The Samsung PMIC datasheets describe the interrupt line as active low
with a requirement of acknowledge from the CPU. The falling edge
interrupt will mostly work but it's not correct.
Fixes: 1fed2252713e ("ARM: dts: fix pinctrl for s2mps11-irq on exynos5420-arndale-octa") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20201210212903.216728-5-krzk@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
The Samsung PMIC datasheets describe the interrupt line as active low
with a requirement of acknowledge from the CPU. Without specifying the
interrupt type in Devicetree, kernel might apply some fixed
configuration, not necessarily working for this hardware.
The Samsung PMIC datasheets describe the interrupt line as active low
with a requirement of acknowledge from the CPU. Without specifying the
interrupt type in Devicetree, kernel might apply some fixed
configuration, not necessarily working for this hardware.
Fixes: faaf348ef468 ("ARM: dts: Add board dts file for exynos3250-rinato") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20201210212903.216728-3-krzk@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
The Samsung PMIC datasheets describe the interrupt line as active low
with a requirement of acknowledge from the CPU. Without specifying the
interrupt type in Devicetree, kernel might apply some fixed
configuration, not necessarily working for this hardware.
The Samsung PMIC datasheets describe the interrupt line as active low
with a requirement of acknowledge from the CPU. Without specifying the
interrupt type in Devicetree, kernel might apply some fixed
configuration, not necessarily working for this hardware.
Fixes: b004a34bd0ff ("ARM: dts: exynos: Add exynos3250-artik5 dtsi file for ARTIK5 module") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20201210212903.216728-1-krzk@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
Looks like this was missed when patching the source to clear the structures
throughout, causing this one instance to clear the struct after the response
id is assigned.
Fixes: eddb7732119d ("Bluetooth: A2MP: Fix not initializing all members") Signed-off-by: Christopher William Snowhill <chris@kode54.net> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Fixes: 4945af5b264f ("ath10k: enable SRRI/DRRI support on ddr for WCN3990") Signed-off-by: Rakesh Pillai <pillair@codeaurora.org> Reviewed-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/1607713210-18320-1-git-send-email-pillair@codeaurora.org Signed-off-by: Sasha Levin <sashal@kernel.org>
MIPS uses its own declaration of rwdata, and thus it should be kept
in sync with the asm-generic one. Currently PAGE_ALIGNED_DATA() is
missing from the linker script, which emits the following ld
warnings:
mips-alpine-linux-musl-ld: warning: orphan section
`.data..page_aligned' from `arch/mips/kernel/vdso.o' being placed
in section `.data..page_aligned'
mips-alpine-linux-musl-ld: warning: orphan section
`.data..page_aligned' from `arch/mips/vdso/vdso-image.o' being placed
in section `.data..page_aligned'
Add the necessary declaration, so the mentioned structures will be
placed in vmlinux as intended:
The commit f274baa49be6 ("ALSA: usb-audio: Allow non-vmalloc buffer
for PCM buffers") introduced the mode to allocate coherent pages for
PCM buffers, and it used bus->controller device as its DMA device.
It turned out, however, that bus->sysdev is a more appropriate device
to be used for DMA mapping in HCD code.
This patch corrects the device reference accordingly.
Note that, on most platforms, both point to the very same device,
hence this patch doesn't change anything practically. But on
platforms like xhcd-plat hcd, the change becomes effective.
bfq_setup_cooperator() uses bfqd->in_serv_last_pos so detect whether it
makes sense to merge current bfq queue with the in-service queue.
However if the in-service queue is freshly scheduled and didn't dispatch
any requests yet, bfqd->in_serv_last_pos is stale and contains value
from the previously scheduled bfq queue which can thus result in a bogus
decision that the two queues should be merged. This bug can be observed
for example with the following fio jobfile:
where the 4 processes will end up in the one shared bfq queue although
they do IO to physically very distant files (for some reason I was able to
observe this only with slice_idle=1ms setting).
Fix the problem by invalidating bfqd->in_serv_last_pos when switching
in-service queue.
Fixes: 058fdecc6de7 ("block, bfq: fix in-service-queue check for queue merging") CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The use of PHY_REFCLK_USE_PAD introduced a regression for apq8064 devices.
It was tested that while apq doesn't require the padding, ipq SoC must use
it or the kernel hangs on boot.
Link: https://lore.kernel.org/r/20201019165555.8269-1-ansuelsmth@gmail.com Fixes: de3c4bf64897 ("PCI: qcom: Add support for tx term offset for rev 2.1.0") Reported-by: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Stanimir Varbanov <svarbanov@mm-sol.com> Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently kdb uses in_interrupt() to determine whether its library
code has been called from the kgdb trap handler or from a saner calling
context such as driver init. This approach is broken because
in_interrupt() alone isn't able to determine kgdb trap handler entry from
normal task context. This can happen during normal use of basic features
such as breakpoints and can also be trivially reproduced using:
echo g > /proc/sysrq-trigger
We can improve this by adding check for in_dbg_master() instead which
explicitly determines if we are running in debugger context.
We expect toolchains to produce these new debug info sections as part of
DWARF v5. Add explicit placements to prevent the linker warnings from
--orphan-section=warn.
Compilers may produce such sections with explicit -gdwarf-5, or based on
the implicit default version of DWARF when -g is used via DEBUG_INFO.
This implicit default changes over time, and has changed to DWARF v5
with GCC 11.
.debug_sup was mentioned in review, but without compilers producing it
today, let's wait to add it until it becomes necessary.
Cc: stable@vger.kernel.org Link: https://bugzilla.redhat.com/show_bug.cgi?id=1922707 Reported-by: Chris Murphy <lists@colorremedies.com> Suggested-by: Fangrui Song <maskray@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Mark Wielaard <mark@klomp.org> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Even though the atomic_dec_and_mutex_lock() in
__static_key_slow_dec_cpuslocked() can never see a negative value in
key->enabled the subsequent sanity check is re-reading key->enabled, which may
have been set to -1 in the meantime by static_key_slow_inc_cpuslocked().
The kernel test robot reported the following issue:
CC [M] drivers/soc/litex/litex_soc_ctrl.o
sh4-linux-objcopy: Unable to change endianness of input file(s)
sh4-linux-ld: cannot find drivers/soc/litex/.tmp_gl_litex_soc_ctrl.o: No such file or directory
sh4-linux-objcopy: 'drivers/soc/litex/.tmp_mx_litex_soc_ctrl.o': No such file
The problem is that the format of input file is elf32-shbig-linux, but
sh4-linux-objcopy wants to output a file which format is elf32-sh-linux:
$ sh4-linux-objdump -d drivers/soc/litex/litex_soc_ctrl.o | grep format
drivers/soc/litex/litex_soc_ctrl.o: file format elf32-shbig-linux
While debugging another issue today, Steve and I noticed that if a
subdir for a file share is already mounted on the client, any new
mount of any other subdir (or the file share root) of the same share
results in sharing the cifs superblock, which e.g. can result in
incorrect device name.
While setting prefix path for the root of a cifs_sb,
CIFS_MOUNT_USE_PREFIX_PATH flag should also be set.
Without it, prepath is not even considered in some places,
and output of "mount" and various /proc/<>/*mount* related
options can be missing part of the device name.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
cecf5d87ff20 ("block: split .sysfs_lock into two locks") starts to
release & acquire sysfs_lock before registering/un-registering elevator
queue during switching elevator for avoiding potential deadlock from
showing & storing 'queue/iosched' attributes and removing elevator's
kobject.
Turns out there isn't such deadlock because 'q->sysfs_lock' isn't
required in .show & .store of queue/iosched's attributes, and just
elevator's sysfs lock is acquired in elv_iosched_store() and
elv_iosched_show(). So it is safe to hold queue's sysfs lock when
registering/un-registering elevator queue.
The biggest issue is that commit cecf5d87ff20 assumes that concurrent
write on 'queue/scheduler' can't happen. However, this assumption isn't
true, because kernfs_fop_write() only guarantees that concurrent write
aren't called on the same open file, but the write could be from
different open on the file. So we can't release & re-acquire queue's
sysfs lock during switching elevator, otherwise use-after-free on
elevator could be triggered.
Fixes the issue by not releasing queue's sysfs lock during switching
elevator.
Fixes: cecf5d87ff20 ("block: split .sysfs_lock into two locks") Cc: Christoph Hellwig <hch@infradead.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
(jwang: adjust ctx for 4.19) Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
cecf5d87ff20 ("block: split .sysfs_lock into two locks") starts to
release & actuire sysfs_lock again during switching elevator. So it
isn't enough to prevent switching elevator from happening by simply
clearing QUEUE_FLAG_REGISTERED with holding sysfs_lock, because
in-progress switch still can move on after re-acquiring the lock,
meantime the flag of QUEUE_FLAG_REGISTERED won't get checked.
Fixes this issue by checking 'q->elevator' directly & locklessly after
q->kobj is removed in blk_unregister_queue(), this way is safe because
q->elevator can't be changed at that time.
Fixes: cecf5d87ff20 ("block: split .sysfs_lock into two locks") Cc: Christoph Hellwig <hch@infradead.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The kernfs built-in lock of 'kn->count' is held in sysfs .show/.store
path. Meantime, inside block's .show/.store callback, q->sysfs_lock is
required.
However, when mq & iosched kobjects are removed via
blk_mq_unregister_dev() & elv_unregister_queue(), q->sysfs_lock is held
too. This way causes AB-BA lock because the kernfs built-in lock of
'kn-count' is required inside kobject_del() too, see the lockdep warning[1].
On the other hand, it isn't necessary to acquire q->sysfs_lock for
both blk_mq_unregister_dev() & elv_unregister_queue() because
clearing REGISTERED flag prevents storing to 'queue/scheduler'
from being happened. Also sysfs write(store) is exclusive, so no
necessary to hold the lock for elv_unregister_queue() when it is
called in switching elevator path.
So split .sysfs_lock into two: one is still named as .sysfs_lock for
covering sync .store, the other one is named as .sysfs_dir_lock
for covering kobjects and related status change.
sysfs itself can handle the race between add/remove kobjects and
showing/storing attributes under kobjects. For switching scheduler
via storing to 'queue/scheduler', we use the queue flag of
QUEUE_FLAG_REGISTERED with .sysfs_lock for avoiding the race, then
we can avoid to hold .sysfs_lock during removing/adding kobjects.
[1] lockdep warning
======================================================
WARNING: possible circular locking dependency detected 5.3.0-rc3-00044-g73277fc75ea0 #1380 Not tainted
------------------------------------------------------
rmmod/777 is trying to acquire lock: 00000000ac50e981 (kn->count#202){++++}, at: kernfs_remove_by_name_ns+0x59/0x72
but task is already holding lock: 00000000fb16ae21 (&q->sysfs_lock){+.+.}, at: blk_unregister_queue+0x78/0x10b
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
Cc: Christoph Hellwig <hch@infradead.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
(jwang:cherry picked from commit cecf5d87ff2035127bb5a9ee054d0023a4a7cad3,
adjust ctx for 4,19) Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There are 4 users which check if queue is registered, so add one helper
to check it.
Cc: Christoph Hellwig <hch@infradead.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Otherwise build fails if the headers are not in the default location. While at
it also ask pkg-config for the libs, with fallback to the existing value.
HDA initialization is failing occasionally on Tegra210 and following
print is observed in the boot log. Because of this probe() fails and
no sound card is registered.
[16.800802] tegra-hda 70030000.hda: no codecs found!
Codecs request a state change and enumeration by the controller. In
failure cases this does not seem to happen as STATETS register reads 0.
The problem seems to be related to the HDA codec dependency on SOR
power domain. If it is gated during HDA probe then the failure is
observed. Building Tegra HDA driver into kernel image avoids this
failure but does not completely address the dependency part. Fix this
problem by adding 'power-domains' DT property for Tegra210 HDA. Note
that Tegra186 and Tegra194 HDA do this already.
The HID subsystem allows an "HID report field" to have a different
number of "values" and "usages" when it is allocated. When a field
struct is created, the size of the usage array is guaranteed to be at
least as large as the values array, but it may be larger. This leads to
a potential out-of-bounds write in
__hidinput_change_resolution_multipliers() and an out-of-bounds read in
hidinput_count_leds().
To fix this, let's make sure that both the usage and value arrays are
the same size.
In kvm_mmu_notifier_invalidate_range_start(), tlbs_dirty is used as:
need_tlb_flush |= kvm->tlbs_dirty;
with need_tlb_flush's type being int and tlbs_dirty's type being long.
It means that tlbs_dirty is always used as int and the higher 32 bits
is useless. We need to check tlbs_dirty in a correct way and this
change checks it directly without propagating it to need_tlb_flush.
Note: it's _extremely_ unlikely this neglecting of higher 32 bits can
cause problems in practice. It would require encountering tlbs_dirty
on a 4 billion count boundary, and KVM would need to be using shadow
paging or be running a nested guest.
Cc: stable@vger.kernel.org Fixes: a4ee1ca4a36e ("KVM: MMU: delay flush all tlbs on sync_page path") Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20201217154118.16497-1-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[sudip: adjust context] Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The function uses a goto-based loop, which may lead to an earlier error
getting discarded by a later iteration. Exit this ad-hoc loop when an
error was encountered.
The out-of-memory error path additionally fails to fill a structure
field looked at by xen_blkbk_unmap_prepare() before inspecting the
handle which does get properly set (to BLKBACK_INVALID_HANDLE).
Since the earlier exiting from the ad-hoc loop requires the same field
filling (invalidation) as that on the out-of-memory path, fold both
paths. While doing so, drop the pr_alert(), as extra log messages aren't
going to help the situation (the kernel will log oom conditions already
anyway).
In particular -ENOMEM may come back here, from set_foreign_p2m_mapping().
Don't make problems worse, the more that handling elsewhere (together
with map's status fields now indicating whether a mapping wasn't even
attempted, and hence has to be considered failed) doesn't require this
odd way of dealing with errors.
In particular -ENOMEM may come back here, from set_foreign_p2m_mapping().
Don't make problems worse, the more that handling elsewhere (together
with map's status fields now indicating whether a mapping wasn't even
attempted, and hence has to be considered failed) doesn't require this
odd way of dealing with errors.
In particular -ENOMEM may come back here, from set_foreign_p2m_mapping().
Don't make problems worse, the more that handling elsewhere (together
with map's status fields now indicating whether a mapping wasn't even
attempted, and hence has to be considered failed) doesn't require this
odd way of dealing with errors.
Failure of the kernel part of the mapping operation should also be
indicated as an error to the caller, or else it may assume the
respective kernel VA is okay to access.
Furthermore gnttab_map_refs() failing still requires recording
successfully mapped handles, so they can be unmapped subsequently. This
in turn requires there to be a way to tell full hypercall failure from
partial success - preset map_op status fields such that they won't
"happen" to look as if the operation succeeded.
Also again use GNTST_okay instead of implying its value (zero).
We may not skip setting the field in the unmap structure when
GNTMAP_device_map is in use - such an unmap would fail to release the
respective resources (a page ref in the hypervisor). Otoh the field
doesn't need setting at all when GNTMAP_device_map is not in use.
To record the value for unmapping, we also better don't use our local
p2m: In particular after a subsequent change it may not have got updated
for all the batch elements. Instead it can simply be taken from the
respective map's results.
We can additionally avoid playing this game altogether for the kernel
part of the mappings in (x86) PV mode.
Its sibling (set_foreign_p2m_mapping()) as well as the sibling of its
only caller (gnttab_map_refs()) don't clean up after themselves in case
of error. Higher level callers are expected to do so. However, in order
for that to really clean up any partially set up state, the operation
should not terminate upon encountering an entry in unexpected state. It
is particularly relevant to notice here that set_foreign_p2m_mapping()
would skip setting up a p2m entry if its grant mapping failed, but it
would continue to set up further p2m entries as long as their mappings
succeeded.
Arguably down the road set_foreign_p2m_mapping() may want its page state
related WARN_ON() also converted to an error return.
The port ID for control messages was uncorrectly set with broadcast
node ID value, causing message to be dropped on remote side since
not passing packet filtering (cb->dst_port != QRTR_PORT_CTRL).
Fixes: d27e77a3de28 ("net: qrtr: Reset the node and port ID of broadcast messages") Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Paolo Bonzini [Thu, 18 Feb 2021 18:40:58 +0000 (13:40 -0500)]
KVM: SEV: fix double locking due to incorrect backport
Fix an incorrect line in the 5.4.y and 4.19.y backports of commit 19a23da53932bc ("Fix unsynchronized access to sev members through
svm_register_enc_region"), first applied to 5.4.98 and 4.19.176.
Fixes: 1e80fdc09d12 ("KVM: SVM: Pin guest memory when SEV is active") Reported-by: Dov Murik <dovmurik@linux.vnet.ibm.com> Cc: stable@vger.kernel.org # 5.4.x Cc: stable@vger.kernel.org # 4.19.x Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
20bf2b378729 ("x86/build: Disable CET instrumentation in the kernel")
disabled CET instrumentation which gets added by default by the Ubuntu
gcc9 and 10 by default, but did that only for 64-bit builds. It would
still fail when building a 32-bit target. So disable CET for all x86
builds.
Fixes: 20bf2b378729 ("x86/build: Disable CET instrumentation in the kernel") Reported-by: AC <achirvasub@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: AC <achirvasub@gmail.com> Link: https://lkml.kernel.org/r/YCCIgMHkzh/xT4ex@arch-chirva.localdomain Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
syzbot found WARNING in rds_rdma_extra_size [1] when RDS_CMSG_RDMA_ARGS
control message is passed with user-controlled
0x40001 bytes of args->nr_local, causing order >= MAX_ORDER condition.
The exact value 0x40001 can be checked with UIO_MAXIOV which is 0x400.
So for kcalloc() 0x400 iovecs with sizeof(struct rds_iovec) = 0x10
is the closest limit, with 0x10 leftover.
Same condition is currently done in rds_cmsg_rdma_args().
In vsock_shutdown() we touched some socket fields without holding the
socket lock, such as 'state' and 'sk_flags'.
Also, after the introduction of multi-transport, we are accessing
'vsk->transport' in vsock_send_shutdown() without holding the lock
and this call can be made while the connection is in progress, so
the transport can change in the meantime.
To avoid issues, we hold the socket lock when we enter in
vsock_shutdown() and release it when we leave.
Among the transports that implement the 'shutdown' callback, only
hyperv_transport acquired the lock. Since the caller now holds it,
we no longer take it.
Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Prevent netif_tx_disable() running concurrently with dev_watchdog() by
taking the device global xmit lock. Otherwise, the recommended:
netif_carrier_off(dev);
netif_tx_disable(dev);
driver shutdown sequence can happen after the watchdog has already
checked carrier, resulting in possible false alarms. This is because
netif_tx_lock() only sets the frozen bit without maintaining the locks
on the individual queues.
Fixes: c3f26a269c24 ("netdev: Fix lockdep warnings in multiqueue configurations.") Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A possible locking issue in vsock_connect_timeout() was recognized by
Eric Dumazet which might cause a null pointer dereference in
vsock_transport_cancel_pkt(). This patch assures that
vsock_transport_cancel_pkt() will be called within the lock, so a race
condition won't occur which could result in vsk->transport to be set to NULL.
Fixes: 380feae0def7 ("vsock: cancel packets when failing to connect") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Norbert Slusarek <nslusarek@gmx.net> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/trinity-f8e0937a-cf0e-4d80-a76e-d9a958ba3ef1-1612535522360@3c-app-gmx-bap12 Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The sctp transport seq_file iterators take a reference to the transport
in the ->start and ->next functions and releases the reference in the
->show function. The preferred handling for such resources is to
release them in the subsequent ->next or ->stop function call.
Since Commit 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration
code and interface") there is no guarantee that ->show will be called
after ->next, so this function can now leak references.
So move the sctp_transport_put() call to ->next and ->stop.
Fixes: 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code and interface") Reported-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Originally the procedure of the ULPI transaction finish detection has been
developed as a simple busy-loop with just decrementing counter and no
delays. It's wrong since on different systems the loop will take a
different time to complete. So if the system bus and CPU are fast enough
to overtake the ULPI bus and the companion PHY reaction, then we'll get to
take a false timeout error. Fix this by converting the busy-loop procedure
to take the standard bus speed, address value and the registers access
mode into account for the busy-loop delay calculation.
Here is the way the fix works. It's known that the ULPI bus is clocked
with 60MHz signal. In accordance with [1] the ULPI bus protocol is created
so to spend 5 and 6 clock periods for immediate register write and read
operations respectively, and 6 and 7 clock periods - for the extended
register writes and reads. Based on that we can easily pre-calculate the
time which will be needed for the controller to perform a requested IO
operation. Note we'll still preserve the attempts counter in case if the
DWC USB3 controller has got some internals delays.
[1] UTMI+ Low Pin Interface (ULPI) Specification, Revision 1.1,
October 20, 2004, pp. 30 - 36.
The digital filter related computation are present in the driver
however the programming of the filter within the IP is missing.
The maximum value for the DNF is wrong and should be 15 instead of 16.
arm64 references the start address of .builtin_fw (__start_builtin_fw)
with a pair of R_AARCH64_ADR_PREL_PG_HI21/R_AARCH64_LDST64_ABS_LO12_NC
relocations. The compiler is allowed to emit the
R_AARCH64_LDST64_ABS_LO12_NC relocation because struct builtin_fw in
include/linux/firmware.h is 8-byte aligned.
The R_AARCH64_LDST64_ABS_LO12_NC relocation requires the address to be a
multiple of 8, which may not be the case if .builtin_fw is empty.
Unconditionally align .builtin_fw to fix the linker error. 32-bit
architectures could use ALIGN(4) but that would add unnecessary
complexity, so just use ALIGN(8).
Link: https://lkml.kernel.org/r/20201208054646.2913063-1-maskray@google.com Link: https://github.com/ClangBuiltLinux/linux/issues/1204 Fixes: 5658c76 ("firmware: allow firmware files to be built into kernel image") Signed-off-by: Fangrui Song <maskray@google.com> Reported-by: kernel test robot <lkp@intel.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Douglas Anderson <dianders@chromium.org> Acked-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The queue_id is received from vf, if use it directly,
an out-of-bound issue may be caused, so add a check for
this queue_id before using it in hclge_reset_vf_queue().
Fixes: 1a426f8b40fc ("net: hns3: fix the VF queue reset flow error") Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
When disable CBS, mode_to_use parameter is not updated even the operation
mode of Tx Queue is changed to Data Centre Bridging (DCB). Therefore,
when tc_setup_cbs() function is called to re-enable CBS, the operation
mode of Tx Queue remains at DCB, which causing CBS fails to work.
This patch updates the value of mode_to_use parameter to MTL_QUEUE_DCB
after operation mode of Tx Queue is changed to DCB in stmmac_dma_qmode()
callback function.
Fixes: 1f705bc61aee ("net: stmmac: Add support for CBS QDISC") Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Mohammad Athari Bin Ismail <mohammad.athari.ismail@intel.com> Signed-off-by: Song, Yoong Siang <yoong.siang.song@intel.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Link: https://lore.kernel.org/r/1612447396-20351-1-git-send-email-yoong.siang.song@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Since commit 23025393dbeb3b8b3 ("xen/netback: use lateeoi irq binding")
xenvif_rx_ring_slots_available() is no longer called only from the rx
queue kernel thread, so it needs to access the rx queue with the
associated queue held.
Reported-by: Igor Druzhinin <igor.druzhinin@citrix.com> Fixes: 23025393dbeb3b8b3 ("xen/netback: use lateeoi irq binding") Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wl@xen.org> Link: https://lore.kernel.org/r/20210202070938.7863-1-jgross@suse.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When updating the tcp or udp header checksum on port nat the function
inet_proto_csum_replace2 with the last parameter pseudohdr as true.
This leads to an error in the case that GRO is used and packets are
split up in GSO. The tcp or udp checksum of all packets is incorrect.
The error is probably masked due to the fact the most network driver
implement tcp/udp checksum offloading. It also only happens when GRO is
applied and not on single packets.
The error is most visible when using a pppoe connection which is not
triggering the tcp/udp checksum offload.
When both --reap and --update flag are specified, there's a code
path at which the entry to be updated is reaped beforehand,
which then leads to kernel crash. Reap only entries which won't be
updated.
Fixes kernel bugzilla #207773.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=207773 Reported-by: Reindl Harald <h.reindl@thelounge.net> Fixes: 0079c5aee348 ("netfilter: xt_recent: add an entry reaper") Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
On 32-bit architecture, roundup_pow_of_two() can return 0 when the argument
has upper most bit set due to resulting 1UL << 32. Add a check for this case.
Fix a memory leak in mt76_add_fragment routine returning the buffer
to the page_frag_cache when we receive a new fragment and the
skb_shared_info frag array is full.
Giancarlo Ferrari reports the following oops while trying to use kexec:
Unable to handle kernel paging request at virtual address 80112f38
pgd = fd7ef03e
[80112f38] *pgd=0001141e(bad)
Internal error: Oops: 80d [#1] PREEMPT SMP ARM
...
This is caused by machine_kexec() trying to set the kernel text to be
read/write, so it can poke values into the relocation code before
copying it - and an interrupt occuring which changes the page tables.
The subsequent writes then hit read-only sections that trigger a
data abort resulting in the above oops.
Fix this by copying the relocation code, and then writing the variables
into the destination, thereby avoiding the need to make the kernel text
read/write.
Reported-by: Giancarlo Ferrari <giancarlo.ferrari89@gmail.com> Tested-by: Giancarlo Ferrari <giancarlo.ferrari89@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
The lpc32xx clock driver is not able to actually change the PLL rate as
this would require reparenting ARM_CLK, DDRAM_CLK, PERIPH_CLK to SYSCLK,
then stop the PLL, update the register, restart the PLL and wait for the
PLL to lock and finally reparent ARM_CLK, DDRAM_CLK, PERIPH_CLK to HCLK
PLL.
Currently, the HCLK driver simply updates the registers but this has no
real effect and all the clock rate calculation end up being wrong. This is
especially annoying for the peripheral (e.g. UARTs, I2C, SPI).
bfq.limit_depth passes word_depths[] as shallow_depth down to sbitmap core
sbitmap_get_shallow, which uses just the number to limit the scan depth of
each bitmap word, formula:
scan_percentage_for_each_word = shallow_depth / (1 << sbimap->shift) * 100%
That means the comments's percentiles 50%, 75%, 18%, 37% of bfq are correct.
But after commit patch 'bfq: Fix computation of shallow depth', we use
sbitmap.depth instead, as a example in following case:
sbitmap.depth = 256, map_nr = 4, shift = 6; sbitmap_word.depth = 64.
The resulsts of computed bfqd->word_depths[] are {128, 192, 48, 96}, and
three of the numbers exceed core dirver's 'sbitmap_word.depth=64' limit
nothing.
Signed-off-by: Lin Feng <linf@wangsu.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
virt_addr_valid macro checks that a virtual address is valid, ie that
the address belongs to the linear mapping and that the corresponding
physical page exists.
Add the missing check that ensures the virtual address belongs to the
linear mapping, otherwise __virt_to_phys, when compiled with
CONFIG_DEBUG_VIRTUAL enabled, raises a WARN that is interpreted as a
kernel bug by syzbot.
[why]
drm_atomic_commit was changed so that the caller must free their
drm_atomic_state reference on successes.
[how]
Add drm_atomic_commit_put after drm_atomic_commit call in
dm_force_atomic_commit.
Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Roman Li <Roman.Li@amd.com> Acked-by: Anson Jacob <Anson.Jacob@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[why]
prev_sink is not used anywhere else in the function and the reference to
it from dc_link is replaced with a new dc_sink.
[how]
Change dc_sink_retain(prev_sink) to dc_sink_release(prev_sink).
Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com> Acked-by: Anson Jacob <Anson.Jacob@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When inode has no listxattr op of its own (e.g. squashfs) vfs_listxattr
calls the LSM inode_listsecurity hooks to list the xattrs that LSMs will
intercept in inode_getxattr hooks.
When selinux LSM is installed but not initialized, it will list the
security.selinux xattr in inode_listsecurity, but will not intercept it
in inode_getxattr. This results in -ENODATA for a getxattr call for an
xattr returned by listxattr.
This situation was manifested as overlayfs failure to copy up lower
files from squashfs when selinux is built-in but not initialized,
because ovl_copy_xattr() iterates the lower inode xattrs by
vfs_listxattr() and vfs_getxattr().
ovl_copy_xattr() skips copy up of security labels that are indentified by
inode_copy_up_xattr LSM hooks, but it does that after vfs_getxattr().
Since we are not going to copy them, skip vfs_getxattr() of the security
labels.
Reported-by: Michael Labriola <michael.d.labriola@gmail.com> Tested-by: Michael Labriola <michael.d.labriola@gmail.com> Link: https://lore.kernel.org/linux-unionfs/2nv9d47zt7.fsf@aldarion.sourceruckus.org/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If a capability is stored on disk in v2 format cap_inode_getsecurity() will
currently return in v2 format unconditionally.
This is wrong: v2 cap should be equivalent to a v3 cap with zero rootid,
and so the same conversions performed on it.
If the rootid cannot be mapped, v3 is returned unconverted. Fix this so
that both v2 and v3 return -EOVERFLOW if the rootid (or the owner of the fs
user namespace in case of v2) cannot be mapped into the current user
namespace.
The vfs_getxattr() in ovl_xattr_set() is used to check whether an xattr
exist on a lower layer file that is to be removed. If the xattr does not
exist, then no need to copy up the file.
This call of vfs_getxattr() wasn't wrapped in credential override, and this
is probably okay. But for consitency wrap this instance as well.
Recently userspace has started making more use of SW_TABLET_MODE
(when an input-dev reports this).
Specifically recent GNOME3 versions will:
1. When SW_TABLET_MODE is reported and is reporting 0:
1.1 Disable accelerometer-based screen auto-rotation
1.2 Disable automatically showing the on-screen keyboard when a
text-input field is focussed
2. When SW_TABLET_MODE is reported and is reporting 1:
2.1 Ignore input-events from the builtin keyboard and touchpad
(this is for 360° hinges style 2-in-1s where the keyboard and
touchpads are accessible on the back of the tablet when folded
into tablet-mode)
This means that claiming to support SW_TABLET_MODE when it does not
actually work / reports correct values has bad side-effects.
The check in the hp-wmi code which is used to decide if the input-dev
should claim SW_TABLET_MODE support, only checks if the
HPWMI_HARDWARE_QUERY is supported. It does *not* check if the hardware
actually is capable of reporting SW_TABLET_MODE.
This leads to the hp-wmi input-dev claiming SW_TABLET_MODE support,
while in reality it will always report 0 as SW_TABLET_MODE value.
This has been seen on a "HP ENVY x360 Convertible 15-cp0xxx" and
this likely is the case on a whole lot of other HP models.
This problem causes both auto-rotation and on-screen keyboard
support to not work on affected x360 models.
There is no easy fix for this, but since userspace expects
SW_TABLET_MODE reporting to be reliable when advertised it is
better to not claim/report SW_TABLET_MODE support at all, then
to claim to support it while it does not work.
To avoid the mentioned problems, add a new enable_tablet_mode_sw
module-parameter which defaults to false.
Note I've made this an int using the standard -1=auto, 0=off, 1=on
triplett, with the hope that in the future we can come up with a
better way to detect SW_TABLET_MODE support. ATM the default
auto option just does the same as off.
It recently became apparent that the lack of a 'device_type = "pci"'
in the PCIe root complex node for rk3399 is a violation of the PCI
binding, as documented in IEEE Std 1275-1994. Changes to the kernel's
parsing of the DT made such violation fatal, as drivers cannot
probe the controller anymore.
Add the missing property makes the PCIe node compliant. While we
are at it, drop the pointless linux,pci-domain property, which only
makes sense when there are multiple host bridges.
After Commit 3499ba8198cad ("xen: Fix event channel callback via
INTX/GSI"), xenbus_probe() will be called too early on Arm. This will
recent to a guest hang during boot.
If the hang wasn't there, we would have ended up to call
xenbus_probe() twice (the second time is in xenbus_probe_initcall()).
We don't need to initialize xenbus_probe() early for Arm guest.
Therefore, the call in xen_guest_init() is now removed.
After this change, there is no more external caller for xenbus_probe().
So the function is turned to a static one. Interestingly there were two
prototypes for it.
Cc: stable@vger.kernel.org Fixes: 3499ba8198cad ("xen: Fix event channel callback via INTX/GSI") Reported-by: Ian Jackson <iwj@xenproject.org> Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Link: https://lore.kernel.org/r/20210210170654.5377-1-julien@xen.org Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When filters are used by trace events, a page is allocated on each CPU and
used to copy the trace event fields to this page before writing to the ring
buffer. The reason to use the filter and not write directly into the ring
buffer is because a filter may discard the event and there's more overhead
on discarding from the ring buffer than the extra copy.
The problem here is that there is no check against the size being allocated
when using this page. If an event asks for more than a page size while being
filtered, it will get only a page, leading to the caller writing more that
what was allocated.
Check the length of the request, and if it is more than PAGE_SIZE minus the
header default back to allocating from the ring buffer directly. The ring
buffer may reject the event if its too big anyway, but it wont overflow.
Link: https://lore.kernel.org/ath10k/1612839593-2308-1-git-send-email-wgong@codeaurora.org/ Cc: stable@vger.kernel.org Fixes: 0fc1b09ff1ff4 ("tracing: Use temp buffer when filtering events") Reported-by: Wen Gong <wgong@codeaurora.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The file /sys/kernel/tracing/events/enable is used to enable all events by
echoing in "1", or disabling all events when echoing in "0". To know if all
events are enabled, disabled, or some are enabled but not all of them,
cating the file should show either "1" (all enabled), "0" (all disabled), or
"X" (some enabled but not all of them). This works the same as the "enable"
files in the individule system directories (like tracing/events/sched/enable).
But when all events are enabled, the top level "enable" file shows "X". The
reason is that its checking the "ftrace" events, which are special events
that only exist for their format files. These include the format for the
function tracer events, that are enabled when the function tracer is
enabled, but not by the "enable" file. The check includes these events,
which will always be disabled, and even though all true events are enabled,
the top level "enable" file will show "X" instead of "1".
To fix this, have the check test the event's flags to see if it has the
"IGNORE_ENABLE" flag set, and if so, not test it.
Cc: stable@vger.kernel.org Fixes: 553552ce1796c ("tracing: Combine event filter_active and enable into single flags field") Reported-by: "Yordan Karadzhov (VMware)" <y.karadz@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With commit eaa7995c529b54 (regulator: core: avoid
regulator_resolve_supply() race condition) we started holding the rdev
lock while resolving supplies, an operation that requires holding the
regulator_list_mutex. This results in lockdep warnings since in other
places we take the list mutex then the mutex on an individual rdev.
Since the goal is to make sure that we don't call set_supply() twice
rather than a concern about the cost of resolution pull the rdev lock
and check for duplicate resolution down to immediately before we do the
set_supply() and drop it again once the allocation is done.
Fixes: eaa7995c529b54 (regulator: core: avoid regulator_resolve_supply() race condition) Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20210122132042.10306-1-broonie@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
At the end of regulator_resolve_supply() we have historically turned
on our supply in some cases. This could be for one of two reasons:
1. If resolving supplies was happening before the call to
set_machine_constraints() we needed to predict if
set_machine_constraints() was going to turn the regulator on and we
needed to preemptively turn the supply on.
2. Maybe set_machine_constraints() happened before we could resolve
supplies (because we failed the first time to resolve) and thus we
might need to propagate an enable that already happened up to our
supply.
Historically regulator_resolve_supply() used _regulator_is_enabled()
to decide whether to turn on the supply.
Let's change things a little bit. Specifically:
1. Let's try to enable the supply and the regulator in the same place,
both in set_machine_constraints(). This means that we have exactly
the same logic for enabling the supply and the regulator.
2. Let's properly set use_count when we enable always-on or boot-on
regulators even for those that don't have supplies. The previous
commit 1fc12b05895e ("regulator: core: Avoid propagating to
supplies when possible") only did this right for regulators with
supplies.
3. Let's make it clear that the only time we need to enable the supply
in regulator_resolve_supply() is if the main regulator is currently
in use. By using use_count (like the rest of the code) to decide
if we're going to enable our supply we keep everything consistent.
Overall the new scheme should be cleaner and easier to reason about.
In addition to fixing regulator_summary to be more correct (because of
the more correct use_count), this change also has the effect of no
longer using _regulator_is_enabled() in this code path.
_regulator_is_enabled() could return an error code for some regulators
at bootup (like RPMh) that can't read their initial state. While one
can argue that the design of those regulators is sub-optimal, the new
logic sidesteps this brokenness. This fix in particular fixes
observed problems on Qualcomm sdm845 boards which use the
above-mentioned RPMh regulator. Those problems were made worse by
commit 1fc12b05895e ("regulator: core: Avoid propagating to supplies
when possible") because now we'd think at bootup that the SD
regulators were already enabled and we'd never try them again.
Fixes: 1fc12b05895e ("regulator: core: Avoid propagating to supplies when possible") Reported-by: Evan Green <evgreen@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When a regulator is marked as always on, it is enabled early on, when
checking and setting up constraints. It makes the assumption that the
bootloader properly initialized the regulator, and just in case enables
the regulator anyway.
Some constraints however currently get missed, such as the soft-start
and ramp-delay. This causes the regulator to be enabled, without the
soft-start and ramp-delay being applied, which in turn can cause
high-currents or other start-up problems.
By moving the always-enabled constraints later in the constraints check,
we can at least ensure all constraints for the regulator are followed.
Signed-off-by: Olliver Schinagl <oliver@schinagl.nl> Signed-off-by: Priit Laes <plaes@plaes.org> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Sysbot has reported a warning where a kmalloc() attempt exceeds the
maximum limit. This has been identified as corruption of the xattr_ids
count when reading the xattr id lookup table.
This patch adds a number of additional sanity checks to detect this
corruption and others.
1. It checks for a corrupted xattr index read from the inode. This could
be because the metadata block is uncompressed, or because the
"compression" bit has been corrupted (turning a compressed block
into an uncompressed block). This would cause an out of bounds read.
2. It checks against corruption of the xattr_ids count. This can either
lead to the above kmalloc failure, or a smaller than expected
table to be read.
3. It checks the contents of the index table for corruption.
Sysbot has reported an "slab-out-of-bounds read" error which has been
identified as being caused by a corrupted "ino_num" value read from the
inode. This could be because the metadata block is uncompressed, or
because the "compression" bit has been corrupted (turning a compressed
block into an uncompressed block).
This patch adds additional sanity checks to detect this, and the
following corruption.
1. It checks against corruption of the inodes count. This can either
lead to a larger table to be read, or a smaller than expected
table to be read.
In the case of a too large inodes count, this would often have been
trapped by the existing sanity checks, but this patch introduces
a more exact check, which can identify too small values.
2. It checks the contents of the index table for corruption.
Sysbot has reported a number of "slab-out-of-bounds reads" and
"use-after-free read" errors which has been identified as being caused
by a corrupted index value read from the inode. This could be because
the metadata block is uncompressed, or because the "compression" bit has
been corrupted (turning a compressed block into an uncompressed block).
This patch adds additional sanity checks to detect this, and the
following corruption.
1. It checks against corruption of the ids count. This can either
lead to a larger table to be read, or a smaller than expected
table to be read.
In the case of a too large ids count, this would often have been
trapped by the existing sanity checks, but this patch introduces
a more exact check, which can identify too small values.
2. It checks the contents of the index table for corruption.
blk_mq_map_swqueue() is called from blk_mq_init_allocated_queue()
and blk_mq_update_nr_hw_queues(). For the former caller, the kobject
isn't exposed to userspace yet. For the latter caller, hctx sysfs entries
and debugfs are un-registered before updating nr_hw_queues.
On the other hand, commit 2f8f1336a48b ("blk-mq: always free hctx after
request queue is freed") moves freeing hctx into queue's release
handler, so there won't be race with queue release path too.
So don't hold q->sysfs_lock in blk_mq_map_swqueue().
Cc: Christoph Hellwig <hch@infradead.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
q->sysfs_lock must be held to provide mutual exclusion between
elevator_switch() and here.
Which is simply wrong. elevator_init_mq() is only called from
blk_mq_init_allocated_queue, which is always called before the request
queue is registered via blk_register_queue(), for dm-rq or normal rq
based driver. However, queue's kobject is only exposed and added to sysfs
in blk_register_queue(). So there isn't such race between elevator_switch()
and elevator_init_mq().
So avoid to hold q->sysfs_lock in elevator_init_mq().
Cc: Christoph Hellwig <hch@infradead.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Grab kvm->lock before pinning memory when registering an encrypted
region; sev_pin_memory() relies on kvm->lock being held to ensure
correctness when checking and updating the number of pinned pages.
Add a lockdep assertion to help prevent future regressions.
Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Cc: linux-kernel@vger.kernel.org Fixes: 1e80fdc09d12 ("KVM: SVM: Pin guest memory when SEV is active") Signed-off-by: Peter Gonda <pgonda@google.com>
V2
- Fix up patch description
- Correct file paths svm.c -> sev.c
- Add unlock of kvm->lock on sev_pin_memory error
Without memcg, there is a one-to-one mapping between the bdi and
bdi_writeback structures. In this world, things are fairly
straightforward; the first thing bdi_unregister() does is to shutdown
the bdi_writeback structure (or wb), and part of that writeback ensures
that no other work queued against the wb, and that the wb is fully
drained.
With memcg, however, there is a one-to-many relationship between the bdi
and bdi_writeback structures; that is, there are multiple wb objects
which can all point to a single bdi. There is a refcount which prevents
the bdi object from being released (and hence, unregistered). So in
theory, the bdi_unregister() *should* only get called once its refcount
goes to zero (bdi_put will drop the refcount, and when it is zero,
release_bdi gets called, which calls bdi_unregister).
Unfortunately, del_gendisk() in block/gen_hd.c never got the memo about
the Brave New memcg World, and calls bdi_unregister directly. It does
this without informing the file system, or the memcg code, or anything
else. This causes the root wb associated with the bdi to be
unregistered, but none of the memcg-specific wb's are shutdown. So when
one of these wb's are woken up to do delayed work, they try to
dereference their wb->bdi->dev to fetch the device name, but
unfortunately bdi->dev is now NULL, thanks to the bdi_unregister()
called by del_gendisk(). As a result, *boom*.
Fortunately, it looks like the rest of the writeback path is perfectly
happy with bdi->dev and bdi->owner being NULL, so the simplest fix is to
create a bdi_dev_name() function which can handle bdi->dev being NULL.
This also allows us to bulletproof the writeback tracepoints to prevent
them from dereferencing a NULL pointer and crashing the kernel if one is
tracing with memcg's enabled, and an iSCSI device dies or a USB storage
stick is pulled.
The most common way of triggering this will be hotremoval of a device
while writeback with memcg enabled is going on. It was triggering
several times a day in a heavily loaded production environment.
In file included from ./arch/powerpc/include/asm/paca.h:15,
from ./arch/powerpc/include/asm/current.h:13,
from ./include/linux/thread_info.h:21,
from ./include/asm-generic/preempt.h:5,
from ./arch/powerpc/include/generated/asm/preempt.h:1,
from ./include/linux/preempt.h:78,
from ./include/linux/spinlock.h:51,
from fs/fs-writeback.c:19:
In function 'strncpy',
inlined from 'perf_trace_writeback_page_template' at
./include/trace/events/writeback.h:56:1:
./include/linux/string.h:260:9: warning: '__builtin_strncpy' specified
bound 32 equals destination size [-Wstringop-truncation]
return __builtin_strncpy(p, q, size);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fix it by using the new strscpy_pad() which was introduced in "lib/string:
Add strscpy_pad() function" and will always be NUL-terminated instead of
strncpy(). Also, change strlcpy() to use strscpy_pad() in this file for
consistency.
Link: http://lkml.kernel.org/r/1564075099-27750-1-git-send-email-cai@lca.pw Fixes: 455b2864686d ("writeback: Initial tracing support") Fixes: 028c2dd184c0 ("writeback: Add tracing to balance_dirty_pages") Fixes: e84d0a4f8e39 ("writeback: trace event writeback_queue_io") Fixes: b48c104d2211 ("writeback: trace event bdi_dirty_ratelimit") Fixes: cc1676d917f3 ("writeback: Move requeueing when I_SYNC set to writeback_sb_inodes()") Fixes: 9fb0a7da0c52 ("writeback: add more tracepoints") Signed-off-by: Qian Cai <cai@lca.pw> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Tobin C. Harding <tobin@kernel.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Dave Chinner <dchinner@redhat.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joe Perches <joe@perches.com> Cc: Kees Cook <keescook@chromium.org> Cc: Jann Horn <jannh@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Nitin Gote <nitin.r.gote@intel.com> Cc: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Cc: Stephen Kitt <steve@sk2.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
We have a function to copy strings safely and we have a function to copy
strings and zero the tail of the destination (if source string is
shorter than destination buffer) but we do not have a function to do
both at once. This means developers must write this themselves if they
desire this functionality. This is a chore, and also leaves us open to
off by one errors unnecessarily.
Add a function that calls strscpy() then memset()s the tail to zero if
the source string is shorter than the destination buffer.
When handling an auth_gss downcall, it's possible to get 0-length
opaque object for the acceptor. In the case of a 0-length XDR
object, make sure simple_get_netobj() fills in dest->data = NULL,
and does not continue to kmemdup() which will set
dest->data = ZERO_SIZE_PTR for the acceptor.
The trace event code can handle NULL but not ZERO_SIZE_PTR for a
string, and so without this patch the rpcgss_context trace event
will crash the kernel as follows:
Remove duplicated helper functions to parse opaque XDR objects
and place inside new file net/sunrpc/auth_gss/auth_gss_internal.h.
In the new file carry the license and copyright from the source file
net/sunrpc/auth_gss/auth_gss.c. Finally, update the comment inside
include/linux/sunrpc/xdr.h since lockd is not the only user of
struct xdr_netobj.
If we get into a problem severe enough to attempt a reprobe,
we schedule a worker to do that. However, if the problem gets
more severe and the device is actually destroyed before this
worker has a chance to run, we use a free device. Bump up the
reference count of the device until the worker runs to avoid
this situation.