git.ipfire.org Git - thirdparty/kernel/stable.git/log

media: rc-loopback: return number of emitters rather than error

commit 6b7f554be8c92319d7e6df92fd247ebb9beb4a45 upstream.

The LIRC_SET_TRANSMITTER_MASK ioctl should return the number of emitters
if an invalid list was set.

Cc: stable@vger.kernel.org
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

media: uvc: don't do DMA on stack

commit 1a10d7fdb6d0e235e9d230916244cc2769d3f170 upstream.

As warned by smatch:
drivers/media/usb/uvc/uvc_v4l2.c:911 uvc_ioctl_g_input() error: doing dma on the stack (&i)
drivers/media/usb/uvc/uvc_v4l2.c:943 uvc_ioctl_s_input() error: doing dma on the stack (&i)

those two functions call uvc_query_ctrl passing a pointer to
a data at the DMA stack. those are used to send URBs via
usb_control_msg(). Using DMA stack is not supported and should
not work anymore on modern Linux versions.

So, use a kmalloc'ed buffer.

Cc: stable@vger.kernel.org # Kernel 4.9 and upper
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

VMCI: fix NULL pointer dereference when unmapping queue pair

commit a30dc6cf0dc51419021550152e435736aaef8799 upstream.

I got a NULL pointer dereference report when doing fuzz test:

Call Trace:
  qp_release_pages+0xae/0x130
  qp_host_unregister_user_memory.isra.25+0x2d/0x80
  vmci_qp_broker_unmap+0x191/0x320
  ? vmci_host_do_alloc_queuepair.isra.9+0x1c0/0x1c0
  vmci_host_unlocked_ioctl+0x59f/0xd50
  ? do_vfs_ioctl+0x14b/0xa10
  ? tomoyo_file_ioctl+0x28/0x30
  ? vmci_host_do_alloc_queuepair.isra.9+0x1c0/0x1c0
  __x64_sys_ioctl+0xea/0x120
  do_syscall_64+0x34/0xb0
  entry_SYSCALL_64_after_hwframe+0x44/0xae

When a queue pair is created by the following call, it will not
register the user memory if the page_store is NULL, and the
entry->state will be set to VMCIQPB_CREATED_NO_MEM.

vmci_host_unlocked_ioctl
  vmci_host_do_alloc_queuepair
    vmci_qp_broker_alloc
      qp_broker_alloc
        qp_broker_create // set entry->state = VMCIQPB_CREATED_NO_MEM;

When unmapping this queue pair, qp_host_unregister_user_memory() will
be called to unregister the non-existent user memory, which will
result in a null pointer reference. It will also change
VMCIQPB_CREATED_NO_MEM to VMCIQPB_CREATED_MEM, which should not be
present in this operation.

Only when the qp broker has mem, it can unregister the user
memory when unmapping the qp broker.

Only when the qp broker has no mem, it can register the user
memory when mapping the qp broker.

Fixes: 06164d2b72aa ("VMCI: queue pairs implementation.")
Cc: stable <stable@vger.kernel.org>
Reported-by: Hulk Robot <hulkci@huawei.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Link: https://lore.kernel.org/r/20210818124845.488312-1-wanghai38@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

crypto: ccp - shutdown SEV firmware on kexec

commit 5441a07a127f106c9936e4f9fa1a8a93e3f31828 upstream.

The commit 97f9ac3db6612 ("crypto: ccp - Add support for SEV-ES to the
PSP driver") added support to allocate Trusted Memory Region (TMR)
used during the SEV-ES firmware initialization. The TMR gets locked
during the firmware initialization and unlocked during the shutdown.
While the TMR is locked, access to it is disallowed.

Currently, the CCP driver does not shutdown the firmware during the
kexec reboot, leaving the TMR memory locked.

Register a callback to shutdown the SEV firmware on the kexec boot.

Fixes: 97f9ac3db6612 ("crypto: ccp - Add support for SEV-ES to the PSP driver")
Reported-by: Lucas Nussbaum <lucas.nussbaum@inria.fr>
Tested-by: Lucas Nussbaum <lucas.nussbaum@inria.fr>
Cc: <stable@kernel.org>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Acked-by: Tom Lendacky <thomas.lendacky@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

dm crypt: Avoid percpu_counter spinlock contention in crypt_page_alloc()

commit 528b16bfc3ae5f11638e71b3b63a81f9999df727 upstream.

On systems with many cores using dm-crypt, heavy spinlock contention in
percpu_counter_compare() can be observed when the page allocation limit
for a given device is reached or close to be reached. This is due
to percpu_counter_compare() taking a spinlock to compute an exact
result on potentially many CPUs at the same time.

Switch to non-exact comparison of allocated and allowed pages by using
the value returned by percpu_counter_read_positive() to avoid taking
the percpu_counter spinlock.

This may over/under estimate the actual number of allocated pages by at
most (batch-1) * num_online_cpus().

Currently, batch is bounded by 32. The system on which this issue was
first observed has 256 CPUs and 512GB of RAM. With a 4k page size, this
change may over/under estimate by 31MB. With ~10G (2%) allowed dm-crypt
allocations, this seems an acceptable error. Certainly preferred over
running into the spinlock contention.

This behavior was reproduced on an EC2 c5.24xlarge instance with 96 CPUs
and 192GB RAM as follows, but can be provoked on systems with less CPUs
as well.

* Disable swap
* Tune vm settings to promote regular writeback
     $ echo 50 > /proc/sys/vm/dirty_expire_centisecs
     $ echo 25 > /proc/sys/vm/dirty_writeback_centisecs
     $ echo $((128 * 1024 * 1024)) > /proc/sys/vm/dirty_background_bytes

* Create 8 dmcrypt devices based on files on a tmpfs
* Create and mount an ext4 filesystem on each crypt devices
* Run stress-ng --hdd 8 within one of above filesystems

Total %system usage collected from sysstat goes to ~35%. Write throughput
on the underlying loop device is ~2GB/s. perf profiling an individual
kworker kcryptd thread shows the following profile, indicating spinlock
contention in percpu_counter_compare():

    99.98%     0.00%  kworker/u193:46  [kernel.kallsyms]  [k] ret_from_fork
      |
      --ret_from_fork
        kthread
        worker_thread
        |
         --99.92%--process_one_work
            |
            |--80.52%--kcryptd_crypt
            |    |
            |    |--62.58%--mempool_alloc
            |    |  |
            |    |   --62.24%--crypt_page_alloc
            |    |     |
            |    |      --61.51%--__percpu_counter_compare
            |    |        |
            |    |         --61.34%--__percpu_counter_sum
            |    |           |
            |    |           |--58.68%--_raw_spin_lock_irqsave
            |    |           |  |
            |    |           |   --58.30%--native_queued_spin_lock_slowpath
            |    |           |
            |    |            --0.69%--cpumask_next
            |    |                |
            |    |                 --0.51%--_find_next_bit
            |    |
            |    |--10.61%--crypt_convert
            |    |          |
            |    |          |--6.05%--xts_crypt
            ...

After applying this patch and running the same test, %system usage is
lowered to ~7% and write throughput on the loop device increases
to ~2.7GB/s. perf report shows mempool_alloc() as ~8% rather than ~62%
in the profile and not hitting the percpu_counter() spinlock anymore.

    |--8.15%--mempool_alloc
    |    |
    |    |--3.93%--crypt_page_alloc
    |    |    |
    |    |     --3.75%--__alloc_pages
    |    |         |
    |    |          --3.62%--get_page_from_freelist
    |    |              |
    |    |               --3.22%--rmqueue_bulk
    |    |                   |
    |    |                    --2.59%--_raw_spin_lock
    |    |                      |
    |    |                       --2.57%--native_queued_spin_lock_slowpath
    |    |
    |     --3.05%--_raw_spin_lock_irqsave
    |               |
    |                --2.49%--native_queued_spin_lock_slowpath

Suggested-by: DJ Gregor <dj@corelight.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Arne Welzel <arne.welzel@corelight.com>
Fixes: 5059353df86e ("dm crypt: limit the number of allocated pages")
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

power: supply: max17042: handle fails of reading status register

commit 54784ffa5b267f57161eb8fbb811499f22a0a0bf upstream.

Reading status register can fail in the interrupt handler. In such
case, the regmap_read() will not store anything useful under passed
'val' variable and random stack value will be used to determine type of
interrupt.

Handle the regmap_read() failure to avoid handling interrupt type and
triggering changed power supply event based on random stack value.

Fixes: 39e7213edc4f ("max17042_battery: Support regmap to access device's registers")
Cc: <stable@vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

block: bfq: fix bfq_set_next_ioprio_data()

commit a680dd72ec336b81511e3bff48efac6dbfa563e7 upstream.

For a request that has a priority level equal to or larger than
IOPRIO_BE_NR, bfq_set_next_ioprio_data() prints a critical warning but
defaults to setting the request new_ioprio field to IOPRIO_BE_NR. This
is not consistent with the warning and the allowed values for priority
levels. Fix this by setting the request new_ioprio field to
IOPRIO_BE_NR - 1, the lowest priority level allowed.

Cc: <stable@vger.kernel.org>
Fixes: aee69d78dec0 ("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler")
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20210811033702.368488-2-damien.lemoal@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

crypto: public_key: fix overflow during implicit conversion

commit f985911b7bc75d5c98ed24d8aaa8b94c590f7c6a upstream.

Hit kernel warning like this, it can be reproduced by verifying 256
bytes datafile by keyctl command, run script:
RAWDATA=rawdata
SIGDATA=sigdata

modprobe pkcs8_key_parser

rm -rf *.der *.pem *.pfx
rm -rf $RAWDATA
dd if=/dev/random of=$RAWDATA bs=256 count=1

openssl req -nodes -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem \
  -subj "/C=CN/ST=GD/L=SZ/O=vihoo/OU=dev/CN=xx.com/emailAddress=yy@xx.com"

KEY_ID=`openssl pkcs8 -in key.pem -topk8 -nocrypt -outform DER | keyctl \
  padd asymmetric 123 @s`

keyctl pkey_sign $KEY_ID 0 $RAWDATA enc=pkcs1 hash=sha1 > $SIGDATA
keyctl pkey_verify $KEY_ID 0 $RAWDATA $SIGDATA enc=pkcs1 hash=sha1

Then the kernel reports:
WARNING: CPU: 5 PID: 344556 at crypto/rsa-pkcs1pad.c:540
   pkcs1pad_verify+0x160/0x190
...
Call Trace:
  public_key_verify_signature+0x282/0x380
  ? software_key_query+0x12d/0x180
  ? keyctl_pkey_params_get+0xd6/0x130
  asymmetric_key_verify_signature+0x66/0x80
  keyctl_pkey_verify+0xa5/0x100
  do_syscall_64+0x35/0xb0
  entry_SYSCALL_64_after_hwframe+0x44/0xae

The reason of this issue, in function 'asymmetric_key_verify_signature':
'.digest_size(u8) = params->in_len(u32)' leads overflow of an u8 value,
so use u32 instead of u8 for digest_size field. And reorder struct
public_key_signature, it saves 8 bytes on a 64-bit machine.

Cc: stable@vger.kernel.org
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

wcn36xx: Ensure finish scan is not requested before start scan

commit d195d7aac09bddabc2c8326fb02fcec2b0a2de02 upstream.

If the operating channel is the first in the scan list, it was seen that
a finish scan request would be sent before a start scan request was
sent, causing the firmware to fail all future scans. Track the current
channel being scanned to avoid requesting the scan finish before it
starts.

Cc: <stable@vger.kernel.org>
Fixes: 5973a2947430 ("wcn36xx: Fix software-driven scan")
Signed-off-by: Joseph Gates <jgates@squareup.com>
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1629286303-13179-1-git-send-email-loic.poulain@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

iio: ltc2983: fix device probe

commit b76d26d69ecc97ebb24aaf40427a13c808a4f488 upstream.

There is no reason to assume that the IRQ rising edge (indicating that
the device start up phase is done) will happen after we request the IRQ.
If the device is already up by the time we request it, the call to
'wait_for_completion_timeout()' will timeout and we will fail the device
probe even though there's nothing wrong.

Fix it by just polling the status register until we get the indication that
the device is up and running. As a side effect of this fix, requesting the
IRQ is also moved to after the setup function.

Fixes: f110f3188e563 ("iio: temperature: Add support for LTC2983")
Reported-and-tested-by: Drew Fustini <drew@pdp7.com>
Reviewed-by: Drew Fustini <drew@pdp7.com>
Signed-off-by: Nuno Sá <nuno.sa@analog.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: <Stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210811133220.190264-2-nuno.sa@analog.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

arm64: head: avoid over-mapping in map_memory

commit 90268574a3e8a6b883bd802d702a2738577e1006 upstream.

The `compute_indices` and `populate_entries` macros operate on inclusive
bounds, and thus the `map_memory` macro which uses them also operates
on inclusive bounds.

We pass `_end` and `_idmap_text_end` to `map_memory`, but these are
exclusive bounds, and if one of these is sufficiently aligned (as a
result of kernel configuration, physical placement, and KASLR), then:

* In `compute_indices`, the computed `iend` will be in the page/block *after*
  the final byte of the intended mapping.

* In `populate_entries`, an unnecessary entry will be created at the end
  of each level of table. At the leaf level, this entry will map up to
  SWAPPER_BLOCK_SIZE bytes of physical addresses that we did not intend
  to map.

As we may map up to SWAPPER_BLOCK_SIZE bytes more than intended, we may
violate the boot protocol and map physical address past the 2MiB-aligned
end address we are permitted to map. As we map these with Normal memory
attributes, this may result in further problems depending on what these
physical addresses correspond to.

The final entry at each level may require an additional table at that
level. As EARLY_ENTRIES() calculates an inclusive bound, we allocate
enough memory for this.

Avoid the extraneous mapping by having map_memory convert the exclusive
end address to an inclusive end address by subtracting one, and do
likewise in EARLY_ENTRIES() when calculating the number of required
tables. For clarity, comments are updated to more clearly document which
boundaries the macros operate on.  For consistency with the other
macros, the comments in map_memory are also updated to describe `vstart`
and `vend` as virtual addresses.

Fixes: 0370b31e4845 ("arm64: Extend early page table code to allow for larger kernels")
Cc: <stable@vger.kernel.org> # 4.16.x
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210823101253.55567-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

arm64: mm: Fix TLBI vs ASID rollover

commit 5e10f9887ed85d4f59266d5c60dd09be96b5dbd4 upstream.

When switching to an 'mm_struct' for the first time following an ASID
rollover, a new ASID may be allocated and assigned to 'mm->context.id'.
This reassignment can happen concurrently with other operations on the
mm, such as unmapping pages and subsequently issuing TLB invalidation.

Consequently, we need to ensure that (a) accesses to 'mm->context.id'
are atomic and (b) all page-table updates made prior to a TLBI using the
old ASID are guaranteed to be visible to CPUs running with the new ASID.

This was found by inspection after reviewing the VMID changes from
Shameer but it looks like a real (yet hard to hit) bug.

Cc: <stable@vger.kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Jade Alglave <jade.alglave@arm.com>
Cc: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210806113109.2475-2-will@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

soc: aspeed: p2a-ctrl: Fix boundary check for mmap

commit 8b07e990fb254fcbaa919616ac77f981cb48c73d upstream.

The check mixes pages (vm_pgoff) with bytes (vm_start, vm_end) on one
side of the comparison, and uses resource address (rather than just the
resource size) on the other side of the comparison.
This can allow malicious userspace to easily bypass the boundary check and
map pages that are located outside memory-region reserved by the driver.

Fixes: 01c60dcea9f7 ("drivers/misc: Add Aspeed P2A control driver")
Cc: stable@vger.kernel.org
Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Tested-by: Andrew Jeffery <andrew@aj.id.au>
Reviewed-by: Joel Stanley <joel@aj.id.au>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

soc: aspeed: lpc-ctrl: Fix boundary check for mmap

commit b49a0e69a7b1a68c8d3f64097d06dabb770fec96 upstream.

The check mixes pages (vm_pgoff) with bytes (vm_start, vm_end) on one
side of the comparison, and uses resource address (rather than just the
resource size) on the other side of the comparison.
This can allow malicious userspace to easily bypass the boundary check and
map pages that are located outside memory-region reserved by the driver.

Fixes: 6c4e97678501 ("drivers/misc: Add Aspeed LPC control driver")
Cc: stable@vger.kernel.org
Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Tested-by: Andrew Jeffery <andrew@aj.id.au>
Reviewed-by: Joel Stanley <joel@aj.id.au>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

soc: qcom: aoss: Fix the out of bound usage of cooling_devs

commit a89f355e469dcda129c2522be4fdba00c1c74c83 upstream.

In "qmp_cooling_devices_register", the count value is initially
QMP_NUM_COOLING_RESOURCES, which is 2. Based on the initial count value,
the memory for cooling_devs is allocated. Then while calling the
"qmp_cooling_device_add" function, count value is post-incremented for
each child node.

This makes the out of bound access to the cooling_dev array. Fix it by
passing the QMP_NUM_COOLING_RESOURCES definition to devm_kzalloc() and
initializing the count to 0.

While at it, let's also free the memory allocated to cooling_dev if no
cooling device is found in DT and during unroll phase.

Cc: stable@vger.kernel.org # 5.4
Fixes: 05589b30b21a ("soc: qcom: Extend AOSS QMP driver to support resources that are used to wake up the SoC.")
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20210629153249.73428-1-manivannan.sadhasivam@linaro.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

pinctrl: ingenic: Fix incorrect pull up/down info

commit d5e931403942b3af39212960c2592b5ba741b2bf upstream.

Fix the pull up/down info for both the JZ4760 and JZ4770 SoCs, as the
previous values sometimes contradicted what's written in the programming
manual.

Fixes: b5c23aa46537 ("pinctrl: add a pinctrl driver for the Ingenic jz47xx SoCs")
Cc: <stable@vger.kernel.org> # v4.12
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Tested-by: 周琰杰 (Zhou Yanjie)<zhouyanjie@wanyeetech.com>
Link: https://lore.kernel.org/r/20210717174836.14776-1-paul@crapouillou.net
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

pinctrl: stmfx: Fix hazardous u8[] to unsigned long cast

commit 1b73e588f47397dee6e4bdfd953e0306c60b5fe5 upstream.

Casting a small array of u8 to an unsigned long is *never* OK:

- it does funny thing when the array size is less than that of a long,
as it accesses random places in the stack
- it makes everything even more fun with a BE kernel

Fix this by building the unsigned long used as a bitmap byte by byte,
in a way that works across endianess and has no undefined behaviours.

An extra BUILD_BUG_ON() catches the unlikely case where the array
would be larger than a single unsigned long.

Fixes: 1490d9f841b1 ("pinctrl: Add STMFX GPIO expander Pinctrl/GPIO driver")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Cc: Amelie Delaunay <amelie.delaunay@foss.st.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Link: https://lore.kernel.org/r/20210725180830.250218-1-maz@kernel.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

clk: socfpga: agilex: add the bypass register for s2f_usr0 clock

commit d17929eb1066d1c1653aae9bb4396a9f1d6602ac upstream.

Add the bypass register for the s2f_user0_clk.

Fixes: 80c6b7a0894f ("clk: socfpga: agilex: add clock driver for the Agilex platform")
Cc: stable@vger.kernel.org
Signed-off-by: Kris Chaplin <kris.chaplin@intel.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
Link: https://lore.kernel.org/r/20210713144621.605140-3-dinguyen@kernel.org
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

clk: socfpga: agilex: fix up s2f_user0_clk representation

commit f817c132db679d492d96c60993fa2f2c67ab18d0 upstream.

Correct the s2f_user0_mux clock representation.

Fixes: 80c6b7a0894f ("clk: socfpga: agilex: add clock driver for the Agilex platform")
Cc: stable@vger.kernel.org
Signed-off-by: Kris Chaplin <kris.chaplin@intel.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
Link: https://lore.kernel.org/r/20210713144621.605140-2-dinguyen@kernel.org
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

clk: socfpga: agilex: fix the parents of the psi_ref_clk

commit 9d563236cca43fc4fe190b3be173444bd48e2a3b upstream.

The psi_ref_clk comes from the C2 node of the main_pll and periph_pll,
not the C3.

Fixes: 80c6b7a0894f ("clk: socfpga: agilex: add clock driver for the Agilex platform")
Cc: stable@vger.kernel.org
Signed-off-by: Kris Chaplin <kris.chaplin@intel.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
Link: https://lore.kernel.org/r/20210713144621.605140-1-dinguyen@kernel.org
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tools/thermal/tmon: Add cross compiling support

commit b5f7912bb604b47a0fe024560488a7556dce8ee7 upstream.

Default to prefixed pkg-config when crosscompiling, this matches what
other parts of the tools/ directory already do.

[dlezcano] : Reworked description

Signed-off-by: Rolf Eike Beer <eb@emlix.com>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/31302992.qZodDJZGDc@devpool47
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

selftests/ftrace: Fix requirement check of README file

commit 210f9df02611cbe641ced3239122b270fd907d86 upstream.

The selftest for ftrace checks some features by checking if the README has
text that states the feature is supported by that kernel. Unfortunately,
this check gives false positives because it many not be checked if there's
spaces in the string to check. This is due to the compare between the
required variable with the ":README" string stripped, because neither has
quotes around them.

Link: https://lkml.kernel.org/r/20210820204742.087177341@goodmis.org
Cc: "Tzvetomir Stoyanov" <tz.stoyanov@gmail.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: linux-kselftest@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: 1b8eec510ba64 ("selftests/ftrace: Support ":README" suffix for requires")
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ceph: fix dereference of null pointer cf

commit 05a444d3f90a3c3e6362e88a1bf13e1a60f8cace upstream.

Currently in the case where kmem_cache_alloc fails the null pointer
cf is dereferenced when assigning cf->is_capsnap = false. Fix this
by adding a null pointer check and return path.

Cc: stable@vger.kernel.org
Addresses-Coverity: ("Dereference null return")
Fixes: b2f9fa1f3bd8 ("ceph: correctly handle releasing an embedded cap flush")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

9p/xen: Fix end of loop tests for list_for_each_entry

commit 732b33d0dbf17e9483f0b50385bf606f724f50a2 upstream.

This patch addresses the following problems:
- priv can never be NULL, so this part of the check is useless
- if the loop ran through the whole list, priv->client is invalid and
it is more appropriate and sufficient to check for the end of
list_for_each_entry loop condition.

Link: http://lkml.kernel.org/r/20210727000709.225032-1-harshvardhan.jha@oracle.com
Signed-off-by: Harshvardhan Jha <harshvardhan.jha@oracle.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Stefano Stabellini <sstabellini@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xen: fix setting of max_pfn in shared_info

commit 4b511d5bfa74b1926daefd1694205c7f1bcf677f upstream.

Xen PV guests are specifying the highest used PFN via the max_pfn
field in shared_info. This value is used by the Xen tools when saving
or migrating the guest.

Unfortunately this field is misnamed, as in reality it is specifying
the number of pages (including any memory holes) of the guest, so it
is the highest used PFN + 1. Renaming isn't possible, as this is a
public Xen hypervisor interface which needs to be kept stable.

The kernel will set the value correctly initially at boot time, but
when adding more pages (e.g. due to memory hotplug or ballooning) a
real PFN number is stored in max_pfn. This is done when expanding the
p2m array, and the PFN stored there is even possibly wrong, as it
should be the last possible PFN of the just added P2M frame, and not
one which led to the P2M expansion.

Fix that by setting shared_info->max_pfn to the last possible PFN + 1.

Fixes: 98dd166ea3a3c3 ("x86/xen/p2m: hint at the last populated P2M entry")
Cc: stable@vger.kernel.org
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Link: https://lore.kernel.org/r/20210730092622.9973-2-jgross@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

powerpc/perf/hv-gpci: Fix counter value parsing

commit f9addd85fbfacf0d155e83dbee8696d6df5ed0c7 upstream.

H_GetPerformanceCounterInfo (0xF080) hcall returns the counter data in
the result buffer. Result buffer has specific format defined in the PAPR
specification. One of the fields is counter offset and width of the
counter data returned.

Counter data are returned in a unsigned char array in big endian byte
order. To get the final counter data, the values must be left shifted
byte at a time. But commit 220a0c609ad17 ("powerpc/perf: Add support for
the hv gpci (get performance counter info) interface") made the shifting
bitwise and also assumed little endian order. Because of that, hcall
counters values are reported incorrectly.

In particular this can lead to counters go backwards which messes up the
counter prev vs now calculation and leads to huge counter value
reporting:

  #: perf stat -e hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
           -C 0 -I 1000
        time             counts unit events
     1.000078854 18,446,744,073,709,535,232      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     2.000213293                  0      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     3.000320107                  0      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     4.000428392                  0      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     5.000537864                  0      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     6.000649087                  0      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     7.000760312                  0      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     8.000865218             16,448      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     9.000978985 18,446,744,073,709,535,232      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
    10.001088891             16,384      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
    11.001201435                  0      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
    12.001307937 18,446,744,073,709,535,232      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/

Fix the shifting logic to correct match the format, ie. read bytes in
big endian order.

Fixes: e4f226b1580b ("powerpc/perf/hv-gpci: Increase request buffer size")
Cc: stable@vger.kernel.org # v4.6+
Reported-by: Nageswara R Sastry<rnsastry@linux.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Tested-by: Nageswara R Sastry<rnsastry@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210813082158.429023-1-kjain@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

PCI/MSI: Skip masking MSI-X on Xen PV

commit 1a519dc7a73c977547d8b5108d98c6e769c89f4b upstream.

When running as Xen PV guest, masking MSI-X is a responsibility of the
hypervisor. The guest has no write access to the relevant BAR at all - when
it tries to, it results in a crash like this:

    BUG: unable to handle page fault for address: ffffc9004069100c
    #PF: supervisor write access in kernel mode
    #PF: error_code(0x0003) - permissions violation
    RIP: e030:__pci_enable_msix_range.part.0+0x26b/0x5f0
     e1000e_set_interrupt_capability+0xbf/0xd0 [e1000e]
     e1000_probe+0x41f/0xdb0 [e1000e]
     local_pci_probe+0x42/0x80
    (...)

The recently introduced function msix_mask_all() does not check the global
variable pci_msi_ignore_mask which is set by XEN PV to bypass the masking
of MSI[-X] interrupts.

Add the check to make this function XEN PV compatible.

Fixes: 7d5ec3d36123 ("PCI/MSI: Mask all unused MSI-X entries")
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210826170342.135172-1-marmarek@invisiblethingslab.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

blk-zoned: allow BLKREPORTZONE without CAP_SYS_ADMIN

commit 4d643b66089591b4769bcdb6fd1bfeff2fe301b8 upstream.

A user space process should not need the CAP_SYS_ADMIN capability set
in order to perform a BLKREPORTZONE ioctl.

Getting the zone report is required in order to get the write pointer.
Neither read() nor write() requires CAP_SYS_ADMIN, so it is reasonable
that a user space process that can read/write from/to the device, also
can get the write pointer. (Since e.g. writes have to be at the write
pointer.)

Fixes: 3ed05a987e0f ("blk-zoned: implement ioctls")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Reviewed-by: Adam Manzanares <a.manzanares@samsung.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: stable@vger.kernel.org # v4.10+
Link: https://lore.kernel.org/r/20210811110505.29649-3-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

blk-zoned: allow zone management send operations without CAP_SYS_ADMIN

commit ead3b768bb51259e3a5f2287ff5fc9041eb6f450 upstream.

Zone management send operations (BLKRESETZONE, BLKOPENZONE, BLKCLOSEZONE
and BLKFINISHZONE) should be allowed under the same permissions as write().
(write() does not require CAP_SYS_ADMIN).

Additionally, other ioctls like BLKSECDISCARD and BLKZEROOUT only check if
the fd was successfully opened with FMODE_WRITE.
(They do not require CAP_SYS_ADMIN).

Currently, zone management send operations require both CAP_SYS_ADMIN
and that the fd was successfully opened with FMODE_WRITE.

Remove the CAP_SYS_ADMIN requirement, so that zone management send
operations match the access control requirement of write(), BLKSECDISCARD
and BLKZEROOUT.

Fixes: 3ed05a987e0f ("blk-zoned: implement ioctls")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Reviewed-by: Adam Manzanares <a.manzanares@samsung.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: stable@vger.kernel.org # v4.10+
Link: https://lore.kernel.org/r/20210811110505.29649-2-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

btrfs: reset replace target device to allocation state on close

commit 0d977e0eba234e01a60bdde27314dc21374201b3 upstream.

This crash was observed with a failed assertion on device close:

  BTRFS: Transaction aborted (error -28)
  WARNING: CPU: 1 PID: 3902 at fs/btrfs/extent-tree.c:2150 btrfs_run_delayed_refs+0x1d2/0x1e0 [btrfs]
  Modules linked in: btrfs blake2b_generic libcrc32c crc32c_intel xor zstd_decompress zstd_compress xxhash lzo_compress lzo_decompress raid6_pq loop
  CPU: 1 PID: 3902 Comm: kworker/u8:4 Not tainted 5.14.0-rc5-default+ #1532
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
  Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs]
  RIP: 0010:btrfs_run_delayed_refs+0x1d2/0x1e0 [btrfs]
  RSP: 0018:ffffb7a5452d7d80 EFLAGS: 00010282
  RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: ffffffffabee13c4 RDI: 00000000ffffffff
  RBP: ffff97834176a378 R08: 0000000000000001 R09: 0000000000000001
  R10: 0000000000000000 R11: 0000000000000001 R12: ffff97835195d388
  R13: 0000000005b08000 R14: ffff978385484000 R15: 000000000000016c
  FS:  0000000000000000(0000) GS:ffff9783bd800000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 000056190d003fe8 CR3: 000000002a81e005 CR4: 0000000000170ea0
  Call Trace:
   flush_space+0x197/0x2f0 [btrfs]
   btrfs_async_reclaim_metadata_space+0x139/0x300 [btrfs]
   process_one_work+0x262/0x5e0
   worker_thread+0x4c/0x320
   ? process_one_work+0x5e0/0x5e0
   kthread+0x144/0x170
   ? set_kthread_struct+0x40/0x40
   ret_from_fork+0x1f/0x30
  irq event stamp: 19334989
  hardirqs last  enabled at (19334997): [<ffffffffab0e0c87>] console_unlock+0x2b7/0x400
  hardirqs last disabled at (19335006): [<ffffffffab0e0d0d>] console_unlock+0x33d/0x400
  softirqs last  enabled at (19334900): [<ffffffffaba0030d>] __do_softirq+0x30d/0x574
  softirqs last disabled at (19334893): [<ffffffffab0721ec>] irq_exit_rcu+0x12c/0x140
  ---[ end trace 45939e308e0dd3c7 ]---
  BTRFS: error (device vdd) in btrfs_run_delayed_refs:2150: errno=-28 No space left
  BTRFS info (device vdd): forced readonly
  BTRFS warning (device vdd): failed setting block group ro: -30
  BTRFS info (device vdd): suspending dev_replace for unmount
  assertion failed: !test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state), in fs/btrfs/volumes.c:1150
  ------------[ cut here ]------------
  kernel BUG at fs/btrfs/ctree.h:3431!
  invalid opcode: 0000 [#1] PREEMPT SMP
  CPU: 1 PID: 3982 Comm: umount Tainted: G        W         5.14.0-rc5-default+ #1532
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
  RIP: 0010:assertfail.constprop.0+0x18/0x1a [btrfs]
  RSP: 0018:ffffb7a5454c7db8 EFLAGS: 00010246
  RAX: 0000000000000068 RBX: ffff978364b91c00 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: ffffffffabee13c4 RDI: 00000000ffffffff
  RBP: ffff9783523a4c00 R08: 0000000000000001 R09: 0000000000000001
  R10: 0000000000000000 R11: 0000000000000001 R12: ffff9783523a4d18
  R13: 0000000000000000 R14: 0000000000000004 R15: 0000000000000003
  FS:  00007f61c8f42800(0000) GS:ffff9783bd800000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 000056190cffa810 CR3: 0000000030b96002 CR4: 0000000000170ea0
  Call Trace:
   btrfs_close_one_device.cold+0x11/0x55 [btrfs]
   close_fs_devices+0x44/0xb0 [btrfs]
   btrfs_close_devices+0x48/0x160 [btrfs]
   generic_shutdown_super+0x69/0x100
   kill_anon_super+0x14/0x30
   btrfs_kill_super+0x12/0x20 [btrfs]
   deactivate_locked_super+0x2c/0xa0
   cleanup_mnt+0x144/0x1b0
   task_work_run+0x59/0xa0
   exit_to_user_mode_loop+0xe7/0xf0
   exit_to_user_mode_prepare+0xaf/0xf0
   syscall_exit_to_user_mode+0x19/0x50
   do_syscall_64+0x4a/0x90
   entry_SYSCALL_64_after_hwframe+0x44/0xae

This happens when close_ctree is called while a dev_replace hasn't
completed. In close_ctree, we suspend the dev_replace, but keep the
replace target around so that we can resume the dev_replace procedure
when we mount the root again. This is the call trace:

  close_ctree():
    btrfs_dev_replace_suspend_for_unmount();
    btrfs_close_devices():
      btrfs_close_fs_devices():
        btrfs_close_one_device():
          ASSERT(!test_bit(BTRFS_DEV_STATE_REPLACE_TGT,
                 &device->dev_state));

However, since the replace target sticks around, there is a device
with BTRFS_DEV_STATE_REPLACE_TGT set on close, and we fail the
assertion in btrfs_close_one_device.

To fix this, if we come across the replace target device when
closing, we should properly reset it back to allocation state. This
fix also ensures that if a non-target device has a corrupted state and
has the BTRFS_DEV_STATE_REPLACE_TGT bit set, the assertion will still
catch the error.

Reported-by: David Sterba <dsterba@suse.com>
Fixes: b2a616676839 ("btrfs: fix rw device counting in __btrfs_free_extra_devids")
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

btrfs: wake up async_delalloc_pages waiters after submit

commit ac98141d140444fe93e26471d3074c603b70e2ca upstream.

We use the async_delalloc_pages mechanism to make sure that we've
completed our async work before trying to continue our delalloc
flushing.  The reason for this is we need to see any ordered extents
that were created by our delalloc flushing.  However we're waking up
before we do the submit work, which is before we create the ordered
extents.  This is a pretty wide race window where we could potentially
think there are no ordered extents and thus exit shrink_delalloc
prematurely.  Fix this by waking us up after we've done the work to
create ordered extents.

CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

io-wq: fix wakeup race when adding new work

commit 87df7fb922d18e96992aa5e824aa34b2065fef59 upstream.

When new work is added, io_wqe_enqueue() checks if we need to wake or
create a new worker. But that check is done outside the lock that
otherwise synchronizes us with a worker going to sleep, so we can end
up in the following situation:

CPU0 CPU1
lock
insert work
unlock
atomic_read(nr_running) != 0
lock
atomic_dec(nr_running)
no wakeup needed

Hold the wqe lock around the "need to wakeup" check. Then we can also get
rid of the temporary work_flags variable, as we know the work will remain
valid as long as we hold the lock.

Cc: stable@vger.kernel.org
Reported-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

io_uring: fail links of cancelled timeouts

commit 2ae2eb9dde18979b40629dd413b9adbd6c894cdf upstream.

When we cancel a timeout we should mark it with REQ_F_FAIL, so
linked requests are cancelled as well, but not queued for further
execution.

Cc: stable@vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/fff625b44eeced3a5cae79f60e6acf3fbdf8f990.1631192135.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

io_uring: add ->splice_fd_in checks

commit 26578cda3db983b17cabe4e577af26306beb9987 upstream.

->splice_fd_in is used only by splice/tee, but no other request checks
it for validity. Add the check for most of request types excluding
reads/writes/sends/recvs, we don't want overhead for them and can leave
them be as is until the field is actually used.

Cc: stable@vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f44bc2acd6777d932de3d71a5692235b5b2b7397.1629451684.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

io_uring: place fixed tables under memcg limits

commit 0bea96f59ba40e63c0ae93ad6a02417b95f22f4d upstream.

Fixed tables may be large enough, place all of them together with
allocated tags under memcg limits.

Cc: stable@vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b3ac9f5da9821bb59837b5fe25e8ef4be982218c.1629451684.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

io_uring: limit fixed table size by RLIMIT_NOFILE

Limit the number of files in io_uring fixed tables by RLIMIT_NOFILE,
that's the first and the simpliest restriction that we should impose.

Cc: stable@vger.kernel.org
Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b2756c340aed7d6c0b302c26dab50c6c5907f4ce.1629451684.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

rtc: tps65910: Correct driver module alias

commit 8d448fa0a8bb1c8d94eef7647edffe9ac81a281e upstream.

The TPS65910 RTC driver module doesn't auto-load because of the wrong
module alias that doesn't match the device name, fix it.

Cc: stable@vger.kernel.org
Reported-by: Anton Bambura <jenneron@protonmail.com>
Tested-by: Anton Bambura <jenneron@protonmail.com>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20210808160030.8556-1-digetx@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Linux 5.10.66

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Revert "time: Handle negative seconds correctly in timespec64_to_ns()"

This reverts commit 656f343d724b45295f73000eb6e7bd3d212af116 which is
commit 39ff83f2f6cc5cc1458dfcea9697f96338210beb upstream.

Arnd reports that this needs more review before being merged into all of
the trees.

Link: https://lore.kernel.org/r/CAK8P3a0z5jE=Z3Ps5bFTCFT7CHZR1JQ8VhdntDJAfsUxSPCcEw@mail.gmail.com
Reported-by: Arnd Bergmann <arnd@kernel.org>
Cc: Lukas Hannen <lukas.hannen@opensource.tttech-industrial.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Revert "posix-cpu-timers: Force next expiration recalc after itimer reset"

This reverts commit 13ccaef77ee86047033c50bf59cb19e0dda3aa97 which is
commit 406dd42bd1ba0c01babf9cde169bb319e52f6147 upstream.

It is reported to cause regressions.  A proposed fix has been posted,
but it is not in a released kernel yet.  So just revert this from the
stable release so that the bug is fixed.  If it's really needed we can
add it back in in a future release.

Link: https://lore.kernel.org/r/87ilz1pwaq.fsf@wylie.me.uk
Reported-by: "Alan J. Wylie" <alan@wylie.me.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Revert "block: nbd: add sanity check for first_minor"

This reverts commit 4b21d4e820bb9a1415ec76dfe565e4c5937337dd which is
commit b1a811633f7321cf1ae2bb76a66805b7720e44c9 upstream.

The backport of this is reported to be causing some problems, so revert
this for now until they are worked out.

Link: https://lore.kernel.org/r/CACPK8XfUWoOHr-0RwRoYoskia4fbAbZ7DYf5wWBnv6qUnGq18w@mail.gmail.com
Reported-by: Joel Stanley <joel@jms.id.au>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Pavel Skripkin <paskripkin@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Revert "Bluetooth: Move shutdown callback before flushing tx and rx queue"

This reverts commit 471128476819777a7b71f2b90bc868ccf74b185f.

Botched backport, dropping to rework for next release.

Reported-by: Pavel Machek <pavel@denx.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>

Linux 5.10.65

Link: https://lore.kernel.org/r/20210913131100.316353015@linuxfoundation.org
Tested-by: Fox Chen <foxhlchen@gmail.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Shuah Khan <skhan@linuxfoundation.org>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Pavel Machek (CIP) <pavel@denx.de>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Hulk Robot <hulkrobot@huawei.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

clk: kirkwood: Fix a clocking boot regression

commit aaedb9e00e5400220a8871180d23a83e67f29f63 upstream.

Since a few kernel releases the Pogoplug 4 has crashed like this
during boot:

Unable to handle kernel NULL pointer dereference at virtual address 00000002
(...)
[<c04116ec>] (strlen) from [<c00ead80>] (kstrdup+0x1c/0x4c)
[<c00ead80>] (kstrdup) from [<c04591d8>] (__clk_register+0x44/0x37c)
[<c04591d8>] (__clk_register) from [<c04595ec>] (clk_hw_register+0x20/0x44)
[<c04595ec>] (clk_hw_register) from [<c045bfa8>] (__clk_hw_register_mux+0x198/0x1e4)
[<c045bfa8>] (__clk_hw_register_mux) from [<c045c050>] (clk_register_mux_table+0x5c/0x6c)
[<c045c050>] (clk_register_mux_table) from [<c0acf3e0>] (kirkwood_clk_muxing_setup.constprop.0+0x13c/0x1ac)
[<c0acf3e0>] (kirkwood_clk_muxing_setup.constprop.0) from [<c0aceae0>] (of_clk_init+0x12c/0x214)
[<c0aceae0>] (of_clk_init) from [<c0ab576c>] (time_init+0x20/0x2c)
[<c0ab576c>] (time_init) from [<c0ab3d18>] (start_kernel+0x3dc/0x56c)
[<c0ab3d18>] (start_kernel) from [<00000000>] (0x0)
Code: e3130020 1afffffb e12fff1e c08a1078 (e5d03000)

This is because the "powersave" mux clock 0 was provided in an unterminated
array, which is required by the loop in the driver:

        /* Count, allocate, and register clock muxes */
        for (n = 0; desc[n].name;)
                n++;

Here n will go out of bounds and then call clk_register_mux() on random
memory contents after the mux clock.

Fix this by terminating the array with a blank entry.

Fixes: 105299381d87 ("cpufreq: kirkwood: use the powersave multiplexer")
Cc: stable@vger.kernel.org
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Chris Packham <chris.packham@alliedtelesis.co.nz>
Cc: Gregory CLEMENT <gregory.clement@bootlin.com>
Cc: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20210814235514.403426-1-linus.walleij@linaro.org
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

backlight: pwm_bl: Improve bootloader/kernel device handover

commit 79fad92f2e596f5a8dd085788a24f540263ef887 upstream.

Currently there are (at least) two problems in the way pwm_bl starts
managing the enable_gpio pin. Both occur when the backlight is initially
off and the driver finds the pin not already in output mode and, as a
result, unconditionally switches it to output-mode and asserts the signal.

Problem 1: This could cause the backlight to flicker since, at this stage
in driver initialisation, we have no idea what the PWM and regulator are
doing (an unconfigured PWM could easily "rest" at 100% duty cycle).

Problem 2: This will cause us not to correctly honour the
post_pwm_on_delay (which also risks flickers).

Fix this by moving the code to configure the GPIO output mode until after
we have examines the handover state. That allows us to initialize
enable_gpio to off if the backlight is currently off and on if the
backlight is on.

Cc: stable@vger.kernel.org
Reported-by: Marek Vasut <marex@denx.de>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Acked-by: Marek Vasut <marex@denx.de>
Tested-by: Marek Vasut <marex@denx.de>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fbmem: don't allow too huge resolutions

commit 8c28051cdcbe9dfcec6bd0a4709d67a09df6edae upstream.

syzbot is reporting page fault at vga16fb_fillrect() [1], for
vga16fb_check_var() is failing to detect multiplication overflow.

  if (vxres * vyres > maxmem) {
    vyres = maxmem / vxres;
    if (vyres < yres)
      return -ENOMEM;
  }

Since no module would accept too huge resolutions where multiplication
overflow happens, let's reject in the common path.

Link: https://syzkaller.appspot.com/bug?extid=04168c8063cfdde1db5e
Reported-by: syzbot <syzbot+04168c8063cfdde1db5e@syzkaller.appspotmail.com>
Debugged-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/185175d6-227a-7b55-433d-b070929b262c@i-love.sakura.ne.jp
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

IMA: remove the dependency on CRYPTO_MD5

commit 8510505d55e194d3f6c9644c9f9d12c4f6b0395a upstream.

MD5 is a weak digest algorithm that shouldn't be used for cryptographic
operation. It hinders the efficiency of a patch set that aims to limit
the digests allowed for the extended file attribute namely security.ima.
MD5 is no longer a requirement for IMA, nor should it be used there.

The sole place where we still use the MD5 algorithm inside IMA is setting
the ima_hash algorithm to MD5, if the user supplies 'ima_hash=md5'
parameter on the command line. With commit ab60368ab6a4 ("ima: Fallback
to the builtin hash algorithm"), setting "ima_hash=md5" fails gracefully
when CRYPTO_MD5 is not set:
ima: Can not allocate md5 (reason: -2)
ima: Allocating md5 failed, going to use default hash algorithm sha256

Remove the CRYPTO_MD5 dependency for IMA.

Signed-off-by: THOBY Simon <Simon.THOBY@viveris.fr>
Reviewed-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
[zohar@linux.ibm.com: include commit number in patch description for
stable.]
Cc: stable@vger.kernel.org # 4.17
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

IMA: remove -Wmissing-prototypes warning

commit a32ad90426a9c8eb3915eed26e08ce133bd9e0da upstream.

With W=1 build, the compiler throws warning message as below:

   security/integrity/ima/ima_mok.c:24:12: warning:
   no previous prototype for ‘ima_mok_init’ [-Wmissing-prototypes]
       __init int ima_mok_init(void)

Silence the warning by adding static keyword to ima_mok_init().

Signed-off-by: Austin Kim <austin.kim@lge.com>
Fixes: 41c89b64d718 ("IMA: create machine owner and blacklist keyrings")
Cc: stable@vger.kernel.org
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fuse: flush extending writes

commit 59bda8ecee2ffc6a602b7bf2b9e43ca669cdbdcd upstream.

Callers of fuse_writeback_range() assume that the file is ready for
modification by the server in the supplied byte range after the call
returns.

If there's a write that extends the file beyond the end of the supplied
range, then the file needs to be extended to at least the end of the range,
but currently that's not done.

There are at least two cases where this can cause problems:

- copy_file_range() will return short count if the file is not extended
   up to end of the source range.

- FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE will not extend the file,
   hence the region may not be fully allocated.

Fix by flushing writes from the start of the range up to the end of the
file.  This could be optimized if the writes are non-extending, etc, but
it's probably not worth the trouble.

Fixes: a2bc92362941 ("fuse: fix copy_file_range() in the writeback case")
Fixes: 6b1bdb56b17c ("fuse: allow fallocate(FALLOC_FL_ZERO_RANGE)")
Cc: <stable@vger.kernel.org> # v5.2
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fuse: truncate pagecache on atomic_o_trunc

commit 76224355db7570cbe6b6f75c8929a1558828dd55 upstream.

fuse_finish_open() will be called with FUSE_NOWRITE in case of atomic
O_TRUNC. This can deadlock with fuse_wait_on_page_writeback() in
fuse_launder_page() triggered by invalidate_inode_pages2().

Fix by replacing invalidate_inode_pages2() in fuse_finish_open() with a
truncate_pagecache() call. This makes sense regardless of FOPEN_KEEP_CACHE
or fc->writeback cache, so do it unconditionally.

Reported-by: Xie Yongji <xieyongji@bytedance.com>
Reported-and-tested-by: syzbot+bea44a5189836d956894@syzkaller.appspotmail.com
Fixes: e4648309b85a ("fuse: truncate pending writes on O_TRUNC")
Cc: <stable@vger.kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ARM: dts: at91: add pinctrl-{names, 0} for all gpios

commit bf781869e5cf3e4ec1a47dad69b6f0df97629cbd upstream.

Add pinctrl-names and pinctrl-0 properties on controllers that claims to
use pins to avoid failures due to
commit 2ab73c6d8323 ("gpio: Support GPIO controllers without pin-ranges")
and also to avoid using pins that may be claimed my other IPs.

Fixes: b7c2b6157079 ("ARM: at91: add Atmel's SAMA5D3 Xplained board")
Fixes: 1e5f532c2737 ("ARM: dts: at91: sam9x60: add device tree for soc and board")
Fixes: 38153a017896 ("ARM: at91/dt: sama5d4: add dts for sama5d4 xplained board")
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Link: https://lore.kernel.org/r/20210727074006.1609989-1-claudiu.beznea@microchip.com
Cc: <stable@vger.kernel.org> # v5.7+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KVM: nVMX: Unconditionally clear nested.pi_pending on nested VM-Enter

commit f7782bb8d818d8f47c26b22079db10599922787a upstream.

Clear nested.pi_pending on nested VM-Enter even if L2 will run without
posted interrupts enabled.  If nested.pi_pending is left set from a
previous L2, vmx_complete_nested_posted_interrupt() will pick up the
stale flag and exit to userspace with an "internal emulation error" due
the new L2 not having a valid nested.pi_desc.

Arguably, vmx_complete_nested_posted_interrupt() should first check for
posted interrupts being enabled, but it's also completely reasonable that
KVM wouldn't screw up a fundamental flag.  Not to mention that the mere
existence of nested.pi_pending is a long-standing bug as KVM shouldn't
move the posted interrupt out of the IRR until it's actually processed,
e.g. KVM effectively drops an interrupt when it performs a nested VM-Exit
with a "pending" posted interrupt.  Fixing the mess is a future problem.

Prior to vmx_complete_nested_posted_interrupt() interpreting a null PI
descriptor as an error, this was a benign bug as the null PI descriptor
effectively served as a check on PI not being enabled.  Even then, the
new flow did not become problematic until KVM started checking the result
of kvm_check_nested_events().

Fixes: 705699a13994 ("KVM: nVMX: Enable nested posted interrupt processing")
Fixes: 966eefb89657 ("KVM: nVMX: Disable vmcs02 posted interrupts if vmcs12 PID isn't mappable")
Fixes: 47d3530f86c0 ("KVM: x86: Exit to userspace when kvm_check_nested_events fails")
Cc: stable@vger.kernel.org
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210810144526.2662272-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation

commit 81b4b56d4f8130bbb99cf4e2b48082e5b4cfccb9 upstream.

If we are emulating an invalid guest state, we don't have a correct
exit reason, and thus we shouldn't do anything in this function.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210826095750.1650467-2-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 95b5a48c4f2b ("KVM: VMX: Handle NMIs, #MCs and async #PFs in common irqs-disabled fn", 2019-06-18)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KVM: x86: Update vCPU's hv_clock before back to guest when tsc_offset is adjusted

commit d9130a2dfdd4b21736c91b818f87dbc0ccd1e757 upstream.

When MSR_IA32_TSC_ADJUST is written by guest due to TSC ADJUST feature
especially there's a big tsc warp (like a new vCPU is hot-added into VM
which has been up for a long time), tsc_offset is added by a large value
then go back to guest. This causes system time jump as tsc_timestamp is
not adjusted in the meantime and pvclock monotonic character.
To fix this, just notify kvm to update vCPU's guest time before back to
guest.

Cc: stable@vger.kernel.org
Signed-off-by: Zelin Deng <zelin.deng@linux.alibaba.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1619576521-81399-2-git-send-email-zelin.deng@linux.alibaba.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KVM: s390: index kvm->arch.idle_mask by vcpu_idx

commit a3e03bc1368c1bc16e19b001fc96dc7430573cc8 upstream.

While in practice vcpu->vcpu_idx ==  vcpu->vcp_id is often true, it may
not always be, and we must not rely on this. Reason is that KVM decides
the vcpu_idx, userspace decides the vcpu_id, thus the two might not
match.

Currently kvm->arch.idle_mask is indexed by vcpu_id, which implies
that code like
for_each_set_bit(vcpu_id, kvm->arch.idle_mask, online_vcpus) {
                vcpu = kvm_get_vcpu(kvm, vcpu_id);
do_stuff(vcpu);
}
is not legit. Reason is that kvm_get_vcpu expects an vcpu_idx, not an
vcpu_id.  The trouble is, we do actually use kvm->arch.idle_mask like
this. To fix this problem we have two options. Either use
kvm_get_vcpu_by_id(vcpu_id), which would loop to find the right vcpu_id,
or switch to indexing via vcpu_idx. The latter is preferable for obvious
reasons.

Let us make switch from indexing kvm->arch.idle_mask by vcpu_id to
indexing it by vcpu_idx.  To keep gisa_int.kicked_mask indexed by the
same index as idle_mask lets make the same change for it as well.

Fixes: 1ee0bc559dc3 ("KVM: s390: get rid of local_int array")
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Christian Bornträger <borntraeger@de.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: <stable@vger.kernel.org> # 3.15+
Link: https://lore.kernel.org/r/20210827125429.1912577-1-pasic@linux.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Revert "KVM: x86: mmu: Add guest physical address check in translate_gpa()"

commit e7177339d7b5f9594b316842122b5fda9513d5e2 upstream.

Revert a misguided illegal GPA check when "translating" a non-nested GPA.
The check is woefully incomplete as it does not fill in @exception as
expected by all callers, which leads to KVM attempting to inject a bogus
exception, potentially exposing kernel stack information in the process.

WARNING: CPU: 0 PID: 8469 at arch/x86/kvm/x86.c:525 exception_type+0x98/0xb0 arch/x86/kvm/x86.c:525
CPU: 1 PID: 8469 Comm: syz-executor531 Not tainted 5.14.0-rc7-syzkaller #0
RIP: 0010:exception_type+0x98/0xb0 arch/x86/kvm/x86.c:525
Call Trace:
  x86_emulate_instruction+0xef6/0x1460 arch/x86/kvm/x86.c:7853
  kvm_mmu_page_fault+0x2f0/0x1810 arch/x86/kvm/mmu/mmu.c:5199
  handle_ept_misconfig+0xdf/0x3e0 arch/x86/kvm/vmx/vmx.c:5336
  __vmx_handle_exit arch/x86/kvm/vmx/vmx.c:6021 [inline]
  vmx_handle_exit+0x336/0x1800 arch/x86/kvm/vmx/vmx.c:6038
  vcpu_enter_guest+0x2a1c/0x4430 arch/x86/kvm/x86.c:9712
  vcpu_run arch/x86/kvm/x86.c:9779 [inline]
  kvm_arch_vcpu_ioctl_run+0x47d/0x1b20 arch/x86/kvm/x86.c:10010
  kvm_vcpu_ioctl+0x49e/0xe50 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3652

The bug has escaped notice because practically speaking the GPA check is
useless.  The GPA check in question only comes into play when KVM is
walking guest page tables (or "translating" CR3), and KVM already handles
illegal GPA checks by setting reserved bits in rsvd_bits_mask for each
PxE, or in the case of CR3 for loading PTDPTRs, manually checks for an
illegal CR3.  This particular failure doesn't hit the existing reserved
bits checks because syzbot sets guest.MAXPHYADDR=1, and IA32 architecture
simply doesn't allow for such an absurd MAXPHYADDR, e.g. 32-bit paging
doesn't define any reserved PA bits checks, which KVM emulates by only
incorporating the reserved PA bits into the "high" bits, i.e. bits 63:32.

Simply remove the bogus check.  There is zero meaningful value and no
architectural justification for supporting guest.MAXPHYADDR < 32, and
properly filling the exception would introduce non-trivial complexity.

This reverts commit ec7771ab471ba6a945350353617e2e3385d0e013.

Fixes: ec7771ab471b ("KVM: x86: mmu: Add guest physical address check in translate_gpa()")
Cc: stable@vger.kernel.org
Reported-by: syzbot+200c08e88ae818f849ce@syzkaller.appspotmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210831164224.1119728-2-seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/resctrl: Fix a maybe-uninitialized build warning treated as error

commit 527f721478bce3f49b513a733bacd19d6f34b08c upstream.

The recent commit

  064855a69003 ("x86/resctrl: Fix default monitoring groups reporting")

caused a RHEL build failure with an uninitialized variable warning
treated as an error because it removed the default case snippet.

The RHEL Makefile uses '-Werror=maybe-uninitialized' to force possibly
uninitialized variable warnings to be treated as errors. This is also
reported by smatch via the 0day robot.

The error from the RHEL build is:

  arch/x86/kernel/cpu/resctrl/monitor.c: In function ‘__mon_event_count’:
  arch/x86/kernel/cpu/resctrl/monitor.c:261:12: error: ‘m’ may be used
  uninitialized in this function [-Werror=maybe-uninitialized]
    m->chunks += chunks;
              ^~

The upstream Makefile does not build using '-Werror=maybe-uninitialized'.
So, the problem is not seen there. Fix the problem by putting back the
default case snippet.

[ bp: note that there's nothing wrong with the code and other compilers
   do not trigger this warning - this is being done just so the RHEL compiler
   is happy. ]

Fixes: 064855a69003 ("x86/resctrl: Fix default monitoring groups reporting")
Reported-by: Terry Bowman <Terry.Bowman@amd.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/162949631908.23903.17090272726012848523.stgit@bmoger-ubuntu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

perf/x86/amd/ibs: Extend PERF_PMU_CAP_NO_EXCLUDE to IBS Op

commit f11dd0d80555cdc8eaf5cfc9e19c9e198217f9f1 upstream.

Commit:

2ff40250691e ("perf/core, arch/x86: Use PERF_PMU_CAP_NO_EXCLUDE for exclusion incapable PMUs")

neglected to do so.

Fixes: 2ff40250691e ("perf/core, arch/x86: Use PERF_PMU_CAP_NO_EXCLUDE for exclusion incapable PMUs")
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210817221048.88063-2-kim.phillips@amd.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tty: Fix data race between tiocsti() and flush_to_ldisc()

commit bb2853a6a421a052268eee00fd5d3f6b3504b2b1 upstream.

The ops->receive_buf() may be accessed concurrently from these two
functions.  If the driver flushes data to the line discipline
receive_buf() method while tiocsti() is waiting for the
ops->receive_buf() to finish its work, the data race will happen.

For example:
tty_ioctl |tty_ldisc_receive_buf
->tioctsi | ->tty_port_default_receive_buf
|  ->tty_ldisc_receive_buf
   ->hci_uart_tty_receive |   ->hci_uart_tty_receive
    ->h4_recv                   |    ->h4_recv

In this case, the h4 receive buffer will be overwritten by the
latecomer, and we will lost the data.

Hence, change tioctsi() function to use the exclusive lock interface
from tty_buffer to avoid the data race.

Reported-by: syzbot+97388eb9d31b997fe1d0@syzkaller.appspotmail.com
Reviewed-by: Jiri Slaby <jirislaby@kernel.org>
Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com>
Link: https://lore.kernel.org/r/20210823000641.2082292-1-phind.uet@gmail.com
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bio: fix page leak bio_add_hw_page failure

commit d9cf3bd531844ffbfe94b16e417037a16efc988d upstream.

__bio_iov_append_get_pages() doesn't put not appended pages on
bio_add_hw_page() failure, so potentially leaking them, fix it. Also, do
the same for __bio_iov_iter_get_pages(), even though it looks like it
can't be triggered by userspace in this case.

Fixes: 0512a75b98f8 ("block: Introduce REQ_OP_ZONE_APPEND")
Cc: stable@vger.kernel.org # 5.8+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1edfa6a2ffd66d55e6345a477df5387d2c1415d0.1626653825.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

io_uring: IORING_OP_WRITE needs hash_reg_file set

commit 7b3188e7ed54102a5dcc73d07727f41fb528f7c8 upstream.

During some testing, it became evident that using IORING_OP_WRITE doesn't
hash buffered writes like the other writes commands do. That's simply
an oversight, and can cause performance regressions when doing buffered
writes with this command.

Correct that and add the flag, so that buffered writes are correctly
hashed when using the non-iovec based write command.

Cc: stable@vger.kernel.org
Fixes: 3a6820f2bb8a ("io_uring: add non-vectored read/write commands")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

time: Handle negative seconds correctly in timespec64_to_ns()

commit 39ff83f2f6cc5cc1458dfcea9697f96338210beb upstream.

timespec64_ns() prevents multiplication overflows by comparing the seconds
value of the timespec to KTIME_SEC_MAX. If the value is greater or equal it
returns KTIME_MAX.

But that check casts the signed seconds value to unsigned which makes the
comparision true for all negative values and therefore return wrongly
KTIME_MAX.

Negative second values are perfectly valid and required in some places,
e.g. ptp_clock_adjtime().

Remove the cast and add a check for the negative boundary which is required
to prevent undefined behaviour due to multiplication underflow.

Fixes: cb47755725da ("time: Prevent undefined behaviour in timespec64_to_ns()")'
Signed-off-by: Lukas Hannen <lukas.hannen@opensource.tttech-industrial.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/AM6PR01MB541637BD6F336B8FFB72AF80EEC69@AM6PR01MB5416.eurprd01.prod.exchangelabs.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f2fs: guarantee to write dirty data when enabling checkpoint back

commit dddd3d65293a52c2c3850c19b1e5115712e534d8 upstream.

We must flush all the dirty data when enabling checkpoint back. Let's guarantee
that first by adding a retry logic on sync_inodes_sb(). In addition to that,
this patch adds to flush data in fsync when checkpoint is disabled, which can
mitigate the sync_inodes_sb() failures in advance.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

iwlwifi Add support for ax201 in Samsung Galaxy Book Flex2 Alpha

commit 2f32c147a3816d789722c0bd242a9431332ec3ed upstream.

The Samsung Galaxy Book Flex2 Alpha uses an ax201 with the ID a0f0/6074.
This works fine with the existing driver once it knows to claim it.
Simple patch to add the device.

Signed-off-by: Justin M. Forbes <jforbes@fedoraproject.org>
Reviewed-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210702223155.1981510-1-jforbes@fedoraproject.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ASoC: rt5682: Remove unused variable in rt5682_i2c_remove()

commit a1ea05723c27a6f77894a60038a7b2b12fcec9a7 upstream.

In commit 772d44526e20 ("ASoC: rt5682: Properly turn off regulators if
wrong device ID") I deleted code but forgot to delete a variable
that's now unused. Delete it.

Fixes: 772d44526e20 ("ASoC: rt5682: Properly turn off regulators if wrong device ID")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/20210813073402.1.Iaa9425cfab80f5233afa78b32d02b6dc23256eb3@changeid
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ipv4: fix endianness issue in inet_rtm_getroute_build_skb()

[ Upstream commit 92548b0ee220e000d81c27ac9a80e0ede895a881 ]

The UDP length field should be in network order.
This removes the following sparse error:

net/ipv4/route.c:3173:27: warning: incorrect type in assignment (different base types)
net/ipv4/route.c:3173:27: expected restricted __be16 [usertype] len
net/ipv4/route.c:3173:27: got unsigned long

Fixes: 404eb77ea766 ("ipv4: support sport, dport and ip_proto in RTM_GETROUTE")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Roopa Prabhu <roopa@nvidia.com>
Cc: David Ahern <dsahern@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>

octeontx2-af: Set proper errorcode for IPv4 checksum errors

[ Upstream commit 1e4428b6dba9b683dc2ec0a56ed7879de3200cce ]

With current config, for packets with IPv4 checksum errors,
errorcode is being set to UNKNOWN. Hence added a separate
errorcodes for outer and inner IPv4 checksum and changed
NPC configuration accordingly.

Also turn on L2 multicast address check in NPC protocol check block.

Fixes: 6b3321bacc5a ("octeontx2-af: Enable packet length and csum validation")
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>

octeontx2-af: Fix static code analyzer reported issues

[ Upstream commit 698a82ebfb4b2f2014baf31b7324b328a2a6366e ]

This patch fixes the static code analyzer reported issues
in rvu_npc.c. The reported errors are different sizes of
operands in bitops and returning uninitialized values.

Fixes: 651cd2652339 ("octeontx2-af: MCAM entry installation support")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>

octeontx2-af: Fix loop in free and unmap counter

[ Upstream commit 6537e96d743b89294b397b4865c6c061abae31b0 ]

When the given counter does not belong to the entry
then code ends up in infinite loop because the loop
cursor, entry is not getting updated further. This
patch fixes that by updating entry for every iteration.

Fixes: a958dd59f9ce ("octeontx2-af: Map or unmap NPC MCAM entry and counter")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>

net: qualcomm: fix QCA7000 checksum handling

[ Upstream commit 429205da6c834447a57279af128bdd56ccd5225e ]

Based on tests the QCA7000 doesn't support checksum offloading. So assume
ip_summed is CHECKSUM_NONE and let the kernel take care of the checksum
handling. This fixes data transfer issues in noisy environments.

Reported-by: Michael Heimpold <michael.heimpold@in-tech.com>
Fixes: 291ab06ecf67 ("net: qualcomm: new Ethernet over SPI driver for QCA7000")
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>

net: sched: Fix qdisc_rate_table refcount leak when get tcf_block failed

[ Upstream commit c66070125837900163b81a03063ddd657a7e9bfb ]

The reference counting issue happens in one exception handling path of
cbq_change_class(). When failing to get tcf_block, the function forgets
to decrease the refcount of "rtab" increased by qdisc_put_rtab(),
causing a refcount leak.

Fix this issue by jumping to "failure" label when get tcf_block failed.

Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure")
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Link: https://lore.kernel.org/r/1630252681-71588-1-git-send-email-xiyuyang19@fudan.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

ipv4: make exception cache less predictible

[ Upstream commit 67d6d681e15b578c1725bad8ad079e05d1c48a8e ]

Even after commit 6457378fe796 ("ipv4: use siphash instead of Jenkins in
fnhe_hashfun()"), an attacker can still use brute force to learn
some secrets from a victim linux host.

One way to defeat these attacks is to make the max depth of the hash
table bucket a random value.

Before this patch, each bucket of the hash table used to store exceptions
could contain 6 items under attack.

After the patch, each bucket would contains a random number of items,
between 6 and 10. The attacker can no longer infer secrets.

This is slightly increasing memory size used by the hash table,
by 50% in average, we do not expect this to be a problem.

This patch is more complex than the prior one (IPv6 equivalent),
because IPv4 was reusing the oldest entry.
Since we need to be able to evict more than one entry per
update_or_create_fnhe() call, I had to replace
fnhe_oldest() with fnhe_remove_oldest().

Also note that we will queue extra kfree_rcu() calls under stress,
which hopefully wont be a too big issue.

Fixes: 4895c771c7f0 ("ipv4: Add FIB nexthop exceptions.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Keyu Man <kman001@ucr.edu>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: David Ahern <dsahern@kernel.org>
Tested-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>

ipv6: make exception cache less predictible

[ Upstream commit a00df2caffed3883c341d5685f830434312e4a43 ]

Even after commit 4785305c05b2 ("ipv6: use siphash in rt6_exception_hash()"),
an attacker can still use brute force to learn some secrets from a victim
linux host.

One way to defeat these attacks is to make the max depth of the hash
table bucket a random value.

Before this patch, each bucket of the hash table used to store exceptions
could contain 6 items under attack.

After the patch, each bucket would contains a random number of items,
between 6 and 10. The attacker can no longer infer secrets.

This is slightly increasing memory size used by the hash table,
we do not expect this to be a problem.

Following patch is dealing with the same issue in IPv4.

Fixes: 35732d01fe31 ("ipv6: introduce a hash table to store dst cache")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Keyu Man <kman001@ucr.edu>
Cc: Wei Wang <weiwan@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>

brcmfmac: pcie: fix oops on failure to resume and reprobe

[ Upstream commit d745ca4f2c4ae9f1bd8cf7d8ac6e22d739bffd19 ]

When resuming from suspend, brcmf_pcie_pm_leave_D3 will first attempt a
hot resume and then fall back to removing the PCI device and then
reprobing. If this probe fails, the kernel will oops, because brcmf_err,
which is called to report the failure will dereference the stale bus
pointer. Open code and use the default bus-less brcmf_err to avoid this.

Fixes: 8602e62441ab ("brcmfmac: pass bus to the __brcmf_err() in pcie.c")
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210817063521.22450-1-a.fatoum@pengutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>

bcma: Fix memory leak for internally-handled cores

[ Upstream commit b63aed3ff195130fef12e0af590f4838cf0201d8 ]

kmemleak reported that dev_name() of internally-handled cores were leaked
on driver unbinding. Let's use device_initialize() to take refcounts for
them and put_device() to properly free the related stuff.

While looking at it, there's another potential issue for those which should
be *registered* into driver core. If device_register() failed, we put
device once and freed bcma_device structures. In bcma_unregister_cores(),
they're treated as unregistered and we hit both UAF and double-free. That
smells not good and has also been fixed now.

Fixes: ab54bc8460b5 ("bcma: fill core details for every device")
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210727025232.663-2-yuzenghui@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>

atlantic: Fix driver resume flow.

[ Upstream commit 57f780f1c43362b86fd23d20bd940e2468237716 ]

Driver crashes when restoring from the Hibernate. In the resume flow,
driver need to clean up the older nic/vec objects and re-initialize them.

Fixes: 8aaa112a57c1d ("net: atlantic: refactoring pm logic")
Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>

ath6kl: wmi: fix an error code in ath6kl_wmi_sync_point()

[ Upstream commit fd6729ec534cffbbeb3917761e6d1fe6a412d3fe ]

This error path is unlikely because of it checked for NULL and
returned -ENOMEM earlier in the function. But it should return
an error code here as well if we ever do hit it because of a
race condition or something.

Fixes: bdcd81707973 ("Add ath6kl cleaned up driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210813113438.GB30697@kili
Signed-off-by: Sasha Levin <sashal@kernel.org>

ice: Only lock to update netdev dev_addr

[ Upstream commit b357d9717be7f95fde2c6c4650b186a995b71e59 ]

commit 3ba7f53f8bf1 ("ice: don't remove netdev->dev_addr from uc sync
list") introduced calls to netif_addr_lock_bh() and
netif_addr_unlock_bh() in the driver's ndo_set_mac() callback. This is
fine since the driver is updated the netdev's dev_addr, but since this
is a spinlock, the driver cannot sleep when the lock is held.
Unfortunately the functions to add/delete MAC filters depend on a mutex.
This was causing a trace with the lock debug kernel config options
enabled when changing the mac address via iproute.

[  203.273059] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:281
[  203.273065] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 6698, name: ip
[  203.273068] Preemption disabled at:
[  203.273068] [<ffffffffc04aaeab>] ice_set_mac_address+0x8b/0x1c0 [ice]
[  203.273097] CPU: 31 PID: 6698 Comm: ip Tainted: G S      W I       5.14.0-rc4 #2
[  203.273100] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0010.010620200716 01/06/2020
[  203.273102] Call Trace:
[  203.273107]  dump_stack_lvl+0x33/0x42
[  203.273113]  ? ice_set_mac_address+0x8b/0x1c0 [ice]
[  203.273124]  ___might_sleep.cold.150+0xda/0xea
[  203.273131]  mutex_lock+0x1c/0x40
[  203.273136]  ice_remove_mac+0xe3/0x180 [ice]
[  203.273155]  ? ice_fltr_add_mac_list+0x20/0x20 [ice]
[  203.273175]  ice_fltr_prepare_mac+0x43/0xa0 [ice]
[  203.273194]  ice_set_mac_address+0xab/0x1c0 [ice]
[  203.273206]  dev_set_mac_address+0xb8/0x120
[  203.273210]  dev_set_mac_address_user+0x2c/0x50
[  203.273212]  do_setlink+0x1dd/0x10e0
[  203.273217]  ? __nla_validate_parse+0x12d/0x1a0
[  203.273221]  __rtnl_newlink+0x530/0x910
[  203.273224]  ? __kmalloc_node_track_caller+0x17f/0x380
[  203.273230]  ? preempt_count_add+0x68/0xa0
[  203.273236]  ? _raw_spin_lock_irqsave+0x1f/0x30
[  203.273241]  ? kmem_cache_alloc_trace+0x4d/0x440
[  203.273244]  rtnl_newlink+0x43/0x60
[  203.273245]  rtnetlink_rcv_msg+0x13a/0x380
[  203.273248]  ? rtnl_calcit.isra.40+0x130/0x130
[  203.273250]  netlink_rcv_skb+0x4e/0x100
[  203.273256]  netlink_unicast+0x1a2/0x280
[  203.273258]  netlink_sendmsg+0x242/0x490
[  203.273260]  sock_sendmsg+0x58/0x60
[  203.273263]  ____sys_sendmsg+0x1ef/0x260
[  203.273265]  ? copy_msghdr_from_user+0x5c/0x90
[  203.273268]  ? ____sys_recvmsg+0xe6/0x170
[  203.273270]  ___sys_sendmsg+0x7c/0xc0
[  203.273272]  ? copy_msghdr_from_user+0x5c/0x90
[  203.273274]  ? ___sys_recvmsg+0x89/0xc0
[  203.273276]  ? __netlink_sendskb+0x50/0x50
[  203.273278]  ? mod_objcg_state+0xee/0x310
[  203.273282]  ? __dentry_kill+0x114/0x170
[  203.273286]  ? get_max_files+0x10/0x10
[  203.273288]  __sys_sendmsg+0x57/0xa0
[  203.273290]  do_syscall_64+0x37/0x80
[  203.273295]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  203.273296] RIP: 0033:0x7f8edf96e278
[  203.273298] Code: 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 63 2c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 41 89 d4 55
[  203.273300] RSP: 002b:00007ffcb8bdac08 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[  203.273303] RAX: ffffffffffffffda RBX: 000000006115e0ae RCX: 00007f8edf96e278
[  203.273304] RDX: 0000000000000000 RSI: 00007ffcb8bdac70 RDI: 0000000000000003
[  203.273305] RBP: 0000000000000000 R08: 0000000000000001 R09: 00007ffcb8bda5b0
[  203.273306] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
[  203.273306] R13: 0000555e10092020 R14: 0000000000000000 R15: 0000000000000005

Fix this by only locking when changing the netdev->dev_addr. Also, make
sure to restore the old netdev->dev_addr on any failures.

Fixes: 3ba7f53f8bf1 ("ice: don't remove netdev->dev_addr from uc sync list")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

iwlwifi: skip first element in the WTAS ACPI table

[ Upstream commit 19426d54302e199b3fd2d575f926a13af66be2b9 ]

By mistake we were considering the first element of the WTAS wifi
package as part of the data we want to rid, but that element is the wifi
package signature (always 0x07), so it should be skipped.

Change the code to read the data starting from element 1 instead.

Signed-off-by: Abhishek Naik <abhishek.naik@intel.com>
Fixes: 28dd7ccdc56f ("iwlwifi: acpi: read TAS table from ACPI and send it to the FW")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20210805141826.ff8148197b15.I70636c04e37b2b57a5df3ce611511f62203d27a7@changeid
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

iwlwifi: follow the new inclusive terminology

[ Upstream commit cdaba917268d7b58bf02fcc587cb2a7a277dc931 ]

The new inclusive terminology requires to change a few
terms that were used in iwlwifi.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20201209231352.1eb4c8625f36.I1b17b68d4a8e77071da3e15ffbd902d15c1d4938@changeid
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

ASoC: wcd9335: Disable irq on slave ports in the remove function

[ Upstream commit d3efd26af2e044ff2b48d38bb871630282d77e60 ]

The probe calls 'wcd9335_setup_irqs()' to enable interrupts on all slave
ports.
This must be undone in the remove function.

Add a 'wcd9335_teardown_irqs()' function that undoes 'wcd9335_setup_irqs()'
function, and call it from the remove function.

Fixes: 20aedafdf492 ("ASoC: wcd9335: add support to wcd9335 codec")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Message-Id: <8f761244d79bd4c098af8a482be9121d3a486d1b.1629091028.git.christophe.jaillet@wanadoo.fr>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

ASoC: wcd9335: Fix a memory leak in the error handling path of the probe function

[ Upstream commit fc6fc81caa63900cef9ebb8b2e365c3ed5a9effb ]

If 'wcd9335_setup_irqs()' fails, me must release the memory allocated in
'wcd_clsh_ctrl_alloc()', as already done in the remove function.

Add an error handling path and the missing 'wcd_clsh_ctrl_free()' call.

Fixes: 20aedafdf492 ("ASoC: wcd9335: add support to wcd9335 codec")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Message-Id: <6dc12372f09fabb70bf05941dbe6a1382dc93e43.1629091028.git.christophe.jaillet@wanadoo.fr>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

ASoC: wcd9335: Fix a double irq free in the remove function

[ Upstream commit 7a6a723e98aa45f393e6add18f7309dfffa1b0e2 ]

There is no point in calling 'free_irq()' explicitly for
'WCD9335_IRQ_SLIMBUS' in the remove function.

The irqs are requested in 'wcd9335_setup_irqs()' using a resource managed
function (i.e. 'devm_request_threaded_irq()').
'wcd9335_setup_irqs()' requests all what is defined in the 'wcd9335_irqs'
structure.
This structure has only one entry for 'WCD9335_IRQ_SLIMBUS'.

So 'devm_request...irq()' + explicit 'free_irq()' would lead to a double
free.

Remove the unneeded 'free_irq()' from the remove function.

Fixes: 20aedafdf492 ("ASoC: wcd9335: add support to wcd9335 codec")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Message-Id: <0614d63bc00edd7e81dd367504128f3d84f72efa.1629091028.git.christophe.jaillet@wanadoo.fr>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

tty: serial: fsl_lpuart: fix the wrong mapbase value

[ Upstream commit d5c38948448abc2bb6b36dbf85a554bf4748885e ]

Register offset needs to be applied on mapbase also.
dma_tx/rx_request use the physical address of UARTDATA.
Register offset is currently only applied to membase (the
corresponding virtual addr) but not on mapbase.

Fixes: 24b1e5f0e83c ("tty: serial: lpuart: add imx7ulp support")
Reviewed-by: Leonard Crestez <leonard.crestez@nxp.com>
Signed-off-by: Adriana Reus <adriana.reus@nxp.com>
Signed-off-by: Sherry Sun <sherry.sun@nxp.com>
Signed-off-by: Andy Duan <fugang.duan@nxp.com>
Link: https://lore.kernel.org/r/20210819021033.32606-1-sherry.sun@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

usb: bdc: Fix a resource leak in the error handling path of 'bdc_probe()'

[ Upstream commit 6f15a2a09cecb7a2faba4a75bbd101f6f962294b ]

If an error occurs after a successful 'clk_prepare_enable()' call, it must
be undone by a corresponding 'clk_disable_unprepare()' call.
This call is already present in the remove function.

Add this call in the error handling path and reorder the code so that the
'clk_prepare_enable()' call happens later in the function.
The goal is to have as much managed resources functions as possible
before the 'clk_prepare_enable()' call in order to keep the error handling
path simple.

While at it, remove the now unneeded 'clk' variable.

Fixes: c87dca047849 ("usb: bdc: Add clock enable for new chips with a separate BDC clock")
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/f8a4a6897deb0c8cb2e576580790303550f15fcd.1629314734.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

usb: bdc: Fix an error handling path in 'bdc_probe()' when no suitable DMA config is available

[ Upstream commit d2f42e09393c774ab79088d8e3afcc62b3328fc9 ]

If no suitable DMA configuration is available, a previous 'bdc_phy_init()'
call must be undone by a corresponding 'bdc_phy_exit()' call.

Branch to the existing error handling path instead of returning
directly.

Fixes: cc29d4f67757 ("usb: bdc: Add support for USB phy")
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/0c5910979f39225d5d8fe68c9ab1c147c68ddee1.1629314734.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

usb: ehci-orion: Handle errors of clk_prepare_enable() in probe

[ Upstream commit 4720f1bf4ee4a784d9ece05420ba33c9222a3004 ]

ehci_orion_drv_probe() did not account for possible errors of
clk_prepare_enable() that in particular could cause invocation of
clk_disable_unprepare() on clocks that were not prepared/enabled yet,
e.g. in remove or on handling errors of usb_add_hcd() in probe. Though,
there were several patches fixing different issues with clocks in this
driver, they did not solve this problem.

Add handling of errors of clk_prepare_enable() in ehci_orion_drv_probe()
to avoid calls of clk_disable_unprepare() without previous successful
invocation of clk_prepare_enable().

Found by Linux Driver Verification project (linuxtesting.org).

Fixes: 8c869edaee07 ("ARM: Orion: EHCI: Add support for enabling clocks")
Co-developed-by: Kirill Shilimanov <kirill.shilimanov@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Evgeny Novikov <novikov@ispras.ru>
Signed-off-by: Kirill Shilimanov <kirill.shilimanov@huawei.com>
Link: https://lore.kernel.org/r/20210825170902.11234-1-novikov@ispras.ru
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

i2c: xlp9xx: fix main IRQ check

[ Upstream commit 661e8a88e8317eb9ffe69c69d6cb4876370fe7e2 ]

Iff platform_get_irq() returns 0 for the main IRQ, the driver's probe()
method will return 0 early (as if the method's call was successful).
Let's consider IRQ0 valid for simplicity -- devm_request_irq() can always
override that decision...

Fixes: 2bbd681ba2b ("i2c: xlp9xx: Driver for Netlogic XLP9XX/5XX I2C controller")
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Reviewed-by: George Cherian <george.cherian@marvell.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

i2c: mt65xx: fix IRQ check

[ Upstream commit 58fb7c643d346e2364404554f531cfa6a1a3917c ]

Iff platform_get_irq() returns 0, the driver's probe() method will return 0
early (as if the method's call was successful). Let's consider IRQ0 valid
for simplicity -- devm_request_irq() can always override that decision...

Fixes: ce38815d39ea ("I2C: mediatek: Add driver for MediaTek I2C controller")
Signed-off-by: Sergey Shtylyov <s.shtylyov@omprussia.ru>
Reviewed-by: Qii Wang <qii.wang@mediatek.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

CIFS: Fix a potencially linear read overflow

[ Upstream commit f980d055a0f858d73d9467bb0b570721bbfcdfb8 ]

strlcpy() reads the entire source buffer first. This read may exceed the
destination size limit. This is both inefficient and can lead to linear
read overflows if a source string is not NUL-terminated.

Also, the strnlen() call does not avoid the read overflow in the strlcpy
function when a not NUL-terminated string is passed.

So, replace this block by a call to kstrndup() that avoids this type of
overflow and does the same.

Fixes: 066ce6899484d ("cifs: rename cifs_strlcpy_to_host and make it use new functions")
Signed-off-by: Len Baker <len.baker@gmx.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

bpf: Fix possible out of bound write in narrow load handling

[ Upstream commit d7af7e497f0308bc97809cc48b58e8e0f13887e1 ]

Fix a verifier bug found by smatch static checker in [0].

This problem has never been seen in prod to my best knowledge. Fixing it
still seems to be a good idea since it's hard to say for sure whether
it's possible or not to have a scenario where a combination of
convert_ctx_access() and a narrow load would lead to an out of bound
write.

When narrow load is handled, one or two new instructions are added to
insn_buf array, but before it was only checked that

cnt >= ARRAY_SIZE(insn_buf)

And it's safe to add a new instruction to insn_buf[cnt++] only once. The
second try will lead to out of bound write. And this is what can happen
if `shift` is set.

Fix it by making sure that if the BPF_RSH instruction has to be added in
addition to BPF_AND then there is enough space for two more instructions
in insn_buf.

The full report [0] is below:

kernel/bpf/verifier.c:12304 convert_ctx_accesses() warn: offset 'cnt' incremented past end of array
kernel/bpf/verifier.c:12311 convert_ctx_accesses() warn: offset 'cnt' incremented past end of array

kernel/bpf/verifier.c
    12282
    12283 insn->off = off & ~(size_default - 1);
    12284 insn->code = BPF_LDX | BPF_MEM | size_code;
    12285 }
    12286
    12287 target_size = 0;
    12288 cnt = convert_ctx_access(type, insn, insn_buf, env->prog,
    12289 &target_size);
    12290 if (cnt == 0 || cnt >= ARRAY_SIZE(insn_buf) ||
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
Bounds check.

    12291     (ctx_field_size && !target_size)) {
    12292 verbose(env, "bpf verifier is misconfigured\n");
    12293 return -EINVAL;
    12294 }
    12295
    12296 if (is_narrower_load && size < target_size) {
    12297 u8 shift = bpf_ctx_narrow_access_offset(
    12298 off, size, size_default) * 8;
    12299 if (ctx_field_size <= 4) {
    12300 if (shift)
    12301 insn_buf[cnt++] = BPF_ALU32_IMM(BPF_RSH,
                                                         ^^^^^
increment beyond end of array

    12302 insn->dst_reg,
    12303 shift);
--> 12304 insn_buf[cnt++] = BPF_ALU32_IMM(BPF_AND, insn->dst_reg,
                                                 ^^^^^
out of bounds write

    12305 (1 << size * 8) - 1);
    12306 } else {
    12307 if (shift)
    12308 insn_buf[cnt++] = BPF_ALU64_IMM(BPF_RSH,
    12309 insn->dst_reg,
    12310 shift);
    12311 insn_buf[cnt++] = BPF_ALU64_IMM(BPF_AND, insn->dst_reg,
                                        ^^^^^^^^^^^^^^^
Same.

    12312 (1ULL << size * 8) - 1);
    12313 }
    12314 }
    12315
    12316 new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
    12317 if (!new_prog)
    12318 return -ENOMEM;
    12319
    12320 delta += cnt - 1;
    12321
    12322 /* keep walking new program and skip insns we just inserted */
    12323 env->prog = new_prog;
    12324 insn      = new_prog->insnsi + i + delta;
    12325 }
    12326
    12327 return 0;
    12328 }

[0] https://lore.kernel.org/bpf/20210817050843.GA21456@kili/

v1->v2:
- clarify that problem was only seen by static checker but not in prod;

Fixes: 46f53a65d2de ("bpf: Allow narrow loads with offset > 0")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210820163935.1902398-1-rdna@fb.com
Signed-off-by: Sasha Levin <sashal@kernel.org>

mmc: moxart: Fix issue with uninitialized dma_slave_config

[ Upstream commit ee5165354d498e5bceb0b386e480ac84c5f8c28c ]

Depending on the DMA driver being used, the struct dma_slave_config may
need to be initialized to zero for the unused data.

For example, we have three DMA drivers using src_port_window_size and
dst_port_window_size. If these are left uninitialized, it can cause DMA
failures.

For moxart, this is probably not currently an issue but is still good to
fix though.

Fixes: 1b66e94e6b99 ("mmc: moxart: Add MOXA ART SD/MMC driver")
Cc: Jonas Jensen <jonas.jensen@gmail.com>
Cc: Vinod Koul <vkoul@kernel.org>
Cc: Peter Ujfalusi <peter.ujfalusi@gmail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Link: https://lore.kernel.org/r/20210810081644.19353-3-tony@atomide.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

mmc: dw_mmc: Fix issue with uninitialized dma_slave_config

[ Upstream commit c3ff0189d3bc9c03845fe37472c140f0fefd0c79 ]

Depending on the DMA driver being used, the struct dma_slave_config may
need to be initialized to zero for the unused data.

For example, we have three DMA drivers using src_port_window_size and
dst_port_window_size. If these are left uninitialized, it can cause DMA
failures.

For dw_mmc, this is probably not currently an issue but is still good to
fix though.

Fixes: 3fc7eaef44db ("mmc: dw_mmc: Add external dma interface support")
Cc: Shawn Lin <shawn.lin@rock-chips.com>
Cc: Jaehoon Chung <jh80.chung@samsung.com>
Cc: Peter Ujfalusi <peter.ujfalusi@gmail.com>
Cc: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Link: https://lore.kernel.org/r/20210810081644.19353-2-tony@atomide.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

mmc: sdhci: Fix issue with uninitialized dma_slave_config

[ Upstream commit 522654d534d315d540710124c57b49ca22ac5f72 ]

Depending on the DMA driver being used, the struct dma_slave_config may
need to be initialized to zero for the unused data.

For example, we have three DMA drivers using src_port_window_size and
dst_port_window_size. If these are left uninitialized, it can cause DMA
failures at least if external TI SDMA is ever configured for sdhci.

For other external DMA cases, this is probably not currently an issue but
is still good to fix though.

Fixes: 18e762e3b7a7 ("mmc: sdhci: add support for using external DMA devices")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Chunyan Zhang <zhang.chunyan@linaro.org>
Cc: Faiz Abbas <faiz_abbas@ti.com>
Cc: Peter Ujfalusi <peter.ujfalusi@gmail.com>
Cc: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Peter Ujfalusi <peter.ujfalusi@gmail.com>
Link: https://lore.kernel.org/r/20210810081644.19353-1-tony@atomide.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

ASoC: Intel: Skylake: Fix module resource and format selection

[ Upstream commit e8b374b649afe756c2470e0e6668022e90bf8518 ]

Module configuration may differ between its instances depending on
resources required and input and output audio format. Available
parameters to select from are stored in module resource and interface
(format) lists. These come from topology, together with description of
each of pipe's modules.

Ignoring index value provided by topology and relying always on 0th
entry leads to unexpected module behavior due to under/overbudged
resources assigned or impropper format selection. Fix by taking entry at
index specified by topology.

Fixes: f6fa56e22559 ("ASoC: Intel: Skylake: Parse and update module config structure")
Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com>
Tested-by: Lukasz Majczak <lma@semihalf.com>
Link: https://lore.kernel.org/r/20210818075742.1515155-5-cezary.rojewski@intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

ASoC: Intel: Skylake: Leave data as is when invoking TLV IPCs

[ Upstream commit 126b3422adc80f29d2129db7f61e0113a8a526c6 ]

Advancing pointer initially fixed issue for some users but caused
regression for others. Leave data as it to make it easier for end users
to adjust their topology files if needed.

Fixes: a8cd7066f042 ("ASoC: Intel: Skylake: Strip T and L from TLV IPCs")
Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com>
Tested-by: Lukasz Majczak <lma@semihalf.com>
Link: https://lore.kernel.org/r/20210818075742.1515155-3-cezary.rojewski@intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

ASoC: Intel: kbl_da7219_max98927: Fix format selection for max98373

[ Upstream commit 6d41bbf2fd3615c56dbf2b67f6cbf9e83d14a2e2 ]

Contrary to what is said in board's file, topology targeting
kbl_da7219_max98373 expects format 16b, not 24/32b. Partially revert
changes added in 'ASoC: Intel: Boards: Add Maxim98373 support' to bring
old behavior back, aligning with topology expectations.

Fixes: 716d53cc7837 ("ASoC: Intel: Boards: Add Maxim98373 support")
Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com>
Tested-by: Lukasz Majczak <lma@semihalf.com>
Link: https://lore.kernel.org/r/20210818075742.1515155-2-cezary.rojewski@intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

rsi: fix an error code in rsi_probe()

[ Upstream commit 9adcdf6758d7c4c9bdaf22d78eb9fcae260ed113 ]

Return -ENODEV instead of success for unsupported devices.

Fixes: 54fdb318c111 ("rsi: add new device model for 9116")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210816183947.GA2119@kili
Signed-off-by: Sasha Levin <sashal@kernel.org>

rsi: fix error code in rsi_load_9116_firmware()

[ Upstream commit d0f8430332a16c7baa80ce2886339182c5d85f37 ]

This code returns success if the kmemdup() fails, but obviously it
should return -ENOMEM instead.

Fixes: e5a1ecc97e5f ("rsi: add firmware loading for 9116 device")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210805103746.GA26417@kili
Signed-off-by: Sasha Levin <sashal@kernel.org>

gfs2: init system threads before freeze lock

[ Upstream commit a28dc123fa66ba7f3eca7cffc4b01d96bfd35c27 ]

Patch 96b1454f2e ("gfs2: move freeze glock outside the make_fs_rw and _ro
functions") changed the gfs2 mount sequence so that it holds the freeze
lock before calling gfs2_make_fs_rw. Before this patch, gfs2_make_fs_rw
called init_threads to initialize the quotad and logd threads. That is a
problem if the system needs to withdraw due to IO errors early in the
mount sequence, for example, while initializing the system statfs inode:

1. An IO error causes the statfs glock to not sync properly after
   recovery, and leaves items on the ail list.
2. The leftover items on the ail list causes its do_xmote call to fail,
   which makes it want to withdraw. But since the glock code cannot
   withdraw (because the withdraw sequence uses glocks) it relies upon
   the logd daemon to initiate the withdraw.
3. The withdraw can never be performed by the logd daemon because all
   this takes place before the logd daemon is started.

This patch moves function init_threads from super.c to ops_fstype.c
and it changes gfs2_fill_super to start its threads before holding the
freeze lock, and if there's an error, stop its threads after releasing
it. This allows the logd to run unblocked by the freeze lock. Thus,
the logd daemon can perform its withdraw sequence properly.

Fixes: 96b1454f2e8e ("gfs2: move freeze glock outside the make_fs_rw and _ro functions")
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>