This fix is basically the same as 9bce02ef0dfa ("pwm: meson: Fix the
G12A AO clock parents order"). Vendor driver referenced there has
xtal as first parent also for axg ao. In addition fix the name
of the aoclk81 clock. Apparently name aoclk81 as used by the vendor
driver was changed when mainlining the axg clock driver.
Use primary channel index to determine which 5 MHz mask should be enable.
This mask is used to prevent noise from channel edge to effect CCA
threshold in wide bandwidth (>= 40 MHZ).
Fixes: 1b00e9236a71 ("rtw89: 8852c: add set channel of BB part") Fixes: 6b0698984eb0 ("wifi: rtw89: 8852b: add chip_ops::set_channel") Cc: stable@vger.kernel.org Signed-off-by: Eric Huang <echuang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230406072841.8308-1-pkshih@realtek.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
According to the documentation, ieee80211_rx_list must not run concurrently
with ieee80211_tx_status (or its variants).
Cc: stable@vger.kernel.org Fixes: 88046b2c9f6d ("mt76: add support for reporting tx status with skb") Reported-by: Brian Coverstone <brian@mainsequence.net> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Under CONFIG_FORTIFY_SOURCE, memcpy() will check the size of destination
and source buffers. Defining kernel_headers_data as "char" would trip
this check. Since these addresses are treated as byte arrays, define
them as arrays (as done everywhere else).
This was seen with:
$ cat /sys/kernel/kheaders.tar.xz >> /dev/null
detected buffer overflow in memcpy
kernel BUG at lib/string_helpers.c:1027!
...
RIP: 0010:fortify_panic+0xf/0x20
[...]
Call Trace:
<TASK>
ikheaders_read+0x45/0x50 [kheaders]
kernfs_fop_read_iter+0x1a4/0x2f0
...
Reported-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/bpf/20230302112130.6e402a98@kernel.org/ Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Tested-by: Jakub Kicinski <kuba@kernel.org> Fixes: 43d8ce9d65a5 ("Provide in-kernel headers to make extending kernel easier") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230302224946.never.243-kees@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The priv->chan_out_states array and actual DAC value can become
mismatched if stx104_write_raw() is called concurrently. Prevent such a
race condition by utilizing a mutex.
Fixes: 97a445dad37a ("iio: Add IIO support for the DAC on the Apex Embedded Systems STX104") Signed-off-by: William Breathitt Gray <william.gray@linaro.org> Link: https://lore.kernel.org/r/c95c9a77fcef36b2a052282146950f23bbc1ebdc.1680790580.git.william.gray@linaro.org Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The ADC conversion procedure requires several device I/O operations
performed in a particular sequence. If stx104_read_raw() is called
concurrently, the ADC conversion procedure could be clobbered. Prevent
such a race condition by utilizing a mutex.
The ipmi communication is not restored after a specific version of BMC is
upgraded on our server.
The ipmi driver does not respond after printing the following log:
ipmi_ssif: Invalid response getting flags: 1c 1
I found that after entering this branch, ssif_info->ssif_state always
holds SSIF_GETTING_FLAGS and never return to IDLE.
As a result, the driver cannot be loaded, because the driver status is
checked during the unload process and must be IDLE in shutdown_ssif():
while (ssif_info->ssif_state != SSIF_IDLE)
schedule_timeout(1);
The process trigger this problem is:
1. One msg timeout and next msg start send, and call
ssif_set_need_watch().
2. ssif_set_need_watch()->watch_timeout()->start_flag_fetch() change
ssif_state to SSIF_GETTING_FLAGS.
3. In msg_done_handler() ssif_state == SSIF_GETTING_FLAGS, if an error
message is received, the second branch does not modify the ssif_state.
4. All retry action need IS_SSIF_IDLE() == True. Include retry action in
watch_timeout(), msg_done_handler(). Sending msg does not work either.
SSIF_IDLE is also checked in start_next_msg().
5. The only thing that can be triggered in the SSIF driver is
watch_timeout(), after destory_user(), this timer will stop too.
So, if enter this branch, the ssif_state will remain SSIF_GETTING_FLAGS
and can't send msg, no timer started, can't unload.
We did a comparative test before and after adding this patch, and the
result is effective.
'obj-$(CONFIG_SYSCTL) += sysctls.o' must be moved after "obj-y :=",
or it won't be built as it is overwrited.
Note that there is nothing that is going to break by linking
sysctl.o later, we were just being way to cautious and patches
have been updated to reflect these considerations and sent for
stable as well with the whole "base" stuff needing to be linked
prior to child sysctl tables that use that directory. All of
the kernel sysctl APIs always share the same directory, and races
against using it should end up re-using the same single created
directory.
And so something we can do eventually is do away with all the base stuff.
For now it's fine, it's not creating an issue. It is just a bit pedantic
and careful.
Fixes: ab171b952c6e ("fs: move namespace sysctls and declare fs base directory") Cc: stable@vger.kernel.org # v5.17 Cc: Christian Brauner <brauner@kernel.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
[mcgrof: enhanced commit log for stable criteria and clarify base stuff ] Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For CONFIG_NO_HZ_FULL systems, the tick_do_timer_cpu cannot be offlined.
However, cpu_is_hotpluggable() still returns true for those CPUs. This causes
torture tests that do offlining to end up trying to offline this CPU causing
test failures. Such failure happens on all architectures.
Fix the repeated error messages thrown by this (even if the hotplug errors are
harmless) by asking the opinion of the nohz subsystem on whether the CPU can be
hotplugged.
[ Apply Frederic Weisbecker feedback on refactoring tick_nohz_cpu_down(). ]
For drivers/base/ portion: Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Frederic Weisbecker <frederic@kernel.org> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Zhouyi Zhou <zhouzhouyi@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: rcu <rcu@vger.kernel.org> Cc: stable@vger.kernel.org Fixes: 2987557f52b9 ("driver-core/cpu: Expose hotpluggability to the rest of the kernel") Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After upgrading from 5.16 to 6.1, our board with a MAX14830 started
producing lots of garbage data over UART. Bisection pointed out commit 285e76fc049c as the culprit. That patch tried to replace hand-written
code which I added in 2b4bac48c1084 ("serial: max310x: Use batched reads
when reasonably safe") with the generic regmap infrastructure for
batched operations.
Unfortunately, the `regmap_raw_read` and `regmap_raw_write` which were
used are actually functions which perform IO over *multiple* registers.
That's not what is needed for accessing these Tx/Rx FIFOs; the
appropriate functions are the `_noinc_` versions, not the `_raw_` ones.
Fix this regression by using `regmap_noinc_read()` and
`regmap_noinc_write()` along with the necessary `regmap_config` setup;
with this patch in place, our board communicates happily again. Since
our board uses SPI for talking to this chip, the I2C part is completely
untested.
There's a potential race before THRE/TEMT deasserts when DMA Tx is
starting up (or the next batch of continuous Tx is being submitted).
This can lead to misdetecting Tx empty condition.
It is entirely normal for THRE/TEMT to be set for some time after the
DMA Tx had been setup in serial8250_tx_dma(). As Tx side is definitely
not empty at that point, it seems incorrect for serial8250_tx_empty()
claim Tx is empty.
Fix the race by also checking in serial8250_tx_empty() whether there's
DMA Tx active.
Note: This fix only addresses in-kernel race mainly to make using
TCSADRAIN/FLUSH robust. Userspace can still cause other races but they
seem userspace concurrency control problems.
Wire up the debugfs regset device pointer so that the controller is
resumed before accessing registers to avoid crashing or locking up if it
happens to be runtime suspended.
If userspace races tcsetattr() with a write, the drained condition
might not be guaranteed by the kernel. There is a race window after
checking Tx is empty before tty_set_termios() takes termios_rwsem for
write. During that race window, more characters can be queued by a
racing writer.
Any ongoing transmission might produce garbage during HW's
->set_termios() call. The intent of TCSADRAIN/FLUSH seems to be
preventing such a character corruption. If those flags are set, take
tty's write lock to stop any writer before performing the lower layer
Tx empty check and wait for the pending characters to be sent (if any).
The initial wait for all-writers-done must be placed outside of tty's
write lock to avoid deadlock which makes it impossible to use
tty_wait_until_sent(). The write lock is retried if a racing write is
detected.
If blk_crypto_evict_key() sees that the key is still in-use (due to a
bug) or that ->keyslot_evict failed, it currently just returns while
leaving the key linked into the keyslot management structures.
However, blk_crypto_evict_key() is only called in contexts such as inode
eviction where failure is not an option. So actually the caller
proceeds with freeing the blk_crypto_key regardless of the return value
of blk_crypto_evict_key().
These two assumptions don't match, and the result is that there can be a
use-after-free in blk_crypto_reprogram_all_keys() after one of these
errors occurs. (Note, these errors *shouldn't* happen; we're just
talking about what happens if they do anyway.)
Fix this by making blk_crypto_evict_key() unlink the key from the
keyslot management structures even on failure.
Also improve some comments.
Fixes: 1b2628397058 ("block: Keyslot Manager for Inline Encryption") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230315183907.53675-2-ebiggers@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
blk_crypto_evict_key() is only called in contexts such as inode eviction
where failure is not an option. So there is nothing the caller can do
with errors except log them. (dm-table.c does "use" the error code, but
only to pass on to upper layers, so it doesn't really count.)
Just make blk_crypto_evict_key() return void and log errors itself.
Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230315183907.53675-2-ebiggers@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Once all I/O using a blk_crypto_key has completed, filesystems can call
blk_crypto_evict_key(). However, the block layer currently doesn't call
blk_crypto_put_keyslot() until the request is being freed, which happens
after upper layers have been told (via bio_endio()) the I/O has
completed. This causes a race condition where blk_crypto_evict_key()
can see 'slot_refs != 0' without there being an actual bug.
This makes __blk_crypto_evict_key() hit the
'WARN_ON_ONCE(atomic_read(&slot->slot_refs) != 0)' and return without
doing anything, eventually causing a use-after-free in
blk_crypto_reprogram_all_keys(). (This is a very rare bug and has only
been seen when per-file keys are being used with fscrypt.)
There are two options to fix this: either release the keyslot before
bio_endio() is called on the request's last bio, or make
__blk_crypto_evict_key() ignore slot_refs. Let's go with the first
solution, since it preserves the ability to report bugs (via
WARN_ON_ONCE) where a key is evicted while still in-use.
Fixes: a892c8d52c02 ("block: Inline encryption support for blk-mq") Cc: stable@vger.kernel.org Reviewed-by: Nathan Huckleberry <nhuck@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20230315183907.53675-2-ebiggers@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Legacy Display Unit (LDU) fb dirty support used a custom fb dirty callback. Latter
handled only the DIRTYFB IOCTL presentation path but not the ADDFB2/PAGE_FLIP/RMFB
IOCTL path, common for Wayland compositors.
Get rid of the custom callback in favor of drm_atomic_helper_dirtyfb and unify the
handling of the presentation paths inside of vmw_ldu_primary_plane_atomic_update.
This also homogenizes the fb dirty callbacks across all DUs: LDU, SOU and STDU.
Signed-off-by: Martin Krastev <krastevm@vmware.com> Reviewed-by: Maaz Mombasawala <mombasawalam@vmware.com> Fixes: 2f5544ff0300 ("drm/vmwgfx: Use atomic helper function for dirty fb IOCTL") Cc: <stable@vger.kernel.org> # v5.0+ Signed-off-by: Zack Rusin <zackr@vmware.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230321020949.335012-3-zack@kde.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It the device is probed in non-zero ACPI D state, the module
identification is delayed until the first streamon.
The module identification has two parts: deviceID and version. To rea
the version we have to enable OTP read. This cannot be done during
streamon, becase it modifies REG_MODE_SELECT.
Since the driver has the same behaviour for all the module versions, do
not read the module version from the sensor's OTP.
Cc: stable@vger.kernel.org Fixes: 0e014f1a8d54 ("media: ov8856: support device probe in non-zero ACPI D state") Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For some unknown reason the introduction of the timer_wait_running callback
missed to fixup posix CPU timers, which went unnoticed for almost four years.
Marco reported recently that the WARN_ON() in timer_wait_running()
triggers with a posix CPU timer test case.
Posix CPU timers have two execution models for expiring timers depending on
CONFIG_POSIX_CPU_TIMERS_TASK_WORK:
1) If not enabled, the expiry happens in hard interrupt context so
spin waiting on the remote CPU is reasonably time bound.
Implement an empty stub function for that case.
2) If enabled, the expiry happens in task work before returning to user
space or guest mode. The expired timers are marked as firing and moved
from the timer queue to a local list head with sighand lock held. Once
the timers are moved, sighand lock is dropped and the expiry happens in
fully preemptible context. That means the expiring task can be scheduled
out, migrated, interrupted etc. So spin waiting on it is more than
suboptimal.
The timer wheel has a timer_wait_running() mechanism for RT, which uses
a per CPU timer-base expiry lock which is held by the expiry code and the
task waiting for the timer function to complete blocks on that lock.
This does not work in the same way for posix CPU timers as there is no
timer base and expiry for process wide timers can run on any task
belonging to that process, but the concept of waiting on an expiry lock
can be used too in a slightly different way:
- Add a mutex to struct posix_cputimers_work. This struct is per task
and used to schedule the expiry task work from the timer interrupt.
- Add a task_struct pointer to struct cpu_timer which is used to store
a the task which runs the expiry. That's filled in when the task
moves the expired timers to the local expiry list. That's not
affecting the size of the k_itimer union as there are bigger union
members already
- Let the task take the expiry mutex around the expiry function
- Let the waiter acquire a task reference with rcu_read_lock() held and
block on the expiry mutex
This avoids spin-waiting on a task which might not even be on a CPU and
works nicely for RT too.
Fixes: ec8f954a40da ("posix-timers: Use a callback for cancel synchronization on PREEMPT_RT") Reported-by: Marco Elver <elver@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marco Elver <elver@google.com> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87zg764ojw.ffs@tglx Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On DT unaware platforms of_property_read_u32_array() returns -ENOSYS
which wasn't handled by the code treating adi,pwm-active-state as
optional. Update the code to use device_property_read_u32_array() which
deals gracefully with DT unaware platforms.
Spec says, when CUR_TEMP_TJ_SEL == 3 and CUR_TEMP_RANGE_SEL == 0,
it should use RangeUnadjusted is 0, which is (CurTmp*0.125 -49) C. The
CUR_TEMP register is read-write when CUR_TEMP_TJ_SEL == 3 (bit 17-16).
It was observed that there are hosts that may complete pending SETUP
transactions before the stop active transfers and controller halt occurs,
leading to lingering endxfer commands on DEPs on subsequent pullup/gadget
start iterations.
The sequence shows that the USB gadget disconnect (dwc3_gadget_pullup(0))
routine completed successfully, allowing for the USB gadget to proceed with
a USB gadget connect. However, if this occurs the system runs into an
issue where:
BUG: spinlock already unlocked on CPU
spin_bug+0x0
dwc3_remove_requests+0x278
dwc3_ep0_out_start+0xb0
__dwc3_gadget_start+0x25c
This is due to the pending endxfers, leading to gadget start (w/o lock
held) to execute the remove requests, which will unlock the dwc3
spinlock as part of giveback.
To mitigate this, resolve the pending endxfers on the pullup disable
path by re-locating the SETUP phase check after stop active transfers, since
that is where the DWC3_EP_DELAY_STOP is potentially set. This also allows
for handling of a host that may be unresponsive by using the completion
timeout to trigger the stall and restart for EP0.
usb_gadget_connect calls gadget->ops->pullup without checking whether
gadget->connected was previously set. Make this symmetric to
usb_gadget_disconnect by returning early if gadget->connected is
already set.
Fixes: 5a1da544e572 ("usb: gadget: core: do not try to disconnect gadget if it is not connected") Cc: stable@vger.kernel.org Signed-off-by: Badhri Jagan Sridharan <badhri@google.com> Link: https://lore.kernel.org/r/20230407030741.3163220-2-badhri@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
usb_udc_connect_control does not check to see if the udc has already
been started. This causes gadget->ops->pullup to be called through
usb_gadget_connect when invoked from usb_udc_vbus_handler even before
usb_gadget_udc_start is called. Guard this by checking for udc->started
in usb_udc_connect_control before invoking usb_gadget_connect.
Guarding udc->vbus, udc->started, gadget->connect, gadget->deactivate
related functions with connect_lock. usb_gadget_connect_locked,
usb_gadget_disconnect_locked, usb_udc_connect_control_locked,
usb_gadget_udc_start_locked, usb_gadget_udc_stop_locked are called with
this lock held as they can be simulataneously invoked from different code
paths.
Adding an additional check to make sure udc is started(udc->started)
before pullup callback is invoked.
The RTW88 chipsets have four different priority queues in hardware. For
the USB type chipsets the packets destined for a specific priority queue
must be sent through the endpoint corresponding to the queue. This was
not fully understood when porting from the RTW88 USB out of tree driver
and thus violated.
This patch implements the qsel to endpoint mapping as in
get_usb_bulkout_id_88xx() in the downstream driver.
Without this the driver often issues "timed out to flush queue 3"
warnings and often TX stalls completely.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Tested-by: ValdikSS <iam@valdikss.org.ru> Tested-by: Alexandru gagniuc <mr.nuke.me@gmail.com> Tested-by: Larry Finger <Larry.Finger@lwfinger.net> Cc: stable@vger.kernel.org Reviewed-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230417140358.2240429-2-s.hauer@pengutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
UML supports HAS_IOMEM since 0bbadafdc49d (um: allow disabling
NO_IOMEM).
Current IMA build on UML fails on allmodconfig (with TCG_TPM=m):
ld: security/integrity/ima/ima_queue.o: in function `ima_add_template_entry':
ima_queue.c:(.text+0x2d9): undefined reference to `tpm_pcr_extend'
ld: security/integrity/ima/ima_init.o: in function `ima_init':
ima_init.c:(.init.text+0x43f): undefined reference to `tpm_default_chip'
ld: security/integrity/ima/ima_crypto.o: in function `ima_calc_boot_aggregate_tfm':
ima_crypto.c:(.text+0x1044): undefined reference to `tpm_pcr_read'
ld: ima_crypto.c:(.text+0x10d8): undefined reference to `tpm_pcr_read'
Modify the IMA Kconfig entry so that it selects TCG_TPM if HAS_IOMEM
is set, regardless of the UML Kconfig setting.
This updates TCG_TPM from =m to =y and fixes the linker errors.
commit 2d5253a096c6 ("bus: mhi: host: pci_generic: Add a secondary AT port
to Telit FN990")
commit 479aa3b0ec2e ("bus: mhi: host: pci_generic: Add a secondary AT port
to Telit FN990")
This turned out to be due to the patch getting applied through different
trees and git settled on a resolution while applying it second time. But
the second AT port of Foxconn devices don't work in PCIe mode. So the
second commit needs to be reverted.
Cc: stable@vger.kernel.org # 6.2 Fixes: 2d5253a096c6 ("bus: mhi: host: pci_generic: Add a secondary AT port to Telit FN990") Signed-off-by: Slark Xiao <slark_xiao@163.com> Reviewed-by: Fabio Porcedda <fabio.porcedda@gmail.com> Reviewed-by: Manivannan Sadhasivam <mani@kernel.org> Link: https://lore.kernel.org/r/20230310101715.69209-1-slark_xiao@163.com
[mani: massaged the commit message a bit, added fixes tag and CCed stable] Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Qcom PCIe IP version v2.7.0 and its derivatives don't contain the
PCIE20_PARF_AXI_MSTR_WR_ADDR_HALT register. Instead, they have the new
PCIE20_PARF_AXI_MSTR_WR_ADDR_HALT_V2 register. So fix the incorrect
register usage which is modifying a different register.
Also in this IP version, this register change doesn't depend on MSI
being enabled. So remove that check also.
Link: https://lore.kernel.org/r/20230316081117.14288-2-manivannan.sadhasivam@linaro.org Fixes: ed8cc3b1fc84 ("PCI: qcom: Add support for SDM845 PCIe controller") Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Cc: <stable@vger.kernel.org> # 5.6+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
amended PCIe hotplug to mask Presence Detect Changed events during a
Secondary Bus Reset. The reset thus no longer causes gratuitous slot
bringdown and bringup.
However the commits neglected to serialize reset with code paths reading
slot registers. For instance, a slot bringup due to an earlier hotplug
event may see the Presence Detect State bit cleared during a concurrent
Secondary Bus Reset.
In 2018, commit
5b3f7b7d062b ("PCI: pciehp: Avoid slot access during reset")
retrofitted the missing locking. It introduced a reset_lock which
serializes a Secondary Bus Reset with other parts of pciehp.
Unfortunately the locking turns out to be overzealous: reset_lock is
held for the entire enumeration and de-enumeration of hotplugged devices,
including driver binding and unbinding.
Driver binding and unbinding acquires device_lock while the reset_lock
of the ancestral hotplug port is held. A concurrent Secondary Bus Reset
acquires the ancestral reset_lock while already holding the device_lock.
The asymmetric locking order in the two code paths can lead to AB-BA
deadlocks.
Michael Haeuptle reports such deadlocks on simultaneous hot-removal and
vfio release (the latter implies a Secondary Bus Reset):
Fix by releasing the reset_lock during driver binding and unbinding,
thereby splitting and shrinking the critical section.
Driver binding and unbinding is protected by the device_lock() and thus
serialized with a Secondary Bus Reset. There's no need to additionally
protect it with the reset_lock. However, pciehp does not bind and
unbind devices directly, but rather invokes PCI core functions which
also perform certain enumeration and de-enumeration steps.
The reset_lock's purpose is to protect slot registers, not enumeration
and de-enumeration of hotplugged devices. That would arguably be the
job of the PCI core, not the PCIe hotplug driver. After all, an
AER-induced Secondary Bus Reset may as well happen during boot-time
enumeration of the PCI hierarchy and there's no locking to prevent that
either.
Exempting *de-enumeration* from the reset_lock is relatively harmless:
A concurrent Secondary Bus Reset may foil config space accesses such as
PME interrupt disablement. But if the device is physically gone, those
accesses are pointless anyway. If the device is physically present and
only logically removed through an Attention Button press or the sysfs
"power" attribute, PME interrupts as well as DMA cannot come through
because pciehp_unconfigure_device() disables INTx and Bus Master bits.
That's still protected by the reset_lock in the present commit.
Exempting *enumeration* from the reset_lock also has limited impact:
The exempted call to pci_bus_add_device() may perform device accesses
through pcibios_bus_add_device() and pci_fixup_device() which are now
no longer protected from a concurrent Secondary Bus Reset. Otherwise
there should be no impact.
In essence, the present commit seeks to fix the AB-BA deadlocks while
still retaining a best-effort reset protection for enumeration and
de-enumeration of hotplugged devices -- until a general solution is
implemented in the PCI core.
pcie-kirin uses regmaps, and needs to pull them in; otherwise, with
CONFIG_PCIE_KIRIN=y and without CONFIG_REGMAP_MMIO pcie-kirin produces
a linker failure looking for __devm_regmap_init_mmio_clk().
Fixes: d19afe7be126 ("PCI: kirin: Use regmap for APB registers") Link: https://lore.kernel.org/r/04636141da1d6d592174eefb56760511468d035d.1668410580.git.josh@joshtriplett.org Signed-off-by: Josh Triplett <josh@joshtriplett.org>
[lpieralisi@kernel.org: commit log and removed REGMAP select] Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Cc: stable@vger.kernel.org # 5.16+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-mcpu=power10 will generate prefixed and pcrel code by default, which
we do not support. The general kernel disables these with cflags, but
those were missed for the boot wrapper.
Fixes: 4b2a9315f20d ("powerpc/64s: POWER10 CPU Kconfig build option") Cc: stable@vger.kernel.org # v6.1+ Reported-by: Danny Tsen <dtsen@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230407040909.230998-1-npiggin@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Instead of reloading the shadow call stack pointer from the ordinary
stack, which may be vulnerable to the kind of gadget based attacks
shadow call stacks were designed to prevent, let's store a task's shadow
call stack pointer in the task struct when switching to the shadow IRQ
stack.
Given that currently, the task_struct::scs_sp field is only used to
preserve the shadow call stack pointer while a task is scheduled out or
running in user space, reusing this field to preserve and restore it
while running off the IRQ stack must be safe, as those occurrences are
guaranteed to never overlap. (The stack switching logic only switches
stacks when running from the task stack, and so the value being saved
here always corresponds to the task mode shadow stack)
While at it, fold a mov/add/mov sequence into a single add.
All occurrences of the scs_load macro load the value of the shadow call
stack pointer from the task which is current at that point. So instead
of taking a task struct register argument in the scs_load macro to
specify the task struct to load from, let's always reference the current
task directly. This should make it much harder to exploit any
instruction sequences reloading the shadow call stack pointer register
from memory.
Updating the clock source from ACLK to default clock
Signed-off-by: Syed Saba Kareem <Syed.SabaKareem@amd.com> Link: https://lore.kernel.org/r/20230331052102.2211115-1-Syed.SabaKareem@amd.com Signed-off-by: Mark Brown <broonie@kernel.org> Cc: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the BIOS has been configured for Fast Boot, systems with mt7921e
have non-functional wifi. Turning on Fast boot caused both bus master
enable and memory space enable bits in PCI_COMMAND not to get configured.
The mt7921 driver already sets bus master enable, but explicitly check
and set memory access enable as well to fix this problem.
Tested-by: Anson Tsao <anson.tsao@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Acked-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Copy the forced type casts from the normal MMIO accessors to suppress
the sparse warnings that point out __raw_readl() returns a native endian
word (just like readl()).
A __field() in the TRACE_EVENT() macro is used to set up the fields of the
trace event data. It is for single storage units (word, char, int,
pointer, etc) and not for complex structures or arrays. Unfortunately,
there's nothing preventing the build from accepting:
__field(int, arr[5]);
from building. It will turn into a array value. This use to work fine, as
the offset and size use to be determined by the macro using the field name,
but things have changed and the offset and size are now determined by the
type. So the above would only be size 4, and the next field will be
located 4 bytes from it (instead of 20).
The proper way to declare static arrays is to use the __array() macro.
Instead of __field(int, arr[5]) it should be __array(int, arr, 5).
Add some macro tricks to the building of a trace event from the
TRACE_EVENT() macro such that __field(int, arr[5]) will fail to build. A
comment by the failure will explain why the build failed.
Currently ath11k breaks after hibernation, the reason being that ath11k expects
that the wireless device will have power during suspend and the firmware will
continue running. But of course during hibernation the power from the device is
cut off and firmware is not running when resuming, so ath11k will fail.
(The reason why ath11k needs the firmware running is the interaction between
mac80211 and MHI stack, it's a long story and more info in the bugzilla report.)
In SUSE kernels the watchdog timeout is reduced from the default 120 to 60 seconds:
CONFIG_DPM_WATCHDOG_TIMEOUT=60
But as the ath11k MHI timeout is 90 seconds the kernel will crash before will
ath11k will recover in resume callback. To avoid the crash reduce the MHI
timeout to just 20 seconds.
From the commit message adding the first s2idle quirks:
> Lenovo laptops that contain NVME SSDs across a variety of generations have
> trouble resuming from suspend to idle when the IOMMU translation layer is
> active for the NVME storage device.
>
> This generally manifests as a large resume delay or page faults. These
> delays and page faults occur as a result of a Lenovo BIOS specific SMI
> that runs during the D3->D0 transition on NVME devices.
Add the DMI ids for another variant of the T14s Gen1, which also needs
the s2idle quirk.
The HG MXPD-483II 1310nm SFP module is meant to operate with 2500Base-X,
however, in their EEPROM they incorrectly specify:
Transceiver type : Ethernet: 1000BASE-LX
...
BR, Nominal : 2600MBd
Use sfp_quirk_2500basex for this module to allow 2500Base-X mode anyway.
Reported-by: chowtom <chowtom@gmail.com> Tested-by: chowtom <chowtom@gmail.com> Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
The driver is exiting from the fault watchdog thread if it sees the 0xF002
(Soft reset in progress) fault code.
If the driver initiates the soft reset, then the driver restarts the
watchdog at the end of the soft reset completion. However, if the soft
reset is initiated by the firmware asynchronously, then the driver will
never restart the watchdog and never re-initialize the controller after the
asynchronous soft reset completion.
When compiling selftests with target mount_setattr I encountered some errors with the below messages:
mount_setattr_test.c: In function ‘mount_setattr_thread’:
mount_setattr_test.c:343:16: error: variable ‘attr’ has initializer but incomplete type
343 | struct mount_attr attr = {
| ^~~~~~~~~~
These errors might be because of linux/mount.h is not included. This patch resolves that issue.
Signed-off-by: Anh Tuan Phan <tuananhlfc@gmail.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When playing with various compilers or their versions, some choke on
the t7xx code. For example (with gcc 13):
In file included from ./arch/s390/include/generated/asm/rwonce.h:1,
from ../include/linux/compiler.h:247,
from ../include/linux/build_bug.h:5,
from ../include/linux/bits.h:22,
from ../drivers/net/wwan/t7xx/t7xx_state_monitor.c:17:
In function 'preempt_count',
inlined from 't7xx_fsm_append_event' at ../drivers/net/wwan/t7xx/t7xx_state_monitor.c:439:43:
../include/asm-generic/rwonce.h:44:26: error: array subscript 0 is outside array bounds of 'const volatile int[0]' [-Werror=array-bounds=]
There is no reason for any code in the kernel to be built with -Werror
by default. Note that we have generic CONFIG_WERROR. So if anyone wants
-Werror, they can enable that.
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Link: https://lore.kernel.org/all/20230330232717.1f8bf5ea@kernel.org/ Cc: Chandrashekar Devegowda <chandrashekar.devegowda@intel.com> Cc: Intel Corporation <linuxwwan@intel.com> Cc: Chiranjeevi Rapolu <chiranjeevi.rapolu@linux.intel.com> Cc: Liu Haijun <haijun.liu@mediatek.com> Cc: M Chetan Kumar <m.chetan.kumar@linux.intel.com> Cc: Ricardo Martinez <ricardo.martinez@linux.intel.com> Cc: Loic Poulain <loic.poulain@linaro.org> Cc: Sergey Ryazanov <ryazanov.s.a@gmail.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
The Acer Iconia One 7 B1-750 tablet mostly works fine with the defaults
for an Bay Trail CR tablet. Except for the internal mic, instead of
an analog mic on IN3 a digital mic on DMIC1 is uses.
Add a quirk with these settings for this tablet.
Acked-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20230322145332.131525-1-hdegoede@redhat.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Hyper-V should never specify a VM that is a Confidential VM and also
running in the root partition. Nonetheless, explicitly block such a
combination to guard against a compromised Hyper-V maliciously trying to
exploit root partition functionality in a Confidential VM to expose
Confidential VM secrets. No known bug is being fixed, but the attack
surface for Confidential VMs on Hyper-V is reduced.
The hw->formats may be set by snd_dmaengine_pcm_refine_runtime_hwparams()
in component's startup()/open(), but soc_pcm_hw_init() will init
hw->formats in dpcm_runtime_setup_fe() after component's startup()/open(),
which causes the valuable hw->formats to be cleared.
So need to store the hw->formats before initialization, then restore
it after initialization.
We used to access the dtb via its linear mapping address but now that the
dtb early mapping was moved in the fixmap region, we can keep using this
address since it is present in swapper_pg_dir, and remove the dtb
relocation.
Note that the relocation was wrong anyway since early_memremap() is
restricted to 256K whereas the maximum fdt size is 2MB.
early_init_dt_verify() is already called in parse_dtb() and since the dtb
address does not change anymore (it is now in the fixmap region), no need
to reset initial_boot_params by calling early_init_dt_verify() again.
- early_pg_dir maps the kernel which allows to discover the system
memory
- swapper_pg_dir installs the final mapping (linear mapping included)
We used to map the dtb in early_pg_dir using DTB_EARLY_BASE_VA, and this
mapping was not carried over in swapper_pg_dir. It happens that
early_init_fdt_scan_reserved_mem() must be called before swapper_pg_dir is
setup otherwise we could allocate reserved memory defined in the dtb.
And this function initializes reserved_mem variable with addresses that
lie in the early_pg_dir dtb mapping: when those addresses are reused
with swapper_pg_dir, this mapping does not exist and then we trap.
The previous "fix" was incorrect as early_init_fdt_scan_reserved_mem()
must be called before swapper_pg_dir is set up otherwise we could
allocate in reserved memory defined in the dtb.
So move the dtb mapping in the fixmap region which is established in
early_pg_dir and handed over to swapper_pg_dir.
Don't require the use of dynamic debug (or modification of the kernel to
add a #define DEBUG to the top of this file) to get the printk message
about driver probe timing. This printk is only emitted when
initcall_debug is enabled on the kernel commandline, and it isn't
immediately obvious that you have to do something else to debug boot
timing issues related to driver probe. Add a comment too so it doesn't
get converted back to pr_debug().
Fixes: eb7fbc9fb118 ("driver core: Add missing '\n' in log messages") Cc: stable <stable@kernel.org> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Brian Norris <briannorris@chromium.org> Reviewed-by: Brian Norris <briannorris@chromium.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Stephen Boyd <swboyd@chromium.org> Link: https://lore.kernel.org/r/20230412225842.3196599-1-swboyd@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add UNISOC vendor ID and TOZED LT70-C modem which is based from UNISOC
SL8563. The modem supports the NCM mode. Interface 0 is used for running
the AT commands. Interface 12 is the ADB interface.
There are some warnings on older compilers (gcc 10, 7) or non-x86_64
architectures (aarch64). As btrfs wants to enable -Wmaybe-uninitialized
by default, fix the warnings even though it's not necessary on recent
compilers (gcc 12+).
../fs/btrfs/volumes.c: In function ‘btrfs_init_new_device’:
../fs/btrfs/volumes.c:2703:3: error: ‘seed_devices’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
2703 | btrfs_setup_sprout(fs_info, seed_devices);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../fs/btrfs/send.c: In function ‘get_cur_inode_state’:
../include/linux/compiler.h:70:32: error: ‘right_gen’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
70 | (__if_trace.miss_hit[1]++,1) : \
| ^
../fs/btrfs/send.c:1878:6: note: ‘right_gen’ was declared here
1878 | u64 right_gen;
| ^~~~~~~~~
Add SDIO ids for use with the muRata 1YN (Cypress CYW43439).
The odd thing about this is that the previous 1YN populated
on M.2 card for evaluation purposes had BRCM SDIO vendor ID,
while the chip populated on real hardware has a Cypress one.
The device ID also differs between the two devices. But they
are both 43439 otherwise, so add the IDs for both.
On-device 1YN (43439), the new one, chip label reads "1YN":
```
/sys/.../mmc_host/mmc2/mmc2:0001 # cat vendor device
0x04b4
0xbd3d
```
EA M.2 evaluation board 1YN (43439), the old one, chip label reads "1YN ES1.4":
```
/sys/.../mmc_host/mmc0/mmc0:0001/# cat vendor device
0x02d0
0xa9a6
```
Reviewed-by: Hans de Goede <hdegoede@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Marek Vasut <marex@denx.de> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230407203752.128539-1-marex@denx.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Previously, capability was checked using capable(), which verified that the
caller of the ioctl system call had the required capability. In addition,
the result of the check would be stored in the HCI_SOCK_TRUSTED flag,
making it persistent for the socket.
However, malicious programs can abuse this approach by deliberately sharing
an HCI socket with a privileged task. The HCI socket will be marked as
trusted when the privileged task occasionally makes an ioctl call.
This problem can be solved by using sk_capable() to check capability, which
ensures that not only the current task but also the socket opener has the
specified capability, thus reducing the risk of privilege escalation
through the previously identified vulnerability.
Cc: stable@vger.kernel.org Fixes: f81f5b2db869 ("Bluetooth: Send control open and close messages for HCI raw sockets") Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1796f808e4bb ("HID: i2c-hid: acpi: Stop setting wakeup_capable")
changed the policy such that I2C touchpads may be able to wake up the
system by default if the system is configured as such.
However on Clevo NL5xNU there is a mistake in the ACPI tables that the
TP_ATTN# signal connected to GPIO 9 is configured as ActiveLow and level
triggered but connected to a pull up. As soon as the system suspends the
touchpad loses power and then the system wakes up.
To avoid this problem, introduce a quirk for this model that will prevent
the wakeup capability for being set for GPIO 9.
This patch is analoge to a very similar patch for NL5xRU, just the DMI
string changed.
Signed-off-by: Werner Sembach <wse@tuxedocomputers.com> Cc: stable@vger.kernel.org Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Drivers are supposed to fix this up if needed if they don't outright
reject it. Uncovered by 6c11df58fd1a ("fbmem: Check virtual screen
sizes in fb_set_var()").
Fix a slab-out-of-bounds read that occurs in kmemdup() called from
brcmf_get_assoc_ies().
The bug could occur when assoc_info->req_len, data from a URB provided
by a USB device, is bigger than the size of buffer which is defined as
WL_EXTRA_BUF_MAX.
Add the size check for req_len/resp_len of assoc_info.
Found by a modified version of syzkaller.
[ 46.592467][ T7] ==================================================================
[ 46.594687][ T7] BUG: KASAN: slab-out-of-bounds in kmemdup+0x3e/0x50
[ 46.596572][ T7] Read of size 3014656 at addr ffff888019442000 by task kworker/0:1/7
[ 46.598575][ T7]
[ 46.599157][ T7] CPU: 0 PID: 7 Comm: kworker/0:1 Tainted: G O 5.14.0+ #145
[ 46.601333][ T7] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 46.604360][ T7] Workqueue: events brcmf_fweh_event_worker
[ 46.605943][ T7] Call Trace:
[ 46.606584][ T7] dump_stack_lvl+0x8e/0xd1
[ 46.607446][ T7] print_address_description.constprop.0.cold+0x93/0x334
[ 46.608610][ T7] ? kmemdup+0x3e/0x50
[ 46.609341][ T7] kasan_report.cold+0x79/0xd5
[ 46.610151][ T7] ? kmemdup+0x3e/0x50
[ 46.610796][ T7] kasan_check_range+0x14e/0x1b0
[ 46.611691][ T7] memcpy+0x20/0x60
[ 46.612323][ T7] kmemdup+0x3e/0x50
[ 46.612987][ T7] brcmf_get_assoc_ies+0x967/0xf60
[ 46.613904][ T7] ? brcmf_notify_vif_event+0x3d0/0x3d0
[ 46.614831][ T7] ? lock_chain_count+0x20/0x20
[ 46.615683][ T7] ? mark_lock.part.0+0xfc/0x2770
[ 46.616552][ T7] ? lock_chain_count+0x20/0x20
[ 46.617409][ T7] ? mark_lock.part.0+0xfc/0x2770
[ 46.618244][ T7] ? lock_chain_count+0x20/0x20
[ 46.619024][ T7] brcmf_bss_connect_done.constprop.0+0x241/0x2e0
[ 46.620019][ T7] ? brcmf_parse_configure_security.isra.0+0x2a0/0x2a0
[ 46.620818][ T7] ? __lock_acquire+0x181f/0x5790
[ 46.621462][ T7] brcmf_notify_connect_status+0x448/0x1950
[ 46.622134][ T7] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 46.622736][ T7] ? brcmf_cfg80211_join_ibss+0x7b0/0x7b0
[ 46.623390][ T7] ? find_held_lock+0x2d/0x110
[ 46.623962][ T7] ? brcmf_fweh_event_worker+0x19f/0xc60
[ 46.624603][ T7] ? mark_held_locks+0x9f/0xe0
[ 46.625145][ T7] ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0
[ 46.625871][ T7] ? brcmf_cfg80211_join_ibss+0x7b0/0x7b0
[ 46.626545][ T7] brcmf_fweh_call_event_handler.isra.0+0x90/0x100
[ 46.627338][ T7] brcmf_fweh_event_worker+0x557/0xc60
[ 46.627962][ T7] ? brcmf_fweh_call_event_handler.isra.0+0x100/0x100
[ 46.628736][ T7] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 46.629396][ T7] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 46.629970][ T7] ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 46.630649][ T7] process_one_work+0x92b/0x1460
[ 46.631205][ T7] ? pwq_dec_nr_in_flight+0x330/0x330
[ 46.631821][ T7] ? rwlock_bug.part.0+0x90/0x90
[ 46.632347][ T7] worker_thread+0x95/0xe00
[ 46.632832][ T7] ? __kthread_parkme+0x115/0x1e0
[ 46.633393][ T7] ? process_one_work+0x1460/0x1460
[ 46.633957][ T7] kthread+0x3a1/0x480
[ 46.634369][ T7] ? set_kthread_struct+0x120/0x120
[ 46.634933][ T7] ret_from_fork+0x1f/0x30
[ 46.635431][ T7]
[ 46.635687][ T7] Allocated by task 7:
[ 46.636151][ T7] kasan_save_stack+0x1b/0x40
[ 46.636628][ T7] __kasan_kmalloc+0x7c/0x90
[ 46.637108][ T7] kmem_cache_alloc_trace+0x19e/0x330
[ 46.637696][ T7] brcmf_cfg80211_attach+0x4a0/0x4040
[ 46.638275][ T7] brcmf_attach+0x389/0xd40
[ 46.638739][ T7] brcmf_usb_probe+0x12de/0x1690
[ 46.639279][ T7] usb_probe_interface+0x2aa/0x760
[ 46.639820][ T7] really_probe+0x205/0xb70
[ 46.640342][ T7] __driver_probe_device+0x311/0x4b0
[ 46.640876][ T7] driver_probe_device+0x4e/0x150
[ 46.641445][ T7] __device_attach_driver+0x1cc/0x2a0
[ 46.642000][ T7] bus_for_each_drv+0x156/0x1d0
[ 46.642543][ T7] __device_attach+0x23f/0x3a0
[ 46.643065][ T7] bus_probe_device+0x1da/0x290
[ 46.643644][ T7] device_add+0xb7b/0x1eb0
[ 46.644130][ T7] usb_set_configuration+0xf59/0x16f0
[ 46.644720][ T7] usb_generic_driver_probe+0x82/0xa0
[ 46.645295][ T7] usb_probe_device+0xbb/0x250
[ 46.645786][ T7] really_probe+0x205/0xb70
[ 46.646258][ T7] __driver_probe_device+0x311/0x4b0
[ 46.646804][ T7] driver_probe_device+0x4e/0x150
[ 46.647387][ T7] __device_attach_driver+0x1cc/0x2a0
[ 46.647926][ T7] bus_for_each_drv+0x156/0x1d0
[ 46.648454][ T7] __device_attach+0x23f/0x3a0
[ 46.648939][ T7] bus_probe_device+0x1da/0x290
[ 46.649478][ T7] device_add+0xb7b/0x1eb0
[ 46.649936][ T7] usb_new_device.cold+0x49c/0x1029
[ 46.650526][ T7] hub_event+0x1c98/0x3950
[ 46.650975][ T7] process_one_work+0x92b/0x1460
[ 46.651535][ T7] worker_thread+0x95/0xe00
[ 46.651991][ T7] kthread+0x3a1/0x480
[ 46.652413][ T7] ret_from_fork+0x1f/0x30
[ 46.652885][ T7]
[ 46.653131][ T7] The buggy address belongs to the object at ffff888019442000
[ 46.653131][ T7] which belongs to the cache kmalloc-2k of size 2048
[ 46.654669][ T7] The buggy address is located 0 bytes inside of
[ 46.654669][ T7] 2048-byte region [ffff888019442000, ffff888019442800)
[ 46.656137][ T7] The buggy address belongs to the page:
[ 46.656720][ T7] page:ffffea0000651000 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x19440
[ 46.657792][ T7] head:ffffea0000651000 order:3 compound_mapcount:0 compound_pincount:0
[ 46.658673][ T7] flags: 0x100000000010200(slab|head|node=0|zone=1)
[ 46.659422][ T7] raw: 01000000000102000000000000000000dead000000000122ffff888100042000
[ 46.660363][ T7] raw: 0000000000000000000000000008000800000001ffffffff0000000000000000
[ 46.661236][ T7] page dumped because: kasan: bad access detected
[ 46.661956][ T7] page_owner tracks the page as allocated
[ 46.662588][ T7] page last allocated via order 3, migratetype Unmovable, gfp_mask 0x52a20(GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP), pid 7, ts 31136961085, free_ts 0
[ 46.664271][ T7] prep_new_page+0x1aa/0x240
[ 46.664763][ T7] get_page_from_freelist+0x159a/0x27c0
[ 46.665340][ T7] __alloc_pages+0x2da/0x6a0
[ 46.665847][ T7] alloc_pages+0xec/0x1e0
[ 46.666308][ T7] allocate_slab+0x380/0x4e0
[ 46.666770][ T7] ___slab_alloc+0x5bc/0x940
[ 46.667264][ T7] __slab_alloc+0x6d/0x80
[ 46.667712][ T7] kmem_cache_alloc_trace+0x30a/0x330
[ 46.668299][ T7] brcmf_usbdev_qinit.constprop.0+0x50/0x470
[ 46.668885][ T7] brcmf_usb_probe+0xc97/0x1690
[ 46.669438][ T7] usb_probe_interface+0x2aa/0x760
[ 46.669988][ T7] really_probe+0x205/0xb70
[ 46.670487][ T7] __driver_probe_device+0x311/0x4b0
[ 46.671031][ T7] driver_probe_device+0x4e/0x150
[ 46.671604][ T7] __device_attach_driver+0x1cc/0x2a0
[ 46.672192][ T7] bus_for_each_drv+0x156/0x1d0
[ 46.672739][ T7] page_owner free stack trace missing
[ 46.673335][ T7]
[ 46.673620][ T7] Memory state around the buggy address:
[ 46.674213][ T7] ffff888019442700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 46.675083][ T7] ffff888019442780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 46.675994][ T7] >ffff888019442800: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 46.676875][ T7] ^
[ 46.677323][ T7] ffff888019442880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 46.678190][ T7] ffff888019442900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 46.679052][ T7] ==================================================================
[ 46.679945][ T7] Disabling lock debugging due to kernel taint
[ 46.680725][ T7] Kernel panic - not syncing:
Reviewed-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230309104457.22628-1-jisoo.jang@yonsei.ac.kr Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
set_mempolicy_home_node() iterates over a list of VMAs and calls
mbind_range() on each VMA, which also iterates over the singular list of
the VMA passed in and potentially splits the VMA. Since the VMA iterator
is not passed through, set_mempolicy_home_node() may now point to a stale
node in the VMA tree. This can result in a UAF as reported by syzbot.
Avoid the stale maple tree node by passing the VMA iterator through to the
underlying call to split_vma().
mbind_range() is also overly complicated, since there are two calling
functions and one already handles iterating over the VMAs. Simplify
mbind_range() to only handle merging and splitting of the VMAs.
Align the new loop in do_mbind() and existing loop in
set_mempolicy_home_node() to use the reduced mbind_range() function. This
allows for a single location of the range calculation and avoids
constantly looking up the previous VMA (since this is a loop over the
VMAs).
Link: https://lore.kernel.org/linux-mm/000000000000c93feb05f87e24ad@google.com/ Fixes: 66850be55e8e ("mm/mempolicy: use vma iterator & maple state instead of vma linked list") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reported-by: syzbot+a7c1ec5b1d71ceaa5186@syzkaller.appspotmail.com Link: https://lkml.kernel.org/r/20230410152205.2294819-1-Liam.Howlett@oracle.com Tested-by: syzbot+a7c1ec5b1d71ceaa5186@syzkaller.appspotmail.com Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Memory passed to kvfree_rcu() that is to be freed is tracked by a
per-CPU kfree_rcu_cpu structure, which in turn contains pointers
to kvfree_rcu_bulk_data structures that contain pointers to memory
that has not yet been handed to RCU, along with an kfree_rcu_cpu_work
structure that tracks the memory that has already been handed to RCU.
These structures track three categories of memory: (1) Memory for
kfree(), (2) Memory for kvfree(), and (3) Memory for both that arrived
during an OOM episode. The first two categories are tracked in a
cache-friendly manner involving a dynamically allocated page of pointers
(the aforementioned kvfree_rcu_bulk_data structures), while the third
uses a simple (but decidedly cache-unfriendly) linked list through the
rcu_head structures in each block of memory.
On a given CPU, these three categories are handled as a unit, with that
CPU's kfree_rcu_cpu_work structure having one pointer for each of the
three categories. Clearly, new memory for a given category cannot be
placed in the corresponding kfree_rcu_cpu_work structure until any old
memory has had its grace period elapse and thus has been removed. And
the kfree_rcu_monitor() function does in fact check for this.
Except that the kfree_rcu_monitor() function checks these pointers one
at a time. This means that if the previous kfree_rcu() memory passed
to RCU had only category 1 and the current one has only category 2, the
kfree_rcu_monitor() function will send that current category-2 memory
along immediately. This can result in memory being freed too soon,
that is, out from under unsuspecting RCU readers.
To see this, consider the following sequence of events, in which:
o Task A on CPU 0 calls rcu_read_lock(), then uses "from_cset",
then is preempted.
o CPU 1 calls kfree_rcu(cset, rcu_head) in order to free "from_cset"
after a later grace period. Except that "from_cset" is freed
right after the previous grace period ended, so that "from_cset"
is immediately freed. Task A resumes and references "from_cset"'s
member, after which nothing good happens.
In full detail:
CPU 0 CPU 1
---------------------- ----------------------
count_memcg_event_mm()
|rcu_read_lock() <---
|mem_cgroup_from_task()
|// css_set_ptr is the "from_cset" mentioned on CPU 1
|css_set_ptr = rcu_dereference((task)->cgroups)
|// Hard irq comes, current task is scheduled out.
cgroup_attach_task()
|cgroup_migrate()
|cgroup_migrate_execute()
|css_set_move_task(task, from_cset, to_cset, true)
|cgroup_move_task(task, to_cset)
|rcu_assign_pointer(.., to_cset)
|...
|cgroup_migrate_finish()
|put_css_set_locked(from_cset)
|from_cset->refcount return 0
|kfree_rcu(cset, rcu_head) // free from_cset after new gp
|add_ptr_to_bulk_krc_lock()
|schedule_delayed_work(&krcp->monitor_work, ..)
kfree_rcu_monitor()
|krcp->bulk_head[0]'s work attached to krwp->bulk_head_free[]
|queue_rcu_work(system_wq, &krwp->rcu_work)
|if rwork->rcu.work is not in WORK_STRUCT_PENDING_BIT state,
|call_rcu(&rwork->rcu, rcu_work_rcufn) <--- request new gp
// There is a perious call_rcu(.., rcu_work_rcufn)
// gp end, rcu_work_rcufn() is called.
rcu_work_rcufn()
|__queue_work(.., rwork->wq, &rwork->work);
|kfree_rcu_work()
|krwp->bulk_head_free[0] bulk is freed before new gp end!!!
|The "from_cset" is freed before new gp end.
// the task resumes some time later.
|css_set_ptr->subsys[(subsys_id) <--- Caused kernel crash, because css_set_ptr is freed.
This commit therefore causes kfree_rcu_monitor() to refrain from moving
kfree_rcu() memory to the kfree_rcu_cpu_work structure until the RCU
grace period has completed for all three categories.
v2: Use helper function instead of inserted code block at kfree_rcu_monitor().
Fixes: 34c881745549 ("rcu: Support kfree_bulk() interface in kfree_rcu()") Fixes: 5f3c8d620447 ("rcu/tree: Maintain separate array for vmalloc ptrs") Reported-by: Mukesh Ojha <quic_mojha@quicinc.com> Signed-off-by: Ziwei Dai <ziwei.dai@unisoc.com> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Tested-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As part of the Rust support for UML, we disable SSE (and similar flags)
to match the normal x86 builds. This both makes sense (we ideally want a
similar configuration to x86), and works around a crash bug with SSE
generation under Rust with LLVM.
However, this breaks compiling stdlib.h under gcc < 11, as the x86_64
ABI requires floating-point return values be stored in an SSE register.
gcc 11 fixes this by only doing register allocation when a function is
actually used, and since we never use atof(), it shouldn't be a problem:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99652
Nevertheless, only disable SSE on clang setups, as that's a simple way
of working around everyone's bugs.
Fixes: 884981867947 ("rust: arch/um: Disable FP/SIMD instruction to match x86") Reported-by: Roberto Sassu <roberto.sassu@huaweicloud.com> Link: https://lore.kernel.org/linux-um/6df2ecef9011d85654a82acd607fdcbc93ad593c.camel@huaweicloud.com/ Tested-by: Roberto Sassu <roberto.sassu@huaweicloud.com> Tested-by: SeongJae Park <sj@kernel.org> Signed-off-by: David Gow <davidgow@google.com> Reviewed-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com> Tested-by: Arthur Grillo <arthurgrillo@riseup.net> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The kernel disables all SSE and similar FP/SIMD instructions on
x86-based architectures (partly because we shouldn't be using floats in
the kernel, and partly to avoid the need for stack alignment, see:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383 )
UML does not do the same thing, which isn't in itself a problem, but
does add to the list of differences between UML and "normal" x86 builds.
In addition, there was a crash bug with LLVM < 15 / rustc < 1.65 when
building with SSE, so disabling it fixes rust builds with earlier
compiler versions, see:
https://github.com/Rust-for-Linux/linux/pull/881
Signed-off-by: David Gow <davidgow@google.com> Reviewed-by: Sergio González Collado <sergio.collado@gmail.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
dma_request_slave_channel() may return NULL which will lead to
NULL pointer dereference error in 'tmp_chan->private'.
Correct this behaviour by, first, switching from deprecated function
dma_request_slave_channel() to dma_request_chan(). Secondly, enable
sanity check for the resuling value of dma_request_chan().
Also, fix description that follows the enacted changes and that
concerns the use of dma_request_slave_channel().
Fixes: 706e2c881158 ("ASoC: fsl_asrc_dma: Reuse the dma channel if available in Back-End") Co-developed-by: Natalia Petrova <n.petrova@fintech.ru> Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Acked-by: Shengjiu Wang <shengjiu.wang@gmail.com> Link: https://lore.kernel.org/r/20230417133242.53339-1-n.zhandarovich@fintech.ru Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
syzbot is reporting circular locking dependency which involves
zonelist_update_seq seqlock [1], for this lock is checked by memory
allocation requests which do not need to be retried.
One deadlock scenario is kmalloc(GFP_ATOMIC) from an interrupt handler.
CPU0
----
__build_all_zonelists() {
write_seqlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount odd
// e.g. timer interrupt handler runs at this moment
some_timer_func() {
kmalloc(GFP_ATOMIC) {
__alloc_pages_slowpath() {
read_seqbegin(&zonelist_update_seq) {
// spins forever because zonelist_update_seq.seqcount is odd
}
}
}
}
// e.g. timer interrupt handler finishes
write_sequnlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount even
}
This deadlock scenario can be easily eliminated by not calling
read_seqbegin(&zonelist_update_seq) from !__GFP_DIRECT_RECLAIM allocation
requests, for retry is applicable to only __GFP_DIRECT_RECLAIM allocation
requests. But Michal Hocko does not know whether we should go with this
approach.
Another deadlock scenario which syzbot is reporting is a race between
kmalloc(GFP_ATOMIC) from tty_insert_flip_string_and_push_buffer() with
port->lock held and printk() from __build_all_zonelists() with
zonelist_update_seq held.
preventing interrupt context from calling kmalloc(GFP_ATOMIC)
and
preventing printk() from calling console_flush_all()
while zonelist_update_seq.seqcount is odd.
Since Petr Mladek thinks that __build_all_zonelists() can become a
candidate for deferring printk() [2], let's address this problem by
disabling local interrupts in order to avoid kmalloc(GFP_ATOMIC)
and
disabling synchronous printk() in order to avoid console_flush_all()
.
As a side effect of minimizing duration of zonelist_update_seq.seqcount
being odd by disabling synchronous printk(), latency at
read_seqbegin(&zonelist_update_seq) for both !__GFP_DIRECT_RECLAIM and
__GFP_DIRECT_RECLAIM allocation requests will be reduced. Although, from
lockdep perspective, not calling read_seqbegin(&zonelist_update_seq) (i.e.
do not record unnecessary locking dependency) from interrupt context is
still preferable, even if we don't allow calling kmalloc(GFP_ATOMIC)
inside
write_seqlock(&zonelist_update_seq)/write_sequnlock(&zonelist_update_seq)
section...
Link: https://lkml.kernel.org/r/8796b95c-3da3-5885-fddd-6ef55f30e4d3@I-love.SAKURA.ne.jp Fixes: 3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation") Link: https://lkml.kernel.org/r/ZCrs+1cDqPWTDFNM@alley Reported-by: syzbot <syzbot+223c7461c58c58a4cb10@syzkaller.appspotmail.com> Link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10 Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Petr Mladek <pmladek@suse.com> Cc: David Hildenbrand <david@redhat.com> Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Patrick Daly <quic_pdaly@quicinc.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current code path can lead to warnings because of uninitialized device,
which contains, as a consequence, uninitialized kobject. The uninitialized
device is passed to of_platform_populate, which will at some point, while
creating child device, try to get a reference on uninitialized parent,
resulting in the following warning:
kobject: '(null)' ((ptrval)): is not initialized, yet kobject_get() is
being called.
The warning is observed after migrating a kernel 5.10.x to 6.1.x.
Reverting commit 0d70af3c2530 ("fpga: bridge: Use standard dev_release for
class driver") seems to remove the warning.
This commit aggregates device_initialize() and device_add() into
device_register() but this new call is done AFTER of_platform_populate
Fixes: 0d70af3c2530 ("fpga: bridge: Use standard dev_release for class driver") Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com> Acked-by: Xu Yilun <yilun.xu@intel.com> Link: https://lore.kernel.org/r/20230404133102.2837535-2-alexis.lothore@bootlin.com Signed-off-by: Xu Yilun <yilun.xu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Warning raised because pegasus_driver submits transfer request for
bogus URB (pipe type does not match endpoint type). Add sanity check at
probe time for pipe value extracted from endpoint descriptor. Probe
will fail if sanity check fails.
Prior to this patch, the sensing configuration data was not parsed
correctly, breaking detection of max_tch. The vendor driver includes
this field. This change informs the driver about the correct maximum
number of simultaneous touch inputs.
Tested on a Pine64 PineNote with a modified touch screen controller
firmware.
We started disabling '-Warray-bounds' for gcc-12 originally on s390,
because it resulted in some warnings that weren't realistically fixable
(commit 8b202ee21839: "s390: disable -Warray-bounds").
That s390-specific issue was then found to be less common elsewhere, but
generic (see f0be87c42cbd: "gcc-12: disable '-Warray-bounds' universally
for now"), and then later expanded the version check was expanded to
gcc-11 (5a41237ad1d4: "gcc: disable -Warray-bounds for gcc-11 too").
And it turns out that I was much too optimistic in thinking that it's
all going to go away, and here we are with gcc-13 showing all the same
issues. So instead of expanding this one version at a time, let's just
disable it for gcc-11+, and put an end limit to it only when we actually
find a solution.
Yes, I'm sure some of this is because the kernel just does odd things
(like our "container_of()" use, but also knowingly playing games with
things like linker tables and array layouts).
And yes, some of the warnings are likely signs of real bugs, but when
there are hundreds of false positives, that doesn't really help.
pci_msix_validate_entries() validates the entries array which is handed in
by the caller for a MSI-X interrupt allocation. Aside of consistency
failures it also detects a failure when the size of the MSI-X hardware table
in the device is smaller than the size of the entries array.
That's wrong for the case of range allocations where the caller provides
the minimum and the maximum number of vectors to allocate, when the
hardware size is greater or equal than the mininum, but smaller than the
maximum.
Remove the hardware size check completely from that function and just
ensure that the entires array up to the maximum size is consistent.
The limitation and range checking versus the hardware size happens
independently of that afterwards anyway because the entries array is
optional.
Since 32ef9e5054ec, -Wa,-gdwarf-2 is no longer used in KBUILD_AFLAGS.
Instead, it includes -g, the appropriate -gdwarf-* flag, and also the
-Wa versions of both of those if building with Clang and GNU as. As a
result, debug info was being generated for the purgatory objects, even
though the intention was that it not be.
Fixes: 32ef9e5054ec ("Makefile.debug: re-enable debug info for .S files") Signed-off-by: Alyssa Ross <hi@alyssa.is> Cc: stable@vger.kernel.org Acked-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
LoongArch maintains cache coherency in hardware, but when paired with
LS7A chipsets the WUC attribute (Weak-ordered UnCached, which is similar
to WriteCombine) is out of the scope of cache coherency machanism for
PCIe devices (this is a PCIe protocol violation, which may be fixed in
newer chipsets).
This means WUC can only used for write-only memory regions now, so this
option is disabled by default, making WUC silently fallback to SUC for
ioremap(). You can enable this option if the kernel is ensured to run on
hardware without this bug.
Kernel parameter writecombine=on/off can be used to override the Kconfig
option.
Cc: stable@vger.kernel.org Suggested-by: WANG Xuerui <kernel@xen0n.name> Reviewed-by: WANG Xuerui <kernel@xen0n.name> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Introduce Kconfig option ARCH_STRICT_ALIGN to make -mstrict-align be
configurable.
Not all LoongArch cores support h/w unaligned access, we can use the
-mstrict-align build parameter to prevent unaligned accesses.
CPUs with h/w unaligned access support:
Loongson-2K2000/2K3000/3A5000/3C5000/3D5000.
CPUs without h/w unaligned access support:
Loongson-2K500/2K1000.
This option is enabled by default to make the kernel be able to run on
all LoongArch systems. But you can disable it manually if you want to
run kernel only on systems with h/w unaligned access support in order to
optimise for performance.
Per-vcpu flags are updated using a non-atomic RMW operation.
Which means it is possible to get preempted between the read and
write operations.
Another interesting thing to note is that preemption also updates
flags, as we have some flag manipulation in both the load and put
operations.
It is thus possible to lose information communicated by either
load or put, as the preempted flag update will overwrite the flags
when the thread is resumed. This is specially critical if either
load or put has stored information which depends on the physical
CPU the vcpu runs on.
This results in really elusive bugs, and kudos must be given to
Mostafa for the long hours of debugging, and finally spotting
the problem.
Fix it by disabling preemption during the RMW operation, which
ensures that the state stays consistent. Also upgrade vcpu_get_flag
path to use READ_ONCE() to make sure the field is always atomically
accessed.
Fixes: e87abb73e594 ("KVM: arm64: Add helpers to manipulate vcpu flags among a set") Reported-by: Mostafa Saleh <smostafa@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230418125737.2327972-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
@server->origin_fullpath already contains the tree name + optional
prefix, so avoid calling __build_path_from_dentry_optional_prefix() as
it might end up duplicating prefix path from @cifs_sb->prepath into
final full path.
Instead, generate DFS full path by simply merging
@server->origin_fullpath with dentry's path.
This fixes the following case
mount.cifs //root/dfs/dir /mnt/ -o ...
ls /mnt/link
where cifs_dfs_do_automount() will call smb3_parse_devname() with
@devname set to "//root/dfs/dir/link" instead of
"//root/dfs/dir/dir/link".
Fixes: 7ad54b98fc1f ("cifs: use origin fullpath for automounts") Cc: <stable@vger.kernel.org> # 6.2+ Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The maple tree limits the gap returned to a window that specifically fits
what was asked. This may not be optimal in the case of switching search
directions or a gap that does not satisfy the requested space for other
reasons. Fix the search by retrying the operation and limiting the search
window in the rare occasion that a conflict occurs.
Link: https://lkml.kernel.org/r/20230414185919.4175572-1-Liam.Howlett@oracle.com Fixes: 3499a13168da ("mm/mmap: use maple tree for unmapped_area{_topdown}") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reported-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>