In the ad7780_write_raw() , val2 can be zero, which might lead to a
division by zero error in DIV_ROUND_CLOSEST(). The ad7780_write_raw()
is based on iio_info's write_raw. While val is explicitly declared that
can be zero (in read mode), val2 is not specified to be non-zero.
At btrfs_ref_tree_mod() after we successfully inserted the new ref entry
(local variable 'ref') into the respective block entry's rbtree (local
variable 'be'), if we find an unexpected action of BTRFS_DROP_DELAYED_REF,
we error out and free the ref entry without removing it from the block
entry's rbtree. Then in the error path of btrfs_ref_tree_mod() we call
btrfs_free_ref_cache(), which iterates over all block entries and then
calls free_block_entry() for each one, and there we will trigger a
use-after-free when we are called against the block entry to which we
added the freed ref entry to its rbtree, since the rbtree still points
to the block entry, as we didn't remove it from the rbtree before freeing
it in the error path at btrfs_ref_tree_mod(). Fix this by removing the
new ref entry from the rbtree before freeing it.
Syzbot report this with the following stack traces:
The buggy address belongs to the object at ffff888042d1af00
which belongs to the cache kmalloc-64 of size 64
The buggy address is located 56 bytes inside of
freed 64-byte region [ffff888042d1af00, ffff888042d1af40)
Memory state around the buggy address: ffff888042d1ae00: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff888042d1ae80: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
>ffff888042d1af00: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
^ ffff888042d1af80: 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc ffff888042d1b000: 00 00 00 00 00 fc fc 00 00 00 00 00 fc fc 00 00
Reported-by: syzbot+7325f164162e200000c1@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/673723eb.050a0220.1324f8.00a8.GAE@google.com/T/#u Fixes: fd708b81d972 ("Btrfs: add a extent ref verify tool") CC: stable@vger.kernel.org # 4.19+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Syzbot reports a null-ptr-deref in btrfs_search_slot().
The reproducer is using rescue=ibadroots, and the extent tree root is
corrupted thus the extent tree is NULL.
When scrub tries to search the extent tree to gather the needed extent
info, btrfs_search_slot() doesn't check if the target root is NULL or
not, resulting the null-ptr-deref.
Add sanity check for btrfs root before using it in btrfs_search_slot().
Add annotations to functions that might sleep due to allocations or IO
and could be called from various contexts. In case of btrfs_search_slot
it's not obvious why it would sleep:
Since we currently don't always flush the quota_release_work queue in
this path, we can end up with the following race:
1. dquot are added to releasing_dquots list during regular operations.
2. FS Freeze starts, however, this does not flush the quota_release_work queue.
3. Freeze completes.
4. Kernel eventually tries to flush the workqueue while FS is frozen which
hits a WARN_ON since transaction gets started during frozen state:
Compat features are new features that older kernels can safely ignore,
allowing read-write mounts without issues. The current sb write validation
implementation returns -EFSCORRUPTED for unknown compat features,
preventing filesystem write operations and contradicting the feature's
definition.
Additionally, if the mounted image is unclean, the log recovery may need
to write to the superblock. Returning an error for unknown compat features
during sb write validation can cause mount failures.
Although XFS currently does not use compat feature flags, this issue
affects current kernels' ability to mount images that may use compat
feature flags in the future.
Since superblock read validation already warns about unknown compat
features, it's unnecessary to repeat this warning during write validation.
Therefore, the relevant code in write validation is being removed.
Fixes: 9e037cb7972f ("xfs: check for unknown v5 feature bits in superblock write verifier") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Long Li <leo.lilong@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Log recovery has always run on read only mounts, even where the primary
superblock advertises unknown rocompat bits. Due to a misunderstanding
between Eric and Darrick back in 2018, we accidentally changed the
superblock write verifier to shutdown the fs over that exact scenario.
As a result, the log cleaning that occurs at the end of the mounting
process fails if there are unknown rocompat bits set.
As we now allow writing of the superblock if there are unknown rocompat
bits set on a RO mount, we no longer want to turn off RO state to allow
log recovery to succeed on a RO mount. Hence we also remove all the
(now unnecessary) RO state toggling from the log recovery path.
Fixes: 9e037cb7972f ("xfs: check for unknown v5 feature bits in superblock write verifier" Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
Stable-dep-of: 652f03db897b ("xfs: remove unknown compat feature check in superblock write validation") Signed-off-by: Sasha Levin <sashal@kernel.org>
Fixes: 17f2142bae4b ("ASoC: fsl_micfil: use GENMASK to define register bit fields") Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com> Acked-by: Sascha Hauer <s.hauer@pengutronix.de> Link: https://lore.kernel.org/r/1651736047-28809-1-git-send-email-shengjiu.wang@nxp.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In the error handling for this function, d is freed without ever
removing it from intc_list which would lead to a use after free.
To fix this, let's only add it to the list after everything has
succeeded.
Fixes: 2dcec7a988a1 ("sh: intc: set_irq_wake() support") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
Since transport->sock has been set to NULL during reset transport,
XPRT_SOCK_UPD_TIMEOUT also needs to be cleared. Otherwise, the
xs_tcp_set_socket_timeouts() may be triggered in xs_tcp_send_request()
to dereference the transport->sock that has been set to NULL.
Fixes: 7196dbb02ea0 ("SUNRPC: Allow changing of the TCP timeout parameters on the fly") Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The socket's SOCKWQ_ASYNC_NOSPACE can be cleared by various actors in
the socket layer, so replace it with our own flag in the transport
sock_state field.
Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Stable-dep-of: 4db9ad82a6c8 ("sunrpc: clear XPRT_SOCK_UPD_TIMEOUT when reset transport") Signed-off-by: Sasha Levin <sashal@kernel.org>
When exporting only one file system with fsid=0 on the server side, the
client alternately uses the ro/rw mount options to perform the mount
operation, and a new vfsmount is generated each time.
It can be reproduced as follows:
[root@localhost ~]# mount /dev/sda /mnt2
[root@localhost ~]# echo "/mnt2 *(rw,no_root_squash,fsid=0)" >/etc/exports
[root@localhost ~]# systemctl restart nfs-server
[root@localhost ~]# mount -t nfs -o ro,vers=4 127.0.0.1:/ /mnt/sdaa
[root@localhost ~]# mount -t nfs -o rw,vers=4 127.0.0.1:/ /mnt/sdaa
[root@localhost ~]# mount -t nfs -o ro,vers=4 127.0.0.1:/ /mnt/sdaa
[root@localhost ~]# mount -t nfs -o rw,vers=4 127.0.0.1:/ /mnt/sdaa
[root@localhost ~]# mount | grep nfs4
127.0.0.1:/ on /mnt/sdaa type nfs4 (ro,relatime,vers=4.2,rsize=1048576,...
127.0.0.1:/ on /mnt/sdaa type nfs4 (rw,relatime,vers=4.2,rsize=1048576,...
127.0.0.1:/ on /mnt/sdaa type nfs4 (ro,relatime,vers=4.2,rsize=1048576,...
127.0.0.1:/ on /mnt/sdaa type nfs4 (rw,relatime,vers=4.2,rsize=1048576,...
[root@localhost ~]#
We expected that after mounting with the ro option, using the rw option to
mount again would return EBUSY, but the actual situation was not the case.
As shown above, when mounting for the first time, a superblock with the ro
flag will be generated, and at the same time, in do_new_mount_fc -->
do_add_mount, it detects that the superblock corresponding to the current
target directory is inconsistent with the currently generated one
(path->mnt->mnt_sb != newmnt->mnt.mnt_sb), and a new vfsmount will be
generated.
When mounting with the rw option for the second time, since no matching
superblock can be found in the fs_supers list, a new superblock with the
rw flag will be generated again. The superblock in use (ro) is different
from the newly generated superblock (rw), and a new vfsmount will be
generated again.
When mounting with the ro option for the third time, the superblock (ro)
is found in fs_supers, the superblock in use (rw) is different from the
found superblock (ro), and a new vfsmount will be generated again.
We can switch between ro/rw through remount, and only one superblock needs
to be generated, thus avoiding the problem of repeated generation of
vfsmount caused by switching superblocks.
Furthermore, This can also resolve the issue described in the link.
Fixes: 275a5d24bf56 ("NFS: Error when mounting the same filesystem with different options") Link: https://lore.kernel.org/all/20240604112636.236517-3-lilingfeng@huaweicloud.com/ Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
This checked if the first character of eisa_device_id::sig was not '\0'.
However, commit ac551828993e changed it as follows:
Ā Ā if (sig[0])
sig[0] is NOT the first character of the eisa_device_id::sig. The
type of 'sig' is 'char (*)[8]', meaning that the type of 'sig[0]' is
'char [8]' instead of 'char'. 'sig[0]' and 'symval' refer to the same
address, which never becomes NULL.
The correct conversion would have been:
Ā Ā if ((*sig)[0])
However, this if-conditional was meaningless because the earlier change
in commitĀ ac551828993e was incorrect.
This commit removes the entire incorrect code, which should never have
been executed.
Fixes: ac551828993e ("modpost: i2c aliases need no trailing wildcard") Fixes: 6543becf26ff ("mod/file2alias: make modalias generation safe for cross compiling") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The undervoltage flags reported by the RTC are useful to know if the
time and date are reliable after a reboot. Although the threshold VLOW1
indicates that the thermometer has been shutdown and time compensation
is off, it doesn't mean that the temperature readout is currently
impossible.
As the system is running, the RTC voltage is now fully established and
we can read the temperature.
Large amount of mount hangs observed during hotplugging of 9pfs devices. The
9pfs Xen driver attempts to initialize itself more than once, causing the
frontend and backend to disagree: the backend listens on a channel that the
frontend does not send on, resulting in stalled processing.
When building the kernel with -Wmaybe-uninitialized, the compiler
reports this warning:
In function 'jffs2_mark_erased_block',
inlined from 'jffs2_erase_pending_blocks' at fs/jffs2/erase.c:116:4:
fs/jffs2/erase.c:474:9: warning: 'bad_offset' may be used uninitialized [-Wmaybe-uninitialized]
474 | jffs2_erase_failed(c, jeb, bad_offset);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
fs/jffs2/erase.c: In function 'jffs2_erase_pending_blocks':
fs/jffs2/erase.c:402:18: note: 'bad_offset' was declared here
402 | uint32_t bad_offset;
| ^~~~~~~~~~
When mtd->point() is used, jffs2_erase_pending_blocks can return -EIO
without initializing bad_offset, which is later used at the filebad
label in jffs2_mark_erased_block.
Fix it by initializing this variable.
Fixes: 8a0f572397ca ("[JFFS2] Return values of jffs2_block_check_erase error paths") Signed-off-by: Qingfang Deng <qingfang.deng@siflower.com.cn> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
After an insertion in TNC, the tree might split and cause a node to
change its `znode->parent`. A further deletion of other nodes in the
tree (which also could free the nodes), the aforementioned node's
`znode->cparent` could still point to a freed node. This
`znode->cparent` may not be updated when getting nodes to commit in
`ubifs_tnc_start_commit()`. This could then trigger a use-after-free
when accessing the `znode->cparent` in `write_index()` in
`ubifs_tnc_end_commit()`.
The offending `memcpy()` in `ubifs_copy_hash()` has a use-after-free
when a node becomes root in TNC but still has a `cparent` to an already
freed node. More specifically, consider the following TNC:
zroot
/
/
zp1
/
/
zn
Inserting a new node `zn_new` with a key smaller then `zn` will trigger
a split in `tnc_insert()` if `zp1` is full:
zroot
/ \
/ \
zp1 zp2
/ \
/ \
zn_new zn
`zn->parent` has now been moved to `zp2`, *but* `zn->cparent` still
points to `zp1`.
Now, consider a removal of all the nodes _except_ `zn`. Just when
`tnc_delete()` is about to delete `zroot` and `zp2`:
zroot
\
\
zp2
\
\
zn
`zroot` and `zp2` get freed and the tree collapses:
zn
`zn` now becomes the new `zroot`.
`get_znodes_to_commit()` will now only find `zn`, the new `zroot`, and
`write_index()` will check its `znode->cparent` that wrongly points to
the already freed `zp1`. `ubifs_copy_hash()` thus gets wrongly called
with `znode->cparent->zbranch[znode->iip].hash` that triggers the
use-after-free!
Fix this by explicitly setting `znode->cparent` to `NULL` in
`get_znodes_to_commit()` for the root node. The search for the dirty
nodes is bottom-up in the tree. Thus, when `find_next_dirty(znode)`
returns NULL, the current `znode` _is_ the root node. Add an assert for
this.
Since commit 4c39529663b9 ("slab: Warn on duplicate cache names when
DEBUG_VM=y"), the duplicate slab cache names can be detected and a
kernel WARNING is thrown out.
In UBI fast attaching process, alloc_ai() could be invoked twice
with the same slab cache name 'ubi_aeb_slab_cache', which will trigger
following warning messages:
kmem_cache of name 'ubi_aeb_slab_cache' already exists
WARNING: CPU: 0 PID: 7519 at mm/slab_common.c:107
__kmem_cache_create_args+0x100/0x5f0
Modules linked in: ubi(+) nandsim [last unloaded: nandsim]
CPU: 0 UID: 0 PID: 7519 Comm: modprobe Tainted: G 6.12.0-rc2
RIP: 0010:__kmem_cache_create_args+0x100/0x5f0
Call Trace:
__kmem_cache_create_args+0x100/0x5f0
alloc_ai+0x295/0x3f0 [ubi]
ubi_attach+0x3c3/0xcc0 [ubi]
ubi_attach_mtd_dev+0x17cf/0x3fa0 [ubi]
ubi_init+0x3fb/0x800 [ubi]
do_init_module+0x265/0x7d0
__x64_sys_finit_module+0x7a/0xc0
The problem could be easily reproduced by loading UBI device by fastmap
with CONFIG_DEBUG_VM=y.
Fix it by using different slab names for alloc_ai() callers.
Fixes: d2158f69a7d4 ("UBI: Remove alloc_ai() slab name from parameter list") Fixes: fdf10ed710c0 ("ubi: Rework Fastmap attach base code") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
Since commit e874dcde1cbf ("ubifs: Reserve one leb for each journal
head while doing budget"), available space is calulated by deducting
reservation for all journal heads. However, the total block count (
which is only used by statfs) is not updated yet, which will cause
the wrong displaying for used space(total - available).
Fix it by deducting reservation for all journal heads from total
block count.
Fixes: e874dcde1cbf ("ubifs: Reserve one leb for each journal head while doing budget") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
If the __rtc_read_time call fails,, the struct rtc_time tm; may contain
uninitialized data, or an illegal date/time read from the RTC hardware.
When calling rtc_tm_to_ktime later, the result may be a very large value
(possibly KTIME_MAX). If there are periodic timers in rtc->timerqueue,
they will continually expire, may causing kernel softlockup.
Fixes: 6610e0893b8b ("RTC: Rework RTC code to use timerqueue for events") Signed-off-by: Yongliang Gao <leonylgao@tencent.com> Acked-by: Jingqun Li <jingqunli@tencent.com> Link: https://lore.kernel.org/r/20241011043153.3788112-1-leonylgao@gmail.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If request_irq() fails in st_rtc_probe(), there is no need to enable
the irq, and if it succeeds, disable_irq() after request_irq() still has
a time gap in which interrupts can come.
request_irq() with IRQF_NO_AUTOEN flag will disable IRQ auto-enable when
request IRQ.
Yang Erkun reports that when two threads are opening files at the same
time, and are forced to abort before a reply is seen, then the call to
nfs_release_seqid() in nfs4_opendata_free() can result in a
use-after-free of the pointer to the defunct rpc task of the other
thread.
The fix is to ensure that if the RPC call is aborted before the call to
nfs_wait_on_sequence() is complete, then we must call nfs_release_seqid()
in nfs4_open_release() before the rpc_task is freed.
Reported-by: Yang Erkun <yangerkun@huawei.com> Fixes: 24ac23ab88df ("NFSv4: Convert open() into an asynchronous RPC call") Reviewed-by: Yang Erkun <yangerkun@huawei.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently, show_stack() always dumps the trace of the current task.
However, it should dump the trace of the specified task if one is
provided. Otherwise, things like running "echo t > sysrq-trigger"
won't work as expected.
Fixes: 970e51feaddb ("um: Add support for CONFIG_STACKTRACE") Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20241106103933.1132365-1-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
As support for splitting transmission over several messages using
TX_DATA_CONT was introduced it does not immediately return the return
value of qcom_glink_tx().
The result is that in the intentless case (i.e. intent == NULL), the
code will continue to send all additional chunks. This is wasteful, and
it's possible that the send operation could incorrectly indicate
success, if the last chunk fits in the TX fifo.
The function `c_show` was called with protection from RCU. This only
ensures that `cp` will not be freed. Therefore, the reference count for
`cp` can drop to zero, which will trigger a refcount use-after-free
warning when `cache_get` is called. To resolve this issue, use
`cache_get_rcu` to ensure that `cp` remains active.
------------[ cut here ]------------
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 7 PID: 822 at lib/refcount.c:25
refcount_warn_saturate+0xb1/0x120
CPU: 7 UID: 0 PID: 822 Comm: cat Not tainted 6.12.0-rc3+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.16.1-2.fc37 04/01/2014
RIP: 0010:refcount_warn_saturate+0xb1/0x120
If the tag length is >= U32_MAX - 3 then the "length + 4" addition
can result in an integer overflow. Address this by splitting the
decoding into several steps so that decode_cb_compound4res() does
not have to perform arithmetic on the unsafe length value.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The output of ".%03u" with the unsigned int in range [0, 4294966295] may
get truncated if the target buffer is not 12 bytes. This can't really
happen here as the 'remainder' variable cannot exceed 999 but the
compiler doesn't know it. To make it happy just increase the buffer to
where the warning goes away.
Fixes: 3c9f3681d0b4 ("[SCSI] lib: add generic helper to print sizes rounded to the correct SI range") Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Andy Shevchenko <andy@kernel.org> Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: Kees Cook <kees@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/r/20241101205453.9353-1-brgl@bgdev.pl Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The dwc3_request->num_queued_sgs is decremented on completion. If a
partially completed request is handled, then the
dwc3_request->num_queued_sgs no longer reflects the total number of
num_queued_sgs (it would be cleared).
Correctly check the number of request SG entries remained to be prepare
and queued. Failure to do this may cause null pointer dereference when
accessing non-existent SG entry.
The check whether the TRB ring is full or empty in dwc3_calc_trbs_left()
is insufficient. It assumes there are active TRBs if there's any request
in the started_list. However, that's not the case for requests with a
large SG list.
That is, if we have a single usb request that requires more TRBs than
the total TRBs in the TRB ring, the queued TRBs will be available when
all the TRBs in the ring are completed. But the request is only
partially completed and remains in the started_list. With the current
logic, the TRB ring is empty, but dwc3_calc_trbs_left() returns 0.
Fix this by additionally checking for the request->num_trbs for active
TRB count.
Internal Speaker of Infinix Y4 Max remains muted due to incorrect
Pin configuration, and the Internal Mic records high noise. This patch
corrects the Pin configuration for the Internal Speaker and limits
the Internal Mic boost.
HW Probe for device: https://linux-hardware.org/?probe=6d4386c347
Test: Internal Speaker works fine, Mic has low noise.
A driver might allow the mmap access before initializing its
runtime->dma_area properly. Add a proper NULL check before passing to
virt_to_page() for avoiding a panic.
The v4l2_detect_cvt/gtf functions should check the result against the
timing capabilities: these functions calculate the timings, so if they
are out of bounds, they should be rejected.
To do this, add the struct v4l2_dv_timings_cap as argument to those
functions.
This required updates to the adv7604 and adv7842 drivers since the
prototype of these functions has now changed. The timings struct
that is passed to v4l2_detect_cvt/gtf in those two drivers is filled
with the timings detected by the hardware.
The vivid driver was also updated, but an additional check was added:
the width and height specified by VIDIOC_S_DV_TIMINGS has to match the
calculated result, otherwise something went wrong. Note that vivid
*emulates* hardware, so all the values passed to the v4l2_detect_cvt/gtf
functions came from the timings struct that was filled by userspace
and passed on to the driver via VIDIOC_S_DV_TIMINGS. So these fields
can contain random data. Both the constraints check via
struct v4l2_dv_timings_cap and the additional width/height check
ensure that the resulting timings are sane and not messed up by the
v4l2_detect_cvt/gtf calculations.
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Fixes: 2576415846bc ("[media] v4l2: move dv-timings related code to v4l2-dv-timings.c") Cc: stable@vger.kernel.org Reported-by: syzbot+a828133770f62293563e@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-media/000000000000013050062127830a@google.com/ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
of_find_compatible_node() requires a call to of_node_put() when the
pointer to the node is not required anymore to decrement its refcount
and avoid leaking memory.
Add the missing call to of_node_put() after the node has been used.
Atomicity violation occurs when the fmc_send_cmd() function is executed
simultaneously with the modification of the fmdev->resp_skb value.
Consider a scenario where, after passing the validity check within the
function, a non-null fmdev->resp_skb variable is assigned a null value.
This results in an invalid fmdev->resp_skb variable passing the validity
check. As seen in the later part of the function, skb = fmdev->resp_skb;
when the invalid fmdev->resp_skb passes the check, a null pointer
dereference error may occur at line 478, evt_hdr = (void *)skb->data;
To address this issue, it is recommended to include the validity check of
fmdev->resp_skb within the locked section of the function. This
modification ensures that the value of fmdev->resp_skb does not change
during the validation process, thereby maintaining its validity.
This possible bug is found by an experimental static analysis tool
developed by our team. This tool analyzes the locking APIs
to extract function pairs that can be concurrently executed, and then
analyzes the instructions in the paired functions to identify possible
concurrency bugs including data races and atomicity violations.
Fixes: e8454ff7b9a4 ("[media] drivers:media:radio: wl128x: FM Driver Common sources") Cc: stable@vger.kernel.org Signed-off-by: Qiu-ji Chen <chenqiuji666@gmail.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The tilt data contained in the Bluetooth packets of an Intuos Pro are
supposed to be interpreted as signed values. Simply casting the values
to type `char` is not guaranteed to work since it is implementation-
defined whether it is signed or unsigned. At least one user has noticed
the data being reported incorrectly on their system. To ensure that the
data is interpreted properly, we specifically cast to `signed char`
instead.
Link: https://github.com/linuxwacom/input-wacom/issues/445 Fixes: 4922cd26f03c ("HID: wacom: Support 2nd-gen Intuos Pro's Bluetooth classic interface") CC: stable@vger.kernel.org # 4.11+ Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com> Signed-off-by: Jiri Kosina <jkosina@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Supposing first scenario with a virtio_blk driver.
CPU0 CPU1
blk_mq_try_issue_directly()
__blk_mq_issue_directly()
q->mq_ops->queue_rq()
virtio_queue_rq()
blk_mq_stop_hw_queue()
virtblk_done()
blk_mq_request_bypass_insert() 1) store
blk_mq_start_stopped_hw_queue()
clear_bit(BLK_MQ_S_STOPPED) 3) store
blk_mq_run_hw_queue()
if (!blk_mq_hctx_has_pending()) 4) load
return
blk_mq_sched_dispatch_requests()
blk_mq_run_hw_queue()
if (!blk_mq_hctx_has_pending())
return
blk_mq_sched_dispatch_requests()
if (blk_mq_hctx_stopped()) 2) load
return
__blk_mq_sched_dispatch_requests()
Supposing another scenario.
CPU0 CPU1
blk_mq_requeue_work()
blk_mq_insert_request() 1) store
virtblk_done()
blk_mq_start_stopped_hw_queue()
blk_mq_run_hw_queues() clear_bit(BLK_MQ_S_STOPPED) 3) store
blk_mq_run_hw_queue()
if (!blk_mq_hctx_has_pending()) 4) load
return
blk_mq_sched_dispatch_requests()
if (blk_mq_hctx_stopped()) 2) load
continue
blk_mq_run_hw_queue()
Both scenarios are similar, the full memory barrier should be inserted
between 1) and 2), as well as between 3) and 4) to make sure that either
CPU0 sees BLK_MQ_S_STOPPED is cleared or CPU1 sees dispatch list.
Otherwise, either CPU will not rerun the hardware queue causing
starvation of the request.
The easy way to fix it is to add the essential full memory barrier into
helper of blk_mq_hctx_stopped(). In order to not affect the fast path
(hardware queue is not stopped most of the time), we only insert the
barrier into the slow path. Actually, only slow path needs to care about
missing of dispatching the request to the low-level device driver.
Fixes: 320ae51feed5 ("blk-mq: new multi-queue block IO queueing mechanism") Cc: stable@vger.kernel.org Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241014092934.53630-4-songmuchun@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 18011eac28c7 ("arm64: tls: Avoid unconditional zeroing of
tpidrro_el0 for native tasks") tried to optimise the context switching
of tpidrro_el0 by eliding the clearing of the register when switching
to a native task with kpti enabled, on the erroneous assumption that
the kpti trampoline entry code would already have taken care of the
write.
Although the kpti trampoline does zero the register on entry from a
native task, the check in tls_thread_switch() is on the *next* task and
so we can end up leaving a stale, non-zero value in the register if the
previous task was 32-bit.
Drop the broken optimisation and zero tpidrro_el0 unconditionally when
switching to a native 64-bit task.
Cc: Mark Rutland <mark.rutland@arm.com> Cc: stable@vger.kernel.org Fixes: 18011eac28c7 ("arm64: tls: Avoid unconditional zeroing of tpidrro_el0 for native tasks") Signed-off-by: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20241114095332.23391-1-will@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When CONFIG_CPUMASK_OFFSTACK and CONFIG_DEBUG_PER_CPU_MAPS are selected,
cpu_max_bits_warn() generates a runtime warning similar as below when
showing /proc/cpuinfo. Fix this by using nr_cpu_ids (the runtime limit)
instead of NR_CPUS to iterate CPUs.
Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The drvdata is not available in release. Let's just use container_of()
to get the vector_device instance. Otherwise, removing a vector device
will result in a crash:
Currently in omap_8250_shutdown, the dma->rx_running flag is
set to zero in omap_8250_rx_dma_flush. Next pm_runtime_get_sync
is called, which is a runtime resume call stack which can
re-set the flag. When the call omap_8250_shutdown returns, the
flag is expected to be UN-SET, but this is not the case. This
is causing issues the next time UART is re-opened and
omap_8250_rx_dma is called. Fix by moving pm_runtime_get_sync
before the omap_8250_rx_dma_flush.
cc: stable@vger.kernel.org Fixes: 0e31c8d173ab ("tty: serial: 8250_omap: add custom DMA-RX callback") Signed-off-by: Bin Liu <b-liu@ti.com>
[Judith: Add commit message] Signed-off-by: Judith Mendez <jm@ti.com> Reviewed-by: Kevin Hilman <khilman@baylibre.com> Tested-by: Kevin Hilman <khilman@baylibre.com> Link: https://lore.kernel.org/r/20241031172315.453750-1-jm@ti.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The drvdata is not available in release. Let's just use container_of()
to get the uml_net instance. Otherwise, removing a network device will
result in a crash:
The drvdata is not available in release. Let's just use container_of()
to get the ubd instance. Otherwise, removing a ubd device will result
in a crash:
During wear-leveing work, the source PEB will be moved into scrub list
when source LEB cannot be locked in ubi_eba_copy_leb(), which is wrong
for non-scrub type source PEB. The problem could bring extra and
ineffective wear-leveing jobs, which makes more or less negative effects
for the life time of flash. Specifically, the process is divided 2 steps:
1. wear_leveling_worker // generate false scrub type PEB
ubi_eba_copy_leb // MOVE_RETRY is returned
leb_write_trylock // trylock failed
scrubbing = 1;
e1 is put into ubi->scrub
2. wear_leveling_worker // schedule false scrub type PEB for wl
scrubbing = 1
e1 = rb_entry(rb_first(&ubi->scrub))
The problem can be reproduced easily by running fsstress on a small
UBIFS partition(<64M, simulated by nandsim) for 5~10mins
(CONFIG_MTD_UBI_FASTMAP=y,CONFIG_MTD_UBI_WL_THRESHOLD=50). Following
message is shown:
ubi0: scrubbed PEB 66 (LEB 0:10), data moved to PEB 165
Since scrub type source PEB has set variable scrubbing as '1', and
variable scrubbing is checked before variable keep, so the problem can
be fixed by setting keep variable as 1 directly if the source LEB cannot
be locked.
Fixes: e801e128b220 ("UBI: fix missing scrub when there is a bit-flip") CC: stable@vger.kernel.org Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The device_for_each_child_node() macro requires explicit calls to
fwnode_handle_put() upon early exits (return, break, goto) to decrement
the fwnode's refcount, and avoid levaing a node reference behind.
Add the missing fwnode_handle_put() after the common label for all error
paths.
Cc: stable@vger.kernel.org Fixes: fdc6b21e2444 ("platform/chrome: Add Type C connector class driver") Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com> Link: https://lore.kernel.org/r/20241013-cross_ec_typec_fwnode_handle_put-v2-1-9182b2cd7767@gmail.com Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
mtk_cpufreq_get_cpu_power() return 0 if the policy is NULL. Then in
em_create_perf_table(), the later zero check for power is not invalid
as power is uninitialized. As Lukasz suggested, it must return -EINVAL when
the 'policy' is not found. So return -EINVAL to fix it.
Cc: stable@vger.kernel.org Fixes: 4855e26bcf4d ("cpufreq: mediatek-hw: Add support for CPUFREQ HW") Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Suggested-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The default dummy cycle for Macronix SPI NOR flash in Octal Output
Read Mode(1-1-8) is 20.
Currently, the dummy buswidth is set according to the address bus width.
In the 1-1-8 mode, this means the dummy buswidth is 1. When converting
dummy cycles to bytes, this results in 20 x 1 / 8 = 2 bytes, causing the
host to read data 4 cycles too early.
Since the protocol data buswidth is always greater than or equal to the
address buswidth. Setting the dummy buswidth to match the data buswidth
increases the likelihood that the dummy cycle-to-byte conversion will be
divisible, preventing the host from reading data prematurely.
Fixes: 0e30f47232ab ("mtd: spi-nor: add support for DTR protocol") Cc: stable@vger.kernel.org Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Signed-off-by: Cheng Ming Lin <chengminglin@mxic.com.tw> Link: https://lore.kernel.org/r/20241112075242.174010-2-linchengming884@gmail.com Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When probing spi device take care of deferred probe of ACPI irq gpio
similar like for OF/DT case.
>From practical standpoint this fixes issue with vsc-tp driver on
Dell XP 9340 laptop, which try to request interrupt with spi->irq
equal to -EPROBE_DEFER and fail to probe with the following error:
vsc-tp spi-INTC10D0:00: probe with driver vsc-tp failed with error -22
When tb[IPSET_ATTR_IP_TO] is not present but tb[IPSET_ATTR_CIDR] exists,
the values of ip and ip_to are slightly swapped. Therefore, the range check
for ip should be done later, but this part is missing and it seems that the
vulnerability occurs.
So we should add missing range checks and remove unnecessary range checks.
Cc: <stable@vger.kernel.org> Reported-by: syzbot+58c872f7790a4d2ac951@syzkaller.appspotmail.com Fixes: 72205fc68bd1 ("netfilter: ipset: bitmap:ip set type support") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Before the GPIO direction is changed from an input to an output,
exar_set_value() is called with value = 1, but since the GPIO is an
input when exar_set_value() is called, _regmap_update_bits() reads a 1
due to an external pull-up. regmap_set_bits() sets force_write =
false, so the value (1) is not written. When the direction is then
changed, the GPIO becomes an output with the value of 0 (the hardware
default).
regmap_write_bits() sets force_write = true, so the value is always
written by exar_set_value() and an external pull-up doesn't affect the
outcome of setting direction = high.
The same can happen when a GPIO is pulled low, but the scenario is a
little more complicated.
$ echo high > /sys/class/gpio/gpio351/direction
$ cat /sys/class/gpio/gpio351/value
1
$ echo in > /sys/class/gpio/gpio351/direction
$ cat /sys/class/gpio/gpio351/value
0
The early_console_setup() function initializes the sci_ports[0].port with
an object of type struct uart_port obtained from the object of type
struct earlycon_device received as argument by the early_console_setup().
It may happen that later, when the rest of the serial ports are probed,
the serial port that was used as earlycon (e.g., port A) to be mapped to a
different position in sci_ports[] and the slot 0 to be used by a different
serial port (e.g., port B), as follows:
sci_ports[0] = port A
sci_ports[X] = port B
In this case, the new port mapped at index zero will have associated data
that was used for earlycon.
In case this happens, after Linux boot, any access to the serial port that
maps on sci_ports[0] (port A) will block the serial port that was used as
earlycon (port B).
To fix this, add early_console_exit() that clean the sci_ports[0] at
earlycon exit time.
Fix installation of WinUSB driver using OS descriptors. Without the
fix the drivers are not installed correctly and the property
'DeviceInterfaceGUID' is missing on host side.
The original change was based on the assumption that the interface
number is in the high byte of wValue but it is in the low byte,
instead. Unfortunately, the fix is based on MS documentation which is
also wrong.
The actual USB request for OS descriptors (using USB analyzer) looks
like:
C1: bmRequestType (device to host, vendor, interface)
A1: nas magic number
0002: wValue (2: nas interface)
0005: wIndex (5: get extended property i.e. nas interface GUID)
008E: wLength (142)
The fix was tested on Windows 10 and Windows 11.
Cc: stable@vger.kernel.org Fixes: ec6ce7075ef8 ("usb: gadget: composite: fix OS descriptors w_value logic") Signed-off-by: Michal Vrastil <michal.vrastil@hidglobal.com> Signed-off-by: Elson Roy Serrao <quic_eserrao@quicinc.com> Acked-by: Peter korsgaard <peter@korsgaard.com> Link: https://lore.kernel.org/r/20241113235433.20244-1-quic_eserrao@quicinc.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For bus_register(), any error which happens after kset_register() will
cause that @priv are freed twice, fixed by setting @priv with NULL after
the first free.
xhci_invalidate_cancelled_tds() may not work correctly if the hardware
is modifying endpoint or stream contexts at the same time by executing
a Set TR Dequeue command. And even if it worked, it would be unable to
queue Set TR Dequeue for the next stream, failing to clear xHC cache.
On stream endpoints, a chain of Set TR Dequeue commands may take some
time to execute and we may want to cancel more TDs during this time.
Currently this leads to Stop Endpoint completion handler calling this
function without testing for SET_DEQ_PENDING, which will trigger the
aforementioned problems when it happens.
On all endpoints, a halt condition causes Reset Endpoint to be queued
and an error status given to the class driver, which may unlink more
URBs in response. Stop Endpoint is queued and its handler may execute
concurrently with Set TR Dequeue queued by Reset Endpoint handler.
(Reset Endpoint handler calls this function too, but there seems to
be no possibility of it running concurrently with Set TR Dequeue).
Fix xhci_invalidate_cancelled_tds() to work correctly under a pending
Set TR Dequeue. Bail out of the function when SET_DEQ_PENDING is set,
then make the completion handler call the function again and also call
xhci_giveback_invalidated_tds(), which needs to be called next.
This seems to fix another potential bug, where the handler would call
xhci_invalidate_cancelled_tds(), which may clear some deferred TDs if
a sanity check fails, and the TDs wouldn't be given back promptly.
Said sanity check seems to be wrong and prone to false positives when
the endpoint halts, but fixing it is beyond the scope of this change,
besides ensuring that cleared TDs are given back properly.
Commit 9bf4e919ccad worked around an issue introduced after an innocuous
optimisation change in LLVM main:
> len is defined as an 'int' because it is assigned from
> '__user int *optlen'. However, it is clamped against the result of
> sizeof(), which has a type of 'size_t' ('unsigned long' for 64-bit
> platforms). This is done with min_t() because min() requires compatible
> types, which results in both len and the result of sizeof() being casted
> to 'unsigned int', meaning len changes signs and the result of sizeof()
> is truncated. From there, len is passed to copy_to_user(), which has a
> third parameter type of 'unsigned long', so it is widened and changes
> signs again. This excessive casting in combination with the KCSAN
> instrumentation causes LLVM to fail to eliminate the __bad_copy_from()
> call, failing the build.
The same issue occurs in rfcomm in functions rfcomm_sock_getsockopt and
rfcomm_sock_getsockopt_old.
Change the type of len to size_t in both rfcomm_sock_getsockopt and
rfcomm_sock_getsockopt_old and replace min_t() with min().
Cc: stable@vger.kernel.org Co-authored-by: Aleksei Vetrov <vvvvvv@google.com>
Improves: 9bf4e919ccad ("Bluetooth: Fix type of len in {l2cap,sco}_sock_getsockopt_old()") Link: https://github.com/ClangBuiltLinux/linux/issues/2007 Link: https://github.com/llvm/llvm-project/issues/85647 Signed-off-by: Andrej Shadura <andrew.shadura@collabora.co.uk> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is no check if stream size and start_clu are invalid.
If start_clu is EOF cluster and stream size is 4096, It will
cause uninit value access. because ei->hint_femp.eidx could
be 128(if cluster size is 4K) and wrong hint will allocate
next cluster. and this cluster will be same with the cluster
that is allocated by exfat_extend_valid_size(). The previous
patch will check invalid start_clu, but for clarity, initialize
hint_femp.eidx to zero.
Syzbot reports a problem that a warning will be triggered while
searching a lock class in look_up_lock_class().
The cause of the issue is that a new name is created and used by
lockdep_set_subclass() instead of using the existing one. This results
in a lock instance has a different name pointer than previous registered
one stored in lock class, and WARN_ONCE() is triggered because of that
in look_up_lock_class().
To fix this, change lockdep_set_subclass() to use the existing name
instead of a new one. Hence, no new name will be created by
lockdep_set_subclass(). Hence, the warning is avoided.
[boqun: Reword the commit log to state the correct issue]
Commit 7c0cca7c847e ("tty: ldisc: add sysctl to prevent autoloading of
ldiscs") introduces the tty_ldisc_autoload sysctl with the wrong
proc_handler. .extra1 and .extra2 parameters are set to avoid other values
thant SYSCTL_ZERO or SYSCTL_ONE to be set but proc_dointvec do not uses
them.
This commit fixes this by using proc_dointvec_minmax instead of
proc_dointvec.
Fixes: 7c0cca7c847e ("tty: ldisc: add sysctl to prevent autoloading of ldiscs") Cc: stable <stable@kernel.org> Signed-off-by: Nicolas Bouchinet <nicolas.bouchinet@ssi.gouv.fr> Reviewed-by: Lin Feng <linf@wangsu.com> Reviewed-by: Jiri Slaby <jirislaby@kernel.org> Link: https://lore.kernel.org/r/20241112131357.49582-4-nicolas.bouchinet@clip-os.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If some remap_pfn_range() calls succeeded before one failed, we still have
buffer pages mapped into the userspace page tables when we drop the buffer
reference with comedi_buf_map_put(bm). The userspace mappings are only
cleaned up later in the mmap error path.
Fix it by explicitly flushing all mappings in our VMA on the error path.
See commit 79a61cc3fc04 ("mm: avoid leaving partial pfn mappings around in
error case").
We got a report that adding a fanotify filsystem watch prevents tail -f
from receiving events.
Reproducer:
1. Create 3 windows / login sessions. Become root in each session.
2. Choose a mounted filesystem that is pretty quiet; I picked /boot.
3. In the first window, run: fsnotifywait -S -m /boot
4. In the second window, run: echo data >> /boot/foo
5. In the third window, run: tail -f /boot/foo
6. Go back to the second window and run: echo more data >> /boot/foo
7. Observe that the tail command doesn't show the new data.
8. In the first window, hit control-C to interrupt fsnotifywait.
9. In the second window, run: echo still more data >> /boot/foo
10. Observe that the tail command in the third window has now printed
the missing data.
When stracing tail, we observed that when fanotify filesystem mark is
set, tail does get the inotify event, but the event is receieved with
the filename:
This is unexpected, because tail is watching the file itself and not its
parent and is inconsistent with the inotify event received by tail when
fanotify filesystem mark is not set:
The inteference between different fsnotify groups was caused by the fact
that the mark on the sb requires the filename, so the filename is passed
to fsnotify(). Later on, fsnotify_handle_event() tries to take care of
not passing the filename to groups (such as inotify) that are interested
in the filename only when the parent is watching.
But the logic was incorrect for the case that no group is watching the
parent, some groups are watching the sb and some watching the inode.
Reported-by: Miklos Szeredi <miklos@szeredi.hu> Fixes: 7372e79c9eb9 ("fanotify: fix logic of reporting name info with watched parent") Cc: stable@vger.kernel.org # 5.10+ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dennis reports a boot crash on recent Lenovo laptops with a USB4 dock.
Since commit 0fc70886569c ("thunderbolt: Reset USB4 v2 host router") and
commit 59a54c5f3dbd ("thunderbolt: Reset topology created by the boot
firmware"), USB4 v2 and v1 Host Routers are reset on probe of the
thunderbolt driver.
The reset clears the Presence Detect State and Data Link Layer Link Active
bits at the USB4 Host Router's Root Port and thus causes hot removal of the
dock.
The crash occurs when pciehp is unbound from one of the dock's Downstream
Ports: pciehp creates a pci_slot on bind and destroys it on unbind. The
pci_slot contains a pointer to the pci_bus below the Downstream Port, but
a reference on that pci_bus is never acquired. The pci_bus is destroyed
before the pci_slot, so a use-after-free ensues when pci_slot_release()
accesses slot->bus.
In principle this should not happen because pci_stop_bus_device() unbinds
pciehp (and therefore destroys the pci_slot) before the pci_bus is
destroyed by pci_remove_bus_device().
However the stacktrace provided by Dennis shows that pciehp is unbound from
pci_remove_bus_device() instead of pci_stop_bus_device(). To understand
the significance of this, one needs to know that the PCI core uses a two
step process to remove a portion of the hierarchy: It first unbinds all
drivers in the sub-hierarchy in pci_stop_bus_device() and then actually
removes the devices in pci_remove_bus_device(). There is no precaution to
prevent driver binding in-between pci_stop_bus_device() and
pci_remove_bus_device().
In Dennis' case, it seems removal of the hierarchy by pciehp races with
driver binding by pci_bus_add_devices(). pciehp is bound to the
Downstream Port after pci_stop_bus_device() has run, so it is unbound by
pci_remove_bus_device() instead of pci_stop_bus_device(). Because the
pci_bus has already been destroyed at that point, accesses to it result in
a use-after-free.
One might conclude that driver binding needs to be prevented after
pci_stop_bus_device() has run. However it seems risky that pci_slot points
to pci_bus without holding a reference. Solely relying on correct ordering
of driver unbind versus pci_bus destruction is certainly not defensive
programming.
If pci_slot has a need to access data in pci_bus, it ought to acquire a
reference. Amend pci_create_slot() accordingly. Dennis reports that the
crash is not reproducible with this change.
DDI0487K.a D13.3.1 describes the PMU overflow condition, which evaluates
to true if any counter's global enable (PMCR_EL0.E), overflow flag
(PMOVSSET_EL0[n]), and interrupt enable (PMINTENSET_EL1[n]) are all 1.
Of note, this does not require a counter to be enabled
(i.e. PMCNTENSET_EL0[n] = 1) to generate an overflow.
Align kvm_pmu_overflow_status() with the reality of the architecture
and stop using PMCNTENSET_EL0 as part of the overflow condition. The
bug was discovered while running an SBSA PMU test [*], which only sets
PMCR.E, PMOVSSET<0>, PMINTENSET<0>, and expects an overflow interrupt.
As per the kernel documentation[1], hardlockup detector should
be disabled in KVM guests as it may give false positives. On
PPC, hardlockup detector is enabled inside KVM guests because
disable_hardlockup_detector() is marked as early_initcall and it
relies on kvm_guest static key (is_kvm_guest()) which is initialized
later during boot by check_kvm_guest(), which is a core_initcall.
check_kvm_guest() is also called in pSeries_smp_probe(), which is called
before initcalls, but it is skipped if KVM guest does not have doorbell
support or if the guest is launched with SMT=1.
Call check_kvm_guest() in disable_hardlockup_detector() so that
is_kvm_guest() check goes through fine and hardlockup detector can be
disabled inside the KVM guest.
Fix the AEGIS assembly code to access 'unsigned int' arguments as 32-bit
values instead of 64-bit, since the upper bits of the corresponding
64-bit registers are not guaranteed to be zero.
Note: there haven't been any reports of this bug actually causing
incorrect behavior. Neither gcc nor clang guarantee zero-extension to
64 bits, but zero-extension is likely to happen in practice because most
instructions that operate on 32-bit registers zero-extend to 64 bits.
Fixes: 1d373d4e8e15 ("crypto: x86 - Add optimized AEGIS implementations") Cc: stable@vger.kernel.org Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If the trace data buffer becomes full, a truncated flag [T] is reported
in PERF_RECORD_AUX. In some cases, the size reported is 0, even though
data must have been added to make the buffer full.
That happens when the buffer fills up from empty to full before the
Intel PT driver has updated the buffer position. Then the driver
calculates the new buffer position before calculating the data size.
If the old and new positions are the same, the data size is reported
as 0, even though it is really the whole buffer size.
Fix by detecting when the buffer position is wrapped, and adjust the
data size calculation accordingly.
Example
Use a very small buffer size (8K) and observe the size of truncated [T]
data. Before the fix, it is possible to see records of 0 size.
Before:
$ perf record -m,8K -e intel_pt// uname
Linux
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.105 MB perf.data ]
$ perf script -D --no-itrace | grep AUX | grep -F '[T]'
Warning:
AUX data lost 2 times out of 3!
$ perf record -m,8K -e intel_pt// uname
Linux
[ perf record: Woken up 3 times to write data ]
[ perf record: Captured and wrote 0.040 MB perf.data ]
$ perf script -D --no-itrace | grep AUX | grep -F '[T]'
Warning:
AUX data lost 2 times out of 3!
An atomicity violation occurs when the validity of the variables
da7219->clk_src and da7219->mclk_rate is being assessed. Since the entire
assessment is not protected by a lock, the da7219 variable might still be
in flux during the assessment, rendering this check invalid.
To fix this issue, we recommend adding a lock before the block
if ((da7219->clk_src == clk_id) && (da7219->mclk_rate == freq)) so that
the legitimacy check for da7219->clk_src and da7219->mclk_rate is
protected by the lock, ensuring the validity of the check.
This possible bug is found by an experimental static analysis tool
developed by our team. This tool analyzes the locking APIs
to extract function pairs that can be concurrently executed, and then
analyzes the instructions in the paired functions to identify possible
concurrency bugs including data races and atomicity violations.
Commit 7c55b78818cf ("jfs: xattr: fix buffer overflow for invalid xattr")
also addresses this issue but it only fixes it for positive values, while
ea_size is an integer type and can take negative values, e.g. in case of
a corrupted filesystem. This still breaks validation and would overflow
because of implicit conversion from int to size_t in print_hex_dump().
Fix this issue by clamping the ea_size value instead.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
The original implementation ext4's FS_IOC_GETFSMAP handling only
worked when the range of queried blocks included at least one free
(unallocated) block range. This is because how the metadata blocks
were emitted was as a side effect of ext4_mballoc_query_range()
calling ext4_getfsmap_datadev_helper(), and that function was only
called when a free block range was identified. As a result, this
caused generic/365 to fail.
Fix this by creating a new function ext4_getfsmap_meta_helper() which
gets called so that blocks before the first free block range in a
block group can get properly reported.
find_group_other() and find_group_orlov() read *_lo, *_hi with
ext4_free_inodes_count without additional locking. This can cause
data-race warning, but since the lock is held for most writes and free
inodes value is generally not a problem even if it is incorrect, it is
more appropriate to use READ_ONCE()/WRITE_ONCE() than to add locking.
==================================================================
BUG: KCSAN: data-race in ext4_free_inodes_count / ext4_free_inodes_set
write to 0xffff88810404300e of 2 bytes by task 6254 on cpu 1:
ext4_free_inodes_set+0x1f/0x80 fs/ext4/super.c:405
__ext4_new_inode+0x15ca/0x2200 fs/ext4/ialloc.c:1216
ext4_symlink+0x242/0x5a0 fs/ext4/namei.c:3391
vfs_symlink+0xca/0x1d0 fs/namei.c:4615
do_symlinkat+0xe3/0x340 fs/namei.c:4641
__do_sys_symlinkat fs/namei.c:4657 [inline]
__se_sys_symlinkat fs/namei.c:4654 [inline]
__x64_sys_symlinkat+0x5e/0x70 fs/namei.c:4654
x64_sys_call+0x1dda/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:267
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x76/0x7e
read to 0xffff88810404300e of 2 bytes by task 6257 on cpu 0:
ext4_free_inodes_count+0x1c/0x80 fs/ext4/super.c:349
find_group_other fs/ext4/ialloc.c:594 [inline]
__ext4_new_inode+0x6ec/0x2200 fs/ext4/ialloc.c:1017
ext4_symlink+0x242/0x5a0 fs/ext4/namei.c:3391
vfs_symlink+0xca/0x1d0 fs/namei.c:4615
do_symlinkat+0xe3/0x340 fs/namei.c:4641
__do_sys_symlinkat fs/namei.c:4657 [inline]
__se_sys_symlinkat fs/namei.c:4654 [inline]
__x64_sys_symlinkat+0x5e/0x70 fs/namei.c:4654
x64_sys_call+0x1dda/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:267
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Cc: stable@vger.kernel.org Signed-off-by: Jeongjun Park <aha310510@gmail.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://patch.msgid.link/20241003125337.47283-1-aha310510@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In success case, the revision holds a non-null pointer. The current
logic incorrectly returns an error for a non-null pointer, whereas
it should return an error for a null pointer.
The socinfo driver for IPQ9574 and IPQ5332 is currently broken,
resulting in the following error message
qcom-socinfo qcom-socinfo: probe with driver qcom-socinfo failed with
error -12
Add a null check for the revision to ensure it returns an error only in
failure case (null pointer).
The current USB-audio driver code doesn't check bLength of each
descriptor at traversing for clock descriptors. That is, when a
device provides a bogus descriptor with a shorter bLength, the driver
might hit out-of-bounds reads.
For addressing it, this patch adds sanity checks to the validator
functions for the clock descriptor traversal. When the descriptor
length is shorter than expected, it's skipped in the loop.
For the clock source and clock multiplier descriptors, we can just
check bLength against the sizeof() of each descriptor type.
OTOH, the clock selector descriptor of UAC2 and UAC3 has an array
of bNrInPins elements and two more fields at its tail, hence those
have to be checked in addition to the sizeof() check.
This patch fixes an issue in the function xenbus_dev_probe(). In the
xenbus_dev_probe() function, within the if (err) branch at line 313, the
program incorrectly returns err directly without releasing the resources
allocated by err = drv->probe(dev, id). As the return value is non-zero,
the upper layers assume the processing logic has failed. However, the probe
operation was performed earlier without a corresponding remove operation.
Since the probe actually allocates resources, failing to perform the remove
operation could lead to problems.
To fix this issue, we followed the resource release logic of the
xenbus_dev_remove() function by adding a new block fail_remove before the
fail_put block. After entering the branch if (err) at line 313, the
function will use a goto statement to jump to the fail_remove block,
ensuring that the previously acquired resources are correctly released,
thus preventing the reference count leak.
This bug was identified by an experimental static analysis tool developed
by our team. The tool specializes in analyzing reference count operations
and detecting potential issues where resources are not properly managed.
In this case, the tool flagged the missing release operation as a
potential problem, which led to the development of this patch.
ARCH_DMA_MINALIGN was defined as 16 - this is too small - it may be
possible that two unrelated 16-byte allocations share a cache line. If
one of these allocations is written using DMA and the other is written
using cached write, the value that was written with DMA may be
corrupted.
This commit changes ARCH_DMA_MINALIGN to be 128 on PA20 and 32 on PA1.1 -
that's the largest possible cache line size.
As different parisc microarchitectures have different cache line size, we
define arch_slab_minalign(), cache_line_size() and
dma_get_cache_alignment() so that the kernel may tune slab cache
parameters dynamically, based on the detected cache line size.
Multiple profiles shared 'ent->caps', so some logs missed.
Fixes: 0ed3b28ab8bf ("AppArmor: mediation of non file objects") Signed-off-by: chao liu <liuzgyid@outlook.com> Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[Syzbot reported two possible deadlocks]
The first possible deadlock is:
WARNING: possible recursive locking detected 6.12.0-rc1-syzkaller-00027-g4a9fe2a8ac53 #0 Not tainted
--------------------------------------------
syz-executor363/2651 is trying to acquire lock: ffffffff89b120e8 (chaoskey_list_lock){+.+.}-{3:3}, at: chaoskey_release+0x15d/0x2c0 drivers/usb/misc/chaoskey.c:322
but task is already holding lock: ffffffff89b120e8 (chaoskey_list_lock){+.+.}-{3:3}, at: chaoskey_release+0x7f/0x2c0 drivers/usb/misc/chaoskey.c:299
other info that might help us debug this:
Possible unsafe locking scenario:
The second possible deadlock is:
WARNING: possible circular locking dependency detected 6.12.0-rc1-syzkaller-00027-g4a9fe2a8ac53 #0 Not tainted
------------------------------------------------------
kworker/0:2/804 is trying to acquire lock: ffffffff899dadb0 (minor_rwsem){++++}-{3:3}, at: usb_deregister_dev+0x7c/0x1e0 drivers/usb/core/file.c:186
but task is already holding lock: ffffffff89b120e8 (chaoskey_list_lock){+.+.}-{3:3}, at: chaoskey_disconnect+0xa8/0x2a0 drivers/usb/misc/chaoskey.c:235
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
*** DEADLOCK ***
[Analysis]
The first is AA lock, it because wrong logic, it need a unlock.
The second is AB lock, it needs to rearrange the order of lock usage.
Fixes: 422dc0a4d12d ("USB: chaoskey: fail open after removal") Reported-by: syzbot+685e14d04fe35692d3bc@syzkaller.appspotmail.com Reported-by: syzbot+1f8ca5ee82576ec01f12@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=685e14d04fe35692d3bc Signed-off-by: Edward Adam Davis <eadavis@qq.com> Tested-by: syzbot+685e14d04fe35692d3bc@syzkaller.appspotmail.com Reported-by: syzbot+5f1ce62e956b7b19610e@syzkaller.appspotmail.com Tested-by: syzbot+5f1ce62e956b7b19610e@syzkaller.appspotmail.com Tested-by: syzbot+1f8ca5ee82576ec01f12@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/tencent_84EB865C89862EC22EE94CB3A7C706C59206@qq.com Cc: Oliver Neukum <oneukum@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
chaoskey_open() takes the lock only to increase the
counter of openings. That means that the mutual exclusion
with chaoskey_disconnect() cannot prevent an increase
of the counter and chaoskey_open() returning a success.
If that race is hit, chaoskey_disconnect() will happily
free all resources associated with the device after
it has dropped the lock, as it has read the counter
as zero.
To prevent this race chaoskey_open() has to check
the presence of the device under the lock.
However, the current per device lock cannot be used,
because it is a part of the data structure to be
freed. Hence an additional global mutex is needed.
The issue is as old as the driver.
The IO yurex_write() needs to wait for in order to have a device
ready for writing again can take a long time time.
Consequently the sleep is done in an interruptible state.
Therefore others waiting for yurex_write() itself to finish should
use mutex_lock_interruptible.
iowarrior_read() uses the iowarrior dev structure, but does not use any
lock on the structure. This can cause various bugs including data-races,
so it is more appropriate to use a mutex lock to safely protect the
iowarrior dev structure. When using a mutex lock, you should split the
branch to prevent blocking when the O_NONBLOCK flag is set.
In addition, it is unnecessary to check for NULL on the iowarrior dev
structure obtained by reading file->private_data. Therefore, it is
better to remove the check.
If i2c_smbus_write_byte_data() fails in al3010_init(),
al3010_set_pwr(false) is not called.
In order to avoid such a situation, move the devm_add_action_or_reset()
witch calls al3010_set_pwr(false) right after a successful
al3010_set_pwr(true).
The cited commit replaced inet_csk_reqsk_queue_drop_and_put() with
__inet_csk_reqsk_queue_drop() and reqsk_put() in reqsk_timer_handler().
Then, oreq should be passed to reqsk_put() instead of req; otherwise
use-after-free of nreq could happen when reqsk is migrated but the
retry attempt failed (e.g. due to timeout).
Let's pass oreq to reqsk_put().
Fixes: e8c526f2bdf1 ("tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink().") Reported-by: Liu Jian <liujian56@huawei.com> Closes: https://lore.kernel.org/netdev/1284490f-9525-42ee-b7b8-ccadf6606f6d@huawei.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Liu Jian <liujian56@huawei.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20241123174236.62438-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
After successful PCIe AER recovery, FW will reset all resource
reservations. If it is IF_UP, the driver will call bnxt_open() and
all resources will be reserved again. It it is IF_DOWN, we should
call bnxt_reserve_rings() so that we can reserve resources including
RoCE resources to allow RoCE to resume after AER. Without this
patch, RoCE fails to resume in this IF_DOWN scenario.
Later, if it becomes IF_UP, bnxt_open() will see that resources have
been reserved and will not reserve again.