The existing PowerNV hotplug code did not handle surprise plug events
correctly, leading to a complete failure of the hotplug system after device
removal and a required reboot to detect new devices.
This comes down to two issues:
1) When a device is surprise removed, often the bridge upstream
port will cause a PE freeze on the PHB. If this freeze is not
cleared, the MSI interrupts from the bridge hotplug notification
logic will not be received by the kernel, stalling all plug events
on all slots associated with the PE.
2) When a device is removed from a slot, regardless of surprise or
programmatic removal, the associated PHB/PE ls left frozen.
If this freeze is not cleared via a fundamental reset, skiboot
is unable to clear the freeze and cannot retrain / rescan the
slot. This also requires a reboot to clear the freeze and redetect
the device in the slot.
Issue the appropriate unfreeze and rescan commands on hotplug events,
and don't oops on hotplug if pci_bus_to_OF_node() returns NULL.
The Microsemi Switchtec PM8533 PFX 48xG3 [11f8:8533] PCIe switch system
was observed to incorrectly assert the Presence Detect Set bit in its
capabilities when tested on a Raptor Computing Systems Blackbird system,
resulting in the hot insert path never attempting a rescan of the bus
and any downstream devices not being re-detected.
Work around this by additionally checking whether the PCIe data link is
active or not when performing presence detection on downstream switches'
ports, similar to the pciehp_hpc.c driver.
When the root of a nested PCIe bridge configuration is unplugged, the
pnv_php driver leaked the allocated IRQ resources for the child bridges'
hotplug event notifications, resulting in a panic.
Fix this by walking all child buses and deallocating all its IRQ resources
before calling pci_hp_remove_devices().
Also modify the lifetime of the workqueue at struct pnv_php_slot::wq so
that it is only destroyed in pnv_php_free_slot(), instead of
pnv_php_disable_irq(). This is required since pnv_php_disable_irq() will
now be called by workers triggered by hot unplug interrupts, so the
workqueue needs to stay allocated.
The abridged kernel panic that occurs without this patch is as follows:
Remove comment for reorder_work which no longer exists.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: 71203f68c774 ("padata: Fix pd UAF once and for all") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
This was missed during the initial implementation. The VFIO PCI encodes
the vf_token inside the device name when opening the device from the group
FD, something like:
This is used to control access to a VF unless there is co-ordination with
the owner of the PF.
Since we no longer have a device name in the cdev path, pass the token
directly through VFIO_DEVICE_BIND_IOMMUFD using an optional field
indicated by VFIO_DEVICE_BIND_FLAG_TOKEN.
Fixes: 5fcc26969a16 ("vfio: Add VFIO_DEVICE_BIND_IOMMUFD") Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v3-bdd8716e85fe+3978a-vfio_token_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit aa3998dbeb3a ("ata: libata-scsi: Disable scsi device
manage_system_start_stop") enabled libata EH to manage device power mode
trasitions for system suspend/resume and removed the flag from
ata_scsi_dev_config. However, since the sd_shutdown() function still
relies on the manage_system_start_stop flag, a spin-down command is not
issued to the disk with command "echo 1 > /sys/block/sdb/device/delete"
sd_shutdown() can be called for both system/runtime start stop
operations, so utilize the manage_run_time_start_stop flag set in the
ata_scsi_dev_config and issue a spin-down command during disk removal
when the system is running. This is in addition to when the system is
powering off and manage_shutdown flag is set. The
manage_system_start_stop flag will still be used for drivers that still
set the flag.
Fixes: aa3998dbeb3a ("ata: libata-scsi: Disable scsi device manage_system_start_stop") Signed-off-by: Salomon Dushimirimana <salomondush@google.com> Link: https://lore.kernel.org/r/20250724214520.112927-1-salomondush@google.com Tested-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If the h8 exit fails during runtime resume process, the runtime thread
enters runtime suspend immediately and the error handler operates at the
same time. It becomes stuck and cannot be recovered through the error
handler. To fix this, use link recovery instead of the error handler.
Fixes: 4db7a2360597 ("scsi: ufs: Fix concurrency of error handler and other error recovery paths") Signed-off-by: Seunghui Lee <sh043.lee@samsung.com> Link: https://lore.kernel.org/r/20250717081213.6811-1-sh043.lee@samsung.com Reviewed-by: Bean Huo <beanhuo@micron.com> Acked-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The invocation of iscsi_put_conn() in iscsi_iter_destory_conn_fn() is
used to free the initial reference counter of iscsi_cls_conn. For
non-qla4xxx cases, the ->destroy_conn() callback (e.g.,
iscsi_conn_teardown) will call iscsi_remove_conn() and iscsi_put_conn()
to remove the connection from the children list of session and free the
connection at last. However for qla4xxx, it is not the case. The
->destroy_conn() callback of qla4xxx will keep the connection in the
session conn_list and doesn't use iscsi_put_conn() to free the initial
reference counter. Therefore, it seems necessary to keep the
iscsi_put_conn() in the iscsi_iter_destroy_conn_fn(), otherwise, there
will be memory leak problem.
In the below noted Fixes commit we introduced a reflck mutex to allow
better scaling between devices for open and close. The reflck was
based on the hot reset granularity, device level for root bus devices
which cannot support hot reset or bus/slot reset otherwise. Overlooked
in this were SR-IOV VFs, where there's also no bus reset option, but
the default for a non-root-bus, non-slot-based device is bus level
reflck granularity.
The reflck mutex has since become the dev_set mutex (via commit 2cd8b14aaa66 ("vfio/pci: Move to the device set infrastructure")) and
is our defacto serialization for various operations and ioctls. It
still seems to be the case though that sets of vfio-pci devices really
only need serialization relative to hot resets affecting the entire
set, which is not relevant to SR-IOV VFs. As described in the Closes
link below, this serialization contributes to startup latency when
multiple VFs sharing the same "bus" are opened concurrently.
Mark the device itself as the basis of the dev_set for SR-IOV VFs.
Reported-by: Aaron Lewis <aaronlewis@google.com> Closes: https://lore.kernel.org/all/20250626180424.632628-1-aaronlewis@google.com Tested-by: Aaron Lewis <aaronlewis@google.com> Fixes: e309df5b0c9e ("vfio/pci: Parallelize device open and release") Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250626225623.1180952-1-alex.williamson@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When vfio_df_close() is called with open_count=0, it triggers a warning in
vfio_assert_device_open() but still decrements open_count to -1. This allows
a subsequent open to incorrectly pass the open_count == 0 check, leading to
unintended behavior, such as setting df->access_granted = true.
For example, running an IOMMUFD compat no-IOMMU device with VFIO tests
(https://github.com/awilliam/tests/blob/master/vfio-noiommu-pci-device-open.c)
results in a warning and a failed VFIO_GROUP_GET_DEVICE_FD ioctl on the first
run, but the second run succeeds incorrectly.
Add checks to avoid decrementing open_count below zero.
Fixes: 05f37e1c03b6 ("vfio: Pass struct vfio_device_file * to vfio_device_open/close()") Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com> Link: https://lore.kernel.org/r/20250618234618.1910456-2-jacob.pan@linux.microsoft.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
For devices with no-iommu enabled in IOMMUFD VFIO compat mode, the group open
path skips vfio_df_open(), leaving open_count at 0. This causes a warning in
vfio_assert_device_open(device) when vfio_df_close() is called during group
close.
The correct behavior is to skip only the IOMMUFD bind in the device open path
for no-iommu devices. Commit 6086efe73498 omitted vfio_df_open(), which was
too broad. This patch restores the previous behavior, ensuring
the vfio_df_open is called in the group open path.
Fixes: 6086efe73498 ("vfio-iommufd: Move noiommu compat validation out of vfio_iommufd_bind()") Suggested-by: Alex Williamson <alex.williamson@redhat.com> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250618234618.1910456-1-jacob.pan@linux.microsoft.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Test: androbench by default setting, use 64GB sdcard.
the random write speed:
without this patch 3.5MB/s
with this patch 7MB/s
After patch "11a347fb6cef", the random write speed decreased significantly.
the .write_iter() interface had been modified, and check the differences
with generic_file_write_iter(), when calling generic_write_sync() and
exfat_file_write_iter() to call vfs_fsync_range(), the fdatasync flag is
wrong, and make not use the fdatasync mode, and make random write speed
decreased. So use generic_write_sync() instead of vfs_fsync_range().
Fixes: 11a347fb6cef ("exfat: change to get file size from DataLength") Signed-off-by: Zhengxu Zhang <zhengxu.zhang@unisoc.com> Acked-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The root cause of we run out-of-space is: in f2fs_map_blocks(), f2fs may
trigger foreground gc only if it allocates any physical block, it will be
a little bit later when there is multiple threads writing data w/
aio/dio/bufio method in parallel, since we always use OPU in lfs mode, so
f2fs_map_blocks() does block allocations aggressively.
In order to fix this issue, let's give a chance to trigger foreground
gc in prior to block allocation in f2fs_map_blocks().
In lfs mode, dirty data needs OPU, we'd better calculate lower_p and
upper_p w/ them during has_not_enough_free_secs(), otherwise we may
encounter out-of-space issue due to we missed to reclaim enough
free section w/ foreground gc.
When testing F2FS with xfstests using UFS backed virtual disks the
kernel complains sometimes that f2fs_release_decomp_mem() calls
vm_unmap_ram() from an invalid context. Example trace from
f2fs/007 test:
This patch modifies in_task() check inside f2fs_read_end_io() to also
check if interrupts are disabled. This ensures that pages are unmapped
asynchronously in an interrupt handler.
Fixes: bff139b49d9f ("f2fs: handle decompress only post processing in softirq") Signed-off-by: Jan Prusakowski <jprusakowski@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
If device path length equals to MAX_PATH_LEN, sbi->devs.path[] may
not end up w/ null character due to path array is fully filled, So
accidently, fields locate after path[] may be treated as part of
device path, result in parsing wrong device path.
The buggy address belongs to the object at ffff88812d961f20
which belongs to the cache f2fs_inode_cache of size 1200
The buggy address is located 856 bytes inside of
1200-byte region [ffff88812d961f20, ffff88812d9623d0)
This bug can be reproduced w/ the reproducer [2], once we enable
CONFIG_F2FS_CHECK_FS config, the reproducer will trigger panic as below,
so the direct reason of this bug is the same as the one below patch [3]
fixed.
The root cause is: in the fuzzed image, dnode #8 belongs to inode #7,
after inode #7 eviction, dnode #8 was dropped.
However there is dirent that has ino #8, so, once we unlink file3, in
f2fs_evict_inode(), both f2fs_truncate() and f2fs_update_inode_page()
will fail due to we can not load node #8, result in we missed to call
f2fs_inode_synced() to clear inode dirty status.
Let's fix this by calling f2fs_inode_synced() in error path of
f2fs_evict_inode().
PS: As I verified, the reproducer [2] can trigger this bug in v6.1.129,
but it failed in v6.16-rc4, this is because the testcase will stop due to
other corruption has been detected by f2fs:
F2FS-fs (loop0): inconsistent node block, node_type:2, nid:8, node_footer[nid:8,ino:8,ofs:0,cpver:5013063228981249506,blkaddr:15366]
F2FS-fs (loop0): f2fs_lookup: inode (ino=9) has zero i_nlink
Fixes: 0f18b462b2e5 ("f2fs: flush inode metadata when checkpoint is doing") Closes: https://syzkaller.appspot.com/x/report.txt?x=13448368580000 Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
==================================================================
BUG: KASAN: use-after-free in __list_del_entry_valid+0xa6/0x130 lib/list_debug.c:62
Read of size 8 at addr ffff888100567dc8 by task kworker/u4:0/8
The buggy address belongs to the object at ffff888100567a10
which belongs to the cache f2fs_inode_cache of size 1360
The buggy address is located 952 bytes inside of
1360-byte region [ffff888100567a10, ffff888100567f60)
The root cause is w/ a fuzzed image, f2fs may missed to clear FI_DIRTY_INODE
flag for target inode, after f2fs_evict_inode(), the inode is still linked in
sbi->inode_list[DIRTY_META] global list, once it triggers checkpoint,
f2fs_sync_inode_meta() may access the released inode.
In f2fs_evict_inode(), let's always call f2fs_inode_synced() to clear
FI_DIRTY_INODE flag and drop inode from global dirty list to avoid this
UAF issue.
Fixes: 0f18b462b2e5 ("f2fs: flush inode metadata when checkpoint is doing") Closes: https://syzkaller.appspot.com/bug?extid=849174b2efaf0d8be6ba Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
This patch adds missing upper boundary check while setting
gc_valid_thresh_ratio via sysfs.
Fixes: e791d00bd06c ("f2fs: add valid block ratio not to do excessive GC for one time GC") Cc: Daeho Jeong <daehojeong@google.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
KMSAN reported a use of uninitialized value in `__is_extent_mergeable()`
and `__is_back_mergeable()` via the read extent tree path.
The root cause is that `get_read_extent_info()` only initializes three
fields (`fofs`, `blk`, `len`) of `struct extent_info`, leaving the
remaining fields uninitialized. This leads to undefined behavior
when those fields are accessed later, especially during
extent merging.
Fix it by zero-initializing the `extent_info` struct before population.
The decompress_io_ctx may be released asynchronously after
I/O completion. If this file is deleted immediately after read,
and the kworker of processing post_read_wq has not been executed yet
due to high workloads, It is possible that the inode(f2fs_inode_info)
is evicted and freed before it is used f2fs_free_dic.
The UAF case as below:
Thread A Thread B
- f2fs_decompress_end_io
- f2fs_put_dic
- queue_work
add free_dic work to post_read_wq
- do_unlink
- iput
- evict
- call_rcu
This file is deleted after read.
Thread C kworker to process post_read_wq
- rcu_do_batch
- f2fs_free_inode
- kmem_cache_free
inode is freed by rcu
- process_scheduled_works
- f2fs_late_free_dic
- f2fs_free_dic
- f2fs_release_decomp_mem
read (dic->inode)->i_compress_algorithm
This patch store compress_algorithm and sbi in dic to avoid inode UAF.
In addition, the previous solution is deprecated in [1] may cause system hang.
[1] https://lore.kernel.org/all/c36ab955-c8db-4a8b-a9d0-f07b5f426c3f@kernel.org
Cc: Daeho Jeong <daehojeong@google.com> Fixes: bff139b49d9f ("f2fs: handle decompress only post processing in softirq") Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Baocong Liu <baocong.liu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
=============================
[ BUG: Invalid wait context ]
6.13.0-rc1 #84 Tainted: G O
-----------------------------
cat/56160 is trying to lock: ffff888105c86648 (&cprc->stat_lock){+.+.}-{3:3}, at: update_general_status+0x32a/0x8c0 [f2fs]
other info that might help us debug this:
context-{5:5}
2 locks held by cat/56160:
#0: ffff88810a002a98 (&p->lock){+.+.}-{4:4}, at: seq_read_iter+0x56/0x4c0
#1: ffffffffa0462638 (f2fs_stat_lock){....}-{2:2}, at: stat_show+0x29/0x1020 [f2fs]
stack backtrace:
CPU: 0 UID: 0 PID: 56160 Comm: cat Tainted: G O 6.13.0-rc1 #84
Tainted: [O]=OOT_MODULE
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Call Trace:
<TASK>
dump_stack_lvl+0x88/0xd0
dump_stack+0x14/0x20
__lock_acquire+0x8d4/0xbb0
lock_acquire+0xd6/0x300
_raw_spin_lock+0x38/0x50
update_general_status+0x32a/0x8c0 [f2fs]
stat_show+0x50/0x1020 [f2fs]
seq_read_iter+0x116/0x4c0
seq_read+0xfa/0x130
full_proxy_read+0x66/0x90
vfs_read+0xc4/0x350
ksys_read+0x74/0xf0
__x64_sys_read+0x1d/0x20
x64_sys_call+0x17d9/0x1b80
do_syscall_64+0x68/0x130
entry_SYSCALL_64_after_hwframe+0x67/0x6f
RIP: 0033:0x7f2ca53147e2
- seq_read
- stat_show
- raw_spin_lock_irqsave(&f2fs_stat_lock, flags)
: f2fs_stat_lock is raw_spinlock_t type variable
- update_general_status
- spin_lock(&sbi->cprc_info.stat_lock);
: stat_lock is spinlock_t type variable
The root cause is the lock order is incorrect [1], we should not acquire
spinlock_t lock after raw_spinlock_t lock, as if CONFIG_PREEMPT_LOCK is
on, spinlock_t is implemented based on rtmutex, which can sleep after
holding the lock.
To fix this issue, let's use change f2fs_stat_lock lock type from
raw_spinlock_t to spinlock_t, it's safe due to:
- we don't need to use raw version of spinlock as the path is not
performance sensitive.
- we don't need to use irqsave version of spinlock as it won't be
used in irq context.
Quoted from [1]:
"Extend lockdep to validate lock wait-type context.
Where lockdep validates that the current lock (the one being acquired)
fits in the current wait-context (as generated by the held stack).
This ensures that there is no attempt to acquire mutexes while holding
spinlocks, to acquire spinlocks while holding raw_spinlocks and so on. In
other words, its a more fancy might_sleep()."
When rv3028_clkout_round_rate() is called with a requested rate higher
than the highest supported rate, it currently returns 0, which disables
the clock. According to the clk API, round_rate() should instead return
the highest supported rate. Update the function to return the maximum
supported rate in this case.
When pcf8563_clkout_round_rate() is called with a requested rate higher
than the highest supported rate, it currently returns 0, which disables
the clock. According to the clk API, round_rate() should instead return
the highest supported rate. Update the function to return the maximum
supported rate in this case.
When pcf85063_clkout_round_rate() is called with a requested rate higher
than the highest supported rate, it currently returns 0, which disables
the clock. According to the clk API, round_rate() should instead return
the highest supported rate. Update the function to return the maximum
supported rate in this case.
Fixes: 8c229ab6048b7 ("rtc: pcf85063: Add pcf85063 clkout control to common clock framework") Signed-off-by: Brian Masney <bmasney@redhat.com> Link: https://lore.kernel.org/r/20250710-rtc-clk-round-rate-v1-4-33140bb2278e@redhat.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When nct3018y_clkout_round_rate() is called with a requested rate higher
than the highest supported rate, it currently returns 0, which disables
the clock. According to the clk API, round_rate() should instead return
the highest supported rate. Update the function to return the maximum
supported rate in this case.
When hym8563_clkout_round_rate() is called with a requested rate higher
than the highest supported rate, it currently returns 0, which disables
the clock. According to the clk API, round_rate() should instead return
the highest supported rate. Update the function to return the maximum
supported rate in this case.
When ds3231_clk_sqw_round_rate() is called with a requested rate higher
than the highest supported rate, it currently returns 0, which disables
the clock. According to the clk API, round_rate() should instead return
the highest supported rate. Update the function to return the maximum
supported rate in this case.
The type of u argument of atomic_long_inc_below() should be long to avoid
unwanted truncation to int.
The patch fixes the wrong argument type of an internal function to
prevent unwanted argument truncation. It fixes an internal locking
primitive; it should not have any direct effect on userspace.
Mark said
: AFAICT there's no problem in practice because atomic_long_inc_below()
: is only used by inc_ucount(), and it looks like the value is
: constrained between 0 and INT_MAX.
:
: In inc_ucount() the limit value is taken from
: user_namespace::ucount_max[], and AFAICT that's only written by
: sysctls, to the table setup by setup_userns_sysctls(), where
: UCOUNT_ENTRY() limits the value between 0 and INT_MAX.
:
: This is certainly a cleanup, but there might be no functional issue in
: practice as above.
Link: https://lkml.kernel.org/r/20250721174610.28361-1-ubizjak@gmail.com Fixes: f9c82a4ea89c ("Increase size of ucounts to atomic_long_t") Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Alexey Gladkov <legion@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: MengEn Sun <mengensun@tencent.com> Cc: "Thomas Weißschuh" <linux@weissschuh.net> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The moduleparam code allows modules to provide their own definition of
MODULE_PARAM_PREFIX, instead of using the default KBUILD_MODNAME ".".
Commit 730b69d22525 ("module: check kernel param length at compile time,
not runtime") added a check to ensure the prefix doesn't exceed
MODULE_NAME_LEN, as this is what param_sysfs_builtin() expects.
Later, commit 58f86cc89c33 ("VERIFY_OCTAL_PERMISSIONS: stricter checking
for sysfs perms.") removed this check, but there is no indication this was
intentional.
Since the check is still useful for param_sysfs_builtin() to function
properly, reintroduce it in __module_param_call(), but in a modernized form
using static_assert().
While here, clean up the __module_param_call() comments. In particular,
remove the comment "Default value instead of permissions?", which comes
from commit 9774a1f54f17 ("[PATCH] Compile-time check re world-writeable
module params"). This comment was related to the test variable
__param_perm_check_##name, which was removed in the previously mentioned
commit 58f86cc89c33.
Fixes: 58f86cc89c33 ("VERIFY_OCTAL_PERMISSIONS: stricter checking for sysfs perms.") Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Link: https://lore.kernel.org/r/20250630143535.267745-4-petr.pavlu@suse.com Signed-off-by: Daniel Gomez <da.gomez@samsung.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In a private write transfer, the driver pre-fills the FIFO to work around
the FIFO_EMPTY quirk. However, if an IBIWON event occurs, the hardware
emits a NACK and the driver initiates a retry. During the retry, driver
attempts to pre-fill the FIFO again if there is remaining data, but since
the FIFO is already full, this leads to data loss.
Check available space in FIFO to prevent overflow.
When CONFIG_I3C is disabled and the i3c_i2c_driver_register() happens
to not be inlined, any driver calling it still references the i3c_driver
instance, which then causes a link failure:
x86_64-linux-ld: drivers/hwmon/lm75.o: in function `lm75_i3c_reg_read':
lm75.c:(.text+0xc61): undefined reference to `i3cdev_to_dev'
x86_64-linux-ld: lm75.c:(.text+0xd25): undefined reference to `i3c_device_do_priv_xfers'
x86_64-linux-ld: lm75.c:(.text+0xdd8): undefined reference to `i3c_device_do_priv_xfers'
This issue was part of the original i3c code, but only now caused problems
when i3c support got added to lm75.
Change the 'inline' annotations in the header to '__always_inline' to
ensure that the dead-code-elimination pass in the compiler can optimize
it out as intended.
Fixes: 6071d10413ff ("hwmon: (lm75) add I3C support for P3T1755") Fixes: 3a379bbcea0a ("i3c: Add core I3C infrastructure") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20250725090609.2456262-1-arnd@kernel.org Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The testcase triggers some unnecessary unaligned memory accesses on the
parisc architecture:
Kernel: unaligned access to 0x12f28e27 in policy_unpack_test_init+0x180/0x374 (iir 0x0cdc1280)
Kernel: unaligned access to 0x12f28e67 in policy_unpack_test_init+0x270/0x374 (iir 0x64dc00ce)
Use the existing helper functions put_unaligned_le32() and
put_unaligned_le16() to avoid such warnings on architectures which
prefer aligned memory accesses.
Signed-off-by: Helge Deller <deller@gmx.de> Fixes: 98c0cc48e27e ("apparmor: fix policy_unpack_test on big endian systems") Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
There's error path that could lead to inactive uprobe:
1) uprobe_register succeeds - updates instruction to int3 and
changes ref_ctr from 0 to 1
2) uprobe_unregister fails - int3 stays in place, but ref_ctr
is changed to 0 (it's not restored to 1 in the fail path)
uprobe is leaked
3) another uprobe_register comes and re-uses the leaked uprobe
and succeds - but int3 is already in place, so ref_ctr update
is skipped and it stays 0 - uprobe CAN NOT be triggered now
4) uprobe_unregister fails because ref_ctr value is unexpected
Fix this by reverting the updated ref_ctr value back to 1 in step 2),
which is the case when uprobe_unregister fails (int3 stays in place), but
we have already updated refctr.
The new scenario will go as follows:
1) uprobe_register succeeds - updates instruction to int3 and
changes ref_ctr from 0 to 1
2) uprobe_unregister fails - int3 stays in place and ref_ctr
is reverted to 1.. uprobe is leaked
3) another uprobe_register comes and re-uses the leaked uprobe
and succeds - but int3 is already in place, so ref_ctr update
is skipped and it stays 1 - uprobe CAN be triggered now
4) uprobe_unregister succeeds
Link: https://lkml.kernel.org/r/20250514101809.2010193-1-jolsa@kernel.org Fixes: 1cc33161a83d ("uprobes: Support SDT markers having reference count (semaphore)") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Suggested-by: Oleg Nesterov <oleg@redhat.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Conflicting attachment resolution is based on the number of states
traversed to reach an accepting state in the attachment DFA, accounting
for DFA loops traversed during the matching process. However, the loop
counting logic had multiple bugs:
- The inc_wb_pos macro increments both position and length, but length
is supposed to saturate upon hitting buffer capacity, instead of
wrapping around.
- If no revisited state is found when traversing the history, is_loop
would still return true, as if there was a loop found the length of
the history buffer, instead of returning false and signalling that
no loop was found. As a result, the adjustment step of
aa_dfa_leftmatch would sometimes produce negative counts with loop-
free DFAs that traversed enough states.
- The iteration in the is_loop for loop is supposed to stop before
i = wb->len, so the conditional should be < instead of <=.
This patch fixes the above bugs as well as the following nits:
- The count and size fields in struct match_workbuf were not used,
so they can be removed.
- The history buffer in match_workbuf semantically stores aa_state_t
and not unsigned ints, even if aa_state_t is currently unsigned int.
- The local variables in is_loop are counters, and thus should be
unsigned ints instead of aa_state_t's.
Fixes: 21f606610502 ("apparmor: improve overlapping domain attachment resolution") Signed-off-by: Ryan Lee <ryan.lee@canonical.com> Co-developed-by: John Johansen <john.johansen@canonical.com> Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
WB_HISTORY_SIZE was defined to be a value not a power of 2, despite a
comment in the declaration of struct match_workbuf stating it is and a
modular arithmetic usage in the inc_wb_pos macro assuming that it is. Bump
WB_HISTORY_SIZE's value up to 32 and add a BUILD_BUG_ON_NOT_POWER_OF_2
line to ensure that any future changes to the value of WB_HISTORY_SIZE
respect this requirement.
Fixes: 136db994852a ("apparmor: increase left match history buffer size") Signed-off-by: Ryan Lee <ryan.lee@canonical.com> Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Similarly to the previous patch fixing the flow_dissector ctx accesses,
nf_is_valid_access also doesn't check that ctx accesses are aligned.
Contrary to flow_dissector programs, netfilter programs don't have
context conversion. The unaligned ctx accesses are therefore allowed by
the verifier.
Fixes: fd9c663b9ad6 ("bpf: minimal support for programs hooked into netfilter framework") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/853ae9ed5edaa5196e8472ff0f1bb1cc24059214.1754039605.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
flow_dissector_is_valid_access doesn't check that the context access is
aligned. As a consequence, an unaligned access within one of the exposed
field is considered valid and later rejected by
flow_dissector_convert_ctx_access when we try to convert it.
The later rejection is problematic because it's reported as a verifier
bug with a kernel warning and doesn't point to the right instruction in
verifier logs.
Since commit 6e890c5d5021 ("vhost: use vhost_tasks for worker threads"),
the vhost uses vhost_task and operates as a child of the
owner thread. This is required for correct CPU usage accounting,
especially when using containers.
However, this change has caused confusion for some legacy
userspace applications, and we didn't notice until it's too late.
Unfortunately, it's too late to revert - we now have userspace
depending both on old and new behaviour :(
To address the issue, reintroduce kthread mode for vhost workers and
provide a configuration to select between kthread and task worker.
- Add 'fork_owner' parameter to vhost_dev to let users select kthread
or task mode. Default mode is task mode(VHOST_FORK_OWNER_TASK).
- Reintroduce kthread mode support:
* Bring back the original vhost_worker() implementation,
and renamed to vhost_run_work_kthread_list().
* Add cgroup support for the kthread
* Introduce struct vhost_worker_ops:
- Encapsulates create / stop / wake‑up callbacks.
- vhost_worker_create() selects the proper ops according to
inherit_owner.
- Userspace configuration interface:
* New IOCTLs:
- VHOST_SET_FORK_FROM_OWNER lets userspace select task mode
(VHOST_FORK_OWNER_TASK) or kthread mode (VHOST_FORK_OWNER_KTHREAD)
- VHOST_GET_FORK_FROM_OWNER reads the current worker mode
* Expose module parameter 'fork_from_owner_default' to allow system
administrators to configure the default mode for vhost workers
* Kconfig option CONFIG_VHOST_ENABLE_FORK_OWNER_CONTROL controls whether
these IOCTLs and the parameter are available
- The VHOST_NEW_WORKER functionality requires fork_owner to be set
to true, with validation added to ensure proper configuration
This partially reverts or improves upon:
commit 6e890c5d5021 ("vhost: use vhost_tasks for worker threads")
commit 1cdaafa1b8b4 ("vhost: replace single worker pointer with xarray")
Fixes: 6e890c5d5021 ("vhost: use vhost_tasks for worker threads"), Signed-off-by: Cindy Lu <lulu@redhat.com>
Message-Id: <20250714071333.59794-2-lulu@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Add missing idr_destroy() call in vduse_exit() to properly free the
vduse_idr radix tree nodes. Without this, module load/unload cycles leak
576-byte radix tree node allocations, detectable by kmemleak as:
The vduse_idr is initialized via DEFINE_IDR() at line 136 and used throughout
the VDUSE (vDPA Device in Userspace) driver for device ID management. The fix
follows the documented pattern in lib/idr.c and matches the cleanup approach
used by other drivers.
This leak was discovered through comprehensive module testing with cumulative
kmemleak detection across 10 load/unload iterations per module.
Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace") Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Message-Id: <20250704125335.1084649-1-anders.roxell@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The commit in the fixes tag made sure that mlx5_vdpa_free()
is the single entrypoint for removing the vdpa device resources
added in mlx5_vdpa_dev_add(), even in the cleanup path of
mlx5_vdpa_dev_add().
This means that all functions from mlx5_vdpa_free() should be able to
handle uninitialized resources. This was not the case though:
mlx5_vdpa_destroy_mr_resources() and mlx5_cmd_cleanup_async_ctx()
were not able to do so. This caused the splat below when adding
a vdpa device without a MAC address.
This patch fixes these remaining issues:
- Makes mlx5_vdpa_destroy_mr_resources() return early if called on
uninitialized resources.
- Moves mlx5_cmd_init_async_ctx() early on during device addition
because it can't fail. This means that mlx5_cmd_cleanup_async_ctx()
also can't fail. To mirror this, move the call site of
mlx5_cmd_cleanup_async_ctx() in mlx5_vdpa_free().
An additional comment was added in mlx5_vdpa_free() to document
the expectations of functions called from this context.
The condition comparing ret to VHOST_SCSI_PREALLOC_SGLS was incorrect,
as ret holds the result of kstrtouint() (typically 0 on success),
not the parsed value. Update the check to use cnt, which contains the
actual user-provided value.
prevents silently accepting values exceeding the maximum inline_sg_cnt.
Fixes: bca939d5bcd0 ("vhost-scsi: Dynamically allocate scatterlists") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20250628183405.3979538-1-alok.a.tiwari@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
As part of the normal initiator side scanning the guest's scsi layer
will loop over all possible targets and send an inquiry. Since the
max number of targets for virtio-scsi is 256, this can result in 255
error messages about targets not existing if you only have a single
target. When there's more than 1 vhost-scsi device each with a single
target, then you get N * 255 log messages.
It looks like the log message was added by accident in:
commit 3f8ca2e115e5 ("vhost/scsi: Extract common handling code from
control queue handler")
when we added common helpers. Then in:
commit 09d7583294aa ("vhost/scsi: Use common handling code in request
queue handler")
we converted the scsi command processing path to use the new
helpers so we started to see the extra log messages during scanning.
The patches were just making some code common but added the vq_err
call and I'm guessing the patch author forgot to enable the vq_err
call (vq_err is implemented by pr_debug which defaults to off). So
this patch removes the call since it's expected to hit this path
during device discovery.
Fixes: 09d7583294aa ("vhost/scsi: Use common handling code in request queue handler") Signed-off-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20250611210113.10912-1-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
needs_teardown is a device flag that indicates when virtual queues need
to be recreated. This happens for certain configuration changes: queue
size and some specific features.
Currently, the needs_teardown state can be incorrectly reset by
subsequent .set_vq_num() calls. For example, for 1 rx VQ with size 512
and 1 tx VQ with size 256:
.set_vq_num(0, 512) -> sets needs_teardown to true (rx queue has a
non-default size)
.set_vq_num(1, 256) -> sets needs_teardown to false (tx queue has a
default size)
This change takes into account the previous value of the needs_teardown
flag when re-calculating it during VQ size configuration.
Fixes: 0fe963d6fc16 ("vdpa/mlx5: Re-create HW VQs under certain conditions") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Si-Wei Liu<si-wei.liu@oracle.com>
Message-Id: <20250604184802.2625300-1-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Lonial reported that an out-of-bounds access in cgroup local storage
can be crafted via tail calls. Given two programs each utilizing a
cgroup local storage with a different value size, and one program
doing a tail call into the other. The verifier will validate each of
the indivial programs just fine. However, in the runtime context
the bpf_cg_run_ctx holds an bpf_prog_array_item which contains the
BPF program as well as any cgroup local storage flavor the program
uses. Helpers such as bpf_get_local_storage() pick this up from the
runtime context:
For the second program which was called from the originally attached
one, this means bpf_get_local_storage() will pick up the former
program's map, not its own. With mismatching sizes, this can result
in an unintended out-of-bounds access.
To fix this issue, we need to extend bpf_map_owner with an array of
storage_cookie[] to match on i) the exact maps from the original
program if the second program was using bpf_get_local_storage(), or
ii) allow the tail call combination if the second program was not
using any of the cgroup local storage maps.
Fixes: 7d9c3427894f ("bpf: Make cgroup storages shared between programs on the same cgroup") Reported-by: Lonial Con <kongln9170@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20250730234733.530041-4-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Given this is only relevant for BPF tail call maps, it is adding up space
and penalizing other map types. We also need to extend this with further
objects to track / compare to. Therefore, lets move this out into a separate
structure and dynamically allocate it only for BPF tail call maps.
Add a cookie to BPF maps to uniquely identify BPF maps for the timespan
when the node is up. This is different to comparing a pointer or BPF map
id which could get rolled over and reused.
It post-processes samples to find which DSO has samples. Based on that
info, it can save used DSOs in the build-ID cache directory. But for
some reason, it saves all DSOs without checking the hit mark. Skipping
unused DSOs can give some speedup especially with --buildid-mmap being
default.
On my idle machine, `time perf record -a sleep 1` goes down from 3 sec
to 1.5 sec with this change.
Fixes: e29386c8f7d71fa5 ("perf record: Add --buildid-mmap option to enable PERF_RECORD_MMAP2's build id") Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250731070330.57116-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
scarlett2_input_select_ctl_info() sets up the string arrays allocated
via kasprintf(), but it misses NULL checks, which may lead to NULL
dereference Oops. Let's add the proper NULL check.
In dbAllocCtl(), read_metapage() increases the reference count of the
metapage. However, when dp->tree.budmin < 0, the function returns -EIO
without calling release_metapage() to decrease the reference count,
leading to a memory leak.
Add release_metapage(mp) before the error return to properly manage
the metapage reference count and prevent the leak.
Fixes: a5f5e4698f8abbb25fe4959814093fb5bfa1aa9d ("jfs: fix shift-out-of-bounds in dbSplit") Signed-off-by: Zheng Yu <zheng.yu@northwestern.edu> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
We are using pci_get_domain_bus_and_slot() function to verify if
the given config directory name matches any existing PCI device,
but we missed to call matching pci_dev_put() to release reference.
While around, also change error code in case of no device match,
to make it more specific than generic formatting error.
Fixes: 16280ded45fb ("drm/xe: Add configfs to enable survivability mode") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250722141059.30707-2-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 0bdd05c2a82bbf2419415d012fd4f5faeca7f1af) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Zero-length AV pairs should be considered as valid target infos.
Don't skip the next AV pairs that follow them.
Cc: linux-cifs@vger.kernel.org Cc: David Howells <dhowells@redhat.com> Fixes: 0e8ae9b953bc ("smb: client: parse av pair type 4 in CHALLENGE_MESSAGE") Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
fb_add_videomode() can fail with -ENOMEM when its internal kmalloc() cannot
allocate a struct fb_modelist. If that happens, the modelist stays empty but
the driver continues to register. Add a check for its return value to prevent
poteintial null-ptr-deref, which is similar to the commit 17186f1f90d3 ("fbdev:
Fix do_register_framebuffer to prevent null-ptr-deref in fb_videomode_to_var").
The `adf_ring_next()` function in the QAT debug transport interface
fails to correctly update the position index when reaching the end of
the ring elements. This triggers the following kernel warning when
reading ring files, such as
/sys/kernel/debug/qat_c6xx_<D:B:D:F>/transport/bank_00/ring_00:
[27725.022965] seq_file: buggy .next function adf_ring_next [intel_qat] did not update position index
Ensure that the `*pos` index is incremented before returning NULL when
after the last element in the ring is found, satisfying the seq_file API
requirements and preventing the warning.
Fixes: a672a9dc872e ("crypto: qat - Intel(R) QAT transport code") Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Ahsan Atta <ahsan.atta@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
QAT devices perform an additional integrity check during compression by
decompressing the output. Starting from QAT GEN4, this verification is
done in-line by the hardware. However, on GEN2 devices, the hardware
reads back the compressed output from the destination buffer and performs
a decompression operation using it as the source.
In the current QAT driver, destination buffers are always marked as
write-only. This is incorrect for QAT GEN2 compression, where the buffer
is also read during verification. Since commit 6f5dc7658094
("iommu/vt-d: Restore WO permissions on second-level paging entries"),
merged in v6.16-rc1, write-only permissions are strictly enforced, leading
to DMAR errors when using QAT GEN2 devices for compression, if VT-d is
enabled.
Mark the destination buffers as DMA_BIDIRECTIONAL. This ensures
compatibility with GEN2 devices, even though it is not required for
QAT GEN4 and later.
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Fixes: cf5bb835b7c8 ("crypto: qat - fix DMA transfer direction") Reviewed-by: Ahsan Atta <ahsan.atta@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
Fix the `clk_round_rate` implementation for Versal platforms by calling
the Versal-specific divider calculation helper. The existing code used
the generic divider routine, which results in incorrect round rate.
arch/sh/Makefile defines and exports ld-bfd to be used by
arch/sh/boot/compressed/Makefile and arch/sh/boot/romimage/Makefile.
However some shells, including dash, will not pass through environment
variables whose name includes a hyphen. Usually GNU make does not use
a shell to recurse, but if e.g. $(srctree) contains '~' it will use a
shell here.
Other instances of this problem were previously fixed by commits 2bfbe7881ee0 "kbuild: Do not use hyphen in exported variable name"
and 82977af93a0d "sh: rename suffix-y to suffix_y".
Rename the variable to ld_bfd.
References: https://buildd.debian.org/status/fetch.php?pkg=linux&arch=sh4&ver=4.13%7Erc5-1%7Eexp1&stamp=1502943967&raw=0 Fixes: 7b022d07a0fd ("sh: Tidy up the ldscript output format specifier.") Signed-off-by: Ben Hutchings <benh@debian.org> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
Return 0 instead of -EINVAL if function ccu_pll_recalc_rate() fails to
get correct rate entry. Follow .recalc_rate callback documentation
as mentioned in include/linux/clk-provider.h for error return value.
Signed-off-by: Akhilesh Patil <akhilesh@ee.iitb.ac.in> Fixes: 1b72c59db0add ("clk: spacemit: Add clock support for SpacemiT K1 SoC") Reviewed-by: Haylen Chu <heylenay@4d2.org> Reviewed-by: Alex Elder <elder@riscstar.com> Link: https://lore.kernel.org/r/aIBzVClNQOBrjIFG@bhairav-test.ee.iitb.ac.in Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
For the XCVR module on i.MX95, even though it only supports SPDIF, the
channel status needs to be obtained from RAM space, which is processed
by firmware. Firmware is necessary to trigger the FSL_XCVR_IRQ_NEW_CS
interrupt.
This change also applies for the SPDIF & ARC function on i.MX8MP which
has the firmware.
Fixes: e6a9750a346b ("ASoC: fsl_xcvr: Add suspend and resume support") Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com> Link: https://patch.msgid.link/20250710030405.3370671-3-shengjiu.wang@nxp.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
There is no PHY for the XCVR module on i.MX93, the channel status needs
to be obtained from FSL_XCVR_RX_CS_DATA_* registers. And channel status
acknowledge (CSA) bit should be set once channel status is processed.
Fixes: e240b9329a30 ("ASoC: fsl_xcvr: Add support for i.MX93 platform") Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com> Link: https://patch.msgid.link/20250710030405.3370671-2-shengjiu.wang@nxp.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The current regmap readable/writeable helper functions always
allow the Next flag and allows any Control Number. Mask the Next
flag based on SDCA_ACCESS_MODE_DUAL which is the only Mode that
supports it. Also check that the Control Number is valid for
the given control.
This patch reflects the change made to move TPS65215 from 1 GPO and 1 GPIO
to 2 GPOs and 1 GPIO. TPS65215 and TPS65219 both have 2 GPOs and 1 GPIO.
TPS65214 has 1 GPO and 1 GPIO. TPS65215 will reuse the TPS65219 GPIO
compatible string.
Fixes: 7947219ab1a2 ("mfd: tps65219: Add support for TI TPS65214 PMIC") Signed-off-by: Shree Ramamoorthy <s-ramamoorthy@ti.com> Link: https://lore.kernel.org/r/20250527190455.169772-2-s-ramamoorthy@ti.com Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
FILENAME_MAX is the same as PATH_MAX (4kb) in glibc rather than
NAME_MAX's 255. Switch to using NAME_MAX and ensure the '\0' is
accounted for in the path's buffer size.
Fixes: 754baf426e09 ("perf pmu: Change aliases from list to hashmap") Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250717150855.1032526-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
FILENAME_MAX is often PATH_MAX (4kb), far more than needed for the
/proc path. Make the buffer size sufficient for the maximum integer
plus "/proc/" and "/status" with a '\0' terminator.
Mux clocks are now described with a customized ccu_mux structure
consisting of ccu_internal and ccu_common substructures, and registered
later with devm_clk_hw_register_mux_parent_data_table(). As this helper
always allocates a new clk_hw structure, it's extremely hard to use mux
clocks as parents statically by clk_hw pointers, since CCF has no
knowledge about the clk_hw structure embedded in ccu_mux.
This scheme already causes issues for clock c910, which takes a mux
clock, c910-i0, as a possible parent. With mainline U-Boot that
reparents c910 to c910-i0 at boottime, c910 is considered as an orphan
by CCF.
This patch refactors handling of mux clocks, embeds a clk_mux structure
in ccu_mux directly. Instead of calling devm_clk_hw_register_mux_*(),
we could register mux clocks on our own without allocating any new
clk_hw pointer, fixing c910 clock's issue.
Fixes: ae81b69fd2b1 ("clk: thead: Add support for T-Head TH1520 AP_SUBSYS clocks") Signed-off-by: Yao Zi <ziyao@disroot.org> Signed-off-by: Drew Fustini <fustini@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The do_k_string() and do_c_string() functions do essentially the same
thing which is they add a string and a comma onto the end of an existing
string. At the end, the caller will overwrite the last comma with a
newline. Later, in orangefs_kernel_debug_init(), we add a newline to
the string.
The change to do_k_string() is just cosmetic. I moved the "- 1" to
the other side of the comparison and made it "+ 1". This has no
effect on runtime, I just wanted the functions to match each other
and the rest of the file.
However in do_c_string(), I removed the "- 2" which allows us to print
two extra characters. I noticed this issue while reviewing the code
and I doubt affects anything in real life. My guess is that this was
double counting the comma and the newline. The "+ 1" accounts for
the newline, and the caller will delete the final comma which ensures
there is enough space for the newline.
Removing the "- 2" lets us print 2 more characters, but mainly it makes
the code more consistent and understandable for reviewers.
Fixes: 44f4641073f1 ("orangefs: clean up debugfs globals") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
As per a commit from Qualcomm's downstream 6.1 kernel[0], the init
sequence is missing setting the CMN_CTRL_OVERRIDE_EN bit back to 0 at
the end, as per the 'latest' HPG revision (as of November 2023).
When enabling runtime PM for clock suppliers that also belong to a power
domain, the following crash is thrown:
error: synchronous external abort: 0000000096000010 [#1] PREEMPT SMP
Workqueue: events_unbound deferred_probe_work_func
pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : clk_mux_get_parent+0x60/0x90
lr : clk_core_reparent_orphans_nolock+0x58/0xd8
Call trace:
clk_mux_get_parent+0x60/0x90
clk_core_reparent_orphans_nolock+0x58/0xd8
of_clk_add_hw_provider.part.0+0x90/0x100
of_clk_add_hw_provider+0x1c/0x38
imx95_bc_probe+0x2e0/0x3f0
platform_probe+0x70/0xd8
Enabling runtime PM without explicitly resuming the device caused
the power domain cut off after clk_register() is called. As a result,
a crash happens when the clock hardware provider is added and attempts
to access the BLK_CTL register.
Fix this by using devm_pm_runtime_enable() instead of pm_runtime_enable()
and getting rid of the pm_runtime_disable() in the cleanup path.
Fixes: 5224b189462f ("clk: imx: add i.MX95 BLK CTL clk driver") Reviewed-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Abel Vesa <abel.vesa@linaro.org> Signed-off-by: Laurentiu Palcu <laurentiu.palcu@oss.nxp.com> Signed-off-by: Peng Fan <peng.fan@nxp.com> Link: https://lore.kernel.org/r/20250707-imx95-blk-ctl-7-1-v3-2-c1b676ec13be@nxp.com Signed-off-by: Abel Vesa <abel.vesa@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
__iomem attribute is supposed to be used only with variables holding the
MMIO pointer. But here, 'mw_addr' variable is just holding a 'void *'
returned by pci_epf_alloc_space(). So annotating it with __iomem is clearly
wrong. Hence, drop the attribute.
This also fixes the below sparse warning:
drivers/pci/endpoint/functions/pci-epf-vntb.c:524:17: warning: incorrect type in assignment (different address spaces)
drivers/pci/endpoint/functions/pci-epf-vntb.c:524:17: expected void [noderef] __iomem *mw_addr
drivers/pci/endpoint/functions/pci-epf-vntb.c:524:17: got void *
drivers/pci/endpoint/functions/pci-epf-vntb.c:530:21: warning: incorrect type in assignment (different address spaces)
drivers/pci/endpoint/functions/pci-epf-vntb.c:530:21: expected unsigned int [usertype] *epf_db
drivers/pci/endpoint/functions/pci-epf-vntb.c:530:21: got void [noderef] __iomem *mw_addr
drivers/pci/endpoint/functions/pci-epf-vntb.c:542:38: warning: incorrect type in argument 2 (different address spaces)
drivers/pci/endpoint/functions/pci-epf-vntb.c:542:38: expected void *addr
drivers/pci/endpoint/functions/pci-epf-vntb.c:542:38: got void [noderef] __iomem *mw_addr
Fixes: e35f56bb0330 ("PCI: endpoint: Support NTB transfer between RC and EP") Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20250709125022.22524-1-mani@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
The bus->params should be restored if the stream is failed to prepare.
The issue exists since beginning. The Fixes tag just indicates the
first commit that the commit can be applied to.
/proc/cgroups lists only v1 controllers by default, however, this is
only enforced since the commit af000ce85293b ("cgroup: Do not report
unavailable v1 controllers in /proc/cgroups") and there is software in
the wild that uses content of /proc/cgroups to decide on availability of
v2 (sic) controllers.
Add a boottime param that can bring back the previous behavior for
setups where the check in the software cannot be changed and it causes
e.g. unintended OOMs.
Also, this patch takes out cgrp_v1_visible from cgroup1_subsys_absent()
guard since it's only important to check which hierarchy (v1 vs v2) the
subsys is attached to. This has no effect on the printed message but
the code is cleaner since cgrp_v1_visible is really about mounted
hierarchies, not the content of /proc/cgroups.
Link: https://lore.kernel.org/r/b26b60b7d0d2a5ecfd2f3c45f95f32922ed24686.camel@decadent.org.uk Fixes: af000ce85293b ("cgroup: Do not report unavailable v1 controllers in /proc/cgroups") Fixes: a0ab1453226d8 ("cgroup: Print message when /proc/cgroups is read on v2-only system") Signed-off-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>