REG_PCIE*_MEM and REG_PCIE*_MEM_MASK regs (PBUS_CSR memory region) are not
part of the scu block on the EN7581 SoC and they are used to select the
PCIE ports on the PBUS, so remove this configuration from the clock driver
and set these registers in the PCIE host driver instead.
This patch does not introduce any backward incompatibility since the dts
for EN7581 SoC is not upstream yet.
Base clocks are the first in being probed and are real dependencies of the
rest of fixed, factor and peripheral clocks. For old ralink SoCs RT2880,
RT305x and RT3883 'xtal' must be defined first since in any other case,
when fixed clocks are probed they are delayed until 'xtal' is probed so the
following warning appears:
When this driver was mainlined we could not find any active users of old
ralink SoCs so we cannot perform any real tests for them. Now, one user
of a Belkin f9k1109 version 1 device which uses RT3883 SoC appeared and
reported some issues in openWRT:
- https://github.com/openwrt/openwrt/issues/16054
Thus, define a 'rt2880_xtal_recalc_rate()' just returning the expected
frequency 40Mhz and use it along the old ralink SoCs to have a correct
boot trace with no warnings and a working clock plan from the beggining.
Fixes: 6f3b15586eef ("clk: ralink: add clock and reset driver for MTMIPS SoCs") Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com> Link: https://lore.kernel.org/r/20240910044024.120009-3-sergio.paracuellos@gmail.com Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Clock plan for Ralink SoC RT3883 needs an extra 'periph' clock to properly
set some peripherals that has this clock as their parent. When this driver
was mainlined we could not find any active users of this SoC so we cannot
perform any real tests for it. Now, one user of a Belkin f9k1109 version 1
device which uses this SoC appear and reported some issues in openWRT:
- https://github.com/openwrt/openwrt/issues/16054
The peripherals that are wrong are 'uart', 'i2c', 'i2s' and 'uartlite' which
has a not defined 'periph' clock as parent. Hence, introduce it to have a
properly working clock plan for this SoC.
Fixes: 6f3b15586eef ("clk: ralink: add clock and reset driver for MTMIPS SoCs") Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com> Link: https://lore.kernel.org/r/20240910044024.120009-2-sergio.paracuellos@gmail.com Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Add NULL check in applnco_probe, to handle kernel NULL pointer
dereference error.
Fixes: 6641057d5dba ("clk: clk-apple-nco: Add driver for Apple NCO") Signed-off-by: Charles Han <hanchunchao@inspur.com> Link: https://lore.kernel.org/r/20241114072820.3071-1-hanchunchao@inspur.com Reviewed-by: Martin Povišer <povik+lin@cutebit.org> Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Move pkey change work initialization and cleanup from device resources
stage to notifier stage, since this is the stage which handles this work
events.
Fix a race between the device deregistration and pkey change work by moving
MLX5_IB_STAGE_DEVICE_NOTIFIER to be after MLX5_IB_STAGE_IB_REG in order to
ensure that the notifier is deregistered before the device during cleanup.
Which ensures there are no works that are being executed after the
device has already unregistered which can cause the panic below.
When information such as info->screen_base is not ready, calling
sh7760fb_free_mem() does not release memory correctly. Call
dma_free_coherent() instead.
DIP algorithm requires a one-to-one mapping between dgid and dip_idx.
Currently a queue 'spare_idx' is used to store QPN of QPs that use
DIP algorithm. For a new dgid, use a QPN from spare_idx as dip_idx.
This method lacks a mechanism for deduplicating QPN, which may result
in different dgids sharing the same dip_idx and break the one-to-one
mapping requirement.
This patch replaces spare_idx with xarray and introduces a refcnt of
a dip_idx to indicate the number of QPs that using this dip_idx.
The state machine for dip_idx management is implemented as:
* The entry at an index in xarray is empty -- This indicates that the
corresponding dip_idx hasn't been created.
* The entry at an index in xarray is not empty but with 0 refcnt --
This indicates that the corresponding dip_idx has been created but
not used as dip_idx yet.
* The entry at an index in xarray is not empty and with non-0 refcnt --
This indicates that the corresponding dip_idx is being used by refcnt
number of DIP QPs.
Fixes: eb653eda1e91 ("RDMA/hns: Bugfix for incorrect association between dip_idx and dgid") Fixes: f91696f2f053 ("RDMA/hns: Support congestion control type selection according to the FW") Signed-off-by: Feng Fang <fangfeng4@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20241112055553.3681129-1-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit 6398326b9ba1 ("KVM: PPC: Book3S HV P9: Stop using vc->dpdes")
dropped the use of vcore->dpdes for msgsndp / SMT emulation. Prior to that
commit, the below code at L1 level (see [1] for terminology) was
responsible for setting vc->dpdes for the respective L2 vCPU:
if (!nested) {
kvmppc_core_prepare_to_enter(vcpu);
if (vcpu->arch.doorbell_request) {
vc->dpdes = 1;
smp_wmb();
vcpu->arch.doorbell_request = 0;
}
L1 then sent vc->dpdes to L0 via kvmhv_save_hv_regs(), and while
servicing H_ENTER_NESTED at L0, the below condition at L0 level made sure
to abort and go back to L1 if vcpu->arch.doorbell_request = 1 so that L1
sets vc->dpdes as per above if condition:
This worked fine since vcpu->arch.doorbell_request was used more like a
flag and vc->dpdes was used to pass around the doorbell state. But after
Commit 6398326b9ba1 ("KVM: PPC: Book3S HV P9: Stop using vc->dpdes"),
vcpu->arch.doorbell_request is the only variable used to pass around
doorbell state.
With the plumbing for handling doorbells for nested guests updated to use
vcpu->arch.doorbell_request over vc->dpdes, the above "else if" stops
doorbells from working correctly as L0 aborts execution of L2 and
instead goes back to L1.
Remove vcpu->arch.doorbell_request from the above "else if" condition as
it is no longer needed for L0 to correctly handle the doorbell status
while running L2.
[1] Terminology
1. L0 : PowerNV linux running with HV privileges
2. L1 : Pseries KVM guest running on top of L0
2. L2 : Nested KVM guest running on top of L1
commit 6398326b9ba1 ("KVM: PPC: Book3S HV P9: Stop using vc->dpdes")
introduced an optimization to use only vcpu->doorbell_request for SMT
emulation for Power9 and above guests, but the code for nested guests
still relies on the old way of handling doorbells, due to which an L2
guest (see [1]) cannot be booted with XICS with SMT>1. The command to
repro this issue is:
Fix the plumbing to utilize vcpu->doorbell_request instead of vcore->dpdes
for nested KVM guests on P9 and above.
[1] Terminology
1. L0 : PowerNV linux running with HV privileges
2. L1 : Pseries KVM guest running on top of L0
2. L2 : Nested KVM guest running on top of L1
After commit: 83762cb5c7c4 ("dax: Kill DEV_DAX_PMEM_COMPAT") the pmem/
directory is not needed anymore and Makefile changes were made
accordingly in this commit, but there is a Makefile and pmem.c in pmem/
which are now stale and pmem.c is empty, remove them.
Fixes: 83762cb5c7c4 ("dax: Kill DEV_DAX_PMEM_COMPAT") Suggested-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://patch.msgid.link/20241017101144.1654085-1-harshit.m.mogalapalli@oracle.com Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Uninit was created at:
__alloc_pages_noprof+0x9a7/0xe00
alloc_pages_mpol_noprof+0x299/0x990
alloc_pages_noprof+0x1bf/0x1e0
allocate_slab+0x33a/0x1250
___slab_alloc+0x12ef/0x35e0
kmem_cache_alloc_bulk_noprof+0x486/0x1330
__io_alloc_req_refill+0x84/0x560
io_submit_sqes+0x172f/0x2f30
__se_sys_io_uring_enter+0x406/0x41c0
__x64_sys_io_uring_enter+0x11f/0x1a0
x64_sys_call+0x2b54/0x3ba0
do_syscall_64+0xcd/0x1e0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Since an instance of 'struct kiocb' may be passed from the block layer
with 'private' field uninitialized, introduce 'ocfs2_iocb_init_rw_locked()'
and use it from where 'ocfs2_dio_end_io()' might take care, i.e. in
'ocfs2_file_read_iter()' and 'ocfs2_file_write_iter()'.
Link: https://lkml.kernel.org/r/20241029091736.1501946-1-dmantipov@yandex.ru Fixes: 7cdfc3a1c397 ("ocfs2: Remember rw lock level during direct io") Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Reported-by: syzbot+a73e253cca4f0230a5a5@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a73e253cca4f0230a5a5 Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The intent here was clearly to use the gfp variable flags instead of
hardcoding GFP_KERNEL. All the callers pass GFP_KERNEL as the gfp
flags so this doesn't affect runtime.
Fixes: b3231d353a51 ("kunit: add a convenience allocation wrapper for SKBs") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: David Gow <davidgow@google.com> Reviewed-by: Kuan-Wei Chiu <visitorckw@gmail.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Patch series "kasan: migrate the last module test to kunit", v4.
copy_user_test() is the last KUnit-incompatible test with
CONFIG_KASAN_MODULE_TEST requirement, which we are going to migrate to
KUnit framework and delete the former test and Kconfig as well.
In this patch series:
- [1/3] move kasan_check_write() and check_object_size() to
do_strncpy_from_user() to cover with KASAN checks with
multiple conditions in strncpy_from_user().
- [2/3] migrated copy_user_test() to KUnit, where we can also test
strncpy_from_user() due to [1/4].
KUnits have been tested on:
- x86_64 with CONFIG_KASAN_GENERIC. Passed
- arm64 with CONFIG_KASAN_SW_TAGS. 1 fail. See [1]
- arm64 with CONFIG_KASAN_HW_TAGS. 1 fail. See [1]
[1] https://lore.kernel.org/linux-mm/CACzwLxj21h7nCcS2-KA_q7ybe+5pxH0uCDwu64q_9pPsydneWQ@mail.gmail.com/
- [3/3] delete CONFIG_KASAN_MODULE_TEST and documentation occurrences.
This patch (of 3):
Since in the commit 2865baf54077("x86: support user address masking
instead of non-speculative conditional") do_strncpy_from_user() is called
from multiple places, we should sanitize the kernel *dst memory and size
which were done in strncpy_from_user() previously.
Link: https://lkml.kernel.org/r/20241016131802.3115788-1-snovitoll@gmail.com Link: https://lkml.kernel.org/r/20241016131802.3115788-2-snovitoll@gmail.com Fixes: 2865baf54077 ("x86: support user address masking instead of non-speculative conditional") Signed-off-by: Sabyrzhan Tasbolatov <snovitoll@gmail.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Alex Shi <alexs@kernel.org> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Hu Haowen <2023002089@link.tyut.edu.cn> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Marco Elver <elver@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Yanteng Si <siyanteng@loongson.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
cppc_get_cpu_power() return 0 if the policy is NULL. Then in
em_create_perf_table(), the later zero check for power is not valid
as power is uninitialized. As Quentin pointed out, kernel energy model
core check the return value of active_power() first, so if the callback
failed it should tell the core. So return -EINVAL to fix it.
cppc_get_cpu_cost() return 0 if the policy is NULL. Then in
em_compute_costs(), the later zero check for cost is not valid
as cost is uninitialized. As Quentin pointed out, kernel energy model
core check the return value of get_cost() first, so if the callback
failed it should tell the core. Return -EINVAL to fix it.
Even if it's not critical, the avoidance of checking the error code
from devm_mutex_init() call today diminishes the point of using devm
variant of it. Tomorrow it may even leak something. Add the missed
check.
The FENCE indicator in hns WQE doesn't ensure that response data from
a previous Read/Atomic operation has been written to the requester's
memory before the subsequent Send/Write operation is processed. This
may result in the subsequent Send/Write operation accessing the original
data in memory instead of the expected response data.
Unlike FENCE, the SO (Strong Order) indicator blocks the subsequent
operation until the previous response data is written to memory and a
bresp is returned. Set the SO indicator instead of FENCE to maintain
strict order.
The param area is a memory region where the kernel places additional
command-line arguments for fadump kernel. Currently, the param memory
area is reserved in fadump kernel if it is above boot_mem_top. However,
it should be reserved if it is below boot_mem_top because the fadump
kernel already reserves memory from boot_mem_top to the end of DRAM.
Currently, there is no impact from not reserving param memory if it is
below boot_mem_top, as it is not used after the early boot phase of the
fadump kernel. However, if this changes in the future, it could lead to
issues in the fadump kernel.
Fixes: 3416c9daa6b1 ("powerpc/fadump: pass additional parameters when fadump is active") Acked-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241107055817.489795-2-sourabhjain@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
Memory for passing additional parameters to fadump capture kernel
is allocated during subsys_initcall level, using memblock. But
as slab is already available by this time, allocation happens via
the buddy allocator. This may work for radix MMU but is likely to
fail in most cases for hash MMU as hash MMU needs this memory in
the first memory block for it to be accessible in real mode in the
capture kernel (second boot). So, allocate memory for additional
parameters area as soon as MMU mode is obvious.
Memory access #VEs are hard for Linux to handle in contexts like the
entry code or NMIs. But other OSes need them for functionality.
There's a static (pre-guest-boot) way for a VMM to choose one or the
other. But VMMs don't always know which OS they are booting, so they
choose to deliver those #VEs so the "other" OSes will work. That,
unfortunately has left us in the lurch and exposed to these
hard-to-handle #VEs.
The TDX module has introduced a new feature. Even if the static
configuration is set to "send nasty #VEs", the kernel can dynamically
request that they be disabled. Once they are disabled, access to private
memory that is not in the Mapped state in the Secure-EPT (SEPT) will
result in an exit to the VMM rather than injecting a #VE.
Check if the feature is available and disable SEPT #VE if possible.
If the TD is allowed to disable/enable SEPT #VEs, the ATTR_SEPT_VE_DISABLE
attribute is no longer reliable. It reflects the initial state of the
control for the TD, but it will not be updated if someone (e.g. bootloader)
changes it before the kernel starts. Kernel must check TDCS_TD_CTLS bit to
determine if SEPT #VEs are enabled or disabled.
[ dhansen: remove 'return' at end of function ]
Fixes: 373e715e31bf ("x86/tdx: Panic on bad configs that #VE on "private" memory access") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/all/20241104103803.195705-4-kirill.shutemov%40linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
The TDG_VM_WR TDCALL is used to ask the TDX module to change some
TD-specific VM configuration. There is currently only one user in the
kernel of this TDCALL leaf. More will be added shortly.
Refactor to make way for more users of TDG_VM_WR who will need to modify
other TD configuration values.
Add a wrapper for the TDG_VM_RD TDCALL that requests TD-specific
metadata from the TDX module. There are currently no users for
TDG_VM_RD. Mark it as __maybe_unused until the first user appears.
This is preparation for enumeration and enabling optional TD features.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://lore.kernel.org/all/20241104103803.195705-2-kirill.shutemov%40linux.intel.com
Stable-dep-of: f65aa0ad79fc ("x86/tdx: Dynamically disable SEPT violations from causing #VEs") Signed-off-by: Sasha Levin <sashal@kernel.org>
In 2010, runtime power management support was implemented in the SCSI
core. The description of patch "[SCSI] implement runtime Power
Management" mentions that the sg driver is skipped but not why. This
patch enables runtime power management even if an instance of the sg
driver is held open. Enabling runtime PM for the sg driver is safe
because all interactions of the sg driver with the SCSI device pass
through the block layer (blk_execute_rq_nowait()) and the block layer
already supports runtime PM.
Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Douglas Gilbert <dgilbert@interlog.com> Fixes: bc4f24014de5 ("[SCSI] implement runtime Power Management") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20241030220310.1373569-1-bvanassche@acm.org Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Hook "qedi_ops->common->sb_init = qed_sb_init" does not release the DMA
memory sb_virt when it fails. Add dma_free_coherent() to free it. This
is the same way as qedr_alloc_mem_sb() and qede_alloc_mem_sb().
Fixes: ace7f46ba5fd ("scsi: qedi: Add QLogic FastLinQ offload iSCSI driver framework.") Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/r/20241026125711.484-3-thunder.leizhen@huawei.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Hook "qed_ops->common->sb_init = qed_sb_init" does not release the DMA
memory sb_virt when it fails. Add dma_free_coherent() to free it. This
is the same way as qedr_alloc_mem_sb() and qede_alloc_mem_sb().
Fixes: 61d8658b4a43 ("scsi: qedf: Add QLogic FastLinQ offload FCoE driver framework.") Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/r/20241026125711.484-2-thunder.leizhen@huawei.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The return value of scsi_device_reprobe() is currently ignored in
_scsih_reprobe_lun(). Fixing the calling code to deal with the potential
error is non-trivial, so for now just WARN_ON().
The handling of scsi_device_reprobe()'s return value refers to
_scsih_reprobe_lun() and the following link:
Fixes: f99be43b3024 ("[SCSI] fusion: power pc and miscellaneous bug fixs") Signed-off-by: Zeng Heng <zengheng4@huawei.com> Link: https://lore.kernel.org/r/20241024084417.154655-1-zengheng4@huawei.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Don't call bfad_im_module_exit() if bfad_im_module_init() failed.
Fixes: 7725ccfda597 ("[SCSI] bfa: Brocade BFA FC SCSI driver") Signed-off-by: Ye Bin <yebin10@huawei.com> Link: https://lore.kernel.org/r/20241023011809.63466-1-yebin@huaweicloud.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In pr_err(), bdev_open_by_path() should be renamed to
bdev_file_open_by_path()
Fixes: 034f0cf8fdf9 ("target: port block device access to file") Signed-off-by: Baolin Liu <liubaolin@kylinos.cn> Link: https://lore.kernel.org/r/20241030021800.234980-1-liubaolin12138@163.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Coccinelle complains about the nested reuse of the pointer `iter' with
different pointer type:
./fs/proc/kcore.c:515:26-30: ERROR: invalid reference to the index variable of the iterator on line 499
./fs/proc/kcore.c:534:23-27: ERROR: invalid reference to the index variable of the iterator on line 499
./fs/proc/kcore.c:550:40-44: ERROR: invalid reference to the index variable of the iterator on line 499
./fs/proc/kcore.c:568:27-31: ERROR: invalid reference to the index variable of the iterator on line 499
./fs/proc/kcore.c:581:28-32: ERROR: invalid reference to the index variable of the iterator on line 499
./fs/proc/kcore.c:599:27-31: ERROR: invalid reference to the index variable of the iterator on line 499
./fs/proc/kcore.c:607:38-42: ERROR: invalid reference to the index variable of the iterator on line 499
./fs/proc/kcore.c:614:26-30: ERROR: invalid reference to the index variable of the iterator on line 499
Replacing `struct kcore_list *iter' with `struct kcore_list *tmp' doesn't change the
scope and the functionality is the same and coccinelle seems happy.
NOTE: There was an issue with using `struct kcore_list *pos' as the nested iterator.
The build did not work!
[akpm@linux-foundation.org: s/tmp/pos/] Link: https://lkml.kernel.org/r/20241029054651.86356-2-mtodorovac69@gmail.com Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ Link: https://lkml.kernel.org/r/20220331223700.902556-1-jakobkoschel@gmail.com Fixes: 04d168c6d42d ("fs/proc/kcore.c: remove check of list iterator against head past the loop body") Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com> Signed-off-by: Mirsad Todorovac <mtodorovac69@gmail.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Brian Johannesmeyer" <bjohannesmeyer@gmail.com> Cc: Cristiano Giuffrida <c.giuffrida@vu.nl> Cc: "Bos, H.J." <h.j.bos@vu.nl> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Yan Zhen <yanzhen@vivo.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
props.timing is not set after commit b5a8c50e5c18 ("leds: ktd2692: Convert
to use ExpressWire library"). Set it with ktd2692_timing.
Fixes: b5a8c50e5c18 ("leds: ktd2692: Convert to use ExpressWire library") Signed-off-by: Raymond Hackley <raymondhackley@protonmail.com> Acked-by: Duje Mihanović <duje.mihanovic@skole.hr> Link: https://lore.kernel.org/r/20241103083505.49648-1-raymondhackley@protonmail.com Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Do not require the presence of `$balanced_parens` to get the commit SHA;
this allows a `Fixes: deadbeef` tag to get a correct suggestion rather
than a suggestion containing a reference to HEAD.
Given this patch:
: From: Tamir Duberstein <tamird@gmail.com>
: Subject: Test patch
: Date: Fri, 25 Oct 2024 19:30:51 -0400
:
: This is a test patch.
:
: Fixes: bd17e036b495
: Signed-off-by: Tamir Duberstein <tamird@gmail.com>
: --- /dev/null
: +++ b/new-file
: @@ -0,0 +1 @@
: +Test.
Before:
WARNING: Please use correct Fixes: style 'Fixes: <12 chars of sha1> ("<title line>")' - ie: 'Fixes: c10a7d25e68f ("Test patch")'
After:
WARNING: Please use correct Fixes: style 'Fixes: <12 chars of sha1> ("<title line>")' - ie: 'Fixes: bd17e036b495 ("checkpatch: warn for non-standard fixes tag style")'
The prior behavior incorrectly suggested the patch's own SHA and title
line rather than the referenced commit's. This fixes that.
Ironically this:
Fixes: bd17e036b495 ("checkpatch: warn for non-standard fixes tag style") Signed-off-by: Tamir Duberstein <tamird@gmail.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Dwaipayan Ray <dwaipayanray1@gmail.com> Cc: Joe Perches <joe@perches.com> Cc: Louis Peens <louis.peens@corigine.com> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Cc: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Cc: Philippe Schenker <philippe.schenker@toradex.com> Cc: Simon Horman <horms@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
1. Super page is dumped as non-present page
2. dma_pte_superpage() should not check against leaf page table entries
3. Pointer pte is never NULL so checking it is meaningless
4. When an entry is not present, it still makes sense to dump the entry
content.
Fix 1,2 by checking dma_pte_superpage()'s returned value after level check.
Fix 3 by removing pte check.
Fix 4 by checking present bit after printing.
By this chance, change to print "page table not present" instead of "PTE
not present" to be clearer.
1. return value of phys_to_virt() is used for checking if an entry is
present.
2. dump is confusing, e.g., "pasid table entry is not present", confusing
by unpresent pasid table vs. unpresent pasid table entry. Current code
means the former.
3. pgtable_walk() is called without checking if page table is present.
Fix 1 by checking present bit of an entry before dump a lower level entry.
Fix 2 by removing "entry" string, e.g., "pasid table is not present".
Fix 3 by checking page table present before walk.
The scu clk_ops only inplements prepare() and unprepare() callback.
Saving the clock state during suspend by checking clk_hw_is_enabled()
is not safe as it's possible that some device drivers may only
disable the clocks without unprepare. Then the state retention will not
work for such clocks.
Fixing it by checking clk_hw_is_prepared() which is more reasonable
and safe.
Fixes: d0409631f466 ("clk: imx: scu: add suspend/resume support") Reviewed-by: Peng Fan <peng.fan@nxp.com> Tested-by: Carlos Song <carlos.song@nxp.com> Signed-off-by: Dong Aisheng <aisheng.dong@nxp.com> Link: https://lore.kernel.org/r/20241027-imx-clk-v1-v3-4-89152574d1d7@nxp.com Signed-off-by: Abel Vesa <abel.vesa@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
To i.MX93 which features dual Cortex-A55 cores and DSU, when using
writel_relaxed to write value to PLL registers, the value might be
buffered. To make sure the value has been written into the hardware,
using readl to read back the register could achieve the goal.
current PLL power up flow can be simplified as below:
1. writel_relaxed to set the PLL POWERUP bit;
2. readl_poll_timeout to check the PLL lock bit:
a). timeout = ktime_add_us(ktime_get(), timeout_us);
b). readl the pll the lock reg;
c). check if the pll lock bit ready
d). check if timeout
But in some corner cases, both the write in step 1 and read in
step 2 will be blocked by other bus transaction in the SoC for a
long time, saying the value into real hardware is just before step b).
That means the timeout counting has begins for quite sometime since
step a), but value still not written into real hardware until bus
released just at a point before step b).
Then there maybe chances that the pll lock bit is not ready
when readl done but the timeout happens. readl_poll_timeout will
err return due to timeout. To avoid such unexpected failure,
read back the reg to make sure the write has been done in HW
reg.
So use readl after writel_relaxed to fix the issue.
Since we are here, to avoid udelay to run before writel_relaxed, use
readl before udelay.
Fixes: 1b26cb8a77a4 ("clk: imx: support fracn gppll") Co-developed-by: Jacky Bai <ping.bai@nxp.com> Signed-off-by: Jacky Bai <ping.bai@nxp.com> Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Abel Vesa <abel.vesa@linaro.org> Link: https://lore.kernel.org/r/20241027-imx-clk-v1-v3-3-89152574d1d7@nxp.com Signed-off-by: Abel Vesa <abel.vesa@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Per i.MX93 Reference Mannual 22.4 Initialization information
1. Program appropriate value of DIV[ODIV], DIV[RDIV] and DIV[MFI]
as per Integer mode.
2. Wait for 5 μs.
3. Program the following field in CTRL register.
Set CTRL[POWERUP] to 1'b1 to enable PLL block.
4. Poll PLL_STATUS[PLL_LOCK] register, and wait till PLL_STATUS[PLL_LOCK]
is 1'b1 and pll_lock output signal is 1'b1.
5. Set CTRL[CLKMUX_EN] to 1'b1 to enable PLL output clock.
So move the CLKMUX_EN operation after PLL locked.
Fixes: 1b26cb8a77a4 ("clk: imx: support fracn gppll") Co-developed-by: Jacky Bai <ping.bai@nxp.com> Signed-off-by: Jacky Bai <ping.bai@nxp.com> Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Abel Vesa <abel.vesa@linaro.org> Link: https://lore.kernel.org/r/20241027-imx-clk-v1-v3-2-89152574d1d7@nxp.com Signed-off-by: Abel Vesa <abel.vesa@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Back-to-back LPCG writes can be ignored by the LPCG register due to
a HW bug. The writes need to be separated by at least 4 cycles of
the gated clock. See https://www.nxp.com.cn/docs/en/errata/IMX8_1N94W.pdf
The workaround is implemented as follows:
1. For clocks running greater than or equal to 24MHz, a read
followed by the write will provide sufficient delay.
2. For clocks running below 24MHz, add a delay of 4 clock cylces
after the write to the LPCG register.
Fixes: 2f77296d3df9 ("clk: imx: add lpcg clock support") Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Abel Vesa <abel.vesa@linaro.org> Link: https://lore.kernel.org/r/20241027-imx-clk-v1-v3-1-89152574d1d7@nxp.com Signed-off-by: Abel Vesa <abel.vesa@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
In kvm_riscv_vcpu_sbi_init() the entry->ext_idx can contain an
out-of-bound index. This is used as a special marker for the base
extensions, that cannot be disabled. However, when traversing the
extensions, that special marker is not checked prior indexing the
array.
Add an out-of-bounds check to the function.
Fixes: 56d8a385b605 ("RISC-V: KVM: Allow some SBI extensions to be disabled by default") Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20241104191503.74725-1-bjorn@kernel.org Signed-off-by: Anup Patel <anup@brainfault.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
In the section "4.7 Precise effects on interrupt-pending bits"
of the RISC-V AIA specification defines that:
"If the source mode is Level1 or Level0 and the interrupt domain
is configured in MSI delivery mode (domaincfg.DM = 1):
The pending bit is cleared whenever the rectified input value is
low, when the interrupt is forwarded by MSI, or by a relevant
write to an in_clrip register or to clripnum."
Update the aplic_write_pending() to match the spec.
Same with commit e375b9c92985 ("RDMA/cxgb4: Set queue pair state when
being queried"). The API for ib_query_qp requires the driver to set
cur_qp_state on return, add the missing set.
Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Liu Jian <liujian56@huawei.com> Link: https://patch.msgid.link/20241031092019.2138467-1-liujian56@huawei.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
While computing foutpostdiv_rate, the value of params->pl5_fracin
is discarded, which results in the wrong refresh rate. Fix the formula
for computing foutpostdiv_rate.
To work around a limitation in our clock modelling, we try to force two
bits in the AUDIO0 PLL to 0, in the CCU probe routine.
However the ~ operator only applies to the first expression, and does
not cover the second bit, so we end up clearing only bit 1.
Group the bit-ORing with parentheses, to make it both clearer to read
and actually correct.
Fixes: 35b97bb94111 ("clk: sunxi-ng: Add support for the D1 SoC clocks") Signed-off-by: Andre Przywara <andre.przywara@arm.com> Link: https://patch.msgid.link/20241001105016.1068558-1-andre.przywara@arm.com Signed-off-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Invalidate rkey is cpu endian and immediate data is in big endian format.
Both immediate data and invalidate the remote key returned by
HW is in little endian format.
While handling the commit in fixes tag, the difference between
immediate data and invalidate rkey endianness was not considered.
Without changes of this patch, Kernel ULP was failing while processing
inv_rkey.
dmesg log snippet -
nvme nvme0: Bogus remote invalidation for rkey 0x2000019Fix in this patch
Do endianness conversion based on completion queue entry flag.
Also, the HW completions are already converted to host endianness in
bnxt_qplib_cq_process_res_rc and bnxt_qplib_cq_process_res_ud and there
is no need to convert it again in bnxt_re_poll_cq. Modified the union to
hold the correct data type.
During reset, cmd to destroy resources such as qp, cq, and mr may fail,
and error logs will be printed. When a large number of resources are
destroyed, there will be lots of printings, and it may lead to a cpu
stuck.
Delete some unnecessary printings and replace other printing functions
in these paths with the ratelimited version.
Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver") Fixes: c7bcb13442e1 ("RDMA/hns: Add SRQ support for hip08 kernel mode") Fixes: 70f92521584f ("RDMA/hns: Use the reserved loopback QPs to free MR before destroying MPT") Fixes: 926a01dc000d ("RDMA/hns: Add QP operations support for hip08 SoC") Signed-off-by: wenglianfa <wenglianfa@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20241024124000.2931869-6-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The sub-directory of hns_roce debugfs is named after the device's
kernel name currently, but it will be inconvenient to use when
the device is renamed.
Modify the name to pci name as users can always easily find the
correspondence between an RDMA device and its pci name.
Fixes: eb7854d63db5 ("RDMA/hns: Support SW stats with debugfs") Signed-off-by: Yuyu Li <liyuyu6@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20241024124000.2931869-4-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
QP needs to be modified to IB_QPS_ERROR to trigger HW flush cqe. But
when this process races with destroy qp, the destroy-qp process may
modify the QP to IB_QPS_RESET first. In this case flush cqe will fail
since it is invalid to modify qp from IB_QPS_RESET to IB_QPS_ERROR.
Add lock and bit flag to make sure pending flush cqe work is completed
first and no more new works will be added.
eq_db_ci is updated only after all AEQEs are processed in the AEQ
interrupt handler, which is not timely enough and may result in
AEQ overflow. Two optimization methods are proposed:
1. Set an upper limit for AEQE processing.
2. Move time-consuming operations such as printings to the bottom
half of the interrupt.
cmd events and flush_cqe events are still fully processed in the top half
to ensure timely handling.
Fixes: a5073d6054f7 ("RDMA/hns: Add eq support of hip08") Signed-off-by: wenglianfa <wenglianfa@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20241024124000.2931869-2-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit c7fc12354be0 ("iommu/amd/pgtbl_v2: Invalidate updated page ranges
only") missed to take domain lock before calling
amd_iommu_domain_flush_pages(). Fix this by taking protection domain
lock before calling TLB invalidation function.
The AMD io_pgtable stuff doesn't implement the tlb ops callbacks, instead
it invokes the invalidation ops directly on the struct protection_domain.
Narrow the use of struct protection_domain to only those few code paths.
Make everything else properly use struct amd_io_pgtable through the call
chains, which is the correct modular type for an io-pgtable module.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/9-v2-831cdc4d00f3+1a315-amd_iopgtbl_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
Stable-dep-of: 016991606aa0 ("iommu/amd/pgtbl_v2: Take protection domain lock before invalidating TLB") Signed-off-by: Sasha Levin <sashal@kernel.org>
We already have memory in the union here that is being wasted in AMD's
case, use it to store the nid.
Putting the nid here further isolates the io_pgtable code from the struct
protection_domain.
Fixup protection_domain_alloc so that the NID from the device is provided,
at this point dev is never NULL for AMD so this will now allocate the
first table pointer on the correct NUMA node.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/8-v2-831cdc4d00f3+1a315-amd_iopgtbl_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
Stable-dep-of: 016991606aa0 ("iommu/amd/pgtbl_v2: Take protection domain lock before invalidating TLB") Signed-off-by: Sasha Levin <sashal@kernel.org>
There is struct protection_domain iopt and struct amd_io_pgtable iopt.
Next patches are going to want to write domain.iopt.iopt.xx which is quite
unnatural to read.
Give one of them a different name, amd_io_pgtable has fewer references so
call it pgtbl, to match pgtbl_cfg, instead.
It is a serious bug if the domain is still mapped to any DTEs when it is
freed as we immediately start freeing page table memory, so any remaining
HW touch will UAF.
If it is not mapped then dev_list is empty and amd_iommu_domain_update()
does nothing.
Remove it and add a WARN_ON() to catch this class of bug.
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v2-831cdc4d00f3+1a315-amd_iopgtbl_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
Stable-dep-of: 016991606aa0 ("iommu/amd/pgtbl_v2: Take protection domain lock before invalidating TLB") Signed-off-by: Sasha Levin <sashal@kernel.org>
cpufreq_cpu_get_raw() may return NULL if the cpu is not in
policy->cpus cpu mask and it will cause null pointer dereference,
so check NULL for cppc_get_cpu_cost().
Fixes: 740fcdc2c20e ("cpufreq: CPPC: Register EM based on efficiency class information") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
In octal DTR mode, RD_ANY_REG_OP needs to use 4-byte address regardless
of flash's internal address mode. Use nor->addr_nbytes which is set to 4
during setup.
This was found by a static analyzer.
There may be a potential integer overflow issue in
sg2042_pll_recalc_rate(). numerator is defined as u64 while
parent_rate is defined as unsigned long and ctrl_table.fbdiv
is defined as unsigned int. On 32-bit machine, the result of
the calculation will be limited to "u32" without correct casting.
Integer overflow may occur on high-performance systems.
Fixes: 48cf7e01386e ("clk: sophgo: Add SG2042 clock driver") Signed-off-by: Zichen Xie <zichenxie0106@gmail.com> Reviewed-by: Chen Wang <unicorn_wang@outlook.com> Link: https://lore.kernel.org/r/20241023145146.13130-1-zichenxie0106@gmail.com Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
copy_from_kernel_nofault() can be called when doing read of /proc/kcore.
/proc/kcore can have some unmapped kfence objects which when read via
copy_from_kernel_nofault() can cause page faults. Since *_nofault()
functions define their own fixup table for handling fault, use that
instead of asking kfence to handle such faults.
Hence we search the exception tables for the nip which generated the
fault. If there is an entry then we let the fixup table handler handle the
page fault by returning an error from within ___do_page_fault().
This can be easily triggered if someone tries to do dd from /proc/kcore.
eg. dd if=/proc/kcore of=/dev/null bs=1M
Some example false negatives:
===============================
BUG: KFENCE: invalid read in copy_from_kernel_nofault+0x9c/0x1a0
Invalid read at 0xc0000000fdff0000:
copy_from_kernel_nofault+0x9c/0x1a0
0xc00000000665f950
read_kcore_iter+0x57c/0xa04
proc_reg_read_iter+0xe4/0x16c
vfs_read+0x320/0x3ec
ksys_read+0x90/0x154
system_call_exception+0x120/0x310
system_call_vectored_common+0x15c/0x2ec
BUG: KFENCE: use-after-free read in copy_from_kernel_nofault+0x9c/0x1a0
Use-after-free read at 0xc0000000fe050000 (in kfence-#2):
copy_from_kernel_nofault+0x9c/0x1a0
0xc00000000665f950
read_kcore_iter+0x57c/0xa04
proc_reg_read_iter+0xe4/0x16c
vfs_read+0x320/0x3ec
ksys_read+0x90/0x154
system_call_exception+0x120/0x310
system_call_vectored_common+0x15c/0x2ec
The pmecc "user" structure is allocated in atmel_pmecc_create_user() and
was supposed to be freed with atmel_pmecc_destroy_user(), but this other
helper is never called. One solution would be to find the proper
location to call the destructor, but the trend today is to switch to
device managed allocations, which in this case fits pretty well.
Replace kzalloc() by devm_kzalloc() and drop the destructor entirely.
Reported-by: "Dr. David Alan Gilbert" <linux@treblig.org> Closes: https://lore.kernel.org/all/ZvmIvRJCf6VhHvpo@gallifrey/ Fixes: f88fc122cc34 ("mtd: nand: Cleanup/rework the atmel_nand driver") Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20241001203149.387655-1-miquel.raynal@bootlin.com Signed-off-by: Sasha Levin <sashal@kernel.org>
During early init CMA_MIN_ALIGNMENT_BYTES can be PAGE_SIZE,
since pageblock_order is still zero and it gets initialized
later during initmem_init() e.g.
setup_arch() -> initmem_init() -> sparse_init() -> set_pageblock_order()
One such use case where this causes issue is -
early_setup() -> early_init_devtree() -> fadump_reserve_mem() -> fadump_cma_init()
This causes CMA memory alignment check to be bypassed in
cma_init_reserved_mem(). Then later cma_activate_area() can hit
a VM_BUG_ON_PAGE(pfn & ((1 << order) - 1)) if the reserved memory
area was not pageblock_order aligned.
Fix it by moving the fadump_cma_init() after initmem_init(),
where other such cma reservations also gets called.
We anyway don't use any return values from fadump_cma_init(). Since
fadump_reserve_mem() from where fadump_cma_init() gets called today,
already has the required checks.
This patch makes this function return type as void. Let's also handle
extra cases like return if fadump_supported is false or dump_active, so
that in later patches we can call fadump_cma_init() separately from
setup_arch().
Acked-by: Hari Bathini <hbathini@linux.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/a2afc3d6481a87a305e89cfc4a3f3d2a0b8ceab3.1729146153.git.ritesh.list@gmail.com
Stable-dep-of: 05b94cae1c47 ("powerpc/fadump: Move fadump_cma_init to setup_arch() after initmem_init()") Signed-off-by: Sasha Levin <sashal@kernel.org>
When cpufreq_register_driver() returns error, the cpufreq_init() returns
without unregister platform_driver, fix by add missing
platform_driver_unregister() when cpufreq_register_driver() failed.
Fixes: f8ede0f700f5 ("MIPS: Loongson 2F: Add CPU frequency scaling support") Signed-off-by: Yuan Can <yuancan@huawei.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When multiple IRQ domains are created from the same device-tree node they
will get the same name based on the device-tree path. This will cause a
naming collision in debugFS when IRQ domain specific entries are created.
The regmap-IRQ creates per instance IRQ domains. This will lead to a
domain name conflict when a device which provides more than one
interrupt line uses the regmap-IRQ.
Add support for specifying an IRQ domain name suffix when creating a
regmap-IRQ controller.
Devices can provide multiple interrupt lines. One reason for this is that
a device has multiple subfunctions, each providing its own interrupt line.
Another reason is that a device can be designed to be used (also) on a
system where some of the interrupts can be routed to another processor.
A line often further acts as a demultiplex for specific interrupts
and has it's respective set of interrupt (status, mask, ack, ...)
registers.
Regmap supports the handling of these registers and demultiplexing
interrupts, but the interrupt domain code ends up assigning the same name
for the per interrupt line domains. This causes a naming collision in the
debugFS code and leads to confusion, as /proc/interrupts shows two separate
interrupts with the same domain name and hardware interrupt number.
Instead of adding a workaround in regmap or driver code, allow giving a
name suffix for the domain name when the domain is created.
Add a name_suffix field in the irq_domain_info structure and make
irq_domain_instantiate() use this suffix if it is given when a domain is
created.
[ tglx: Adopt it to the cleanup patch and fixup the invalid NULL return ]
irq_domain_create_simple() and irq_domain_create_legacy() use
__irq_domain_instantiate(), but have extra handling of allocating interrupt
descriptors and associating interrupts in them. Some of that is duplicated.
There are also call sites which have conditonals to invoke different
interrupt domain creator functions, where one of them is usually
irq_domain_create_legacy(). Alternatively they associate the interrupts for
the legacy case after creating the domain.
Moving the extra logic of irq_domain_create_simple()/legacy() into
__irq_domain_instantiate() allows to consolidate that.
Introduce hwirq_base and virq_base members in the irq_domain_info
structure, which allows to transport the required information and add the
conditional interrupt descriptor allocation and interrupt association into
__irq_domain_instantiate().
This reduces irq_domain_create_legacy() and irq_domain_create_simple() to
trivial wrappers which fill in the info structure and allows call sites
which must support the legacy case along with more modern mechanism to
select the domain type via the parameters of the info struct.
While design wise the idea of converting the driver to use
the hierarchy of the IRQ chips is correct, the implementation
has (inherited) flaws. This was unveiled when platform_get_irq()
had started WARN() on IRQ 0 that is supposed to be a Linux
IRQ number (also known as vIRQ).
Rework the driver to respect IRQ domain when creating each MFD
device separately, as the domain is not the same for all of them.
Fixes: 57129044f504 ("mfd: intel_soc_pmic_bxtwc: Use chained IRQs for second level IRQ chips") Tested-by: Zhang Ning <zhangn1985@outlook.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20241005193029.1929139-4-andriy.shevchenko@linux.intel.com Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
While design wise the idea of converting the driver to use
the hierarchy of the IRQ chips is correct, the implementation
has (inherited) flaws. This was unveiled when platform_get_irq()
had started WARN() on IRQ 0 that is supposed to be a Linux
IRQ number (also known as vIRQ).
Rework the driver to respect IRQ domain when creating each MFD
device separately, as the domain is not the same for all of them.
Fixes: 957ae5098185 ("platform/x86: Add Whiskey Cove PMIC TMU support") Fixes: 57129044f504 ("mfd: intel_soc_pmic_bxtwc: Use chained IRQs for second level IRQ chips") Reported-by: Zhang Ning <zhangn1985@outlook.com> Closes: https://lore.kernel.org/r/TY2PR01MB3322FEDCDC048B7D3794F922CDBA2@TY2PR01MB3322.jpnprd01.prod.outlook.com Tested-by: Zhang Ning <zhangn1985@outlook.com> Acked-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20241005193029.1929139-3-andriy.shevchenko@linux.intel.com Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
While design wise the idea of converting the driver to use
the hierarchy of the IRQ chips is correct, the implementation
has (inherited) flaws. This was unveiled when platform_get_irq()
had started WARN() on IRQ 0 that is supposed to be a Linux
IRQ number (also known as vIRQ).
Rework the driver to respect IRQ domain when creating each MFD
device separately, as the domain is not the same for all of them.
Fixes: 9c6235c86332 ("mfd: intel_soc_pmic_bxtwc: Add bxt_wcove_usbc device") Fixes: d2061f9cc32d ("usb: typec: add driver for Intel Whiskey Cove PMIC USB Type-C PHY") Fixes: 57129044f504 ("mfd: intel_soc_pmic_bxtwc: Use chained IRQs for second level IRQ chips") Reported-by: Zhang Ning <zhangn1985@outlook.com> Closes: https://lore.kernel.org/r/TY2PR01MB3322FEDCDC048B7D3794F922CDBA2@TY2PR01MB3322.jpnprd01.prod.outlook.com Tested-by: Zhang Ning <zhangn1985@outlook.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20241005193029.1929139-2-andriy.shevchenko@linux.intel.com Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
As the comment said, disable_irq() after request_irq() still has a
time gap in which interrupts can come. request_irq() with IRQF_NO_AUTOEN
flag will disable IRQ auto-enable when request IRQ.
Symbol table '.dynsym' contains 12 entries:
Num: Value Size Type Bind Vis Ndx Name
...
1: 0000000000000524 84 NOTYPE GLOBAL DEFAULT 8 __[...]@@LINUX_2.6.15
...
4: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LINUX_2.6.15
5: 00000000000006c0 48 NOTYPE GLOBAL DEFAULT 8 __[...]@@LINUX_2.6.15
Symbol table '.symtab' contains 56 entries:
Num: Value Size Type Bind Vis Ndx Name
...
45: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LINUX_2.6.15
46: 00000000000006c0 48 NOTYPE GLOBAL DEFAULT 8 __kernel_getcpu
47: 0000000000000524 84 NOTYPE GLOBAL DEFAULT 8 __kernel_clock_getres
To overcome that, commit ba83b3239e65 ("selftests: vDSO: fix vDSO
symbols lookup for powerpc64") was applied to have selftests also
look for NOTYPE symbols, but the correct fix should be to flag VDSO
entry points as functions.
The original commit that brought VDSO support into powerpc/64 has the
following explanation:
Note that the symbols exposed by the vDSO aren't "normal" function symbols, apps
can't be expected to link against them directly, the vDSO's are both seen
as if they were linked at 0 and the symbols just contain offsets to the
various functions. This is done on purpose to avoid a relocation step
(ppc64 functions normally have descriptors with abs addresses in them).
When glibc uses those functions, it's expected to use it's own trampolines
that know how to reach them.
The descriptors it's talking about are the OPD function descriptors
used on ABI v1 (big endian). But it would be more correct for a text
symbol to have type function, even if there's no function descriptor
for it.
glibc has a special case already for handling the VDSO symbols which
creates a fake opd pointing at the kernel symbol. So changing the VDSO
symbol type to function shouldn't affect that.
For ABI v2, there is no function descriptors and VDSO functions can
safely have function type.
So lets flag VDSO entry points as functions and revert the
selftest change.
For the controller reset operation(such as FLR or clear nexus ha in SCSI
EH), we will disable all PHYs and then enable PHY based on the
hisi_hba->phy_state obtained in hisi_sas_controller_reset_prepare(). If
the device is removed before controller reset or the PHY is not attached
to any device in directly attached scenario, the corresponding bit of
phy_state is not set. After controller reset done, the PHY is disabled.
The device cannot be identified even if user reconnect the disk.
Therefore, for PHYs that are not disabled by user, hisi_sas_phy_enable()
needs to be executed even if the corresponding bit of phy_state is not
set.
Fixes: 89954f024c3a ("scsi: hisi_sas: Ensure all enabled PHYs up during controller reset") Signed-off-by: Yihang Li <liyihang9@huawei.com> Link: https://lore.kernel.org/r/20241008021822.2617339-5-liyihang9@huawei.com Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
This fixes a crash when surprise hot-unplugging a PCI device. This crash
happens because during hot-unplug __iommu_group_set_domain_nofail()
attaching the default domain fails when the platform no longer
recognizes the device as it has already been removed and we end up with
a NULL domain pointer and UAF. This is exactly the case referred to in
the second comment in __iommu_device_set_domain() and just as stated
there if we can instead attach the blocking domain the UAF is prevented
as this can handle the already removed device. Implement the blocking
domain to use this handling. With this change, the crash is fixed but
we still hit a warning attempting to change DMA ownership on a blocked
device.
When a tracepoint event is created with attr.freq = 1,
'hwc->period_left' is not initialized correctly. As a result,
in the perf_swevent_overflow() function, when the first time the event occurs,
it calculates the event overflow and the perf_swevent_set_period() returns 3,
this leads to the event are recorded for three duplicate times.
Commit 0f471d31e5e8 ("clk: mediatek: Split MT8195 clock drivers and allow
module build") adds a number of new COMMON_CLK_MT8195_* config options.
Among those, the config options COMMON_CLK_MT8195_AUDSYS and
COMMON_CLK_MT8195_MSDC have no reference in the source tree and are not
used in the Makefile to include a specific file.
Drop the dead config options COMMON_CLK_MT8195_AUDSYS and
COMMON_CLK_MT8195_MSDC.
Fixes: 0f471d31e5e8 ("clk: mediatek: Split MT8195 clock drivers and allow module build") Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com> Link: https://lore.kernel.org/r/20240927092232.386511-1-lukas.bulwahn@redhat.com Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When HW is being reset, userspace should not ring doorbell otherwise
it may lead to abnormal consequence such as RAS.
Disassociate mmap pages for all uctx to prevent userspace from ringing
doorbell to HW. Since all resources will be destroyed during HW reset,
no new mmap is allowed after HW reset is completed.
Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20240927103323.1897094-3-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Provide a new api rdma_user_mmap_disassociate() for drivers to
disassociate mmap pages for a device.
Since drivers can now disassociate mmaps by calling this api,
introduce a new disassociation_lock to specifically prevent
races between this disassociation process and new mmaps. And
thus the old hw_destroy_rwsem is not needed in this api.
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20240927103323.1897094-2-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
Stable-dep-of: 615b94746a54 ("RDMA/hns: Disassociate mmap pages for all uctx when HW is being reset") Signed-off-by: Sasha Levin <sashal@kernel.org>
The CPPC performance feedback counters could be 0 or unchanged when the
target cpu is in a low-power idle state, e.g. power-gated or clock-gated.
When the counters are 0, cppc_cpufreq_get_rate() returns 0 KHz, which makes
cpufreq_online() get a false error and fail to generate a cpufreq policy.
When the counters are unchanged, the existing cppc_perf_from_fbctrs()
returns a cached desired perf, but some platforms may update the real
frequency back to the desired perf reg.
For the above cases in cppc_cpufreq_get_rate(), get the latest desired perf
from the CPPC reg to reflect the frequency because some platforms may
update the actual frequency back there; if failed, use the cached desired
perf.
Fixes: 6a4fec4f6d30 ("cpufreq: cppc: cppc_cpufreq_get_rate() returns zero in all error cases.") Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com> Reviewed-by: Huisong Li <lihuisong@huawei.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
utf8_load() requests the symbol "utf8_data_table" and then checks if the
requested UTF-8 version is supported. If it's unsupported, it tries to
put the data table using symbol_put(). If an unsupported version is
requested, symbol_put() fails like this:
That happens because symbol_put() expects the unique string that
identify the symbol, instead of a pointer to the loaded symbol. Fix that
by using such string.
When the stream_verdict program returns SK_PASS, it places the received skb
into its own receive queue, but a recursive lock eventually occurs, leading
to an operating system deadlock. This issue has been present since v6.9.
Some distros may not load nf_conntrack by default, which will cause
subsequent nf_conntrack sets to fail. Load this module if it is not
already loaded.
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org>
[ Jason: add [[ -e ... ]] check so this works in the qemu harness. ] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://patch.msgid.link/20241117212030.629159-4-Jason@zx2c4.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The ndev->npinfo pointer in netpoll_poll_lock() is RCU-protected but is
being accessed directly for a NULL check. While no RCU read lock is held
in this context, we should still use proper RCU primitives for
consistency and correctness.
Replace the direct NULL check with rcu_access_pointer(), which is the
appropriate primitive when only checking for NULL without dereferencing
the pointer. This function provides the necessary ordering guarantees
without requiring RCU read-side protection.
Fixes: bea3348eef27 ("[NET]: Make NAPI polling independent of struct net_device objects.") Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Link: https://patch.msgid.link/20241118-netpoll_rcu-v1-2-a1888dcb4a02@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Since the GPIO interrupt controller is always not working properly, we need
to constantly add workaround to cope with hardware deficiencies. So just
remove GPIO interrupt controller, and let the SFP driver poll the GPIO
status.