On DWMAC3 and later, there's a RX Watchdog interrupt that's used for
interrupt coalescing. It's known to be buggy on some platforms, and
dwmac-socfpga appears to be one of them. Changing the interrupt
coalescing from ethtool doesn't appear to have any effect here.
Without disabling RIWT (Received Interrupt Watchdog Timer, I
believe...), we observe latencies while receiving traffic that amount to
around ~0.4ms. This was discovered with NTP but can be easily reproduced
with a simple ping. Without this patch :
64 bytes from 192.168.5.2: icmp_seq=1 ttl=64 time=0.657 ms
With this patch :
64 bytes from 192.168.5.2: icmp_seq=1 ttl=64 time=0.254 ms
If an optional resource is found but fails to remap, return on failure.
Avoids any potential problems when using the iomapped resource as the
assumption is that it's available.
Fixes: 23a890d493e3 ("net: mdio: Add the reset function for IPQ MDIO driver") Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20241121193152.8966-1-rosenp@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
RFC8981 section 3.4 says that existing temporary addresses must have their
lifetimes adjusted so that no temporary addresses should ever remain "valid"
or "preferred" longer than the incoming SLAAC Prefix Information. This would
strongly imply in Linux's case that if the "mngtmpaddr" address is deleted or
un-flagged as such, its corresponding temporary addresses must be cleared out
right away.
But now the temporary address is renewed even after ‘mngtmpaddr’ is removed
or becomes unmanaged as manage_tempaddrs() set temporary addresses
prefered/valid time to 0, and later in addrconf_verify_rtnl() all checkings
failed to remove the addresses. Fix this by deleting the temporary address
directly for these situations.
Fixes: 778964f2fdf0 ("ipv6/addrconf: fix timing bug in tempaddr regen") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Passing MSG_PEEK flag to skb_recv_datagram() increments skb refcount
(skb->users) and iucv_sock_recvmsg() does not decrement skb refcount
at exit.
This results in skb memory leak in skb_queue_purge() and WARN_ON in
iucv_sock_destruct() during socket close. To fix this decrease
skb refcount by one if MSG_PEEK is set in order to prevent memory
leak and WARN_ON.
Unaligned direct writes are invalid and should return an error
without making any changes, rather than extending ->valid_size
and then returning an error. Therefore, alignment checking is
required before extending ->valid_size.
Fixes: 11a347fb6cef ("exfat: change to get file size from DataLength") Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Co-developed-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit under fixes extended extack reporting to dumps.
It works under normal conditions, because extack errors are
usually reported during ->start() or the first ->dump(),
it's quite rare that the dump starts okay but fails later.
If the dump does fail later, however, the input skb will
already have the initiating message pulled, so checking
if bad attr falls within skb->data will fail.
Switch the check to using nlh, which is always valid.
syzbot found a way to hit that scenario by filling up
the receive queue. In this case we initiate a dump
but don't call ->dump() until there is read space for
an skb.
Reported-by: syzbot+d4373fa8042c06cefa84@syzkaller.appspotmail.com Fixes: 8af4f60472fc ("netlink: support all extack types in dumps") Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20241119224432.1713040-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
VCAP API unit tests fail randomly with errors such as
# vcap_api_iterator_init_test: EXPECTATION FAILED at drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c:387
Expected 134 + 7 == iter.offset, but
134 + 7 == 141 (0x8d)
iter.offset == 17214 (0x433e)
# vcap_api_iterator_init_test: EXPECTATION FAILED at drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c:388
Expected 5 == iter.reg_idx, but
iter.reg_idx == 702 (0x2be)
# vcap_api_iterator_init_test: EXPECTATION FAILED at drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c:389
Expected 11 == iter.reg_bitpos, but
iter.reg_bitpos == 15 (0xf)
# vcap_api_iterator_init_test: pass:0 fail:1 skip:0 total:1
Comments in the code state that "A typegroup table ends with an all-zero
terminator". Add the missing terminators.
Some of the typegroups did have a terminator of ".offset = 0, .width = 0,
.value = 0,". Replace those terminators with "{ }" (no trailing ',') for
consistency and to excplicitly state "this is a terminator".
Fixes: 67d637516fa9 ("net: microchip: sparx5: Adding KUNIT test for the VCAP API") Cc: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20241119213202.2884639-1-linux@roeck-us.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Validate Wake-on-LAN (WoL) options in `lan78xx_set_wol` before calling
`usb_autopm_get_interface`. This prevents USB autopm refcounting issues
and ensures the adapter can properly enter autosuspend when invalid WoL
options are provided.
The hardware on Broadcom 1G chipsets have a known limitation
where they cannot handle DMA addresses that cross over 4GB.
When such an address is encountered, the hardware sets the
address overflow error bit in the DMA status register and
triggers a reset.
However, BCM57766 hardware is setting the overflow bit and
triggering a reset in some cases when there is no actual
underlying address overflow. The hardware team analyzed the
issue and concluded that it is happening when the status
block update has an address with higher (b16 to b31) bits
as 0xffff following a previous update that had lowest bits
as 0xffff.
To work around this bug in the BCM57766 hardware, set the
coherent dma mask from the current 64b to 31b. This will
ensure that upper bits of the status block DMA address are
always at most 0x7fff, thus avoiding the improper overflow
check described above. This work around is intended for only
status block and ring memories and has no effect on TX and
RX buffers as they do not require coherent memory.
Add calls to `phy_device_free` after `fixed_phy_unregister` to fix a
memory leak that occurs when the device is unplugged. This ensures
proper cleanup of pseudo fixed-link PHYs.
Fixes: 89b36fb5e532 ("lan78xx: Lan7801 Support for Fixed PHY") Cc: Raghuram Chary J <raghuramchary.jallipalli@microchip.com> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20241116130558.1352230-2-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
In lan78xx_probe(), the buffer `buf` was being freed twice: once
implicitly through `usb_free_urb(dev->urb_intr)` with the
`URB_FREE_BUFFER` flag and again explicitly by `kfree(buf)`. This caused
a double free issue.
To resolve this, reordered `kmalloc()` and `usb_alloc_urb()` calls to
simplify the initialization sequence and removed the redundant
`kfree(buf)`. Now, `buf` is allocated after `usb_alloc_urb()`, ensuring
it is correctly managed by `usb_fill_int_urb()` and freed by
`usb_free_urb()` as intended.
Correct bq27426 registers, according to technical reference manual
it does not have Design Capacity register so it is not register
compatible with bq27421.
Fixes: 5ef6a16033b47 ("power: supply: bq27xxx: Add support for BQ27426") Signed-off-by: Barnabás Czémán <barnabas.czeman@mainlining.org> Link: https://lore.kernel.org/r/20241016-fix_bq27426-v2-1-aa6c0f51a9f6@mainlining.org Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The put_device() call in power_supply_put() may call
power_supply_dev_release(). The latter function does not sleep so
power_supply_put() doesn't sleep either. Hence, remove the might_sleep()
call from power_supply_put(). This patch suppresses false positive
complaints about calling a sleeping function from atomic context if
power_supply_put() is called from atomic context.
Cc: Kyle Tso <kyletso@google.com> Cc: Krzysztof Kozlowski <krzk@kernel.org> Fixes: 1a352462b537 ("power_supply: Add power_supply_put for decrementing device reference counter") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240917193914.47566-1-bvanassche@acm.org Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
There are no failed test cases compiled with the lower version of GCC
such as 13.3.0, while the problems only appear with higher version of
GCC such as 14.2.0.
This is because the problems were hidden by the lower version of GCC due
to redundant sign extension instructions generated by compiler, but with
optimization of higher version of GCC, the sign extension instructions
have been removed.
(4) Root Cause Analysis:
The LoongArch architecture does not expose sub-registers, and hold all
32-bit values in a sign-extended format. While BPF, on the other hand,
exposes sub-registers, and use zero-extension (similar to arm64/x86).
This has led to some subtle bugs, where a BPF JITted program has not
sign-extended the a0 register (return value in LoongArch land), passed
the return value up the kernel, for example:
| int from_bpf(void);
|
| long foo(void)
| {
| return from_bpf();
| }
Here, a0 would be 0xffffffff instead of the expected 0xffffffffffffffff.
Internally, the LoongArch JIT uses a5 as a dedicated register for BPF
return values. That is to say, the LoongArch BPF uses a5 for BPF return
values, which are zero-extended, whereas the LoongArch ABI uses a0 which
is sign-extended.
(5) Final Solution:
Keep a5 zero-extended, but explicitly sign-extend a0 (which is used
outside BPF land). Because libbpf currently defines the return value
of an ebpf program as a 32-bit unsigned integer, just use addi.w to
extend bit 31 into bits 63 through 32 of a5 to a0. This is similar to
commit 2f1b0d3d7331 ("riscv, bpf: Sign-extend return values").
Whenever I try to build the kernel with upcoming GCC 15 which defaults
to -std=gnu23 I get a build failure:
CC arch/loongarch/vdso/vgetcpu.o
In file included from ./include/uapi/linux/posix_types.h:5,
from ./include/uapi/linux/types.h:14,
from ./include/linux/types.h:6,
from ./include/linux/kasan-checks.h:5,
from ./include/asm-generic/rwonce.h:26,
from ./arch/loongarch/include/generated/asm/rwonce.h:1,
from ./include/linux/compiler.h:317,
from ./include/asm-generic/bug.h:5,
from ./arch/loongarch/include/asm/bug.h:60,
from ./include/linux/bug.h:5,
from ./include/linux/mmdebug.h:5,
from ./include/linux/mm.h:6,
from ./arch/loongarch/include/asm/vdso.h:10,
from arch/loongarch/vdso/vgetcpu.c:6:
./include/linux/stddef.h:11:9: error: expected identifier before 'false'
11 | false = 0,
| ^~~~~
./include/linux/types.h:35:33: error: two or more data types in declaration specifiers
35 | typedef _Bool bool;
| ^~~~
./include/linux/types.h:35:1: warning: useless type name in empty declaration
35 | typedef _Bool bool;
| ^~~~~~~
The kernel builds explicitly with -std=gnu11 in top Makefile, but
arch/loongarch/vdso does not use KBUILD_CFLAGS from the rest of the
kernel, just add -std=gnu11 flag to arch/loongarch/vdso/Makefile.
By the way, commit e8c07082a810 ("Kbuild: move to -std=gnu11") did a
similar change for arch/arm64/kernel/vdso32/Makefile.
Fixes: c6b99bed6b8f ("LoongArch: Add VDSO and VSYSCALL support") Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Sasha Levin <sashal@kernel.org>
Add the missing 'name' parameter to the mount_api documentation for
fs_validate_description().
Fixes: 96cafb9ccb15 ("fs_parser: remove fs_parameter_description name field") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20241125215021.231758-1-rdunlap@infradead.org Cc: Eric Sandeen <sandeen@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
There are cases where a PCIe extended capability should be hidden from
the user. For example, an unknown capability (i.e., capability with ID
greater than PCI_EXT_CAP_ID_MAX) or a capability that is intentionally
chosen to be hidden from the user.
Hiding a capability is done by virtualizing and modifying the 'Next
Capability Offset' field of the previous capability so it points to the
capability after the one that should be hidden.
The special case where the first capability in the list should be hidden
is handled differently because there is no previous capability that can
be modified. In this case, the capability ID and version are zeroed
while leaving the next pointer intact. This hides the capability and
leaves an anchor for the rest of the capability list.
However, today, hiding the first capability in the list is not done
properly if the capability is unknown, as struct
vfio_pci_core_device->pci_config_map is set to the capability ID during
initialization but the capability ID is not properly checked later when
used in vfio_config_do_rw(). This leads to the following warning [1] and
to an out-of-bounds access to ecap_perms array.
Fix it by checking cap_id in vfio_config_do_rw(), and if it is greater
than PCI_EXT_CAP_ID_MAX, use an alternative struct perm_bits for direct
read only access instead of the ecap_perms array.
Note that this is safe since the above is the only case where cap_id can
exceed PCI_EXT_CAP_ID_MAX (except for the special capabilities, which
are already checked before).
Currently the mount_setattr_test fails on machines with a 64K PAGE_SIZE,
with errors such as:
# RUN mount_setattr_idmapped.invalid_fd_negative ...
mkfs.ext4: No space left on device while writing out and closing file system
# mount_setattr_test.c:1055:invalid_fd_negative:Expected system("mkfs.ext4 -q /mnt/C/ext4.img") (256) == 0 (0)
# invalid_fd_negative: Test terminated by assertion
# FAIL mount_setattr_idmapped.invalid_fd_negative
not ok 12 mount_setattr_idmapped.invalid_fd_negative
At first glance it seems like that should never work, after all 2MB is
larger than 100,000 bytes. However the filesystem image doesn't actually
occupy 2MB on "disk" (actually RAM, due to tmpfs). On 4K kernels the
ext4.img uses ~84KB of actual space (according to du), which just fits.
However on 64K PAGE_SIZE kernels the ext4.img takes at least 256KB,
which is too large to fit in the tmpfs, hence the errors.
It seems fraught to rely on the ext4.img taking less space on disk than
the allocated size, so instead create the tmpfs with a size of 2MB. With
that all 21 tests pass on 64K PAGE_SIZE kernels.
Fix unwind flows in mlx5vf_pci_save_device_data() and
mlx5vf_pci_resume_device_data() to avoid freeing the migf pointer at the
'end' label, as this will be handled by fput(migf->filp) through
mlx5vf_release_file().
To ensure mlx5vf_release_file() functions correctly, move the
initialization of migf fields (such as migf->lock) to occur before any
potential unwind flow, as these fields may be accessed within
mlx5vf_release_file().
The starting iova address to iterate iotlb map entry within a range
was set to an irrelevant value when passing to the itree_next()
iterator, although luckily it doesn't affect the outcome of finding
out the granule of the smallest iotlb map size. Fix the code to make
it consistent with the following for-loop.
Fixes: 94abbccdf291 ("vdpa/mlx5: Add shared memory registration code") Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Message-Id: <20241021134040.975221-3-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Fix the following register definitions for REG_CSR_2L_RX{0,1}_REV0
registers:
- CSR_2L_PXP_VOS_PNINV
- CSR_2L_PXP_FE_GAIN_NORMAL_MODE
- CSR_2L_PXP_FE_GAIN_TRAIN_MODE
Commit 120584c728a6 ("hwmon: (aquacomputer_d5next) Add support for Octo
flow sensor") added support for reading Octo flow sensor, but didn't
update the priv->speed_input array length. Since Octo has 8 fans, with
the addition of the flow sensor the proper length for speed_input is 9.
Reported by Arne Schwabe on Github [1], who received a UBSAN warning.
Fixes: 120584c728a6 ("hwmon: (aquacomputer_d5next) Add support for Octo flow sensor") Closes: https://github.com/aleksamagicka/aquacomputer_d5next-hwmon/issues/100 [1] Reported-by: Arne Schwabe <arne@rfc2549.org> Signed-off-by: Aleksa Savic <savicaleksa83@gmail.com>
Message-ID: <20241124152725.7205-1-savicaleksa83@gmail.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Negative temperatures are reported as large positive temperatures
due to missing sign extension from unsigned int to long. Cast unsigned
raw register values to signed before performing the calculations
to fix the problem.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
while ((copy = nfsd4_get_copy(clp)) != NULL)
nfsd4_stop_copy(copy);
nfsd4_get_copy() bumps @copy's reference count, preventing
nfsd4_stop_copy() from releasing @copy.
A while loop like this usually works by removing the first element
of the list, but neither nfsd4_get_copy() nor nfsd4_stop_copy()
alters the async_copies list.
Best I can tell, then, is that nfsd4_shutdown_copy() continues to
loop until other threads manage to remove all the items from this
list. The spinning loop blocks shutdown until these items are gone.
Possibly the reason we haven't seen this issue in the field is
because client_has_state() prevents __destroy_client() from calling
nfsd4_shutdown_copy() if there are any items on this list. In a
subsequent patch I plan to remove that restriction.
Fixes: e0639dc5805a ("NFSD introduce async copy feature") Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If register_sysctl() return NULL, then svc_rdma_proc_cleanup() will not
destroy the percpu counters which init in svc_rdma_proc_init().
If CONFIG_HOTPLUG_CPU is enabled, residual nodes may be in the
'percpu_counters' list. The above issue may occur once the module is
removed. If the CONFIG_HOTPLUG_CPU configuration is not enabled, memory
leakage occurs.
To solve above issue just destroy all percpu counters when
register_sysctl() return NULL.
Fixes: 1e7e55731628 ("svcrdma: Restore read and write stats") Fixes: 22df5a22462e ("svcrdma: Convert rdma_stat_sq_starve to a per-CPU counter") Fixes: df971cd853c0 ("svcrdma: Convert rdma_stat_recv to a per-CPU counter") Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The last reference for `cache_head` can be reduced to zero in `c_show`
and `e_show`(using `rcu_read_lock` and `rcu_read_unlock`). Consequently,
`svc_export_put` and `expkey_put` will be invoked, leading to two
issues:
1. The `svc_export_put` will directly free ex_uuid. However,
`e_show`/`c_show` will access `ex_uuid` after `cache_put`, which can
trigger a use-after-free issue, shown below.
==================================================================
BUG: KASAN: slab-use-after-free in svc_export_show+0x362/0x430 [nfsd]
Read of size 1 at addr ff11000010fdc120 by task cat/870
2. We cannot sleep while using `rcu_read_lock`/`rcu_read_unlock`.
However, `svc_export_put`/`expkey_put` will call path_put, which
subsequently triggers a sleeping operation due to the following
`dput`.
Fix these issues by using `rcu_work` to help release
`svc_expkey`/`svc_export`. This approach allows for an asynchronous
context to invoke `path_put` and also facilitates the freeing of
`uuid/exp/key` after an RCU grace period.
Fixes: 9ceddd9da134 ("knfsd: Allow lockless lookups of the exports") Signed-off-by: Yang Erkun <yangerkun@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
@ses is initialized to NULL. If __nfsd4_find_backchannel() finds no
available backchannel session, setup_callback_client() will try to
dereference @ses and segfault.
Fixes: dcbeaa68dbbd ("nfsd4: allow backchannel recovery") Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If platform_get_resource_byname() fails and returns NULL because DT lacks
an 'mmio' property for the MHI endpoint, dereferencing res->start will
cause a NULL pointer access. Add a check to prevent it.
Fixes: 1bf5f25324f7 ("PCI: endpoint: Add PCI Endpoint function driver for MHI bus") Link: https://lore.kernel.org/r/20241105120735.1240728-1-quic_zhonhan@quicinc.com Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com>
[kwilczynski: error message update per the review feedback] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
[bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Any write access to the IMEM region when the Q6 is setting up XPU
protection on it will result in a XPU violation. Fix this by ensuring
IMEM writes related to the MBA post-mortem logs happen before the Q6
is brought out of reset.
Fixes: 318130cc9362 ("remoteproc: qcom_q6v5_mss: Add MBA log extraction support") Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Tested-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20240819073020.3291287-1-quic_sibis@quicinc.com Signed-off-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The name len field of the CMD_OPEN packet is only 16-bits and the upper
16-bits of "param2" are a different "prio" field, which can be nonzero in
certain situations, and CMD_OPEN packets can be unexpectedly dropped
because of this.
Fix this by masking out the upper 16 bits of param2.
Current implementation of adsp_probe() in qcom_q6v5_adsp.c and does not
remove the subdevs of adsp on the error path. Fix this bug by calling
qcom_remove_{ssr,sysmon,pdm,smd,glink}_subdev(), and qcom_q6v5_deinit()
appropriately.
Current implementation of adsp_probe() in qcom_q6v5_pas.c does not
remove the subdevs of adsp on the error path. Fix this bug by calling
qcom_remove_{ssr,sysmon,pdm,smd,glink}_subdev(), qcom_q6v5_deinit(), and
adsp_unassign_memory_region() appropriately.
syscall__scnprintf_args may not place anything in the output buffer
(e.g., because the arguments are all zero). If that happened in
trace__fprintf_sys_enter, its fprintf would receive an unitialized
buffer leading to garbage output.
Fix the problem by passing the (possibly zero) bounds of the argument
buffer to the output fprintf.
Fixes: a98392bb1e169a04 ("perf trace: Use beautifiers on syscalls:sys_enter_ handlers") Signed-off-by: Benjamin Peterson <benjamin@engflow.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Howard Chu <howardchu95@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241107232128.108981-2-benjamin@engflow.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If a perf trace event selector specifies a maximum number of events to output
(i.e., "/nr=N/" syntax), the event printing handler, trace__event_handler,
disables the event selector after the maximum number events are
printed.
Furthermore, trace__event_handler checked if the event selector was
disabled before doing any work. This avoided exceeding the maximum
number of events to print if more events were in the buffer before the
selector was disabled.
However, the event selector can be disabled for reasons other than
exceeding the maximum number of events. In particular, when the traced
subprocess exits, the main loop disables all event selectors. This meant
the last events of a traced subprocess might be lost to the printing
handler's short-circuiting logic.
This nondeterministic problem could be seen by running the following many times:
trace__event_handler should simply check for exceeding the maximum number of
events to print rather than the state of the event selector.
Fixes: a9c5e6c1e9bff42c ("perf trace: Introduce per-event maximum number of events property") Signed-off-by: Benjamin Peterson <benjamin@engflow.com> Tested-by: Howard Chu <howardchu95@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241107232128.108981-1-benjamin@engflow.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
There exists a pids_filtered map in augmented_raw_syscalls.bpf.c that
ceases to provide functionality after the BPF skeleton migration done
in:
5e6da6be3082f77b ("perf trace: Migrate BPF augmentation to use a skeleton")
Before the migration, pid_filtered map works, courtesy of Arnaldo
Carvalho de Melo <acme@kernel.org>:
⬢ [acme@toolbox perf-tools]$ git log --oneline -5 6f769c3458b6cf2d (HEAD) perf tests trace+probe_vfs_getname.sh: Accept quotes surrounding the filename 7777ac3dfe29f55d perf test trace+probe_vfs_getname.sh: Remove stray \ before / 33d9c5062113a4bd perf script python: Add stub for PMU symbol to the python binding e59fea47f83e8a9a perf symbols: Fix DSO kernel load and symbol process to correctly map DSO to its long_name, type and adjust_symbols 878460e8d0ff84a0 perf build: Remove -Wno-unused-but-set-variable from the flex flags when building with clang < 13.0.0
After using the heuristic, write is hooked to syscall_unaugmented, which
returns 1.
SEC("tp/raw_syscalls/sys_enter")
int syscall_unaugmented(struct syscall_enter_args *args)
{
return 1;
}
If the BPF program returns 1, the tracepoint filter will filter it
(since the tracepoint filter for perf is correctly set), but before the
heuristic, when it was hooked to a sys_enter_openat(), which is a BPF
program that calls bpf_perf_event_output() and writes to the buffer, it
didn't get filtered, thus creating feedback loop. So switching write to
unaugmented accidentally fixed the problem.
But some syscalls are not so lucky, for example newfstatat:
perf $ ./perf trace -e newfstatat --max-events=100 & echo #!
[1] 2166948
Currently, write is augmented by the new BTF general augmenter (which
calls bpf_perf_event_output()). The problem, which luckily got fixed,
resurfaced, and that’s how it was discovered.
Fixes: 5e6da6be3082f77b ("perf trace: Migrate BPF augmentation to use a skeleton") Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241030052431.2220130-1-howardchu95@gmail.com
[ Check if trace->skel is non-NULL, as it is only initialized if trace->trace_syscalls is set ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Fixes: e5c6109f4813246a ("perf list: Reorganize to use callbacks to allow honouring command line options") Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Jean-Philippe Romain <jean-philippe.romain@foss.st.com> Tested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Junhao He <hejunhao3@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241109025801.560378-1-irogers@google.com
[ I fixed the two callers and added it to Jean-Phillippe's original change. ] Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The inode that nfs4_open_delegation() passes to this function is
wrong, which throws off the result. The inode will end up getting a
directory-style change attr instead of a regular-file-style one.
Fix up nfs4_delegation_stat() to fetch STATX_MODE, and then drop the
inode parameter from nfsd4_change_attribute(), since it's no longer
needed.
Fixes: c5967721e106 ("NFSD: handle GETATTR conflict with write delegation") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When CONFIG_FEC is set (due to COMPILE_TEST) along with
CONFIG_M54xx, coldfire/device.c has compile errors due to
missing MCFEC_* and MCF_IRQ_FEC_* symbols.
Make the whole FEC blocks dependent on having the HW macros
defined, rather than on CONFIG_FEC itself.
This fix is very similar to commit e6e1e7b19fa1 ("m68k: coldfire/device.c: only build for MCF_EDMA when h/w macros are defined")
Fixes: b7ce7f0d0efc ("m68knommu: merge common ColdFire FEC platform setup code")
To: Greg Ungerer <gerg@linux-m68k.org>
To: Geert Uytterhoeven <geert@linux-m68k.org> Cc: linux-m68k@lists.linux-m68k.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Antonio Quartulli <antonio@mandelbit.com> Signed-off-by: Greg Ungerer <gerg@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Fix a typo in the CONFIG_M5441x preprocessor condition, where the GPIO
register offset was incorrectly set to 8 instead of 0. This prevented
proper GPIO configuration for m5441x targets.
Fixes: bea8bcb12da0 ("m68knommu: Add support for the Coldfire m5441x.") Signed-off-by: Jean-Michel Hautbois <jeanmichel.hautbois@yoseli.org> Signed-off-by: Greg Ungerer <gerg@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
trace__fprintf_tp_fields may not print any tracepoint arguments. E.g., if the
argument values are all zero. Previously, this would result in a totally
uninitialized buffer being passed to fprintf, which could lead to garbage on the
console. Fix the problem by passing the number of initialized bytes fprintf.
Fixes: f11b2803bb88 ("perf trace: Allow choosing how to augment the tracepoint arguments") Signed-off-by: Benjamin Peterson <benjamin@engflow.com> Tested-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20241103204816.7834-1-benjamin@engflow.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Jinsu Lee reported a performance regression issue, after commit 5c8764f8679e ("f2fs: fix to force buffered IO on inline_data
inode"), we forced direct write to use buffered IO on inline_data
inode, it will cause performace regression due to memory copy
and data flush.
It's fine to not force direct write to use buffered IO, as it
can convert inline inode before committing direct write IO.
Fixes: 5c8764f8679e ("f2fs: fix to force buffered IO on inline_data inode") Reported-by: Jinsu Lee <jinsu1.lee@samsung.com> Closes: https://lore.kernel.org/linux-f2fs-devel/af03dd2c-e361-4f80-b2fd-39440766cf6e@kernel.org Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
f2fs_map_blocks() supports to map continuous holes or preallocated
address, we should avoid setting F2FS_MAP_MAPPED for these cases
only, otherwise, it may fail f2fs_iomap_begin(), and make direct
write fallbacking to use buffered IO and flush, result in performance
regression.
Fixes: 9f0f6bf42714 ("f2fs: support to map continuous holes or preallocated address") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202409122103.e45aa13b-oliver.sang@intel.com Cc: Cyril Hrubis <chrubis@suse.cz> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The commit c7f114d864ac ("f2fs: fix to avoid use-after-free in
f2fs_stop_gc_thread()") attempted to fix this issue by using a read
semaphore to prevent races between shutdown and remount threads, but
it fails to prevent all race conditions.
Fix it by converting to write lock of s_umount in f2fs_do_shutdown().
Fixes: 7950e9ac638e ("f2fs: stop gc/discard thread after fs shutdown") Signed-off-by: Long Li <leo.lilong@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When building with custom libtraceevent, below errors occur:
$ make -C tools/perf NO_LIBPYTHON=1 PKG_CONFIG_PATH=<custom libtraceevent>
In file included from util/session.h:5,
from builtin-buildid-list.c:17:
util/trace-event.h:153:10: fatal error: traceevent/event-parse.h: No such file or directory
153 | #include <traceevent/event-parse.h>
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
<snip similar errors of missing headers>
This is because the include path is missed in the cflags. Add it.
Fixes: 0f0e1f445690 ("perf build: Use pkg-config for feature check for libtrace{event,fs}") Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Guilherme Amadio <amadio@gentoo.org> Cc: linuxarm@huawei.com Link: https://lore.kernel.org/r/20241024133236.31016-1-yangyicong@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
According to Section 2.2 of the PCI Express Card Electromechanical
Specification (Revision 5.1), in order to ensure that the power and the
reference clock are stable, PERST# has to be deasserted after a delay of
100 milliseconds (TPVPERL).
Currently, it is being assumed that the power is already stable, which
is not necessarily true.
Hence, change the delay to PCIE_T_PVPERL_MS to guarantee that power and
reference clock are stable.
Fixes: f3e25911a430 ("PCI: j721e: Add TI J721E PCIe driver") Fixes: f96b69713733 ("PCI: j721e: Use T_PERST_CLK_US macro") Link: https://lore.kernel.org/r/20241104074420.1862932-1-s-vadapalli@ti.com Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Add suspend and resume support. Only the Root Complex mode is supported.
During the suspend stage PERST# is asserted, then deasserted during the
resume stage.
Link: https://lore.kernel.org/linux-pci/20240102-j7200-pcie-s2r-v7-7-a2f9156da6c3@bootlin.com Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com> Signed-off-by: Thomas Richard <thomas.richard@bootlin.com>
[kwilczynski: commit log, update references to the PCI SIG specification] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Stable-dep-of: 22a9120479a4 ("PCI: j721e: Deassert PERST# after a delay of PCIE_T_PVPERL_MS milliseconds") Signed-off-by: Sasha Levin <sashal@kernel.org>
Add reset GPIO to struct j721e_pcie, so it can be used at suspend and
resume stages.
Link: https://lore.kernel.org/linux-pci/20240102-j7200-pcie-s2r-v7-4-a2f9156da6c3@bootlin.com Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com> Signed-off-by: Thomas Richard <thomas.richard@bootlin.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Stable-dep-of: 22a9120479a4 ("PCI: j721e: Deassert PERST# after a delay of PCIE_T_PVPERL_MS milliseconds") Signed-off-by: Sasha Levin <sashal@kernel.org>
During the resume sequence of the host, cdns_pcie_host_init() needs to be
called, so set it global.
The dev function parameter is removed, as it isn't used.
Link: https://lore.kernel.org/linux-pci/20240102-j7200-pcie-s2r-v7-2-a2f9156da6c3@bootlin.com Signed-off-by: Thomas Richard <thomas.richard@bootlin.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Stable-dep-of: 22a9120479a4 ("PCI: j721e: Deassert PERST# after a delay of PCIE_T_PVPERL_MS milliseconds") Signed-off-by: Sasha Levin <sashal@kernel.org>
The function cdns_pcie_host_setup() mixes probe structure and link setup.
The link setup must be done during the resume sequence. So extract it from
cdns_pcie_host_setup() and create a dedicated function.
Link: https://lore.kernel.org/linux-pci/20240102-j7200-pcie-s2r-v7-1-a2f9156da6c3@bootlin.com Signed-off-by: Thomas Richard <thomas.richard@bootlin.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Stable-dep-of: 22a9120479a4 ("PCI: j721e: Deassert PERST# after a delay of PCIE_T_PVPERL_MS milliseconds") Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently, the endpoint cleanup function dw_pcie_ep_cleanup() and EPF
deinit notify function pci_epc_deinit_notify() are called during the
execution of pex_ep_event_pex_rst_assert() i.e., when the host has asserted
PERST#. But quickly after this step, refclk will also be disabled by the
host.
All of the tegra194 endpoint SoCs supported as of now depend on the refclk
from the host for keeping the controller operational. Due to this
limitation, any access to the hardware registers in the absence of refclk
will result in a whole endpoint crash. Unfortunately, most of the
controller cleanups require accessing the hardware registers (like eDMA
cleanup performed in dw_pcie_ep_cleanup(), etc...). So these cleanup
functions can cause the crash in the endpoint SoC once host asserts PERST#.
One way to address this issue is by generating the refclk in the endpoint
itself and not depending on the host. But that is not always possible as
some of the endpoint designs do require the endpoint to consume refclk from
the host.
Thus, fix this crash by moving the controller cleanups to the start of
the pex_ep_event_pex_rst_deassert() function. This function is called
whenever the host has deasserted PERST# and it is guaranteed that the
refclk would be active at this point. So at the start of this function
(after enabling resources) the controller cleanup can be performed. Once
finished, rest of the code execution for PERST# deassert can continue as
usual.
Fixes: 473b2cf9c4d1 ("PCI: endpoint: Introduce 'epc_deinit' event and notify the EPF drivers") Fixes: 570d7715eed8 ("PCI: dwc: ep: Introduce dw_pcie_ep_cleanup() API for drivers supporting PERST#") Link: https://lore.kernel.org/r/20240817-pci-qcom-ep-cleanup-v1-2-d6b958226559@linaro.org Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Cc: Jonathan Hunter <jonathanh@nvidia.com> Cc: Thierry Reding <thierry.reding@gmail.com> Cc: Vidya Sagar <vidyas@nvidia.com> Cc: linux-tegra@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently, the endpoint cleanup function dw_pcie_ep_cleanup() and EPF
deinit notify function pci_epc_deinit_notify() are called during the
execution of qcom_pcie_perst_assert() i.e., when the host has asserted
PERST#. But quickly after this step, refclk will also be disabled by the
host.
All of the Qcom endpoint SoCs supported as of now depend on the refclk from
the host for keeping the controller operational. Due to this limitation,
any access to the hardware registers in the absence of refclk will result
in a whole endpoint crash. Unfortunately, most of the controller cleanups
require accessing the hardware registers (like eDMA cleanup performed in
dw_pcie_ep_cleanup(), powering down MHI EPF etc...). So these cleanup
functions are currently causing the crash in the endpoint SoC once host
asserts PERST#.
One way to address this issue is by generating the refclk in the endpoint
itself and not depending on the host. But that is not always possible as
some of the endpoint designs do require the endpoint to consume refclk from
the host (as I was told by the Qcom engineers).
Thus, fix this crash by moving the controller cleanups to the start of
the qcom_pcie_perst_deassert() function. qcom_pcie_perst_deassert() is
called whenever the host has deasserted PERST# and it is guaranteed that
the refclk would be active at this point. So at the start of this function
(after enabling resources), the controller cleanup can be performed. Once
finished, rest of the code execution for PERST# deassert can continue as
usual.
Fixes: 473b2cf9c4d1 ("PCI: endpoint: Introduce 'epc_deinit' event and notify the EPF drivers") Fixes: 570d7715eed8 ("PCI: dwc: ep: Introduce dw_pcie_ep_cleanup() API for drivers supporting PERST#") Link: https://lore.kernel.org/r/20240817-pci-qcom-ep-cleanup-v1-1-d6b958226559@linaro.org Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
If gc_mode is set to GC_URGENT_LOW or GC_URGENT_MID, cost benefit GC
approach should be used, but if ATGC is enabled at the same time,
Age-threshold approach will be selected, which can only do amount of
GC and it is much less than the numbers of CB approach.
Both threads try to acquire read lock of lock A, then its upcoming write
lock grabber will trigger deadlock.
Let's always create an asynchronous task in f2fs_handle_critical_error()
rather than calling f2fs_record_stop_reason() synchronously to avoid
this potential deadlock issue.
Fixes: b62e71be2110 ("f2fs: support errors=remount-ro|continue|panic mountoption") Reported-by: syzbot+be4a9983e95a5e25c8d3@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/6704d667.050a0220.1e4d62.0081.GAE@google.com Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Fix the following compilation warning:
fs/f2fs/data.c:2391:10: warning: variable ‘index’ set but not used
[-Wunused-but-set-variable]
2391 | pgoff_t index;
Only define and set the variable index when the CONFIG_F2FS_FS_COMPRESSION
is enabled.
Fixes: db92e6c729d8 ("f2fs: convert f2fs_mpage_readpages() to use folio") Signed-off-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
In the __f2fs_init_atgc_curseg->get_atssr_segment calling,
curseg->segno is NULL_SEGNO, indicating that there is no summary
block that needs to be written.
Fixes: 093749e296e2 ("f2fs: support age threshold based garbage collection") Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
This f2fs_bug_on was introduced by commit 2c1905042c8c ("f2fs: check
segment type in __f2fs_replace_block") when there were only 6 curseg types.
After commit d0b9e42ab615 ("f2fs: introduce inmem curseg") was introduced,
the condition should be changed to checking curseg->seg_type.
When a new device hotjoins, a new dynamic address is assigned.
i3c_master_add_i3c_dev_locked() identifies that the device was previously
attached to the bus and locates the olddev.
i3c_master_add_i3c_dev_locked()
{
...
olddev = i3c_master_search_i3c_dev_duplicate(newdev);
...
if (olddev) {
...
i3c_dev_disable_ibi_locked(olddev);
^^^^^^
The olddev should not receive any commands on the i3c bus as it
does not exist and has been assigned a new address. This will
result in NACK or timeout. So remove it.
}
i3c_dev_free_ibi_locked(olddev);
^^^^^^^^
This function internally calls i3c_dev_disable_ibi_locked() function
causing to send DISEC command with old Address.
The olddev should not receive any commands on the i3c bus as it
does not exist and has been assigned a new address. This will
result in NACK or timeout. So, update the olddev->ibi->enabled
flag to false to avoid DISEC with OldAddr.
}
Include part of Ravindra Yashvant Shinde's work:
https://lore.kernel.org/linux-i3c/20240820151917.3904956-1-ravindra.yashvant.shinde@nxp.com/T/#u
Fixes: 317bacf960a4 ("i3c: master: add enable(disable) hot join in sys entry") Co-developed-by: Ravindra Yashvant Shinde <ravindra.yashvant.shinde@nxp.com> Signed-off-by: Ravindra Yashvant Shinde <ravindra.yashvant.shinde@nxp.com> Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20241001162232.223724-1-Frank.Li@nxp.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
1) syscall finit_module() handles the module insertion and it invokes
kernel_read_file() to read the content of the module first.
2) kernel_read_file() allocates a 10MB buffer by using vmalloc() and
passes it to kernel_read(). kernel_read() constructs a kvec iter by
using iov_iter_kvec() and passes it to fuse_file_read_iter().
3) virtio-fs disables the cache, so fuse_file_read_iter() invokes
fuse_direct_io(). As for now, the maximal read size for kvec iter is
only limited by fc->max_read. For virtio-fs, max_read is UINT_MAX, so
fuse_direct_io() doesn't split the 10MB buffer. It saves the address and
the size of the 10MB-sized buffer in out_args[0] of a fuse request and
passes the fuse request to virtio_fs_wake_pending_and_unlock().
4) virtio_fs_wake_pending_and_unlock() uses virtio_fs_enqueue_req() to
queue the request. Because virtiofs need DMA-able address, so
virtio_fs_enqueue_req() uses kmalloc() to allocate a bounce buffer for
all fuse args, copies these args into the bounce buffer and passed the
physical address of the bounce buffer to virtiofsd. The total length of
these fuse args for the passed fuse request is about 10MB, so
copy_args_to_argbuf() invokes kmalloc() with a 10MB size parameter and
it triggers the warning in __alloc_pages():
if (WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp))
return NULL;
5) virtio_fs_enqueue_req() will retry the memory allocation in a
kworker, but it won't help, because kmalloc() will always return NULL
due to the abnormal size and finit_module() will hang forever.
A feasible solution is to limit the value of max_read for virtio-fs, so
the length passed to kmalloc() will be limited. However it will affect
the maximal read size for normal read. And for virtio-fs write initiated
from kernel, it has the similar problem but now there is no way to limit
fc->max_write in kernel.
So instead of limiting both the values of max_read and max_write in
kernel, introducing use_pages_for_kvec_io in fuse_conn and setting it as
true in virtiofs. When use_pages_for_kvec_io is enabled, fuse will use
pages instead of pointer to pass the KVEC_IO data.
After switching to pages for KVEC_IO data, these pages will be used for
DMA through virtio-fs. If these pages are backed by vmalloc(),
{flush|invalidate}_kernel_vmap_range() are necessary to flush or
invalidate the cache before the DMA operation. So add two new fields in
fuse_args_pages to record the base address of vmalloc area and the
condition indicating whether invalidation is needed. Perform the flush
in fuse_get_user_pages() for write operations and the invalidation in
fuse_release_user_pages() for read operations.
It may seem necessary to introduce another field in fuse_conn to
indicate that these KVEC_IO pages are used for DMA, However, considering
that virtio-fs is currently the only user of use_pages_for_kvec_io, just
reuse use_pages_for_kvec_io to indicate that these pages will be used
for DMA.
Fix several issues with rustdoc formatting for the
`kernel::block::mq::Request` module, in particular:
- An ordered list not rendering correctly, fixed by using numbers
prefixes instead of letters.
- Code snippets formatted as regular text, fixed by wrapping the
code with `back-ticks`.
- References to types missing intra-doc links, fixed by wrapping the
types with [square brackets].
Reported-by: Miguel Ojeda <ojeda@kernel.org> Closes: https://github.com/Rust-for-Linux/linux/issues/1108 Signed-off-by: Francesco Zardi <frazar00@gmail.com> Acked-by: Andreas Hindborg <a.hindborg@kernel.org> Fixes: 3253aba3408a ("rust: block: introduce `kernel::block::mq` module") Link: https://lore.kernel.org/r/20240903173027.16732-3-frazar00@gmail.com
[ Added an extra intra-doc link. Took the chance to add some periods
for consistency. Reworded slightly. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Use PCI_POSSIBLE_ERROR() to check the response we get when we read data
from hardware. This unifies PCI error response checking and makes error
checks consistent and easier to find.
The doc comment for `ThisModule` incorrectly states the C header file
for `THIS_MODULE` as `include/linux/export.h`, while the correct path is
`include/linux/init.h`. This is because `THIS_MODULE` was moved in
commit 5b20755b7780 ("init: move THIS_MODULE from <linux/export.h> to
<linux/init.h>").
Update the doc comment for `ThisModule` to reflect the correct header
file path for `THIS_MODULE`.
Fixes: 5b20755b7780 ("init: move THIS_MODULE from <linux/export.h> to <linux/init.h>") Signed-off-by: Yutaro Ohno <yutaro.ono.418@gmail.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/ZxXDZwxWgoEiIYkj@ohnotp Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
# perf --debug verbose=3 probe -x test_cpp_mangle --add "test=print_data(int)"
probe-definition(0): test=print_data(int)
symbol:print_data(int) file:(null) line:0 offset:0 return:0 lazy:(null)
0 arguments
Open Debuginfo file: /home/niayan01/test_cpp_mangle
Try to find probe point from debuginfo.
Symbol print_data(int) address found : afc
Matched function: print_data [2ccf]
Probe point found: print_data+0
Found 1 probe_trace_events.
Opening /sys/kernel/tracing//uprobe_events write=1
Opening /sys/kernel/tracing//README write=0
Writing event: p:probe_test_cpp_mangle/test /home/niayan01/test_cpp_mangle:0xb38
...
When tried to probe symbol "print_data(int)", the log shows:
Symbol print_data(int) address found : afc
The found address is 0xafc - which is right with verifying the output
result from nm. Afterwards when write event, the command uses offset
0xb38 in the last log, which is a wrong address.
The dwarf_diename() gets a common function name, in above case, it
returns string "print_data". As a result, the tool parses the offset
based on the common name. This leads to probe at the wrong symbol
"print_data(Point&)".
To fix the issue, use the die_get_linkage_name() function to retrieve
the distinct linkage name - this is the mangled name for the C++ case.
Based on this unique name, the tool can get a correct offset for
probing. Based on DWARF doc, it is possible the linkage name is missed
in the DIE, it rolls back to use dwarf_diename().
After:
# perf --debug verbose=3 probe -x test_cpp_mangle --add "test=print_data(int)"
probe-definition(0): test=print_data(int)
symbol:print_data(int) file:(null) line:0 offset:0 return:0 lazy:(null)
0 arguments
Open Debuginfo file: /home/niayan01/test_cpp_mangle
Try to find probe point from debuginfo.
Symbol print_data(int) address found : afc
Matched function: print_data [2d06]
Probe point found: print_data+0
Found 1 probe_trace_events.
Opening /sys/kernel/tracing//uprobe_events write=1
Opening /sys/kernel/tracing//README write=0
Writing event: p:probe_test_cpp_mangle/test /home/niayan01/test_cpp_mangle:0xafc
Added new event:
probe_test_cpp_mangle:test (on print_data(int) in /home/niayan01/test_cpp_mangle)
You can now use it in all perf tools, such as:
perf record -e probe_test_cpp_mangle:test -aR sleep 1
# perf --debug verbose=3 probe -x test_cpp_mangle --add "test2=print_data(Point&)"
probe-definition(0): test2=print_data(Point&)
symbol:print_data(Point&) file:(null) line:0 offset:0 return:0 lazy:(null)
0 arguments
Open Debuginfo file: /home/niayan01/test_cpp_mangle
Try to find probe point from debuginfo.
Symbol print_data(Point&) address found : b38
Matched function: print_data [2ccf]
Probe point found: print_data+0
Found 1 probe_trace_events.
Opening /sys/kernel/tracing//uprobe_events write=1
Parsing probe_events: p:probe_test_cpp_mangle/test /home/niayan01/test_cpp_mangle:0x0000000000000afc
Group:probe_test_cpp_mangle Event:test probe:p
Opening /sys/kernel/tracing//README write=0
Writing event: p:probe_test_cpp_mangle/test2 /home/niayan01/test_cpp_mangle:0xb38
Added new event:
probe_test_cpp_mangle:test2 (on print_data(Point&) in /home/niayan01/test_cpp_mangle)
You can now use it in all perf tools, such as:
perf record -e probe_test_cpp_mangle:test2 -aR sleep 1
Fixes: fb1587d869a3 ("perf probe: List probes with line number and file name") Signed-off-by: Leo Yan <leo.yan@arm.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20241012141432.877894-1-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Add missing dwarf_cfi_end to free memory associated with probe_finder
cfi_eh which is allocated and owned via a call to
dwarf_getcfi_elf. Confusingly cfi_dbg shouldn't be freed as its memory
is owned by the passed in debuginfo struct. Add comments to highlight
this.
This addresses leak sanitizer issues seen in:
tools/perf/tests/shell/test_uprobe_from_different_cu.sh
Fixes: 270bde1e76f4 ("perf probe: Search both .eh_frame and .debug_frame sections for probe location") Signed-off-by: Ian Rogers <irogers@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20241016235622.52166-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
During the rework of the dso structure in patch ee756ef7491eafd an
increment was forgotten for the symtab_type in case the data for
the kernel module are compressed. This affects the probing of the
kernel modules, which fails if the data are not already cached.
Increment the value of the symtab_type to its compressed variant so the
data could be recovered successfully.
Fixes: ee756ef7491eafd7 ("perf dso: Add reference count checking and accessor functions") Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Acked-by: Michael Petlan <mpetlan@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Michael Petlan <mpetlan@redhat.com> Link: https://lore.kernel.org/r/20241010144836.16424-1-vmolnaro@redhat.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The root cause is if checkpoint_disabling and lfs_mode are both on,
it will trigger OPU for all overwritten data, it may cost more free
segment than expected, so f2fs must account those data correctly to
calculate cosumed free segments later, and return ENOSPC earlier to
avoid run out of free segment during block allocation.
There's issue as follows when concurrently installing the f2fs.ko
module and mounting the f2fs file system:
KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
RIP: 0010:__bio_alloc+0x2fb/0x6c0 [f2fs]
Call Trace:
<TASK>
f2fs_submit_page_bio+0x126/0x8b0 [f2fs]
__get_meta_page+0x1d4/0x920 [f2fs]
get_checkpoint_version.constprop.0+0x2b/0x3c0 [f2fs]
validate_checkpoint+0xac/0x290 [f2fs]
f2fs_get_valid_checkpoint+0x207/0x950 [f2fs]
f2fs_fill_super+0x1007/0x39b0 [f2fs]
mount_bdev+0x183/0x250
legacy_get_tree+0xf4/0x1e0
vfs_get_tree+0x88/0x340
do_new_mount+0x283/0x5e0
path_mount+0x2b2/0x15b0
__x64_sys_mount+0x1fe/0x270
do_syscall_64+0x5f/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Above issue happens as the biset of the f2fs file system is not
initialized before register "f2fs_fs_type".
To address above issue just register "f2fs_fs_type" at the last in
init_f2fs_fs(). Ensure that all f2fs file system resources are
initialized.
Fixes: f543805fcd60 ("f2fs: introduce private bioset") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
For clusters of the following type, i_blocks are decremented by 1 and
i_compr_blocks are incremented by 7 in release_compress_blocks, while
updates to i_blocks and i_compr_blocks are skipped in reserve_compress_blocks.
raw node:
D D D D D D D D
after compress:
C D D D D D D D
after reserve:
C D D D D D D D
Let's update i_blocks and i_compr_blocks properly in reserve_compress_blocks.
Fixes: eb8fbaa53374 ("f2fs: compress: fix to check unreleased compressed cluster") Signed-off-by: Qi Han <hanqi@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
With the patch 0b6c5371c03c "Add missing topdown metrics events" eight
topdown metric events with numbers ranging from 0x8000 to 0x8700 were
added to the test since they were added as 'perf stat' default events.
Later the patch 951efb9976ce "Update no event/metric expectations" kept
only 4 of those events(0x8000-0x8300).
Currently, the topdown events with numbers 0x8400 to 0x8700 are missing
from the list of expected events resulting in a failure. Add back the
missing topdown events.
Fixes: 951efb9976ce ("perf test attr: Update no event/metric expectations") Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Tested-by: Ian Rogers <irogers@google.com> Cc: mpetlan@redhat.com Link: https://lore.kernel.org/r/20240311081611.7835-1-vmolnaro@redhat.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Since 9ffa6c7512ca ("perf machine thread: Remove exited threads by
default") perf cleans exited threads up, but as said, sometimes they
are necessary to be kept. The mentioned commit does not cover all the
cases, we also need the information to construct the summary table in
perf-trace.
Fixes: 49de179577e7 ("perf stat: No need to setup affinities when starting a workload") Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20241001052327.7052-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When create_perf_stat_counter() failed, it doesn't close workload.cork_fd
open in evlist__prepare_workload(). This could make too many open file
error while __run_perf_stat() repeats.
Introduce evlist__cancel_workload to close workload.cork_fd and
wait workload.child_pid until exit to clear child process
when create_perf_stat_counter() is failed.
Signed-off-by: Levi Yun <yeoreum.yun@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: nd@arm.com Cc: howardchu95@gmail.com Link: https://lore.kernel.org/r/20240925132022.2650180-2-yeoreum.yun@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Stable-dep-of: 7f6ccb70e465 ("perf stat: Fix affinity memory leaks on error path") Signed-off-by: Sasha Levin <sashal@kernel.org>
In reset_method_store(), a string is allocated via kstrndup() and assigned
to the local "options". options is then used in with strsep() to find
spaces:
while ((name = strsep(&options, " ")) != NULL) {
If there are no remaining spaces, then options is set to NULL by strsep(),
so the subsequent kfree(options) doesn't free the memory allocated via
kstrndup().
Fix by using a separate tmp_options to iterate with strsep() so options is
preserved.
Link: https://lore.kernel.org/r/20241001231147.3583649-1-tkjos@google.com Fixes: d88f521da3ef ("PCI: Allow userspace to query and set device reset mechanism") Signed-off-by: Todd Kjos <tkjos@google.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
With commit 8ec9497d3ef34 ("tools/include: Sync uapi/linux/perf.h
with the kernel sources"), 'perf mem report' gives an incorrect memory
access string.
...
0.02% 1 3644 L5 hit [.] 0x0000000000009b0e mlc [.] 0x00007fce43f59480
...
This occurs because, if no entry exists in mem_lvlnum, perf_mem__lvl_scnprintf
will default to 'L%d, lvl', which in this case for PERF_MEM_LVLNUM_L2_MHB is 0x05.
Add entries for PERF_MEM_LVLNUM_L2_MHB and PERF_MEM_LVLNUM_MSC to mem_lvlnum,
so that the correct strings are printed.
...
0.02% 1 3644 L2 MHB hit [.] 0x0000000000009b0e mlc [.] 0x00007fce43f59480
...
Fixes: 8ec9497d3ef34 ("tools/include: Sync uapi/linux/perf.h with the kernel sources") Suggested-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Thomas Falcon <thomas.falcon@intel.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20240926144040.77897-1-thomas.falcon@intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Intel TPEBS sampling mode is supported through perf record. The counting mode
code uses perf record to capture retire_latency value and use it in metric
calculation. This test checks the counting mode code on Intel platforms.
Committer testing:
root@x1:~# perf test tpebs
123: test Intel TPEBS counting mode : Ok
root@x1:~# set -o vi
root@x1:~# perf test tpebs
123: test Intel TPEBS counting mode : Ok
root@x1:~# perf test -v tpebs
123: test Intel TPEBS counting mode : Ok
root@x1:~# perf test -vvv tpebs
123: test Intel TPEBS counting mode:
--- start ---
test child forked, pid 16603
Testing without --record-tpebs
Testing with --record-tpebs
---- end(0) ----
123: test Intel TPEBS counting mode : Ok
root@x1:~#
Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Weilin Wang <weilin.wang@intel.com> Acked-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Link: https://lore.kernel.org/r/20240720062102.444578-9-weilin.wang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Stable-dep-of: 057f8bfc6f70 ("perf stat: Uniquify event name improvements") Signed-off-by: Sasha Levin <sashal@kernel.org>
Before commit f0e56edc2ec7 ("gfs2: Split the two kinds of glock "delete"
work"), function delete_work_func() was used to trigger the eviction of
in-memory inodes from remote as well as deleting unlinked inodes at a
later point. These two kinds of work were then split into two kinds of
work, and the two places in the code were deferred deletion of inodes is
required accidentally ended up queuing the wrong kind of work. This
caused unlinked inodes to be left behind, which could in the worst case
fill up filesystems and require a filesystem check to recover.
Fix that by queuing the right kind of work in try_rgrp_unlink() and
gfs2_drop_inode().
Fixes: f0e56edc2ec7 ("gfs2: Split the two kinds of glock "delete" work") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
cs_etm__flush(), like cs_etm__sample() is an operation that generates a
sample and then swaps the current with the previous packet. Calling
flush after processing the queues results in two swaps which corrupts
the next sample. Therefore it wasn't appropriate to call flush here so
remove it.
Flushing is still done on a discontinuity to explicitly clear the last
branch buffer, but when the packet_queue fills up before reaching a
timestamp, that's not a discontinuity and the call to
cs_etm__process_traceid_queue() already generated samples and drained
the buffers correctly.
This is visible by looking for a branch that has the same target as the
previous branch and the following source is before the address of the
last target, which is impossible as execution would have had to have
gone backwards:
Make sure that a final branch stack is output at the end of the trace
by calling cs_etm__end_block(). This is already done for both the
timeless decode paths.
Fixes: 21fe8dc1191a ("perf cs-etm: Add support for CPU-wide trace scenarios") Reported-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Closes: https://lore.kernel.org/all/20240719092619.274730-1-gankulkarni@os.amperecomputing.com/ Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Ruidong Tian <tianruidong@linux.alibaba.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Cc: John Garry <john.g.garry@oracle.com> Cc: scclevenger@os.amperecomputing.com Link: https://lore.kernel.org/r/20240916135743.1490403-2-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When the size isn't a small constant, __access_ok() will call
valid_user_address() with the address after the last byte of the user
buffer.
It is valid for a buffer to end with the last valid user address so
valid_user_address() must allow accesses to the base of the guard page.
[ This introduces an off-by-one in the other direction for the plain
non-sized accesses, but since we have that guard region that is a
whole page, those checks "allowing" accesses to that guard region
don't really matter. The access will fault anyway, whether to the
guard page or if the address has been masked to all ones - Linus ]
Both the inner and outer loops in this code use the "i" iterator.
The inner loop should really use a different iterator.
It doesn't affect things in practice because the data comes from the
device tree. The "protocol" and "windows" variables are going to be
zero. That means we're always going to hit the "return &chans[channel];"
statement and we're not going to want to iterate through the outer
loop again.
Still it's worth fixing this for future use cases.