Relocate the runtime PM enable operation to sci_probe_single(). This change
prepares the codebase for upcoming fixes.
While at it, replace the existing logic with a direct call to
devm_pm_runtime_enable() and remove sci_cleanup_single(). The
devm_pm_runtime_enable() function automatically handles disabling runtime
PM during driver removal.
On the Renesas RZ/G3S, when doing suspend to RAM, the uart_suspend_port()
is called. The uart_suspend_port() calls 3 times the
struct uart_port::ops::tx_empty() before shutting down the port.
According to the documentation, the struct uart_port::ops::tx_empty()
API tests whether the transmitter FIFO and shifter for the port is
empty.
The Renesas RZ/G3S SCIFA IP reports the number of data units stored in the
transmit FIFO through the FDR (FIFO Data Count Register). The data units
in the FIFOs are written in the shift register and transmitted from there.
The TEND bit in the Serial Status Register reports if the data was
transmitted from the shift register.
In the previous code, in the tx_empty() API implemented by the sh-sci
driver, it is considered that the TX is empty if the hardware reports the
TEND bit set and the number of data units in the FIFO is zero.
According to the HW manual, the TEND bit has the following meaning:
0: Transmission is in the waiting state or in progress.
1: Transmission is completed.
It has been noticed that when opening the serial device w/o using it and
then switch to a power saving mode, the tx_empty() call in the
uart_port_suspend() function fails, leading to the "Unable to drain
transmitter" message being printed on the console. This is because the
TEND=0 if nothing has been transmitted and the FIFOs are empty. As the
TEND=0 has double meaning (waiting state, in progress) we can't
determined the scenario described above.
Add a software workaround for this. This sets a variable if any data has
been sent on the serial console (when using PIO) or if the DMA callback has
been called (meaning something has been transmitted). In the tx_empty()
API the status of the DMA transaction is also checked and if it is
completed or in progress the code falls back in checking the hardware
registers instead of relying on the software variable.
Sysfs interface for updating firmware for RMI devices is available even
when F34 probe fails. The code checks for presence of F34 "container"
pointer and then tries to use the function data attached to the
sub-device. F34 assigns the function data early, before it knows if
probe will succeed, leaving behind a stale pointer.
Fix this by expanding checks to not only test for presence of F34
"container" but also check if there is driver data assigned to the
sub-device, and call dev_set_drvdata() only after we are certain that
probe is successful.
This is not a complete fix, since F34 will be freed during firmware
update, so there is still a race when fetching and accessing this
pointer. This race will be addressed in follow-up changes.
Reported-by: Hanno Böck <hanno@hboeck.de> Fixes: 29fd0ec2bdbe ("Input: synaptics-rmi4 - add support for F34 device reflash") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/aBlAl6sGulam-Qcx@google.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Follow the advice of the Documentation/filesystems/sysfs.rst and show()
should only use sysfs_emit() or sysfs_emit_at() when formatting the value
to be returned to user space.
The error checking for of_count_phandle_with_args() does not handle
negative error codes correctly. The problem is that "index" is a u32 so
in the condition "if (index >= num_domains)" negative error codes stored
in "num_domains" are type promoted to very high positive values and
"index" is always going to be valid.
Test for negative error codes first and then test if "index" is valid.
Fixes: 3ccf3f0cd197 ("PM / Domains: Enable genpd_dev_pm_attach_by_id|name() for single PM domain") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/aBxPQ8AI8N5v-7rL@stanley.mountain Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Ensure that propagation settings can only be changed for mounts located
in the caller's mount namespace. This change aligns permission checking
with the rest of mount(2).
Reviewed-by: Christian Brauner <brauner@kernel.org> Fixes: 07b20889e305 ("beginning of the shared-subtree proper") Reported-by: "Orlando, Noah" <Noah.Orlando@deshaw.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
9ffb14ef61ba "move_mount: allow to add a mount into an existing group"
breaks assertions on ->mnt_share/->mnt_slave. For once, the data structures
in question are actually documented.
Documentation/filesystem/sharedsubtree.rst:
All vfsmounts in a peer group have the same ->mnt_master. If it is
non-NULL, they form a contiguous (ordered) segment of slave list.
do_set_group() puts a mount into the same place in propagation graph
as the old one. As the result, if old mount gets events from somewhere
and is not a pure event sink, new one needs to be placed next to the
old one in the slave list the old one's on. If it is a pure event
sink, we only need to make sure the new one doesn't end up in the
middle of some peer group.
"move_mount: allow to add a mount into an existing group" ends up putting
the new one in the beginning of list; that's definitely not going to be
in the middle of anything, so that's fine for case when old is not marked
shared. In case when old one _is_ marked shared (i.e. is not a pure event
sink), that breaks the assumptions of propagation graph iterators.
Put the new mount next to the old one on the list - that does the right thing
in "old is marked shared" case and is just as correct as the current behaviour
if old is not marked shared (kudos to Pavel for pointing that out - my original
suggested fix changed behaviour in the "nor marked" case, which complicated
things for no good reason).
Reviewed-by: Christian Brauner <brauner@kernel.org> Fixes: 9ffb14ef61ba ("move_mount: allow to add a mount into an existing group") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
The kernel currently validates that the length of the provided nexthop
address does not exceed the specified length. This can lead to the
kernel reading uninitialized memory if user space provided a shorter
length than the specified one.
Fix by validating that the provided length exactly matches the specified
one.
Fixes: d1df6fd8a1d2 ("ipv6: sr: define core operations for seg6local lightweight tunnel") Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250604113252.371528-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Enable threaded NAPI by default for WireGuard devices in response to low
performance behavior that we observed when multiple tunnels (and thus
multiple wg devices) are deployed on a single host. This affects any
kind of multi-tunnel deployment, regardless of whether the tunnels share
the same endpoints or not (i.e., a VPN concentrator type of gateway
would also be affected).
The problem is caused by the fact that, in case of a traffic surge that
involves multiple tunnels at the same time, the polling of the NAPI
instance of all these wg devices tends to converge onto the same core,
causing underutilization of the CPU and bottlenecking performance.
This happens because NAPI polling is hosted by default in softirq
context, but the WireGuard driver only raises this softirq after the rx
peer queue has been drained, which doesn't happen during high traffic.
In this case, the softirq already active on a core is reused instead of
raising a new one.
As a result, once two or more tunnel softirqs have been scheduled on
the same core, they remain pinned there until the surge ends.
In our experiments, this almost always leads to all tunnel NAPIs being
handled on a single core shortly after a surge begins, limiting
scalability to less than 3× the performance of a single tunnel, despite
plenty of unused CPU cores being available.
The proposed mitigation is to enable threaded NAPI for all WireGuard
devices. This moves the NAPI polling context to a dedicated per-device
kernel thread, allowing the scheduler to balance the load across all
available cores.
On our 32-core gateways, enabling threaded NAPI yields a ~4× performance
improvement with 16 tunnels, increasing throughput from ~13 Gbps to
~48 Gbps. Meanwhile, CPU usage on the receiver (which is the bottleneck)
jumps from 20% to 100%.
We have found no performance regressions in any scenario we tested.
Single-tunnel throughput remains unchanged.
Commit 03f1444016b7 ("PM: sleep: Fix handling devices with direct_complete
set on errors") caused power.is_suspended to be set for devices with
power.direct_complete set, but it forgot to ensure the clearing of that
flag for them in device_resume(), so power.is_suspended is still set for
them during the next system suspend-resume cycle.
If that cycle is aborted in dpm_suspend(), the subsequent invocation of
dpm_resume() will trigger a device_resume() call for every device and
because power.is_suspended is set for the devices in question, they will
not be skipped by device_resume() as expected which causes scary error
messages to be logged (as appropriate).
To address this issue, move the clearing of power.is_suspended in
device_resume() immediately after the power.is_suspended check so it
will be always cleared for all devices processed by that function.
Fixes: 03f1444016b7 ("PM: sleep: Fix handling devices with direct_complete set on errors") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4280 Reported-and-tested-by: Chris Bainbridge <chris.bainbridge@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://patch.msgid.link/4990586.GXAFRqVoOG@rjwysocki.net Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit 3d010c8031e3 ("udp: do not accept non-tunnel GSO skbs landing
in a tunnel") added checks in linux stack to not accept non-tunnel
GRO packets landing in a tunnel. This exposed an issue in vmxnet3
which was not correctly reporting GRO packets for tunnel packets.
This patch fixes this issue by setting correct GSO type for the
tunnel packets.
Currently, vmxnet3 does not support reporting inner fields for LRO
tunnel packets. The issue is not seen for egress drivers that do not
use skb inner fields. The workaround is to enable tnl-segmentation
offload on the egress interfaces if the driver supports it. This
problem pre-exists this patch fix and can be addressed as a separate
future patch.
Fixes: dacce2be3312 ("vmxnet3: add geneve and vxlan tunnel offload support") Signed-off-by: Ronak Doshi <ronak.doshi@broadcom.com> Acked-by: Guolin Yang <guolin.yang@broadcom.com> Link: https://patch.msgid.link/20250530152701.70354-1-ronak.doshi@broadcom.com
[pabeni@redhat.com: dropped the changelog] Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The current implementation of the Tx scheduler tree attempts
to create nodes for all Tx queues, ignoring the fact that some
queues may already exist in the tree. For example, if the VSI
already has 128 Tx queues and the user requests for 16 new queues,
the Tx scheduler will compute the tree for 272 queues (128 existing
queues + 144 new queues), instead of 144 queues (128 existing queues
and 16 new queues).
Fix that by modifying the node count calculation algorithm to skip
the queues that already exist in the tree.
Fixes: 5513b920a4f7 ("ice: Update Tx scheduler tree for VSI multi-Tx queue support") Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Jesse Brandeburg <jbrandeburg@cloudflare.com> Tested-by: Saritha Sanigani <sarithax.sanigani@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Depending on the security set the response to L2CAP_LE_CONN_REQ shall be
just L2CAP_CR_LE_ENCRYPTION if only encryption when BT_SECURITY_MEDIUM
is selected since that means security mode 2 which doesn't require
authentication which is something that is covered in the qualification
test L2CAP/LE/CFC/BV-25-C.
Link: https://github.com/bluez/bluez/issues/1270 Fixes: 27e2d4c8d28b ("Bluetooth: Add basic LE L2CAP connect request receiving support") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The "freq" variable is in terms of MHz and "max_val_cycles" is in terms
of Hz. The fact that "max_val_cycles" is a u64 suggests that support
for high frequency is intended but the "freq_khz * 1000" would overflow
the u32 type if we went above 4GHz. Use unsigned long long type for the
mutliplication to prevent that.
Fixes: 31c128b66e5b ("net/mlx4_en: Choose time-stamping shift value according to HW frequency") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/aDbFHe19juIJKjsb@stanley.mountain Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Identify the cause of the suspend/resume hang: netif_carrier_off()
is called during link state changes and becomes stuck while
executing linkwatch_work().
To resolve this issue, call netif_device_detach() during the Ethernet
suspend process to temporarily detach the network device from the
kernel and prevent the suspend/resume hang.
Fixes: 8c7bd5a454ff ("net: ethernet: mtk-star-emac: new driver") Signed-off-by: Yanqing Wang <ot_yanqing.wang@mediatek.com> Signed-off-by: Macpaul Lin <macpaul.lin@mediatek.com> Signed-off-by: Biao Huang <biao.huang@mediatek.com> Link: https://patch.msgid.link/20250528075351.593068-1-macpaul.lin@mediatek.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
syzbot reported a refcount warning [1] caused by calling get_net() on
a network namespace that is being destroyed (refcount=0). This happens
when a TIPC discovery timer fires during network namespace cleanup.
The recently added get_net() call in commit e279024617134 ("net/tipc:
fix slab-use-after-free Read in tipc_aead_encrypt_done") attempts to
hold a reference to the network namespace. However, if the namespace
is already being destroyed, its refcount might be zero, leading to the
use-after-free warning.
Replace get_net() with maybe_get_net(), which safely checks if the
refcount is non-zero before incrementing it. If the namespace is being
destroyed, return -ENODEV early, after releasing the bearer reference.
Previously, the RX_BUFFERS_POSTED stat incorrectly reported the
fill_cnt from RX queue 0 for all queues, resulting in inaccurate
per-queue statistics.
Fix this by correctly indexing priv->rx[idx].fill_cnt for each RX queue.
Fixes: 24aeb56f2d38 ("gve: Add Gvnic stats AQ command and ethtool show/set-priv-flags.") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Link: https://patch.msgid.link/20250527130830.1812903-1-alok.a.tiwari@oracle.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
bus_id is currently derived from the ethernetX alias. If one is missing
for the device, 0 is used. If ethernet0 points to another stmmac device
or if there are 2+ stmmac devices without an ethernet alias, then bus_id
will be 0 for all of those.
This is an issue because the bus_id is used to generate the mdio bus id
(new_bus->id in drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
stmmac_mdio_register) and this needs to be unique.
This allows to avoid needing to define ethernet aliases for devices with
multiple stmmac controllers (such as the Rockchip RK3588) for multiple
stmmac devices to probe properly.
Obviously, the bus_id isn't guaranteed to be stable across reboots if no
alias is set for the device but that is easily fixed by simply adding an
alias if this is desired.
Fixes: 25c83b5c2e82 ("dt:net:stmmac: Add support to dwmac version 3.610 and 3.710") Signed-off-by: Quentin Schulz <quentin.schulz@cherry.de> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20250527-stmmac-mdio-bus_id-v2-1-a5ca78454e3c@cherry.de Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
They are listed amon those cmd values that "treat 'arg' as an integer"
which is wrong. They should instead fall into the default case. Probably
nobody ever relied on that code since 2009 but still.
Fixes: e92166517e3c ("tty: handle VT specific compat ioctls in vt driver") Signed-off-by: Nicolas Pitre <npitre@baylibre.com> Reviewed-by: Jiri Slaby <jirislaby@kernel.org> Link: https://lore.kernel.org/r/pr214s15-36r8-6732-2pop-159nq85o48r7@syhkavp.arg Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The sinc4 filter has a factor 0.23 between Output Data Rate and f_{3dB}
and for sinc3 the factor is 0.272 according to the data sheets for
ad7124-4 (Rev. E.) and ad7124-8 (Rev. F).
Reorder the initialization sequence in `usbhs_probe()` to enable runtime
PM before accessing registers, preventing potential crashes due to
uninitialized clocks.
Currently, in the probe path, registers are accessed before enabling the
clocks, leading to a synchronous external abort on the RZ/V2H SoC.
The problematic call flow is as follows:
Since `iowrite16()` is performed without ensuring the required clocks are
enabled, this can lead to access errors. To fix this, enable PM runtime
early in the probe function and ensure clocks are acquired before register
access, preventing crashes like the following on RZ/V2H:
Previously the struct aer_err_info "info" was allocated on the stack
without being initialized, so it contained junk except for the fields we
explicitly set later.
Initialize "info" at declaration so it starts as all zeros.
Fixes: 8aefa9b0d910 ("PCI/DPC: Print AER status in DPC event handling") Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20250522232339.1525671-2-helgaas@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
devm_kasprintf() returns NULL when memory allocation fails. Currently,
udma_probe() does not check for this case, which results in a NULL
pointer dereference.
Add NULL check after devm_kasprintf() to prevent this issue.
Fixes: 25dcb5dd7b7c ("dmaengine: ti: New driver for K3 UDMA") Signed-off-by: Henry Martin <bsdhenrymartin@gmail.com> Reviewed-by: Nathan Lynch <nathan.lynch@amd.com> Acked-by: Peter Ujfalusi <peter.ujfalusi@gmail.com> Link: https://lore.kernel.org/r/20250402023900.43440-1-bsdhenrymartin@gmail.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
If the call to pci_host_probe() in cdns_pcie_host_setup() fails, PM
runtime count is decremented in the error path using pm_runtime_put_sync().
But the runtime count is not incremented by this driver, but only by the
callers (cdns_plat_pcie_probe/j721e_pcie_probe). And the callers also
decrement the runtime PM count in their error path. So this leads to the
below warning from the PM core:
"runtime PM usage count underflow!"
So fix it by getting rid of pm_runtime_put_sync() in the error path and
directly return the errno.
The DT bindings for this driver define the interrupts in the order as
they are numbered in the interrupt controller. The old platform_data,
however, listed them in a different order. So, for DT based platforms,
they are mixed up. Assign them specifically for DT, so we can keep the
bindings stable. After the fix, 'rtctest' passes again on the Renesas
Genmai board (RZ-A1 / R7S72100).
In some scenarios, when mounting NFS, more than one superblock may be
created. The final superblock used is the last one created, but only the
first superblock carries the ro flag passed from user space. If a ro flag
is added to the superblock via remount, it will trigger the issue
described in Link[1].
Link[2] attempted to address this by marking the superblock as ro during
the initial mount. However, this introduced a new problem in scenarios
where multiple mount points share the same superblock:
[root@a ~]# mount /dev/sdb /mnt/sdb
[root@a ~]# echo "/mnt/sdb *(rw,no_root_squash)" > /etc/exports
[root@a ~]# echo "/mnt/sdb/test_dir2 *(ro,no_root_squash)" >> /etc/exports
[root@a ~]# systemctl restart nfs-server
[root@a ~]# mount -t nfs -o rw 127.0.0.1:/mnt/sdb/test_dir1 /mnt/test_mp1
[root@a ~]# mount | grep nfs4
127.0.0.1:/mnt/sdb/test_dir1 on /mnt/test_mp1 type nfs4 (rw,relatime,...
[root@a ~]# mount -t nfs -o ro 127.0.0.1:/mnt/sdb/test_dir2 /mnt/test_mp2
[root@a ~]# mount | grep nfs4
127.0.0.1:/mnt/sdb/test_dir1 on /mnt/test_mp1 type nfs4 (ro,relatime,...
127.0.0.1:/mnt/sdb/test_dir2 on /mnt/test_mp2 type nfs4 (ro,relatime,...
[root@a ~]#
When mounting the second NFS, the shared superblock is marked as ro,
causing the previous NFS mount to become read-only.
To resolve both issues, the ro flag is no longer applied to the superblock
during remount. Instead, the ro flag on the mount is used to control
whether the mount point is read-only.
Fixes: 281cad46b34d ("NFS: Create a submount rpc_op")
Link[1]: https://lore.kernel.org/all/20240604112636.236517-3-lilingfeng@huaweicloud.com/
Link[2]: https://lore.kernel.org/all/20241130035818.1459775-1-lilingfeng3@huawei.com/ Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
As described in the link, commit 52cb7f8f1778 ("nfs: ignore SB_RDONLY when
mounting nfs") removed the check for the ro flag when determining whether
to share the superblock, which caused issues when mounting different
subdirectories under the same export directory via NFSv3. However, this
change did not affect NFSv4.
For NFSv3:
1) A single superblock is created for the initial mount.
2) When mounted read-only, this superblock carries the SB_RDONLY flag.
3) Before commit 52cb7f8f1778 ("nfs: ignore SB_RDONLY when mounting nfs"):
Subsequent rw mounts would not share the existing ro superblock due to
flag mismatch, creating a new superblock without SB_RDONLY.
After the commit:
The SB_RDONLY flag is ignored during superblock comparison, and this leads
to sharing the existing superblock even for rw mounts.
Ultimately results in write operations being rejected at the VFS layer.
For NFSv4:
1) Multiple superblocks are created and the last one will be kept.
2) The actually used superblock for ro mounts doesn't carry SB_RDONLY flag.
Therefore, commit 52cb7f8f1778 doesn't affect NFSv4 mounts.
Clear SB_RDONLY before getting superblock when NFS_MOUNT_UNSHARED is not
set to fix it.
Fixes: 52cb7f8f1778 ("nfs: ignore SB_RDONLY when mounting nfs") Closes: https://lore.kernel.org/all/12d7ea53-1202-4e21-a7ef-431c94758ce5@app.fastmail.com/T/ Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The test might fail on the Arm64 platform with the error:
# perf test -vvv "Track with sched_switch"
Missing sched_switch events
#
The issue is caused by incorrect handling of timestamp comparisons. The
comparison result, a signed 64-bit value, was being directly cast to an
int, leading to incorrect sorting for sched events.
The case does not fail everytime, usually I can trigger the failure
after run 20 ~ 30 times:
# while true; do perf test "Track with sched_switch"; done
106: Track with sched_switch : Ok
106: Track with sched_switch : Ok
106: Track with sched_switch : Ok
106: Track with sched_switch : Ok
106: Track with sched_switch : Ok
106: Track with sched_switch : Ok
106: Track with sched_switch : Ok
106: Track with sched_switch : Ok
106: Track with sched_switch : Ok
106: Track with sched_switch : Ok
106: Track with sched_switch : Ok
106: Track with sched_switch : Ok
106: Track with sched_switch : Ok
106: Track with sched_switch : Ok
106: Track with sched_switch : FAILED!
106: Track with sched_switch : Ok
106: Track with sched_switch : Ok
106: Track with sched_switch : Ok
106: Track with sched_switch : Ok
106: Track with sched_switch : Ok
106: Track with sched_switch : Ok
106: Track with sched_switch : Ok
106: Track with sched_switch : Ok
106: Track with sched_switch : FAILED!
106: Track with sched_switch : Ok
106: Track with sched_switch : Ok
I used cross compiler to build Perf tool on my host machine and tested on
Debian / Juno board. Generally, I think this issue is not very specific
to GCC versions. As both internal CI and my local env can reproduce the
issue.
# lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 12 (bookworm)
Release: 12
Codename: bookworm
Fix this by explicitly returning 0, 1, or -1 based on whether the result
is zero, positive, or negative.
Fixes: d44bc558297222d9 ("perf tests: Add a test for tracking with sched_switch") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Leo Yan <leo.yan@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: James Clark <james.clark@linaro.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250331172759.115604-1-leo.yan@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The script allows the user to enter patterns to find symbols.
The pattern matching characters are converted for use in SQL.
For PostgreSQL the conversion involves using the Python maketrans()
method which is slightly different in Python 3 compared with Python 2.
Fix to work in Python 3.
Fixes: beda0e725e5f06ac ("perf script python: Add Python3 support to exported-sql-viewer.py") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Tony Jones <tonyj@suse.de> Link: https://lore.kernel.org/r/20250512093932.79854-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
devm_kasprintf() returns NULL when memory allocation fails. Currently,
wled_configure() does not check for this case, which results in a NULL
pointer dereference.
Add NULL check after devm_kasprintf() to prevent this issue.
Fixes: f86b77583d88 ("backlight: pm8941: Convert to using %pOFn instead of device_node.name") Signed-off-by: Henry Martin <bsdhenrymartin@gmail.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Reviewed-by: "Daniel Thompson (RISCstar)" <danielt@kernel.org> Link: https://lore.kernel.org/r/20250401091647.22784-1-bsdhenrymartin@gmail.com Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
In 7cecb7fe8388d5c3 ("perf hists: Move sort__has_comm into struct
perf_hpp_list") it assumes that act->thread is set prior to calling
do_zoom_thread().
This doesn't happen when we use ESC or the Left arrow key to Zoom out of
a specific thread, making this operation not to work and we get stuck
into the thread zoom.
In 6422184b087ff435 ("perf hists browser: Simplify zooming code using
pstack_peek()") it says no need to set actions->thread, and at that
point that was true, but in 7cecb7fe8388d5c3 a actions->thread == NULL
check was added before the zoom out of thread could kick in.
We can zoom out using the alternative 't' thread zoom toggle hotkey to
finally set actions->thread before calling do_zoom_thread() and zoom
out, but lets also fix the ESC/Zoom out of thread case.
Fixes: 7cecb7fe8388d5c3 ("perf hists: Move sort__has_comm into struct perf_hpp_list") Reported-by: Ingo Molnar <mingo@kernel.org> Tested-by: Ingo Molnar <mingo@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/Z_TYux5fUg2pW-pF@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Fixes: c7a14fdcb3fa7736 ("perf build-ids: Fall back to debuginfod query if debuginfo not found") Tested-by: Ingo Molnar <mingo@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frank Ch. Eigler <fche@redhat.com> Link: https://lore.kernel.org/r/Z_dkNDj9EPFwPqq1@gmail.com
[ Folded patch from Ingo to have the debian/ubuntu devel package added build warning message ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In fb_find_mode_cvt(), iff mode->refresh somehow happens to be 0x80000000,
cvt.f_refresh will become 0 when multiplying it by 2 due to overflow. It's
then passed to fb_cvt_hperiod(), where it's used as a divider -- division
by 0 will result in kernel oops. Add a sanity check for cvt.f_refresh to
avoid such overflow...
Found by Linux Verification Center (linuxtesting.org) with the Svace static
analysis tool.
devm_kasprintf() returns NULL when memory allocation fails. Currently,
aspeed_lpc_enable_snoop() does not check for this case, which results in a
NULL pointer dereference.
Add NULL check after devm_kasprintf() to prevent this issue.
Fixes: 3772e5da4454 ("drivers/misc: Aspeed LPC snoop output using misc chardev") Signed-off-by: Henry Martin <bsdhenrymartin@gmail.com> Link: https://patch.msgid.link/20250401074647.21300-1-bsdhenrymartin@gmail.com
[arj: Fix Fixes: tag to use subject from 3772e5da4454] Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
The u2phy0_host port is the part of the USB PHY0 (namely the
HOST0_DP/DM lanes) which routes directly to the USB2.0 HOST
controller[1]. The other lanes of the PHY are routed to the USB3.0 OTG
controller (dwc3), which we do use.
The HOST0_DP/DM lanes aren't routed on RK3399 Puma so let's simply
disable the USB2.0 controllers.
USB3 OTG has been known to be unstable on RK3399 Puma Haikou for a
while, one of the recurring issues being that only USB2 is detected and
not USB3 in host mode. Reading the justification above and seeing that
we are keeping u2phy0_host in the Haikou carrierboard DTS probably may
have bothered you since it should be changed to u2phy0_otg. The issue is
that if it's switched to that, USB OTG on Haikou is entirely broken. I
have checked the routing in the Gerber file, the lanes are going to the
expected ball pins (that is, NOT HOST0_DP/DM).
u2phy0_host is for sure the wrong part of the PHY to use, but it's the
only one that works at the moment for that board so keep it until we
figure out what exactly is broken.
Follow up the expected way of describing the SFPB hwspinlock and merge
hwspinlock node into corresponding syscon node, fixing several dt-schema
warnings.
The blamed commit tried to simplify how the deallocations are done but,
in the process, introduced a double-free on the mc_dev variable.
In case the MC device is a DPRC, a new mc_bus is allocated and the
mc_dev variable is just a reference to one of its fields. In this
circumstance, on the error path only the mc_bus should be freed.
This commit introduces back the following checkpatch warning which is a
false-positive.
WARNING: kfree(NULL) is safe and this check is probably not required
+ if (mc_bus)
+ kfree(mc_bus);
In preparation for writing logs, in nilfs_btree_propagate(), which makes
parent and ancestor node blocks dirty starting from a modified data block
or b-tree node block, if the starting block does not belong to the b-tree,
i.e. is isolated, nilfs_btree_do_lookup() called within the function
fails with -ENOENT.
In this case, even though -ENOENT is an internal code, it is propagated to
the log writer via nilfs_bmap_propagate() and may be erroneously returned
to system calls such as fsync().
Fix this issue by changing the error code to -EINVAL in this case, and
having the bmap layer detect metadata corruption and convert the error
code appropriately.
Link: https://lkml.kernel.org/r/20250428173808.6452-3-konishi.ryusuke@gmail.com Fixes: 1f5abe7e7dbc ("nilfs2: replace BUG_ON and BUG calls triggerable from ioctl") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Wentao Liang <vulab@iscas.ac.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Patch series "nilfs2: improve sanity checks in dirty state propagation".
This fixes one missed check for block mapping anomalies and one improper
return of an error code during a preparation step for log writing, thereby
improving checking for filesystem corruption on writeback.
This patch (of 2):
In nilfs_direct_propagate(), the printer get from nilfs_direct_get_ptr()
need to be checked to ensure it is not an invalid pointer.
If the pointer value obtained by nilfs_direct_get_ptr() is
NILFS_BMAP_INVALID_PTR, means that the metadata (in this case, i_bmap in
the nilfs_inode_info struct) that should point to the data block at the
buffer head of the argument is corrupted and the data block is orphaned,
meaning that the file system has lost consistency.
Add a value check and return -EINVAL when it is an invalid pointer.
If ocfs2_finish_quota_recovery() exits due to an error before passing all
rc_list elements to ocfs2_recover_local_quota_file() then it can lead to a
memory leak as rc_list may still contain elements that have to be freed.
Release all memory allocated by ocfs2_add_recovery_chunk() using
ocfs2_free_quota_recovery() instead of kfree().
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Link: https://lkml.kernel.org/r/20250402065628.706359-2-m.masimov@mt-integration.ru Fixes: 2205363dce74 ("ocfs2: Implement quota recovery") Signed-off-by: Murad Masimov <m.masimov@mt-integration.ru> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Syzkaller reports an "UBSAN: shift-out-of-bounds in squashfs_bio_read" bug.
Syzkaller forks multiple processes which after mounting the Squashfs
filesystem, issues an ioctl("/dev/loop0", LOOP_SET_BLOCK_SIZE, 0x8000).
Now if this ioctl occurs at the same time another process is in the
process of mounting a Squashfs filesystem on /dev/loop0, the failure
occurs. When this happens the following code in squashfs_fill_super()
fails.
Although not noticeable when used every day, the RTC appears to drift when
left to sit over time. This is due to the capacitive load not being
properly set. Fix RTC drift by correcting the capacitive load setting
from 7000 to 12500, which matches the actual hardware configuration.
Fixes: 36ca3c8ccb53 ("arm64: dts: imx: Add Beacon i.MX8M Nano development kit") Signed-off-by: Adam Ford <aford173@gmail.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Although not noticeable when used every day, the RTC appears to drift when
left to sit over time. This is due to the capacitive load not being
properly set. Fix RTC drift by correcting the capacitive load setting
from 7000 to 12500, which matches the actual hardware configuration.
Fixes: 593816fa2f35 ("arm64: dts: imx: Add Beacon i.MX8m-Mini development kit") Signed-off-by: Adam Ford <aford173@gmail.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
NAND did not work on my USB-A9263. I discovered that the offending
commit converted the PIO bank for chip selects wrongly, so all A9263
boards need to be fixed.
Fixes: 1004a2977bdc ("ARM: dts: at91: Switch to the new NAND bindings") Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20250402210446.5972-2-wsa+renesas@sang-engineering.com Signed-off-by: Claudiu Beznea <claudiu.beznea@tuxon.dev> Signed-off-by: Sasha Levin <sashal@kernel.org>
We have noticed that when PHY timestamping is enabled, L2 frames seems
to be modified by changing two 2 bytes with a value of 0. The place were
these 2 bytes seems to be random(or I couldn't find a pattern). In most
of the cases the userspace can ignore these frames but if for example
those 2 bytes are in the correction field there is nothing to do. This
seems to happen when configuring the HW for IPv4 even that the flow is
not enabled.
These 2 bytes correspond to the UDPv4 checksum and once we don't enable
clearing the checksum when using L2 frames then the frame doesn't seem
to be changed anymore.
The unexpected MPLS packet may not end with the bottom label stack.
When there are many stacks, The label count value has wrapped around.
A dead loop occurs, soft lockup/CPU stuck finally.
stack backtrace:
UBSAN: array-index-out-of-bounds in /build/linux-0Pa0xK/linux-5.15.0/net/openvswitch/flow.c:662:26
index -1 is out of range for type '__be32 [3]'
CPU: 34 PID: 0 Comm: swapper/34 Kdump: loaded Tainted: G OE 5.15.0-121-generic #131-Ubuntu
Hardware name: Dell Inc. PowerEdge C6420/0JP9TF, BIOS 2.12.2 07/14/2021
Call Trace:
<IRQ>
show_stack+0x52/0x5c
dump_stack_lvl+0x4a/0x63
dump_stack+0x10/0x16
ubsan_epilogue+0x9/0x36
__ubsan_handle_out_of_bounds.cold+0x44/0x49
key_extract_l3l4+0x82a/0x840 [openvswitch]
? kfree_skbmem+0x52/0xa0
key_extract+0x9c/0x2b0 [openvswitch]
ovs_flow_key_extract+0x124/0x350 [openvswitch]
ovs_vport_receive+0x61/0xd0 [openvswitch]
? kernel_init_free_pages.part.0+0x4a/0x70
? get_page_from_freelist+0x353/0x540
netdev_port_receive+0xc4/0x180 [openvswitch]
? netdev_port_receive+0x180/0x180 [openvswitch]
netdev_frame_hook+0x1f/0x40 [openvswitch]
__netif_receive_skb_core.constprop.0+0x23a/0xf00
__netif_receive_skb_list_core+0xfa/0x240
netif_receive_skb_list_internal+0x18e/0x2a0
napi_complete_done+0x7a/0x1c0
bnxt_poll+0x155/0x1c0 [bnxt_en]
__napi_poll+0x30/0x180
net_rx_action+0x126/0x280
? bnxt_msix+0x67/0x80 [bnxt_en]
handle_softirqs+0xda/0x2d0
irq_exit_rcu+0x96/0xc0
common_interrupt+0x8e/0xa0
</IRQ>
Fixes: fbdcdd78da7c ("Change in Openvswitch to support MPLS label depth of 3 in ingress direction") Signed-off-by: Faicker Mo <faicker.mo@zenlayer.com> Acked-by: Ilya Maximets <i.maximets@ovn.org> Reviewed-by: Aaron Conole <aconole@redhat.com> Link: https://patch.msgid.link/259D3404-575D-4A6D-B263-1DF59A67CF89@zenlayer.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
syzkaller reported a null-ptr-deref in txopt_get(). [0]
The offset 0x70 was of struct ipv6_txoptions in struct ipv6_pinfo,
so struct ipv6_pinfo was NULL there.
However, this never happens for IPv6 sockets as inet_sk(sk)->pinet6
is always set in inet6_create(), meaning the socket was not IPv6 one.
The root cause is missing validation in netlbl_conn_setattr().
netlbl_conn_setattr() switches branches based on struct
sockaddr.sa_family, which is passed from userspace. However,
netlbl_conn_setattr() does not check if the address family matches
the socket.
The syzkaller must have called connect() for an IPv6 address on
an IPv4 socket.
We have a proper validation in tcp_v[46]_connect(), but
security_socket_connect() is called in the earlier stage.
Let's copy the validation to netlbl_conn_setattr().
Fixes: ceba1832b1b2 ("calipso: Set the calipso socket label to match the secattr.") Reported-by: syzkaller <syzkaller@googlegroups.com> Reported-by: John Cheung <john.cs.hey@gmail.com> Closes: https://lore.kernel.org/netdev/CAP=Rh=M1LzunrcQB1fSGauMrJrhL6GGps5cPAKzHJXj6GQV+-g@mail.gmail.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Paul Moore <paul@paul-moore.com> Link: https://patch.msgid.link/20250522221858.91240-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When creating bpf program, 'fp->jit_requested' depends on bpf_jit_enable.
This issue is triggered because of CONFIG_BPF_JIT_ALWAYS_ON is not set
and bpf_jit_enable is set to 1, causing the arch to attempt JIT the prog,
but jit failed due to FAULT_INJECTION. As a result, incorrectly
treats the program as valid, when the program runs it calls
`__bpf_prog_ret0_warn` and triggers the WARN_ON_ONCE(1).
Reported-by: syzbot+0903f6d7f285e41cdf10@syzkaller.appspotmail.com Closes: https://lore.kernel.org/bpf/6816e34e.a70a0220.254cdc.002c.GAE@google.com Fixes: fa9dd599b4da ("bpf: get rid of pure_initcall dependency to enable jits") Signed-off-by: KaFai Wan <mannkafai@gmail.com> Link: https://lore.kernel.org/r/20250526133358.2594176-1-mannkafai@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Syzkaller, courtesy of syzbot, identified an error (see report [1]) in
aqc111 driver, caused by incomplete sanitation of usb read calls'
results. This problem is quite similar to the one fixed in commit 920a9fa27e78 ("net: asix: add proper error handling of usb read errors").
For instance, usbnet_read_cmd() may read fewer than 'size' bytes,
even if the caller expected the full amount, and aqc111_read_cmd()
will not check its result properly. As [1] shows, this may lead
to MAC address in aqc111_bind() being only partly initialized,
triggering KMSAN warnings.
Fix the issue by verifying that the number of bytes read is
as expected and not less.
The sk->sk_socket is not locked or referenced in backlog thread, and
during the call to skb_send_sock(), there is a race condition with
the release of sk_socket. All types of sockets(tcp/udp/unix/vsock)
will be affected.
The ref of psock become 0 after sock_map_close() executed.
'''
void sock_map_close()
{
...
if (likely(psock)) {
...
// !! here we remove psock and the ref of psock become 0
sock_map_remove_links(sk, psock)
psock = sk_psock_get(sk);
if (unlikely(!psock))
goto no_psock; <=== Control jumps here via goto
...
cancel_delayed_work_sync(&psock->work); <=== not executed
sk_psock_put(sk, psock);
...
}
'''
Based on the fact that we already wait for the workqueue to finish in
sock_map_close() if psock is held, we simply increase the psock
reference count to avoid race conditions.
With this patch, if the backlog thread is running, sock_map_close() will
wait for the backlog thread to complete and cancel all pending work.
If no backlog running, any pending work that hasn't started by then will
fail when invoked by sk_psock_get(), as the psock reference count have
been zeroed, and sk_psock_drop() will cancel all jobs via
cancel_delayed_work_sync().
In summary, we require synchronization to coordinate the backlog thread
and close() thread.
When setting up dirty page tracking at the vfio IOMMU backend for
device migration, if an error is encountered allocating a tracking
bitmap, the unwind loop fails to free previously allocated tracking
bitmaps. This occurs because the wrong loop index is used to
generate the tracking object. This results in unintended memory
usage for the life of the current DMA mappings where bitmaps were
successfully allocated.
Use the correct loop index to derive the tracking object for
freeing during unwind.
Fixes: d6a4c185660c ("vfio iommu: Implementation of ioctl for dirty pages tracking") Signed-off-by: Li RongQing <lirongqing@baidu.com> Link: https://lore.kernel.org/r/20250521034647.2877-1-lirongqing@baidu.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
With a VRF, ipv4 and ipv6 FIB expression behave differently.
fib daddr . iif oif
Will return the input interface name for ipv4, but the real device
for ipv6. Example:
If VRF device name is tvrf and real (incoming) device is veth0.
First round is ok, both ipv4 and ipv6 will yield 'veth0'.
But in the second round (incoming device will be set to "tvrf"), ipv4
will yield "tvrf" whereas ipv6 returns "veth0" for the second round too.
This makes ipv6 behave like ipv4.
A followup patch will add a test case for this, without this change
it will fail with:
get element inet t fibif6iif { tvrf . dead:1::99 . tvrf }
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FAIL: did not find tvrf . dead:1::99 . tvrf in fibif6iif
Alternatively we could either not do anything at all or change
ipv4 to also return the lower/real device, however, nft (userspace)
doc says "iif: if fib lookup provides a route then check its output
interface is identical to the packets input interface." which is what
the nft fib ipv4 behaviour is.
A malicious USB device can send a WMI_SWBA_EVENTID event from an
ath9k_htc-managed device before beaconing has been enabled. This causes
a device-by-zero error in the driver, leading to either a crash or an
out of bounds read.
Prevent this by aborting the handling in ath9k_htc_swba() if beacons are
not enabled.
Reported-by: Robert Morris <rtm@csail.mit.edu> Closes: https://lore.kernel.org/r/88967.1743099372@localhost Fixes: 832f6a18fc2a ("ath9k_htc: Add beacon slots") Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Link: https://patch.msgid.link/20250402112217.58533-1-toke@toke.dk Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
This is because leaf progs do not store backchain. Fix by making all
progs do it. This is what GCC and Clang-generated code does as well.
Now the call trace looks like this:
Tracepoint like trace_mmap_lock_acquire_returned may cause nested call
as the corner case show above, which will be resolved with more general
method in the future. As a result, WARN_ON_ONCE will be triggered. As
Alexei suggested, remove the WARN_ON_ONCE first.
at91_gpio_probe() doesn't check that given OF alias is not available or
something went wrong when trying to get it. This might have consequences
when accessing gpio_chips array with that value as an index. Note, that
BUG() can be compiled out and hence won't actually perform the required
checks.
Return value of the validate_nla() function can be propagated all the
way up to users of libbpf API. In case of error this libbpf version
of validate_nla returns -1 which will be seen as -EPERM from user's
point of view. Instead, return a more reasonable -EINVAL.
When we specify apply_bytes, we divide the msg into multiple segments,
each with a length of 'send', and every time we send this part of the data
using tcp_bpf_sendmsg_redir(), we use sk_msg_return_zero() to uncharge the
memory of the specified 'send' size.
However, if the first segment of data fails to send, for example, the
peer's buffer is full, we need to release all of the msg. When releasing
the msg, we haven't uncharged the memory of the subsequent segments.
This modification does not make significant logical changes, but only
fills in the missing uncharge places.
This issue has existed all along, until it was exposed after we added the
apply test in test_sockmap:
commit 3448ad23b34e ("selftests/bpf: Add apply_bytes test to test_txmsg_redir_wait_sndmem in test_sockmap")
Fixes: d3b18ad31f93 ("tls: add bpf support to sk_msg handling") Reported-by: Cong Wang <xiyou.wangcong@gmail.com> Closes: https://lore.kernel.org/bpf/aAmIi0vlycHtbXeb@pop-os.localdomain/T/#t Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Link: https://lore.kernel.org/r/20250425060015.6968-2-jiayuan.chen@linux.dev Signed-off-by: Sasha Levin <sashal@kernel.org>
devm_kasprintf() returns NULL when memory allocation fails. Currently,
raspberrypi_clk_register() does not check for this case, which results
in a NULL pointer dereference.
Add NULL check after devm_kasprintf() to prevent this issue.
Fixes: 93d2725affd6 ("clk: bcm: rpi: Discover the firmware clocks") Signed-off-by: Henry Martin <bsdhenrymartin@gmail.com> Reviewed-by: Dave Stevenson <dave.stevenson@raspberrypi.com> Link: https://lore.kernel.org/r/20250402020513.42628-1-bsdhenrymartin@gmail.com Reviewed-by: Stefan Wahren <wahrenst@gmx.net> Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Compared to the msm-4.19 driver the mainline GDSC driver always sets the
bits for en_rest, en_few & clk_dis, and if those values are not set
per-GDSC in the respective driver then the default value from the GDSC
driver is used. The downstream driver only conditionally sets
clk_dis_wait_val if qcom,clk-dis-wait-val is given in devicetree.
Correct this situation by explicitly setting those values. For all GDSCs
the reset value of those bits are used.
With the latest LLVM bpf selftests build will fail with
the following error message:
progs/profiler.inc.h:710:31: error: default initialization of an object of type 'typeof ((parent_task)->real_cred->uid.val)' (aka 'const unsigned int') leaves the object uninitialized and is incompatible with C++ [-Werror,-Wdefault-const-init-unsafe]
710 | proc_exec_data->parent_uid = BPF_CORE_READ(parent_task, real_cred, uid.val);
| ^
tools/testing/selftests/bpf/tools/include/bpf/bpf_core_read.h:520:35: note: expanded from macro 'BPF_CORE_READ'
520 | ___type((src), a, ##__VA_ARGS__) __r; \
| ^
This happens because BPF_CORE_READ (and other macro) declare the
variable __r using the ___type macro which can inherit const modifier
from intermediate types.
Fix this by using __typeof_unqual__, when supported. (And when it
is not supported, the problem shouldn't appear, as older compilers
haven't complained.)
Fixes: 792001f4f7aa ("libbpf: Add user-space variants of BPF_CORE_READ() family of macros") Fixes: a4b09a9ef945 ("libbpf: Add non-CO-RE variants of BPF_CORE_READ() macro family") Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250502193031.3522715-1-a.s.protopopov@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
Upon RQ destruction if the firmware command fails which is the
last resource to be destroyed some SW resources were already cleaned
regardless of the failure.
Now properly rollback the object to its original state upon such failure.
The config NF_CONNTRACK_BRIDGE will change the bridge forwarding for
fragmented packets.
The original bridge does not know that it is a fragmented packet and
forwards it directly, after NF_CONNTRACK_BRIDGE is enabled, function
nf_br_ip_fragment and br_ip6_fragment will check the headroom.
In original br_forward, insufficient headroom of skb may indeed exist,
but there's still a way to save the skb in the device driver after
dev_queue_xmit.So droping the skb will change the original bridge
forwarding in some cases.
Return values of the linker_append_sec_data() and the
linker_append_elf_relos() functions are propagated all the
way up to users of libbpf API. In some error cases these
functions return -1 which will be seen as -EPERM from user's
point of view. Instead, return a more reasonable -EINVAL.
After 4961acdd65c9 ("f2fs: fix to tag gcing flag on page during block
migration"), we will tag page w/ gcing flag for raw page of cluster
during its migration.
However, if the inode is both encrypted and compressed, during
ioc_decompress(), it will tag page w/ gcing flag, and it increase
F2FS_WB_DATA reference count:
- f2fs_write_multi_page
- f2fs_write_raw_page
- f2fs_write_single_page
- do_write_page
- f2fs_submit_page_write
- WB_DATA_TYPE(bio_page, fio->compressed_page)
: bio_page is encrypted, so mapping is NULL, and fio->compressed_page
is NULL, it returns F2FS_WB_DATA
- inc_page_count(.., F2FS_WB_DATA)
Then, during end_io(), it decrease F2FS_WB_CP_DATA reference count:
- f2fs_write_end_io
- f2fs_compress_write_end_io
- fscrypt_pagecache_folio
: get raw page from encrypted page
- WB_DATA_TYPE(&folio->page, false)
: raw page has gcing flag, it returns F2FS_WB_CP_DATA
- dec_page_count(.., F2FS_WB_CP_DATA)
In order to fix this issue, we need to detect gcing flag in raw page
in f2fs_is_cp_guaranteed().
Fixes: 4961acdd65c9 ("f2fs: fix to tag gcing flag on page during block migration") Reported-by: Jan Prusakowski <jprusakowski@google.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
hns_roce_hw_v2.h has a direct dependency on hnae3.h due to the
inline function hns_roce_write64(), but it doesn't include this
header currently. This leads to that files including
hns_roce_hw_v2.h must also include hnae3.h to avoid compilation
errors, even if they themselves don't really rely on hnae3.h.
This doesn't make sense, hns_roce_hw_v2.h should include hnae3.h
directly.
Fixes: d3743fa94ccd ("RDMA/hns: Fix the chip hanging caused by sending doorbell during reset") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20250421132750.1363348-6-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
In 'rtw8822c_dpk_cal_coef1()', do not ignore error returned
by 'check_hw_ready()' but issue a warning to denote possible
DPK issue. Compile tested only.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
As shown in [1], it is possible to corrupt a BPF ELF file such that
arbitrary BPF instructions are loaded by libbpf. This can be done by
setting a symbol (BPF program) section offset to a large (unsigned)
number such that <section start + symbol offset> overflows and points
before the section data in the memory.
$ readelf -s crash
Symbol table '.symtab' contains 8 entries:
Num: Value Size Type Bind Vis Ndx Name
...
6: ffffffffffffffb8 104 FUNC GLOBAL DEFAULT 2 handle_tp
Here, the handle_tp prog has section offset ffffffffffffffb8, i.e. will
point before the actual memory where section 2 is allocated.
This is also reported by AddressSanitizer:
=================================================================
==1232==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7c7302fe0000 at pc 0x7fc3046e4b77 bp 0x7ffe64677cd0 sp 0x7ffe64677490
READ of size 104 at 0x7c7302fe0000 thread T0
#0 0x7fc3046e4b76 in memcpy (/lib64/libasan.so.8+0xe4b76)
#1 0x00000040df3e in bpf_object__init_prog /src/libbpf/src/libbpf.c:856
#2 0x00000040df3e in bpf_object__add_programs /src/libbpf/src/libbpf.c:928
#3 0x00000040df3e in bpf_object__elf_collect /src/libbpf/src/libbpf.c:3930
#4 0x00000040df3e in bpf_object_open /src/libbpf/src/libbpf.c:8067
#5 0x00000040f176 in bpf_object__open_file /src/libbpf/src/libbpf.c:8090
#6 0x000000400c16 in main /poc/poc.c:8
#7 0x7fc3043d25b4 in __libc_start_call_main (/lib64/libc.so.6+0x35b4)
#8 0x7fc3043d2667 in __libc_start_main@@GLIBC_2.34 (/lib64/libc.so.6+0x3667)
#9 0x000000400b34 in _start (/poc/poc+0x400b34)
0x7c7302fe0000 is located 64 bytes before 104-byte region [0x7c7302fe0040,0x7c7302fe00a8)
allocated by thread T0 here:
#0 0x7fc3046e716b in malloc (/lib64/libasan.so.8+0xe716b)
#1 0x7fc3045ee600 in __libelf_set_rawdata_wrlock (/lib64/libelf.so.1+0xb600)
#2 0x7fc3045ef018 in __elf_getdata_rdlock (/lib64/libelf.so.1+0xc018)
#3 0x00000040642f in elf_sec_data /src/libbpf/src/libbpf.c:3740
The problem here is that currently, libbpf only checks that the program
end is within the section bounds. There used to be a check
`while (sec_off < sec_sz)` in bpf_object__add_programs, however, it was
removed by commit 6245947c1b3c ("libbpf: Allow gaps in BPF program
sections to support overriden weak functions").
Add a check for detecting the overflow of `sec_off + prog_sz` to
bpf_object__init_prog to fix this issue.
Correct Get Controller Packet Statistics (GCPS) 64-bit wide member
variables, as per DSP0222 v1.0.0 and forward specs. The Driver currently
collects these stats, but they are yet to be exposed to the user.
Therefore, no user impact.
Statistics fixes:
Total Bytes Received (byte range 28..35)
Total Bytes Transmitted (byte range 36..43)
Total Unicast Packets Received (byte range 44..51)
Total Multicast Packets Received (byte range 52..59)
Total Broadcast Packets Received (byte range 60..67)
Total Unicast Packets Transmitted (byte range 68..75)
Total Multicast Packets Transmitted (byte range 76..83)
Total Broadcast Packets Transmitted (byte range 84..91)
Valid Bytes Received (byte range 204..11)
The reason is: in fuzzed image, sbi->total_valid_block_count is
inconsistent w/ mapped blocks indexed by inode, so, we should
not trigger panic for such case, instead, let's print log and
set fsck flag.
Fixes: 39a53e0ce0df ("f2fs: add superblock and major in-memory structure") Reported-by: syzbot+8b376a77b2f364097fbe@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/67f3c0b2.050a0220.396535.0547.GAE@google.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
In the !ingress path under sk_psock_handle_skb(), when sending data to the
remote under snd_buf limitations, partial skb data might be transmitted.
Although we preserved the partial transmission state (offset/length), the
state wasn't properly consumed during retries. This caused the retry path
to resend the entire skb data instead of continuing from the previous
offset, resulting in data overlap at the receiver side.
In workloads where there are many processes establishing connections using
RDMA CM in parallel (large scale MPI), there can be heavy contention for
mad_agent_lock in cm_alloc_msg.
This contention can occur while inside of a spin_lock_irq region, leading
to interrupts being disabled for extended durations on many
cores. Furthermore, it leads to the serialization of rdma_create_ah calls,
which has negative performance impacts for NICs which are capable of
processing multiple address handle creations in parallel.
The end result is the machine becoming unresponsive, hung task warnings,
netdev TX timeouts, etc.
Since the lock appears to be only for protection from cm_remove_one, it
can be changed to a rwlock to resolve these issues.
Reproducer:
Server:
for i in $(seq 1 512); do
ucmatose -c 32 -p $((i + 5000)) &
done
Client:
for i in $(seq 1 512); do
ucmatose -c 32 -p $((i + 5000)) -s 10.2.0.52 &
done
Fixes: 76039ac9095f ("IB/cm: Protect cm_dev, cm_ports and mad_agent with kref and lock") Link: https://patch.msgid.link/r/20250220175612.2763122-1-jmoroni@google.com Signed-off-by: Jacob Moroni <jmoroni@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In current WLAN recovery code flow, ath11k_core_halt() only
reinitializes the "arvifs" list head. This will cause the
list node immediately following the list head to become an
invalid list node. Because the prev of that node still points
to the list head "arvifs", but the next of the list head "arvifs"
no longer points to that list node.
When a WLAN recovery occurs during the execution of a vif
removal, and it happens before the spin_lock_bh(&ar->data_lock)
in ath11k_mac_op_remove_interface(), list_del() will detect the
previously mentioned situation, thereby triggering a kernel panic.
The fix is to remove and reinitialize all vif list nodes from the
list head "arvifs" during WLAN halt. The reinitialization is to make
the list nodes valid, ensuring that the list_del() in
ath11k_mac_op_remove_interface() can execute normally.
SDEI usually initialize with the ACPI table, but on platforms where
ACPI is not used, the SDEI feature can still be used to handle
specific firmware calls or other customized purposes. Therefore, it
is not necessary for ARM_SDE_INTERFACE to depend on ACPI_APEI_GHES.
In commit dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES
in acpi_init()"), to make APEI ready earlier, sdei_init was moved
into acpi_ghes_init instead of being a standalone initcall, adding
ACPI_APEI_GHES dependency to ARM_SDE_INTERFACE. This restricts the
flexibility and usability of SDEI.
This patch corrects the dependency in Kconfig and splits sdei_init()
into two separate functions: sdei_init() and acpi_sdei_init().
sdei_init() will be called by arch_initcall and will only initialize
the platform driver, while acpi_sdei_init() will initialize the
device from acpi_ghes_init() when ACPI is ready. This allows the
initialization of SDEI without ACPI_APEI_GHES enabled.
Fixes: dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in apci_init()") Cc: Shuai Xue <xueshuai@linux.alibaba.com> Signed-off-by: Huang Yiwei <quic_hyiwei@quicinc.com> Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20250507045757.2658795-1-quic_hyiwei@quicinc.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>