====================
selftests/bpf: Fix a few dynptr test failures with 64K page size
There are a few dynptr test failures with arm64 64K page size.
They are fixed in this patch set and please see individual patches
for details.
====================
Yonghong Song [Fri, 25 Jul 2025 04:34:40 +0000 (21:34 -0700)]
selftests/bpf: Fix test dynptr/test_dynptr_memset_xdp_chunks failure
For arm64 64K page size, the xdp data size was set to be more than 64K
in one of previous patches. This will cause failure for bpf_dynptr_memset().
Since the failure of bpf_dynptr_memset() is expected with 64K page size,
return success.
Yonghong Song [Fri, 25 Jul 2025 04:34:35 +0000 (21:34 -0700)]
selftests/bpf: Fix test dynptr/test_dynptr_copy_xdp failure
For arm64 64K page size, the bpf_dynptr_copy() in test dynptr/test_dynptr_copy_xdp
will succeed, but the test will failure with 4K page size. This patch made a change
so the test will fail expectedly for both 4K and 64K page sizes.
Yonghong Song [Fri, 25 Jul 2025 04:34:30 +0000 (21:34 -0700)]
selftests/bpf: Increase xdp data size for arm64 64K page size
With arm64 64K page size, the following 4 subtests failed:
#97/25 dynptr/test_probe_read_user_dynptr:FAIL
#97/26 dynptr/test_probe_read_kernel_dynptr:FAIL
#97/27 dynptr/test_probe_read_user_str_dynptr:FAIL
#97/28 dynptr/test_probe_read_kernel_str_dynptr:FAIL
These failures are due to function bpf_dynptr_check_off_len() in
include/linux/bpf.h where there is a test
if (len > size || offset > size - len)
return -E2BIG;
With 64K page size, the 'offset' is greater than 'size - len',
which caused the test failure.
For 64KB page size, this patch increased the xdp buffer size from 5000 to
90000. The above 4 test failures are fixed as 'size' value is increased.
But it introduced two new failures:
#97/4 dynptr/test_dynptr_copy_xdp:FAIL
#97/12 dynptr/test_dynptr_memset_xdp_chunks:FAIL
These two failures will be addressed in subsequent patches.
Tristram Ha [Fri, 25 Jul 2025 00:17:53 +0000 (17:17 -0700)]
net: dsa: microchip: Disable PTP function of KSZ8463
The PTP function of KSZ8463 is on by default. However, its proprietary
way of storing timestamp directly in a reserved field inside the PTP
message header is not suitable for use with the current Linux PTP stack
implementation. It is necessary to disable the PTP function to not
interfere the normal operation of the MAC.
Note the PTP driver for KSZ switches does not work for KSZ8463 and is not
activated for it.
Tristram Ha [Fri, 25 Jul 2025 00:17:52 +0000 (17:17 -0700)]
net: dsa: microchip: Setup fiber ports for KSZ8463
The fiber ports in KSZ8463 cannot be detected internally, so it requires
specifying that condition in the device tree. Like the one used in
Micrel PHY the port link can only be read and there is no write to the
PHY. The driver programs registers to operate fiber ports correctly.
Tristram Ha [Fri, 25 Jul 2025 00:17:49 +0000 (17:17 -0700)]
net: dsa: microchip: Add KSZ8463 switch support to KSZ DSA driver
KSZ8463 switch is a 3-port switch based from KSZ8863. Its major
difference from other KSZ SPI switches is its register access is not a
simple continual 8-bit transfer with automatic address increase but uses
a byte-enable mechanism specifying 8-bit, 16-bit, or 32-bit access. Its
registers are also defined in 16-bit format because it shares a design
with a MAC controller using 16-bit access. As a result some common
register accesses need to be re-arranged.
This patch adds the basic structure for using KSZ8463. It cannot use the
same regmap table for other KSZ switches as it interprets the 16-bit
value as little-endian and its SPI commands are different.
KSZ8463 uses a byte-enable mechanism to specify 8-bit, 16-bit, and 32-bit
access. The register is first shifted right by 2 then left by 4. Extra
4 bits are added. If the access is 8-bit one of the 4 bits is set. If
the access is 16-bit two of the 4 bits are set. If the access is 32-bit
all 4 bits are set. The SPI command for read or write is then added.
Because of this register transformation separate SPI read and write
functions are provided for KSZ8463.
KSZ8463's internal PHYs use standard PHY register definitions so there is
no need to remap things. However, the hardware has a bug that the high
word and low word of the PHY id are swapped. In addition the port
registers are arranged differently so KSZ8463 has its own mapping for
port registers and PHY registers. Therefore the PORT_CTRL_ADDR macro is
replaced with the get_port_addr helper function.
Currently, UDP exchange is prone to failure when cmd attempt to send data
while socat in bkg is not ready. Since, the behavior is probabilistic, this
can result in flakiness for XDP tests. While testing
test_xdp_native_tx_mb() on netdevsim, a failure rate of around 1% in 500
500 iterations was observed.
Use wait_port_listen() to ensure that the bkg socat is started and ready to
receive before cmd start sending. With proposed changes, a re-run of the
same test passed 100% of time.
====================
arm64: dts: socfpga: enable ethernet support for Agilex5
This patch set enables ethernet support for the Agilex5 family of SOCFPGAs,
and specifically enables gmac2 on the Agilex5 SOCFPGA Premium Development
Kit.
Patch 1 defines Agilex5 compatibility string in the device tree bindings.
Patch 2 add the new compatibility string to dwmac-socfpga.c.
====================
Jakub Kicinski [Fri, 25 Jul 2025 23:48:37 +0000 (16:48 -0700)]
Merge tag 'linux-can-fixes-for-6.16-20250725' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2025-07-25
The patch is by Stephane Grosjean and adds support the recent firmware
of USB CAN FD interfaces to the peak_usb driver.
* tag 'linux-can-fixes-for-6.16-20250725' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: peak_usb: fix USB FD devices potential malfunction
====================
The first patch is by Khaled Elnaggar and converts the janz-ican3
driver's fwinfo_show() to sysfs_emit().
Vincent Mailhol contributes 3 patches that first fix a warning in the
ti_hecc driver and then add missing COMPILE_TEST more compile
coverage to the ti_hecc and tscan1 driver.
Randy Dunlap's patch let's the tscan1 driver depend on PC104.
A patch by Luis Felipe Hernandez fixes a kernel-doc error in the
ctucanfd driver.
Jimmy Assarsson contributes 10 patches for the kvaser_pciefd and 11
for the kvaser_usb driver. Both series simplify the identification of
physical the CAN interfaces and add devlink support to get information
about the running firmware.
* tag 'linux-can-next-for-6.17-20250725' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: (27 commits)
Documentation: devlink: add devlink documentation for the kvaser_usb driver
can: kvaser_usb: Add devlink port support
can: kvaser_usb: Expose device information via devlink info_get()
can: kvaser_usb: Add devlink support
can: kvaser_usb: Store additional device information
can: kvaser_usb: Store the different firmware version components in a struct
can: kvaser_usb: Move comment regarding max_tx_urbs
can: kvaser_usb: Add intermediate variables
can: kvaser_usb: Assign netdev.dev_port based on device channel index
can: kvaser_usb: Add support for ethtool set_phys_id()
can: kvaser_usb: Add support to control CAN LEDs on device
Documentation: devlink: add devlink documentation for the kvaser_pciefd driver
can: kvaser_pciefd: Add devlink port support
can: kvaser_pciefd: Expose device firmware version via devlink info_get()
can: kvaser_pciefd: Add devlink support
can: kvaser_pciefd: Split driver into C-file and header-file.
can: kvaser_pciefd: Store device channel index
can: kvaser_pciefd: Store the different firmware version components in a struct
can: kvaser_pciefd: Add intermediate variable for device struct in probe()
can: kvaser_pciefd: Add support for ethtool set_phys_id()
...
====================
It is a prework to allow reusing some specific Intel code (eq. fwlog).
Move common *_aq_desc structure to libie header and changing
it in ice, ixgbe, i40e and iavf.
Only generic adminq commands can be easily moved to common header, as
rest is slightly different. Format remains the same. It will be better
to correctly move it when it will be needed to commonize other part of
the code.
Move *_aq_str() to new libie module (libie_adminq) and use it across
drivers. The functions are exactly the same in each driver. Some more
adminq helpers/functions can be moved to libie_adminq when needed.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
i40e: use libie_aq_str
iavf: use libie_aq_str
ice: use libie_aq_str
libie: add adminq helper for converting err to str
iavf: use libie adminq descriptors
i40e: use libie adminq descriptors
ixgbe: use libie adminq descriptors
ice, libie: move generic adminq descriptors to lib
====================
Wolfram Sang [Fri, 25 Jul 2025 22:59:39 +0000 (00:59 +0200)]
Merge tag 'i2c-host-fixes-6.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current
i2c-host-fixes for v6.16-rc8
qup: avoid potential hang when waiting for bus idle
tegra: improve ACPI reset error handling
virtio: use interruptible wait to prevent hang during transfer
res = hfs_find_init(HFS_SB(inode->i_sb)->ext_tree, &fd);
if (!res) {
res = __hfs_ext_cache_extent(&fd, inode, block);
hfs_find_exit(&fd);
}
return res;
}
The problem here that hfs_find_init() is trying to use
HFS_SB(inode->i_sb)->ext_tree that is not initialized yet.
It will be initailized when hfs_btree_open() finishes
the execution.
The patch adds checking of tree pointer in hfs_find_init()
and it reworks the logic of hfs_btree_open() by reading
the b-tree's header directly from the volume. The read_mapping_page()
is exchanged on filemap_grab_folio() that grab the folio from
mapping. Then, sb_bread() extracts the b-tree's header
content and copy it into the folio.
Reported-by: Wenzhi Wang <wenzhi.wang@uwaterloo.ca> Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250710213657.108285-1-slava@dubeyko.com Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
This patch introduces is_bnode_offset_valid() method that checks
the requested offset value. Also, it introduces
check_and_correct_requested_length() method that checks and
correct the requested length (if it is necessary). These methods
are used in hfs_bnode_read(), hfs_bnode_write(), hfs_bnode_clear(),
hfs_bnode_copy(), and hfs_bnode_move() with the goal to prevent
the access out of allocated memory and triggering the crash.
The reason of the issue that code doesn't check the correctness
of the requested offset and length. As a result, incorrect value
of offset or/and length could result in access out of allocated
memory.
This patch introduces is_bnode_offset_valid() method that checks
the requested offset value. Also, it introduces
check_and_correct_requested_length() method that checks and
correct the requested length (if it is necessary). These methods
are used in hfsplus_bnode_read(), hfsplus_bnode_write(),
hfsplus_bnode_clear(), hfsplus_bnode_copy(), and hfsplus_bnode_move()
with the goal to prevent the access out of allocated memory
and triggering the crash.
Reported-by: Kun Hu <huk23@m.fudan.edu.cn> Reported-by: Jiaji Qin <jjtan24@m.fudan.edu.cn> Reported-by: Shuoran Bai <baishuoran@hrbeu.edu.cn> Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Link: https://lore.kernel.org/r/20250703214804.244077-1-slava@dubeyko.com Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
There are cases in networking (e.g. wireguard, sctp) where a union is
used to provide coverage for either IPv4 or IPv6 network addresses,
and they include an embedded "struct sockaddr" as well (for "sa_family"
and raw "sa_data" access). The current struct sockaddr contains a
flexible array, which means these unions should not be further embedded
in other structs because they do not technically have a fixed size (and
are generating warnings for the coming -Wflexible-array-not-at-end flag
addition). But the future changes to make struct sockaddr a fixed size
(i.e. with a 14 byte sa_data member) make the "sa_data" uses with an IPv6
address a potential place for the compiler to get upset about object size
mismatches. Therefore, we need a sockaddr that cleanly provides both an
sa_family member and an appropriately fixed-sized sa_data member that does
not bloat member usage via the potential alternative of sockaddr_storage
to cover both IPv4 and IPv6, to avoid unseemly churn in the affected code
bases.
Introduce sockaddr_inet as a unified structure for holding both IPv4 and
IPv6 addresses (i.e. large enough to accommodate sockaddr_in6).
The structure is defined in linux/in6.h since its max size is sized
based on sockaddr_in6 and provides a more specific alternative to the
generic sockaddr_storage for IPv4 with IPv6 address family handling.
The "sa_family" member doesn't use the sa_family_t type to avoid needing
layer violating header inclusions.
Also includes the replacements for wireguard and sctp.
====================
There are cases in networking (e.g. wireguard, sctp) where a union is
used to provide coverage for either IPv4 or IPv6 network addresses,
and they include an embedded "struct sockaddr" as well (for "sa_family"
and raw "sa_data" access). The current struct sockaddr contains a
flexible array, which means these unions should not be further embedded
in other structs because they do not technically have a fixed size (and
are generating warnings for the coming -Wflexible-array-not-at-end flag
addition). But the future changes to make struct sockaddr a fixed size
(i.e. with a 14 byte sa_data member) make the "sa_data" uses with an IPv6
address a potential place for the compiler to get upset about object size
mismatches. Therefore, we need a sockaddr that cleanly provides both an
sa_family member and an appropriately fixed-sized sa_data member that does
not bloat member usage via the potential alternative of sockaddr_storage
to cover both IPv4 and IPv6, to avoid unseemly churn in the affected code
bases.
Introduce sockaddr_inet as a unified structure for holding both IPv4 and
IPv6 addresses (i.e. large enough to accommodate sockaddr_in6).
The structure is defined in linux/in6.h since its max size is sized
based on sockaddr_in6 and provides a more specific alternative to the
generic sockaddr_storage for IPv4 with IPv6 address family handling.
The "sa_family" member doesn't use the sa_family_t type to avoid needing
layer violating header inclusions.
net/mlx5e: Support routed networks during IPsec MACs initialization
Remote IPsec tunnel endpoint may refer to a network segment that is
not directly connected to the host. In such a case, IPsec tunnel
endpoints are connected to a router and reachable via a routing path.
In IPsec packet offload mode, HW is initialized with the MAC address
of both IPsec tunnel endpoints.
Extend the current IPsec init MACs procedure to resolve nexthop for
routed networks. Direct neighbour lookup and probe is still used
for directly connected networks and as a fallback mechanism if fib
lookup fails.
Signed-off-by: Alexandre Cassen <acassen@corp.free.fr> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/1753194228-333722-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
where HFSPLUS_MAX_STRLEN is 255 bytes. The issue happens if length
of the structure instance has value bigger than 255 (for example,
65283). In such case, pointer on unicode buffer is going beyond of
the allocated memory.
The patch fixes the issue by checking the length value of
hfsplus_unistr instance and using 255 value in the case if length
value is bigger than HFSPLUS_MAX_STRLEN. Potential reason of such
situation could be a corruption of Catalog File b-tree's node.
Reported-by: Wenzhi Wang <wenzhi.wang@uwaterloo.ca> Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org Reviewed-by: Yangtao Li <frank.li@vivo.com> Link: https://lore.kernel.org/r/20250710230830.110500-1-slava@dubeyko.com Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
hfsplus: don't use BUG_ON() in hfsplus_create_attributes_file()
When the volume header contains erroneous values that do not reflect
the actual state of the filesystem, hfsplus_fill_super() assumes that
the attributes file is not yet created, which later results in hitting
BUG_ON() when hfsplus_create_attributes_file() is called. Replace this
BUG_ON() with -EIO error with a message to suggest running fsck tool.
hfsplus: don't set REQ_SYNC for hfsplus_submit_bio()
hfsplus_submit_bio() called by hfsplus_sync_fs() uses bdev_virt_rw() which
in turn uses submit_bio_wait() to submit the BIO.
But submit_bio_wait() already sets the REQ_SYNC flag on the BIO so there
is no need for setting the flag in hfsplus_sync_fs() when calling
hfsplus_submit_bio().
Steven Rostedt [Thu, 12 Jun 2025 14:05:52 +0000 (10:05 -0400)]
tracing: sched: Hide numa events under CONFIG_NUMA_BALANCING
The events sched_move_numa, sched_stick_numa and sched_swap_numa are only
called when CONFIG_NUMA_BALANCING is configured. As each event can take up
to 5K of memory in text and meta data regardless if they are used or not,
they should not be defined when unused.
Move the #ifdef CONFIG_NUMA_BALANCING to hide these events as well.
Rework the read and write code paths in the driver to support operation
in atomic contexts. To achieve this, the driver must not rely on IRQs
or perform any scheduling, e.g., via a sleep or schedule routine.
Implement atomic, sleep-free, and IRQ-less operation. This increases
complexity but is necessary for atomic I2C transfers required by some
hardware configurations, e.g., to trigger reboots on an external PMIC chip.
Signed-off-by: Emanuele Ghidoli <emanuele.ghidoli@toradex.com> Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com> Reviewed-by: Carlos Song <carlos.song@nxp.com> Tested-by: Primoz Fiser <primoz.fiser@norik.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org> Link: https://lore.kernel.org/r/20250718133429.67219-3-francesco@dolcini.it
Jonas Karlman [Wed, 23 Jul 2025 08:56:44 +0000 (08:56 +0000)]
dt-bindings: i2c: i2c-rk3x: Allow use of a power-domain
The I2C controllers in most Rockchip SoCs are part of power domains that
are always powered on, i.e. PD_BUS or PD_PMU. These always powered
on power domains have typically not been described in the device tree.
Because these power domains have been left out of the device tree there
has not been any real need to properly describe the I2C controllers
power domain.
On RK3528 the I2C controllers are spread out among the described
PD_RKVENC, PD_VO and PD_VPU power domains. However, one I2C controller
belong to an undescribed always powered on power domain.
Add support to describe an optional power-domains for the I2C
controllers in Rockchip SoCs.
Yuesong Li [Fri, 13 Jun 2025 11:06:38 +0000 (19:06 +0800)]
i2c: lpi2c: convert to use secs_to_jiffies()
Since secs_to_jiffies() has been introduced in commit b35108a51cf7
("jiffies: Define secs_to_jiffies()"), we can use it to avoid scaling
the time to msec.
Qianfeng Rong [Wed, 9 Jul 2025 04:23:46 +0000 (12:23 +0800)]
i2c: st: Use min() to improve code
Use min() to reduce the code and improve its readability.
The type of the max parameter in the st_i2c_rd_fill_tx_fifo()
was changed from int to u32, because the max parameter passed
in is always greater than 0.
====================
net: dsa: b53: mmap: Add bcm63xx EPHY power control
The gpio controller on some bcm63xx SoCs has a register for
controlling functionality of the internal fast ethernet phys.
These patches allow the b53 driver to enable/disable phy
power.
The register also contains reset bits which will be set by
a reset driver in another patch series:
https://lore.kernel.org/all/20250715234605.36216-1-kylehendrydev@gmail.com/
net: dsa: b53: mmap: Add syscon reference and register layout for bcm63268
On bcm63xx SoCs there are registers that control the PHYs in
the GPIO controller. Allow the b53 driver to access them
by passing in the syscon through the device tree.
Add a structure to describe the ephy control register
and add register info for bcm63268.
parisc: Revise __get_user() to probe user read access
Because of the way read access support is implemented, read access
interruptions are only triggered at privilege levels 2 and 3. The
kernel executes at privilege level 0, so __get_user() never triggers
a read access interruption (code 26). Thus, it is currently possible
for user code to access a read protected address via a system call.
Fix this by probing read access rights at privilege level 3 (PRIV_USER)
and setting __gu_err to -EFAULT (-14) if access isn't allowed.
Note the cmpiclr instruction does a 32-bit compare because COND macro
doesn't work inside asm.
Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org # v5.12+
parisc: Revise gateway LWS calls to probe user read access
We use load and stbys,e instructions to trigger memory reference
interruptions without writing to memory. Because of the way read
access support is implemented, read access interruptions are only
triggered at privilege levels 2 and 3. The kernel and gateway
page execute at privilege level 0, so this code never triggers
a read access interruption. Thus, it is currently possible for
user code to execute a LWS compare and swap operation at an
address that is read protected at privilege level 3 (PRIV_USER).
Fix this by probing read access rights at privilege level 3 and
branching to lws_fault if access isn't allowed.
Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org # v5.12+
parisc: Check region is readable by user in raw_copy_from_user()
Because of the way the _PAGE_READ is handled in the parisc PTE, an
access interruption is not generated when the kernel reads from a
region where the _PAGE_READ is zero. The current code was written
assuming read access faults would also occur in the kernel.
This change adds user access checks to raw_copy_from_user(). The
prober_user() define checks whether user code has read access to
a virtual address. Note that page faults are not handled in the
exception support for the probe instruction. For this reason, we
precede the probe by a ldb access check.
Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org # v5.12+
int main(int argc, char **argv)
{
unsigned long page_size = sysconf(_SC_PAGESIZE);
char *p = malloc(3 * page_size);
char *p_aligned;
/* initialize memory region. If not initialized, write syscall below will correctly return EFAULT. */
if (1)
memset(p, 'X', 3 * page_size);
p_aligned = (char *) ((((uintptr_t) p) + (2*page_size - 1)) & ~(page_size - 1));
/* Drop PROT_READ protection. Kernel and userspace should fault when accessing that memory region */
mprotect(p_aligned, page_size, PROT_NONE);
/* the following write() should return EFAULT, since PROT_READ was dropped by previous mprotect() */
int ret = write(2, p_aligned, 1);
if (!ret || errno != EFAULT)
printf("\n FAILURE: write() did not returned expected EFAULT value\n");
return 0;
}
Because of the way _PAGE_READ is handled, kernel code never generates
a read access fault when it access a page as the kernel privilege level
is always less than PL1 in the PTE.
This patch reworks the comments in the make_insert_tlb macro to try
to make this clearer.
Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org # v5.12+
Randy Dunlap [Wed, 25 Jun 2025 07:30:54 +0000 (00:30 -0700)]
parisc: Makefile: explain that 64BIT requires both 32-bit and 64-bit compilers
For building a 64-bit kernel, both 32-bit and 64-bit VDSO binaries
are built, so both 32-bit and 64-bit compilers (and tools) should be
in the PATH environment variable.
Tomas Glozar [Thu, 26 Jun 2025 12:34:04 +0000 (14:34 +0200)]
rtla/tests: Limit duration to maximum of 10s
Many of the original rtla tests included durations of 1 minute and 30
seconds. Experience has shown this is unnecessary, since 10 seconds as
waiting time for samples to appear.
Change duration of all rtla tests to at most 10 seconds. This speeds up
testing significantly.
Tomas Glozar [Thu, 26 Jun 2025 12:34:02 +0000 (14:34 +0200)]
rtla/tests: Check rtla output with grep
Add argument to the check command in the test suite that takes a regular
expression that the output of rtla command is checked against. This
allows testing for specific information in rtla output in addition
to checking the return value.
Two minor improvements are included: running rtla with "eval" so that
arguments with spaces can be passed to it via shell quotations, and
the stdout of pushd and popd is suppressed to clean up the test output.
Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Chang Yin <cyin@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Cc: Crystal Wood <crwood@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250626123405.1496931-7-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Tomas Glozar [Thu, 26 Jun 2025 12:34:01 +0000 (14:34 +0200)]
rtla/timerlat: Add action on end feature
Implement actions on end next to actions on threshold. A new option,
--on-end is added, parallel to --on-threshold. Instead of being
executed whenever a latency threshold is reached, it is executed at the
end of the measurement.
For example:
$ rtla timerlat hist -d 5s --on-end trace
will save the trace output at the end.
All actions supported by --on-threshold are also supported by --on-end,
except for continue, which does nothing with --on-end.
Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Chang Yin <cyin@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Cc: Crystal Wood <crwood@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250626123405.1496931-6-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The feature is supported for both hist and top. After the continue
action is executed, processing of the list of actions is stopped and
tracing is resumed.
Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Chang Yin <cyin@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Cc: Crystal Wood <crwood@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250626123405.1496931-5-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Tomas Glozar [Thu, 26 Jun 2025 12:33:59 +0000 (14:33 +0200)]
rtla/timerlat_bpf: Allow resuming tracing
Currently, rtla-timerlat BPF program uses a global variable stored in a
.bss section to store whether tracing has been stopped.
Move the information to a separate map, so that it is easily writable
from userspace, and add a function that clears the value, resuming
tracing after it has been stopped.
Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Chang Yin <cyin@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Cc: Crystal Wood <crwood@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250626123405.1496931-4-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Tomas Glozar [Thu, 26 Jun 2025 12:33:58 +0000 (14:33 +0200)]
rtla/timerlat: Add action on threshold feature
Extend the functionality provided by the -t/--trace option, which
triggers saving the contents of a tracefs buffer after tracing is
stopped, to support implementing arbitrary actions.
A new option, --on-threshold, is added, taking an argument
that further specifies the action. Actions added in this patch are:
- trace[,file=<filename>]: Saves tracefs buffer, optionally taking a
filename.
- signal,num=<sig>,pid=<pid>: Sends signal to process. "parent" might
be specified instead of number to send signal to parent process.
- shell,command=<command>: Execute shell command.
Multiple actions may be specified and will be executed in order,
including multiple actions of the same type. Trace output requested via
-t and -a now adds a trace action to the end of the list.
If an action fails, the following actions are not executed. For
example, this command:
After the introduction of BPF-based sample collection, rtla-timerlat
effectively runs in one of three modes:
- Pure BPF mode, with tracefs only being used to set up the timerlat
tracer. Sample processing and stop on threshold are handled by BPF.
- tracefs mode. BPF is unsupported or kernel is lacking the necessary
trace event (osnoise:timerlat_sample). Stop on theshold is handled by
timerlat tracer stopping tracing in all instances.
- BPF/tracefs mixed mode - BPF is used for sample collection for top or
histogram, tracefs is used for trace output and/or auto-analysis. Stop
on threshold is handled both through BPF program, which stops sample
collection for top/histogram and wakes up rtla, and by timerlat
tracer, which stops tracing for trace output/auto-analysis instances.
Add enum timerlat_tracing_mode, with three values:
Those represent the modes described above. A field of this type is added
to struct timerlat_params, named "mode", replacing the no_bpf variable.
params->mode is set in timerlat_{top,hist}_parse_args to
TRACING_MODE_BPF or TRACING_MODE_MIXED based on whether trace output
and/or auto-analysis is requested. timerlat_{top,hist}_main then checks
if BPF is not unavailable or disabled, in that case, it sets
params->mode to TRACING_MODE_TRACEFS.
A condition is added to timerlat_apply_config that skips setting
timerlat tracer thresholds if params->mode is TRACING_MODE_BPF (those
are unnecessary, since they only turn off tracing, which is already
turned off in that case, since BPF is used to collect samples).
Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Chang Yin <cyin@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Cc: Crystal Wood <crwood@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250626123405.1496931-2-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Commit 21b688dabecb ("net: phy: micrel: Cable Diag feature for lan8814
phy") introduced cable_test support for the LAN8814 that reuses parts of
the KSZ886x logic and introduced the cable_diag_reg and pair_mask
parameters to account for differences between those chips.
However, it did not update the ksz8081_type struct, so those members are
now 0, causing no pairs to be tested in ksz886x_cable_test_get_status
and ksz886x_cable_test_wait_for_completion to poll the wrong register
for the affected PHYs (Basic Control/Reset, which is 0 in normal
operation) and exit immediately.
Fix this by setting both struct members accordingly.
Merge tag 'drm-fixes-2025-07-26' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes (part 2) from Dave Airlie:
"Just the follow up fixes for i915 and xe, all pretty minor.
i915:
- Fix DP 2.7 Gbps DP_LINK_BW value on g4x
- Fix return value on intel_atomic_commit_fence_wait
xe:
- Fix build without debugfs"
* tag 'drm-fixes-2025-07-26' of https://gitlab.freedesktop.org/drm/kernel:
drm/xe: Fix build without debugfs
drm/i915/display: Fix dma_fence_wait_timeout() return value handling
drm/i915/dp: Fix 2.7 Gbps DP_LINK_BW value on g4x
dt-bindings: display: mediatek,dp: Allow DisplayPort AUX bus
Like others, the MediaTek DisplayPort controller provides an
auxiliary bus: import the common dp-aux-bus.yaml in this binding
to allow specifying an aux-bus subnode.
Tristram Ha [Wed, 23 Jul 2025 03:04:03 +0000 (20:04 -0700)]
net: dsa: microchip: Fix wrong rx drop MIB counter for KSZ8863
When KSZ8863 support was first added to KSZ driver the RX drop MIB
counter was somehow defined as 0x105. The TX drop MIB counter
starts at 0x100 for port 1, 0x101 for port 2, and 0x102 for port 3, so
the RX drop MIB counter should start at 0x103 for port 1, 0x104 for
port 2, and 0x105 for port 3.
There are 5 ports for KSZ8895, so its RX drop MIB counter starts at
0x105.
Fixes: 4b20a07e103f ("net: dsa: microchip: ksz8795: add support for ksz88xx chips") Signed-off-by: Tristram Ha <tristram.ha@microchip.com> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20250723030403.56878-1-Tristram.Ha@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gabriel Goller [Tue, 22 Jul 2025 08:18:45 +0000 (10:18 +0200)]
ipv6: add `force_forwarding` sysctl to enable per-interface forwarding
It is currently impossible to enable ipv6 forwarding on a per-interface
basis like in ipv4. To enable forwarding on an ipv6 interface we need to
enable it on all interfaces and disable it on the other interfaces using
a netfilter rule. This is especially cumbersome if you have lots of
interfaces and only want to enable forwarding on a few. According to the
sysctl docs [0] the `net.ipv6.conf.all.forwarding` enables forwarding
for all interfaces, while the interface-specific
`net.ipv6.conf.<interface>.forwarding` configures the interface
Host/Router configuration.
Introduce a new sysctl flag `force_forwarding`, which can be set on every
interface. The ip6_forwarding function will then check if the global
forwarding flag OR the force_forwarding flag is active and forward the
packet.
To preserve backwards-compatibility reset the flag (on all interfaces)
to 0 if the net.ipv6.conf.all.forwarding flag is set to 0.
Add a short selftest that checks if a packet gets forwarded with and
without `force_forwarding`.
Add missing description for AMD/Xilinx interrupt controller. The binding is
used by Microblaze before dt-binding even existed but never been
documented properly.
IP acts as primary interrupt controller on Microblaze systems or can be
used as secondary interrupt controller on ARM based systems like Zynq,
ZynqMP, Versal or Versal Gen 2. Also as secondary interrupt controller on
Microblaze-V (Risc-V) systems.
Over the years IP exists in multiple variants based on attached bus as OPB,
PLB or AXI that's why generic filename is used.
Property xlnx,kind-of-intr is in hex because every bit position corresponds
to interrupt line. Controller support mixing edge or level interrupts
together and this is the property which distinguish them.
Simon Horman [Thu, 24 Jul 2025 13:10:54 +0000 (14:10 +0100)]
octeontx2-af: use unsigned int as iterator for unsigned values
The local variable i is used to iterate over unsigned
values. The lower bound of the loop is set to 0. While
the upper bound is cgx->lmac_count, where they lmac_count is
an u8. So the theoretical upper bound is 255.
As is, GCC can't see this range of values and warns that
a formatted string, which includes the %d representation of i,
may overflow the buffer provided.
GCC 15.1.0 says:
.../cgx.c: In function 'cgx_lmac_init':
.../cgx.c:1737:49: warning: '%d' directive writing between 1 and 11 bytes into a region of size between 4 and 6 [-Wformat-overflow=]
1737 | sprintf(lmac->name, "cgx_fwi_%d_%d", cgx->cgx_id, i);
| ^~
.../cgx.c:1737:37: note: directive argument in the range [-2147483641, 254]
1737 | sprintf(lmac->name, "cgx_fwi_%d_%d", cgx->cgx_id, i);
| ^~~~~~~~~~~~~~~
.../cgx.c:1737:17: note: 'sprintf' output between 12 and 24 bytes into a destination of size 16
1737 | sprintf(lmac->name, "cgx_fwi_%d_%d", cgx->cgx_id, i);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Empirically, changing the type of i from (signed) int to unsigned int
addresses this problem. I assume by allowing GCC to see the range of
values described above.
Also update the format specifiers for the integer values in the string
in question from %d to %u. This seems appropriate as they are now both
unsigned.
No functional change intended.
Compile tested only.
Paolo Abeni [Wed, 23 Jul 2025 14:32:23 +0000 (16:32 +0200)]
mptcp: track fallbacks accurately via mibs
Add the mibs required to cover the few possible fallback causes still
lacking suck info.
Move the relevant mib increment into the fallback helper, so that no
eventual future fallback operation will miss a paired mib increment.
Additionally track failed fallback via its own mib, such mib is
incremented only when a fallback mandated by the protocol fails - due to
racing subflow creation.
While at the above, rename an existing helper to reduce long lines
problems all along.
selftests: net: Skip test if IPv6 is not configured
Extend the `check_for_dependencies()` function in `lib_netcons.sh` to check
whether IPv6 is enabled by verifying the existence of
`/proc/net/if_inet6`. Having IPv6 is a now a dependency of netconsole
tests. If the file does not exist, the script will skip the test with an
appropriate message suggesting to verify if `CONFIG_IPV6` is enabled.
This prevents the test to misbehave if IPv6 is not configured.
Yi Cong [Thu, 24 Jul 2025 01:31:33 +0000 (09:31 +0800)]
usbnet: Set duplex status to unknown in the absence of MII
Currently, USB CDC devices that do not use MDIO to get link status have
their duplex mode set to half-duplex by default. However, since the CDC
specification does not define a duplex status, this can be misleading.
This patch changes the default to DUPLEX_UNKNOWN in the absence of MII,
which more accurately reflects the state of the link and avoids implying
an incorrect or error state.
Which is similar to recent syzkaller reports in [0] and [1] and triggers
because macsec does not advertise IFF_UNICAST_FLT although it has proper
ndo_set_rx_mode callback that takes care of pushing uc/mc addresses
down to the real device.
In general, dev_uc_add call path is problematic for stacking
non-IFF_UNICAST_FLT because we might grab netdev instance lock under
addr_list_lock spinlock, so this is not a systemic fix.
John Ernberg [Wed, 23 Jul 2025 10:25:35 +0000 (10:25 +0000)]
net: usbnet: Avoid potential RCU stall on LINK_CHANGE event
The Gemalto Cinterion PLS83-W modem (cdc_ether) is emitting confusing link
up and down events when the WWAN interface is activated on the modem-side.
Interrupt URBs will in consecutive polls grab:
* Link Connected
* Link Disconnected
* Link Connected
Where the last Connected is then a stable link state.
When the system is under load this may cause the unlink_urbs() work in
__handle_link_change() to not complete before the next usbnet_link_change()
call turns the carrier on again, allowing rx_submit() to queue new SKBs.
In that event the URB queue is filled faster than it can drain, ending up
in a RCU stall:
rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-.... } 33108 jiffies s: 201 root: 0x1/.
rcu: blocking rcu_node structures (internal RCU debug):
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
net: hibmcge: support for statistics of reset failures
Add a statistical item to count the number of reset failures.
This statistical item can be queried using ethtool -S or
reported through diagnose information.
Shahar Shitrit [Wed, 23 Jul 2025 07:44:32 +0000 (10:44 +0300)]
net/mlx5e: Fix potential deadlock by deferring RX timeout recovery
mlx5e_reporter_rx_timeout() is currently invoked synchronously
in the driver's open error flow. This causes the thread holding
priv->state_lock to attempt acquiring the devlink lock, which
can result in a circular dependency with other devlink operations.
For example:
- Devlink health diagnose flow:
- __devlink_nl_pre_doit() acquires the devlink lock.
- devlink_nl_health_reporter_diagnose_doit() invokes the
driver's diagnose callback.
- mlx5e_rx_reporter_diagnose() then attempts to acquire
priv->state_lock.
- Driver open flow:
- mlx5e_open() acquires priv->state_lock.
- If an error occurs, devlink_health_reporter may be called,
attempting to acquire the devlink lock.
To prevent this circular locking scenario, defer the RX timeout
recovery by scheduling it via a workqueue. This ensures that the
recovery work acquires locks in a consistent order: first the
devlink lock, then priv->state_lock.
Additionally, make the recovery work acquire the netdev instance
lock to safely synchronize with the open/close channel flows,
similar to mlx5e_tx_timeout_work. Repeatedly attempt to acquire
the netdev instance lock until it is taken or the target RQ is no
longer active, as indicated by the MLX5E_STATE_CHANNELS_ACTIVE bit.
Fixes: 32c57fb26863 ("net/mlx5e: Report and recover from rx timeout") Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1753256672-337784-4-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jianbo Liu [Wed, 23 Jul 2025 07:44:31 +0000 (10:44 +0300)]
net/mlx5e: Remove skb secpath if xfrm state is not found
Hardware returns a unique identifier for a decrypted packet's xfrm
state, this state is looked up in an xarray. However, the state might
have been freed by the time of this lookup.
Currently, if the state is not found, only a counter is incremented.
The secpath (sp) extension on the skb is not removed, resulting in
sp->len becoming 0.
Subsequently, functions like __xfrm_policy_check() attempt to access
fields such as xfrm_input_state(skb)->xso.type (which dereferences
sp->xvec[sp->len - 1]) without first validating sp->len. This leads to
a crash when dereferencing an invalid state pointer.
This patch prevents the crash by explicitly removing the secpath
extension from the skb if the xfrm state is not found after hardware
decryption. This ensures downstream functions do not operate on a
zero-length secpath.
net/mlx5e: Clear Read-Only port buffer size in PBMC before update
When updating the PBMC register, we read its current value,
modify desired fields, then write it back.
The port_buffer_size field within PBMC is Read-Only (RO).
If this RO field contains a non-zero value when read,
attempting to write it back will cause the entire PBMC
register update to fail.
This commit ensures port_buffer_size is explicitly cleared
to zero after reading the PBMC register but before writing
back the modified value.
This allows updates to other fields in the PBMC register to succeed.
Ian Rogers [Thu, 24 Jul 2025 16:33:02 +0000 (09:33 -0700)]
perf sort: Use perf_env to set arch sort keys and header
Previously arch_support_sort_key and arch_perf_header_entry used a
weak symbol to compile as appropriate for x86 and powerpc. A
limitation to this is that the handling of a data file could vary in
cross-platform development. Change to using the perf_env of the
current session to determine the architecture kind and set the sort
key and header entries as appropriate.