Julian Braha [Tue, 31 Mar 2026 07:42:42 +0000 (08:42 +0100)]
cpufreq: clean up dead code in Kconfig
There is already an 'if CPU_FREQ' condition wrapping these config
options, making the 'depends on' statement for each a duplicate
dependency (dead code).
Leave the outer 'if CPU_FREQ...endif' and remove the individual
'depends on' statement from each option.
This dead code was found by kconfirm, a static analysis tool for
Kconfig.
Viresh Kumar [Tue, 31 Mar 2026 05:03:46 +0000 (10:33 +0530)]
cpufreq: Allocate QoS freq_req objects with policy
A recent change exposed a bug in the error path: if
freq_qos_add_request(boost_freq_req) fails, min_freq_req may remain a
valid pointer even though it was never successfully added. During policy
teardown, this leads to an unconditional call to
freq_qos_remove_request(), triggering a WARN.
The current design allocates all three freq_req objects together, making
the lifetime rules unclear and error handling fragile.
Simplify this by allocating the QoS freq_req objects at policy
allocation time. The policy itself is dynamically allocated, and two of
the three requests are always needed anyway. This ensures consistent
lifetime management and eliminates the inconsistent state in failure
paths.
Reported-by: Zhongqiu Han <zhongqiu.han@oss.qualcomm.com> Fixes: 6e39ba4e5a82 ("cpufreq: Add boost_freq_req QoS request") Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com> Tested-by: Pierre Gondois <pierre.gondois@arm.com> Reviewed-by: Zhongqiu Han <zhongqiu.han@oss.qualcomm.com> Link: https://patch.msgid.link/a293f29d841b86c51f34699c6e717e01858d8ada.1774933424.git.viresh.kumar@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Merge tag 'counter-fixes-for-7.0' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/wbg/counter into char-misc-linus
William writes:
Counter fixes for 7.0
Two fixes for rz-mut3-cnt: synchronize runtime PM usage count to toggle
state of the counter, and set counter->parent during probe to ensure the
current dev pointer is accessed during driver operation.
* tag 'counter-fixes-for-7.0' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/wbg/counter:
counter: rz-mtu3-cnt: do not use struct rz_mtu3_channel's dev member
counter: rz-mtu3-cnt: prevent counter from being toggled multiple times
Simon Trimmer [Tue, 31 Mar 2026 13:19:16 +0000 (13:19 +0000)]
ASoC: amd: ps: Fix missing leading zeros in subsystem_device SSID log
Ensure that subsystem_device is printed with leading zeros when combined
with subsystem_vendor to form the SSID. Without this, devices with upper
bits unset may appear to have an incorrect SSID in the debug output.
Merge tag 'qcom-drivers-fixes-for-7.0' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes
Qualcomm driver fixes for v7.0
Fix the length of the PD restart reason string in pd-mapper to avoid
QMI decoding errors, resulting in the notification being dropped.
Fix the newly introduce handling of TBT/USB4 notifications in pmic_glink
altmode driver, as it broke the handling of non-TBT/USB4 DisplayPort
unplug events.
* tag 'qcom-drivers-fixes-for-7.0' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
soc: qcom: pmic_glink_altmode: Fix TBT->SAFE->!TBT transition
soc: qcom: pmic_glink_altmode: Fix SVID=DP && unconnected edge case
soc: qcom: pd-mapper: Fix element length in servreg_loc_pfr_req_ei
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Merge tag 'riscv-soc-fixes-for-v7.0-rc6' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/conor/linux into arm/fixes
RISC-V soc fixes for v7.0-rc6
Microchip:
More resource leak fixes for unlikely scenarios, and a change to the
auto-update "firmware" driver to prevent it probing on systems with
engineering silicon where it cannot be used.
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
* tag 'riscv-soc-fixes-for-v7.0-rc6' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/conor/linux:
firmware: microchip: fail auto-update probe if no flash found
soc: microchip: mpfs-mss-top-sysreg: Fix resource leak on driver unbind
soc: microchip: mpfs-control-scb: Fix resource leak on driver unbind
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Merge tag 'aspeed-7.0-fixes-0' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bmc/linux into arm/fixes
aspeed: first batch of fixes for v7.0
* tag 'aspeed-7.0-fixes-0' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bmc/linux:
soc: aspeed: socinfo: Mask table entries for accurate SoC ID matching
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Merge tag 'reset-fixes-for-v7.0-2' of https://git.pengutronix.de/git/pza/linux into arm/fixes
Reset controller fixes for v7.0, part 2
* Decouple spacemit K3 reset lines that were incorrectly coupled
together as one, but are in fact separate resets in hardware.
* Fix a double free in the reset_add_gpio_aux_device() error path.
This has already been fixed on reset/next by commit a9b95ce36de4
("reset: gpio: add a devlink between reset-gpio and its consumer").
* Fix the MODULE_AUTHOR string in the rzg2l-usbphy-ctrl driver.
* tag 'reset-fixes-for-v7.0-2' of https://git.pengutronix.de/git/pza/linux:
reset: spacemit: k3: Decouple composite reset lines
reset: gpio: fix double free in reset_add_gpio_aux_device() error path
reset: rzg2l-usbphy-ctrl: Fix malformed MODULE_AUTHOR string
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
firmware: thead: Fix buffer overflow and use standard endian macros
Addresses two issues in the TH1520 AON firmware protocol driver:
1. Fix a potential buffer overflow where the code used unsafe pointer
arithmetic to access the 'mode' field through the 'resource' pointer
with an offset. This was flagged by Smatch static checker as:
"buffer overflow 'data' 2 <= 3"
2. Replace custom RPC_SET_BE* and RPC_GET_BE* macros with standard
kernel endianness conversion macros (cpu_to_be16, etc.) for better
portability and maintainability.
The functionality was re-tested with the GPU power-up sequence,
confirming the GPU powers up correctly and the driver probes
successfully.
[ 12.702370] powervr ffef400000.gpu: [drm] loaded firmware
powervr/rogue_36.52.104.182_v1.fw
[ 12.711043] powervr ffef400000.gpu: [drm] FW version v1.0 (build 6645434 OS)
[ 12.719787] [drm] Initialized powervr 1.0.0 for ffef400000.gpu on
minor 0
Fixes: e4b3cbd840e5 ("firmware: thead: Add AON firmware protocol driver") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/all/17a0ccce-060b-4b9d-a3c4-8d5d5823b1c9@stanley.mountain/ Signed-off-by: Michal Wilczynski <m.wilczynski@samsung.com> Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Acked-by: Drew Fustini <fustini@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
nft_queue is always used from userspace nftables to deliver the NF_QUEUE
verdict. Immediately emitting an NF_QUEUE verdict is never used by the
userspace nft tools, so reject immediate NF_QUEUE verdicts.
The arp family does not provide queue support, but such an immediate
verdict is still reachable. Globally reject NF_QUEUE immediate verdicts
to address this issue.
netfilter: x_tables: restrict xt_check_match/xt_check_target extensions for NFPROTO_ARP
Weiming Shi says:
xt_match and xt_target structs registered with NFPROTO_UNSPEC can be
loaded by any protocol family through nft_compat. When such a
match/target sets .hooks to restrict which hooks it may run on, the
bitmask uses NF_INET_* constants. This is only correct for families
whose hook layout matches NF_INET_*: IPv4, IPv6, INET, and bridge
all share the same five hooks (PRE_ROUTING ... POST_ROUTING).
ARP only has three hooks (IN=0, OUT=1, FORWARD=2) with different
semantics. Because NF_ARP_OUT == 1 == NF_INET_LOCAL_IN, the .hooks
validation silently passes for the wrong reasons, allowing matches to
run on ARP chains where the hook assumptions (e.g. state->in being
set on input hooks) do not hold. This leads to NULL pointer
dereferences; xt_devgroup is one concrete example:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000044: 0000 [#1] SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000220-0x0000000000000227]
RIP: 0010:devgroup_mt+0xff/0x350
Call Trace:
<TASK>
nft_match_eval (net/netfilter/nft_compat.c:407)
nft_do_chain (net/netfilter/nf_tables_core.c:285)
nft_do_chain_arp (net/netfilter/nft_chain_filter.c:61)
nf_hook_slow (net/netfilter/core.c:623)
arp_xmit (net/ipv4/arp.c:666)
</TASK>
Kernel panic - not syncing: Fatal exception in interrupt
Fix it by restricting arptables to NFPROTO_ARP extensions only.
Note that arptables-legacy only supports:
- arpt_CLASSIFY
- arpt_mangle
- arpt_MARK
that provide explicit NFPROTO_ARP match/target declarations.
Fixes: 9291747f118d ("netfilter: xtables: add device group match") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Yifan Wu [Mon, 30 Mar 2026 21:39:24 +0000 (14:39 -0700)]
netfilter: ipset: drop logically empty buckets in mtype_del
mtype_del() counts empty slots below n->pos in k, but it only drops the
bucket when both n->pos and k are zero. This misses buckets whose live
entries have all been removed while n->pos still points past deleted slots.
Treat a bucket as empty when all positions below n->pos are unused and
release it directly instead of shrinking it further.
Fixes: 8af1c6fbd923 ("netfilter: ipset: Fix forceadd evaluation path") Cc: stable@vger.kernel.org Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <dstsmallbird@foxmail.com> Signed-off-by: Yifan Wu <yifanwucs@gmail.com> Co-developed-by: Yuan Tan <yuantan098@gmail.com> Signed-off-by: Yuan Tan <yuantan098@gmail.com> Reviewed-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: ctnetlink: ignore explicit helper on new expectations
Use the existing master conntrack helper, anything else is not really
supported and it just makes validation more complicated, so just ignore
what helper userspace suggests for this expectation.
This was uncovered when validating CTA_EXPECT_CLASS via different helper
provided by userspace than the existing master conntrack helper:
BUG: KASAN: slab-out-of-bounds in nf_ct_expect_related_report+0x2479/0x27c0
Read of size 4 at addr ffff8880043fe408 by task poc/102
Call Trace:
nf_ct_expect_related_report+0x2479/0x27c0
ctnetlink_create_expect+0x22b/0x3b0
ctnetlink_new_expect+0x4bd/0x5c0
nfnetlink_rcv_msg+0x67a/0x950
netlink_rcv_skb+0x120/0x350
Allowing to read kernel memory bytes off the expectation boundary.
CTA_EXPECT_HELP_NAME is still used to offer the helper name to userspace
via netlink dump.
Fixes: bd0779370588 ("netfilter: nfnetlink_queue: allow to attach expectations to conntracks") Reported-by: Qi Tang <tpluszz77@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Qi Tang [Tue, 31 Mar 2026 06:17:12 +0000 (14:17 +0800)]
netfilter: ctnetlink: zero expect NAT fields when CTA_EXPECT_NAT absent
ctnetlink_alloc_expect() allocates expectations from a non-zeroing
slab cache via nf_ct_expect_alloc(). When CTA_EXPECT_NAT is not
present in the netlink message, saved_addr and saved_proto are
never initialized. Stale data from a previous slab occupant can
then be dumped to userspace by ctnetlink_exp_dump_expect(), which
checks these fields to decide whether to emit CTA_EXPECT_NAT.
The safe sibling nf_ct_expect_init(), used by the packet path,
explicitly zeroes these fields.
Zero saved_addr, saved_proto and dir in the else branch, guarded
by IS_ENABLED(CONFIG_NF_NAT) since these fields only exist when
NAT is enabled.
Confirmed by priming the expect slab with NAT-bearing expectations,
freeing them, creating a new expectation without CTA_EXPECT_NAT,
and observing that the ctnetlink dump emits a spurious
CTA_EXPECT_NAT containing stale data from the prior allocation.
Fixes: 076a0ca02644 ("netfilter: ctnetlink: add NAT support for expectations") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Qi Tang <tpluszz77@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Qi Tang [Sun, 29 Mar 2026 16:50:36 +0000 (00:50 +0800)]
netfilter: nf_conntrack_helper: pass helper to expect cleanup
nf_conntrack_helper_unregister() calls nf_ct_expect_iterate_destroy()
to remove expectations belonging to the helper being unregistered.
However, it passes NULL instead of the helper pointer as the data
argument, so expect_iter_me() never matches any expectation and all
of them survive the cleanup.
After unregister returns, nfnl_cthelper_del() frees the helper
object immediately. Subsequent expectation dumps or packet-driven
init_conntrack() calls then dereference the freed exp->helper,
causing a use-after-free.
Pass the actual helper pointer so expectations referencing it are
properly destroyed before the helper object is freed.
BUG: KASAN: slab-use-after-free in string+0x38f/0x430
Read of size 1 at addr ffff888003b14d20 by task poc/103
Call Trace:
string+0x38f/0x430
vsnprintf+0x3cc/0x1170
seq_printf+0x17a/0x240
exp_seq_show+0x2e5/0x560
seq_read_iter+0x419/0x1280
proc_reg_read+0x1ac/0x270
vfs_read+0x179/0x930
ksys_read+0xef/0x1c0
Freed by task 103:
The buggy address is located 32 bytes inside of
freed 192-byte region [ffff888003b14d00, ffff888003b14dc0)
Fixes: ac7b84839003 ("netfilter: expect: add and use nf_ct_expect_iterate helpers") Signed-off-by: Qi Tang <tpluszz77@gmail.com> Reviewed-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: flowtable: strictly check for maximum number of actions
The maximum number of flowtable hardware offload actions in IPv6 is:
* ethernet mangling (4 payload actions, 2 for each ethernet address)
* SNAT (4 payload actions)
* DNAT (4 payload actions)
* Double VLAN (4 vlan actions, 2 for popping vlan, and 2 for pushing)
for QinQ.
* Redirect (1 action)
Which makes 17, while the maximum is 16. But act_ct supports for tunnels
actions too. Note that payload action operates at 32-bit word level, so
mangling an IPv6 address takes 4 payload actions.
Update flow_action_entry_next() calls to check for the maximum number of
supported actions.
While at it, rise the maximum number of actions per flow from 16 to 24
so this works fine with IPv6 setups.
Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Reported-by: Hyunwoo Kim <imv4bel@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Ard Biesheuvel [Thu, 26 Mar 2026 18:04:36 +0000 (19:04 +0100)]
lis3lv02d: Omit IRQF_ONESHOT if no threaded handler is provided
The lis3lv02d started triggering a WARN in the IRQ code because it
passes IRQF_ONESHOT to request_threaded_irq() even when thread_fn is
NULL, which is an invalid combination.
Randy Dunlap [Thu, 12 Mar 2026 05:14:00 +0000 (22:14 -0700)]
lis3lv02d: fix kernel-doc warnings
Use the correct kernel-doc format to avoid kernel-doc warnings:
Warning: include/linux/lis3lv02d.h:125 struct member 'st_min_limits' not
described in 'lis3lv02d_platform_data'
Warning: include/linux/lis3lv02d.h:125 struct member 'st_max_limits' not
described in 'lis3lv02d_platform_data'
ALSA: usb-audio: Exclude Scarlett 2i2 1st Gen (8016) from SKIP_IFACE_SETUP
Same issue as the other 1st Gen Scarletts: QUIRK_FLAG_SKIP_IFACE_SETUP
causes distorted audio on this revision of the Scarlett 2i2 1st Gen
(1235:8016).
The MCP251xFD provides a dedicated transceiver standby control function via
the INT0/GPIO0/XSTBY pin, controlled by the XSTBYEN bit in IOCON. When
enabled, the hardware automatically drives the pin low while the controller
is active and high when it enters Sleep mode, allowing automatic standby
control of an external CAN transceiver without software intervention.
This series adds driver support for the XSTBYEN-based transceiver standby
control feature.
Tested on QCS6490 RB3 Gen2 with a PCAN-USB FD adapter: the transceiver is
active in normal mode, CAN communication works correctly, and the pin is
automatically managed across sleep and wake transitions.
Viken Dadhaniya [Sat, 21 Mar 2026 13:50:31 +0000 (19:20 +0530)]
can: mcp251xfd: add support for XSTBYEN transceiver standby control
The MCP251xFD has a dedicated transceiver standby control function on
the INT0/GPIO0/XSTBY pin, controlled by the XSTBYEN bit in IOCON.
When enabled, the hardware automatically manages the transceiver
standby state: the pin is driven low when the controller is active
and high when it enters Sleep mode.
Enable this feature when the 'microchip,xstbyen' device tree property
is present.
net: can: ctucanfd: remove useless copy of PCI_DEVICE_DATA macro
The ctucanfd driver has its own copy of the PCI_DEVICE_DATA macro. I
assume this was done to support older kernel versions where it didn't
exist, but that is irrelevant once the driver is in the mainline
kernel. Remove it.
Add the boolean property 'microchip,xstbyen' to enable the dedicated
transceiver standby control function on the INT0/GPIO0/XSTBY pin of
the MCP251xFD family.
powerpc/net: Inline checksum wrappers and convert to scoped user access
Commit 861574d51bbd ("powerpc/uaccess: Implement masked user access")
provides optimised user access by avoiding the cost of access_ok().
Convert csum_and_copy_to_user() and csum_and_copy_from_user() to
scoped user access to benefit from masked user access.
csum_and_copy_to_user() and csum_and_copy_from_user() are only
called respectively by csum_and_copy_to_iter() and
csum_and_copy_from_iter_full() and they are only called twice.
Those functions used to be large but they were first reduced by
commit c693cc4676a0 ("saner calling conventions for
csum_and_copy_..._user()") then commit 70d65cd555c5 ("ppc: propagate
the calling conventions change down to csum_partial_copy_generic()").
With the additional size reduction provided by conversion to scoped
user access they are not worth being kept out of line.
Christophe Leroy [Tue, 10 Mar 2026 15:08:07 +0000 (16:08 +0100)]
powerpc/audit: Convert powerpc to AUDIT_ARCH_COMPAT_GENERIC
Commit e65e1fc2d24b ("[PATCH] syscall class hookup for all normal
targets") added generic support for AUDIT but that didn't include
support for bi-arch like powerpc.
Commit 4b58841149dc ("audit: Add generic compat syscall support")
added generic support for bi-arch.
Convert powerpc to that bi-arch generic audit support.
With this change generated text is similar.
Thomas has confirmed that the previously failing filter_exclude/test
is now successful both without and with this patch, see [1]
Shrikanth Hegde [Wed, 11 Mar 2026 06:17:09 +0000 (11:47 +0530)]
cpuidle: powerpc: avoid double clear when breaking snooze
snooze_loop is done often in any system which has fair bit of
idle time. So it qualifies for even micro-optimizations.
When breaking the snooze due to timeout, TIF_POLLING_NRFLAG is cleared
twice. Clearing the bit invokes atomics. Avoid double clear and thereby
avoid one atomic write.
dev->poll_time_limit indicates whether the loop was broken due to
timeout. Use that instead of defining a new variable.
Fixes: 7ded429152e8 ("cpuidle: powerpc: no memory barrier after break from idle") Cc: stable@vger.kernel.org Reviewed-by: Mukesh Kumar Chaurasiya (IBM) <mkchauras@gmail.com> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260311061709.1230440-1-sshegde@linux.ibm.com
Randy Dunlap [Wed, 25 Feb 2026 05:53:28 +0000 (21:53 -0800)]
powerpc/ps3: spu.c: fix enum and Return kernel-doc warnings
Fix enum and function return value kernel-doc warnings:
Warning: spu.c:36 Excess enum value '%spe_type_logical' description in 'spe_type'
Warning: spu.c:78 Excess enum value '%spe_ex_state_unexecutable' description in 'spe_ex_state'
Warning: spu.c:78 Excess enum value '%spe_ex_state_executable' description in 'spe_ex_state'
Warning: spu.c:78 Excess enum value '%spe_ex_state_executed' description in 'spe_ex_state'
Warning: spu.c:190 No description found for return value of 'setup_areas'
Randy Dunlap [Wed, 25 Feb 2026 05:53:14 +0000 (21:53 -0800)]
powerpc: kgdb: fix kernel-doc warnings
Remove empty comment line at the beginning of a kernel-doc function
block. Add a "Return:" section for this function.
These changes prevent 2 kernel-doc warnings:
Warning: ../arch/powerpc/kernel/kgdb.c:103 Cannot find identifier on line:
*
Warning: kgdb.c:113 No description found for return value of 'kgdb_skipexception'
Randy Dunlap [Sat, 29 Nov 2025 18:36:36 +0000 (10:36 -0800)]
powerpc/ps3: fix ps3.h kernel-doc warnings
Fix some kernel-doc warnings in ps3.h:
- add @dev to struct ps3_dma_region
- don't mark a function as "struct"
- add Returns: description for one function
- add a short description for ps3_system_bus_set_drvdata()
- correct an enum @name
- move intervening "struct ps3_system_bus_device;" from between
kernel-doc for ps3_dma_region_init() and the function declaration
to eliminate these warnings:
Warning: arch/powerpc/include/asm/ps3.h:96 struct member 'dev' not
described in 'ps3_dma_region'
Warning: arch/powerpc/include/asm/ps3.h:118 struct ps3_system_bus_device;
error: Cannot parse struct or union!
Warning: arch/powerpc/include/asm/ps3.h:166 int
ps3_mmio_region_init(struct ps3_system_bus_device *dev, struct
ps3_mmio_region *r, unsigned long bus_addr, unsigned long len, enum
ps3_mmio_page_size page_size); error: Cannot parse struct or union!
Warning: arch/powerpc/include/asm/ps3.h:167 No description found for
return value of 'ps3_mmio_region_init'
Warning: arch/powerpc/include/asm/ps3.h:407 missing initial short
description on line:
* ps3_system_bus_set_drvdata -
Warning: arch/powerpc/include/asm/ps3.h:473 Enum value
'PS3_LPM_TB_TYPE_INTERNAL' not described in enum 'ps3_lpm_tb_type'
Warning: arch/powerpc/include/asm/ps3.h:473 Excess enum value
'@PS3_LPM_RIGHTS_USE_TB' description in 'ps3_lpm_tb_type'
This leaves struct members in several structs and function parameters in
one function still undescribed.
J. Neuschäfer [Tue, 3 Mar 2026 15:09:49 +0000 (16:09 +0100)]
powerpc: Move GameCube/Wii options under EMBEDDED6xx
Move CONFIG_GAMECUBE and CONFIG_WII directly below other embedded6xx
boards, and above options such as TSI108_BRIDGE. This has two
advantages for the GC/Wii options:
- They won't be moved around by USBGECKO_UDBG appearing or disappearing
- They will be intendented in menuconfig/nconfig, to make it clear they
are part of the embedded6xx platforms
The driver currently sets the handler data and the chained handler in
two separate steps. This creates a theoretical race window where an
interrupt could fire after the handler is set but before the data is
assigned, leading to a NULL pointer dereference.
Replace the two calls with irq_set_chained_handler_and_data() to set
both the handler and its data atomically under the irq_desc->lock.
The driver currently sets the handler data and the chained handler in
two separate steps. This creates a theoretical race window where an
interrupt could fire after the handler is set but before the data is
assigned, leading to a NULL pointer dereference.
Replace the two calls with irq_set_chained_handler_and_data() to set
both the handler and its data atomically under the irq_desc->lock.
The driver currently sets the handler data and the chained handler in
two separate steps. This creates a theoretical race window where an
interrupt could fire after the handler is set but before the data is
assigned, leading to a NULL pointer dereference.
Replace the two calls with irq_set_chained_handler_and_data() to set
both the handler and its data atomically under the irq_desc->lock.
Amit Machhiwal [Fri, 13 Mar 2026 16:54:26 +0000 (22:24 +0530)]
selftests/powerpc: Suppress -Wmaybe-uninitialized with GCC 15
GCC 15 reports the below false positive '-Wmaybe-uninitialized' warning
in vphn_unpack_associativity() when building the powerpc selftests.
# make -C tools/testing/selftests TARGETS="powerpc"
[...]
CC test-vphn
In file included from test-vphn.c:3:
In function ‘vphn_unpack_associativity’,
inlined from ‘test_one’ at test-vphn.c:371:2,
inlined from ‘test_vphn’ at test-vphn.c:399:9:
test-vphn.c:10:33: error: ‘be_packed’ may be used uninitialized [-Werror=maybe-uninitialized]
10 | #define be16_to_cpup(x) bswap_16(*x)
| ^~~~~~~~
vphn.c:42:27: note: in expansion of macro ‘be16_to_cpup’
42 | u16 new = be16_to_cpup(field++);
| ^~~~~~~~~~~~
In file included from test-vphn.c:19:
vphn.c: In function ‘test_vphn’:
vphn.c:27:16: note: ‘be_packed’ declared here
27 | __be64 be_packed[VPHN_REGISTER_COUNT];
| ^~~~~~~~~
cc1: all warnings being treated as errors
When vphn_unpack_associativity() is called from hcall_vphn() in kernel
the error is not seen while building vphn.c during kernel compilation.
This is because the top level Makefile includes '-fno-strict-aliasing'
flag always.
The issue here is that GCC 15 emits '-Wmaybe-uninitialized' due to type
punning between __be64[] and __b16* when accessing the buffer via
be16_to_cpup(). The underlying object is fully initialized but GCC 15
fails to track the aliasing due to the strict aliasing violation here.
Please refer [1] and [2]. This results in a false positive warning which
is promoted to an error under '-Werror'. This problem is not seen when
the compilation is performed with GCC 13 and 14. An issue [1] has also
been created on GCC bugzilla.
The selftest compiles fine with '-fno-strict-aliasing'. Since this GCC
flag is used to compile vphn.c in kernel too, the same flag should be
used to build vphn tests when compiling vphn.c in the selftest as well.
Fix this by including '-fno-strict-aliasing' during vphn.c compilation
in the selftest. This keeps the build working while limiting the scope
of the suppression to building vphn tests.
Yury Norov [Thu, 19 Mar 2026 03:36:46 +0000 (23:36 -0400)]
powerpc/xive: rework xive_find_target_in_mask()
Switch the function to using modern cpumask API and drop most of the
housekeeping code.
Notice, if first >= nr_cpu_ids, for_each_cpu_wrap() iterator behaves just
like for_each_cpu(), i.e. begins from 0. So even if WARN_ON() is triggered,
no special handling is needed.
When called from xive_irq_startup(), the size of the cpumask can be
larger than nr_cpu_ids. This can result in a WARN_ON.
[...]
This happens because we're being called with our affinity mask set to
irq_default_affinity. That in turn was populated using
cpumask_setall(), which sets NR_CPUs worth of bits, not nr_cpu_ids
worth. Finally cpumask_weight() will return > nr_cpu_ids when passed a
mask which has > nr_cpu_ids bits set.
In modern kernel, cpumask_weight() can't return > nr_cpu_ids.
In inline case, cpumask_setall() explicitly clears all bits above
nr_cpu_ids, see commit 63355b9884b3 ("cpumask: be more careful with
'cpumask_setall()'"). So, despite that cpumask_weight() is passed
with small_cpumask_bits, which is NR_CPUS in this case, it can't
count over the nr_cpu_ids.
In outline case, cpumask_setall() may set bits beyond the limit up to
the next byte alignment, but in this case small_cpumask_bits is wired
to nr_cpu_ids, thus making overcounting impossible.
Sourabh Jain [Thu, 12 Mar 2026 08:30:50 +0000 (14:00 +0530)]
powerpc/crash: Update backup region offset in elfcorehdr on memory hotplug
When elfcorehdr is prepared for kdump, the program header representing
the first 64 KB of memory is expected to have its offset point to the
backup region. This is required because purgatory copies the first 64 KB
of the crashed kernel memory to this backup region following a kernel
crash. This allows the capture kernel to use the first 64 KB of memory
to place the exception vectors and other required data.
When elfcorehdr is recreated due to memory hotplug, the offset of
the program header representing the first 64 KB is not updated.
As a result, the capture kernel exports the first 64 KB at offset
0, even though the data actually resides in the backup region.
Fix this by calling sync_backup_region_phdr() to update the program
header offset in the elfcorehdr created during memory hotplug.
sync_backup_region_phdr() works for images loaded via the
kexec_file_load syscall. However, it does not work for kexec_load,
because image->arch.backup_start is not initialized in that case.
So introduce machine_kexec_post_load() to process the elfcorehdr
prepared by kexec-tools and initialize image->arch.backup_start for
kdump images loaded via kexec_load syscall.
Rename update_backup_region_phdr() to sync_backup_region_phdr() and
extend it to synchronize the backup region offset between the kdump
image and the ELF core header. The helper now supports updating either
the kdump image from the ELF program header or updating the ELF program
header from the kdump image, avoiding code duplication.
Define ARCH_HAS_KIMAGE_ARCH and struct kimage_arch when
CONFIG_KEXEC_FILE or CONFIG_CRASH_DUMP is enabled so that
kimage->arch.backup_start is available with the kexec_load system call.
This patch depends on the patch titled
"powerpc/crash: fix backup region offset update to elfcorehdr".
Sourabh Jain [Thu, 12 Mar 2026 08:30:49 +0000 (14:00 +0530)]
powerpc/crash: fix backup region offset update to elfcorehdr
update_backup_region_phdr() in file_load_64.c iterates over all the
program headers in the kdump kernel’s elfcorehdr and updates the
p_offset of the program header whose physical address starts at 0.
However, the loop logic is incorrect because the program header pointer
is not updated during iteration. Since elfcorehdr typically contains
PT_NOTE entries first, the PT_LOAD program header with physical address
0 is never reached. As a result, its p_offset is not updated to point to
the backup region.
Because of this behavior, the capture kernel exports the first 64 KB of
the crashed kernel’s memory at offset 0, even though that memory
actually lives in the backup region. When a crash happens, purgatory
copies the first 64 KB of the crashed kernel’s memory into the backup
region so the capture kernel can safely use it.
This has not caused problems so far because the first 64 KB is usually
identical in both the crashed and capture kernels. However, this is
just an assumption and is not guaranteed to always hold true.
Fix update_backup_region_phdr() to correctly update the p_offset of the
program header with a starting physical address of 0 by correcting the
logic used to iterate over the program headers.
Nagamani PV [Mon, 30 Mar 2026 11:44:36 +0000 (13:44 +0200)]
net/iucv: Add missing kernel-doc return value descriptions
Add missing return value descriptions for several functions in
net/iucv/af_iucv.c and net/iucv/iucv.c to address kernel-doc warnings.
Warnings detected with:
scripts/kernel-doc -none -Wall net/iucv/*
Warning: net/iucv/af_iucv.c:131 No description found for return value of 'iucv_msg_length'
Warning: net/iucv/af_iucv.c:150 No description found for return value of 'iucv_sock_in_state'
...
No functional change.
Reviewed-by: Aswin Karuvally <aswin@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Nagamani PV <nagamani@linux.ibm.com> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Link: https://patch.msgid.link/20260330114436.2010108-1-wintera@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: vxlan: check ipv6_mod_enabled() on neigh_reduce()
IPv6 must be enabled or otherwise neigh_reduce() might cause a kernel
panic. This was prevented by a check on in6_dev. Use ipv6_mod_enabled()
instead as it is cleaner and also consistent with the code at
route_shortcircuit().
Michal Piekos [Sat, 28 Mar 2026 08:55:51 +0000 (09:55 +0100)]
net: stmmac: skip VLAN restore when VLAN hash ops are missing
stmmac_vlan_restore() unconditionally calls stmmac_vlan_update() when
NETIF_F_VLAN_FEATURES is set. On platforms where priv->hw->vlan (or
->update_vlan_hash) is not provided, stmmac_update_vlan_hash() returns
-EINVAL via stmmac_do_void_callback(), resulting in a spurious
"Failed to restore VLANs" error even when no VLAN filtering is in use.
Remove not needed comment.
Remove not used return value from stmmac_vlan_restore().
net: mana: hardening: Validate adapter_mtu from MANA_QUERY_DEV_CONFIG
As a part of MANA hardening for CVM, validate the adapter_mtu value
returned from the MANA_QUERY_DEV_CONFIG HWC command.
The adapter_mtu value is used to compute ndev->max_mtu via:
gc->adapter_mtu - ETH_HLEN. If hardware returns a bogus adapter_mtu
smaller than ETH_HLEN (e.g. 0), the unsigned subtraction wraps to a
huge value, silently allowing oversized MTU settings.
Add a validation check to reject adapter_mtu values below
ETH_MIN_MTU + ETH_HLEN, returning -EPROTO to fail the device
configuration early with a clear error message.
Yufan Chen [Sat, 28 Mar 2026 16:32:57 +0000 (00:32 +0800)]
net: ftgmac100: fix ring allocation unwind on open failure
ftgmac100_alloc_rings() allocates rx_skbs, tx_skbs, rxdes, txdes, and
rx_scratch in stages. On intermediate failures it returned -ENOMEM
directly, leaking resources allocated earlier in the function.
Rework the failure path to use staged local unwind labels and free
allocated resources in reverse order before returning -ENOMEM. This
matches common netdev allocation cleanup style.
Fixes: d72e01a0430f ("ftgmac100: Use a scratch buffer for failed RX allocations") Cc: stable@vger.kernel.org Signed-off-by: Yufan Chen <yufan.chen@linux.dev> Link: https://patch.msgid.link/20260328163257.60836-1-yufan.chen@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Inspired by a recent discussion[1] I have come up with this pair of
small improvements to DMA error reporting with declance.
[1] Sebastian Andrzej Siewior, "declance: Remove IRQF_ONESHOT",
<https://lore.kernel.org/r/20260127135334.qUEaYP9G@linutronix.de/>
====================
declance: Include the offending address with DMA errors
The address latched in the I/O ASIC LANCE DMA Pointer Register uses the
TURBOchannel bus address encoding and therefore bits 33:29 of location
referred occupy bits 4:0, bits 28:2 are left-shifted by 3, and bits 1:0
are hardwired to zero. In reality no TURBOchannel system exceeds 1GiB
of RAM though, so the address reported will always fit in 8 hex digits.
Daniel Wagner [Mon, 30 Mar 2026 22:53:10 +0000 (23:53 +0100)]
net: phy: bcm84881: add BCM84891/BCM84892 support
The BCM84891 and BCM84892 are 10GBASE-T PHYs in the same family as the
BCM84881, sharing the register map and most callbacks. They add USXGMII
as a host interface mode.
bcm8489x_config_init() is separate from bcm84881_config_init(): it
allows only USXGMII (the only host mode available on the tested
hardware) and clears MDIO_CTRL1_LPOWER, which is set at boot on the
tested platform. Does not recur on ifdown/ifup, cable events, or
link-partner advertisement changes, so config_init is sufficient.
For USXGMII, read_status() skips the 0x4011 host-mode register: it
returns the same value regardless of negotiated copper speed (USXGMII
symbol replication). Speed comes from phy_resolve_aneg_linkmode() via
standard C45 AN resolution.
Tested on TRENDnet TEG-S750 (RTL9303 + 1x BCM84891 + 4x BCM84892)
running OpenWrt, where the MDIO controller driver is currently
OpenWrt-specific. Link verified at 100M, 1G, 2.5G, 10G.
Signed-off-by: Daniel Wagner <wagner.daniel.t@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20260330225310.2801264-1-wagner.daniel.t@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Li Xiasong [Mon, 30 Mar 2026 12:03:35 +0000 (20:03 +0800)]
mptcp: fix soft lockup in mptcp_recvmsg()
syzbot reported a soft lockup in mptcp_recvmsg() [0].
When receiving data with MSG_PEEK | MSG_WAITALL flags, the skb is not
removed from the sk_receive_queue. This causes sk_wait_data() to always
find available data and never perform actual waiting, leading to a soft
lockup.
Fix this by adding a 'last' parameter to track the last peeked skb.
This allows sk_wait_data() to make informed waiting decisions and prevent
infinite loops when MSG_PEEK is used.
Eric Biggers [Tue, 31 Mar 2026 02:44:38 +0000 (19:44 -0700)]
lib/crypto: Include <crypto/utils.h> instead of <crypto/algapi.h>
Since the lib/crypto/ files that include <crypto/algapi.h> need it only
for the transitive inclusion of <crypto/utils.h> (and not all the
traditional crypto API stuff that the rest of <crypto/algapi.h> is
filled with), replace these inclusions with direct inclusions of
<crypto/utils.h>.
Eric Biggers [Tue, 31 Mar 2026 02:44:30 +0000 (19:44 -0700)]
lib/crypto: aesgcm: Don't disable IRQs during AES block encryption
aes_encrypt() now uses AES instructions when available instead of always
using table-based code. AES instructions are constant-time and don't
benefit from disabling IRQs as a constant-time hardening measure.
In fact, on two architectures (arm and riscv) disabling IRQs is
counterproductive because it prevents the AES instructions from being
used. (See the may_use_simd() implementation on those architectures.)
Therefore, let's remove the IRQ disabling/enabling and leave the choice
of constant-time hardening measures to the AES library code.
Note that currently the arm table-based AES code (which runs on arm
kernels that don't have ARMv8 CE) disables IRQs, while the generic
table-based AES code does not. So this does technically regress in
constant-time hardening when that generic code is used. But as
discussed in commit a22fd0e3c495 ("lib/crypto: aes: Introduce improved
AES library") I think just leaving IRQs enabled is the right choice.
Disabling them is slow and can cause problems, and AES instructions
(which modern CPUs have) solve the problem in a much better way anyway.
Eric Biggers [Tue, 31 Mar 2026 02:44:14 +0000 (19:44 -0700)]
lib/crypto: aescfb: Don't disable IRQs during AES block encryption
aes_encrypt() now uses AES instructions when available instead of always
using table-based code. AES instructions are constant-time and don't
benefit from disabling IRQs as a constant-time hardening measure.
In fact, on two architectures (arm and riscv) disabling IRQs is
counterproductive because it prevents the AES instructions from being
used. (See the may_use_simd() implementation on those architectures.)
Therefore, let's remove the IRQ disabling/enabling and leave the choice
of constant-time hardening measures to the AES library code.
Note that currently the arm table-based AES code (which runs on arm
kernels that don't have ARMv8 CE) disables IRQs, while the generic
table-based AES code does not. So this does technically regress in
constant-time hardening when that generic code is used. But as
discussed in commit a22fd0e3c495 ("lib/crypto: aes: Introduce improved
AES library") I think just leaving IRQs enabled is the right choice.
Disabling them is slow and can cause problems, and AES instructions
(which modern CPUs have) solve the problem in a much better way anyway.
Kees Cook [Tue, 24 Mar 2026 02:07:30 +0000 (19:07 -0700)]
lkdtm/fortify: Drop unneeded FORTIFY_STR_OBJECT test
The str* family of fortified functions all use member-sized limits
for a while now, so the FORTIFY_STR_OBJECT test is redundant to
FORTIFY_STR_MEMBER. While here, replace the strncpy() use with strscpy(),
as strncpy() is being removed.
Siddharth Nayyar [Thu, 26 Mar 2026 21:25:05 +0000 (21:25 +0000)]
module: use kflagstab instead of *_gpl sections
Read kflagstab section for vmlinux and modules to determine whether
kernel symbols are GPL only.
This patch eliminates the need for fragmenting the ksymtab for infering
the value of GPL-only symbol flag, henceforth stop populating *_gpl
versions of the ksymtab and kcrctab in modpost.
Signed-off-by: Siddharth Nayyar <sidnayyar@google.com> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Siddharth Nayyar [Thu, 26 Mar 2026 21:25:03 +0000 (21:25 +0000)]
module: add kflagstab section to vmlinux and modules
This patch introduces a __kflagstab section to store symbol flags in a
dedicated data structure, similar to how CRCs are handled in the
__kcrctab.
The flags for a given symbol in __kflagstab will be located at the same
index as the symbol's entry in __ksymtab and its CRC in __kcrctab. This
design decouples the flags from the symbol table itself, allowing us to
maintain a single, sorted __ksymtab. As a result, the symbol search
remains an efficient, single lookup, regardless of the number of flags
we add in the future.
The motivation for this change comes from the Android kernel, which uses
an additional symbol flag to restrict the use of certain exported
symbols by unsigned modules, thereby enhancing kernel security. This
__kflagstab can be implemented as a bitmap to efficiently manage which
symbols are available for general use versus those restricted to signed
modules only.
This section will contain read-only data for values of kernel symbol
flags in the form of an 8-bit bitsets for each kernel symbol. Each bit
in the bitset represents a flag value defined by ksym_flags enumeration.
Petr Pavlu ran a small test to get a better understanding of the
different section sizes resulting from this patch series. He used
v6.17-rc6 together with the openSUSE x86_64 config [1], which is fairly
large. The resulting vmlinux.bin (no debuginfo) had an on-disk size of
58 MiB, and included 5937 + 6589 (GPL-only) exported symbols.
The following table summarizes his measurements and calculations
regarding the sizes of all sections related to exported symbols:
| HAVE_ARCH_PREL32_RELOCATIONS | !HAVE_ARCH_PREL32_RELOCATIONS
Section | Base [B] | Ext. [B] | Sep. [B] | Base [B] | Ext. [B] | Sep. [B]
----------------------------------------------------------------------------------------
__ksymtab | 71244 | 200416 | 150312 | 142488 | 400832 | 300624
__ksymtab_gpl | 79068 | NA | NA | 158136 | NA | NA
__kcrctab | 23748 | 50104 | 50104 | 23748 | 50104 | 50104
__kcrctab_gpl | 26356 | NA | NA | 26356 | NA | NA
__ksymtab_strings | 253628 | 253628 | 253628 | 253628 | 253628 | 253628
__kflagstab | NA | NA | 12526 | NA | NA | 12526
----------------------------------------------------------------------------------------
Total | 454044 | 504148 | 466570 | 604356 | 704564 | 616882
Increase to base [%] | NA | 11.0 | 2.8 | NA | 16.6 | 2.1
The column "HAVE_ARCH_PREL32_RELOCATIONS -> Base" contains the measured
numbers. The rest of the values are calculated. The "Ext." column
represents an alternative approach of extending __ksymtab to include a
bitset of symbol flags, and the "Sep." column represents the approach of
having a separate __kflagstab. With HAVE_ARCH_PREL32_RELOCATIONS, each
kernel_symbol is 12 B in size and is extended to 16 B. With
!HAVE_ARCH_PREL32_RELOCATIONS, it is 24 B, extended to 32 B. Note that
this does not include the metadata needed to relocate __ksymtab*, which
is freed after the initial processing.
Adding __kflagstab as a separate section has a negligible impact, as
expected. When extending __ksymtab (kernel_symbol) instead, the worst
case with !HAVE_ARCH_PREL32_RELOCATIONS increases the export data size
by 16.6%. Note that the larger increase in size for the latter approach
is due to 4-byte alignment of kernel_symbol data structure, instead of
1-byte alignment for the flags bitset in __kflagstab in the former
approach.
Based on the above, it was concluded that introducing __kflagstab makes
sense, as the added complexity is minimal over extending kernel_symbol,
and there is overall simplification of symbol finding logic in the
module loader.
Signed-off-by: Siddharth Nayyar <sidnayyar@google.com> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com>
[Sami: Updated commit message to include details from the cover letter.] Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Siddharth Nayyar [Thu, 26 Mar 2026 21:25:02 +0000 (21:25 +0000)]
module: define ksym_flags enumeration to represent kernel symbol flags
The core architectural issue with kernel symbol flags is our reliance on
splitting the main symbol table, ksymtab. To handle a single boolean
property, such as GPL-only, all exported symbols are split across two
separate tables: __ksymtab and __ksymtab_gpl.
This design forces the module loader to perform a separate search on
each of these tables for every symbol it needs, for vmlinux and for all
previously loaded modules.
This approach is fundamentally not scalable. If we were to introduce a
second flag, we would need four distinct symbol tables. For n boolean
flags, this model requires an exponential growth to 2^n tables,
dramatically increasing complexity.
Another consequence of this fragmentation is degraded performance. For
example, a binary search on the symbol table of vmlinux, that would take
only 14 comparison steps (assuming ~2^14 or 16K symbols) in a unified
table, can require up to 26 steps when spread across two tables
(assuming both tables have ~2^13 symbols). This performance penalty
worsens as more flags are added.
To address this, symbol flags is an enumeration used to represent flags
as a bitset, for example a flag to tell if a symbol is GPL only.
The said bitset is introduced in subsequent patches and will contain
values of kernel symbol flags. These bitset will then be used to infer
flag values rather than fragmenting ksymtab for separating symbols with
different flag values, thereby eliminating the need to fragment the
ksymtab.
Link: https://lore.kernel.org/r/20260326-kflagstab-v5-0-fa0796fe88d9@google.com Signed-off-by: Siddharth Nayyar <sidnayyar@google.com> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com>
[Sami: Updated the commit message to explain the use case for the series.] Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Fredric Cover [Mon, 30 Mar 2026 20:11:27 +0000 (13:11 -0700)]
fs/smb/client: fix out-of-bounds read in cifs_sanitize_prepath
When cifs_sanitize_prepath is called with an empty string or a string
containing only delimiters (e.g., "/"), the current logic attempts to
check *(cursor2 - 1) before cursor2 has advanced. This results in an
out-of-bounds read.
This patch adds an early exit check after stripping prepended
delimiters. If no path content remains, the function returns NULL.
The bug was identified via manual audit and verified using a
standalone test case compiled with AddressSanitizer, which
triggered a SEGV on affected inputs.
Signed-off-by: Fredric Cover <FredTheDude@proton.me> Reviewed-by: Henrique Carvalho <[2]henrique.carvalho@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
====================
Fix bpf_link grace period wait for tracepoints
A recent change to non-faultable tracepoints switched from
preempt-disabled critical sections to SRCU-fast, which breaks
assumptions in the bpf_link_free() path. Use call_srcu() to fix the
breakage.
* Introduce and switch to call_tracepoint_unregister_non_faultable(). (Steven)
* Address Andrii's comment and add Acked-by. (Andrii)
* Drop rcu_trace_implies_rcu_gp() conversion. (Alexei)
bpf: Fix grace period wait for tracepoint bpf_link
Recently, tracepoints were switched from using disabled preemption
(which acts as RCU read section) to SRCU-fast when they are not
faultable. This means that to do a proper grace period wait for programs
running in such tracepoints, we must use SRCU's grace period wait.
This is only for non-faultable tracepoints, faultable ones continue
using RCU Tasks Trace.
However, bpf_link_free() currently does call_rcu() for all cases when
the link is non-sleepable (hence, for tracepoints, non-faultable). Fix
this by doing a call_srcu() grace period wait.
As far RCU Tasks Trace gp -> RCU gp chaining is concerned, it is deemed
unnecessary for tracepoint programs. The link and program are either
accessed under RCU Tasks Trace protection, or SRCU-fast protection now.
The earlier logic of chaining both RCU Tasks Trace and RCU gp waits was
to generalize the logic, even if it conceded an extra RCU gp wait,
however that is unnecessary for tracepoints even before this change.
In practice no cost was paid since rcu_trace_implies_rcu_gp() was always
true. Hence we need not chaining any RCU gp after the SRCU gp.
For instance, in the non-faultable raw tracepoint, the RCU read section
of the program in __bpf_trace_run() is enclosed in the SRCU gp, likewise
for faultable raw tracepoint, the program is under the RCU Tasks Trace
protection. Hence, the outermost scope can be waited upon to ensure
correctness.
Also, sleepable programs cannot be attached to non-faultable
tracepoints, so whenever program or link is sleepable, only RCU Tasks
Trace protection is being used for the link and prog.
Fixes: a46023d5616e ("tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast") Reviewed-by: Sun Jian <sun.jian.kdev@gmail.com> Reviewed-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20260331211021.1632902-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Mykyta Yatsenko [Tue, 31 Mar 2026 17:26:34 +0000 (18:26 +0100)]
selftests/bpf: Suppress veristat error messages in non-verbose mode
When running veristat across many BPF objects, expected load failures
produce noisy stderr output that obscures actual issues. Gate these
diagnostic messages behind --verbose.
Menglong Dong [Tue, 31 Mar 2026 07:04:34 +0000 (15:04 +0800)]
selftests/bpf: Test access to ringbuf position with map pointer
Add the testing to access the bpf_ringbuf with the map pointer.
"consumer_pos" and "producer_pos" is accessed in this testing. We reserve
128 bytes in the ringbuf to test the producer_pos, which should be
"128 + BPF_RINGBUF_HDR_SZ".
It will be helpful if we want to evaluate the usage of the ringbuf in bpf
prog with the consumer and producer position.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/bpf/20260331070434.10037-1-dongml2@chinatelecom.cn
Eyal Birger [Tue, 31 Mar 2026 13:06:12 +0000 (06:06 -0700)]
bpf: Clarify BPF_RB_NO_WAKEUP behavior for bpf_ringbuf_discard()
Clarify bpf_ringbuf_discard() documentation for BPF_RB_NO_WAKEUP.
Discarded ring buffer records are still left in the ring buffer and are
only skipped when user space consumes them. This can matter when
BPF_RB_NO_WAKEUP is used: a later submit relying on adaptive wakeup
might not wake the consumer, because the discarded record still needs to
be consumed first.
Zhengchuan Liang [Mon, 30 Mar 2026 08:46:24 +0000 (16:46 +0800)]
net: ipv6: flowlabel: defer exclusive option free until RCU teardown
`ip6fl_seq_show()` walks the global flowlabel hash under the seq-file
RCU read-side lock and prints `fl->opt->opt_nflen` when an option block
is present.
Exclusive flowlabels currently free `fl->opt` as soon as `fl->users`
drops to zero in `fl_release()`. However, the surrounding
`struct ip6_flowlabel` remains visible in the global hash table until
later garbage collection removes it and `fl_free_rcu()` finally tears it
down.
A concurrent `/proc/net/ip6_flowlabel` reader can therefore race that
early `kfree()` and dereference freed option state, triggering a crash
in `ip6fl_seq_show()`.
Fix this by keeping `fl->opt` alive until `fl_free_rcu()`. That matches
the lifetime already required for the enclosing flowlabel while readers
can still reach it under RCU.
Fixes: d3aedd5ebd4b ("ipv6 flowlabel: Convert hash list to RCU.") Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Co-developed-by: Yuan Tan <yuantan098@gmail.com> Signed-off-by: Yuan Tan <yuantan098@gmail.com> Suggested-by: Xin Liu <bird@lzu.edu.cn> Tested-by: Ren Wei <enjou1224z@gmail.com> Signed-off-by: Zhengchuan Liang <zcliangcn@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/07351f0ec47bcee289576f39f9354f4a64add6e4.1774855883.git.zcliangcn@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In case rold->reg->range == BEYOND_PKT_END && rcur->reg->range == N
regsafe() may return true which may lead to current state with
valid packet range not being explored. Fix the bug.
Fixes: 6d94e741a8ff ("bpf: Support for pointers beyond pkt_end.") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Amery Hung <ameryhung@gmail.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20260331204228.26726-1-alexei.starovoitov@gmail.com
Dmytro Maluka [Thu, 19 Mar 2026 15:38:12 +0000 (16:38 +0100)]
selftests/cpu-hotplug: Fix check for cpu hotplug not supported
If CONFIG_HOTPLUG_CPU is disabled, /sys/devices/system/cpu/cpu*
directories are still populated, so the test fails to correctly detect
that CPU hotplug is not supported.
Fix this by checking for the presence of 'online' files in those
directories instead. The 'online' node is created for the given CPU if
and only if this CPU supports hotplug. So if none of the CPUs have
'online' nodes, it means CPU hotplug is not supported.
pstore/ftrace: Keep ftrace module parameter and debugfs switch in sync
Commit a5d05b07961a ("pstore/ftrace: Allow immediate recording")
introduced a kernel parameter to enable early-boot collection for
ftrace frontend. But then, if we enable the debugfs later, the
parameter remains set as N. This is not a biggie, things work fine;
but at the same time, why not have both in sync if possible, right?