hits the WARN_ON_ONCE() in static_key_disable_cpuslocked() and hangs the
system since both sysfs writes are trying to do
amd_pstate_change_driver_mode() without any synchronization.
Grab the "amd_pstate_driver_lock" mutex when modifying "dynamic_epp" to
prevent the two paths from racing with each other. Add a lockdep
assertion for "amd_pstate_driver_lock" in
amd_pstate_change_driver_mode() to formalize the dependency.
Since "cppc_mode" is stable under "amd_pstate_driver_lock", only reload
the driver when in "AMD_PSTATE_ACTIVE" mode and reject all writes when
in passive or guided mode, or if the driver is not loaded, since only
active mode operates on EPP.
Fixes: e30ca6dd5345 ("cpufreq/amd-pstate: Add dynamic energy performance preference") Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lore.kernel.org/r/20260508051748.10484-2-kprateek.nayak@amd.com Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Dave Airlie [Fri, 8 May 2026 04:14:24 +0000 (14:14 +1000)]
Merge tag 'drm-misc-next-2026-05-07' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for v7.2-rc1:
UAPI Changes:
- Support medium/low power modes in amdxdna.
- Support limiting frequency in ivpu.
- Document license for drm core uAPI headers.
- Add the following DRM formats: P230, Y7, XYYY2101010, T430,
XVUY210101010.
Core Changes:
- Do not call drop_master on file close if not master.
- Convert drm-bridge and drm/atomic to use drm_printf_indent.
- Remove the extra call to drm_connector_attach_encoder after
drm_bridge_connector_init().
- Assorted docbook updates.
Driver Changes:
- Bugfixes in amdxdna, ivpu, mipi-dsi, imagination, nouveau, panthor,
bridge/analogix_dp, ipv3, lontium-lt8912b, verisilicon, tve200,
etnaviv, panel/focaltech-ota7290b, panel/jadard-jd9365da-h3,
bridge/ite-it6263, renesas, xlnx, bridge/cdns-dsi, gma500,
bridge/microchip-lvds, mgag200.
- Add support for MStar TSUMU88ADT3-LF-1 bridge.
- Add support for WaveShare 7, Novatek NT35532, Startek KD070HDFLD092,
ChipWealth CH13726A AMOLED, Team Source Display TST070WSNE-196C,
Displaytech DT050BTFT-PTS panels.
- Improve mipi-dsi shutdown and convert a panasonic panel to use the
mipi-dsi wrappers.
- Allowing dumping vbios over debugfs in GSP-RM mode.
- Update maintainers for ivpu, add reviewer for drm-bridge code
and update maintainers for LT8912B DRM HDMI bridge.
- Add test pattern support to bridge/ti-sn65dsi83.
- Convert vmwgfx to vblank timers.
- Add power management to sysfb drm drivers to allow suspend/resume.
- Support the aforementioned new drm formats in xlnx/qynqmp.
- Fix panel Kconfig dependencies.
- Add carveout support for debugging and bringup to amxdna.
- Add support for long command tx via videobuffer in bridge/tc358768.
Hui Wang [Wed, 6 May 2026 13:21:52 +0000 (21:21 +0800)]
riscv: cpufeature: Use pre-defined ISA ext macros to index isa2hwcap
We have pre-defined ISA extension macros, here use those macros to
replace a magic number for isa2hwcap definition and some array
indexing for isa2hwcap access.
This doesn't change the original functionality, just improve the code
maintainability and readability.
Abel Vesa [Mon, 4 May 2026 15:56:52 +0000 (18:56 +0300)]
dt-bindings: soc: qcom,aoss-qmp: Document the Eliza Always-On Subsystem side channel
Document the Always-On Subsystem (AOSS) side channel found on the Qualcomm
Eliza SoC. It is used for communication with other clients, like
remoteprocs.
Add the missing QSPI-to-memory interconnect path alongside the existing
configuration path. Without this path, the interconnect framework cannot
correctly vote for the bandwidth required by QSPI DMA data transfers.
Add the missing QSPI-to-memory interconnect path alongside the existing
configuration path. Without it, the interconnect framework cannot vote for
the bandwidth required by QSPI DMA data transfers.
arm64: dts: qcom: qcs615-ride: Enable QSPI and NOR flash
The QCS615 Ride board has a SPI-NOR flash connected to the QSPI controller
on CS0. Enable the QSPI controller and add the corresponding SPI-NOR flash
node to allow the system to access it.
The Talos (QCS615) platform includes a QSPI controller used for accessing
external flash storage. Add the QSPI OPP table, TLMM pinmux entries, and
the QSPI controller node to enable support for this hardware.
Harshal Dev [Tue, 5 May 2026 07:40:04 +0000 (13:10 +0530)]
arm64: dts: qcom: glymur: Add crypto engine and BAM
On almost all Qualcomm platforms, including Glymur, there is a Crypto
engine IP block to which the CPU can off-load cryptographic computations
for achieving acceleration.
The engine is also DMA capable due to the presence of an associated Bus
Access Manager (BAM) module.
The testsuite defines several kprobes and kretprobes as static variables
that are preserved across test runs.
After register_kprobe and unregister_kprobe, a kprobe contains some
leftover data that must be cleared before the kprobe can be registered
again. The tests are setting symbol_name to define the probe location.
Address and flags must be cleared.
The existing code clears some of the probes between subsequent tests, but
not between two test runs. The leftover data from a previous test run
makes the registrations fail in the next run.
Move the cleanups for all kprobes into kprobes_test_init, this function
is called before each single test (including the first test of a test
run).
Jianpeng Chang [Fri, 8 May 2026 00:56:36 +0000 (09:56 +0900)]
kprobes: skip non-symbol addresses in kprobe_add_ksym_blacklist()
When kprobe_add_area_blacklist() iterates through a section like
.kprobes.text, the start address may not correspond to a named symbol.
On ARM64 with CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS=y (introduced by
commit baaf553d3bc3 ("arm64: Implement
HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS")), the compiler flag
-fpatchable-function-entry=4,2 inserts 2 NOPs before each function entry
point for ftrace call_ops. These pre-function NOPs sit at the section base
address, before the first named function symbol. The compiler emits a $x
mapping symbol at offset 0x00 to mark the start of code, but
find_kallsyms_symbol() ignores mapping symbols.
Without CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS (e.g. defconfig), no
pre-function NOPs are inserted, the first function starts at offset
0x00, and the bug does not trigger.
This only affects modules that have a .kprobes.text section (i.e. those
using the __kprobes annotation). Modules using NOKPROBE_SYMBOL() instead
(like kretprobe_example.ko) blacklist exact function addresses via the
_kprobe_blacklist section and are not affected.
For kprobe_example.ko on ARM64 with -fpatchable-function-entry=4,2,
the .kprobes.text section layout is:
kprobe_add_area_blacklist() starts iterating from the section base
address (offset 0x00), which only has the $x mapping symbol.
kprobe_add_ksym_blacklist() then calls kallsyms_lookup_size_offset()
for this address, which goes through:
find_kallsyms_symbol() scans all module symbols to find the closest
preceding symbol.
Since no named text symbol exists at offset 0x00,
find_kallsyms_symbol() picks __UNIQUE_ID_vermagic (a .modinfo symbol
whose address is in the temporary image) as the "best" match. The
computed "size" = next_text_symbol - modinfo_symbol spans across
these two unrelated memory regions, creating a blacklist entry with
a bogus range of tens of terabytes.
Whether this causes a visible failure depends on address randomization,
here is what happens on Raspberry Pi 4/5:
- On RPi5, the bogus size was ~35 TB. start + size stayed within
64-bit range, so the blacklist entry covered the entire kernel
text. register_kprobe() in the module's own init function failed
with -EINVAL.
- On RPi4, the bogus size was ~75 TB. start + size overflowed
64 bits and wrapped to a small address near zero. The range
check (addr >= start && addr < end) then failed because end
wrapped around, so the bogus entry was accidentally harmless
and kprobes worked by luck.
The same bug exists on both machines, but randomization determines whether
the integer overflow masks it or not.
Fix this by adding notrace to the __kprobes macro. Functions in
.kprobes.text are kprobe infrastructure handlers that should never be
traced by ftrace. With notrace, the compiler stops inserting them and the
non-symbol gap at the section start disappears entirely.
tegra210_mixer_configure_gain() divides a 64-bit BIT_ULL() value by the
fade duration. On 32-bit MIPS, clang emits a call to __udivdi3 for that
plain C division, but that compiler helper is not exported to modules.
Use div_u64() for the inverse duration calculation so the driver uses the
kernel's 64-bit division helper instead of emitting a compiler runtime
call.
Linus Torvalds [Fri, 8 May 2026 00:26:43 +0000 (17:26 -0700)]
Merge tag 'selinux-pr-20260507' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux fixes from Paul Moore:
- Allow for multiple opens of /sys/fs/selinux/policy
Prevent a single process from blocking others from reading the
SELinux policy loaded in the kernel. This does have the side effect
of potentially allowing userspace to trigger additional kernel memory
allocations as part of the open/read operation, but this is mitigated
by requiring the SELinux security/read_policy permission.
- Reduce the critical sections where the SELinux policy mutex is held
This includes the patch to the policy loader code where we move the
permission checks and an allocation outside the mutex as well as the
the patch to checkreqprot which drops the code/lock entirely.
While the checkreqprot code had effectively been dropped in an
earlier release, portions of the code still remained that would have
triggered the mutex to perform an IMA measurement. This finally drops
all of that while preserving the user visible behavior.
- Eliminate potential sources of log spamming
There were a few areas where processes could flood the system logs
and hide other, more critical events. The previously disabled
checkreqprot and runtime disable knobs in selinuxfs were two such
areas that have now been greatly simplified and a pr_err() replaced
with a pr_err_once().
The third such place is the /sys/fs/selinux/user file, which hasn't
been used by a userspace release since 2020 and was scheduled for
removal after 2025; this effectively disables this functionality, but
similar to checkreqprot, it is done in a way that should not break
old userspace.
* tag 'selinux-pr-20260507' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: shrink critical section in sel_write_load()
selinux: allow multiple opens of /sys/fs/selinux/policy
selinux: prune /sys/fs/selinux/user
selinux: prune /sys/fs/selinux/disable
selinux: prune /sys/fs/selinux/checkreqprot
Li Xiasong [Thu, 7 May 2026 14:04:22 +0000 (22:04 +0800)]
netfilter: nf_conntrack_sip: get helper before allocating expectation
process_register_request() allocates an expectation and then checks
whether a conntrack helper is available. If helper lookup fails, the
function returns early and the allocated expectation is left behind.
Reorder the code to fetch and validate helper before calling
nf_ct_expect_alloc(). This keeps the logic simpler and removes the leak
path while preserving existing behavior.
Fixes: e14575fa7529 ("netfilter: nf_conntrack: use rcu accessors where needed") Cc: stable@vger.kernel.org Signed-off-by: Li Xiasong <lixiasong1@huawei.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: nf_conntrack_expect: restore helper propagation via expectation
A recent series to fix expectations broke helper propagation via
expectation, this mechanism is used by the sip and h323 helper. This
also propagates the conntrack helper to expected connections. I changed
semantics of exp->helper which now tells us the actual helper that
created the expectation.
Add an explicit assign_helper field to expectations for this purpose
and update helpers to use it.
Restore this feature for userspace conntrack helper via ctnetlink
nfqueue integration so it is again possible to attach a helper to an
expectation, where it makes sense. This is not restored via ctnetlink
expectation creation as there is no client for such feature. Use the
expectation layer 4 protocol number for the helper lookup for
consistency.
Make sure the expectation using this helper propagation mechanism also
go away when the helper is unregistered.
netfilter: bridge: eb_tables: close module init race
sashiko reports for unrelated patch:
Does the core ebtables initialization in ebtables.c suffer from a similar race?
Once nf_register_sockopt() completes, the sockopts are exposed globally.
sockopt has to be registered last, just like in ip/ip6/arptables.
Fixes: 5b53951cfc85 ("netfilter: ebtables: use net_generic infra") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: x_tables: close dangling table module init race
Similar to the previous ebtables patch:
template add exposes the table to userspace, we must do this last to
rnsure the pernet ops are set up (contain the destructors).
Fixes: fdacd57c79b7 ("netfilter: x_tables: never register tables by default") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: ebtables: close dangling table module init race
sashiko reported for a related patch:
In modules like iptable_raw.c, [..], if register_pernet_subsys() fails,
the rollback might call kfree(rawtable_ops) before [..]
During this window, could a concurrent userspace process find the globally
visible template, trigger table_init(), [..]
The table init functions must always register the template last.
Otherwise, set/getsockopt can instantiate a table in a namespace
while the required pernet ops (contain the destructor) isn't available.
This change is also required in x_tables, handled in followup change.
Fixes: 87663c39f898 ("netfilter: ebtables: do not hook tables by default") Reviewed-by: Tristan Madani <tristan@talencesecurity.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: x_tables: add and use xtables_unregister_table_exit
Previous change added xtables_unregister_table_pre_exit to detach the
table from the packetpath and to unlink it from the active table list.
In case of rmmod, userspace that is doing set/getsockopt for this table
will not be able to re-instantiate the table:
1. The larval table has been removed already
2. existing instantiated table is no longer on the xt pernet table list.
This adds the second stage helper:
unlink the table from the dying list, free the hook ops (if any) and do
the audit notification. It replaces xt_unregister_table().
Fixes: fdacd57c79b7 ("netfilter: x_tables: never register tables by default") Reported-by: Tristan Madani <tristan@talencesecurity.com> Reviewed-by: Tristan Madani <tristan@talencesecurity.com> Closes: https://lore.kernel.org/netfilter-devel/20260429175613.1459342-1-tristmd@gmail.com/ Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: x_tables: unregister the templates first
When the module is going away we need to zap the template
first. Else there is a small race window where userspace
could instantiate a new table after the pernet exit function
has removed the current table.
Fixes: fdacd57c79b7 ("netfilter: x_tables: never register tables by default") Reported-by: Tristan Madani <tristan@talencesecurity.com> Reviewed-by: Tristan Madani <tristan@talencesecurity.com> Closes: https://lore.kernel.org/netfilter-devel/20260429175613.1459342-1-tristmd@gmail.com/ Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: x_tables: add and use xt_unregister_table_pre_exit
Remove the copypasted variants of _pre_exit and add one single
function in the xtables core. ebtables is not compatible with
x_tables and therefore unchanged.
This is a preparation patch to reduce noise in the followup
bug fixes.
netfilter: x_tables: allocate hook ops while under mutex
arp/ip(6)t_register_table() add the table to the per-netns list via
xt_register_table() before allocating the per-netns hook ops copy
via kmemdup_array(). This leaves a window where the table is
visible in the list with ops=NULL.
If the pernet exit happens runs concurrently the pre_exit callback finds
the table via xt_find_table() and passes the NULL ops pointer to
nf_unregister_net_hooks(), causing a NULL dereference:
general protection fault in nf_unregister_net_hooks+0xbc/0x150
RIP: nf_unregister_net_hooks (net/netfilter/core.c:613)
Call Trace:
ipt_unregister_table_pre_exit
iptable_mangle_net_pre_exit
ops_pre_exit_list
cleanup_net
Fix by moving the ops allocation into the xtables core so the table is
never in the list without valid ops. Also ensure the table is no longer
processing packets before its torn down on error unwind.
nf_register_net_hooks might have published at least one hook; call
synchronize_rcu() if there was an error.
audit log register message gets deferred until all operations have
passed, this avoids need to emit another ureg message in case of
error unwinding.
At the moment we emit the audit log a bit too early, which makes it
necessary to also emit an unregister log in case we have to unwind
errors after possible hook register failure.
Followup patch will be slightly simpler if we can delay the
register message until after the hooks have been wired up.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Tabrez Ahmed [Sat, 2 May 2026 02:08:42 +0000 (07:38 +0530)]
hwmon: (ads7871) Fix endianness bug in 16-bit register reads
The ads7871_read_reg16() function relies on spi_w8r16() to read the
16-bit sensor output. The ADS7871 device transmits the Least Significant
Byte (LSB) first.
On Little-Endian architectures, spi_w8r16() correctly reconstructs the
16-bit value. However, on Big-Endian architectures, the byte swapping
causes the first received byte (LSB) to be placed in the most significant
byte of the u16, resulting in corrupted voltage readings.
To fix this, cast the integer result of spi_w8r16() to a restricted
__le16 type and convert it to the host CPU's native byte order using
le16_to_cpu(). Negative error codes returned by the SPI core are caught
and returned prior to the conversion to avoid mangling the error status.
Reported-by: Sashiko <sashiko-bot@kernel.org> Closes: https://sashiko.dev/#/patchset/20260418034601.90226-1-tabreztalks@gmail.com Fixes: e0c70b8078629 ("hwmon: add TI ads7871 a/d converter driver") Suggested-by: David Laight <david.laight.linux@gmail.com> Signed-off-by: Tabrez Ahmed <tabreztalks@gmail.com> Link: https://lore.kernel.org/r/20260502020844.110038-2-tabreztalks@gmail.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Sensors configurations are defined by set and clear masks. These
do not follow a consistent "clear mask is a superset of set mask"
rule. This relaxed definition breaks lm75_write_config()
Basically all bits from set_mask that are not defined in clr_mask are
dropped. Fix that by enhancing the helper to always combine clr_mask
and set_mask into the mask bits of regmap_update_bits().
hwmon: (lm75) Fix AS6200 and TMP112 setup and alarm handling
The initialization of the AS6200 has two shortcomings
- The device-add-commit states "Conversion mode: continuous" but the
the lm75_params structure uses set_mask = 0x94c0. This activates
single shot mode (bit 15). According to the datasheet "The device
features a single shot measurement mode if the device is in sleep
mode (SM=1)". This is quite contradictionary.
- It is the only device that activates polarity active-high (bit 10)
All this is paired with a undefined clear mask bug in function
lm75_write_config() that was introduced with a later refactoring
commit.
[as6200] = {
.config_reg_16bits = true,
.set_mask = 0x94C0,
-> .clr_mask not defined here
.default_resolution = 12,
...
static inline int lm75_write_config(struct lm75_data *data, u16 set_mask,
u16 clr_mask)
{
return regmap_update_bits(data->regmap, LM75_REG_CONF,
clr_mask | LM75_SHUTDOWN, set_mask);
}
regmap_update_bits() requires clr_mask to be a superset of set_mask.
So basically all sensors with "wrong" masks like the AS6200 are not
initialized as intended.
Fix that by
- Change the set_mask to 0xc010 to reflect the current active-low
setup properly and to drive the sensor in continous mode. This
takes into account that the config register is little endian and
the first byte sent to the chip is the LSB.
- Adapt the alarm handling so it can report the alarm correctly
even if it is high active. This is done by comparing config register
bit 5 and 10 (translated to 2 and 13).
This commit does not introduce any ABI breakage as the mutliple bugs
effectly drive the AS6200 in standard active-low mode.
Fixes: 4b6358e1fe46 ("hwmon: (lm75) Add AMS AS6200 temperature sensor") Suggested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Markus Stockhausen <markus.stockhausen@gmx.de> Link: https://lore.kernel.org/r/20260502173207.3567876-2-markus.stockhausen@gmx.de
[groeck: Update set_mask for as6200 further: As modeled, the upper bits
contain the conversion rate, so the config register needs to be set to
0xc010 instead of 0x10c0 to reflect 8 samples/s and 4 consecutive faults.
Fix the same problem for TMP112.] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Dave Airlie [Thu, 7 May 2026 22:51:01 +0000 (08:51 +1000)]
Merge tag 'drm-xe-fixes-2026-05-07' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
UAPI Changes:
Cross-subsystem Changes:
Core Changes:
Driver Changes:
- Add NULL check for media_gt in intel_hdcp_gsc_check_status (Gustavo)
- Fix EAGAIN sign in pf_migration_consume (Shuicheng)
- Fix MMIO access using PF view instead of VF view during migration (Shuicheng)
- Exclude indirect ring state page from ADS engine state size (Satya)
Dave Airlie [Thu, 7 May 2026 22:34:34 +0000 (08:34 +1000)]
Merge tag 'drm-rust-fixes-2026-05-07' of https://gitlab.freedesktop.org/drm/rust/kernel into drm-fixes
DRM Rust fixes for v7.1-rc3
- Fix unsound initialization in drm::Device::new(); if pinned
initialization of drm::Device::Data fails, make sure
drm::Device::release() isn't called, so we don't run the data's
destructor
- Fix missing GEM state cleanup in the init failure case; call
drm_gem_private_object_fini() if drm_gem_object_init() fails
- Fix wrong ARef import in the DRM shmem GEM helper abstraction
- Replace the nouveau mailing list with the new nova-gpu mailing list
for both nova-core and nova-drm, and remove unused patchwork entries
Robbie Ko [Fri, 1 May 2026 02:41:56 +0000 (10:41 +0800)]
btrfs: fix incorrect i_size after remount caused by KEEP_SIZE prealloc gap
When fallocate() with FALLOC_FL_KEEP_SIZE preallocates an extent past the
current i_size, the file_extent_tree of the inode is updated to cover
that range. However, on the next mount, btrfs_read_locked_inode() only
re-populates file_extent_tree with [0, round_up(i_size, sectorsize)),
losing the marks that belonged to the KEEP_SIZE prealloc extent beyond
i_size.
Later, when a non-KEEP_SIZE fallocate() extends i_size into / past that
old prealloc extent, the reservation loop in btrfs_fallocate() skips
already-prealloc segments and does not call into the path that marks the
file_extent_tree, so a gap remains inside the file_extent_tree across
[old_aligned_i_size, start_of_new_alloc). Then __btrfs_prealloc_file_range()
calls btrfs_inode_safe_disk_i_size_write(), which uses
find_contiguous_extent_bit() starting at offset 0 to derive disk_i_size.
The walk stops at the gap, so disk_i_size ends up smaller than i_size and
gets persisted. After the next mount, the file shows the wrong (smaller)
size.
# non-KEEP_SIZE fallocate that overlaps the previous prealloc tail
# and extends past it
fallocate -o 7M -l 2M $MNT/file1
ls -lh $MNT/file1
umount $MNT
mount $DEV $MNT
ls -lh $MNT/file1
umount $MNT
Running the reproducer gives the following result:
The size before the second mount is correct (9M), but after the
remount it drops to 7M, i.e. the start of the gap inside file_extent_tree.
Fix this in __btrfs_prealloc_file_range() by marking the entire range
[round_down(old_i_size, sectorsize), round_up(new_i_size, sectorsize))
in file_extent_tree before updating i_size and calling
btrfs_inode_safe_disk_i_size_write(). This ensures the contiguous bit
search starting from 0 is not truncated by a stale gap left behind by a
previous KEEP_SIZE prealloc that was not restored on inode load.
The fix has no effect when the NO_HOLES feature is enabled because
btrfs_inode_safe_disk_i_size_write() and
btrfs_inode_set_file_extent_range()
both take the fast path that directly tracks disk_i_size without
consulting file_extent_tree.
Fixes: 9ddc959e802b ("btrfs: use the file extent tree infrastructure") Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Robbie Ko <robbieko@synology.com>
[ Minor updates to the change log ] Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
btrfs: only release the dirty pages io tree after successful writes
[WARNING]
With extra warning on dirty extent buffers at umount (aka, the next
patch in the series), test case generic/388 can trigger the following
warning about dirty extent buffers at unmount time:
BTRFS critical (device dm-2 state E): emergency shutdown
BTRFS error (device dm-2 state E): error while writing out transaction: -30
BTRFS warning (device dm-2 state E): Skipping commit of aborted transaction.
BTRFS error (device dm-2 state EA): Transaction 9 aborted (error -30)
BTRFS: error (device dm-2 state EA) in cleanup_transaction:2068: errno=-30 Readonly filesystem
BTRFS info (device dm-2 state EA): forced readonly
BTRFS info (device dm-2 state EA): last unmount of filesystem 4fbf2e15-f941-49a0-bc7c-716315d2777c
------------[ cut here ]------------
WARNING: disk-io.c:3311 at invalidate_and_check_btree_folios+0xfd/0x1ca [btrfs], CPU#8: umount/914368
CPU: 8 UID: 0 PID: 914368 Comm: umount Tainted: G OE 7.1.0-rc1-custom+ #372 PREEMPT(full) 2de38db8d1deae71fde295430a0ff3ab98ccf596
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
RIP: 0010:invalidate_and_check_btree_folios+0xfd/0x1ca [btrfs]
Call Trace:
<TASK>
close_ctree+0x52e/0x574 [btrfs d2f0b1cd330d1287e7a9919d112eadfc0e914efd]
generic_shutdown_super+0x89/0x1a0
kill_anon_super+0x16/0x40
btrfs_kill_super+0x16/0x20 [btrfs d2f0b1cd330d1287e7a9919d112eadfc0e914efd]
deactivate_locked_super+0x2d/0xb0
cleanup_mnt+0xdc/0x140
task_work_run+0x5a/0xa0
exit_to_user_mode_loop+0x123/0x4b0
do_syscall_64+0x243/0x7c0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
</TASK>
---[ end trace 0000000000000000 ]---
BTRFS warning (device dm-2 state EA): unable to release extent buffer 30539776 owner 9 gen 9 refs 2 flags 0x7
BTRFS warning (device dm-2 state EA): unable to release extent buffer 30621696 owner 257 gen 9 refs 2 flags 0x7
BTRFS warning (device dm-2 state EA): unable to release extent buffer 30638080 owner 258 gen 9 refs 2 flags 0x7
BTRFS warning (device dm-2 state EA): unable to release extent buffer 30654464 owner 7 gen 9 refs 2 flags 0x7
BTRFS warning (device dm-2 state EA): unable to release extent buffer 30703616 owner 2 gen 9 refs 2 flags 0x7
BTRFS warning (device dm-2 state EA): unable to release extent buffer 30720000 owner 10 gen 9 refs 2 flags 0x7
BTRFS warning (device dm-2 state EA): unable to release extent buffer 30736384 owner 4 gen 9 refs 2 flags 0x7
BTRFS warning (device dm-2 state EA): unable to release extent buffer 30752768 owner 11 gen 9 refs 2 flags 0x7
I'm using a stripped down version, which seems to trigger the warning
more reliably:
for (( i = 0; i < $runtime; i++ )); do
echo "=== $i/$runtime ==="
workload
done
[CAUSE]
Inside btrfs_write_and_wait_transaction(), we first try to write all
dirty ebs, then wait for them to finish.
After that we call btrfs_extent_io_tree_release() to free all
extent states from dirty_pages io tree.
However if we hit an error from btrfs_write_marked_extent(), then we
still call btrfs_extent_io_tree_release() to clear that dirty_pages io
tree, which may contain dirty records that we haven't yet submitted.
Furthermore, the later transaction cleanup path will utilize that
dirty_pages io tree to properly cleanup those dirty ebs, but since it's
already empty, no dirty ebs are properly cleaned up, thus will later
trigger the warnings inside invalidate_btree_folios().
[FIX]
Normally such dirty ebs won't cause problems, as when the iput() is
called on the btree inode, the dirty ebs will be forcibly written back,
and since the fs is already in an error status, such writeback will not
reach disk and finish immediately.
But it's still better to get rid of such dirty ebs, if we ended up with
dirty ebs but the fs is not in an error status, then such writeback at
iput() time will be too late, as all workers are already stopped but
writeback will utilize workers, which will lead to NULL pointer
dereferences.
Instead of unconditionally calling btrfs_extent_io_tree_release(), only
call it if btrfs_write_and_wait_transaction() finished successfully, so
that @dirty_pages extent io tree is kept untouched for transaction
cleanup.
CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Tue, 28 Apr 2026 15:58:56 +0000 (16:58 +0100)]
btrfs: tracepoints: fix sleep while in atomic context in btrfs_sync_file()
The trace event btrfs_sync_file() is called in an atomic context (all trace
events are) and its call to dput(), which is needed due to the call to
dget_parent(), can sleep, triggering a kernel splat.
This can be reproduced by enabling the trace event and running btrfs/056
from fstests for example. The splat shown in dmesg is the following:
So stop using dget_parent() and dput() and access the parent dentry
directly as dentry->d_parent. This is also what ext4 is doing in
its equivalent trace event ext4_sync_file_enter().
Fixes: a85b46db143f ("btrfs: tracepoints: get correct superblock from dentry in event btrfs_sync_file()") Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
(lldb) source list -a add_ra_bio_pages+0x13c
.../vmlinux.unstripped add_ra_bio_pages + 316 at .../fs/btrfs/compression.c:454:8
451
452 folio = filemap_alloc_folio(mapping_gfp_constraint(mapping, constraint_gfp),
453 0, NULL);
-> 454 if (!folio)
455 break;
I can reproduce this consistently by running a memory hog concurrently
with a buffered writer on a machine with a very large amount of swap.
Commit 7ae37b2c94ed ("btrfs: prevent direct reclaim during compressed
readahead") clearly intended to suppress these warnings. But because the
mask set in the address_space with mapping_set_gfp_mask() doesn't include
__GFP_NOWARN, mapping_gfp_constraint() removes it from constraint_gfp
before it is passed to filemap_alloc_folio().
Fix by refactoring the code to add __GFP_NOWARN after the call to
mapping_gfp_constraint().
Fixes: 7ae37b2c94ed ("btrfs: prevent direct reclaim during compressed readahead") Signed-off-by: Calvin Owens <calvin@wbinvd.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
ZhengYuan Huang [Wed, 25 Mar 2026 00:43:39 +0000 (08:43 +0800)]
btrfs: fix check_chunk_block_group_mappings() to iterate all chunk maps
[BUG]
A corrupted image with a chunk present in the chunk tree but whose
corresponding block group item is missing from the extent tree can be
mounted successfully, even though check_chunk_block_group_mappings()
is supposed to catch exactly this corruption at mount time. Once
mounted, running btrfs balance with a usage filter (-dusage=N or
-dusage=min..max) triggers a null-ptr-deref:
KASAN: null-ptr-deref in range [0x0000000000000070-0x0000000000000077]
RIP: 0010:chunk_usage_filter fs/btrfs/volumes.c:3874 [inline]
RIP: 0010:should_balance_chunk fs/btrfs/volumes.c:4018 [inline]
RIP: 0010:__btrfs_balance fs/btrfs/volumes.c:4172 [inline]
RIP: 0010:btrfs_balance+0x2024/0x42b0 fs/btrfs/volumes.c:4604
[CAUSE]
The crash occurs because __btrfs_balance() iterates the on-disk chunk
tree, finds the orphaned chunk, calls chunk_usage_filter() (or
chunk_usage_range_filter()), which queries the in-memory block group
cache via btrfs_lookup_block_group(). Since no block group was ever
inserted for this chunk, the lookup returns NULL, and the subsequent
dereference of cache->used crashes.
check_chunk_block_group_mappings() uses btrfs_find_chunk_map() to
iterate the in-memory chunk map (fs_info->mapping_tree):
map = btrfs_find_chunk_map(fs_info, start, 1);
With @start = 0 and @length = 1, btrfs_find_chunk_map() looks for a
chunk map that *contains* the logical address 0. If no chunk contains
logical address 0, btrfs_find_chunk_map(fs_info, 0, 1) returns NULL
immediately and the loop breaks after the very first iteration,
having checked zero chunks. The entire verification function is therefore
a no-op, and the corrupted image passes the mount-time check undetected.
[FIX]
Replace the btrfs_find_chunk_map() based loop with a direct in-order
walk of fs_info->mapping_tree using rb_first_cached() + rb_next().
This guarantees that every chunk map in the tree is visited regardless
of the logical addresses involved.
No lock is taken around the traversal. This function is called during
mount from btrfs_read_block_groups(), which is invoked from open_ctree()
before any background threads (cleaner, transaction kthread, etc.) are
started. There are therefore no concurrent writers that could modify
mapping_tree at this point. An analogous lockless direct traversal of
mapping_tree already exists in fill_dummy_bgs() in the same file.
Since we walk the rb-tree directly via rb_entry() without going through
btrfs_find_chunk_map(), no reference is taken on each map entry, so the
btrfs_free_chunk_map() calls are also removed.
Signed-off-by: ZhengYuan Huang <gality369@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Tejun Heo [Thu, 7 May 2026 22:09:21 +0000 (12:09 -1000)]
sched_ext: Drop unused scx_find_sub_sched() stub
scx_find_sub_sched()'s only caller, scx_bpf_sub_dispatch(), is gated on
CONFIG_EXT_SUB_SCHED. When CONFIG_EXT_SUB_SCHED=n the caller compiles out
and the stub becomes dead code, tripping -Wunused-function on randconfigs.
Drop the stub.
Fixes: 25037af712eb ("sched_ext: Add rhashtable lookup for sub-schedulers") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/all/202605080556.42PXw8U9-lkp@intel.com/ Signed-off-by: Tejun Heo <tj@kernel.org>
Tristan Madani [Tue, 5 May 2026 11:12:59 +0000 (11:12 +0000)]
hfs/hfsplus: zero-initialize buffer in hfs_bnode_read
hfs_bnode_read() can return early without writing to the output buffer
when is_bnode_offset_valid() fails or when check_and_correct_requested_
length() corrects the length to zero. Callers such as hfs_bnode_read_
u16() and hfs_bnode_read_u8() pass stack-allocated buffers and use the
result unconditionally, leading to KMSAN uninit-value reports.
Rather than initializing at each individual call site, zero the buffer
at the start of hfs_bnode_read() before any validation checks. This
ensures all callers in both hfs and hfsplus get a deterministic zero
value regardless of which early-return path is taken.
Tristan Madani [Tue, 5 May 2026 11:12:58 +0000 (11:12 +0000)]
hfs/hfsplus: fix u32 overflow in check_and_correct_requested_length
check_and_correct_requested_length() compares (off + len) against
node_size using u32 arithmetic. When the caller passes a large len
value (e.g. from an underflowed subtraction in hfs_brec_remove()),
off + len can wrap past 2^32 and produce a small result, causing the
bounds check to pass when it should fail.
For example, with off=14 and len=0xFFFFFFF2 (underflowed from
data_off - keyoffset - size in hfs_brec_remove), off + len wraps to 6,
which is less than a typical node_size of 512, so the check passes and
the subsequent memmove reads ~4GB past the node buffer.
Fix this by widening the addition to u64 before comparing against
node_size. This prevents the u32 wrap while keeping the logic
straightforward.
Chen Wandun [Thu, 7 May 2026 10:54:34 +0000 (18:54 +0800)]
cgroup/cpuset: move PF_EXITING check before __GFP_HARDWALL in cpuset_current_node_allowed()
Since prepare_alloc_pages() unconditionally adds __GFP_HARDWALL for the
fast path when cpusets are enabled, the __GFP_HARDWALL check in
cpuset_current_node_allowed() causes the PF_EXITING escape path to be
skipped on the first allocation attempt. This makes it unreachable in
the common case, so dying tasks can get stuck in direct reclaim or even
trigger OOM while trying to exit, despite being allowed to allocate from
any node.
Move the PF_EXITING check before __GFP_HARDWALL so that dying tasks
can allocate memory from any node to exit quickly, even when cpusets
are enabled.
Also update the function comment to reflect the actual behavior of
prepare_alloc_pages() and the corrected check ordering.
Signed-off-by: Chen Wandun <chenwandun@lixiang.com> Acked-by: Michal Koutný <mkoutny@suse.com> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
umh: replace use of system_unbound_wq with system_dfl_wq
This patch continues the effort to refactor workqueue APIs, which has begun
with the changes introducing new workqueues and a new alloc_workqueue flag:
commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")
The point of the refactoring is to eventually alter the default behavior of
workqueues to become unbound by default so that their workload placement is
optimized by the scheduler.
Before that to happen, workqueue users must be converted to the better named
new workqueues with no intended behaviour changes:
rapidio: rio: add WQ_PERCPU to alloc_workqueue users
This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:
commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")
The refactoring is going to alter the default behavior of
alloc_workqueue() to be unbound by default.
With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU. For more details see the Link tag below.
In order to keep alloc_workqueue() behavior identical, explicitly request
WQ_PERCPU.
media: ddbridge: add WQ_PERCPU to alloc_workqueue users
This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:
commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")
The refactoring is going to alter the default behavior of
alloc_workqueue() to be unbound by default.
With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU. For more details see the Link tag below.
In order to keep alloc_workqueue() behavior identical, explicitly request
WQ_PERCPU.
platform: cznic: turris-omnia-mcu: replace use of system_wq with system_percpu_wq
Currently if a user enqueues a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistency cannot be addressed without refactoring the API.
This patch continues the effort to refactor worqueue APIs, which has begun
with the change introducing new workqueues and a new alloc_workqueue flag:
commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")
Replace system_wq with system_percpu_wq, keeping the same behavior.
The old wq (system_wq) will be kept for a few release cycles.
Cc: Marek Behun <kabel@kernel.org> Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Marek Behún <kabel@kernel.org>
media: synopsys: hdmirx: replace use of system_unbound_wq with system_dfl_wq
This patch continues the effort to refactor workqueue APIs, which has begun
with the changes introducing new workqueues and a new alloc_workqueue flag:
commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")
The point of the refactoring is to eventually alter the default behavior of
workqueues to become unbound by default so that their workload placement is
optimized by the scheduler.
Before that to happen, workqueue users must be converted to the better named
new workqueues with no intended behaviour changes:
virt: acrn: Add WQ_PERCPU to alloc_workqueue users
This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:
commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")
The refactoring is going to alter the default behavior of
alloc_workqueue() to be unbound by default.
With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU. For more details see the Link tag below.
In order to keep alloc_workqueue() behavior identical, explicitly request
WQ_PERCPU.
Nishad Saraf [Tue, 5 May 2026 16:09:36 +0000 (09:09 -0700)]
accel/amdxdna: Add AIE4 work buffer initialization
NPU firmware requires a host-allocated work buffer for hardware contexts.
Allocate a 4 MB host buffer and attach it to device during device init.
Refactor aie2_alloc_msg_buffer() and aie2_free_msg_buffer() into common
helpers by moving them to aie.c and renaming them to
amdxdna_alloc_msg_buffer() and amdxdna_free_msg_buffer(), allowing both
AIE2 and AIE4 to reuse the implementation.
David Zhang [Wed, 6 May 2026 16:16:42 +0000 (09:16 -0700)]
accel/amdxdna: Add AIE4 metadata query support
Add support for querying device metadata on AIE4 via a mailbox message.
Refactor aie2_get_aie_metadata() into a common helper by moving it to
aie.c and renaming it to amdxdna_get_metadata(), allowing both AIE2
and AIE4 to reuse the implementation.
David Zhang [Tue, 5 May 2026 16:09:34 +0000 (09:09 -0700)]
accel/amdxdna: Add command doorbell and wait support
Expose the command doorbell register to userspace on a per-hardware
context basis, enabling applications to notify the firmware of pending
commands via doorbell writes.
Introduce DRM_IOCTL_AMDXDNA_WAIT_CMD to allow userspace to wait for
completion of individual commands.
Lizhi Hou [Thu, 7 May 2026 04:02:07 +0000 (21:02 -0700)]
accel/amdxdna: Fix clflush buffer size
The firmware is told the buffer is req.buf_size bytes. It may read/write
the entire region. If the CPU only flushes a subset, the remaining cache
lines could contain stale data, causing the device to see garbage.
Tejun Heo [Thu, 7 May 2026 21:05:31 +0000 (11:05 -1000)]
sched_ext: Move scx_error() out of scx_link_sched()'s lock region
scx_link_sched() holds scx_sched_lock. The scx_error() calls inside take the
same lock through scx_claim_exit() and deadlock. Move them out of the guard.
Fixes: 6b4576b09714 ("sched_ext: Reject sub-sched attachment to a disabled parent") Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andrea Righi <arighi@nvidia.com>
smb: client: validate dacloffset before building DACL pointers
parse_sec_desc(), build_sec_desc(), and the chown path in
id_mode_to_cifs_acl() all add the server-supplied dacloffset to pntsd
before proving a DACL header fits inside the returned security
descriptor.
On 32-bit builds a malicious server can return dacloffset near
U32_MAX, wrap the derived DACL pointer below end_of_acl, and then slip
past the later pointer-based bounds checks. build_sec_desc() and
id_mode_to_cifs_acl() can then dereference DACL fields from the wrapped
pointer in the chmod/chown rewrite paths.
Validate dacloffset numerically before building any DACL pointer and
reuse the same helper at the three DACL entry points.
Fixes: bc3e9dd9d104 ("cifs: Change SIDs in ACEs while transferring file ownership.") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-6 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Zisen Ye [Wed, 6 May 2026 03:49:08 +0000 (11:49 +0800)]
smb/client: fix out-of-bounds read in smb2_compound_op()
If a server sends a truncated response but a large OutputBufferLength, and
terminates the EA list early, check_wsl_eas() returns success without
validating that the entire OutputBufferLength fits within iov_len.
Then smb2_compound_op() does:
memcpy(idata->wsl.eas, data[0], size[0]);
Where size[0] is OutputBufferLength. If iov_len is smaller than size[0],
memcpy can read beyond the end of the rsp_iov allocation and leak adjacent
kernel heap memory.
Zisen Ye [Sat, 2 May 2026 10:48:36 +0000 (18:48 +0800)]
smb/client: fix out-of-bounds read in symlink_data()
Since smb2_check_message() returns success without length validation for
the symlink error response, in symlink_data() it is possible for
iov->iov_len to be smaller than sizeof(struct smb2_err_rsp). If the buffer
only contains the base SMB2 header (64 bytes), accessing
err->ErrorContextCount (at offset 66) or err->ByteCount later in
symlink_data() will cause an out-of-bounds read.
Piyush Sachdeva [Thu, 7 May 2026 16:52:14 +0000 (22:22 +0530)]
smb: client: Zero-pad short GSS session keys per MS-SMB2
Per MS-SMB2 section 3.2.5.3, Session.SessionKey is the first 16 bytes
of the GSS cryptographic key, right-padded with zero bytes if the key
is shorter than 16 bytes.
SMB2_auth_kerberos() copies the GSS session key from the cifs.upcall
response using kmemdup(msg->data, msg->sesskey_len, ...) and stores
the GSS-reported length verbatim in ses->auth_key.len. generate_key()
reads SMB2_NTLMV2_SESSKEY_SIZE bytes from this buffer when feeding the
HMAC-SHA256 KDF for signing key derivation. If a GSS mechanism returns
a session key shorter than 16 bytes (e.g. a deprecated single-DES
Kerberos enctype with an 8-byte session key), the KDF call performs an
out-of-bounds slab read and derives keys that do not match the server,
which pads per the spec.
Modern KDCs disable short-key enctypes by default, so this is latent
rather than reachable in production, but it is still a kernel heap
over-read.
Allocate auth_key.response with kzalloc() at a length of
max(msg->sesskey_len, SMB2_NTLMV2_SESSKEY_SIZE), copy the GSS key in,
and rely on kzalloc()'s zero initialization for the spec-mandated
padding. Set ses->auth_key.len to the padded length. Larger GSS keys
(e.g. the 32-byte aes256-cts-hmac-sha1-96 session key) continue to be
stored at their natural length, preserving the FullSessionKey path.
Emit a cifs_dbg(VFS, ...) message when a short key is encountered to
surface deprecated-enctype usage.
NTLMv2 and NTLMSSP code paths produce a 16-byte session key by
construction and are unaffected.
Signed-off-by: Piyush Sachdeva <psachdeva@microsoft.com> Signed-off-by: Piyush Sachdeva <s.piyush1024@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Piyush Sachdeva [Thu, 7 May 2026 16:52:13 +0000 (22:22 +0530)]
smb: client: Use FullSessionKey for AES-256 encryption key derivation
When Kerberos authentication is used with AES-256 encryption (AES-256-CCM
or AES-256-GCM), the SMB3 encryption and decryption keys must be derived
using the full session key (Session.FullSessionKey) rather than just the
first 16 bytes (Session.SessionKey).
Per MS-SMB2 section 3.2.5.3.1, when Connection.Dialect is "3.1.1" and
Connection.CipherId is AES-256-CCM or AES-256-GCM, Session.FullSessionKey
must be set to the full cryptographic key from the GSS authentication
context. The encryption and decryption key derivation (SMBC2SCipherKey,
SMBS2CCipherKey) must use this FullSessionKey as the KDF input. The
signing key derivation continues to use Session.SessionKey (first 16
bytes) in all cases.
Previously, generate_key() hardcoded SMB2_NTLMV2_SESSKEY_SIZE (16) as the
HMAC-SHA256 key input length for all derivations. When Kerberos with
AES-256 provides a 32-byte session key, the KDF for encryption/decryption
was using only the first 16 bytes, producing keys that did not match the
server's, causing mount failures with sec=krb5 and require_gcm_256=1.
Add a full_key_size parameter to generate_key() and pass the appropriate
size from generate_smb3signingkey():
- Signing: always SMB2_NTLMV2_SESSKEY_SIZE (16 bytes)
- Encryption/Decryption: ses->auth_key.len when AES-256, otherwise 16
Also fix cifs_dump_full_key() to report the actual session key length for
AES-256 instead of hardcoded CIFS_SESS_KEY_SIZE, so that userspace tools
like Wireshark receive the correct key for decryption.
Cc: <stable@vger.kernel.org> Reviewed-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Piyush Sachdeva <psachdeva@microsoft.com> Signed-off-by: Piyush Sachdeva <s.piyush1024@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
drm/dp/mst: fix buffer overflows in sideband chunk accumulation
drm_dp_sideband_append_payload() has three related bugs when processing
device-provided sideband reply data:
1. Zero-length curchunk_len underflow: msg_len is a 6-bit field taken
directly from the DP sideband header. If a device sends msg_len=0,
curchunk_len is set to zero. The condition (curchunk_idx >= curchunk_len)
is immediately true, and curchunk_len-1 wraps to 255 (u8 underflow).
drm_dp_msg_data_crc4() reads 255 bytes from chunk[48], then memcpy()
writes 255 bytes into msg[], both far out of bounds.
2. chunk[48] overflow: curchunk_len can reach 63 (6-bit field). chunk[] is
only 48 bytes. Multi-iteration payload assembly appends 16-byte blocks
until curchunk_idx reaches curchunk_len, writing up to 15 bytes past
the end of chunk[] into msg[].
3. msg[256] overflow: each chunk contributes (curchunk_len-1) bytes to
msg[]. No check ensures curlen + (curchunk_len-1) stays within msg[256],
so the memcpy can spill into adjacent struct fields.
All three are reachable from any DP MST device that can forge sideband
reply messages on a physical connection.
Cross-merge networking fixes after downstream PR (net-7.1-rc3).
Conflicts:
net/ipv4/igmp.c 726fa7da2d8c ("ipv4: igmp: get rid of IGMPV3_{QQIC,MRC} and simplify calculation") c6bebaa744f7 ("ipv4: igmp: annotate data-races in igmp_heard_query()")
https://lore.kernel.org/a7365e4873340f7a5e30411207de3bf9@kernel.org
Adjacent changes:
net/psp/psp_main.c 30cb24f97d44 ("psp: strip variable-length PSP header in psp_dev_rcv()") c2b22277ad89 ("psp: validate IPv4 header fields in psp_dev_rcv()")
Francois Dugast [Thu, 7 May 2026 09:16:06 +0000 (11:16 +0200)]
drm: Drop HPAGE_PMD_SIZE dependency in dma_iova_try_alloc calls
The phys argument to dma_iova_try_alloc() is used only to compute
the sub-granule offset (phys & (granule - 1)). Since HPAGE_PMD_SIZE
is a power of two larger than any IOMMU granule, this expression
always evaluates to zero.
Replace the ternary expressions with a plain 0, which is what the
API documentation recommends for callers doing PAGE_SIZE-aligned
transfers. This also removes the dependency on HPAGE_PMD_SIZE and
thus on CONFIG_PGTABLE_HAS_HUGE_LEAVES / HAVE_ARCH_TRANSPARENT_HUGEPAGE,
fixing build failures on architectures such as arm32 that lack that
config.
Dmitry Torokhov [Mon, 4 May 2026 18:54:46 +0000 (11:54 -0700)]
Input: atmel_mxt_ts - check mem_size before calculating config memory size
In mxt_update_cfg(), the driver calculates the memory size needed to store
the configuration as data->mem_size - cfg.start_ofs. If data->mem_size is
less than or equal to cfg.start_ofs, this calculation will underflow or
result in a zero-size buffer, neither of which is valid for a configuration
update.
Add a check to return -EINVAL if data->mem_size is too small. While at it,
change the types of start_ofs and mem_size in struct mxt_cfg to u16 to
match the device address space.
Dmitry Torokhov [Mon, 4 May 2026 18:54:45 +0000 (11:54 -0700)]
Input: atmel_mxt_ts - fix boundary check in mxt_prepare_cfg_mem
When a configuration file provides an object size that is larger than the
driver's known mxt_obj_size(object), the driver intends to discard the
extra bytes.
The loop iterates using for (i = 0; i < size; i++). Inside the loop, the
condition to skip processing extra bytes is:
if (i > mxt_obj_size(object))
continue;
Since i is a 0-based index, the valid indices for the object are 0 through
mxt_obj_size(object) - 1.
When i == mxt_obj_size(object), the condition evaluates to false, and the
code processes the byte instead of discarding it.
This causes the code to calculate byte_offset = reg + i - cfg->start_ofs
and writes the byte there, overwriting exactly one byte of the adjacent
instance or object.
Update the boundary check to skip extra bytes correctly by using >=.
Input: fm801-gp - simplify initialisation of pci_device_id array
Instead of assigning the pci_device_id members using a list (which is
hard to read as you need to look at the order of the members in that
struct in parallel) use the PCI_VDEVICE() convenience macro to compact
the initialisation while improving readability.
Also drop trailing zeros that the compiler will care about then.
The change doesn't introduce binary changes to the compiled driver,
verified on both ARCH=x86 and ARCH=arm64.
Daniel Machon [Wed, 6 May 2026 07:25:39 +0000 (09:25 +0200)]
net: sparx5: configure serdes for 1000BASE-X in sparx5_port_init()
sparx5_port_init() only invokes sparx5_serdes_set() and the associated
shadow-device enable and low-speed device switch for SGMII and QSGMII.
On any port with a high-speed primary device (DEV5G/DEV10G/DEV25G)
configured for 1000BASE-X the serdes is therefore left uninitialized,
the DEV2G5 shadow is never enabled, and the port stays pointed at its
high-speed device rather than the DEV2G5. The PCS1G block looks
healthy in isolation, but no frames reach the link partner.
Add 1000BASE-X to the check so the same three steps run.
Note: the same issue might apply to 2500BASE-X, but that will,
eventually, be addressed in a separate commit.
The value read back from the chip is GCB_CHIP_ID_PART_ID, which is a
GENMASK(27, 12) field, i.e. at most 16 bits wide. It can never match
these IDs, so probing a TSN part fails with a "Target not supported"
error.
Fix the enum to use the actual 16-bit part IDs returned by the
hardware: 0x0546, 0x0549, 0x0552, 0x0556 and 0x0558.
Linus Torvalds [Thu, 7 May 2026 15:55:15 +0000 (08:55 -0700)]
Merge tag 'sound-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Again a collection of small fixes, mostly for device-specific ones.
The only big LOC is about the removal of pretty old dead code in
ab8500 codec driver, while the rest all nice small changes.
Core / API:
- Fix race in deferred fasync state checks
- Fix UMP group filtering in sequencer
ASoC:
- cs35l56: fixes for driver cleanup and error paths
- tas2764/2770: workaround for bogus temperature readings
- wm_adsp: fixes for firmware unit tests
- amd-yc: more DMI quirks for laptops
- Minor fixes for fsl_xcvr and spacemit
HD-Audio:
- Mute LED and speaker quirks for HP, Lenovo, and Xiaomi laptops
USB-audio:
- New device-specific quirks (Motu, JBL, AlphaTheta, Razer)
- Fix of MIDI2 playback on resume
Others:
- Firewire-tascam control event fix
- Minor cleanups and fixes for sparc/dbri and pcmtest"
* tag 'sound-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (28 commits)
ASoC: cs35l56: Destroy workqueue in probe error path
ASoC: cs35l56: Don't use devres to unregister component
ALSA: sparc/dbri: add missing fallthrough
ALSA: core: Serialize deferred fasync state checks
ALSA: hda/realtek: Add mute LED fixup for HP Pavilion 15-cs1xxx
ALSA: seq: Fix UMP group 16 filtering
ASoC: wm_adsp_fw_find_test: Clear searched_fw_files in find-by-index test
ASoC: wm_adsp_fw_find_test: Redirect wm_adsp_release_firmware_files()
ASoC: tas2770: Deal with bogus initial temperature value
ASoC: tas2764: Deal with bogus initial temperature register value
ALSA: usb-audio: add clock quirk for Motu 1248
ALSA: usb-audio: midi2: Restart output URBs on resume
ALSA: hda/realtek: Fix mute and mic-mute LEDs for HP Envy X360 15-fh0xxx
ALSA: usb-audio: Add quirk flags for JBL Pebbles
ALSA: firewire-tascam: Do not drop unread control events
ALSA: usb-audio: Add quirk flags for AlphaTheta EUPHONIA
ASoC: fsl_xcvr: Fix event generation for cached controls
ASoC: sdw_utils: avoid the SDCA companion function not supported failure
ASoC: amd: yc: Add HP OMEN Gaming Laptop 16-ap0xxx product line in quirk table
ASoC: cs35l56: Fix out-of-bounds in dev_err() in cs35l56_read_onchip_spkid()
...
Peng Fan [Mon, 27 Apr 2026 01:01:48 +0000 (09:01 +0800)]
soc: imx8m: Fix match data lookup for soc device
The i.MX8M soc device is registered via platform_device_register_simple(),
so it is not associated with a Device Tree node and the imx8m_soc_driver
has no of_match_table.
As a result, device_get_match_data() always returns NULL when probing
the soc device.
Retrieve the match data directly from the machine compatible using
of_machine_get_match_data(imx8_soc_match), which provides the correct SoC
data.
Fixes: 2524b293a59e5 ("soc: imx8m: don't access of_root directly") Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: Frank Li <Frank.Li@nxp.com>
Linus Torvalds [Thu, 7 May 2026 15:43:25 +0000 (08:43 -0700)]
Merge tag 'pmdomain-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm
Pull pmdomain fixes from Ulf Hansson:
- Fix detach procedure for virtual devices in genpd
- mediatek: Fix use-after-free in scpsys_get_bus_protection_legacy()
* tag 'pmdomain-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
pmdomain: mediatek: fix use-after-free in scpsys_get_bus_protection_legacy()
pmdomain: core: Fix detach procedure for virtual devices in genpd
Joey Lu [Wed, 6 May 2026 08:46:13 +0000 (16:46 +0800)]
net: stmmac: dwmac-nuvoton: fix NULL pointer dereference in nvt_set_phy_intf_sel()
priv->dev was never initialized after devm_kzalloc() allocates the
private data structure. When nvt_set_phy_intf_sel() is later invoked
via the phylink interface_select callback, it calls
nvt_gmac_get_delay(priv->dev, ...) which dereferences the NULL pointer.
Fix this by assigning priv->dev = dev immediately after allocation.
If a socket is bound to a wildcard address, tcp_v[46]_connect()
updates it with a non-wildcard address based on the route lookup.
After bhash2 was introduced in the cited commit, we must call
inet_bhash2_update_saddr() to update the bhash2 entry as well.
If inet_bhash2_update_saddr() fails, we must release the refcount
for dst by ip_route_connect() or ip6_dst_lookup_flow().
While tcp_v4_connect() calls ip_rt_put() in the error path,
tcp_v6_connect() does not call dst_release().
Let's call dst_release() when inet_bhash2_update_saddr() fails
in tcp_v6_connect().
Fixes: 28044fc1d495 ("net: Add a bhash2 table hashed by port and address") Reported-by: Damiano Melotti <melotti@google.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260506070443.1699879-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Justin Chen [Tue, 5 May 2026 17:39:26 +0000 (10:39 -0700)]
net: phy: broadcom: Save PHY counters during suspend
The PHY counters can be lost if the PHY is reset during suspend. We
need to save the values into the shadow counters or the accounting
will be incorrect over multiple suspend and resume cycles.
Fixes: 820ee17b8d3b ("net: phy: broadcom: Add support code for reading PHY counters") Signed-off-by: Justin Chen <justin.chen@broadcom.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20260505173926.2870069-1-justin.chen@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
D. Wythe [Wed, 6 May 2026 01:41:05 +0000 (09:41 +0800)]
net/smc: fix missing sk_err when TCP handshake fails
In smc_connect_work(), when the underlying TCP handshake fails, the error
code (rc) must be propagated to sk_err to ensure userspace can correctly
retrieve the error status via SO_ERROR. Currently, the code only handles
a restricted set of error codes (e.g., EPIPE, ECONNREFUSED). If other
errors occurs, such as EHOSTUNREACH, sk_err remains unset (zero).
This affects applications that rely on SO_ERROR to determine connect
outcome. For example, higher versions of Go's netpoller treats
SO_ERROR == 0 combined with a failed getpeername() as a spurious wakeup
and re-enters epoll_wait(). Under ET mode, no further edge will be
generated since the socket is already in a terminal state, causing the
connect to hang indefinitely or until a user-specified timeout, if one
is set.
Jiexun Wang [Wed, 6 May 2026 14:08:23 +0000 (22:08 +0800)]
af_unix: Reject SIOCATMARK on non-stream sockets
SIOCATMARK reports whether the receive queue is at the urgent mark for
MSG_OOB.
In AF_UNIX, MSG_OOB is supported only for SOCK_STREAM sockets.
SOCK_DGRAM and SOCK_SEQPACKET reject MSG_OOB in sendmsg() and recvmsg(),
so they should not support SIOCATMARK either.
Return -EOPNOTSUPP for non-stream sockets before checking the receive
queue.
Fixes: 314001f0bf92 ("af_unix: Add OOB support") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Suggested-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Jiexun Wang <wangjiexun2025@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260506140825.2987635-1-n05ec@lzu.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The top-level MFD driver now passes the device properties to the GPIO
cell via the software node. Use generic device property accessors and
stop using platform data. We can ignore the "ngpios" property here now
as it will be retrieved internally by GPIO core.
mfd: timberdale: Set up a software node for the GPIO cell
Using generic device properties instead of custom platform data
structures is preferred due to the resulting unification of the way
properties are accessed in consumer drivers. There's no DT node for the
GPIO cell in this driver but we can create a software node with device
properties and attach it to all the GPIO cells.
Shuhao Fu [Tue, 28 Apr 2026 08:12:38 +0000 (16:12 +0800)]
ALSA: hda: cs35l41: Put ACPI device on missing physical node
acpi_dev_get_first_match_dev() returns a refcounted ACPI device and
callers must balance it with acpi_dev_put().
cs35l41_hda_read_acpi() stores the returned ACPI device in
cs35l41->dacpi. That reference is normally released by the later
probe cleanup or the remove path, but the NULL-check on
physdev exits before either of those paths can run.
Drop the lookup reference before returning -ENODEV.
Fixes: c34b04cc6178 ("ALSA: hda: cs35l41: Fix NULL pointer dereference in cs35l41_hda_read_acpi()") Signed-off-by: Shuhao Fu <sfual@cse.ust.hk> Tested-by: Simon Trimmer <simont@opensource.cirrus.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: https://patch.msgid.link/20260428081238.GA1659932@chcpu16
Shuhao Fu [Tue, 28 Apr 2026 08:01:39 +0000 (16:01 +0800)]
ALSA: hda: cs35l56: Put ACPI device after setting companion
acpi_dev_get_first_match_dev() returns a refcounted ACPI device and
callers are expected to balance it with acpi_dev_put().
When no companion is already attached, cs35l56_hda_read_acpi() looks
up an ACPI device and sets it with ACPI_COMPANION_SET(), but leaves
the lookup reference held.
ACPI_COMPANION_SET() does not take ownership of that reference, so
drop it with acpi_dev_put() after attaching the companion.
Fixes: 73cfbfa9caea ("ALSA: hda/cs35l56: Add driver for Cirrus Logic CS35L56 amplifier") Signed-off-by: Shuhao Fu <sfual@cse.ust.hk> Tested-by: Simon Trimmer <simont@opensource.cirrus.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: https://patch.msgid.link/20260428080139.GA1649104@chcpu16
3b497c3f4f04 ("fs/resctrl: Introduce the interface to display monitoring modes")
introduced CONFIG_RESCTRL_ASSIGN_FIXED but left adding the Kconfig
entry until it was necessary. The counter assignment mode is fixed in
MPAM, even when there are assignable counters, and so addressing this
is needed to support MPAM.
To avoid the burden of another Kconfig entry, replace
CONFIG_RESCTRL_ASSIGN_FIXED with a new property in 'struct resctrl_mon',
'mbm_cntr_assign_fixed' to be set by the architecture.
Do not request the architecture to change the counter assignment mode if it
does not support doing so. Provide insight to user space about why such a
request fails.
veth: fix OOB txq access in veth_poll() with asymmetric queue counts
XDP redirect into a veth device (via bpf_redirect()) calls
veth_xdp_xmit(), which enqueues frames into the peer's ptr_ring using
smp_processor_id() % peer->real_num_rx_queues
as the ring index. With an asymmetric veth pair where the peer has
fewer TX queues than RX queues, that index can exceed
peer->real_num_tx_queues.
veth_poll() then resolves peer_txq for the ring via:
where queue_idx = rq->xdp_rxq.queue_index. When queue_idx exceeds
peer_dev->real_num_tx_queues this is an out-of-bounds (OOB) access
into the peer's netdev_queue array, triggering DEBUG_NET_WARN_ON_ONCE
in netdev_get_tx_queue().
The normal ndo_start_xmit path is not affected: the stack clamps
skb->queue_mapping via netdev_cap_txqueue() before invoking
ndo_start_xmit, so rxq in veth_xmit() never exceeds real_num_tx_queues.
Fix veth_poll() by clamping: only dereference peer_txq when queue_idx is
within bounds, otherwise set it to NULL. The out-of-range rings are fed
exclusively via XDP redirect (veth_xdp_xmit), never via ndo_start_xmit
(veth_xmit), so the peer txq was never stopped and there is nothing to
wake; NULL is the correct fallback.
Reported-by: Sashiko <sashiko-bot@kernel.org> Closes: https://lore.kernel.org/all/20260502071828.616C3C19425@smtp.kernel.org/ Fixes: dc82a33297fc ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops") Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://patch.msgid.link/20260505132159.241305-2-hawk@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
drm/panfrost: Fix wait_bo ioctl leaking positive return from dma_resv_wait_timeout()
dma_resv_wait_timeout() returns a positive 'remaining jiffies' value
on success, 0 on timeout, and -errno on failure.
panfrost_ioctl_wait_bo() returns this 'long' result from an int-typed
ioctl handler, so positive values reach userspace as bogus errors.
Explicitly set ret to 0 on the success path.
accel/rocket: Fix prep_bo ioctl leaking positive return from dma_resv_wait_timeout()
dma_resv_wait_timeout() returns a positive 'remaining jiffies' value
on success, 0 on timeout, and -errno on failure.
rocket_ioctl_prep_bo() returns this 'long' result from an int-typed
ioctl handler, so positive values reach userspace as bogus errors.
Explicitly set ret to 0 on the success path.
Thomas Richard [Mon, 27 Apr 2026 09:40:33 +0000 (11:40 +0200)]
backlight: cgbc: Remove redundant X86 dependency
The backlight driver depends on the MFD cgbc-core driver, which already
depends on X86. The explicit X86 dependency for the backlight driver is
redundant and can be safely removed.