]> git.ipfire.org Git - thirdparty/kernel/linux.git/log
thirdparty/kernel/linux.git
3 weeks agoeventpoll: defer struct eventpoll free to RCU grace period
Nicholas Carlini [Tue, 31 Mar 2026 13:25:32 +0000 (15:25 +0200)] 
eventpoll: defer struct eventpoll free to RCU grace period

In certain situations, ep_free() in eventpoll.c will kfree the epi->ep
eventpoll struct while it still being used by another concurrent thread.
Defer the kfree() to an RCU callback to prevent UAF.

Fixes: f2e467a48287 ("eventpoll: Fix semi-unbounded recursion")
Signed-off-by: Nicholas Carlini <nicholas@carlini.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
3 weeks agoaccel/ivpu: Trigger recovery on TDR with OS scheduling
Karol Wachowski [Thu, 2 Apr 2026 12:55:26 +0000 (14:55 +0200)] 
accel/ivpu: Trigger recovery on TDR with OS scheduling

With OS scheduling mode the driver cannot determine which context
caused the timeout, so context abort cannot be used. Instead of
queuing context_abort_work, directly trigger full device recovery
when a job timeout (TDR) occurs in OS scheduling mode.

Fixes: ade00a6c903f ("accel/ivpu: Perform engine reset instead of device recovery on TDR")
Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Reviewed-by: Lizhi Hou <lizhi.hou@amd.com>
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Link: https://patch.msgid.link/20260402125526.845210-1-karol.wachowski@linux.intel.com
3 weeks agosched_ext: Fix is_bpf_migration_disabled() false negative on non-PREEMPT_RCU
Changwoo Min [Thu, 2 Apr 2026 02:31:50 +0000 (11:31 +0900)] 
sched_ext: Fix is_bpf_migration_disabled() false negative on non-PREEMPT_RCU

Since commit 8e4f0b1ebcf2 ("bpf: use rcu_read_lock_dont_migrate() for
trampoline.c"), the BPF prolog (__bpf_prog_enter) calls migrate_disable()
only when CONFIG_PREEMPT_RCU is enabled, via rcu_read_lock_dont_migrate().
Without CONFIG_PREEMPT_RCU, the prolog never touches migration_disabled,
so migration_disabled == 1 always means the task is truly
migration-disabled regardless of whether it is the current task.

The old unconditional p == current check was a false negative in this
case, potentially allowing a migration-disabled task to be dispatched to
a remote CPU and triggering scx_error in task_can_run_on_remote_rq().

Only apply the p == current disambiguation when CONFIG_PREEMPT_RCU is
enabled, where the ambiguity with the BPF prolog still exists.

Fixes: 8e4f0b1ebcf2 ("bpf: use rcu_read_lock_dont_migrate() for trampoline.c")
Cc: stable@vger.kernel.org # v6.18+
Link: https://lore.kernel.org/lkml/20250821090609.42508-8-dongml2@chinatelecom.cn/
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
3 weeks agodrm/amd/display: Wire up dcn10_dio_construct() for all pre-DCN401 generations
Ionut Nechita [Mon, 23 Mar 2026 21:13:43 +0000 (23:13 +0200)] 
drm/amd/display: Wire up dcn10_dio_construct() for all pre-DCN401 generations

Description:
 - Commit b82f0759346617b2 ("drm/amd/display: Migrate DIO registers access
   from hwseq to dio component") moved DIO_MEM_PWR_CTRL register access
   behind the new dio abstraction layer but only created the dio object for
   DCN 4.01. On all other generations (DCN 10/20/21/201/30/301/302/303/
   31/314/315/316/32/321/35/351/36), the dio pointer is NULL, causing the
   register write to be silently skipped.

   This results in AFMT HDMI memory not being powered on during init_hw,
   which can cause HDMI audio failures and display issues on affected
   hardware including Renoir/Cezanne (DCN 2.1) APUs that use dcn10_init_hw.

   Call dcn10_dio_construct() in each older DCN generation's resource.c
   to create the dio object, following the same pattern as DCN 4.01. This
   ensures the dio pointer is non-NULL and the mem_pwr_ctrl callback works
   through the dio abstraction for all DCN generations.

Fixes: b82f07593466 ("drm/amd/display: Migrate DIO registers access from hwseq to dio component.")
Reviewed-by: Ivan Lipski <ivan.lipski@amd.com>
Signed-off-by: Ionut Nechita <ionut_n2001@yahoo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agosched_ext: Fix missing warning in scx_set_task_state() default case
Samuele Mariotti [Thu, 2 Apr 2026 17:00:25 +0000 (19:00 +0200)] 
sched_ext: Fix missing warning in scx_set_task_state() default case

In scx_set_task_state(), the default case was setting the
warn flag, but then returning immediately. This is problematic
because the only purpose of the warn flag is to trigger
WARN_ONCE, but the early return prevented it from ever firing,
leaving invalid task states undetected and untraced.

To fix this, a WARN_ONCE call is now added directly in the
default case.

The fix addresses two aspects:

 - Guarantees the invalid task states are properly logged
   and traced.

 - Provides a distinct warning message
   ("sched_ext: Invalid task state") specifically for
   states outside the defined scx_task_state enum values,
   making it easier to distinguish from other transition
   warnings.

This ensures proper detection and reporting of invalid states.

Signed-off-by: Samuele Mariotti <smariotti@disroot.org>
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
3 weeks agoMerge tag 'v7.0-rc6-ksmbd-server-fix' of git://git.samba.org/ksmbd
Linus Torvalds [Thu, 2 Apr 2026 19:03:15 +0000 (12:03 -0700)] 
Merge tag 'v7.0-rc6-ksmbd-server-fix' of git://git.samba.org/ksmbd

Pull smb server fix from Steve French:

 - Fix out of bound write

* tag 'v7.0-rc6-ksmbd-server-fix' of git://git.samba.org/ksmbd:
  ksmbd: fix OOB write in QUERY_INFO for compound requests

3 weeks agoata: libata-transport: remove static variable ata_scsi_transport_template
Heiner Kallweit [Thu, 2 Apr 2026 13:32:13 +0000 (15:32 +0200)] 
ata: libata-transport: remove static variable ata_scsi_transport_template

Simplify the code by making struct ata_scsi_transportt public, instead
of using separate variable ata_scsi_transport_template.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
3 weeks agoata: libata-transport: split struct ata_internal
Heiner Kallweit [Thu, 2 Apr 2026 13:31:22 +0000 (15:31 +0200)] 
ata: libata-transport: split struct ata_internal

There's no need for an umbrella struct, so remove it. It's also a
prerequisite for making the embedded struct scsi_transport_template
public.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
3 weeks agoata: libata-transport: use static struct ata_transport_internal to simplify match...
Heiner Kallweit [Thu, 2 Apr 2026 13:30:48 +0000 (15:30 +0200)] 
ata: libata-transport: use static struct ata_transport_internal to simplify match functions

Both matching functions can make use of static struct
ata_transport_internal. This eliminates the dependency on static
variable ata_scsi_transport_template, and it allows to remove helper
to_ata_internal(). Small drawback is that a forward declaration of
both functions is needed.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
3 weeks agofuse: support FSCONFIG_SET_FD for "fd" option
Miklos Szeredi [Thu, 12 Mar 2026 19:30:08 +0000 (20:30 +0100)] 
fuse: support FSCONFIG_SET_FD for "fd" option

This is not only cleaner to use in userspace (no need to sprintf the fd to
a string) but also allows userspace to detect that the devfd can be closed
after the fsconfig call.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
3 weeks agofuse: clean up device cloning
Miklos Szeredi [Thu, 12 Mar 2026 11:19:10 +0000 (12:19 +0100)] 
fuse: clean up device cloning

 - fuse_mutex is not needed for device cloning, because fuse_dev_install()
   uses cmpxcg() to set fud->fc, which prevents races between clone/mount
   or clone/clone.  This makes the logic simpler

 - Drop fc->dev_count.  This is only used to check in release if the device
   is the last clone, but checking list_empty(&fc->devices) is equivalent
   after removing the released device from the list.  Removing the fuse_dev
   before calling fuse_abort_conn() is okay, since the processing and io
   lists are now empty for this device.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
3 weeks agoata: libata-transport: inline ata_attach|release_transport
Heiner Kallweit [Thu, 2 Apr 2026 13:30:05 +0000 (15:30 +0200)] 
ata: libata-transport: inline ata_attach|release_transport

Both functions are helpers which are used only once. So remove them and
merge their code into libata_transport_init() and libata_transport_exit()
respectively.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
3 weeks agoata: libata-transport: instantiate struct ata_internal statically
Heiner Kallweit [Thu, 2 Apr 2026 13:29:21 +0000 (15:29 +0200)] 
ata: libata-transport: instantiate struct ata_internal statically

Struct ata_internal is only instantiated once, in module init code.
So we can also instantiate it statically, which allows simplifying
the code.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
3 weeks agofuse: don't require /dev/fuse fd to be kept open during mount
Miklos Szeredi [Wed, 11 Mar 2026 21:27:44 +0000 (22:27 +0100)] 
fuse: don't require /dev/fuse fd to be kept open during mount

With the new mount API the sequence of syscalls would be:

        fs_fd = fsopen("fuse", 0);
snprintf(opt, sizeof(opt), "%i", devfd);
fsconfig(fs_fd, FSCONFIG_SET_STRING, "fd", opt, 0);
/* ... */
fsconfig(fs_fd, FSCONFIG_CMD_CREATE, 0, 0, 0);

Current mount code just stores the value of devfd in the fs_context and
uses it in during FSCONFIG_CMD_CREATE, which is inelegant.

Instead grab a reference to the underlying fuse_dev, and use that during
the filesystem creation.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
3 weeks agofuse: add refcount to fuse_dev
Miklos Szeredi [Wed, 11 Mar 2026 21:05:17 +0000 (22:05 +0100)] 
fuse: add refcount to fuse_dev

This will make it possible to grab the fuse_dev and subsequently release
the file that it came from.

In the above case, fud->fc will be set to FUSE_DEV_FC_DISCONNECTED to
indicate that this is no longer a functional device.

When trying to assign an fc to such a disconnected fuse_dev, the fc is set
to the disconnected state.

Use atomic operations xchg() and cmpxchg() to prevent races.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
3 weeks agofuse: create fuse_dev on /dev/fuse open instead of mount
Miklos Szeredi [Wed, 11 Mar 2026 20:02:41 +0000 (21:02 +0100)] 
fuse: create fuse_dev on /dev/fuse open instead of mount

Allocate struct fuse_dev when opening the device.  This means that unlike
before, ->private_data is always set to a valid pointer.

The use of USE_DEV_SYNC_INIT magic pointer for the private_data is now
replaced with a simple bool sync_init member.

If sync INIT is not set, I/O on the device returns error before mount.
Keep this behavior by checking for the ->fc member.  If fud->fc is set, the
mount has succeeded.  Testing this used READ_ONCE(file->private_data) and
smp_mb() to try and provide the necessary semantics.  Switch this to
smp_store_release() and smp_load_acquire().

Setting fud->fc is protected by fuse_mutex, this is unchanged.

Will need this later so the /dev/fuse open file reference is not held
during FSCONFIG_CMD_CREATE.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
3 weeks agofuse: check connection state on notification
Miklos Szeredi [Thu, 26 Mar 2026 10:45:44 +0000 (11:45 +0100)] 
fuse: check connection state on notification

Check if the connection is fully initialized and connected before trying to
process a notification form the fuse server.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
3 weeks agofuse: fuse_dev_ioctl_clone() should wait for device file to be initialized
Miklos Szeredi [Thu, 2 Apr 2026 18:19:55 +0000 (20:19 +0200)] 
fuse: fuse_dev_ioctl_clone() should wait for device file to be initialized

Use fuse_get_dev() not __fuse_get_dev() on the old fd, since in the case of
synchronous INIT the caller will want to wait for the device file to be
available for cloning, just like I/O wants to wait instead of returning an
error.

Fixes: dfb84c330794 ("fuse: allow synchronous FUSE_INIT")
Cc: stable@vger.kernel.org # v6.18
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
3 weeks agoMerge branch 'net-stmmac-tso-fixes-cleanups'
Jakub Kicinski [Thu, 2 Apr 2026 18:28:23 +0000 (11:28 -0700)] 
Merge branch 'net-stmmac-tso-fixes-cleanups'

Russell King says:

====================
net: stmmac: TSO fixes/cleanups

This is a more refined version of the previous patch series fixing
and cleaning up the TSO code.

I'm not sure whether "TSO" or "GSO" should be used to describe this
feature - although it primarily handles TCP, dwmac4 appears to also
be able to handle UDP.

In essence, this series adds a .ndo_features_check() method to handle
whether TSO/GSO can be used for a particular skbuff - checking which
queue the skbuff is destined for and whether that has TBS available
which precludes TSO being enabled on that channel.

I'm also adding a check that the header is smaller than 1024 bytes,
as documented in those sources which have TSO support - this is due
to the hardware buffering the header in "TSO memory" which I guess
is limited to 1KiB. I expect this test never to trigger, but if
the headers ever exceed that size, the hardware will likely fail.
While IPv4 headers are unlikely to be anywhere near this, there is
nothing in the protocol which prevents IPv6 headers up to 64KiB.

As we now have a .ndo_features_check() method, I'm moving the VLAN
insertion for TSO packets into core code by unpublishing the VLAN
insertion features when we use TSO. Another move is for checksumming,
which is required for TSO, but stmmac's requirements for offloading
checksums are more strict - and this seems to be a bug in the TSO
path.

I've changed the hardware initialisation to always enable TSO support
on the channels even if the user requests TSO/GSO to be disabled -
this fixes another issue as pointed out by Jakub in a previous review.

I'm moving the setup of the GSO features, cleaning those up, and
adding a warning if platform glue requests this to be enabled but the
hardware has no support. Hopefully this will never trigger if everyone
got the STMMAC_FLAG_TSO_EN flag correct. Also adding a check for TxPBL
value.

Finally, moving the "TSO supported" message to the new
stmmac_set_gso_features() function so keep all this TSO stuff together.
====================

Link: https://patch.msgid.link/aczHVF04LIGq_lYO@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: stmmac: move "TSO supported" message to stmmac_set_gso_features()
Russell King (Oracle) [Wed, 1 Apr 2026 07:22:20 +0000 (08:22 +0100)] 
net: stmmac: move "TSO supported" message to stmmac_set_gso_features()

Move the "TSO supported" message to stmmac_set_gso_features() so that
we group all probe-time TSO stuff in one place.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7pu8-0000000Eau5-3Zne@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: stmmac: check txpbl for TSO
Russell King (Oracle) [Wed, 1 Apr 2026 07:22:15 +0000 (08:22 +0100)] 
net: stmmac: check txpbl for TSO

Documentation states that TxPBL must be >= 4 to allow TSO support, but
the driver doesn't check this. TxPBL comes from the platform glue code
or DT. Add a check with a warning if platform glue code attempts to
enable TSO support with TxPBL too low.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7pu3-0000000Eatz-39ts@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: stmmac: add warning when TSO is requested but unsupported
Russell King (Oracle) [Wed, 1 Apr 2026 07:22:10 +0000 (08:22 +0100)] 
net: stmmac: add warning when TSO is requested but unsupported

Add a warning message if TSO is requested by the platform glue code but
the core wasn't configured for TSO.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7pty-0000000Eatt-2TjZ@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: stmmac: make stmmac_set_gso_features() more readable
Russell King (Oracle) [Wed, 1 Apr 2026 07:22:05 +0000 (08:22 +0100)] 
net: stmmac: make stmmac_set_gso_features() more readable

Make stmmac_set_gso_features() more readable by adding some whitespace
and getting rid of the indentation.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7ptt-0000000Eatn-1ziK@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: stmmac: split out gso features setup
Russell King (Oracle) [Wed, 1 Apr 2026 07:22:00 +0000 (08:22 +0100)] 
net: stmmac: split out gso features setup

Move the GSO features setup into a separate function, co-loated with
other GSO/TSO support.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7pto-0000000Eath-1VDH@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: stmmac: simplify GSO/TSO test in stmmac_xmit()
Russell King (Oracle) [Wed, 1 Apr 2026 07:21:55 +0000 (08:21 +0100)] 
net: stmmac: simplify GSO/TSO test in stmmac_xmit()

The test in stmmac_xmit() to see whether we should pass the skbuff to
stmmac_tso_xmit() is more complex than it needs to be. This test can
be simplified by storing the mask of GSO types that we will pass, and
setting it according to the enabled features.

Note that "tso" is a mis-nomer since commit b776620651a1 ("net:
stmmac: Implement UDP Segmentation Offload"). Also note that this
commit controls both via the TSO feature. We preserve this behaviour
in this commit.

Also, this commit unconditionally accessed skb_shinfo(skb)->gso_type
for all frames, even when skb_is_gso() was false. This access is
eliminated.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7ptj-0000000Eatb-11zK@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: stmmac: move check for hardware checksum supported
Russell King (Oracle) [Wed, 1 Apr 2026 07:21:50 +0000 (08:21 +0100)] 
net: stmmac: move check for hardware checksum supported

Add a check in .ndo_features_check() to indicate whether hardware
checksum can be performed on the skbuff. Where hardware checksum is
not supported - either because the channel does not support Tx COE
or the skb isn't suitable (stmmac uses a tighter test than
can_checksum_protocol()) we also need to disable TSO, which will be
done by harmonize_features() in net/core/dev.c

This fixes a bug where a channel which has COE disabled may still
receive TSO skbuffs.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7pte-0000000EatU-0ILt@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: stmmac: move TSO VLAN tag insertion to core code
Russell King (Oracle) [Wed, 1 Apr 2026 07:21:44 +0000 (08:21 +0100)] 
net: stmmac: move TSO VLAN tag insertion to core code

stmmac_tso_xmit() checks whether the skbuff is trying to offload
vlan tag insertion to hardware, which from the comment in the code
appears to be buggy when the TSO feature is used.

Rather than stmmac_tso_xmit() inserting the VLAN tag, handle this
in stmmac_features_check() which will then use core net code to
handle this. See net/core/dev.c::validate_xmit_skb()

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7ptY-0000000EatO-42Qv@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: stmmac: add GSO MSS checks
Russell King (Oracle) [Wed, 1 Apr 2026 07:21:39 +0000 (08:21 +0100)] 
net: stmmac: add GSO MSS checks

Add GSO MSS checks to stmmac_features_check().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7ptT-0000000EatI-3feh@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: stmmac: add TSO check for header length
Russell King (Oracle) [Wed, 1 Apr 2026 07:21:34 +0000 (08:21 +0100)] 
net: stmmac: add TSO check for header length

According to the STM32MP151 documentation which covers dwmac v4.2, the
hardware TSO feature can handle header lengths up to a maximum of 1023
bytes.

Add a .ndo_features_check() method implementation to check the header
length meets these requirements, otherwise fall back to software GSO.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7ptO-0000000EatC-39il@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: stmmac: add stmmac_tso_header_size()
Russell King (Oracle) [Wed, 1 Apr 2026 07:21:29 +0000 (08:21 +0100)] 
net: stmmac: add stmmac_tso_header_size()

We will need to compute the size of the protocol headers in two places,
so move this into a separate function.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7ptJ-0000000Eat5-2ZlA@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: stmmac: fix TSO support when some channels have TBS available
Russell King (Oracle) [Wed, 1 Apr 2026 07:21:24 +0000 (08:21 +0100)] 
net: stmmac: fix TSO support when some channels have TBS available

According to the STM32MP25xx manual, which is dwmac v5.3, TBS (time
based scheduling) is not permitted for channels which have hardware
TSO enabled. Intel's commit 5e6038b88a57 ("net: stmmac: fix TSO and
TBS feature enabling during driver open") concurs with this, but it
is incomplete.

This commit avoids enabling TSO support on the channels which have
TBS available, which, as far as the hardware is concerned, means we
do not set the TSE bit in the DMA channel's transmit control register.

However, the net device's features apply to all queues(channels), which
means these channels may still be handed TSO skbs to transmit, and the
driver will pass them to stmmac_tso_xmit(). This will generate the
descriptors for TSO, even though the channel has the TSE bit clear.

Fix this by checking whether the queue(channel) has TBS available,
and if it does, fall back to software GSO support.

Fixes: 5e6038b88a57 ("net: stmmac: fix TSO and TBS feature enabling during driver open")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7ptE-0000000Easz-28tv@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: stmmac: fix .ndo_fix_features()
Russell King (Oracle) [Wed, 1 Apr 2026 07:21:19 +0000 (08:21 +0100)] 
net: stmmac: fix .ndo_fix_features()

netdev features documentation requires that .ndo_fix_features() is
stateless: it shouldn't modify driver state. Yet, stmmac_fix_features()
does exactly that, changing whether GSO frames are processed by the
driver.

Move this code to stmmac_set_features() instead, which is the correct
place for it. We don't need to check whether TSO is supported; this
is already handled via the setup of netdev->hw_features, and we are
guaranteed that if netdev->hw_features indicates that a feature is
not supported, .ndo_set_features() won't be called with it set.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7pt9-0000000East-1YAO@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: stmmac: fix channel TSO enable on resume
Russell King (Oracle) [Wed, 1 Apr 2026 07:21:14 +0000 (08:21 +0100)] 
net: stmmac: fix channel TSO enable on resume

Rather than configuring the channels depending on whether GSO/TSO is
currently enabled by the user, always enable if the hardware has TSO
support and the platform wants TSO to be enabled.

This avoids the channel TSO enable bit being disabled after a resume
when the user has disabled TSO features. This will cause problems when
the user re-enables TSO.

This bug goes back to commit f748be531d70 ("stmmac: support new GMAC4")

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7pt4-0000000Easn-14WL@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agontfs3: fix memory leak in indx_create_allocate()
Deepanshu Kartikey [Mon, 23 Mar 2026 05:21:48 +0000 (10:51 +0530)] 
ntfs3: fix memory leak in indx_create_allocate()

When indx_create_allocate() fails after
attr_allocate_clusters() succeeds, run_deallocate()
frees the disk clusters but never frees the memory
allocated by run_add_entry() via kvmalloc() for the
runs_tree structure.

Fix this by adding run_close() at the out: label to
free the run.runs memory on all error paths. The
success path is unaffected as it returns 0 directly
without going through out:, transferring ownership
of the run memory to indx->alloc_run via memcpy().

Reported-by: syzbot+7adcddaeeb860e5d3f2f@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=7adcddaeeb860e5d3f2f
Signed-off-by: Deepanshu Kartikey <Kartikey406@gmail.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
3 weeks agoata: libata-eh: Do not retry reset if the device is gone
Igor Pylypiv [Thu, 2 Apr 2026 16:07:05 +0000 (09:07 -0700)] 
ata: libata-eh: Do not retry reset if the device is gone

If a device is hot-unplugged or otherwise disappears during error handling,
ata_eh_reset() may fail with -ENODEV. Currently, the error handler will
continue to retry the reset operation up to max_tries times.

Prevent unnecessary reset retries by exiting the loop early when
ata_do_reset() returns -ENODEV.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
3 weeks agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 2 Apr 2026 17:57:09 +0000 (10:57 -0700)] 
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-7.0-rc7).

Conflicts:

net/vmw_vsock/af_vsock.c
  b18c83388874 ("vsock: initialize child_ns_mode_locked in vsock_net_init()")
  0de607dc4fd8 ("vsock: add G2H fallback for CIDs not owned by H2G transport")

Adjacent changes:

drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
  ceee35e5674a ("bnxt_en: Refactor some basic ring setup and adjustment logic")
  57cdfe0dc70b ("bnxt_en: Resize RSS contexts on channel count change")

drivers/net/wireless/intel/iwlwifi/mld/mac80211.c
  4d56037a02bd ("wifi: iwlwifi: mld: block EMLSR during TDLS connections")
  687a95d204e7 ("wifi: iwlwifi: mld: correctly set wifi generation data")

drivers/net/wireless/intel/iwlwifi/mld/scan.h
  b6045c899e37 ("wifi: iwlwifi: mld: Refactor scan command handling")
  ec66ec6a5a8f ("wifi: iwlwifi: mld: Fix MLO scan timing")

drivers/net/wireless/intel/iwlwifi/mvm/fw.c
  078df640ef05 ("wifi: iwlwifi: mld: add support for iwl_mcc_allowed_ap_type_cmd v
2")
  323156c3541e ("wifi: iwlwifi: mvm: don't send a 6E related command when not supported")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoPCI: dwc: Fix type mismatch for kstrtou32_from_user() return value
Hans Zhang [Wed, 1 Apr 2026 02:30:48 +0000 (10:30 +0800)] 
PCI: dwc: Fix type mismatch for kstrtou32_from_user() return value

kstrtou32_from_user() returns int, but the return value was stored in
a u32 variable 'val', risking sign loss. Use a dedicated int variable
to correctly handle the return code.

Fixes: 4fbfa17f9a07 ("PCI: dwc: Add debugfs based Silicon Debug support for DWC")
Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20260401023048.4182452-1-18255117159@163.com
3 weeks agoMerge tag 'for-7.0-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Thu, 2 Apr 2026 17:31:30 +0000 (10:31 -0700)] 
Merge tag 'for-7.0-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fix from David Sterba:
 "One more fix for a potential extent tree corruption due to an
  unexpected error value.

  When the search for an extent item failed, it under some circumstances
  was reported as a success to the caller"

* tag 'for-7.0-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix incorrect return value after changing leaf in lookup_extent_data_ref()

3 weeks agotracing: Allow backup to save persistent ring buffer before it starts
Steven Rostedt [Tue, 31 Mar 2026 20:39:24 +0000 (16:39 -0400)] 
tracing: Allow backup to save persistent ring buffer before it starts

When the persistent ring buffer was first introduced, it did not make
sense to start tracing for it on the kernel command line. That's because
if there was a crash, the start of events would invalidate the events from
the previous boot that had the crash.

But now that there's a "backup" instance that can take a snapshot of the
persistent ring buffer when boot starts, it is possible to have the
persistent ring buffer start events at boot up and not lose the old events.

Update the code where the boot events start after all boot time instances
are created. This will allow the backup instance to copy the persistent
ring buffer from the previous boot, and allow the persistent ring buffer
to start tracing new events for the current boot.

  reserve_mem=100M:12M:trace trace_instance=boot_mapped^@trace,sched trace_instance=backup=boot_mapped

The above will create a boot_mapped persistent ring buffer and enabled the
scheduler events. If there's a crash, a "backup" instance will be created
holding the events of the persistent ring buffer from the previous boot,
while the persistent ring buffer will once again start tracing scheduler
events of the current boot.

Now the user doesn't have to remember to start the persistent ring buffer.
It will always have the events started at each boot.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: John Stultz <jstultz@google.com>
Link: https://patch.msgid.link/20260331163924.6ccb3896@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
3 weeks agotracing/Documentation: Add a section about backup instance
Masami Hiramatsu (Google) [Wed, 1 Apr 2026 06:38:05 +0000 (15:38 +0900)] 
tracing/Documentation: Add a section about backup instance

Add a section about backup instance to the debugging.rst.

Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/177502548479.1311542.7062269603547001007.stgit@mhiramat.tok.corp.google.com
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
3 weeks agothermal/drivers/brcmstb_thermal: Use max to simplify brcmstb_get_temp
Thorsten Blum [Thu, 2 Apr 2026 16:56:18 +0000 (18:56 +0200)] 
thermal/drivers/brcmstb_thermal: Use max to simplify brcmstb_get_temp

Use max() to simplify brcmstb_get_temp() and improve its readability.
Since avs_tmon_code_to_temp() returns an int, change the data type of
the local variable 't' from long to int.  No functional changes.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20260402165616.895305-3-thorsten.blum@linux.dev
3 weeks agotracing: Remove the backup instance automatically after read
Masami Hiramatsu (Google) [Wed, 1 Apr 2026 06:37:57 +0000 (15:37 +0900)] 
tracing: Remove the backup instance automatically after read

Since the backup instance is readonly, after reading all data via pipe, no
data is left on the instance. Thus it can be removed safely after closing
all files.  This also removes it if user resets the ring buffer manually
via 'trace' file.

Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/177502547711.1311542.12572973358010839400.stgit@mhiramat.tok.corp.google.com
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
3 weeks agotracing: Make the backup instance non-reusable
Masami Hiramatsu (Google) [Wed, 1 Apr 2026 06:37:49 +0000 (15:37 +0900)] 
tracing: Make the backup instance non-reusable

Since there is no reason to reuse the backup instance, make it readonly
(but erasable).  Note that only backup instances are readonly, because
other trace instances will be empty unless it is writable.  Only backup
instances have copy entries from the original.

With this change, most of the trace control files are removed from the
backup instance, including eventfs enable/filter etc.

 # find /sys/kernel/tracing/instances/backup/events/ | wc -l
 4093
 # find /sys/kernel/tracing/instances/boot_map/events/ | wc -l
 9573

Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/177502546939.1311542.1826814401724828930.stgit@mhiramat.tok.corp.google.com
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
3 weeks agoring-buffer: Enforce read ordering of trace_buffer cpumask and buffers
Vincent Donnefort [Wed, 1 Apr 2026 05:36:59 +0000 (06:36 +0100)] 
ring-buffer: Enforce read ordering of trace_buffer cpumask and buffers

On CPU hotplug, if it is the first time a trace_buffer sees a CPU, a
ring_buffer_per_cpu will be allocated and its corresponding bit toggled
in the cpumask. Many readers check this cpumask to know if they can
safely read the ring_buffer_per_cpu but they are doing so without memory
ordering and may observe the cpumask bit set while having NULL buffer
pointer.

Enforce the memory read ordering by sending an IPI to all online CPUs.
The hotplug path is a slow-path anyway and it saves us from adding read
barriers in numerous call sites.

Link: https://patch.msgid.link/20260401053659.3458961-1-vdonnefort@google.com
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
3 weeks agoselftests/bpf: Add more precision tracking tests for atomics
Daniel Borkmann [Tue, 31 Mar 2026 22:20:20 +0000 (00:20 +0200)] 
selftests/bpf: Add more precision tracking tests for atomics

Add verifier precision tracking tests for BPF atomic fetch operations.
Validate that backtrack_insn correctly propagates precision from the
fetch dst_reg to the stack slot for {fetch_add,xchg,cmpxchg} atomics.
For the first two src_reg gets the old memory value, and for the last
one r0. The fetched register is used for pointer arithmetic to trigger
backtracking. Also add coverage for fetch_{or,and,xor} flavors which
exercises the bitwise atomic fetch variants going through the same
insn->imm & BPF_FETCH check but with different imm values.

Add dual-precision regression tests for fetch_add and cmpxchg where
both the fetched value and a reread of the same stack slot are tracked
for precision. After the atomic operation, the stack slot is STACK_MISC,
so the ldx does not set INSN_F_STACK_ACCESS. These tests verify that
stack precision propagates solely through the atomic fetch's load side.

Add map-based tests for fetch_add and cmpxchg which validate that non-
stack atomic fetch completes precision tracking without falling back
to mark_all_scalars_precise. Lastly, add 32-bit variants for {fetch_add,
cmpxchg} on map values to cover the second valid atomic operand size.

  # LDLIBS=-static PKG_CONFIG='pkg-config --static' ./vmtest.sh -- ./test_progs -t verifier_precision
  [...]
  + /etc/rcS.d/S50-startup
  ./test_progs -t verifier_precision
  [    1.697105] bpf_testmod: loading out-of-tree module taints kernel.
  [    1.700220] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
  [    1.777043] tsc: Refined TSC clocksource calibration: 3407.986 MHz
  [    1.777619] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fc6d7268, max_idle_ns: 440795260133 ns
  [    1.778658] clocksource: Switched to clocksource tsc
  #633/1   verifier_precision/bpf_neg:OK
  #633/2   verifier_precision/bpf_end_to_le:OK
  #633/3   verifier_precision/bpf_end_to_be:OK
  #633/4   verifier_precision/bpf_end_bswap:OK
  #633/5   verifier_precision/bpf_load_acquire:OK
  #633/6   verifier_precision/bpf_store_release:OK
  #633/7   verifier_precision/state_loop_first_last_equal:OK
  #633/8   verifier_precision/bpf_cond_op_r10:OK
  #633/9   verifier_precision/bpf_cond_op_not_r10:OK
  #633/10  verifier_precision/bpf_atomic_fetch_add_precision:OK
  #633/11  verifier_precision/bpf_atomic_xchg_precision:OK
  #633/12  verifier_precision/bpf_atomic_fetch_or_precision:OK
  #633/13  verifier_precision/bpf_atomic_fetch_and_precision:OK
  #633/14  verifier_precision/bpf_atomic_fetch_xor_precision:OK
  #633/15  verifier_precision/bpf_atomic_cmpxchg_precision:OK
  #633/16  verifier_precision/bpf_atomic_fetch_add_dual_precision:OK
  #633/17  verifier_precision/bpf_atomic_cmpxchg_dual_precision:OK
  #633/18  verifier_precision/bpf_atomic_fetch_add_map_precision:OK
  #633/19  verifier_precision/bpf_atomic_cmpxchg_map_precision:OK
  #633/20  verifier_precision/bpf_atomic_fetch_add_32bit_precision:OK
  #633/21  verifier_precision/bpf_atomic_cmpxchg_32bit_precision:OK
  #633/22  verifier_precision/bpf_neg_2:OK
  #633/23  verifier_precision/bpf_neg_3:OK
  #633/24  verifier_precision/bpf_neg_4:OK
  #633/25  verifier_precision/bpf_neg_5:OK
  #633     verifier_precision:OK
  Summary: 1/25 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260331222020.401848-2-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 weeks agobpf: Fix incorrect pruning due to atomic fetch precision tracking
Daniel Borkmann [Tue, 31 Mar 2026 22:20:19 +0000 (00:20 +0200)] 
bpf: Fix incorrect pruning due to atomic fetch precision tracking

When backtrack_insn encounters a BPF_STX instruction with BPF_ATOMIC
and BPF_FETCH, the src register (or r0 for BPF_CMPXCHG) also acts as
a destination, thus receiving the old value from the memory location.

The current backtracking logic does not account for this. It treats
atomic fetch operations the same as regular stores where the src
register is only an input. This leads the backtrack_insn to fail to
propagate precision to the stack location, which is then not marked
as precise!

Later, the verifier's path pruning can incorrectly consider two states
equivalent when they differ in terms of stack state. Meaning, two
branches can be treated as equivalent and thus get pruned when they
should not be seen as such.

Fix it as follows: Extend the BPF_LDX handling in backtrack_insn to
also cover atomic fetch operations via is_atomic_fetch_insn() helper.
When the fetch dst register is being tracked for precision, clear it,
and propagate precision over to the stack slot. For non-stack memory,
the precision walk stops at the atomic instruction, same as regular
BPF_LDX. This covers all fetch variants.

Before:

  0: (b7) r1 = 8                        ; R1=8
  1: (7b) *(u64 *)(r10 -8) = r1         ; R1=8 R10=fp0 fp-8=8
  2: (b7) r2 = 0                        ; R2=0
  3: (db) r2 = atomic64_fetch_add((u64 *)(r10 -8), r2)          ; R2=8 R10=fp0 fp-8=mmmmmmmm
  4: (bf) r3 = r10                      ; R3=fp0 R10=fp0
  5: (0f) r3 += r2
  mark_precise: frame0: last_idx 5 first_idx 0 subseq_idx -1
  mark_precise: frame0: regs=r2 stack= before 4: (bf) r3 = r10
  mark_precise: frame0: regs=r2 stack= before 3: (db) r2 = atomic64_fetch_add((u64 *)(r10 -8), r2)
  mark_precise: frame0: regs=r2 stack= before 2: (b7) r2 = 0
  6: R2=8 R3=fp8
  6: (b7) r0 = 0                        ; R0=0
  7: (95) exit

After:

  0: (b7) r1 = 8                        ; R1=8
  1: (7b) *(u64 *)(r10 -8) = r1         ; R1=8 R10=fp0 fp-8=8
  2: (b7) r2 = 0                        ; R2=0
  3: (db) r2 = atomic64_fetch_add((u64 *)(r10 -8), r2)          ; R2=8 R10=fp0 fp-8=mmmmmmmm
  4: (bf) r3 = r10                      ; R3=fp0 R10=fp0
  5: (0f) r3 += r2
  mark_precise: frame0: last_idx 5 first_idx 0 subseq_idx -1
  mark_precise: frame0: regs=r2 stack= before 4: (bf) r3 = r10
  mark_precise: frame0: regs=r2 stack= before 3: (db) r2 = atomic64_fetch_add((u64 *)(r10 -8), r2)
  mark_precise: frame0: regs= stack=-8 before 2: (b7) r2 = 0
  mark_precise: frame0: regs= stack=-8 before 1: (7b) *(u64 *)(r10 -8) = r1
  mark_precise: frame0: regs=r1 stack= before 0: (b7) r1 = 8
  6: R2=8 R3=fp8
  6: (b7) r0 = 0                        ; R0=0
  7: (95) exit

Fixes: 5ffa25502b5a ("bpf: Add instructions for atomic_[cmp]xchg")
Fixes: 5ca419f2864a ("bpf: Add BPF_FETCH field / create atomic_fetch_add instruction")
Reported-by: STAR Labs SG <info@starlabs.sg>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260331222020.401848-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 weeks agoMerge tag 'net-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 2 Apr 2026 16:57:06 +0000 (09:57 -0700)] 
Merge tag 'net-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "With fixes from wireless, bluetooth and netfilter included we're back
  to each PR carrying 30%+ more fixes than in previous era.

  The good news is that so far none of the "extra" fixes are themselves
  causing real regressions. Not sure how much comfort that is.

  Current release - fix to a fix:

   - netdevsim: fix build if SKB_EXTENSIONS=n

   - eth: stmmac: skip VLAN restore when VLAN hash ops are missing

  Previous releases - regressions:

   - wifi: iwlwifi: mvm: don't send a 6E related command when
     not supported

  Previous releases - always broken:

   - some info leak fixes

   - add missing clearing of skb->cb[] on ICMP paths from tunnels

   - ipv6:
      - flowlabel: defer exclusive option free until RCU teardown
      - avoid overflows in ip6_datagram_send_ctl()

   - mpls: add seqcount to protect platform_labels from OOB access

   - bridge: improve safety of parsing ND options

   - bluetooth: fix leaks, overflows and races in hci_sync

   - netfilter: add more input validation, some to address bugs directly
     some to prevent exploits from cooking up broken configurations

   - wifi:
      - ath: avoid poor performance due to stopping the wrong
        aggregation session
      - virt_wifi: remove SET_NETDEV_DEV to avoid use-after-free

   - eth:
      - fec: fix the PTP periodic output sysfs interface
      - enetc: safely reinitialize TX BD ring when it has unsent frames"

* tag 'net-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (95 commits)
  eth: fbnic: Increase FBNIC_QUEUE_SIZE_MIN to 64
  ipv6: avoid overflows in ip6_datagram_send_ctl()
  net: hsr: fix VLAN add unwind on slave errors
  net: hsr: serialize seq_blocks merge across nodes
  vsock: initialize child_ns_mode_locked in vsock_net_init()
  selftests/tc-testing: add tests for cls_fw and cls_flow on shared blocks
  net/sched: cls_flow: fix NULL pointer dereference on shared blocks
  net/sched: cls_fw: fix NULL pointer dereference on shared blocks
  net/x25: Fix overflow when accumulating packets
  net/x25: Fix potential double free of skb
  bnxt_en: Restore default stat ctxs for ULP when resource is available
  bnxt_en: Don't assume XDP is never enabled in bnxt_init_dflt_ring_mode()
  bnxt_en: Refactor some basic ring setup and adjustment logic
  net/mlx5: Fix switchdev mode rollback in case of failure
  net/mlx5: Avoid "No data available" when FW version queries fail
  net/mlx5: lag: Check for LAG device before creating debugfs
  net: macb: properly unregister fixed rate clocks
  net: macb: fix clk handling on PCI glue driver removal
  virtio_net: clamp rss_max_key_size to NETDEV_RSS_KEY_LEN
  net/sched: sch_netem: fix out-of-bounds access in packet corruption
  ...

3 weeks agoMerge tag 'iommu-fixes-v7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 2 Apr 2026 16:53:16 +0000 (09:53 -0700)] 
Merge tag 'iommu-fixes-v7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux

Pull iommu fixes from Joerg Roedel:

 - IOMMU-PT related compile breakage in for AMD driver

 - IOTLB flushing behavior when unmapped region is larger than requested
   due to page-sizes

 - Fix IOTLB flush behavior with empty gathers

* tag 'iommu-fixes-v7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
  iommupt/amdv1: mark amdv1pt_install_leaf_entry as __always_inline
  iommupt: Fix short gather if the unmap goes into a large mapping
  iommu: Do not call drivers for empty gathers

4 weeks agobpf: Reject sleepable kprobe_multi programs at attach time
Varun R Mallya [Wed, 1 Apr 2026 19:11:25 +0000 (00:41 +0530)] 
bpf: Reject sleepable kprobe_multi programs at attach time

kprobe.multi programs run in atomic/RCU context and cannot sleep.
However, bpf_kprobe_multi_link_attach() did not validate whether the
program being attached had the sleepable flag set, allowing sleepable
helpers such as bpf_copy_from_user() to be invoked from a non-sleepable
context.

This causes a "sleeping function called from invalid context" splat:

  BUG: sleeping function called from invalid context at ./include/linux/uaccess.h:169
  in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1787, name: sudo
  preempt_count: 1, expected: 0
  RCU nest depth: 2, expected: 0

Fix this by rejecting sleepable programs early in
bpf_kprobe_multi_link_attach(), before any further processing.

Fixes: 0dcac2725406 ("bpf: Add multi kprobe link")
Signed-off-by: Varun R Mallya <varunrmallya@gmail.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Leon Hwang <leon.hwang@linux.dev>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260401191126.440683-1-varunrmallya@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
4 weeks agobpf: reject direct access to nullable PTR_TO_BUF pointers
Qi Tang [Thu, 2 Apr 2026 09:29:22 +0000 (17:29 +0800)] 
bpf: reject direct access to nullable PTR_TO_BUF pointers

check_mem_access() matches PTR_TO_BUF via base_type() which strips
PTR_MAYBE_NULL, allowing direct dereference without a null check.

Map iterator ctx->key and ctx->value are PTR_TO_BUF | PTR_MAYBE_NULL.
On stop callbacks these are NULL, causing a kernel NULL dereference.

Add a type_may_be_null() guard to the PTR_TO_BUF branch, matching the
existing PTR_TO_BTF_ID pattern.

Fixes: 20b2aff4bc15 ("bpf: Introduce MEM_RDONLY flag")
Signed-off-by: Qi Tang <tpluszz77@gmail.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260402092923.38357-2-tpluszz77@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
4 weeks agoMerge tag 'sound-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Thu, 2 Apr 2026 16:41:21 +0000 (09:41 -0700)] 
Merge tag 'sound-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "People have been so busy for hunting and we're still getting more
  changes than wished for, but it doesn't look too scary; almost all
  changes are device-specific small fixes.

  I guess it's rather a casual bump, and no more Easter eggs are left
  for 7.0 (hopefully)...

   - Fixes for the recent regression on ctxfi driver

   - Fix missing INIT_LIST_HEAD() for ASoC card_aux_list

   - Usual HD- and USB-audio, and ASoC AMD quirk updates

   - ASoC fixes for AMD and Intel"

* tag 'sound-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (24 commits)
  ASoC: amd: ps: Fix missing leading zeros in subsystem_device SSID log
  ALSA: usb-audio: Exclude Scarlett 2i2 1st Gen (8016) from SKIP_IFACE_SETUP
  ALSA: hda/realtek: add quirk for Acer Swift SFG14-73
  ALSA: hda/realtek: Add quirk for Lenovo Yoga Pro 7 14IMH9
  ASoC: Intel: boards: fix unmet dependency on PINCTRL
  ASoC: Intel: ehl_rt5660: Use the correct rtd->dev device in hw_params
  ALSA: ctxfi: Don't enumerate SPDIF1 at DAIO initialization
  ALSA: hda/realtek: Add quirk for Lenovo Yoga Slim 7 14AKP10
  ALSA: hda/realtek: add quirk for HP Laptop 15-fc0xxx
  ASoC: ep93xx: Fix unchecked clk_prepare_enable() and add rollback on failure
  ASoC: soc-core: call missing INIT_LIST_HEAD() for card_aux_list
  ALSA: hda/realtek: Add quirk for Samsung Book2 Pro 360 (NP950QED)
  ASoC: amd: yc: Add DMI entry for HP Laptop 15-fc0xxx
  ASoC: amd: yc: Add DMI quirk for ASUS Vivobook Pro 16X OLED M7601RM
  ALSA: hda/realtek: Add quirk for ASUS ROG Strix SCAR 15
  ALSA: usb-audio: Exclude Scarlett Solo 1st Gen from SKIP_IFACE_SETUP
  ALSA: caiaq: fix stack out-of-bounds read in init_card
  ALSA: ctxfi: Check the error for index mapping
  ALSA: ctxfi: Fix missing SPDIFI1 index handling
  ALSA: hda/realtek: add quirk for HP Victus 15-fb0xxx
  ...

4 weeks agoMerge tag 'auxdisplay-v7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/andy...
Linus Torvalds [Thu, 2 Apr 2026 16:34:22 +0000 (09:34 -0700)] 
Merge tag 'auxdisplay-v7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/andy/linux-auxdisplay

Pull auxdisplay fixes from Andy Shevchenko:

 - Fix NULL dereference in linedisp_release()

 - Fix ht16k33 DT bindings to avoid warnings

 - Handle errors in I²C transfers in lcd2s driver

* tag 'auxdisplay-v7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/andy/linux-auxdisplay:
  auxdisplay: line-display: fix NULL dereference in linedisp_release
  auxdisplay: lcd2s: add error handling for i2c transfers
  dt-bindings: auxdisplay: ht16k33: Use unevaluatedProperties to fix common property warning

4 weeks agoMerge tag 'reset-fixes-for-v7.0-2' into reset/next
Philipp Zabel [Thu, 2 Apr 2026 12:30:10 +0000 (14:30 +0200)] 
Merge tag 'reset-fixes-for-v7.0-2' into reset/next

Reset controller fixes for v7.0, part 2

* Decouple spacemit K3 reset lines that were incorrectly coupled
  together as one, but are in fact separate resets in hardware.
* Fix a double free in the reset_add_gpio_aux_device() error path.
  This has already been fixed on reset/next by commit a9b95ce36de4
  ("reset: gpio: add a devlink between reset-gpio and its consumer").
* Fix the MODULE_AUTHOR string in the rzg2l-usbphy-ctrl driver.

We merge this into reset/next to resolve a conflict between commits
a9b95ce36de4 ("reset: gpio: add a devlink between reset-gpio and its
consumer") and fbffb8c7c7bb ("reset: gpio: fix double free in
reset_add_gpio_aux_device() error path").

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
4 weeks agoMerge branch 'bpf-migrate-bpf_task_work-and-file-dynptr-to-kmalloc_nolock'
Alexei Starovoitov [Thu, 2 Apr 2026 16:31:42 +0000 (09:31 -0700)] 
Merge branch 'bpf-migrate-bpf_task_work-and-file-dynptr-to-kmalloc_nolock'

Mykyta Yatsenko says:

====================
bpf: Migrate bpf_task_work and file dynptr to kmalloc_nolock

Now that kmalloc can be used from NMI context via kmalloc_nolock(),
migrate BPF internal allocations away from bpf_mem_alloc to use the
standard slab allocator.

Use kfree_rcu() for deferred freeing, which waits for a regular RCU
grace period before the memory is reclaimed. Sleepable BPF programs
hold rcu_read_lock_trace but not regular rcu_read_lock, so patch 1
adds explicit rcu_read_lock/unlock around the pointer-to-refcount
window to prevent kfree_rcu from freeing memory while a sleepable
program is still between reading the pointer and acquiring a
reference.

Patch 1 migrates bpf_task_work_ctx from bpf_mem_alloc/bpf_mem_free to
kmalloc_nolock/kfree_rcu.

Patch 2 migrates bpf_dynptr_file_impl from bpf_mem_alloc/bpf_mem_free
to kmalloc_nolock/kfree.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
Changes in v2:
- Switch to scoped_guard in patch 1 (Kumar)
- Remove rcu gp wait in patch 2 (Kumar)
- Defer to irq_work when irqs disabled in patch 1
- use bpf_map_kmalloc_nolock() for bpf_task_work
- use kmalloc_nolock() for file dynptr
- Link to v1: https://lore.kernel.org/all/20260325-kmalloc_special-v1-0-269666afb1ea@meta.com/
====================

Link: https://patch.msgid.link/20260330-kmalloc_special-v2-0-c90403f92ff0@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
4 weeks agobpf: Migrate dynptr file to kmalloc_nolock
Mykyta Yatsenko [Mon, 30 Mar 2026 22:27:57 +0000 (15:27 -0700)] 
bpf: Migrate dynptr file to kmalloc_nolock

Replace bpf_mem_alloc/bpf_mem_free with kmalloc_nolock/kfree_nolock for
bpf_dynptr_file_impl, continuing the migration away from bpf_mem_alloc
now that kmalloc can be used from NMI context.

freader_cleanup() runs before kfree_nolock() while the dynptr still
holds exclusive access, so plain kfree_nolock() is safe — no concurrent
readers can access the object.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260330-kmalloc_special-v2-2-c90403f92ff0@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
4 weeks agobpf: Migrate bpf_task_work to kmalloc_nolock
Mykyta Yatsenko [Mon, 30 Mar 2026 22:27:56 +0000 (15:27 -0700)] 
bpf: Migrate bpf_task_work to kmalloc_nolock

Replace bpf_mem_alloc/bpf_mem_free with
kmalloc_nolock/kfree_rcu for bpf_task_work_ctx.

Replace guard(rcu_tasks_trace)() with guard(rcu)() in
bpf_task_work_irq(). The function only accesses ctx struct members
(not map values), so tasks trace protection is not needed - regular
RCU is sufficient since ctx is freed via kfree_rcu. The guard in
bpf_task_work_callback() remains as tasks trace since it accesses map
values from process context.

Sleepable BPF programs hold rcu_read_lock_trace but not
regular rcu_read_lock. Since kfree_rcu
waits for a regular RCU grace period, the ctx memory can be freed
while a sleepable program is still running. Add scoped_guard(rcu)
around the pointer read and refcount tryget in
bpf_task_work_acquire_ctx to close this race window.

Since kfree_rcu uses call_rcu internally which is not safe from
NMI context, defer destruction via irq_work when irqs are disabled.

For the lost-cmpxchg path the ctx was never published, so
kfree_nolock is safe.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260330-kmalloc_special-v2-1-c90403f92ff0@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
4 weeks agoMAINTAINERS: amd-pstate: Step down as maintainer, add Prateek as reviewer
Gautham R. Shenoy [Thu, 2 Apr 2026 10:26:11 +0000 (15:56 +0530)] 
MAINTAINERS: amd-pstate: Step down as maintainer, add Prateek as reviewer

Mario Limonciello has led amd-pstate maintenance in recent years and
has done excellent work. The amd-pstate driver is in good hands with
him. I am stepping down as co-maintainer as I move on to other things.

Add K Prateek Nayak as a reviewer. He has been actively contributing
to the driver including preferred-core and ITMT improvements, and has
been helping review amd-pstate patches for a while now.

Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Acked-by: K Prateek Nayak <kprateek.nayak@amd.com>
Acked-by: Mario Limonciello (AMD) <superm1@kernel.org>
Link: https://lore.kernel.org/r/20260402102611.16519-1-gautham.shenoy@amd.com
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
4 weeks agocpufreq: Pass the policy to cpufreq_driver->adjust_perf()
K Prateek Nayak [Mon, 16 Mar 2026 08:18:49 +0000 (08:18 +0000)] 
cpufreq: Pass the policy to cpufreq_driver->adjust_perf()

cpufreq_cpu_get() can sleep on PREEMPT_RT in presence of concurrent
writer(s), however amd-pstate depends on fetching the cpudata via the
policy's driver data which necessitates grabbing the reference.

Since schedutil governor can call "cpufreq_driver->update_perf()"
during sched_tick/enqueue/dequeue with rq_lock held and IRQs disabled,
fetching the policy object using the cpufreq_cpu_get() helper in the
scheduler fast-path leads to "BUG: scheduling while atomic" on
PREEMPT_RT [1].

Pass the cached cpufreq policy object in sg_policy to the update_perf()
instead of just the CPU. The CPU can be inferred using "policy->cpu".

The lifetime of cpufreq_policy object outlasts that of the governor and
the cpufreq driver (allocated when the CPU is onlined and only reclaimed
when the CPU is offlined / the CPU device is removed) which makes it
safe to be referenced throughout the governor's lifetime.

Closes:https://lore.kernel.org/all/20250731092316.3191-1-spasswolf@web.de/ [1]

Fixes: 1d215f0319c2 ("cpufreq: amd-pstate: Add fast switch function for AMD P-State")
Reported-by: Bert Karwatzki <spasswolf@web.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Acked-by: Gary Guo <gary@garyguo.net> # Rust
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Reviewed-by: Zhongqiu Han <zhongqiu.han@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260316081849.19368-3-kprateek.nayak@amd.com
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
4 weeks agocpufreq/amd-pstate: Pass the policy to amd_pstate_update()
K Prateek Nayak [Mon, 16 Mar 2026 08:18:48 +0000 (08:18 +0000)] 
cpufreq/amd-pstate: Pass the policy to amd_pstate_update()

All callers of amd_pstate_update() already have a reference to the
cpufreq_policy object.

Pass the entire policy object and grab the cpudata using
"policy->driver_data" instead of passing the cpudata and unnecessarily
grabbing another read-side reference to the cpufreq policy object when
it is already available in the caller.

No functional changes intended.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/20260316081849.19368-2-kprateek.nayak@amd.com
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
4 weeks agocpufreq/amd-pstate-ut: Add a unit test for raw EPP
Mario Limonciello (AMD) [Sun, 29 Mar 2026 20:38:11 +0000 (15:38 -0500)] 
cpufreq/amd-pstate-ut: Add a unit test for raw EPP

Ensure that all supported raw EPP values work properly.

Export the driver helpers used by the test module so the test can drive
raw EPP writes and temporarily disable dynamic EPP while it runs.

Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
4 weeks agoMerge branch 'bpf-fix-abuse-of-kprobe_write_ctx-via-freplace'
Alexei Starovoitov [Thu, 2 Apr 2026 16:29:49 +0000 (09:29 -0700)] 
Merge branch 'bpf-fix-abuse-of-kprobe_write_ctx-via-freplace'

Leon Hwang says:

====================
bpf: Fix abuse of kprobe_write_ctx via freplace

The potential issue of kprobe_write_ctx+freplace was mentioned in
"bpf: Disallow !kprobe_write_ctx progs tail-calling kprobe_write_ctx progs" [1].

It is true issue, that the test in patch #2 verifies that kprobe_write_ctx=false
kprobe progs can be abused to modify struct pt_regs via kprobe_write_ctx=true
freplace progs.

When struct pt_regs is modified, bpf_prog_test_run_opts() gets -EFAULT instead
of 0.

test_freplace_kprobe_write_ctx:FAIL:bpf_prog_test_run_opts unexpected error: -14 (errno 14)

We will disallow attaching freplace programs on kprobe programs with different
kprobe_write_ctx values.

Links:
[1] https://lore.kernel.org/bpf/CAP01T74w4KVMn9bEwpQXrk+bqcUxzb6VW1SQ_QvNy0A4EY-9Jg@mail.gmail.com/

Changes:
v2 -> v3:
* Add comment to the rejection of kprobe_write_ctx (per Jiri).
* Use libbpf_get_error() instead of errno in test (per Jiri).
* Collect Acked-by tags from Jiri and Song, thanks.
v2: https://lore.kernel.org/bpf/20260326141718.17731-1-leon.hwang@linux.dev/

v1 -> v2:
* Drop patch #1 in v1, as it wasn't an issue (per Toke).
* Check kprobe_write_ctx value at attach time instead of at load time, to
  prevent attaching kprobe_write_ctx=true freplace progs on
  kprobe_write_ctx=false kprobe progs (per Gemini/sashiko).
* Move kprobe_write_ctx test code to attach_probe.c and kprobe_write_ctx.c.
v1: https://lore.kernel.org/bpf/20260324150444.68166-1-leon.hwang@linux.dev/
====================

Link: https://patch.msgid.link/20260331145353.87606-1-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
4 weeks agoselftests/bpf: Add test to verify the fix of kprobe_write_ctx abuse
Leon Hwang [Tue, 31 Mar 2026 14:53:53 +0000 (22:53 +0800)] 
selftests/bpf: Add test to verify the fix of kprobe_write_ctx abuse

Add a test to verify the issue: kprobe_write_ctx can be abused to modify
struct pt_regs of kernel functions via kprobe_write_ctx=true freplace
progs.

Without the fix, the issue is verified:

kprobe_write_ctx=true freplace prog is allowed to attach to
kprobe_write_ctx=false kprobe prog. Then, the first arg of
bpf_fentry_test1 will be set as 0, and bpf_prog_test_run_opts() gets
-EFAULT instead of 0.

With the fix, the issue is rejected at attach time.

Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20260331145353.87606-3-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
4 weeks agobpf: Fix abuse of kprobe_write_ctx via freplace
Leon Hwang [Tue, 31 Mar 2026 14:53:52 +0000 (22:53 +0800)] 
bpf: Fix abuse of kprobe_write_ctx via freplace

uprobe programs are allowed to modify struct pt_regs.

Since the actual program type of uprobe is KPROBE, it can be abused to
modify struct pt_regs via kprobe+freplace when the kprobe attaches to
kernel functions.

For example,

SEC("?kprobe")
int kprobe(struct pt_regs *regs)
{
return 0;
}

SEC("?freplace")
int freplace_kprobe(struct pt_regs *regs)
{
regs->di = 0;
return 0;
}

freplace_kprobe prog will attach to kprobe prog.
kprobe prog will attach to a kernel function.

Without this patch, when the kernel function runs, its first arg will
always be set as 0 via the freplace_kprobe prog.

To fix the abuse of kprobe_write_ctx=true via kprobe+freplace, disallow
attaching freplace programs on kprobe programs with different
kprobe_write_ctx values.

Fixes: 7384893d970e ("bpf: Allow uprobe program to change context registers")
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20260331145353.87606-2-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
4 weeks agocpufreq/amd-pstate: Add support for raw EPP writes
Mario Limonciello (AMD) [Sun, 29 Mar 2026 20:38:10 +0000 (15:38 -0500)] 
cpufreq/amd-pstate: Add support for raw EPP writes

The energy performance preference field of the CPPC request MSR
supports values from 0 to 255, but the strings only offer 4 values.

The other values are useful for tuning the performance of some
workloads.

Add support for writing the raw energy performance preference value
to the sysfs file.  If the last value written was an integer then
an integer will be returned.  If the last value written was a string
then a string will be returned.

Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
4 weeks agocpufreq/amd-pstate: Add support for platform profile class
Mario Limonciello (AMD) [Sun, 29 Mar 2026 20:38:09 +0000 (15:38 -0500)] 
cpufreq/amd-pstate: Add support for platform profile class

The platform profile core allows multiple drivers and devices to
register platform profile support.

When the legacy platform profile interface is used all drivers will
adjust the platform profile as well.

Add support for registering every CPU with the platform profile handler
when dynamic EPP is enabled.

The end result will be that changing the platform profile will modify
EPP accordingly.

Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
4 weeks agocpufreq/amd-pstate: add kernel command line to override dynamic epp
Mario Limonciello (AMD) [Sun, 29 Mar 2026 20:38:08 +0000 (15:38 -0500)] 
cpufreq/amd-pstate: add kernel command line to override dynamic epp

Add `amd_dynamic_epp=enable` and `amd_dynamic_epp=disable` to override
the kernel configuration option `CONFIG_X86_AMD_PSTATE_DYNAMIC_EPP`
locally.

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
4 weeks agocpufreq/amd-pstate: Add dynamic energy performance preference
Mario Limonciello (AMD) [Sun, 29 Mar 2026 20:38:07 +0000 (15:38 -0500)] 
cpufreq/amd-pstate: Add dynamic energy performance preference

Dynamic energy performance preference changes the EPP profile based on
whether the machine is running on AC or DC power.

A notification chain from the power supply core is used to adjust EPP
values on plug in or plug out events.

When enabled, the driver exposes a sysfs toggle for dynamic EPP, blocks
manual writes to energy_performance_preference while it "owns" the EPP
updates.

For non-server systems:
    * the default EPP for AC mode is `performance`.
    * the default EPP for DC mode is `balance_performance`.

For server systems dynamic EPP is mostly a no-op.

Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
4 weeks agoDocumentation: amd-pstate: fix dead links in the reference section
Ninad Naik [Mon, 30 Mar 2026 19:08:55 +0000 (00:38 +0530)] 
Documentation: amd-pstate: fix dead links in the reference section

The links for AMD64 Architecture Programmer's Manual and PPR for AMD
Family 19h Model 51h, Revision A1 Processors redirect to a generic page.
Update the links to the working ones.

Signed-off-by: Ninad Naik <ninadnaik07@gmail.com>
Link: https://lore.kernel.org/r/20260330190855.1115304-1-ninadnaik07@gmail.com
Acked-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
4 weeks agocpufreq/amd-pstate: Cache the max frequency in cpudata
Mario Limonciello (AMD) [Thu, 26 Mar 2026 19:36:20 +0000 (14:36 -0500)] 
cpufreq/amd-pstate: Cache the max frequency in cpudata

The value of maximum frequency is fixed and never changes. Doing
calculations every time based off of perf is unnecessary.

Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/20260326193620.649441-1-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
4 weeks agoDocumentation/amd-pstate: Add documentation for amd_pstate_floor_{freq,count}
Gautham R. Shenoy [Thu, 26 Mar 2026 11:47:56 +0000 (17:17 +0530)] 
Documentation/amd-pstate: Add documentation for amd_pstate_floor_{freq,count}

Add documentation for the sysfs files
/sys/devices/system/cpu/cpufreq/policy*/amd_pstate_floor_freq
and
/sys/devices/system/cpu/cpufreq/policy*/amd_pstate_floor_count.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
4 weeks agoDocumentation/amd-pstate: List amd_pstate_prefcore_ranking sysfs file
Gautham R. Shenoy [Thu, 26 Mar 2026 11:47:55 +0000 (17:17 +0530)] 
Documentation/amd-pstate: List amd_pstate_prefcore_ranking sysfs file

Add the missing amd_pstate_prefcore_ranking filenames in the sysfs
listing example leading to the descriptions of these
parameters. Clarify when will the file be visible.

Fixes: 15a2b764ea7c ("amd-pstate: Add missing documentation for `amd_pstate_prefcore_ranking`")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
4 weeks agoDocumentation/amd-pstate: List amd_pstate_hw_prefcore sysfs file
Gautham R. Shenoy [Thu, 26 Mar 2026 11:47:54 +0000 (17:17 +0530)] 
Documentation/amd-pstate: List amd_pstate_hw_prefcore sysfs file

Add the missing amd_pstate_hw_prefcore filenames in the sysfs listing
example leading to the descriptions of these parameters. Clarify when
will the file be visible.

Fixes: b96b82d1af7f ("cpufreq: amd-pstate: Add documentation for `amd_pstate_hw_prefcore`")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
4 weeks agoamd-pstate-ut: Add a testcase to validate the visibility of driver attributes
Gautham R. Shenoy [Thu, 26 Mar 2026 11:47:53 +0000 (17:17 +0530)] 
amd-pstate-ut: Add a testcase to validate the visibility of driver attributes

amd-pstate driver has per-attribute visibility functions to
dynamically control which sysfs freq_attrs are exposed based on the
platform capabilities and the current amd_pstate mode. However, there
is no test coverage to validate that the driver's live attribute list
matches the expected visibility for each mode.

Add amd_pstate_ut_check_freq_attrs() to the amd-pstate unit test
module. For each enabled mode (passive, active, guided), the test
independently derives the expected visibility of each attribute:
  - Core attributes (max_freq, lowest_nonlinear_freq, highest_perf)
    are always expected.
  - Prefcore attributes (prefcore_ranking, hw_prefcore) are expected
    only when cpudata->hw_prefcore indicates platform support.
  - EPP attributes (energy_performance_preference,
    energy_performance_available_preferences) are expected only in
    active mode.
  - Floor frequency attributes (floor_freq, floor_count) are expected
    only when X86_FEATURE_CPPC_PERF_PRIO is present.

Compare these independent expectations against the live driver's attr
array, catching bugs such as attributes leaking into wrong modes or
visibility functions checking incorrect conditions.

Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
4 weeks agoamd-pstate-ut: Add module parameter to select testcases
Gautham R. Shenoy [Thu, 26 Mar 2026 11:47:52 +0000 (17:17 +0530)] 
amd-pstate-ut: Add module parameter to select testcases

Currently when amd-pstate-ut test module is loaded, it runs all the
tests from amd_pstate_ut_cases[] array.

Add a module parameter named "test_list" that accepts a
comma-delimited list of test names, allowing users to run a
selected subset of tests. When the parameter is omitted or empty, all
tests are run as before.

Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
4 weeks agoamd-pstate: Introduce a tracepoint trace_amd_pstate_cppc_req2()
Gautham R. Shenoy [Thu, 26 Mar 2026 11:47:51 +0000 (17:17 +0530)] 
amd-pstate: Introduce a tracepoint trace_amd_pstate_cppc_req2()

Introduce a new tracepoint trace_amd_pstate_cppc_req2() to track
updates to MSR_AMD_CPPC_REQ2.

Invoke this while changing the Floor Perf.

Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
4 weeks agoamd-pstate: Add sysfs support for floor_freq and floor_count
Gautham R. Shenoy [Thu, 26 Mar 2026 11:47:50 +0000 (17:17 +0530)] 
amd-pstate: Add sysfs support for floor_freq and floor_count

When Floor Performance feature is supported by the platform, expose
two sysfs files:

   * amd_pstate_floor_freq to allow userspace to request the floor
     frequency for each CPU.

   * amd_pstate_floor_count which advertises the number of distinct
     levels of floor frequencies supported on this platform.

Reset the floor_perf to bios_floor_perf in the suspend, offline, and
exit paths, and restore the value to the cached user-request
floor_freq on the resume and online paths mirroring how bios_min_perf
is handled for MSR_AMD_CPPC_REQ.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
4 weeks agoamd-pstate: Add support for CPPC_REQ2 and FLOOR_PERF
Gautham R. Shenoy [Thu, 26 Mar 2026 11:47:49 +0000 (17:17 +0530)] 
amd-pstate: Add support for CPPC_REQ2 and FLOOR_PERF

Some future AMD processors have feature named "CPPC Performance
Priority" which lets userspace specify different floor performance
levels for different CPUs. The platform firmware takes these different
floor performance levels into consideration while throttling the CPUs
under power/thermal constraints. The presence of this feature is
indicated by bit 16 of the EDX register for CPUID leaf
0x80000007. More details can be found in AMD Publication titled "AMD64
Collaborative Processor Performance Control (CPPC) Performance
Priority" Revision 1.10.

The number of distinct floor performance levels supported on the
platform will be advertised through the bits 32:39 of the
MSR_AMD_CPPC_CAP1. Bits 0:7 of a new MSR MSR_AMD_CPPC_REQ2
(0xc00102b5) will be used to specify the desired floor performance
level for that CPU.

Add support for the aforementioned MSR_AMD_CPPC_REQ2, and macros for
parsing and updating the relevant bits from MSR_AMD_CPPC_CAP1 and
MSR_AMD_CPPC_REQ2.

On boot if the default value of the MSR_AMD_CPPC_REQ2[7:0] (Floor
Perf) is lower than CPPC.lowest_perf, and thus invalid, initialize it
to MSR_AMD_CPPC_CAP1.nominal_perf which is a sane default value.

Save the boot-time floor_perf during amd_pstate_init_floor_perf(). In
a subsequent patch it will be restored in the suspend, offline, and
exit paths, mirroring how bios_min_perf is handled for
MSR_AMD_CPPC_REQ.

Link: https://docs.amd.com/v/u/en-US/69206_1.10_AMD64_CPPC_PUB
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
4 weeks agox86/cpufeatures: Add AMD CPPC Performance Priority feature.
Gautham R. Shenoy [Thu, 26 Mar 2026 11:47:48 +0000 (17:17 +0530)] 
x86/cpufeatures: Add AMD CPPC Performance Priority feature.

Some future AMD processors have feature named "CPPC Performance
Priority" which lets userspace specify different floor performance
levels for different CPUs. The platform firmware takes these different
floor performance levels into consideration while throttling the CPUs
under power/thermal constraints. The presence of this feature is
indicated by bit 16 of the EDX register for CPUID leaf
0x80000007. More details can be found in AMD Publication titled "AMD64
Collaborative Processor Performance Control (CPPC) Performance
Priority" Revision 1.10.

Define a new feature bit named X86_FEATURE_CPPC_PERF_PRIO to map to
CPUID 0x80000007.EDX[16].

Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
4 weeks agoamd-pstate: Make certain freq_attrs conditionally visible
Gautham R. Shenoy [Thu, 26 Mar 2026 11:47:47 +0000 (17:17 +0530)] 
amd-pstate: Make certain freq_attrs conditionally visible

Certain amd_pstate freq_attrs such as amd_pstate_hw_prefcore and
amd_pstate_prefcore_ranking are enabled even when preferred core is
not supported on the platform.

Similarly there are common freq_attrs between the amd-pstate and the
amd-pstate-epp drivers (eg: amd_pstate_max_freq,
amd_pstate_lowest_nonlinear_freq, etc.) but are duplicated in two
different freq_attr structs.

Unify all the attributes in a single place and associate each of them
with a visibility function that determines whether the attribute
should be visible based on the underlying platform support and the
current amd_pstate mode.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
4 weeks agoamd-pstate: Update cppc_req_cached in fast_switch case
Gautham R. Shenoy [Thu, 26 Mar 2026 11:47:46 +0000 (17:17 +0530)] 
amd-pstate: Update cppc_req_cached in fast_switch case

The function msr_update_perf() does not cache the new value that is
written to MSR_AMD_CPPC_REQ into the variable cpudata->cppc_req_cached
when the update is happening from the fast path.

Fix that by caching the value everytime the MSR_AMD_CPPC_REQ gets
updated.

This issue was discovered by Claude Opus 4.6 with the aid of Chris
Mason's AI review-prompts
(https://github.com/masoncl/review-prompts/tree/main/kernel).

Assisted-by: Claude:claude-opus-4.6 review-prompts/linux
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Fixes: fff395796917 ("cpufreq/amd-pstate: Always write EPP value when updating perf")
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
4 weeks agoamd-pstate: Fix memory leak in amd_pstate_epp_cpu_init()
Gautham R. Shenoy [Thu, 26 Mar 2026 11:47:45 +0000 (17:17 +0530)] 
amd-pstate: Fix memory leak in amd_pstate_epp_cpu_init()

On failure to set the epp, the function amd_pstate_epp_cpu_init()
returns with an error code without freeing the cpudata object that was
allocated at the beginning of the function.

Ensure that the cpudata object is freed before returning from the
function.

This memory leak was discovered by Claude Opus 4.6 with the aid of
Chris Mason's AI review-prompts
(https://github.com/masoncl/review-prompts/tree/main/kernel).

Assisted-by: Claude:claude-opus-4.6 review-prompts/linux
Fixes: f9a378ff6443 ("cpufreq/amd-pstate: Set different default EPP policy for Epyc and Ryzen")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
4 weeks agof2fs: fix to preserve previous reserve_{blocks,node} value when remount
Zhiguo Niu [Thu, 5 Mar 2026 03:22:46 +0000 (11:22 +0800)] 
f2fs: fix to preserve previous reserve_{blocks,node} value when remount

The following steps will change previous value of reserve_{blocks,node},
this dones not match the original intention.

1.mount -t f2fs -o reserve_root=8192 imgfile test_mount/
F2FS-fs (loop56): Mounted with checkpoint version = 1b69f8c7
mount info:
/dev/block/loop56 on /data/test_mount type f2fs (xxx,reserve_root=8192,reserve_node=0,resuid=0,resgid=0,xxx)

2.mount -t f2fs -o remount,reserve_root=4096 /data/test_mount
F2FS-fs (loop56): Preserve previous reserve_root=8192
check mount info: reserve_root change to 4096
/dev/block/loop56 on /data/test_mount type f2fs (xxx,reserve_root=4096,reserve_node=0,resuid=0,resgid=0,xxx)

Prior to commit d18535132523 ("f2fs: separate the options parsing and options checking"),
the value of reserve_{blocks,node} was only set during the first mount, along with
the corresponding mount option F2FS_MOUNT_RESERVE_{ROOT,NODE} . If the mount option
F2FS_MOUNT_RESERVE_{ROOT,NODE} was found to have been set during the mount/remount,
the previously value of reserve_{blocks,node} would also be preserved, as shown in
the code below.
             if (test_opt(sbi, RESERVE_ROOT)) {
                   f2fs_info(sbi, "Preserve previous reserve_root=%u",
                          F2FS_OPTION(sbi).root_reserved_blocks);
             } else {
                   F2FS_OPTION(sbi).root_reserved_blocks = arg;
                   set_opt(sbi, RESERVE_ROOT);
             }
But commit d18535132523 ("f2fs: separate the options parsing and options checking")
only preserved the previous mount option; it did not preserve the previous value of
reserve_{blocks,node}. Since value of reserve_{blocks,node} value is assigned
or not depends on ctx->spec_mask, ctx->spec_mask should be alos handled in
f2fs_check_opt_consistency.

This patch will clear the corresponding ctx->spec_mask bits in f2fs_check_opt_consistency
to preserve the previously values of reserve_{blocks,node} if it already have a value.

Fixes: d18535132523 ("f2fs: separate the options parsing and options checking")
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
4 weeks agof2fs: invalidate block device page cache on umount
Yongpeng Yang [Tue, 24 Mar 2026 09:47:08 +0000 (17:47 +0800)] 
f2fs: invalidate block device page cache on umount

Neither F2FS nor VFS invalidates the block device page cache, which
results in reading stale metadata. An example scenario is shown below:

Terminal A                  Terminal B
mount /dev/vdb /mnt/f2fs
touch mx // ino = 4
sync
dump.f2fs -i 4 /dev/vdb// block on "[Y/N]"
                            touch mx2 // ino = 5
                            sync
                            umount /mnt/f2fs
                            dump.f2fs -i 5 /dev/vdb // block addr is 0

After umount, the block device page cache is not purged, causing
`dump.f2fs -i 5 /dev/vdb` to read stale metadata and see inode 5 with
block address 0.

Btrfs has encountered a similar issue before, the solution there was to
call sync_blockdev() and invalidate_bdev() when the device is closed:

mail-archive.com/linux-btrfs@vger.kernel.org/msg54188.html

For the root user, the f2fs kernel calls sync_blockdev() on umount to
flush all cached data to disk, and f2fs-tools can release the page cache
by issuing ioctl(fd, BLKFLSBUF) when accessing the device. However,
non-root users are not permitted to drop the page cache, and may still
observe stale data.

This patch calls sync_blockdev() and invalidate_bdev() during umount to
invalidate the block device page cache, thereby preventing stale
metadata from being read.

Note that this may result in an extra sync_blockdev() call on the first
device, in both f2fs_put_super() and kill_block_super(). The second call
do nothing, as there are no dirty pages left to flush. It ensures that
non-root users do not observe stale data.

Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
4 weeks agof2fs: fix to freeze GC and discard threads quickly
Daeho Jeong [Mon, 16 Mar 2026 18:59:54 +0000 (11:59 -0700)] 
f2fs: fix to freeze GC and discard threads quickly

Suspend can fail if kernel threads do not freeze for a while.
f2fs_gc and f2fs_discard threads can perform long-running operations
that prevent them from reaching a freeze point in a timely manner.

This patch adds explicit freezing checks in the following locations:
1. f2fs_gc: Added a check at the 'retry' label to exit the loop quickly
   if freezing is requested, especially during heavy GC rounds.
2. __issue_discard_cmd: Added a 'suspended' flag to break both inner and
   outer loops during discard command issuance if freezing is detected
   after at least one command has been issued.
3. __issue_discard_cmd_orderly: Added a similar check for orderly discard
   to ensure responsiveness.

These checks ensure that the threads release locks safely and enter the
frozen state.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
4 weeks agof2fs: fix to avoid uninit-value access in f2fs_sanity_check_node_footer
Chao Yu [Mon, 9 Mar 2026 02:22:37 +0000 (02:22 +0000)] 
f2fs: fix to avoid uninit-value access in f2fs_sanity_check_node_footer

syzbot reported a f2fs bug as below:

BUG: KMSAN: uninit-value in f2fs_sanity_check_node_footer+0x374/0xa20 fs/f2fs/node.c:1520
 f2fs_sanity_check_node_footer+0x374/0xa20 fs/f2fs/node.c:1520
 f2fs_finish_read_bio+0xe1e/0x1d60 fs/f2fs/data.c:177
 f2fs_read_end_io+0x6ab/0x2220 fs/f2fs/data.c:-1
 bio_endio+0x1006/0x1160 block/bio.c:1792
 submit_bio_noacct+0x533/0x2960 block/blk-core.c:891
 submit_bio+0x57a/0x620 block/blk-core.c:926
 blk_crypto_submit_bio include/linux/blk-crypto.h:203 [inline]
 f2fs_submit_read_bio+0x12c/0x360 fs/f2fs/data.c:557
 f2fs_submit_page_bio+0xee2/0x1450 fs/f2fs/data.c:775
 read_node_folio+0x384/0x4b0 fs/f2fs/node.c:1481
 __get_node_folio+0x5db/0x15d0 fs/f2fs/node.c:1576
 f2fs_get_inode_folio+0x40/0x50 fs/f2fs/node.c:1623
 do_read_inode fs/f2fs/inode.c:425 [inline]
 f2fs_iget+0x1209/0x9380 fs/f2fs/inode.c:596
 f2fs_fill_super+0x8f5a/0xb2e0 fs/f2fs/super.c:5184
 get_tree_bdev_flags+0x6e6/0x920 fs/super.c:1694
 get_tree_bdev+0x38/0x50 fs/super.c:1717
 f2fs_get_tree+0x35/0x40 fs/f2fs/super.c:5436
 vfs_get_tree+0xb3/0x5d0 fs/super.c:1754
 fc_mount fs/namespace.c:1193 [inline]
 do_new_mount_fc fs/namespace.c:3763 [inline]
 do_new_mount+0x885/0x1dd0 fs/namespace.c:3839
 path_mount+0x7a2/0x20b0 fs/namespace.c:4159
 do_mount fs/namespace.c:4172 [inline]
 __do_sys_mount fs/namespace.c:4361 [inline]
 __se_sys_mount+0x704/0x7f0 fs/namespace.c:4338
 __x64_sys_mount+0xe4/0x150 fs/namespace.c:4338
 x64_sys_call+0x39f0/0x3ea0 arch/x86/include/generated/asm/syscalls_64.h:166
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x134/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

The root cause is: in f2fs_finish_read_bio(), we may access uninit data
in folio if we failed to read the data from device into folio, let's add
a check condition to avoid such issue.

Cc: stable@kernel.org
Fixes: 50ac3ecd8e05 ("f2fs: fix to do sanity check on node footer in {read,write}_end_io")
Reported-by: syzbot+9aac813cdc456cdd49f8@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/69a9ca26.a70a0220.305d9a.0000.GAE@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
4 weeks agof2fs: fix false alarm of lockdep on cp_global_sem lock
Chao Yu [Fri, 6 Mar 2026 12:24:20 +0000 (12:24 +0000)] 
f2fs: fix false alarm of lockdep on cp_global_sem lock

lockdep reported a potential deadlock:

a) TCMU device removal context:
 - call del_gendisk() to get q->q_usage_counter
 - call start_flush_work() to get work_completion of wb->dwork
b) f2fs writeback context:
 - in wb_workfn(), which holds work_completion of wb->dwork
 - call f2fs_balance_fs() to get sbi->gc_lock
c) f2fs vfs_write context:
 - call f2fs_gc() to get sbi->gc_lock
 - call f2fs_write_checkpoint() to get sbi->cp_global_sem
d) f2fs mount context:
 - call recover_fsync_data() to get sbi->cp_global_sem
 - call f2fs_check_and_fix_write_pointer() to call blkdev_report_zones()
   that goes down to blk_mq_alloc_request and get q->q_usage_counter

Original callstack is in Closes tag.

However, I think this is a false alarm due to before mount returns
successfully (context d), we can not access file therein via vfs_write
(context c).

Let's introduce per-sb cp_global_sem_key, and assign the key for
cp_global_sem, so that lockdep can recognize cp_global_sem from
different super block correctly.

A lot of work are done by Shin'ichiro Kawasaki, thanks a lot for
the work.

Fixes: c426d99127b1 ("f2fs: Check write pointer consistency of open zones")
Cc: stable@kernel.org
Reported-and-tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Closes: https://lore.kernel.org/linux-f2fs-devel/20260218125237.3340441-1-shinichiro.kawasaki@wdc.com
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
4 weeks agof2fs: fix data loss caused by incorrect use of nat_entry flag
Yongpeng Yang [Tue, 10 Mar 2026 09:36:14 +0000 (17:36 +0800)] 
f2fs: fix data loss caused by incorrect use of nat_entry flag

Data loss can occur when fsync is performed on a newly created file
(before any checkpoint has been written) concurrently with a checkpoint
operation. The scenario is as follows:

create & write & fsync 'file A'                 write checkpoint
- f2fs_do_sync_file // inline inode
 - f2fs_write_inode // inode folio is dirty
                                                - f2fs_write_checkpoint
                                                 - f2fs_flush_merged_writes
                                                 - f2fs_sync_node_pages
                                                 - f2fs_flush_nat_entries
 - f2fs_fsync_node_pages // no dirty node
 - f2fs_need_inode_block_update // return false
 SPO and lost 'file A'

f2fs_flush_nat_entries() sets the IS_CHECKPOINTED and HAS_LAST_FSYNC
flags for the nat_entry, but this does not mean that the checkpoint has
actually completed successfully. However, f2fs_need_inode_block_update()
checks these flags and incorrectly assumes that the checkpoint has
finished.

The root cause is that the semantics of IS_CHECKPOINTED and
HAS_LAST_FSYNC are only guaranteed after the checkpoint write fully
completes.

This patch modifies f2fs_need_inode_block_update() to acquire the
sbi->node_write lock before reading the nat_entry flags, ensuring that
once IS_CHECKPOINTED and HAS_LAST_FSYNC are observed to be set, the
checkpoint operation has already completed.

Fixes: e05df3b115e7 ("f2fs: add node operations")
Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
4 weeks agof2fs: fix to skip empty sections in f2fs_get_victim
Daeho Jeong [Mon, 16 Mar 2026 18:59:21 +0000 (11:59 -0700)] 
f2fs: fix to skip empty sections in f2fs_get_victim

In age-based victim selection (ATGC, AT_SSR, or GC_CB), f2fs_get_victim
can encounter sections with zero valid blocks. This situation often
arises when checkpoint is disabled or due to race conditions between
SIT updates and dirty list management.

In such cases, f2fs_get_section_mtime() returns INVALID_MTIME, which
subsequently triggers a fatal f2fs_bug_on(sbi, mtime == INVALID_MTIME)
in add_victim_entry() or get_cb_cost().

This patch adds a check in f2fs_get_victim's selection loop to skip
sections with no valid blocks. This prevents unnecessary age
calculations for empty sections and avoids the associated kernel panic.
This change also allows removing redundant checks in add_victim_entry().

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
4 weeks agof2fs: fix inline data not being written to disk in writeback path
Yongpeng Yang [Wed, 18 Mar 2026 08:46:35 +0000 (16:46 +0800)] 
f2fs: fix inline data not being written to disk in writeback path

When f2fs_fiemap() is called with `fileinfo->fi_flags` containing the
FIEMAP_FLAG_SYNC flag, it attempts to write data to disk before
retrieving file mappings via filemap_write_and_wait(). However, there is
an issue where the file does not get mapped as expected. The following
scenario can occur:

root@vm:/mnt/f2fs# dd if=/dev/zero of=data.3k bs=3k count=1
root@vm:/mnt/f2fs# xfs_io data.3k -c "fiemap -v 0 4096"
data.3k:
 EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
   0: [0..5]:          0..5                 6 0x307

The root cause of this issue is that f2fs_write_single_data_page() only
calls f2fs_write_inline_data() to copy data from the data folio to the
inode folio, and it clears the dirty flag on the data folio. However, it
does not mark the data folio as writeback. When
__filemap_fdatawait_range() checks for folios with the writeback flag,
it returns early, causing f2fs_fiemap() to report that the file has no
mapping.

To fix this issue, the solution is to call
f2fs_write_single_node_folio() in f2fs_inline_data_fiemap() when
getting fiemap with FIEMAP_FLAG_SYNC flags. This patch ensures that the
inode folio is written back and the writeback process completes before
proceeding.

Cc: stable@kernel.org
Fixes: 9ffe0fb5f3bb ("f2fs: handle inline data operations")
Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
4 weeks agof2fs: fix fsck inconsistency caused by FGGC of node block
Yongpeng Yang [Wed, 18 Mar 2026 08:45:34 +0000 (16:45 +0800)] 
f2fs: fix fsck inconsistency caused by FGGC of node block

During FGGC node block migration, fsck may incorrectly treat the
migrated node block as fsync-written data.

The reproduction scenario:
root@vm:/mnt/f2fs# seq 1 2048 | xargs -n 1 ./test_sync // write inline inode and sync
root@vm:/mnt/f2fs# rm -f 1
root@vm:/mnt/f2fs# sync
root@vm:/mnt/f2fs# f2fs_io gc_range // move data block in sync mode and not write CP
  SPO, "fsck --dry-run" find inode has already checkpointed but still
  with DENT_BIT_SHIFT set

The root cause is that GC does not clear the dentry mark and fsync mark
during node block migration, leading fsck to misinterpret them as
user-issued fsync writes.

In BGGC mode, node block migration is handled by f2fs_sync_node_pages(),
which guarantees the dentry and fsync marks are cleared before writing.

This patch move the set/clear of the fsync|dentry marks into
__write_node_folio to make the logic clearer, and ensures the
fsync|dentry mark is cleared in FGGC.

Cc: stable@kernel.org
Fixes: da011cc0da8c ("f2fs: move node pages only in victim section during GC")
Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
4 weeks agof2fs: fix fsck inconsistency caused by incorrect nat_entry flag usage
Yongpeng Yang [Tue, 10 Mar 2026 09:36:12 +0000 (17:36 +0800)] 
f2fs: fix fsck inconsistency caused by incorrect nat_entry flag usage

f2fs_need_dentry_mark() reads nat_entry flags without mutual exclusion
with the checkpoint path, which can result in an incorrect inode block
marking state. The scenario is as follows:

create & write & fsync 'file A'                 write checkpoint
- f2fs_do_sync_file // inline inode
 - f2fs_write_inode // inode folio is dirty
                                                - f2fs_write_checkpoint
                                                 - f2fs_flush_merged_writes
                                                 - f2fs_sync_node_pages
 - f2fs_fsync_node_pages // no dirty node
 - f2fs_need_inode_block_update // return true
 - f2fs_fsync_node_pages // inode dirtied
  - f2fs_need_dentry_mark //return true
                                                 - f2fs_flush_nat_entries
                                                - f2fs_write_checkpoint end
  - __write_node_folio // inode with DENT_BIT_SHIFT set
  SPO, "fsck --dry-run" find inode has already checkpointed but still
  with DENT_BIT_SHIFT set

The state observed by f2fs_need_dentry_mark() can differ from the state
observed in __write_node_folio() after acquiring sbi->node_write. The
root cause is that the semantics of IS_CHECKPOINTED and
HAS_FSYNCED_INODE are only guaranteed after the checkpoint write has
fully completed.

This patch moves set_dentry_mark() into __write_node_folio() and
protects it with the sbi->node_write lock.

Cc: stable@kernel.org
Fixes: 88bd02c9472a ("f2fs: fix conditions to remain recovery information in f2fs_sync_file")
Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
4 weeks agof2fs: fix to do sanity check on dcc->discard_cmd_cnt conditionally
Chao Yu [Wed, 11 Mar 2026 13:35:42 +0000 (21:35 +0800)] 
f2fs: fix to do sanity check on dcc->discard_cmd_cnt conditionally

Syzbot reported a f2fs bug as below:

------------[ cut here ]------------
kernel BUG at fs/f2fs/segment.c:1900!
Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 1 UID: 0 PID: 6527 Comm: syz.5.110 Not tainted syzkaller #0 PREEMPT_{RT,(full)}
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2026
RIP: 0010:f2fs_issue_discard_timeout+0x59b/0x5a0 fs/f2fs/segment.c:1900
Code: d9 80 e1 07 80 c1 03 38 c1 0f 8c d6 fe ff ff 48 89 df e8 a8 5e fa fd e9 c9 fe ff ff e8 4e 46 94 fd 90 0f 0b e8 46 46 94 fd 90 <0f> 0b 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3
RSP: 0018:ffffc9000494f940 EFLAGS: 00010283
RAX: ffffffff843009ca RBX: 0000000000000001 RCX: 0000000000080000
RDX: ffffc9001ca78000 RSI: 00000000000029f3 RDI: 00000000000029f4
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: dffffc0000000000 R11: ffffed100893a431 R12: 1ffff1100893a430
R13: 1ffff1100c2b702c R14: dffffc0000000000 R15: ffff8880449d2160
FS:  00007ffa35fed6c0(0000) GS:ffff88812643d000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f2b68634000 CR3: 0000000039f62000 CR4: 00000000003526f0
Call Trace:
 <TASK>
 __f2fs_remount fs/f2fs/super.c:2960 [inline]
 f2fs_reconfigure+0x108a/0x1710 fs/f2fs/super.c:5443
 reconfigure_super+0x227/0x8a0 fs/super.c:1080
 do_remount fs/namespace.c:3391 [inline]
 path_mount+0xdc5/0x10e0 fs/namespace.c:4151
 do_mount fs/namespace.c:4172 [inline]
 __do_sys_mount fs/namespace.c:4361 [inline]
 __se_sys_mount+0x31d/0x420 fs/namespace.c:4338
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7ffa37dbda0a

The root cause is there will be race condition in between f2fs_ioc_fitrim()
and f2fs_remount():

- f2fs_remount - f2fs_ioc_fitrim
 - f2fs_issue_discard_timeout
  - __issue_discard_cmd
  - __drop_discard_cmd
  - __wait_all_discard_cmd
 - f2fs_trim_fs
  - f2fs_write_checkpoint
   - f2fs_clear_prefree_segments
    - f2fs_issue_discard
     - __issue_discard_async
      - __queue_discard_cmd
       - __update_discard_tree_range
        - __insert_discard_cmd
         - __create_discard_cmd
         : atomic_inc(&dcc->discard_cmd_cnt);
  - sanity check on dcc->discard_cmd_cnt (expect discard_cmd_cnt to be zero)

This will only happen when fitrim races w/ remount rw, if we remount to
readonly filesystem, remount will wait until mnt_pcp.mnt_writers to zero,
that means fitrim is not in process at that time.

Cc: stable@kernel.org
Fixes: 2482c4325dfe ("f2fs: detect bug_on in f2fs_wait_discard_bios")
Reported-by: syzbot+62538b67389ee582837a@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/69b07d7c.050a0220.8df7.09a1.GAE@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
4 weeks agof2fs: refactor node footer flag setting related code
Yongpeng Yang [Wed, 18 Mar 2026 08:45:33 +0000 (16:45 +0800)] 
f2fs: refactor node footer flag setting related code

This patch refactors the node footer flag setting code to simplify
redundant logic and adjust function parameters and return types. No
logical changes.

Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
4 weeks agof2fs: refactor f2fs_move_node_folio function
Yongpeng Yang [Wed, 18 Mar 2026 08:45:32 +0000 (16:45 +0800)] 
f2fs: refactor f2fs_move_node_folio function

This patch refactor the f2fs_move_node_folio() function. No logical
changes.

Cc: stable@kernel.org
Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
4 weeks agoKVM: riscv: selftests: Implement kvm_arch_has_default_irqchip
Mayuresh Chitale [Thu, 2 Apr 2026 10:18:14 +0000 (15:48 +0530)] 
KVM: riscv: selftests: Implement kvm_arch_has_default_irqchip

kvm_arch_has_default_irqchip is required for irqfd_test and returns
true if an in-kernel interrupt controller is supported.

Fixes: a133052666bed ("KVM: selftests: Fix irqfd_test for non-x86 architectures")
Signed-off-by: Mayuresh Chitale <mayuresh.chitale@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260402101818.2982071-1-mayuresh.chitale@oss.qualcomm.com
Signed-off-by: Anup Patel <anup@brainfault.org>
4 weeks agokbuild: vdso_install: drop build ID architecture allow-list
Thomas Weißschuh [Tue, 31 Mar 2026 17:50:22 +0000 (19:50 +0200)] 
kbuild: vdso_install: drop build ID architecture allow-list

Many architectures which do generate build IDs are missing from this
list. For example arm64, riscv, loongarch, mips.

Now that errors from readelf and binaries without any build ID are
handled gracefully, the allow-list is not necessary anymore, drop it.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Nicolas Schier <nsc@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20260331-kbuild-vdso-install-v2-4-606d0dc6beca@weissschuh.net
Signed-off-by: Nicolas Schier <nsc@kernel.org>
4 weeks agokbuild: vdso_install: gracefully handle images without build ID
Thomas Weißschuh [Tue, 31 Mar 2026 17:50:21 +0000 (19:50 +0200)] 
kbuild: vdso_install: gracefully handle images without build ID

If the vDSO does not contain a build ID, skip the symlink step.
This will allow the removal of the explicit list of architectures.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Nicolas Schier <nsc@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20260331-kbuild-vdso-install-v2-3-606d0dc6beca@weissschuh.net
Signed-off-by: Nicolas Schier <nsc@kernel.org>
4 weeks agokbuild: vdso_install: hide readelf warnings
Thomas Weißschuh [Tue, 31 Mar 2026 17:50:20 +0000 (19:50 +0200)] 
kbuild: vdso_install: hide readelf warnings

If 'readelf -n' encounters a note it does not recognize it emits a
warning. This for example happens when inspecting a compat vDSO for
which the main kernel toolchain was not used.
However the relevant build ID note is always readable, so the
warnings are pointless.

Hide the warnings to make it possible to extract build IDs for more
architectures in the future.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Nicolas Schier <nsc@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20260331-kbuild-vdso-install-v2-2-606d0dc6beca@weissschuh.net
Signed-off-by: Nicolas Schier <nsc@kernel.org>
4 weeks agokbuild: vdso_install: split out the readelf invocation
Thomas Weißschuh [Tue, 31 Mar 2026 17:50:19 +0000 (19:50 +0200)] 
kbuild: vdso_install: split out the readelf invocation

Split up the logic as some upcoming changes to the readelf invocation
would create a very long line otherwise.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Nicolas Schier <nsc@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20260331-kbuild-vdso-install-v2-1-606d0dc6beca@weissschuh.net
Signed-off-by: Nicolas Schier <nsc@kernel.org>
4 weeks agoeth: fbnic: Increase FBNIC_QUEUE_SIZE_MIN to 64
Dimitri Daskalakis [Wed, 1 Apr 2026 16:28:48 +0000 (09:28 -0700)] 
eth: fbnic: Increase FBNIC_QUEUE_SIZE_MIN to 64

On systems with 64K pages, RX queues will be wedged if users set the
descriptor count to the current minimum (16). Fbnic fragments large
pages into 4K chunks, and scales down the ring size accordingly. With
64K pages and 16 descriptors, the ring size mask is 0 and will never
be filled.

32 descriptors is another special case that wedges the RX rings.
Internally, the rings track pages for the head/tail pointers, not page
fragments. So with 32 descriptors, there's only 1 usable page as one
ring slot is kept empty to disambiguate between an empty/full ring.
As a result, the head pointer never advances and the HW stalls after
consuming 16 page fragments.

Fixes: 0cb4c0a13723 ("eth: fbnic: Implement Rx queue alloc/start/stop/free")
Signed-off-by: Dimitri Daskalakis <daskald@meta.com>
Link: https://patch.msgid.link/20260401162848.2335350-1-dimitri.daskalakis1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>