]> git.ipfire.org Git - thirdparty/kernel/linux.git/log
thirdparty/kernel/linux.git
8 weeks agoALSA: hda/realtek: Support mute LED for Yoga with ALC287
Jackie Dong [Mon, 14 Jul 2025 09:46:55 +0000 (17:46 +0800)] 
ALSA: hda/realtek: Support mute LED for Yoga with ALC287

Support mute LED on keyboard for Lenovo Yoga series products with
Realtek ALC287 chipset.

Tested on Lenovo Slim Pro 7 14APH8.

[ slight comment cleanup by tiwai ]

Signed-off-by: Jackie Dong <xy-jackie@139.com>
Link: https://patch.msgid.link/20250714094655.4657-1-xy-jackie@139.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
8 weeks agoAdd RPMh regulator support for PM7550 & PMR735B
Mark Brown [Mon, 14 Jul 2025 10:34:22 +0000 (11:34 +0100)] 
Add RPMh regulator support for PM7550 & PMR735B

Merge series from Luca Weiss <luca.weiss@fairphone.com>:

Document and add support for the regulators on PM7550 and PMR735B, which
can be paired with the Milos SoC.

8 weeks agoASoC: codec: Convert to GPIO descriptors for
Mark Brown [Mon, 14 Jul 2025 10:34:16 +0000 (11:34 +0100)] 
ASoC: codec: Convert to GPIO descriptors for

Merge series from Peng Fan <peng.fan@nxp.com>:

This patchset is a pick up of patch 1,2 from [1]. And I also collect
Linus's R-b for patch 2. After this patchset, there is only one user of
of_gpio.h left in sound driver(pxa2xx-ac97).

of_gpio.h is deprecated, update the driver to use GPIO descriptors.

Patch 1 is to drop legacy platform data which in-tree no users are using it
Patch 2 is to convert to GPIO descriptors

Checking the DTS that use the device, all are using GPIOD_ACTIVE_LOW
polarity for reset-gpios, so all should work as expected with this patch.

[1] https://lore.kernel.org/all/20250408-asoc-gpio-v1-0-c0db9d3fd6e9@nxp.com/

8 weeks agoUpdate SDCA Kconfig
Mark Brown [Mon, 14 Jul 2025 10:34:12 +0000 (11:34 +0100)] 
Update SDCA Kconfig

Merge series from Charles Keepax <ckeepax@opensource.cirrus.com>:

Tidy up a bunch of makefile and Kconfig things, and pull the HID and IRQ
into the main SDCA module.

Changes since v1:
 - Don't expose SND_SOC_SDCA to the user
 - Simplify the makefile bits for HID and IRQ

Thanks,
Charles

Charles Keepax (2):
  ASoC: SDCA: Kconfig/Makefile fixups
  ASoC: SDCA: Pull HID and IRQ into the primary SDCA module

 sound/soc/sdca/Kconfig           | 13 +++++++++----
 sound/soc/sdca/Makefile          | 12 ++++--------
 sound/soc/sdca/sdca_functions.c  |  1 -
 sound/soc/sdca/sdca_hid.c        |  2 +-
 sound/soc/sdca/sdca_interrupts.c |  8 ++++----
 5 files changed, 18 insertions(+), 18 deletions(-)

--
2.39.5

8 weeks agoASoC: set bias_level at if
Mark Brown [Mon, 14 Jul 2025 10:34:08 +0000 (11:34 +0100)] 
ASoC: set bias_level at if

Merge series from Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>:

ASoC has 2 functions to set bias level.

(A) snd_soc_dapm_force_bias_level()
(B) snd_soc_dapm_set_bias_level()

(A) sets dapm->bias_level, but (B) is not.
I think we should set it on both (A) and (B).
I think it is miss or bug, but Samsung is the only vendor that feels a
problem about this.

I think this patch (= [1/5]) is correct approach, but some non-samsung
vendor might get affect from this patch-set, so I added [RFC] on this
patch-set.

Furthermore, (B) cares both Card and Component, (A) cares Component only.
I guess it is the reason why it is called as "force" bias_level function.
(A) is used from each drivers, (B) is used from soc-dapm only.
I'm not 100% sure though, except special cases, each driver should use (B),
I guess ?

8 weeks agoiommu/vt-d: Deduplicate cache_tag_flush_all by reusing flush_range
Ethan Milon [Mon, 14 Jul 2025 04:50:28 +0000 (12:50 +0800)] 
iommu/vt-d: Deduplicate cache_tag_flush_all by reusing flush_range

The logic in cache_tag_flush_all() to iterate over cache tags and issue
TLB invalidations is largely duplicated in cache_tag_flush_range(), with
the only difference being the range parameters.

Extend cache_tag_flush_range() to handle a full address space flush when
called with start = 0 and end = ULONG_MAX. This allows
cache_tag_flush_all() to simply delegate to cache_tag_flush_range()

Signed-off-by: Ethan Milon <ethan.milon@eviden.com>
Link: https://lore.kernel.org/r/20250708214821.30967-2-ethan.milon@eviden.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20250714045028.958850-12-baolu.lu@linux.intel.com
Signed-off-by: Will Deacon <will@kernel.org>
8 weeks agoiommu/vt-d: Fix missing PASID in dev TLB flush with cache_tag_flush_all
Ethan Milon [Mon, 14 Jul 2025 04:50:27 +0000 (12:50 +0800)] 
iommu/vt-d: Fix missing PASID in dev TLB flush with cache_tag_flush_all

The function cache_tag_flush_all() was originally implemented with
incorrect device TLB invalidation logic that does not handle PASID, in
commit c4d27ffaa8eb ("iommu/vt-d: Add cache tag invalidation helpers")

This causes regressions where full address space TLB invalidations occur
with a PASID attached, such as during transparent hugepage unmapping in
SVA configurations or when calling iommu_flush_iotlb_all(). In these
cases, the device receives a TLB invalidation that lacks PASID.

This incorrect logic was later extracted into
cache_tag_flush_devtlb_all(), in commit 3297d047cd7f ("iommu/vt-d:
Refactor IOTLB and Dev-IOTLB flush for batching")

The fix replaces the call to cache_tag_flush_devtlb_all() with
cache_tag_flush_devtlb_psi(), which properly handles PASID.

Fixes: 4f609dbff51b ("iommu/vt-d: Use cache helpers in arch_invalidate_secondary_tlbs")
Fixes: 4e589a53685c ("iommu/vt-d: Use cache_tag_flush_all() in flush_iotlb_all")
Signed-off-by: Ethan Milon <ethan.milon@eviden.com>
Link: https://lore.kernel.org/r/20250708214821.30967-1-ethan.milon@eviden.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20250714045028.958850-11-baolu.lu@linux.intel.com
Signed-off-by: Will Deacon <will@kernel.org>
8 weeks agoiommu/vt-d: Split paging_domain_compatible()
Jason Gunthorpe [Mon, 14 Jul 2025 04:50:26 +0000 (12:50 +0800)] 
iommu/vt-d: Split paging_domain_compatible()

Make First/Second stage specific functions that follow the same pattern in
intel_iommu_domain_alloc_first/second_stage() for computing
EOPNOTSUPP. This makes the code easier to understand as if we couldn't
create a domain with the parameters for this IOMMU instance then we
certainly are not compatible with it.

Check superpage support directly against the per-stage cap bits and the
pgsize_bitmap.

Add a note that the force_snooping is read without locking. The locking
needs to cover the compatible check and the add of the device to the list.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/7-v3-dbbe6f7e7ae3+124ffe-vtd_prep_jgg@nvidia.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20250714045028.958850-10-baolu.lu@linux.intel.com
Signed-off-by: Will Deacon <will@kernel.org>
8 weeks agoiommu/vt-d: Split intel_iommu_enforce_cache_coherency()
Jason Gunthorpe [Mon, 14 Jul 2025 04:50:25 +0000 (12:50 +0800)] 
iommu/vt-d: Split intel_iommu_enforce_cache_coherency()

First Stage and Second Stage have very different ways to deny
no-snoop. The first stage uses the PGSNP bit which is global per-PASID so
enabling requires loading new PASID entries for all the attached devices.

Second stage uses a bit per PTE, so enabling just requires telling future
maps to set the bit.

Since we now have two domain ops we can have two functions that can
directly code their required actions instead of a bunch of logic dancing
around use_first_level.

Combine domain_set_force_snooping() into the new functions since they are
the only caller.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/6-v3-dbbe6f7e7ae3+124ffe-vtd_prep_jgg@nvidia.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20250714045028.958850-9-baolu.lu@linux.intel.com
Signed-off-by: Will Deacon <will@kernel.org>
8 weeks agoiommu/vt-d: Create unique domain ops for each stage
Jason Gunthorpe [Mon, 14 Jul 2025 04:50:24 +0000 (12:50 +0800)] 
iommu/vt-d: Create unique domain ops for each stage

Use the domain ops pointer to tell what kind of domain it is instead of
the internal use_first_level indication. This also protects against
wrongly using a SVA/nested/IDENTITY/BLOCKED domain type in places they
should not be.

The only remaining uses of use_first_level outside the paging domain are in
paging_domain_compatible() and intel_iommu_enforce_cache_coherency().

Thus, remove the useless sets of use_first_level in
intel_svm_domain_alloc() and intel_iommu_domain_alloc_nested(). None of
the unique ops for these domain types ever reference it on their call
chains.

Add a WARN_ON() check in domain_context_mapping_one() as it only works
with second stage.

This is preparation for iommupt which will have different ops for each of
the stages.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/5-v3-dbbe6f7e7ae3+124ffe-vtd_prep_jgg@nvidia.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20250714045028.958850-8-baolu.lu@linux.intel.com
Signed-off-by: Will Deacon <will@kernel.org>
8 weeks agoiommu/vt-d: Split intel_iommu_domain_alloc_paging_flags()
Jason Gunthorpe [Mon, 14 Jul 2025 04:50:23 +0000 (12:50 +0800)] 
iommu/vt-d: Split intel_iommu_domain_alloc_paging_flags()

Create stage specific functions that check the stage specific conditions
if each stage can be supported.

Have intel_iommu_domain_alloc_paging_flags() call both stages in sequence
until one does not return EOPNOTSUPP and prefer to use the first stage if
available and suitable for the requested flags.

Move second stage only operations like nested_parent and dirty_tracking
into the second stage function for clarity.

Move initialization of the iommu_domain members into paging_domain_alloc().

Drop initialization of domain->owner as the callers all do it.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/4-v3-dbbe6f7e7ae3+124ffe-vtd_prep_jgg@nvidia.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20250714045028.958850-7-baolu.lu@linux.intel.com
Signed-off-by: Will Deacon <will@kernel.org>
8 weeks agoiommu/vt-d: Do not wipe out the page table NID when devices detach
Jason Gunthorpe [Mon, 14 Jul 2025 04:50:22 +0000 (12:50 +0800)] 
iommu/vt-d: Do not wipe out the page table NID when devices detach

The NID is used to control which NUMA node memory for the page table is
allocated it from. It should be a permanent property of the page table
when it was allocated and not change during attach/detach of devices.

Reviewed-by: Wei Wang <wei.w.wang@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/3-v3-dbbe6f7e7ae3+124ffe-vtd_prep_jgg@nvidia.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Fixes: 7c204426b818 ("iommu/vt-d: Add domain_alloc_paging support")
Link: https://lore.kernel.org/r/20250714045028.958850-6-baolu.lu@linux.intel.com
Signed-off-by: Will Deacon <will@kernel.org>
8 weeks agoiommu/vt-d: Fold domain_exit() into intel_iommu_domain_free()
Jason Gunthorpe [Mon, 14 Jul 2025 04:50:21 +0000 (12:50 +0800)] 
iommu/vt-d: Fold domain_exit() into intel_iommu_domain_free()

It has only one caller, no need for two functions.

Correct the WARN_ON() error handling to leak the entire page table if the
HW is still referencing it so we don't UAF during WARN_ON recovery.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/2-v3-dbbe6f7e7ae3+124ffe-vtd_prep_jgg@nvidia.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20250714045028.958850-5-baolu.lu@linux.intel.com
Signed-off-by: Will Deacon <will@kernel.org>
8 weeks agoiommu/vt-d: Lift the __pa to domain_setup_first_level/intel_svm_set_dev_pasid()
Jason Gunthorpe [Mon, 14 Jul 2025 04:50:20 +0000 (12:50 +0800)] 
iommu/vt-d: Lift the __pa to domain_setup_first_level/intel_svm_set_dev_pasid()

Pass the phys_addr_t down through the call chain from the top instead of
passing a pgd_t * KVA. This moves the __pa() into
domain_setup_first_level() which is the first function to obtain the pgd
from the IOMMU page table in this call chain.

The SVA flow is also adjusted to get the pa of the mm->pgd.

iommput will move the __pa() into iommupt code, it never shares the KVA of
the page table with the driver.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/1-v3-dbbe6f7e7ae3+124ffe-vtd_prep_jgg@nvidia.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20250714045028.958850-4-baolu.lu@linux.intel.com
Signed-off-by: Will Deacon <will@kernel.org>
8 weeks agoiommu/vt-d: Optimize iotlb_sync_map for non-caching/non-RWBF modes
Lu Baolu [Mon, 14 Jul 2025 04:50:19 +0000 (12:50 +0800)] 
iommu/vt-d: Optimize iotlb_sync_map for non-caching/non-RWBF modes

The iotlb_sync_map iommu ops allows drivers to perform necessary cache
flushes when new mappings are established. For the Intel iommu driver,
this callback specifically serves two purposes:

- To flush caches when a second-stage page table is attached to a device
  whose iommu is operating in caching mode (CAP_REG.CM==1).
- To explicitly flush internal write buffers to ensure updates to memory-
  resident remapping structures are visible to hardware (CAP_REG.RWBF==1).

However, in scenarios where neither caching mode nor the RWBF flag is
active, the cache_tag_flush_range_np() helper, which is called in the
iotlb_sync_map path, effectively becomes a no-op.

Despite being a no-op, cache_tag_flush_range_np() involves iterating
through all cache tags of the iommu's attached to the domain, protected
by a spinlock. This unnecessary execution path introduces overhead,
leading to a measurable I/O performance regression. On systems with NVMes
under the same bridge, performance was observed to drop from approximately
~6150 MiB/s down to ~4985 MiB/s.

Introduce a flag in the dmar_domain structure. This flag will only be set
when iotlb_sync_map is required (i.e., when CM or RWBF is set). The
cache_tag_flush_range_np() is called only for domains where this flag is
set. This flag, once set, is immutable, given that there won't be mixed
configurations in real-world scenarios where some IOMMUs in a system
operate in caching mode while others do not. Theoretically, the
immutability of this flag does not impact functionality.

Reported-by: Ioanna Alifieraki <ioanna-maria.alifieraki@canonical.com>
Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2115738
Link: https://lore.kernel.org/r/20250701171154.52435-1-ioanna-maria.alifieraki@canonical.com
Fixes: 129dab6e1286 ("iommu/vt-d: Use cache_tag_flush_range_np() in iotlb_sync_map")
Cc: stable@vger.kernel.org
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20250703031545.3378602-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20250714045028.958850-3-baolu.lu@linux.intel.com
Signed-off-by: Will Deacon <will@kernel.org>
8 weeks agoiommu/vt-d: Remove the CONFIG_X86 wrapping from iommu init hook
Vineeth Pillai (Google) [Mon, 14 Jul 2025 04:50:18 +0000 (12:50 +0800)] 
iommu/vt-d: Remove the CONFIG_X86 wrapping from iommu init hook

iommu init hook is wrapped in CONFI_X86 and is a remnant of dmar.c when
it was a common code in "drivers/pci/dmar.c". This was added in commit
(9d5ce73a64be2 x86: intel-iommu: Convert detect_intel_iommu to use
iommu_init hook)

Now this is built only for x86. This config wrap could be removed.

Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20250616131740.3499289-1-vineeth@bitbyteword.org
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20250714045028.958850-2-baolu.lu@linux.intel.com
Signed-off-by: Will Deacon <will@kernel.org>
8 weeks agoEDAC/synopsys: Clear the ECC counters on init
Shubhrajyoti Datta [Sun, 13 Jul 2025 05:07:53 +0000 (10:37 +0530)] 
EDAC/synopsys: Clear the ECC counters on init

Clear the ECC error and counter registers during initialization/probe to avoid
reporting stale errors that may have occurred before EDAC registration.

For that, unify the Zynq and ZynqMP ECC state reading paths and simplify the
code.

  [ bp: Massage commit message.
    Fix an -Wsometimes-uninitialized warning as reported by
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202507141048.obUv3ZUm-lkp@intel.com ]
Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250713050753.7042-1-shubhrajyoti.datta@amd.com
8 weeks agopmdomain: samsung: Fix splash-screen handover by enforcing a sync_state
Ulf Hansson [Fri, 11 Jul 2025 11:47:19 +0000 (13:47 +0200)] 
pmdomain: samsung: Fix splash-screen handover by enforcing a sync_state

It's has been reported that some Samsung platforms fails to boot with
genpd's new sync_state support.

Typically the problem exists for platforms where bootloaders are turning on
the splash-screen and handing it over to be managed by the kernel. However,
at this point, it's not clear how to correctly solve the problem.

Although, to make the platforms boot again, let's add a temporary hack in
the samsung power-domain provider driver, which enforces a sync_state that
allows the power-domains to be reset before consumer devices starts to be
attached.

Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/all/212a1a56-08a5-48a5-9e98-23de632168d0@samsung.com
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20250711114719.189441-1-ulf.hansson@linaro.org
8 weeks agoMerge patch series "netfs: Fix use of fscache with ceph"
Christian Brauner [Mon, 14 Jul 2025 09:05:07 +0000 (11:05 +0200)] 
Merge patch series "netfs: Fix use of fscache with ceph"

David Howells <dhowells@redhat.com> says:

Here are a couple of patches that fix the use of fscaching with ceph:

 (1) Fix the read collector to mark the write request that it creates to copy
     data to the cache with NETFS_RREQ_OFFLOAD_COLLECTION so that it will run
     the write collector on a workqueue as it's meant to run in the background
     and the app isn't going to wait for it.

 (2) Fix the read collector to wake up the copy-to-cache write request after
     it sets NETFS_RREQ_ALL_QUEUED if the write request doesn't have any
     subrequests left on it.  ALL_QUEUED indicates that there won't be any
     more subreqs coming and the collector should clean up - except that an
     event is needed to trigger that, but it only gets events from subreq
     termination and so the last event can beat us to setting ALL_QUEUED.

* patches from https://lore.kernel.org/20250711151005.2956810-1-dhowells@redhat.com:
  netfs: Fix race between cache write completion and ALL_QUEUED being set
  netfs: Fix copy-to-cache so that it performs collection with ceph+fscache

Link: https://lore.kernel.org/20250711151005.2956810-1-dhowells@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
8 weeks agonetfs: Fix race between cache write completion and ALL_QUEUED being set
David Howells [Fri, 11 Jul 2025 15:10:01 +0000 (16:10 +0100)] 
netfs: Fix race between cache write completion and ALL_QUEUED being set

When netfslib is issuing subrequests, the subrequests start processing
immediately and may complete before we reach the end of the issuing
function.  At the end of the issuing function we set NETFS_RREQ_ALL_QUEUED
to indicate to the collector that we aren't going to issue any more subreqs
and that it can do the final notifications and cleanup.

Now, this isn't a problem if the request is synchronous
(NETFS_RREQ_OFFLOAD_COLLECTION is unset) as the result collection will be
done in-thread and we're guaranteed an opportunity to run the collector.

However, if the request is asynchronous, collection is primarily triggered
by the termination of subrequests queuing it on a workqueue.  Now, a race
can occur here if the app thread sets ALL_QUEUED after the last subrequest
terminates.

This can happen most easily with the copy2cache code (as used by Ceph)
where, in the collection routine of a read request, an asynchronous write
request is spawned to copy data to the cache.  Folios are added to the
write request as they're unlocked, but there may be a delay before
ALL_QUEUED is set as the write subrequests may complete before we get
there.

If all the write subreqs have finished by the ALL_QUEUED point, no further
events happen and the collection never happens, leaving the request
hanging.

Fix this by queuing the collector after setting ALL_QUEUED.  This is a bit
heavy-handed and it may be sufficient to do it only if there are no extant
subreqs.

Also add a tracepoint to cross-reference both requests in a copy-to-request
operation and add a trace to the netfs_rreq tracepoint to indicate the
setting of ALL_QUEUED.

Fixes: e2d46f2ec332 ("netfs: Change the read result collector to only use one work item")
Reported-by: Max Kellermann <max.kellermann@ionos.com>
Link: https://lore.kernel.org/r/CAKPOu+8z_ijTLHdiCYGU_Uk7yYD=shxyGLwfe-L7AV3DhebS3w@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/20250711151005.2956810-3-dhowells@redhat.com
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
cc: Paulo Alcantara <pc@manguebit.org>
cc: Viacheslav Dubeyko <slava@dubeyko.com>
cc: Alex Markuze <amarkuze@redhat.com>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: netfs@lists.linux.dev
cc: ceph-devel@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: stable@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
8 weeks agonetfs: Fix copy-to-cache so that it performs collection with ceph+fscache
David Howells [Fri, 11 Jul 2025 15:10:00 +0000 (16:10 +0100)] 
netfs: Fix copy-to-cache so that it performs collection with ceph+fscache

The netfs copy-to-cache that is used by Ceph with local caching sets up a
new request to write data just read to the cache.  The request is started
and then left to look after itself whilst the app continues.  The request
gets notified by the backing fs upon completion of the async DIO write, but
then tries to wake up the app because NETFS_RREQ_OFFLOAD_COLLECTION isn't
set - but the app isn't waiting there, and so the request just hangs.

Fix this by setting NETFS_RREQ_OFFLOAD_COLLECTION which causes the
notification from the backing filesystem to put the collection onto a work
queue instead.

Fixes: e2d46f2ec332 ("netfs: Change the read result collector to only use one work item")
Reported-by: Max Kellermann <max.kellermann@ionos.com>
Link: https://lore.kernel.org/r/CAKPOu+8z_ijTLHdiCYGU_Uk7yYD=shxyGLwfe-L7AV3DhebS3w@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/20250711151005.2956810-2-dhowells@redhat.com
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
cc: Paulo Alcantara <pc@manguebit.org>
cc: Viacheslav Dubeyko <slava@dubeyko.com>
cc: Alex Markuze <amarkuze@redhat.com>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: netfs@lists.linux.dev
cc: ceph-devel@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: stable@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
8 weeks agosched/topology: Remove sched_domain_topology_level::flags
K Prateek Nayak [Fri, 11 Jul 2025 05:50:30 +0000 (11:20 +0530)] 
sched/topology: Remove sched_domain_topology_level::flags

Support for overlapping domains added in commit e3589f6c81e4 ("sched:
Allow for overlapping sched_domain spans") also allowed forcefully
setting SD_OVERLAP for !NUMA domains via FORCE_SD_OVERLAP sched_feat().

Since NUMA domains had to be presumed overlapping to ensure correct
behavior, "sched_domain_topology_level::flags" was introduced. NUMA
domains added the SDTL_OVERLAP flag would ensure SD_OVERLAP was always
added during build_sched_domains() for these domains, even when
FORCE_SD_OVERLAP was off.

Condition for adding the SD_OVERLAP flag at the aforementioned commit
was as follows:

    if (tl->flags & SDTL_OVERLAP || sched_feat(FORCE_SD_OVERLAP))
            sd->flags |= SD_OVERLAP;

The FORCE_SD_OVERLAP debug feature was removed in commit af85596c74de
("sched/topology: Remove FORCE_SD_OVERLAP") which left the NUMA domains
as the exclusive users of SDTL_OVERLAP, SD_OVERLAP, and SD_NUMA flags.

Get rid of SDTL_OVERLAP and SD_OVERLAP as they have become redundant
and instead rely on SD_NUMA to detect the only overlapping domain
currently supported. Since SDTL_OVERLAP was the only user of
"tl->flags", get rid of "sched_domain_topology_level::flags" too.

Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/ba4dbdf8-bc37-493d-b2e0-2efb00ea3e19@amd.com
8 weeks agox86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled
Li Chen [Thu, 10 Jul 2025 10:57:10 +0000 (18:57 +0800)] 
x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled

Currently, the SMT domain is added into sched_domain_topology by default.

If cpu_attach_domain() finds that the CPU SMT domain’s cpumask_weight
is just 1, it will destroy it.

On a large machine, such as one with 512 cores, this results in
512 redundant domain attach/destroy operations.

Avoid these unnecessary operations by simply checking
cpu_smt_num_threads and skip SMT domain if the SMT domain is not
enabled.

Suggested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Li Chen <chenl311@chinatelecom.cn>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20250710105715.66594-5-me@linux.beauty
8 weeks agox86/smpboot: moves x86_topology to static initialize and truncate
Li Chen [Thu, 10 Jul 2025 10:57:09 +0000 (18:57 +0800)] 
x86/smpboot: moves x86_topology to static initialize and truncate

The #ifdeffery and the initializers in build_sched_topology() are just
disgusting.

Statically initialize the domain levels in the topology array and let
build_sched_topology() invalidate the package domain level when NUMA in
package is available.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Li Chen <chenl311@chinatelecom.cn>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20250710105715.66594-4-me@linux.beauty
8 weeks agox86/smpboot: remove redundant CONFIG_SCHED_SMT
Li Chen [Thu, 10 Jul 2025 10:57:08 +0000 (18:57 +0800)] 
x86/smpboot: remove redundant CONFIG_SCHED_SMT

On x86 CONFIG_SCHED_SMT is default y if SMP is enabled, so let's
simply drop CONFIG_SCHED_SMT.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Li Chen <chenl311@chinatelecom.cn>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20250710105715.66594-3-me@linux.beauty
8 weeks agosmpboot: introduce SDTL_INIT() helper to tidy sched topology setup
Li Chen [Thu, 10 Jul 2025 10:57:07 +0000 (18:57 +0800)] 
smpboot: introduce SDTL_INIT() helper to tidy sched topology setup

Define a small SDTL_INIT(maskfn, flagsfn, name) macro and use it to build the
sched_domain_topology_level array. Purely a cleanup; behaviour is unchanged.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Li Chen <chenl311@chinatelecom.cn>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20250710105715.66594-2-me@linux.beauty
8 weeks agotools/sched: Add dl_bw_dump.py for printing bandwidth accounting info
Juri Lelli [Fri, 27 Jun 2025 11:51:18 +0000 (13:51 +0200)] 
tools/sched: Add dl_bw_dump.py for printing bandwidth accounting info

dl_rq bandwidth accounting information is crucial for the correct
functioning of SCHED_DEADLINE.

Add a drgn script for accessing that information at runtime, so that
it's easier to check and debug issues related to it.

Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Marcel Ziswiler <marcel.ziswiler@codethink.co.uk> # nuc & rock5b
Link: https://lore.kernel.org/r/20250627115118.438797-6-juri.lelli@redhat.com
8 weeks agotools/sched: Add root_domains_dump.py which dumps root domains info
Juri Lelli [Fri, 27 Jun 2025 11:51:17 +0000 (13:51 +0200)] 
tools/sched: Add root_domains_dump.py which dumps root domains info

Root domains information is somewhat hard to access at runtime. Even
with sched_debug and sched_verbose, such information is only printed
on kernel console when domains are modified.

Add a simple drgn script to more easily retrieve root domains
information at runtime.

Since tools/sched is a new directory, add it to MAINTAINERS as well.

Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Marcel Ziswiler <marcel.ziswiler@codethink.co.uk> # nuc & rock5b
Link: https://lore.kernel.org/r/20250627115118.438797-5-juri.lelli@redhat.com
8 weeks agosched/deadline: Fix accounting after global limits change
Juri Lelli [Fri, 27 Jun 2025 11:51:16 +0000 (13:51 +0200)] 
sched/deadline: Fix accounting after global limits change

A global limits change (sched_rt_handler() logic) currently leaves stale
and/or incorrect values in variables related to accounting (e.g.
extra_bw).

Properly clean up per runqueue variables before implementing the change
and rebuild scheduling domains (so that accounting is also properly
restored) after such a change is complete.

Reported-by: Marcel Ziswiler <marcel.ziswiler@codethink.co.uk>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Marcel Ziswiler <marcel.ziswiler@codethink.co.uk> # nuc & rock5b
Link: https://lore.kernel.org/r/20250627115118.438797-4-juri.lelli@redhat.com
8 weeks agosched/deadline: Reset extra_bw to max_bw when clearing root domains
Juri Lelli [Fri, 27 Jun 2025 11:51:15 +0000 (13:51 +0200)] 
sched/deadline: Reset extra_bw to max_bw when clearing root domains

dl_clear_root_domain() doesn't take into account the fact that per-rq
extra_bw variables retain values computed before root domain changes,
resulting in broken accounting.

Fix it by resetting extra_bw to max_bw before restoring back dl-servers
contributions.

Fixes: 2ff899e351643 ("sched/deadline: Rebuild root domain accounting after every update")
Reported-by: Marcel Ziswiler <marcel.ziswiler@codethink.co.uk>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Marcel Ziswiler <marcel.ziswiler@codethink.co.uk> # nuc & rock5b
Link: https://lore.kernel.org/r/20250627115118.438797-3-juri.lelli@redhat.com
8 weeks agosched/deadline: Initialize dl_servers after SMP
Juri Lelli [Fri, 27 Jun 2025 11:51:14 +0000 (13:51 +0200)] 
sched/deadline: Initialize dl_servers after SMP

dl-servers are currently initialized too early at boot when CPUs are not
fully up (only boot CPU is). This results in miscalculation of per
runqueue DEADLINE variables like extra_bw (which needs a stable CPU
count).

Move initialization of dl-servers later on after SMP has been
initialized and CPUs are all online, so that CPU count is stable and
DEADLINE variables can be computed correctly.

Fixes: d741f297bceaf ("sched/fair: Fair server interface")
Reported-by: Marcel Ziswiler <marcel.ziswiler@codethink.co.uk>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Tested-by: Marcel Ziswiler <marcel.ziswiler@codethink.co.uk> # nuc & rock5b
Link: https://lore.kernel.org/r/20250627115118.438797-2-juri.lelli@redhat.com
8 weeks agosched: Change nr_uninterruptible type to unsigned long
Aruna Ramakrishna [Wed, 9 Jul 2025 17:33:28 +0000 (17:33 +0000)] 
sched: Change nr_uninterruptible type to unsigned long

The commit e6fe3f422be1 ("sched: Make multiple runqueue task counters
32-bit") changed nr_uninterruptible to an unsigned int. But the
nr_uninterruptible values for each of the CPU runqueues can grow to
large numbers, sometimes exceeding INT_MAX. This is valid, if, over
time, a large number of tasks are migrated off of one CPU after going
into an uninterruptible state. Only the sum of all nr_interruptible
values across all CPUs yields the correct result, as explained in a
comment in kernel/sched/loadavg.c.

Change the type of nr_uninterruptible back to unsigned long to prevent
overflows, and thus the miscalculation of load average.

Fixes: e6fe3f422be1 ("sched: Make multiple runqueue task counters 32-bit")
Signed-off-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250709173328.606794-1-aruna.ramakrishna@oracle.com
8 weeks agoMerge patch series "refactor the iomap writeback code v5"
Christian Brauner [Mon, 14 Jul 2025 08:51:38 +0000 (10:51 +0200)] 
Merge patch series "refactor the iomap writeback code v5"

Christoph Hellwig <hch@lst.de> says:

This is an alternative approach to the writeback part of the
"fuse: use iomap for buffered writes + writeback" series from Joanne.

The big difference compared to Joanne's version is that I hope the
split between the generic and ioend/bio based writeback code is a bit
cleaner here.  We have two methods that define the split between the
generic writeback code, and the implemementation of it, and all knowledge
of ioends and bios now sits below that layer.

This version passes testing on xfs, and gets as far as mainline for
gfs2 (crashes in generic/361).

* patches from https://lore.kernel.org/20250710133343.399917-1-hch@lst.de:
  iomap: build the writeback code without CONFIG_BLOCK
  iomap: add read_folio_range() handler for buffered writes
  iomap: improve argument passing to iomap_read_folio_sync
  iomap: replace iomap_folio_ops with iomap_write_ops
  iomap: export iomap_writeback_folio
  iomap: move folio_unlock out of iomap_writeback_folio
  iomap: rename iomap_writepage_map to iomap_writeback_folio
  iomap: move all ioend handling to ioend.c
  iomap: add public helpers for uptodate state manipulation
  iomap: hide ioends from the generic writeback code
  iomap: refactor the writeback interface
  iomap: cleanup the pending writeback tracking in iomap_writepage_map_blocks
  iomap: pass more arguments using the iomap writeback context
  iomap: header diet

Link: https://lore.kernel.org/20250710133343.399917-1-hch@lst.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
8 weeks agoiomap: build the writeback code without CONFIG_BLOCK
Christoph Hellwig [Thu, 10 Jul 2025 13:33:38 +0000 (15:33 +0200)] 
iomap: build the writeback code without CONFIG_BLOCK

Allow fuse to use the iomap writeback code even when CONFIG_BLOCK is
not enabled.  Do this with an ifdef instead of a separate file to keep
the iomap_folio_state local to buffered-io.c.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/20250710133343.399917-15-hch@lst.de
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
8 weeks agoiomap: add read_folio_range() handler for buffered writes
Christoph Hellwig [Thu, 10 Jul 2025 13:33:37 +0000 (15:33 +0200)] 
iomap: add read_folio_range() handler for buffered writes

Add a read_folio_range() handler for buffered writes that filesystems
may pass in if they wish to provide a custom handler for synchronously
reading in the contents of a folio.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
[hch: renamed to read_folio_range, pass less arguments]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/20250710133343.399917-14-hch@lst.de
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
8 weeks agoiomap: improve argument passing to iomap_read_folio_sync
Christoph Hellwig [Thu, 10 Jul 2025 13:33:36 +0000 (15:33 +0200)] 
iomap: improve argument passing to iomap_read_folio_sync

Pass the iomap_iter and derive the map inside iomap_read_folio_sync
instead of in the caller, and use the more descriptive srcmap name for
the source iomap.  Stop passing the offset into folio argument as it
can be derived from the folio and the file offset.  Rename the
variables for the offset into the file and the length to be more
descriptive and match the rest of the code.

Rename the function itself to iomap_read_folio_range to make the use
more clear.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/20250710133343.399917-13-hch@lst.de
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
8 weeks agoiomap: replace iomap_folio_ops with iomap_write_ops
Christoph Hellwig [Thu, 10 Jul 2025 13:33:35 +0000 (15:33 +0200)] 
iomap: replace iomap_folio_ops with iomap_write_ops

The iomap_folio_ops are only used for buffered writes, including the zero
and unshare variants.  Rename them to iomap_write_ops to better describe
the usage, and pass them through the call chain like the other operation
specific methods instead of through the iomap.

xfs_iomap_valid grows a IOMAP_HOLE check to keep the existing behavior
that never attached the folio_ops to a iomap representing a hole.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/20250710133343.399917-12-hch@lst.de
Acked-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
8 weeks agoiomap: export iomap_writeback_folio
Christoph Hellwig [Thu, 10 Jul 2025 13:33:34 +0000 (15:33 +0200)] 
iomap: export iomap_writeback_folio

Allow fuse to use iomap_writeback_folio for folio laundering.  Note
that the caller needs to manually submit the pending writeback context.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/20250710133343.399917-11-hch@lst.de
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
8 weeks agoiomap: move folio_unlock out of iomap_writeback_folio
Joanne Koong [Thu, 10 Jul 2025 13:33:33 +0000 (15:33 +0200)] 
iomap: move folio_unlock out of iomap_writeback_folio

Move unlocking the folio out of iomap_writeback_folio into the caller.
This means the end writeback machinery is now run with the folio locked
when no writeback happened, or writeback completed extremely fast.

Note that having the folio locked over the call to folio_end_writeback in
iomap_writeback_folio means that the dropbehind handling there will never
run because the trylock fails.  The only way this can happen is if the
writepage either never wrote back any dirty data at all, in which case
the dropbehind handling isn't needed, or if all writeback finished
instantly, which is rather unlikely.  Even in the latter case the
dropbehind handling is an optional optimization so skipping it will not
cause correctness issues.

This prepares for exporting iomap_writeback_folio for use in folio
laundering.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
[hch: split from a larger patch]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/20250710133343.399917-10-hch@lst.de
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
8 weeks agoiomap: rename iomap_writepage_map to iomap_writeback_folio
Christoph Hellwig [Thu, 10 Jul 2025 13:33:32 +0000 (15:33 +0200)] 
iomap: rename iomap_writepage_map to iomap_writeback_folio

->writepage is gone, and our naming wasn't always that great to start
with.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/20250710133343.399917-9-hch@lst.de
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
8 weeks agoiomap: move all ioend handling to ioend.c
Christoph Hellwig [Thu, 10 Jul 2025 13:33:31 +0000 (15:33 +0200)] 
iomap: move all ioend handling to ioend.c

Now that the writeback code has the proper abstractions, all the ioend
code can be self-contained in ioend.c.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/20250710133343.399917-8-hch@lst.de
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
8 weeks agoiomap: add public helpers for uptodate state manipulation
Joanne Koong [Thu, 10 Jul 2025 13:33:30 +0000 (15:33 +0200)] 
iomap: add public helpers for uptodate state manipulation

Add a new iomap_start_folio_write helper to abstract away the
write_bytes_pending handling, and export it and the existing
iomap_finish_folio_write for non-iomap writeback in fuse.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
[hch: split from a larger patch]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/20250710133343.399917-7-hch@lst.de
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
8 weeks agoiomap: hide ioends from the generic writeback code
Christoph Hellwig [Thu, 10 Jul 2025 13:33:29 +0000 (15:33 +0200)] 
iomap: hide ioends from the generic writeback code

Replace the ioend pointer in iomap_writeback_ctx with a void *wb_ctx
one to facilitate non-block, non-ioend writeback for use.  Rename
the submit_ioend method to writeback_submit and make it mandatory so
that the generic writeback code stops seeing ioends and bios.

Co-developed-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/20250710133343.399917-6-hch@lst.de
Acked-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
8 weeks agoiomap: refactor the writeback interface
Christoph Hellwig [Thu, 10 Jul 2025 13:33:28 +0000 (15:33 +0200)] 
iomap: refactor the writeback interface

Replace ->map_blocks with a new ->writeback_range, which differs in the
following ways:

 - it must also queue up the I/O for writeback, that is called into the
   slightly refactored and extended in scope iomap_add_to_ioend for
   each region
 - can handle only a part of the requested region, that is the retry
   loop for partial mappings moves to the caller
 - handles cleanup on failures as well, and thus also replaces the
   discard_folio method only implemented by XFS.

This will allow to use the iomap writeback code also for file systems
that are not block based like fuse.

Co-developed-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/20250710133343.399917-5-hch@lst.de
Acked-by: Damien Le Moal <dlemoal@kernel.org> # zonefs
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
8 weeks agoiomap: cleanup the pending writeback tracking in iomap_writepage_map_blocks
Joanne Koong [Thu, 10 Jul 2025 13:33:27 +0000 (15:33 +0200)] 
iomap: cleanup the pending writeback tracking in iomap_writepage_map_blocks

We don't care about the count of outstanding ioends, just if there is one.
Replace the count variable passed to iomap_writepage_map_blocks with a
boolean to make that more clear.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
[hch: rename the variable, update the commit message]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/20250710133343.399917-4-hch@lst.de
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
8 weeks agoiomap: pass more arguments using the iomap writeback context
Christoph Hellwig [Thu, 10 Jul 2025 13:33:26 +0000 (15:33 +0200)] 
iomap: pass more arguments using the iomap writeback context

Add inode and wpc fields to pass the inode and writeback context that
are needed in the entire writeback call chain, and let the callers
initialize all fields in the writeback context before calling
iomap_writepages to simplify the argument passing.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/20250710133343.399917-3-hch@lst.de
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
8 weeks agoiomap: header diet
Christoph Hellwig [Thu, 10 Jul 2025 13:33:25 +0000 (15:33 +0200)] 
iomap: header diet

Drop various unused #include statements.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/20250710133343.399917-2-hch@lst.de
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
8 weeks agodon't bother with path_get()/path_put() in unix_open_file()
Al Viro [Sat, 12 Jul 2025 05:41:57 +0000 (06:41 +0100)] 
don't bother with path_get()/path_put() in unix_open_file()

Once unix_sock ->path is set, we are guaranteed that its ->path will remain
unchanged (and pinned) until the socket is closed.  OTOH, dentry_open()
does not modify the path passed to it.

IOW, there's no need to copy unix_sk(sk)->path in unix_open_file() - we
can just pass it to dentry_open() and be done with that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/20250712054157.GZ1880847@ZenIV
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
8 weeks agofix a leak in fcntl_dirnotify()
Al Viro [Sat, 12 Jul 2025 17:18:43 +0000 (18:18 +0100)] 
fix a leak in fcntl_dirnotify()

[into #fixes, unless somebody objects]

Lifetime of new_dn_mark is controlled by that of its ->fsn_mark,
pointed to by new_fsn_mark.  Unfortunately, a failure exit had
been inserted between the allocation of new_dn_mark and the
call of fsnotify_init_mark(), ending up with a leak.

Fixes: 1934b212615d "file: reclaim 24 bytes from f_owner"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/20250712171843.GB1880847@ZenIV
Signed-off-by: Christian Brauner <brauner@kernel.org>
8 weeks agoxen: fix UAF in dmabuf_exp_from_pages()
Al Viro [Sat, 12 Jul 2025 05:09:16 +0000 (06:09 +0100)] 
xen: fix UAF in dmabuf_exp_from_pages()

[dma_buf_fd() fixes; no preferences regarding the tree it goes through -
up to xen folks]

As soon as we'd inserted a file reference into descriptor table, another
thread could close it.  That's fine for the case when all we are doing is
returning that descriptor to userland (it's a race, but it's a userland
race and there's nothing the kernel can do about it).  However, if we
follow fd_install() with any kind of access to objects that would be
destroyed on close (be it the struct file itself or anything destroyed
by its ->release()), we have a UAF.

dma_buf_fd() is a combination of reserving a descriptor and fd_install().
gntdev dmabuf_exp_from_pages() calls it and then proceeds to access the
objects destroyed on close - starting with gntdev_dmabuf itself.

Fix that by doing reserving descriptor before anything else and do
fd_install() only when everything had been set up.

Fixes: a240d6e42e28 ("xen/gntdev: Implement dma-buf export functionality")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Juergen Gross <jgross@suse.com>
Message-ID: <20250712050916.GY1880847@ZenIV>
Signed-off-by: Juergen Gross <jgross@suse.com>
8 weeks agoxen: Remove some deadcode (x)
Dr. David Alan Gilbert [Sun, 13 Jul 2025 13:26:25 +0000 (14:26 +0100)] 
xen: Remove some deadcode (x)

Remove three uncalled functions:

  xenbus_mkdir() was added in 2007 by
commit 4bac07c993d0 ("xen: add the Xenbus sysfs and virtual device hotplug
driver")
but has remained unused.

  xen_get_runstate_snapshot() last use was removed in 2016 by
commit 6ba286ad8457 ("xen: support runqueue steal time on xen")
which replaces the use by the _cpu version.

  xen_resume_notifier_unregister() last use was removed in 2017 by
commit 1914f0cd203c ("xen/acpi: upload PM state from init-domain to Xen")

Remove them.

Signed-off-by: "Dr. David Alan Gilbert" <linux@treblig.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Message-ID: <20250713132625.164728-1-linux@treblig.org>

8 weeks agoxen-pciback: Replace scnprintf() with sysfs_emit_at()
Ryan Chung [Tue, 8 Jul 2025 00:14:40 +0000 (09:14 +0900)] 
xen-pciback: Replace scnprintf() with sysfs_emit_at()

This is the third revision (v3) of this patch series.
No changes since v2—only adding Reviewed-by lines.

Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Ryan Chung <seokwoo.chung130@gmail.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Message-ID: <20250708001444.86155-1-seokwoo.chung130@gmail.com>

8 weeks agoxen/xenbus: fix W=1 build warning in xenbus_va_dev_error function
Peng Jiang [Fri, 20 Jun 2025 00:41:04 +0000 (08:41 +0800)] 
xen/xenbus: fix W=1 build warning in xenbus_va_dev_error function

This patch fixes a W=1 format-string warning reported by GCC 12.3.0
by annotating xenbus_switch_fatal() and xenbus_va_dev_error()
with the __printf attribute. The attribute enables compile-time
validation of printf-style format strings in these functions.

The original warning trace:
drivers/xen/xenbus/xenbus_client.c:304:9: warning: function 'xenbus_va_dev_error' might be
a candidate for 'gnu_printf' format attribute [-Wsuggest-attribute=format]

Signed-off-by: Peng Jiang <jiang.peng9@zte.com.cn>
Reviewed-by: Juergen Gross <jgross@suse.com>
Message-ID: <20250620084104786r5xoR16_AmYZMJLnno3_Q@zte.com.cn>
Signed-off-by: Juergen Gross <jgross@suse.com>
8 weeks agoata: pata_rdc: Use registered definition for the RDC vendor
Andy Shevchenko [Fri, 11 Jul 2025 11:36:50 +0000 (14:36 +0300)] 
ata: pata_rdc: Use registered definition for the RDC vendor

Convert to PCI_VDEVICE() and use registered definition for RDC vendor
from pci_ids.h.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20250711113650.1475307-1-andriy.shevchenko@linux.intel.com
[cassel: add ata: prefix to subject, fix typo in Damien's Rb tag]
Signed-off-by: Niklas Cassel <cassel@kernel.org>
8 weeks agoi2c: Clarify behavior of I2C_M_RD flag
I Viswanath [Wed, 9 Jul 2025 15:07:18 +0000 (20:37 +0530)] 
i2c: Clarify behavior of I2C_M_RD flag

Update the description of I2C_M_RD to clarify that not setting it
signals a write transaction

Signed-off-by: I Viswanath <viswanathiyyappan@gmail.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
8 weeks agoMerge tag 'i2c-host-fixes-6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel...
Wolfram Sang [Mon, 14 Jul 2025 07:02:25 +0000 (09:02 +0200)] 
Merge tag 'i2c-host-fixes-6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current

i2c-host-fixes for v6.16-rc6

omap: add missing error check and fix PM disable in probe error
path.

stm32: unmap DMA buffer on transfer failure and use correct
device when mapping and unmapping during transfers.

8 weeks agoMerge branch 'ipsec: fix splat due to ipcomp fallback tunnel'
Steffen Klassert [Mon, 14 Jul 2025 06:59:48 +0000 (08:59 +0200)] 
Merge branch 'ipsec: fix splat due to ipcomp fallback tunnel'

Sabrina Dubroca says:

====================
IPcomp tunnel states have an associated fallback tunnel, a keep a
reference on the corresponding xfrm_state, to allow deleting that
extra state when it's not needed anymore. These states cause issues
during netns deletion.

Commit f75a2804da39 ("xfrm: destroy xfrm_state synchronously on net
exit path") tried to address these problems but doesn't fully solve
them, and slowed down netns deletion by adding one synchronize_rcu per
deleted state.

The first patch solves the problem by moving the fallback state
deletion earlier (when we delete the user state, rather than at
destruction), then we can revert the previous fix.
====================

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
8 weeks agoclk: sunxi-ng: ccu_nm: convert from round_rate() to determine_rate()
Brian Masney [Thu, 3 Jul 2025 23:22:34 +0000 (19:22 -0400)] 
clk: sunxi-ng: ccu_nm: convert from round_rate() to determine_rate()

The round_rate() clk ops is deprecated, so migrate this driver from
round_rate() to determine_rate() using the Coccinelle semantic patch
on the cover letter of this series.

I manually fixed up one minor formatting issue that occurred after
applying the semantic patch:

        req->rate = ccu_nm_find_best(&nm->common, req->best_parent_rate,
                                     req->rate,
                                     &_nm);

I manually changed it to:

        req->rate = ccu_nm_find_best(&nm->common, req->best_parent_rate,
                                     req->rate, &_nm);

Signed-off-by: Brian Masney <bmasney@redhat.com>
Reviewed-by: Maxime Ripard <mripard@kernel.org>
Link: https://patch.msgid.link/20250703-clk-cocci-drop-round-rate-v1-10-3a8da898367e@redhat.com
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
8 weeks agoclk: sunxi-ng: ccu_nkmp: convert from round_rate() to determine_rate()
Brian Masney [Thu, 3 Jul 2025 23:22:33 +0000 (19:22 -0400)] 
clk: sunxi-ng: ccu_nkmp: convert from round_rate() to determine_rate()

The round_rate() clk ops is deprecated, so migrate this driver from
round_rate() to determine_rate() using the Coccinelle semantic patch
on the cover letter of this series.

Signed-off-by: Brian Masney <bmasney@redhat.com>
Reviewed-by: Maxime Ripard <mripard@kernel.org>
Link: https://patch.msgid.link/20250703-clk-cocci-drop-round-rate-v1-9-3a8da898367e@redhat.com
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
8 weeks agoclk: sunxi-ng: ccu_nk: convert from round_rate() to determine_rate()
Brian Masney [Thu, 3 Jul 2025 23:22:32 +0000 (19:22 -0400)] 
clk: sunxi-ng: ccu_nk: convert from round_rate() to determine_rate()

The round_rate() clk ops is deprecated, so migrate this driver from
round_rate() to determine_rate() using the Coccinelle semantic patch
on the cover letter of this series.

Signed-off-by: Brian Masney <bmasney@redhat.com>
Reviewed-by: Maxime Ripard <mripard@kernel.org>
Link: https://patch.msgid.link/20250703-clk-cocci-drop-round-rate-v1-8-3a8da898367e@redhat.com
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
8 weeks agoclk: sunxi-ng: ccu_gate: convert from round_rate() to determine_rate()
Brian Masney [Thu, 3 Jul 2025 23:22:31 +0000 (19:22 -0400)] 
clk: sunxi-ng: ccu_gate: convert from round_rate() to determine_rate()

The round_rate() clk ops is deprecated, so migrate this driver from
round_rate() to determine_rate() using the Coccinelle semantic patch
on the cover letter of this series.

Signed-off-by: Brian Masney <bmasney@redhat.com>
Reviewed-by: Maxime Ripard <mripard@kernel.org>
Link: https://patch.msgid.link/20250703-clk-cocci-drop-round-rate-v1-7-3a8da898367e@redhat.com
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
8 weeks agoclk: sunxi-ng: v3s: Assign the de and tcon clocks to the video pll
Paul Kocialkowski [Fri, 4 Jul 2025 15:40:08 +0000 (17:40 +0200)] 
clk: sunxi-ng: v3s: Assign the de and tcon clocks to the video pll

It appears (based on experimentation) that both the de and tcon clocks
need to have the same parent for the two units to work together.

Assign them both to the video pll by manually clearing the parent
selection bits (effectively setting index 0) and marking the clocks
with the CLK_SET_RATE_NO_REPARENT flag, which ensures that they will
never use a different parent.

The video pll is also a possible parent for the camera subsystem,
but it can use the dedicated isp pll if needed so there should be
no negative side-effect due to this change.

Note that ccu_mux_helper_set_parent cannot be used at this stage as
it requires the clock driver to be initialized and this configuration
is best done before the clock driver is available to consumers.

Signed-off-by: Paul Kocialkowski <paulk@sys-base.io>
Link: https://patch.msgid.link/20250704154008.3463257-2-paulk@sys-base.io
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
8 weeks agoclk: sunxi-ng: v3s: Fix de clock definition
Paul Kocialkowski [Fri, 4 Jul 2025 15:40:07 +0000 (17:40 +0200)] 
clk: sunxi-ng: v3s: Fix de clock definition

The de clock is marked with CLK_SET_RATE_PARENT, which is really not
necessary (as confirmed from experimentation) and significantly
restricts flexibility for other clocks using the same parent.

In addition the source selection (parent) field is marked as using
2 bits, when it the documentation reports that it uses 3.

Fix both issues in the de clock definition.

Fixes: d0f11d14b0bc ("clk: sunxi-ng: add support for V3s CCU")
Signed-off-by: Paul Kocialkowski <paulk@sys-base.io>
Link: https://patch.msgid.link/20250704154008.3463257-1-paulk@sys-base.io
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
8 weeks agoext4: fix insufficient credits calculation in ext4_meta_trans_blocks()
Zhang Yi [Mon, 7 Jul 2025 14:08:13 +0000 (22:08 +0800)] 
ext4: fix insufficient credits calculation in ext4_meta_trans_blocks()

The calculation of journal credits in ext4_meta_trans_blocks() should
include pextents, as each extent separately may be allocated from a
different group and thus need to update different bitmap and group
descriptor block.

Fixes: 0e32d8617012 ("ext4: correct the journal credits calculations of allocating blocks")
Reported-by: Jan Kara <jack@suse.cz>
Closes: https://lore.kernel.org/linux-ext4/nhxfuu53wyacsrq7xqgxvgzcggyscu2tbabginahcygvmc45hy@t4fvmyeky33e/
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Link: https://patch.msgid.link/20250707140814.542883-11-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
8 weeks agoext4: replace ext4_writepage_trans_blocks()
Zhang Yi [Mon, 7 Jul 2025 14:08:12 +0000 (22:08 +0800)] 
ext4: replace ext4_writepage_trans_blocks()

After ext4 supports large folios, the semantics of reserving credits in
pages is no longer applicable. In most scenarios, reserving credits in
extents is sufficient. Therefore, introduce ext4_chunk_trans_extent()
to replace ext4_writepage_trans_blocks(). move_extent_per_page() is the
only remaining location where we are still processing extents in pages.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20250707140814.542883-10-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
8 weeks agoext4: reserved credits for one extent during the folio writeback
Zhang Yi [Mon, 7 Jul 2025 14:08:11 +0000 (22:08 +0800)] 
ext4: reserved credits for one extent during the folio writeback

After ext4 supports large folios, reserving journal credits for one
maximum-ordered folio based on the worst case cenario during the
writeback process can easily exceed the maximum transaction credits.
Additionally, reserving journal credits for one page is also no
longer appropriate.

Currently, the folio writeback process can either extend the journal
credits or initiate a new transaction if the currently reserved journal
credits are insufficient. Therefore, it can be modified to reserve
credits for only one extent at the outset. In most cases involving
continuous mapping, these credits are generally adequate, and we may
only need to perform some basic credit expansion. However, in extreme
cases where the block size and folio size differ significantly, or when
the folios are sufficiently discontinuous, it may be necessary to
restart a new transaction and resubmit the folios.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20250707140814.542883-9-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
8 weeks agoext4: correct the reserved credits for extent conversion
Zhang Yi [Mon, 7 Jul 2025 14:08:10 +0000 (22:08 +0800)] 
ext4: correct the reserved credits for extent conversion

Now, we reserve journal credits for converting extents in only one page
to written state when the I/O operation is complete. This is
insufficient when large folio is enabled.

Fix this by reserving credits for converting up to one extent per block in
the largest 2MB folio, this calculation should only involve extents index
and leaf blocks, so it should not estimate too many credits.

Fixes: 7ac67301e82f ("ext4: enable large folio for regular file")
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Link: https://patch.msgid.link/20250707140814.542883-8-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
8 weeks agoext4: enhance tracepoints during the folios writeback
Zhang Yi [Mon, 7 Jul 2025 14:08:09 +0000 (22:08 +0800)] 
ext4: enhance tracepoints during the folios writeback

After mpage_map_and_submit_extent() supports restarting handle if
credits are insufficient during allocating blocks, it is more likely to
exit the current mapping iteration and continue to process the current
processing partially mapped folio again. The existing tracepoints are
not sufficient to track this situation, so enhance the tracepoints to
track the writeback position and the return value before and after
submitting the folios.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20250707140814.542883-7-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
8 weeks agoext4: restart handle if credits are insufficient during allocating blocks
Zhang Yi [Mon, 7 Jul 2025 14:08:08 +0000 (22:08 +0800)] 
ext4: restart handle if credits are insufficient during allocating blocks

After large folios are supported on ext4, writing back a sufficiently
large and discontinuous folio may consume a significant number of
journal credits, placing considerable strain on the journal. For
example, in a 20GB filesystem with 1K block size and 1MB journal size,
writing back a 2MB folio could require thousands of credits in the
worst-case scenario (when each block is discontinuous and distributed
across different block groups), potentially exceeding the journal size.
This issue can also occur in ext4_write_begin() and ext4_page_mkwrite()
when delalloc is not enabled.

Fix this by ensuring that there are sufficient journal credits before
allocating an extent in mpage_map_one_extent() and
ext4_block_write_begin(). If there are not enough credits, return
-EAGAIN, exit the current mapping loop, restart a new handle and a new
transaction, and allocating blocks on this folio again in the next
iteration.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20250707140814.542883-6-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
8 weeks agoext4: refactor the block allocation process of ext4_page_mkwrite()
Zhang Yi [Mon, 7 Jul 2025 14:08:07 +0000 (22:08 +0800)] 
ext4: refactor the block allocation process of ext4_page_mkwrite()

The block allocation process and error handling in ext4_page_mkwrite()
is complex now. Refactor it by introducing a new helper function,
ext4_block_page_mkwrite(). It will call ext4_block_write_begin() to
allocate blocks instead of directly calling block_page_mkwrite().
Preparing to implement retry logic in a subsequent patch to address
situations where the reserved journal credits are insufficient.
Additionally, this modification will help prevent potential deadlocks
that may occur when waiting for folio writeback while holding the
transaction handle.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20250707140814.542883-5-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
8 weeks agoext4: fix stale data if it bail out of the extents mapping loop
Zhang Yi [Mon, 7 Jul 2025 14:08:06 +0000 (22:08 +0800)] 
ext4: fix stale data if it bail out of the extents mapping loop

During the process of writing back folios, if
mpage_map_and_submit_extent() exits the extent mapping loop due to an
ENOSPC or ENOMEM error, it may result in stale data or filesystem
inconsistency in environments where the block size is smaller than the
folio size.

When mapping a discontinuous folio in mpage_map_and_submit_extent(),
some buffers may have already be mapped. If we exit the mapping loop
prematurely, the folio data within the mapped range will not be written
back, and the file's disk size will not be updated. Once the transaction
that includes this range of extents is committed, this can lead to stale
data or filesystem inconsistency.

Fix this by submitting the current processing partially mapped folio.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20250707140814.542883-4-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
8 weeks agoext4: move the calculation of wbc->nr_to_write to mpage_folio_done()
Zhang Yi [Mon, 7 Jul 2025 14:08:05 +0000 (22:08 +0800)] 
ext4: move the calculation of wbc->nr_to_write to mpage_folio_done()

mpage_folio_done() should be a more appropriate place than
mpage_submit_folio() for updating the wbc->nr_to_write after we have
submitted a fully mapped folio. Preparing to make mpage_submit_folio()
allows to submit partially mapped folio that is still under processing.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Link: https://patch.msgid.link/20250707140814.542883-3-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
8 weeks agoext4: process folios writeback in bytes
Zhang Yi [Mon, 7 Jul 2025 14:08:04 +0000 (22:08 +0800)] 
ext4: process folios writeback in bytes

Since ext4 supports large folios, processing writebacks in pages is no
longer appropriate, it can be modified to process writebacks in bytes.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20250707140814.542883-2-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 months agotools/bootconfig: Cleanup bootconfig footer size calculations
Masami Hiramatsu (Google) [Thu, 10 Jul 2025 02:24:17 +0000 (11:24 +0900)] 
tools/bootconfig: Cleanup bootconfig footer size calculations

There are many same pattern of 8 + BOOTCONFIG_MAGIC_LEN for calculating
the size of bootconfig footer. Use BOOTCONFIG_FOOTER_SIZE macro to
clean up those magic numbers.

Link: https://lore.kernel.org/all/175211425693.2591046.16029516706923643510.stgit@mhiramat.tok.corp.google.com/
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2 months agotools/bootconfig: Replace some echo with printf for more portability
Masami Hiramatsu (Google) [Thu, 10 Jul 2025 02:24:09 +0000 (11:24 +0900)] 
tools/bootconfig: Replace some echo with printf for more portability

Since echo is not portable, use printf instead. This fixes a wrong
test result formatting in some environment. (showing "\t\t[OK]")
Also this uses printf command for generating test data instead of
echo.

Link: https://lore.kernel.org/all/175211424942.2591046.5439447789303314213.stgit@mhiramat.tok.corp.google.com/
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2 months agotools/bootconfig: Improve portability
Masami Hiramatsu (Google) [Thu, 10 Jul 2025 02:24:02 +0000 (11:24 +0900)] 
tools/bootconfig: Improve portability

Since 'stat' command and 'truncate' command are GNU extension,
use 'wc' and 'dd' commands instead.

Link: https://lore.kernel.org/all/175211424184.2591046.3523297993175066026.stgit@mhiramat.tok.corp.google.com/
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2 months agotools: bootconfig: Regex enclosed with quotes to make syntax highlight proper
Bhaskar Chowdhury [Wed, 9 Jul 2025 02:57:59 +0000 (08:27 +0530)] 
tools: bootconfig: Regex enclosed with quotes to make syntax highlight proper

As suggested, changed the square brackets escaping to quote the whole Regex
class.

Link: https://lore.kernel.org/all/20250709030141.27483-1-unixbhaskar@gmail.com/
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2 months agomm/damon/reclaim: use parameter context correctly
SeongJae Park [Sun, 6 Jul 2025 19:32:07 +0000 (12:32 -0700)] 
mm/damon/reclaim: use parameter context correctly

damon_reclaim_apply_parameters() allocates a new DAMON context, stages
user-specified DAMON parameters on it, and commits to running DAMON
context at once, using damon_commit_ctx().  The code is mistakenly
over-writing the monitoring attributes and the reclaim scheme on the
running context.  It is not causing a real problem for monitoring
attributes, but the scheme overwriting can remove scheme's internal status
such as charged quota.  Fix the wrong use of the parameter context.

Link: https://lkml.kernel.org/r/20250706193207.39810-7-sj@kernel.org
Fixes: 11ddcfc257a3 ("mm/damon/reclaim: use damon_commit_ctx()")
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm/damon/lru_sort: reset enabled when DAMON start failed
SeongJae Park [Sun, 6 Jul 2025 19:32:06 +0000 (12:32 -0700)] 
mm/damon/lru_sort: reset enabled when DAMON start failed

When the startup fails, 'enabled' parameter is not reset.  As a result,
users show the parameter 'Y' while it is not really working.  Fix it by
resetting 'enabled' to 'false' when the work is failed.

Link: https://lkml.kernel.org/r/20250706193207.39810-6-sj@kernel.org
Fixes: 7a034fbba336 ("mm/damon/lru_sort: enable and disable synchronously")
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm/damon/reclaim: reset enabled when DAMON start failed
SeongJae Park [Sun, 6 Jul 2025 19:32:05 +0000 (12:32 -0700)] 
mm/damon/reclaim: reset enabled when DAMON start failed

When the startup fails, 'enabled' parameter is not reset.  As a result,
users show the parameter 'Y' while it is not really working.  Fix it by
resetting 'enabled' to 'false' when the work is failed.

Link: https://lkml.kernel.org/r/20250706193207.39810-5-sj@kernel.org
Fixes: 04e98764befa ("mm/damon/reclaim: enable and disable synchronously")
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agosamples/damon/mtier: support boot time enable setup
SeongJae Park [Sun, 6 Jul 2025 19:32:04 +0000 (12:32 -0700)] 
samples/damon/mtier: support boot time enable setup

If 'enable' parameter of the 'mtier' DAMON sample module is set at boot
time via the kernel command line, memory allocation is tried before the
slab is initialized.  As a result kernel NULL pointer dereference BUG can
happen.  Fix it by checking the initialization status.

Link: https://lkml.kernel.org/r/20250706193207.39810-4-sj@kernel.org
Fixes: 82a08bde3cf7 ("samples/damon: implement a DAMON module for memory tiering")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agosamples/damon/prcl: fix boot time enable crash
SeongJae Park [Sun, 6 Jul 2025 19:32:03 +0000 (12:32 -0700)] 
samples/damon/prcl: fix boot time enable crash

If 'enable' parameter of the 'prcl' DAMON sample module is set at boot
time via the kernel command line, memory allocation is tried before the
slab is initialized.  As a result kernel NULL pointer dereference BUG can
happen.  Fix it by checking the initialization status.

Link: https://lkml.kernel.org/r/20250706193207.39810-3-sj@kernel.org
Fixes: 2aca254620a8 ("samples/damon: introduce a skeleton of a smaple DAMON module for proactive reclamation")
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agosamples/damon/wsse: fix boot time enable handling
SeongJae Park [Sun, 6 Jul 2025 19:32:02 +0000 (12:32 -0700)] 
samples/damon/wsse: fix boot time enable handling

Patch series "mm/damon: fix misc bugs in DAMON modules".

From manual code review, I found below bugs in DAMON modules.

DAMON sample modules crash if those are enabled at boot time, via kernel
command line.  A similar issue was found and fixed on DAMON non-sample
modules in the past, but we didn't check that for sample modules.

DAMON non-sample modules are not setting 'enabled' parameters accordingly
when real enabling is failed.  Honggyu found and fixed[1] this type of
bugs in DAMON sample modules, and my inspection was motivated by the great
work.  Kudos to Honggyu.

Finally, DAMON_RECLIAM is mistakenly losing scheme internal status due to
misuse of damon_commit_ctx().  DAMON_LRU_SORT has a similar misuse, but
fortunately it is not causing real status loss.

Fix the bugs.  Since these are similar patterns of bugs that were found in
the past, it would be better to add tests or refactor the code, in future.

This patch (of 6):

If 'enable' parameter of the 'wsse' DAMON sample module is set at boot
time via the kernel command line, memory allocation is tried before the
slab is initialized.  As a result kernel NULL pointer dereference BUG can
happen.  Fix it by checking the initialization status.

Link: https://lkml.kernel.org/r/20250706193207.39810-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20250706193207.39810-2-sj@kernel.org
Link: https://lore.kernel.org/20250702000205.1921-1-honggyu.kim@sk.com
Fixes: b757c6cfc696 ("samples/damon/wsse: start and stop DAMON as the user requests")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm/damon: add trace event for effective size quota
SeongJae Park [Fri, 4 Jul 2025 22:14:08 +0000 (15:14 -0700)] 
mm/damon: add trace event for effective size quota

Aim-oriented DAMOS quota auto-tuning is an important and recommended
feature for DAMOS users.  Add a trace event for the observability of the
tuned quota and tuning itself.

[sj@kernel.org: initialize sidx in damos_trace_esz()]
Link: https://lkml.kernel.org/r/20250705172003.52324-1-sj@kernel.org
[sj@kernel.org: make damos_esz unconditional trace event]
Link: https://lkml.kernel.org/r/20250709182843.35812-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20250704221408.38510-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm/damon: add trace event for auto-tuned monitoring intervals
SeongJae Park [Fri, 4 Jul 2025 22:14:07 +0000 (15:14 -0700)] 
mm/damon: add trace event for auto-tuned monitoring intervals

Patch series "mm/damon: add trace events for auto-tuned monitoring
intervals and DAMOS quota".

The aim-oriented auto-tuning features for monitoring intervals and DAMOS
quota are important and recommended.  Add tracepoints for observabilities
of those tuned values and the tuning itself.

This patch (of 2):

Aim-oriented monitoring intervals auto-tuning is an important and
recommended feature for DAMON users.  Add a trace event for the
observability of the tuned intervals and tuning itself.

Link: https://lkml.kernel.org/r/20250704221408.38510-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20250704221408.38510-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agokhugepaged: reduce race probability between migration and khugepaged
Dev Jain [Fri, 4 Jul 2025 04:04:17 +0000 (09:34 +0530)] 
khugepaged: reduce race probability between migration and khugepaged

Suppose a folio is under migration, and khugepaged is also trying to
collapse it.  collapse_pte_mapped_thp() will retrieve the folio from the
page cache via filemap_lock_folio(), thus taking a reference on the folio
and sleeping on the folio lock, since the lock is held by the migration
path.  Migration will then fail in __folio_migrate_mapping ->
folio_ref_freeze.  Reduce the probability of such a race happening
(leading to migration failure) by bailing out if we detect a PMD is marked
with a migration entry.

This fixes the migration-shared-anon-thp testcase failure on Apple M3.

Note that, this is not a "fix" since it only reduces the chance of
interference of khugepaged with migration, wherein both the kernel
functionalities are deemed "best-effort".

Link: https://lkml.kernel.org/r/20250704040417.63826-1-dev.jain@arm.com
Signed-off-by: Dev Jain <dev.jain@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agolib/test_vmalloc.c: introduce xfail for failing tests
Raghavendra K T [Wed, 2 Jul 2025 06:43:19 +0000 (06:43 +0000)] 
lib/test_vmalloc.c: introduce xfail for failing tests

The test align_shift_alloc_test is expected to fail.  Reporting the test
as fail confuses to be a genuine failure.  Introduce widely used xfail
sematics to address the issue.

Note: a warn_alloc dump similar to below is still expected:

 Call Trace:
  <TASK>
  dump_stack_lvl+0x64/0x80
  warn_alloc+0x137/0x1b0
  ? __get_vm_area_node+0x134/0x140

Snippet of dmesg after change:

Summary: random_size_align_alloc_test passed: 1 failed: 0 xfailed: 0 ..
Summary: align_shift_alloc_test passed: 0 failed: 0 xfailed: 1 ..
Summary: pcpu_alloc_test passed: 1 failed: 0 xfailed: 0 ..

Link: https://lkml.kernel.org/r/20250702064319.885-1-raghavendra.kt@amd.com
Signed-off-by: Raghavendra K T <raghavendra.kt@amd.com>
Reviewed-by: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Dev Jain <dev.jain@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm/balloon_compaction: provide single balloon_page_insert() and balloon_mapping_gfp_m...
David Hildenbrand [Fri, 4 Jul 2025 10:25:23 +0000 (12:25 +0200)] 
mm/balloon_compaction: provide single balloon_page_insert() and balloon_mapping_gfp_mask()

Let's just special-case based on IS_ENABLED(CONFIG_BALLOON_COMPACTION)
like we did for balloon_page_finalize().

Link: https://lkml.kernel.org/r/20250704102524.326966-30-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Eugenio Pé rez <eperezma@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Gregory Price <gourry@gourry.net>
Cc: Harry Yoo <harry.yoo@oracle.com>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm/balloon_compaction: "movable_ops" doc updates
David Hildenbrand [Fri, 4 Jul 2025 10:25:22 +0000 (12:25 +0200)] 
mm/balloon_compaction: "movable_ops" doc updates

Let's bring the docs up-to-date.  Setting PG_movable_ops + page->private
very likely still requires to be performed under documented locks: it's
complicated.

We will rework this in the future, as we will try avoiding using the page
lock.

Link: https://lkml.kernel.org/r/20250704102524.326966-29-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Eugenio Pé rez <eperezma@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Gregory Price <gourry@gourry.net>
Cc: Harry Yoo <harry.yoo@oracle.com>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agodocs/mm: convert from "Non-LRU page migration" to "movable_ops page migration"
David Hildenbrand [Fri, 4 Jul 2025 10:25:21 +0000 (12:25 +0200)] 
docs/mm: convert from "Non-LRU page migration" to "movable_ops page migration"

Let's bring the docs up-to-date.

Link: https://lkml.kernel.org/r/20250704102524.326966-28-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Eugenio Pé rez <eperezma@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Gregory Price <gourry@gourry.net>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm: rename PAGE_MAPPING_* to FOLIO_MAPPING_*
David Hildenbrand [Fri, 4 Jul 2025 10:25:20 +0000 (12:25 +0200)] 
mm: rename PAGE_MAPPING_* to FOLIO_MAPPING_*

Now that the mapping flags are only used for folios, let's rename the
defines.

Link: https://lkml.kernel.org/r/20250704102524.326966-27-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Eugenio Pé rez <eperezma@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Gregory Price <gourry@gourry.net>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm: simplify folio_expected_ref_count()
David Hildenbrand [Fri, 4 Jul 2025 10:25:19 +0000 (12:25 +0200)] 
mm: simplify folio_expected_ref_count()

Now that PAGE_MAPPING_MOVABLE is gone, we can simplify and rely on the
folio_test_anon() test only.

... but staring at the users, this function should never even have been
called on movable_ops pages. E.g.,
* __buffer_migrate_folio() does not make sense for them
* folio_migrate_mapping() does not make sense for them
* migrate_huge_page_move_mapping() does not make sense for them
* __migrate_folio() does not make sense for them
* ... and khugepaged should never stumble over them

Let's simply refuse typed pages (which includes slab) except hugetlb, and
WARN.

Link: https://lkml.kernel.org/r/20250704102524.326966-26-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Eugenio Pé rez <eperezma@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Gregory Price <gourry@gourry.net>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm/page-flags: remove folio_mapping_flags()
David Hildenbrand [Fri, 4 Jul 2025 10:25:18 +0000 (12:25 +0200)] 
mm/page-flags: remove folio_mapping_flags()

It's unused and the page counterpart is gone, so let's remove it.

Link: https://lkml.kernel.org/r/20250704102524.326966-25-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Eugenio Pé rez <eperezma@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Gregory Price <gourry@gourry.net>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm/page-alloc: remove PageMappingFlags()
David Hildenbrand [Fri, 4 Jul 2025 10:25:17 +0000 (12:25 +0200)] 
mm/page-alloc: remove PageMappingFlags()

As PageMappingFlags() now only indicates anon (incl.  KSM) folios, we can
now simply check for PageAnon() and remove PageMappingFlags().

...  and while at it, use the folio instead and operate on folio->mapping.

Link: https://lkml.kernel.org/r/20250704102524.326966-24-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Eugenio Pé rez <eperezma@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Gregory Price <gourry@gourry.net>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm/page-flags: rename PAGE_MAPPING_MOVABLE to PAGE_MAPPING_ANON_KSM
David Hildenbrand [Fri, 4 Jul 2025 10:25:16 +0000 (12:25 +0200)] 
mm/page-flags: rename PAGE_MAPPING_MOVABLE to PAGE_MAPPING_ANON_KSM

KSM is the only remaining user, let's rename the flag.  While at it,
adjust to remaining page -> folio in the doc.

Link: https://lkml.kernel.org/r/20250704102524.326966-23-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Eugenio Pé rez <eperezma@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Gregory Price <gourry@gourry.net>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm: rename PG_isolated to PG_movable_ops_isolated
David Hildenbrand [Fri, 4 Jul 2025 10:25:15 +0000 (12:25 +0200)] 
mm: rename PG_isolated to PG_movable_ops_isolated

Let's rename the flag to make it clearer where it applies (not folios
...).

While at it, define the flag only with CONFIG_MIGRATION.

Link: https://lkml.kernel.org/r/20250704102524.326966-22-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Eugenio Pé rez <eperezma@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Gregory Price <gourry@gourry.net>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm: convert "movable" flag in page->mapping to a page flag
David Hildenbrand [Fri, 4 Jul 2025 10:25:14 +0000 (12:25 +0200)] 
mm: convert "movable" flag in page->mapping to a page flag

Instead, let's use a page flag.  As the page flag can result in
false-positives, glue it to the page types for which we support/implement
movable_ops page migration.

We are reusing PG_uptodate, that is for example used to track file system
state and does not apply to movable_ops pages.  So warning in case it is
set in page_has_movable_ops() on other page types could result in
false-positive warnings.

Likely we could set the bit using a non-atomic update: in contrast to
page->mapping, we could have others trying to update the flags
concurrently when trying to lock the folio.  In
isolate_movable_ops_page(), we already take care of that by checking if
the page has movable_ops before locking it.  Let's start with the atomic
variant, we could later switch to the non-atomic variant once we are sure
other cases are similarly fine.  Once we perform the switch, we'll have to
introduce __SETPAGEFLAG_NOOP().

[david@redhat.com: add missing `:' in kerneldoc]
Link: https://lkml.kernel.org/r/d96e2916-2c43-462c-b6a1-2375ef397d8b@redhat.com
Link: https://lkml.kernel.org/r/20250704102524.326966-21-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Eugenio Pé rez <eperezma@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Gregory Price <gourry@gourry.net>
Cc: Harry Yoo <harry.yoo@oracle.com>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm: stop storing migration_ops in page->mapping
David Hildenbrand [Fri, 4 Jul 2025 10:25:13 +0000 (12:25 +0200)] 
mm: stop storing migration_ops in page->mapping

...  instead, look them up statically based on the page type.  Maybe in
the future we want a registration interface?  At least for now, it can be
easily handled using the two page types that actually support page
migration.

The remaining usage of page->mapping is to flag such pages as actually
being movable (having movable_ops), which we will change next.

Link: https://lkml.kernel.org/r/20250704102524.326966-20-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Eugenio Pé rez <eperezma@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Gregory Price <gourry@gourry.net>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm: remove __folio_test_movable()
David Hildenbrand [Fri, 4 Jul 2025 10:25:12 +0000 (12:25 +0200)] 
mm: remove __folio_test_movable()

Convert to page_has_movable_ops().  While at it, cleanup relevant code a
bit.

The data_race() in migrate_folio_unmap() is questionable: we already hold
a page reference, and concurrent modifications can no longer happen (iow:
__ClearPageMovable() no longer exists).  Drop it for now, we'll rework
page_has_movable_ops() soon either way to no longer rely on page->mapping.

Wherever we cast from folio to page now is a clear sign that this code has
to be decoupled.

Link: https://lkml.kernel.org/r/20250704102524.326966-19-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Eugenio Pé rez <eperezma@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Gregory Price <gourry@gourry.net>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm/page_isolation: drop __folio_test_movable() check for large folios
David Hildenbrand [Fri, 4 Jul 2025 10:25:11 +0000 (12:25 +0200)] 
mm/page_isolation: drop __folio_test_movable() check for large folios

Currently, we only support migration of individual movable_ops pages, so
we can not run into that.

Link: https://lkml.kernel.org/r/20250704102524.326966-18-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Eugenio Pé rez <eperezma@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Gregory Price <gourry@gourry.net>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>