--- /dev/null
+From 41cddf83d8b00f29fd105e7a0777366edc69a5cf Mon Sep 17 00:00:00 2001
+From: David Hildenbrand <david@redhat.com>
+Date: Mon, 10 Feb 2025 17:13:17 +0100
+Subject: mm/migrate_device: don't add folio to be freed to LRU in migrate_device_finalize()
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+From: David Hildenbrand <david@redhat.com>
+
+commit 41cddf83d8b00f29fd105e7a0777366edc69a5cf upstream.
+
+If migration succeeded, we called
+folio_migrate_flags()->mem_cgroup_migrate() to migrate the memcg from the
+old to the new folio. This will set memcg_data of the old folio to 0.
+
+Similarly, if migration failed, memcg_data of the dst folio is left unset.
+
+If we call folio_putback_lru() on such folios (memcg_data == 0), we will
+add the folio to be freed to the LRU, making memcg code unhappy. Running
+the hmm selftests:
+
+ # ./hmm-tests
+ ...
+ # RUN hmm.hmm_device_private.migrate ...
+ [ 102.078007][T14893] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x7ff27d200 pfn:0x13cc00
+ [ 102.079974][T14893] anon flags: 0x17ff00000020018(uptodate|dirty|swapbacked|node=0|zone=2|lastcpupid=0x7ff)
+ [ 102.082037][T14893] raw: 017ff00000020018 dead000000000100 dead000000000122 ffff8881353896c9
+ [ 102.083687][T14893] raw: 00000007ff27d200 0000000000000000 00000001ffffffff 0000000000000000
+ [ 102.085331][T14893] page dumped because: VM_WARN_ON_ONCE_FOLIO(!memcg && !mem_cgroup_disabled())
+ [ 102.087230][T14893] ------------[ cut here ]------------
+ [ 102.088279][T14893] WARNING: CPU: 0 PID: 14893 at ./include/linux/memcontrol.h:726 folio_lruvec_lock_irqsave+0x10e/0x170
+ [ 102.090478][T14893] Modules linked in:
+ [ 102.091244][T14893] CPU: 0 UID: 0 PID: 14893 Comm: hmm-tests Not tainted 6.13.0-09623-g6c216bc522fd #151
+ [ 102.093089][T14893] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
+ [ 102.094848][T14893] RIP: 0010:folio_lruvec_lock_irqsave+0x10e/0x170
+ [ 102.096104][T14893] Code: ...
+ [ 102.099908][T14893] RSP: 0018:ffffc900236c37b0 EFLAGS: 00010293
+ [ 102.101152][T14893] RAX: 0000000000000000 RBX: ffffea0004f30000 RCX: ffffffff8183f426
+ [ 102.102684][T14893] RDX: ffff8881063cb880 RSI: ffffffff81b8117f RDI: ffff8881063cb880
+ [ 102.104227][T14893] RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000
+ [ 102.105757][T14893] R10: 0000000000000001 R11: 0000000000000002 R12: ffffc900236c37d8
+ [ 102.107296][T14893] R13: ffff888277a2bcb0 R14: 000000000000001f R15: 0000000000000000
+ [ 102.108830][T14893] FS: 00007ff27dbdd740(0000) GS:ffff888277a00000(0000) knlGS:0000000000000000
+ [ 102.110643][T14893] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
+ [ 102.111924][T14893] CR2: 00007ff27d400000 CR3: 000000010866e000 CR4: 0000000000750ef0
+ [ 102.113478][T14893] PKRU: 55555554
+ [ 102.114172][T14893] Call Trace:
+ [ 102.114805][T14893] <TASK>
+ [ 102.115397][T14893] ? folio_lruvec_lock_irqsave+0x10e/0x170
+ [ 102.116547][T14893] ? __warn.cold+0x110/0x210
+ [ 102.117461][T14893] ? folio_lruvec_lock_irqsave+0x10e/0x170
+ [ 102.118667][T14893] ? report_bug+0x1b9/0x320
+ [ 102.119571][T14893] ? handle_bug+0x54/0x90
+ [ 102.120494][T14893] ? exc_invalid_op+0x17/0x50
+ [ 102.121433][T14893] ? asm_exc_invalid_op+0x1a/0x20
+ [ 102.122435][T14893] ? __wake_up_klogd.part.0+0x76/0xd0
+ [ 102.123506][T14893] ? dump_page+0x4f/0x60
+ [ 102.124352][T14893] ? folio_lruvec_lock_irqsave+0x10e/0x170
+ [ 102.125500][T14893] folio_batch_move_lru+0xd4/0x200
+ [ 102.126577][T14893] ? __pfx_lru_add+0x10/0x10
+ [ 102.127505][T14893] __folio_batch_add_and_move+0x391/0x720
+ [ 102.128633][T14893] ? __pfx_lru_add+0x10/0x10
+ [ 102.129550][T14893] folio_putback_lru+0x16/0x80
+ [ 102.130564][T14893] migrate_device_finalize+0x9b/0x530
+ [ 102.131640][T14893] dmirror_migrate_to_device.constprop.0+0x7c5/0xad0
+ [ 102.133047][T14893] dmirror_fops_unlocked_ioctl+0x89b/0xc80
+
+Likely, nothing else goes wrong: putting the last folio reference will
+remove the folio from the LRU again. So besides memcg complaining, adding
+the folio to be freed to the LRU is just an unnecessary step.
+
+The new flow resembles what we have in migrate_folio_move(): add the dst
+to the lru, remove migration ptes, unlock and unref dst.
+
+Link: https://lkml.kernel.org/r/20250210161317.717936-1-david@redhat.com
+Fixes: 8763cb45ab96 ("mm/migrate: new memory migration helper for use with device memory")
+Signed-off-by: David Hildenbrand <david@redhat.com>
+Cc: Jérôme Glisse <jglisse@redhat.com>
+Cc: John Hubbard <jhubbard@nvidia.com>
+Cc: Alistair Popple <apopple@nvidia.com>
+Cc: <stable@vger.kernel.org>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: David Hildenbrand <david@redhat.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ mm/migrate.c | 12 ++++--------
+ 1 file changed, 4 insertions(+), 8 deletions(-)
+
+--- a/mm/migrate.c
++++ b/mm/migrate.c
+@@ -2967,21 +2967,17 @@ void migrate_vma_finalize(struct migrate
+ newpage = page;
+ }
+
++ if (!is_zone_device_page(newpage))
++ lru_cache_add(newpage);
+ remove_migration_ptes(page, newpage, false);
+ unlock_page(page);
+ migrate->cpages--;
+
+- if (is_zone_device_page(page))
+- put_page(page);
+- else
+- putback_lru_page(page);
++ put_page(page);
+
+ if (newpage != page) {
+ unlock_page(newpage);
+- if (is_zone_device_page(newpage))
+- put_page(newpage);
+- else
+- putback_lru_page(newpage);
++ put_page(newpage);
+ }
+ }
+ }