From: Greg Kroah-Hartman Date: Mon, 21 Jul 2025 14:09:27 +0000 (+0200) Subject: 6.12-stable patches X-Git-Tag: v6.1.147~47 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=a265c8a52b512f659cf86dabeb2ffb544c60d20e;p=thirdparty%2Fkernel%2Fstable-queue.git 6.12-stable patches added patches: btrfs-fix-block-group-refcount-race-in-btrfs_create_pending_block_groups.patch clone_private_mnt-make-sure-that-caller-has-cap_sys_admin-in-the-right-userns.patch sched-change-nr_uninterruptible-type-to-unsigned-long.patch --- diff --git a/queue-6.12/btrfs-fix-block-group-refcount-race-in-btrfs_create_pending_block_groups.patch b/queue-6.12/btrfs-fix-block-group-refcount-race-in-btrfs_create_pending_block_groups.patch new file mode 100644 index 0000000000..fcf6f2ce8e --- /dev/null +++ b/queue-6.12/btrfs-fix-block-group-refcount-race-in-btrfs_create_pending_block_groups.patch @@ -0,0 +1,108 @@ +From 2d8e5168d48a91e7a802d3003e72afb4304bebfa Mon Sep 17 00:00:00 2001 +From: Boris Burkov +Date: Wed, 5 Mar 2025 15:03:13 -0800 +Subject: btrfs: fix block group refcount race in btrfs_create_pending_block_groups() + +From: Boris Burkov + +commit 2d8e5168d48a91e7a802d3003e72afb4304bebfa upstream. + +Block group creation is done in two phases, which results in a slightly +unintuitive property: a block group can be allocated/deallocated from +after btrfs_make_block_group() adds it to the space_info with +btrfs_add_bg_to_space_info(), but before creation is completely completed +in btrfs_create_pending_block_groups(). As a result, it is possible for a +block group to go unused and have 'btrfs_mark_bg_unused' called on it +concurrently with 'btrfs_create_pending_block_groups'. This causes a +number of issues, which were fixed with the block group flag +'BLOCK_GROUP_FLAG_NEW'. + +However, this fix is not quite complete. Since it does not use the +unused_bg_lock, it is possible for the following race to occur: + +btrfs_create_pending_block_groups btrfs_mark_bg_unused + if list_empty // false + list_del_init + clear_bit + else if (test_bit) // true + list_move_tail + +And we get into the exact same broken ref count and invalid new_bgs +state for transaction cleanup that BLOCK_GROUP_FLAG_NEW was designed to +prevent. + +The broken refcount aspect will result in a warning like: + + [1272.943527] refcount_t: underflow; use-after-free. + [1272.943967] WARNING: CPU: 1 PID: 61 at lib/refcount.c:28 refcount_warn_saturate+0xba/0x110 + [1272.944731] Modules linked in: btrfs virtio_net xor zstd_compress raid6_pq null_blk [last unloaded: btrfs] + [1272.945550] CPU: 1 UID: 0 PID: 61 Comm: kworker/u32:1 Kdump: loaded Tainted: G W 6.14.0-rc5+ #108 + [1272.946368] Tainted: [W]=WARN + [1272.946585] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 + [1272.947273] Workqueue: btrfs_discard btrfs_discard_workfn [btrfs] + [1272.947788] RIP: 0010:refcount_warn_saturate+0xba/0x110 + [1272.949532] RSP: 0018:ffffbf1200247df0 EFLAGS: 00010282 + [1272.949901] RAX: 0000000000000000 RBX: ffffa14b00e3f800 RCX: 0000000000000000 + [1272.950437] RDX: 0000000000000000 RSI: ffffbf1200247c78 RDI: 00000000ffffdfff + [1272.950986] RBP: ffffa14b00dc2860 R08: 00000000ffffdfff R09: ffffffff90526268 + [1272.951512] R10: ffffffff904762c0 R11: 0000000063666572 R12: ffffa14b00dc28c0 + [1272.952024] R13: 0000000000000000 R14: ffffa14b00dc2868 R15: 000001285dcd12c0 + [1272.952850] FS: 0000000000000000(0000) GS:ffffa14d33c40000(0000) knlGS:0000000000000000 + [1272.953458] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 + [1272.953931] CR2: 00007f838cbda000 CR3: 000000010104e000 CR4: 00000000000006f0 + [1272.954474] Call Trace: + [1272.954655] + [1272.954812] ? refcount_warn_saturate+0xba/0x110 + [1272.955173] ? __warn.cold+0x93/0xd7 + [1272.955487] ? refcount_warn_saturate+0xba/0x110 + [1272.955816] ? report_bug+0xe7/0x120 + [1272.956103] ? handle_bug+0x53/0x90 + [1272.956424] ? exc_invalid_op+0x13/0x60 + [1272.956700] ? asm_exc_invalid_op+0x16/0x20 + [1272.957011] ? refcount_warn_saturate+0xba/0x110 + [1272.957399] btrfs_discard_cancel_work.cold+0x26/0x2b [btrfs] + [1272.957853] btrfs_put_block_group.cold+0x5d/0x8e [btrfs] + [1272.958289] btrfs_discard_workfn+0x194/0x380 [btrfs] + [1272.958729] process_one_work+0x130/0x290 + [1272.959026] worker_thread+0x2ea/0x420 + [1272.959335] ? __pfx_worker_thread+0x10/0x10 + [1272.959644] kthread+0xd7/0x1c0 + [1272.959872] ? __pfx_kthread+0x10/0x10 + [1272.960172] ret_from_fork+0x30/0x50 + [1272.960474] ? __pfx_kthread+0x10/0x10 + [1272.960745] ret_from_fork_asm+0x1a/0x30 + [1272.961035] + [1272.961238] ---[ end trace 0000000000000000 ]--- + +Though we have seen them in the async discard workfn as well. It is +most likely to happen after a relocation finishes which cancels discard, +tears down the block group, etc. + +Fix this fully by taking the lock around the list_del_init + clear_bit +so that the two are done atomically. + +Fixes: 0657b20c5a76 ("btrfs: fix use-after-free of new block group that became unused") +Reviewed-by: Qu Wenruo +Reviewed-by: Filipe Manana +Signed-off-by: Boris Burkov +Signed-off-by: David Sterba +Signed-off-by: Alva Lan +Signed-off-by: Greg Kroah-Hartman +--- + fs/btrfs/block-group.c | 3 +++ + 1 file changed, 3 insertions(+) + +--- a/fs/btrfs/block-group.c ++++ b/fs/btrfs/block-group.c +@@ -2780,8 +2780,11 @@ void btrfs_create_pending_block_groups(s + /* Already aborted the transaction if it failed. */ + next: + btrfs_dec_delayed_refs_rsv_bg_inserts(fs_info); ++ ++ spin_lock(&fs_info->unused_bgs_lock); + list_del_init(&block_group->bg_list); + clear_bit(BLOCK_GROUP_FLAG_NEW, &block_group->runtime_flags); ++ spin_unlock(&fs_info->unused_bgs_lock); + + /* + * If the block group is still unused, add it to the list of diff --git a/queue-6.12/clone_private_mnt-make-sure-that-caller-has-cap_sys_admin-in-the-right-userns.patch b/queue-6.12/clone_private_mnt-make-sure-that-caller-has-cap_sys_admin-in-the-right-userns.patch new file mode 100644 index 0000000000..ba4d4667df --- /dev/null +++ b/queue-6.12/clone_private_mnt-make-sure-that-caller-has-cap_sys_admin-in-the-right-userns.patch @@ -0,0 +1,48 @@ +From c28f922c9dcee0e4876a2c095939d77fe7e15116 Mon Sep 17 00:00:00 2001 +From: Al Viro +Date: Sun, 1 Jun 2025 20:11:06 -0400 +Subject: clone_private_mnt(): make sure that caller has CAP_SYS_ADMIN in the right userns + +From: Al Viro + +commit c28f922c9dcee0e4876a2c095939d77fe7e15116 upstream. + +What we want is to verify there is that clone won't expose something +hidden by a mount we wouldn't be able to undo. "Wouldn't be able to undo" +may be a result of MNT_LOCKED on a child, but it may also come from +lacking admin rights in the userns of the namespace mount belongs to. + +clone_private_mnt() checks the former, but not the latter. + +There's a number of rather confusing CAP_SYS_ADMIN checks in various +userns during the mount, especially with the new mount API; they serve +different purposes and in case of clone_private_mnt() they usually, +but not always end up covering the missing check mentioned above. + +Reviewed-by: Christian Brauner +Reported-by: "Orlando, Noah" +Fixes: 427215d85e8d ("ovl: prevent private clone if bind mount is not allowed") +Signed-off-by: Al Viro +[ merge conflict resolution: clone_private_mount() was reworked in + db04662e2f4f ("fs: allow detached mounts in clone_private_mount()"). + Tweak the relevant ns_capable check so that it works on older kernels ] +Signed-off-by: Noah Orlando +Signed-off-by: Greg Kroah-Hartman +--- + fs/namespace.c | 5 +++++ + 1 file changed, 5 insertions(+) + +--- a/fs/namespace.c ++++ b/fs/namespace.c +@@ -2263,6 +2263,11 @@ struct vfsmount *clone_private_mount(con + if (!check_mnt(old_mnt)) + goto invalid; + ++ if (!ns_capable(old_mnt->mnt_ns->user_ns, CAP_SYS_ADMIN)) { ++ up_read(&namespace_sem); ++ return ERR_PTR(-EPERM); ++ } ++ + if (has_locked_children(old_mnt, path->dentry)) + goto invalid; + diff --git a/queue-6.12/sched-change-nr_uninterruptible-type-to-unsigned-long.patch b/queue-6.12/sched-change-nr_uninterruptible-type-to-unsigned-long.patch new file mode 100644 index 0000000000..99097943c8 --- /dev/null +++ b/queue-6.12/sched-change-nr_uninterruptible-type-to-unsigned-long.patch @@ -0,0 +1,54 @@ +From 36569780b0d64de283f9d6c2195fd1a43e221ee8 Mon Sep 17 00:00:00 2001 +From: Aruna Ramakrishna +Date: Wed, 9 Jul 2025 17:33:28 +0000 +Subject: sched: Change nr_uninterruptible type to unsigned long + +From: Aruna Ramakrishna + +commit 36569780b0d64de283f9d6c2195fd1a43e221ee8 upstream. + +The commit e6fe3f422be1 ("sched: Make multiple runqueue task counters +32-bit") changed nr_uninterruptible to an unsigned int. But the +nr_uninterruptible values for each of the CPU runqueues can grow to +large numbers, sometimes exceeding INT_MAX. This is valid, if, over +time, a large number of tasks are migrated off of one CPU after going +into an uninterruptible state. Only the sum of all nr_interruptible +values across all CPUs yields the correct result, as explained in a +comment in kernel/sched/loadavg.c. + +Change the type of nr_uninterruptible back to unsigned long to prevent +overflows, and thus the miscalculation of load average. + +Fixes: e6fe3f422be1 ("sched: Make multiple runqueue task counters 32-bit") + +Signed-off-by: Aruna Ramakrishna +Signed-off-by: Peter Zijlstra (Intel) +Link: https://lkml.kernel.org/r/20250709173328.606794-1-aruna.ramakrishna@oracle.com +Signed-off-by: Greg Kroah-Hartman +--- + kernel/sched/loadavg.c | 2 +- + kernel/sched/sched.h | 2 +- + 2 files changed, 2 insertions(+), 2 deletions(-) + +--- a/kernel/sched/loadavg.c ++++ b/kernel/sched/loadavg.c +@@ -80,7 +80,7 @@ long calc_load_fold_active(struct rq *th + long nr_active, delta = 0; + + nr_active = this_rq->nr_running - adjust; +- nr_active += (int)this_rq->nr_uninterruptible; ++ nr_active += (long)this_rq->nr_uninterruptible; + + if (nr_active != this_rq->calc_load_active) { + delta = nr_active - this_rq->calc_load_active; +--- a/kernel/sched/sched.h ++++ b/kernel/sched/sched.h +@@ -1156,7 +1156,7 @@ struct rq { + * one CPU and if it got migrated afterwards it may decrease + * it on another CPU. Always updated under the runqueue lock: + */ +- unsigned int nr_uninterruptible; ++ unsigned long nr_uninterruptible; + + struct task_struct __rcu *curr; + struct sched_dl_entity *dl_server; diff --git a/queue-6.12/series b/queue-6.12/series index a501eff194..2b04334bdc 100644 --- a/queue-6.12/series +++ b/queue-6.12/series @@ -138,3 +138,5 @@ drm-mediatek-only-announce-afbc-if-really-supported.patch libbpf-fix-handling-of-bpf-arena-relocations.patch efivarfs-fix-memory-leak-of-efivarfs_fs_info-in-fs_c.patch sched-change-nr_uninterruptible-type-to-unsigned-long.patch +clone_private_mnt-make-sure-that-caller-has-cap_sys_admin-in-the-right-userns.patch +btrfs-fix-block-group-refcount-race-in-btrfs_create_pending_block_groups.patch