From: Greg Kroah-Hartman Date: Mon, 30 Mar 2026 11:38:24 +0000 (+0200) Subject: 6.19-stable patches X-Git-Tag: v6.6.131~28 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=65678e0462f6dedea484ab5d2530ff164dc350d9;p=thirdparty%2Fkernel%2Fstable-queue.git 6.19-stable patches added patches: ext4-always-drain-queued-discard-work-in-ext4_mb_release.patch ext4-avoid-allocate-block-from-corrupted-group-in-ext4_mb_find_by_goal.patch ext4-avoid-infinite-loops-caused-by-residual-data.patch ext4-convert-inline-data-to-extents-when-truncate-exceeds-inline-size.patch ext4-do-not-check-fast-symlink-during-orphan-recovery.patch ext4-fix-fsync-2-for-nojournal-mode.patch ext4-fix-iloc.bh-leak-in-ext4_fc_replay_inode-error-paths.patch ext4-fix-journal-credit-check-when-setting-fscrypt-context.patch ext4-fix-stale-xarray-tags-after-writeback.patch ext4-fix-the-might_sleep-warnings-in-kvfree.patch ext4-fix-use-after-free-in-update_super_work-when-racing-with-umount.patch ext4-handle-wraparound-when-searching-for-blocks-for-indirect-mapped-blocks.patch ext4-make-recently_deleted-properly-work-with-lazy-itable-initialization.patch ext4-publish-jinode-after-initialization.patch ext4-reject-mount-if-bigalloc-with-s_first_data_block-0.patch ext4-replace-bug_on-with-proper-error-handling-in-ext4_read_inline_folio.patch ext4-test-if-inode-s-all-dirty-pages-are-submitted-to-disk.patch ext4-validate-p_idx-bounds-in-ext4_ext_correct_indexes.patch --- diff --git a/queue-6.19/ext4-always-drain-queued-discard-work-in-ext4_mb_release.patch b/queue-6.19/ext4-always-drain-queued-discard-work-in-ext4_mb_release.patch new file mode 100644 index 0000000000..ff891c8f31 --- /dev/null +++ b/queue-6.19/ext4-always-drain-queued-discard-work-in-ext4_mb_release.patch @@ -0,0 +1,65 @@ +From 9ee29d20aab228adfb02ca93f87fb53c56c2f3af Mon Sep 17 00:00:00 2001 +From: Theodore Ts'o +Date: Fri, 27 Mar 2026 02:13:15 -0400 +Subject: ext4: always drain queued discard work in ext4_mb_release() + +From: Theodore Ts'o + +commit 9ee29d20aab228adfb02ca93f87fb53c56c2f3af upstream. + +While reviewing recent ext4 patch[1], Sashiko raised the following +concern[2]: + +> If the filesystem is initially mounted with the discard option, +> deleting files will populate sbi->s_discard_list and queue +> s_discard_work. If it is then remounted with nodiscard, the +> EXT4_MOUNT_DISCARD flag is cleared, but the pending s_discard_work is +> neither cancelled nor flushed. + +[1] https://lore.kernel.org/r/20260319094545.19291-1-qiang.zhang@linux.dev/ +[2] https://sashiko.dev/#/patchset/20260319094545.19291-1-qiang.zhang%40linux.dev + +The concern was valid, but it had nothing to do with the patch[1]. +One of the problems with Sashiko in its current (early) form is that +it will detect pre-existing issues and report it as a problem with the +patch that it is reviewing. + +In practice, it would be hard to hit deliberately (unless you are a +malicious syzkaller fuzzer), since it would involve mounting the file +system with -o discard, and then deleting a large number of files, +remounting the file system with -o nodiscard, and then immediately +unmounting the file system before the queued discard work has a change +to drain on its own. + +Fix it because it's a real bug, and to avoid Sashiko from raising this +concern when analyzing future patches to mballoc.c. + +Signed-off-by: Theodore Ts'o +Fixes: 55cdd0af2bc5 ("ext4: get discard out of jbd2 commit kthread contex") +Cc: stable@kernel.org +Signed-off-by: Greg Kroah-Hartman +--- + fs/ext4/mballoc.c | 12 +++++------- + 1 file changed, 5 insertions(+), 7 deletions(-) + +--- a/fs/ext4/mballoc.c ++++ b/fs/ext4/mballoc.c +@@ -3895,13 +3895,11 @@ void ext4_mb_release(struct super_block + struct kmem_cache *cachep = get_groupinfo_cache(sb->s_blocksize_bits); + int count; + +- if (test_opt(sb, DISCARD)) { +- /* +- * wait the discard work to drain all of ext4_free_data +- */ +- flush_work(&sbi->s_discard_work); +- WARN_ON_ONCE(!list_empty(&sbi->s_discard_list)); +- } ++ /* ++ * wait the discard work to drain all of ext4_free_data ++ */ ++ flush_work(&sbi->s_discard_work); ++ WARN_ON_ONCE(!list_empty(&sbi->s_discard_list)); + + group_info = rcu_access_pointer(sbi->s_group_info); + if (group_info) { diff --git a/queue-6.19/ext4-avoid-allocate-block-from-corrupted-group-in-ext4_mb_find_by_goal.patch b/queue-6.19/ext4-avoid-allocate-block-from-corrupted-group-in-ext4_mb_find_by_goal.patch new file mode 100644 index 0000000000..a39066a6b8 --- /dev/null +++ b/queue-6.19/ext4-avoid-allocate-block-from-corrupted-group-in-ext4_mb_find_by_goal.patch @@ -0,0 +1,92 @@ +From 46066e3a06647c5b186cc6334409722622d05c44 Mon Sep 17 00:00:00 2001 +From: Ye Bin +Date: Mon, 2 Mar 2026 21:46:19 +0800 +Subject: ext4: avoid allocate block from corrupted group in ext4_mb_find_by_goal() + +From: Ye Bin + +commit 46066e3a06647c5b186cc6334409722622d05c44 upstream. + +There's issue as follows: +... +EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117 +EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost + +EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117 +EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost + +EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117 +EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost + +EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 206 at logical offset 0 with max blocks 1 with error 117 +EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost + +EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 2243 at logical offset 0 with max blocks 1 with error 117 +EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost + +EXT4-fs (mmcblk0p1): Delayed block allocation failed for inode 2239 at logical offset 0 with max blocks 1 with error 117 +EXT4-fs (mmcblk0p1): This should not happen!! Data will be lost + +EXT4-fs (mmcblk0p1): error count since last fsck: 1 +EXT4-fs (mmcblk0p1): initial error at time 1765597433: ext4_mb_generate_buddy:760 +EXT4-fs (mmcblk0p1): last error at time 1765597433: ext4_mb_generate_buddy:760 +... + +According to the log analysis, blocks are always requested from the +corrupted block group. This may happen as follows: +ext4_mb_find_by_goal + ext4_mb_load_buddy + ext4_mb_load_buddy_gfp + ext4_mb_init_cache + ext4_read_block_bitmap_nowait + ext4_wait_block_bitmap + ext4_validate_block_bitmap + if (!grp || EXT4_MB_GRP_BBITMAP_CORRUPT(grp)) + return -EFSCORRUPTED; // There's no logs. + if (err) + return err; // Will return error +ext4_lock_group(ac->ac_sb, group); + if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info))) // Unreachable + goto out; + +After commit 9008a58e5dce ("ext4: make the bitmap read routines return +real error codes") merged, Commit 163a203ddb36 ("ext4: mark block group +as corrupt on block bitmap error") is no real solution for allocating +blocks from corrupted block groups. This is because if +'EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info)' is true, then +'ext4_mb_load_buddy()' may return an error. This means that the block +allocation will fail. +Therefore, check block group if corrupted when ext4_mb_load_buddy() +returns error. + +Fixes: 163a203ddb36 ("ext4: mark block group as corrupt on block bitmap error") +Fixes: 9008a58e5dce ("ext4: make the bitmap read routines return real error codes") +Signed-off-by: Ye Bin +Reviewed-by: Ritesh Harjani (IBM) +Reviewed-by: Zhang Yi +Reviewed-by: Andreas Dilger +Reviewed-by: Jan Kara +Link: https://patch.msgid.link/20260302134619.3145520-1-yebin@huaweicloud.com +Signed-off-by: Theodore Ts'o +Cc: stable@kernel.org +Signed-off-by: Greg Kroah-Hartman +--- + fs/ext4/mballoc.c | 6 +++++- + 1 file changed, 5 insertions(+), 1 deletion(-) + +--- a/fs/ext4/mballoc.c ++++ b/fs/ext4/mballoc.c +@@ -2443,8 +2443,12 @@ int ext4_mb_find_by_goal(struct ext4_all + return 0; + + err = ext4_mb_load_buddy(ac->ac_sb, group, e4b); +- if (err) ++ if (err) { ++ if (EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info) && ++ !(ac->ac_flags & EXT4_MB_HINT_GOAL_ONLY)) ++ return 0; + return err; ++ } + + ext4_lock_group(ac->ac_sb, group); + if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info))) diff --git a/queue-6.19/ext4-avoid-infinite-loops-caused-by-residual-data.patch b/queue-6.19/ext4-avoid-infinite-loops-caused-by-residual-data.patch new file mode 100644 index 0000000000..91595e5a27 --- /dev/null +++ b/queue-6.19/ext4-avoid-infinite-loops-caused-by-residual-data.patch @@ -0,0 +1,81 @@ +From 5422fe71d26d42af6c454ca9527faaad4e677d6c Mon Sep 17 00:00:00 2001 +From: Edward Adam Davis +Date: Fri, 6 Mar 2026 09:31:58 +0800 +Subject: ext4: avoid infinite loops caused by residual data + +From: Edward Adam Davis + +commit 5422fe71d26d42af6c454ca9527faaad4e677d6c upstream. + +On the mkdir/mknod path, when mapping logical blocks to physical blocks, +if inserting a new extent into the extent tree fails (in this example, +because the file system disabled the huge file feature when marking the +inode as dirty), ext4_ext_map_blocks() only calls ext4_free_blocks() to +reclaim the physical block without deleting the corresponding data in +the extent tree. This causes subsequent mkdir operations to reference +the previously reclaimed physical block number again, even though this +physical block is already being used by the xattr block. Therefore, a +situation arises where both the directory and xattr are using the same +buffer head block in memory simultaneously. + +The above causes ext4_xattr_block_set() to enter an infinite loop about +"inserted" and cannot release the inode lock, ultimately leading to the +143s blocking problem mentioned in [1]. + +If the metadata is corrupted, then trying to remove some extent space +can do even more harm. Also in case EXT4_GET_BLOCKS_DELALLOC_RESERVE +was passed, remove space wrongly update quota information. +Jan Kara suggests distinguishing between two cases: + +1) The error is ENOSPC or EDQUOT - in this case the filesystem is fully +consistent and we must maintain its consistency including all the +accounting. However these errors can happen only early before we've +inserted the extent into the extent tree. So current code works correctly +for this case. + +2) Some other error - this means metadata is corrupted. We should strive to +do as few modifications as possible to limit damage. So I'd just skip +freeing of allocated blocks. + +[1] +INFO: task syz.0.17:5995 blocked for more than 143 seconds. +Call Trace: + inode_lock_nested include/linux/fs.h:1073 [inline] + __start_dirop fs/namei.c:2923 [inline] + start_dirop fs/namei.c:2934 [inline] + +Reported-by: syzbot+512459401510e2a9a39f@syzkaller.appspotmail.com +Closes: https://syzkaller.appspot.com/bug?extid=1659aaaaa8d9d11265d7 +Tested-by: syzbot+1659aaaaa8d9d11265d7@syzkaller.appspotmail.com +Reported-by: syzbot+1659aaaaa8d9d11265d7@syzkaller.appspotmail.com +Closes: https://syzkaller.appspot.com/bug?extid=512459401510e2a9a39f +Tested-by: syzbot+1659aaaaa8d9d11265d7@syzkaller.appspotmail.com +Signed-off-by: Edward Adam Davis +Reviewed-by: Jan Kara +Tested-by: syzbot+512459401510e2a9a39f@syzkaller.appspotmail.com +Link: https://patch.msgid.link/tencent_43696283A68450B761D76866C6F360E36705@qq.com +Signed-off-by: Theodore Ts'o +Cc: stable@kernel.org +Signed-off-by: Greg Kroah-Hartman +--- + fs/ext4/extents.c | 8 ++++++-- + 1 file changed, 6 insertions(+), 2 deletions(-) + +--- a/fs/ext4/extents.c ++++ b/fs/ext4/extents.c +@@ -4461,9 +4461,13 @@ got_allocated_blocks: + path = ext4_ext_insert_extent(handle, inode, path, &newex, flags); + if (IS_ERR(path)) { + err = PTR_ERR(path); +- if (allocated_clusters) { ++ /* ++ * Gracefully handle out of space conditions. If the filesystem ++ * is inconsistent, we'll just leak allocated blocks to avoid ++ * causing even more damage. ++ */ ++ if (allocated_clusters && (err == -EDQUOT || err == -ENOSPC)) { + int fb_flags = 0; +- + /* + * free data blocks we just allocated. + * not a good idea to call discard here directly, diff --git a/queue-6.19/ext4-convert-inline-data-to-extents-when-truncate-exceeds-inline-size.patch b/queue-6.19/ext4-convert-inline-data-to-extents-when-truncate-exceeds-inline-size.patch new file mode 100644 index 0000000000..07b905ea5b --- /dev/null +++ b/queue-6.19/ext4-convert-inline-data-to-extents-when-truncate-exceeds-inline-size.patch @@ -0,0 +1,67 @@ +From ed9356a30e59c7cc3198e7fc46cfedf3767b9b17 Mon Sep 17 00:00:00 2001 +From: Deepanshu Kartikey +Date: Sat, 7 Feb 2026 10:06:07 +0530 +Subject: ext4: convert inline data to extents when truncate exceeds inline size + +From: Deepanshu Kartikey + +commit ed9356a30e59c7cc3198e7fc46cfedf3767b9b17 upstream. + +Add a check in ext4_setattr() to convert files from inline data storage +to extent-based storage when truncate() grows the file size beyond the +inline capacity. This prevents the filesystem from entering an +inconsistent state where the inline data flag is set but the file size +exceeds what can be stored inline. + +Without this fix, the following sequence causes a kernel BUG_ON(): + +1. Mount filesystem with inode that has inline flag set and small size +2. truncate(file, 50MB) - grows size but inline flag remains set +3. sendfile() attempts to write data +4. ext4_write_inline_data() hits BUG_ON(write_size > inline_capacity) + +The crash occurs because ext4_write_inline_data() expects inline storage +to accommodate the write, but the actual inline capacity (~60 bytes for +i_block + ~96 bytes for xattrs) is far smaller than the file size and +write request. + +The fix checks if the new size from setattr exceeds the inode's actual +inline capacity (EXT4_I(inode)->i_inline_size) and converts the file to +extent-based storage before proceeding with the size change. + +This addresses the root cause by ensuring the inline data flag and file +size remain consistent during truncate operations. + +Reported-by: syzbot+7de5fe447862fc37576f@syzkaller.appspotmail.com +Closes: https://syzkaller.appspot.com/bug?extid=7de5fe447862fc37576f +Tested-by: syzbot+7de5fe447862fc37576f@syzkaller.appspotmail.com +Signed-off-by: Deepanshu Kartikey +Link: https://patch.msgid.link/20260207043607.1175976-1-kartikey406@gmail.com +Signed-off-by: Theodore Ts'o +Cc: stable@kernel.org +Signed-off-by: Greg Kroah-Hartman +--- + fs/ext4/inode.c | 12 ++++++++++++ + 1 file changed, 12 insertions(+) + +--- a/fs/ext4/inode.c ++++ b/fs/ext4/inode.c +@@ -5901,6 +5901,18 @@ int ext4_setattr(struct mnt_idmap *idmap + if (attr->ia_size == inode->i_size) + inc_ivers = false; + ++ /* ++ * If file has inline data but new size exceeds inline capacity, ++ * convert to extent-based storage first to prevent inconsistent ++ * state (inline flag set but size exceeds inline capacity). ++ */ ++ if (ext4_has_inline_data(inode) && ++ attr->ia_size > EXT4_I(inode)->i_inline_size) { ++ error = ext4_convert_inline_data(inode); ++ if (error) ++ goto err_out; ++ } ++ + if (shrink) { + if (ext4_should_order_data(inode)) { + error = ext4_begin_ordered_truncate(inode, diff --git a/queue-6.19/ext4-do-not-check-fast-symlink-during-orphan-recovery.patch b/queue-6.19/ext4-do-not-check-fast-symlink-during-orphan-recovery.patch new file mode 100644 index 0000000000..0c6634eda7 --- /dev/null +++ b/queue-6.19/ext4-do-not-check-fast-symlink-during-orphan-recovery.patch @@ -0,0 +1,90 @@ +From 84e21e3fb8fd99ea460eb7274584750d11cf3e9f Mon Sep 17 00:00:00 2001 +From: Zhang Yi +Date: Sat, 31 Jan 2026 17:11:56 +0800 +Subject: ext4: do not check fast symlink during orphan recovery + +From: Zhang Yi + +commit 84e21e3fb8fd99ea460eb7274584750d11cf3e9f upstream. + +Commit '5f920d5d6083 ("ext4: verify fast symlink length")' causes the +generic/475 test to fail during orphan cleanup of zero-length symlinks. + + generic/475 84s ... _check_generic_filesystem: filesystem on /dev/vde is inconsistent + +The fsck reports are provided below: + + Deleted inode 9686 has zero dtime. + Deleted inode 158230 has zero dtime. + ... + Inode bitmap differences: -9686 -158230 + Orphan file (inode 12) block 13 is not clean. + Failed to initialize orphan file. + +In ext4_symlink(), a newly created symlink can be added to the orphan +list due to ENOSPC. Its data has not been initialized, and its size is +zero. Therefore, we need to disregard the length check of the symbolic +link when cleaning up orphan inodes. Instead, we should ensure that the +nlink count is zero. + +Fixes: 5f920d5d6083 ("ext4: verify fast symlink length") +Signed-off-by: Zhang Yi +Reviewed-by: Jan Kara +Link: https://patch.msgid.link/20260131091156.1733648-1-yi.zhang@huaweicloud.com +Signed-off-by: Theodore Ts'o +Cc: stable@kernel.org +Signed-off-by: Greg Kroah-Hartman +--- + fs/ext4/inode.c | 40 +++++++++++++++++++++++++++++----------- + 1 file changed, 29 insertions(+), 11 deletions(-) + +--- a/fs/ext4/inode.c ++++ b/fs/ext4/inode.c +@@ -5449,18 +5449,36 @@ struct inode *__ext4_iget(struct super_b + inode->i_op = &ext4_encrypted_symlink_inode_operations; + } else if (ext4_inode_is_fast_symlink(inode)) { + inode->i_op = &ext4_fast_symlink_inode_operations; +- if (inode->i_size == 0 || +- inode->i_size >= sizeof(ei->i_data) || +- strnlen((char *)ei->i_data, inode->i_size + 1) != +- inode->i_size) { +- ext4_error_inode(inode, function, line, 0, +- "invalid fast symlink length %llu", +- (unsigned long long)inode->i_size); +- ret = -EFSCORRUPTED; +- goto bad_inode; ++ ++ /* ++ * Orphan cleanup can see inodes with i_size == 0 ++ * and i_data uninitialized. Skip size checks in ++ * that case. This is safe because the first thing ++ * ext4_evict_inode() does for fast symlinks is ++ * clearing of i_data and i_size. ++ */ ++ if ((EXT4_SB(sb)->s_mount_state & EXT4_ORPHAN_FS)) { ++ if (inode->i_nlink != 0) { ++ ext4_error_inode(inode, function, line, 0, ++ "invalid orphan symlink nlink %d", ++ inode->i_nlink); ++ ret = -EFSCORRUPTED; ++ goto bad_inode; ++ } ++ } else { ++ if (inode->i_size == 0 || ++ inode->i_size >= sizeof(ei->i_data) || ++ strnlen((char *)ei->i_data, inode->i_size + 1) != ++ inode->i_size) { ++ ext4_error_inode(inode, function, line, 0, ++ "invalid fast symlink length %llu", ++ (unsigned long long)inode->i_size); ++ ret = -EFSCORRUPTED; ++ goto bad_inode; ++ } ++ inode_set_cached_link(inode, (char *)ei->i_data, ++ inode->i_size); + } +- inode_set_cached_link(inode, (char *)ei->i_data, +- inode->i_size); + } else { + inode->i_op = &ext4_symlink_inode_operations; + } diff --git a/queue-6.19/ext4-fix-fsync-2-for-nojournal-mode.patch b/queue-6.19/ext4-fix-fsync-2-for-nojournal-mode.patch new file mode 100644 index 0000000000..9f9cdb3582 --- /dev/null +++ b/queue-6.19/ext4-fix-fsync-2-for-nojournal-mode.patch @@ -0,0 +1,60 @@ +From 1308255bbf8452762f89f44f7447ce137ecdbcff Mon Sep 17 00:00:00 2001 +From: Jan Kara +Date: Mon, 16 Feb 2026 17:48:44 +0100 +Subject: ext4: fix fsync(2) for nojournal mode + +From: Jan Kara + +commit 1308255bbf8452762f89f44f7447ce137ecdbcff upstream. + +When inode metadata is changed, we sometimes just call +ext4_mark_inode_dirty() to track modified metadata. This copies inode +metadata into block buffer which is enough when we are journalling +metadata. However when we are running in nojournal mode we currently +fail to write the dirtied inode buffer during fsync(2) because the inode +is not marked as dirty. Use explicit ext4_write_inode() call to make +sure the inode table buffer is written to the disk. This is a band aid +solution but proper solution requires a much larger rewrite including +changes in metadata bh tracking infrastructure. + +Reported-by: Free Ekanayaka +Link: https://lore.kernel.org/all/87il8nhxdm.fsf@x1.mail-host-address-is-not-set/ +CC: stable@vger.kernel.org +Signed-off-by: Jan Kara +Reviewed-by: Zhang Yi +Link: https://patch.msgid.link/20260216164848.3074-4-jack@suse.cz +Signed-off-by: Theodore Ts'o +Cc: stable@kernel.org +Signed-off-by: Greg Kroah-Hartman +--- + fs/ext4/fsync.c | 16 ++++++++++++++-- + 1 file changed, 14 insertions(+), 2 deletions(-) + +--- a/fs/ext4/fsync.c ++++ b/fs/ext4/fsync.c +@@ -83,11 +83,23 @@ static int ext4_fsync_nojournal(struct f + int datasync, bool *needs_barrier) + { + struct inode *inode = file->f_inode; ++ struct writeback_control wbc = { ++ .sync_mode = WB_SYNC_ALL, ++ .nr_to_write = 0, ++ }; + int ret; + + ret = generic_buffers_fsync_noflush(file, start, end, datasync); +- if (!ret) +- ret = ext4_sync_parent(inode); ++ if (ret) ++ return ret; ++ ++ /* Force writeout of inode table buffer to disk */ ++ ret = ext4_write_inode(inode, &wbc); ++ if (ret) ++ return ret; ++ ++ ret = ext4_sync_parent(inode); ++ + if (test_opt(inode->i_sb, BARRIER)) + *needs_barrier = true; + diff --git a/queue-6.19/ext4-fix-iloc.bh-leak-in-ext4_fc_replay_inode-error-paths.patch b/queue-6.19/ext4-fix-iloc.bh-leak-in-ext4_fc_replay_inode-error-paths.patch new file mode 100644 index 0000000000..144c1d55a2 --- /dev/null +++ b/queue-6.19/ext4-fix-iloc.bh-leak-in-ext4_fc_replay_inode-error-paths.patch @@ -0,0 +1,84 @@ +From ec0a7500d8eace5b4f305fa0c594dd148f0e8d29 Mon Sep 17 00:00:00 2001 +From: Baokun Li +Date: Mon, 23 Mar 2026 14:08:36 +0800 +Subject: ext4: fix iloc.bh leak in ext4_fc_replay_inode() error paths + +From: Baokun Li + +commit ec0a7500d8eace5b4f305fa0c594dd148f0e8d29 upstream. + +During code review, Joseph found that ext4_fc_replay_inode() calls +ext4_get_fc_inode_loc() to get the inode location, which holds a +reference to iloc.bh that must be released via brelse(). + +However, several error paths jump to the 'out' label without +releasing iloc.bh: + + - ext4_handle_dirty_metadata() failure + - sync_dirty_buffer() failure + - ext4_mark_inode_used() failure + - ext4_iget() failure + +Fix this by introducing an 'out_brelse' label placed just before +the existing 'out' label to ensure iloc.bh is always released. + +Additionally, make ext4_fc_replay_inode() propagate errors +properly instead of always returning 0. + +Reported-by: Joseph Qi +Fixes: 8016e29f4362 ("ext4: fast commit recovery path") +Signed-off-by: Baokun Li +Reviewed-by: Zhang Yi +Reviewed-by: Jan Kara +Link: https://patch.msgid.link/20260323060836.3452660-1-libaokun@linux.alibaba.com +Signed-off-by: Theodore Ts'o +Cc: stable@kernel.org +Signed-off-by: Greg Kroah-Hartman +--- + fs/ext4/fast_commit.c | 13 ++++++++----- + 1 file changed, 8 insertions(+), 5 deletions(-) + +--- a/fs/ext4/fast_commit.c ++++ b/fs/ext4/fast_commit.c +@@ -1613,19 +1613,21 @@ static int ext4_fc_replay_inode(struct s + /* Immediately update the inode on disk. */ + ret = ext4_handle_dirty_metadata(NULL, NULL, iloc.bh); + if (ret) +- goto out; ++ goto out_brelse; + ret = sync_dirty_buffer(iloc.bh); + if (ret) +- goto out; ++ goto out_brelse; + ret = ext4_mark_inode_used(sb, ino); + if (ret) +- goto out; ++ goto out_brelse; + + /* Given that we just wrote the inode on disk, this SHOULD succeed. */ + inode = ext4_iget(sb, ino, EXT4_IGET_NORMAL); + if (IS_ERR(inode)) { + ext4_debug("Inode not found."); +- return -EFSCORRUPTED; ++ inode = NULL; ++ ret = -EFSCORRUPTED; ++ goto out_brelse; + } + + /* +@@ -1642,13 +1644,14 @@ static int ext4_fc_replay_inode(struct s + ext4_inode_csum_set(inode, ext4_raw_inode(&iloc), EXT4_I(inode)); + ret = ext4_handle_dirty_metadata(NULL, NULL, iloc.bh); + sync_dirty_buffer(iloc.bh); ++out_brelse: + brelse(iloc.bh); + out: + iput(inode); + if (!ret) + blkdev_issue_flush(sb->s_bdev); + +- return 0; ++ return ret; + } + + /* diff --git a/queue-6.19/ext4-fix-journal-credit-check-when-setting-fscrypt-context.patch b/queue-6.19/ext4-fix-journal-credit-check-when-setting-fscrypt-context.patch new file mode 100644 index 0000000000..e6ba644bbb --- /dev/null +++ b/queue-6.19/ext4-fix-journal-credit-check-when-setting-fscrypt-context.patch @@ -0,0 +1,58 @@ +From b1d682f1990c19fb1d5b97d13266210457092bcd Mon Sep 17 00:00:00 2001 +From: Simon Weber +Date: Sat, 7 Feb 2026 10:53:03 +0100 +Subject: ext4: fix journal credit check when setting fscrypt context + +From: Simon Weber + +commit b1d682f1990c19fb1d5b97d13266210457092bcd upstream. + +Fix an issue arising when ext4 features has_journal, ea_inode, and encrypt +are activated simultaneously, leading to ENOSPC when creating an encrypted +file. + +Fix by passing XATTR_CREATE flag to xattr_set_handle function if a handle +is specified, i.e., when the function is called in the control flow of +creating a new inode. This aligns the number of jbd2 credits set_handle +checks for with the number allocated for creating a new inode. + +ext4_set_context must not be called with a non-null handle (fs_data) if +fscrypt context xattr is not guaranteed to not exist yet. The only other +usage of this function currently is when handling the ioctl +FS_IOC_SET_ENCRYPTION_POLICY, which calls it with fs_data=NULL. + +Fixes: c1a5d5f6ab21eb7e ("ext4: improve journal credit handling in set xattr paths") + +Co-developed-by: Anthony Durrer +Signed-off-by: Anthony Durrer +Signed-off-by: Simon Weber +Reviewed-by: Eric Biggers +Link: https://patch.msgid.link/20260207100148.724275-4-simon.weber.39@gmail.com +Signed-off-by: Theodore Ts'o +Cc: stable@kernel.org +Signed-off-by: Greg Kroah-Hartman +--- + fs/ext4/crypto.c | 9 ++++++++- + 1 file changed, 8 insertions(+), 1 deletion(-) + +--- a/fs/ext4/crypto.c ++++ b/fs/ext4/crypto.c +@@ -163,10 +163,17 @@ static int ext4_set_context(struct inode + */ + + if (handle) { ++ /* ++ * Since the inode is new it is ok to pass the ++ * XATTR_CREATE flag. This is necessary to match the ++ * remaining journal credits check in the set_handle ++ * function with the credits allocated for the new ++ * inode. ++ */ + res = ext4_xattr_set_handle(handle, inode, + EXT4_XATTR_INDEX_ENCRYPTION, + EXT4_XATTR_NAME_ENCRYPTION_CONTEXT, +- ctx, len, 0); ++ ctx, len, XATTR_CREATE); + if (!res) { + ext4_set_inode_flag(inode, EXT4_INODE_ENCRYPT); + ext4_clear_inode_state(inode, diff --git a/queue-6.19/ext4-fix-stale-xarray-tags-after-writeback.patch b/queue-6.19/ext4-fix-stale-xarray-tags-after-writeback.patch new file mode 100644 index 0000000000..c150760e8f --- /dev/null +++ b/queue-6.19/ext4-fix-stale-xarray-tags-after-writeback.patch @@ -0,0 +1,60 @@ +From f4a2b42e78914ff15630e71289adc589c3a8eb45 Mon Sep 17 00:00:00 2001 +From: Jan Kara +Date: Thu, 5 Feb 2026 10:22:24 +0100 +Subject: ext4: fix stale xarray tags after writeback + +From: Jan Kara + +commit f4a2b42e78914ff15630e71289adc589c3a8eb45 upstream. + +There are cases where ext4_bio_write_page() gets called for a page which +has no buffers to submit. This happens e.g. when the part of the file is +actually a hole, when we cannot allocate blocks due to being called from +jbd2, or in data=journal mode when checkpointing writes the buffers +earlier. In these cases we just return from ext4_bio_write_page() +however if the page didn't need redirtying, we will leave stale DIRTY +and/or TOWRITE tags in xarray because those get cleared only in +__folio_start_writeback(). As a result we can leave these tags set in +mappings even after a final sync on filesystem that's getting remounted +read-only or that's being frozen. Various assertions can then get upset +when writeback is started on such filesystems (Gerald reported assertion +in ext4_journal_check_start() firing). + +Fix the problem by cycling the page through writeback state even if we +decide nothing needs to be written for it so that xarray tags get +properly updated. This is slightly silly (we could update the xarray +tags directly) but I don't think a special helper messing with xarray +tags is really worth it in this relatively rare corner case. + +Reported-by: Gerald Yang +Link: https://lore.kernel.org/all/20260128074515.2028982-1-gerald.yang@canonical.com +Fixes: dff4ac75eeee ("ext4: move keep_towrite handling to ext4_bio_write_page()") +Signed-off-by: Jan Kara +Link: https://patch.msgid.link/20260205092223.21287-2-jack@suse.cz +Signed-off-by: Theodore Ts'o +Cc: stable@kernel.org +Signed-off-by: Greg Kroah-Hartman +--- + fs/ext4/page-io.c | 10 ++++++++-- + 1 file changed, 8 insertions(+), 2 deletions(-) + +--- a/fs/ext4/page-io.c ++++ b/fs/ext4/page-io.c +@@ -523,9 +523,15 @@ int ext4_bio_write_folio(struct ext4_io_ + nr_to_submit++; + } while ((bh = bh->b_this_page) != head); + +- /* Nothing to submit? Just unlock the folio... */ +- if (!nr_to_submit) ++ if (!nr_to_submit) { ++ /* ++ * We have nothing to submit. Just cycle the folio through ++ * writeback state to properly update xarray tags. ++ */ ++ __folio_start_writeback(folio, keep_towrite); ++ folio_end_writeback(folio); + return 0; ++ } + + bh = head = folio_buffers(folio); + diff --git a/queue-6.19/ext4-fix-the-might_sleep-warnings-in-kvfree.patch b/queue-6.19/ext4-fix-the-might_sleep-warnings-in-kvfree.patch new file mode 100644 index 0000000000..e7970d2f49 --- /dev/null +++ b/queue-6.19/ext4-fix-the-might_sleep-warnings-in-kvfree.patch @@ -0,0 +1,168 @@ +From 496bb99b7e66f48b178126626f47e9ba79e2d0fa Mon Sep 17 00:00:00 2001 +From: Zqiang +Date: Thu, 19 Mar 2026 17:45:45 +0800 +Subject: ext4: fix the might_sleep() warnings in kvfree() + +From: Zqiang + +commit 496bb99b7e66f48b178126626f47e9ba79e2d0fa upstream. + +Use the kvfree() in the RCU read critical section can trigger +the following warnings: + +EXT4-fs (vdb): unmounting filesystem cd983e5b-3c83-4f5a-a136-17b00eb9d018. + +WARNING: suspicious RCU usage + +./include/linux/rcupdate.h:409 Illegal context switch in RCU read-side critical section! + +other info that might help us debug this: + +rcu_scheduler_active = 2, debug_locks = 1 + +Call Trace: + + dump_stack_lvl+0xbb/0xd0 + dump_stack+0x14/0x20 + lockdep_rcu_suspicious+0x15a/0x1b0 + __might_resched+0x375/0x4d0 + ? put_object.part.0+0x2c/0x50 + __might_sleep+0x108/0x160 + vfree+0x58/0x910 + ? ext4_group_desc_free+0x27/0x270 + kvfree+0x23/0x40 + ext4_group_desc_free+0x111/0x270 + ext4_put_super+0x3c8/0xd40 + generic_shutdown_super+0x14c/0x4a0 + ? __pfx_shrinker_free+0x10/0x10 + kill_block_super+0x40/0x90 + ext4_kill_sb+0x6d/0xb0 + deactivate_locked_super+0xb4/0x180 + deactivate_super+0x7e/0xa0 + cleanup_mnt+0x296/0x3e0 + __cleanup_mnt+0x16/0x20 + task_work_run+0x157/0x250 + ? __pfx_task_work_run+0x10/0x10 + ? exit_to_user_mode_loop+0x6a/0x550 + exit_to_user_mode_loop+0x102/0x550 + do_syscall_64+0x44a/0x500 + entry_SYSCALL_64_after_hwframe+0x77/0x7f + + +BUG: sleeping function called from invalid context at mm/vmalloc.c:3441 +in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 556, name: umount +preempt_count: 1, expected: 0 +CPU: 3 UID: 0 PID: 556 Comm: umount +Call Trace: + + dump_stack_lvl+0xbb/0xd0 + dump_stack+0x14/0x20 + __might_resched+0x275/0x4d0 + ? put_object.part.0+0x2c/0x50 + __might_sleep+0x108/0x160 + vfree+0x58/0x910 + ? ext4_group_desc_free+0x27/0x270 + kvfree+0x23/0x40 + ext4_group_desc_free+0x111/0x270 + ext4_put_super+0x3c8/0xd40 + generic_shutdown_super+0x14c/0x4a0 + ? __pfx_shrinker_free+0x10/0x10 + kill_block_super+0x40/0x90 + ext4_kill_sb+0x6d/0xb0 + deactivate_locked_super+0xb4/0x180 + deactivate_super+0x7e/0xa0 + cleanup_mnt+0x296/0x3e0 + __cleanup_mnt+0x16/0x20 + task_work_run+0x157/0x250 + ? __pfx_task_work_run+0x10/0x10 + ? exit_to_user_mode_loop+0x6a/0x550 + exit_to_user_mode_loop+0x102/0x550 + do_syscall_64+0x44a/0x500 + entry_SYSCALL_64_after_hwframe+0x77/0x7f + +The above scenarios occur in initialization failures and teardown +paths, there are no parallel operations on the resources released +by kvfree(), this commit therefore remove rcu_read_lock/unlock() and +use rcu_access_pointer() instead of rcu_dereference() operations. + +Fixes: 7c990728b99e ("ext4: fix potential race between s_flex_groups online resizing and access") +Fixes: df3da4ea5a0f ("ext4: fix potential race between s_group_info online resizing and access") +Signed-off-by: Zqiang +Reviewed-by: Baokun Li +Link: https://patch.msgid.link/20260319094545.19291-1-qiang.zhang@linux.dev +Signed-off-by: Theodore Ts'o +Cc: stable@kernel.org +Signed-off-by: Greg Kroah-Hartman +--- + fs/ext4/mballoc.c | 10 +++------- + fs/ext4/super.c | 8 ++------ + 2 files changed, 5 insertions(+), 13 deletions(-) + +--- a/fs/ext4/mballoc.c ++++ b/fs/ext4/mballoc.c +@@ -3584,9 +3584,7 @@ err_freebuddy: + rcu_read_unlock(); + iput(sbi->s_buddy_cache); + err_freesgi: +- rcu_read_lock(); +- kvfree(rcu_dereference(sbi->s_group_info)); +- rcu_read_unlock(); ++ kvfree(rcu_access_pointer(sbi->s_group_info)); + return -ENOMEM; + } + +@@ -3903,7 +3901,8 @@ void ext4_mb_release(struct super_block + WARN_ON_ONCE(!list_empty(&sbi->s_discard_list)); + } + +- if (sbi->s_group_info) { ++ group_info = rcu_access_pointer(sbi->s_group_info); ++ if (group_info) { + for (i = 0; i < ngroups; i++) { + cond_resched(); + grinfo = ext4_get_group_info(sb, i); +@@ -3921,12 +3920,9 @@ void ext4_mb_release(struct super_block + num_meta_group_infos = (ngroups + + EXT4_DESC_PER_BLOCK(sb) - 1) >> + EXT4_DESC_PER_BLOCK_BITS(sb); +- rcu_read_lock(); +- group_info = rcu_dereference(sbi->s_group_info); + for (i = 0; i < num_meta_group_infos; i++) + kfree(group_info[i]); + kvfree(group_info); +- rcu_read_unlock(); + } + ext4_mb_avg_fragment_size_destroy(sbi); + ext4_mb_largest_free_orders_destroy(sbi); +--- a/fs/ext4/super.c ++++ b/fs/ext4/super.c +@@ -1249,12 +1249,10 @@ static void ext4_group_desc_free(struct + struct buffer_head **group_desc; + int i; + +- rcu_read_lock(); +- group_desc = rcu_dereference(sbi->s_group_desc); ++ group_desc = rcu_access_pointer(sbi->s_group_desc); + for (i = 0; i < sbi->s_gdb_count; i++) + brelse(group_desc[i]); + kvfree(group_desc); +- rcu_read_unlock(); + } + + static void ext4_flex_groups_free(struct ext4_sb_info *sbi) +@@ -1262,14 +1260,12 @@ static void ext4_flex_groups_free(struct + struct flex_groups **flex_groups; + int i; + +- rcu_read_lock(); +- flex_groups = rcu_dereference(sbi->s_flex_groups); ++ flex_groups = rcu_access_pointer(sbi->s_flex_groups); + if (flex_groups) { + for (i = 0; i < sbi->s_flex_groups_allocated; i++) + kvfree(flex_groups[i]); + kvfree(flex_groups); + } +- rcu_read_unlock(); + } + + static void ext4_put_super(struct super_block *sb) diff --git a/queue-6.19/ext4-fix-use-after-free-in-update_super_work-when-racing-with-umount.patch b/queue-6.19/ext4-fix-use-after-free-in-update_super_work-when-racing-with-umount.patch new file mode 100644 index 0000000000..89d8a4418f --- /dev/null +++ b/queue-6.19/ext4-fix-use-after-free-in-update_super_work-when-racing-with-umount.patch @@ -0,0 +1,113 @@ +From d15e4b0a418537aafa56b2cb80d44add83e83697 Mon Sep 17 00:00:00 2001 +From: Jiayuan Chen +Date: Thu, 19 Mar 2026 20:03:35 +0800 +Subject: ext4: fix use-after-free in update_super_work when racing with umount + +From: Jiayuan Chen + +commit d15e4b0a418537aafa56b2cb80d44add83e83697 upstream. + +Commit b98535d09179 ("ext4: fix bug_on in start_this_handle during umount +filesystem") moved ext4_unregister_sysfs() before flushing s_sb_upd_work +to prevent new error work from being queued via /proc/fs/ext4/xx/mb_groups +reads during unmount. However, this introduced a use-after-free because +update_super_work calls ext4_notify_error_sysfs() -> sysfs_notify() which +accesses the kobject's kernfs_node after it has been freed by kobject_del() +in ext4_unregister_sysfs(): + + update_super_work ext4_put_super + ----------------- -------------- + ext4_unregister_sysfs(sb) + kobject_del(&sbi->s_kobj) + __kobject_del() + sysfs_remove_dir() + kobj->sd = NULL + sysfs_put(sd) + kernfs_put() // RCU free + ext4_notify_error_sysfs(sbi) + sysfs_notify(&sbi->s_kobj) + kn = kobj->sd // stale pointer + kernfs_get(kn) // UAF on freed kernfs_node + ext4_journal_destroy() + flush_work(&sbi->s_sb_upd_work) + +Instead of reordering the teardown sequence, fix this by making +ext4_notify_error_sysfs() detect that sysfs has already been torn down +by checking s_kobj.state_in_sysfs, and skipping the sysfs_notify() call +in that case. A dedicated mutex (s_error_notify_mutex) serializes +ext4_notify_error_sysfs() against kobject_del() in ext4_unregister_sysfs() +to prevent TOCTOU races where the kobject could be deleted between the +state_in_sysfs check and the sysfs_notify() call. + +Fixes: b98535d09179 ("ext4: fix bug_on in start_this_handle during umount filesystem") +Cc: Jiayuan Chen +Suggested-by: Jan Kara +Signed-off-by: Jiayuan Chen +Reviewed-by: Ritesh Harjani (IBM) +Reviewed-by: Jan Kara +Link: https://patch.msgid.link/20260319120336.157873-1-jiayuan.chen@linux.dev +Signed-off-by: Theodore Ts'o +Cc: stable@kernel.org +Signed-off-by: Greg Kroah-Hartman +--- + fs/ext4/ext4.h | 1 + + fs/ext4/super.c | 1 + + fs/ext4/sysfs.c | 10 +++++++++- + 3 files changed, 11 insertions(+), 1 deletion(-) + +--- a/fs/ext4/ext4.h ++++ b/fs/ext4/ext4.h +@@ -1583,6 +1583,7 @@ struct ext4_sb_info { + struct proc_dir_entry *s_proc; + struct kobject s_kobj; + struct completion s_kobj_unregister; ++ struct mutex s_error_notify_mutex; /* protects sysfs_notify vs kobject_del */ + struct super_block *s_sb; + struct buffer_head *s_mmp_bh; + +--- a/fs/ext4/super.c ++++ b/fs/ext4/super.c +@@ -5400,6 +5400,7 @@ static int __ext4_fill_super(struct fs_c + + timer_setup(&sbi->s_err_report, print_daily_error_info, 0); + spin_lock_init(&sbi->s_error_lock); ++ mutex_init(&sbi->s_error_notify_mutex); + INIT_WORK(&sbi->s_sb_upd_work, update_super_work); + + err = ext4_group_desc_init(sb, es, logical_sb_block, &first_not_zeroed); +--- a/fs/ext4/sysfs.c ++++ b/fs/ext4/sysfs.c +@@ -561,7 +561,10 @@ static const struct kobj_type ext4_feat_ + + void ext4_notify_error_sysfs(struct ext4_sb_info *sbi) + { +- sysfs_notify(&sbi->s_kobj, NULL, "errors_count"); ++ mutex_lock(&sbi->s_error_notify_mutex); ++ if (sbi->s_kobj.state_in_sysfs) ++ sysfs_notify(&sbi->s_kobj, NULL, "errors_count"); ++ mutex_unlock(&sbi->s_error_notify_mutex); + } + + static struct kobject *ext4_root; +@@ -574,8 +577,10 @@ int ext4_register_sysfs(struct super_blo + int err; + + init_completion(&sbi->s_kobj_unregister); ++ mutex_lock(&sbi->s_error_notify_mutex); + err = kobject_init_and_add(&sbi->s_kobj, &ext4_sb_ktype, ext4_root, + "%s", sb->s_id); ++ mutex_unlock(&sbi->s_error_notify_mutex); + if (err) { + kobject_put(&sbi->s_kobj); + wait_for_completion(&sbi->s_kobj_unregister); +@@ -608,7 +613,10 @@ void ext4_unregister_sysfs(struct super_ + + if (sbi->s_proc) + remove_proc_subtree(sb->s_id, ext4_proc_root); ++ ++ mutex_lock(&sbi->s_error_notify_mutex); + kobject_del(&sbi->s_kobj); ++ mutex_unlock(&sbi->s_error_notify_mutex); + } + + int __init ext4_init_sysfs(void) diff --git a/queue-6.19/ext4-handle-wraparound-when-searching-for-blocks-for-indirect-mapped-blocks.patch b/queue-6.19/ext4-handle-wraparound-when-searching-for-blocks-for-indirect-mapped-blocks.patch new file mode 100644 index 0000000000..732dfccf08 --- /dev/null +++ b/queue-6.19/ext4-handle-wraparound-when-searching-for-blocks-for-indirect-mapped-blocks.patch @@ -0,0 +1,60 @@ +From bb81702370fad22c06ca12b6e1648754dbc37e0f Mon Sep 17 00:00:00 2001 +From: Theodore Ts'o +Date: Thu, 26 Mar 2026 00:58:34 -0400 +Subject: ext4: handle wraparound when searching for blocks for indirect mapped blocks + +From: Theodore Ts'o + +commit bb81702370fad22c06ca12b6e1648754dbc37e0f upstream. + +Commit 4865c768b563 ("ext4: always allocate blocks only from groups +inode can use") restricts what blocks will be allocated for indirect +block based files to block numbers that fit within 32-bit block +numbers. + +However, when using a review bot running on the latest Gemini LLM to +check this commit when backporting into an LTS based kernel, it raised +this concern: + + If ac->ac_g_ex.fe_group is >= ngroups (for instance, if the goal + group was populated via stream allocation from s_mb_last_groups), + then start will be >= ngroups. + + Does this allow allocating blocks beyond the 32-bit limit for + indirect block mapped files? The commit message mentions that + ext4_mb_scan_groups_linear() takes care to not select unsupported + groups. However, its loop uses group = *start, and the very first + iteration will call ext4_mb_scan_group() with this unsupported + group because next_linear_group() is only called at the end of the + iteration. + +After reviewing the code paths involved and considering the LLM +review, I determined that this can happen when there is a file system +where some files/directories are extent-mapped and others are +indirect-block mapped. To address this, add a safety clamp in +ext4_mb_scan_groups(). + +Fixes: 4865c768b563 ("ext4: always allocate blocks only from groups inode can use") +Cc: Jan Kara +Reviewed-by: Baokun Li +Reviewed-by: Jan Kara +Signed-off-by: Theodore Ts'o +Link: https://patch.msgid.link/20260326045834.1175822-1-tytso@mit.edu +Signed-off-by: Theodore Ts'o +Cc: stable@kernel.org +Signed-off-by: Greg Kroah-Hartman +--- + fs/ext4/mballoc.c | 2 ++ + 1 file changed, 2 insertions(+) + +--- a/fs/ext4/mballoc.c ++++ b/fs/ext4/mballoc.c +@@ -1199,6 +1199,8 @@ static int ext4_mb_scan_groups(struct ex + + /* searching for the right group start from the goal value specified */ + start = ac->ac_g_ex.fe_group; ++ if (start >= ngroups) ++ start = 0; + ac->ac_prefetch_grp = start; + ac->ac_prefetch_nr = 0; + diff --git a/queue-6.19/ext4-make-recently_deleted-properly-work-with-lazy-itable-initialization.patch b/queue-6.19/ext4-make-recently_deleted-properly-work-with-lazy-itable-initialization.patch new file mode 100644 index 0000000000..7eb7f55c8a --- /dev/null +++ b/queue-6.19/ext4-make-recently_deleted-properly-work-with-lazy-itable-initialization.patch @@ -0,0 +1,42 @@ +From bd060afa7cc3e0ad30afa9ecc544a78638498555 Mon Sep 17 00:00:00 2001 +From: Jan Kara +Date: Mon, 16 Feb 2026 17:48:43 +0100 +Subject: ext4: make recently_deleted() properly work with lazy itable initialization + +From: Jan Kara + +commit bd060afa7cc3e0ad30afa9ecc544a78638498555 upstream. + +recently_deleted() checks whether inode has been used in the near past. +However this can give false positive result when inode table is not +initialized yet and we are in fact comparing to random garbage (or stale +itable block of a filesystem before mkfs). Ultimately this results in +uninitialized inodes being skipped during inode allocation and possibly +they are never initialized and thus e2fsck complains. Verify if the +inode has been initialized before checking for dtime. + +Signed-off-by: Jan Kara +Reviewed-by: Zhang Yi +Link: https://patch.msgid.link/20260216164848.3074-3-jack@suse.cz +Signed-off-by: Theodore Ts'o +Cc: stable@kernel.org +Signed-off-by: Greg Kroah-Hartman +--- + fs/ext4/ialloc.c | 6 ++++++ + 1 file changed, 6 insertions(+) + +--- a/fs/ext4/ialloc.c ++++ b/fs/ext4/ialloc.c +@@ -686,6 +686,12 @@ static int recently_deleted(struct super + if (unlikely(!gdp)) + return 0; + ++ /* Inode was never used in this filesystem? */ ++ if (ext4_has_group_desc_csum(sb) && ++ (gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT) || ++ ino >= EXT4_INODES_PER_GROUP(sb) - ext4_itable_unused_count(sb, gdp))) ++ return 0; ++ + bh = sb_find_get_block(sb, ext4_inode_table(sb, gdp) + + (ino / inodes_per_block)); + if (!bh || !buffer_uptodate(bh)) diff --git a/queue-6.19/ext4-publish-jinode-after-initialization.patch b/queue-6.19/ext4-publish-jinode-after-initialization.patch new file mode 100644 index 0000000000..f17a6ae782 --- /dev/null +++ b/queue-6.19/ext4-publish-jinode-after-initialization.patch @@ -0,0 +1,145 @@ +From 1aec30021edd410b986c156f195f3d23959a9d11 Mon Sep 17 00:00:00 2001 +From: Li Chen +Date: Wed, 25 Feb 2026 16:26:16 +0800 +Subject: ext4: publish jinode after initialization + +From: Li Chen + +commit 1aec30021edd410b986c156f195f3d23959a9d11 upstream. + +ext4_inode_attach_jinode() publishes ei->jinode to concurrent users. +It used to set ei->jinode before jbd2_journal_init_jbd_inode(), +allowing a reader to observe a non-NULL jinode with i_vfs_inode +still unset. + +The fast commit flush path can then pass this jinode to +jbd2_wait_inode_data(), which dereferences i_vfs_inode->i_mapping and +may crash. + +Below is the crash I observe: +``` +BUG: unable to handle page fault for address: 000000010beb47f4 +PGD 110e51067 P4D 110e51067 PUD 0 +Oops: Oops: 0000 [#1] SMP NOPTI +CPU: 1 UID: 0 PID: 4850 Comm: fc_fsync_bench_ Not tainted 6.18.0-00764-g795a690c06a5 #1 PREEMPT(voluntary) +Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.17.0-2-2 04/01/2014 +RIP: 0010:xas_find_marked+0x3d/0x2e0 +Code: e0 03 48 83 f8 02 0f 84 f0 01 00 00 48 8b 47 08 48 89 c3 48 39 c6 0f 82 fd 01 00 00 48 85 c9 74 3d 48 83 f9 03 77 63 4c 8b 0f <49> 8b 71 08 48 c7 47 18 00 00 00 00 48 89 f1 83 e1 03 48 83 f9 02 +RSP: 0018:ffffbbee806e7bf0 EFLAGS: 00010246 +RAX: 000000000010beb4 RBX: 000000000010beb4 RCX: 0000000000000003 +RDX: 0000000000000001 RSI: 0000002000300000 RDI: ffffbbee806e7c10 +RBP: 0000000000000001 R08: 0000002000300000 R09: 000000010beb47ec +R10: ffff9ea494590090 R11: 0000000000000000 R12: 0000002000300000 +R13: ffffbbee806e7c90 R14: ffff9ea494513788 R15: ffffbbee806e7c88 +FS: 00007fc2f9e3e6c0(0000) GS:ffff9ea6b1444000(0000) knlGS:0000000000000000 +CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 +CR2: 000000010beb47f4 CR3: 0000000119ac5000 CR4: 0000000000750ef0 +PKRU: 55555554 +Call Trace: + +filemap_get_folios_tag+0x87/0x2a0 +__filemap_fdatawait_range+0x5f/0xd0 +? srso_alias_return_thunk+0x5/0xfbef5 +? __schedule+0x3e7/0x10c0 +? srso_alias_return_thunk+0x5/0xfbef5 +? srso_alias_return_thunk+0x5/0xfbef5 +? srso_alias_return_thunk+0x5/0xfbef5 +? preempt_count_sub+0x5f/0x80 +? srso_alias_return_thunk+0x5/0xfbef5 +? cap_safe_nice+0x37/0x70 +? srso_alias_return_thunk+0x5/0xfbef5 +? preempt_count_sub+0x5f/0x80 +? srso_alias_return_thunk+0x5/0xfbef5 +filemap_fdatawait_range_keep_errors+0x12/0x40 +ext4_fc_commit+0x697/0x8b0 +? ext4_file_write_iter+0x64b/0x950 +? srso_alias_return_thunk+0x5/0xfbef5 +? preempt_count_sub+0x5f/0x80 +? srso_alias_return_thunk+0x5/0xfbef5 +? vfs_write+0x356/0x480 +? srso_alias_return_thunk+0x5/0xfbef5 +? preempt_count_sub+0x5f/0x80 +ext4_sync_file+0xf7/0x370 +do_fsync+0x3b/0x80 +? syscall_trace_enter+0x108/0x1d0 +__x64_sys_fdatasync+0x16/0x20 +do_syscall_64+0x62/0x2c0 +entry_SYSCALL_64_after_hwframe+0x76/0x7e +... +``` + +Fix this by initializing the jbd2_inode first. +Use smp_wmb() and WRITE_ONCE() to publish ei->jinode after +initialization. Readers use READ_ONCE() to fetch the pointer. + +Fixes: a361293f5fede ("jbd2: Fix oops in jbd2_journal_file_inode()") +Cc: stable@vger.kernel.org +Signed-off-by: Li Chen +Reviewed-by: Jan Kara +Link: https://patch.msgid.link/20260225082617.147957-1-me@linux.beauty +Signed-off-by: Theodore Ts'o +Cc: stable@kernel.org +Signed-off-by: Greg Kroah-Hartman +--- + fs/ext4/fast_commit.c | 4 ++-- + fs/ext4/inode.c | 15 +++++++++++---- + 2 files changed, 13 insertions(+), 6 deletions(-) + +--- a/fs/ext4/fast_commit.c ++++ b/fs/ext4/fast_commit.c +@@ -975,13 +975,13 @@ static int ext4_fc_flush_data(journal_t + int ret = 0; + + list_for_each_entry(ei, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) { +- ret = jbd2_submit_inode_data(journal, ei->jinode); ++ ret = jbd2_submit_inode_data(journal, READ_ONCE(ei->jinode)); + if (ret) + return ret; + } + + list_for_each_entry(ei, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) { +- ret = jbd2_wait_inode_data(journal, ei->jinode); ++ ret = jbd2_wait_inode_data(journal, READ_ONCE(ei->jinode)); + if (ret) + return ret; + } +--- a/fs/ext4/inode.c ++++ b/fs/ext4/inode.c +@@ -126,6 +126,8 @@ void ext4_inode_csum_set(struct inode *i + static inline int ext4_begin_ordered_truncate(struct inode *inode, + loff_t new_size) + { ++ struct jbd2_inode *jinode = READ_ONCE(EXT4_I(inode)->jinode); ++ + trace_ext4_begin_ordered_truncate(inode, new_size); + /* + * If jinode is zero, then we never opened the file for +@@ -133,10 +135,10 @@ static inline int ext4_begin_ordered_tru + * jbd2_journal_begin_ordered_truncate() since there's no + * outstanding writes we need to flush. + */ +- if (!EXT4_I(inode)->jinode) ++ if (!jinode) + return 0; + return jbd2_journal_begin_ordered_truncate(EXT4_JOURNAL(inode), +- EXT4_I(inode)->jinode, ++ jinode, + new_size); + } + +@@ -4499,8 +4501,13 @@ int ext4_inode_attach_jinode(struct inod + spin_unlock(&inode->i_lock); + return -ENOMEM; + } +- ei->jinode = jinode; +- jbd2_journal_init_jbd_inode(ei->jinode, inode); ++ jbd2_journal_init_jbd_inode(jinode, inode); ++ /* ++ * Publish ->jinode only after it is fully initialized so that ++ * readers never observe a partially initialized jbd2_inode. ++ */ ++ smp_wmb(); ++ WRITE_ONCE(ei->jinode, jinode); + jinode = NULL; + } + spin_unlock(&inode->i_lock); diff --git a/queue-6.19/ext4-reject-mount-if-bigalloc-with-s_first_data_block-0.patch b/queue-6.19/ext4-reject-mount-if-bigalloc-with-s_first_data_block-0.patch new file mode 100644 index 0000000000..714e013cf1 --- /dev/null +++ b/queue-6.19/ext4-reject-mount-if-bigalloc-with-s_first_data_block-0.patch @@ -0,0 +1,40 @@ +From 3822743dc20386d9897e999dbb990befa3a5b3f8 Mon Sep 17 00:00:00 2001 +From: Helen Koike +Date: Tue, 17 Mar 2026 11:23:10 -0300 +Subject: ext4: reject mount if bigalloc with s_first_data_block != 0 + +From: Helen Koike + +commit 3822743dc20386d9897e999dbb990befa3a5b3f8 upstream. + +bigalloc with s_first_data_block != 0 is not supported, reject mounting +it. + +Signed-off-by: Helen Koike +Suggested-by: Theodore Ts'o +Reported-by: syzbot+b73703b873a33d8eb8f6@syzkaller.appspotmail.com +Closes: https://syzkaller.appspot.com/bug?extid=b73703b873a33d8eb8f6 +Link: https://patch.msgid.link/20260317142325.135074-1-koike@igalia.com +Signed-off-by: Theodore Ts'o +Cc: stable@kernel.org +Signed-off-by: Greg Kroah-Hartman +--- + fs/ext4/super.c | 7 +++++++ + 1 file changed, 7 insertions(+) + +--- a/fs/ext4/super.c ++++ b/fs/ext4/super.c +@@ -3625,6 +3625,13 @@ int ext4_feature_set_ok(struct super_blo + "extents feature\n"); + return 0; + } ++ if (ext4_has_feature_bigalloc(sb) && ++ le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block)) { ++ ext4_msg(sb, KERN_WARNING, ++ "bad geometry: bigalloc file system with non-zero " ++ "first_data_block\n"); ++ return 0; ++ } + + #if !IS_ENABLED(CONFIG_QUOTA) || !IS_ENABLED(CONFIG_QFMT_V2) + if (!readonly && (ext4_has_feature_quota(sb) || diff --git a/queue-6.19/ext4-replace-bug_on-with-proper-error-handling-in-ext4_read_inline_folio.patch b/queue-6.19/ext4-replace-bug_on-with-proper-error-handling-in-ext4_read_inline_folio.patch new file mode 100644 index 0000000000..032fb84849 --- /dev/null +++ b/queue-6.19/ext4-replace-bug_on-with-proper-error-handling-in-ext4_read_inline_folio.patch @@ -0,0 +1,45 @@ +From 356227096eb66e41b23caf7045e6304877322edf Mon Sep 17 00:00:00 2001 +From: Yuto Ohnuki +Date: Mon, 23 Feb 2026 12:33:46 +0000 +Subject: ext4: replace BUG_ON with proper error handling in ext4_read_inline_folio + +From: Yuto Ohnuki + +commit 356227096eb66e41b23caf7045e6304877322edf upstream. + +Replace BUG_ON() with proper error handling when inline data size +exceeds PAGE_SIZE. This prevents kernel panic and allows the system to +continue running while properly reporting the filesystem corruption. + +The error is logged via ext4_error_inode(), the buffer head is released +to prevent memory leak, and -EFSCORRUPTED is returned to indicate +filesystem corruption. + +Signed-off-by: Yuto Ohnuki +Link: https://patch.msgid.link/20260223123345.14838-2-ytohnuki@amazon.com +Signed-off-by: Theodore Ts'o +Cc: stable@kernel.org +Signed-off-by: Greg Kroah-Hartman +--- + fs/ext4/inline.c | 10 +++++++++- + 1 file changed, 9 insertions(+), 1 deletion(-) + +--- a/fs/ext4/inline.c ++++ b/fs/ext4/inline.c +@@ -522,7 +522,15 @@ static int ext4_read_inline_folio(struct + goto out; + + len = min_t(size_t, ext4_get_inline_size(inode), i_size_read(inode)); +- BUG_ON(len > PAGE_SIZE); ++ ++ if (len > PAGE_SIZE) { ++ ext4_error_inode(inode, __func__, __LINE__, 0, ++ "inline size %zu exceeds PAGE_SIZE", len); ++ ret = -EFSCORRUPTED; ++ brelse(iloc.bh); ++ goto out; ++ } ++ + kaddr = kmap_local_folio(folio, 0); + ret = ext4_read_inline_data(inode, kaddr, len, &iloc); + kaddr = folio_zero_tail(folio, len, kaddr + len); diff --git a/queue-6.19/ext4-test-if-inode-s-all-dirty-pages-are-submitted-to-disk.patch b/queue-6.19/ext4-test-if-inode-s-all-dirty-pages-are-submitted-to-disk.patch new file mode 100644 index 0000000000..6fa331dfdf --- /dev/null +++ b/queue-6.19/ext4-test-if-inode-s-all-dirty-pages-are-submitted-to-disk.patch @@ -0,0 +1,48 @@ +From 73bf12adbea10b13647864cd1c62410d19e21086 Mon Sep 17 00:00:00 2001 +From: Ye Bin +Date: Tue, 3 Mar 2026 09:22:42 +0800 +Subject: ext4: test if inode's all dirty pages are submitted to disk + +From: Ye Bin + +commit 73bf12adbea10b13647864cd1c62410d19e21086 upstream. + +The commit aa373cf55099 ("writeback: stop background/kupdate works from +livelocking other works") introduced an issue where unmounting a filesystem +in a multi-logical-partition scenario could lead to batch file data loss. +This problem was not fixed until the commit d92109891f21 ("fs/writeback: +bail out if there is no more inodes for IO and queued once"). It took +considerable time to identify the root cause. Additionally, in actual +production environments, we frequently encountered file data loss after +normal system reboots. Therefore, we are adding a check in the inode +release flow to verify whether all dirty pages have been flushed to disk, +in order to determine whether the data loss is caused by a logic issue in +the filesystem code. + +Signed-off-by: Ye Bin +Reviewed-by: Jan Kara +Link: https://patch.msgid.link/20260303012242.3206465-1-yebin@huaweicloud.com +Signed-off-by: Theodore Ts'o +Cc: stable@kernel.org +Signed-off-by: Greg Kroah-Hartman +--- + fs/ext4/inode.c | 8 ++++++++ + 1 file changed, 8 insertions(+) + +--- a/fs/ext4/inode.c ++++ b/fs/ext4/inode.c +@@ -184,6 +184,14 @@ void ext4_evict_inode(struct inode *inod + if (EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL) + ext4_evict_ea_inode(inode); + if (inode->i_nlink) { ++ /* ++ * If there's dirty page will lead to data loss, user ++ * could see stale data. ++ */ ++ if (unlikely(!ext4_emergency_state(inode->i_sb) && ++ mapping_tagged(&inode->i_data, PAGECACHE_TAG_DIRTY))) ++ ext4_warning_inode(inode, "data will be lost"); ++ + truncate_inode_pages_final(&inode->i_data); + + goto no_delete; diff --git a/queue-6.19/ext4-validate-p_idx-bounds-in-ext4_ext_correct_indexes.patch b/queue-6.19/ext4-validate-p_idx-bounds-in-ext4_ext_correct_indexes.patch new file mode 100644 index 0000000000..1832b99230 --- /dev/null +++ b/queue-6.19/ext4-validate-p_idx-bounds-in-ext4_ext_correct_indexes.patch @@ -0,0 +1,67 @@ +From 2acb5c12ebd860f30e4faf67e6cc8c44ddfe5fe8 Mon Sep 17 00:00:00 2001 +From: Tejas Bharambe +Date: Tue, 3 Mar 2026 23:14:34 -0800 +Subject: ext4: validate p_idx bounds in ext4_ext_correct_indexes + +From: Tejas Bharambe + +commit 2acb5c12ebd860f30e4faf67e6cc8c44ddfe5fe8 upstream. + +ext4_ext_correct_indexes() walks up the extent tree correcting +index entries when the first extent in a leaf is modified. Before +accessing path[k].p_idx->ei_block, there is no validation that +p_idx falls within the valid range of index entries for that +level. + +If the on-disk extent header contains a corrupted or crafted +eh_entries value, p_idx can point past the end of the allocated +buffer, causing a slab-out-of-bounds read. + +Fix this by validating path[k].p_idx against EXT_LAST_INDEX() at +both access sites: before the while loop and inside it. Return +-EFSCORRUPTED if the index pointer is out of range, consistent +with how other bounds violations are handled in the ext4 extent +tree code. + +Reported-by: syzbot+04c4e65cab786a2e5b7e@syzkaller.appspotmail.com +Closes: https://syzkaller.appspot.com/bug?extid=04c4e65cab786a2e5b7e +Signed-off-by: Tejas Bharambe +Link: https://patch.msgid.link/JH0PR06MB66326016F9B6AD24097D232B897CA@JH0PR06MB6632.apcprd06.prod.outlook.com +Signed-off-by: Theodore Ts'o +Cc: stable@kernel.org +Signed-off-by: Greg Kroah-Hartman +--- + fs/ext4/extents.c | 15 +++++++++++++++ + 1 file changed, 15 insertions(+) + +--- a/fs/ext4/extents.c ++++ b/fs/ext4/extents.c +@@ -1741,6 +1741,13 @@ static int ext4_ext_correct_indexes(hand + err = ext4_ext_get_access(handle, inode, path + k); + if (err) + return err; ++ if (unlikely(path[k].p_idx > EXT_LAST_INDEX(path[k].p_hdr))) { ++ EXT4_ERROR_INODE(inode, ++ "path[%d].p_idx %p > EXT_LAST_INDEX %p", ++ k, path[k].p_idx, ++ EXT_LAST_INDEX(path[k].p_hdr)); ++ return -EFSCORRUPTED; ++ } + path[k].p_idx->ei_block = border; + err = ext4_ext_dirty(handle, inode, path + k); + if (err) +@@ -1753,6 +1760,14 @@ static int ext4_ext_correct_indexes(hand + err = ext4_ext_get_access(handle, inode, path + k); + if (err) + goto clean; ++ if (unlikely(path[k].p_idx > EXT_LAST_INDEX(path[k].p_hdr))) { ++ EXT4_ERROR_INODE(inode, ++ "path[%d].p_idx %p > EXT_LAST_INDEX %p", ++ k, path[k].p_idx, ++ EXT_LAST_INDEX(path[k].p_hdr)); ++ err = -EFSCORRUPTED; ++ goto clean; ++ } + path[k].p_idx->ei_block = border; + err = ext4_ext_dirty(handle, inode, path + k); + if (err) diff --git a/queue-6.19/series b/queue-6.19/series index 0b4ed579c7..410055b125 100644 --- a/queue-6.19/series +++ b/queue-6.19/series @@ -287,3 +287,21 @@ xfs-scrub-unlock-dquot-before-early-return-in-quota-scrub.patch xfs-fix-ri_total-validation-in-xlog_recover_attri_commit_pass2.patch xfs-don-t-irele-after-failing-to-iget-in-xfs_attri_recover_work.patch xfs-remove-file_path-tracepoint-data.patch +ext4-fix-journal-credit-check-when-setting-fscrypt-context.patch +ext4-convert-inline-data-to-extents-when-truncate-exceeds-inline-size.patch +ext4-fix-stale-xarray-tags-after-writeback.patch +ext4-do-not-check-fast-symlink-during-orphan-recovery.patch +ext4-fix-fsync-2-for-nojournal-mode.patch +ext4-make-recently_deleted-properly-work-with-lazy-itable-initialization.patch +ext4-replace-bug_on-with-proper-error-handling-in-ext4_read_inline_folio.patch +ext4-publish-jinode-after-initialization.patch +ext4-test-if-inode-s-all-dirty-pages-are-submitted-to-disk.patch +ext4-validate-p_idx-bounds-in-ext4_ext_correct_indexes.patch +ext4-avoid-infinite-loops-caused-by-residual-data.patch +ext4-avoid-allocate-block-from-corrupted-group-in-ext4_mb_find_by_goal.patch +ext4-reject-mount-if-bigalloc-with-s_first_data_block-0.patch +ext4-fix-use-after-free-in-update_super_work-when-racing-with-umount.patch +ext4-fix-the-might_sleep-warnings-in-kvfree.patch +ext4-handle-wraparound-when-searching-for-blocks-for-indirect-mapped-blocks.patch +ext4-fix-iloc.bh-leak-in-ext4_fc_replay_inode-error-paths.patch +ext4-always-drain-queued-discard-work-in-ext4_mb_release.patch