From: Sasha Levin Date: Mon, 4 Dec 2023 19:45:52 +0000 (-0500) Subject: Drop ext4-mark-buffer-new-if-it-is-unwritten-to-avoid-sta.patch X-Git-Tag: v4.14.332~23^2~27 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=d9b998a2d3b37e8ab0772102841837215d21e60b;p=thirdparty%2Fkernel%2Fstable-queue.git Drop ext4-mark-buffer-new-if-it-is-unwritten-to-avoid-sta.patch Signed-off-by: Sasha Levin --- diff --git a/queue-4.19/ext4-mark-buffer-new-if-it-is-unwritten-to-avoid-sta.patch b/queue-4.19/ext4-mark-buffer-new-if-it-is-unwritten-to-avoid-sta.patch deleted file mode 100644 index 46c449575ef..00000000000 --- a/queue-4.19/ext4-mark-buffer-new-if-it-is-unwritten-to-avoid-sta.patch +++ /dev/null @@ -1,145 +0,0 @@ -From a69ad8b1ba62474d3923aeef9405f8109784a2a4 Mon Sep 17 00:00:00 2001 -From: Sasha Levin -Date: Mon, 18 Sep 2023 16:15:50 +0530 -Subject: ext4: mark buffer new if it is unwritten to avoid stale data exposure - -From: Ojaswin Mujoo - -[ Upstream commit 2cd8bdb5efc1e0d5b11a4b7ba6b922fd2736a87f ] - -** Short Version ** - -In ext4 with dioread_nolock, we could have a scenario where the bh returned by -get_blocks (ext4_get_block_unwritten()) in __block_write_begin_int() has -UNWRITTEN and MAPPED flag set. Since such a bh does not have NEW flag set we -never zero out the range of bh that is not under write, causing whatever stale -data is present in the folio at that time to be written out to disk. To fix this -mark the buffer as new, in case it is unwritten, in ext4_get_block_unwritten(). - -** Long Version ** - -The issue mentioned above was resulting in two different bugs: - -1. On block size < page size case in ext4, generic/269 was reliably -failing with dioread_nolock. The state of the write was as follows: - - * The write was extending i_size. - * The last block of the file was fallocated and had an unwritten extent - * We were near ENOSPC and hence we were switching to non-delayed alloc - allocation. - -In this case, the back trace that triggers the bug is as follows: - - ext4_da_write_begin() - /* switch to nodelalloc due to low space */ - ext4_write_begin() - ext4_should_dioread_nolock() // true since mount flags still have delalloc - __block_write_begin(..., ext4_get_block_unwritten) - __block_write_begin_int() - for(each buffer head in page) { - /* first iteration, this is bh1 which contains i_size */ - if (!buffer_mapped) - get_block() /* returns bh with only UNWRITTEN and MAPPED */ - /* second iteration, bh2 */ - if (!buffer_mapped) - get_block() /* we fail here, could be ENOSPC */ - } - if (err) - /* - * this would zero out all new buffers and mark them uptodate. - * Since bh1 was never marked new, we skip it here which causes - * the bug later. - */ - folio_zero_new_buffers(); - /* ext4_wrte_begin() error handling */ - ext4_truncate_failed_write() - ext4_truncate() - ext4_block_truncate_page() - __ext4_block_zero_page_range() - if(!buffer_uptodate()) - ext4_read_bh_lock() - ext4_read_bh() -> ... ext4_submit_bh_wbc() - BUG_ON(buffer_unwritten(bh)); /* !!! */ - -2. The second issue is stale data exposure with page size >= blocksize -with dioread_nolock. The conditions needed for it to happen are same as -the previous issue ie dioread_nolock around ENOSPC condition. The issue -is also similar where in __block_write_begin_int() when we call -ext4_get_block_unwritten() on the buffer_head and the underlying extent -is unwritten, we get an unwritten and mapped buffer head. Since it is -not new, we never zero out the partial range which is not under write, -thus writing stale data to disk. This can be easily observed with the -following reproducer: - - fallocate -l 4k testfile - xfs_io -c "pwrite 2k 2k" testfile - # hexdump output will have stale data in from byte 0 to 2k in testfile - hexdump -C testfile - -NOTE: To trigger this, we need dioread_nolock enabled and write happening via -ext4_write_begin(), which is usually used when we have -o nodealloc. Since -dioread_nolock is disabled with nodelalloc, the only alternate way to call -ext4_write_begin() is to ensure that delayed alloc switches to nodelalloc ie -ext4_da_write_begin() calls ext4_write_begin(). This will usually happen when -ext4 is almost full like the way generic/269 was triggering it in Issue 1 above. -This might make the issue harder to hit. Hence, for reliable replication, I used -the below patch to temporarily allow dioread_nolock with nodelalloc and then -mount the disk with -o nodealloc,dioread_nolock. With this you can hit the stale -data issue 100% of times: - -@@ -508,8 +508,8 @@ static inline int ext4_should_dioread_nolock(struct inode *inode) - if (ext4_should_journal_data(inode)) - return 0; - /* temporary fix to prevent generic/422 test failures */ -- if (!test_opt(inode->i_sb, DELALLOC)) -- return 0; -+ // if (!test_opt(inode->i_sb, DELALLOC)) -+ // return 0; - return 1; - } - -After applying this patch to mark buffer as NEW, both the above issues are -fixed. - -Signed-off-by: Ojaswin Mujoo -Cc: stable@kernel.org -Reviewed-by: Jan Kara -Reviewed-by: "Ritesh Harjani (IBM)" -Link: https://lore.kernel.org/r/d0ed09d70a9733fbb5349c5c7b125caac186ecdf.1695033645.git.ojaswin@linux.ibm.com -Signed-off-by: Theodore Ts'o -Signed-off-by: Sasha Levin ---- - fs/ext4/inode.c | 14 +++++++++++++- - 1 file changed, 13 insertions(+), 1 deletion(-) - -diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c -index 949c5189c9be8..d0df0cd3fecff 100644 ---- a/fs/ext4/inode.c -+++ b/fs/ext4/inode.c -@@ -828,10 +828,22 @@ int ext4_get_block(struct inode *inode, sector_t iblock, - int ext4_get_block_unwritten(struct inode *inode, sector_t iblock, - struct buffer_head *bh_result, int create) - { -+ int ret = 0; -+ - ext4_debug("ext4_get_block_unwritten: inode %lu, create flag %d\n", - inode->i_ino, create); -- return _ext4_get_block(inode, iblock, bh_result, -+ ret = _ext4_get_block(inode, iblock, bh_result, - EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT); -+ -+ /* -+ * If the buffer is marked unwritten, mark it as new to make sure it is -+ * zeroed out correctly in case of partial writes. Otherwise, there is -+ * a chance of stale data getting exposed. -+ */ -+ if (ret == 0 && buffer_unwritten(bh_result)) -+ set_buffer_new(bh_result); -+ -+ return ret; - } - - /* Maximum number of blocks we map for direct IO at once. */ --- -2.42.0 - diff --git a/queue-4.19/series b/queue-4.19/series index c8cfd9e372f..fad45a57ceb 100644 --- a/queue-4.19/series +++ b/queue-4.19/series @@ -47,7 +47,6 @@ ravb-fix-races-between-ravb_tx_timeout_work-and-net-.patch net-ravb-start-tx-queues-after-hw-initialization-suc.patch perf-intel-pt-adjust-sample-flags-for-vm-exit.patch perf-intel-pt-fix-async-branch-flags.patch -ext4-mark-buffer-new-if-it-is-unwritten-to-avoid-sta.patch smb3-fix-touch-h-of-symlink.patch pci-let-pci_disable_link_state-propagate-errors.patch pci-move-aspm-declarations-to-linux-pci.h.patch diff --git a/queue-5.10/ext4-mark-buffer-new-if-it-is-unwritten-to-avoid-sta.patch b/queue-5.10/ext4-mark-buffer-new-if-it-is-unwritten-to-avoid-sta.patch deleted file mode 100644 index 3e29dfce9a8..00000000000 --- a/queue-5.10/ext4-mark-buffer-new-if-it-is-unwritten-to-avoid-sta.patch +++ /dev/null @@ -1,145 +0,0 @@ -From 831c8dbe1e4536ee9a7fbd5ab606712832f6eda9 Mon Sep 17 00:00:00 2001 -From: Sasha Levin -Date: Mon, 18 Sep 2023 16:15:50 +0530 -Subject: ext4: mark buffer new if it is unwritten to avoid stale data exposure - -From: Ojaswin Mujoo - -[ Upstream commit 2cd8bdb5efc1e0d5b11a4b7ba6b922fd2736a87f ] - -** Short Version ** - -In ext4 with dioread_nolock, we could have a scenario where the bh returned by -get_blocks (ext4_get_block_unwritten()) in __block_write_begin_int() has -UNWRITTEN and MAPPED flag set. Since such a bh does not have NEW flag set we -never zero out the range of bh that is not under write, causing whatever stale -data is present in the folio at that time to be written out to disk. To fix this -mark the buffer as new, in case it is unwritten, in ext4_get_block_unwritten(). - -** Long Version ** - -The issue mentioned above was resulting in two different bugs: - -1. On block size < page size case in ext4, generic/269 was reliably -failing with dioread_nolock. The state of the write was as follows: - - * The write was extending i_size. - * The last block of the file was fallocated and had an unwritten extent - * We were near ENOSPC and hence we were switching to non-delayed alloc - allocation. - -In this case, the back trace that triggers the bug is as follows: - - ext4_da_write_begin() - /* switch to nodelalloc due to low space */ - ext4_write_begin() - ext4_should_dioread_nolock() // true since mount flags still have delalloc - __block_write_begin(..., ext4_get_block_unwritten) - __block_write_begin_int() - for(each buffer head in page) { - /* first iteration, this is bh1 which contains i_size */ - if (!buffer_mapped) - get_block() /* returns bh with only UNWRITTEN and MAPPED */ - /* second iteration, bh2 */ - if (!buffer_mapped) - get_block() /* we fail here, could be ENOSPC */ - } - if (err) - /* - * this would zero out all new buffers and mark them uptodate. - * Since bh1 was never marked new, we skip it here which causes - * the bug later. - */ - folio_zero_new_buffers(); - /* ext4_wrte_begin() error handling */ - ext4_truncate_failed_write() - ext4_truncate() - ext4_block_truncate_page() - __ext4_block_zero_page_range() - if(!buffer_uptodate()) - ext4_read_bh_lock() - ext4_read_bh() -> ... ext4_submit_bh_wbc() - BUG_ON(buffer_unwritten(bh)); /* !!! */ - -2. The second issue is stale data exposure with page size >= blocksize -with dioread_nolock. The conditions needed for it to happen are same as -the previous issue ie dioread_nolock around ENOSPC condition. The issue -is also similar where in __block_write_begin_int() when we call -ext4_get_block_unwritten() on the buffer_head and the underlying extent -is unwritten, we get an unwritten and mapped buffer head. Since it is -not new, we never zero out the partial range which is not under write, -thus writing stale data to disk. This can be easily observed with the -following reproducer: - - fallocate -l 4k testfile - xfs_io -c "pwrite 2k 2k" testfile - # hexdump output will have stale data in from byte 0 to 2k in testfile - hexdump -C testfile - -NOTE: To trigger this, we need dioread_nolock enabled and write happening via -ext4_write_begin(), which is usually used when we have -o nodealloc. Since -dioread_nolock is disabled with nodelalloc, the only alternate way to call -ext4_write_begin() is to ensure that delayed alloc switches to nodelalloc ie -ext4_da_write_begin() calls ext4_write_begin(). This will usually happen when -ext4 is almost full like the way generic/269 was triggering it in Issue 1 above. -This might make the issue harder to hit. Hence, for reliable replication, I used -the below patch to temporarily allow dioread_nolock with nodelalloc and then -mount the disk with -o nodealloc,dioread_nolock. With this you can hit the stale -data issue 100% of times: - -@@ -508,8 +508,8 @@ static inline int ext4_should_dioread_nolock(struct inode *inode) - if (ext4_should_journal_data(inode)) - return 0; - /* temporary fix to prevent generic/422 test failures */ -- if (!test_opt(inode->i_sb, DELALLOC)) -- return 0; -+ // if (!test_opt(inode->i_sb, DELALLOC)) -+ // return 0; - return 1; - } - -After applying this patch to mark buffer as NEW, both the above issues are -fixed. - -Signed-off-by: Ojaswin Mujoo -Cc: stable@kernel.org -Reviewed-by: Jan Kara -Reviewed-by: "Ritesh Harjani (IBM)" -Link: https://lore.kernel.org/r/d0ed09d70a9733fbb5349c5c7b125caac186ecdf.1695033645.git.ojaswin@linux.ibm.com -Signed-off-by: Theodore Ts'o -Signed-off-by: Sasha Levin ---- - fs/ext4/inode.c | 14 +++++++++++++- - 1 file changed, 13 insertions(+), 1 deletion(-) - -diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c -index 045f0efb696ad..61739f62e2b3c 100644 ---- a/fs/ext4/inode.c -+++ b/fs/ext4/inode.c -@@ -818,10 +818,22 @@ int ext4_get_block(struct inode *inode, sector_t iblock, - int ext4_get_block_unwritten(struct inode *inode, sector_t iblock, - struct buffer_head *bh_result, int create) - { -+ int ret = 0; -+ - ext4_debug("ext4_get_block_unwritten: inode %lu, create flag %d\n", - inode->i_ino, create); -- return _ext4_get_block(inode, iblock, bh_result, -+ ret = _ext4_get_block(inode, iblock, bh_result, - EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT); -+ -+ /* -+ * If the buffer is marked unwritten, mark it as new to make sure it is -+ * zeroed out correctly in case of partial writes. Otherwise, there is -+ * a chance of stale data getting exposed. -+ */ -+ if (ret == 0 && buffer_unwritten(bh_result)) -+ set_buffer_new(bh_result); -+ -+ return ret; - } - - /* Maximum number of blocks we map for direct IO at once. */ --- -2.42.0 - diff --git a/queue-5.10/series b/queue-5.10/series index d7bbd7ccf9c..a812cb7a672 100644 --- a/queue-5.10/series +++ b/queue-5.10/series @@ -102,7 +102,6 @@ net-ravb-use-pm_runtime_resume_and_get.patch net-ravb-start-tx-queues-after-hw-initialization-suc.patch perf-intel-pt-adjust-sample-flags-for-vm-exit.patch perf-intel-pt-fix-async-branch-flags.patch -ext4-mark-buffer-new-if-it-is-unwritten-to-avoid-sta.patch smb3-fix-touch-h-of-symlink.patch asoc-intel-move-soc_intel_is_foo-helpers-to-a-generi.patch asoc-sof-sof-pci-dev-use-community-key-on-all-up-boa.patch diff --git a/queue-5.15/ext4-mark-buffer-new-if-it-is-unwritten-to-avoid-sta.patch b/queue-5.15/ext4-mark-buffer-new-if-it-is-unwritten-to-avoid-sta.patch deleted file mode 100644 index e4c3b0980a9..00000000000 --- a/queue-5.15/ext4-mark-buffer-new-if-it-is-unwritten-to-avoid-sta.patch +++ /dev/null @@ -1,145 +0,0 @@ -From a23d2f2cef58f4c6e8e837657dc416a0ac609a76 Mon Sep 17 00:00:00 2001 -From: Sasha Levin -Date: Mon, 18 Sep 2023 16:15:50 +0530 -Subject: ext4: mark buffer new if it is unwritten to avoid stale data exposure - -From: Ojaswin Mujoo - -[ Upstream commit 2cd8bdb5efc1e0d5b11a4b7ba6b922fd2736a87f ] - -** Short Version ** - -In ext4 with dioread_nolock, we could have a scenario where the bh returned by -get_blocks (ext4_get_block_unwritten()) in __block_write_begin_int() has -UNWRITTEN and MAPPED flag set. Since such a bh does not have NEW flag set we -never zero out the range of bh that is not under write, causing whatever stale -data is present in the folio at that time to be written out to disk. To fix this -mark the buffer as new, in case it is unwritten, in ext4_get_block_unwritten(). - -** Long Version ** - -The issue mentioned above was resulting in two different bugs: - -1. On block size < page size case in ext4, generic/269 was reliably -failing with dioread_nolock. The state of the write was as follows: - - * The write was extending i_size. - * The last block of the file was fallocated and had an unwritten extent - * We were near ENOSPC and hence we were switching to non-delayed alloc - allocation. - -In this case, the back trace that triggers the bug is as follows: - - ext4_da_write_begin() - /* switch to nodelalloc due to low space */ - ext4_write_begin() - ext4_should_dioread_nolock() // true since mount flags still have delalloc - __block_write_begin(..., ext4_get_block_unwritten) - __block_write_begin_int() - for(each buffer head in page) { - /* first iteration, this is bh1 which contains i_size */ - if (!buffer_mapped) - get_block() /* returns bh with only UNWRITTEN and MAPPED */ - /* second iteration, bh2 */ - if (!buffer_mapped) - get_block() /* we fail here, could be ENOSPC */ - } - if (err) - /* - * this would zero out all new buffers and mark them uptodate. - * Since bh1 was never marked new, we skip it here which causes - * the bug later. - */ - folio_zero_new_buffers(); - /* ext4_wrte_begin() error handling */ - ext4_truncate_failed_write() - ext4_truncate() - ext4_block_truncate_page() - __ext4_block_zero_page_range() - if(!buffer_uptodate()) - ext4_read_bh_lock() - ext4_read_bh() -> ... ext4_submit_bh_wbc() - BUG_ON(buffer_unwritten(bh)); /* !!! */ - -2. The second issue is stale data exposure with page size >= blocksize -with dioread_nolock. The conditions needed for it to happen are same as -the previous issue ie dioread_nolock around ENOSPC condition. The issue -is also similar where in __block_write_begin_int() when we call -ext4_get_block_unwritten() on the buffer_head and the underlying extent -is unwritten, we get an unwritten and mapped buffer head. Since it is -not new, we never zero out the partial range which is not under write, -thus writing stale data to disk. This can be easily observed with the -following reproducer: - - fallocate -l 4k testfile - xfs_io -c "pwrite 2k 2k" testfile - # hexdump output will have stale data in from byte 0 to 2k in testfile - hexdump -C testfile - -NOTE: To trigger this, we need dioread_nolock enabled and write happening via -ext4_write_begin(), which is usually used when we have -o nodealloc. Since -dioread_nolock is disabled with nodelalloc, the only alternate way to call -ext4_write_begin() is to ensure that delayed alloc switches to nodelalloc ie -ext4_da_write_begin() calls ext4_write_begin(). This will usually happen when -ext4 is almost full like the way generic/269 was triggering it in Issue 1 above. -This might make the issue harder to hit. Hence, for reliable replication, I used -the below patch to temporarily allow dioread_nolock with nodelalloc and then -mount the disk with -o nodealloc,dioread_nolock. With this you can hit the stale -data issue 100% of times: - -@@ -508,8 +508,8 @@ static inline int ext4_should_dioread_nolock(struct inode *inode) - if (ext4_should_journal_data(inode)) - return 0; - /* temporary fix to prevent generic/422 test failures */ -- if (!test_opt(inode->i_sb, DELALLOC)) -- return 0; -+ // if (!test_opt(inode->i_sb, DELALLOC)) -+ // return 0; - return 1; - } - -After applying this patch to mark buffer as NEW, both the above issues are -fixed. - -Signed-off-by: Ojaswin Mujoo -Cc: stable@kernel.org -Reviewed-by: Jan Kara -Reviewed-by: "Ritesh Harjani (IBM)" -Link: https://lore.kernel.org/r/d0ed09d70a9733fbb5349c5c7b125caac186ecdf.1695033645.git.ojaswin@linux.ibm.com -Signed-off-by: Theodore Ts'o -Signed-off-by: Sasha Levin ---- - fs/ext4/inode.c | 14 +++++++++++++- - 1 file changed, 13 insertions(+), 1 deletion(-) - -diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c -index c3b0fc7580e83..2ec1796922871 100644 ---- a/fs/ext4/inode.c -+++ b/fs/ext4/inode.c -@@ -818,10 +818,22 @@ int ext4_get_block(struct inode *inode, sector_t iblock, - int ext4_get_block_unwritten(struct inode *inode, sector_t iblock, - struct buffer_head *bh_result, int create) - { -+ int ret = 0; -+ - ext4_debug("ext4_get_block_unwritten: inode %lu, create flag %d\n", - inode->i_ino, create); -- return _ext4_get_block(inode, iblock, bh_result, -+ ret = _ext4_get_block(inode, iblock, bh_result, - EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT); -+ -+ /* -+ * If the buffer is marked unwritten, mark it as new to make sure it is -+ * zeroed out correctly in case of partial writes. Otherwise, there is -+ * a chance of stale data getting exposed. -+ */ -+ if (ret == 0 && buffer_unwritten(bh_result)) -+ set_buffer_new(bh_result); -+ -+ return ret; - } - - /* Maximum number of blocks we map for direct IO at once. */ --- -2.42.0 - diff --git a/queue-5.15/series b/queue-5.15/series index fa895cf984c..58f56434d60 100644 --- a/queue-5.15/series +++ b/queue-5.15/series @@ -45,7 +45,6 @@ ravb-separate-handling-of-irq-enable-disable-regs-in.patch ravb-support-separate-line0-desc-line1-err-and-line2.patch net-ravb-stop-dma-in-case-of-failures-on-ravb_open.patch perf-intel-pt-fix-async-branch-flags.patch -ext4-mark-buffer-new-if-it-is-unwritten-to-avoid-sta.patch selftests-resctrl-add-missing-spdx-license-to-makefi.patch selftests-resctrl-move-_gnu_source-define-into-makef.patch powerpc-pseries-iommu-enable_ddw-incorrectly-returns.patch diff --git a/queue-5.4/ext4-mark-buffer-new-if-it-is-unwritten-to-avoid-sta.patch b/queue-5.4/ext4-mark-buffer-new-if-it-is-unwritten-to-avoid-sta.patch deleted file mode 100644 index 9526ab1a6de..00000000000 --- a/queue-5.4/ext4-mark-buffer-new-if-it-is-unwritten-to-avoid-sta.patch +++ /dev/null @@ -1,145 +0,0 @@ -From eb57900d1090262be4fa3b51cd416e0fe82cd0c4 Mon Sep 17 00:00:00 2001 -From: Sasha Levin -Date: Mon, 18 Sep 2023 16:15:50 +0530 -Subject: ext4: mark buffer new if it is unwritten to avoid stale data exposure - -From: Ojaswin Mujoo - -[ Upstream commit 2cd8bdb5efc1e0d5b11a4b7ba6b922fd2736a87f ] - -** Short Version ** - -In ext4 with dioread_nolock, we could have a scenario where the bh returned by -get_blocks (ext4_get_block_unwritten()) in __block_write_begin_int() has -UNWRITTEN and MAPPED flag set. Since such a bh does not have NEW flag set we -never zero out the range of bh that is not under write, causing whatever stale -data is present in the folio at that time to be written out to disk. To fix this -mark the buffer as new, in case it is unwritten, in ext4_get_block_unwritten(). - -** Long Version ** - -The issue mentioned above was resulting in two different bugs: - -1. On block size < page size case in ext4, generic/269 was reliably -failing with dioread_nolock. The state of the write was as follows: - - * The write was extending i_size. - * The last block of the file was fallocated and had an unwritten extent - * We were near ENOSPC and hence we were switching to non-delayed alloc - allocation. - -In this case, the back trace that triggers the bug is as follows: - - ext4_da_write_begin() - /* switch to nodelalloc due to low space */ - ext4_write_begin() - ext4_should_dioread_nolock() // true since mount flags still have delalloc - __block_write_begin(..., ext4_get_block_unwritten) - __block_write_begin_int() - for(each buffer head in page) { - /* first iteration, this is bh1 which contains i_size */ - if (!buffer_mapped) - get_block() /* returns bh with only UNWRITTEN and MAPPED */ - /* second iteration, bh2 */ - if (!buffer_mapped) - get_block() /* we fail here, could be ENOSPC */ - } - if (err) - /* - * this would zero out all new buffers and mark them uptodate. - * Since bh1 was never marked new, we skip it here which causes - * the bug later. - */ - folio_zero_new_buffers(); - /* ext4_wrte_begin() error handling */ - ext4_truncate_failed_write() - ext4_truncate() - ext4_block_truncate_page() - __ext4_block_zero_page_range() - if(!buffer_uptodate()) - ext4_read_bh_lock() - ext4_read_bh() -> ... ext4_submit_bh_wbc() - BUG_ON(buffer_unwritten(bh)); /* !!! */ - -2. The second issue is stale data exposure with page size >= blocksize -with dioread_nolock. The conditions needed for it to happen are same as -the previous issue ie dioread_nolock around ENOSPC condition. The issue -is also similar where in __block_write_begin_int() when we call -ext4_get_block_unwritten() on the buffer_head and the underlying extent -is unwritten, we get an unwritten and mapped buffer head. Since it is -not new, we never zero out the partial range which is not under write, -thus writing stale data to disk. This can be easily observed with the -following reproducer: - - fallocate -l 4k testfile - xfs_io -c "pwrite 2k 2k" testfile - # hexdump output will have stale data in from byte 0 to 2k in testfile - hexdump -C testfile - -NOTE: To trigger this, we need dioread_nolock enabled and write happening via -ext4_write_begin(), which is usually used when we have -o nodealloc. Since -dioread_nolock is disabled with nodelalloc, the only alternate way to call -ext4_write_begin() is to ensure that delayed alloc switches to nodelalloc ie -ext4_da_write_begin() calls ext4_write_begin(). This will usually happen when -ext4 is almost full like the way generic/269 was triggering it in Issue 1 above. -This might make the issue harder to hit. Hence, for reliable replication, I used -the below patch to temporarily allow dioread_nolock with nodelalloc and then -mount the disk with -o nodealloc,dioread_nolock. With this you can hit the stale -data issue 100% of times: - -@@ -508,8 +508,8 @@ static inline int ext4_should_dioread_nolock(struct inode *inode) - if (ext4_should_journal_data(inode)) - return 0; - /* temporary fix to prevent generic/422 test failures */ -- if (!test_opt(inode->i_sb, DELALLOC)) -- return 0; -+ // if (!test_opt(inode->i_sb, DELALLOC)) -+ // return 0; - return 1; - } - -After applying this patch to mark buffer as NEW, both the above issues are -fixed. - -Signed-off-by: Ojaswin Mujoo -Cc: stable@kernel.org -Reviewed-by: Jan Kara -Reviewed-by: "Ritesh Harjani (IBM)" -Link: https://lore.kernel.org/r/d0ed09d70a9733fbb5349c5c7b125caac186ecdf.1695033645.git.ojaswin@linux.ibm.com -Signed-off-by: Theodore Ts'o -Signed-off-by: Sasha Levin ---- - fs/ext4/inode.c | 14 +++++++++++++- - 1 file changed, 13 insertions(+), 1 deletion(-) - -diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c -index 9ca7db0c4039a..0847657400a92 100644 ---- a/fs/ext4/inode.c -+++ b/fs/ext4/inode.c -@@ -827,10 +827,22 @@ int ext4_get_block(struct inode *inode, sector_t iblock, - int ext4_get_block_unwritten(struct inode *inode, sector_t iblock, - struct buffer_head *bh_result, int create) - { -+ int ret = 0; -+ - ext4_debug("ext4_get_block_unwritten: inode %lu, create flag %d\n", - inode->i_ino, create); -- return _ext4_get_block(inode, iblock, bh_result, -+ ret = _ext4_get_block(inode, iblock, bh_result, - EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT); -+ -+ /* -+ * If the buffer is marked unwritten, mark it as new to make sure it is -+ * zeroed out correctly in case of partial writes. Otherwise, there is -+ * a chance of stale data getting exposed. -+ */ -+ if (ret == 0 && buffer_unwritten(bh_result)) -+ set_buffer_new(bh_result); -+ -+ return ret; - } - - /* Maximum number of blocks we map for direct IO at once. */ --- -2.42.0 - diff --git a/queue-5.4/series b/queue-5.4/series index bae3ca58844..eca2ba23a07 100644 --- a/queue-5.4/series +++ b/queue-5.4/series @@ -71,7 +71,6 @@ net-ravb-use-pm_runtime_resume_and_get.patch net-ravb-start-tx-queues-after-hw-initialization-suc.patch perf-intel-pt-adjust-sample-flags-for-vm-exit.patch perf-intel-pt-fix-async-branch-flags.patch -ext4-mark-buffer-new-if-it-is-unwritten-to-avoid-sta.patch smb3-fix-touch-h-of-symlink.patch s390-mm-fix-phys-vs-virt-confusion-in-mark_kernel_px.patch s390-cmma-fix-detection-of-dat-pages.patch