From: Pádraig Brady
Date: Thu, 30 Oct 2025 13:02:48 +0000 (+0000) Subject: copy: don't avoid copy-offload upon SEEK_HOLE indicating non-sparse X-Git-Url: http://git.ipfire.org/gitweb.cgi?a=commitdiff_plain;h=64b8fdb5b4767e0f833486507c3eae46ed1b40f8;p=thirdparty%2Fcoreutils.git copy: don't avoid copy-offload upon SEEK_HOLE indicating non-sparse * src/copy-file-data.c (infer_scantype): Fall back to a plain copy if SEEK_HOLE indicates non-sparse, as zero copy avoids copy offload. This was seen with transparently compressed files on OpenZFS. * tests/cp/sparse-perf.sh: Add a test case even though it might only trigger on compressed file systems that don't support reflink. * NEWS: Mention the bug fix. Addresses https://github.com/coreutils/coreutils/issues/122 --- diff --git a/NEWS b/NEWS index ee0213fb78..53e5b387c2 100644 --- a/NEWS +++ b/NEWS @@ -18,6 +18,10 @@ GNU coreutils NEWS -*- outline -*- Also non standard SHA2 tags with a bad length resulted in undefined behavior. [bug introduced in coreutils-9.8] + 'cp' restores performance with transparently compressed files, which + regressed due to the avoidance of copy offload, seen with OpenZFS at least. + [bug introduced in coreutils-9.8] + 'numfmt' no longer reads out-of-bounds memory with trailing blanks in input. [bug introduced with numfmt in coreutils-8.21] @@ -42,6 +46,10 @@ GNU coreutils NEWS -*- outline -*- ** Changes in behavior + 'cp' with default options may again, like with versions before v9.8, + miss opportunities to create holes with file systems like squashfs, + that support SEEK_HOLE only trivially. + 'sort --compress-program' will continue without compressing temporary files if the specified program cannot be executed. Also malformed shell scripts without a "shebang line" will no longer be executed. diff --git a/src/copy-file-data.c b/src/copy-file-data.c index 9eb6f47244..8fd25fee92 100644 --- a/src/copy-file-data.c +++ b/src/copy-file-data.c @@ -481,12 +481,19 @@ infer_scantype (int fd, struct stat const *sb, off_t pos, if (scan_inference->hole_start < sb->st_size) return LSEEK_SCANTYPE; - /* Though the file likely has holes, SEEK_DATA and SEEK_HOLE + /* Though the file may have holes, SEEK_DATA and SEEK_HOLE didn't find any. This can happen with file systems like circa-2025 squashfs that support SEEK_HOLE only trivially. - Fall back on ZERO_SCANTYPE. */ + This can also happen due to transparent file compression, + which can also indicate fewer than the usual number of blocks. */ + if (lseek (fd, pos, SEEK_SET) < 0) return ERROR_SCANTYPE; + + /* we prefer to return PLAIN_SCANTYPE here so that copy offload + continues to be used. Falling through to ZERO_SCANTYPE would be + less performant in the compressed file case. */ + return PLAIN_SCANTYPE; } } else if (pos < scan_inference->ext_start || errno == ENXIO) diff --git a/tests/cp/sparse-perf.sh b/tests/cp/sparse-perf.sh index 5a283c1fe6..5ee984c527 100755 --- a/tests/cp/sparse-perf.sh +++ b/tests/cp/sparse-perf.sh @@ -35,6 +35,16 @@ cmp $other_partition_sparse k2 || fail=1 grep ': avoided' cp.out && { cat cp.out; fail=1; } +# Create a large-non-sparse-but-compressible file +# Ensure we don't avoid copy offload which we saw with +# transparent compression on OpenZFS at least +# (as that triggers our sparse heuristic). +mls='might-look-sparse' +yes | head -n1M > "$mls" || framework_failure_ +cp --debug "$mls" "$mls.cp" >cp.out || fail=1 +cmp "$mls" "$mls.cp" || fail=1 +grep ': avoided' cp.out && { cat cp.out; fail=1; } + # Create a large-but-sparse file on the current partition. # We disable relinking below, thus verifying SEEK_HOLE support