Pádraig Brady [Sat, 18 Oct 2025 16:44:49 +0000 (17:44 +0100)]
numfmt: support multi-byte --delimiter
* bootstrap.conf: Depend on mbsstr() to robustly search for a
multi-byte delimiter character (string) within a multi-byte string.
* src/numfmt.c (main): Accept a valid multi-byte delimiter character.
(next_field): Adjust delimiter search from single byte
to multi-byte aware. Use mbsstr to find the first match.
* tests/misc/numfmt.pl: Add test case.
* NEWS: Mention the improvement.
Pádraig Brady [Sat, 18 Oct 2025 12:02:44 +0000 (13:02 +0100)]
numfmt: use multi-byte aware suffix matching
* src/numfmt.c (process_suffixed_number): Use gnulib's
mbs_endswith() helper, which is more robust in non UTF-8 locales.
Also always output a devmsg if a suffix is specified.
Pádraig Brady [Fri, 17 Oct 2025 18:14:21 +0000 (19:14 +0100)]
numfmt: fix issues with multi-byte blanks
* src/numfmt.c (process_line): Restore byte overwritten with NUL,
as it may be part of a multi-byte blank.
(process_suffixed_number): Skip multi-byte blanks,
and correctly determine width with mbswidth().
(parse_format_string): Use c_isblank() to explicitly
indicate that's all the format spec supports.
* tests/misc/numfmt.pl: Add test cases.
* NEWS: Mention the bug fix.
Pádraig Brady [Thu, 9 Oct 2025 13:24:12 +0000 (14:24 +0100)]
numfmt: add --unit-separator
Output, accept, or disallow a string between the number and unit
as recommended in <https://physics.nist.gov/cuu/Units/checklist.html>
I.e. support outputting numbers of the form: "1234 M"
* src/numfmt.c (simple_strtod_human): Skip unit separator if present,
or disallow a unit separator if empty.
(double_to_human): Output unit separator if specified.
(main): Accept --unit-separator.
* tests/misc/numfmt.pl: Add test cases.
* doc/coreutils.texi: Describe the new option,
giving examples of interaction with --delimiter.
* NEWS: Mention the new feature.
* THANKS.in: Add Johannes Schauer Marin Rodrigues,
who provided a preliminary patch.
Pádraig Brady [Tue, 14 Oct 2025 20:06:03 +0000 (21:06 +0100)]
numfmt: support reading numbers with grouping characters
This does not validate grouping character placement,
and currently just ignores grouping characters.
* src/numfmt.c (simple_strtod_int): Skip grouping chars
that are part of a number.
* tests/misc/numfmt.pl: Add test cases.
* NEWS: Mention the improvement.
Pádraig Brady [Tue, 14 Oct 2025 15:17:56 +0000 (16:17 +0100)]
numfmt: support reading numbers with NBSP before unit
* src/numfmt.c (simple_strtod_human): Accept (multi-byte)
non-breaking space character between number and unit.
Note we restrict this to a single character between number
and unit, to allow less ambiguous parsing if multiple blanks
are used to delimit fields.
* tests/misc/numfmt.pl: Add test cases.
* doc/coreutils.texi (numfmt invocation): Fix stale description
--delimiter skipping whitespace.
* NEWS: Mention the improvement.
Nicolas Boichat [Sat, 26 Jul 2025 06:58:33 +0000 (14:58 +0800)]
tests: du/bigtime: try harder to find a suitable filesystem
* tests/du/bigtime.sh: At least on Linux, the ext4 filesystem
doesn't support such large timestamp, while tmpfs does. Try a bit
harder to look for a filesystem with large timestamp support.
Pádraig Brady [Tue, 7 Oct 2025 13:38:49 +0000 (14:38 +0100)]
maint: cksum: document a base64/hex parsing ambiguity with untagged
* src/digest.c (split_3): Mention the ambiguity in misinterpreting
base64 characters as hex is not a practical consideration.
Also add an example of both tagged formats which makes it
easier to interpret the parsing logic.
Pádraig Brady [Mon, 6 Oct 2025 18:41:24 +0000 (19:41 +0100)]
cksum: fix --check with untagged base64 format with tag matches
* src/digest.c (split_3): Fallback to untagged matching in the
case where -a is specified and we have matched a TAG in
the possibly base64 data. This might happen in 1 in every 64K files.
Note we remove the modification of string S (and redundant streq) in
the tag matching, as that was not needed since v8.32-223-g217cd278e.
* tests/cksum/cksum-c.sh: Add a test case.
* NEWS: Mention the bug fix.
Pádraig Brady [Mon, 6 Oct 2025 15:32:26 +0000 (16:32 +0100)]
cksum: fix length validation with SHA2- tagged format
* src/digest.c (sha2_sum_stream): Change from unreachable()
to affirm() so that we have defined behavior unless
we configure with --disable-assert.
(sha3_sum_stream): Likewise.
(split_3): Validate SHA2-lengths before passing on.
* tests/cksum/cksum-c.sh: Add a test case.
* NEWS: Mention the bug fix.
Pádraig Brady [Mon, 6 Oct 2025 12:23:55 +0000 (13:23 +0100)]
cksum: fix --check with --algorithm=sha2
* src/digest.c (split_3): Look up the provided tag with -a sha2
because there is not a 1:1 mapping between them.
* tests/cksum/cksum-c.sh: Add a test case.
* NEWS: Mention the bug fix.
Paul Eggert [Mon, 6 Oct 2025 20:31:48 +0000 (13:31 -0700)]
rm: make ‘rm -d DIR’ more like ‘rmdir DIR’
* src/remove.c (rm_fts): When not recursive,
arrange for ‘rm -d DIR’ to behave more like ‘rmdir DIR’.
This works better for Ceph snapshot directories.
Problem reported by Yannick Le Pennec (bug#78245).
Collin Funk [Sun, 5 Oct 2025 00:18:01 +0000 (17:18 -0700)]
cksum: allow -a {blake2b,sha2,sha3} --check to work on base64
* NEWS: Mention the bug.
* src/digest.c (split_3): Check that the base64 digest matches the
length supported by the algorithm.
(digest_check): Check that the read digest matches the base64 length of
the algorithm's digest. The previous condition would not work for
'cksum -a blake2b -l 8 ...'.
* tests/cksum/cksum-base64-untagged.sh: New file.
* tests/local.mk (all_tests): Add the new test.
Paul Eggert [Sat, 4 Oct 2025 00:04:01 +0000 (17:04 -0700)]
maint: omit trailing white space in config.h
* configure.ac (FORTIFY_SOURCE): Don’t indent a line
where the indentation can cause trailing white space in config.h.
Problem reported by Grisha Levit (Bug#79567).
Pádraig Brady [Thu, 2 Oct 2025 16:20:39 +0000 (17:20 +0100)]
tests: factor: add suggested large prime tests
* tests/factor/create-test.sh: Add 2 new large primes from:
https://github.com/coreutils/coreutils/issues/65
* tests/local.mk: Reference the 2 new generated tests.
* man/help2man: Relax the option match so more --option
variations are supported, and passed through to convert_option().
Specifically more variations after '=' are now supported.
Also split and document the regular expression.
Reported at https://github.com/coreutils/coreutils/issues/84
fold: move multi-byte character reading to a module
* gl/modules/mbbuf: New file.
* gl/lib/mbbuf.c: Likewise.
* gl/lib/mbbuf.h: Likewise.
* gl/local.mk (EXTRA_DIST): Add the new files.
* bootstrap.conf (gnulib_modules): Add mbbuf.
* src/fold.c: Include mbbuf.h.
(fold_file): Use the mbbuf functions instead of calling fread and
handling the input buffer ourselves.
* cfg.mk (exclude_file_name_regexp--sc_preprocessor_indentation)
(exclude_file_name_regexp--sc_GPL_version): Match gl/lib/mbbuf.c and
gl/lib/mbbuf.h.
* configure.ac: Add detection of AVX512 intrinsics for wc.
* src/local.mk: Build AVX512 wc libraries.
* src/wc.c: Add runtime detection of AVX512 intrinsics and call
appropriate function when detected.
* src/wc.h (wc_lines_avx512): Declare function.
* tests/wc/wc-cpu.sh: Add a test that disables AVX512 intrinsics.
* src/wc_avx512.c: New file containing the wc -l implementation using
AVX512. The logic and code is reused from the AVX2 implementation with
slight adaptations. Replaced __builtin_popcount by __builtin_popcountll
and the combination of _mm256_cmpeq_epi8 and _mm256_movemask_epi8 by a
single call to _mm512_cmpeq_epi8_mask.
* NEWS: Mention the improvement.
Bernhard Voelker [Thu, 25 Sep 2025 20:50:01 +0000 (22:50 +0200)]
join: remove unused #include "argmatch.h"
Prompted by the syntax-check failure:
maint.mk: the above files include argmatch.h but don't use it
make: *** [maint.mk:741: sc_prohibit_argmatch_without_use] Error 1
* src/join.c: Remove include, as the previous commit changed from using
ARRAY_CARDINALITY to countof - for which system.h includes the header.
Hannes Braun [Wed, 24 Sep 2025 19:20:49 +0000 (21:20 +0200)]
tail: fix tailing larger number of lines in regular files
* src/tail.c (file_lines): Seek to the previous block instead of the
beginning (or a little before) of the block that was just scanned.
Otherwise, the same block is read and scanned (at least partially)
again. This bug was introduced by commit v9.7-219-g976f8abc1.
* tests/tail/basic-seek.sh: Add a new test.
* tests/local.mk: Reference the new test.
* NEWS: mention the bug fix.
* src/local.mk: Due to gnulib adjustments, this explicit
dependency is required with the mold linker at least.
Reported at https://github.com/coreutils/coreutils/issues/113
basenc: --base58: fix buffer overflow with input > 15MB
base58_length() operated naively on an int
which resulted in an overflow to a negative number
for any input > 2^31-1/138, i.e. 15,561,475 bytes.
* src/basenc.c (base_length): Change input and output
parameter types from int to idx_t since this needs to
cater for the full input size in the base58 case.
(base58_length): Likewise. Also reorder the calculation
to be less exact, but doing the division first
to minimize the chance of overflow (which now on 64 bit
would only happen for inputs > around 6 Exa bytes).
* tests/basenc/basenc-large.sh: Add a new test,
that triggers with valgrind or ASAN.
* tests/local.mk: Reference the new test.
* NEWS: Mention the bug fix.
* src/cksum.c (pclmul_supported): In the case where
gnulib has pclmul enabled but coreutils does not,
ensure we don't reference the coreutils pclmul function.
Addresses https://bugs.gnu.org/79491
tests: ls: avoid alignment check with non printable characters
* tests/ls/block-size.sh: Skip the case where there are
non-printable characters in ls' output, which is the case
with NBSP thousands separators on FreeBSD 11 and 12.
We may drop the MBSW_REJECT_UNPRINTABLE in future from
ls and numfmt, but for now avoid these characters in the test.
Reported by Bruno Haible.
* tests/du/move-dir-while-traversing.sh: Expand the work to avoid
a false failure where du completes before the directory is moved.
Also expand the timeout to our more standard 10s to avoid the
"directory mover" being killed before du processes the directory.
This doesn't perceptibly impact the run time of the test.
Reported by Bruno Haible on a CentOS 7 system.
* configure.ac: Check for statx using gl_CHECK_FUNCS_ANDROID since it is
hidden for Android API level <= 30.
* m4/jm-macros.m4 (coreutils_MACROS): Check for syncfs using
gl_CHECK_FUNCS_ANDROID since it is hidden for Android API level <= 28.
* tests/fold/fold-zero-width.sh: Check the shell was able to create
the redirection file, as intermittently on CentOS 5,6,7 this wasn't
the case, with the shell giving an xmalloc failure due to the ulimit.
Reported by William Bader and Bruno Haible.
* tests/tail/inotify-dir-recreate.sh: Add an extra check
that inotify is in use, as it's required for this test.
Inotify is avoided with overlayfs for which the
df --local check is not sufficient exclusion for.
* tests/fold/fold-characters.sh: Ensure we have independent verification
of the width of characters before testing based on those widths.
* tests/fold/fold-zero-width.sh: Likewise.
* tests/fold/fold.pl: Only compare the exit status,
as the error message can be translated.
* tests/fold/fold-zero-width.sh: Increase vm limit to avoid
failures on CentOS 5,6,7. Match the limit used in write-errors.sh
as per commit v9.5-255-g0bd149403
tests: env: skip a few tests if LD_LIBRARY_PATH is set
* tests/env/env-null.sh: Skip test if LD_LIBRARY_PATH or platform's
equivalent is set, since 'env -i' will unset it which may prevent
programs from running.
* tests/env/env-S.pl: Likewise.
Issue and suggested fix reported by Bruno Haible.
Paul Eggert [Fri, 19 Sep 2025 15:55:12 +0000 (08:55 -0700)]
copy: pacify older clang
* src/copy-file-data.c (copy_file_data):
Redo conditionals for clarity.
This pacifies the clang in FreeBSD 11 and OpenBSD 7.6.
Problem reported by Bruno Haible in:
https://lists.gnu.org/r/coreutils/2025-09/msg00104.html
* src/dd.c (v): Add the O_EXCL flag to the set of bits that we do not
want to use for our definitions.
* cfg.mk (sc_dd_O_FLAGS): Adjust to pass syntax-check.
* NEWS: Mention the fix.
Reported by Bruno Haible.
tests: tail/overlay-headers.sh: protect against hang
* tests/tail/overlay-headers.sh: Protect tail invocation with timeout,
and extend the possible running period to 60 seconds like other tests.
Reported by Brudno Haible on T2SDE Linux/alpha
* tests/tail/wait.sh: This test was seen to hang occasionally
on an Alpine Linux 3.20 system, so protect the tail(1) call
with `timeout 60` as done in similar tests.
Reported by Bruno Haible.
* src/local.mk: Explicitly depend on @INTL_MACOS_LIBS@
which may be not implicitly referenced (in LIBINTL) without gettext.
This is a new transitive dependency through localename-unsafe.
We add this globally to ease future maintenance as currently
6 commands require the localename-unsafe dependency:
date, du through show-date() (fprintftime), and
ls, pr, stat, uptime through strftime().
tests: write-errors.sh: avoid portability issue with dash
* tests/misc/write-errors.sh: Use printf rather than echo
since the echo builtin in dash will interpret backslashes.
* tests/misc/read-errors.sh: Likewise for consistency.
cksum,wc: inspect GLIBC_TUNABLES on all architectures
* gnulib: Update gnulib to latest to pull in change
to inspect GLIBC_TUNABLES on non x86_64 or aarch64 platforms.
This was needed on 32 bit x86, which supports calling into
accelerated code, and so requires inspection of GLIBC_TUNABLES
to pass the disablement verification in tests/cksum/cksum.sh
Paul Eggert [Wed, 17 Sep 2025 16:12:23 +0000 (09:12 -0700)]
maint: STREQ → streq
Use new Gnulib streq function instead of rolling our own macro.
* bootstrap.conf (gnulib_modules): Add stringeq.
* src/rm.c (main): Don’t assume streq is a macro that expands to (...),
as it is now a function.
* src/system.h:
* tests/df/no-mtab-status.sh, tests/df/skip-duplicates.sh:
(STREQ): Remove. All uses replaced by streq.
* tests/fold/fold-characters.sh: Move memory limit test to ...
* tests/misc/write-errors.sh: ... which avoids "write error"
messages on stderr due to the ignored SIGPIPE. It also protects
the fold invocation with a timeout(1) so that fold implementations
that don't exit promptly upon write error don't hang the test suite
(Like we would have done before commit v9.7-311-gc95c7ee76).
maint: remove unnecessary uintmaxtostr usage in printf
* src/comm.c (compare_files): Use the "%ju" printf directive instead of
"%s" and printing the result of umaxtostr.
* src/df.c (get_header): Likewise.
* src/ls.c (print_long_format): Likewise.
* src/sort.c (check): Likewise.
* NEWS: Mention the new GLIBC_TUNABLES feature.
* doc/coreutils.texi (Hardware Acceleration): A new node
detailing the build time and run time configuration options.
fold: fix write error checks with invalid multi-byte input
* src/fold.c (write_out): A new helper to check all writes.
(fold-file): Use write_out() for all writes.
* tests/fold/fold-zero-width.sh: Adjust to writing more
data in various patterns, rather than two buffers of NULs.
This is a more robust memory bound check, and the '\303' case
tests this particular logic change.
* NEWS: fold now exits immediately, not just promptly.
* NEWS: Mention the improvement.
* src/fold.c (fold_file): Check for write errors
after each buffer read from stdin.
* tests/misc/write-errors.sh: Add test cases.
Paul Eggert [Wed, 6 Aug 2025 19:11:34 +0000 (12:11 -0700)]
cp: prefer signed types in copy-file-data.c
* src/copy-file-data.c (sparse_copy, lseek_copy)
(copy_file_data): Prefer idx_t to size_t.
(copy_file_data): Defend against st_size < 0, which can happen on
ancient buggy platforms. Check for overflow; the old code was
wrong on theoretical-but-valid hosts where SIZE_MAX <= INT_MAX.
Paul Eggert [Wed, 6 Aug 2025 18:48:21 +0000 (11:48 -0700)]
cp: don’t allocate a separate zero buffer
* src/copy-file-data.c (write_zeros): New args abuf, buf_size.
Use the lazily-allocated buffer, which most likely already exists,
rather than allocating a separate buffer just for zeros.
This makes the code more reentrant as there is no longer
a need for static storage here. Although there is some CPU
overhead due to the need to zero out the buffer for each file,
the overhead is relatively small as the buffer is smallish
and should be cached. All callers changed.
Paul Eggert [Wed, 6 Aug 2025 16:55:41 +0000 (09:55 -0700)]
cp: refactor copying to return bytes copied
This doesn’t change behavior; it simplifies future changes.
* src/copy-file-data.c (sparse_copy, lseek_copy, copy_file_data):
Return the number of bytes copied, or -1 on failure,
instead of merely returning a success indication.
All callers changed.
Paul Eggert [Wed, 6 Aug 2025 14:53:35 +0000 (07:53 -0700)]
cp: copy_file_data now supports ibytes
This does not affect current coreutils behavior;
it is merely to help make copy_file_data more useful in the future.
* src/copy-file-data.c (lseek_copy): New arg ibytes.
Caller changed.
(copy_file_data): Implement the ibytes arg; formerly
it was always treated as COUNT_MAX, though all callers
currently pass COUNT_MAX so there was no problem in practice.
Paul Eggert [Tue, 5 Aug 2025 22:45:35 +0000 (15:45 -0700)]
cp: port better to old limited hosts
Port better ancient platforms where OFF_T_MAX is only 2**31 - 1,
but some devices have more than that many bytes.
* src/copy-file-data.c (copy_file_data): Byte count is now
count_t, not off_t. All callers changed. Since we need to check
for overflow anyway, also check for too-small calls to fadvise.
Paul Eggert [Tue, 5 Aug 2025 21:45:18 +0000 (14:45 -0700)]
cp: refactor out data copying
* po/POTFILES.in, src/local.mk (copy_sources): Add the new file.
* src/copy.c: Move the #includes of alignalloc.h, buffer-lcm.h,
fadvise.h, full-write.h, ioblksize.h to copy-file-data.c.
(enum copy_debug_val, struct copy_debug): Move these decls to copy.h.
(punch_hole, create_hole, is_CLONENOTSUP, sparse_copy)
(write_zeros, lseek_copy, HAVE_STRUCT_STAT_ST_BLOCKS)
(enum scantype, struct scan_inference, infer_scantype):
Move to copy-file-data.c.
(copy_reg): Move the data-copying part of this function
to the new function copy_file_data in copy-file-data.c.
* src/copy-file-data.c: New file, taken from part of copy.c.
Paul Eggert [Tue, 5 Aug 2025 19:47:16 +0000 (12:47 -0700)]
cp: refactor src/copy.c
This is in preparation for splitting this large module.
* src/copy.c (sparse_copy, lseek_copy): New arg DEBUG, used to
identify copy debug info instead of using a static var. All
callers changed.
(lseek_copy, infer_scantype): New args SRC_POS and POS.
Callers changed.
(copy_file_data): New function, with contents taken from copy.
(copy_reg): Call it.
doc: NEWS: mention fold can operate on very long lines
* NEWS: Before commit fb9016d50 (fold: use fread instead of getline,
2025-08-24), fold required that the maximum line size in a file fit into
memory. Document that this is no longer the case.
fold: fix out of bounds write with zero width characters
* src/fold.c (fold_file): Prefer putchar ('\n') to copying characters.
If we do not have room in the output buffer print it since it is not a
full line of text.
* tests/fold/fold-zero-width.sh: New test case.
* tests/local.mk (all_tests): Add it.