Pick up mbrto{c32,wc} optimizations on UTF-8 on GLIBC.
Note configure.ac defines the required GNULIB_WCHAR_SINGLE_LOCALE.
This speeds up wc -m by 2.6x, when processing non ASCII chars,
and will similarly speed up per character processing
in the impending cut multi-byte implementation.
* NEWS: Mention the wc -m speed improvement.
basename: avoid duplicate strlen calls on the suffix
$ ltrace -c ./src/basename-prev -s a $(seq 100000) > /dev/null
% time seconds usecs/call calls function
------ ----------- ----------- --------- --------------------
50.00 30.030316 75 400000 strlen
[...]
$ ltrace -c ./src/basename -s a $(seq 100000) > /dev/null
% time seconds usecs/call calls function
------ ----------- ----------- --------- --------------------
42.88 22.413953 74 300001 strlen
[...]
* src/basename.c (remove_suffix, perform_basename): Add a length
argument for the suffix and use it instead of strlen.
(main): Calculate the suffix length. Refactor code to avoid calling
perform_basename in multiple places.
Paul Eggert [Fri, 3 Apr 2026 01:53:34 +0000 (18:53 -0700)]
date: simplify -u by not calling putenv
* src/date.c (TZSET): Remove; no longer needed.
(main): Simplify -u’s implementation by passing "UTC0" to tzalloc,
rather than by setting TZ in the environment and then calling getenv.
The old way of doing things dates back to before we had tzalloc.
* configure.ac (LOCALTIME_CACHE): Remove; no longer needed.
Paul Eggert [Wed, 1 Apr 2026 18:56:18 +0000 (11:56 -0700)]
maint: avoid Gnulib modules mbiter, mbiterf
* bootstrap.conf (avoided_gnulib_modules): Avoid mbiter and
mbiterf, for the same reason we avoid mbuiter and mbuiterf: these
modules are not needed because (due to mcel-prefer) we use mcel in
preference to mbiter/mbiterf/mbuiter/mbuiterf.
tests: dd: ensure memory exhaustion is handled gracefully
* tests/dd/no-allocate.sh: Ensure we exit 1 upon mem allocation failure.
Also check other buffer size edge cases.
https://github.com/uutils/coreutils/issues/11436
https://github.com/uutils/coreutils/issues/11580
https://github.com/coreutils/coreutils/pull/235
tests: dd: avoid false failure with no controlling terminal
* tests/dd/misc.sh: test -w /dev/tty is not a strong enough check,
we need to actually open /dev/tty to ensure it's available.
It's not available under setsid for example.
oech3 [Tue, 31 Mar 2026 06:57:58 +0000 (15:57 +0900)]
tests: dd: check that erroneous seeks are not done in output
* tests/dd/misc.sh: Add test case for of=/dev/tty.
The same occurs for /dev/stdout, but that varies
in the test hardness so is best avoided.
https://github.com/coreutils/coreutils/pull/234
Collin Funk [Sat, 28 Mar 2026 05:45:56 +0000 (22:45 -0700)]
doc: tty: mention the removal of the -s option from POSIX
* doc/coreutils.texi (tty invocation): Mention that POSIX.1-2001 removed
the -s option and that portable scripts can redirect standard out to
/dev/null instead.
oech3 [Thu, 12 Mar 2026 09:33:08 +0000 (18:33 +0900)]
tests: env/env.sh: improve portability
* tests/env/env.sh: Make more portable by avoiding references to our
build dir, and avoiding names that may cause false matches in
multi-call binaries.
https://github.com/coreutils/coreutils/pull/216
oech3 [Mon, 23 Mar 2026 13:28:22 +0000 (22:28 +0900)]
tests: yes: support more zero-copy related syscalls
* tests/misc/yes.sh: Disable other related zero-copy syscalls
to ensure better testing of future or other implementations.
https://github.com/coreutils/coreutils/pull/227
Chris Down [Mon, 23 Mar 2026 07:55:53 +0000 (15:55 +0800)]
sort: speed up keyed field sorting significantly using memchr
When sort is invoked with an explicit field separator with `-t SEP`,
begfield() and limfield() scan for the separator to locate boundaries.
Right now the implementation there uses a loop that iterates over bytes
one by one, which is not ideal since we must scan past many bytes of
non-separator data one byte at a time.
Let's replace each of these loops with memchr(). On glibc systems,
memchr() uses SIMD to scan 16 bytes per step (NEON on aarch64) or 32
bytes per step (AVX2 on x86_64), rather than 1 byte at a time, so any
field longer than a handful of bytes stands to benefit quite
significantly.
Using the following input data:
awk 'BEGIN {
srand(42)
for (i = 1; i <= 500000; i++)
printf "%*d,%*d,%d\n", 4+int(rand()*9), 0,
4+int(rand()*9), 0, int(rand()*10000)
}' > short_csv_500k
awk 'BEGIN {
for (i = 1; i <= 500000; i++)
printf "%100d,%100d,%d\n", 0, 0, int(rand()*10000)
}' > wide_csv_500k
sort -t, -k3,3n (500K lines, 4-12 byte short fields):
Before: 123.1 ms After: 108.1 ms (-12.2%)
sort -t, -k3,3n (500K lines, 100 byte wide fields):
Before: 243.5 ms After: 165.9 ms (-31.9%)
sort (default, no -k, 500K lines):
Before: 141.6 ms After: 141.8 ms (unchanged)
And on M1 Pro aarch64 with -O2:
sort -t, -k3,3n (500K lines, 4-12 byte short fields):
Before: 98.0 ms After: 92.3 ms (-5.8%)
sort -t, -k3,3n (500K lines, 100 byte wide fields):
Before: 240.8 ms After: 183.0 ms (-24.0%)
sort (default, no -k, 500K lines):
Before: 145.6 ms After: 145.6 ms (unchanged)
Looking at profiling, the improvement is larger on x86_64 in these runs
because glibc's memchr uses AVX2 to scan 32 bytes per step versus 16
bytes per step with NEON on aarch64.
Collin Funk [Sat, 21 Mar 2026 08:07:28 +0000 (01:07 -0700)]
tac: promptly diagnose write errors
This patch also fixes a bug where 'tac' would print a vague error on
some inputs:
$ seq 10000 | ./src/tac-prev > /dev/full
tac-prev: write error
$ seq 10000 | ./src/tac > /dev/full
tac: write error: No space left on device
In this case ferror (stdout) is true, but errno has been set back to
zero by a successful fclose (stdout) call.
* src/tac.c (output): Call write_error() if fwrite fails.
* tests/misc/io-errors.sh: Check that 'tac' prints a detailed write
error.
* NEWS: Mention the improvement.
Pádraig Brady [Sat, 21 Mar 2026 12:37:20 +0000 (12:37 +0000)]
tests: support checking for specific write errors
* tests/misc/io-errors.sh: Support checkout for a specific error
in commands that don't run indefinitely. Currently all the explicitly
listed commands output a specific error and do not need to be tagged.
Pádraig Brady [Mon, 16 Mar 2026 22:25:42 +0000 (22:25 +0000)]
tests: ls: fix false failure on FreeBSD
* tests/ls/non-utf8-hidden.sh: Avoid sorting in ls, to avoid:
ls: cannot compare file names ...: Illegal byte sequence
seen on FreeBSD 14.
Reported by Bruno Haible.
Collin Funk [Sun, 15 Mar 2026 03:21:53 +0000 (20:21 -0700)]
tee: prefer file descriptors over streams
We disable buffering on the streams anyways, so we were effectively
calling the write system call previously despite using streams.
* src/iopoll.h (fclose_wait, fwrite_wait): Remove declarations.
(close_wait, write_wait): Add declarations.
* src/iopoll.c (fwait_for_nonblocking_write, fclose_wait, fwrite_wait):
Remove functions.
(wait_for_nonblocking_write): New function based on
fwait_for_nonblocking_write.
(close_wait): New function based on fclose_wait.
(write_wait): New function based on fwrite_wait.
* src/tee.c: Include fcntl--.h. Don't include stdio--.h.
(get_next_out): Operate on file descriptors instead of streams.
(fail_output): Likewise. Remove clearerr call since we no longer call
fwrite on stdout.
(tee_files): Operate on file descriptors instead of streams. Remove
calls to setvbuf.
Collin Funk [Sat, 14 Mar 2026 03:37:10 +0000 (20:37 -0700)]
timeout: don't exit immediately if the parent is the init process
* src/timeout.c (main): Save the process ID before creating a child
process. Check if the result of getppid is different than the saved
process ID instead of checking if it is 1.
* tests/timeout/init-parent.sh: New file.
* tests/local.mk (all_tests): Add the new test.
* NEWS: Mention the bug fix. Also mention that this change allows
'timeout' to work when reparented by a subreaper process instead of
init.
Pádraig Brady [Wed, 11 Mar 2026 15:39:20 +0000 (15:39 +0000)]
dd: always diagnose partial writes on write failure
* src/dd.c (dd_copy): Increment the partial write count upon failure.
* tests/dd/partial-write.sh: Add a new test.
* tests/local.mk: Reference the new test.
* NEWS: Mention the bug fix.
Fixes https://bugs.gnu.org/80583
Pádraig Brady [Wed, 11 Mar 2026 15:57:22 +0000 (15:57 +0000)]
doc: clarify a recent NEWS item
* NEWS: It was ambiguous as to whether we quoted a range of
observered throughputs. Clarify this was the old and new
throughput on a single test system.
Pádraig Brady [Sat, 7 Mar 2026 14:23:38 +0000 (14:23 +0000)]
yes: use a zero-copy implementation via (vm)splice
A good reference for the concepts used here is:
https://mazzo.li/posts/fast-pipes.html
We don't consider huge pages or busy loops here,
but use vmsplice(), and splice() to get significant speedups:
Also throughput to file (on BTRFS) was seen to increase significantly.
With a Fedora 43 laptop improving from 690MiB/s to 1.1GiB/s.
* bootstrap.conf: Ensure sys/uio.h is present.
This was an existing transitive dependency.
* m4/jm-macros.m4: Define HAVE_SPLICE appropriately.
We assume vmsplice() is available if splice() is as they
were introduced at the same time to Linux and glibc.
* src/yes.c (repeat_pattern): A new function to efficiently
duplicate a pattern in a buffer with memcpy calls that double in size.
This also makes the setup for the existing write() path more efficient.
(pipe_splice_size): A new function to increase the kernel pipe buffer
if possible, and use an appropriately sized buffer based on that (25%).
(splice_write): A new function to call vmplice() when outputting
to a pipe, and also splice() if outputting to a non-pipe.
* tests/misc/yes.sh: Verify the non-pipe output case,
(main): Adjust to always calling write on the minimal buffer first,
then trying vmsplice(), then falling back to write from bigger buffer.
and the vmsplice() fallback to write() case.
* NEWS: Mention the improvement.
Pádraig Brady [Mon, 9 Mar 2026 22:23:12 +0000 (22:23 +0000)]
all: use more consistent blank character determination
* src/system.h (c32issep): A new function that is essentially
iswblank() on GLIBC platforms, and iswspace() with exceptions elsewhere.
* src/expand.c: Use it instead of c32isblank().
* src/fold.c: Likewise.
* src/join.c: Likewise.
* src/numfmt.c: Likewise.
* src/unexpand.c: Likewise.
* src/uniq.c: Likewise.
* NEWS: Mention the improvement.
Collin Funk [Tue, 10 Mar 2026 02:32:27 +0000 (19:32 -0700)]
wc: improve aarch64 Neon optimization for 'wc -l'
$ yes abcdefghijklmnopqrstuvwxyz | head -n 200000000 > input
$ time ./src/wc-prev -l input 200000000 input
real 0m1.240s
user 0m0.456s
sys 0m0.784s
$ time ./src/wc -l input 200000000 input
real 0m0.936s
user 0m0.141s
sys 0m0.795s
* configure.ac: Use unsigned char for the buffer to avoid potential
compiler warnings. Check for the functions being used in src/wc_neon.c
after this patch.
* src/wc_neon.c (wc_lines_neon): Use vreinterpretq_s8_u8 to convert 0xff
into -1 instead of bitwise AND instructions into convert it into 1.
Perform the pairwise addition and lane extraction once every 8192 bytes
instead of once every 64 bytes.
Thanks to Lasse Collin for spotting this and reviewing a draft of this
patch.
Collin Funk [Sun, 8 Mar 2026 00:16:01 +0000 (16:16 -0800)]
maint: prefer memset_explicit to explicit_bzero
The explicit_bzero function is a common extension, but memset_explicit
was standardized in C23. It will likely become more portable in the
future, and Gnulib provides an implementation if needed.
* bootstrap.conf (gnulib_modules): Add memset_explicit. Remove
explicit_bzero.
* gl/lib/randint.c (randint_free): Use memset_explicit instead of
explicit_bzero.
* gl/lib/randread.c (randread_free_body): Likewise.
Lukáš Zaoral [Fri, 6 Mar 2026 14:13:17 +0000 (14:13 +0000)]
expand,unexpand: support multi-byte input
* src/expand.c: Use mbbuf to support multi-byte input.
* src/unexpand.c: Likewise.
* tests/expand/mb.sh: New multi-byte test.
* tests/unexpand/mb.sh: Likewise.
* tests/local.mk: Reference new tests.
* NEWS: Mention the improvement.
Collin Funk [Thu, 5 Mar 2026 07:34:45 +0000 (23:34 -0800)]
maint: chown,chgrp: reduce variable scope
* src/chown-core.c (describe_change, restricted_chown)
(change_file_owner, chown_files): Declare variables where they are used
instead of at the start of the function.
* src/chown.c (main): Likewise.
Collin Funk [Sun, 1 Mar 2026 23:31:28 +0000 (15:31 -0800)]
install: allow the combination of --compare and --preserve-timestamps
* NEWS: Mention the improvement.
* src/install.c (enum copy_status): New type to let the caller know if
the copy was performed or skipped.
(copy_file): Return the new type instead of bool. Reduce variable scope.
(install_file_in_file): Only strip the file if the copy was
performed. Update the timestamps if the copy was skipped.
(main): Don't error when --compare and --preserve-timestamps are
combined.
* tests/install/install-C.sh: Add some test cases.
Pádraig Brady [Sat, 28 Feb 2026 11:09:26 +0000 (11:09 +0000)]
cksum: use more defensive escaping for --check
cksum --check is often the first interaction
users have with possibly untrusted downloads, so we should try
to be as defensive as possible when processing it.
Specifically we currently only escape \n characters in file names
presented in checksum files being parsed with cksum --check.
This gives some possibilty of dumping arbitrary data to the terminal
when checking downloads from an untrusted source.
This change gives these advantages:
1. Avoids dumping arbitrary data to vulnerable terminals
2. Avoids visual deception with ansi codes hiding checksum failures
3. More secure if users copy and paste file names from --check output
4. Simplifies programmatic parsing
Note this changes programmatic parsing, but given the original
format was so awkward to parse, I expect that's extremely rare.
I was not able to find example in the wild at least.
To parse the new format from from shell, you can do something like:
cksum -c checksums | while IFS= read -r line; do
case $line in
*': FAILED')
filename=$(eval "printf '%s' ${line%: FAILED}")
cp -v "$filename" /quarantine
;;
esac
done
This change also slightly reduces the size of the sum(1) utility.
This change also apples to md5sum, sha*sum, and b2sum.
* src/cksum.c (digest_check): Call quotef() instead of
cksum(1) specific quoting.
* tests/cksum/md5sum-bsd.sh: Adjust accordingly.
* doc/coreutils.texi (cksum general options): Describe the
shell quoting used for problematic file names.
* NEWS: Mention the change in behavior.
Reported by: Aaron Rainbolt
Pádraig Brady [Wed, 4 Mar 2026 16:56:48 +0000 (16:56 +0000)]
fold: fix output truncation with 0xFF bytes in input
On signed char platforms, 0xFF was converted to -1
which matches MBBUF_EOF, causing fold to stop processing.
* NEWS: Mention the bug fix.
* gl/lib/mbbuf.h: Avoid sign extension on signed char platforms.
* tests/fold/fold-characters.sh: Adjust test case.
Reported at https://src.fedoraproject.org/rpms/coreutils/pull-request/20
Sylvestre Ledru [Sat, 14 Feb 2026 19:08:12 +0000 (20:08 +0100)]
tests: date: add timezone conversion test
*tests/date/date.pl: Add the test case.
Add test case for https://github.com/uutils/coreutils/issues/10800
to verify `date -u -d '10:30 UTC-05'` converts to 15:30 UTC.
Collin Funk [Tue, 3 Mar 2026 05:43:22 +0000 (21:43 -0800)]
tests: avoid false test failure when using address sanitizer
* tests/misc/warning-errors.sh: Skip commands which have been built with
sanitizers, since standard error will not be closed and checked for
errors.
Reported by Bruno Haible.
Collin Funk [Tue, 3 Mar 2026 06:16:21 +0000 (22:16 -0800)]
tests: avoid failure on systems without an optimized 'cksum' or 'wc -l'
* tests/misc/warning-errors.sh: Expect 'wc' and 'cksum' to exit
successfully if there is not an optimized 'wc -l' implementation or
CRC32 implementation.
Reported by Bruno Haible.
oech3 [Mon, 2 Mar 2026 11:56:23 +0000 (11:56 +0000)]
tests: shuf: ensure memory exhaustion is handled gracefully
* tests/shuf/shuf.sh: Ensure we exit 1 upon failure
to allocate memory.
https://github.com/uutils/coreutils/issues/11170
https://github.com/coreutils/coreutils/pull/209
Collin Funk [Sun, 1 Mar 2026 02:36:34 +0000 (18:36 -0800)]
tests: wc,du: add additional --files0-from test cases
* tests/wc/wc-files0-from.pl ($limits): New variable.
(@Tests): Prefer the error strings from getlimits over writing them by
hand. Add test cases for --files0-from listing missing files and
duplicate files.
* tests/du/files0-from.pl ($limits): New variable.
(@Tests): Prefer the error strings from getlimits over writing them by
hand. Add test cases for --files0-from listing missing files. Add tests
for --files0-from listing duplicate files with and without the -l option
also in use.
Paul Eggert [Sat, 28 Feb 2026 00:17:27 +0000 (16:17 -0800)]
id: avoid unnecessary buffer flushing
* src/groups.c (main):
* src/id.c (main, print_stuff):
Don’t flush stdout before testing for write error.
Do the test only when in a loop, as a one-shot will
test for write error soon anyway.
Paul Eggert [Sat, 28 Feb 2026 00:17:27 +0000 (16:17 -0800)]
cksum: prefer signed int
* src/cksum.c (min_digest_line_length, digest_hex_bytes)
(digest_length, md5_sum_stream, sha1_sum_stream)
(sha224_sum_stream, sha256_sum_stream, sha384_sum_stream)
(sha512_sum_stream, sha2_sum_stream, sha3_sum_stream)
(blake2b_sum_stream, sm3_sum_stream, problematic_chars)
(filename_unescape, valid_digits, bsd_split_3)
(algorithm_from_tag, split_3, digest_file, output_file)
(b64_equal, hex_equal, digest_check, main):
* src/cksum_avx2.c (cksum_avx2):
* src/cksum_avx512.c (cksum_avx512):
* src/cksum_crc.c (cksum_fp_t, cksum_slice8, crc_sum_stream)
(crc32b_sum_stream, output_crc):
* src/cksum_pclmul.c (cksum_pclmul):
* src/cksum_vmull.c (cksum_vmull):
* src/sum.c (bsd_sum_stream, sysv_sum_stream, output_bsd, output_sysv):
Prefer signed to unsigned int where either will do.
This allows better checking with -fsanitize=undefined.
It should also help simplify future patches, so that they
needn’t worry whether comparisons like ‘i < len - 2’ will misbehave.
Collin Funk [Fri, 27 Feb 2026 04:39:12 +0000 (20:39 -0800)]
stat: don't check QUOTING_STYLE when --printf %%N is used
* NEWS: Mention the fix.
* src/stat.c (main): Only check QUOTING_STYLE if there is a %N that is
not preceded by a percentage sign.
* tests/stat/stat-fmt.sh: Add some test cases.
Collin Funk [Thu, 26 Feb 2026 04:59:35 +0000 (20:59 -0800)]
id: promptly diagnose write errors
* NEWS: Mention the improvement.
* src/id.c (print_stuff): Call fflush for each listed user to check for
write errors.
* tests/misc/io-errors.sh: Add an invocation of 'id'.
Collin Funk [Thu, 26 Feb 2026 04:56:12 +0000 (20:56 -0800)]
groups: promptly diagnose write errors
* NEWS: Mention the improvement.
* src/groups.c (main): Call fflush for each listed user to check for
write errors.
* tests/misc/io-errors.sh: Add an invocation of 'groups'.
Pádraig Brady [Thu, 26 Feb 2026 20:06:29 +0000 (20:06 +0000)]
tests: ensure failure to write warnings is handled gracefully
* tests/misc/warning-errors.sh: Add a new test to ensure
failure to write warnings is diagnosed in the exit status.
* tests/local.mk: Reference the new test.
oech3 [Thu, 26 Feb 2026 12:04:56 +0000 (21:04 +0900)]
tests: shuf: ensure randomization doesn't depend solely on ASLR
* tests/shuf/shuf.sh: Use setarch --addr-no-randomize to disable
ASLR, and show the output is still random.
https://github.com/coreutils/coreutils/pull/198
* tests/factor/factor.pl: Verify that embedded NULs
on stdin terminate the _number_.
* tests/numfmt/numfmt.p: Verify that embedded NULs
on stdin terminate the _line_.
https://github.com/coreutils/coreutils/pull/196
Pádraig Brady [Tue, 24 Feb 2026 15:44:41 +0000 (15:44 +0000)]
tests: fix "Hangup" termination of non-interactive runs
This avoids the test harness being terminated like:
make[1]: *** [Makefile:24419: check-recursive] Hangup
make[3]: *** [Makefile:24668: check-TESTS] Hangup
make: *** [Makefile:24922: check] Hangup
make[2]: *** [Makefile:24920: check-am] Hangup
make[4]: *** [Makefile:24685: tests/misc/usage_vs_refs.log] Error 129
...
This happened sometimes when the tests were being run non interactively.
For example when run like:
setsid make TESTS="tests/timeout/timeout.sh \
tests/tail/overlay-headers.sh" SUBDIRS=. -j2 check
Note the race window can be made bigger by adding a sleep
after tail is stopped in overlay-headers.sh
The race can trigger the kernel to induce its job control
mechanism to prevent stuck processes.
I.e. where it sends SIGHUP + SIGCONT to a process group
when it determines that group may become orphaned,
and there are stopped processes in that group.
* tests/tail/overlay-headers.sh: Use setsid(1) to keep the stopped
tail process in a separate process group, thus avoiding any kernel
job control protection mechanism.
* tests/timeout/timeout.sh: Use setsid(1) to avoid the kernel
checking the main process group when sleep(1) is reparented.
Fixes https://bugs.gnu.org/80477
Collin Funk [Sun, 22 Feb 2026 22:20:30 +0000 (14:20 -0800)]
doc: tee: avoid the use of gpg cleartext signatures in an example
Cleartext signatures have many gotchas. Therefore, the use of detached
signatures is recommended where possible. See:
<https://gnupg.org/blog/20251226-cleartext-signatures.html>.
* doc/coreutils.texi (tee invocation): Adjust gpg invocation to produce
a detached signature.
oech3 [Mon, 23 Feb 2026 10:22:44 +0000 (19:22 +0900)]
tests: whoami, logname: verify error handling
* tests/df/no-mtab-status-masked-proc.sh: Tweak unshare check.
* tests/local.mk: Reference new test.
* tests/misc/user.sh: Add new test using unshare -U, to verify
that whoami and logname diagnose failure correctly.
https://github.com/coreutils/coreutils/pull/195