Pádraig Brady [Fri, 27 Oct 2023 12:24:04 +0000 (13:24 +0100)]
base32,base64: disallow non-canonical encodings
This will make decoding more resilient to corruption
whether due to transmission errors or nefarious adjustment.
See https://eprint.iacr.org/2022/361.pdf
* gnulib: Update to commit 3f463202bd enforcing canonical encoding.
* tests/basenc/base64.pl: Add test cases, and adjust existing cases.
* NEWS: Mention the change in behavior.
Paul Eggert [Wed, 25 Oct 2023 22:09:04 +0000 (15:09 -0700)]
basenc: fix unlikely locale issue; tune
This sped up ‘basenc -d --base16’ by 60% on my old platform,
AMD Phenom II X4 910e, Fedora 38.
* src/basenc.c (struct base16_decode_context): Simplify by
omitting have_nibble. ‘nibble’ is now negative if it’s missing.
All uses changed.
(B16): New macro, inspired by ../lib/base64.c.
(base16_to_int): New static var, likewise.
(isubase16): Reimplement using base16_to_int, since isxdigit is
not guaranteed to succeed on the chars we want when the locale is
oddball.
(base16_decode_ctx): Tune by using base16_to_int and by
Paul Eggert [Wed, 25 Oct 2023 21:43:32 +0000 (14:43 -0700)]
basenc: tweak checks to use unsigned char
This tends to generate better code, at least on x86-64,
because callers are just as fast and callees can avoid a conversion.
* src/basenc.c: The following renamings also change the arg type
from char to unsigned char. All uses changed.
(isubase): Rename from isbase.
(isubase64url): Rename from isbase64url.
(isubase32hex): Rename from isbase32hex.
(isubase16): Rename from isbase16.
(isuz85): Rename from isz85.
(isubase2): Rename from isbase2.
2023-10-24 Paul Eggert <eggert@cs.ucla.edu>
* src/basenc.c (struct base16_decode_context):
Simplify by storing -1 for missing nibbles. All uses changed.
Pádraig Brady [Mon, 23 Oct 2023 11:51:19 +0000 (12:51 +0100)]
basenc: --base16: support lower case hex digits
* src/basenc.c (base16_decode_ctx): Convert to uppercase
before converting from hex.
* tests/basenc/basenc.pl: Add a test case.
* NEWS: Mention the change in behavior.
Addresses https://bugs.gnu.org/66698
Pádraig Brady [Thu, 5 Oct 2023 16:00:51 +0000 (17:00 +0100)]
basenc: auto pad base32 and base64 inputs when decoding
Padding of encoded data is useful in cases where
base64 encoded data is concatenated / streamed.
I.e. where there are padding chars _within_ the stream.
In other cases padding is optional and can be inferred.
Note we continue to treat partial padding as invalid,
as that would be indicative of truncation.
* src/basenc.c (do_decode): Auto pad the end of the input.
* NEWS: Mention the change in behavior.
* tests/misc/base64.pl: Adjust to not fail for missing padding.
Addresses https://bugs.gnu.org/66265
Paul Eggert [Sun, 24 Sep 2023 00:07:33 +0000 (17:07 -0700)]
wc: goto considered harmful
* src/wc.c: Do not include assure.h. Replace the only
use of ‘assure’ with ‘unreachable’ which is good enough.
(wc, main): Remove labels and gotos. This doesn’t affect
performance in any way I can measure, and makes the code
a bit easier to follow.
Paul Eggert [Sat, 23 Sep 2023 21:22:16 +0000 (14:22 -0700)]
wc: prefer signed integers
Prefer signed to unsigned integers, to make it easier to catch
integer overflow errors.
* src/wc.c: Do not include safe-read.
(total_lines_overflow, total_words_overflow, total_chars_overflow)
(total_bytes_overflow): Now bool, not uintmax_t. All uses changed.
(max_line_length): Now intmax_t, not uintmax_t. All uses changed.
The total_... vars are still uintmax_t because overflow into them
is checked.
(page_size): Now idx_t, not size_t.
(wc_lines, wc, get_input_fstatus, compute_number_width, main):
Prefer signed to unsigned ints where either should do.
(wc_lines, wc): Use read rather than safe_read, since we don’t
need safe_read’s checks for huge buffers.
(wc): Redo call to mbrtoc32 to lessen the number of comparisons
against its returned value. Do this partly by keeping a pointer
to the end of the buffer rather than a count. Simplify
overflow-checking code.
(compute_number_width): Check for integer overflow.
Don’t assume size_t fits into unsigned long.
* src/wc.h (struct wc_lines): Prefer signed integers.
* src/wc_avx2.c: Do not include safe-read.h.
(wc_lines_avx2): Prefer signed integers. Use read, not safe_read.
Paul Eggert [Sat, 23 Sep 2023 20:38:08 +0000 (13:38 -0700)]
wc: improve avx2 API
* src/wc.c: Use "#include <...>" for files not in the current dir.
Include "wc.h" instead of declaring wc_lines_avx2 by hand.
(wc_lines): New API, with no file name (no longer needed) and
with a return struct rather than arg pointers. All uses changed.
Use avx2_supported directly instead of using a function pointer.
Exploit C99-style declarations after statements.
Multiply by 15 rather than dividing; it’s faster and more accurate
and cannot overflow here.
(wc): Simplify based on wc_lines API change.
* src/wc.h: New file.
* src/wc_avx2.c: Include it, to check API better.
(wc_lines_avx2): Use new API. All uses changed. Exploit C99.
Make locals more local.
Paul Eggert [Sat, 23 Sep 2023 08:15:08 +0000 (01:15 -0700)]
factor,tail: avoid quadratic reallocation
* src/factor.c (struct mp_factors): New member nalloc.
(mp_factor_init): Initialize it.
* src/factor.c (mp_factor_insert):
* src/tail.c (parse_options): Use xpalloc to avoid quadratic
worst-case behavior on reallocation.
* src/tail.c (pids_alloc): New static var.
Paul Eggert [Sat, 23 Sep 2023 07:03:41 +0000 (00:03 -0700)]
wc: simplify by removing SUPPORT_OLD_MBRTOWC
* src/wc.c (SUPPORT_OLD_MBRTOWC): Remove. All uses removed.
(wc): Simplify by assuming C99-or-later behavior for mbrtoc32,
which after all is a C11 API. Fix the !SUPPORT_OLD_MBRTOWC
code, which evidently was never tested seriously.
Paul Eggert [Sat, 23 Sep 2023 05:09:37 +0000 (22:09 -0700)]
wc: 3× speedup in C locale
The 3× speedup was measured by invoking 'wc $(find * -type f)'
on the coreutils sources etc. on an Ubuntu 23.04 x86-64.
These changes also speed up wc 20% in UTF-8 locales.
* src/wc.c (wc_isprint, wc_isspace): New static vars.
(wc): Use them for speed.
(main): Initialize them if needed.
(isnbspace): Remove; no longer used.
Paul Eggert [Fri, 22 Sep 2023 18:13:51 +0000 (11:13 -0700)]
wc: fix word count bug
* bootstrap.conf (gnulib_modules): Remove c32isprint.
* src/wc.c (wc): Consider all non-white-space characters
to be word constituents, even if they are not printable.
POSIX requires this, and it is what BSD does.
Partly do this by simplifying the check for a word,
by counting word starts rather than word ends.
* tests/wc/wc.pl: Test for the bug.
Paul Eggert [Fri, 22 Sep 2023 17:05:58 +0000 (10:05 -0700)]
maint: omit some unused function tests
* m4/jm-macros.m4: Do not check for ftruncate, iswspace,
mkfifo, mbrlen, sysctl. Coreutils no longer uses the
corresponding HAVE_* macros, typically because Gnulib
handles them now.
* src/wc.c (iswspace): Remove; unused.
Paul Eggert [Fri, 22 Sep 2023 16:45:12 +0000 (09:45 -0700)]
maint: prefer char32_t to wchar_t
This should work better on non-glibc platforms that don’t
use Unicode for wchar_t. However, POSIX appears to prohibit
this for printf.c so leave that alone.
* bootstrap.conf (gnulib_modules): Add btoc32, c32iscntrl,
c32isprint, c32isspace, c32width, mbrtoc32. Remove btoc, wcwidth.
* src/df.c, src/ls.c, src/wc.c:
Include uchar.h instead of wchar.h and wctype.h.
* src/df.c (replace_invalid_chars):
* src/ls.c (quote_name_buf):
* src/wc.c (isnbspace, wc):
Use char32_t instead of wchar_t.
Paul Eggert [Fri, 22 Sep 2023 15:17:15 +0000 (08:17 -0700)]
wc: simplify #if MB_LEN_MAX
* src/wc.c: Don’t have special #ifs for platforms where
MB_LEN_MAX is 1. On these platforms, MB_CUR_MAX is 1 as well,
so the compiler should optimize away all multi-byte code.
Paul Eggert [Fri, 22 Sep 2023 01:45:47 +0000 (18:45 -0700)]
maint: prefer mcel
This causes Gnulib code to also use mcel, which is more consistent.
* bootstrap.conf (avoided_gnulib_modules): Avoid mbuiter
and mbuiterf, since we can now just use mcel. This avoids
the need to ship and compile mbchar and these modules.
(gnulib_modules): Change mcel to mcel-prefer.
Paul Eggert [Fri, 22 Sep 2023 01:45:08 +0000 (18:45 -0700)]
wc: stop worrying about EBCDIC, shift-JIS, etc
* src/wc.c: Do not include mbchar.h.
(wc): Check for ASCII characters instead of using is_basic.
Other parts of Gnulib and coreutils already assume the encoding
is upward compatible with ASCII, and the old code wouldn’t
have worked anyway with shift-JIS.
Paul Eggert [Thu, 21 Sep 2023 23:59:48 +0000 (16:59 -0700)]
expr: use mcel
The mcel API is simpler and corresponds more closely to how
Emacs etc. behave when the input has encoding errors,
since it treats each encoding-error byte separately.
* bootstrap.conf (gnulib_modules): Add mcel.
* src/expr.c: Include mcel.h instead of mbuiter.h.
(mbs_logical_cspn, mbs_logical_substr, mbs_offset_to_chars):
Use mcel API.
(mbs_logical_substr): Use ximemdup0 so as not to waste memory in
the result, fixing a FIXME.
build: avoid build failures on gcc <= 10, or clang
On gcc 10 the following build failure occurs:
"error: a label can only be part of a statement
and a declaration is not a statement"
This is because the current code is non standards conforming,
but GCC >= 11 will compile it (even with the -Wpedantic option).
This issue is tracked for GCC at:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111526
* src/tail.c (parse_options): Avoid a declaration after label,
by using a surrounding block.
Stephen Kitt [Mon, 18 Sep 2023 16:09:29 +0000 (18:09 +0200)]
tail: allow multiple PIDs
tail can watch multiple files, but currently only a single writer. It
can be useful to watch files from multiple writers, or even processes
not directly related to the files (e.g. watch log files written by a
server process, for the duration of a test driven by a separate
client).
* src/tail.c (writers_are_dead): New function.
(tail_forever): Use it to wait for writers.
(tail_forever_inotify): As above.
(parse_options): Manage --pid options in an array.
* doc/coreutils.texi: Update documentation.
* tests/tail/pid.sh: Add a variant with two PIDs.
* News: Mention the new feature.
ls: --dired now implies long format with hyperlinks disabled
Currently --dired is silently ignored
with conflicting output formats
* src/ls.c (decode_switches): Set default format and hyperlink mode
when the --dired option is specified.
* tests/ls/dired.sh: Check that formats are implied / overridden.
* NEWS: Mention the change in behavior.
* doc/coreutils.texi (ls invocation): Adjust --dired description.
Paul Eggert [Fri, 8 Sep 2023 16:14:06 +0000 (09:14 -0700)]
cp: avoid needless unlinkat after fstatat ELOOP
* src/copy.c (copy_internal): When cp -f's fstatat fails on the
destination with ELOOP, report an error immediately when fstatat
used AT_SYMLINK_NOFOLLOW, as the later unlinkat would fail too.
Paul Eggert [Sat, 2 Sep 2023 20:27:45 +0000 (13:27 -0700)]
cp,mv,install: fix chmod on Linux CIFS
This bug occurs only when temporarily setting the mode to the
intersection of old and new modes when changing ownership.
* src/copy.c (owner_failure_ok): Treat EACCES like EPERM.
Paul Eggert [Fri, 1 Sep 2023 16:11:32 +0000 (09:11 -0700)]
chown: port to mingw and MSVC 14
* src/chown-core.c (restricted_chown): Don’t assume fchown exists.
The Gnulib doc says that nowadays this is needed only for
ports to mingw and MSVC 14, but it’s an easy port so let’s do it.
Paul Eggert [Thu, 31 Aug 2023 00:51:42 +0000 (17:51 -0700)]
maint: Gnulib module gc
Remove Gnulib modules that coreutils code no longer use directly.
Some of these are used indirectly, but gnulib-tool should do that.
* bootstrap.conf (gnulib_modules): Remove calloc-gnu, cloexec,
getgroups, getpass-gnu, getugroups, getusershell, gnu-mae,
group-member, lchown, mgetgroups, netinet_in, readlink,
realloc-gnu, rename, rpmatch, stpncpy, tzset, wchar-single,
wcswidth.
Paul Eggert [Thu, 31 Aug 2023 00:41:42 +0000 (17:41 -0700)]
maint: assume non-rare encodings
* configure.ac (GNULIB_WCHAR_SINGLE_LOCALE): Define.
This can improve performance, while dropping support for
rare encodings on non-GNU platforms. Nowadays these encodings
are typically not worth the hassle.
Paul Eggert [Wed, 30 Aug 2023 14:15:09 +0000 (07:15 -0700)]
maint: use modern Gnulib LIB_ macros
* src/local.mk (src_timeout_LDADD, src_dd_LDADD)
(src_shred_LDADD, src_sync_LDADD): Use TIMER_TIME_LIB
and FDATASYNC_LIB instead of LIB_TIMER_TIME and
LIB_FDATASYNC.
Paul Eggert [Wed, 30 Aug 2023 13:51:18 +0000 (06:51 -0700)]
kill: rely on Gnulib strsignal
Omit checks no longer needed now that we use strsignal.
* configure.ac: Do not check for strsignal-related decls.
* src/kill.c (sys_siglist, strsignal): Remove.
Paul Eggert [Wed, 30 Aug 2023 06:52:07 +0000 (23:52 -0700)]
maint: remove need for mbsalign
This simplifies memory allocation a bit, and removes an arbitrary
limitation from numfmt, which formerly limited cell output to 127
bytes.
* bootstrap.conf (gnulib_modules): Remove mbsalign, strncat.
Add strnlen (the code already used strnlen directly, and we were
saved only because Gnulib used the module indirectly)
* gl/lib/mbsalign.c, gl/lib/mbsalign.h, gl/modules/mbsalign:
* gl/modules/mbsalign-tests, gl/tests/test-mbsalign.c: Remove.
* src/df.c, src/ls.c: Do not include mbsalign.h.
(MBSWIDTH_FLAGS): New constant, now used for all
mbswidth calls. All callers changed to check for -1 return.
* src/df.c (struct field_data_t): ‘width’ is now int not size_t,
since mbswidth can’t do widths greater than INT_MAX anyway.
Replace ‘align’ with ‘align_right’. All uses changed.
(print_table): Redo to avoid the need for ambsalign.
(get_header, get_dev): mbswidth returns int, not size_t.
* src/ls.c (MAX_MON_WIDTH): Remove; no longer used.
(abmon_init): Use strnlen to cheaply discard too-long month names.
Align by hand instead of using mbsalign.
* src/numfmt.c: Include stdckdint.h, mbswidth.h.
Do not include mbsalign.h.
(padding_buffer_size): Now idx_t. All uses changed.
(padding_width): Now intmax_t, since it’s no longer an object
size. Its sign now records alignment. All uses changed.
(zero_padding_width): Now int, since it’s given to sprintf.
All uses changed.
(padding_alignment): Remove; it’s now taken from padding_width’s sign.
(double_to_human): Return string length. BUF_SIZE arg is now idx_t.
Include suffix in output. All callers changed. Simplify by not
calling strncat or stpcpy. Calculate fmt size bound more carefully.
(setup_padding_buffer): Remove. All uses removed.
(parse_format_string): Use intmax_t, not long, for pad.
On overflow, set widths to large values that cause later code
to do the right thing, rather than separately checking for
overflow here.
(prepare_padded_number): Return bool, not int 0/1. New arg
PADDING. All uses changed. Do not limit padded output to 127
bytes; instead, use xpalloc to expand the output buffer.
(print_padded_number): New arg PADDING. All uses changed.
(process_suffixed_number): Simplify.
(main): Take extremum if xstrtoimax overflows, as this does
the right thing.
* tests/misc/numfmt.pl: New test suf-20 to test for truncation bug.
Remove tests pad-3.2, fmt-err-7, as they’re no longer invalid but
are quite expensive.
Paul Eggert [Mon, 28 Aug 2023 19:42:23 +0000 (12:42 -0700)]
maint: spelling fixes, including author names
Most of this just affects commentary and documentations. The only
significant behavior change is translating author names via
proper_name_lite rather than proper_name_utf8, or not translating
them at all. proper_name_lite is good enough for coreutils and
avoids the bloat that had coreutils not using Gnulib proper_name.
* bootstrap.conf (gnulib_modules): Use propername-lite instead
of propername.
(XGETTEXT_OPTIONS): Look for proper_name_lite instead of for
proper_name_utf8.
* cfg.mk (local-checks-to-skip): Remove
sc_proper_name_utf8_requires_ICONV, since we no longer use
proper_name_utf8.
(old_NEWS_hash): Update.
(sc_check-I18N-AUTHORS): Remove; no longer needed.
Paul Eggert [Mon, 28 Aug 2023 02:13:42 +0000 (19:13 -0700)]
sort: port sort-continue test back to Solaris 10
* tests/sort/sort-continue.sh: Use ulimit -n 7 not -n 6. On
Solaris 10 'sort' uses Gnulib mkostemp, which calls Gnulib
getrandom, which opens /dev/urandom to calculate the temp file's
name, which means 'sort' needs one more file descriptor to work.
Pádraig Brady [Sun, 27 Aug 2023 19:22:32 +0000 (20:22 +0100)]
tests: avoid false failure on cygwin
* tests/cksum/md5sum-bsd.sh: Avoid part of test dealing with backslashes
in file names, on systems where backslash is a directory separator.
Issue reported by Bruno Haible on cygwin.
Pádraig Brady [Sun, 27 Aug 2023 18:52:13 +0000 (19:52 +0100)]
cksum: adjust tests and docs to binary mode handling
Following commit v9.3-80-g5e1e0993b which makes cksum
match the output of the standalone utilities...
* doc/coreutils.texi (cksum output modes): Remove the mention
that cksum never outputs a binary indicator, as that's no longer the
case.
* tests/cksum/b2sum.sh: Avoid outputting a binary indicator.
* tests/cksum/sm3sum.pl: Likewise.
Pádraig Brady [Sun, 27 Aug 2023 15:01:27 +0000 (16:01 +0100)]
all: avoid duplicated write errors on FreeBSD
* src/system.h (write_error): Also call fpurge(), which was seen to
be needed on FreeBSD 13.1 to avoid duplicated write errors.
* src/head.c (xwrite_stdout): Likewise.
* bootstrap.conf: Depend on fpurge.
Reported by Bruno Haible.
Pádraig Brady [Mon, 21 Aug 2023 13:51:49 +0000 (14:51 +0100)]
doc: reorg texinfo for the checksumming utilities
* doc/coreutils.texi: Reorg so that 'cksum invocation' is the
main node listing all options and output formats, which is then
referenced by the descriptions of the standalone utilities.
Use macros in the description of the standalone utilities
rather than referencing 'md5sum invocation' to be more direct.
Pádraig Brady [Mon, 21 Aug 2023 12:39:14 +0000 (13:39 +0100)]
cp: with --sparse=never, avoid COW and copy offload
* src/cp.c (main): Set default reflink mode appropriately
with --sparse=never.
* src/copy.c (infer_scantype): Add a comment to related code.
* tests/cp/sparse-2.sh: Add a test case.
* NEWS: Mention the bug.
Pádraig Brady [Tue, 15 Aug 2023 15:44:26 +0000 (16:44 +0100)]
tests: fix false failure due to locale on alpine
* tests/sort/sort-debug-keys.sh: Decimal point was seen to be '.'
on fr_FR.UTF-8 on Alpine Linux 3.18, so add an extra guard
to ensure we've a ',' as the decimal point on this locale.
Fixes https://bugs.gnu.org/65310
Paul Eggert [Tue, 15 Aug 2023 21:00:54 +0000 (14:00 -0700)]
uptime: be more generous about read_utmp failure
* src/uptime.c (print_uptime): Check for overflow
when computing uptime. Use C99-style decl after statements.
Do not let an idx_t value go negative.
(print_uptime, uptime): Be more generous about read_utmp failures,
or when read_utmp does not report the boot time. Instead of
failing, warn but keep going, printing the information that we did
get, and then exit with nonzero status.
(print_uptime): Return the desired exit status. Caller changed.
Bruno Haible [Tue, 15 Aug 2023 19:01:13 +0000 (12:01 -0700)]
uptime: Include VM sleep time in the "up" duration
* src/uptime.c: Don't include c-strtod.h.
(print_uptime): Don't read /proc/uptime, because the value it provides
does not change when a date adjustment occurs.
* bootstrap.conf (gnulib_modules): Remove 'uptime'.
Pádraig Brady [Sat, 12 Aug 2023 10:21:50 +0000 (11:21 +0100)]
maint: allow use of printf C99 integer size specifiers
Older systems that had issues with these like HP-UX and Solaris 8
are now obsolete, and can easily apply patches to provide support.
Also we've used %td since coreutils 9.1, with no reported issues.
* cfg.mk (sc_prohibit-c99-printf-format): Remove to allow use of
%[jtz] size specifiers, which allows for cleaner code
by avoiding the need to cast to PRI?MAX etc.
Pádraig Brady [Fri, 11 Aug 2023 13:27:59 +0000 (14:27 +0100)]
doc: separate out description of 2038 time stamp change
* NEWS: Separate out the description of the _existing_ issues
with outputting timestamps on 32 bit systems, from _future_ issues
outputting timestamps on all systems. Also move this to the
"improvement" section, since it's not really a coreutils
specific issue, and also is a build time configurable option.