Paul Eggert [Sun, 18 Feb 2024 05:51:03 +0000 (21:51 -0800)]
ls: remove unnecessary pragmas
* src/ls.c (decode_switches): Remove pragmas. They are no longer
needed to pacify GCC 13.2.1 with --enable-gcc-checking, and there’s
little point keeping them around for older GCC versions.
Paul Eggert [Sun, 18 Feb 2024 05:36:37 +0000 (21:36 -0800)]
maint: remove unneeded suggest-attributes pragmas
* gl/lib/fadvise.c: Remove pragma that works around GCC bug 83559
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83559>.
This bug was fixed in GCC 9, and we needn’t worry about
--enable-gcc-warnings for compilers that old.
* src/test.c: Likewise.
Greg Wooledge [Sat, 17 Feb 2024 13:07:12 +0000 (13:07 +0000)]
doc: fix typo in shred example
* doc/coreutils.texi (shred invocation): Fix the example
to correctly close file descriptor 3.
* THANKS.in: Remove old email since now recorded in repo history.
Reported at https://bugs.debian.org/1063837
Pádraig Brady [Wed, 7 Feb 2024 10:55:00 +0000 (10:55 +0000)]
build: fix od build on clang < 17
* configure.ac: Ensure the compiler can promote 16 bit floating point
types to float, before enabling that code in od. This was an issue
with clang 16 at least.
* src/od.c: Adjust for the new defines.
* tests/od/od-float.sh: Likewise. Also port to the dash shell,
whose inbuilt printf doesn't support hex escapes.
Pádraig Brady [Thu, 1 Feb 2024 17:59:51 +0000 (17:59 +0000)]
od: support half precision floating point
Rely on compiler support for _Float16 and __bf16
to support -fH and -fB formats respectively.
I.e. IEEE 16 bit, and brain 16 bit floats respectively.
Modern GCC and LLVM compilers support both types.
This was tested on:
gcc 13, clang 17 x86 (Both types supported)
gcc 7 aarch64 (Only -fH supported)
gcc 13 ppc(be) (Neither supported. Both will be with GCC 14)
* src/od.c: Support -tf2 or -tfH to print IEEE 16 bit floating point,
or -tfB to print Brain 16 bit floating point.
* configure.ac: Check for _Float16 and __bf16 types.
* doc/coreutils.texi (od invocation): Mention the new -f types.
* tests/od/od-float.sh: Add test cases.
* NEWS: Mention the new feature.
Addresses https://bugs.gnu.org/68871
Pádraig Brady [Thu, 18 Jan 2024 00:05:18 +0000 (00:05 +0000)]
doc: split -C: test and document a heap overflow
This was introduced in coreutils 9.2 through commit v9.1-184-g40bf1591b,
and was fixed in coreutils 9.5 through commit v9.4-111-gc4c5ed8f4.
This issue has been assigned CVE-2024-0684.
* NEWS: Mention the bug fix.
* tests/split/line-bytes.sh: Add a test case.
Reported by Valentin Metz.
Pádraig Brady [Wed, 17 Jan 2024 23:49:52 +0000 (23:49 +0000)]
tests: make ulimit -v interact better with ASAN
ulimit -v is generally not supported with ASAN, giving errors like:
"ReserveShadowMemoryRange failed while trying to map 0x... bytes.
Perhaps you're using ulimit -v"
* tests/cp/link-heap.sh: Mention ASAN as a possible reason for skipping.
* tests/csplit/csplit-heap.sh: Likewise.
* tests/cut/cut-huge-range.sh: Likewise.
* tests/dd/no-allocate.sh: Likewise.
* tests/printf/printf-surprise.sh: Likewise.
* tests/rm/many-dir-entries-vs-OOM.sh: Likewise.
* tests/head/head-c.sh: Only skip the part of the test needing ulimit.
* tests/split/line-bytes.sh: Likewise.
Paul Eggert [Tue, 16 Jan 2024 21:48:32 +0000 (13:48 -0800)]
split: do not shrink hold buffer
* src/split.c (line_bytes_split): Do not shrink hold buffer.
If it’s large for this batch it’s likely to be large for the next
batch, and for ‘split’ it’s not worth the complexity/CPU hassle to
shrink it. Do not assume hold_size can be bufsize.
Samuel Tardieu [Fri, 5 Jan 2024 15:51:34 +0000 (16:51 +0100)]
maint: add attributes to two functions without side effects
* src/date.c (res_width): This function computes its result solely
from the value of its parameter and qualifies for the const attribute.
* src/tee.c (get_next_out): This function has no side effect and
qualifies for the pure attribute.
* THANKS.in: Remove duplicate now that author has a commit in the repo.
Those two functions were flagged by GCC 12.3.0,
though not by GCC 13.2.1.
Pádraig Brady [Mon, 1 Jan 2024 13:22:42 +0000 (13:22 +0000)]
maint: update all copyright year number ranges
Update to latest gnulib with new copyright year.
Run "make update-copyright" and then...
* gnulib: Update included in this commit as copyright years
are the only change from the previous gnulib commit.
* tests/init.sh: Sync with gnulib to pick up copyright year.
* bootstrap: Manually update copyright year,
until we fully sync with gnulib at a later stage.
* tests/sample-test: Adjust to use the single most recent year.
Paul Eggert [Mon, 1 Jan 2024 03:48:24 +0000 (19:48 -0800)]
maint: pacify recent clang better
* configure.ac: Clang now seems to have -Wformat-extra-args,
-Wimplicit-const-int-float-conversion, and
-Wtautological-constant-out-of-range-compare on by default,
so disable them even if --enable-gcc-warnings is not used.
Rely on Gnulib’s check for clang rather than rolling our own.
Pádraig Brady [Thu, 28 Dec 2023 00:02:42 +0000 (00:02 +0000)]
sort: fix thousands grouping handling on single byte locales
* gl/lib/strnumcmp-in.h (numcompare): After commit v9.0-8-g6cafb122f,
we need to treat characters as signed to avoid invalid comparisons
between negative integers and unsigned characters.
* NEWS: Mention the bug fix.
Pádraig Brady [Wed, 27 Dec 2023 23:37:17 +0000 (23:37 +0000)]
tests: numfmt: fix test related to lower case 'k' SI unit
* tests/misc/numfmt.pl: Following on from v9.4-86-g615167cc4,
adjust this test accordingly. This test was being skipped
on some systems, and so only noticed now.
Reported by Jim Meyering.
Pádraig Brady [Wed, 27 Dec 2023 22:47:48 +0000 (22:47 +0000)]
tests: run locale tests on more systems
* tests/misc/numfmt.pl: Determine the thousands grouping character
in use, rather than skipping locale tests when it's not a space.
For example fr_FR.UTF-8 uses "NARROW NO-BREAK SPACE" as the grouping
char on modern glibc systems at least.
* tests/sort/sort-h-thousands-sep.sh: Likewise.
Do not perform SELinux context translation for operations not involving
user input or output. Context translation converts MCS/MLS labels into
human readable form, which is useful for user facing applications like
ls(1) or the --context=CTX argument of cp(1).
* src/copy.c (set_process_security_ctx): Use raw selinux variants.
* src/install.c (need_copy): Likewise.
(setdefaultfilecon): Likewise.
* src/selinux.c (computecon): Likewise.
(defaultcon): Likewise.
* tests/cp/no-ctx.sh: Add raw variants to preload lib.
* NEWS: Mention the improvement.
Pádraig Brady [Sun, 17 Dec 2023 14:35:36 +0000 (14:35 +0000)]
doc: cp --no-clobber: improve documentation
* doc/coreutils.texi (cp invocation): Reference the related --update
option, like we had already done in mv invocation.
* src/cp.c (usage): State clearly what --no-clobber does,
indicating it's protection focused, rather than being update focused.
* doc/coreutils.texi (chown invocation): Convert --from option
description to a macro and call from ...
(chgrp description): ... here.
* src/chown-core.h (emit_from_option_description): A new function
refactored from ...
* src/chown.c (usage): ... here, and called from ...
* src/chgrp.c (usage): ... here.
(main): Accept the --from option as chown(1) does.
* po/POTFILES.in: Add chown-core.h as now translated.
* tests/chown/basic.sh: Decouple the root user from id 0.
* tests/chgrp/from.sh: A new test largely based on chown/basic.sh.
* tests/local.mk: Reference the new test.
* NEWS: Mention the new feature.
Suggested by Ed Neville.
Pádraig Brady [Mon, 11 Dec 2023 17:03:33 +0000 (17:03 +0000)]
maint: remove obsolete AC_PROG_GCC_TRADITIONAL
* configure.ac: Remove obsolete macro call.
Recent autoconf warns that it is obsolete.
AC_PROG_CPP sets up the -traditional-cpp option if required.
GCC ignores -traditional since commit f458d1d5 (2002).
Fixes https://bugs.gnu.org/67756
Pádraig Brady [Wed, 6 Dec 2023 13:03:48 +0000 (13:03 +0000)]
doc: touch: clarify --time description in man page
* src/touch.c (usage): Reorganise the description to be similar to
the format used for the ls --time description, which formats better
when converted to a man page. Also separate the description
to allow for more granular translations.
Fixes https://bugs.gnu.org/67656
dann frazier [Thu, 30 Nov 2023 01:32:34 +0000 (18:32 -0700)]
tail: fix tailing sysfs files where PAGE_SIZE > BUFSIZ
* src/tail.c (file_lines): Ensure we use a buffer size >= PAGE_SIZE when
searching backwards to avoid seeking within a file,
which on sysfs files is accepted but also returns no data.
* tests/tail/tail-sysfs.sh: Add a new test.
* tests/local.mk: Reference the new test.
* NEWS: Mention the bug fix.
Fixes https://bugs.gnu.org/67490
Pádraig Brady [Sun, 26 Nov 2023 16:41:56 +0000 (16:41 +0000)]
numfmt: support lowercase 'k' for Kilo and Kibi
For consistency with the "SI" standard, and with other coreutils
which output a lowercase 'k' in "SI" mode.
* src/numfmt.c (suffix_power): Treat 'k' like 'K' on input.
(double_to_human): Output lowercase 'k' in SI mode.
(usage): Adjust accordingly.
* doc/coreutils.texi: Mention 'k' accepted, and printed in SI mode.
* tests/misc/numfmt.pl: Adjust accordingly.
* NEWS: Mention the change in behavior.
Fixes https://bugs.gnu.org/47103
Paul Eggert [Thu, 16 Nov 2023 19:34:55 +0000 (11:34 -0800)]
uniq: fix bug with -w in multibyte locales
-w counted bytes not characters, which is wrong in multibyte locales.
This bug exists even in Fedora, which is why the recently-added
test cases from Fedora didn’t catch it.
* src/uniq.c (find_field): New arg PLEN. All callers changed.
Compute length of field correctly in multi-byte locales.
(different): Don’t worry about check_chars; find_field now does that.
* tests/uniq/uniq.pl: Test for this bug.
Paul Eggert [Wed, 15 Nov 2023 23:05:17 +0000 (15:05 -0800)]
uniq: prefer static init
* src/uniq.c (skip_fields, skip_chars, check_chars, count_occurrences)
(output_unique, output_first_repeated, output_later_repeated)
(delimit_groups): Initialize statically, rather than in ‘main’.
This shrinks the executable a bit.
Paul Eggert [Wed, 15 Nov 2023 22:57:17 +0000 (14:57 -0800)]
uniq: simplify and fix unlikely bug by using bool
* src/uniq.c (enum countmode): Remove this type.
(count_occurrences): New static var, replacing the old countmode,
and of type boolean instead of a two-value enum type that was
confusing (and which caused a hard-to-test bug when the count
exceeded INTMAX_MAX - 1). All uses changed.
Paul Eggert [Wed, 15 Nov 2023 07:13:26 +0000 (23:13 -0800)]
uniq: prefer signed integers
* src/uniq.c (skip_fields, skip_chars, check_chars, size_opt)
(find_field, different, writeline, check_file, main):
Prefer signed to unsigned integer types, since this allows
for better runtime checking with -fsanitize=undefined.
Paul Eggert [Wed, 15 Nov 2023 04:35:56 +0000 (20:35 -0800)]
maint: DECIMAL_DIGIT_ACCUMULATE uses stdckdint.h
* src/system.h: Include <stdckdint.h>, since the new
DECIMAL_DIGIT_ACCUMULATE uses it.
Do not include stdckdint.h from files that also include system.h.
(DECIMAL_DIGIT_ACCUMULATE): Omit last arg, which is no longer needed.
Reimplement by using C23-style stdckdint.h’s ckd_mul and ckd_add,
as that’s more standard and is more likely to generate better code.
Paul Eggert [Sat, 11 Nov 2023 08:17:11 +0000 (00:17 -0800)]
pinky: fix string size calculation
* src/pinky.c (count_ampersands): Simplify and return idx_t.
(create_fullname): Compute proper destination string size,
basically, by adding (ulen - 1) * ampersands rather than ulen *
(ampersands - 1). Problem found on CHERI-64.
Pádraig Brady [Fri, 3 Nov 2023 16:22:22 +0000 (16:22 +0000)]
ls: fix recent regression in size alignment
* src/ls.c (print_long_format): Use correct column width,
introduced due to a copy/paste error in commit v9.4-2-gcbb6dfec5
* tests/ls/size-align.sh: Add a test.
* tests/local.mk: Reference the new test.
Fixes https://bugs.gnu.org/66919
Paul Eggert [Mon, 30 Oct 2023 17:47:34 +0000 (10:47 -0700)]
join: fix recently introduced NUL bug
* src/join.c (xfields): Simplify and fix bug with fields
that start with a NUL byte when -t is not used.
* tests/misc/join-utf8.sh: Also test when -t is not used,
and when a field starts with NUL.
Paul Eggert [Mon, 30 Oct 2023 07:32:51 +0000 (00:32 -0700)]
join,uniq: support multi-byte separators
* NEWS: Mention this.
* bootstrap.conf (gnulib_modules): Remove cu-ctype, as this module
is now more trouble than it’s worth. All uses removed.
Add skipchars.
* gl/lib/cu-ctype.c, gl/lib/cu-ctype.h, gl/modules/cu-ctype:
Remove.
* gl/lib/skipchars.c, gl/lib/skipchars.h, gl/modules/skipchars:
* tests/misc/join-utf8.sh:
New files.
* src/join.c: Include skipchars.h and mcel.h instead of cu-ctype.h.
(tab): Now mcel_t, not int. All uses changed.
(output_separator, output_seplen): New static vars.
(eq_tab, newline_or_blank, comma_or_blank): New functions.
(xfields, prfields, prjoin, add_field_list, main):
Support multi-byte characters.
* src/numfmt.c: Include ctype.h, skipchars.h.
Do not include cu-ctype.h.
(newline_or_blank): New function.
(next_field): Support multi-byte characters.
* src/sort.c: Include ctype.h instead of cu-ctype.h.
(inittables): Open-code field_sep since it no longer exists.
‘sort’ is not multi-byte safe yet, but when it is this code
will need revamping anyway.
* src/uniq.c: Include mcel.h and skipchars.h instead of cu-ctype.h.
(newline_or_blank): New function.
(find_field): Support multi-byte characters.
* tests/local.mk (all_tests): Add tests/misc/join-utf8.sh
Paul Eggert [Sat, 28 Oct 2023 16:22:09 +0000 (09:22 -0700)]
dircolors: assume C-locale spaces
* src/dircolors.c: Include c-ctype.h, not ctype.h.
(parse_line): Use c_isspace, not isspace, as the .dircolors
file format (which does not seem to be documented!) appears
to be ASCII.
Paul Eggert [Sat, 28 Oct 2023 00:31:49 +0000 (17:31 -0700)]
maint: include ctype.h selectively
Include ctype.h only in files that need it. Many of its uses
are incorrect, as they assume single-byte locales. The idea is
to remove the incorrect uses later, when there is time.
* src/chroot.c, src/csplit.c, src/dd.c, src/digest.c, src/dircolors.c:
* src/expand-common.c, src/expand.c, src/fmt.c, src/fold.c, src/ls.c:
* src/od.c, src/pinky.c, src/pr.c, src/ptx.c, src/seq.c:
* src/set-fields.c, src/split.c, src/stdbuf.c, src/test.c:
* src/tr.c, src/truncate.c, src/unexpand.c, src/wc.c:
Include ctype.h.
* src/system.h: Do not include ctype.h.
Paul Eggert [Sat, 28 Oct 2023 00:15:08 +0000 (17:15 -0700)]
maint: move field_sep into separate module
This is so that we don’t need to have every source file
include ctype.h.
* bootstrap.conf (gnulib_modules): Add cu-ctype.
* gl/lib/cu-ctype.c, gl/lib/cu-ctype.h, gl/modules/cu-ctype:
New files.
* src/join.c, src/numfmt.c, src/sort.c, src/uniq.c:
Include cu-ctype.h, for field_sep.
* src/system.h (field_sep): Remove; now supplied by cu-ctype.
Paul Eggert [Fri, 27 Oct 2023 15:45:50 +0000 (08:45 -0700)]
maint: prefer c_isxdigit when that is the intent
* src/digest.c (valid_digits, split_3):
* src/echo.c (main):
* src/printf.c (print_esc):
* src/ptx.c (unescape_string):
* src/stat.c (print_it):
When the code is supposed to support only POSIX-locale hex digits,
use c_isxdigit rather than isxdigit. Include c-ctype.h as needed.
This defends against oddball locales where isxdigit != c_isxdigit.
Pádraig Brady [Fri, 27 Oct 2023 12:24:04 +0000 (13:24 +0100)]
base32,base64: disallow non-canonical encodings
This will make decoding more resilient to corruption
whether due to transmission errors or nefarious adjustment.
See https://eprint.iacr.org/2022/361.pdf
* gnulib: Update to commit 3f463202bd enforcing canonical encoding.
* tests/basenc/base64.pl: Add test cases, and adjust existing cases.
* NEWS: Mention the change in behavior.
Paul Eggert [Wed, 25 Oct 2023 22:09:04 +0000 (15:09 -0700)]
basenc: fix unlikely locale issue; tune
This sped up ‘basenc -d --base16’ by 60% on my old platform,
AMD Phenom II X4 910e, Fedora 38.
* src/basenc.c (struct base16_decode_context): Simplify by
omitting have_nibble. ‘nibble’ is now negative if it’s missing.
All uses changed.
(B16): New macro, inspired by ../lib/base64.c.
(base16_to_int): New static var, likewise.
(isubase16): Reimplement using base16_to_int, since isxdigit is
not guaranteed to succeed on the chars we want when the locale is
oddball.
(base16_decode_ctx): Tune by using base16_to_int and by
Paul Eggert [Wed, 25 Oct 2023 21:43:32 +0000 (14:43 -0700)]
basenc: tweak checks to use unsigned char
This tends to generate better code, at least on x86-64,
because callers are just as fast and callees can avoid a conversion.
* src/basenc.c: The following renamings also change the arg type
from char to unsigned char. All uses changed.
(isubase): Rename from isbase.
(isubase64url): Rename from isbase64url.
(isubase32hex): Rename from isbase32hex.
(isubase16): Rename from isbase16.
(isuz85): Rename from isz85.
(isubase2): Rename from isbase2.
2023-10-24 Paul Eggert <eggert@cs.ucla.edu>
* src/basenc.c (struct base16_decode_context):
Simplify by storing -1 for missing nibbles. All uses changed.
Pádraig Brady [Mon, 23 Oct 2023 11:51:19 +0000 (12:51 +0100)]
basenc: --base16: support lower case hex digits
* src/basenc.c (base16_decode_ctx): Convert to uppercase
before converting from hex.
* tests/basenc/basenc.pl: Add a test case.
* NEWS: Mention the change in behavior.
Addresses https://bugs.gnu.org/66698
Pádraig Brady [Thu, 5 Oct 2023 16:00:51 +0000 (17:00 +0100)]
basenc: auto pad base32 and base64 inputs when decoding
Padding of encoded data is useful in cases where
base64 encoded data is concatenated / streamed.
I.e. where there are padding chars _within_ the stream.
In other cases padding is optional and can be inferred.
Note we continue to treat partial padding as invalid,
as that would be indicative of truncation.
* src/basenc.c (do_decode): Auto pad the end of the input.
* NEWS: Mention the change in behavior.
* tests/misc/base64.pl: Adjust to not fail for missing padding.
Addresses https://bugs.gnu.org/66265
Paul Eggert [Sun, 24 Sep 2023 00:07:33 +0000 (17:07 -0700)]
wc: goto considered harmful
* src/wc.c: Do not include assure.h. Replace the only
use of ‘assure’ with ‘unreachable’ which is good enough.
(wc, main): Remove labels and gotos. This doesn’t affect
performance in any way I can measure, and makes the code
a bit easier to follow.
Paul Eggert [Sat, 23 Sep 2023 21:22:16 +0000 (14:22 -0700)]
wc: prefer signed integers
Prefer signed to unsigned integers, to make it easier to catch
integer overflow errors.
* src/wc.c: Do not include safe-read.
(total_lines_overflow, total_words_overflow, total_chars_overflow)
(total_bytes_overflow): Now bool, not uintmax_t. All uses changed.
(max_line_length): Now intmax_t, not uintmax_t. All uses changed.
The total_... vars are still uintmax_t because overflow into them
is checked.
(page_size): Now idx_t, not size_t.
(wc_lines, wc, get_input_fstatus, compute_number_width, main):
Prefer signed to unsigned ints where either should do.
(wc_lines, wc): Use read rather than safe_read, since we don’t
need safe_read’s checks for huge buffers.
(wc): Redo call to mbrtoc32 to lessen the number of comparisons
against its returned value. Do this partly by keeping a pointer
to the end of the buffer rather than a count. Simplify
overflow-checking code.
(compute_number_width): Check for integer overflow.
Don’t assume size_t fits into unsigned long.
* src/wc.h (struct wc_lines): Prefer signed integers.
* src/wc_avx2.c: Do not include safe-read.h.
(wc_lines_avx2): Prefer signed integers. Use read, not safe_read.
Paul Eggert [Sat, 23 Sep 2023 20:38:08 +0000 (13:38 -0700)]
wc: improve avx2 API
* src/wc.c: Use "#include <...>" for files not in the current dir.
Include "wc.h" instead of declaring wc_lines_avx2 by hand.
(wc_lines): New API, with no file name (no longer needed) and
with a return struct rather than arg pointers. All uses changed.
Use avx2_supported directly instead of using a function pointer.
Exploit C99-style declarations after statements.
Multiply by 15 rather than dividing; it’s faster and more accurate
and cannot overflow here.
(wc): Simplify based on wc_lines API change.
* src/wc.h: New file.
* src/wc_avx2.c: Include it, to check API better.
(wc_lines_avx2): Use new API. All uses changed. Exploit C99.
Make locals more local.
Paul Eggert [Sat, 23 Sep 2023 08:15:08 +0000 (01:15 -0700)]
factor,tail: avoid quadratic reallocation
* src/factor.c (struct mp_factors): New member nalloc.
(mp_factor_init): Initialize it.
* src/factor.c (mp_factor_insert):
* src/tail.c (parse_options): Use xpalloc to avoid quadratic
worst-case behavior on reallocation.
* src/tail.c (pids_alloc): New static var.
Paul Eggert [Sat, 23 Sep 2023 07:03:41 +0000 (00:03 -0700)]
wc: simplify by removing SUPPORT_OLD_MBRTOWC
* src/wc.c (SUPPORT_OLD_MBRTOWC): Remove. All uses removed.
(wc): Simplify by assuming C99-or-later behavior for mbrtoc32,
which after all is a C11 API. Fix the !SUPPORT_OLD_MBRTOWC
code, which evidently was never tested seriously.
Paul Eggert [Sat, 23 Sep 2023 05:09:37 +0000 (22:09 -0700)]
wc: 3× speedup in C locale
The 3× speedup was measured by invoking 'wc $(find * -type f)'
on the coreutils sources etc. on an Ubuntu 23.04 x86-64.
These changes also speed up wc 20% in UTF-8 locales.
* src/wc.c (wc_isprint, wc_isspace): New static vars.
(wc): Use them for speed.
(main): Initialize them if needed.
(isnbspace): Remove; no longer used.
Paul Eggert [Fri, 22 Sep 2023 18:13:51 +0000 (11:13 -0700)]
wc: fix word count bug
* bootstrap.conf (gnulib_modules): Remove c32isprint.
* src/wc.c (wc): Consider all non-white-space characters
to be word constituents, even if they are not printable.
POSIX requires this, and it is what BSD does.
Partly do this by simplifying the check for a word,
by counting word starts rather than word ends.
* tests/wc/wc.pl: Test for the bug.
Paul Eggert [Fri, 22 Sep 2023 17:05:58 +0000 (10:05 -0700)]
maint: omit some unused function tests
* m4/jm-macros.m4: Do not check for ftruncate, iswspace,
mkfifo, mbrlen, sysctl. Coreutils no longer uses the
corresponding HAVE_* macros, typically because Gnulib
handles them now.
* src/wc.c (iswspace): Remove; unused.
Paul Eggert [Fri, 22 Sep 2023 16:45:12 +0000 (09:45 -0700)]
maint: prefer char32_t to wchar_t
This should work better on non-glibc platforms that don’t
use Unicode for wchar_t. However, POSIX appears to prohibit
this for printf.c so leave that alone.
* bootstrap.conf (gnulib_modules): Add btoc32, c32iscntrl,
c32isprint, c32isspace, c32width, mbrtoc32. Remove btoc, wcwidth.
* src/df.c, src/ls.c, src/wc.c:
Include uchar.h instead of wchar.h and wctype.h.
* src/df.c (replace_invalid_chars):
* src/ls.c (quote_name_buf):
* src/wc.c (isnbspace, wc):
Use char32_t instead of wchar_t.
Paul Eggert [Fri, 22 Sep 2023 15:17:15 +0000 (08:17 -0700)]
wc: simplify #if MB_LEN_MAX
* src/wc.c: Don’t have special #ifs for platforms where
MB_LEN_MAX is 1. On these platforms, MB_CUR_MAX is 1 as well,
so the compiler should optimize away all multi-byte code.
Paul Eggert [Fri, 22 Sep 2023 01:45:47 +0000 (18:45 -0700)]
maint: prefer mcel
This causes Gnulib code to also use mcel, which is more consistent.
* bootstrap.conf (avoided_gnulib_modules): Avoid mbuiter
and mbuiterf, since we can now just use mcel. This avoids
the need to ship and compile mbchar and these modules.
(gnulib_modules): Change mcel to mcel-prefer.
Paul Eggert [Fri, 22 Sep 2023 01:45:08 +0000 (18:45 -0700)]
wc: stop worrying about EBCDIC, shift-JIS, etc
* src/wc.c: Do not include mbchar.h.
(wc): Check for ASCII characters instead of using is_basic.
Other parts of Gnulib and coreutils already assume the encoding
is upward compatible with ASCII, and the old code wouldn’t
have worked anyway with shift-JIS.
Paul Eggert [Thu, 21 Sep 2023 23:59:48 +0000 (16:59 -0700)]
expr: use mcel
The mcel API is simpler and corresponds more closely to how
Emacs etc. behave when the input has encoding errors,
since it treats each encoding-error byte separately.
* bootstrap.conf (gnulib_modules): Add mcel.
* src/expr.c: Include mcel.h instead of mbuiter.h.
(mbs_logical_cspn, mbs_logical_substr, mbs_offset_to_chars):
Use mcel API.
(mbs_logical_substr): Use ximemdup0 so as not to waste memory in
the result, fixing a FIXME.
build: avoid build failures on gcc <= 10, or clang
On gcc 10 the following build failure occurs:
"error: a label can only be part of a statement
and a declaration is not a statement"
This is because the current code is non standards conforming,
but GCC >= 11 will compile it (even with the -Wpedantic option).
This issue is tracked for GCC at:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111526
* src/tail.c (parse_options): Avoid a declaration after label,
by using a surrounding block.
Stephen Kitt [Mon, 18 Sep 2023 16:09:29 +0000 (18:09 +0200)]
tail: allow multiple PIDs
tail can watch multiple files, but currently only a single writer. It
can be useful to watch files from multiple writers, or even processes
not directly related to the files (e.g. watch log files written by a
server process, for the duration of a test driven by a separate
client).
* src/tail.c (writers_are_dead): New function.
(tail_forever): Use it to wait for writers.
(tail_forever_inotify): As above.
(parse_options): Manage --pid options in an array.
* doc/coreutils.texi: Update documentation.
* tests/tail/pid.sh: Add a variant with two PIDs.
* News: Mention the new feature.
ls: --dired now implies long format with hyperlinks disabled
Currently --dired is silently ignored
with conflicting output formats
* src/ls.c (decode_switches): Set default format and hyperlink mode
when the --dired option is specified.
* tests/ls/dired.sh: Check that formats are implied / overridden.
* NEWS: Mention the change in behavior.
* doc/coreutils.texi (ls invocation): Adjust --dired description.