]> git.ipfire.org Git - thirdparty/coreutils.git/log
thirdparty/coreutils.git
17 months agomaint: remove unneeded suggest-attributes pragmas
Paul Eggert [Sun, 18 Feb 2024 05:36:37 +0000 (21:36 -0800)] 
maint: remove unneeded suggest-attributes pragmas

* gl/lib/fadvise.c: Remove pragma that works around GCC bug 83559
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83559>.
This bug was fixed in GCC 9, and we needn’t worry about
--enable-gcc-warnings for compilers that old.
* src/test.c: Likewise.

17 months agodoc: fix typo in shred example
Greg Wooledge [Sat, 17 Feb 2024 13:07:12 +0000 (13:07 +0000)] 
doc: fix typo in shred example

* doc/coreutils.texi (shred invocation): Fix the example
to correctly close file descriptor 3.
* THANKS.in: Remove old email since now recorded in repo history.
Reported at https://bugs.debian.org/1063837

17 months agomaint: avoid -Wshadow warning under clang
Collin Funk [Wed, 7 Feb 2024 11:58:02 +0000 (03:58 -0800)] 
maint: avoid -Wshadow warning under clang

* src/env.c (parse_signal_action_params, parse_signal_block_params):
Rename OPTARG to ARG so that it does not conflict with OPTARG used by
getopt.

Copyright-paperwork-exempt: Yes

18 months agobuild: fix od build on clang < 17
Pádraig Brady [Wed, 7 Feb 2024 10:55:00 +0000 (10:55 +0000)] 
build: fix od build on clang < 17

* configure.ac: Ensure the compiler can promote 16 bit floating point
types to float, before enabling that code in od.  This was an issue
with clang 16 at least.
* src/od.c: Adjust for the new defines.
* tests/od/od-float.sh: Likewise.  Also port to the dash shell,
whose inbuilt printf doesn't support hex escapes.

18 months agood: support half precision floating point
Pádraig Brady [Thu, 1 Feb 2024 17:59:51 +0000 (17:59 +0000)] 
od: support half precision floating point

Rely on compiler support for _Float16 and __bf16
to support -fH and -fB formats respectively.
I.e. IEEE 16 bit, and brain 16 bit floats respectively.
Modern GCC and LLVM compilers support both types.

clang-sect=half-precision-floating-point
https://gcc.gnu.org/onlinedocs/gcc/Half-Precision.html
https://clang.llvm.org/docs/LanguageExtensions.html#$clang-sect
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0192r4.html
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p1467r9.html

This was tested on:
gcc 13, clang 17 x86 (Both types supported)
gcc 7 aarch64 (Only -fH supported)
gcc 13 ppc(be) (Neither supported. Both will be with GCC 14)

* src/od.c: Support -tf2 or -tfH to print IEEE 16 bit floating point,
or -tfB to print Brain 16 bit floating point.
* configure.ac: Check for _Float16 and __bf16 types.
* doc/coreutils.texi (od invocation): Mention the new -f types.
* tests/od/od-float.sh: Add test cases.
* NEWS: Mention the new feature.
Addresses https://bugs.gnu.org/68871

18 months agoseq: say why not ‘x += step’
Paul Eggert [Mon, 29 Jan 2024 07:35:49 +0000 (23:35 -0800)] 
seq: say why not ‘x += step’

* src/seq.c (print_numbers): Add comment.

18 months agodoc: split -C: test and document a heap overflow
Pádraig Brady [Thu, 18 Jan 2024 00:05:18 +0000 (00:05 +0000)] 
doc: split -C: test and document a heap overflow

This was introduced in coreutils 9.2 through commit v9.1-184-g40bf1591b,
and was fixed in coreutils 9.5 through commit v9.4-111-gc4c5ed8f4.
This issue has been assigned CVE-2024-0684.

* NEWS: Mention the bug fix.
* tests/split/line-bytes.sh: Add a test case.
Reported by Valentin Metz.

18 months agotests: make ulimit -v interact better with ASAN
Pádraig Brady [Wed, 17 Jan 2024 23:49:52 +0000 (23:49 +0000)] 
tests: make ulimit -v interact better with ASAN

ulimit -v is generally not supported with ASAN, giving errors like:
  "ReserveShadowMemoryRange failed while trying to map 0x... bytes.
   Perhaps you're using ulimit -v"

* tests/cp/link-heap.sh: Mention ASAN as a possible reason for skipping.
* tests/csplit/csplit-heap.sh: Likewise.
* tests/cut/cut-huge-range.sh: Likewise.
* tests/dd/no-allocate.sh: Likewise.
* tests/printf/printf-surprise.sh: Likewise.
* tests/rm/many-dir-entries-vs-OOM.sh: Likewise.
* tests/head/head-c.sh: Only skip the part of the test needing ulimit.
* tests/split/line-bytes.sh: Likewise.

18 months agosplit: do not shrink hold buffer
Paul Eggert [Tue, 16 Jan 2024 21:48:32 +0000 (13:48 -0800)] 
split: do not shrink hold buffer

* src/split.c (line_bytes_split): Do not shrink hold buffer.
If it’s large for this batch it’s likely to be large for the next
batch, and for ‘split’ it’s not worth the complexity/CPU hassle to
shrink it.  Do not assume hold_size can be bufsize.

18 months agotests: ls: add a test to verify that '+' is added
Sylvestre Ledru [Wed, 10 Jan 2024 18:18:05 +0000 (19:18 +0100)] 
tests: ls: add a test to verify that '+' is added

* tests/ls/acl.sh: Add a new test.
* tests/local.mk: Reference the new test.

19 months agomaint: add attributes to two functions without side effects
Samuel Tardieu [Fri, 5 Jan 2024 15:51:34 +0000 (16:51 +0100)] 
maint: add attributes to two functions without side effects

* src/date.c (res_width): This function computes its result solely
from the value of its parameter and qualifies for the const attribute.
* src/tee.c (get_next_out): This function has no side effect and
qualifies for the pure attribute.
* THANKS.in: Remove duplicate now that author has a commit in the repo.

Those two functions were flagged by GCC 12.3.0,
though not by GCC 13.2.1.

19 months agomaint: update all copyright year number ranges
Pádraig Brady [Mon, 1 Jan 2024 13:22:42 +0000 (13:22 +0000)] 
maint: update all copyright year number ranges

Update to latest gnulib with new copyright year.
Run "make update-copyright" and then...

* gnulib: Update included in this commit as copyright years
are the only change from the previous gnulib commit.
* tests/init.sh: Sync with gnulib to pick up copyright year.
* bootstrap: Manually update copyright year,
until we fully sync with gnulib at a later stage.
* tests/sample-test: Adjust to use the single most recent year.

19 months agomaint: pacify recent clang better
Paul Eggert [Mon, 1 Jan 2024 03:48:24 +0000 (19:48 -0800)] 
maint: pacify recent clang better

* configure.ac: Clang now seems to have -Wformat-extra-args,
-Wimplicit-const-int-float-conversion, and
-Wtautological-constant-out-of-range-compare on by default,
so disable them even if --enable-gcc-warnings is not used.
Rely on Gnulib’s check for clang rather than rolling our own.

19 months agomaint: pacify clang -Winclude-next-absolute-path
Paul Eggert [Mon, 1 Jan 2024 03:48:24 +0000 (19:48 -0800)] 
maint: pacify clang -Winclude-next-absolute-path

* gl/lib/xdectoint.c: Use #include <...> instead of #include "...".

19 months agobuild: update gnulib submodule to latest
Paul Eggert [Mon, 1 Jan 2024 03:48:24 +0000 (19:48 -0800)] 
build: update gnulib submodule to latest

19 months agols: omit bad_cast
Paul Eggert [Fri, 29 Dec 2023 00:32:28 +0000 (16:32 -0800)] 
ls: omit bad_cast

* src/ls.c (decode_switches): Declare some local vars to be
char const *, not char *, and omit unnecessary bad_cast calls.

19 months agosplit: omit bad_cast
Paul Eggert [Fri, 29 Dec 2023 00:32:28 +0000 (16:32 -0800)] 
split: omit bad_cast

* src/split.c (infile): Now char const *, not char *.
(main): Omit unnecessary bad_cast calls.

19 months agosort: fix thousands grouping handling on single byte locales
Pádraig Brady [Thu, 28 Dec 2023 00:02:42 +0000 (00:02 +0000)] 
sort: fix thousands grouping handling on single byte locales

* gl/lib/strnumcmp-in.h (numcompare): After commit v9.0-8-g6cafb122f,
we need to treat characters as signed to avoid invalid comparisons
between negative integers and unsigned characters.
* NEWS: Mention the bug fix.

19 months agotests: numfmt: fix test related to lower case 'k' SI unit
Pádraig Brady [Wed, 27 Dec 2023 23:37:17 +0000 (23:37 +0000)] 
tests: numfmt: fix test related to lower case 'k' SI unit

* tests/misc/numfmt.pl: Following on from v9.4-86-g615167cc4,
adjust this test accordingly.  This test was being skipped
on some systems, and so only noticed now.
Reported by Jim Meyering.

19 months agotests: run locale tests on more systems
Pádraig Brady [Wed, 27 Dec 2023 22:47:48 +0000 (22:47 +0000)] 
tests: run locale tests on more systems

* tests/misc/numfmt.pl: Determine the thousands grouping character
in use, rather than skipping locale tests when it's not a space.
For example fr_FR.UTF-8 uses "NARROW NO-BREAK SPACE" as the grouping
char on modern glibc systems at least.
* tests/sort/sort-h-thousands-sep.sh: Likewise.

19 months agomaint: distribute new header from previous commit
Pádraig Brady [Fri, 29 Dec 2023 17:51:19 +0000 (17:51 +0000)] 
maint: distribute new header from previous commit

* src/local.mk: Reference the new header, so it's distributed.

19 months agomaint: merge chgrp and chown sources
Pádraig Brady [Wed, 27 Dec 2023 13:28:02 +0000 (13:28 +0000)] 
maint: merge chgrp and chown sources

chown is a close superset of chgrp functionality,
so merge sources to avoid unwanted divergence in future.
This removes about 300 lines in chgrp.c

* build-aux/gen-single-binary.sh: Generate new rules for chgrp.
* cfg.mk: Exclude new wrappers.
* po/POTFILES.in: Remove chgrp.c
* src/chgrp.c: Remove.
* src/chown-chgrp.c: New wrapper.
* src/chown-chown.c: Likewise.
* src/chown.c (main): Prepend ':' for chgrp(1).
* src/chown.h: Define both operating modes.
(usage): Adjust depending on utility being called.
* src/coreutils-chgrp.c: Likewise.
* src/local.mk: Reference new wrappers.

19 months agocopy,install: avoid unnecessary security context translations
Christian Göttsche [Tue, 19 Dec 2023 14:55:28 +0000 (15:55 +0100)] 
copy,install: avoid unnecessary security context translations

Do not perform SELinux context translation for operations not involving
user input or output.  Context translation converts MCS/MLS labels into
human readable form, which is useful for user facing applications like
ls(1) or the --context=CTX argument of cp(1).

* src/copy.c (set_process_security_ctx): Use raw selinux variants.
* src/install.c (need_copy): Likewise.
(setdefaultfilecon): Likewise.
* src/selinux.c (computecon): Likewise.
(defaultcon): Likewise.
* tests/cp/no-ctx.sh: Add raw variants to preload lib.
* NEWS: Mention the improvement.

19 months agobuild: update gnulib to latest
Pádraig Brady [Tue, 19 Dec 2023 17:18:46 +0000 (17:18 +0000)] 
build: update gnulib to latest

* gnulib: Primarily to get raw selinux wrappers

19 months agomaint: avoid false positive warning with newer gcc
Pádraig Brady [Sun, 17 Dec 2023 17:13:31 +0000 (17:13 +0000)] 
maint: avoid false positive warning with newer gcc

* src/pr.c (read_line): GCC 13.2.1 can't discern that CHARS
is not used with '\n', so avoid the -Werror=maybe-uninitialized
issue in dev builds.

19 months agodoc: cp --no-clobber: improve documentation
Pádraig Brady [Sun, 17 Dec 2023 14:35:36 +0000 (14:35 +0000)] 
doc: cp --no-clobber: improve documentation

* doc/coreutils.texi (cp invocation): Reference the related --update
option, like we had already done in mv invocation.
* src/cp.c (usage): State clearly what --no-clobber does,
indicating it's protection focused, rather than being update focused.

19 months agochgrp: add --from parameter similar to chown
Pádraig Brady [Wed, 27 Sep 2023 19:32:06 +0000 (20:32 +0100)] 
chgrp: add --from parameter similar to chown

* doc/coreutils.texi (chown invocation): Convert --from option
description to a macro and call from ...
(chgrp description): ... here.
* src/chown-core.h (emit_from_option_description): A new function
refactored from ...
* src/chown.c (usage): ... here, and called from ...
* src/chgrp.c (usage): ... here.
(main): Accept the --from option as chown(1) does.
* po/POTFILES.in: Add chown-core.h as now translated.
* tests/chown/basic.sh: Decouple the root user from id 0.
* tests/chgrp/from.sh: A new test largely based on chown/basic.sh.
* tests/local.mk: Reference the new test.
* NEWS: Mention the new feature.
Suggested by Ed Neville.

19 months agomaint: remove obsolete AC_PROG_GCC_TRADITIONAL
Pádraig Brady [Mon, 11 Dec 2023 17:03:33 +0000 (17:03 +0000)] 
maint: remove obsolete AC_PROG_GCC_TRADITIONAL

* configure.ac: Remove obsolete macro call.
Recent autoconf warns that it is obsolete.
AC_PROG_CPP sets up the -traditional-cpp option if required.
GCC ignores -traditional since commit f458d1d5 (2002).
Fixes https://bugs.gnu.org/67756

19 months agodoc: ls: fix regression in -f description
Pádraig Brady [Mon, 11 Dec 2023 14:20:47 +0000 (14:20 +0000)] 
doc: ls: fix regression in -f description

The description of -f regressed in coreutils 9.0

* doc/coreutils.texi (ls invocation): Detail which options
are enabled/disabled with -f.
* src/ls.c (usage): Likewise.
(decode_switches): Update comments.
Fixes https://bugs.gnu.org/67765

19 months agomaint: add list/obstack.h to .gitignore
Pádraig Brady [Mon, 11 Dec 2023 14:33:14 +0000 (14:33 +0000)] 
maint: add list/obstack.h to .gitignore

Following recent gnulib update

19 months agobuild: update gnulib submodule to latest
Pádraig Brady [Sun, 10 Dec 2023 19:04:59 +0000 (19:04 +0000)] 
build: update gnulib submodule to latest

* bootstrap: Copy from latest Gnulib,
to fix --bootstrap-sync with other options.

20 months agodoc: touch: clarify --time description in man page
Pádraig Brady [Wed, 6 Dec 2023 13:03:48 +0000 (13:03 +0000)] 
doc: touch: clarify --time description in man page

* src/touch.c (usage): Reorganise the description to be similar to
the format used for the ls --time description, which formats better
when converted to a man page.  Also separate the description
to allow for more granular translations.
Fixes https://bugs.gnu.org/67656

20 months agotail: fix tailing sysfs files where PAGE_SIZE > BUFSIZ
dann frazier [Thu, 30 Nov 2023 01:32:34 +0000 (18:32 -0700)] 
tail: fix tailing sysfs files where PAGE_SIZE > BUFSIZ

* src/tail.c (file_lines): Ensure we use a buffer size >= PAGE_SIZE when
searching backwards to avoid seeking within a file,
which on sysfs files is accepted but also returns no data.
* tests/tail/tail-sysfs.sh: Add a new test.
* tests/local.mk: Reference the new test.
* NEWS: Mention the bug fix.
Fixes https://bugs.gnu.org/67490

20 months agonumfmt: support lowercase 'k' for Kilo and Kibi
Pádraig Brady [Sun, 26 Nov 2023 16:41:56 +0000 (16:41 +0000)] 
numfmt: support lowercase 'k' for Kilo and Kibi

For consistency with the "SI" standard, and with other coreutils
which output a lowercase 'k' in "SI" mode.

* src/numfmt.c (suffix_power): Treat 'k' like 'K' on input.
(double_to_human): Output lowercase 'k' in SI mode.
(usage): Adjust accordingly.
* doc/coreutils.texi: Mention 'k' accepted, and printed in SI mode.
* tests/misc/numfmt.pl: Adjust accordingly.
* NEWS: Mention the change in behavior.
Fixes https://bugs.gnu.org/47103

20 months agouniq: fix bug with -w in multibyte locales
Paul Eggert [Thu, 16 Nov 2023 19:34:55 +0000 (11:34 -0800)] 
uniq: fix bug with -w in multibyte locales

-w counted bytes not characters, which is wrong in multibyte locales.
This bug exists even in Fedora, which is why the recently-added
test cases from Fedora didn’t catch it.
* src/uniq.c (find_field): New arg PLEN.  All callers changed.
Compute length of field correctly in multi-byte locales.
(different): Don’t worry about check_chars; find_field now does that.
* tests/uniq/uniq.pl: Test for this bug.

20 months agotests: omit inapplicable test code
Paul Eggert [Thu, 16 Nov 2023 18:12:55 +0000 (10:12 -0800)] 
tests: omit inapplicable test code

* tests/misc/join.pl, tests/uniq/uniq.pl:
Remove test for "invalid byte, character or field list" message
that is not generated.

20 months agouniq: change macro to function
Paul Eggert [Wed, 15 Nov 2023 23:08:34 +0000 (15:08 -0800)] 
uniq: change macro to function

* src/uniq.c (swap_lines): New static function, replacing
the old SWAP_LINES macro.  These days this is just as fast.
All uses changed.

20 months agouniq: prefer static init
Paul Eggert [Wed, 15 Nov 2023 23:05:17 +0000 (15:05 -0800)] 
uniq: prefer static init

* src/uniq.c (skip_fields, skip_chars, check_chars, count_occurrences)
(output_unique, output_first_repeated, output_later_repeated)
(delimit_groups): Initialize statically, rather than in ‘main’.
This shrinks the executable a bit.

20 months agouniq: simplify and fix unlikely bug by using bool
Paul Eggert [Wed, 15 Nov 2023 22:57:17 +0000 (14:57 -0800)] 
uniq: simplify and fix unlikely bug by using bool

* src/uniq.c (enum countmode): Remove this type.
(count_occurrences): New static var, replacing the old countmode,
and of type boolean instead of a two-value enum type that was
confusing (and which caused a hard-to-test bug when the count
exceeded INTMAX_MAX - 1).  All uses changed.

20 months agouniq: prefer signed integers
Paul Eggert [Wed, 15 Nov 2023 07:13:26 +0000 (23:13 -0800)] 
uniq: prefer signed integers

* src/uniq.c (skip_fields, skip_chars, check_chars, size_opt)
(find_field, different, writeline, check_file, main):
Prefer signed to unsigned integer types, since this allows
for better runtime checking with -fsanitize=undefined.

20 months agomaint: DECIMAL_DIGIT_ACCUMULATE uses stdckdint.h
Paul Eggert [Wed, 15 Nov 2023 04:35:56 +0000 (20:35 -0800)] 
maint: DECIMAL_DIGIT_ACCUMULATE uses stdckdint.h

* src/system.h: Include <stdckdint.h>, since the new
DECIMAL_DIGIT_ACCUMULATE uses it.
Do not include stdckdint.h from files that also include system.h.
(DECIMAL_DIGIT_ACCUMULATE): Omit last arg, which is no longer needed.
Reimplement by using C23-style stdckdint.h’s ckd_mul and ckd_add,
as that’s more standard and is more likely to generate better code.

20 months agopinky: fix string size calculation
Paul Eggert [Sat, 11 Nov 2023 08:17:11 +0000 (00:17 -0800)] 
pinky: fix string size calculation

* src/pinky.c (count_ampersands): Simplify and return idx_t.
(create_fullname): Compute proper destination string size,
basically, by adding (ulen - 1) * ampersands rather than ulen *
(ampersands - 1).  Problem found on CHERI-64.

20 months agomaint: port randread to FreeBSD 14
Paul Eggert [Sat, 11 Nov 2023 08:14:48 +0000 (00:14 -0800)] 
maint: port randread to FreeBSD 14

* gl/lib/randread.c (POINTER_IS_ALIGNED): Rename from
ALIGNED_POINTER to avoid a collision with <machine/param.h>
on FreeBSD 14.

20 months agobuild: update gnulib submodule to latest
Paul Eggert [Sat, 11 Nov 2023 03:08:54 +0000 (19:08 -0800)] 
build: update gnulib submodule to latest

21 months agols: fix recent regression in size alignment
Pádraig Brady [Fri, 3 Nov 2023 16:22:22 +0000 (16:22 +0000)] 
ls: fix recent regression in size alignment

* src/ls.c (print_long_format): Use correct column width,
introduced due to a copy/paste error in commit v9.4-2-gcbb6dfec5
* tests/ls/size-align.sh: Add a test.
* tests/local.mk: Reference the new test.
Fixes https://bugs.gnu.org/66919

21 months agojoin: fix recently introduced NUL bug
Paul Eggert [Mon, 30 Oct 2023 17:47:34 +0000 (10:47 -0700)] 
join: fix recently introduced NUL bug

* src/join.c (xfields): Simplify and fix bug with fields
that start with a NUL byte when -t is not used.
* tests/misc/join-utf8.sh: Also test when -t is not used,
and when a field starts with NUL.

21 months agomaint: pacify ‘make syntax-check’
Paul Eggert [Mon, 30 Oct 2023 08:32:37 +0000 (01:32 -0700)] 
maint: pacify ‘make syntax-check’

* tests/misc/join-utf8.sh: Omit fail=0.
Fix framework_failure_ typo.
* tests/misc/join.pl: Change ` to '.

21 months agomaint: copy join, uniq tests from Fedora
Paul Eggert [Mon, 30 Oct 2023 08:24:28 +0000 (01:24 -0700)] 
maint: copy join, uniq tests from Fedora

* tests/misc/join.pl, tests/uniq/uniq.pl:
Copy from Fedora 39.  This adds more multi-byte tests.

21 months agojoin,uniq: support multi-byte separators
Paul Eggert [Mon, 30 Oct 2023 07:32:51 +0000 (00:32 -0700)] 
join,uniq: support multi-byte separators

* NEWS: Mention this.
* bootstrap.conf (gnulib_modules): Remove cu-ctype, as this module
is now more trouble than it’s worth.  All uses removed.
Add skipchars.
* gl/lib/cu-ctype.c, gl/lib/cu-ctype.h, gl/modules/cu-ctype:
Remove.
* gl/lib/skipchars.c, gl/lib/skipchars.h, gl/modules/skipchars:
* tests/misc/join-utf8.sh:
New files.
* src/join.c: Include skipchars.h and mcel.h instead of cu-ctype.h.
(tab): Now mcel_t, not int.  All uses changed.
(output_separator, output_seplen): New static vars.
(eq_tab, newline_or_blank, comma_or_blank): New functions.
(xfields, prfields, prjoin, add_field_list, main):
Support multi-byte characters.
* src/numfmt.c: Include ctype.h, skipchars.h.
Do not include cu-ctype.h.
(newline_or_blank): New function.
(next_field): Support multi-byte characters.
* src/sort.c: Include ctype.h instead of cu-ctype.h.
(inittables): Open-code field_sep since it no longer exists.
‘sort’ is not multi-byte safe yet, but when it is this code
will need revamping anyway.
* src/uniq.c: Include mcel.h and skipchars.h instead of cu-ctype.h.
(newline_or_blank): New function.
(find_field): Support multi-byte characters.
* tests/local.mk (all_tests): Add tests/misc/join-utf8.sh

21 months agotest: allow non-blank white space in numbers
Paul Eggert [Sat, 28 Oct 2023 23:15:49 +0000 (16:15 -0700)] 
test: allow non-blank white space in numbers

* src/test.c (find_int): Use isspace, not isblank,
for compatibility with how strtol works, which
is how most other shells do this.

21 months agostdbuf: port to oddball toupper
Paul Eggert [Sat, 28 Oct 2023 16:30:49 +0000 (09:30 -0700)] 
stdbuf: port to oddball toupper

* src/stdbuf.c: Do not include ctype.h.
(set_libstdbuf_options): Use c_toupper, not toupper,
since the C locale is intended here.

21 months agodircolors: assume C-locale spaces
Paul Eggert [Sat, 28 Oct 2023 16:22:09 +0000 (09:22 -0700)] 
dircolors: assume C-locale spaces

* src/dircolors.c: Include c-ctype.h, not ctype.h.
(parse_line): Use c_isspace, not isspace, as the .dircolors
file format (which does not seem to be documented!) appears
to be ASCII.

21 months agomaint: port to oddball tolower
Paul Eggert [Sat, 28 Oct 2023 16:07:14 +0000 (09:07 -0700)] 
maint: port to oddball tolower

* src/digest.c (hex_equal): Work even in oddball locales
where tolower does not work as expected on ASCII letters.

21 months agomaint: include ctype.h selectively
Paul Eggert [Sat, 28 Oct 2023 00:31:49 +0000 (17:31 -0700)] 
maint: include ctype.h selectively

Include ctype.h only in files that need it.  Many of its uses
are incorrect, as they assume single-byte locales.  The idea is
to remove the incorrect uses later, when there is time.
* src/chroot.c, src/csplit.c, src/dd.c, src/digest.c, src/dircolors.c:
* src/expand-common.c, src/expand.c, src/fmt.c, src/fold.c, src/ls.c:
* src/od.c, src/pinky.c, src/pr.c, src/ptx.c, src/seq.c:
* src/set-fields.c, src/split.c, src/stdbuf.c, src/test.c:
* src/tr.c, src/truncate.c, src/unexpand.c, src/wc.c:
Include ctype.h.
* src/system.h: Do not include ctype.h.

include ctype.h.o

21 months agomaint: move field_sep into separate module
Paul Eggert [Sat, 28 Oct 2023 00:15:08 +0000 (17:15 -0700)] 
maint: move field_sep into separate module

This is so that we don’t need to have every source file
include ctype.h.
* bootstrap.conf (gnulib_modules): Add cu-ctype.
* gl/lib/cu-ctype.c, gl/lib/cu-ctype.h, gl/modules/cu-ctype:
New files.
* src/join.c, src/numfmt.c, src/sort.c, src/uniq.c:
Include cu-ctype.h, for field_sep.
* src/system.h (field_sep): Remove; now supplied by cu-ctype.

21 months agodigest: omit unnecessary b2sum includes
Paul Eggert [Fri, 27 Oct 2023 15:56:39 +0000 (08:56 -0700)] 
digest: omit unnecessary b2sum includes

* src/blake2/b2sum.c: Do not include string.h, errno.h,
ctype.h, unistd.h, getopt.h.

21 months agomaint: prefer c_isxdigit when that is the intent
Paul Eggert [Fri, 27 Oct 2023 15:45:50 +0000 (08:45 -0700)] 
maint: prefer c_isxdigit when that is the intent

* src/digest.c (valid_digits, split_3):
* src/echo.c (main):
* src/printf.c (print_esc):
* src/ptx.c (unescape_string):
* src/stat.c (print_it):
When the code is supposed to support only POSIX-locale hex digits,
use c_isxdigit rather than isxdigit.  Include c-ctype.h as needed.
This defends against oddball locales where isxdigit != c_isxdigit.

21 months agomaint: fix syntax check issue
Pádraig Brady [Fri, 27 Oct 2023 13:19:01 +0000 (14:19 +0100)] 
maint: fix syntax check issue

* src/basenc.c: Fix preprocessor indentation.

21 months agobase32,base64: disallow non-canonical encodings
Pádraig Brady [Fri, 27 Oct 2023 12:24:04 +0000 (13:24 +0100)] 
base32,base64: disallow non-canonical encodings

This will make decoding more resilient to corruption
whether due to transmission errors or nefarious adjustment.
See https://eprint.iacr.org/2022/361.pdf

* gnulib: Update to commit 3f463202bd enforcing canonical encoding.
* tests/basenc/base64.pl: Add test cases, and adjust existing cases.
* NEWS: Mention the change in behavior.

21 months agobasenc: fix unlikely locale issue; tune
Paul Eggert [Wed, 25 Oct 2023 22:09:04 +0000 (15:09 -0700)] 
basenc: fix unlikely locale issue; tune

This sped up ‘basenc -d --base16’ by 60% on my old platform,
AMD Phenom II X4 910e, Fedora 38.
* src/basenc.c (struct base16_decode_context): Simplify by
omitting have_nibble.  ‘nibble’ is now negative if it’s missing.
All uses changed.
(B16): New macro, inspired by ../lib/base64.c.
(base16_to_int): New static var, likewise.
(isubase16): Reimplement using base16_to_int, since isxdigit is
not guaranteed to succeed on the chars we want when the locale is
oddball.
(base16_decode_ctx): Tune by using base16_to_int and by

21 months agobasenc: tweak checks to use unsigned char
Paul Eggert [Wed, 25 Oct 2023 21:43:32 +0000 (14:43 -0700)] 
basenc: tweak checks to use unsigned char

This tends to generate better code, at least on x86-64,
because callers are just as fast and callees can avoid a conversion.
* src/basenc.c: The following renamings also change the arg type
from char to unsigned char.  All uses changed.
(isubase): Rename from isbase.
(isubase64url): Rename from isbase64url.
(isubase32hex): Rename from isbase32hex.
(isubase16): Rename from isbase16.
(isuz85): Rename from isz85.
(isubase2): Rename from isbase2.

2023-10-24  Paul Eggert  <eggert@cs.ucla.edu>

* src/basenc.c (struct base16_decode_context):
Simplify by storing -1 for missing nibbles.  All uses changed.

21 months agobuild: update gnulib submodule to latest
Paul Eggert [Wed, 25 Oct 2023 15:45:15 +0000 (08:45 -0700)] 
build: update gnulib submodule to latest

21 months agobasenc: --base16: also allow lower case with --ignore-garbage
Pádraig Brady [Wed, 25 Oct 2023 13:04:00 +0000 (14:04 +0100)] 
basenc: --base16: also allow lower case with --ignore-garbage

* src/basenc.c (isbase16): Also return true for lower case.
* tests/basenc/basenc.pl: Add a test case.
Reported by Paul Eggert.

21 months agobasenc: --base16: support lower case hex digits
Pádraig Brady [Mon, 23 Oct 2023 11:51:19 +0000 (12:51 +0100)] 
basenc: --base16: support lower case hex digits

* src/basenc.c (base16_decode_ctx): Convert to uppercase
before converting from hex.
* tests/basenc/basenc.pl: Add a test case.
* NEWS: Mention the change in behavior.
Addresses https://bugs.gnu.org/66698

21 months agodoc: fix RFC references
Pádraig Brady [Mon, 23 Oct 2023 11:29:03 +0000 (12:29 +0100)] 
doc: fix RFC references

* doc/coreutils.texi: Adjust RFC URLs as the original
now give 404 errors.

22 months agotests: move all basenc tests to their own directory
Pádraig Brady [Fri, 6 Oct 2023 15:31:47 +0000 (16:31 +0100)] 
tests: move all basenc tests to their own directory

* tests/misc/base64.pl: Move to tests/basenc/base64.pl
* tests/misc/basenc.pl: Move to tests/basenc/basenc.pl
* tests/local.mk: Adjust accordingly

22 months agobasenc: auto pad base32 and base64 inputs when decoding
Pádraig Brady [Thu, 5 Oct 2023 16:00:51 +0000 (17:00 +0100)] 
basenc: auto pad base32 and base64 inputs when decoding

Padding of encoded data is useful in cases where
base64 encoded data is concatenated / streamed.
I.e. where there are padding chars _within_ the stream.
In other cases padding is optional and can be inferred.
Note we continue to treat partial padding as invalid,
as that would be indicative of truncation.

* src/basenc.c (do_decode): Auto pad the end of the input.
* NEWS: Mention the change in behavior.
* tests/misc/base64.pl: Adjust to not fail for missing padding.
Addresses https://bugs.gnu.org/66265

22 months agosort: improve --help
Paul Eggert [Fri, 29 Sep 2023 01:02:25 +0000 (18:02 -0700)] 
sort: improve --help

Problem reported by Jorge Stolfi (bug#66253).
* src/sort.c (usage): Suggest looking at the manual for -n details.

22 months agodoc: rm --help: mention that '.' or '..' are rejected
Pádraig Brady [Mon, 25 Sep 2023 13:46:48 +0000 (14:46 +0100)] 
doc: rm --help: mention that '.' or '..' are rejected

* src/rm.c (usage): State that '.' or '..' are rejected.

22 months agowc: pacify ‘make syntax-check’
Paul Eggert [Sun, 24 Sep 2023 00:19:35 +0000 (17:19 -0700)] 
wc: pacify ‘make syntax-check’

* src/wc_avx2.c (wc_lines_avx2): Explicitly make it ‘extern’.
Not sure why this is needed.

22 months agowc: distribute src/wc.h
Paul Eggert [Sun, 24 Sep 2023 00:18:45 +0000 (17:18 -0700)] 
wc: distribute src/wc.h

* src/local.mk (noinst_HEADERS): Add src/wc.h.

22 months agowc: goto considered harmful
Paul Eggert [Sun, 24 Sep 2023 00:07:33 +0000 (17:07 -0700)] 
wc: goto considered harmful

* src/wc.c: Do not include assure.h.  Replace the only
use of ‘assure’ with ‘unreachable’ which is good enough.
(wc, main): Remove labels and gotos.  This doesn’t affect
performance in any way I can measure, and makes the code
a bit easier to follow.

22 months agowc: prefer signed integers
Paul Eggert [Sat, 23 Sep 2023 21:22:16 +0000 (14:22 -0700)] 
wc: prefer signed integers

Prefer signed to unsigned integers, to make it easier to catch
integer overflow errors.
* src/wc.c: Do not include safe-read.
(total_lines_overflow, total_words_overflow, total_chars_overflow)
(total_bytes_overflow): Now bool, not uintmax_t.  All uses changed.
(max_line_length): Now intmax_t, not uintmax_t.  All uses changed.
The total_... vars are still uintmax_t because overflow into them
is checked.
(page_size): Now idx_t, not size_t.
(wc_lines, wc, get_input_fstatus, compute_number_width, main):
Prefer signed to unsigned ints where either should do.
(wc_lines, wc): Use read rather than safe_read, since we don’t
need safe_read’s checks for huge buffers.
(wc): Redo call to mbrtoc32 to lessen the number of comparisons
against its returned value.  Do this partly by keeping a pointer
to the end of the buffer rather than a count.  Simplify
overflow-checking code.
(compute_number_width): Check for integer overflow.
Don’t assume size_t fits into unsigned long.
* src/wc.h (struct wc_lines): Prefer signed integers.
* src/wc_avx2.c: Do not include safe-read.h.
(wc_lines_avx2): Prefer signed integers.  Use read, not safe_read.

22 months agowc: improve avx2 API
Paul Eggert [Sat, 23 Sep 2023 20:38:08 +0000 (13:38 -0700)] 
wc: improve avx2 API

* src/wc.c: Use "#include <...>" for files not in the current dir.
Include "wc.h" instead of declaring wc_lines_avx2 by hand.
(wc_lines): New API, with no file name (no longer needed) and
with a return struct rather than arg pointers.  All uses changed.
Use avx2_supported directly instead of using a function pointer.
Exploit C99-style declarations after statements.
Multiply by 15 rather than dividing; it’s faster and more accurate
and cannot overflow here.
(wc): Simplify based on wc_lines API change.
* src/wc.h: New file.
* src/wc_avx2.c: Include it, to check API better.
(wc_lines_avx2): Use new API.  All uses changed.  Exploit C99.
Make locals more local.

22 months agofactor,tail: avoid quadratic reallocation
Paul Eggert [Sat, 23 Sep 2023 08:15:08 +0000 (01:15 -0700)] 
factor,tail: avoid quadratic reallocation

* src/factor.c (struct mp_factors): New member nalloc.
(mp_factor_init): Initialize it.
* src/factor.c (mp_factor_insert):
* src/tail.c (parse_options): Use xpalloc to avoid quadratic
worst-case behavior on reallocation.
* src/tail.c (pids_alloc): New static var.

22 months agodoc: mention Unicode exceptions for wc
Paul Eggert [Sat, 23 Sep 2023 07:23:26 +0000 (00:23 -0700)] 
doc: mention Unicode exceptions for wc

22 months agowc: simplify by removing SUPPORT_OLD_MBRTOWC
Paul Eggert [Sat, 23 Sep 2023 07:03:41 +0000 (00:03 -0700)] 
wc: simplify by removing SUPPORT_OLD_MBRTOWC

* src/wc.c (SUPPORT_OLD_MBRTOWC): Remove.  All uses removed.
(wc): Simplify by assuming C99-or-later behavior for mbrtoc32,
which after all is a C11 API.  Fix the !SUPPORT_OLD_MBRTOWC
code, which evidently was never tested seriously.

22 months agowc: 3× speedup in C locale
Paul Eggert [Sat, 23 Sep 2023 05:09:37 +0000 (22:09 -0700)] 
wc: 3× speedup in C locale

The 3× speedup was measured by invoking 'wc $(find * -type f)'
on the coreutils sources etc. on an Ubuntu 23.04 x86-64.
These changes also speed up wc 20% in UTF-8 locales.
* src/wc.c (wc_isprint, wc_isspace): New static vars.
(wc): Use them for speed.
(main): Initialize them if needed.
(isnbspace): Remove; no longer used.

22 months agowc: treat encoding errors as non white space
Paul Eggert [Sat, 23 Sep 2023 03:53:57 +0000 (20:53 -0700)] 
wc: treat encoding errors as non white space

* src/wc.c (wc): Treat encoding errors like non white space
characters.

22 months agowc: fix word count bug
Paul Eggert [Fri, 22 Sep 2023 18:13:51 +0000 (11:13 -0700)] 
wc: fix word count bug

* bootstrap.conf (gnulib_modules): Remove c32isprint.
* src/wc.c (wc): Consider all non-white-space characters
to be word constituents, even if they are not printable.
POSIX requires this, and it is what BSD does.
Partly do this by simplifying the check for a word,
by counting word starts rather than word ends.
* tests/wc/wc.pl: Test for the bug.

22 months agomaint: omit some unused function tests
Paul Eggert [Fri, 22 Sep 2023 17:05:58 +0000 (10:05 -0700)] 
maint: omit some unused function tests

* m4/jm-macros.m4: Do not check for ftruncate, iswspace,
mkfifo, mbrlen, sysctl.  Coreutils no longer uses the
corresponding HAVE_* macros, typically because Gnulib
handles them now.
* src/wc.c (iswspace): Remove; unused.

22 months agosort: not a special case for mbrtowc
Paul Eggert [Fri, 22 Sep 2023 16:49:41 +0000 (09:49 -0700)] 
sort: not a special case for mbrtowc

* configure.ac (GNULIB_MBRTOWC_SINGLE_THREAD): Define.

22 months agomaint: prefer char32_t to wchar_t
Paul Eggert [Fri, 22 Sep 2023 16:45:12 +0000 (09:45 -0700)] 
maint: prefer char32_t to wchar_t

This should work better on non-glibc platforms that don’t
use Unicode for wchar_t.  However, POSIX appears to prohibit
this for printf.c so leave that alone.
* bootstrap.conf (gnulib_modules): Add btoc32, c32iscntrl,
c32isprint, c32isspace, c32width, mbrtoc32.  Remove btoc, wcwidth.
* src/df.c, src/ls.c, src/wc.c:
Include uchar.h instead of wchar.h and wctype.h.
* src/df.c (replace_invalid_chars):
* src/ls.c (quote_name_buf):
* src/wc.c (isnbspace, wc):
Use char32_t instead of wchar_t.

22 months agowc: simplify #if MB_LEN_MAX
Paul Eggert [Fri, 22 Sep 2023 15:17:15 +0000 (08:17 -0700)] 
wc: simplify #if MB_LEN_MAX

* src/wc.c: Don’t have special #ifs for platforms where
MB_LEN_MAX is 1.  On these platforms, MB_CUR_MAX is 1 as well,
so the compiler should optimize away all multi-byte code.

22 months agowc: avoid undefined conversion state
Paul Eggert [Fri, 22 Sep 2023 02:23:56 +0000 (19:23 -0700)] 
wc: avoid undefined conversion state

* src/wc.c (wc): When mbrtowc returns (size_t) -1, zero the
conversion state, since POSIX says it’s undefined.

22 months agomaint: use mbszero
Paul Eggert [Fri, 22 Sep 2023 02:09:15 +0000 (19:09 -0700)] 
maint: use mbszero

* bootstrap.conf (gnulib_modules): Add mbszero.
* src/df.c (replace_invalid_chars):
* src/ls.c (quote_name_buf):
* src/pathchk.c (portable_chars_only):
* src/printf.c (STRTOX):
* src/wc.c (wc):
Prefer mbszero to clearing an mbstate_t by hand.

22 months agomaint: prefer mcel
Paul Eggert [Fri, 22 Sep 2023 01:45:47 +0000 (18:45 -0700)] 
maint: prefer mcel

This causes Gnulib code to also use mcel, which is more consistent.
* bootstrap.conf (avoided_gnulib_modules): Avoid mbuiter
and mbuiterf, since we can now just use mcel.  This avoids
the need to ship and compile mbchar and these modules.
(gnulib_modules): Change mcel to mcel-prefer.

22 months agowc: stop worrying about EBCDIC, shift-JIS, etc
Paul Eggert [Fri, 22 Sep 2023 01:45:08 +0000 (18:45 -0700)] 
wc: stop worrying about EBCDIC, shift-JIS, etc

* src/wc.c: Do not include mbchar.h.
(wc): Check for ASCII characters instead of using is_basic.
Other parts of Gnulib and coreutils already assume the encoding
is upward compatible with ASCII, and the old code wouldn’t
have worked anyway with shift-JIS.

22 months agoexpr: use mcel
Paul Eggert [Thu, 21 Sep 2023 23:59:48 +0000 (16:59 -0700)] 
expr: use mcel

The mcel API is simpler and corresponds more closely to how
Emacs etc. behave when the input has encoding errors,
since it treats each encoding-error byte separately.
* bootstrap.conf (gnulib_modules): Add mcel.
* src/expr.c: Include mcel.h instead of mbuiter.h.
(mbs_logical_cspn, mbs_logical_substr, mbs_offset_to_chars):
Use mcel API.
(mbs_logical_substr): Use ximemdup0 so as not to waste memory in
the result, fixing a FIXME.

22 months agobuild: update gnulib submodule to latest
Paul Eggert [Thu, 21 Sep 2023 17:44:25 +0000 (10:44 -0700)] 
build: update gnulib submodule to latest

22 months agobuild: avoid build failures on gcc <= 10, or clang
Pádraig Brady [Thu, 21 Sep 2023 17:48:49 +0000 (18:48 +0100)] 
build: avoid build failures on gcc <= 10, or clang

On gcc 10 the following build failure occurs:
  "error: a label can only be part of a statement
   and a declaration is not a statement"
This is because the current code is non standards conforming,
but GCC >= 11 will compile it (even with the -Wpedantic option).
This issue is tracked for GCC at:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111526

* src/tail.c (parse_options): Avoid a declaration after label,
by using a surrounding block.

22 months agotail: allow multiple PIDs
Stephen Kitt [Mon, 18 Sep 2023 16:09:29 +0000 (18:09 +0200)] 
tail: allow multiple PIDs

tail can watch multiple files, but currently only a single writer. It
can be useful to watch files from multiple writers, or even processes
not directly related to the files (e.g. watch log files written by a
server process, for the duration of a test driven by a separate
client).

* src/tail.c (writers_are_dead): New function.
(tail_forever): Use it to wait for writers.
(tail_forever_inotify): As above.
(parse_options): Manage --pid options in an array.
* doc/coreutils.texi: Update documentation.
* tests/tail/pid.sh: Add a variant with two PIDs.
* News: Mention the new feature.

22 months agols: --dired now implies long format with hyperlinks disabled
Sylvestre Ledru [Sun, 17 Sep 2023 13:55:57 +0000 (15:55 +0200)] 
ls: --dired now implies long format with hyperlinks disabled

Currently --dired is silently ignored
with conflicting output formats

* src/ls.c (decode_switches): Set default format and hyperlink mode
when the --dired option is specified.
* tests/ls/dired.sh: Check that formats are implied / overridden.
* NEWS: Mention the change in behavior.
* doc/coreutils.texi (ls invocation): Adjust --dired description.

22 months agotests: improve ls --dired testing
Sylvestre Ledru [Thu, 14 Sep 2023 21:40:08 +0000 (23:40 +0200)] 
tests: improve ls --dired testing

* tests/ls/dired.sh: Verify ls --dired output against varying offsets.

22 months agomaint: use C99 int size specifiers rather than PRI.MAX defines
Pádraig Brady [Wed, 13 Sep 2023 22:08:02 +0000 (23:08 +0100)] 
maint: use C99 int size specifiers rather than PRI.MAX defines

Following on from commit v9.3-128-gf31229ebd
replace all uses of the PRI.MAX portability defines
with C99 size specifiers %z, %j, and %t.

22 months agodoc: add subsections for cksum nodes
Pádraig Brady [Mon, 11 Sep 2023 19:21:39 +0000 (20:21 +0100)] 
doc: add subsections for cksum nodes

* doc/coreutils.texi: Specify each of the cksum nodes as a subsection,
so that the docs are organised appropriately in the pdf and html manual.

22 months agocp,mv,install: add copy_internal comment
Paul Eggert [Fri, 8 Sep 2023 23:25:00 +0000 (16:25 -0700)] 
cp,mv,install: add copy_internal comment

* src/copy.c (copy_internal): Add comment about
some particularly tricky logic.

22 months agocp: avoid needless unlinkat after fstatat ELOOP
Paul Eggert [Fri, 8 Sep 2023 16:14:06 +0000 (09:14 -0700)] 
cp: avoid needless unlinkat after fstatat ELOOP

* src/copy.c (copy_internal): When cp -f's fstatat fails on the
destination with ELOOP, report an error immediately when fstatat
used AT_SYMLINK_NOFOLLOW, as the later unlinkat would fail too.

22 months agocp,mv,install: minor copy_internal refactoring
Paul Eggert [Fri, 8 Sep 2023 16:10:21 +0000 (09:10 -0700)] 
cp,mv,install: minor copy_internal refactoring

* src/copy.c (copy_internal): Redo to avoid need for calculating
fstatat_flags when not needed.  This is for clarity, not speed.

22 months agocp,mv,install: fix comment punctuation
Paul Eggert [Tue, 5 Sep 2023 17:10:12 +0000 (10:10 -0700)] 
cp,mv,install: fix comment punctuation

* src/copy.h: Fix punctuation in comment.