Pádraig Brady [Mon, 26 Feb 2024 16:38:41 +0000 (16:38 +0000)]
build: fix libcrypto version linked by sort at runtime
One should link the versioned lib at runtime,
and the unversioned lib at build time,
as the unversioned lib may not be installed,
and better couples the binary with the required version.
* configure.ac: Define LIBCRYPTO_SONAME, determined from
the test binary linked with -lcrypto. Also document
why we use SHA512() in the check, rather than MD5().
* src/sort.c (link_libcrypto): Use the versioned lib in dlopen().
Pádraig Brady [Mon, 26 Feb 2024 14:42:40 +0000 (14:42 +0000)]
maint: avoid sc_tight_scope failure in sort.c
* cfg.mk: Exclude the ptr_MD5_* symbols added in
commit v9.4-130-g7f57ac2d2, as there is no way
to declare these static given they way they're defined.
Paul Eggert [Mon, 26 Feb 2024 01:13:12 +0000 (17:13 -0800)]
sort: dynamically link -lcrypto if -R
This saves time in the usual case, which does not need -lcrypto.
* configure.ac (DLOPEN_LIBCRYPTO): New macro.
* src/sort.c [DLOPEN_LIBCRYPTO && HAVE_OPENSSL_MD5]: New macros
MD5_Init, MD5_Update, MD5_Final. Include "md5.h" after defining
them. Include <dlfcn.h>, and define new functions link_failure
and symbol_address.
(link_libcrypto): New function.
(random_md5_state_init): Call it before using crypto functions.
Daan De Meyer [Thu, 25 Jan 2024 13:02:32 +0000 (14:02 +0100)]
cp: add --keep-directory-symlink option
When recursively copying files into OS trees, it often happens that
some subdirectory of the source directory is a symlink in the target
directory. Currently, cp will fail in that scenario with the error:
"cannot overwrite non-directory %s with directory %s"
However, we'd like cp in this scenario to follow the destination
directory symlink and copy the files into the symlinked directory
instead. Let's support this by adding a new option
--keep-directory-symlink that makes cp follow destination directory
symlinks.
We name the option --keep-directory-symlink to keep consistent with
tar which has the same option with the same effect.
* doc/coreutils.texi (cp invocation): Describe the new option.
* src/copy.h: Add the new setting.
* src/copy.h: Adjust to follow symlinks if setting enabled.
* src/cp.c (usage): Describe the new option.
(main): Accept the new option.
* tests/cp/keep-directory-symlink.sh: A new test.
* tests/local.mk: Reference the new test.
* NEWS: Mention the new feature.
Paul Eggert [Sun, 18 Feb 2024 05:51:03 +0000 (21:51 -0800)]
ls: remove unnecessary pragmas
* src/ls.c (decode_switches): Remove pragmas. They are no longer
needed to pacify GCC 13.2.1 with --enable-gcc-checking, and there’s
little point keeping them around for older GCC versions.
Paul Eggert [Sun, 18 Feb 2024 05:36:37 +0000 (21:36 -0800)]
maint: remove unneeded suggest-attributes pragmas
* gl/lib/fadvise.c: Remove pragma that works around GCC bug 83559
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83559>.
This bug was fixed in GCC 9, and we needn’t worry about
--enable-gcc-warnings for compilers that old.
* src/test.c: Likewise.
Greg Wooledge [Sat, 17 Feb 2024 13:07:12 +0000 (13:07 +0000)]
doc: fix typo in shred example
* doc/coreutils.texi (shred invocation): Fix the example
to correctly close file descriptor 3.
* THANKS.in: Remove old email since now recorded in repo history.
Reported at https://bugs.debian.org/1063837
Pádraig Brady [Wed, 7 Feb 2024 10:55:00 +0000 (10:55 +0000)]
build: fix od build on clang < 17
* configure.ac: Ensure the compiler can promote 16 bit floating point
types to float, before enabling that code in od. This was an issue
with clang 16 at least.
* src/od.c: Adjust for the new defines.
* tests/od/od-float.sh: Likewise. Also port to the dash shell,
whose inbuilt printf doesn't support hex escapes.
Pádraig Brady [Thu, 1 Feb 2024 17:59:51 +0000 (17:59 +0000)]
od: support half precision floating point
Rely on compiler support for _Float16 and __bf16
to support -fH and -fB formats respectively.
I.e. IEEE 16 bit, and brain 16 bit floats respectively.
Modern GCC and LLVM compilers support both types.
This was tested on:
gcc 13, clang 17 x86 (Both types supported)
gcc 7 aarch64 (Only -fH supported)
gcc 13 ppc(be) (Neither supported. Both will be with GCC 14)
* src/od.c: Support -tf2 or -tfH to print IEEE 16 bit floating point,
or -tfB to print Brain 16 bit floating point.
* configure.ac: Check for _Float16 and __bf16 types.
* doc/coreutils.texi (od invocation): Mention the new -f types.
* tests/od/od-float.sh: Add test cases.
* NEWS: Mention the new feature.
Addresses https://bugs.gnu.org/68871
Pádraig Brady [Thu, 18 Jan 2024 00:05:18 +0000 (00:05 +0000)]
doc: split -C: test and document a heap overflow
This was introduced in coreutils 9.2 through commit v9.1-184-g40bf1591b,
and was fixed in coreutils 9.5 through commit v9.4-111-gc4c5ed8f4.
This issue has been assigned CVE-2024-0684.
* NEWS: Mention the bug fix.
* tests/split/line-bytes.sh: Add a test case.
Reported by Valentin Metz.
Pádraig Brady [Wed, 17 Jan 2024 23:49:52 +0000 (23:49 +0000)]
tests: make ulimit -v interact better with ASAN
ulimit -v is generally not supported with ASAN, giving errors like:
"ReserveShadowMemoryRange failed while trying to map 0x... bytes.
Perhaps you're using ulimit -v"
* tests/cp/link-heap.sh: Mention ASAN as a possible reason for skipping.
* tests/csplit/csplit-heap.sh: Likewise.
* tests/cut/cut-huge-range.sh: Likewise.
* tests/dd/no-allocate.sh: Likewise.
* tests/printf/printf-surprise.sh: Likewise.
* tests/rm/many-dir-entries-vs-OOM.sh: Likewise.
* tests/head/head-c.sh: Only skip the part of the test needing ulimit.
* tests/split/line-bytes.sh: Likewise.
Paul Eggert [Tue, 16 Jan 2024 21:48:32 +0000 (13:48 -0800)]
split: do not shrink hold buffer
* src/split.c (line_bytes_split): Do not shrink hold buffer.
If it’s large for this batch it’s likely to be large for the next
batch, and for ‘split’ it’s not worth the complexity/CPU hassle to
shrink it. Do not assume hold_size can be bufsize.
Samuel Tardieu [Fri, 5 Jan 2024 15:51:34 +0000 (16:51 +0100)]
maint: add attributes to two functions without side effects
* src/date.c (res_width): This function computes its result solely
from the value of its parameter and qualifies for the const attribute.
* src/tee.c (get_next_out): This function has no side effect and
qualifies for the pure attribute.
* THANKS.in: Remove duplicate now that author has a commit in the repo.
Those two functions were flagged by GCC 12.3.0,
though not by GCC 13.2.1.
Pádraig Brady [Mon, 1 Jan 2024 13:22:42 +0000 (13:22 +0000)]
maint: update all copyright year number ranges
Update to latest gnulib with new copyright year.
Run "make update-copyright" and then...
* gnulib: Update included in this commit as copyright years
are the only change from the previous gnulib commit.
* tests/init.sh: Sync with gnulib to pick up copyright year.
* bootstrap: Manually update copyright year,
until we fully sync with gnulib at a later stage.
* tests/sample-test: Adjust to use the single most recent year.
Paul Eggert [Mon, 1 Jan 2024 03:48:24 +0000 (19:48 -0800)]
maint: pacify recent clang better
* configure.ac: Clang now seems to have -Wformat-extra-args,
-Wimplicit-const-int-float-conversion, and
-Wtautological-constant-out-of-range-compare on by default,
so disable them even if --enable-gcc-warnings is not used.
Rely on Gnulib’s check for clang rather than rolling our own.
Pádraig Brady [Thu, 28 Dec 2023 00:02:42 +0000 (00:02 +0000)]
sort: fix thousands grouping handling on single byte locales
* gl/lib/strnumcmp-in.h (numcompare): After commit v9.0-8-g6cafb122f,
we need to treat characters as signed to avoid invalid comparisons
between negative integers and unsigned characters.
* NEWS: Mention the bug fix.
Pádraig Brady [Wed, 27 Dec 2023 23:37:17 +0000 (23:37 +0000)]
tests: numfmt: fix test related to lower case 'k' SI unit
* tests/misc/numfmt.pl: Following on from v9.4-86-g615167cc4,
adjust this test accordingly. This test was being skipped
on some systems, and so only noticed now.
Reported by Jim Meyering.
Pádraig Brady [Wed, 27 Dec 2023 22:47:48 +0000 (22:47 +0000)]
tests: run locale tests on more systems
* tests/misc/numfmt.pl: Determine the thousands grouping character
in use, rather than skipping locale tests when it's not a space.
For example fr_FR.UTF-8 uses "NARROW NO-BREAK SPACE" as the grouping
char on modern glibc systems at least.
* tests/sort/sort-h-thousands-sep.sh: Likewise.
Do not perform SELinux context translation for operations not involving
user input or output. Context translation converts MCS/MLS labels into
human readable form, which is useful for user facing applications like
ls(1) or the --context=CTX argument of cp(1).
* src/copy.c (set_process_security_ctx): Use raw selinux variants.
* src/install.c (need_copy): Likewise.
(setdefaultfilecon): Likewise.
* src/selinux.c (computecon): Likewise.
(defaultcon): Likewise.
* tests/cp/no-ctx.sh: Add raw variants to preload lib.
* NEWS: Mention the improvement.
Pádraig Brady [Sun, 17 Dec 2023 14:35:36 +0000 (14:35 +0000)]
doc: cp --no-clobber: improve documentation
* doc/coreutils.texi (cp invocation): Reference the related --update
option, like we had already done in mv invocation.
* src/cp.c (usage): State clearly what --no-clobber does,
indicating it's protection focused, rather than being update focused.
* doc/coreutils.texi (chown invocation): Convert --from option
description to a macro and call from ...
(chgrp description): ... here.
* src/chown-core.h (emit_from_option_description): A new function
refactored from ...
* src/chown.c (usage): ... here, and called from ...
* src/chgrp.c (usage): ... here.
(main): Accept the --from option as chown(1) does.
* po/POTFILES.in: Add chown-core.h as now translated.
* tests/chown/basic.sh: Decouple the root user from id 0.
* tests/chgrp/from.sh: A new test largely based on chown/basic.sh.
* tests/local.mk: Reference the new test.
* NEWS: Mention the new feature.
Suggested by Ed Neville.
Pádraig Brady [Mon, 11 Dec 2023 17:03:33 +0000 (17:03 +0000)]
maint: remove obsolete AC_PROG_GCC_TRADITIONAL
* configure.ac: Remove obsolete macro call.
Recent autoconf warns that it is obsolete.
AC_PROG_CPP sets up the -traditional-cpp option if required.
GCC ignores -traditional since commit f458d1d5 (2002).
Fixes https://bugs.gnu.org/67756
Pádraig Brady [Wed, 6 Dec 2023 13:03:48 +0000 (13:03 +0000)]
doc: touch: clarify --time description in man page
* src/touch.c (usage): Reorganise the description to be similar to
the format used for the ls --time description, which formats better
when converted to a man page. Also separate the description
to allow for more granular translations.
Fixes https://bugs.gnu.org/67656
dann frazier [Thu, 30 Nov 2023 01:32:34 +0000 (18:32 -0700)]
tail: fix tailing sysfs files where PAGE_SIZE > BUFSIZ
* src/tail.c (file_lines): Ensure we use a buffer size >= PAGE_SIZE when
searching backwards to avoid seeking within a file,
which on sysfs files is accepted but also returns no data.
* tests/tail/tail-sysfs.sh: Add a new test.
* tests/local.mk: Reference the new test.
* NEWS: Mention the bug fix.
Fixes https://bugs.gnu.org/67490
Pádraig Brady [Sun, 26 Nov 2023 16:41:56 +0000 (16:41 +0000)]
numfmt: support lowercase 'k' for Kilo and Kibi
For consistency with the "SI" standard, and with other coreutils
which output a lowercase 'k' in "SI" mode.
* src/numfmt.c (suffix_power): Treat 'k' like 'K' on input.
(double_to_human): Output lowercase 'k' in SI mode.
(usage): Adjust accordingly.
* doc/coreutils.texi: Mention 'k' accepted, and printed in SI mode.
* tests/misc/numfmt.pl: Adjust accordingly.
* NEWS: Mention the change in behavior.
Fixes https://bugs.gnu.org/47103
Paul Eggert [Thu, 16 Nov 2023 19:34:55 +0000 (11:34 -0800)]
uniq: fix bug with -w in multibyte locales
-w counted bytes not characters, which is wrong in multibyte locales.
This bug exists even in Fedora, which is why the recently-added
test cases from Fedora didn’t catch it.
* src/uniq.c (find_field): New arg PLEN. All callers changed.
Compute length of field correctly in multi-byte locales.
(different): Don’t worry about check_chars; find_field now does that.
* tests/uniq/uniq.pl: Test for this bug.
Paul Eggert [Wed, 15 Nov 2023 23:05:17 +0000 (15:05 -0800)]
uniq: prefer static init
* src/uniq.c (skip_fields, skip_chars, check_chars, count_occurrences)
(output_unique, output_first_repeated, output_later_repeated)
(delimit_groups): Initialize statically, rather than in ‘main’.
This shrinks the executable a bit.
Paul Eggert [Wed, 15 Nov 2023 22:57:17 +0000 (14:57 -0800)]
uniq: simplify and fix unlikely bug by using bool
* src/uniq.c (enum countmode): Remove this type.
(count_occurrences): New static var, replacing the old countmode,
and of type boolean instead of a two-value enum type that was
confusing (and which caused a hard-to-test bug when the count
exceeded INTMAX_MAX - 1). All uses changed.
Paul Eggert [Wed, 15 Nov 2023 07:13:26 +0000 (23:13 -0800)]
uniq: prefer signed integers
* src/uniq.c (skip_fields, skip_chars, check_chars, size_opt)
(find_field, different, writeline, check_file, main):
Prefer signed to unsigned integer types, since this allows
for better runtime checking with -fsanitize=undefined.
Paul Eggert [Wed, 15 Nov 2023 04:35:56 +0000 (20:35 -0800)]
maint: DECIMAL_DIGIT_ACCUMULATE uses stdckdint.h
* src/system.h: Include <stdckdint.h>, since the new
DECIMAL_DIGIT_ACCUMULATE uses it.
Do not include stdckdint.h from files that also include system.h.
(DECIMAL_DIGIT_ACCUMULATE): Omit last arg, which is no longer needed.
Reimplement by using C23-style stdckdint.h’s ckd_mul and ckd_add,
as that’s more standard and is more likely to generate better code.
Paul Eggert [Sat, 11 Nov 2023 08:17:11 +0000 (00:17 -0800)]
pinky: fix string size calculation
* src/pinky.c (count_ampersands): Simplify and return idx_t.
(create_fullname): Compute proper destination string size,
basically, by adding (ulen - 1) * ampersands rather than ulen *
(ampersands - 1). Problem found on CHERI-64.
Pádraig Brady [Fri, 3 Nov 2023 16:22:22 +0000 (16:22 +0000)]
ls: fix recent regression in size alignment
* src/ls.c (print_long_format): Use correct column width,
introduced due to a copy/paste error in commit v9.4-2-gcbb6dfec5
* tests/ls/size-align.sh: Add a test.
* tests/local.mk: Reference the new test.
Fixes https://bugs.gnu.org/66919
Paul Eggert [Mon, 30 Oct 2023 17:47:34 +0000 (10:47 -0700)]
join: fix recently introduced NUL bug
* src/join.c (xfields): Simplify and fix bug with fields
that start with a NUL byte when -t is not used.
* tests/misc/join-utf8.sh: Also test when -t is not used,
and when a field starts with NUL.
Paul Eggert [Mon, 30 Oct 2023 07:32:51 +0000 (00:32 -0700)]
join,uniq: support multi-byte separators
* NEWS: Mention this.
* bootstrap.conf (gnulib_modules): Remove cu-ctype, as this module
is now more trouble than it’s worth. All uses removed.
Add skipchars.
* gl/lib/cu-ctype.c, gl/lib/cu-ctype.h, gl/modules/cu-ctype:
Remove.
* gl/lib/skipchars.c, gl/lib/skipchars.h, gl/modules/skipchars:
* tests/misc/join-utf8.sh:
New files.
* src/join.c: Include skipchars.h and mcel.h instead of cu-ctype.h.
(tab): Now mcel_t, not int. All uses changed.
(output_separator, output_seplen): New static vars.
(eq_tab, newline_or_blank, comma_or_blank): New functions.
(xfields, prfields, prjoin, add_field_list, main):
Support multi-byte characters.
* src/numfmt.c: Include ctype.h, skipchars.h.
Do not include cu-ctype.h.
(newline_or_blank): New function.
(next_field): Support multi-byte characters.
* src/sort.c: Include ctype.h instead of cu-ctype.h.
(inittables): Open-code field_sep since it no longer exists.
‘sort’ is not multi-byte safe yet, but when it is this code
will need revamping anyway.
* src/uniq.c: Include mcel.h and skipchars.h instead of cu-ctype.h.
(newline_or_blank): New function.
(find_field): Support multi-byte characters.
* tests/local.mk (all_tests): Add tests/misc/join-utf8.sh
Paul Eggert [Sat, 28 Oct 2023 16:22:09 +0000 (09:22 -0700)]
dircolors: assume C-locale spaces
* src/dircolors.c: Include c-ctype.h, not ctype.h.
(parse_line): Use c_isspace, not isspace, as the .dircolors
file format (which does not seem to be documented!) appears
to be ASCII.
Paul Eggert [Sat, 28 Oct 2023 00:31:49 +0000 (17:31 -0700)]
maint: include ctype.h selectively
Include ctype.h only in files that need it. Many of its uses
are incorrect, as they assume single-byte locales. The idea is
to remove the incorrect uses later, when there is time.
* src/chroot.c, src/csplit.c, src/dd.c, src/digest.c, src/dircolors.c:
* src/expand-common.c, src/expand.c, src/fmt.c, src/fold.c, src/ls.c:
* src/od.c, src/pinky.c, src/pr.c, src/ptx.c, src/seq.c:
* src/set-fields.c, src/split.c, src/stdbuf.c, src/test.c:
* src/tr.c, src/truncate.c, src/unexpand.c, src/wc.c:
Include ctype.h.
* src/system.h: Do not include ctype.h.
Paul Eggert [Sat, 28 Oct 2023 00:15:08 +0000 (17:15 -0700)]
maint: move field_sep into separate module
This is so that we don’t need to have every source file
include ctype.h.
* bootstrap.conf (gnulib_modules): Add cu-ctype.
* gl/lib/cu-ctype.c, gl/lib/cu-ctype.h, gl/modules/cu-ctype:
New files.
* src/join.c, src/numfmt.c, src/sort.c, src/uniq.c:
Include cu-ctype.h, for field_sep.
* src/system.h (field_sep): Remove; now supplied by cu-ctype.
Paul Eggert [Fri, 27 Oct 2023 15:45:50 +0000 (08:45 -0700)]
maint: prefer c_isxdigit when that is the intent
* src/digest.c (valid_digits, split_3):
* src/echo.c (main):
* src/printf.c (print_esc):
* src/ptx.c (unescape_string):
* src/stat.c (print_it):
When the code is supposed to support only POSIX-locale hex digits,
use c_isxdigit rather than isxdigit. Include c-ctype.h as needed.
This defends against oddball locales where isxdigit != c_isxdigit.
Pádraig Brady [Fri, 27 Oct 2023 12:24:04 +0000 (13:24 +0100)]
base32,base64: disallow non-canonical encodings
This will make decoding more resilient to corruption
whether due to transmission errors or nefarious adjustment.
See https://eprint.iacr.org/2022/361.pdf
* gnulib: Update to commit 3f463202bd enforcing canonical encoding.
* tests/basenc/base64.pl: Add test cases, and adjust existing cases.
* NEWS: Mention the change in behavior.
Paul Eggert [Wed, 25 Oct 2023 22:09:04 +0000 (15:09 -0700)]
basenc: fix unlikely locale issue; tune
This sped up ‘basenc -d --base16’ by 60% on my old platform,
AMD Phenom II X4 910e, Fedora 38.
* src/basenc.c (struct base16_decode_context): Simplify by
omitting have_nibble. ‘nibble’ is now negative if it’s missing.
All uses changed.
(B16): New macro, inspired by ../lib/base64.c.
(base16_to_int): New static var, likewise.
(isubase16): Reimplement using base16_to_int, since isxdigit is
not guaranteed to succeed on the chars we want when the locale is
oddball.
(base16_decode_ctx): Tune by using base16_to_int and by
Paul Eggert [Wed, 25 Oct 2023 21:43:32 +0000 (14:43 -0700)]
basenc: tweak checks to use unsigned char
This tends to generate better code, at least on x86-64,
because callers are just as fast and callees can avoid a conversion.
* src/basenc.c: The following renamings also change the arg type
from char to unsigned char. All uses changed.
(isubase): Rename from isbase.
(isubase64url): Rename from isbase64url.
(isubase32hex): Rename from isbase32hex.
(isubase16): Rename from isbase16.
(isuz85): Rename from isz85.
(isubase2): Rename from isbase2.
2023-10-24 Paul Eggert <eggert@cs.ucla.edu>
* src/basenc.c (struct base16_decode_context):
Simplify by storing -1 for missing nibbles. All uses changed.
Pádraig Brady [Mon, 23 Oct 2023 11:51:19 +0000 (12:51 +0100)]
basenc: --base16: support lower case hex digits
* src/basenc.c (base16_decode_ctx): Convert to uppercase
before converting from hex.
* tests/basenc/basenc.pl: Add a test case.
* NEWS: Mention the change in behavior.
Addresses https://bugs.gnu.org/66698
Pádraig Brady [Thu, 5 Oct 2023 16:00:51 +0000 (17:00 +0100)]
basenc: auto pad base32 and base64 inputs when decoding
Padding of encoded data is useful in cases where
base64 encoded data is concatenated / streamed.
I.e. where there are padding chars _within_ the stream.
In other cases padding is optional and can be inferred.
Note we continue to treat partial padding as invalid,
as that would be indicative of truncation.
* src/basenc.c (do_decode): Auto pad the end of the input.
* NEWS: Mention the change in behavior.
* tests/misc/base64.pl: Adjust to not fail for missing padding.
Addresses https://bugs.gnu.org/66265
Paul Eggert [Sun, 24 Sep 2023 00:07:33 +0000 (17:07 -0700)]
wc: goto considered harmful
* src/wc.c: Do not include assure.h. Replace the only
use of ‘assure’ with ‘unreachable’ which is good enough.
(wc, main): Remove labels and gotos. This doesn’t affect
performance in any way I can measure, and makes the code
a bit easier to follow.
Paul Eggert [Sat, 23 Sep 2023 21:22:16 +0000 (14:22 -0700)]
wc: prefer signed integers
Prefer signed to unsigned integers, to make it easier to catch
integer overflow errors.
* src/wc.c: Do not include safe-read.
(total_lines_overflow, total_words_overflow, total_chars_overflow)
(total_bytes_overflow): Now bool, not uintmax_t. All uses changed.
(max_line_length): Now intmax_t, not uintmax_t. All uses changed.
The total_... vars are still uintmax_t because overflow into them
is checked.
(page_size): Now idx_t, not size_t.
(wc_lines, wc, get_input_fstatus, compute_number_width, main):
Prefer signed to unsigned ints where either should do.
(wc_lines, wc): Use read rather than safe_read, since we don’t
need safe_read’s checks for huge buffers.
(wc): Redo call to mbrtoc32 to lessen the number of comparisons
against its returned value. Do this partly by keeping a pointer
to the end of the buffer rather than a count. Simplify
overflow-checking code.
(compute_number_width): Check for integer overflow.
Don’t assume size_t fits into unsigned long.
* src/wc.h (struct wc_lines): Prefer signed integers.
* src/wc_avx2.c: Do not include safe-read.h.
(wc_lines_avx2): Prefer signed integers. Use read, not safe_read.
Paul Eggert [Sat, 23 Sep 2023 20:38:08 +0000 (13:38 -0700)]
wc: improve avx2 API
* src/wc.c: Use "#include <...>" for files not in the current dir.
Include "wc.h" instead of declaring wc_lines_avx2 by hand.
(wc_lines): New API, with no file name (no longer needed) and
with a return struct rather than arg pointers. All uses changed.
Use avx2_supported directly instead of using a function pointer.
Exploit C99-style declarations after statements.
Multiply by 15 rather than dividing; it’s faster and more accurate
and cannot overflow here.
(wc): Simplify based on wc_lines API change.
* src/wc.h: New file.
* src/wc_avx2.c: Include it, to check API better.
(wc_lines_avx2): Use new API. All uses changed. Exploit C99.
Make locals more local.
Paul Eggert [Sat, 23 Sep 2023 08:15:08 +0000 (01:15 -0700)]
factor,tail: avoid quadratic reallocation
* src/factor.c (struct mp_factors): New member nalloc.
(mp_factor_init): Initialize it.
* src/factor.c (mp_factor_insert):
* src/tail.c (parse_options): Use xpalloc to avoid quadratic
worst-case behavior on reallocation.
* src/tail.c (pids_alloc): New static var.
Paul Eggert [Sat, 23 Sep 2023 07:03:41 +0000 (00:03 -0700)]
wc: simplify by removing SUPPORT_OLD_MBRTOWC
* src/wc.c (SUPPORT_OLD_MBRTOWC): Remove. All uses removed.
(wc): Simplify by assuming C99-or-later behavior for mbrtoc32,
which after all is a C11 API. Fix the !SUPPORT_OLD_MBRTOWC
code, which evidently was never tested seriously.
Paul Eggert [Sat, 23 Sep 2023 05:09:37 +0000 (22:09 -0700)]
wc: 3× speedup in C locale
The 3× speedup was measured by invoking 'wc $(find * -type f)'
on the coreutils sources etc. on an Ubuntu 23.04 x86-64.
These changes also speed up wc 20% in UTF-8 locales.
* src/wc.c (wc_isprint, wc_isspace): New static vars.
(wc): Use them for speed.
(main): Initialize them if needed.
(isnbspace): Remove; no longer used.
Paul Eggert [Fri, 22 Sep 2023 18:13:51 +0000 (11:13 -0700)]
wc: fix word count bug
* bootstrap.conf (gnulib_modules): Remove c32isprint.
* src/wc.c (wc): Consider all non-white-space characters
to be word constituents, even if they are not printable.
POSIX requires this, and it is what BSD does.
Partly do this by simplifying the check for a word,
by counting word starts rather than word ends.
* tests/wc/wc.pl: Test for the bug.
Paul Eggert [Fri, 22 Sep 2023 17:05:58 +0000 (10:05 -0700)]
maint: omit some unused function tests
* m4/jm-macros.m4: Do not check for ftruncate, iswspace,
mkfifo, mbrlen, sysctl. Coreutils no longer uses the
corresponding HAVE_* macros, typically because Gnulib
handles them now.
* src/wc.c (iswspace): Remove; unused.
Paul Eggert [Fri, 22 Sep 2023 16:45:12 +0000 (09:45 -0700)]
maint: prefer char32_t to wchar_t
This should work better on non-glibc platforms that don’t
use Unicode for wchar_t. However, POSIX appears to prohibit
this for printf.c so leave that alone.
* bootstrap.conf (gnulib_modules): Add btoc32, c32iscntrl,
c32isprint, c32isspace, c32width, mbrtoc32. Remove btoc, wcwidth.
* src/df.c, src/ls.c, src/wc.c:
Include uchar.h instead of wchar.h and wctype.h.
* src/df.c (replace_invalid_chars):
* src/ls.c (quote_name_buf):
* src/wc.c (isnbspace, wc):
Use char32_t instead of wchar_t.
Paul Eggert [Fri, 22 Sep 2023 15:17:15 +0000 (08:17 -0700)]
wc: simplify #if MB_LEN_MAX
* src/wc.c: Don’t have special #ifs for platforms where
MB_LEN_MAX is 1. On these platforms, MB_CUR_MAX is 1 as well,
so the compiler should optimize away all multi-byte code.