Pádraig Brady [Sun, 29 Aug 2021 19:57:33 +0000 (20:57 +0100)]
sum: always output a file name if one passed
Adjust to output the file name if any name parameter is passed.
This is consistent with sum -s, cksum, and sum implementations
on other platforms. This should not cause significant compat
issues, as multiple fields are already output, and so already
need to be parsed.
* src/sum.c (bsd_sum_file): Output the file name
if any name parameter is passed.
* tests/misc/sum.pl: Adjust accordingly.
* doc/coreutils.texi (sum invocation): Likewise.
* NEWS: Mention the change in behavior.
Paul Eggert [Mon, 30 Aug 2021 23:57:15 +0000 (16:57 -0700)]
tests: port better to NetBSD
* tests/misc/help-version.sh: Test that /dev/full causes
shell printf to fail. This ports better to NetBSD 9.88.46,
where it doesn’t. Problem reported by Nelson H. F. Beebe.
Paul Eggert [Mon, 30 Aug 2021 23:57:15 +0000 (16:57 -0700)]
tests: merge help-version changes back from gzip
* tests/misc/help-version.sh: Merge gzip-related changes
back from gzip/tests/help-version. This fixes problems
when TERM is not 'dumb', and should simplify maintenance.
Assaf Gordon [Mon, 16 Aug 2021 21:03:36 +0000 (15:03 -0600)]
basenc: fix bug49741: using wrong decoding buffer length
Emil Lundberg <lundberg.emil@gmail.com> reports in
https://bugs.gnu.org/49741 about a 'basenc --base64 -d' decoding bug.
The input buffer length was not divisible by 3, resulting in
decoding errors.
* NEWS: Mention fix.
* src/basenc.c (DEC_BLOCKSIZE): Change from 1024*5 to 4200 (35*3*5*8)
which is divisible by 3,4,5,8 - satisfying both base32 and base64;
Use compile-time verify() macro to enforce the above.
* tests/misc/basenc.pl: Add test.
Paul Eggert [Fri, 27 Aug 2021 22:29:46 +0000 (15:29 -0700)]
basenc: prefer signed to unsigned integers
This patch modifies basenc to prefer signed integers to
unsigned, as signed are less error-prone.
This patch also updates Gnulib to to latest, which updates Gnulib’s
base32 and base64 modules to prefer signed to unsigned integers.
* src/basenc.c: Include idx.h.
(struct base2_decode_context): Use unsigned char, not unsigned
for an octet that must fit in an unsigned char.
(base_encode, struct base_decode_context)
(base64_decode_ctx_wrapper, prepare_inbuf, base64url_encode)
(base64url_decode_ctx_wrapper, base32_decode_ctx_wrapper)
(base32hex_encode, base32hex_decode_ctx_wrapper, base16_encode)
(base16_decode_ctx, z85_encode, Z85_HI_CTX_TO_32BIT_VAL)
(z85_decoding, z85_decode_ctx, base2msbf_encode)
(base2lsbf_encode, base2lsbf_decode_ctx, base2msbf_decode_ctx)
(wrap_write, do_encode, do_decode, main):
Prefer signed integers to unsigned.
(main): Treat extremely large wrap columns as if they were
infinite; that’s good enough. Since we’re now using xstrtoimax,
this allows ‘-w -0’ (same as ‘-w 0’).
* tests/misc/base64.pl (gen_tests): -w-0 is no longer an error.
Jim Meyering [Wed, 25 Aug 2021 08:34:27 +0000 (01:34 -0700)]
maint: avoid new syntax-check failure
find-mount-point.h rightly includes <stdlib.h> for its use
of _GL_ATTRIBUTE_DEALLOC_FREE, which uses free, yet that new
inclusion provoked a syntax-check failure. Exempt this header
file as we've done for others.
* cfg.mk (exclude_file_name_regexp--sc_system_h_headers):
Add find-mount-point.h to the regexp.
(sc_system_h_headers): Use grep -E, for a more readable regexp.
Pádraig Brady [Tue, 24 Aug 2021 21:48:46 +0000 (22:48 +0100)]
tests: avoid reflinks when testing SEEK_DATA logic
This better tests the SEEK_HOLE logic which
replaced the original fiemap hole identification logic.
Also it avoids a false failure in sparse-2.sh
on reflink supporting file systems, where we
try to correlate the file sizes produced by cp and dd.
Paul Eggert [Sun, 22 Aug 2021 18:54:44 +0000 (11:54 -0700)]
maint: use clearerr on stdin when appropriate
This is so that commands like ‘fmt - -’ read from stdin
both times, even when it is a tty. Fix some other minor
issues that are related.
* src/blake2/b2sum.c (main):
* src/cksum.c (cksum):
* src/cut.c (cut_file):
* src/expand-common.c (next_file):
* src/fmt.c (fmt):
* src/fold.c (fold_file):
* src/md5sum.c (digest_file, digest_check):
* src/nl.c (nl_file):
* src/od.c (check_and_close):
* src/paste.c (paste_parallel, paste_serial):
* src/pr.c (close_file):
* src/sum.c (bsd_sum_file):
Use clearerr on stdin so that stdin can be read multiple times
even if it is a tty. Do not assume that ferror preserves errno as
POSIX does not guarantee this. Coalesce duplicate diagnostic
calls.
* src/blake2/b2sum.c (main):
* src/fmt.c (main, fmt):
Report read error, even if it's merely fclose failure.
* src/fmt.c: Include die.h.
(fmt): New arg FILE. Close input (reporting error) if not stdin.
All callers changed.
* src/ptx.c (swallow_file_in_memory): Clear stdin's EOF flag.
* src/sort.c (xfclose): Remove unnecessary feof call.
Paul Eggert [Mon, 16 Aug 2021 04:29:38 +0000 (21:29 -0700)]
chmod: fix use of uninitialized var if -v
Problem reported by Michael Debertol (Bug#50070).
* NEWS: Mention the fix.
* src/chmod.c (struct change_status): New struct, replacing the
old enum Change_status. All uses changed.
(describe_change): Distinguish between cases depending on
whether 'stat' or its equivalent succeeded. Report a line
of output even if 'stat' failed, as that matches the documentation.
Rework to avoid casts.
(process_file): Do not output nonsense modes computed from
uninitialized storage, removing a couple of IF_LINTs. Simplify by
defaulting to CH_NO_STAT.
Paul Eggert [Wed, 11 Aug 2021 18:16:05 +0000 (11:16 -0700)]
df: fix bug with automounted
If the command-line argument is automounted, df would use
stat info that became wrong after the following open.
* NEWS: Mention the fix (bug#50012).
* src/df.c (automount_stat_err): New function.
This fixes the hang on fifos in a better way, by using O_NONBLOCK.
(main): Use it.
Pádraig Brady [Sat, 7 Aug 2021 17:47:57 +0000 (18:47 +0100)]
cat: with -E fix handling of \r\n spanning buffers
We must delay handling when \r is the last character
of the buffer being processed, as the next character
may or may not be \n.
* src/cat.c (pending_cr): A new global to record whether
the last character processed (in -E mode) is '\r'.
(cat): Honor pending_cr when processing the start of the buffer.
(main): Honor pending_cr if no more files to process.
* tests/misc/cat-E.sh: Add test cases.
Fixes https://bugs.gnu.org/49925
Paul Eggert [Sat, 31 Jul 2021 18:14:43 +0000 (11:14 -0700)]
uniq: pacify GCC -fanalyzer
Pacify GCC 11.1 -fanalyzer.
* src/uniq.c (check_file): Use simpler test to check whether this
is the first time through the loop. Although the old test was
correct, the new one is easier to understand and perhaps a tiny
bit more efficient.
Paul Eggert [Sat, 31 Jul 2021 18:10:57 +0000 (11:10 -0700)]
numfmt: omit unnecessary pointer test
Caught by GCC 11.1 -fanalyzer.
* src/numfmt.c (simple_strtod_int): Remove unnecessary test of
*endptr vs NULL. Presumably this was a typo and **endptr was
intended instead of *endptr, but an **endptr test is also
unnecessary since c_isdigit (0) returns false.
* doc/coreutils.texi (tr invocation): Provide a summary
list of the available options, which is useful to
provide a quick reminder for those already familiar
with the functionality of tr.
Fixes https://bugs.gnu.org/49764
Paul Eggert [Wed, 28 Jul 2021 19:22:11 +0000 (12:22 -0700)]
doc: modernize usage of “disk” and “core”
In documentation and comments, don’t assume that secondary storage
devices are disk devices. Similarly, don’t assume that main memory
uses magnetic cores, which became obsolete in the 1970s.
* src/du.c (usage):
* src/ls.c (usage):
* src/shred.c (usage): Reword to avoid “disk” in usage messages.
Paul Eggert [Wed, 28 Jul 2021 18:26:48 +0000 (11:26 -0700)]
doc: improve ls documentation
* doc/coreutils.texi (ls invocation): Document implementation more
closely. Be more consistent about style. Omit some needless words.
* src/ls.c (usage): Don’t overdocument -f, as the details were wrong.
Omit -1 advice as it’s a bit obsolete now that we have --zero and
is a bit much for --usage output anyway.
Paul Eggert [Wed, 28 Jul 2021 00:34:43 +0000 (17:34 -0700)]
ls: rename --null to --zero (Bug#49716)
* NEWS, doc/coreutils.texi (General output formatting):
* src/ls.c (usage):
Document this.
* src/ls.c (ZERO_OPTION): Rename from NULL_OPTION.
All uses changed.
(long_options): Rename --null to --zero.
(dired_dump_obstack, main, print_dir): Use '\n' instead of
eolbyte where eolbyte must equal '\n'.
(decode_switches): Decode --zero instead of --null.
--zero also implies -1, -N, --color=none, --show-control-chars.
Use easier-to-decipher code to set ‘format’ and ‘dired’.
Reject attempts to combine --dired and --zero.
* tests/local.mk: Adjust to test script renaming.
* tests/ls/zero-option.sh: Rename from tests/ls/null-option.sh,
and test --zero instead of --null.
Paul Eggert [Tue, 27 Jul 2021 21:27:00 +0000 (14:27 -0700)]
ls: compute defaults more lazily
* src/ls.c (enum time_type, enum sort_type, enum indicator_style)
(enum Dereference_symlink, ignore_mode):
Put ‘= 0’ after default values, since the code relies
on static storage defaulting to zero.
(enum sort_type): Reorder so that -1 can be used to represent unset.
(main): Test print_with_color after parse_ls_color may have reset it.
(decode_line_length): Return the line length instead of setting
static storage. All uses changed. Treat line lengths exceeding
PTRDIFF_MAX as infinite, to avoid pointer-subtraction glitches.
(stdout_isatty): New function, to avoid calling isatty twice.
(decode_switches): Calculate defaults more lazily, to avoid using
syscalls or getenv during startup unless the results are more
likely to be needed. Use -1 to indicate options that haven’t been
set on the command line yet. Move print_with_color test from
here to ‘main’. Suppress bogus GCC warning.
(getenv_quoting_style): Return the quoting style instead of
setting static storage.
(init_column_info): New arg MAX_COLS, to avoid recalculating it.
Caller changed.
Paul Eggert [Mon, 26 Jul 2021 07:26:32 +0000 (00:26 -0700)]
ls: port to wider off_t, uid_t, gid_t
* src/ls.c (dired_pos): Now off_t, not size_t, since it counts
output file offsets.
(dired_dump_obstack): This obstack's file offsets are now
off_t, not size_t.
(format_user_or_group, format_user_or_group_width):
ID arg is now uintmax_t, not unsigned long, since uid_t and
gid_t values might exceed ULONG_MAX.
(format_user_or_group_width): Use snprintf with NULL instead of
sprintf with a discarded buffer. This avoids a stack buffer,
and so should be safer.
Paul Eggert [Mon, 26 Jul 2021 04:01:31 +0000 (21:01 -0700)]
ls: demacroize
Prefer functions or constants to macros where either will do.
That’s cleaner, and nowadays there’s no performance reason to
prefer macros. All uses changed.
* src/ls.c (INITIAL_TABLE_SIZE, MIN_COLUMN_WIDTH):
Now constants instead of macros.
(file_or_link_mode): New function, replacing the old macro
FILE_OR_LINK_MODE.
(dired_outbyte): New function, replacing the old macro DIRED_PUTCHAR.
(dired_outbuf): New function, replacing the old macro DIRED_FPUTS.
(dired_outstring): New function, replacing the old macro
DIRED_FPUTS_LITERAL.
(dired_indent): New function, replacing the old macro DIRED_INDENT.
(push_current_dired_pos): New function, replacing the old macro
PUSH_CURRENT_DIRED_POS.
(assert_matching_dev_ino): New function, replacing the old macro
ASSERT_MATCHING_DEV_INO.
(do_stat, do_lstat, stat_for_mode, stat_for_ino, fstat_for_ino)
(signal_init, signal_restore, cmp_ctime, cmp_mtime, cmp_atime)
(cmp_btime, cmp_size, cmp_name, cmp_extension)
(fileinfo_name_width, cmp_width, cmp_version):
No longer inline; compilers can deduce this well enough nowadays.
(main): Protect unused assert with ‘if (false)’ rather than
commenting it out, so that the compiler checks the code.
(print_dir): Output the space and newline in the same buffer
as the human-readable number they surround.
(dirfirst_check): New function, replacing the old macro
DIRFIRST_CHECK. Simplify by using subtraction.
(off_cmp): New function, replacing the old macro longdiff.
(print_long_format): No need to null-terminate the string now.
(format_user_or_group): Let printf count the bytes.
Paul Eggert [Mon, 26 Jul 2021 01:54:10 +0000 (18:54 -0700)]
ls: simplify sprintf usage
* src/ls.c (format_user_or_group_width, print_long_format):
Use return value from sprintf instead of calling strlen on
the resulting buffer, or inferring the length some other way.
Kamil Dudka [Wed, 30 Jun 2021 15:53:22 +0000 (17:53 +0200)]
df: fix duplicated remote entries due to bind mounts
As originally reported in <https://bugzilla.redhat.com/1962515>,
df invoked without -a printed duplicated entries for NFS mounts
of bind mounts. This is a regression from commit v8.25-54-g1c17f61ef99,
which introduced the use of a hash table.
The proposed patch makes sure that the devlist entry seen the last time
is used for comparison when eliminating duplicated mount entries. This
way it worked before introducing the hash table.
Patch co-authored by Roberto Bergantinos.
* src/ls.c (struct devlist): Introduce the seen_last pointer.
(devlist_for_dev): Return the devlist entry seen the last time if found.
(filter_mount_list): Remember the devlist entry seen the last time for
each hashed item.
* NEWS: Mention the bug fix.
Fixes https://bugs.gnu.org/49298
Paul Eggert [Sun, 27 Jun 2021 01:23:52 +0000 (18:23 -0700)]
tail: use poll, not select
This fixes an unlikely stack out-of-bounds write reported by
Stepan Broz via Kamil Dudka (Bug#49209).
* bootstrap.conf (gnulib_modules): Replace select with poll.
* src/tail.c: Do not include <sys/select.h>.
[!_AIX]: Include poll.h.
(check_output_alive) [!_AIX]: Use poll instead of select.
(tail_forever_inotify): Likewise. Simplify logic, as there is no
need for a ‘while (len <= evbuf_off)’ loop.
Pádraig Brady [Sun, 20 Jun 2021 20:26:21 +0000 (21:26 +0100)]
stat: use decomposed decimal device numbers by default
* src/stat.c (default_format): Use decomposed decimal
representation (major,minor) in the default format.
This is least ambiguous for human interpretation,
and more consistent with ls for example.
Fixes https://bugs.gnu.org/48960
Pádraig Brady [Sun, 20 Jun 2021 14:16:49 +0000 (15:16 +0100)]
stat: support more device number representations
In preparation for changing the default device number
representation (to decomposed decimal), provide more
formatting options for device numbers.
These new (FreeBSD compat) formatting options are added:
%Hd major device number in decimal (st_dev)
%Ld minor device number in decimal (st_dev)
%Hr major device type in decimal (st_rdev)
%Lr minor device type in decimal (st_rdev)
%r (composed) device type in decimal (st_rdev)
%R (composed) device type in hex (st_rdev)
* doc/coreutils.texi (stat invocation): Document new formats.
* src/stat.c (print_it): Handle the new %H and %L modifiers.
(print_statfs): Adjust to passing the format as two chars
rather than an int. Using an int was introduced in commit db42ae78,
but using separate chars is cleaner and more extensible.
(print_stat): Likewise. Handle any modifiers and the new 'r' format.
(usage): Document the new formats.
* tests/misc/stat-fmt.sh: Add a test case for new modifiers.
Addresses https://bugs.gnu.org/48960
Paul Eggert [Sat, 12 Jun 2021 00:41:37 +0000 (17:41 -0700)]
build: update gnulib submodule to latest
Coreutils mistakenly did not list xstrndup as a module
that it depends on directly. When the latest Gnulib removed
the dirname module's dependency on xstrndup, this mistake
caused coreutils to not build. Since all of Coreutils's
uses of xstrndup know the string length, xmemdup0 is a better
match for what's needed. Since the size args are typically
signed or derived from subtracting pointers, the new Gnulib
ximemdup0 function is a better match yet.
So, use ximemdup0 instead of xstrndup.
* src/cut.c, src/dircolors.c, src/expand-common.c, src/expand.c:
* src/numfmt.c, src/set-fields.c, src/unexpand.c:
Do not include xstrndup.h; no longer needed.
* src/dircolors.c (parse_line):
* src/expand-common.c (parse_tab_stops):
* src/numfmt.c (parse_format_string):
* src/set-fields.c (set_fields):
Use ximemdup0 instead of xstrndup.
Pádraig Brady [Sat, 15 May 2021 11:40:45 +0000 (12:40 +0100)]
copy: remove fiemap logic
This is now only used on 10 year old linux kernels,
and performs a sync before each copy.
* src/copy.c (extent_copy): Remove function and all callers.
* src/extent-scan.c: Remove.
* src/extent-scan.h: Remove.
* src/fiemap.h: Remove.
* src/local.mk: Adjust for removed files.
* NEWS: Adjust to say fiemap is removed.
Pádraig Brady [Wed, 12 May 2021 22:47:38 +0000 (23:47 +0100)]
copy: disallow copy_file_range() on Linux kernels before 5.3
copy_file_range() before Linux kernel release 5.3 had many issues,
as described at https://lwn.net/Articles/789527/, which was
referenced from https://lwn.net/Articles/846403/; a more general
article discussing the generality of copy_file_range().
Linux kernel 5.3 was released in September 2019, which is new enough
that we need to actively avoid older kernels.
* src/copy.c (functional_copy_file_range): A new function
that returns false for Linux kernels before version 5.3.
(sparse_copy): Call this new function to gate use of
copy_file_range().
Pádraig Brady [Sun, 9 May 2021 22:41:00 +0000 (23:41 +0100)]
tests: fix tests/cp/sparse-2.sh false failure on some systems
* tests/cp/sparse-2.sh: Double check cp --sparse=always,
with dd conv=sparse, in the case where the former didn't
create a sparse file. Now that this test is being newly run
on macos, we're seeing a failure due to seek() not creating
holes on apfs unless the size is >= 16MiB.
Pádraig Brady [Sun, 9 May 2021 13:29:01 +0000 (14:29 +0100)]
tests: ensure we test SEEK_DATA where used
fiemap is no longer the default copy implementation,
so check for SEEK_DATA support instead as that's preferred.
This will ensure better test coverage on systems without fiemap.
* init.cfg: Replace fiemap_capable_ with seek_data_capable_.
This is best supported with python 3 so prefer that.
* tests/seek-data-capable: A new test script checking for
SEEK_DATA support on the passed file name,
called from seek_data_capable_.
* tests/fiemap-capable: Remove no longer used probing script.
* tests/cp/fiemap-perf.sh: Renamed to tests/cp/sparse-perf.sh
* tests/cp/fiemap-2.sh: Renamed to tests/cp/sparse-2.sh
* tests/cp/fiemap-extents.sh: Renamed to tests/cp/sparse-extents.sh
* tests/cp/sparse-fiemap.sh: Renamed to tests/cp/sparse-extents-2.sh
* tests/cp/fiemap-FMR.sh: Renamed to tests/cp/copy-FMR.sh
* tests/local.mk: Reference the renamed tests.
Pádraig Brady [Sat, 8 May 2021 18:23:20 +0000 (19:23 +0100)]
copy: handle system security config issues with copy_file_range()
* src/copy.c (sparse_copy): Upon EPERM from copy_file_range(),
fall back to a standard copy, which will give a more accurate
error as to whether the issue is with the source or destination.
Also this will avoid the issue where seccomp or apparmor are
not configured to handle copy_file_range(), in which case
the fall back standard copy would succeed without issue.
This specific issue with seccomp was noticed for example in:
https://github.com/golang/go/issues/40900
Pádraig Brady [Sun, 9 May 2021 20:55:22 +0000 (21:55 +0100)]
copy: handle EOPNOTSUPP from SEEK_DATA
* src/copy.c (infer_scantype): Ensure we don't error out
if SEEK_DATA returns EOPNOTSUPP, on systems where this value
is distinct from ENOTSUP. Generally both of these should be checked.
Pádraig Brady [Sat, 8 May 2021 16:18:54 +0000 (17:18 +0100)]
copy: handle ENOTSUP from copy_file_range()
* src/copy.c (sparse_copy): Ensure we fall back to
a standard copy if copy_file_range() returns ENOTSUP.
This generally is best checked when checking ENOSYS,
but it also seems to be a practical concern on Centos 7,
as a quick search gave https://bugzilla.redhat.com/1840284
Carl Edquist [Mon, 10 May 2021 10:22:11 +0000 (05:22 -0500)]
build: fix __get_cpuid_count check to catch link failure
The test program will compile successfully even if __get_cpuid_count
is not declared. The error for the missing symbol will only show up
at link time. Thus, use AC_LINK_IFELSE instead of AC_COMPILE_IFELSE.
* configure.ac (__get_cpuid_count check): Use C_LINK_IFELSE instead
of AC_COMPILE_IFELSE.
(__get_cpuid check): Likewise.
Pádraig Brady [Mon, 3 May 2021 17:11:04 +0000 (18:11 +0100)]
maint: consistently free hash structures in dev mode
Ensure we call hash_free() to avoid valgrind and leak_sanitizer
"definitely lost" warnings. These were not real leaks as
we terminate immediately after, but we should avoid these
"definitely lost" warnings where possible.
* src/copy.c: Add dest_info_free() and src_info_free().
* src/copy.h: Declare the above.
* src/cp-hash.c: Don't define unless "lint" is defined.
* src/install.c: Call dest_info_free() in dev mode.
* src/mv.c: Likewise.
* src/cp.c: Likewise. Also call src_info_free().
* src/ln.c: Call hash_free() in dev mode.
* src/tail.c: Call hash_free() even if about to exit, in dev mode.
Pádraig Brady [Sun, 2 May 2021 20:27:17 +0000 (21:27 +0100)]
copy: ensure we enforce --reflink=never
* src/copy.c (sparse_copy): Don't use copy_file_range()
with --reflink=never as copy_file_range() may implicitly
use acceleration techniques like reflinking.
(extent_copy): Pass through whether we allow reflinking.
(lseek_copy): Likewise.
Fixes https://bugs.gnu.org/48164
Pádraig Brady [Sat, 1 May 2021 19:02:02 +0000 (20:02 +0100)]
wc: add --debug to diagnose which implementation used
* src/wc.c: (main): Handle the new --debug option.
Only call avx2_supported if needed.
(avx2_supported): Diagnose various failures and attempts.
* NEWS: Mention the new wc improvement and --debug option.
wc: use avx2 optimization when counting only lines
Use cpuid to detect CPU support for avx2 instructions.
Performance was seen to improve by 5x for a file with only newlines,
while the performance for a file with no such characters is unchanged.
* configure.ac [USE_AVX2_WC_LINECOUNT]: A new conditional,
set when __get_cpuid_count() and avx2 compiler intrinsics are supported.
* src/wc.c (avx2_supported): A new function using __get_cpuid_count()
to determine if avx2 instructions are supported.
(wc_lines): A new function refactored from wc(),
which implements the standard line counting logic,
and provides the fallback implementation for when avx2 is not supported.
* src/wc_avx2.c: A new module to implement using avx2 intrinsics.
* src/local.mk: Reference the new module. Note we build as a separate
lib so that it can be portably built with separate -mavx2 etc. flags.
Paul Eggert [Sat, 1 May 2021 22:19:16 +0000 (15:19 -0700)]
touch: fix wrong diagnostic (Bug#48106)
Problem reported by Roland (Bug#48106).
* src/touch.c (touch): Take more care when deciding whether
to use open_errno or utime_errno in the diagnostic.
Stop worrying about SunOS 4 (which as part of the problem),
as it’s long obsolete. For Solaris 10, verify that EINVAL
really means the file was a directory.
Paul Eggert [Tue, 27 Apr 2021 06:27:59 +0000 (23:27 -0700)]
maint: port to Autoconf 2.71
* configure.ac: Use AC_PROG_CC, not AC_PROG_CC_STDC.
* gl/modules/smack (configure.ac):
* m4/jm-macros.m4 (coreutils_MACROS):
* m4/xattr.m4 (gl_FUNC_XATTR):
Use AS_HELP_STRING, not AC_HELP_STRING.
* m4/check-decl.m4 (gl_CHECK_DECLS):
Do not require AC_HEADER_TIME; we no longer care about it directly.
* m4/jm-macros.m4 (coreutils_MACROS):
Do not require AC_ISC_POSIX, which became obsolete in 2006.
Use AC_LINK_IFELSE instead of AC_TRY_LINK.
Paul Eggert [Tue, 27 Apr 2021 01:02:16 +0000 (18:02 -0700)]
build: update gnulib submodule to latest
* src/csplit.c (load_buffer):
* src/pinky.c (create_fullname):
Use intprops-based checks rather than xalloc_oversized,
since Gnulib xalloc.h no longer includes xalloc-oversized.h.
Zorro Lang [Mon, 26 Apr 2021 15:25:18 +0000 (17:25 +0200)]
copy: do not refuse to copy a swap file
* src/copy.c (sparse_copy): Fallback to read() if copy_file_range()
fails with ETXTBSY. Otherwise it would be impossible to copy files
that are being used as swap. This used to work before introducing
the support for copy_file_range() in coreutils. (Bug#48036)
doc: clarify that ln --relative requires --symbolic to be specified
* doc/coreutils.texi (ln invocation): State --symbolic is required.
* src/ln.c (usage): Explicitly state -s is not implied.
Fixes https://bugs.gnu.org/47703
* src/wc.c (usage): State that only printable characters are considered
when counting words. This also disambiguates wether we're talking
about bytes or characters in this context.
* doc/coreutils.texi (wc invocation): Likewise. Also clarify
that --characters counts valid locale aware characters,
and that --lines does not count a trailing "line" unless
it ends with a newline character.
Fixes https://bugs.gnu.org/47702
This is especially important now for --sort=width,
as that can greatly increase how often this
expensive quote_name_width() function is called per file.
This also helps the default invocation of ls,
or specifically the --format={across,vertical} cases
(when --width is not set to 0),
to avoid two calls to this function per file.
Note the only case where we later compute the width,
is for --format=commas. That's only done once though,
so we leave the computation close to use to
maximize hardware caching.
* src/ls.c (struct fileinfo): Add a WIDTH member to cache
the screen width of the file name.
(update_current_files_info): Set the WIDTH members for cases
they're needed multiple times. Note we do this explicitly here,
rather than caching at use, so that the fileinfo
structures can remain const in the sorting and presentation functions.
(sort_files): Call the new update_current_files_info() in this
initialization function.
(fileinfo_name_width): Renamed from fileinfo_width,
and adjusted to return the cached value if available.
Carl Edquist [Fri, 26 Mar 2021 09:27:54 +0000 (04:27 -0500)]
ls: add --sort=width option to sort by file name width
This helps identify the outliers for long filenames, and also produces
a more compact display of columns when listing a directory with many
entries of various widths.
* src/ls.c (sort_type, sort_types, sort_width): New sort_width sort
type.
(sort_args): Add "width" sort arg.
(cmp_width, fileinfo_width): New sort function and helper for file name
width.
(quote_name_width): Add function prototype declaration.
(usage): Document --sort=width option.
* doc/coreutils.texi: Document --sort=width option.
* tests/ls/sort-width-option.sh: New test for --sort=width option.
* tests/local.mk: Reference new test.
* NEWS: Mention the new feature.
Paul Eggert [Tue, 30 Mar 2021 04:42:44 +0000 (21:42 -0700)]
env: simplify --split-string memory management
* bootstrap.conf (gnulib_modules): Add idx.
* src/env.c: Include idx.h, minmax.h.
Prefer idx_t to ptrdiff_t when values are nonnegative.
(valid_escape_sequence, escape_char, validate_split_str)
(CHECK_START_NEW_ARG):
Remove; no longer needed now that we validate as we go.
(struct splitbuf): New type.
(splitbuf_grow, splitbuf_append_byte, check_start_new_arg)
(splitbuf_finishup): New functions.
(build_argv): New arg ARGC. Validate and process in one go, using
the new functions; this is simpler and more reliable than the old
approach (as witness the recent bug). Avoid integer overflow in
the unlikely case where the string contains more than INT_MAX
arguments.
(parse_split_string): Simplify by exploiting the new build_argv.
Paul Eggert [Fri, 26 Mar 2021 21:06:26 +0000 (14:06 -0700)]
env: remove asserts
The assertions didn’t help catch the most recent bug which
was in their area, and kind of get in the way.
* src/env.c: Do not include <assert.h>, and remove all assertions.
These seem to have been put in to pacify gcov, but surely there’s
a better way.
(escape_char): Pacify GCC with 'assume' instead.
Paul Eggert [Fri, 26 Mar 2021 20:49:49 +0000 (13:49 -0700)]
env: fix address violation with '\v' in -S
Problem reported by Frank Busse (Bug#47412).
* src/env.c (C_ISSPACE_CHARS): New macro.
(shortopts, build_argv, main): Treate all C-locale space
characters like space and tab, for compatibility with FreeBSD.
(validate_split_str, build_argv, parse_split_string):
Use the C locale, not the current locale, to determine whether a
byte is a space character.
Paul Eggert [Wed, 24 Mar 2021 23:43:05 +0000 (16:43 -0700)]
cksum: port recent changes to macOS
* src/cksum.c (cksum_slice8): Fix bug on little-endian
platforms lacking __bswap_32: the SWAP macro evaluates
its argument multiple times, but the macro has a side effect.
Paul Eggert [Sun, 21 Mar 2021 21:00:26 +0000 (14:00 -0700)]
ptx: remove use of diacrit module
The diacrit module is obsolete, and ptx’s use of it is obsolete
too; it assumes an 8-bit locale (not that common these days) and
that TeX cannot process the 8-bit characters (nowadays, it can).
* NEWS, doc/coreutils.texi (Charset selection in ptx): Document this.
* bootstrap.conf (gnulib_modules): Remove diacrit.
* src/ptx.c: Do not include diacrit.h.
(print_field, fix_output_parameters): Remove obsolete support
for 8-bit diacritics.
Pádraig Brady [Sat, 13 Mar 2021 18:10:12 +0000 (18:10 +0000)]
cksum: add --debug to diagnose which implementation used
* src/cksum.c: (main): Use getopt_long to parse options,
and handle the new --debug option.
(pclmul_supported): Diagnose various failures and attempts.
* NEWS: Mention the new option.
cksum: use pclmul hardware instruction for CRC32 calculation
Use cpuid to detect CPU support for hardware instruction.
Fall back to slice by 8 algorithm if not supported.
A 500MiB file improves from 1.40s to 0.67s on an i3-2310M
* configure.ac [USE_PCLMUL_CRC32]: A new conditional,
set when __get_cpuid() and clmul compiler intrinsics are supported.
* src/cksum.c (pclmul_supported): A new function using __get_cpuid()
to determine if pclmul instructions are supported.
(cksum): A new function refactored from cksum_slice8(),
which calls pclmul_supported() and then cksum_slice8()
or cksum_pclmul() as appropriate.
* src/cksum.h: Export the crctab array for use in the new module.
* src/cksum_pclmul.c: A new module to implement using pclmul intrinsics.
* po/POTFILES.in: Reference the new cksum_pclmul module.
* src/local.mk: Likewise. Note we build it as a separate library
so that it can be portably built with separate -mavx etc. flags.
* tests/misc/cksum.sh: Add new test modes for pertinent buffer sizes.
Pádraig Brady [Sun, 14 Mar 2021 22:43:42 +0000 (22:43 +0000)]
maint: propagate DEPENDENCIES to libs in single binary mode
build-aux/gen-single-binary.sh (override_single): A new function
to refactor the existing mappings for dir, vdir, and arch.
This function now also sets the DEPENDENCIES variable so that these
dependencies can be maintained later in the script, where
we now propagate the automake generated $(src_$cmd_DEPENDENCIES)
to our equivalent src_libsinglebin_$cmd_a_DEPENDENCIES.
This will ensure that any required libs are built,
which we require in a following change to cksum that
builds part of it as a separate library.
Pádraig Brady [Tue, 16 Feb 2021 05:07:40 +0000 (05:07 +0000)]
rmdir: diagnose non following of symlinks with trailing slash
GNU/Linux is unusual here in that rmdir("symlink/") returns ENOTDIR,
whereas Solaris and FreeBSD at least, will follow the symlink
and remove the target directory. We don't make the behavior
on Linux kernels consistent, but at least clarify
the confusing error message.
* src/rmdir (main): Output a specific error message for the above case.
(remove_parents): In the error message, don't assume intermediate paths
are directories, as they could be symlinks.
* tests/rmdir/symlink-errors.sh: Add a new test.
* tests/local.mk: Reference the new test.
* NEWS: Mention the improvement.