Paul Eggert [Sun, 3 Mar 2024 21:27:32 +0000 (13:27 -0800)]
tar: fix current_block confusion
Problem reported by Robert Morris in:
https://lists.gnu.org/r/bug-tar/2024-03/msg00001.html
* src/delete.c (flush_file): Simply return at EOF,
so that current_block continues to point to end of input.
Paul Eggert [Sun, 3 Mar 2024 21:17:32 +0000 (13:17 -0800)]
tar: improve diagnostic for truncated archive
* src/buffer.c (seek_archive): If EOF has been read, don’t attempt
to seek past it. This replaces a bogus "rmtlseek not stopped at a
record boundary" message with a better "Unexpected EOF in archive"
when I run ‘tar tvf gtar13c.tar’ using the gtar13.tar file here:
https://lists.gnu.org/r/bug-tar/2024-03/msg00001.html
When given -c -a, issue a warning if no compressor is associated with the suffix.
* src/suffix.c (find_compression_suffix): Always return stripped
archive name length in the last argument. Return 0 if there is no
suffix.
(find_compression_program): Remove.
(set_compression_program_by_suffix): Take third argument, controlling
whether to issue a warning if no suitable compression program is found
for the suffix.
* src/common.h (set_compression_program_by_suffix): Change prototype.
* src/buffer.c, src/tar.c: All uses of set_compression_program_by_suffix
changed.
Paul Eggert [Tue, 2 Jan 2024 03:09:59 +0000 (19:09 -0800)]
Skip test on macOS 12.6
* tests/xform04.at: Skip test on macOS 12.6, which is behind the times
and doesn’t think that ⱥ (U+2C65 LATIN SMALL LETTER A WITH STROKE) is
printable.
Paul Eggert [Wed, 13 Sep 2023 04:21:18 +0000 (23:21 -0500)]
Support multi-byte --transform='...\L...' etc
Support upcasing and downcasing in multi-byte locales.
* gnulib.modules: Add c32rtomb, c32tolower, c32toupper,
mbrtoc32-regular.
* src/transform.c: Do not include ctype.h. Include mcel.h.
(stk, stk_init): Move up.
(run_case_conv): Return void, not char *. Append result to
stk directly; this avoids the need for a separate allocation.
All callers changed. Do not assume a single-byte locale.
* tests/xform04.at: New test.
* tests/Makefile.am (TESTSUITE_AT):
* tests/testsuite.at: Add it.
Paul Eggert [Tue, 12 Sep 2023 05:15:52 +0000 (00:15 -0500)]
Parse in a more locale-independent way
update submodules to latest
* gnulib.modules: Add c-ctype.
* lib/wordsplit.c, src/buffer.c, src/exclist.c, src/incremen.c:
* src/list.c, src/misc.c, src/names.c, src/sparse.c, src/tar.c:
* src/xheader.c:
Include c-ctype.h, and use its API rather than ctype.h’s.
This is more likely to work when oddball locales are used.
* src/transform.c: Include ctype.h, since this module still uses
tolower and toupper (this is probably wrong - should be multi-byte).
Paul Eggert [Mon, 11 Sep 2023 06:17:02 +0000 (01:17 -0500)]
Fix pointer bug in drop_volume_label_suffix
Problem reported by Marc Espie in:
https://lists.gnu.org/r/bug-tar/2023-09/msg00003.html
* src/buffer.c (drop_volume_label_suffix):
Redo to not compute a pointer before the start of a buffer,
as this is not portable.
Paul Eggert [Sun, 10 Sep 2023 17:10:52 +0000 (10:10 -0700)]
Prefer mcel to mbuiter
Prefer the lighter-weight mcel implementation to the heavier-weight
mbuiter that GNU tar does not need.
* bootstrap.conf (avoided_gnulib_modules): Avoid mbuiter, mbuiterf.
* gnulib.modules: Add mcel-prefer.
Paul Eggert [Mon, 21 Aug 2023 20:40:37 +0000 (13:40 -0700)]
Simplify recently-added hash code
* src/extract.c (delay_set_stat): Simplify hash lookup;
no need to initialize members other than file_name.
Avoid assignment in ‘if’ when it’s easy.
(extract_finish): Do not bother to free when we are about to exit.
delayed_set_stat avoids inserting duplicate entries into
delayed_set_stat_head. It was doing this by scanning the entire
list.
Normally this list is small, but if --delay-directory-restore is
used (including automatically for incremental archives), this list
grows with the total number of directories in the archive.
The entire scan takes O(n) time. Extracting an archive with n
directories could therefore take O(n^2) time.
The included test uses AT_SKIP_LARGE_FILES, allowing it to optionally be
skipped. It may execute slowly on certain filesystems or disks, as it
creates thousands of directories.
There are still potentially problematic O(n) scans in
find_direct_ancestor and remove_delayed_set_stat, which this patch does
not attempt to fix.
* NEWS: Update.
* src/extract.c (delayed_set_stat_table): Create a table for O(1)
lookups of entries in the delayed_set_stat_head list. The list
remains, as tracking insertion order is important.
(dl_hash, dl_compare): New hash table helper functions.
(delay_set_stat): Create the hash table, replace the O(n) list scan
with a hash_lookup, insert new entries into the hash table.
(remove_delayed_set_stat): Also remove entry from hash table.
(apply_nonancestor_delayed_set_stat): Also remove entry from hash
table.
(extract_finish): Free the (empty) hash table.
* tests/extrac26.at: New file.
* tests/Makefile.am (TESTSUITE_AT): Include extrac26.at.
* tests/testsuite.at: Include extrac26.at.
Commit e89c7a45eb broke deletion from archives. The reported number
of bytes read is rounded to the nearest record anyway, revert the
commit and document the fact.
Reported by Ed Santiago. See
https://bugzilla.redhat.com/show_bug.cgi?id=2230127
* doc/tar.texi: Document the fact that --totals rounds up the
number of bytes reads to the nearest record.
* src/buffer.c: Revert changes.
* tests/delete06.at: Fix expected status code and stderr.
Paul Eggert [Wed, 2 Aug 2023 15:41:12 +0000 (08:41 -0700)]
Stop using alloca
* gnulib.modules: Remove alloca.
* src/create.c (dump_file0): Return address of any allocated
storage. Caller changed to free it. Use xmalloc instead
of alloca, to obtain this storage.
* src/list.c (from_header): Use quote_mem instead of quote,
removing the need to use alloca.
Paul Eggert [Tue, 25 Jul 2023 16:43:16 +0000 (09:43 -0700)]
Improve reproducibility recipe
* doc/tar.texi (Reproducibility): Improve index.
Improve and add comments to recipe. In the recipe,
don’t worry about file names beginning with ‘-’ for simplicity;
don’t use touch -c as it exits with status 0 even when it
does not work; and set directory timestamps too.
Paul Eggert [Wed, 19 Jul 2023 22:48:25 +0000 (15:48 -0700)]
tests: fix LDADD
Problem reported by Christian Weisgerber <naddy@mips.inka.de> in:
https://lists.gnu.org/r/bug-tar/2023-07/msg00015.html
* tests/Makefile.am (LDADD): Add $(LIBINTL), $(LIBICONV).
Paul Eggert [Tue, 18 Jul 2023 16:15:03 +0000 (09:15 -0700)]
tests: fix TESTSUITE_AT
Problem reported by Lukas Javorsky <ljavorsk@redhat.com> in:
https://lists.gnu.org/r/bug-tar/2023-07/msg00002.html
* tests/Makefile.am (TESTSUITE_AT): Add exclude17.at, exclude18.at.
Remove compress.m4; all uses changed. Add a comment saying how
to rederive this. Sort.
* src/common.h (name): New field: is_wildcard.
(name_scan): Change protoype.
* src/delete.c: Update calls to name_scan.
* src/names.c (addname, add_starting_file): Initialize is_wildcard.
(namelist_match): Take two arguments. If second one is true, return
only exact matches.
(name_scan): Likewise. All callers updated.
(name_from_list): Skip patterns.
* src/update.c (remove_exact_name): New function.
(update_archive): Do not remove matching name, if it is a pattern.
Instead, add a new entry with the matching file name.
* tests/update04.at: New test.
* tests/Makefile.am: Add new test.
* tests/testsuite.at: Include new test.
* doc/tar.1: Add missing dots, use plural when necessary,
tweak a wording. Remove an incorrect observation, three times.
Add some missing articles, correct some formatting,
and expand the opaque descriptions of two options.
* doc/tar.texi: Drop a stray `cd` command from an example.
Correct two cross references, correct the paragraph
about the manpage, and unbreak a URL.
* src/names.c: Correct and shorten an error message: "non-optional"
means "mandatory", but "non-option" is what was meant. And the
phrase "in archive create or update mode" was both unneeded and
incomplete.
* tests/positional01.at: Change expected error text.
* tests/positional02.at: Likewise.
* tests/positional03.at: Likewise.
Paul Eggert [Sun, 25 Jun 2023 19:54:20 +0000 (12:54 -0700)]
tar: extract delayed links in order
Extract delayed links in tar file order, rather than
in hash table order with modifications.
This is simpler and more likely to use the kernel’s
cached filesystem data, assuming related delayed links
are nearby in the tar file.
* src/extract.c (struct delayed_link.has_predecessor):
Remove. All uses removed.
(delayed_link_head, delayed_link_tail): New static vars.
This resurrects delayed_link_head’s old function
except that the linked list is now in forward order, not reverse.
(find_delayed_link_source): Now simply returns bool,
since the callers no longer need the pointer.
(create_placeholder_file):
Put the delayed link at the end of the linked list.
Omit no-longer-needed last arg. All callers changed.
(apply_delayed_links): Simplify now that we can just iterate
through the delayed_link_head list.
Paul Eggert [Sun, 25 Jun 2023 20:54:14 +0000 (13:54 -0700)]
tar: make safe for -Wunused-parameter
This also ports to C23 [[maybe_unused]].
* configure.ac (WARN_CFLAGS): Do not add -Wno-unused-parameter.
Add MAYBE_UNUSED where needed in source code.
Also, put it at the front where C23 requires it.
* src/extract.c (create_placeholder_file): Use FLEXNSIZEOF (overlooked
by c542d3d0c8)
(apply_delayed_links): Don't follow the "next" chain after its entries
have been applied.
Paul Eggert [Fri, 16 Jun 2023 23:34:19 +0000 (16:34 -0700)]
Port to strict C99 struct hack
Portability bug caught by GCC 13 -fstrict-flex-arrays.
* gnulib.modules: Add flexmember.
* src/create.c (struct link):
* src/exclist.c (struct excfile):
* src/extract.c (struct delayed_link, struct string_list):
Include <flexmember.h>. Use FLEXIBLE_ARRAY_MEMBER, for
portability to strict C99 or later. All storage
allocations changed to use FLEXNSIZEOF.
Paul Eggert [Fri, 16 Jun 2023 23:34:19 +0000 (16:34 -0700)]
Use Gnulib ‘dup2’ module
This simplifies code that would otherwise use dup and close.
* gnulib.modules: Add dup2.
* src/system.c: Add #pragma to pacify GCC 13.
(xdup2): Simplify by using dup2.
Pavel Raiskup [Tue, 6 Jun 2023 09:33:27 +0000 (12:33 +0300)]
Fix --xattr-include='*' documentation
* doc/tar.texi (Extended File Attributes): The default extraction
pattern consists of just 'user.*' namespace only. While on it, try
to explain the reasons for this default behavior.
Based on patch from Fabian Grünbichler <f.gruenbichler@proxmox.com>
* src/xattrs.c (acls_get_text): New function. If given --numeric-owner,
use acl_to_any_text to convert ACL to textual representation. Print
warning if that function is not available.
(xattrs__acls_get_a, xattrs__acls_get_d): Use acls_get_text.
Add missing option to manpage and remove duplicate operation
* doc/tar.1: Add needed option -f after operation -A, sort operation -t
alphabetically, add --file after --concatenate, consistently use long
option --file in the GNU-style section, and delete duplicate --update.
* doc/tar.texi: Add small missing word, and lowercase a letter.
* src/extract.c (maybe_recoverable): If make_directories indicates
success, suppose some intermediate directories have been made, even
if in fact they have not. That's necessary to avoid dead loops when
maybe_recoverable is called with the same arguments again.
Paul Eggert [Fri, 6 Jan 2023 20:47:09 +0000 (12:47 -0800)]
Go back to single-file bootstrap
Gnulib now supports a single-file bootstrap with --pull
and --gen options, in place of the three files
autopull.sh, autogen.sh, bootstrap-funclib.sh.
This keeps the top level a bit cleaner.
* bootstrap: Sync from Gnulib build-aux/bootstrap
instead of from top/bootstrap.
* autopull.sh, autogen.sh, bootstrap-funclib.sh: Remove.
Optionally warn about missing zero blocks at the end of the archive
(In response to savannah bug #63574)
* doc/intern.texi: Document actual tar behaviour in regard to
missing end-of-file marker.
* doc/tar.texi: Rewrite the "warnings" section. Document
--warning=missing-zero-blocks
* src/common.h (WARN_MISSING_ZERO_BLOCKS): New constant.
(WARN_ALL): Include all warning bits.
* src/list.c (read_and): If EOF is reached without seeing end-of-file
blocks and the "missing-zero-blocks" warning is requested, warn about
the fact.
* src/warning.c: New warnings: "missing-zero-blocks", "verbose".
(warning_option): Change definition to reflect changes in common.h
Paul Eggert [Fri, 4 Nov 2022 06:07:11 +0000 (23:07 -0700)]
Fix -Af F bug when F is not a regular file
Problem reported by Boris Gjenero in:
https://lists.gnu.org/r/bug-tar/2022-11/msg00001.html
* src/update.c (append_file): Don’t assume that FILE_NAME is a
regular file whose size can be determined before reading.
Instead, simply read from the file until its end is reached.
Paul Eggert [Fri, 4 Nov 2022 05:56:18 +0000 (22:56 -0700)]
Fix README-* files
README-alpha is for alpha releases, which are not from Git or CVS, so
omit mention of that. I'm not sure we'll ever do alpha releases, but
if we do, README-alpha assumes the tarballs are already bit.
Update README-hacking with info that was mistakenly put into
README-alpha. Also mention Bison, needed for parse-date.y.
The bug was introduced by commit 79d1ac38c1, which didn't take into
account all the consequences of returning RECOVER_OK on EEXIST, in
particular interactions with the delayed_set_stat logic.
The commit 79d1ac38c1 is reverted (the bug it was intended to fix
was actually fixed by 79a442d7b0). Instead:
* src/extract.c (maybe_recoverable): Don't call maybe_recoverable
if EEXIST is reported when UNLINK_FIRST_OLD_FILES option is set.
Aurélien Martin [Fri, 7 Oct 2022 19:08:40 +0000 (21:08 +0200)]
tar: fix --exclude-vcs-ignores memory
The function frees the patterns' wordsplit structure without asking
add_exclude to reallocate the strings. In many cases, this leads to
each file name in the directory being checked against the memory
location where it just got reallocated.
* src/exclist.c: Use EXCLUDE_ALLOC.
Copyright-paperwork-exempt: Yes
Paul Eggert [Sat, 10 Sep 2022 21:44:36 +0000 (16:44 -0500)]
build: update submodules to latest
* src/common.h: Include <inttostr.h> since paxutils no longer does.
(STRINGIFY_BIGINT): New macro, copied from older paxutils.
(UINTMAX_STRSIZE_BOUND): New constant, also from older paxutils.
Paul Eggert [Sat, 3 Sep 2022 23:22:34 +0000 (18:22 -0500)]
Fix data loss when acting as filter
This bug was introduced by the recent lseek-related changes.
* src/delete.c (delete_archive_members):
* src/update.c (update_archive):
Copy the member if acting as a filter, rather than lseeking over
it, which is possible if stdin is a regular file.
* src/list.c (skim_file, skim_member):
* src/sparse.c (sparse_skim_file):
New functions, for copying when a filter.
* src/list.c (skip_file): Remove; replaced with skim_file.
All callers changed.
(skip_member): Reimplement in terms of skim_member.
* src/sparse.c (sparse_skip_file):
Remove; replaced with sparse_skim_file. All callers changed.
* src/update.c (acting_as_filter): New static var.
(update_archive): Set it; this is like delete.c.
* tests/delete01.at (deleting a member after a big one):
* tests/delete02.at (deleting a member from stdin archive):
Also test filter case.
Paul Eggert [Fri, 26 Aug 2022 21:38:29 +0000 (16:38 -0500)]
Do not diagnose same xattr file twice
* src/extract.c (set_xattr): Simplify, by having it do only
the mknodat and xattrs_xattrs_set, rather than also
trying to recover from failure. Caller simplified too.
* tests/xattr07.at (xattrs: xattrs and --skip-old-files):
Adjust test to match fixed behavior.
Paul Eggert [Fri, 26 Aug 2022 20:23:23 +0000 (15:23 -0500)]
Fix bug with -x --xattr read-only files
Problem reported by Kevin Raymond in:
https://bugzilla.redhat.com/show_bug.cgi?id=1886540
* src/extract.c (open_output_file): If we already created the
empty file, do not open with O_EXCL, or with O_CREAT or O_TRUNC
for that matter. Instead, use only O_NOFOLLOW to avoid some
races. When estimating current mode, use openflag & O_EXCL rather
than overwriting_old_files.
(extract_file): Also invert S_IWUSR if it’s not set.
* tests/xattr08.at: New test.
* tests/Makefile.am, tests/testsuite.at: Add it.
Paul Eggert [Mon, 15 Aug 2022 07:05:53 +0000 (00:05 -0700)]
Avoid quadratic behavior with delayed links
Do this by searching a hash table instead of a linked list.
Problem reported by Martin Dørum in https://mort.coffee/home/tar/
via Gavin Smith in:
https://lists.gnu.org/r/bug-tar/2022-07/msg00003.html
* src/extract.c: Include hash.h.
Improve performance a bit on non-birthtime hosts
(struct delayed_link.has_predecessor): New member.
(delayed_link_head): Remove, replacing with ...
(delayed_link_table): ... this new variable. All uses
of linked list replaced with hash table.
(dl_hash, dl_compare): New functions for hash table.
(create_placeholder_file): Initialize has_predecessor.
(apply_delayed_link): New function, with body taken from
most of the old apply_delayed_link.
(apply_delayed_links): Use it. Respect has_predecessor.
Don’t bother freeing as we are about to exit.
Paul Eggert [Mon, 15 Aug 2022 06:16:42 +0000 (23:16 -0700)]
Improve performance a bit on non-birthtime hosts
* src/extract.c (HAVE_BIRTHTIME, BIRTHTIME_EQ): New macros.
(struct delayed_link, create_placeholder_file, extract_link)
(apply_delayed_links): Avoid unnecessary work on platforms
like GNU/Linux that lack birthtime.