Paul Eggert [Thu, 12 Jun 2025 07:20:52 +0000 (00:20 -0700)]
Port short_read to UBSan
Problem reported by Kirill Furman in:
https://lists.gnu.org/r/bug-tar/2025-06/msg00002.html
* src/buffer.c (short_read): Use (char *) record_start,
instead of record_start->buffer, to avoid undefined behavior
accessing past end of buffer. In practice the undefined
behavior is harmless unless running with -fsanitize=undefined
or a similarly-picky implementation.
* src/names.c (namelist_match_from): New function.
(namelist_match): Rewrite as a wrapper over it.
(register_match): New function.
(name_match)" Update all possible matches in the name list.
* tests/extrac29.at: New test.
* tests/Makefile.am: Add new test.
* tests/testsuite.at: Likewise.
Handle directory members consistently when listing and when extracting.
* src/list.c (skim_member): Recognize directory members using
the same rules as during extraction.
* tests/skipdir.at: New testcase.
* tests/testsuite.at: Add new test.
* tests/Makefile.am: Likewise.
Anssi Hannula [Wed, 6 Nov 2024 10:43:32 +0000 (12:43 +0200)]
Fix missing type in mknodat() mode argument
Per POSIX, the type of the file to be created should be OR'ed to the
`mode` argument of mknodat().
However, set_xattr() creates an empty file using mknodat() and does not
do that.
E.g. Linux kernel considers zero type as S_IFREG, so the code works on
most systems.
However, e.g. fakeroot, at least up to the current v1.36, does not
consider 0 as S_IFREG, instead creating an invalid file, causing tar to
enter an infinite retry loop when trying to create a file with xattrs
under fakeroot.
Since set_xattr is used only when extracting regular files, fix that
by ORing its mode argument with S_IFREG.
Paul Eggert [Tue, 29 Apr 2025 21:05:59 +0000 (14:05 -0700)]
Port to recent Gnulib hash_remove
Problem reported by Bruno Haible in:
https://lists.gnu.org/r/bug-tar/2025-04/msg00003.html
* src/incremen.c (remove_directory): hash_delete → hash_remove.
* src/extract.c (update_interdir_set_stat): New function.
(extract_dir): If the directory already exists, check if it
has been created as intermediate directory earlier. If so,
update its delayed_set_stat data from archive.
* tests/Makefile.am: Add new testcase.
* tests/testsuite.at: Add new testcase.
* tests/extrac28.at: New file.
Paul Eggert [Wed, 6 Nov 2024 18:18:13 +0000 (10:18 -0800)]
Fix bad pointer usage in xsparse.c
* scripts/xsparse.c (read_xheader): Avoid undefined behavior by
accessing via null pointer sparse_map or out of its bounds when
the input is invalid. This means we no longer need the ‘expect’
local, so omit it for simplicity.
Paul Eggert [Wed, 6 Nov 2024 18:05:23 +0000 (10:05 -0800)]
Port xsparse.c to AIX
* scripts/xsparse.c (emalloc): Do not report failure when malloc
(0) returns NULL, as it does on AIX. Simply return a null
pointer; that’s good enough for xsparse.c.
Paul Eggert [Wed, 6 Nov 2024 18:02:02 +0000 (10:02 -0800)]
Fix xsparse.c big heap allocation bugs
* scripts/xsparse.c (expand_sparse): Read into auto buffer, not heap.
The heap code was wrong for two reasons: it called malloc just once
in the try-again loop, and even when it succeeded it could have
left so few bytes available in the heap that later stdio calls
could fail. Reading into the auto buffer might be a bit slower
but speed is not an issue here and it’s better to be simple.
Paul Eggert [Sat, 2 Nov 2024 20:42:02 +0000 (13:42 -0700)]
Add LG_BLOCKSIZE to omit some *, % ops
* src/buffer.c (_flush_write, short_read, seek_archive)
(_gnu_flush_write):
* src/create.c (write_gnu_long_link, dump_regular_file)
(dump_dir0):
* src/delete.c (write_recent_bytes, flush_file)
(delete_archive_members):
* src/list.c (read_header):
* src/sparse.c (sparse_dump_region, sparse_extract_region)
(pax_dump_header_1):
* src/tar.c (parse_opt):
* src/update.c (append_file):
Prefer shifting and masking to dividing and remaindering by
BLOCKSIZE. This reclaims some compiler optimizations lost
by our recent preference for signed integers.
* src/tar.h (LG_BLOCKSIZE): New constant, for shifting.
Paul Eggert [Sat, 2 Nov 2024 20:06:47 +0000 (13:06 -0700)]
Improve sparse I/O performance
* src/sparse.c (sparse_dump_region, sparse_extract_region):
Don’t insist on reading and writing sparse files 512
bytes at a time. This resulted in a 4× to 6× performance
improvement on my platform.
Paul Eggert [Sat, 2 Nov 2024 18:52:28 +0000 (11:52 -0700)]
Simplify read_incr_db_01 malloc
* src/incremen.c (read_incr_db_01): Replace arg initbuf with two
args pbuf and pbufsize so that we can simplify memory allocation.
Caller changed. Omit now-unnecessary free, xstrdup, strlen.
Paul Eggert [Sat, 2 Nov 2024 16:54:10 +0000 (09:54 -0700)]
Count short read slop when seeking
* src/buffer.c (short_read_slop): New static var.
(get_archive_status): Treat anything other than fifos and sockets
as potentially seekable; they’ll tell us if they aren’t, whereas
fifos and sockets cannot be seekable. Check named files for
initial offset too, to deal with names like /dev/stdin.
Do not worry about start_offset’s value if !seekable_archive,
as it won’t be used. Use short_read_slop.
(short_read, try_new_volume, simple_flush_read, _gnu_flush_read):
Set short_read_slop.
Paul Eggert [Sat, 2 Nov 2024 02:49:02 +0000 (19:49 -0700)]
Prefer other types to int in names.c
* src/names.c (uname_to_uid, gname_to_gid, handle_option)
(make_file_name): Prefer bool for boolean.
(struct name_elt, read_name_from_file): Prefer char for char.
(handle_option): Invert sense of return value, for clarity.
All uses changed.
(merge_sort_sll, merge_sort, collect_and_sort_names):
Don’t assume list length fits in int. Use intptr_t not idx_t,
since the bound is the size of all memory rather than one array.
Paul Eggert [Sat, 2 Nov 2024 02:09:44 +0000 (19:09 -0700)]
Prefer other types to int in misc.c
* src/misc.c (quote_copy_string, tar_savedir):
Use bool for booleans. All uses changed.
(quote_copy_string): Use char for chars.
(unquote_string): Return void, since nobody uses return value.
(unquote_string): Check for overflow in escapes like \777.
(wdcache): Now array of idx_t not int, since in theory it
might contain values greater than INT_MAX. All uses changed.
Paul Eggert [Sat, 2 Nov 2024 01:45:00 +0000 (18:45 -0700)]
Fix some uses of int in list.c
* src/list.c (decode_xform): Last arg is now int, not a void *
pointer to that int. All uses changed.
(enforce_one_top_level): Don’t assume string length fits in int.
(transform_stat_info): Prefer char to int for typeflag.
All uses changed.
(decode_header): Prefer bool for booleans. All uses changed.
(ugswidth): Now idx_t, not int, since in theory it could
exceed INT_MAX. All uses changed.
(simple_print_header, print_for_mkdir): Don’t assume printf length
fits in int, and similarly for length of user or group name.
* src/transform.c (transform_name_fp): Last arg is now int, not void *.
All uses changed.
Paul Eggert [Fri, 1 Nov 2024 21:27:21 +0000 (14:27 -0700)]
Prefer other types to int in incremen.c
* src/incremen.c (struct dumpdir_iter, dumpdir_first)
(read_incr_db_01, dumpdir_ok, list_dumpdir):
Prefer bool to int for booleans. All uses changed.
(read_incr_db_01): Don’t assume getline returns <= INT_MAX.
(dumpdir_ok): Prefer char to int for chars.
Paul Eggert [Fri, 1 Nov 2024 21:15:09 +0000 (14:15 -0700)]
Prefer other types to int in extract.c
* src/extract.c (fd_chmod, extract_chdir, open_output_file)
(extract_file, extract_link, extract_symlink, extract_node)
(extract_fifo, tar_extractor_t, pepare_to_extract): Prefer char to
int for typeflag, since it’s a char. All uses changed.
(fd_chmod): Use clearer code for errno.
(extract_dir, extract_file, create_placeholder_file, extract_link)
(extract_symlink, extract_node, extract_fifo, tar_extractor_t):
Return bool true for success, false for failure. All uses changed.
(open_output_file): Prefer bool for boolean.
(prepare_to_extract): Simplify by returning the extractor a null
pointer, rather than storing through a pointer to an extractor.
Paul Eggert [Fri, 1 Nov 2024 18:04:39 +0000 (11:04 -0700)]
Check for checkpoint string overflow
It’s very unlikely, but would lead to undefined behavior.
* src/checkpoint.c (format_checkpoint_string): Accept and return
intmax_t, not idx_t. All callers changed. Check for integer
overflow by using add_printf. If overflow occurs, don’t bother
with extending width.
Paul Eggert [Fri, 1 Nov 2024 17:37:39 +0000 (10:37 -0700)]
Prefer int to idx_t for some small sizes
* src/create.c (max_octal_val, to_octal, tar_copy_str)
(tar_name_copy_str, to_base256, to_chars_subst, to_chars)
(gid_to_chars, major_to_chars, minor_to_chars, mode_to_chars)
(off_to_chars, time_to_chars, uid_to_chars, string_to_chars)
(split_long_name, write_ustar_long_name, simple_finish_header):
* src/list.c (from_header, gid_from_header, major_from_header)
(minor_from_header, mode_from_header, off_from_header)
(time_from_header, uid_from_header):
Prefer int to idx_t where either will do because the buffer sizes
are known to be small, as this can be a performance win on 32-bit
platforms. Also, in a few cases the values were negative, whereas
idx_t is supposed to be nonnegative.
Paul Eggert [Fri, 1 Nov 2024 16:40:36 +0000 (09:40 -0700)]
Prefer other types to int in buffer.c
This increases the volume number maximum from 2**31 - 1 to 2**63 - 1.
* src/buffer.c (record_index, inhibit_map, new_volume):
Prefer bool to int for booleans.
* src/buffer.c (volno, global_volno):
* src/system.c (sys_exec_info_script):
Prefer intmax_t to int.
* src/buffer.c (increase_volume_number): Omit by-hand check for
overflow that relied on undefined behavior.
(new_volume): Check for that overflow here instead, without
relying on undefined behavior.
(print_stats): Avoid undefined behavior if printf sums overflow,
and reliably treat printf error like overflow.
* src/common.h (add_printf): New inline function.
Paul Eggert [Fri, 1 Nov 2024 16:40:36 +0000 (09:40 -0700)]
Prefer other types to int in tar.c
Use types that are more specific than ‘int’, if that is easy.
* src/tar.c (after_date_option, xattrs_option, check_links_option)
(confirm, confirm_file_EOF, set_xattr_option, optloc_eq)
(get_date_or_file):
Prefer bool to int.
(tar_list_quoting_styles, tar_set_quoting_style, parse_opt):
Prefer idx_t to int.
(optloc_lookup, option_set_in_cl): Prefer enum option_class to int.
(decode_signal): Avoid some pointer reallocation.
(sort_mode_flag, hole_detection_types, set_old_files_option)
(is_subcommand_class): Prefer enum to int.
(parse_opt) [DEVICE_PREFIX]: Remove unused var.
Simplify creation of device name.
(find_argp_option_key, find_argp_option): Prefer char to int.
(enum subcommand_class): Now named.
(subcommand_class): Now char, not int.
(decode_options): Check for unlikely int overflow.
Paul Eggert [Fri, 1 Nov 2024 16:40:36 +0000 (09:40 -0700)]
Fewer uses of uintmax_t in xheader.c
* src/tar.h (struct xheader):
* src/xheader.c (xheader_string_end):
Use idx_t, not uintmax_t, for string length.
* src/xheader.c (xheader_string_add):
Avoid duplicate calls to strlen.
(xheader_string_end): Remove by-hand check for size overflow;
it’s not possible, as this is measuring allocated storage.
Paul Eggert [Fri, 1 Nov 2024 16:40:36 +0000 (09:40 -0700)]
Check for setenv failures when running scripts
* src/system.c (dec_to_env): Use umaxtostr for speed,
since convenience isn’t needed here.
(sys_exec_info_script, sys_exec_checkpoint_script):
Check for setenv failure.
Paul Eggert [Fri, 1 Nov 2024 16:40:36 +0000 (09:40 -0700)]
Prefer intmax_t to size_t in xheader.c
* src/common.h (INTMAX_STRSIZE_BOUND): New constant.
(SYSINT_BUFSIZE): Use it.
* src/xheader.c (global_header_count, xheader_format_name):
Prefer intmax_t to size_t, as the values are not sizes.
Paul Eggert [Fri, 1 Nov 2024 16:40:36 +0000 (09:40 -0700)]
Prefer idx_t to size_t in tar.c
* src/tar.c (strip_name_components, archive_names)
(allocated_archive_names, tar_list_quoting_styles)
(expand_pax_option, parse_opt):
Prefer idx_t to size_t.
(decode_options): Use a static word rather than going
to to the bother of dynamically allocating an array.
(main): Do not preallocate array. Do not call ‘free’
on a pointer that now might be to static storage.
Paul Eggert [Fri, 1 Nov 2024 16:40:36 +0000 (09:40 -0700)]
Streamline compression suffix detection
* src/suffix.c (struct compression_suffix):
Use arrays rather than pointers that need relocation.
All uses changed.
(compression_suffixes): Now const.
Omit trailing null entry; all uses changed.
(find_compression_suffix): Simplify length calculations.
No longer any need to call strlen.
Paul Eggert [Fri, 1 Nov 2024 16:40:36 +0000 (09:40 -0700)]
Adjust better to Gnulib signed-int read changes
The 2024-08-09 Gnulib changes that caused some modules prefer
signed types to size_t means that Tar should follow suit.
* src/buffer.c (short_read):
* src/system.c (sys_child_open_for_compress)
(sys_child_open_for_uncompress):
rmtread and safe_read return ptrdiff_t not idx_t;
don’t rely on implementation defined conversion.
* src/misc.c (blocking_read): Never return a negative number.
Return idx_t, not ptrdiff_t, with the same convention for EOF
and error as the new full_read. All callers changed.
* src/sparse.c (sparse_dump_region, check_sparse_region)
(check_data_region):
* src/update.c (append_file):
full_read no longer returns SAFE_READ_ERROR for I/O error; instead it
returns the number of bytes successfully read, and sets errno.
Adjust to this.
* src/system.c (sys_child_open_for_uncompress):
Rewrite to avoid need for goto and label.
Paul Eggert [Fri, 1 Nov 2024 16:40:36 +0000 (09:40 -0700)]
Simplify name_buffer initialization
* src/names.c (name_init): Remove no-longer-needed initialization
of name_buffer, name_buffer_length. It was confusing anyway,
since it caused name_buffer_length to not equal the length of
name_buffer.
Paul Eggert [Fri, 1 Nov 2024 02:53:25 +0000 (19:53 -0700)]
Pacify GCC in info_attach_exclist
* src/exclist.c (info_attach_exclist): Remove unnecessary test
for whether dir and ex are null. GCC complains about the first
one in some cases. Use C99-style decls.
Paul Eggert [Fri, 1 Nov 2024 02:53:25 +0000 (19:53 -0700)]
Simplify checkpoint_action allocation
* src/checkpoint.c: Include <flexmember.h>.
(struct checkpoint_action): New member commandbuf.
(checkpoint_action_tail): Now pointer to pointer,
to simplify updating. All uses changed.
(alloc_action): New arg quoted_string, to lessen number of
separate allocations. All uses changed.
Paul Eggert [Fri, 1 Nov 2024 02:53:25 +0000 (19:53 -0700)]
Fewer uses of size_t in checkpoint.c
* src/checkpoint.c (copy_string_unquote, getarg)
(format_checkpoint_string): Prefer idx_t to size_t.
(copy_string_unquote): Simplify by using ximemdup0.
(getarg): Avoid quadratic reallocation behavior by
using xpalloc.
Paul Eggert [Fri, 1 Nov 2024 02:53:25 +0000 (19:53 -0700)]
Fewer uses of size_t in buffer.c
* src/buffer.c (flush_write_ptr, flush_bufmap, bufmap_locate):
(struct zip_magic, available_space_after, _flush_write)
(short_read, flush_archive, try_new_volume)
(gnu_add_multi_volume_header, simple_flush_read)
(simple_flush_write, _gnu_flush_read, _gnu_flush_write)
(gnu_flush_write): Prefer idx_t to size_t when either will do, as
signed types are typically safer. For a tiny value in memory,
just use ‘char’.
Don't assume archive read from stdin starts at offset 0
* src/buffer.c (start_offset): New variable.
(get_archive_status): If reading from seekable stdin, store the
position in the stream corresponding to record_start in start_offset.
(seek_archive): Compute current offset relative to start_offset.