Paul Eggert [Sat, 2 Nov 2024 16:54:10 +0000 (09:54 -0700)]
Count short read slop when seeking
* src/buffer.c (short_read_slop): New static var.
(get_archive_status): Treat anything other than fifos and sockets
as potentially seekable; they’ll tell us if they aren’t, whereas
fifos and sockets cannot be seekable. Check named files for
initial offset too, to deal with names like /dev/stdin.
Do not worry about start_offset’s value if !seekable_archive,
as it won’t be used. Use short_read_slop.
(short_read, try_new_volume, simple_flush_read, _gnu_flush_read):
Set short_read_slop.
Paul Eggert [Sat, 2 Nov 2024 02:49:02 +0000 (19:49 -0700)]
Prefer other types to int in names.c
* src/names.c (uname_to_uid, gname_to_gid, handle_option)
(make_file_name): Prefer bool for boolean.
(struct name_elt, read_name_from_file): Prefer char for char.
(handle_option): Invert sense of return value, for clarity.
All uses changed.
(merge_sort_sll, merge_sort, collect_and_sort_names):
Don’t assume list length fits in int. Use intptr_t not idx_t,
since the bound is the size of all memory rather than one array.
Paul Eggert [Sat, 2 Nov 2024 02:09:44 +0000 (19:09 -0700)]
Prefer other types to int in misc.c
* src/misc.c (quote_copy_string, tar_savedir):
Use bool for booleans. All uses changed.
(quote_copy_string): Use char for chars.
(unquote_string): Return void, since nobody uses return value.
(unquote_string): Check for overflow in escapes like \777.
(wdcache): Now array of idx_t not int, since in theory it
might contain values greater than INT_MAX. All uses changed.
Paul Eggert [Sat, 2 Nov 2024 01:45:00 +0000 (18:45 -0700)]
Fix some uses of int in list.c
* src/list.c (decode_xform): Last arg is now int, not a void *
pointer to that int. All uses changed.
(enforce_one_top_level): Don’t assume string length fits in int.
(transform_stat_info): Prefer char to int for typeflag.
All uses changed.
(decode_header): Prefer bool for booleans. All uses changed.
(ugswidth): Now idx_t, not int, since in theory it could
exceed INT_MAX. All uses changed.
(simple_print_header, print_for_mkdir): Don’t assume printf length
fits in int, and similarly for length of user or group name.
* src/transform.c (transform_name_fp): Last arg is now int, not void *.
All uses changed.
Paul Eggert [Fri, 1 Nov 2024 21:27:21 +0000 (14:27 -0700)]
Prefer other types to int in incremen.c
* src/incremen.c (struct dumpdir_iter, dumpdir_first)
(read_incr_db_01, dumpdir_ok, list_dumpdir):
Prefer bool to int for booleans. All uses changed.
(read_incr_db_01): Don’t assume getline returns <= INT_MAX.
(dumpdir_ok): Prefer char to int for chars.
Paul Eggert [Fri, 1 Nov 2024 21:15:09 +0000 (14:15 -0700)]
Prefer other types to int in extract.c
* src/extract.c (fd_chmod, extract_chdir, open_output_file)
(extract_file, extract_link, extract_symlink, extract_node)
(extract_fifo, tar_extractor_t, pepare_to_extract): Prefer char to
int for typeflag, since it’s a char. All uses changed.
(fd_chmod): Use clearer code for errno.
(extract_dir, extract_file, create_placeholder_file, extract_link)
(extract_symlink, extract_node, extract_fifo, tar_extractor_t):
Return bool true for success, false for failure. All uses changed.
(open_output_file): Prefer bool for boolean.
(prepare_to_extract): Simplify by returning the extractor a null
pointer, rather than storing through a pointer to an extractor.
Paul Eggert [Fri, 1 Nov 2024 18:04:39 +0000 (11:04 -0700)]
Check for checkpoint string overflow
It’s very unlikely, but would lead to undefined behavior.
* src/checkpoint.c (format_checkpoint_string): Accept and return
intmax_t, not idx_t. All callers changed. Check for integer
overflow by using add_printf. If overflow occurs, don’t bother
with extending width.
Paul Eggert [Fri, 1 Nov 2024 17:37:39 +0000 (10:37 -0700)]
Prefer int to idx_t for some small sizes
* src/create.c (max_octal_val, to_octal, tar_copy_str)
(tar_name_copy_str, to_base256, to_chars_subst, to_chars)
(gid_to_chars, major_to_chars, minor_to_chars, mode_to_chars)
(off_to_chars, time_to_chars, uid_to_chars, string_to_chars)
(split_long_name, write_ustar_long_name, simple_finish_header):
* src/list.c (from_header, gid_from_header, major_from_header)
(minor_from_header, mode_from_header, off_from_header)
(time_from_header, uid_from_header):
Prefer int to idx_t where either will do because the buffer sizes
are known to be small, as this can be a performance win on 32-bit
platforms. Also, in a few cases the values were negative, whereas
idx_t is supposed to be nonnegative.
Paul Eggert [Fri, 1 Nov 2024 16:40:36 +0000 (09:40 -0700)]
Prefer other types to int in buffer.c
This increases the volume number maximum from 2**31 - 1 to 2**63 - 1.
* src/buffer.c (record_index, inhibit_map, new_volume):
Prefer bool to int for booleans.
* src/buffer.c (volno, global_volno):
* src/system.c (sys_exec_info_script):
Prefer intmax_t to int.
* src/buffer.c (increase_volume_number): Omit by-hand check for
overflow that relied on undefined behavior.
(new_volume): Check for that overflow here instead, without
relying on undefined behavior.
(print_stats): Avoid undefined behavior if printf sums overflow,
and reliably treat printf error like overflow.
* src/common.h (add_printf): New inline function.
Paul Eggert [Fri, 1 Nov 2024 16:40:36 +0000 (09:40 -0700)]
Prefer other types to int in tar.c
Use types that are more specific than ‘int’, if that is easy.
* src/tar.c (after_date_option, xattrs_option, check_links_option)
(confirm, confirm_file_EOF, set_xattr_option, optloc_eq)
(get_date_or_file):
Prefer bool to int.
(tar_list_quoting_styles, tar_set_quoting_style, parse_opt):
Prefer idx_t to int.
(optloc_lookup, option_set_in_cl): Prefer enum option_class to int.
(decode_signal): Avoid some pointer reallocation.
(sort_mode_flag, hole_detection_types, set_old_files_option)
(is_subcommand_class): Prefer enum to int.
(parse_opt) [DEVICE_PREFIX]: Remove unused var.
Simplify creation of device name.
(find_argp_option_key, find_argp_option): Prefer char to int.
(enum subcommand_class): Now named.
(subcommand_class): Now char, not int.
(decode_options): Check for unlikely int overflow.
Paul Eggert [Fri, 1 Nov 2024 16:40:36 +0000 (09:40 -0700)]
Fewer uses of uintmax_t in xheader.c
* src/tar.h (struct xheader):
* src/xheader.c (xheader_string_end):
Use idx_t, not uintmax_t, for string length.
* src/xheader.c (xheader_string_add):
Avoid duplicate calls to strlen.
(xheader_string_end): Remove by-hand check for size overflow;
it’s not possible, as this is measuring allocated storage.
Paul Eggert [Fri, 1 Nov 2024 16:40:36 +0000 (09:40 -0700)]
Check for setenv failures when running scripts
* src/system.c (dec_to_env): Use umaxtostr for speed,
since convenience isn’t needed here.
(sys_exec_info_script, sys_exec_checkpoint_script):
Check for setenv failure.
Paul Eggert [Fri, 1 Nov 2024 16:40:36 +0000 (09:40 -0700)]
Prefer intmax_t to size_t in xheader.c
* src/common.h (INTMAX_STRSIZE_BOUND): New constant.
(SYSINT_BUFSIZE): Use it.
* src/xheader.c (global_header_count, xheader_format_name):
Prefer intmax_t to size_t, as the values are not sizes.
Paul Eggert [Fri, 1 Nov 2024 16:40:36 +0000 (09:40 -0700)]
Prefer idx_t to size_t in tar.c
* src/tar.c (strip_name_components, archive_names)
(allocated_archive_names, tar_list_quoting_styles)
(expand_pax_option, parse_opt):
Prefer idx_t to size_t.
(decode_options): Use a static word rather than going
to to the bother of dynamically allocating an array.
(main): Do not preallocate array. Do not call ‘free’
on a pointer that now might be to static storage.
Paul Eggert [Fri, 1 Nov 2024 16:40:36 +0000 (09:40 -0700)]
Streamline compression suffix detection
* src/suffix.c (struct compression_suffix):
Use arrays rather than pointers that need relocation.
All uses changed.
(compression_suffixes): Now const.
Omit trailing null entry; all uses changed.
(find_compression_suffix): Simplify length calculations.
No longer any need to call strlen.
Paul Eggert [Fri, 1 Nov 2024 16:40:36 +0000 (09:40 -0700)]
Adjust better to Gnulib signed-int read changes
The 2024-08-09 Gnulib changes that caused some modules prefer
signed types to size_t means that Tar should follow suit.
* src/buffer.c (short_read):
* src/system.c (sys_child_open_for_compress)
(sys_child_open_for_uncompress):
rmtread and safe_read return ptrdiff_t not idx_t;
don’t rely on implementation defined conversion.
* src/misc.c (blocking_read): Never return a negative number.
Return idx_t, not ptrdiff_t, with the same convention for EOF
and error as the new full_read. All callers changed.
* src/sparse.c (sparse_dump_region, check_sparse_region)
(check_data_region):
* src/update.c (append_file):
full_read no longer returns SAFE_READ_ERROR for I/O error; instead it
returns the number of bytes successfully read, and sets errno.
Adjust to this.
* src/system.c (sys_child_open_for_uncompress):
Rewrite to avoid need for goto and label.
Paul Eggert [Fri, 1 Nov 2024 16:40:36 +0000 (09:40 -0700)]
Simplify name_buffer initialization
* src/names.c (name_init): Remove no-longer-needed initialization
of name_buffer, name_buffer_length. It was confusing anyway,
since it caused name_buffer_length to not equal the length of
name_buffer.
Paul Eggert [Fri, 1 Nov 2024 02:53:25 +0000 (19:53 -0700)]
Pacify GCC in info_attach_exclist
* src/exclist.c (info_attach_exclist): Remove unnecessary test
for whether dir and ex are null. GCC complains about the first
one in some cases. Use C99-style decls.
Paul Eggert [Fri, 1 Nov 2024 02:53:25 +0000 (19:53 -0700)]
Simplify checkpoint_action allocation
* src/checkpoint.c: Include <flexmember.h>.
(struct checkpoint_action): New member commandbuf.
(checkpoint_action_tail): Now pointer to pointer,
to simplify updating. All uses changed.
(alloc_action): New arg quoted_string, to lessen number of
separate allocations. All uses changed.
Paul Eggert [Fri, 1 Nov 2024 02:53:25 +0000 (19:53 -0700)]
Fewer uses of size_t in checkpoint.c
* src/checkpoint.c (copy_string_unquote, getarg)
(format_checkpoint_string): Prefer idx_t to size_t.
(copy_string_unquote): Simplify by using ximemdup0.
(getarg): Avoid quadratic reallocation behavior by
using xpalloc.
Paul Eggert [Fri, 1 Nov 2024 02:53:25 +0000 (19:53 -0700)]
Fewer uses of size_t in buffer.c
* src/buffer.c (flush_write_ptr, flush_bufmap, bufmap_locate):
(struct zip_magic, available_space_after, _flush_write)
(short_read, flush_archive, try_new_volume)
(gnu_add_multi_volume_header, simple_flush_read)
(simple_flush_write, _gnu_flush_read, _gnu_flush_write)
(gnu_flush_write): Prefer idx_t to size_t when either will do, as
signed types are typically safer. For a tiny value in memory,
just use ‘char’.
Don't assume archive read from stdin starts at offset 0
* src/buffer.c (start_offset): New variable.
(get_archive_status): If reading from seekable stdin, store the
position in the stream corresponding to record_start in start_offset.
(seek_archive): Compute current offset relative to start_offset.
Paul Eggert [Mon, 19 Aug 2024 16:42:59 +0000 (09:42 -0700)]
Fewer macros in extract.c
* src/extract.c (ALL_MODE_BITS, RECOVER_NO, RECOVER_OK)
(RECOVER_SKIP): Now constants or inline functions, not macros.
(maybe_recoverable): Return enum recover, not int.
Paul Eggert [Mon, 19 Aug 2024 16:41:37 +0000 (09:41 -0700)]
Fewer macros in create.c
* src/create.c (CACHEDIR_SIGNATURE, CACHEDIR_SIGNATURE_SIZE)
(MAX_VAL_WITH_DIGITS, MAX_OCTAL_VAL): Now constants or
inline functions or removed, instead of macros.
(max_octal_val): Accept size rather than type.
Paul Eggert [Mon, 19 Aug 2024 16:12:52 +0000 (09:12 -0700)]
Fewer macros in common.h
In common.h, replace macros with constants or functions when that
is easy. This makes code a bit more reliable (functions evaluate
their args exactly once) and easier to debug (many debugging
environments cannot access macros).
* src/common.h (CHKBLANKS): Remove. All uses removed.
(NAME_FIELD_SIZE, PREFIX_FIELD_SIZE, UNAME_FIELD_SIZE)
(GNAME_FIELD_SIZE, TAREXIT_SUCCESS, TAREXIT_DIFFERS)
(TAREXIT_FAILURE, LG_8, LG_256, DEFAULT_CHECKPOINT)
(MAX_OLD_FILES, TF_READ, TF_WRITE, TF_DELETED, XFORM_REGFILE)
(XFORM_LINK, XFORM_SYMLINK, XFORM_ALL, WARN_ALONE_ZERO_BLOCK)
(WARN_BAD_DUMPDIR, WARN_CACHEDIR, WARN_CONTIGUOUS_CAST)
(WARN_FILE_CHANGED, WARN_FILE_IGNORED, WARN_FILE_REMOVED)
(WARN_FILE_SHRANK, WARN_FILE_UNCHANGED, WARN_FILENAME_WITH_NULS)
(WARN_IGNORE_ARCHIVE, WARN_IGNORE_NEWER, WARN_NEW_DIRECTORY)
(WARN_RENAME_DIRECTORY, WARN_SYMLINK_CAST, WARN_TIMESTAMP)
(WARN_UNKNOWN_CAST, WARN_UNKNOWN_KEYWORD, WARN_XDEV)
(WARN_DECOMPRESS_PROGRAM, WARN_EXISTING_FILE, WARN_XATTR_WRITE)
(WARN_RECORD_SIZE, WARN_FAILED_READ, WARN_MISSING_ZERO_BLOCKS)
(WARN_VERBOSE_WARNINGS, WARN_ALL, EXCL_DEFAULT, EXCL_RECURSIVE)
(EXCL_NON_RECURSIVE): Now enum constants rather than macros.
(time_option_initialized, isfound, wasfound, warning_enabled):
Now functions rather than macros TIME_OPTION_INITIALIZED, ISFOUND,
WASFOUND, WARNING_ENABLED. All uses changed.
(OLDER_STAT_TIME, OLDER_TAR_STAT_TIME, EXTRACT_OVER_PIPE)
(TAR_ARGS_INITIALIZER): Remove. All uses replaced with their
definiens or equivalent.
Paul Eggert [Sun, 18 Aug 2024 18:02:51 +0000 (11:02 -0700)]
Prefer function to COPY_BUF macro
* src/sparse.c (struct ok_n_block_ptr): New type.
(decode_num): Revamp API so that it does the work of both
the old decode_num and the old COPY_BUF. Always read to the
next newline even if there is a lot of junk in between.
(pax_decode_header): Use the new API.
(COPY_BUF): Remove.
Paul Eggert [Sun, 18 Aug 2024 16:44:25 +0000 (09:44 -0700)]
Prefer function to COPY_STRING macro
* src/sparse.c (struct block_ptr):
New type, to allow a functional style.
(dump_str_nl, floorlog10): New static functions.
(COPY_STRING): Remove. All uses replaced by dump_str_nl.
(pax_dump_header_1): Use floorlog10 instead of creating a string.
Simplify size calculation.
Paul Eggert [Tue, 13 Aug 2024 15:35:24 +0000 (08:35 -0700)]
Avoid need for base64_init and extra table
Simplify the code by assuming C99 initializers.
* src/list.c (base_64_digits): Remove.
(base64_map): Now a constant. Now has its (old value + 1) % 65,
as that’s the only easy portable way to do it with a static
initializer (even on platforms where CHAR_BIT != 8); all uses changed.
(base64_init): Remove; only use removed.
(from_header): Adjust to new values in base64_map.
* src/list.c (base_64_digits): Remove; no longer needed.
(base64_map): Now const, initialized statically, and with
invalid entries being 0 not 64, and with valid entries
being 1 greater than before.
Paul Eggert [Tue, 13 Aug 2024 01:17:58 +0000 (18:17 -0700)]
Prefer signed to unsigned when decoding options
* src/tar.c (assert_format, decode_options):
Prefer signed to unsigned integers.
(optloc_save): Prefer enum to unsigned integer.
Simplify allocation.
(decode_options): No need to call ngettext for a value known
to be plenty large.