Tim Kientzle [Fri, 8 May 2026 04:50:50 +0000 (21:50 -0700)]
[test_utils] Fix a minor UB
(UBSan occasionally finds something interesting and
often reports whacky non-bugs like this one. "Fixing"
it will make the real UB bugs easier to identify, so...)
According to C's integer promotion rules, `unsigned short` gets
promoted to _signed_ `int`, and shifting into the sign bit of an `int`
is technically UB. Explicit cast to `unsigned` quiets UBSan.
Tim Kientzle [Fri, 8 May 2026 04:11:54 +0000 (21:11 -0700)]
[XAR] Fix two UB
1. The XAR writer's path normalization code uses strcpy() to move
parts of a path string within the same buffer. The source and
destination ranges overlap, which is undefined behavior for strcpy().
2. Failure to check string length before accessing the last character
of a path component. For empty components (e.g., //), the length is 0,
and length-1 underflows to SIZE_MAX.
Tim Kientzle [Fri, 8 May 2026 03:15:37 +0000 (20:15 -0700)]
[pathmatch] Heap buffer over-read
The bracket expression matching [ in the pathmatch engine fails to
handle malformed patterns, specifically when a closing ] is missing or
when high-byte characters are used. The scanner advances the pattern
pointer beyond the allocated buffer.
Tim Kientzle [Fri, 8 May 2026 02:41:04 +0000 (19:41 -0700)]
[ACL] Parser out-of-bounds read
The ACL parser fails to validate buffer length when processing PAX
attributes (SCHILY.acl.access/default). The next_field() function
attempts to read a separator character from a pointer even when the
remaining length is zero.
Jose Luis Duran [Wed, 15 Apr 2026 01:36:07 +0000 (01:36 +0000)]
mtree: Do not append '/' when basename is '.'
If the basename is '.', it means it is the root directory ('/'). Do not
append '/' to '.', as this will produce a path '/.', resulting in an
invalid mtree entry.
Tim Kientzle [Thu, 7 May 2026 21:35:02 +0000 (14:35 -0700)]
Date parsing: reject date components with numbers of more than 4 digits
Only the Unix epoch format `@<timestamp>` can have a number with
more than 4 digits. So let's break out the numeric parsing into
a standalone uint64 parser and use it separately to parse epoch
timestamps (which are only limited by the range of time_t) and
other date components.
It also avoids a time-consuming leap-year correction for
nonsensically large year values.
Tim Kientzle [Wed, 6 May 2026 03:54:12 +0000 (20:54 -0700)]
[CMake] Automatically update `list.h`
`list.h` contains a list of all the tests and is generated
by grepping the test source files for `DEFINE_TEST`.
Previously, it was generated at configure time.
This meant that if you added a new test to an existing
source file, you had to manually reconfigure.
This adds the necessary dependencies so that `list.h`
is regenerated whenever any C test source changes.
This ensures that new tests are always discovered automatically.
Note: If someone wants to update the autoconf-based
build system to do this, please send a PR.
Tim Kientzle [Tue, 5 May 2026 17:00:01 +0000 (10:00 -0700)]
[RAR5] Correct handling of unknown filter types
The change to return FAILED for entry-specific issues uncovered
flaws in RAR5 handling of filter types:
* Supported filter types are verified in `parse_filter` and
were being also checked in `run_filter` -- the duplication
confused the error handling here.
* `do_uncompress_file` was only checking for `FATAL` from
the upstream filter logic, so failed to properly pass
`FAILED` errors through
Tim Kientzle [Mon, 4 May 2026 23:38:15 +0000 (16:38 -0700)]
[rar5] Fix infinite loop in header parsing
The change to return `FAILED` instead of `FATAL` for issues that
impact a single entry (but don't necessarily terminate the entire archive)
created a bug in header parsing since `FAILED` wasn't handled in a
header-check loop.
Tim Kientzle [Mon, 4 May 2026 03:42:22 +0000 (20:42 -0700)]
7zip: propagate skip_stream's actual error code in read_data_skip
archive_read_format_7zip_read_data_skip used to coerce any negative
skip_stream() return into ARCHIVE_FATAL. That is wrong in principle:
ARCHIVE_FAILED can legitimately propagate up from setup_decode_folder()
through read_stream() and skip_stream(), and the wrapper should not
upgrade it.
In the current encryption-partially test case this is empirically a
no-op because skip_stream() still returns ARCHIVE_FATAL via a second,
deeper code path through extract_pack_stream(). An inline TODO comment
flags that asymmetry for a follow-up audit.
Tim Kientzle [Mon, 4 May 2026 03:12:42 +0000 (20:12 -0700)]
rar5: convert remaining per-entry data errors to ARCHIVE_FAILED
Follow-up to 9fa772ab. An audit of the rar5 reader found many more
ARCHIVE_FATAL returns in data-decode paths that should be ARCHIVE_FAILED
so the caller can move on to the next entry after a corrupt one:
Programmer assertions, ENOMEM, true I/O errors, and propagation of
copy_string's window-buf-NULL FATAL return are intentionally kept as
ARCHIVE_FATAL because they are not recoverable per-entry conditions.
Tim Kientzle [Sun, 3 May 2026 23:25:55 +0000 (16:25 -0700)]
rar: convert remaining per-entry data errors to ARCHIVE_FAILED
Follow-up to 4f148608. A code review found additional ARCHIVE_FATAL
returns in RAR4 data-decode paths that should be ARCHIVE_FAILED so the
caller can move on to the next entry:
read_data_block Truncated RAR file data
read_data_compressed PPMd "Invalid symbol" (3 sites)
parse_codes Zero window size is invalid
add_value Prefix found (second site)
make_table_recurse Huffman tree was not created
make_table_recurse Invalid location to Huffman tree specified
These are all per-entry parse/decode failures. As with the earlier
batch, the rar4 input position is tracked by rar_br_fillup so
read_data_skip will correctly advance past the damaged entry, and
RAR4 solid mode is not supported, so subsequent entries are not at
risk from a half-consumed shared decoder state.
Tim Kientzle [Sun, 3 May 2026 22:23:48 +0000 (15:23 -0700)]
tests: update expected return codes from FATAL to FAILED
Per-entry data errors (encryption, invalid filters, bad bitstream) now
return ARCHIVE_FAILED instead of ARCHIVE_FATAL. Update tests that were
asserting the old incorrect ARCHIVE_FATAL return codes.
The one exception is test_read_format_7zip_encryption_partially line 71,
which asserts ARCHIVE_FATAL after archive_read_next_header on the entry
following an encrypted entry: 7zip cannot skip an encrypted entry (the
decode-folder setup fails), so the skip legitimately returns ARCHIVE_FATAL
and the archive is done.
Tim Kientzle [Sun, 3 May 2026 22:21:44 +0000 (15:21 -0700)]
7zip: return ARCHIVE_FAILED (not ARCHIVE_FATAL) for per-entry data errors
setup_decode_folder() returned ARCHIVE_FATAL for both header-level and
data-level encryption/filter errors. Header encryption is a true
archive-fatal condition; data encryption is per-entry.
Distinguish the two by returning ARCHIVE_FATAL when decoding archive
headers (header==1) and ARCHIVE_FAILED when decoding file content
(header==0). Fix the call site in read_stream() to propagate the
actual return value rather than mapping all errors to ARCHIVE_FATAL.
Tim Kientzle [Sun, 3 May 2026 22:21:09 +0000 (15:21 -0700)]
rar: return ARCHIVE_FAILED (not ARCHIVE_FATAL) for per-entry data errors
ARCHIVE_FATAL means the entire archive is unreadable and no further
operations are valid. ARCHIVE_FAILED means the current entry cannot
be processed but iteration over subsequent entries may still succeed.
The RAR4 decompressor was returning ARCHIVE_FATAL from a large number
of data-parsing failures (invalid Huffman prefix, invalid PPMd sequence,
bad CRC, invalid symbol, etc.) that are per-entry errors. Because each
entry's compressed data region can be skipped using the packed_size
recorded in its file header, a decompressor error does not prevent
reading the next entry's header.
Change all such per-entry errors in the data-reading path
(read_data_stored, read_data_compressed, parse_codes, create_code,
add_value, make_table_recurse, expand, copy_from_lzss_window,
copy_from_lzss_window_to_unp) to return ARCHIVE_FAILED. OOM errors
and true I/O failures (rar_br_preparation truncated-data) remain
ARCHIVE_FATAL.
Tim Kientzle [Sun, 3 May 2026 22:20:33 +0000 (15:20 -0700)]
archive_read: make ARCHIVE_FATAL sticky in data-reading entry points
Three entry points in archive_read.c could return ARCHIVE_FATAL from
the format layer without setting a->archive.state = ARCHIVE_STATE_FATAL,
so subsequent API calls would not see the archive as fatally broken:
- archive_read_data_skip() unconditionally reset state to HEADER even
when the format's skip returned ARCHIVE_FATAL.
- archive_seek_data() and _archive_read_data_block() forwarded FATAL
from the format layer without recording it in the archive state.
Fix all three so that ARCHIVE_FATAL causes the state to become
ARCHIVE_STATE_FATAL, consistent with the existing behavior of
archive_read_next_header().
Tim Kientzle [Sun, 3 May 2026 16:32:00 +0000 (09:32 -0700)]
[Zip] Reject empty pathnames in ZIP writer
An empty pathname caused a one-byte OOB read before the heap buffer in
write_path() due to two compounding bugs:
(1) misuse of bitwise & instead of logical &&, and
(2) missing gaurd for an empty pathname
Tim Kientzle [Sat, 2 May 2026 18:51:47 +0000 (11:51 -0700)]
[archive_acl] Reject ACL entries with out-of-range numeric IDs
isint() and isint_w() previously clamped values >= INT_MAX to INT_MAX
and returned success, allowing malformed ACL text to silently set IDs
to an arbitrary sentinel value. Change them to return -1 (a new
"overflow" indication) instead, and update all callers in both the
NFS4 and POSIX parsers (narrow and wide) to treat overflow as
ARCHIVE_WARN and skip the offending entry.
Add test_acl_nfs4_text.c with four test functions covering NFS4 ACL
text round-trips, audit/alarm entry types, numeric-ID handling
including the overflow boundary (INT_MAX - 1 accepted, INT_MAX
rejected), and malformed-entry error paths.
Tim Kientzle [Sat, 2 May 2026 20:58:45 +0000 (13:58 -0700)]
[tar] Harden timestamp parsing
Improves the parsing of timestamps in a couple of ways:
* Saturate when timestamps exceed the range of time_t.
In particular, this provides more rational behavior on
systems with 32-bit time_t.
* Validate the format of overlong pax timestamps.
We previously failed to check that high-resolution
timestamps had only digits in the fractional part.
We now notice and ignore those with a warning.
Tim Kientzle [Sat, 2 May 2026 16:46:06 +0000 (09:46 -0700)]
archive_acl: Fix buffer overrun and wrong output for NULL-name ACL entries
archive_acl_text_len() counted the trailing ":id" digits only when
ARCHIVE_ENTRY_ACL_STYLE_EXTRA_ID was set, but archive_acl_to_text_l()
always writes them for USER/GROUP entries whose name is NULL. With a
7-digit or larger id the allocated buffer was too short, causing
append_id() to write past its end.
Fix the estimator to also count the extra colon and digits when the
name is NULL, matching the serializer's logic.
The wide serializer (archive_acl_to_text_w) had the opposite problem:
it passed id=-1 to append_entry_w() for NULL-name entries regardless
of the id value, causing a garbage character to be written in the name
field and the numeric id to be omitted entirely. Fix it to mirror the
narrow serializer by setting id = ap->id when wname is NULL.
Darwin covers a wide range of platforms with similar but not identical
sets of libraries. MD5, SHA1 and SHA2 are available from libsystem on
all of these, but macOS also has them in libc and libmd. Restricting
our search to only libsystem means we can run configure on macOS and get
a config.h that also works for other Darwin platforms.
Return an int for error information and supply offset through a given
argument. This fits other function declarations and makes it much easier
to differentiate between status and "return value".
While at it, merge fallback mechanism of both functions.
Its functionality is split off from archive_read_format_7zip_bid
and returns offset to actual data, i.e. it handles self extracting
(SFX) files if offset 0 is not already a 7zip magic.