Archives are not necessarily 256 bytes long, e.g. the last part of a
multi volume archive. Only try to read up to 256 bytes when looking for
an archive or disk name, not necessarily exactly 256 bytes. This matches
the behavior of regular entry name lookups.
Also, disk names can be empty. Handle this as well.
Make sure that the content of the link can fit into a size_t. This
should be always true, but be cautious with 32 bit systems and very
weird filesystems (possibly through fuse).
I took SSIZE_MAX as upper limit due to signedness and eventual readlink
calls which would fail with larger values anyway.
Use size_t for avail_in, avail_out and stream_in for ppmd streams.
The fields avail_in and avail_out values are set in function decompress
based on size_t variables (t_avail_in/t_avail_out) and eventually
written back. The stream_in field is only incremented.
The actual use case happens within ppmd_read to support situations in
which not enough bytes are available. In such cases, more bytes are read
on demand but not written into next_in.
In such cases, avail_in can turn negative and next_in can point outside
of its allocated memory area.
Since stream_in is always incremented by one, it won't overflow on real
hardware, given that size_t would address the whole available heap
space.
Make sure that avail_in never turns negative (which allows the size_t
usage) and also guarantee that t_avail_in will never wrap around,
leading to a huge "used" value.
As a bonus, __archive_read_ahead can be reliably called with a NULL
argument now, since no more casting occurs for second argument, which
was missing in the test.
The cab_read_ahead_remaining function might return more bytes through
avail than initially asked for. The given limit is 255+1, i.e. the
maximum file name length.
While it's not a big deal for libarchive to handle file names longer
than that, the CAB format does not allow longer names.
Limit the amount of available bytes to the given argument for proper
checking.
If not enough bytes are available, __archive_read_ahead will return the
amount of bytes still available, which can be larger than 0. Only in
error cases, a negative value is returned.
Check the return pointer instead. It simplifies the error handling,
allows a NULL argument, and covers more truncation issues.
Right now, a NULL pointer with a non-zero size could be further
processed, which just asks for more technical or logical issues to
arise.
data [Sat, 13 Jun 2026 00:12:52 +0000 (08:12 +0800)]
tar: avoid wide pathname conversion for trailing slash check
Avoid requesting the wide-character pathname when the tar reader only
needs to check whether a regular entry name ends in '/'.
archive_entry_pathname_w() can lazily convert the pathname to WCS. In the
common tar read path, the multibyte pathname is already available, so
checking it first avoids unnecessary per-entry conversion. The WCS fallback
is kept for cases where the multibyte pathname is unavailable.
GeorgH93 [Wed, 10 Jun 2026 19:23:31 +0000 (21:23 +0200)]
Refactor zip archive reader, by moving decryption related code blocks into their own functions to make them reusable for compression formats other than deflate
data [Thu, 11 Jun 2026 19:36:33 +0000 (03:36 +0800)]
tar reader: avoid temporary buffer for empty-prefix ustar names
For empty-prefix ustar entries, copy the fixed-width name field
directly into the archive entry instead of first building a temporary
archive_string.
This avoids a temporary buffer allocation and intermediate copy in the
common case. It also fixes a small fatal-error leak by freeing the
temporary prefix/name buffer before returning on pathname conversion
failure.
Dustin L. Howett [Wed, 10 Jun 2026 01:14:28 +0000 (20:14 -0500)]
Merge pull request #3132 from stoeckmann/lz4_double_free
lz4: Fix double-free on reallocation failure
Alternative version of https://github.com/libarchive/libarchive/pull/2945 which removes the test (which requires a modified malloc to actually fail the 4 MB allocation).
isomorph-cyber [Wed, 25 Mar 2026 03:19:10 +0000 (23:19 -0400)]
Fix double-free in LZ4 filter on reallocation failure (CWE-415)
lz4_allocate_out_block() frees state->out_block without NULLing
the pointer. If the subsequent malloc fails, the function returns
ARCHIVE_FATAL with a dangling pointer. lz4_filter_close() later
calls free(state->out_block) again, triggering a double-free.
Also, state->out_block_size was updated before checking if malloc
succeeded, leaving inconsistent metadata on failure.
Fix both lz4_allocate_out_block() and lz4_allocate_out_block_for_legacy():
- NULL the pointer immediately after free
- Move size update to after malloc succeeds
- Reset size to 0 on allocation failure
datauwu [Tue, 9 Jun 2026 19:10:27 +0000 (03:10 +0800)]
7zip: add malformed SubStreamsInfo test
Add a 7z regression test for malformed SubStreamsInfo metadata that
declares more than one unpack stream without the kSize data needed to
describe those streams.
Store the archive as a .7z.uu file, matching the existing malformed
7z tests.
unzip: reject absolute or traversing symlink targets
This is overly broad, and will reject some well-formed archives which
contain symlinks to trees which exist in the archive; however, this is
the best we can do without some rudimentary path parsing.
Merge pull request #3116 from stoeckmann/hardening
This PR does not fix any reachable issue, but fixes the code in question nonetheless to prevent regressions in the future:
- Do not call `archive_copy_error` after `archive_read_free` to prevent a user after free bug
- Reset `vtable` to `NULL` to prevent `close` from being called after filter initialization error, since `data` is already freed and set to `NULL`, preventing a `NULL` pointer dereference
If a system with sizeof(wchar_t)=2 (e.g. Cygwin) tries to convert a wide
character string into a multi byte string representation, it
precalculates the required length with sizeof(wchar_t) instead of
MB_LEN_MAX. This can lead to short memory allocation for filenames which
have a shorter representation in wchar_t than in UTF-8.
A system with sizeof(wchar_t)=2 (Cygwin on Windows) can trigger an out
of boundary write in archive_read_open_filenames_w when converting the
wide character string into a multi byte string.
The current finite state machine carefully handles short reads, i.e. the
loop can enter as often as needed until enough bytes arrive for the
current state to perform its actions.
This can be simplified by relying on __archive_filter_read_ahead to
return the amount of bytes actually needed. I assume that this did not
happen in the original code due to its age (2009) and evolution of
libarchive's internals over time.
Also, headers are only skipped at the beginning. As soon as the reader
starts returning data (ST_ARCHIVE reached), the filter pretty much
becomes a pass-through filter.
Split the initial lead and header skipping into its own function and
only keep track if the initial skipping was performed or not. This
greatly simplifies the reader function.
Also, it avoids book keeping of internal states and "total_in" tracking,
which I don't have to properly audit for edge cases anymore.
Last but not least, this refactoring properly reports truncated streams
now.
00redbeer [Sun, 7 Jun 2026 12:23:25 +0000 (14:23 +0200)]
rar5: fix integer underflow in bytes_remaining
A malformed RAR5 archive with data_size=1 forces bytes_remaining
(ssize_t) to wrap to -2 when a compressed block header consumes
to_skip=3 bytes (CWE-191). That negative value is then implicitly
cast to size_t ~0 inside malloc(), requesting a ~16-exabyte
allocation — confirmed heap buffer overflow via ASAN/UBSan on a
48-byte crafted archive requiring no authentication.
Three guards added to archive_read_support_format_rar5.c:
1. Reject data_size > SSIZE_MAX before assigning to bytes_remaining
(CWE-195, unsafe unsigned-to-signed conversion)
2. Reject to_skip > bytes_remaining in process_block() before the
subtraction — this is the primary fix for the underflow (CWE-191)
3. Change cur_block_size == 0 to cur_block_size <= 0 in merge_block()
as defense-in-depth so that any negative bytes_remaining reaching
read_ahead() is caught before it becomes a malloc size (CWE-122)
00redbeer [Sun, 7 Jun 2026 12:14:34 +0000 (14:14 +0200)]
rar5: check integer overflow in bytes_remaining
A malformed RAR5 archive with data_size=1 forces bytes_remaining
(ssize_t) to wrap to -2 when a compressed block header consumes
to_skip=3 bytes (CWE-191). That negative value is then implicitly
cast to size_t ~0 inside malloc(), requesting a ~16-exabyte
allocation — confirmed heap buffer overflow via ASAN/UBSan on a
48-byte crafted archive requiring no authentication.
Reproducer: 48-byte crafted RAR5 archive; ASAN confirms
"allocation-size-too-big 0xfffffffffffffffe".