Make sure that the content of the link can fit into a size_t. This
should be always true, but be cautious with 32 bit systems and very
weird filesystems (possibly through fuse).
I took SSIZE_MAX as upper limit due to signedness and eventual readlink
calls which would fail with larger values anyway.
Use size_t for avail_in, avail_out and stream_in for ppmd streams.
The fields avail_in and avail_out values are set in function decompress
based on size_t variables (t_avail_in/t_avail_out) and eventually
written back. The stream_in field is only incremented.
The actual use case happens within ppmd_read to support situations in
which not enough bytes are available. In such cases, more bytes are read
on demand but not written into next_in.
In such cases, avail_in can turn negative and next_in can point outside
of its allocated memory area.
Since stream_in is always incremented by one, it won't overflow on real
hardware, given that size_t would address the whole available heap
space.
Make sure that avail_in never turns negative (which allows the size_t
usage) and also guarantee that t_avail_in will never wrap around,
leading to a huge "used" value.
As a bonus, __archive_read_ahead can be reliably called with a NULL
argument now, since no more casting occurs for second argument, which
was missing in the test.
GeorgH93 [Wed, 10 Jun 2026 19:23:31 +0000 (21:23 +0200)]
Refactor zip archive reader, by moving decryption related code blocks into their own functions to make them reusable for compression formats other than deflate
data [Thu, 11 Jun 2026 19:36:33 +0000 (03:36 +0800)]
tar reader: avoid temporary buffer for empty-prefix ustar names
For empty-prefix ustar entries, copy the fixed-width name field
directly into the archive entry instead of first building a temporary
archive_string.
This avoids a temporary buffer allocation and intermediate copy in the
common case. It also fixes a small fatal-error leak by freeing the
temporary prefix/name buffer before returning on pathname conversion
failure.
Dustin L. Howett [Wed, 10 Jun 2026 01:14:28 +0000 (20:14 -0500)]
Merge pull request #3132 from stoeckmann/lz4_double_free
lz4: Fix double-free on reallocation failure
Alternative version of https://github.com/libarchive/libarchive/pull/2945 which removes the test (which requires a modified malloc to actually fail the 4 MB allocation).
isomorph-cyber [Wed, 25 Mar 2026 03:19:10 +0000 (23:19 -0400)]
Fix double-free in LZ4 filter on reallocation failure (CWE-415)
lz4_allocate_out_block() frees state->out_block without NULLing
the pointer. If the subsequent malloc fails, the function returns
ARCHIVE_FATAL with a dangling pointer. lz4_filter_close() later
calls free(state->out_block) again, triggering a double-free.
Also, state->out_block_size was updated before checking if malloc
succeeded, leaving inconsistent metadata on failure.
Fix both lz4_allocate_out_block() and lz4_allocate_out_block_for_legacy():
- NULL the pointer immediately after free
- Move size update to after malloc succeeds
- Reset size to 0 on allocation failure
datauwu [Tue, 9 Jun 2026 19:10:27 +0000 (03:10 +0800)]
7zip: add malformed SubStreamsInfo test
Add a 7z regression test for malformed SubStreamsInfo metadata that
declares more than one unpack stream without the kSize data needed to
describe those streams.
Store the archive as a .7z.uu file, matching the existing malformed
7z tests.
unzip: reject absolute or traversing symlink targets
This is overly broad, and will reject some well-formed archives which
contain symlinks to trees which exist in the archive; however, this is
the best we can do without some rudimentary path parsing.
Merge pull request #3116 from stoeckmann/hardening
This PR does not fix any reachable issue, but fixes the code in question nonetheless to prevent regressions in the future:
- Do not call `archive_copy_error` after `archive_read_free` to prevent a user after free bug
- Reset `vtable` to `NULL` to prevent `close` from being called after filter initialization error, since `data` is already freed and set to `NULL`, preventing a `NULL` pointer dereference
If a system with sizeof(wchar_t)=2 (e.g. Cygwin) tries to convert a wide
character string into a multi byte string representation, it
precalculates the required length with sizeof(wchar_t) instead of
MB_LEN_MAX. This can lead to short memory allocation for filenames which
have a shorter representation in wchar_t than in UTF-8.
A system with sizeof(wchar_t)=2 (Cygwin on Windows) can trigger an out
of boundary write in archive_read_open_filenames_w when converting the
wide character string into a multi byte string.
The current finite state machine carefully handles short reads, i.e. the
loop can enter as often as needed until enough bytes arrive for the
current state to perform its actions.
This can be simplified by relying on __archive_filter_read_ahead to
return the amount of bytes actually needed. I assume that this did not
happen in the original code due to its age (2009) and evolution of
libarchive's internals over time.
Also, headers are only skipped at the beginning. As soon as the reader
starts returning data (ST_ARCHIVE reached), the filter pretty much
becomes a pass-through filter.
Split the initial lead and header skipping into its own function and
only keep track if the initial skipping was performed or not. This
greatly simplifies the reader function.
Also, it avoids book keeping of internal states and "total_in" tracking,
which I don't have to properly audit for edge cases anymore.
Last but not least, this refactoring properly reports truncated streams
now.
00redbeer [Sun, 7 Jun 2026 12:23:25 +0000 (14:23 +0200)]
rar5: fix integer underflow in bytes_remaining
A malformed RAR5 archive with data_size=1 forces bytes_remaining
(ssize_t) to wrap to -2 when a compressed block header consumes
to_skip=3 bytes (CWE-191). That negative value is then implicitly
cast to size_t ~0 inside malloc(), requesting a ~16-exabyte
allocation — confirmed heap buffer overflow via ASAN/UBSan on a
48-byte crafted archive requiring no authentication.
Three guards added to archive_read_support_format_rar5.c:
1. Reject data_size > SSIZE_MAX before assigning to bytes_remaining
(CWE-195, unsafe unsigned-to-signed conversion)
2. Reject to_skip > bytes_remaining in process_block() before the
subtraction — this is the primary fix for the underflow (CWE-191)
3. Change cur_block_size == 0 to cur_block_size <= 0 in merge_block()
as defense-in-depth so that any negative bytes_remaining reaching
read_ahead() is caught before it becomes a malloc size (CWE-122)
00redbeer [Sun, 7 Jun 2026 12:14:34 +0000 (14:14 +0200)]
rar5: check integer overflow in bytes_remaining
A malformed RAR5 archive with data_size=1 forces bytes_remaining
(ssize_t) to wrap to -2 when a compressed block header consumes
to_skip=3 bytes (CWE-191). That negative value is then implicitly
cast to size_t ~0 inside malloc(), requesting a ~16-exabyte
allocation — confirmed heap buffer overflow via ASAN/UBSan on a
48-byte crafted archive requiring no authentication.
Reproducer: 48-byte crafted RAR5 archive; ASAN confirms
"allocation-size-too-big 0xfffffffffffffffe".
If vtable is not set to NULL, close function would be called during
shutdown. Since data is already freed and set to NULL, this would lead
to a NULL pointer dereference later on.
The called library functions should never fail though, so this is a
purely defensive measure against future lzma changes.
If archive_read_next_header in add_pattern_from_file would ever return
anything but ARCHIVE_OK or ARCHIVE_EOF, a use after free would occur
when copying error information.
Since this is impossible with current setup (format raw without any
further filter, thus only open_filename code), this change is a purely
defensive measure against future changes.
If the original name cannot be duplicated, return ARCHIVE_FAILED instead
of ARCHIVE_WARN. The latter implies that the option is unknown, which is
not the case.
All arithmetical operations are unsigned, and it makes sense to keep it
unsigned: The total_in value is written at the end of the stream and if
the value overflows, it's pretty much expected to be % UINT32_MAX.
Very unlikely that int64_t will ever overflow, but the fix is cheap.
The CreateSymbolicLinkW function is available since 0x0600 and is also
part of the Nano Server APIs. On earlier systems, don't even try.
Otherwise use it directly to simplify code.
- Use a stack array for 22 bytes
- Entering the if-branch already implies that we will add data
- Use snprintf instead of strcpy
Even though snprintf is slower than strcpy, it's easier to verify and
since nobody complained so far about the malloc overhead, this should be
okay (for now).
As a bonus, this code cannot fail anymore, which previously meant that
file attributes were silently ignored.