Sanjay Rawat [Thu, 21 May 2026 21:51:09 +0000 (23:51 +0200)]
xar: fix fflags_text leak in file_free
file_free() releases pathname, symlink, uname, gname, and hardlink but
omits fflags_text. When a XAR archive describes a file with <flags> or
<ext2> children (e.g. <SystemNoUnlink/>, <Compress/>), xml_parse_file_flags
/ xml_parse_file_ext2 populate xar_file->fflags_text via archive_strcat,
which heap-allocates. The buffer leaks on every file_free().
Reproduces with ASan+LSan via the bundled bsdtar:
bsdtar -tvf <xar-with-flags>
=> Direct leak of N bytes ... archive_strcat ... xml_parse_file_flags
Same shape as commit 6767cbe3 ("Free XAR xattr fstype metadata"), which
fixed the analogous miss in xattr_free().
Existing release of fflags_text in archive_string_free is a no-op when
the field was never populated (.s == NULL, free(NULL) is safe), so the
patch is harmless on the non-flags path.
Currently, the code calls GetTempPathW to figure out required size for a
buffer larger enough to contain the temporary directory path, allocates
the memory, and then calls GetTempPathW again to populate the memory.
Since libarchive is designed with multi-threading in mind, the worst
situation would be that another thread modifies the environment variable
between these two calls.
Use a buffer of MAX_PATH + 1 (261) to basically cover all regular
situations. If long paths are enabled, reallocate until enough bytes
were available (32 kb is maximum) without another thread intefering.
Realistically, this will happen only once.
datauwu [Sun, 17 May 2026 07:34:15 +0000 (15:34 +0800)]
cpio: reject oversized pathnames before read-ahead
Reject malformed CPIO entries whose pathname field exceeds 1 MiB before asking the read-ahead layer to satisfy the padded pathname length.
This prevents newc archives with attacker-controlled c_namesize values from forcing large metadata read-ahead and pathname allocation during archive listing. Add a regression test that fails on the unpatched reader and passes once the cap is enforced.
Tim Kientzle [Sat, 16 May 2026 17:04:24 +0000 (10:04 -0700)]
Fix unchecked calloc results in init_unpack (rar5)
window_buf and filtered_buf were allocated via calloc without checking
for NULL. Change init_unpack to return int and propagate ARCHIVE_FATAL
on allocation failure to the caller.
The strcpy and sprintf functions are generally hard to reason about.
While they are safe in this context, I think, it's easy to refactor the
code to avoid them completely.
The code can be simplified to avoid strcpy usage. While not exactly a
much safer approach by manually adjusting characters in a string, this
attempt reduces size of libarchive and avoids unneeded copy operations.
cmdline: Use free+strdup instead of realloc+strcpy
The cmdline_set_path function contains logic to reallocate memory of
command path and copy a new value into it.
Inline a simpler free+strdup alternative into __archive_cmdline_parse.
Since the function is always called with empty data, free could be
removed as well. I've kept it that way just to make sure that it's 100 %
compatible with previous code.
read: Fix memory corruption in client_switch_proxy
Switching a multi-volume archive file with another active filter, e.g.
decompression, can lead to memory corruption due to modifying the wrong
private data (self->data).
Use highest upstream filter to replace the correct private data.
Parsing a string into an integer also means that boundary checks have to
be performed. Check within atou64 as well as outside by verifying that
subsequent casts won't truncate numbers.
br0nzu [Wed, 13 May 2026 07:24:44 +0000 (16:24 +0900)]
Free XAR xattr fstype metadata
The XAR xattr cleanup helper released the xattr name string but not the parallel fstype archive_string. Release fstype from the same xattr_free() ownership boundary.
br0nzu [Wed, 13 May 2026 07:24:44 +0000 (16:24 +0900)]
Add XAR xattr fstype cleanup coverage
Add a focused XAR sample with xattr fstype metadata and exercise it through the public read/free path. This gives leak-checking builds coverage for the xattr cleanup ownership boundary.
Avoid requesting NEWSUB extended data through read-ahead while parsing the header. The full NEWSUB block size is still validated and consumed, but the extended data is not required to be present in memory during header parsing.
Add a test for a malformed NEWSUB header with a large packed size.
Allocate enough memory for possible addition of 3 characters within the
range of 0-Z. Since UTF-16 is in use, allocate 6 bytes + 2 bytes for the
terminating NUL character.
Also keep in mind that "l" is already size in bytes, which means that a
multiplication of 2 is not needed (and prevented overflow issues with
longer filenames).
It is possible to trigger an out of boundary write with short filenames
which contain illegal ISO9660 characters. For these files, Joliet IDs
are generated. If multiple files lead to the same ID (which can happen
because illegal characters are replaced with an underscore), 3
characters/digits in the range of 0-Z are added.
iso9660: Fix infinite loop in Joliet ID generation
3 characters/digits base 36 means that 46656 combinations are possible.
If a directory with even more conflicting identifiers is encountered, the
code would trigger an endless loop.
Strings pointed to by these variables are actually modified. They point
to modifiable data areas (own stack arrays or argv arguments), so the
code does not erroneously modify them. Instead, clarify that they are
modifiable by removing the qualifier.
Rudi Heitbaum [Mon, 16 Feb 2026 10:20:04 +0000 (10:20 +0000)]
fix handling of missing const type qualifier
Since glibc-2.43:
For ISO C23, the functions bsearch, memchr, strchr, strpbrk, strrchr,
strstr, wcschr, wcspbrk, wcsrchr, wcsstr and wmemchr that return pointers
into their input arrays now have definitions as macros that return a
pointer to a const-qualified type when the input argument is a pointer
to a const-qualified type.
The supplied nanoseconds of time keyword could be truncated due to
casting from int64_t to long (relevant for Windows and x86), resulting
in an incorrect value.
Since the implementation already caps the value at specific limits for
bug compatibility, just use the correct data type for parsing to not
make things worse.
If no flags are supplied, anchor flags are supposed to be not special.
This means that ^ at the beginning of a pattern should be treated as a
regular character.
This breaks current behavior, but complies with comments in code, i.e.
archive_pathmatch.h line 41/42.
Patterns with a lot of asterisks may overflow the call stack, crashing
the application. Check the recursion depth. If it is too deep, fail
with an error.
These functions can return negative values, in which case operation
itself failed. While internal libarchive libraries handle these cases,
the tools don't. Check for negative values in them as well.
Tim Kientzle [Fri, 8 May 2026 04:50:50 +0000 (21:50 -0700)]
[test_utils] Fix a minor UB
(UBSan occasionally finds something interesting and
often reports whacky non-bugs like this one. "Fixing"
it will make the real UB bugs easier to identify, so...)
According to C's integer promotion rules, `unsigned short` gets
promoted to _signed_ `int`, and shifting into the sign bit of an `int`
is technically UB. Explicit cast to `unsigned` quiets UBSan.
Tim Kientzle [Fri, 8 May 2026 04:11:54 +0000 (21:11 -0700)]
[XAR] Fix two UB
1. The XAR writer's path normalization code uses strcpy() to move
parts of a path string within the same buffer. The source and
destination ranges overlap, which is undefined behavior for strcpy().
2. Failure to check string length before accessing the last character
of a path component. For empty components (e.g., //), the length is 0,
and length-1 underflows to SIZE_MAX.
Tim Kientzle [Fri, 8 May 2026 03:15:37 +0000 (20:15 -0700)]
[pathmatch] Heap buffer over-read
The bracket expression matching [ in the pathmatch engine fails to
handle malformed patterns, specifically when a closing ] is missing or
when high-byte characters are used. The scanner advances the pattern
pointer beyond the allocated buffer.
Tim Kientzle [Fri, 8 May 2026 02:41:04 +0000 (19:41 -0700)]
[ACL] Parser out-of-bounds read
The ACL parser fails to validate buffer length when processing PAX
attributes (SCHILY.acl.access/default). The next_field() function
attempts to read a separator character from a pointer even when the
remaining length is zero.
Jose Luis Duran [Wed, 15 Apr 2026 01:36:07 +0000 (01:36 +0000)]
mtree: Do not append '/' when basename is '.'
If the basename is '.', it means it is the root directory ('/'). Do not
append '/' to '.', as this will produce a path '/.', resulting in an
invalid mtree entry.
Tim Kientzle [Thu, 7 May 2026 21:35:02 +0000 (14:35 -0700)]
Date parsing: reject date components with numbers of more than 4 digits
Only the Unix epoch format `@<timestamp>` can have a number with
more than 4 digits. So let's break out the numeric parsing into
a standalone uint64 parser and use it separately to parse epoch
timestamps (which are only limited by the range of time_t) and
other date components.
It also avoids a time-consuming leap-year correction for
nonsensically large year values.