It is possible to handle entries and files with sizes which do not fit
into off_t of the current system (Windows always has 32 bit off_t and
32 bit systems without large file support also have 32 bit off_t).
Set sizes to 0 in such cases. The fstat system call would return -1 and
set errno to EOVERFLOW, but that's not how archive_entry_set_size acts.
It would simply ignore negative values and set the size to 0.
Actual callers of archive_entry_stat from foreign projects seem to not
even check for NULL return values, so let's try to handle such cases as
nice as possible.
Affects mtree's checkfs option as well (Windows only, 32 bit systems
would simply fail in fstat/stat).
Dustin Howett [Fri, 15 Oct 2021 22:47:53 +0000 (17:47 -0500)]
win32: shim wopen, and make both open/wopen use _s "secure" variant
The new `__la_wopen` wrapper is a copy of `__la_open` that
expects--rather than converts--a wcs parameter.
The `sopen` variants are offered as "more secure" variants of `open` and
`wopen`; I cannot vouch for their security, but some build systems are
strict about the use of "banned insecure APIs".
I've confirmed that `_wsopen_s` and `_open_s` are present in the Windows
Vista SDK.
I did not confirm that they are available in the Windows XP Platform
SDK, in part because in e61afbd463d1 (2016!) Tim says:
> I'd like to completely remove support for WinXP and earlier.
Rose [Wed, 11 Jun 2025 19:21:46 +0000 (15:21 -0400)]
Fix error checking in writing files
For write, 0 may not mean an error at all. We need to instead check for the length not being the same.
With fwrite, because 0 could mean an error, but not always. We must check that we wrote the entire file!
Note that unlike write, fwrite's description according to POSIX does not mention returning a negative type at all. Nor does it say you can retry unlike write.
Finally, with write, we need to check less than 0, not 0, as 0 is a valid return and does not mean an error.
Make sure that the string table size is not smaller than 6 (and also
not larger than SIZE_MAX for better 32 bit support).
Such small values would lead to a large loop limit which either leads to
a crash or wrong detection of a ".data" string in possibly uninitialized
memory.
If a sparse hole is located at the end of an entry, then the tar
parser returns ARCHIVE_EOF while updating the offset where 0 bytes of
data will follow.
If archive_read_data encounters such an ARCHIVE_EOF return value, it
has to recheck if the offsets (data offset and output offset) still
match. If they do not match, it has to keep filling 0 bytes.
This changes assumes that it's okay to call archive_read_data_block
again after an EOF. As far as I understood the parsers so far, this
should be okay, since it's always ARCHIVE_EOF afterwards.
rar: Do not forcefully set offset to unpacked size
If an entry reaches its end of file, the offset is not necessarily
the same as unp_size. This is especially true for links which have
a "0 size body" even though the unpacked size is not 0.
When _warc_read encounters end of entry, it adds 4 bytes to the last
offset for \r\n\r\n separator, which is never written. Ignore these
bytes since they are not part of the returned entry.
The string constants can be used directly for comparison, which makes
this code robust against future changes which could lead to names being
longer than str could hold on stack.
Also removes around 100 bytes from compiled library (with gcc 15).
If zlib is not supported, do not run tests to avoid false positives.
Also adjust tests to support latest gzip versions (1.10+) which store
less information for improved reproducibility. The gzip binary is
used as a fallback if zlib is not available.
If no encryption support exists, the -P option will always fail.
"Skip" the test by making sure that there really is no encryption
support according to libarchive functions.
Kyle Evans [Tue, 3 Jun 2025 02:43:28 +0000 (21:43 -0500)]
libarchive/test: fix build when memcpy() is a macro
After importing the latest libarchive into FreeBSD, Shawn Webb @
HardenedBSD noted that the test build is broken when FORTIFY_SOURCE=2
while building the base system. Braced initializer lists are a special
case that need some extra fun parentheses when we're dealing with the
preprocessor.
While it's not a particularly common setup, the extra parentheses don't
really hurt readability all that much so it's worth fixing for wider
compatibility.
Ignoring SIGCHLD gets passed to child processes. Doing that has
influence on waitpid, namely that zombie processes won't be
created. This means that a status can never be read.
We can't enforce this in library, but libarchive's tools can be
protected against this by enforcing default handling.
Use pid_t since waitpid returns a pid_t. Also check for a negative
return value in writer as well to avoid reading the possibly
unitialized status value.
Calling CloseHandle multiple times for the same handle can lead to
exceptions while debugging according to documentation.
Mimic the waitpid handling for success cases to behave more like the
Unix version which would "reap the zombie".
Doing this for an unsuccessful call is off, but the loop is never
entered again, so I guess it's okay and worth it to reduce the amount
of Windows specific definitions in source files.
The archive_utility_string_sort function won't be part of the 4.0.0 API
anymore. No users were found and such a task should be done outside of
the library.
The utility function "archive_utility_string_sort" is a custom qsort
implementation. Since qsort is specified in C11 and POSIX.1-2008
which libarchive is based on, use system's qsort directly.
The function is not used directly in libarchive, so this is a good
way to save around 500 bytes in resulting library without breaking
compatibility for any user of this function (none found).
Also allows more than UINT_MAX entries which previously were limited
by data type and (way earlier) due to recursion.
Test cases already get a C locale, which is sufficient for this test.
IF LC_TIME was not previously set, the used en_US.UTF-8 would stay
as an environment variable, possibly affecting other test cases.
Since en_US.UTF-8 is not guaranteed to be available, C is a better
choice.
Reset current locale settings through setlocale and also all
environment variables which might affect test cases which
spawn children through systemf which in turn would call setlocale
on their own, e.g. bsdtar.
Explicitly use goto to turn a recursive call into an iterative one.
Most compilers do this on their own with default settings, but MSVC
with default settings would create a binary which actually performs
recursive calls.
Fixes call stack overflow in binaries compiled with low optimization.
tar: Handle many sparse comments on 32 bit systems
The sparse 1.0 parser skips lines with comments. The amount of skipped
bytes is stored in a ssize_t variable, although common 32 bit systems
allow files larger than 4 GB.
Gracefully handle files with more than 2 GB bytes full of comments to
prevent integer truncations.
If a pax global header specifies a negative size, it is possible to
reduce variable `unconsumed` by 512 bytes, leading to a re-reading
of the pax global header. Fortunately the loop verifies that only one
global header per entry is allowed, leading to a later ARCHIVE_FATAL.
Avoid any form of negative size handling and fail early.
Skip all entry bytes after sparse entries were encountered. This matches
GNU tar behavior.
I have adjusted (and fixed) the existing test case for this. The test
case test_read_format_gtar_sparse_skip_entry did not work with GNU tar.
In #2558 it was explained that the pax size always overrides the header
size (correct). Since the pax size in the test case was way larger than
the actual entry bytes in archive, GNU tar choke on the test file.
The libarchive parser did not skip any bytes not already read due to
references by sparse entries, so the huge pax size was not detected.
By adjusting the test case to have a leftover byte (only 3 bytes are
referenced through sparse entry now, leaving one extra byte) with a
correct pax size and an invalid header size (after all it is overridden
by pax size), GNU tar works and libarchive gets off its 512 byte
alignment, not being able to read the next entry.
Steve Lhomme [Mon, 26 May 2025 08:44:49 +0000 (10:44 +0200)]
[cmake] add uuid library when using xmllite
Consecutive to 16fd043f51d911b106f2a7834ad8f08f65051977
IID_ISequentialStream is required by the code.
This GUID is defined in uuid.lib or libuuid.a in mingw-w64. It is required
to link with that library to get the definition of the GUID. Some toolchains
add it by default but not all.
If a pax attribute has a 0 length value and no newline, the tar reader
gets out of sync with block alignment.
This happens because the pax parser assumes that variable value_length
(which includes the terminating newline) is at least 1. To get the
real value length, 1 is subtracted. This result is subtracted from
extsize, which in this case would lead to `extsize -= -1`, i.e.
the remaining byte count is increased.
Such an unexpected calculation leads to an off-by-one when skipping
to the next block. In supplied test case, bsdtar complains that the
checksum of the next block is wrong. Since the tar parser was not
properly 512 bytes aligned, this is no surprise.
Gracefully handle such a case like GNU tar does and warn the user that
an invalid attribute has been encountered.
Zhaofeng Li [Sat, 24 May 2025 19:45:18 +0000 (13:45 -0600)]
tar: Reset accumulated header state after reading macOS metadata blob
AppleDouble extension entries are present as separate files immediately
preceding the corresponding real files. In libarchive, we process the
entire metadata file (headers + data) as if it were a header in the real
file. However, the code forgets to reset the accumulated header state
before parsing the real file's headers. In one code path, this causes
the metadata file's name to be used as the real file's name.
Specifically, this can be triggered with a tar containing two files:
1. A file named `._badname` with pax header containing the `path` attribute
2. A file named `goodname` _with_ a pax header but _without_ the `path` attribute
libarchive will list one file, `._badname` containing the data of `goodname`.
This code is pretty brittle and we really should let the client deal with
it :(
Pax extended headers may specify negative time values for files older
than the epoch.
Adjust the code to clear values to 0.0 more often and set ps to
INT64_MIN to have a proper error specifier, because the parser does
not allow anything below -INT64_MAX.
The count fields are merely used to check if a list is empty or not.
A check for first being not NULL is sufficient and is already in
place while iterating over the linked elements (count is not used).
The operations for key and node comparison depend on the platform
libarchive is compiled for. Since these values do not change
during runtime, set them only once during initialisation.
Further simplify the code by declaring only one "rb_ops" with
required functions based on platform.
The cygwin FAQ states that __CYGWIN__ is defined when building for a
Cygwin environment. Only a few test files check (inconsistently) for
CYGWIN, so adjust them to the recommended __CYGWIN__ definition.