Make sure that the string table size is not smaller than 6 (and also
not larger than SIZE_MAX for better 32 bit support).
Such small values would lead to a large loop limit which either leads to
a crash or wrong detection of a ".data" string in possibly uninitialized
memory.
If a sparse hole is located at the end of an entry, then the tar
parser returns ARCHIVE_EOF while updating the offset where 0 bytes of
data will follow.
If archive_read_data encounters such an ARCHIVE_EOF return value, it
has to recheck if the offsets (data offset and output offset) still
match. If they do not match, it has to keep filling 0 bytes.
This changes assumes that it's okay to call archive_read_data_block
again after an EOF. As far as I understood the parsers so far, this
should be okay, since it's always ARCHIVE_EOF afterwards.
rar: Do not forcefully set offset to unpacked size
If an entry reaches its end of file, the offset is not necessarily
the same as unp_size. This is especially true for links which have
a "0 size body" even though the unpacked size is not 0.
When _warc_read encounters end of entry, it adds 4 bytes to the last
offset for \r\n\r\n separator, which is never written. Ignore these
bytes since they are not part of the returned entry.
The string constants can be used directly for comparison, which makes
this code robust against future changes which could lead to names being
longer than str could hold on stack.
Also removes around 100 bytes from compiled library (with gcc 15).
Kyle Evans [Tue, 3 Jun 2025 02:43:28 +0000 (21:43 -0500)]
libarchive/test: fix build when memcpy() is a macro
After importing the latest libarchive into FreeBSD, Shawn Webb @
HardenedBSD noted that the test build is broken when FORTIFY_SOURCE=2
while building the base system. Braced initializer lists are a special
case that need some extra fun parentheses when we're dealing with the
preprocessor.
While it's not a particularly common setup, the extra parentheses don't
really hurt readability all that much so it's worth fixing for wider
compatibility.
Ignoring SIGCHLD gets passed to child processes. Doing that has
influence on waitpid, namely that zombie processes won't be
created. This means that a status can never be read.
We can't enforce this in library, but libarchive's tools can be
protected against this by enforcing default handling.
Use pid_t since waitpid returns a pid_t. Also check for a negative
return value in writer as well to avoid reading the possibly
unitialized status value.
Calling CloseHandle multiple times for the same handle can lead to
exceptions while debugging according to documentation.
Mimic the waitpid handling for success cases to behave more like the
Unix version which would "reap the zombie".
Doing this for an unsuccessful call is off, but the loop is never
entered again, so I guess it's okay and worth it to reduce the amount
of Windows specific definitions in source files.
The archive_utility_string_sort function won't be part of the 4.0.0 API
anymore. No users were found and such a task should be done outside of
the library.
The utility function "archive_utility_string_sort" is a custom qsort
implementation. Since qsort is specified in C11 and POSIX.1-2008
which libarchive is based on, use system's qsort directly.
The function is not used directly in libarchive, so this is a good
way to save around 500 bytes in resulting library without breaking
compatibility for any user of this function (none found).
Also allows more than UINT_MAX entries which previously were limited
by data type and (way earlier) due to recursion.
Test cases already get a C locale, which is sufficient for this test.
IF LC_TIME was not previously set, the used en_US.UTF-8 would stay
as an environment variable, possibly affecting other test cases.
Since en_US.UTF-8 is not guaranteed to be available, C is a better
choice.
Reset current locale settings through setlocale and also all
environment variables which might affect test cases which
spawn children through systemf which in turn would call setlocale
on their own, e.g. bsdtar.
Explicitly use goto to turn a recursive call into an iterative one.
Most compilers do this on their own with default settings, but MSVC
with default settings would create a binary which actually performs
recursive calls.
Fixes call stack overflow in binaries compiled with low optimization.
tar: Handle many sparse comments on 32 bit systems
The sparse 1.0 parser skips lines with comments. The amount of skipped
bytes is stored in a ssize_t variable, although common 32 bit systems
allow files larger than 4 GB.
Gracefully handle files with more than 2 GB bytes full of comments to
prevent integer truncations.
If a pax global header specifies a negative size, it is possible to
reduce variable `unconsumed` by 512 bytes, leading to a re-reading
of the pax global header. Fortunately the loop verifies that only one
global header per entry is allowed, leading to a later ARCHIVE_FATAL.
Avoid any form of negative size handling and fail early.
Skip all entry bytes after sparse entries were encountered. This matches
GNU tar behavior.
I have adjusted (and fixed) the existing test case for this. The test
case test_read_format_gtar_sparse_skip_entry did not work with GNU tar.
In #2558 it was explained that the pax size always overrides the header
size (correct). Since the pax size in the test case was way larger than
the actual entry bytes in archive, GNU tar choke on the test file.
The libarchive parser did not skip any bytes not already read due to
references by sparse entries, so the huge pax size was not detected.
By adjusting the test case to have a leftover byte (only 3 bytes are
referenced through sparse entry now, leaving one extra byte) with a
correct pax size and an invalid header size (after all it is overridden
by pax size), GNU tar works and libarchive gets off its 512 byte
alignment, not being able to read the next entry.
Steve Lhomme [Mon, 26 May 2025 08:44:49 +0000 (10:44 +0200)]
[cmake] add uuid library when using xmllite
Consecutive to 16fd043f51d911b106f2a7834ad8f08f65051977
IID_ISequentialStream is required by the code.
This GUID is defined in uuid.lib or libuuid.a in mingw-w64. It is required
to link with that library to get the definition of the GUID. Some toolchains
add it by default but not all.
If a pax attribute has a 0 length value and no newline, the tar reader
gets out of sync with block alignment.
This happens because the pax parser assumes that variable value_length
(which includes the terminating newline) is at least 1. To get the
real value length, 1 is subtracted. This result is subtracted from
extsize, which in this case would lead to `extsize -= -1`, i.e.
the remaining byte count is increased.
Such an unexpected calculation leads to an off-by-one when skipping
to the next block. In supplied test case, bsdtar complains that the
checksum of the next block is wrong. Since the tar parser was not
properly 512 bytes aligned, this is no surprise.
Gracefully handle such a case like GNU tar does and warn the user that
an invalid attribute has been encountered.
Zhaofeng Li [Sat, 24 May 2025 19:45:18 +0000 (13:45 -0600)]
tar: Reset accumulated header state after reading macOS metadata blob
AppleDouble extension entries are present as separate files immediately
preceding the corresponding real files. In libarchive, we process the
entire metadata file (headers + data) as if it were a header in the real
file. However, the code forgets to reset the accumulated header state
before parsing the real file's headers. In one code path, this causes
the metadata file's name to be used as the real file's name.
Specifically, this can be triggered with a tar containing two files:
1. A file named `._badname` with pax header containing the `path` attribute
2. A file named `goodname` _with_ a pax header but _without_ the `path` attribute
libarchive will list one file, `._badname` containing the data of `goodname`.
This code is pretty brittle and we really should let the client deal with
it :(
Pax extended headers may specify negative time values for files older
than the epoch.
Adjust the code to clear values to 0.0 more often and set ps to
INT64_MIN to have a proper error specifier, because the parser does
not allow anything below -INT64_MAX.
The count fields are merely used to check if a list is empty or not.
A check for first being not NULL is sufficient and is already in
place while iterating over the linked elements (count is not used).
The operations for key and node comparison depend on the platform
libarchive is compiled for. Since these values do not change
during runtime, set them only once during initialisation.
Further simplify the code by declaring only one "rb_ops" with
required functions based on platform.
The cygwin FAQ states that __CYGWIN__ is defined when building for a
Cygwin environment. Only a few test files check (inconsistently) for
CYGWIN, so adjust them to the recommended __CYGWIN__ definition.
Cast address of "version" to BYTE pointer for CryptGetProvParam.
Fix "major" variable assignment for picky compilers like MSVC.
The "length" variable is an in/out variable. It must be set to the size
of available memory within "version". Right now it is undefined behavior
and 0 would crash during runtime.
dependabot[bot] [Tue, 20 May 2025 08:19:56 +0000 (10:19 +0200)]
CI: Bump the all-actions group across 1 directory with 4 updates (#2623)
Bumps the all-actions group with 4 updates:
`actions/checkout` from 4.2.1 to 4.2.2
`actions/upload-artifact` from 4.4.3 to 4.6.2
`github/codeql-action` from 3.26.12 to 3.28.18
`ossf/scorecard-action` from 2.4.0 to 2.4.1
Rose [Sat, 17 May 2025 23:35:22 +0000 (19:35 -0400)]
Fatal if field[0].start or field[0].end is null
We should not get here, but given that the check exists, we should not let it happen if this is NULL because otherwise we just dereference it later on.