RAR5 reader: add more checks for invalid extraction parameters
Some specially crafted files declare invalid extraction parameters that
can confuse the RAR5 reader.
One of the arguments is the declared window size parameter that the
archive file can declare for each file stored in the archive. Some
crafted files declare window size equal to 0, which is clearly wrong.
This commit adds additional safety checks decreasing the tolerance of
the RAR5 format.
RAR5 reader: fix invalid memory access in some files
RAR5 reader uses several variables to manage the window buffer during
extraction: the buffer itself (`window_buf`), the current size of the
window buffer (`window_size`), and a helper variable (`window_mask`)
that is used to constrain read and write offsets to the window buffer.
Some specially crafted files can force the unpacker to update the
`window_mask` variable to a value that is out of sync with current
buffer size. If the `window_mask` will be bigger than the actual buffer
size, then an invalid access operation can happen (SIGSEGV).
This commit ensures that if the `window_size` and `window_mask` will be
changed, the window buffer will be reallocated to the proper size, so no
invalid memory operation should be possible.
This commit contains a test file from OSSFuzz #30442.
Alex Richardson [Thu, 17 Sep 2020 17:28:17 +0000 (18:28 +0100)]
Avoid millions of rand() calls() when running tests
Many tests use a loop calling rand() to fill buffers with test data. As
these calls cannot be inlined, this adds up to noticeable overhead:
For example, running on QEMU RISC-V the test_write_format_7zip_large_copy
test took ~22 seconds before and with this change it's ~17 seconds.
This change uses a simpler xorshift64 random number generator that can be
inlined into the loop filling the data buffer. By default the seed for this
RNG is rand(), but it can be overwritten by setting the TEST_RANDOM_SEED
environment variable.
For a native build the difference is much less noticeable, but it's still
measurable: test_write_format_7zip_large_copy takes 314.9 ms ± 3.9 ms
before and 227.8 ms ± 5.8 ms after (i.e. 38% faster for that test).
Petr Malat [Thu, 23 Dec 2021 10:47:04 +0000 (11:47 +0100)]
Support libzstd compiled with compressor disabled
ZSTD library can be compiled with the compressor disabled, which is
handy on space restricted systems as the compressor accounts for more
than two thirds of the library size.
Detect this case and use libzstd for the decompression only.
Compression will be done using zstd binary if it's available.
Peter Pentchev [Wed, 22 Dec 2021 15:05:53 +0000 (17:05 +0200)]
Raise the lzip max dictionary size to 512MB.
The lzip documentation specifies that the logarithm of the dictionary
base size may be in the range 12-29, and the lzip utility is quite
capable of creating an archive with a dictionary larger than 128M if
passed the appropriate -s command-line option.
Graham Percival [Wed, 22 Dec 2021 02:00:19 +0000 (18:00 -0800)]
Fix Y2038 check
If time_t is a signed 32-bit integer, then we cannot represent times
after 03:14:07 on 2038-01-19. Indicating an error if (Year > 2038) is
not sufficient; for safety, we need to bail if (Year >= 2038).
As the comment above this line notes, it would be better to check if
time_t is 32 bits first. And even if we didn't check for that, we could
use a much more complicated check:
Walter Lozano [Sat, 18 Dec 2021 01:44:09 +0000 (22:44 -0300)]
Fix check for tape device
In b6b423f0 a fallback in tar to stdio was implemented. However, the check
for the tape device didn't interpret the correct value returned from
access(). Fix the check to implement the fallback to stdio properly.
Signed-off-by: Walter Lozano <walter.lozano@collabora.com>
Tim Kientzle [Sat, 4 Dec 2021 18:56:33 +0000 (10:56 -0800)]
Merge pull request #1632 from wlozano0collabora/default-archive-file
Have `bsdtar` default to stdout if this system has no tape device. This uses an `access()` check to see if the default tape device (e.g., `/dev/tape` on FreeBSD) exists and will use stdout as the default if it doesn't exist. If the system does have a tape device, there is no change to the existing behavior.
For libarchive 4.0, we'll change the default behavior of `bsdtar`:
* The `TAPE` environment variable will still be honored at runtime
* The `_PATH_DEFTAPE` preprocessor macro will still be honored at build time
* But `_PATH_DEFTAPE` will no longer be set by libarchive's default build, with the effect that for most people, bsdtar will default to stdout if there is no `-f` option provided.
Walter Lozano [Sat, 27 Nov 2021 00:23:20 +0000 (21:23 -0300)]
Add path fallback in tar
Since current tar defaults to tape devices that are rare nowadays add an
additional step to fallback to "-" if tape devices are not found.
This is a clean way to have a default to "-" on those systems that tape
devices are not present while keeping the current behavior for other
cases. Additionally prepare for future releases where this kind of defaults
will be dropped.
Signed-off-by: Walter Lozano <walter.lozano@collabora.com>
Emil Velikov [Sun, 21 Nov 2021 17:38:38 +0000 (17:38 +0000)]
autotools: enable -fdata/function-sections and --gc-sections
Analogue to the parent cmake commit, with linker flag detection.
The former two split the functions and data into separate sections
within the object file. Which makes it easier for the latter to properly
garbage collect and discard unused sections.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Emil Velikov [Sun, 21 Nov 2021 17:38:28 +0000 (17:38 +0000)]
cmake: enable -fdata/function-sections and --gc-sections
The former two split the functions and data into separate sections
within the object file. Which makes it easier for the latter to properly
garbage collect and discard unused sections. For example
text data bss dec hex filename
208268 2056 4424 214748 346dc bsdcat -- before
93396 1304 4360 99060 182f4 bsdcat -- after 1059167 12112 24176 1095455 10b71f bsdcpio -- before 1002538 7320 23984 1033842 fc672 bsdcpio -- after 1093676 14248 6608 1114532 1101a4 bsdtar -- before 1062231 14176 6416 1082823 1085c7 bsdtar -- after 1097259 15032 6408 1118699 1111eb libarchive.so.18 -- before 1095675 14992 6216 1116883 110ad3 libarchive.so.18 -- after
Note:
This is enabled only with gcc/clang on non-Mac platforms. Ideally we'll
have a compile-time check, albeit that seems impossible with our ancient
cmake requirement.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Emil Velikov [Sun, 21 Nov 2021 18:05:19 +0000 (18:05 +0000)]
tar: demote -xa from error to a warning
It's fairly common for people to use caf and xaf on Linux. The former in
itself being GNU tar specific - libarchive tar does not allow xa.
While it makes little sense to use xaf with libarchive tar, that is
implementation detail which gets in the way when trying to write trivial
tooling/scripts.
For the sake of compatibility, reduce the error to a warning and augment
the message itself. Making it clear that the option makes little sense.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Emil Velikov [Sun, 21 Nov 2021 14:50:25 +0000 (14:50 +0000)]
cmake: drop -rdynamic aka CMP0065 NEW
Prior to version 3.3 cmake would always use -rdynamic. That in itself
causes all the internal symbols to be exported, increasing the binaries
by 5-10% and making it impossible for the compiler to reason, optimise
and discard unused code.
The -rdynamic is useful in two cases:
- having a third party module (say /usr/lib/foo/foobar.so) which is
underlinked and depends on symbols from the main binary - apps like
irssi, bash and zsh use that
- uses the glibc backtrace, which relies on dlopen/dlsym to fetch the
symbol data. Unwind is much better solution, since it replies on the
DWARF data
Our binaries do not use either of these - so drop the -rdynamic. The
autotools build doesn't use it either.
text data bss dec hex filename
229000 2120 4424 235544 39818 bsdcat -- before
208324 2120 4424 214868 34754 bsdcat -- after 1093939 12128 24176 1130243 113f03 bsdcpio -- before 1059181 12128 24176 1095485 10b73d bsdcpio -- after 1130091 14264 6608 1150963 118ff3 bsdtar -- before 1093690 14264 6608 1114562 1101c2 bsdtar -- after
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Emil Velikov [Sun, 21 Nov 2021 14:26:53 +0000 (14:26 +0000)]
cmake: fold gcc/clang sections
The flags used across the two are identical, apart from -g.
There is no compelling reason, why we would omit -g for debug builds
with GCC, while using it with clang.
De-duplicate the sections.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Jonas Witschel [Sun, 21 Nov 2021 09:07:52 +0000 (10:07 +0100)]
test_sparse_basic: do not assume that holes can be read in one go
verify_sparse_file() assumes that every hole will be fully contained in only
one archive_read_data_block(). This is a reasonable assumption if the file is
indeed sparsely encoded in the archive because archive_read_data_block() will
just skip the hole and return the offset of the next data block.
However, if the file is not sparsely encoded in the archive, a hole consists of
a lot of zeroes that need to be read byte by byte. In this case, the archive
contains no information on where this block of zeroes ends and where actual
data begins. Therefore it can happen that a single archive_read_data_block()
contains both zeroes from a hole and actual data.
If this happens, assert(sparse->type == HOLE) fails. This assertion is
reasonable for sparsely encoded files because archive_read_data_block() will
never only read part of a hole (since it does not really "read" a hole at all,
it just returns a higher offset accounting for the size of the hole).
However, we want to start testing files with verify_sparse_file() that are
explicitly not sparsely encoded. In this case, the assertion does not
necessarily hold any more. Therefore we need to account for the case where the
overlapping block consists of data. To make sure the file contents are
correctly encoded in the archive, we need to test the contents of the data
block, like it is already done for blocks completely contained in the data read
by archive_read_data_block().
Note that this modification does not change the way sparsely encoded files are
verified, it just relaxes an edge case that cannot happen with sparsely encoded
files to make it possible to test any kind of file, whether sparsely encoded or
not.
Theo Buehler [Fri, 19 Nov 2021 17:55:29 +0000 (18:55 +0100)]
Remove OpenSSL compat code that misuses the API
Immediately after EVP_CIPHER_CTX_new() neither EVP_CIPHER_CTX_init()
nor EVP_CIHPER_CTX_reset() should be called: the purpose of the init
function is to initialize a context on the stack while reset clears
a used context for reuse. Neither situation is the case here.
Removing the code also fixes a potential NULL dereference because an
error of reset is not signaled to the caller. Fortunately reset doesn't
currently fail in this situation in current OpenSSL and LibreSSL.
Martin Matuska [Wed, 17 Nov 2021 20:06:00 +0000 (21:06 +0100)]
archive_write_disk_posix: fix writing fflags broken in 8a1bd5c
The fixup list was erroneously assumed to be directories only.
Only in the case of critical file flags modification (e.g. SF_IMMUTABLE
on BSD systems), other file types (e.g. regular files or symbolic links)
may be added to the fixup list. We still need to verify that we are writing
to the correct file type, so compare the archive entry file type with
the file type of the file to be modified.
Jonas Witschel [Sun, 14 Nov 2021 17:56:49 +0000 (18:56 +0100)]
Add ARCHIVE_READDISK_NO_SPARSE to suppress reading sparse file info
Sparse file information depends on the file system and can therefore be a
source of unreproducibility for the generated archives, e.g. if the same
content is compressed on a file system with and without sparse file support.
Add an option to suppress reading this information from disk entirely.
Emil Velikov [Mon, 30 Mar 2020 22:13:15 +0000 (23:13 +0100)]
reader: track read_filter "can_skip" with a flag
Analogous to the earlier "can_seek" change. Drop the function pointer
in favour of a flag. Over the years, with over a dozen filters, no
filters actually implemented it.
If at a point in the future that changes, one can reinstate it.
Alternatively one could use a ARCHIVE_FILTER_NONE check.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Emil Velikov [Sat, 23 Oct 2021 17:22:05 +0000 (18:22 +0100)]
reader: transform get_bidder into register_bidder
There's a notable duplication across all the read bidder code.
Check the archive magic/state, set the bidder private data (to NULL in
all but one case), name and vtable.
Change the helper to do the actual registration, keeping things simpler
in the dozen+ filters. This also allows us to enforce the bidder ::bid
and ::init dispatch are non NULL. The final one ::free is optional.
NOTE: some of the bidders do _not_ set a name, which I suspect is a
pre-existing bug. I've left them as-is, but we might want to fix and
enforce that somehow.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>