RAR5 reader: Removed a memory leak in process_head_file
The process_head_file function was using memset() to clear the
archive_entry structure. The problem was that this structure could
contain pointers to allocated blocks of memory, and removing those
pointers with memset() resulted in a memory leak.
Switching it in favor of archive_entry_clear() effectively clears the
structure, but also releases any allocated memory blocks. This removes
the memory leak.
The commit also changes the way a temporary archive_entry instance is
being created when skipping a base block after block merge; instead of
directly creating a new instance on the stack, a constructor function
archive_entry_new() is used to ensure the new archive_entry instance is
not in an inconsistent state. This is needed because the fix described
in the first half of this commit message depends on the archive_entry
instance being in a consistent state due to the call of the
archive_entry_clear() function.
RAR5 reader: Fixed a read from invalid memory block
In multi-file RAR5 archives, if a block spans from one file to another,
the RAR5 reader merges both blocks into one, and feeds this merged block
to the decompressor function. The problem is that the block merge
function allocates the exact number of bytes for this block. This is
problematic because when trying to read the last byte from this new
block with bit reader functions, the bit reader functions will reference
few additional bytes right after the byte the caller is trying to read,
resulting in an out of bounds read.
The commit increases the allocation size for new merged block. This
ensures that bit reader functions will never perform any out of bounds
reads. Additional space is zeroed out to prevent errors from
instrumentation tools like ASan or Valgrind.
Daniel Axtens [Tue, 1 Jan 2019 05:01:40 +0000 (16:01 +1100)]
7zip: fix crash when parsing certain archives
Fuzzing with CRCs disabled revealed that a call to get_uncompressed_data()
would sometimes fail to return at least 'minimum' bytes. This can cause
the crc32() invocation in header_bytes to read off into invalid memory.
A specially crafted archive can use this to cause a crash.
An ASAN trace is below, but ASAN is not required - an uninstrumented
binary will also crash.
==7719==ERROR: AddressSanitizer: SEGV on unknown address 0x631000040000 (pc 0x7fbdb3b3ec1d bp 0x7ffe77a51310 sp 0x7ffe77a51150 T0)
==7719==The signal is caused by a READ memory access.
#0 0x7fbdb3b3ec1c in crc32_z (/lib/x86_64-linux-gnu/libz.so.1+0x2c1c)
#1 0x84f5eb in header_bytes (/tmp/libarchive/bsdtar+0x84f5eb)
#2 0x856156 in read_Header (/tmp/libarchive/bsdtar+0x856156)
#3 0x84e134 in slurp_central_directory (/tmp/libarchive/bsdtar+0x84e134)
#4 0x849690 in archive_read_format_7zip_read_header (/tmp/libarchive/bsdtar+0x849690)
#5 0x5713b7 in _archive_read_next_header2 (/tmp/libarchive/bsdtar+0x5713b7)
#6 0x570e63 in _archive_read_next_header (/tmp/libarchive/bsdtar+0x570e63)
#7 0x6f08bd in archive_read_next_header (/tmp/libarchive/bsdtar+0x6f08bd)
#8 0x52373f in read_archive (/tmp/libarchive/bsdtar+0x52373f)
#9 0x5257be in tar_mode_x (/tmp/libarchive/bsdtar+0x5257be)
#10 0x51daeb in main (/tmp/libarchive/bsdtar+0x51daeb)
#11 0x7fbdb27cab96 in __libc_start_main /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:310
#12 0x41dd09 in _start (/tmp/libarchive/bsdtar+0x41dd09)
This was primarly done with afl and FairFuzz. Some early corpus entries
may have been generated by qsym.
Daniel Axtens [Tue, 1 Jan 2019 06:10:49 +0000 (17:10 +1100)]
iso9660: Fail when expected Rockridge extensions is missing
A corrupted or malicious ISO9660 image can cause read_CE() to loop
forever.
read_CE() calls parse_rockridge(), expecting a Rockridge extension
to be read. However, parse_rockridge() is structured as a while
loop starting with a sanity check, and if the sanity check fails
before the loop has run, the function returns ARCHIVE_OK without
advancing the position in the file. This causes read_CE() to retry
indefinitely.
Make parse_rockridge() return ARCHIVE_WARN if it didn't read an
extension. As someone with no real knowledge of the format, this
seems more apt than ARCHIVE_FATAL, but both the call-sites escalate
it to a fatal error immediately anyway.
Found with a combination of AFL, afl-rb (FairFuzz) and qsym.
ZIP reader: added support for XZ, LZMA, PPMD8 and BZIP2 decompression
This commit adds some support for extraction of '.zipx' files. Those
files are standard ZIP files that can contain files compressed with
different set of algorithms that standard '.zip' files use.
Support is still missing for Deflate64, JPEG and WavPack algorithms.
Tim Kientzle [Sat, 15 Dec 2018 18:40:38 +0000 (10:40 -0800)]
Issue 1104: Explicitly limit the printed string to 12 characters
GCC8 tries to diagnose `snprintf()` overflows but isn't quite
smart enough for this case, so emits a false-positive warning.
Remember that `%12s` only specifies the minimum number of bytes. GCC8
conservatively assumes this might result in writing the full length of
`date2`. (Which will never be longer than 12 bytes, but GCC8
apparently can't reason about `strftime` format specifiers yet.)
Changing the specifier here to `%12.12s` explicitly truncates to 12
bytes and should help the compiler understand that this will never
overflow.
While I'm here, correct a minor typo in the previous line; it used
`sizeof(date)` instead of `sizeof(date2)`. (Both are the same
size, so this had no functional impact.)
Daniel Axtens [Tue, 4 Dec 2018 05:33:42 +0000 (16:33 +1100)]
warc: consume data once read
The warc decoder only used read ahead, it wouldn't actually consume
data that had previously been printed. This means that if you specify
an invalid content length, it will just reprint the same data over
and over and over again until it hits the desired length.
This means that a WARC resource with e.g.
Content-Length: 666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666665
but only a few hundred bytes of data, causes a quasi-infinite loop.
Daniel Axtens [Tue, 4 Dec 2018 03:29:42 +0000 (14:29 +1100)]
Skip 0-length ACL fields
Currently, it is possible to create an archive that crashes bsdtar
with a malformed ACL:
Program received signal SIGSEGV, Segmentation fault.
archive_acl_from_text_l (acl=<optimised out>, text=0x7e2e92 "", want_type=<optimised out>, sc=<optimised out>) at libarchive/archive_acl.c:1726
1726 switch (*s) {
(gdb) p n
$1 = 1
(gdb) p field[n]
$2 = {start = 0x0, end = 0x0}
Stop this by checking that the length is not zero before beginning
the switch statement.
I am pretty sure this is the bug mentioned in the qsym paper [1],
and I was able to replicate it with a qsym + AFL + afl-rb setup.
Daniel Axtens [Mon, 3 Dec 2018 13:55:22 +0000 (00:55 +1100)]
rar: file split across multi-part archives must match
Fuzzing uncovered some UAF and memory overrun bugs where a file in a
single file archive reported that it was split across multiple
volumes. This was caused by ppmd7 operations calling
rar_br_fillup. This would invoke rar_read_ahead, which would in some
situations invoke archive_read_format_rar_read_header. That would
check the new file name against the old file name, and if they didn't
match up it would free the ppmd7 buffer and allocate a new
one. However, because the ppmd7 decoder wasn't actually done with the
buffer, it would continue to used the freed buffer. Both reads and
writes to the freed region can be observed.
This is quite tricky to solve: once the buffer has been freed it is
too late, as the ppmd7 decoder functions almost universally assume
success - there's no way for ppmd_read to signal error, nor are there
good ways for functions like Range_Normalise to propagate them. So we
can't detect after the fact that we're in an invalid state - e.g. by
checking rar->cursor, we have to prevent ourselves from ever ending up
there. So, when we are in the dangerous part or rar_read_ahead that
assumes a valid split, we set a flag force read_header to either go
down the path for split files or bail. This means that the ppmd7
decoder keeps a valid buffer and just runs out of data.
The reader has assumed it's running on little-endian. The commit changes
direct memory reads to archive_le* function calls, which should allow
the reader to run on big-endian machines as well.
Changes were needed in the reader itself and in the file holding
reader's test cases.
The commit also removes 1 warning encountered when compiling under GCC
8 on PowerPC architecture.
Pavel Raiskup [Fri, 23 Nov 2018 13:08:48 +0000 (14:08 +0100)]
Fix use-after-free in delayed link processing (newc format)
During archiving, if some of the "delayed" hard link entries
happened to disappear on filesystem (or become unreadable) for
some reason (most probably race), the old code free()d the 'entry'
and continued with the loop; the next loop though dereferenced
'entry' and crashed the archiver.
reset CMAKE_REQUIRED_LIBRARIES before checking system headers
This fixes this CMake warning with CMake 3.12 and newer:
CMake Warning (dev) at /usr/share/cmake/Modules/CheckIncludeFiles.cmake:110 (message):
Policy CMP0075 is not set: Include file check macros honor
CMAKE_REQUIRED_LIBRARIES. Run "cmake --help-policy CMP0075" for policy
details. Use the cmake_policy command to set the policy and suppress this
warning.
CMAKE_REQUIRED_LIBRARIES is set to:
/usr/lib/liblzma.so
For compatibility with CMake 3.11 and below this check is ignoring it.
Call Stack (most recent call first):
CMakeLists.txt:602 (CHECK_INCLUDE_FILES)
CMakeLists.txt:609 (LA_CHECK_INCLUDE_FILE)
Martin Matuska [Thu, 25 Oct 2018 22:48:19 +0000 (00:48 +0200)]
RAR5 reader: fix build errors on some FreeBSD platforms
- "index" shadows a global declaration on powerpc(64), mips(64) and sparc64
- avoid unitialized size_t on riscv64
Martin Matuska [Sat, 6 Oct 2018 20:13:44 +0000 (22:13 +0200)]
Add information about BLAKE2 multi-license to COPYING
The BLAKE2 source files are multi-licensed with the ability to choose
between CC0 1.0 Universal, OpenSSL or Apache 2.0 licenses. For libarchive
the CC0 1.0 Universal Public Domain Dedication should be acceptable.
Fixed broken unit tests in other parts of the project.
The problem was a bad return value for the set_option() function. This
commit changes this return value to ARCHIVE_WARN, as there currently are
no options to be handled by the decompressor.
This is an initial implementation of a stream-oriented unpacker. Things
that should work:
- Extraction of any compression level (stored or compressed), file
enumeration, skipping through files,
- Support for any dictionary sizes,
- CRC32 and BLAKE2sp checksums,
- Solid archives,
- Multi-volume archives (part001, part002, etc),
- Solid multi-volume archives,
- DELTA, x86 and ARM filter support: other filters are not used
in version 5 of the format.