Martin Matuska [Mon, 3 Jun 2019 21:33:49 +0000 (23:33 +0200)]
Minor bsdtar.1 manpage fixes
- the -p option does not restore owner by default.
- the -n option was listed twice
- file flags are called file attributes on Linux and are platform-specific
Mike Frysinger [Wed, 22 May 2019 04:04:35 +0000 (09:49 +0545)]
simplify gitignore a bit
Lets ignore autotool generated files (.la .dirstamp .deps) everywhere
rather than hardcoded specific subdirs. We'll never add files with
those names to the source repo, so that should be OK.
We're already ignoring CMakeFiles/ everywhere (since the rule lacks
a leading / anchor), so we can delete the redundant paths.
Rather than hardcode every possible unittest and related files, add
globs that ignore all *_test related paths in the topdir. We won't
be adding paths like that to the source repo, so it should be OK.
Mike Frysinger [Fri, 17 May 2019 10:53:18 +0000 (22:53 +1200)]
zip: check filename crc in Info-ZIP Unicode Path Extra Field
The 0x7075 extension includes a crc of the filename that's in the CDE
to make sure that the UTF8 string is always up to date. If an older
tool updates the CDE but doesn't update the 0x7075 field, we want to
ignore the UTF8 string since it's stale.
Martin Matuska [Sat, 11 May 2019 19:36:39 +0000 (21:36 +0200)]
CI: increase make command output verbosity
Add Fedora_29_distcheck task with "make distcheck"
Add support for debug build with address sanitizer
Add missing free to tar/test/test_option_C_mtree.c
Martin Matuska [Sat, 11 May 2019 00:36:53 +0000 (02:36 +0200)]
RAR reader: fix use after free
If read_data_compressed() returns ARCHIVE_FAILED, the caller is allowed
to continue with next archive headers. We need to set rar->start_new_table
after the ppmd7_context got freed, otherwise it won't be allocated again.
RAR5 reader: fix a potential SIGSEGV on 32-bit builds
The reader was causing a SIGSEGV when the file has been declaring a
specific dictionary size. Dictionary sizes above 0xFFFFFFFF bytes are
overflowing size_t type on 32-bit builds. In case the file has been
declaring dictionary size of 0x100000000 (so, UINT_MAX+1), the
window_size variable effectively contained value of 0. Later, the memory
allocation function was skipping actual allocation of 0 bytes, but still
tried to unpack the data.
This commit limits the dictionary window size buffer to 64MB, so it
always fits in a size_t variable, and disallows a zero dictionary size
for files in the header processing stage.
One unit test had to be modified after this change.
RAR5 reader: don't try to unpack entries marked as directories
RAR5 structure contains two places where a file can be marked as a
directory. First place is inside the file_flags field in FILE and
SERVICE base blocks, and the second place is inside file_attributes
bitfield also in the same base blocks.
The first directory flag was used to decide if the reader should
allocate any memory for the dictionary buffer needed to unpack the
files. Because if the file is actually a directory, then there should be
nothing to unpack, so if a file was marked as a directory here, the
reader did not allocate any dictionary buffer.
The second directory flag was used to indicate what file attributes
should be passed to the caller. So this second directory flag was used
as an actual indicator what the caller should do during archive
unpacking: should it treat it as a directory, or should it treat it as a
file.
Because of this situation, it was possible to declare a file as a
directory in the file_flags field, but do not declare it as a directory
in the second field, also adding a compressed stream to the FILE/SERVICE
base block. This situation was leading to a condition where the reader
was trying to use unallocated/already freed memory (because it did not
allocate a new dictionary buffer due to the directory flag set in
file_flags).
This commit fixes it so that the reader will check if it tries to
decompress a FILE/SERVICE block that has been declared as a directory in
the file_flags field. If the check will evaluate to true, it will return
an ARCHIVE_FAILED code, because it's not a valid action to take, and
shouldn't exist in valid archives at all.
Also added a unit test for this issue.
This should fix OSSFuzz issue #14574.
This commit also has influenced some of the other unit tests, because it
turned out the sample files used in other tests also did have
inconsistent directory flags in the file_flags and file_attributes
fields. So, some assertions in affected test cases have been changed to
be more relaxed, but still functional.
Converted space indentation to tabs in RAR reader, ZIP reader tests
Whole libarchive uses tab characters to indent scopes. RAR5 reader and
RAR5 reader tests were using space characters for indentation.
Additionally ZIP reader tests was using space indentation only in
specific places, but most of the file used tab character for indent.
This commit converts space indentation characters to tabs.
RAR5 reader: fix invalid type used for dictionary size mask.
This commit fixes places where the window_mask variable, which is needed
to perform operations on the dictionary circular buffer, was casted to
an int variable.
In files that declare dictionary buffer size of 4GB, window_mask has a
value of 0xFFFFFFFF. If this value will be assigned to an int variable,
this will effectively make the variable to contain value of -1. This
means, that any cast to a 64-bit value will bit-extend the int variable
to 0xFFFFFFFFFFFFFFFF. This was happening during a read operation from
the dictionary. Such invalid window_mask variable was not guarding
against buffer underflow.
This commit should fix the OSSFuzz issue #14537.
The commit also contains a test case for this issue.
RAR5 reader: handle a case with truncated huffman tables.
RAR5 reader did assume that the block contains full huffman table data.
In invalid files that declare existence of huffman tables, but also
declare too small block size to fit the huffman tables in, RAR5 reader
was interpreting memory beyond the allocated block.
The commit adds necessary buffer overflow checks and fails the huffman
table reading function in case truncated data will be detected.
The commit also provides a unit test for this case.
This commit fixes some undefined shift-left operations on types that do
not support such a big shift. Those invalid shift operations were
triggering on invalid files produced by fuzzing.
The commit also contains two unit tests that ensure such problems won't
arise in the future.
RAR5 reader: fix buffer overflow when parsing huffman tables.
RAR5 compresses its Huffman tables by using an algorithm similar to Run
Length Encoding. During uncompression of those tables, RAR5 reader
didn't perform enough checks to prevent from buffer overflow in some
cases.
This commit adds additional check that prevents from encountering a
buffer overflow error in some files.
The commit also adds a unit test to guard against regression of this
issue.
This commit fixes a memory leak which is triggered by invalid files.
Sample test case that triggers the leak is provided by OSSFuzz #14470.
If the ZIPX file contanis an LZMA stream, and this stream is invalid,
the reader was allocating an LZMA decoding context which wasn't freed.
Later, when trying to unpack another LZMA stream, context was
re-initialized by allocating a new context and overwriting old pointers
to an unfreed memory, causing a memory leak.
After applying this commit, the LZMA stream context initialization
function will check if there is an non-freed previous context being in
use. If it exists, the reader will free the memory before allocating a
new LZMA unpacking context.
The commit also contains a test case with OSSFuzz sample #14470.
RAR5 reader: add support for 'version' extra field and ignore unknown fields.
This commit adds support for the VERSION extra field appended to FILE
base block. This field allows to add version support for files inside
the archive. If the file name is 'abc' and its version is 15, libarchive
will unpack this file as 'abc;15'. Changing of file names is needed
because there can be multiple files inside the archive with the same
names and different versions. In order for the user to not be confused
which file is which, RAR5 reader changes the name.
Also this commit contains a unit test for VERSION extra field support.
Another change this commit introduces is ignoring of unknown extra
fields. Before applying the commit, RAR5 reader was failing to unpack
the file if an unknown field was encountered. But since the reader knows
the unknown field's size, it can skip it and ignore it, then proceed
with parsing the structure. After applying this commit, RAR5 reader will
skip and ignore unknown fields.
Unknown fields that are skipped include fields in FILE's extra header,
as well as unsupported REDIR types.
RAR5 reader: fix ASan errors, fix OSSFuzz samples, add a unit test
This commit fixes errors reported by ASan, as well as fixes runtime
behavior of RAR5 reader on OSSFuzz sample files:
#12999, #13029, #13144, #13478, #13490
Root cause for these changes is that merge_block() function was
sometimes called in a recursive way. But this function shouldn't be used
this way, because calling it recursively overwrites the global state
that is used by the function. So, the commit ensures the function will
not be called recursively.
There is also one fix that changes some tabs to spaces, because whole
file originally used space indentation.
Mike Frysinger [Mon, 27 Mar 2017 00:29:34 +0000 (20:29 -0400)]
support reading metadata from compressed files
The raw format provides very little metadata. Allow filters to pass
back state that it knows about. With gzip, we know the original file
name, mtime, and file size. For now, we only pull out the first two
as those are available in the file header. The latter is in the file
trailer, so we'll have to add support for that later (if we can seek
the input).
RAR5 reader: invalid window buffer read in E8E9 filter
The E8E9 filter was accessing the window buffer with a direct memory
read. But since the window buffer is a circular buffer, some of its data
can span between the end of the buffer and beginning of the buffer. This
means that accessing the window buffer needs to be done always by a
reading function that is aware of the fact that the window buffer is
circular.
The commit changes direct memory read to the access through the
circular_memcpy() function.
This fixes some edge cases when the E8E9 filter data (4 bytes) is
spanned between the end of the window buffer and the beginning of the
buffer. This situation can happen in archives compressed with a small
dictionary size.