Tim Kientzle [Thu, 1 Dec 2011 06:35:11 +0000 (01:35 -0500)]
Refactor the Zip header read:
* Move some common initialization into read_local_file_header
* Ensure that the file pointer is always pointing to the start of
the local file header when calling read_local_file_header
* Merge old search_next_signature into streamable_read_header
and simplify the resulting code
Tim Kientzle [Thu, 1 Dec 2011 05:35:22 +0000 (00:35 -0500)]
Issue 201: Since we're using the seeking Zip reader here, we only
have to suppress the actual read if we lack libz. We
don't expect an error when reading the next header.
If ENABLE_ICONV is OFF, clean up ICONV variables in CACHE in case ENABLE_ICONV is changed from ON to OFF.
That remaining variables caused build failure.
Add support for 7-Zip.
This new 7Zip reader supports COPY,Deflate,BZ2,LZMA and LZMA2.
This also supports BCJ + LZMA2, DELTA + LZMA and DELTA + LZMA2.
(BCJ + LZMA is not supported, there is some problems to support it.)
This 7Zip reader cannot work on stream because of its format:
its header is placed at the bottom of the archive file, so we
have to call seek().
Tim Kientzle [Sat, 26 Nov 2011 07:00:43 +0000 (02:00 -0500)]
When reading deflate data, we should still treat the "compressed size"
value as a limit instead of assuming the deflate stream will agree.
This fixes an internal assertion failure when trying to read certain
inconsistent Zip archives.
Further the robustness of the ISO reader against corrupted ISO images.
- Do not allow a negative location when the file is not empty since
it's too large and it will makes a negative file offset, which the
ISO reader dose not expect to use the offset for non empty files at all.
Tim Kientzle [Thu, 24 Nov 2011 08:56:20 +0000 (03:56 -0500)]
Issue 152: With a little work, we actually can extract
this type of archive even with the streaming reader.
The end-of-data marker used with length-at-end storage
is actually more distinctive than I had at first thought.
This commit changes the streaming reader to look
for the end-of-data marker when scanning uncompressed
bodies.
Issue 199: Improve robustness against corrupted ISO images.
- Back out r3823. it was not insufficient for such corrupted ISO images.
- Check "RE" extension strictly: stop just clearing "RE" extension
as if it were not defined.
- Check "CL" extension strictly:
* "CL" does not point at itself.
* "CL" does not point at its parents.
* A directory file cannot have "CL" extension.
* A file which has "CL" extension is not on the top level of
a directory tree.
Tim Kientzle [Wed, 23 Nov 2011 09:15:30 +0000 (04:15 -0500)]
Issue 152: Now that we have a seeking Zip reader that understands
Central Directory, we can take the sample Zip file from this issue
and build a test around it.
This mostly involves cleaning up the handling of data that appears
in both the central directory and the local file header. In
particular, some writers set crc and size fields to zero in the
local file header for no apparent reason. Fixing this also changes
a few tests that assumed the Zip reader didn't always have accurate
file information. It now does have accurate file information
from the central directory in the seek case.
Tim Kientzle [Wed, 23 Nov 2011 07:47:46 +0000 (02:47 -0500)]
When we do an implicit data-skip during a header request, be
a little more careful about errors:
* FATAL during the data-skip is very bad (obviously)
* EOF during the data-skip is very bad (EOF should only ever be
returned by next-header)
* EOF from the header fetch always has to be returned, even if we
drop a warning to do so. (In theory, we could defer the warning
until close, but that's a little weird, too.)
* Otherwise, return whichever is worst.
Tim Kientzle [Mon, 21 Nov 2011 00:52:32 +0000 (19:52 -0500)]
Try to decrease the performance hit of seeking during the bid phase.
Tell each bidder about the best bid so far so it can decide
to do nothing. Reorder support_format_all so formats
with relatively inexpensive bidders will run first.
Tim Kientzle [Mon, 21 Nov 2011 00:12:15 +0000 (19:12 -0500)]
There are now two different Zip
readers. archive_read_support_format_zip() enables both of them:
* The "streamable" reader is the old one, which reads Zip archives
serially from the beginning. This is efficient but cannot use
information from the Central Directory (especially file mode). It
also has problems with SFX archives. (I've actually removed the
seek-ahead SFX support from this reader.)
* The new "seekable" reader reads the Central Directory first and
therefore has access to slightly more information about each file.
In particular, it can fully support SFX archives. But it requires
seek support, so cannot read Zip archives from pipes or other
non-seekable inputs.
The two implementations obviously share a lot of code.
There are still rough edges but this does basically work. I've added
one Zip test to verify that SFX archives can be read correctly only
when seek is supported, but more testing is needed (in particular, the
fuzz tester only exercises one of the two Zip readers).
There are some unpleasant consequences of having a format reader that
relies on seek support. In particular, there are some performance
headaches during bid phase because the seeking bidder thrashes
libarchive's internal read-ahead buffers. There are also some
potentially ugly problems with read alignment when seeking.
On Windows platform, we have made own lseek, which cat handle 64 bits offset pointer, and use it.
If there is an obvious reason to use off_t defined as 32 bits offset pointer, we can use int64_t
for lseek instead of off_t on Windows.
I stupidly misunderstood a handling of CMake build options, ENABLE_TAR, ENABLE_TAR_SHARED,
ENABLE_CPIO and ENABLE_CPIO_SHARED, in r3240 and r3713. And that had made bsdtar and bsdcpio
disabled or being test failure on Windows platform.
- Simplify CMake build options; all platforms have the same default build options.
- Back out r3713.
Tim Kientzle [Sat, 12 Nov 2011 05:26:31 +0000 (00:26 -0500)]
Disable OpenSSL probe on Darwin.
This avoids a painful deprecation build error when the configure
finds RMD160 in OpenSSL.
We can re-enable this in the production version of libarchive 3.0.0,
since the production versions don't enable -Werror.
Tim Kientzle [Sat, 12 Nov 2011 05:21:58 +0000 (00:21 -0500)]
Issue 196: cpio format tests should verify that nlinks in the
archive matches what's on disk, instead of assuming historical
Unix behavior.
Thanks to: dpmcgee
Colin Percival [Thu, 10 Nov 2011 06:08:52 +0000 (01:08 -0500)]
Garbage collect NO_NAME. It hasn't been used since r423 (January 2009)
when the uname/gname lookups moved out of write.c to use the new
archive_read_disk API instead.
Tim Kientzle [Wed, 9 Nov 2011 05:05:57 +0000 (00:05 -0500)]
Issue 191: Update the manpage for archive_entry_stat to
clarify the fields actually copied.
While I'm here, fill in a few missing items and try to
make some of the descriptions a little more accurate.