Tim Kientzle [Mon, 30 Dec 2013 04:57:17 +0000 (20:57 -0800)]
Support experimental "at" extra block for better streaming.
The writer now writes this extra block with every local
file header; the reader recognizes it and uses it.
This allows streaming extraction to properly restore
file permissions and symlinks.
Without this, streaming extraction of Zip archives is
somewhat hobbled by the lack of full information in
the local file header.
Here's a detailed description of the new extra block.
The details here are subject to change at any time.
-Extended Local File Header Extra Field (0x7461):
The following is the layout of the extended local file
header "extra" block. This allows information to be
included with the local file header that could previously
only be stored with the central directory file header.
Note: all fields stored in Intel low-byte/high-byte order.
Value Size Description
----- ---- -----------
0x7461 2 bytes Tag for this "extra" block type
Size 2 bytes Size of this "extra" block
Version
Made By 2 bytes See "Version Made By" above
Internal File
Attributes 2 bytes See "Internal File Attributes" above
External File
Attributes 4 bytes See "External File Attributes" above
This extra block should only be used with the local
file header. The values stored should exactly match
the corresponding values in the central directory
file header.
Tim Kientzle [Sat, 28 Dec 2013 09:45:32 +0000 (01:45 -0800)]
Test for large Zip archives, following code for large Tar test.
Fix several bugs:
* comparison function for ordering entries in reader was wrong
* writer wasn't including 64-bit sizes for entries of exactly 0xffffffff bytes
Also, add options to suppress CRC calculations and checks
(otherwise, this test spends a *lot* of time in CRC routines).
Tim Kientzle [Thu, 26 Dec 2013 21:59:55 +0000 (13:59 -0800)]
Fix all current tests:
* When searching for start of Central directory, DTRT if there are no entries.
* Find end-of-archive marker when it is exactly at end of file
* Recognize zip64 central directory locator (not yet fully parsed)
* Don't return entry size if length-at-end, even if size looks reasonable
Trying to assign -1 (ARCHIVE_READ_FORMAT_ENCRYPTION_DONT_KNOW)
to a data type we're not sure will always be signed is a bad idea
(blew up on ARM, for instance).
NiLuJe [Sun, 22 Dec 2013 23:04:38 +0000 (00:04 +0100)]
More MinGW trickery...
Poor man's attempt at making 4e002d9a92ecd7cec0fb98b0bedbace8aad81f6e
play nice with MinGW.
Dry coded for the fastest solution, there's probably a much better
way to handle that sanely & properly.
Fixes a build failure on MinGW:
CC tar/bsdtar-bsdtar.o
In file included from tar/bsdtar_platform.h:62:0,
from tar/bsdtar.c:26:
./libarchive/archive_entry.h:250:64: error: unknown type name 'BY_HANDLE_FILE_INFORMATION'
__LA_DECL void archive_entry_copy_bhfi(struct archive_entry *, BY_HANDLE_FILE_INFORMATION *);
^
Tim Kientzle [Fri, 20 Dec 2013 05:17:23 +0000 (21:17 -0800)]
Zip and Rar store file times in local time,
so we can't verify them in tests (since the
time varies depending on the time zone where
the tests are being run).
Konrad Kleine [Fri, 13 Dec 2013 08:28:55 +0000 (09:28 +0100)]
Include coverage script if ENABLE_COVERAGE is set
Rather than including the LibarchiveCodeCoverage.cmake file and checking
there if ENABLE_COVERAGE is set, we only include the file if this option
is set in the first place.
Hans Johnson [Thu, 16 May 2013 16:26:46 +0000 (11:26 -0500)]
ENH: Allow fine grained control over dependancies
An end user may want to explicitly avoid using
a feature that can be automatically found on the
build system. This arose when the build machine
had libraries for LZMA but the target machine
did not have those libraries available.
By allowing flags the optional features provided by
LZMA/ZLIB/BZip2/EXPAT/PCREPOSIX/LibGCC
can be explicitly removed (Default behavior is
to use what can be found).
Brad King [Mon, 9 Dec 2013 19:03:43 +0000 (14:03 -0500)]
libarchive: Use ARCHIVE_LITERAL_ULL to add ULL integer suffix
The macro maps to an implementation that works on older compilers when
necessary. Convert the 0ULL literal introduced by commit 6cf33c93
(Issue 320: Rewrite (again) to avoid the left shift that CLang dislikes
so much, 2013-12-07) to use the macro.
Tim Kientzle [Sun, 1 Dec 2013 22:45:56 +0000 (14:45 -0800)]
Refactor Zip writer.
Zip writer no longer preserves full archive_entry objects for
every entry; it just accumulates the actual bytes to be put
into the central directory. Most of the central directory file
header is formatted at the same time as the local file header.
The header formatting is refactored to make it easier to support
variable-length extra data.
The tests are adjusted to match the new output: We include more
detailed extra data in the central directory, we're more selective
about generating data descriptors (they're not needed for directory
entries, for instance), UT extra data now includes only the time
fields specified by the user, we're setting the "version required"
field more accurately.
There are some initial attempts to include Zip64 extensions
when appropriate; that still needs lots of work. I'm not
yet sure how to test Zip64 support without generating gigantic
archives. Hmmm...
Tim Kientzle [Tue, 26 Nov 2013 18:03:01 +0000 (10:03 -0800)]
Support Zip64 extra data fields for handling large entries.
Process extra data fields for central directory and local file headers
so we get correct full size information in both cases.
Correct central directory vs. local file header sanity check
to compare full size information (including data picked out of
the extra data).
Note: This does not yet support the Zip64 end-of-central-directory
marker so doesn't correctly handle very large archives.
Tim Kientzle [Tue, 26 Nov 2013 17:57:03 +0000 (09:57 -0800)]
Start refactoring Zip writer:
* Build list of entries for Central directory at entry_finish
(So we can switch in-memory Central dir to a list of binary blobs.)
* Rename some variables to clarify the code.
* Add 'zip64' option to force zip64 extensions for testing
Tim Kientzle [Sun, 24 Nov 2013 16:44:34 +0000 (08:44 -0800)]
Issue 332: Be more careful guessing file mode information from
incomplete Zip archives. In particular, some epub files have
0 in the file type part of the mode field.
Konrad Kleine [Wed, 9 Oct 2013 11:53:15 +0000 (13:53 +0200)]
Better archive_read_has_encrypted_entries function
These are the possible return values:
=====================================
ARCHIVE_READ_FORMAT_ENCRYPTION_UNSUPPORTED (-2):
------------------------------------------------
This is the default return value for reader that don't support
encrpytion detection at all. When this value is returned you can be sure
that it will always be returned even on later calls to
archive_read_has_encrypted_entries.
ARCHIVE_READ_FORMAT_ENCRYPTION_DONT_KNOW (-1):
----------------------------------------------
This is the default return value for readers that support encryption. It
means that the reader is not yet ready (e.g. not enough data was read so
far) to say that there are encrypted entries in the archive. Over time
as more data was consumed by a reader this value might change to 0 or 1.
0:
--
No encrypted entries were found yet (maybe there will be one when
reading the next header). When 0 is returned, it might be that on a
later call 1 will be returned.
1:
--
At least one encrypted entry was found. Once 1 is returned. It will
always be returned even if the another entry is not encrypted.
NOTE:
=====
If the metadata/header of an archive is also encrypted, you
cannot rely on the number of encrypted entries. That is why this
function does not return the number of encrypted entries but
just 1 to show that there are some. Also, two distinct readers might
detect the number of entries and their encryption status at different
times. If one reader can say how many files are encrypted after reading
the first header another reader might need more data. In the end the
number of encrypted entries might be the same in two archives while the
appropriate readers output different results at different points in
time. This is more confusing than helpful.
Konrad Kleine [Mon, 1 Jul 2013 15:12:24 +0000 (17:12 +0200)]
Detect encrypted archive entries (ZIP, RAR, 7Zip)
With this change you can detect if an archive entry is encrypted. The
archive formats covered with this change are: ZIP, RAR, and 7zip. Other
formats can be added quite simply by looking at the already supported
formats. For all the already supported formats we have tests that check
if:
* an archive entries's data is encryped (data test)
* an archive entries's metadata is encrypted (header test)
* one file is encrypted while another is not (partially test)
These new functions are introduced.
int archive_read_format_capabilities(struct archive*)
Returns a bitmask of capabilities that are supported by the archive
format reader. If the reader has no special capabilities,
ARCHIVE_READ_FORMAT_CAPS_NONE is returned; otherwise 0 is returned.
You can call this function even before reading the first header from
an archive.
Return Values:
* ARCHIVE_READ_FORMAT_CAPS_ENCRYPT_DATA
The reader supports detection of encrypted data.
* ARCHIVE_READ_FORMAT_CAPS_ENCRYPT_METADATA
The reader supports detection of encrypted metadata (e.g.
filename, modification time, size, etc.)
* ARCHIVE_READ_FORMAT_CAPS_NONE
None of the above capabilities. If this value is returned, this
doesn't mean that the format itself doesn't support any type of
encryption it simply means that the reader is not capable of
detecting it.
int archive_read_has_encrypted_entries(struct archive *)
Returns "true" (non-zero) if the archive contains at least one
encrypted entry, no matter which encryption type (data or metadata)
is used; otherwise 0 is returned.
You should only call this function after reading the first header
from an archive.
NOTE: I'm not sure that this function will stay in for long.
int archive_entry_is_data_encrypted(struct archive_entry*)
Returns "true" (non-zero) if the archive entry's data is encrypted;
otherwise 0 is returned. You can call this function after calling
archive_read_next_header().
int archive_entry_is_metadata_encrypted(struct archive_entry*)
Returns "true" (non-zero) if the archive entry's metadata is
encrypted; otherwise 0 is returned. You can call this function after
calling archive_read_next_header().
int archive_entry_is_encrypted(struct archive_entry*)
Returns "true" (non-zero) if either the archive entry's data and/or
it's metadata is encrypted; otherwise 0 is returned. You can call
this function after calling archive_read_next_header().
If you use archive_read_format_capabilities() in combination with one of
the archive_entry_is_[data|metadata]_encrypted() functions, you can be
sure that you've encountered an encrypted entry. This allows you to
react differently depending on libarchive's return codes. For instance,
you might want to skip encrypted files from being extracted until
decryption support has been implemented.
Here's how I generated the 7zip test files:
-------------------------------------------