Tim Kientzle [Sun, 4 Jan 2009 02:02:36 +0000 (21:02 -0500)]
Big decompression refactor, Phase 2.
First, I've changed terminology: the objects that were called
"sources" and "readers" in libarchive 2.6 are now called "filters" and
"filter bidders." I think this reads a lot better. It also more
cleanly conveys that filters can be stacked and are not limited
to decompression.
Filter objects are now created and owned by the archive_read core,
which attaches reblocking logic to the output of each one. This
allows filters to use the read_ahead/consume protocol to pull data
from upstream filters (the client itself is also handled as a filter,
one with no upstream). Peek/consume read semantics greatly simplify
bidding and should make it easier to handle other lookahead scenarios
such as concatenated gzip streams.
Bid protocol:
* Filters install bidders into the core.
* Bidder has a "bid" function which is handled an upstream filter to taste.
* Core creates a filter object for the winner and allows the filter
to initialize that object.
* Core repeats the bid with the new filter to build out filter streams.
Filter protocol:
* read() can use peek/consume to pull from upstream filter, returns
arbitrary blocks
* close() releases private data
* Optional skip() seeks forward in the stream.
The core obviously provides a lot more support in the way of reblocking
logic so that downstream filters can use simple peek/consume semantics
while upstream filters can provide arbitrary blocks without regard
for downstream needs. The lazy reblocking makes this pretty efficient
in practice.
Upcoming: I think the core should provide a "prepare" stage that's called
before the first read. Concatenated gzip streams would go back to prepare
after close. It will require a little thought to properly separate the
init() and prepare() operations.
Tim Kientzle [Thu, 1 Jan 2009 02:07:29 +0000 (21:07 -0500)]
Create test_option_s to exercise basic -s handling.
In particular, add some checks for multiple -s options in order
to verify a bug reported where multiple -s options have no effect.
A bad typo was causing the substitution-matching loop to always exit
after checking the first match.
Fixed.
Thanks to Wayne Marshall for reporting this problem.
Tim Kientzle [Thu, 1 Jan 2009 00:57:15 +0000 (19:57 -0500)]
The code to support concatenated gzip streams is broken and
it breaks some non-concatenated streams. So just disable this
by always marking EOF immediately when we hit the end of a
gzip stream.
Tim Kientzle [Wed, 31 Dec 2008 07:20:56 +0000 (02:20 -0500)]
Skip testing character conversion failures on platforms
where the "C" locale never generates such failures.
(Cygwin apparently has an overly-permissive "C" locale;
wctomb() never fails.)
Tim Kientzle [Wed, 31 Dec 2008 06:56:54 +0000 (01:56 -0500)]
If zlib/bzlib don't exist, use stub implementations of
archive_write_set_compression_gzip/archive_write_set_compression_bzip2
that always return errors.
Tim Kientzle [Mon, 29 Dec 2008 06:47:01 +0000 (01:47 -0500)]
Visual Studio lacks getopt(). Fortunately, the
test suite only requires fairly basic argument parsing,
so just add the dozen lines needed to implement it from
scratch.
Tim Kientzle [Mon, 29 Dec 2008 01:59:44 +0000 (20:59 -0500)]
Compatibility, based on Windows compatibility fixes from Michihiro NAKAJIMA:
* We don't actually need the name we were run as; we can just use our expected program name (removes need to parse argv[0])
* Add some platform-specific setup on Windows
* Use getcwd() instead of trying to fork pwd shell command; this may need a bit more tweaking on Windows
Tim Kientzle [Mon, 29 Dec 2008 00:45:58 +0000 (19:45 -0500)]
Use native conversion functions on Windows which
handle UTF-16 encoding correctly.
I have two concerns about the current code:
* Windows isn't unique in using UTF-16. Its just a
couple lines of code to combine surrogate pairs
prior to converting to UTF-8. Its another couple
of lines to expand surrogate pairs when converting
from UTF-8 (but only if sizeof(wchar_t) == 2).
* mbtowc() isn't thread-safe. Although there are
parts of libarchive that aren't thread safe---in
particular, archive_write_disk has some umask()
and chdir() calls that are hard to avoid---most
of libarchive can be made thread safe with a little
care. I originally switched this code to mbtowc()
style for reasons of portability and error handling,
so some care is needed.
Tim Kientzle [Mon, 29 Dec 2008 00:36:39 +0000 (19:36 -0500)]
Handle surrogate pairs properly when encoding UTF-8.
In particular, this gives us correct encoding of non-BMP
values on platforms such as Windows whose native wide
character representation is UTF-16.
Tim Kientzle [Mon, 29 Dec 2008 00:24:22 +0000 (19:24 -0500)]
Use MultiByteToWideChar() on Windows instead of mbstowcs().
I'm not yet convinced about this approach. Microsoft documents
mbstowcs(); I'd prefer to avoid platform conditionals.
Tim Kientzle [Mon, 29 Dec 2008 00:13:57 +0000 (19:13 -0500)]
Updated Visual Studio solution files, including
projects for libarchive_test. I'm still working
through the rest of Michihiro's fixes for getting
libarchive_test working under Visual Studio.
Tim Kientzle [Sun, 28 Dec 2008 23:54:08 +0000 (18:54 -0500)]
Skip a test for handling of invalid characters if the
local platform's "C" locale has no invalid characters.
In particular, this fixes libarchive_test on Cygwin.
Tim Kientzle [Sat, 27 Dec 2008 22:06:33 +0000 (17:06 -0500)]
IFC: Various style corrections to libarchive_test: Use more
informative assertXxxx macros; rework some tests so they give
up before failures get out of hand.
Tim Kientzle [Thu, 25 Dec 2008 14:41:32 +0000 (09:41 -0500)]
The EXT2 ioctls are used on Linux to get/set file flags.
The header defining these exists on some Cygwin installations,
but it's broken. I don't think Cygwin supports these ioctls
anyway, so I don't see any point in including the header there.
Someone with more autoconf-fu than I have probably knows a better
solution to this problem.
Tim Kientzle [Thu, 25 Dec 2008 14:31:49 +0000 (09:31 -0500)]
Straighten out the close handling. archive_read_close() now
walks the decompression filter list, invoking the close handler
on each one. In particular, this means that the compress handler
should not recursively invoke close on it's source.
Tim Kientzle [Sun, 21 Dec 2008 19:02:56 +0000 (14:02 -0500)]
Reduce the number of file patterns tested from 200 to 170.
This seems to be necessary in order to run the tests on Cygwin. (?)
Submitted by: Michihiro NAKAJIMA