Improve test_read_format_zip_filename for Windows platform.
- some tests work on both POSIX and Windows platform.
- add test_read_format_zip_filename_CP866_CP1251_win to check the automatic filename translation.
Fix the automatic filename translation on Windows and add a test for it.
When reading CP866 filenames in the zip file on Russian Windows platform,
which its ACP is not equal to its OEMCP, We should automatically translate
CP866(OEMCP of Russian) filenames to CP1251(ACP of Russian) filenames because
other archiver application on Windows have stored the filenames in CP866
through WideCharToMultiByte() with CP_OEMCP.
Roman Neuhauser [Sat, 9 Apr 2011 08:06:57 +0000 (04:06 -0400)]
test_archive_string.c: drop assertEqualArchiveString
the interesting bit about the archive_string* returning functions
is the returned address (these must return their first argument),
the value pointed at is tested separately.
Use of mbrtowc/wcrtomb is slower than use of WideCharToMultiByte/MultiByteToWideChar.
So we should use WideCharToMultiByte/MultiByteToWideChar and
simulate CRT mbrtowc/wcrtomb handling in "C" locale.
Following tests are now for all platform not only Windows.
test_read_format_zip_filename_CP932_CP932
test_read_format_zip_filename_UTF8_CP932
test_read_format_zip_filename_CP866_CP1251
test_read_format_zip_filename_KOI8R_CP1251
test_read_format_zip_filename_UTF8_CP1251
Simplify sparse test code.
Avoid the effect of current locale state to the current working directory in multi-byte name
for the directory traversals on Windows.
Use of archive_strcat() in archive_string_append_from_wcs_to_mbs() is slightly inefficient.
We can trust a contents of buffer, so we should use archive_string_append() instead.
Use mbstowcs() at __la_win_permissive_name() instead of MultiByteToWideChar() with CP_ACP, and
Reduce the effect of locale from the functions which used __la_win_permissive_name().
On Windows, Improve a conversion MBS<==>WCS to be related with setlocale().
use of _get_current_locale() was good but msys does not provide, use of CP_ACP
mostly worked well but it ignored the locale set by setlocale().
I was finding the best solution for that issue and I finally found that
setlocale(LC_CTYPE, NULL) always returned current CodePage like this
"English_United States.1252", so we can get the current CodePage through
setlocale().
Add filenames conversion tests for Windows.
We have to check WCS(UTF-16) filenames for those tests because MBS
byte sequence cab be different if the current locale is different.
For example, MBS in CP932 can together describe CP866 by using
escape sequence so you can see a CP866 specific character in non
CP866 CodePage but cannot compare byte sequence to formal CP866 byte
sequence because of the escape sequence.
Introduce
archive_string_default_conversion_for_read() and
archive_string_default_conversion_for_write()
for conversion CP_ACP <==> CP_OEMCP(as charset for archive file) on Windows platform.
Those functions on non-Windows platform always return NULL.
Drop a use of CRL function _get_current_locale(), which msys does not provide.
We should use GetACP() instead. This is simple and fast, and would work well
for the application using libarchive.
I changed my idea it is better that a use of GetOEMCP() is only in reader and writer.
Allow hdrcharset=UTF-8 to pax writer because libarchive_test needs that for its test to check
if the running platform support the string conversion or not.
Copy lib-link.m4,lib-prefix.m4 and lib-ld.m4 from
http://git.savannah.gnu.org/cgit/gnulib.git/tree/m4 into build/autoconf
so that iconv.m4 can be used without error on Ubuntu or other platforms.
Rename archive_string_append_from_unicode_to_mbs back to archive_string_append_from_wcs_to_mbs
since we have changed the handling of WCS. we are free from WCS format.
Rename "charset" option to "hdrcharset" since the name "charset" is not clear,
in particular reading or writing pax format, the "charset" option can be used to
a content of the file stored in pax archive file, and then "hdrcharset" described
for its metadata such as filename.
Roman Neuhauser [Mon, 4 Apr 2011 11:36:14 +0000 (07:36 -0400)]
archive_read_support_format_xar.c: multiple fixes based on Tim's comments
* various helper functions take archive_read* so that they can
call archive_set_error()
* the expat-based TOC parser should now behave correctly on errors
This change almost revert r3148,r3149 because of my misunderstanding.
We should use both BINARY and UTF-8(default) only to a charset for PAX header.
- Pax writer allow charset=BINARY option only.
- For Reader side, Pax header encoding does not affect other tar header, and
Pax header parser allow only both BINARY and "ISO-IR 10646 2000 UTF-8" for hdrcharset.
Improve the windows version of archive_read_disk_entry_from_file() to avoid using stat() family.
- Transfer the windows version of archive_read_disk_entry_from_file() into
archive_read_disk_windows.c because there are same utility functions in
archive_read_disk_windows.c and calling archive_read_disk_entry_from_file() from
directory traversals is inefficient through its API.
- Use WCS for pathname to exceed the PATH_MAX limitation.
- Properly detect if the file is symbolic linked file like lstat().
the last two calls were in __archive_read_get_bidder() and
__archive_read_register_format(). __archive_read_get_bidder()
required a change in interface (not really, but if it calls
archive_set_error(), it should return ARCHIVE_*); i added
a wrapper macro ala archive_check_magic() to keep it DRY.
some but not all archive_read_support_compression_*() functions
called archive_clear_error() right after __archive_read_get_bidder.
the reason isn't clear to me and no tests break without these
calls, so i've removed them.
Get the current codepage through _get_current_locale() and use it for
MultiByteToWideChar and WideCharToMultiByte instead of CP_OEMCP or CP_ACP.
This simulates the CRT version of mbrtowc/wcrtomb.
Use archive_string_conv object for the test to know if current locale is
UTF-8.
Delay the initialization of variables in strncpy_to_utf16be and
strncpy_from_utf16be until those are really needed.
maintain BC aliases (ARCHIVE_VERSION_NUMBER < 4000000). if fact,
the new names are wrappers around the old ones as i want to switch
the tests to the new names and have the old names still tested.
the wrappers revealed a mismatch between
archive_read_support_compression_program_signature declaration in
archive.h and its definition (const void* vs void*), i'm going with
const void*.
Improve character-set conversion functions.
- Change the interface in order to reduece the comparison of
charset names. Previous version always did it when the functions
were called; it was very inefficient. So I have made a conversion
object, struct archive_string_conv, to resolve that issue.
- Integrate *_from_charset and *_to_charset into *_in_locale because
of above.
- Integrate *_from_utf16be and *_to_utf16be into *in_locale.
- On Windows, Make a codepage from a charset name to know whether
current codepage and specified charset are the same or not.
Roman Neuhauser [Mon, 28 Mar 2011 09:18:49 +0000 (05:18 -0400)]
libarchive_changes.3: changes in libarchive interface
this is meant to be the target of 'man removed_function' and
'man deprecated_function'. we don't have the needed machinery (yet).
the FreeBSD-bundled copy of libarchive installs many links using
a FreeBSD-provided mechanism, we could probably get that into
Makefile.am (http://sources.redhat.com/automake/automake.html#Extending
sadly doesn't mention install-man in the list of extensible targets..).
not sure about cmake. spotted nothing interesting in the 2.8 manual,
google returned nothing relevant.
Roman Neuhauser [Mon, 28 Mar 2011 09:17:47 +0000 (05:17 -0400)]
archive_write_finish_entry.3: separate man page
this moves archive_write_finish_entry() out of sight a bit:
its description says it's not normally necessary, so the man page
is referenced only from archive_write_data.3 (for now)