Silent [Wed, 1 Jan 2025 16:31:35 +0000 (17:31 +0100)]
Fix a Y2038 bug by replacing `Int32x32To64` with regular multiplication (#2471)
`Int32x32To64` macro internally truncates the arguments to int32, while
`time_t` is 64-bit on most/all modern platforms. Therefore, usage of
this macro creates a Year 2038 bug.
I detailed this issue a while ago in a writeup, and spotted the same
issue in this repository when updating the list of affected
repositories:
<https://cookieplmonster.github.io/2022/02/17/year-2038-problem/>
A few more notes:
1. I changed all uses of `Int32x32To64` en masse, even though at least
one of them was technically OK and used with int32 parameters only. IMO
better safe than sorry.
2. This is untested, but it's a small enough change that I hope the CI
success is a good enough indicator.
Endianness is easy to determine at runtime, but detecting this a single
time and then reusing the cached result might require API changes.
However we can use compile-time detection for some known compiler macros
without API changes fairly easily. Let's start by enabling this for
Clang and GCC.
The test archive that contains this executable was created like so,
using 7-Zip 24.08:
`7zz a -t7z -m0=deflate -mf=ppc
libarchive/test/test_read_format_7zip_deflate_powerpc.7z hw-powerpc`
This test fails in the first commit in this PR, and passes in the second
commit.
tar: fix bug when -s/a/b/ used more than once with b flag (#2435)
When the -s/regexp/replacement/ option was used with the b flag more
than once, the result of the previous substitution was appended to the
previous subject instead of replacing it. Fixed it by making sure the
subject is made the empty string before the call to realloc_strcat().
That in effect makes it more like a realloc_strcpy(), but creating a new
realloc_strcpy() function for that one usage doesn't feel worth it.
ci: use at most the number of make threads as there are cores on mac and linux github runners (#2437)
We previously told make to run as many threads as it likes on these CI
jobs, but that might sometimes hit resource limits like RAM or the
allowed number of open files.
These numbers were found experimentally by using `sysctl -n hw.ncpu` on
mac and `nproc` on linux.
This plumbing is required for cmake/ctest to recognise and report
skipped tests.
Now skipped tests in cmake ci jobs are reported like so:
```
Start 7: libarchive_test_acl_platform_posix1e_read
7/785 Test #7: libarchive_test_acl_platform_posix1e_read ................................***Skipped 0.02 sec
```
And there is a list of skipped tests shown at the end of the test run.
ci: make autoconf look for headers and libraries in /opt/homebrew if those directories exist (#2427)
Prior to this change, the ci autoconf jobs weren't looking for homebrew
headers or libraries unless pkg-config was used, so for example the
"MacOS (autotools)" ci job wasn't testing lz4 or zstd code.
ci: log bsdtar's version text, so we can see which support libraries were used (#2426)
A few of libarchive's CI jobs don't find all the local support libraries
that they could be using. This change makes it easier to see which of
them are used.
ci: find liblzma >= 5.6.3 on windows msvc tests (#2421)
We currently use XZ Utils 5.6.3 on windows CI jobs, but the Windows
(msvc)
job which uses cmake seems to only be looking for the old library name,
liblzma.lib:
```
-- Looking for lzma_auto_decoder in C:/Program Files (x86)/xz/lib/liblzma.lib
-- Looking for lzma_auto_decoder in C:/Program Files (x86)/xz/lib/liblzma.lib - not found
-- Looking for lzma_easy_encoder in C:/Program Files (x86)/xz/lib/liblzma.lib
-- Looking for lzma_easy_encoder in C:/Program Files (x86)/xz/lib/liblzma.lib - not found
-- Looking for lzma_lzma_preset in C:/Program Files (x86)/xz/lib/liblzma.lib
-- Looking for lzma_lzma_preset in C:/Program Files (x86)/xz/lib/liblzma.lib - not found
-- Could NOT find LibLZMA (missing: LIBLZMA_HAS_AUTO_DECODER LIBLZMA_HAS_EASY_ENCODER LIBLZMA_HAS_LZMA_PRESET) (found version "5.6.3")
```
We need to update build/ci/github_actions/ci.cmd to look for lzma.lib
instead.
Julian Uy [Fri, 6 Dec 2024 15:57:27 +0000 (09:57 -0600)]
Add missing definition for getline polyfill (#2425)
The fallback for when `getline` is not implemented in libc was not
compiling due to the fact that the definition for it was missing, so add
the definition.
Alexander Ziaee [Fri, 6 Dec 2024 15:50:06 +0000 (10:50 -0500)]
bsdtar.1: Mention rar support + manual page polish (#2423)
I have been using this for years without realizing it decompresses rar.
+ add rar to supported decompression formats
+ use section references to link sections (this makes them clickable in
GUIs)
+ add paragraph breaks for consistent spacing
+ pdtar is not this program, so use Sy per mdoc style guide
+ do almost the same in reverse for bsdtar
+ remove parenthetical around a complete sentance
Test with XZ Utils 5.6.3 on windows CI jobs (#2417)
This change fixes the autotools build to work with xz-utils 5.6.3, which
changed library names on windows, and fixes a couple of tests that I
noticed had dependencies on liblzma.
ljdarj [Sun, 17 Nov 2024 01:42:27 +0000 (02:42 +0100)]
Moving the tests' integer reading functions to test_utils. (#2410)
Moving the tests' integer reading functions to test_utils so that they
all use the same as well as moving the few using the archive_endian
functions over to the test_utils helper.
Tim Kientzle [Wed, 6 Nov 2024 21:21:54 +0000 (13:21 -0800)]
Ignore ustar size when pax size is present (#2405)
When the pax `size` field is present, we should ignore the size value in
the ustar header. In particular, this fixes reading pax archives created
by GNU tar with entries larger than 8GB.
Note: This doesn't impact reading pax archives created by libarchive
because libarchive uses tar extensions to store an accurate large size
field in the ustar header. GNU tar instead strictly follows ustar in
this case, which prevents it from storing accurate sizes in the ustar
header.
The two test archives that contain this executable were created like so,
using the https://github.com/tehmul/p7zip-zstd fork of 7-Zip:
`7z a -t7z -m0=zstd -mf=SPARC
libarchive/test/test_read_format_7zip_zstd_sparc.7z hw-sparc64`
`7z a -t7z -m0=lzma2 -mf=SPARC
libarchive/test/test_read_format_7zip_lzma2_sparc.7z hw-sparc64`
Two test files are required, because the 7zip reader code has two
different paths, one for lzma and one for all other compressors.
The test_read_format_7zip_lzma2_sparc test is expected to pass, because
LZMA BCJ filters are implemented in liblzma.
The test_read_format_7zip_zstd_sparc test is expected to fail in the
first commit, because libarchive does not currently implement the SPARC
BCJ filter. The second commit will make test_read_format_7zip_zstd_sparc
pass.
Dustin L. Howett [Tue, 22 Oct 2024 09:10:50 +0000 (04:10 -0500)]
write_xar: move libxml2 behind an abstraction layer (#1849)
This commit prepares the XAR writer for another XML writing backend.
Almost everything in this changeset leaves the code identical to how
it started, except for a new layer of indirection between the xar writer
and the XML writer.
The things that are not one-to-one renames include:
- The removal of `UTF8Toisolat1` for the purposes of validating UTF-8
- The writer code made a copy of every filename for the purposes of
checking whether it was Latin-1 stored as UTF-8. In xar, Non-Latin-1
gets stored Base64-encoded.
- I've replaced this use because (1) it was inefficient and (2)
`UTF8Toisolat1` is a `libxml2` export.
- The new function has slightly different results than the one it is
replacing for invalid UTF-8. Namely, it treats illegal UTF-8 "overlong"
encodings of Latin-1 codepoints as _invalid_. It operates on the principle
that we can determine whether something is Latin-1 based entirely on how
long the sequence is expected to be.
- The move of `SetIndent` to before `StartDocument`, which the
abstraction layer immediately undoes. This is to accommodate XML writers
that require indent to be set _before_ the document starts.
ljdarj [Tue, 22 Oct 2024 08:58:22 +0000 (10:58 +0200)]
Adding XZ, LZMA, ZSTD and BZIP2 support to ZIP writer (#2284)
PPMD may come later but I'd rather first iron out style issues with the
ones needing only to wire up libraries already-used in Libarchive before
going at the ones possibly requiring implementing algorithms as well.
dependabot[bot] [Sun, 13 Oct 2024 07:42:01 +0000 (09:42 +0200)]
CI: Bump the all-actions group across 1 directory with 4 updates (#2379)
Bumps the all-actions group with 4 updates:
actions/checkout from 4.1.6 to 4.2.1
actions/upload-artifact from 4.3.3 to 4.4.3
github/codeql-action from 3.25.6 to 3.26.12
ossf/scorecard-action from 2.3.3 to 2.4.0
Emil Velikov [Sun, 13 Oct 2024 03:54:16 +0000 (04:54 +0100)]
Convert the tools and respective tests to SPDX (#2317)
This is the first part of converting the project to use SPDX license
identifiers instead using the verbose license text.
The patches are semi-automated and I've went through manually to ensure
no license changes were made. That said, I would welcome another pair of
eyes, since I am only human.
See https://github.com/libarchive/libarchive/issues/2298
---------
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Update archive_private to avoid template keyword (#2342)
People really should never, ever, ever use libarchive internal headers. And they definitely should not expect libarchive internal headers to work in a C++ compiler. (C++ and C are really just not that compatible.)
However, people do a lot of things they shouldn't: Avoid the reserved C++ keyword `template`
vcoxvco [Sun, 13 Oct 2024 00:44:32 +0000 (02:44 +0200)]
configure.ac,CMakeLists.txt: Add libbsd on Haiku for readpassphrase (#2352)
Followup from #2346
Add libbsd to make/cmake configuration for linking readpassphrase on
Haiku.
Maybe there is a better way to do this for cmake, I'm not that familiar
with it.
Duncan Horn [Fri, 11 Oct 2024 06:30:25 +0000 (23:30 -0700)]
[7zip] Read/write symlink paths as UTF-8 (#2252)
I previously tried to find documentation on how symlinks are expected to
be stored in 7zip files, however the best reference I could find was
[here](https://py7zr.readthedocs.io/en/latest/archive_format.html). That
site suggests that symlink paths are stored as UTF-8 encoded strings:
Duncan Horn [Fri, 11 Oct 2024 06:25:47 +0000 (23:25 -0700)]
Update RAR5 code to report encryption (#2096)
Currently, the RAR5 code always reports
`ARCHIVE_READ_FORMAT_ENCRYPTION_UNSUPPORTED` for
`archive_read_has_encrypted_entries`, nor does it set any of the
entry-specific properties, even though it has enough information to
properly report this information. Accurate reporting of encryption is
super useful for applications because reporting an error message such as
"the archive is encrypted, but we don't currently support encryption" is
a lot better than a not generally useful `errno` value and a
non-localizable error string with a confusing and unpredictable error
message.
ljdarj [Fri, 11 Oct 2024 06:18:55 +0000 (08:18 +0200)]
Change to Windows absolute symlinks. (#2362)
Change to read absolute symlinks as verbatim paths instead of NT paths:
as far as I can see, libarchive can deal with verbatim paths while it
can't with NT ones.
Michał Górny [Fri, 11 Oct 2024 06:17:01 +0000 (08:17 +0200)]
configure.ac: remove incorrect 4th argument to `AC_CHECK_FUNCS` (#2334)
Remove the incorrect 4th argument from `AC_CHECK_FUNCS` calls. The macro
uses only three arguments, so it was ignored anyway. Furthermore, in at
least once instance it was wrong -- due to a typo in `attr/xatr.h`
header name.
Tim Kientzle [Fri, 11 Oct 2024 06:16:12 +0000 (23:16 -0700)]
Don't crash on truncated tar archives (#2364)
The tar header parsing overhaul in #2127 introduced a systematic
mishandling of truncated files: the code incorrectly checks for whether
a given read operation failed, and ends up dereferencing a NULL pointer
in this case. I've gone back and double-checked how
`__archive_read_ahead` actually works (it returns NULL precisely when it
was unable to satisfy the read request) and reworked the error handling
for each call to this function in archive_read_support_format_tar.c
Tim Kientzle [Fri, 11 Oct 2024 06:14:58 +0000 (23:14 -0700)]
Sanity-check gzip header field length (#2366)
OSS-Fuzz managed to construct a small gzip input that decompresses into
another gzip input with an extremely large filename field. This causes
libarchive to hang processing the inner gzip.
Address this by rejecting any gzip input where the filename or comment
fields exceed 1MiB.
Tim Kientzle [Fri, 11 Oct 2024 06:13:00 +0000 (23:13 -0700)]
Clarify crc32 variable names (#2367)
No functional change, just a tiny style improvement.
Use `crc32_computed` to refer to the crc32 that the reader has computed
and `crc32_read` to refer to the value that we read from the archive.
That hopefully makes this code a tiny bit easier to follow. (It confused
me recently when I was double-checking something in this area, so I
thought an improvement here might help others.)
Tim Kientzle [Fri, 11 Oct 2024 06:11:43 +0000 (23:11 -0700)]
Fix error message printing (#2368)
We always print the error message with or without -v, but for some
reason, we were omitting the path being processed. Simplify so that we
always print the full error including context.
This fixes various code quality issues I encountered while chasing a
memory leak reported by test automation. I failed to reproduce the
memory leak, but I hope you find this useful nonetheless.
These were disabled when migrating from Cirrus CI. Let's enable them for
github workflows, disable any failing tests on this configuration and
leave TODO notes to fix them.
This was the only failure that I found:
```
684/764 Test #684: bsdtar_test_option_ignore_zeros_mode_c ...................................***Failed 0.10 sec
If tests fail or crash, details will be in:
C:\Users\RUNNER~1\AppData\Local\Temp/bsdtar_test.exe.2024-09-29T11.42.13-000
Reference files will be read from: D:/a/libarchive/libarchive/tar/test
Running tests on: "D:\a\libarchive\libarchive\build_ci\cmake\bin\Release\bsdtar.exe"
Exercising: bsdtar 3.8.0 - libarchive 3.8.0dev zlib/1.3 liblzma/5.4.4 bz2lib/1.1.0 libzstd/1.5.5
39: test_option_ignore_zeros_mode_c
D:\a\libarchive\libarchive\tar\test\test_option_ignore_zeros.c(99): File should be empty: test-c.err
File size: 112
Contents:
0000 62 73 64 74 61 72 2e 65 78 65 3a 20 61 3a 20 43 bsdtar.exe: a: C
0010 61 6e 27 74 20 74 72 61 6e 73 6c 61 74 65 20 75 an't translate u
0020 6e 61 6d 65 20 27 28 6e 75 6c 6c 29 27 20 74 6f name '(null)' to
0030 20 55 54 46 2d 38 0d 0a 62 73 64 74 61 72 2e 65 UTF-8..bsdtar.e
0040 78 65 3a 20 62 3a 20 43 61 6e 27 74 20 74 72 61 xe: b: Can't tra
0050 6e 73 6c 61 74 65 20 75 6e 61 6d 65 20 27 28 6e nslate uname '(n
0060 75 6c 6c 29 27 20 74 6f 20 55 54 46 2d 38 0d 0a ull)' to UTF-8..
Tim Kientzle [Sun, 22 Sep 2024 23:06:34 +0000 (16:06 -0700)]
Clean up linkpath between entries (#2343)
PR #2127 failed to clean up the linkpath storage between entries. As a
result, after the first hard/symlink entry in a pax format archive, all
subsequent entries would get the same link information.
I'm really unsure how this bug failed to trip CI. I'll do some digging
in the test suite before I merge this.
Resolves #2331 , #2337
P.S. Thanks to Brad King for noting that the linkpath wasn't being
managed correctly, which was a big hint for me.
Michał Górny [Sat, 21 Sep 2024 02:44:06 +0000 (04:44 +0200)]
tar/write.h: Support `sys/xattr.h` (#2335)
Synchronize the last use of `attr/xattr.h` to support using
`sys/xattr.h` instead. The former header is deprecated on GNU/Linux, and
this replacement makes it possible to build libarchive without the
`attr` package.
Brad King [Fri, 20 Sep 2024 12:11:43 +0000 (08:11 -0400)]
tar: fix memory leaks when processing symlinks or parsing pax headers (#2338)
Fix memory leaks introduced by #2127:
* `struct tar` member `entry_linkpath` was moved at the same time as
other members were removed, but its cleanup was accidentally removed
with the others.
* `header_pax_extension` local variable `attr_name` was not cleaned up.
Tim Kientzle [Fri, 20 Sep 2024 05:20:02 +0000 (22:20 -0700)]
Be more cautious about parsing ISO-9660 timestamps (#2330)
Some ISO images don't have valid timestamps for the root directory
entry. Parsing such timestamps can generate nonsensical results, which
in one case showed up as an unexpected overflow on a 32-bit system.
Add some validation logic that can check whether a 7-byte or 17-byte
timestamp is reasonable-looking, and use this to ignore invalid
timestamps in various locations. This also requires us to be a little
more careful about tracking which timestamps are actually known.
Followup to #2318 which accidentally made zlib required.
Tested locally by increasing the version in CMakeLists.txt to 1.4.1
(which does not exist yet), and confirming that the build reports that a
suitable version of zlib was not found, while the build continued..