Lasse Collin [Wed, 14 Feb 2024 17:11:48 +0000 (19:11 +0200)]
Docs: Include doc/examples/11_file_info.c in tarballs.
It was added in 2017 in c2e29f06a7d1e3ba242ac2fafc69f5d6e92f62cd
but it never got into any release tarballs because it was
forgotten to be added to Makefile.am.
Lasse Collin [Mon, 12 Feb 2024 15:09:10 +0000 (17:09 +0200)]
liblzma: Range decoder: Add branchless C code.
It's used only for basic bittrees and fixed-size reverse bittree
because those showed a clear benefit on x86-64 with GCC and Clang.
The other methods were more mixed and thus are commented out but
they should be tested on other archs.
Lasse Collin [Mon, 12 Feb 2024 15:09:10 +0000 (17:09 +0200)]
liblzma: Optimize LZ decoder slightly.
Now extra buffer space is reserved so that repeating bytes for
any single match will never need to copy from two places (both
the beginning and the end of the buffer). This simplifies
dict_repeat() and helps a little with speed.
This seems to reduce .lzma decompression time about 2 %, so
with .xz and CRC it could be slightly less. The small things
add up still.
Jia Tan [Mon, 12 Feb 2024 15:09:10 +0000 (17:09 +0200)]
liblzma: Creates Non-resumable and Resumable modes for lzma_decoder.
The new decoder resumes the first decoder loop in the Resumable mode.
Then, the code executes in Non-resumable mode until it detects that it
cannot guarantee to have enough input/output to decode another symbol.
The Resumable mode is how the decoder has always worked. Before decoding
every input bit, it checks if there is enough space and will save its
location to be resumed later. When the decoder has more input/output,
it jumps back to the correct sequence in the Resumable mode code.
When the input/output buffers are large, the Resumable mode is much
slower than the Non-resumable because it has more branches and is harder
for the compiler to optimize since it is in a large switch block.
Early benchmarking shows significant time improvement (8-10% on gcc and
clang x86) by using the Non-resumable code as much as possible.
Jia Tan [Mon, 12 Feb 2024 15:09:10 +0000 (17:09 +0200)]
liblzma: Creates separate "safe" range decoder mode.
The new "safe" range decoder mode is the same as old range decoder, but
now the default behavior of the range decoder will not check if there is
enough input or output to complete the operation. When the buffers are
close to fully consumed, the "safe" operations must be used instead. This
will improve speed because it will reduce the number of branches needed
for most of the range decoder operations.
Lasse Collin [Mon, 12 Feb 2024 15:09:10 +0000 (17:09 +0200)]
Translations: Translate also messages of lzmainfo.
lzmainfo has had translation support since 2009 at least but
it was never added to po/POTFILES.in so the messages weren't
translated. It's a very rarely needed tool so it's not too bad.
This also adds src/xz/mytime.c to po/POTFILES.in although there
are no translatable strings. It's simpler this way so that it
won't be forgotten if strings were ever added to that file.
Lasse Collin [Mon, 12 Feb 2024 15:09:10 +0000 (17:09 +0200)]
xzdiff, xzgrep, and xzmore: Rewrite the man pages.
The main reason is a kind of silly one:
xz-man.pot contains strings from all man pages in XZ Utils.
The man pages of xzdiff, xzgrep, and xzmore were under GPLv2
and the rest under 0BSD. Thus xz-man.pot contained strings
under two licences. po4a creates the translated man pages
from the combined 0BSD+GPLv2 xz-man.pot.
I haven't liked this mixing in xz-man.pot but the
Translation Project requires that all man pages must be
in the same .pot file. So a separate xz-man-gpl.pot
wasn't an option.
Since these man pages are short, rewriting them was quick enough.
Now xz-man.pot is entirely under 0BSD and marking the per-file
licenses is simpler.
As a bonus, some wording hopefully is now slightly better
although it's perhaps a matter of taste.
NOTE: In xzgrep.1, the EXIT STATUS section was written by me
in the commit d796b6d7fdb8b7238b277056cf9146cce25db604 so that's
why that section could be taken as is from the old xzgrep.1.
Lasse Collin [Mon, 12 Feb 2024 15:09:10 +0000 (17:09 +0200)]
liblzma: Include the SPDX license identifier 0BSD to generated files.
Perhaps the generated files aren't even copyrightable but
using the same license for them as for the rest of the liblzma
keeps things more consistent for tools that look for license info.
Lasse Collin [Fri, 9 Feb 2024 15:20:31 +0000 (17:20 +0200)]
Fix SHA-256 authors.
The initial commit 5d018dc03549c1ee4958364712fb0c94e1bf2741
in 2007 had a comment in sha256.c that the code is based on
Crypto++ Library 5.5.1. In 2009 the Authors list in sha256.c
and the AUTHORS file was updated with information that the
code had come from Crypto++ but via 7-Zip. I know I had viewed
7-Zip's SHA-256 code but back then the C code has been identical
enough with Crypto++, so I don't why I thought the author info
would need that extra step via 7-Zip for this single file.
Another error is that I had mixed sha.* and shacal2.* files
when checking for author info in Crypto++. The shacal2.* files
aren't related to liblzma's sha256.c and thus Kevin Springle's
code in Crypto++ isn't either.
Jia Tan [Thu, 1 Feb 2024 08:06:29 +0000 (16:06 +0800)]
liblzma: Check HAVE_USABLE_CLMUL before omitting CRC64 table.
If liblzma is configured with --disable-clmul-crc
CFLAGS="-msse4.1 -mpclmul", then it will fail to compile because the
generic version must be used but the CRC tables were not included.
Jia Tan [Tue, 23 Jan 2024 10:02:13 +0000 (18:02 +0800)]
liblzma: Only use ifunc in crcXX_fast.c if its needed.
The code was using HAVE_FUNC_ATTRIBUTE_IFUNC instead of CRC_USE_IFUNC.
With ARM64, ifunc is incompatible because it requires non-inline
function calls for runtime detection.
Jia Tan [Sun, 21 Jan 2024 16:36:47 +0000 (00:36 +0800)]
Build: Add support for ARM64 CRC32 instruction detection.
This adds --enable-arm64-crc32/--disable-arm64-crc32 (enabled by
default) for using the ARM64 CRC32 instruction. This can be disabled if
one knows the binary will never need to run on an ARM64 machine
with this instruction extension.
Chenxi Mao [Tue, 9 Jan 2024 09:23:11 +0000 (17:23 +0800)]
Speed up CRC32 calculation on ARM64
The CRC32 instructions in ARM64 can calculate the CRC32 result
for 8 bytes in a single operation, making the use of ARM64
instructions much faster compared to the general CRC32 algorithm.
Optimized CRC32 will be enabled if ARM64 has CRC extension
running on Linux.
Signed-off-by: Chenxi Mao <chenxi.mao2013@gmail.com>
Jia Tan [Tue, 9 Jan 2024 08:40:56 +0000 (16:40 +0800)]
Doxygen: Added the XZ logo and copyright information.
The PROJECT_LOGO field is now used to include the XZ logo. The footer
of each page now lists the copyright information instead of the default
footer. The license is also copied to statisfy the copyright and so the
link in the documentation can be local.
Lasse Collin [Tue, 23 Jan 2024 16:29:28 +0000 (18:29 +0200)]
xz: Use threaded mode by defaut (as if --threads=0 was used).
This hopefully does more good than bad:
+ It's faster by default.
+ Only the threaded compressor creates files that
can be decompressed in threaded mode.
- Compression ratio is worse, usually not too much though.
When it matters, -T1 must be used.
- Memory usage increases.
- Scripts that assume single-threaded mode but don't use -T1 will
possibly use too much resources, for example, if they run
multiple xz processes in parallel to compress multiple files.
- Output from single-threaded and multi-threaded compressors
differ but such changes could happen for other reasons too
(they just haven't happened since 5.0.0).
Lasse Collin [Mon, 22 Jan 2024 22:09:48 +0000 (00:09 +0200)]
liblzma: RISC-V filter: Use byte-by-byte access.
Not all RISC-V processors support fast unaligned access so
it's better to read only one byte in the main loop. This can
be faster even on x86-64 when compared to reading 32 bits at
a time as half the time the address is only 16-bit aligned.
The downside is larger code size on archs that do support
fast unaligned access.
Jia Tan [Mon, 22 Jan 2024 15:33:39 +0000 (23:33 +0800)]
xz: Update xz -lvv for RISC-V filter.
Version 5.6.0 will be shown, even though upcoming alphas and betas
will be able to support this filter. 5.6.0 looks nicer in the output and
people shouldn't be encouraged to use an unstable version in production
in any way.
Jia Tan [Mon, 22 Jan 2024 15:33:39 +0000 (23:33 +0800)]
Tests: Add two RISC-V Filter test files.
These test files achieve 100% code coverage in
src/liblzma/simple/riscv.c. They contain all of the instructions that
should be filtered and a few cases that should not.