Lasse Collin [Sun, 23 Jun 2024 12:35:35 +0000 (15:35 +0300)]
liblzma: Tidy up crc_common.h
Prefix ARM64_RUNTIME_DETECTION with CRC_ and reorder it to be with
the other ARM64-specific lines. That macro isn't used outside this
file.
ARM64 CLMUL implementation doesn't exist yet and thus CRC64_ARM64_CLMUL
isn't used anywhere yet.
It's not ideal that the single-letter CRC utility macros are here
as they pollute the namespace of the LZ encoder files. Those could
be moved their own crc_macros.h like they were in 5.2.x but in practice
this is fine enough already.
Lasse Collin [Sun, 23 Jun 2024 11:22:08 +0000 (14:22 +0300)]
liblzma: Move lzma_crcXX_table[][] declarations to crc_common.h
LZ encoder needs lzma_crc32_table[0] but otherwise those tables
are private to the CRC code. In contrast, the other things in
check.h are needed in several places.
Lasse Collin [Wed, 19 Jun 2024 15:38:22 +0000 (18:38 +0300)]
liblzma: Make 32-bit x86 CRC assembly co-exist with CLMUL
Now runtime detection of CLMUL support can pick between the CLMUL and
the generic assembly implementations. Whatever overhead this has for
builds that omit CLMUL completely isn't important because builds for
any non-ancient system is likely to include the CLMUL code too.
Handle the CRC tables in crcXX_fast.c files because now these files
are built even when assembly code is used.
If 32-bit x86 assembly is enabled then it will always be built even
if compiler flags were such that CLMUL would be allowed unconditionally.
That is, runtime detection will be used anyway. This keeps the build
rules simpler.
In LZ encoder, build and use lzma_lz_hash_table[256] if CLMUL CRC
is used without runtime detection. Previously this wasn't needed
because crc32_table.c included the lzma_crc32_table[][] in the build
unless encoder support had been disabled. Including an 8 KiB table
was silly when only 1 KiB is actually used. So now liblzma is 7 KiB
smaller if CLMUL is enabled without runtime detection.
Lasse Collin [Thu, 20 Jun 2024 15:12:21 +0000 (18:12 +0300)]
CMake: Move threading detection a few lines up
It feels clearer this way, and when support for external SHA-256
is added, this will keep the order of the library detection the
same as in configure.ac (check for pthreads before libmd) although
it shouldn't matter in practice.
Lasse Collin [Thu, 20 Jun 2024 18:53:03 +0000 (21:53 +0300)]
CMake: Refactor XZ_SYMBOL_VERSIONING to match configure.ac
Make the available options and their behavior match
--enable-symbol-versions in configure.ac.
Don't enable symbol versions on Linux if not using glibc. Previously
the generic variant was selected on Microblaze or if using NVHPC
without checking that libc is glibc.
Leave the cache variable to "auto" or "yes" if that was specified
instead of setting it to the autodetected value by default. A downside
is that one cannot easily see which variant the autodetection code
has selected. The same applies to XZ_SANDBOX and XZ_THREADS though.
Lasse Collin [Sat, 15 Jun 2024 15:07:04 +0000 (18:07 +0300)]
CMake: Use the same option list for XZ_THREADS as in configure.ac
Also clarify that "yes" will fail if no threading support is found.
If no threading is wanted, it has to be disabled manually.
configure.ac doesn't behave this way at the moment. Instead it
assumes pthreads to be present if not targeting Windows. If pthreads
actually are missing, the build fails later.
Lasse Collin [Sat, 15 Jun 2024 15:07:04 +0000 (18:07 +0300)]
CMake: Use \040 instead of \x20 for a space
This is for consistency with 4c81c9611f8b2e1ad65eb7fa166afc570c58607e
where \040 has to be used because \0x20F gets interpret at three hex
digits. Octals escapes are never longer than three digits.
Lasse Collin [Sat, 15 Jun 2024 15:07:04 +0000 (18:07 +0300)]
CMake: Refactor ADDITIONAL_CHECK_TYPES to XZ_CHECKS
Now "crc32" is in the list too for completeness but it doesn't
actually have any effect. The description of the cache variable
says that "crc32 is always built" so it should be clear enough.
Lasse Collin [Sat, 15 Jun 2024 15:07:04 +0000 (18:07 +0300)]
CMake: Rename CREATE_LZMA_SYMLINKS to XZ_TOOL_LZMA_SYMLINKS
Update the description too.
It affects creation of not only the legacy lzma, unlzma, lzcat symlinks
but also lzgrep and other legacy names for the scripts. The last
LZMA Utils release was made in 2008 but these names are still used
in some places to handle .lzma files.
Lasse Collin [Sat, 15 Jun 2024 15:07:04 +0000 (18:07 +0300)]
CMake: Rename ENABLE_NLS to XZ_NLS
Also update the description to mention that this affects installation
of translated man pages too.
Prefixing the cache variables with the project name helps if
the package is used as a subproject in another package.
It also makes the package-specific options group more nicely
in ccmake and cmake-gui.
Lasse Collin [Sat, 15 Jun 2024 20:34:29 +0000 (23:34 +0300)]
CMake: Link Threads::Threads as PRIVATE to liblzma
This way pthread options aren't passed to the linker when linking
against shared liblzma but they are still passed when linking against
static liblzma. (Also, one never needs the include path of the
threading library to use liblzma since liblzma's API headers
don't #include <pthread.h>. But <pthread.h> tends to be in the
default include path so here this change makes no difference.)
One cannot mix target_link_libraries() calls that use the scope
(PRIVATE, PUBLIC, or INTERFACE) keyword and calls that don't use it.
The calls without the keyword are like PUBLIC except perhaps when
they aren't, or something like that... It seems best to always
specify a scope keyword as the meanings of those three keywords
at least are clear.
Lasse Collin [Sun, 16 Jun 2024 16:37:36 +0000 (19:37 +0300)]
CMake: Use CMAKE_THREAD_LIBS_INIT in liblzma.pc only with pthreads
This shouldn't make much difference in practice as on Windows
no flags are needed anyway and unitialized variable (when threading
is disabled) expands to empty. But it's clearer this way.
Lasse Collin [Sun, 16 Jun 2024 16:18:56 +0000 (19:18 +0300)]
CMake: Use relative paths in liblzma.pc if possible
Now liblzma.pc can be relocatable only if using CMake >= 3.20
but that should be OK as now we shouldn't get broken liblzma.pc
if CMAKE_INSTALL_LIBDIR or CMAKE_INSTALL_INCLUDEDIR contain an
absolute path.
Lasse Collin [Sun, 16 Jun 2024 10:39:37 +0000 (13:39 +0300)]
liblzma: CRC CLMUL: Omit is_arch_extension_supported() when not needed
On E2K the function compiles only due to compiler emulation but the
function is never used. It's cleaner to omit the function when it's
not needed even though it's a "static inline" function.
Lasse Collin [Sun, 16 Jun 2024 10:21:34 +0000 (13:21 +0300)]
liblzma: x86 CLMUL CRC: Rewrite
It's faster with both tiny and large buffers and doesn't require
disabling any sanitizers. With large buffers the extra speed is
from folding four 16-byte chunks in parallel.
The 32-bit x86 with MSVC reportedly still needs a workaround.
Now the simpler "__asm mov ebx, ebx" trick is enough but it
needs to be in lzma_crc64() instead of crc64_arch_optimized().
Thanks to Iouri Kharon for testing and the fix.
Thanks to Ilya Kurdyukov for testing the speed with aligned and
unaligned buffers on a few x86 processors and on E2K v6.
Lasse Collin [Mon, 10 Jun 2024 12:12:48 +0000 (15:12 +0300)]
liblzma: CRC64 CLMUL: Refactor the constants
Now it refers to crc_clmul_consts_gen.c. vfold8 was renamed to mu_p
and the p no longer has the lowest bit set (it makes no difference
as the output bits it affects are ignored).
Lasse Collin [Thu, 9 May 2024 18:03:39 +0000 (21:03 +0300)]
liblzma: Remove crc_attr_no_sanitize_address
It's not enough to silence the address sanitizer. Also memory and
thread sanitizers would need to be silenced. They, at least currently,
aren't smart enough to see that the extra bytes are discarded from
the xmm registers by later instructions.
Valgrind is smarter, possibly because this kind of code isn't weird
to write in assembly. Agner Fog's optimizing_assembly.pdf even mentions
this idea of doing an aligned read and then discarding the extra
bytes. The sanitizers don't instrument assembly code but Valgrind
checks all code.
It's better to change the implementation to avoid the sanitization
attributes which also look scary in the code. (Somehow they can look
more scary than __asm__ which is implictly unsanitized.)
See also:
https://github.com/tukaani-project/xz/issues/112
https://github.com/tukaani-project/xz/issues/122
Lasse Collin [Wed, 12 Jun 2024 11:26:44 +0000 (14:26 +0300)]
CMake: Prefer C11 with a fallback to C99
There is no need to make a similar change in configure.ac.
With Autoconf 2.72, the deprecated macro AC_PROG_CC_C99
is an alias for AC_PROG_CC which prefers a C11 compiler.
Lasse Collin [Fri, 7 Jun 2024 12:47:20 +0000 (15:47 +0300)]
tuklib_integer: Fix building on OpenBSD/sparc64 that uses GCC 4.2
GCC 4.2 doesn't have __builtin_bswap16() and friends so tuklib_integer.h
tries to use OS-specific byte swap methods instead. On OpenBSD those
macros are swap16/32/64 instead of bswap16/32/64 like on other *BSDs
and Darwin.
An alternative to "#ifdef __OpenBSD__" could be "#ifdef swap16" as it
is a macro. But since OpenBSD seems to be a special case under this
special case of "*BSDs and Darwin", checking for __OpenBSD__ seems
the more conservative choice now.
Thanks to Christian Weisgerber and Brad Smith who both submitted
the same patch a few hours apart.
Co-authored-by: Christian Weisgerber <naddy@mips.inka.de> Co-authored-by: Brad Smith <brad@comstyle.com> Closes: https://github.com/tukaani-project/xz/pull/126
Lasse Collin [Tue, 4 Jun 2024 20:59:29 +0000 (23:59 +0300)]
CMake: Fix liblzma filename in Windows environments
This is a mess because liblzma DLL outside Cygwin and MSYS2
is liblzma.dll instead of lzma.dll to avoid a conflict with
lzma.dll from LZMA SDK.
On Cygwin the name was "liblzma-5.dll" while "cyglzma-5.dll"
would have been correct (and match what Libtool produces).
MSYS2 likely was broken too as it uses the "msys-" prefix.
This change has no effect with MinGW-w64 because with that
the "lib" prefix was correct already.
With MSVC builds this is a small breaking change that requires developers
to adjust the library name when linking against liblzma. The liblzma.dll
name is kept as is but the import library and static library are now
lzma.lib instead of liblzma.lib. This is helpful when using pkgconf
because "pkgconf --msvc-syntax --libs liblzma" outputs "lzma.lib"
(it's converted from "-llzma" in liblzma.pc). It would be easy to
keep the liblzma.lib naming but the pkgconf compatibility seems worth
it in the long run. The lzma.lib name is compatible with MinGW-w64
too as -llzma will find also lzma.lib.
vcpkg had been patching CMakeLists.txt this way since 2022 but I
learned this only recently. The reasoning for the patch makes sense,
and while this is a small breaking change with MSVC, it seems like
a decent compromise as it keeps the DLL name the same.
2022 patch in vcpkg: https://github.com/microsoft/vcpkg/blob/0707a17ecf1466d64cf1a3c1ee18c8ff02aadb2d/ports/liblzma/win_output_name.patch
See the discussion: https://github.com/microsoft/vcpkg/pull/39024
Thanks to Vincent Torri for confirming the naming issue on Cygwin.
Lasse Collin [Mon, 3 Jun 2024 13:55:03 +0000 (16:55 +0300)]
Fix version.sh compatiblity with Solaris
The ancient /bin/tr on Solaris doesn't support '\n'.
With /usr/xpg4/bin/tr it works but it might not be in PATH.
Another problem was that sed was given input that didn't have a newline
at the end. Text files must end with a newline to be portable.
Fix both problems:
- Handle multiline input within sed itself to avoid one tr invocation.
The default sed even on Solaris does understand \n.
- Use octals in tr -d. \012 works for ASCII "line feed", it's even
used as an example in the Solaris man page. But we must strip
also ASCII "carriage return" \015 and EBCDIC "next line" \025.
The EBCDIC case got handled with \n previously. Stripping \012
and \015 on EBCDIC system won't matter as those control chars
won't be present in the string in the first place.
An awk-based solution could be an alternative but it might need
special casing on Solaris to used nawk instead of awk. The changes
in this commit are smaller and should have a smaller risk for
regressions. It's also possible that version.sh will be dropped
entirely at some point.
Lasse Collin [Mon, 3 Jun 2024 14:44:50 +0000 (17:44 +0300)]
CMake: Install liblzma.pc even with MSVC
I had misunderstood that it wouldn't be useful with MSVC.
vcpkg had been installing liblzma.pc with custom rules since 2020,
years before liblzma.pc support was added to CMakeLists.txt.
Sam James [Mon, 3 Jun 2024 05:16:23 +0000 (06:16 +0100)]
ci: don't pin official GH actions via commit, just tag
There's no real value in doing it via commit for official GH actions. We
can keep using pinned commits for unofficial actions. It's hassle for no
gain.
Maybe going forward we can limit this further by only being paranoid
for the jobs with any access to tokens.
Sam James [Sun, 14 Apr 2024 07:08:00 +0000 (08:08 +0100)]
xz: list: suppress -Wformat-nonliteral for Solaris
Solaris' GCC can't understand that our use is fine, unlike modern compilers:
```
list.c: In function 'print_totals_basic':
list.c:1191:4: error: format not a string literal, argument types not checked [-Werror=format-nonliteral]
uint64_to_str(totals.files, 0));
^~~~~~~~~~~~~
cc1: all warnings being treated as errors
```
It's presumably because of older gettext missing format attributes.
Lasse Collin [Wed, 29 May 2024 14:44:53 +0000 (17:44 +0300)]
Translations: Run "make -C po update-po"
In the past this wasn't done before releases; the Git repository
just contained the files from the Translation Project. But this
way it is clearer when comparing release tarballs against the
Git repository. In future releases this might no longer be necessary
within a stable branch as the .po files won't change so easily anymore
when creating a tarball.