Lasse Collin [Tue, 2 Jul 2024 15:02:50 +0000 (18:02 +0300)]
CMake: The compile definition is ENABLE_NLS, not XZ_NLS
The CMake variables were renamed and accidentally also
the compile definition was renamed. As a result, translation
support wasn't actually enabled in the executables.
Xi Ruoyao [Fri, 28 Jun 2024 10:36:43 +0000 (13:36 +0300)]
liblzma: Speed up CRC32 calculation on 64-bit LoongArch
The crc.w.{b/h/w/d}.w instructions in LoongArch can calculate the CRC32
result for 1/2/4/8 bytes in a single operation. Using these is much
faster compared to the generic method.
Optimized CRC32 is enabled unconditionally on 64-bit LoongArch because
the LoongArch specification says that CRC32 instructions shall be
implemented for 64-bit processors. Optimized CRC32 isn't enabled for
32-bit LoongArch processors because not enough information is available
about them.
Co-authored-by: Lasse Collin <lasse.collin@tukaani.org> Closes: https://github.com/tukaani-project/xz/pull/86
Sam James [Fri, 28 Jun 2024 11:18:35 +0000 (14:18 +0300)]
CI: Speed up Valgrind job by using --trace-children-skip-by-arg=...
This addresses the issue I mentioned in 6c095a98fbec70b790253a663173ecdb669108c4 and speeds up the Valgrind
job a bit, because non-xz tools aren't run unnecessarily with
Valgrind by the script tests.
Lasse Collin [Tue, 25 Jun 2024 13:00:22 +0000 (16:00 +0300)]
Build: Prepend, not append, PTHREAD_CFLAGS to LIBS
It shouldn't make any difference because LIBS should be empty
at that point in configure. But prepending is the correct way
because in general the libraries being added might require other
libraries that come later on the command line.
Lasse Collin [Tue, 25 Jun 2024 11:24:29 +0000 (14:24 +0300)]
Build: Use AC_LINK_IFELSE to handle implicit function declarations
It's more robust in case the compiler allows pre-C99 implicit function
declarations. If an x86 intrinsic is missing and gets treated as
implicit function, the linking step will very probably fail. This
isn't the only way to workaround implicit function declarations but
it might be the simplest and cleanest.
The problem hasn't been observed in the wild.
There are a couple more AC_COMPILE_IFELSE uses in configure.ac.
Of these, Landlock check calls prctl() and in theory could have
the same problem. In practice it doesn't as the check program
looks for several other things too. However, it was changed to
AC_LINK_IFELSE still to look more correct.
Similarly, m4/tuklib_cpucores.m4 and m4/tuklib_physmem.m4 were
updated although they haven't given any trouble either. They
have worked all these years because those check programs rely
on specific headers and types: if headers or types are missing,
compilation will fail. Using the linker makes these checks more
similar to the ones in cmake/tuklib_*.cmake which always link.
Lasse Collin [Mon, 24 Jun 2024 20:35:59 +0000 (23:35 +0300)]
Build: Use AC_LINK_IFELSE instead of -Werror
AC_COMPILE_IFELSE needed -Werror because Clang <= 14 would merely
warn about the unsupported attribute and implicit function declaration.
Changing to AC_LINK_IFELSE handles the implicit declaration because
the symbol __crc32d is unlikely to exist in libc.
Note that the other part of the check is that #include <arm_acle.h>
must work. If the header is missing, most compilers give an error
and the linking step won't be attempted.
Avoiding -Werror makes the check more robust in case CFLAGS contains
warning flags that break -Werror anyway (but this isn't the only check
in configure.ac that has this problem). Using AC_LINK_IFELSE also makes
the check more similar to how it is done in CMakeLists.txt.
Lasse Collin [Mon, 24 Jun 2024 17:14:43 +0000 (20:14 +0300)]
CMake: Not experimental anymore
While the CMake support has gotten a lot less testing than
the Autotools-based build, the supported features should now
be equal. The output may differ slightly, for example,
liblzma.pc may have
Libs.private: -pthread -lpthread
with Autotools on GNU/Linux. CMake doesn't put any options
in Libs.private because on modern glibc the pthread functions
are in libc. The options options aren't required to link static
liblzma into an application.
Autotools-based build doesn't generate or install
lib/cmake/liblzma-*.cmake files. This means that on most
platforms one cannot rely on
Lasse Collin [Tue, 25 Jun 2024 12:51:48 +0000 (15:51 +0300)]
CMake: Always add pthread flags into CMAKE_REQUIRED_LIBRARIES
It was weird to add CMAKE_THREAD_LIBS_INIT in CMAKE_REQUIRED_LIBRARIES
only if CLOCK_MONOTONIC is available. Alternative would be to remove
the thread libs from CMAKE_REQUIRED_LIBRARIES after the check for
pthread_condattr_setclock() but keeping the libs should be fine too.
Then it's ready in case more pthread functions were wanted some day.
Lasse Collin [Mon, 24 Jun 2024 19:41:10 +0000 (22:41 +0300)]
CMake: Fix three checks if building with -flto
In CMake, check_c_source_compiles() always links too. With
link-time optimization, unused functions may get omitted if
main() doesn't depend on them. Consider the following which
tries to check if somefunction() is available when <someheader.h>
has been included:
#include <someheader.h>
int foo(void) { return somefunction(); }
int main(void) { return 0; }
LTO may omit foo() completely because the program as a whole doesn't
need it and then the program will link even if the symbol somefunction
isn't available in libc or other library being linked in, and then
the test may pass when it shouldn't.
What happens if <someheader.h> doesn't declare somefunction()?
Shouldn't the test fail in the compilation phase already? It should
but many compilers don't follow the C99 and later standards that
prohibit implicit function declarations. Instead such compilers
assume that somefunction() exists, compilation succeeds (with a
warning), and then linker with LTO omits the call to somefunction().
Change the tests so that they are part of main(). If compiler accepts
implicitly declared functions, LTO cannot omit them because it has to
assume that they might have side effects and thus linking will fail.
On the other hand, if the functions/intrinsics being used are supported,
they might get optimized away but in that case it's fine because they
really are supported.
It is fine to use __attribute__((target(...))) for main(). At least
it works with GCC 4.9 to 14.1 on x86-64.
Lasse Collin [Mon, 24 Jun 2024 14:39:54 +0000 (17:39 +0300)]
CI: Workaround buggy config.guess on Ubuntu 22.04LTS and 24.04LTS
Check for the wrong triplet from config.guess and override it with
the --build option on the configure command line. Then i386 assembly
autodetection will work.
These Ubuntu versions (and as of writing, also Debian unstable)
ship config.guess version 2022-01-09 which contains a bug that
was fixed in version 2022-05-08. It results in a wrong configure
triplet when using CC="gcc -m32" to build i386 binaries.
Lasse Collin [Sun, 23 Jun 2024 12:35:35 +0000 (15:35 +0300)]
liblzma: Tidy up crc_common.h
Prefix ARM64_RUNTIME_DETECTION with CRC_ and reorder it to be with
the other ARM64-specific lines. That macro isn't used outside this
file.
ARM64 CLMUL implementation doesn't exist yet and thus CRC64_ARM64_CLMUL
isn't used anywhere yet.
It's not ideal that the single-letter CRC utility macros are here
as they pollute the namespace of the LZ encoder files. Those could
be moved their own crc_macros.h like they were in 5.2.x but in practice
this is fine enough already.
Lasse Collin [Sun, 23 Jun 2024 11:22:08 +0000 (14:22 +0300)]
liblzma: Move lzma_crcXX_table[][] declarations to crc_common.h
LZ encoder needs lzma_crc32_table[0] but otherwise those tables
are private to the CRC code. In contrast, the other things in
check.h are needed in several places.
Lasse Collin [Wed, 19 Jun 2024 15:38:22 +0000 (18:38 +0300)]
liblzma: Make 32-bit x86 CRC assembly co-exist with CLMUL
Now runtime detection of CLMUL support can pick between the CLMUL and
the generic assembly implementations. Whatever overhead this has for
builds that omit CLMUL completely isn't important because builds for
any non-ancient system is likely to include the CLMUL code too.
Handle the CRC tables in crcXX_fast.c files because now these files
are built even when assembly code is used.
If 32-bit x86 assembly is enabled then it will always be built even
if compiler flags were such that CLMUL would be allowed unconditionally.
That is, runtime detection will be used anyway. This keeps the build
rules simpler.
In LZ encoder, build and use lzma_lz_hash_table[256] if CLMUL CRC
is used without runtime detection. Previously this wasn't needed
because crc32_table.c included the lzma_crc32_table[][] in the build
unless encoder support had been disabled. Including an 8 KiB table
was silly when only 1 KiB is actually used. So now liblzma is 7 KiB
smaller if CLMUL is enabled without runtime detection.
Lasse Collin [Thu, 20 Jun 2024 15:12:21 +0000 (18:12 +0300)]
CMake: Move threading detection a few lines up
It feels clearer this way, and when support for external SHA-256
is added, this will keep the order of the library detection the
same as in configure.ac (check for pthreads before libmd) although
it shouldn't matter in practice.
Lasse Collin [Thu, 20 Jun 2024 18:53:03 +0000 (21:53 +0300)]
CMake: Refactor XZ_SYMBOL_VERSIONING to match configure.ac
Make the available options and their behavior match
--enable-symbol-versions in configure.ac.
Don't enable symbol versions on Linux if not using glibc. Previously
the generic variant was selected on Microblaze or if using NVHPC
without checking that libc is glibc.
Leave the cache variable to "auto" or "yes" if that was specified
instead of setting it to the autodetected value by default. A downside
is that one cannot easily see which variant the autodetection code
has selected. The same applies to XZ_SANDBOX and XZ_THREADS though.
Lasse Collin [Sat, 15 Jun 2024 15:07:04 +0000 (18:07 +0300)]
CMake: Use the same option list for XZ_THREADS as in configure.ac
Also clarify that "yes" will fail if no threading support is found.
If no threading is wanted, it has to be disabled manually.
configure.ac doesn't behave this way at the moment. Instead it
assumes pthreads to be present if not targeting Windows. If pthreads
actually are missing, the build fails later.
Lasse Collin [Sat, 15 Jun 2024 15:07:04 +0000 (18:07 +0300)]
CMake: Use \040 instead of \x20 for a space
This is for consistency with 4c81c9611f8b2e1ad65eb7fa166afc570c58607e
where \040 has to be used because \0x20F gets interpret at three hex
digits. Octals escapes are never longer than three digits.
Lasse Collin [Sat, 15 Jun 2024 15:07:04 +0000 (18:07 +0300)]
CMake: Refactor ADDITIONAL_CHECK_TYPES to XZ_CHECKS
Now "crc32" is in the list too for completeness but it doesn't
actually have any effect. The description of the cache variable
says that "crc32 is always built" so it should be clear enough.
Lasse Collin [Sat, 15 Jun 2024 15:07:04 +0000 (18:07 +0300)]
CMake: Rename CREATE_LZMA_SYMLINKS to XZ_TOOL_LZMA_SYMLINKS
Update the description too.
It affects creation of not only the legacy lzma, unlzma, lzcat symlinks
but also lzgrep and other legacy names for the scripts. The last
LZMA Utils release was made in 2008 but these names are still used
in some places to handle .lzma files.
Lasse Collin [Sat, 15 Jun 2024 15:07:04 +0000 (18:07 +0300)]
CMake: Rename ENABLE_NLS to XZ_NLS
Also update the description to mention that this affects installation
of translated man pages too.
Prefixing the cache variables with the project name helps if
the package is used as a subproject in another package.
It also makes the package-specific options group more nicely
in ccmake and cmake-gui.
Lasse Collin [Sat, 15 Jun 2024 20:34:29 +0000 (23:34 +0300)]
CMake: Link Threads::Threads as PRIVATE to liblzma
This way pthread options aren't passed to the linker when linking
against shared liblzma but they are still passed when linking against
static liblzma. (Also, one never needs the include path of the
threading library to use liblzma since liblzma's API headers
don't #include <pthread.h>. But <pthread.h> tends to be in the
default include path so here this change makes no difference.)
One cannot mix target_link_libraries() calls that use the scope
(PRIVATE, PUBLIC, or INTERFACE) keyword and calls that don't use it.
The calls without the keyword are like PUBLIC except perhaps when
they aren't, or something like that... It seems best to always
specify a scope keyword as the meanings of those three keywords
at least are clear.
Lasse Collin [Sun, 16 Jun 2024 16:37:36 +0000 (19:37 +0300)]
CMake: Use CMAKE_THREAD_LIBS_INIT in liblzma.pc only with pthreads
This shouldn't make much difference in practice as on Windows
no flags are needed anyway and unitialized variable (when threading
is disabled) expands to empty. But it's clearer this way.
Lasse Collin [Sun, 16 Jun 2024 16:18:56 +0000 (19:18 +0300)]
CMake: Use relative paths in liblzma.pc if possible
Now liblzma.pc can be relocatable only if using CMake >= 3.20
but that should be OK as now we shouldn't get broken liblzma.pc
if CMAKE_INSTALL_LIBDIR or CMAKE_INSTALL_INCLUDEDIR contain an
absolute path.
Lasse Collin [Sun, 16 Jun 2024 10:39:37 +0000 (13:39 +0300)]
liblzma: CRC CLMUL: Omit is_arch_extension_supported() when not needed
On E2K the function compiles only due to compiler emulation but the
function is never used. It's cleaner to omit the function when it's
not needed even though it's a "static inline" function.
Lasse Collin [Sun, 16 Jun 2024 10:21:34 +0000 (13:21 +0300)]
liblzma: x86 CLMUL CRC: Rewrite
It's faster with both tiny and large buffers and doesn't require
disabling any sanitizers. With large buffers the extra speed is
from folding four 16-byte chunks in parallel.
The 32-bit x86 with MSVC reportedly still needs a workaround.
Now the simpler "__asm mov ebx, ebx" trick is enough but it
needs to be in lzma_crc64() instead of crc64_arch_optimized().
Thanks to Iouri Kharon for testing and the fix.
Thanks to Ilya Kurdyukov for testing the speed with aligned and
unaligned buffers on a few x86 processors and on E2K v6.
Lasse Collin [Mon, 10 Jun 2024 12:12:48 +0000 (15:12 +0300)]
liblzma: CRC64 CLMUL: Refactor the constants
Now it refers to crc_clmul_consts_gen.c. vfold8 was renamed to mu_p
and the p no longer has the lowest bit set (it makes no difference
as the output bits it affects are ignored).
Lasse Collin [Thu, 9 May 2024 18:03:39 +0000 (21:03 +0300)]
liblzma: Remove crc_attr_no_sanitize_address
It's not enough to silence the address sanitizer. Also memory and
thread sanitizers would need to be silenced. They, at least currently,
aren't smart enough to see that the extra bytes are discarded from
the xmm registers by later instructions.
Valgrind is smarter, possibly because this kind of code isn't weird
to write in assembly. Agner Fog's optimizing_assembly.pdf even mentions
this idea of doing an aligned read and then discarding the extra
bytes. The sanitizers don't instrument assembly code but Valgrind
checks all code.
It's better to change the implementation to avoid the sanitization
attributes which also look scary in the code. (Somehow they can look
more scary than __asm__ which is implictly unsanitized.)
See also:
https://github.com/tukaani-project/xz/issues/112
https://github.com/tukaani-project/xz/issues/122
Lasse Collin [Wed, 12 Jun 2024 11:26:44 +0000 (14:26 +0300)]
CMake: Prefer C11 with a fallback to C99
There is no need to make a similar change in configure.ac.
With Autoconf 2.72, the deprecated macro AC_PROG_CC_C99
is an alias for AC_PROG_CC which prefers a C11 compiler.
Lasse Collin [Fri, 7 Jun 2024 12:47:20 +0000 (15:47 +0300)]
tuklib_integer: Fix building on OpenBSD/sparc64 that uses GCC 4.2
GCC 4.2 doesn't have __builtin_bswap16() and friends so tuklib_integer.h
tries to use OS-specific byte swap methods instead. On OpenBSD those
macros are swap16/32/64 instead of bswap16/32/64 like on other *BSDs
and Darwin.
An alternative to "#ifdef __OpenBSD__" could be "#ifdef swap16" as it
is a macro. But since OpenBSD seems to be a special case under this
special case of "*BSDs and Darwin", checking for __OpenBSD__ seems
the more conservative choice now.
Thanks to Christian Weisgerber and Brad Smith who both submitted
the same patch a few hours apart.
Co-authored-by: Christian Weisgerber <naddy@mips.inka.de> Co-authored-by: Brad Smith <brad@comstyle.com> Closes: https://github.com/tukaani-project/xz/pull/126