Lasse Collin [Sun, 9 Mar 2025 12:06:35 +0000 (14:06 +0200)]
CMake: Revise tuklib_use_system_extensions
Define NetBSD and Darwin/macOS feature test macros. Autoconf defines
these too (and a few others).
Define the macros on Windows except with MSVC. The _GNU_SOURCE macro
makes a difference with mingw-w64.
Use a function instead of a macro. Don't take the TARGET_OR_ALL argument
because there's always global effect because the global variable
CMAKE_REQUIRED_DEFINITIONS is modified.
Lasse Collin [Thu, 6 Mar 2025 15:37:39 +0000 (17:37 +0200)]
Docs: Add a few TRANSLATORS comments to man pages
All translators know that --command-line-options must not be translated.
With some other strings it's not obvious when the untranslated string
must be preserved. These comments hopefully help.
Lasse Collin [Thu, 6 Mar 2025 14:34:32 +0000 (16:34 +0200)]
Scripts: Mark the LZMA Utils script aliases as deprecated
The deprecated aliases are lzcmp, lzdiff, lzless, lzmore,
lzgrep, lzegrep, and lzfgrep. The commands that start with
the xz prefix have identical behavior, for example, both
lzgrep and xzgrep handle all supported file formats.
This doesn't affect lzma, unlzma, lzcat, lzmadec, or lzmainfo.
The last release of LZMA Utils was made in 2008, but the lzma
compatibility alias for the gzip-like tool is still in common use.
Deprecating it would cause unnecessary breakage.
Lasse Collin [Mon, 17 Feb 2025 19:46:15 +0000 (21:46 +0200)]
Build: Fix out-of-tree builds when using the replacement getopt_long
Nowaways $(top_builddir)/lib/getopt.h depends on headers in
$(top_srcdir)/lib, so both have to be in the include path.
CMake-based build already did this.
Lasse Collin [Mon, 17 Feb 2025 16:11:58 +0000 (18:11 +0200)]
Build: Allow forcing the use of the replacement getopt_long
Now one can pass gl_replace_getopt=yes to configure to force the use
of GNU getopt_long from the lib directory. This only checks that the
value of gl_replace_getopt is non-empty, so one cannot force the
replacement to be disabled.
Lasse Collin [Mon, 3 Feb 2025 14:15:38 +0000 (16:15 +0200)]
Translations: Update Chinese (traditional) translation
Since there are no spaces between words, the unsophisticated automatic
word wrapping code needs some help. Compared to the version in the
Translation Project, I added a few \t characters which the word
wrapping code interprets as zero width spaces (hopefully they are
placed correctly). These edits can be seen with this command:
Lasse Collin [Sun, 2 Feb 2025 12:15:07 +0000 (14:15 +0200)]
Build: Update posix-shell.m4 from Gnulib
Tabs have been converted to spaces and a "serial" number has been
added. The previous version was from 2008/2009. There are no functional
changes since then but now it's clearer that the copy in XZ Utils
isn't outdated.
The new file was picked from the Gnulib commit 81a4c1e3b7692e95c0806d948cbab9148ad85ef2. A later commit adds
a warranty disclaimer to the license, which obviously is fine,
but I didn't find a SPDX license identifier for the new license,
so for simplicity I used the earlier commit.
Lasse Collin [Sun, 2 Feb 2025 10:51:03 +0000 (12:51 +0200)]
Build: Check for -fsanitize= also in $CC
People may put -fsanitize in CC instead of CFLAGS so check both.
Landlock sandbox isn't compatible with sanitizers so it's nice
to catch the incompatible options at configure time.
Don't attempt to do the same in CMakeLists.txt; the check for
CMAKE_C_FLAGS / CFLAGS shall be enough there. The extra flags from
the CC environment variable go into the undocumented internal variable
CMAKE_C_COMPILER_ARG1 (all flags from CC go into that same variable).
Peeking the internal variable merely for improved diagnostics isn't
worth it.
Lasse Collin [Tue, 28 Jan 2025 14:28:18 +0000 (16:28 +0200)]
Windows: Avoid an error message on broken pipe
Also make xz not process more input files after a broken pipe has
been detected. This matches the behavior on POSIX. If all files
are being written to standard output, trying with the next file is
pointless when it's known that standard output won't accept more data.
xzdec already stopped after the first error. It does so with all
errors, so it differs from xz:
$ xz -dc not_found_1 not_found_2
xz: not_found_1: No such file or directory
xz: not_found_2: No such file or directory
$ xzdec not_found_1 not_found_2
xzdec: not_found_1: No such file or directory
Lasse Collin [Wed, 22 Jan 2025 13:03:55 +0000 (15:03 +0200)]
Windows: Disable MinGW-w64's stdio functions in size-optimized builds
This only affects builds with UCRT. With legacy MSVCRT, the replacement
functions are always enabled.
Omitting the MinGW-w64 replacements saves over 20 KiB per executable.
The downside is that --enable-small or XZ_SMALL=ON disables thousand
separator support in xz messages. If someone is OK with the slower
speed of slightly smaller builds, lack of thousand separators won't
matter.
Don't override __USE_MINGW_ANSI_STDIO if it is already defined (via
CPPFLAGS or such method).
Lasse Collin [Mon, 13 Jan 2025 06:44:58 +0000 (08:44 +0200)]
liblzma: memcmplen.h: Use 8-byte method on 64-bit unaligned archs
Previously it was enabled only on x86-64 and ARM64 when also support
for unaligned access was detected or manually enabled at built time.
In the default build configuration, the 8-byte method is now enabled
also on 64-bit RISC-V and 64-bit PowerPC (both endiannesses). It was
reported that on big endian POWER9, encoding time may reduce 12-13 %.
This change only affects builds with GCC and Clang because the code
uses __builtin_ctzll or __builtin_clzll.
Lasse Collin [Fri, 10 Jan 2025 11:11:40 +0000 (13:11 +0200)]
Translations: Update Serbian translation
I rewrapped a few overlong lines. Those edits aren't in the
Translation Project. Automatic wrapping in the master branch
means that these strings need to be updated soon anyway.
Lasse Collin [Wed, 8 Jan 2025 17:08:08 +0000 (19:08 +0200)]
xz: Workaround broken O_SEARCH in musl
Testing with musl 1.2.5 and Linux 6.12, O_SEARCH doesn't result
in a file descriptor that works with fsync() although it should work.
See the added comment.
Lasse Collin [Sun, 5 Jan 2025 19:43:11 +0000 (21:43 +0200)]
xz: O_SEARCH cannot be used for fsync()
Opening a directory with O_SEARCH results in a file descriptor that can
be used with functions like openat(). Such a file descriptor cannot be
used with fsync(). Use O_RDONLY instead.
In musl, O_SEARCH becomes Linux-specific O_PATH. A file descriptor
from O_PATH doesn't allow fsync().
Seems that it's not possible to fsync() a directory that has write
and search permissions but not read permission.
tuklib_mask_nonprint() may call mbrtowc() and malloc() which may modify
errno. If errno isn't preserved, the error message might be wrong if
a compiler decides to call tuklib_mask_nonprint() before strerror().
Lasse Collin [Sun, 5 Jan 2025 18:14:49 +0000 (20:14 +0200)]
xz: Use fsync() before deleting the input file, and add --no-sync
xz's default behavior is to delete the input file after successful
compression or decompression (unless writing to standard output).
If the system crashes soon after the deletion, it is possible that
the newly written file has not yet hit the disk while the previous
delete operation might have. In that case neither the original file
nor the written file is available.
Call fsync() on the file. On POSIX systems, sync also the directory
where the file was created.
Add a new option --no-sync which disables fsync() usage. It can avoid
a (possibly significant) performance penalty when processing many
small files. It's fine to use --no-sync when one knows that the files
are easy to recreate or restore after a system crash.
Using fsync() after every flush initiated by --flush-timeout was
considered. It wasn't implemented at least for now.
- --flush-timeout is typically used when writing to stdout. If stdout
is a file, xz cannot (portably) sync the directory of the file.
One would need to create the output file first, sync the directory,
and then run xz with fsync() enabled.
- If xz --flush-timeout output goes to a file, it's possible to use
a separate script to sync the file, for example, once per minute
while telling xz to flush more frequently.
- Not supporting syncing with --flush-timeout was simpler.
Portability notes:
- On systems that lack O_SEARCH (like Linux), "xz dir/file" will now
fail if "dir" cannot be opened for reading. If "dir" still has
write and search permissions (like d-wx------ in "ls -l"),
previously xz would have been able to compress "dir/file" still.
Now it only works if using --no-sync (or --keep or --stdout).
- <libgen.h> and dirname() should be available on all POSIX systems,
and aren't needed on non-POSIX systems.
- fsync() is available on all POSIX systems. The directory syncing
could be changed to fdatasync() although at least on ext4 it
doesn't seem to make a performance difference in xz's usage.
fdatasync() would need a build system check to support (old)
special cases, for example, MINIX 3.3.0 doesn't have fdatasync()
and Solaris 10 needs -lrt.
- On native Windows, _commit() is used to replace fsync(). Directory
syncing isn't done and shouldn't be needed. (In Cygwin, fsync() on
directories is a no-op.)
- DJGPP has fsync() for files. ;-)
Using fsync() was considered somewhere around 2009 and again in 2016 but
those times the idea was rejected. For comparison, GNU gzip 1.7 (2016)
added the option --synchronous which enables fsync().
Lasse Collin [Sun, 5 Jan 2025 10:10:05 +0000 (12:10 +0200)]
liblzma: Always validate the first digit of a preset string
lzma_str_to_filters() may call parse_lzma12_preset() in two ways. The
call from str_to_filters() detects the string type from the first
character(s) and as a side-effect it validates the first digit of
the preset string. So this change makes no difference there.
However, the call from parse_options() doesn't pre-validate the string.
parse_lzma12_preset() will return an invalid value which is passed to
lzma_lzma_preset() which safely rejects it. The bug still affects the
the error message:
Lasse Collin [Sun, 5 Jan 2025 09:40:34 +0000 (11:40 +0200)]
xz: Fix getopt_long argument type in --filters*
Forgetting the argument (or not using = to separate the option from
the argument) resulted in lzma_str_to_filters() being called with NULL
as input string argument. The function handles it fine but xz passes
the NULL to printf() too:
$ xz --filters
xz: Error in --filters=FILTERS option:
xz: (null)
xz: ^
xz: Unexpected NULL pointer argument(s) to lzma_str_to_filters()
Now it's correct:
$ xz --filters
xz: option '--filters' requires an argument
The --filters-help option doesn't take any arguments.
Lasse Collin [Sat, 4 Jan 2025 13:02:16 +0000 (15:02 +0200)]
xz: Avoid printf formats like %2$s
It's a POSIX feature that isn't in standard C. It's not available on
Windows. Even MinGW-w64 with __USE_MINGW_ANSI_STDIO doesn't support
it even though it supports POSIX %'d for thousand separators.
Gettext's <libintl.h> provides overrides for printf and other functions
which do support the %2$s formats. Translations use them. But xz should
work on Windows without <libintl.h> too.
Lasse Collin [Thu, 2 Jan 2025 13:32:10 +0000 (15:32 +0200)]
xz: Use my_landlock.h
A slightly silly thing is that xz may now query the ABI version up to
three times. We could call my_landlock_ruleset_attr_forbid_all() only
once and cache the result but it didn't seem worth doing.