Lasse Collin [Wed, 1 Jan 2025 13:34:51 +0000 (15:34 +0200)]
Build: Use -Wimplicit-fallthrough=5 when supported
Now that we have the FALLTHROUGH macro, use the strictest mode with
GCC so that comment-based fallthrough markings are no longer accepted.
In GCC, -Wextra includes -Wimplicit-fallthrough=3 and
-Wimplicit-fallthrough is the same as -Wimplicit-fallthrough=3.
Thus, the strict mode requires specifying -Wimplicit-fallthrough=5.
Clang has -Wimplicit-fallthrough which is *not* enabled by -Wextra.
Clang doesn't have a variant that takes an argument. Thus we need
to check for -Wimplicit-fallthrough. Do it before checking for
-Wimplicit-fallthrough=5 so that the latter overrides the former
when using GCC.
Lasse Collin [Thu, 2 Jan 2025 11:35:48 +0000 (13:35 +0200)]
Windows: Make NLS require UCRT and gettext-runtime >= 0.23.1
Also remove the recently-added workaround from tuklib_gettext.h.
Requiring a new enough gettext-runtime is cleaner. I guess it's
mostly MSYS2 where xz is built with translation support, so once
MSYS2 has Gettext >= 0.23.1, this requirement shouldn't be a problem
in practice.
Lasse Collin [Mon, 30 Dec 2024 08:51:26 +0000 (10:51 +0200)]
xz man page: Describe the source file deletion in -z and -d options
The DESCRIPTION section always explained it, and the OPTIONS section
only described the differences to the default behavior. However, new
users in a hurry may skip reading DESCRIPTION. The default behavior
is a bit dangerous, thus it's good to repeat in --compress and
--decompress docs that source file is removed after successful operation.
Lasse Collin [Sat, 28 Dec 2024 16:28:56 +0000 (18:28 +0200)]
CMake/macOS: Use GNU Libtool compatible shared library versioning
Because this increases the Mach-O compatibility_version, this commit
shouldn't cause any ABI compatibility trouble for existing CMake users
on macOS. This is assuming that they won't later downgrade to an older
liblzma version that was built with CMake before this commit.
Meson allows customising the Mach-O versioning too. So the three
build systems can be configured to be compatible.
liblzma and xz can't be compiled as a unity/jumbo build because of
redeclarations and type name reuse. The CMake documentation recommends
setting UNITY_BUILD to false in this case.
This is especially important if we're compiled as a subproject and the
consumer wants to use CMAKE_UNITY_BUILD=ON for the rest of their code
base.
Lasse Collin [Thu, 19 Dec 2024 16:31:09 +0000 (18:31 +0200)]
xzdec: Use setlocale() instead of tuklib_gettext_setlocale()
xzdec isn't translated and doesn't need libintl on Windows even
when NLS is enabled, thus libintl_setlocale() cannot interfere
with the locale settings. Thus, standard setlocale() works perfectly.
In the commit 78868b6e, the explanation in the commit message is wrong.
Lasse Collin [Thu, 19 Dec 2024 17:36:15 +0000 (19:36 +0200)]
Windows: Revert the setlocale(LC_ALL, ".UTF8") documentation
Only leave the FindFileFirstA() notes from 20dfca81, reverting
the incorrect setlocale() notes. On Windows, Gettext's <libintl.h>
overrides setlocale() with libintl_setlocale() wrapper. I hadn't
noticed this, and thus my conclusions were wrong.
Lasse Collin [Wed, 18 Dec 2024 12:00:09 +0000 (14:00 +0200)]
xz: Use tuklib_mbstr_nonprint
Call tuklib_mask_nonprint() on filenames and also on a few other
strings from the command line too.
The filename printed by "xz --robot --list" (in list.c) is also masked.
It's good to get rid of tabs and newlines which would desync the output
but masking other chars wouldn't be strictly necessary. It might matter
with sensible filenames if LC_CTYPE is "C" (when iswprint() might reject
non-ASCII chars) and a script wants to read a filename from xz's output.
Hopefully it's an unusual enough corner case to not be a real problem.
Lasse Collin [Wed, 18 Dec 2024 12:00:09 +0000 (14:00 +0200)]
Add tuklib_mbstr_nonprint to mask non-printable characters
Malicious filenames or other untrusted strings may affect the state of
the terminal when such strings are printed as part of (error) messages.
Add functions that mask such characters.
It's not enough to handle only single-byte control characters.
In multibyte locales, some control characters are multibyte too, for
example, terminals interpret C1 control characters (U+0080 to U+009F)
that are two bytes as UTF-8.
Instead of checking for control characters with iswcntrl(), this
uses iswprint() to detect printable characters. This is much stricter.
On Windows it's actually too strict as it rejects some characters that
definitely are printable.
Gnulib's quotearg would do a lot more but I hope this simpler method
is good enough here.
Thanks to Ryan Colyer for the discussion about the problems of
the earlier single-byte-only method.
Thanks to Christian Weisgerber for reporting a bug in an earlier
version of this code.
Lasse Collin [Wed, 18 Dec 2024 09:33:09 +0000 (11:33 +0200)]
Translations: Add preliminary Georgian translation
Most of the auto-wrapped strings are translated already. A few
strings have changed since this was created though. This file
isn't in the Translation Project *yet* because these strings
are still very new.
Lasse Collin [Mon, 16 Dec 2024 16:43:52 +0000 (18:43 +0200)]
Add tuklib_mbstr_wrap for automatic word wrapping
Automatic word wrapping makes translators' work easier and reduces
errors like misaligned columns or overlong lines. Right-to-left
languages and languages that don't use spaces between words will
still need extra effort. (xz hasn't been translated to any RTL
language so far.)
Lasse Collin [Mon, 16 Dec 2024 18:06:07 +0000 (20:06 +0200)]
tuklib_mbstr_width: Change the behavior when wcwidth() is not available
If wcwidth() isn't available (Windows), previously it was assumed
that one byte == one column in the terminal. Now it is assumed that
one multibyte character == one column. This works better with UTF-8.
Languages that only use single-width characters without any combining
characters should work correctly with this.
In xz, none of po/*.po contain combining characters and only ko.po,
zh_CN.po, and zh_TW.po contain fullwidth characters. Thus, "only"
those three translations in xz are broken on Windows with the
UTF-8 code page. Broken means that column headings in xz -lvv and
(only in the master branch) strings in --long-help are misaligned,
so it's not a huge problem. I don't know if those three languages
displayed perfectly before the UTF-8 change because I hadn't tested
translations with native Windows builds before.
Lasse Collin [Wed, 18 Dec 2024 12:23:13 +0000 (14:23 +0200)]
xzdec: Use setlocale() via tuklib_gettext_setlocale()
xzdec isn't translated and didn't have locale-specific behavior
in the past. On Windows with UTF-8 in the application manifest,
setting the locale makes a difference though:
- Without any setlocale() call, non-ASCII filenames don't display
properly in Command Prompt unless one first uses "chcp 65001"
to set the console code page to UTF-8.
- setlocale(LC_ALL, "") is enough to make non-ASCII filenames
print correctly in Command Prompt without using "chcp 65001",
assuming that the non-UTF-8 code page (like 850) supports
those non-ASCII characters.
- setlocale(LC_ALL, ".UTF8") is even better because then mbrtowc() and
such functions use an UTF-8 locale instead of a legacy code page.
The tuklib_gettext_setlocale() macro takes care of this (without
enabling any translations).
Lasse Collin [Tue, 17 Dec 2024 12:59:37 +0000 (14:59 +0200)]
Windows: Use UTF-8 locale when active code page is UTF-8
XZ Utils 5.6.3 set the active code page to UTF-8 to fix CVE-2024-47611.
This wasn't paired with UCRT-specific setlocale(LC_ALL, ".UTF8"), thus
non-ASCII characters from translations became mojibake.
Lasse Collin [Sun, 15 Dec 2024 17:08:32 +0000 (19:08 +0200)]
CMake: Bump maximum policy version to 3.31
With CMake 3.31, there were a few warnings from
CMP0177 "install() DESTINATION paths are normalized".
These occurred because the install(FILES) command in
my_install_man_lang() is called with a DESTINATION path
that contains two consecutive slashes, for example,
"share/man//man1". Such a path is for the English man pages.
With translated man pages, the language code goes between
the slashes. The warning was probably triggered because the
extra slash gets removed by the normalization.
Lasse Collin [Sun, 1 Dec 2024 19:38:17 +0000 (21:38 +0200)]
doc/SHA256SUMS: Add the list of SHA-256 hashes of release files
The release files are signed but verifying the signatures cannot
catch certain types of attacks:
1. A malicious maintainer could make more than one variant of
a package. One could be for general distribution. Another
with malicious content could be targeted to specific users,
for example, distributing the malicious version on a mirror
controlled by the attacker.
2. If the signing key of an honest maintainer was compromised
without being detected, a similar situation as described
above could occur.
SHA256SUMS could be put on the project website but having it in
the Git repository makes it obvious that old lines aren't modified
when the file is updated.
Hashes of uncompressed files are included too. This way tarballs
can be recompressed and the hashes can still be verified.
Lasse Collin [Sat, 30 Nov 2024 10:05:59 +0000 (12:05 +0200)]
Docs: Remove .github/SECURITY.md
One of the reasons to have this file in the xz repository was to
show vulnerability reporting info in the Security section on GitHub.
On 2024-11-25, I added SECURITY.md to the tukaani-project organization
on GitHub:
GitHub shows that file in all projects in the organization unless
overridden by a project-specific SECURITY.md. Thus, removing
the file from the xz repo makes GitHub show the organization-wide
text instead.
Maintaining a single copy for the whole GitHub organization makes
things simpler. It's also nicer to have fewer GitHub-specific files
in the xz repo. Information how to report bugs (including security
issues) is available in README and on the home page too.
The OpenSSF Scorecard tool didn't find .github/SECURITY.md from the
xz repository. There was a suggestion to move the file to the top-level
directory where Scorecard should find it. However, Scorecard does find
the organization-wide SECURITY.md. Thus, the file isn't needed in the
xz repository to score points in the Scorecard game:
Lasse Collin [Tue, 1 Oct 2024 09:10:23 +0000 (12:10 +0300)]
Windows: Embed an application manifest in the EXE files
IMPORTANT: This includes a security fix to command line tool
argument handling.
Some toolchains embed an application manifest by default to declare
UAC-compliance. Some also declare compatibility with Vista/8/8.1/10/11
to let the app access features newer than those of Vista.
We want all the above but also two more things:
- Declare that the app is long path aware to support paths longer
than 259 characters (this may also require a registry change).
- Force the code page to UTF-8. This allows the command line tools
to access files whose names contain characters that don't exist
in the current legacy code page (except unpaired surrogates).
The UTF-8 code page also fixes security issues in command line
argument handling which can be exploited with malicious filenames.
See the new file w32_application.manifest.comments.txt.
Thanks to Orange Tsai and splitline from DEVCORE Research Team
for discovering this issue.
Thanks to Vijay Sarvepalli for reporting the issue to me.
Thanks to Kelvin Lee for testing with MSVC and helping with
the required build system fixes.
Lasse Collin [Sun, 29 Sep 2024 11:46:52 +0000 (14:46 +0300)]
Windows: Set DLL name accurately in StringFileInfo on Cygwin and MSYS2
Now the information in the "Details" tab in the file properties
dialog matches the naming convention of Cygwin and MSYS2. This
is only a cosmetic change.
Lasse Collin [Sat, 28 Sep 2024 17:09:50 +0000 (20:09 +0300)]
CMake: Add the resource files to the Cygwin and MSYS2 builds
Autotools-based build has always done this so this is for consistency.
However, the CMake build won't create the DEF file when building
for Cygwin or MSYS2 because in that context it should be useless.
(If Cygwin or MSYS2 is used to host building of normal Windows
binaries then the DEF file is still created.)
"xzdec -M123" exited with exit status 1 without printing
any messages. The "M:" entry should have been removed when
the memory usage limiter support was removed from xzdec.
Yifeng Li [Thu, 22 Aug 2024 02:18:49 +0000 (02:18 +0000)]
liblzma: Fix x86-64 movzw compatibility in range_decoder.h
Support for instruction "movzw" without suffix in "GNU as" was
added in commit [1] and stabilized in binutils 2.27, released
in August 2016. Earlier systems don't accept this instruction
without a suffix, making range_decoder.h's inline assembly
unable to build on old systems such as Ubuntu 16.04, creating
error messages like:
lzma_decoder.c: Assembler messages:
lzma_decoder.c:371: Error: no such instruction: `movzw 2(%r11),%esi'
lzma_decoder.c:373: Error: no such instruction: `movzw 4(%r11),%edi'
lzma_decoder.c:388: Error: no such instruction: `movzw 6(%r11),%edx'
lzma_decoder.c:398: Error: no such instruction: `movzw (%r11,%r14,4),%esi'
OpenBSD 7.6 will support elf_aux_info(3), and the detection code used
on FreeBSD will work on OpenBSD 7.6 too. Keep things simpler and drop
the OpenBSD-specific sysctl() method.
Lasse Collin [Sat, 6 Jul 2024 11:04:48 +0000 (14:04 +0300)]
xz: Remove the TODO comment about --recursive
It won't be implemented. find + xargs is more flexible, for example,
it allows compressing small files in parallel. An example for that
has been included in the xz man page since 2010.
Lasse Collin [Wed, 3 Jul 2024 17:45:48 +0000 (20:45 +0300)]
CMake: Link xz against Threads::Threads if using pthreads
The liblzma target was recently changed to link against Threads::Threads
with the PRIVATE keyword. I had forgotten that xz itself depends on
pthreads too due to pthread_sigmask(). Thus, the build broke when
building shared liblzma and pthread_sigmask() wasn't in libc.
Lasse Collin [Tue, 2 Jul 2024 17:19:47 +0000 (20:19 +0300)]
CMake: Update the comment at the top of CMakeLists.txt
While po/*.gmo files won't be used from the release tarball,
the generated translated man pages will be used still. Those
are text files and po4a has slightly more dependencies than
gettext tools so installing po4a might be a bit more challenging
in some situations.
Lasse Collin [Tue, 2 Jul 2024 17:12:40 +0000 (20:12 +0300)]
CMake: Drop support for pre-generated po/*.gmo files
When a release tarball is created using Autotools, the tarball includes
po/*.gmo files which are binary files generated from po/*.po. Other
tarball creation methods don't and won't create the .gmo files.
It feels clearer if CMake will never install pre-generated binary files
from the source package. If people are able to install CMake, they
likely are able to install gettext tools as well (assuming they want
translations).
Lasse Collin [Tue, 2 Jul 2024 16:14:50 +0000 (19:14 +0300)]
CMake: Make XZ_NLS handling more robust
If a user set XZ_NLS=ON but find_package(Intl) failed or CMake version
wasn't at least 3.20, the configuration would fail in a cryptic way.
If XZ_NLS is enabled, require that CMake is new enough and that either
gettext tools or pre-generated .gmo files are available. Otherwise fail
the configuration. Previously missing gettext tools and .gmo files would
only result in a warning.
Missing man page translations are still only a warning.
Lasse Collin [Tue, 2 Jul 2024 15:02:50 +0000 (18:02 +0300)]
CMake: The compile definition is ENABLE_NLS, not XZ_NLS
The CMake variables were renamed and accidentally also
the compile definition was renamed. As a result, translation
support wasn't actually enabled in the executables.