]> git.ipfire.org Git - thirdparty/xz.git/log
thirdparty/xz.git
7 months agoxz: O_SEARCH cannot be used for fsync()
Lasse Collin [Sun, 5 Jan 2025 19:43:11 +0000 (21:43 +0200)] 
xz: O_SEARCH cannot be used for fsync()

Opening a directory with O_SEARCH results in a file descriptor that can
be used with functions like openat(). Such a file descriptor cannot be
used with fsync(). Use O_RDONLY instead.

In musl, O_SEARCH becomes Linux-specific O_PATH. A file descriptor
from O_PATH doesn't allow fsync().

Seems that it's not possible to fsync() a directory that has write
and search permissions but not read permission.

Fixes: 2a9e91d796d091740489d951fa7780525e4275f1
7 months agoCI: Make ctest show errors from failed tests
Lasse Collin [Sun, 5 Jan 2025 18:48:28 +0000 (20:48 +0200)] 
CI: Make ctest show errors from failed tests

7 months agotuklib_mbstr_nonprint: Preserve the value of errno
Lasse Collin [Sun, 5 Jan 2025 18:14:49 +0000 (20:14 +0200)] 
tuklib_mbstr_nonprint: Preserve the value of errno

A typical use case is like this:

    printf("%s: %s\n", tuklib_mask_nonprint(filename), strerror(errno));

tuklib_mask_nonprint() may call mbrtowc() and malloc() which may modify
errno. If errno isn't preserved, the error message might be wrong if
a compiler decides to call tuklib_mask_nonprint() before strerror().

Fixes: 40e573305535960574404d2eae848b248c95ea7e
7 months agoxz: Use fsync() before deleting the input file, and add --no-sync
Lasse Collin [Sun, 5 Jan 2025 18:14:49 +0000 (20:14 +0200)] 
xz: Use fsync() before deleting the input file, and add --no-sync

xz's default behavior is to delete the input file after successful
compression or decompression (unless writing to standard output).
If the system crashes soon after the deletion, it is possible that
the newly written file has not yet hit the disk while the previous
delete operation might have. In that case neither the original file
nor the written file is available.

Call fsync() on the file. On POSIX systems, sync also the directory
where the file was created.

Add a new option --no-sync which disables fsync() usage. It can avoid
a (possibly significant) performance penalty when processing many
small files. It's fine to use --no-sync when one knows that the files
are easy to recreate or restore after a system crash.

Using fsync() after every flush initiated by --flush-timeout was
considered. It wasn't implemented at least for now.

  - --flush-timeout is typically used when writing to stdout. If stdout
    is a file, xz cannot (portably) sync the directory of the file.
    One would need to create the output file first, sync the directory,
    and then run xz with fsync() enabled.

  - If xz --flush-timeout output goes to a file, it's possible to use
    a separate script to sync the file, for example, once per minute
    while telling xz to flush more frequently.

  - Not supporting syncing with --flush-timeout was simpler.

Portability notes:

  - On systems that lack O_SEARCH (like Linux), "xz dir/file" will now
    fail if "dir" cannot be opened for reading. If "dir" still has
    write and search permissions (like d-wx------ in "ls -l"),
    previously xz would have been able to compress "dir/file" still.
    Now it only works if using --no-sync (or --keep or --stdout).

  - <libgen.h> and dirname() should be available on all POSIX systems,
    and aren't needed on non-POSIX systems.

  - fsync() is available on all POSIX systems. The directory syncing
    could be changed to fdatasync() although at least on ext4 it
    doesn't seem to make a performance difference in xz's usage.
    fdatasync() would need a build system check to support (old)
    special cases, for example, MINIX 3.3.0 doesn't have fdatasync()
    and Solaris 10 needs -lrt.

  - On native Windows, _commit() is used to replace fsync(). Directory
    syncing isn't done and shouldn't be needed. (In Cygwin, fsync() on
    directories is a no-op.)

  - DJGPP has fsync() for files. ;-)

Using fsync() was considered somewhere around 2009 and again in 2016 but
those times the idea was rejected. For comparison, GNU gzip 1.7 (2016)
added the option --synchronous which enables fsync().

Co-authored-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Fixes: https://bugs.debian.org/814089
Link: https://www.mail-archive.com/xz-devel@tukaani.org/msg00282.html
Closes: https://github.com/tukaani-project/xz/pull/151
7 months agoxz: Use "goto" for error handling in io_open_dest_real()
Lasse Collin [Fri, 27 Dec 2024 07:15:50 +0000 (09:15 +0200)] 
xz: Use "goto" for error handling in io_open_dest_real()

7 months agoliblzma: Always validate the first digit of a preset string
Lasse Collin [Sun, 5 Jan 2025 10:10:05 +0000 (12:10 +0200)] 
liblzma: Always validate the first digit of a preset string

lzma_str_to_filters() may call parse_lzma12_preset() in two ways. The
call from str_to_filters() detects the string type from the first
character(s) and as a side-effect it validates the first digit of
the preset string. So this change makes no difference there.

However, the call from parse_options() doesn't pre-validate the string.
parse_lzma12_preset() will return an invalid value which is passed to
lzma_lzma_preset() which safely rejects it. The bug still affects the
the error message:

    $ xz --filters=lzma2:preset=X
    xz: Error in --filters=FILTERS option:
    xz: lzma2:preset=X
    xz:               ^
    xz: Unsupported preset

After the fix:

    $ xz --filters=lzma2:preset=X
    xz: Error in --filters=FILTERS option:
    xz: lzma2:preset=X
    xz:              ^
    xz: Unsupported preset

The ^ now correctly points to the X and not past it because the X itself
is the problematic character.

Fixes: cedeeca2ea6ada5b0411b2ae10d7a859e837f203
7 months agoxz: Fix getopt_long argument type in --filters*
Lasse Collin [Sun, 5 Jan 2025 09:40:34 +0000 (11:40 +0200)] 
xz: Fix getopt_long argument type in --filters*

Forgetting the argument (or not using = to separate the option from
the argument) resulted in lzma_str_to_filters() being called with NULL
as input string argument. The function handles it fine but xz passes
the NULL to printf() too:

    $ xz --filters
    xz: Error in --filters=FILTERS option:
    xz: (null)
    xz: ^
    xz: Unexpected NULL pointer argument(s) to lzma_str_to_filters()

Now it's correct:

    $ xz --filters
    xz: option '--filters' requires an argument

The --filters-help option doesn't take any arguments.

Fixes: 9ded880a0221f4d1256845fc4ab957ffd377c760
Fixes: d6af7f347077b22403133239592e478931307759
Fixes: a165d7df1964121eb9df715e6f836a31c865beef
7 months agoxzdec: Don't leave Landlock file descriptor open for no reason
Lasse Collin [Sat, 4 Jan 2025 18:04:56 +0000 (20:04 +0200)] 
xzdec: Don't leave Landlock file descriptor open for no reason

This fix is similar to 48ff3f06521ca326996ab9a04d1b342098960427.

Fixes: d74fb5f060b76db709b50f5fd37490394e52f975
7 months agoxz: Make --single-stream imply --keep
Lasse Collin [Sat, 4 Jan 2025 18:02:18 +0000 (20:02 +0200)] 
xz: Make --single-stream imply --keep

Suggested by xx on #tukaani on 2024-04-12.

7 months agoUpdate AUTHORS
Lasse Collin [Sat, 4 Jan 2025 17:57:07 +0000 (19:57 +0200)] 
Update AUTHORS

The contributions have been rewritten.

7 months agoxz: Avoid printf formats like %2$s
Lasse Collin [Sat, 4 Jan 2025 13:02:16 +0000 (15:02 +0200)] 
xz: Avoid printf formats like %2$s

It's a POSIX feature that isn't in standard C. It's not available on
Windows. Even MinGW-w64 with __USE_MINGW_ANSI_STDIO doesn't support
it even though it supports POSIX %'d for thousand separators.

Gettext's <libintl.h> provides overrides for printf and other functions
which do support the %2$s formats. Translations use them. But xz should
work on Windows without <libintl.h> too.

Fixes: 3e9177fd206d20d6d8acc7d203c25a9ae0549229
7 months agotuklib_mbstr_wrap: Add printf format attribute
Lasse Collin [Sat, 4 Jan 2025 12:41:37 +0000 (14:41 +0200)] 
tuklib_mbstr_wrap: Add printf format attribute

It's supported by GCC 3.x already.

7 months agoxz: Translate a Windows-specific string
Lasse Collin [Sat, 4 Jan 2025 11:44:12 +0000 (13:44 +0200)] 
xz: Translate a Windows-specific string

Originally I thought that native Windows builds wouldn't be translated
but nowadays at least MSYS2 ships such binaries.

7 months agoxz: Use my_landlock.h
Lasse Collin [Thu, 2 Jan 2025 13:32:10 +0000 (15:32 +0200)] 
xz: Use my_landlock.h

A slightly silly thing is that xz may now query the ABI version up to
three times. We could call my_landlock_ruleset_attr_forbid_all() only
once and cache the result but it didn't seem worth doing.

7 months agoxzdec: Use my_landlock.h
Lasse Collin [Thu, 2 Jan 2025 13:32:10 +0000 (15:32 +0200)] 
xzdec: Use my_landlock.h

7 months agoAdd my_landlock.h with helper functions to use Linux Landlock
Lasse Collin [Thu, 2 Jan 2025 13:32:10 +0000 (15:32 +0200)] 
Add my_landlock.h with helper functions to use Linux Landlock

This supports up to Landlock ABI version 6. The current code in
xz and xzdec only support up to ABI version 4.

7 months agoliblzma: Silence warnings from "clang -Wimplicit-fallthrough"
Lasse Collin [Wed, 1 Jan 2025 16:46:50 +0000 (18:46 +0200)] 
liblzma: Silence warnings from "clang -Wimplicit-fallthrough"

7 months agoBuild: Use -Wimplicit-fallthrough=5 when supported
Lasse Collin [Wed, 1 Jan 2025 13:34:51 +0000 (15:34 +0200)] 
Build: Use -Wimplicit-fallthrough=5 when supported

Now that we have the FALLTHROUGH macro, use the strictest mode with
GCC so that comment-based fallthrough markings are no longer accepted.

In GCC, -Wextra includes -Wimplicit-fallthrough=3 and
-Wimplicit-fallthrough is the same as -Wimplicit-fallthrough=3.
Thus, the strict mode requires specifying -Wimplicit-fallthrough=5.

Clang has -Wimplicit-fallthrough which is *not* enabled by -Wextra.
Clang doesn't have a variant that takes an argument. Thus we need
to check for -Wimplicit-fallthrough. Do it before checking for
-Wimplicit-fallthrough=5 so that the latter overrides the former
when using GCC.

7 months agoReplace "Fall through" comments with FALLTHROUGH
Lasse Collin [Wed, 1 Jan 2025 13:30:50 +0000 (15:30 +0200)] 
Replace "Fall through" comments with FALLTHROUGH

7 months agosysdefs.h: Add FALLTHROUGH macro
Lasse Collin [Wed, 1 Jan 2025 13:08:51 +0000 (15:08 +0200)] 
sysdefs.h: Add FALLTHROUGH macro

7 months agoxzdec: Fix language in a comment
Lasse Collin [Wed, 1 Jan 2025 13:06:15 +0000 (15:06 +0200)] 
xzdec: Fix language in a comment

7 months agoWindows: Make NLS require UCRT and gettext-runtime >= 0.23.1
Lasse Collin [Thu, 2 Jan 2025 11:35:48 +0000 (13:35 +0200)] 
Windows: Make NLS require UCRT and gettext-runtime >= 0.23.1

Also remove the recently-added workaround from tuklib_gettext.h.
Requiring a new enough gettext-runtime is cleaner. I guess it's
mostly MSYS2 where xz is built with translation support, so once
MSYS2 has Gettext >= 0.23.1, this requirement shouldn't be a problem
in practice.

7 months agowindows/build-with-cmake.bat: Fix ENABLE_NLS to XZ_NLS
Lasse Collin [Thu, 2 Jan 2025 09:52:17 +0000 (11:52 +0200)] 
windows/build-with-cmake.bat: Fix ENABLE_NLS to XZ_NLS

Fixes: 29f77c7b707f2458fb047e77497354b195e05b14
7 months agoBuild: Use git log --pretty=medium when creating ChangeLog
Lasse Collin [Mon, 30 Dec 2024 09:21:57 +0000 (11:21 +0200)] 
Build: Use git log --pretty=medium when creating ChangeLog

It's the default in git-log. Specifying it explicitly is good in case
a user has set format.pretty to a different value.

7 months agoWindows: Update MinGW-w64 + CMake instructions to recommend UCRT
Lasse Collin [Mon, 30 Dec 2024 08:51:33 +0000 (10:51 +0200)] 
Windows: Update MinGW-w64 + CMake instructions to recommend UCRT

7 months agoxz man page: Describe the source file deletion in -z and -d options
Lasse Collin [Mon, 30 Dec 2024 08:51:26 +0000 (10:51 +0200)] 
xz man page: Describe the source file deletion in -z and -d options

The DESCRIPTION section always explained it, and the OPTIONS section
only described the differences to the default behavior. However, new
users in a hurry may skip reading DESCRIPTION. The default behavior
is a bit dangerous, thus it's good to repeat in --compress and
--decompress docs that source file is removed after successful operation.

Fixes: https://github.com/tukaani-project/xz/issues/150
7 months agoBuild: Set libtool -version-info so that it matches with CMake
Lasse Collin [Fri, 27 Dec 2024 19:52:28 +0000 (21:52 +0200)] 
Build: Set libtool -version-info so that it matches with CMake

In the past, they haven't been in sync in development versions
although they (of course) have been in stable releases.

7 months agoCMake/macOS: Use GNU Libtool compatible shared library versioning
Lasse Collin [Sat, 28 Dec 2024 16:28:56 +0000 (18:28 +0200)] 
CMake/macOS: Use GNU Libtool compatible shared library versioning

Because this increases the Mach-O compatibility_version, this commit
shouldn't cause any ABI compatibility trouble for existing CMake users
on macOS. This is assuming that they won't later downgrade to an older
liblzma version that was built with CMake before this commit.

Meson allows customising the Mach-O versioning too. So the three
build systems can be configured to be compatible.

7 months agoCMake: Edit a comment
Lasse Collin [Sat, 28 Dec 2024 12:49:45 +0000 (14:49 +0200)] 
CMake: Edit a comment

7 months agoversion.sh: Omit an unwanted dot from development versions
Lasse Collin [Sat, 28 Dec 2024 18:39:49 +0000 (20:39 +0200)] 
version.sh: Omit an unwanted dot from development versions

It printed 5.7.0.alpha instead of 5.7.0alpha.

Fixes: e7a42cda7c827e016619e8cab15e2faf5d4181ae
7 months agoCMake: Remove a duplicate word from a comment
Lasse Collin [Fri, 27 Dec 2024 14:25:07 +0000 (16:25 +0200)] 
CMake: Remove a duplicate word from a comment

7 months agoINSTALL: Document CMAKE_DLL_NAME_WITH_SOVERSION
Lasse Collin [Fri, 27 Dec 2024 14:23:12 +0000 (16:23 +0200)] 
INSTALL: Document CMAKE_DLL_NAME_WITH_SOVERSION

7 months agoxz: Fix comments
Lasse Collin [Thu, 26 Dec 2024 19:27:18 +0000 (21:27 +0200)] 
xz: Fix comments

7 months agoCMake: Disable unity builds project-wide
Dexter Castor Döpping [Sun, 22 Dec 2024 12:44:03 +0000 (13:44 +0100)] 
CMake: Disable unity builds project-wide

liblzma and xz can't be compiled as a unity/jumbo build because of
redeclarations and type name reuse. The CMake documentation recommends
setting UNITY_BUILD to false in this case.

This is especially important if we're compiled as a subproject and the
consumer wants to use CMAKE_UNITY_BUILD=ON for the rest of their code
base.

Closes: https://github.com/tukaani-project/xz/pull/158
7 months agoWindows: Workaround a UTF-8 issue in Gettext's libintl_setlocale()
Lasse Collin [Fri, 20 Dec 2024 06:51:18 +0000 (08:51 +0200)] 
Windows: Workaround a UTF-8 issue in Gettext's libintl_setlocale()

See the comment. In this package, locale is set at program startup and
not changed later, so the point (2) in the comment isn't a problem.

Fixes: 46ee0061629fb075d61d83839e14dd193337af59
7 months agoRevert "Windows: Use UTF-8 locale when active code page is UTF-8"
Lasse Collin [Fri, 20 Dec 2024 04:50:36 +0000 (06:50 +0200)] 
Revert "Windows: Use UTF-8 locale when active code page is UTF-8"

This reverts commit 0d0b574cc45045d6150d397776340c068df59e2a.

7 months agoxzdec: Use setlocale() instead of tuklib_gettext_setlocale()
Lasse Collin [Thu, 19 Dec 2024 16:31:09 +0000 (18:31 +0200)] 
xzdec: Use setlocale() instead of tuklib_gettext_setlocale()

xzdec isn't translated and doesn't need libintl on Windows even
when NLS is enabled, thus libintl_setlocale() cannot interfere
with the locale settings. Thus, standard setlocale() works perfectly.

In the commit 78868b6e, the explanation in the commit message is wrong.

Fixes: 78868b6ed63fa4c89f73e3dfed27abfb8b0d46db
7 months agoWindows: Revert the setlocale(LC_ALL, ".UTF8") documentation
Lasse Collin [Thu, 19 Dec 2024 17:36:15 +0000 (19:36 +0200)] 
Windows: Revert the setlocale(LC_ALL, ".UTF8") documentation

Only leave the FindFileFirstA() notes from 20dfca81, reverting
the incorrect setlocale() notes. On Windows, Gettext's <libintl.h>
overrides setlocale() with libintl_setlocale() wrapper. I hadn't
noticed this, and thus my conclusions were wrong.

Fixes: 20dfca8171dad4c64785ac61d5b68972c444877b
7 months agotuklib_mbstr_wrap: Silence a warning from Clang
Lasse Collin [Wed, 18 Dec 2024 15:49:05 +0000 (17:49 +0200)] 
tuklib_mbstr_wrap: Silence a warning from Clang

Fixes: ca529c3f41a4a19a59e2e252e6dd9255f130c634
7 months agoUpdate THANKS
Lasse Collin [Wed, 18 Dec 2024 12:00:09 +0000 (14:00 +0200)] 
Update THANKS

7 months agoUpdate TODO
Lasse Collin [Wed, 18 Dec 2024 12:00:09 +0000 (14:00 +0200)] 
Update TODO

Fixes: 5f6dddc6c911df02ba660564e78e6de80947c947
7 months agolzmainfo: Use tuklib_mbstr_nonprint
Lasse Collin [Wed, 18 Dec 2024 12:00:09 +0000 (14:00 +0200)] 
lzmainfo: Use tuklib_mbstr_nonprint

7 months agoxzdec: Use tuklib_mbstr_nonprint
Lasse Collin [Wed, 18 Dec 2024 12:00:09 +0000 (14:00 +0200)] 
xzdec: Use tuklib_mbstr_nonprint

7 months agoxz: Use tuklib_mbstr_nonprint
Lasse Collin [Wed, 18 Dec 2024 12:00:09 +0000 (14:00 +0200)] 
xz: Use tuklib_mbstr_nonprint

Call tuklib_mask_nonprint() on filenames and also on a few other
strings from the command line too.

The filename printed by "xz --robot --list" (in list.c) is also masked.
It's good to get rid of tabs and newlines which would desync the output
but masking other chars wouldn't be strictly necessary. It might matter
with sensible filenames if LC_CTYPE is "C" (when iswprint() might reject
non-ASCII chars) and a script wants to read a filename from xz's output.
Hopefully it's an unusual enough corner case to not be a real problem.

7 months agoAdd tuklib_mbstr_nonprint to mask non-printable characters
Lasse Collin [Wed, 18 Dec 2024 12:00:09 +0000 (14:00 +0200)] 
Add tuklib_mbstr_nonprint to mask non-printable characters

Malicious filenames or other untrusted strings may affect the state of
the terminal when such strings are printed as part of (error) messages.
Add functions that mask such characters.

It's not enough to handle only single-byte control characters.
In multibyte locales, some control characters are multibyte too, for
example, terminals interpret C1 control characters (U+0080 to U+009F)
that are two bytes as UTF-8.

Instead of checking for control characters with iswcntrl(), this
uses iswprint() to detect printable characters. This is much stricter.
On Windows it's actually too strict as it rejects some characters that
definitely are printable.

Gnulib's quotearg would do a lot more but I hope this simpler method
is good enough here.

Thanks to Ryan Colyer for the discussion about the problems of
the earlier single-byte-only method.

Thanks to Christian Weisgerber for reporting a bug in an earlier
version of this code.

Thanks to Jeroen Roovers for a typo fix.

Closes: https://github.com/tukaani-project/xz/pull/118
7 months agoTranslations: Add preliminary Georgian translation
Lasse Collin [Wed, 18 Dec 2024 09:33:09 +0000 (11:33 +0200)] 
Translations: Add preliminary Georgian translation

Most of the auto-wrapped strings are translated already. A few
strings have changed since this was created though. This file
isn't in the Translation Project *yet* because these strings
are still very new.

Closes: https://github.com/tukaani-project/xz/pull/145
7 months agoxz: Make one string simpler for translators
Lasse Collin [Wed, 30 Oct 2024 18:50:20 +0000 (20:50 +0200)] 
xz: Make one string simpler for translators

Leading spaces in the string can get miscounted by translators.

7 months agolzmainfo: Sync the translatable strings with xz
Lasse Collin [Tue, 17 Dec 2024 08:26:10 +0000 (10:26 +0200)] 
lzmainfo: Sync the translatable strings with xz

7 months agoxz: Use automatic word wrapping for help texts
Lasse Collin [Tue, 17 Dec 2024 08:26:10 +0000 (10:26 +0200)] 
xz: Use automatic word wrapping for help texts

--long-help is now one line longer because --lzma1 is now on its
own line.

7 months agopo/Makevars: Add --keyword=W_:... to XGETTEXT_OPTIONS
Lasse Collin [Mon, 16 Dec 2024 16:46:45 +0000 (18:46 +0200)] 
po/Makevars: Add --keyword=W_:... to XGETTEXT_OPTIONS

The text was copied from tuklib_gettext.h.

Also rearrange the --keyword options to be last on the line.

7 months agoAdd tuklib_mbstr_wrap for automatic word wrapping
Lasse Collin [Mon, 16 Dec 2024 16:43:52 +0000 (18:43 +0200)] 
Add tuklib_mbstr_wrap for automatic word wrapping

Automatic word wrapping makes translators' work easier and reduces
errors like misaligned columns or overlong lines. Right-to-left
languages and languages that don't use spaces between words will
still need extra effort. (xz hasn't been translated to any RTL
language so far.)

7 months agoBuild: Sort filenames to ASCII order in Makefile.am
Lasse Collin [Tue, 17 Dec 2024 15:57:18 +0000 (17:57 +0200)] 
Build: Sort filenames to ASCII order in Makefile.am

7 months agotuklib_mbstr_width: Add tuklib_mbstr_width_mem()
Lasse Collin [Mon, 21 Oct 2024 15:51:24 +0000 (18:51 +0300)] 
tuklib_mbstr_width: Add tuklib_mbstr_width_mem()

It's a new function split from tuklib_mbstr_width().
It's useful with partial strings that aren't terminated with \0.

7 months agotuklib_mbstr_width: Update a comment about shift states
Lasse Collin [Mon, 16 Dec 2024 18:08:27 +0000 (20:08 +0200)] 
tuklib_mbstr_width: Update a comment about shift states

7 months agotuklib_mbstr_width: Don't mention shift states in the API docs
Lasse Collin [Mon, 21 Oct 2024 15:47:56 +0000 (18:47 +0300)] 
tuklib_mbstr_width: Don't mention shift states in the API docs

It is assumed that this code won't be used with charsets that use
locking shift states.

7 months agotuklib_mbstr_width: Use stricter return value checking
Lasse Collin [Mon, 21 Oct 2024 15:41:41 +0000 (18:41 +0300)] 
tuklib_mbstr_width: Use stricter return value checking

This should make no difference in practice (at least if mbrtowc()
isn't broken).

7 months agotuklib_mbstr_width: Change the behavior when wcwidth() is not available
Lasse Collin [Mon, 16 Dec 2024 18:06:07 +0000 (20:06 +0200)] 
tuklib_mbstr_width: Change the behavior when wcwidth() is not available

If wcwidth() isn't available (Windows), previously it was assumed
that one byte == one column in the terminal. Now it is assumed that
one multibyte character == one column. This works better with UTF-8.
Languages that only use single-width characters without any combining
characters should work correctly with this.

In xz, none of po/*.po contain combining characters and only ko.po,
zh_CN.po, and zh_TW.po contain fullwidth characters. Thus, "only"
those three translations in xz are broken on Windows with the
UTF-8 code page. Broken means that column headings in xz -lvv and
(only in the master branch) strings in --long-help are misaligned,
so it's not a huge problem. I don't know if those three languages
displayed perfectly before the UTF-8 change because I hadn't tested
translations with native Windows builds before.

Fixes: 46ee0061629fb075d61d83839e14dd193337af59
7 months agoxzdec: Use setlocale() via tuklib_gettext_setlocale()
Lasse Collin [Wed, 18 Dec 2024 12:23:13 +0000 (14:23 +0200)] 
xzdec: Use setlocale() via tuklib_gettext_setlocale()

xzdec isn't translated and didn't have locale-specific behavior
in the past. On Windows with UTF-8 in the application manifest,
setting the locale makes a difference though:

  - Without any setlocale() call, non-ASCII filenames don't display
    properly in Command Prompt unless one first uses "chcp 65001"
    to set the console code page to UTF-8.

  - setlocale(LC_ALL, "") is enough to make non-ASCII filenames
    print correctly in Command Prompt without using "chcp 65001",
    assuming that the non-UTF-8 code page (like 850) supports
    those non-ASCII characters.

  - setlocale(LC_ALL, ".UTF8") is even better because then mbrtowc() and
    such functions use an UTF-8 locale instead of a legacy code page.
    The tuklib_gettext_setlocale() macro takes care of this (without
    enabling any translations).

Fixes: 46ee0061629fb075d61d83839e14dd193337af59
7 months agoWindows: Use UTF-8 locale when active code page is UTF-8
Lasse Collin [Tue, 17 Dec 2024 12:59:37 +0000 (14:59 +0200)] 
Windows: Use UTF-8 locale when active code page is UTF-8

XZ Utils 5.6.3 set the active code page to UTF-8 to fix CVE-2024-47611.
This wasn't paired with UCRT-specific setlocale(LC_ALL, ".UTF8"), thus
non-ASCII characters from translations became mojibake.

Fixes: 46ee0061629fb075d61d83839e14dd193337af59
7 months agoWindows: Document the need for setlocale(LC_ALL, ".UTF8")
Lasse Collin [Tue, 17 Dec 2024 13:01:29 +0000 (15:01 +0200)] 
Windows: Document the need for setlocale(LC_ALL, ".UTF8")

Also warn about unpaired surrogates and (somewhat UTF-8-specific)
MAX_PATH issue in FindFirstFileA().

Fixes: 46ee0061629fb075d61d83839e14dd193337af59
7 months agoxzdec: Call tuklib_progname_init() early enough
Lasse Collin [Wed, 18 Dec 2024 12:12:22 +0000 (14:12 +0200)] 
xzdec: Call tuklib_progname_init() early enough

If the early pledge() call on OpenBSD fails, it calls my_errorf()
which requires the "progname" variable.

Fixes: d74fb5f060b76db709b50f5fd37490394e52f975
7 months agoCMake: Bump maximum policy version to 3.31
Lasse Collin [Sun, 15 Dec 2024 17:08:32 +0000 (19:08 +0200)] 
CMake: Bump maximum policy version to 3.31

With CMake 3.31, there were a few warnings from
CMP0177 "install() DESTINATION paths are normalized".
These occurred because the install(FILES) command in
my_install_man_lang() is called with a DESTINATION path
that contains two consecutive slashes, for example,
"share/man//man1". Such a path is for the English man pages.
With translated man pages, the language code goes between
the slashes. The warning was probably triggered because the
extra slash gets removed by the normalization.

7 months agoUpdate THANKS
Lasse Collin [Sun, 15 Dec 2024 16:35:27 +0000 (18:35 +0200)] 
Update THANKS

7 months agoliblzma: Fix incorrect macro name in a comment
Dexter Castor Döpping [Sun, 8 Dec 2024 17:24:29 +0000 (18:24 +0100)] 
liblzma: Fix incorrect macro name in a comment

Fixes: 33b8a24b6646a9dbfd8358405aec466b13078559
Closes: https://github.com/tukaani-project/xz/pull/155
7 months agolicense-check.sh: Add an exception for doc/SHA256SUMS
Lasse Collin [Tue, 17 Dec 2024 08:36:43 +0000 (10:36 +0200)] 
license-check.sh: Add an exception for doc/SHA256SUMS

Fixes: 36b531022f24a2ab57a2dfb9e5052f1c176e9d9a
8 months agodoc/SHA256SUMS: Add the list of SHA-256 hashes of release files
Lasse Collin [Sun, 1 Dec 2024 19:38:17 +0000 (21:38 +0200)] 
doc/SHA256SUMS: Add the list of SHA-256 hashes of release files

The release files are signed but verifying the signatures cannot
catch certain types of attacks:

1. A malicious maintainer could make more than one variant of
   a package. One could be for general distribution. Another
   with malicious content could be targeted to specific users,
   for example, distributing the malicious version on a mirror
   controlled by the attacker.

2. If the signing key of an honest maintainer was compromised
   without being detected, a similar situation as described
   above could occur.

SHA256SUMS could be put on the project website but having it in
the Git repository makes it obvious that old lines aren't modified
when the file is updated.

Hashes of uncompressed files are included too. This way tarballs
can be recompressed and the hashes can still be verified.

8 months agoDocs: Remove .github/SECURITY.md
Lasse Collin [Sat, 30 Nov 2024 10:05:59 +0000 (12:05 +0200)] 
Docs: Remove .github/SECURITY.md

One of the reasons to have this file in the xz repository was to
show vulnerability reporting info in the Security section on GitHub.
On 2024-11-25, I added SECURITY.md to the tukaani-project organization
on GitHub:

    https://github.com/tukaani-project/.github/blob/main/SECURITY.md

GitHub shows that file in all projects in the organization unless
overridden by a project-specific SECURITY.md. Thus, removing
the file from the xz repo makes GitHub show the organization-wide
text instead.

Maintaining a single copy for the whole GitHub organization makes
things simpler. It's also nicer to have fewer GitHub-specific files
in the xz repo. Information how to report bugs (including security
issues) is available in README and on the home page too.

The OpenSSF Scorecard tool didn't find .github/SECURITY.md from the
xz repository. There was a suggestion to move the file to the top-level
directory where Scorecard should find it. However, Scorecard does find
the organization-wide SECURITY.md. Thus, the file isn't needed in the
xz repository to score points in the Scorecard game:

    https://scorecard.dev/viewer/?uri=github.com/tukaani-project/xz

Closes: https://github.com/tukaani-project/xz/issues/148
Closes: https://github.com/tukaani-project/xz/pull/149
8 months agoTranslations: Update the Chinese (traditional) translation
Lasse Collin [Sat, 30 Nov 2024 08:27:14 +0000 (10:27 +0200)] 
Translations: Update the Chinese (traditional) translation

8 months agoliblzma: Optimize the loop conditions in BCJ filters
Lasse Collin [Wed, 30 Oct 2024 17:54:34 +0000 (19:54 +0200)] 
liblzma: Optimize the loop conditions in BCJ filters

Compilers cannot optimize the addition "i + 4" away since theoretically
it could overflow.

8 months agoUpdate THANKS
Lasse Collin [Mon, 25 Nov 2024 14:26:54 +0000 (16:26 +0200)] 
Update THANKS

8 months agoxz: Landlock: Fix a file descriptor leak
Mark Wielaard [Mon, 25 Nov 2024 10:28:44 +0000 (12:28 +0200)] 
xz: Landlock: Fix a file descriptor leak

10 months agoCI: update FreeBSD, NetBSD, OpenBSD, Solaris actions
Sam James [Wed, 2 Oct 2024 02:04:03 +0000 (03:04 +0100)] 
CI: update FreeBSD, NetBSD, OpenBSD, Solaris actions

Checked the changes and they're all innocuous. This should hopefully
fix the "externally managed" pip error in these jobs that started
recently.

10 months agoAdd NEWS for 5.6.3
Lasse Collin [Tue, 1 Oct 2024 09:17:39 +0000 (12:17 +0300)] 
Add NEWS for 5.6.3

10 months agocmake/tuklib_large_file_support.cmake: Add a missing include
Lasse Collin [Tue, 1 Oct 2024 11:49:41 +0000 (14:49 +0300)] 
cmake/tuklib_large_file_support.cmake: Add a missing include

v5.2 didn't build with CMake. Other branches had
include(CMakePushCheckState) in top-level CMakeLists.txt
which made the build work.

Fixes: 597f49b61475438a43a417236989b2acc968a686
10 months agoUpdate THANKS
Lasse Collin [Tue, 1 Oct 2024 09:10:23 +0000 (12:10 +0300)] 
Update THANKS

10 months agoTests/Windows: Add the application manifest to the test programs
Lasse Collin [Tue, 1 Oct 2024 09:10:23 +0000 (12:10 +0300)] 
Tests/Windows: Add the application manifest to the test programs

This ensures that the test programs get executed the same way as
the binaries that are installed.

10 months agolicense-check.sh: Add an exception for w32_application.manifest
Lasse Collin [Tue, 1 Oct 2024 09:10:23 +0000 (12:10 +0300)] 
license-check.sh: Add an exception for w32_application.manifest

The file gets embedded as is into executables, thus it cannot
hold a license identifier.

10 months agoWindows: Embed an application manifest in the EXE files
Lasse Collin [Tue, 1 Oct 2024 09:10:23 +0000 (12:10 +0300)] 
Windows: Embed an application manifest in the EXE files

IMPORTANT: This includes a security fix to command line tool
           argument handling.

Some toolchains embed an application manifest by default to declare
UAC-compliance. Some also declare compatibility with Vista/8/8.1/10/11
to let the app access features newer than those of Vista.

We want all the above but also two more things:

  - Declare that the app is long path aware to support paths longer
    than 259 characters (this may also require a registry change).

  - Force the code page to UTF-8. This allows the command line tools
    to access files whose names contain characters that don't exist
    in the current legacy code page (except unpaired surrogates).
    The UTF-8 code page also fixes security issues in command line
    argument handling which can be exploited with malicious filenames.
    See the new file w32_application.manifest.comments.txt.

Thanks to Orange Tsai and splitline from DEVCORE Research Team
for discovering this issue.

Thanks to Vijay Sarvepalli for reporting the issue to me.

Thanks to Kelvin Lee for testing with MSVC and helping with
the required build system fixes.

10 months agoWindows: Set DLL name accurately in StringFileInfo on Cygwin and MSYS2
Lasse Collin [Sun, 29 Sep 2024 11:46:52 +0000 (14:46 +0300)] 
Windows: Set DLL name accurately in StringFileInfo on Cygwin and MSYS2

Now the information in the "Details" tab in the file properties
dialog matches the naming convention of Cygwin and MSYS2. This
is only a cosmetic change.

10 months agocommon_w32res.rc: White space edits
Lasse Collin [Wed, 25 Sep 2024 12:47:55 +0000 (15:47 +0300)] 
common_w32res.rc: White space edits

LANGUAGE and VS_VERSION_INFO begin new statements so put an empty line
between them.

10 months agoCMake: Add the resource files to the Cygwin and MSYS2 builds
Lasse Collin [Sat, 28 Sep 2024 17:09:50 +0000 (20:09 +0300)] 
CMake: Add the resource files to the Cygwin and MSYS2 builds

Autotools-based build has always done this so this is for consistency.

However, the CMake build won't create the DEF file when building
for Cygwin or MSYS2 because in that context it should be useless.
(If Cygwin or MSYS2 is used to host building of normal Windows
binaries then the DEF file is still created.)

10 months agoCMake: Fix Windows resource file dependencies
Lasse Collin [Sat, 28 Sep 2024 12:19:14 +0000 (15:19 +0300)] 
CMake: Fix Windows resource file dependencies

If common_w32res.rc is modified, the resource files need to be rebuilt.
In contrast, the liblzma*.map files truly are link dependencies.

10 months agoCMake: Checking for CYGWIN covers MSYS2 too
Lasse Collin [Sat, 28 Sep 2024 22:20:03 +0000 (01:20 +0300)] 
CMake: Checking for CYGWIN covers MSYS2 too

On MSYS2, both CYGWIN and MSYS are set.

10 months agoTranslations: Add the SPDX license identifier to pt_BR.po
Lasse Collin [Sat, 28 Sep 2024 06:37:30 +0000 (09:37 +0300)] 
Translations: Add the SPDX license identifier to pt_BR.po

10 months agoWindows/CMake: Use the correct resource file for lzmadec.exe
Lasse Collin [Wed, 25 Sep 2024 13:41:37 +0000 (16:41 +0300)] 
Windows/CMake: Use the correct resource file for lzmadec.exe

CMakeLists.txt was using xzdec_w32res.rc for both xzdec and lzmadec.

Fixes: 998d0b29536094a89cf385a3b894e157db1ccefe
10 months agoTranslations: Update the Brazilian Portuguese translation
Lasse Collin [Wed, 25 Sep 2024 18:29:59 +0000 (21:29 +0300)] 
Translations: Update the Brazilian Portuguese translation

10 months agoUpdate THANKS
Lasse Collin [Mon, 16 Sep 2024 22:21:15 +0000 (01:21 +0300)] 
Update THANKS

10 months agolzmainfo: Avoid integer overflow
Tobias Stoeckmann [Mon, 16 Sep 2024 21:19:46 +0000 (23:19 +0200)] 
lzmainfo: Avoid integer overflow

The MB output can overflow with huge numbers. Most likely these are
invalid .lzma files anyway, but let's avoid garbage output.

lzmadec was adapted from LZMA Utils. The original code with this bug
was written in 2005, over 19 years ago.

Co-authored-by: Lasse Collin <lasse.collin@tukaani.org>
Closes: https://github.com/tukaani-project/xz/pull/144
10 months agoxzdec: Remove unused short option -M
Tobias Stoeckmann [Mon, 16 Sep 2024 20:04:40 +0000 (22:04 +0200)] 
xzdec: Remove unused short option -M

"xzdec -M123" exited with exit status 1 without printing
any messages. The "M:" entry should have been removed when
the memory usage limiter support was removed from xzdec.

Fixes: 792331bdee706aa852a78b171040ebf814c6f3ae
Closes: https://github.com/tukaani-project/xz/pull/143
[ Lasse: Commit message edits ]

11 months agoUpdate THANKS
Lasse Collin [Tue, 10 Sep 2024 10:54:47 +0000 (13:54 +0300)] 
Update THANKS

11 months agoBuild: Fix a typo in autogen.sh
Firas Khalil Khana [Tue, 10 Sep 2024 09:30:32 +0000 (12:30 +0300)] 
Build: Fix a typo in autogen.sh

Fixes: e9be74f5b129fe8a5388d588e68b1b7f5168a310
Closes: https://github.com/tukaani-project/xz/pull/141
11 months agoTranslations: Update Chinese (simplified) translation
Lasse Collin [Mon, 2 Sep 2024 17:08:40 +0000 (20:08 +0300)] 
Translations: Update Chinese (simplified) translation

Differences to the zh_CN.po file from the Translation Project:

  - Two uses of \v were fixed.

  - Missing "OPTS" translation in --riscv[=OPTS] was copied from
    previous lines.

  - "make update-po" was run to remove line numbers from comments.

11 months agoTranslations: Update the Catalan translation
Lasse Collin [Mon, 2 Sep 2024 16:40:50 +0000 (19:40 +0300)] 
Translations: Update the Catalan translation

Differences to the ca.po file from the Translation Project:

  - An overlong line translating --filters-help was wrapped.

  - "make update-po" was used to remove line numbers from the comments
    to match the changes in fccebe2b4fd513488fc920e4dac32562ed3c7637
    and 093490b58271e9424ce38a7b1b38bcf61b9c86c6. xz.pot in the TP
    is older than these commits.

11 months agoUpdate THANKS
Lasse Collin [Thu, 22 Aug 2024 11:06:16 +0000 (14:06 +0300)] 
Update THANKS

11 months agoCMake: Don't install lzmadec.1 symlinks if XZ_TOOL_LZMADEC=OFF
Lasse Collin [Thu, 22 Aug 2024 11:06:16 +0000 (14:06 +0300)] 
CMake: Don't install lzmadec.1 symlinks if XZ_TOOL_LZMADEC=OFF

Thanks-to: 榆柳松 (ZhengSen Wang) <wzhengsen@gmail.com>
Fixes: fb50c6ba1d4c9405e5b12b5988b01a3002638c5d
Closes: https://github.com/tukaani-project/xz/pull/134
11 months agoCMake: Fix the build when XZ_TOOL_LZMADEC=OFF
Lasse Collin [Thu, 22 Aug 2024 11:06:16 +0000 (14:06 +0300)] 
CMake: Fix the build when XZ_TOOL_LZMADEC=OFF

Co-developed-by: 榆柳松 (ZhengSen Wang) <wzhengsen@gmail.com>
Fixes: fb50c6ba1d4c9405e5b12b5988b01a3002638c5d
Fixes: https://github.com/tukaani-project/xz/pull/134
11 months agoUpdate THANKS
Lasse Collin [Thu, 22 Aug 2024 08:01:07 +0000 (11:01 +0300)] 
Update THANKS

11 months agoliblzma: Fix x86-64 movzw compatibility in range_decoder.h
Yifeng Li [Thu, 22 Aug 2024 02:18:49 +0000 (02:18 +0000)] 
liblzma: Fix x86-64 movzw compatibility in range_decoder.h

Support for instruction "movzw" without suffix in "GNU as" was
added in commit [1] and stabilized in binutils 2.27, released
in August 2016. Earlier systems don't accept this instruction
without a suffix, making range_decoder.h's inline assembly
unable to build on old systems such as Ubuntu 16.04, creating
error messages like:

    lzma_decoder.c: Assembler messages:
    lzma_decoder.c:371: Error: no such instruction: `movzw 2(%r11),%esi'
    lzma_decoder.c:373: Error: no such instruction: `movzw 4(%r11),%edi'
    lzma_decoder.c:388: Error: no such instruction: `movzw 6(%r11),%edx'
    lzma_decoder.c:398: Error: no such instruction: `movzw (%r11,%r14,4),%esi'

Change "movzw" to "movzwl" for compatibility.

[1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=c07315e0c610e0e3317b4c02266f81793df253d2

Suggested-by: Lasse Collin <lasse.collin@tukaani.org>
Tested-by: Yifeng Li <tomli@tomli.me>
Signed-off-by: Yifeng Li <tomli@tomli.me>
Fixes: 3182a330c1512cc1f5c87b5c5a272578e60a5158
Fixes: https://github.com/tukaani-project/xz/issues/121
Closes: https://github.com/tukaani-project/xz/pull/136
12 months agoBuild: Comment that elf_aux_info(3) will be available on OpenBSD >= 7.6
Lasse Collin [Fri, 19 Jul 2024 17:02:43 +0000 (20:02 +0300)] 
Build: Comment that elf_aux_info(3) will be available on OpenBSD >= 7.6

12 months agoRevert "liblzma: Add ARM64 CRC32 instruction support detection on OpenBSD"
Lasse Collin [Fri, 19 Jul 2024 16:42:26 +0000 (19:42 +0300)] 
Revert "liblzma: Add ARM64 CRC32 instruction support detection on OpenBSD"

This reverts commit dc03f6290f5b9bd3d50c7e12e58dee870889d599.

OpenBSD 7.6 will support elf_aux_info(3), and the detection code used
on FreeBSD will work on OpenBSD 7.6 too. Keep things simpler and drop
the OpenBSD-specific sysctl() method.

Thanks to Christian Weisgerber.