Jia Tan [Fri, 24 Feb 2023 10:10:37 +0000 (18:10 +0800)]
Tests: Small tweak to test-vli.c.
The static global variables can be disabled if encoders and decoders
are not built. If they are not disabled and -Werror is used, it will
cause an usused warning as an error.
Jia Tan [Mon, 6 Feb 2023 13:35:06 +0000 (21:35 +0800)]
liblzma: Improve documentation in filter.h.
All functions now explicitly specify parameter and return values.
The notes and code annotations were moved before the parameter and
return value descriptions for consistency.
Also, the description above lzma_filter_encoder_is_supported() about
not being able to list available filters was removed since
lzma_str_list_filters() will do this.
Lasse Collin [Tue, 21 Feb 2023 20:57:10 +0000 (22:57 +0200)]
liblzma: Avoid null pointer + 0 (undefined behavior in C).
In the C99 and C17 standards, section 6.5.6 paragraph 8 means that
adding 0 to a null pointer is undefined behavior. As of writing,
"clang -fsanitize=undefined" (Clang 15) diagnoses this. However,
I'm not aware of any compiler that would take advantage of this
when optimizing (Clang 15 included). It's good to avoid this anyway
since compilers might some day infer that pointer arithmetic implies
that the pointer is not NULL. That is, the following foo() would then
unconditionally return 0, even for foo(NULL, 0):
void bar(char *a, char *b);
int foo(char *a, size_t n)
{
bar(a, a + n);
return a == NULL;
}
In contrast to C, C++ explicitly allows null pointer + 0. So if
the above is compiled as C++ then there is no undefined behavior
in the foo(NULL, 0) call.
To me it seems that changing the C standard would be the sane
thing to do (just add one sentence) as it would ensure that a huge
amount of old code won't break in the future. Based on web searches
it seems that a large number of codebases (where null pointer + 0
occurs) are being fixed instead to be future-proof in case compilers
will some day optimize based on it (like making the above foo(NULL, 0)
return 0) which in the worst case will cause security bugs.
Some projects don't plan to change it. For example, gnulib and thus
many GNU tools currently require that null pointer + 0 is defined:
In XZ Utils null pointer + 0 issue should be fixed after this
commit. This adds a few if-statements and thus branches to avoid
null pointer + 0. These check for size > 0 instead of ptr != NULL
because this way bugs where size > 0 && ptr == NULL will likely
get caught quickly. None of them are in hot spots so it shouldn't
matter for performance.
Checking for offset != 0 instead of ptr != NULL allows GCC >= 8.1,
Clang >= 7, and Clang-based ICX to optimize it to the very same code
as ptr + offset. That is, it won't create a branch. So for hot code
this could be a good solution to avoid null pointer + 0. Unfortunately
other compilers like ICC 2021 or MSVC 19.33 (VS2022) will create a
branch from my_ptr_add().
Thanks to Marcin Kowalczyk for reporting the problem:
https://github.com/tukaani-project/xz/issues/36
Jia Tan [Thu, 26 Jan 2023 15:16:34 +0000 (23:16 +0800)]
liblzma: Improve documentation for container.h
Standardizing each function to always specify parameters and return
values. Also moved the parameters and return values to the end of each
function description.
Lasse Collin [Fri, 17 Feb 2023 18:48:28 +0000 (20:48 +0200)]
Build: Use only the generic symbol versioning on MicroBlaze.
On MicroBlaze, GCC 12 is broken in sense that
__has_attribute(__symver__) returns true but it still doesn't
support the __symver__ attribute even though the platform is ELF
and symbol versioning is supported if using the traditional
__asm__(".symver ...") method. Avoiding the traditional method is
good because it breaks LTO (-flto) builds with GCC.
See also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101766
For now the only extra symbols in liblzma_linux.map are the
compatibility symbols with the patch that spread from RHEL/CentOS 7.
These require the use of __symver__ attribute or __asm__(".symver ...")
in the C code. Compatibility with the patch from CentOS 7 doesn't
seem valuable on MicroBlaze so use liblzma_generic.map on MicroBlaze
instead. It doesn't require anything special in the C code and thus
no LTO issues either.
An alternative would be to detect support for __symver__
attribute in configure.ac and CMakeLists.txt and fall back
to __asm__(".symver ...") but then LTO would be silently broken
on MicroBlaze. It sounds likely that MicroBlaze is a special
case so let's treat it as a such because that is simpler. If
a similar issue exists on some other platform too then hopefully
someone will report it and this can be reconsidered.
(This doesn't do the same fix in CMakeLists.txt. Perhaps it should
but perhaps CMake build of liblzma doesn't matter much on MicroBlaze.
The problem breaks the build so it's easy to notice and can be fixed
later.)
Thanks to Vincent Fazio for reporting the problem and proposing
a patch (in the end that solution wasn't used):
https://github.com/tukaani-project/xz/pull/32
Lasse Collin [Thu, 16 Feb 2023 15:59:50 +0000 (17:59 +0200)]
liblzma: Silence a warning from MSVC.
It gives C4146 here since unary minus with unsigned integer
is still unsigned (which is the intention here). Doing it
with substraction makes it clearer and avoids the warning.
Jia Tan [Thu, 16 Feb 2023 13:04:54 +0000 (21:04 +0800)]
liblzma: Improve documentation for stream_flags.h
Standardizing each function to always specify parameters and return
values. Also moved the parameters and return values to the end of each
function description.
A few small things were reworded and long sentences broken up.
Jia Tan [Fri, 27 Jan 2023 14:44:06 +0000 (22:44 +0800)]
liblzma: Improve documentation in check.h.
All functions now explicitly specify parameter and return values.
Also moved the note about SHA-256 functions not being exported to the
top of the file.
Jia Tan [Thu, 2 Feb 2023 16:33:32 +0000 (00:33 +0800)]
liblzma: Fix bug in lzma_str_from_filters() not checking filters[] length.
The bug is only a problem in applications that do not properly terminate
the filters[] array with LZMA_VLI_UNKNOWN or have more than
LZMA_FILTERS_MAX filters. This bug does not affect xz.
Jia Tan [Thu, 2 Feb 2023 16:20:20 +0000 (00:20 +0800)]
liblzma: Clarify block encoder and decoder documentation.
Added a few sentences to the description for lzma_block_encoder() and
lzma_block_decoder() to highlight that the Block Header must be coded
before calling these functions.
Jia Tan [Thu, 2 Feb 2023 16:07:23 +0000 (00:07 +0800)]
liblzma: Improve documentation for block.h.
Standardizing each function to always specify params and return values.
Output pointer parameters are also marked with doxygen style [out] to
make it clear. Any note sections were also moved above the parameter and
return sections for consistency.
Jia Tan [Wed, 1 Feb 2023 15:38:30 +0000 (23:38 +0800)]
liblzma: Clarify a comment about LZMA_STR_NO_VALIDATION.
The flag description for LZMA_STR_NO_VALIDATION was previously confusing
about the treatment for filters than cannot be used with .xz format
(lzma1) without using LZMA_STR_ALL_FILTERS. Now, it is clear that
LZMA_STR_NO_VALIDATION is not a super set of LZMA_STR_ALL_FILTERS.
Jia Tan [Tue, 24 Jan 2023 12:48:50 +0000 (20:48 +0800)]
liblzma: Fix documentation in filter.h for lzma_str_to_filters()
The previous documentation for lzma_str_to_filters() was technically
correct, but misleading. lzma_str_to_filters() returns NULL on success,
which is in practice always defined to 0. This is the same value as
LZMA_OK, but lzma_str_to_filters() does not return lzma_ret so we should
be more clear.
Jia Tan [Thu, 19 Jan 2023 12:35:09 +0000 (20:35 +0800)]
tuklib_physmem: Silence warning from -Wcast-function-type on MinGW-w64.
tuklib_physmem depends on GetProcAddress() for both MSVC and MinGW-w64
to retrieve a function address. The proper way to do this is to cast the
return value to the type of function pointer retrieved. Unfortunately,
this causes a cast-function-type warning, so the best solution is to
simply ignore the warning.
Jia Tan [Mon, 16 Jan 2023 12:55:10 +0000 (20:55 +0800)]
xz: Do not set compression settings with raw format in list mode.
Calling coder_set_compression_settings() in list mode with verbose mode
on caused the filter chain and memory requirements to print. This was
unnecessary since the command results in an error and not consistent
with other formats like lzma and alone.
Lasse Collin [Thu, 12 Jan 2023 11:04:05 +0000 (13:04 +0200)]
Build: Omit -Wmissing-noreturn from the default warnings.
It's not that important. It can be annoying in builds that
disable many features since in those cases the tests programs
will correctly trigger this warning with Clang.
Lasse Collin [Thu, 12 Jan 2023 03:38:48 +0000 (05:38 +0200)]
liblzma: Silence another warning from -Wsign-conversion in a 32-bit build.
It doesn't warn on a 64-bit system because truncating
a ptrdiff_t (signed long) to uint32_t is diagnosed under
-Wconversion by GCC and -Wshorten-64-to-32 by Clang.
Lasse Collin [Thu, 12 Jan 2023 02:14:18 +0000 (04:14 +0200)]
Tests: Fix warnings from clang --Wassign-enum.
Explicitly casting the integer to lzma_check silences the warning.
Since such an invalid value is needed in multiple tests, a constant
INVALID_LZMA_CHECK_ID was added to tests.h.
The use of 0x1000 for lzma_block.check wasn't optimal as if
the underlying type is a char then 0x1000 will be truncated to 0.
However, in these test cases the value is ignored, thus even with
such truncation the test would have passed.
Lasse Collin [Thu, 12 Jan 2023 01:51:07 +0000 (03:51 +0200)]
Tests: Silence warnings from -Wsign-conversion.
Note that assigning an unsigned int to lzma_check doesn't warn
on GNU/Linux x86-64 since the enum type is unsigned on that
platform. The enum can be signed on some other platform though
so it's best to use enumeration type lzma_check in these situations.
Lasse Collin [Thu, 12 Jan 2023 01:19:59 +0000 (03:19 +0200)]
liblzma: Silence warnings from clang -Wconditional-uninitialized.
This is similar to 2ce4f36f179a81d0c6e182a409f363df759d1ad0.
The actual initialization of the variables is done inside
mythread_sync() macro. Clang doesn't seem to see that
the initialization code inside the macro is always executed.
Lasse Collin [Tue, 10 Jan 2023 09:56:11 +0000 (11:56 +0200)]
sysdefs.h: Don't include strings.h anymore.
On some platforms src/xz/suffix.c may need <strings.h> for
strcasecmp() but suffix.c includes the header when it needs it.
Unless there is an old system that otherwise supports enough C99
to build XZ Utils but doesn't have C89/C90-compatible <string.h>,
there should be no need to include <strings.h> in sysdefs.h.
Lasse Collin [Tue, 10 Jan 2023 09:23:41 +0000 (11:23 +0200)]
xz: Include <strings.h> in suffix.c if needed for strcasecmp().
SUSv2 and POSIX.1‐2017 declare only a few functions in <strings.h>.
Of these, strcasecmp() is used on some platforms in suffix.c.
Nothing else in the project needs <strings.h> (at least if
building on a modern system).
sysdefs.h currently includes <strings.h> if HAVE_STRINGS_H is
defined and suffix.c relied on this.
Note that dos/config.h doesn't #define HAVE_STRINGS_H even though
DJGPP does have strings.h. It isn't needed with DJGPP as strcasecmp()
is also in <string.h> in DJGPP.
Jia Tan [Wed, 11 Jan 2023 14:46:48 +0000 (22:46 +0800)]
xz: Fix warning -Wformat-nonliteral on clang in message.c.
clang and gcc differ in how they handle -Wformat-nonliteral. gcc will
allow a non-literal format string as long as the function takes its
format arguments as a va_list.
Jia Tan [Wed, 11 Jan 2023 12:42:29 +0000 (20:42 +0800)]
Tests: Fix type-limits warning in test_filter_flags.
This only occurs in test_filter_flags when the BCJ filters are not
configured and built. In this case, ARRAY_SIZE() returns 0 and causes a
type-limits warning with the loop variable since an unsigned number will
always be >= 0.
Lasse Collin [Tue, 10 Jan 2023 20:14:03 +0000 (22:14 +0200)]
liblzma: CLMUL CRC64: Work around a bug in MSVC, second attempt.
This affects only 32-bit x86 builds. x86-64 is OK as is.
I still cannot easily test this myself. The reporter has tested
this and it passes the tests included in the CMake build and
performance is good: raw CRC64 is 2-3 times faster than the
C version of the slice-by-four method. (Note that liblzma doesn't
include a MSVC-compatible version of the 32-bit x86 assembly code
for the slice-by-four method.)
Thanks to Iouri Kharon for figuring out a fix, testing, and
benchmarking.
It was reported that it wasn't a good enough fix and MSVC
still produced (different kind of) bad code when building
for 32-bit x86 if optimizations are enabled.
Lasse Collin [Tue, 10 Jan 2023 08:04:06 +0000 (10:04 +0200)]
sysdefs.h: Don't include memory.h anymore even if it were available.
It quite probably was never needed, that is, any system where memory.h
was required likely couldn't compile XZ Utils for other reasons anyway.
XZ Utils 5.2.6 and later source packages were generated using
Autoconf 2.71 which no longer defines HAVE_MEMORY_H. So the code
being removed is no longer used anyway.
Lasse Collin [Fri, 6 Jan 2023 20:53:38 +0000 (22:53 +0200)]
Tests: test_filter_flags: Clean up minor issues.
Here are the list of the most significant issues addressed:
- Avoid using internal common.h header. It's not good to copy the
constants like this but common.h cannot be included for use outside
of liblzma. This is the quickest thing to do that could be fixed later.
- Omit the INIT_FILTER macro. Initialization should be done with just
regular designated initializers.
- Use start_offset = 257 for BCJ tests. It demonstrates that Filter
Flags encoder and decoder don't validate the options thoroughly.
257 is valid only for the x86 filter. This is a bit silly but
not a significant problem in practice because the encoder and
decoder initialization functions will catch bad alignment still.
Perhaps this should be fixed but it's not urgent and doesn't need
to be in 5.4.x.
- Various tweaks to comments such as filter id -> Filter ID
Lasse Collin [Sat, 7 Jan 2023 22:32:29 +0000 (00:32 +0200)]
Tests: tuktest.h: Support tuktest_malloc(0).
It's not needed in XZ Utils at least for now. It's good to support
it still because if such use is needed later, it wouldn't be
caught on GNU/Linux since malloc(0) from glibc returns non-NULL.
Jia Tan [Thu, 5 Jan 2023 12:57:25 +0000 (20:57 +0800)]
liblzma: Remove common.h include from common/index.h.
common/index.h is needed by liblzma internally and tests. common.h will
include and define many things that are not needed by the tests.
Also, this prevents include order problems because both common.h and
lzma.h define LZMA_API. On most platforms it results only in a warning
but on Windows it would break the build as the definition in common.h
must be used only for building liblzma itself.
Jia Tan [Wed, 4 Jan 2023 15:58:58 +0000 (23:58 +0800)]
Tests: Replace non portable shell parameter expansion
The shell parameter expansion using # and ## is not supported in
Solaris 10 Bourne shell (/bin/sh). Even though this is POSIX, it is not fully
portable, so we should avoid it.