git.ipfire.org Git - thirdparty/glibc.git/log

wordexp: handle overflow in positional parameter number (bug 28011)

Use strtoul instead of atoi so that overflow can be detected.

(cherry picked from commit 5adda61f62b77384718b4c0d8336ade8f2b4b35c)

nscd: Fix double free in netgroupcache [BZ #27462]

In commit 745664bd798ec8fd50438605948eea594179fba1 a use-after-free
was fixed, but this led to an occasional double-free. This patch
tracks the "live" allocation better.

Tested manually by a third party.

Related: RHBZ 1927877

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit dca565886b5e8bd7966e15f0ca42ee5cff686673)

nscd: Fix use-after-free in addgetnetgrentX [BZ #23520]

addinnetgrX may use the heap-allocated buffer, so free the buffer
in this function.

(cherry picked from commit 745664bd798ec8fd50438605948eea594179fba1)

Add NEWS entry for CVE-2020-29562 (BZ #26923)

BZ #26923 now has a CVE entry, so add a NEWS entry for it.

(cherry picked from commit 38a9e93cb1c58e3c899d638480e6d6e42af8e6fc)

iconv: Fix incorrect UCS4 inner loop bounds (BZ#26923)

Previously, in UCS4 conversion routines we limit the number of
characters we examine to the minimum of the number of characters in the
input and the number of characters in the output. This is not the
correct behavior when __GCONV_IGNORE_ERRORS is set, as we do not consume
an output character when we skip a code unit. Instead, track the input
and output pointers and terminate the loop when either reaches its
limit.

This resolves assertion failures when resetting the input buffer in a step of
iconv, which assumes that the input will be fully consumed given sufficient
output space.

(cherry picked from commit 228edd356f03bf62dcf2b1335f25d43c602ee68d)

iconv: Accept redundant shift sequences in IBM1364 [BZ #26224]

The IBM1364, IBM1371, IBM1388, IBM1390 and IBM1399 character sets
share converter logic (iconvdata/ibm1364.c) which would reject
redundant shift sequences when processing input in these character
sets. This led to a hang in the iconv program (CVE-2020-27618).

This commit adjusts the converter to ignore redundant shift sequences
and adds test cases for iconv_prog hangs that would be triggered upon
their rejection. This brings the implementation in line with other
converters that also ignore redundant shift sequences (e.g. IBM930
etc., fixed in commit 692de4b3960d).

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 9a99c682144bdbd40792ebf822fe9264e0376fb5)

intl: Handle translation output codesets with suffixes [BZ #26383]

Commit 91927b7c7643 (Rewrite iconv option parsing [BZ #19519]) did not
handle cases where the output codeset for translations (via the `gettext'
family of functions) might have a caller specified encoding suffix such as
TRANSLIT or IGNORE. This led to a regression where translations did not
work when the codeset had a suffix.

This commit fixes the above issue by parsing any suffixes passed to
__dcigettext and adds two new test-cases to intl/tst-codeset.c to
verify correct behaviour. The iconv-internal function __gconv_create_spec
and the static iconv-internal function gconv_destroy_spec are now visible
internally within glibc and used in intl/dcigettext.c.

(cherry picked from commit 7d4ec75e111291851620c6aa2c4460647b7fd50d)

Add NEWS entry for CVE-2016-10228 (bug 19519)

(cherry picked from commit 17a0126abf02955cabf6256c67f8f9462a64163f)

Rewrite iconv option parsing [BZ #19519]

This commit replaces string manipulation during `iconv_open' and iconv_prog
option parsing with a structured, flag based conversion specification. In
doing so, it alters the internal `__gconv_open' interface and accordingly
adjusts its uses.

This change fixes several hangs in the iconv program and therefore includes
a new test to exercise iconv_prog options that originally led to these hangs.
It also includes a new regression test for option handling in the iconv
function.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 91927b7c76437db860cd86a7714476b56bb39d07)

support: Add xsetlocale function

(cherry picked from commit cce35a50c1de0cec5cd1f6c18979ff6ee3ea1dd1)

Add Transliterations for Unicode Misc. Mathematical Symbols-A/B [BZ #23132]

This commit adds previously missing transliterations for several code points
in the Unicode blocks "Miscellaneous Mathematical Symbols-A/B" -
transliterated to their approximate ASCII representations. It also adds a
corresponding iconv transliteration test.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 513aaa0d782f8fae36732d06ca59d658149f0139)

support: Correct error message for TEST_COMPARE_STRING

It should say "string", not "blob".

(cherry picked from commit 6175507c06de56e03407004bd2f289ed2cce034d)

support: Fix printf format for TEST_COMPARE_STRING

Fix the following on 32 bits targets:

support_test_compare_string.c: In function 'support_test_compare_string':
support_test_compare_string.c:80:37: error: format '%lu' expects argument of
type 'long unsigned int', but argument 2 has type 'size_t' {aka 'unsigned int'}
[-Werror=format=]
         printf ("  string length: %lu bytes\n", left_length);
                                   ~~^           ~~~~~~~~~~~
                                   %u
Checked on arm-linux-gnueabihf.

* support/support_test_compare_string.c
(support_test_compare_string): Fix printf format.

(cherry picked from commit 00c86a37d1b63044e3169d1f2ebec23447c73f79)

support: Implement TEST_COMPARE_STRING

(cherry picked from commit 1df872fd74f730bcae3df201a229195445d2e18a)

Fix iconv buffer handling with IGNORE error handler (bug #18830)

(cherry picked from commit 4802be92c891903caaf8cae47f685da6f26d4b9a)

Add NEWS entry for CVE-2020-10029 (bug 25487)

(cherry picked from commit 15ab195229dc288d1d49612c3de14a33b88065ed)

math/test-sinl-pseudo: Use stack protector only if available

This fixes commit 9333498794cde1d5cca518bad ("Avoid ldbl-96 stack
corruption from range reduction of pseudo-zero (bug 25487).").

(cherry picked from commit c10acd40262486dac597001aecc20ad9d3bd0e4a)

Avoid ldbl-96 stack corruption from range reduction of pseudo-zero (bug 25487).

Bug 25487 reports stack corruption in ldbl-96 sinl on a pseudo-zero
argument (an representation where all the significand bits, including
the explicit high bit, are zero, but the exponent is not zero, which
is not a valid representation for the long double type).

Although this is not a valid long double representation, existing
practice in this area (see bug 4586, originally marked invalid but
subsequently fixed) is that we still seek to avoid invalid memory
accesses as a result, in case of programs that treat arbitrary binary
data as long double representations, although the invalid
representations of the ldbl-96 format do not need to be consistently
handled the same as any particular valid representation.

This patch makes the range reduction detect pseudo-zero and unnormal
representations that would otherwise go to __kernel_rem_pio2, and
returns a NaN for them instead of continuing with the range reduction
process. (Pseudo-zero and unnormal representations whose unbiased
exponent is less than -1 have already been safely returned from the
function before this point without going through the rest of range
reduction.) Pseudo-zero representations would previously result in
the value passed to __kernel_rem_pio2 being all-zero, which is
definitely unsafe; unnormal representations would previously result in
a value passed whose high bit is zero, which might well be unsafe
since that is not a form of input expected by __kernel_rem_pio2.

Tested for x86_64.

(cherry picked from commit 9333498794cde1d5cca518badf79533a24114b6f)

Add NEWS entry for CVE-2020-1751 (bug 25423)

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 07d16a6debc830ebcf9533da5396edd2eff688e0)

Add NEWS entry for CVE-2020-6096 (bug 25620)

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 17400c4bcd57d84add1da3aa93248ef2efdb0ccb)

arm: CVE-2020-6096: Fix multiarch memcpy for negative length [BZ #25620]

Unsigned branch instructions could be used for r2 to fix the wrong
behavior when a negative length is passed to memcpy.
This commit fixes the armv7 version.

(cherry picked from commit beea361050728138b82c57dda0c4810402d342b9)

arm: CVE-2020-6096: fix memcpy and memmove for negative length [BZ #25620]

Unsigned branch instructions could be used for r2 to fix the wrong
behavior when a negative length is passed to memcpy and memmove.
This commit fixes the generic arm implementation of memcpy amd memmove.

(cherry picked from commit 79a4fa341b8a89cb03f84564fd72abaa1a2db394)

Fix use-after-free in glob when expanding ~user (bug 25414)

The value of `end_name' points into the value of `dirname', thus don't
deallocate the latter before the last use of the former.

(cherry picked from commit ddc650e9b3dc916eab417ce9f79e67337b05035c)
(cherry picked from commit 39a05214fe14ff722d4d92e697fb71ff15e84e70)

Fix array overflow in backtrace on PowerPC (bug 25423)

When unwinding through a signal frame the backtrace function on PowerPC
didn't check array bounds when storing the frame address. Fixes commit
d400dcac5e ("PowerPC: fix backtrace to handle signal trampolines").

(cherry picked from commit d93769405996dfc11d216ddbe415946617b5a494)

misc/test-errno-linux: Handle EINVAL from quotactl

In commit 3dd4d40b420846dd35869ccc8f8627feef2cff32 ("xfs: Sanity check
flags of Q_XQUOTARM call"), Linux 5.4 added checking for the flags
argument, causing the test to fail due to too restrictive test
expectations.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 1f7525d924b608a3e43b10fcfb3d46b8a6e9e4f9)

<string.h>: Define __CORRECT_ISO_CPP_STRING_H_PROTO for Clang [BZ #25232]

Without the asm redirects, strchr et al. are not const-correct.

libc++ has a wrapper header that works with and without
__CORRECT_ISO_CPP_STRING_H_PROTO (using a Clang extension). But when
Clang is used with libstdc++ or just C headers, the overloaded functions
with the correct types are not declared.

This change does not impact current GCC (with libstdc++ or libc++).

(cherry picked from commit 953ceff17a4a15b10cfdd5edc3c8cae4884c8ec3)

intl/tst-gettext: fix failure with newest msgfmt

Since upstream gettext commit d13f165b83 (msgfmt: Remove
POT-Creation-Date field from the header in the output.), msgfmt does not
copy the POT-Creation-Date field in the header entry from the po file to
the mo file anymore. This breaks the assumption that we can test gettext
by comparing each message in the po files with the corresponding string
return by gettext. This makes the intl/tst-gettext to fail.

While it would have been possible to modify the po2test.awk script to
also strip the line POT-Creation-Date field when creating the msgs.h
file, it would not work with both the old and new msgfmt.

Instead create a tst-gettext-de.po file from de.po by removing the
POT-Creation-Date line. Another alternative would be to use a static
tst-gettext-de.po file, but I guess the reason for using de.po is to
also catch issues caused by newly added strings.

As tst-catgets also uses msg.h, it should also be updated. Instead of
using the new tst-gettext-de.po file, the patch modifies xopen-msg.awk
to avoid creating a second catgets->intl dependency.

Changelog:
[BZ #21508]
* catgets/xopen-msg.awk: Ignore POT-Creation-Date line.
* intl/Makefile ($(objpfx)tst-gettext-de.po): Generate
intl/tst-gettext-de.po from po/de.po by removing the
POT-Creation-Date line.
($(objpfx)msgs.h): Depend on $(objpfx)tst-gettext-de.po instead of
../po/de.po.
* intl/tst-gettext.sh: Use ${objpfx}tst-gettext-de.po instead of
../po/de.po.
(cherry picked from commit 56456a2aadf0522b51ea55be1291ca832c5d0524)

aarch64: Fix DT_AARCH64_VARIANT_PCS handling [BZ #26798]

The variant PCS support was ineffective because in the common case
linkmap->l_mach.plt == 0 but then the symbol table flags were ignored
and normal lazy binding was used instead of resolving the relocs early.
(This was a misunderstanding about how GOT[1] is setup by the linker.)

In practice this mainly affects SVE calls when the vector length is
more than 128 bits, then the top bits of the argument registers get
clobbered during lazy binding.

Fixes bug 26798.

(cherry picked from commit 558251bd8785760ad40fcbfeaaee5d27fa5b0fe4)

aarch64: add HWCAP_ATOMICS to HWCAP_IMPORTANT

This enables searching shared libraries in atomics/ when the hardware
supports LSE atomics of armv8.1 so one can provide optimized variants
of libraries in a portable way.

LSE atomics does not affect library abi, the new instructions can
interoperate with old ones.

I considered the earlier comments on the patch

https://sourceware.org/ml/libc-alpha/2018-04/msg00400.html
https://sourceware.org/ml/libc-alpha/2018-04/msg00625.html

It turns out that the way glibc dynamic linker decides on the search
path is not very flexible: it wants to use hwcap bits and associated
strings.  So some targets reuse hwcap bits for glibc internal purposes
to affect the search logic.  But hwcap is an interface with the kernel,
glibc should not allocate bits in it for its internal logic as that
limits future hwcap extensions and confusing to users who expect to see
hwcap bits in ifunc resolvers.  Instead of rewriting the dynamic linker
path logic (which affects all targets) this patch just uses the existing
mechanism, however this means that the path name has to be the hwcap
name "atomics" and cannot be changed to something more meaningful to
users.

It is hard to tell how much performance benefit this can give, in
principle armv8.1 atomics can be better optimized in the hardware, so it
can make a difference for synchronization heavy code.  On some systems
such multilib setup may be the only viable way to get optimized
libraries used.

* sysdeps/unix/sysv/linux/aarch64/dl-procinfo.h (HWCAP_IMPORTANT): Add
HWCAP_ATOMICS.

(cherry picked from commit 397c54c1afa531242602fe3ac7bb47eff0e909f9)

aarch64: Remove HWCAP_CPUID from HWCAP_IMPORTANT

This partially reverts

commit f82e9672ad89ea1ef40bbe1af71478e255e87c5e
Author:     Siddhesh Poyarekar <siddhesh@sourceware.org>

    aarch64: Allow overriding HWCAP_CPUID feature check using HWCAP_MASK

The idea was to make it possible to disable cpuid based ifunc resolution
in glibc by changing the hwcap mask which the user could already control.

However the hwcap mask has an orthogonal role: it specifies additional
library search paths for the dynamic linker.  So "cpuid" got added to
the search paths when it was set in the default mask (HWCAP_IMPORTANT),
which is not useful behaviour, the hwcap masking should not be reused
in the cpu features code.

Meanwhile there is a tunable to set the cpu explicitly so it is possible
to disable the cpuid based dispatch without using a hwcap mask:

  GLIBC_TUNABLES=glibc.tune.cpu=generic

* sysdeps/unix/sysv/linux/aarch64/cpu-features.c (init_cpu_features):
Use dl_hwcap without masking.
* sysdeps/unix/sysv/linux/aarch64/dl-procinfo.h (HWCAP_IMPORTANT):
Remove HWCAP_CPUID.

(cherry picked from commit d0cd79807157e399ff58e67cb51651f90442122e)

libio: Disable vtable validation for pre-2.1 interposed handles [BZ #25203]

Commit c402355dfa7807b8e0adb27c009135a7e2b9f1b0 ("libio: Disable
vtable validation in case of interposition [BZ #23313]") only covered
the interposable glibc 2.1 handles, in libio/stdfiles.c. The
parallel code in libio/oldstdfiles.c needs similar detection logic.

Fixes (again) commit db3476aff19b75c4fdefbe65fcd5f0a90588ba51
("libio: Implement vtable verification [BZ #20191]").

Change-Id: Ief6f9f17e91d1f7263421c56a7dc018f4f595c21
(cherry picked from commit cb61630ed712d033f54295f776967532d3f4b46a)

rtld: Check __libc_enable_secure before honoring LD_PREFER_MAP_32BIT_EXEC (CVE-2019-19126) [BZ #25204]

The problem was introduced in glibc 2.23, in commit
b9eb92ab05204df772eb4929eccd018637c9f3e9
("Add Prefer_MAP_32BIT_EXEC to map executable pages with MAP_32BIT").

(cherry picked from commit d5dfad4326fc683c813df1e37bbf5cf920591c8e)

mips: Force RWX stack for hard-float builds that can run on pre-4.8 kernels

Linux/Mips kernels prior to 4.8 could potentially crash the user
process when doing FPU emulation while running on non-executable
user stack.

Currently, gcc doesn't emit .note.GNU-stack for mips, but that will
change in the future. To ensure that glibc can be used with such
future gcc, without silently resulting in binaries that might crash
in runtime, this patch forces RWX stack for all built objects if
configured to run against minimum kernel version less than 4.8.

* sysdeps/unix/sysv/linux/mips/Makefile
(test-xfail-check-execstack):
Move under mips-has-gnustack != yes.
(CFLAGS-.o*, ASFLAGS-.o*): New rules.
Apply -Wa,-execstack if mips-force-execstack == yes.
* sysdeps/unix/sysv/linux/mips/configure: Regenerated.
* sysdeps/unix/sysv/linux/mips/configure.ac
(mips-force-execstack): New var.
Set to yes for hard-float builds with minimum_kernel < 4.8.0
or minimum_kernel not set at all.
(mips-has-gnustack): New var.
Use value of libc_cv_as_noexecstack
if mips-force-execstack != yes, otherwise set to no.

(cherry picked from commit 33bc9efd91de1b14354291fc8ebd5bce96379f12)

Add undef to fix test failure.

Improve performance of memmem

This patch significantly improves performance of memmem using a novel
modified Horspool algorithm. Needles up to size 256 use a bad-character
table indexed by hashed pairs of characters to quickly skip past mismatches.
Long needles use a self-adapting filtering step to avoid comparing the whole
needle repeatedly.

By limiting the needle length to 256, the shift table only requires 8 bits
per entry, lowering preprocessing overhead and minimizing cache effects.
This limit also implies worst-case performance is linear.

Small needles up to size 2 use a dedicated linear search. Very long needles
use the Two-Way algorithm (to avoid increasing stack size or slowing down
the common case, inlining is disabled).

The performance gain is 6.6 times on English text on AArch64 using random
needles with average size 8.

Tested against GLIBC testsuite and randomized tests.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* string/memmem.c (__memmem): Rewrite to improve performance.

(cherry picked from commit 680942b0167715e123d934b609060cd382f8e39f)

Improve performance of strstr

This patch significantly improves performance of strstr using a novel
modified Horspool algorithm. Needles up to size 256 use a bad-character
table indexed by hashed pairs of characters to quickly skip past mismatches.
Long needles use a self-adapting filtering step to avoid comparing the whole
needle repeatedly.

By limiting the needle length to 256, the shift table only requires 8 bits
per entry, lowering preprocessing overhead and minimizing cache effects.
This limit also implies worst-case performance is linear.

Small needles up to size 3 use a dedicated linear search. Very long needles
use the Two-Way algorithm.

The performance gain using the improved bench-strstr on Cortex-A72 is 5.8
times basic_strstr and 3.7 times twoway_strstr.

Tested against GLIBC testsuite, randomized tests and the GNULIB strstr test
(https://git.savannah.gnu.org/cgit/gnulib.git/tree/tests/test-strstr.c).

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* string/str-two-way.h (two_way_short_needle): Add inline to avoid
warning.
(two_way_long_needle): Block inlining.
* string/strstr.c (strstr2): Add new function.
(strstr3): Likewise.
(STRSTR): Completely rewrite strstr to improve performance.

(cherry picked from commit 5e0a7ecb6629461b28adc1a5aabcc0ede122f201)

Fix strstr bug with huge needles (bug 23637)

The generic strstr in GLIBC 2.28 fails to match huge needles.  The optimized
AVAILABLE macro reads ahead a large fixed amount to reduce the overhead of
repeatedly checking for the end of the string.  However if the needle length
is larger than this, two_way_long_needle may confuse this as meaning the end
of the string and return NULL.  This is fixed by adding the needle length to
the amount to read ahead.

[BZ #23637]
* string/test-strstr.c (pr23637): New function.
(test_main): Add tests with longer needles.
* string/strcasestr.c (AVAILABLE): Fix readahead distance.
* string/strstr.c (AVAILABLE): Likewise.

(cherry picked from commit 83a552b0bb9fc2a5e80a0ab3723c0a80ce1db9f2)

Speedup first memmem match

As done in commit 284f42bc778e487dfd5dff5c01959f93b9e0c4f5, memcmp
can be used after memchr to avoid the initialization overhead of the
two-way algorithm for the first match. This has shown improvement
>40% for first match.

(cherry picked from commit c8dd67e7c958de04c3783cbea7c384431707b5f8)

Simplify and speedup strstr/strcasestr first match

Looking at the benchtests, both strstr and strcasestr spend a lot of time
in a slow initialization loop handling one character per iteration.
This can be simplified and use the much faster strlen/strnlen/strchr/memcmp.
Read ahead a few cachelines to reduce the number of strnlen calls, which
improves performance by ~3-4%. This patch improves the time taken for the
full strstr benchtest by >40%.

* string/strcasestr.c (STRCASESTR): Simplify and speedup first match.
* string/strstr.c (AVAILABLE): Likewise.

(cherry picked from commit 284f42bc778e487dfd5dff5c01959f93b9e0c4f5)

Improve strstr performance

Improve strstr performance. Strstr tends to be slow because it uses
many calls to memchr and a slow byte loop to scan for the next match.
Performance is significantly improved by using strnlen on larger blocks
and using strchr to search for the next matching character. strcasestr
can also use strnlen to scan ahead, and memmem can use memchr to check
for the next match.

On the GLIBC bench tests the performance gains on Cortex-A72 are:
strstr: +25%
strcasestr: +4.3%
memmem: +18%

On a 256KB dataset strstr performance improves by 67%, strcasestr by 47%.

Reviewd-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 3ae725dfb6d7f61447d27d00ed83e573bd5454f4)

[AArch64] Add ifunc support for Ares

Add Ares to the midr_el0 list and support ifunc dispatch. Since Ares
supports 2 128-bit loads/stores, use Neon registers for memcpy by
selecting __memcpy_falkor by default (we should rename this to
__memcpy_simd or similar).

* manual/tunables.texi (glibc.cpu.name): Add ares tunable.
* sysdeps/aarch64/multiarch/memcpy.c (__libc_memcpy): Use
__memcpy_falkor for ares.
* sysdeps/unix/sysv/linux/aarch64/cpu-features.h (IS_ARES):
Add new define.
* sysdeps/unix/sysv/linux/aarch64/cpu-features.c (cpu_list):
Add ares cpu.

(cherry picked from commit 02f440c1ef5d5d79552a524065aa3e2fabe469b9)

aarch64,falkor: Use vector registers for memcpy

Vector registers perform better than scalar register pairs for copying
data so prefer them instead. This results in a time reduction of over
50% (i.e. 2x speed improvemnet) for some smaller sizes for memcpy-walk.
Larger sizes show improvements of around 1% to 2%. memcpy-random shows
a very small improvement, in the range of 1-2%.

* sysdeps/aarch64/multiarch/memcpy_falkor.S (__memcpy_falkor):
Use vector registers.

(cherry picked from commit 0aec4c1d1801e8016ebe89281d16597e0557b8be)

aarch64,falkor: Ignore prefetcher tagging for smaller copies

For smaller and medium sized copies, the effect of hardware
prefetching are not as dominant as instruction level parallelism.
Hence it makes more sense to load data into multiple registers than to
try and route them to the same prefetch unit. This is also the case
for the loop exit where we are unable to latch on to the same prefetch
unit anyway so it makes more sense to have data loaded in parallel.

The performance results are a bit mixed with memcpy-random, with
numbers jumping between -1% and +3%, i.e. the numbers don't seem
repeatable. memcpy-walk sees a 70% improvement (i.e. > 2x) for 128
bytes and that improvement reduces down as the impact of the tail copy
decreases in comparison to the loop.

* sysdeps/aarch64/multiarch/memcpy_falkor.S (__memcpy_falkor):
Use multiple registers to copy data in loop tail.

(cherry picked from commit db725a458e1cb0e17204daa543744faf08bb2e06)

aarch64/strncmp: Use lsr instead of mov+lsr

A lsr can do what the mov and lsr did.

(cherry picked from commit b47c3e7637efb77818cbef55dcd0ed1f0ea0ddf1)

aarch64/strncmp: Unbreak builds with old binutils

Binutils 2.26.* and older do not support moves with shifted registers,
so use a separate shift instruction instead.

(cherry picked from commit d46f84de745db8f3f06a37048261f4e5ceacf0a3)

aarch64: Improve strncmp for mutually misaligned inputs

The mutually misaligned inputs on aarch64 are compared with a simple
byte copy, which is not very efficient. Enhance the comparison
similar to strcmp by loading a double-word at a time. The peak
performance improvement (i.e. 4k maxlen comparisons) due to this on
the strncmp microbenchmark is as follows:

falkor: 3.5x (up to 72% time reduction)
cortex-a73: 3.5x (up to 71% time reduction)
cortex-a53: 3.5x (up to 71% time reduction)

All mutually misaligned inputs from 16 bytes maxlen onwards show
upwards of 15% improvement and there is no measurable effect on the
performance of aligned/mutually aligned inputs.

* sysdeps/aarch64/strncmp.S (count): New macro.
(strncmp): Store misaligned length in SRC1 in COUNT.
(mutual_align): Adjust.
(misaligned8): Load dword at a time when it is safe.

(cherry picked from commit 7108f1f944792ac68332967015d5e6418c5ccc88)

aarch64/strcmp: fix misaligned loop jump target

I accidentally set the loop jump back label as misaligned8 instead of
do_misaligned. The typo is harmless but it's always nice to not have
to unnecessarily execute those two instructions.

* sysdeps/aarch64/strcmp.S (do_misaligned): Jump back to
do_misaligned, not misaligned8.

(cherry picked from commit 6ca24c43481e2c93a6eec362b04c3e77a35b28e3)

aarch64: Improve strcmp unaligned performance

Replace the simple byte-wise compare in the misaligned case with a
dword compare with page boundary checks in place. For simplicity I've
chosen a 4K page boundary so that we don't have to query the actual
page size on the system.

This results in up to 3x improvement in performance in the unaligned
case on falkor and about 2.5x improvement on mustang as measured using
bench-strcmp.

* sysdeps/aarch64/strcmp.S (misaligned8): Compare dword at a
time whenever possible.

(cherry picked from commit 2bce01ebbaf8db52ba4a5635eb5744f989cdbf69)

aarch64: Fix branch target to loop16

I goofed up when changing the loop8 name to loop16 and missed on out
the branch instance. Fixed and actually build tested this time.

* sysdeps/aarch64/memcmp.S (more16): Fix branch target loop16.

(cherry picked from commit 4e54d918630ea53e29dd70d3bdffcb00d29ed3d4)

aarch64: Optimized memcmp for medium to large sizes

This improved memcmp provides a fast path for compares up to 16 bytes
and then compares 16 bytes at a time, thus optimizing loads from both
sources. The glibc memcmp microbenchmark retains performance (with an
error of ~1ns) for smaller compare sizes and reduces up to 31% of
execution time for compares up to 4K on the APM Mustang. On Qualcomm
Falkor this improves to almost 48%, i.e. it is almost 2x improvement
for sizes of 2K and above.

* sysdeps/aarch64/memcmp.S: Widen comparison to 16 bytes at a
time.

(cherry picked from commit 30a81dae5b752f8aa5f96e7f7c341ec57cba3585)

aarch64: Use the L() macro for labels in memcmp

The L() macro makes the assembly a bit more readable.

* sysdeps/aarch64/memcmp.S: Use L() macro for labels.

(cherry picked from commit 84c94d2fd90d84ae7e67657ee8e22c2d1b796f63)

posix: Fix large mmap64 offset for mips64n32 (BZ#24699)

The fix for BZ#21270 (commit 158d5fa0e19) added a mask to avoid offset larger
than 1^44 to be used along __NR_mmap2. However mips64n32 users __NR_mmap,
as mips64n64, but still defines off_t as old non-LFS type (other ILP32, such
x32, defines off_t being equal to off64_t). This leads to use the same
mask meant only for __NR_mmap2 call for __NR_mmap, thus limiting the maximum
offset it can use with mmap64.

This patch fixes by setting the high mask only for __NR_mmap2 usage. The
posix/tst-mmap-offset.c already tests it and also fails for mips64n32. The
patch also change the test to check for an arch-specific header that defines
the maximum supported offset.

Checked on x86_64-linux-gnu, i686-linux-gnu, and I also tests tst-mmap-offset
on qemu simulated mips64 with kernel 3.2.0 kernel for both mips-linux-gnu and
mips64-n32-linux-gnu.

[BZ #24699]
* posix/tst-mmap-offset.c: Mention BZ #24699.
(do_test_bz21270): Rename to do_test_large_offset and use
mmap64_maximum_offset to check for maximum expected offset value.
* sysdeps/generic/mmap_info.h: New file.
* sysdeps/unix/sysv/linux/mips/mmap_info.h: Likewise.
* sysdeps/unix/sysv/linux/mmap64.c (MMAP_OFF_HIGH_MASK): Define iff
__NR_mmap2 is used.

(cherry picked from commit a008c76b56e4f958cf5a0d6f67d29fade89421b7)

aarch64: handle STO_AARCH64_VARIANT_PCS

Backport of commit 82bc69c012838a381c4167c156a06f4598f34227
and commit 30ba0375464f34e4bf8129f3d3dc14d0c09add17
without using DT_AARCH64_VARIANT_PCS for optimizing the symbol table check.
This is needed so the internal abi between ld.so and libc.so is unchanged.

Avoid lazy binding of symbols that may follow a variant PCS with different
register usage convention from the base PCS.

Currently the lazy binding entry code does not preserve all the registers
required for AdvSIMD and SVE vector calls. Saving and restoring all
registers unconditionally may break existing binaries, even if they never
use vector calls, because of the larger stack requirement for lazy
resolution, which can be significant on an SVE system.

The solution is to mark all symbols in the symbol table that may follow
a variant PCS so the dynamic linker can handle them specially. In this
patch such symbols are always resolved at load time, not lazily.

So currently LD_AUDIT for variant PCS symbols are not supported, for that
the _dl_runtime_profile entry needs to be changed e.g. to unconditionally
save/restore all registers (but pass down arg and retval registers to
pltentry/exit callbacks according to the base PCS).

This patch also removes a __builtin_expect from the modified code because
the branch prediction hint did not seem useful.

* sysdeps/aarch64/dl-machine.h (elf_machine_lazy_rel): Check
STO_AARCH64_VARIANT_PCS and bind such symbols at load time.

aarch64: add STO_AARCH64_VARIANT_PCS and DT_AARCH64_VARIANT_PCS

STO_AARCH64_VARIANT_PCS is a non-visibility st_other flag for marking
symbols that reference functions that may follow a variant PCS with
different register usage convention from the base PCS.

DT_AARCH64_VARIANT_PCS is a dynamic tag that marks ELF modules that
have R_*_JUMP_SLOT relocations for symbols marked with
STO_AARCH64_VARIANT_PCS (i.e. have variant PCS calls via a PLT).

* elf/elf.h (STO_AARCH64_VARIANT_PCS): Define.
(DT_AARCH64_VARIANT_PCS): Define.

io: Remove copy_file_range emulation [BZ #24744]

The kernel is evolving this interface (e.g., removal of the
restriction on cross-device copies), and keeping up with that
is difficult. Applications which need the function should
run kernels which support the system call instead of relying on
the imperfect glibc emulation.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 5a659ccc0ec217ab02a4c273a1f6d346a359560a)

libio: do not attempt to free wide buffers of legacy streams [BZ #24228]

Commit a601b74d31ca086de38441d316a3dee24c866305 aka glibc-2.23~693
("In preparation for fixing BZ#16734, fix failure in misc/tst-error1-mem
when _G_HAVE_MMAP is turned off.") introduced a regression:
_IO_unbuffer_all now invokes _IO_wsetb to free wide buffers of all
files, including legacy standard files which are small statically
allocated objects that do not have wide buffers and the _mode member,
causing memory corruption.

Another memory corruption in _IO_unbuffer_all happens when -1
is assigned to the _mode member of legacy standard files that
do not have it.

[BZ #24228]
* libio/genops.c (_IO_unbuffer_all)
[SHLIB_COMPAT (libc, GLIBC_2_0, GLIBC_2_1)]: Do not attempt to free wide
buffers and access _IO_FILE_complete members of legacy libio streams.
* libio/tst-bz24228.c: New file.
* libio/tst-bz24228.map: Likewise.
* libio/Makefile [build-shared] (tests): Add tst-bz24228.
[build-shared] (generated): Add tst-bz24228.mtrace and
tst-bz24228.check.
[run-built-tests && build-shared] (tests-special): Add
$(objpfx)tst-bz24228-mem.out.
(LDFLAGS-tst-bz24228, tst-bz24228-ENV): New variables.
($(objpfx)tst-bz24228-mem.out): New rule.

(cherry picked from commit 21cc130b78a4db9113fb6695e2b951e697662440)

NEWS: add entries for bugs 22964, 24180, and 24531

Fix tcache count maximum (BZ #24531)

The tcache counts[] array is a char, which has a very small range and thus
may overflow. When setting tcache_count tunable, there is no overflow check.
However the tunable must not be larger than the maximum value of the tcache
counts[] array, otherwise it can overflow when filling the tcache.

[BZ #24531]
* malloc/malloc.c (MAX_TCACHE_COUNT): New define.
(do_set_tcache_count): Only update if count is small enough.
* manual/tunables.texi (glibc.malloc.tcache_count): Document max value.

(cherry picked from commit 5ad533e8e65092be962e414e0417112c65d154fb)

Fix crash in _IO_wfile_sync (bug 20568)

When computing the length of the converted part of the stdio buffer, use
the number of consumed wide characters, not the (negative) distance to the
end of the wide buffer.

(cherry picked from commit 32ff397533715988c19cbf3675dcbd727ec13e18)

elf: Fix pldd (BZ#18035)

Since 9182aa67994 (Fix vDSO l_name for GDB's, BZ#387) the initial link_map
for executable itself and loader will have both l_name and l_libname->name
holding the same value due:

elf/dl-object.c

95   new->l_name = *realname ? realname : (char *) newname->name + libname_len - 1;

Since newname->name points to new->l_libname->name.

This leads to pldd to an infinite call at:

elf/pldd-xx.c

203     again:
204       while (1)
205         {
206           ssize_t n = pread64 (memfd, tmpbuf.data, tmpbuf.length, name_offset);

228           /* Try the l_libname element.  */
229           struct E(libname_list) ln;
230           if (pread64 (memfd, &ln, sizeof (ln), m.l_libname) == sizeof (ln))
231             {
232               name_offset = ln.name;
233               goto again;
234             }

Since the value at ln.name (l_libname->name) will be the same as previously
read. The straightforward fix is just avoid the check and read the new list
entry.

I checked also against binaries issues with old loaders with fix for BZ#387,
and pldd could dump the shared objects.

Checked on x86_64-linux-gnu, i686-linux-gnu, aarch64-linux-gnu, and
powerpc64le-linux-gnu.

[BZ #18035]
* elf/Makefile (tests-container): Add tst-pldd.
* elf/pldd-xx.c: Use _Static_assert in of pldd_assert.
(E(find_maps)): Avoid use alloca, use default read file operations
instead of explicit LFS names, and fix infinite loop.
* elf/pldd.c: Explicit set _FILE_OFFSET_BITS, cleanup headers.
(get_process_info): Use _Static_assert instead of assert, use default
directory operations instead of explicit LFS names, and free some
leadek pointers.
* elf/tst-pldd.c: New file.

(cherry picked from commit 1a4c27355e146b6d8cc6487b998462c7fdd1048f)
(Backported without the test case due to lack of test-in-container
support.)

ja_JP locale: Add entry for the new Japanese era [BZ #22964]

The Japanese era name will be changed on May 1, 2019. The Japanese
government made a preliminary announcement on April 1, 2019.

The glibc ja_JP locale must be updated to include the new era name for
strftime's alternative year format support.

This is a minimal cherry pick of just the required locale changes.

(cherry picked from commit 466afec30896585b60c2106df7a722a86247c9f3)

S390: Mark vx and vxe as important hwcap.

This patch adds vx and vxe as important hwcaps
which allows one to provide shared libraries
tuned for platforms with non-vx/-vxe, vx or vxe.

ChangeLog:

* sysdeps/s390/dl-procinfo.h (HWCAP_IMPORTANT):
Add HWCAP_S390_VX and HWCAP_S390_VXE.

(cherry picked from commit 61f5e9470fb397a4c334938ac5a667427d9047df)

Add compiler barriers around modifications of the robust mutex list for pthread_mutex_trylock. [BZ #24180]

While debugging a kernel warning, Thomas Gleixner, Sebastian Sewior and
Heiko Carstens found a bug in pthread_mutex_trylock due to misordered
instructions:
140:   a5 1b 00 01             oill    %r1,1
144:   e5 48 a0 f0 00 00       mvghi   240(%r10),0   <--- THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, NULL);
14a:   e3 10 a0 e0 00 24       stg     %r1,224(%r10) <--- last THREAD_SETMEM of ENQUEUE_MUTEX_PI

vs (with compiler barriers):
140:   a5 1b 00 01             oill    %r1,1
144:   e3 10 a0 e0 00 24       stg     %r1,224(%r10)
14a:   e5 48 a0 f0 00 00       mvghi   240(%r10),0

Please have a look at the discussion:
"Re: WARN_ON_ONCE(!new_owner) within wake_futex_pi() triggerede"
(https://lore.kernel.org/lkml/20190202112006.GB3381@osiris/)

This patch is introducing the same compiler barriers and comments
for pthread_mutex_trylock as introduced for pthread_mutex_lock and
pthread_mutex_timedlock by commit 8f9450a0b7a9e78267e8ae1ab1000ebca08e473e
"Add compiler barriers around modifications of the robust mutex list."

ChangeLog:

[BZ #24180]
* nptl/pthread_mutex_trylock.c (__pthread_mutex_trylock):
Add compiler barriers and comments.

(cherry picked from commit 823624bdc47f1f80109c9c52dee7939b9386d708)

x86-64 memcmp: Use unsigned Jcc instructions on size [BZ #24155]

Since the size argument is unsigned. we should use unsigned Jcc
instructions, instead of signed, to check size.

Tested on x86-64 and x32, with and without --disable-multi-arch.

[BZ #24155]
CVE-2019-7309
* NEWS: Updated for CVE-2019-7309.
* sysdeps/x86_64/memcmp.S: Use RDX_LP for size. Clear the
upper 32 bits of RDX register for x32. Use unsigned Jcc
instructions, instead of signed.
* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-memcmp-2.
* sysdeps/x86_64/x32/tst-size_t-memcmp-2.c: New test.

(cherry picked from commit 3f635fb43389b54f682fc9ed2acc0b2aaf4a923d)

x86-64 strnlen/wcsnlen: Properly handle the length parameter [BZ #24097]

On x32, the size_t parameter may be passed in the lower 32 bits of a
64-bit register with the non-zero upper 32 bits. The string/memory
functions written in assembly can only use the lower 32 bits of a
64-bit register as length or must clear the upper 32 bits before using
the full 64-bit register for length.

This pach fixes strnlen/wcsnlen for x32. Tested on x86-64 and x32. On
x86-64, libc.so is the same with and withou the fix.

[BZ #24097]
CVE-2019-6488
* sysdeps/x86_64/multiarch/strlen-avx2.S: Use RSI_LP for length.
Clear the upper 32 bits of RSI register.
* sysdeps/x86_64/strlen.S: Use RSI_LP for length.
* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-strnlen
and tst-size_t-wcsnlen.
* sysdeps/x86_64/x32/tst-size_t-strnlen.c: New file.
* sysdeps/x86_64/x32/tst-size_t-wcsnlen.c: Likewise.

(cherry picked from commit 5165de69c0908e28a380cbd4bb054e55ea4abc95)

x86-64 strncpy: Properly handle the length parameter [BZ #24097]

On x32, the size_t parameter may be passed in the lower 32 bits of a
64-bit register with the non-zero upper 32 bits. The string/memory
functions written in assembly can only use the lower 32 bits of a
64-bit register as length or must clear the upper 32 bits before using
the full 64-bit register for length.

This pach fixes strncpy for x32. Tested on x86-64 and x32. On x86-64,
libc.so is the same with and withou the fix.

[BZ #24097]
CVE-2019-6488
* sysdeps/x86_64/multiarch/strcpy-sse2-unaligned.S: Use RDX_LP
for length.
* sysdeps/x86_64/multiarch/strcpy-ssse3.S: Likewise.
* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-strncpy.
* sysdeps/x86_64/x32/tst-size_t-strncpy.c: New file.

(cherry picked from commit c7c54f65b080affb87a1513dee449c8ad6143c8b)

x86-64 strncmp family: Properly handle the length parameter [BZ #24097]

On x32, the size_t parameter may be passed in the lower 32 bits of a
64-bit register with the non-zero upper 32 bits. The string/memory
functions written in assembly can only use the lower 32 bits of a
64-bit register as length or must clear the upper 32 bits before using
the full 64-bit register for length.

This pach fixes the strncmp family for x32. Tested on x86-64 and x32.
On x86-64, libc.so is the same with and withou the fix.

[BZ #24097]
CVE-2019-6488
* sysdeps/x86_64/multiarch/strcmp-sse42.S: Use RDX_LP for length.
* sysdeps/x86_64/strcmp.S: Likewise.
* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-strncasecmp,
tst-size_t-strncmp and tst-size_t-wcsncmp.
* sysdeps/x86_64/x32/tst-size_t-strncasecmp.c: New file.
* sysdeps/x86_64/x32/tst-size_t-strncmp.c: Likewise.
* sysdeps/x86_64/x32/tst-size_t-wcsncmp.c: Likewise.

(cherry picked from commit ee915088a0231cd421054dbd8abab7aadf331153)

x86-64 memset/wmemset: Properly handle the length parameter [BZ #24097]

On x32, the size_t parameter may be passed in the lower 32 bits of a
64-bit register with the non-zero upper 32 bits.  The string/memory
functions written in assembly can only use the lower 32 bits of a
64-bit register as length or must clear the upper 32 bits before using
the full 64-bit register for length.

This pach fixes memset/wmemset for x32.  Tested on x86-64 and x32.  On
x86-64, libc.so is the same with and withou the fix.

[BZ #24097]
CVE-2019-6488
* sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S: Use
RDX_LP for length.  Clear the upper 32 bits of RDX register.
* sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Likewise.
* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-wmemset.
* sysdeps/x86_64/x32/tst-size_t-memset.c: New file.
* sysdeps/x86_64/x32/tst-size_t-wmemset.c: Likewise.

(cherry picked from commit 82d0b4a4d76db554eb6757acb790fcea30b19965)

x86-64 memrchr: Properly handle the length parameter [BZ #24097]

On x32, the size_t parameter may be passed in the lower 32 bits of a
64-bit register with the non-zero upper 32 bits. The string/memory
functions written in assembly can only use the lower 32 bits of a
64-bit register as length or must clear the upper 32 bits before using
the full 64-bit register for length.

This pach fixes memrchr for x32. Tested on x86-64 and x32. On x86-64,
libc.so is the same with and withou the fix.

[BZ #24097]
CVE-2019-6488
* sysdeps/x86_64/memrchr.S: Use RDX_LP for length.
* sysdeps/x86_64/multiarch/memrchr-avx2.S: Likewise.
* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-memrchr.
* sysdeps/x86_64/x32/tst-size_t-memrchr.c: New file.

(cherry picked from commit ecd8b842cf37ea112e59cd9085ff1f1b6e208ae0)

x86-64 memcpy: Properly handle the length parameter [BZ #24097]

On x32, the size_t parameter may be passed in the lower 32 bits of a
64-bit register with the non-zero upper 32 bits.  The string/memory
functions written in assembly can only use the lower 32 bits of a
64-bit register as length or must clear the upper 32 bits before using
the full 64-bit register for length.

This pach fixes memcpy for x32.  Tested on x86-64 and x32.  On x86-64,
libc.so is the same with and withou the fix.

[BZ #24097]
CVE-2019-6488
* sysdeps/x86_64/multiarch/memcpy-ssse3-back.S: Use RDX_LP for
length.  Clear the upper 32 bits of RDX register.
* sysdeps/x86_64/multiarch/memcpy-ssse3.S: Likewise.
* sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S:
Likewise.
* sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:
Likewise.
* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-memcpy.
tst-size_t-wmemchr.
* sysdeps/x86_64/x32/tst-size_t-memcpy.c: New file.

(cherry picked from commit 231c56760c1e2ded21ad96bbb860b1f08c556c7a)

x86-64 memcmp/wmemcmp: Properly handle the length parameter [BZ #24097]

On x32, the size_t parameter may be passed in the lower 32 bits of a
64-bit register with the non-zero upper 32 bits.  The string/memory
functions written in assembly can only use the lower 32 bits of a
64-bit register as length or must clear the upper 32 bits before using
the full 64-bit register for length.

This pach fixes memcmp/wmemcmp for x32.  Tested on x86-64 and x32.  On
x86-64, libc.so is the same with and withou the fix.

[BZ #24097]
CVE-2019-6488
* sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S: Use RDX_LP for
length.  Clear the upper 32 bits of RDX register.
* sysdeps/x86_64/multiarch/memcmp-sse4.S: Likewise.
* sysdeps/x86_64/multiarch/memcmp-ssse3.S: Likewise.
* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-memcmp and
tst-size_t-wmemcmp.
* sysdeps/x86_64/x32/tst-size_t-memcmp.c: New file.
* sysdeps/x86_64/x32/tst-size_t-wmemcmp.c: Likewise.

(cherry picked from commit b304fc201d2f6baf52ea790df8643e99772243cd)

x86-64 memchr/wmemchr: Properly handle the length parameter [BZ #24097]

On x32, the size_t parameter may be passed in the lower 32 bits of a
64-bit register with the non-zero upper 32 bits.  The string/memory
functions written in assembly can only use the lower 32 bits of a
64-bit register as length or must clear the upper 32 bits before using
the full 64-bit register for length.

This pach fixes memchr/wmemchr for x32.  Tested on x86-64 and x32.  On
x86-64, libc.so is the same with and withou the fix.

[BZ #24097]
CVE-2019-6488
* sysdeps/x86_64/memchr.S: Use RDX_LP for length.  Clear the
upper 32 bits of RDX register.
* sysdeps/x86_64/multiarch/memchr-avx2.S: Likewise.
* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-memchr and
tst-size_t-wmemchr.
* sysdeps/x86_64/x32/test-size_t.h: New file.
* sysdeps/x86_64/x32/tst-size_t-memchr.c: Likewise.
* sysdeps/x86_64/x32/tst-size_t-wmemchr.c: Likewise.

(cherry picked from commit 97700a34f36721b11a754cf37a1cc40695ece1fd)

NEWS: add entries for bugs 23275, 23861, and 23907

intl: Do not return NULL on asprintf failure in gettext [BZ #24018]

Fixes commit 9695dd0c9309712ed8e9c17a7040fe7af347f2dc ("DCIGETTEXT:
Use getcwd, asprintf to construct absolute pathname").

(cherry picked from commit 8c1aafc1f34d090a5b41dc527c33e8687f6a1287)

malloc: Always call memcpy in _int_realloc [BZ #24027]

This commit removes the custom memcpy implementation from _int_realloc
for small chunk sizes. The ncopies variable has the wrong type, and
an integer wraparound could cause the existing code to copy too few
elements (leaving the new memory region mostly uninitialized).
Therefore, removing this code fixes bug 24027.

(cherry picked from commit b50dd3bc8cbb1efe85399b03d7e6c0310c2ead84)

Fix rwlock stall with PREFER_WRITER_NONRECURSIVE_NP (bug 23861)

In the read lock function (__pthread_rwlock_rdlock_full) there was a
code path which would fail to reload __readers while waiting for
PTHREAD_RWLOCK_RWAITING to change. This failure to reload __readers
into a local value meant that various conditionals used the old value
of __readers and with only two threads left it could result in an
indefinite stall of one of the readers (waiting for PTHREAD_RWLOCK_RWAITING
to go to zero, but it never would).

(cherry picked from commit f21e8f8ca466320fed38bdb71526c574dae98026)

powerpc: Add missing CFI register information (bug #23614)

Add CFI information about the offset of registers stored in the stack
frame.

[BZ #23614]
* sysdeps/powerpc/powerpc64/addmul_1.S (FUNC): Add CFI offset for
registers saved in the stack frame.
* sysdeps/powerpc/powerpc64/lshift.S (__mpn_lshift): Likewise.
* sysdeps/powerpc/powerpc64/mul_1.S (__mpn_mul_1): Likewise.

Signed-off-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
(cherry picked from commit 1d880d4a9bf7608c2cd33bbe954ce6995f79121a)

Fix _dl_profile_fixup data-dependency issue (Bug 23690)

There is a data-dependency between the fields of struct l_reloc_result
and the field used as the initialization guard. Users of the guard
expect writes to the structure to be observable when they also observe
the guard initialized. The solution for this problem is to use an acquire
and release load and store to ensure previous writes to the structure are
observable if the guard is initialized.

The previous implementation used DL_FIXUP_VALUE_ADDR (l_reloc_result->addr)
as the initialization guard, making it impossible for some architectures
to load and store it atomically, i.e. hppa and ia64, due to its larger size.

This commit adds an unsigned int to l_reloc_result to be used as the new
initialization guard of the struct, making it possible to load and store
it atomically in all architectures. The fix ensures that the values
observed in l_reloc_result are consistent and do not lead to crashes.
The algorithm is documented in the code in elf/dl-runtime.c
(_dl_profile_fixup). Not all data races have been eliminated.

Tested with build-many-glibcs and on powerpc, powerpc64, and powerpc64le.

[BZ #23690]
* elf/dl-runtime.c (_dl_profile_fixup): Guarantee memory
modification order when accessing reloc_result->addr.
* include/link.h (reloc_result): Add field init.
* nptl/Makefile (tests): Add tst-audit-threads.
(modules-names): Add tst-audit-threads-mod1 and
tst-audit-threads-mod2.
Add rules to build tst-audit-threads.
* nptl/tst-audit-threads-mod1.c: New file.
* nptl/tst-audit-threads-mod2.c: Likewise.
* nptl/tst-audit-threads.c: Likewise.
* nptl/tst-audit-threads.h: Likewise.

Signed-off-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit e5d262effe3a87164308a3f37e61b32d0348692a)

inet/tst-if_index-long: New test case for CVE-2018-19591 [BZ #23927]

(cherry picked from commit 899478c2bfa00c5df8d8bedb52effbb065700278)

support: Implement <support/descriptors.h> to track file descriptors

(cherry picked from commit f255336a9301619519045548acb2e1027065a837)

support: Close original descriptors in support_capture_subprocess

(cherry picked from commit 02cd5c1a8d033d7f91fea12a66bb44d1bbf85f76)

support_quote_string: Do not use str parameter name

This avoids a build failure if this identifier is used as a macro
in a test.

(cherry picked from commit 47d8d9a2172f827a8dde7695f415aa6f78a82d0e)

support: Implement support_quote_string

Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
(cherry picked from commit c74a91deaa5de416237c02bbb3e41bda76ca4c7b)

malloc: Add another test for tcache double free check.

This one tests for BZ#23907 where the double free
test didn't check the tcache bin bounds before dereferencing
the bin.

[BZ #23907]
* malloc/tst-tcfree3.c: New.
* malloc/Makefile: Add it.

(cherry picked from commit 7c9a7c68363051cfc5fa1ebb96b3b2c1f82dcb76)

malloc: tcache double free check

* malloc/malloc.c (tcache_entry): Add key field.
(tcache_put): Set it.
(tcache_get): Likewise.
(_int_free): Check for double free in tcache.
* malloc/tst-tcfree1.c: New.
* malloc/tst-tcfree2.c: New.
* malloc/Makefile: Run the new tests.
* manual/probes.texi: Document memory_tcache_double_free probe.

* dlfcn/dlerror.c (check_free): Prevent double frees.

(cherry picked from commit bcdaad21d4635931d1bd3b54a7894276925d081d)

malloc: tcache: Validate tc_idx before checking for double-frees [BZ #23907]

The previous check could read beyond the end of the tcache entry
array. If the e->key == tcache cookie check happened to pass, this
would result in crashes.

(cherry picked from commit affec03b713c82c43a5b025dddc21bde3334f41e)

CVE-2018-19591: if_nametoindex: Fix descriptor for overlong name [BZ #23927]

(cherry picked from commit d527c860f5a3f0ed687bd03f0cb464612dc23408)

Add an additional test to resolv/tst-resolv-network.c

Test for the infinite loop in getnetbyname, bug #17630.

(cherry picked from commit ac8060265bcaca61568ef3a20b9a0140a270af54)

libanl: properly cleanup if first helper thread creation failed (bug 22927)

(cherry picked from commit bd3b0fbae33a9a4cc5e2daf049443d5cf03d4251)

x86: Fix Haswell CPU string flags (BZ#23709)

Th commit 'Disable TSX on some Haswell processors.' (2702856bf4) changed the
default flags for Haswell models. Previously, new models were handled by the
default switch path, which assumed a Core i3/i5/i7 if AVX is available. After
the patch, Haswell models (0x3f, 0x3c, 0x45, 0x46) do not set the flags
Fast_Rep_String, Fast_Unaligned_Load, Fast_Unaligned_Copy, and
Prefer_PMINUB_for_stringop (only the TSX one).

This patch fixes it by disentangle the TSX flag handling from the memory
optimization ones. The strstr case cited on patch now selects the
__strstr_sse2_unaligned as expected for the Haswell cpu.

Checked on x86_64-linux-gnu.

[BZ #23709]
* sysdeps/x86/cpu-features.c (init_cpu_features): Set TSX bits
independently of other flags.

(cherry picked from commit c3d8dc45c9df199b8334599a6cbd98c9950dba62)

i64: fix missing exp2f, log2f and powf symbols in libm.a [BZ #23822]

When new symbol versions were introduced without SVID compatible
error handling the exp2f, log2f and powf symbols were accidentally
removed from the ia64 lim.a. The regression was introduced by
the commits

f5f0f5265162fe6f4f238abcd3086985f7c38d6d
New expf and exp2f version without SVID compat wrapper

72d3d281080be9f674982067d72874fd6cdb4b64
New symbol version for logf, log2f and powf without SVID compat

With WEAK_LIBM_ENTRY(foo), there is a hidden __foo and weak foo
symbol definition in both SHARED and !SHARED build.

[BZ #23822]
* sysdeps/ia64/fpu/e_exp2f.S (exp2f): Use WEAK_LIBM_ENTRY.
* sysdeps/ia64/fpu/e_log2f.S (log2f): Likewise.
* sysdeps/ia64/fpu/e_exp2f.S (powf): Likewise.

(cherry picked from commit ba5b14c7613980dfefcad6b6e88f913e5f596c59)

conform: XFAIL siginfo_t si_band test on sparc64

We can use long int on sparcv9, but on sparc64, we must match the int
type used by the kernel (and not long int, as in POSIX).

(cherry picked from commit 7c5e34d7f1b8f8f5acd94c2b885ae13b85414dcd)

signal: Use correct type for si_band in siginfo_t [BZ #23562]

(cherry picked from commit f997b4be18f7e57d757d39e42f7715db26528aa0)

Fix race in pthread_mutex_lock while promoting to PTHREAD_MUTEX_ELISION_NP [BZ #23275]

The race leads either to pthread_mutex_destroy returning EBUSY
or triggering an assertion (See description in bugzilla).

This patch is fixing the race by ensuring that the elision path is
used in all cases if elision is enabled by the GLIBC_TUNABLES framework.

The __kind variable in struct __pthread_mutex_s is accessed concurrently.
Therefore we are now using the atomic macros.

The new testcase tst-mutex10 is triggering the race on s390x and intel.
Presumably also on power, but I don't have access to a power machine
with lock-elision. At least the code for power is the same as on the other
two architectures.

ChangeLog:

[BZ #23275]
* nptl/tst-mutex10.c: New File.
* nptl/Makefile (tests): Add tst-mutex10.
(tst-mutex10-ENV): New variable.
* sysdeps/unix/sysv/linux/s390/force-elision.h: (FORCE_ELISION):
Ensure that elision path is used if elision is available.
* sysdeps/unix/sysv/linux/powerpc/force-elision.h (FORCE_ELISION):
Likewise.
* sysdeps/unix/sysv/linux/x86/force-elision.h: (FORCE_ELISION):
Likewise.
* nptl/pthreadP.h (PTHREAD_MUTEX_TYPE, PTHREAD_MUTEX_TYPE_ELISION)
(PTHREAD_MUTEX_PSHARED): Use atomic_load_relaxed.
* nptl/pthread_mutex_consistent.c (pthread_mutex_consistent): Likewise.
* nptl/pthread_mutex_getprioceiling.c (pthread_mutex_getprioceiling):
Likewise.
* nptl/pthread_mutex_lock.c (__pthread_mutex_lock_full)
(__pthread_mutex_cond_lock_adjust): Likewise.
* nptl/pthread_mutex_setprioceiling.c (pthread_mutex_setprioceiling):
Likewise.
* nptl/pthread_mutex_timedlock.c (__pthread_mutex_timedlock): Likewise.
* nptl/pthread_mutex_trylock.c (__pthread_mutex_trylock): Likewise.
* nptl/pthread_mutex_unlock.c (__pthread_mutex_unlock_full): Likewise.
* sysdeps/nptl/bits/thread-shared-types.h (struct __pthread_mutex_s):
Add comments.
* nptl/pthread_mutex_destroy.c (__pthread_mutex_destroy):
Use atomic_load_relaxed and atomic_store_relaxed.
* nptl/pthread_mutex_init.c (__pthread_mutex_init):
Use atomic_store_relaxed.

(cherry picked from commit 403b4feb22dcbc85ace72a361d2a951380372471)

Fix misreported errno on preadv2/pwritev2 (BZ#23579)

The fallback code of Linux wrapper for preadv2/pwritev2 executes
regardless of the errno code for preadv2, instead of the case where
the syscall is not supported.

This fixes it by calling the fallback code iff errno is ENOSYS. The
patch also adds tests for both invalid file descriptor and invalid
iov_len and vector count.

The only discrepancy between preadv2 and fallback code regarding
error reporting is when an invalid flags are used. The fallback code
bails out earlier with ENOTSUP instead of EINVAL/EBADF when the syscall
is used.

Checked on x86_64-linux-gnu on a 4.4.0 and 4.15.0 kernel.

[BZ #23579]
* misc/tst-preadvwritev2-common.c (do_test_with_invalid_fd): New
test.
* misc/tst-preadvwritev2.c, misc/tst-preadvwritev64v2.c (do_test):
Call do_test_with_invalid_fd.
* sysdeps/unix/sysv/linux/preadv2.c (preadv2): Use fallback code iff
errno is ENOSYS.
* sysdeps/unix/sysv/linux/preadv64v2.c (preadv64v2): Likewise.
* sysdeps/unix/sysv/linux/pwritev2.c (pwritev2): Likewise.
* sysdeps/unix/sysv/linux/pwritev64v2.c (pwritev64v2): Likewise.

(cherry picked from commit 7a16bdbb9ff4122af0a28dc20996c95352011fdd)

preadv2/pwritev2: Handle offset == -1 [BZ #22753]

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit d4b4a00a462348750bb18544eb30853ee6ac5d10)

Fix segfault in maybe_script_execute.

If glibc is built with gcc 8 and -march=z900,
the testcase posix/tst-spawn4-compat crashes with a segfault.

In function maybe_script_execute, the new_argv array is dynamically
initialized on stack with (argc + 1) elements.
The function wants to add _PATH_BSHELL as the first argument
and writes out of bounds of new_argv.
There is an off-by-one because maybe_script_execute fails to count
the terminating NULL when sizing new_argv.

ChangeLog:

* sysdeps/unix/sysv/linux/spawni.c (maybe_script_execute):
Increment size of new_argv by one.

(cherry picked from commit 28669f86f6780a18daca264f32d66b1428c9c6f1)

pthread_cond_broadcast: Fix waiters-after-spinning case [BZ #23538]

(cherry picked from commit 99ea93ca31795469d2a1f1570f17a5c39c2eb7e2)

x86: Populate COMMON_CPUID_INDEX_80000001 for Intel CPUs [BZ #23459]

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
[BZ #23459]
* sysdeps/x86/cpu-features.c (get_extended_indices): New
function.
(init_cpu_features): Call get_extended_indices for both Intel
and AMD CPUs.
* sysdeps/x86/cpu-features.h (COMMON_CPUID_INDEX_80000001):
Remove "for AMD" comment.

(cherry picked from commit be525a69a6630abc83144c0a96474f2e26da7443)

x86: Correct index_cpu_LZCNT [BZ #23456]

cpu-features.h has

#define bit_cpu_LZCNT (1 << 5)
#define index_cpu_LZCNT COMMON_CPUID_INDEX_1
#define reg_LZCNT

But the LZCNT feature bit is in COMMON_CPUID_INDEX_80000001:

Initial EAX Value: 80000001H
ECX Extended Processor Signature and Feature Bits:
Bit 05: LZCNT available

index_cpu_LZCNT should be COMMON_CPUID_INDEX_80000001, not
COMMON_CPUID_INDEX_1. The VMX feature bit is in COMMON_CPUID_INDEX_1:

Initial EAX Value: 01H
Feature Information Returned in the ECX Register:
5 VMX

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
[BZ #23456]
* sysdeps/x86/cpu-features.h (index_cpu_LZCNT): Set to
COMMON_CPUID_INDEX_80000001.

(cherry picked from commit 65d87ade1ee6f3ac099105e3511bd09bdc24cf3f)

NEWS: Mention bug 22996.

The fix has been backported in the commit
718445e569ec5c5892bc7c5ff17f434cdc586dc8.