git.ipfire.org Git - thirdparty/glibc.git/log

x86: Optimize memchr-avx2.S

No bug. This commit optimizes memchr-avx2.S. The optimizations include
replacing some branches with cmovcc, avoiding some branches entirely
in the less_4x_vec case, making the page cross logic less strict,
asaving a few instructions the in loop return loop. test-memchr,
test-rawmemchr, and test-wmemchr are all passing.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit acfd088a1963ba51cd83c78f95c0ab25ead79e04)

test-strnlen.c: Check that strnlen won't go beyond the maximum length

Place strings ending at page boundary without the null byte. If an
implementation goes beyond EXP_LEN, it will trigger the segfault.

(cherry picked from commit cb882b21b63606aabd6e55afe23b42434d95f2ef)

test-strnlen.c: Initialize wchar_t string with wmemset [BZ #27655]

Use wmemset to initialize wchar_t string.

(cherry picked from commit 86859b7e58d8670b186c5209ba25f0fbd6612fb7)

x86-64: Require BMI2 for __strlen_evex and __strnlen_evex

Since __strlen_evex and __strnlen_evex added by

commit 1fd8c163a83d96ace1ff78fa6bac7aee084f6f77
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Fri Mar 5 06:24:52 2021 -0800

    x86-64: Add ifunc-avx2.h functions with 256-bit EVEX

use sarx:

c4 e2 6a f7 c0        sarx   %edx,%eax,%eax

require BMI2 for __strlen_evex and __strnlen_evex in ifunc-impl-list.c.
ifunc-avx2.h already requires BMI2 for EVEX implementation.

(cherry picked from commit 55bf411b451c13f0fb7ff3d3bf9a820020b45df1)

NEWS: Add a bug fix entry for BZ #27457

x86-64: Fix ifdef indentation in strlen-evex.S

Fix some indentations of ifdef in file strlen-evex.S which are off by 1
and confusing to read.

(cherry picked from commit 595c22ecd8e87a27fd19270ed30fdbae9ad25426)

x86-64: Use ZMM16-ZMM31 in AVX512 memmove family functions

Update ifunc-memmove.h to select the function optimized with AVX512
instructions using ZMM16-ZMM31 registers to avoid RTM abort with usable
AVX512VL since VZEROUPPER isn't needed at function exit.

(cherry picked from commit e4fda4631017e49d4ee5a2755db34289b6860fa4)

x86-64: Use ZMM16-ZMM31 in AVX512 memset family functions

Update ifunc-memset.h/ifunc-wmemset.h to select the function optimized
with AVX512 instructions using ZMM16-ZMM31 registers to avoid RTM abort
with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at
function exit.

(cherry picked from commit 4e2d8f352774b56078c34648b14a2412c38384f4)

x86: Add string/memory function tests in RTM region

At function exit, AVX optimized string/memory functions have VZEROUPPER
which triggers RTM abort. When such functions are called inside a
transactionally executing RTM region, RTM abort causes severe performance
degradation. Add tests to verify that string/memory functions won't
cause RTM abort in RTM region.

(cherry picked from commit 4bd660be40967cd69072f69ebc2ad32bfcc1f206)

x86-64: Add AVX optimized string/memory functions for RTM

Since VZEROUPPER triggers RTM abort while VZEROALL won't, select AVX
optimized string/memory functions with

xtest
jz 1f
vzeroall
ret
1:
vzeroupper
ret

at function exit on processors with usable RTM, but without 256-bit EVEX
instructions to avoid VZEROUPPER inside a transactionally executing RTM
region.

(cherry picked from commit 7ebba91361badf7531d4e75050627a88d424872f)

x86-64: Add memcmp family functions with 256-bit EVEX

Update ifunc-memcmp.h to select the function optimized with 256-bit EVEX
instructions using YMM16-YMM31 registers to avoid RTM abort with usable
AVX512VL, AVX512BW and MOVBE since VZEROUPPER isn't needed at function
exit.

(cherry picked from commit 91264fe3577fe887b4860923fa6142b5274c8965)

x86-64: Add memset family functions with 256-bit EVEX

Update ifunc-memset.h/ifunc-wmemset.h to select the function optimized
with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM
abort with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at
function exit.

(cherry picked from commit 1b968b6b9b3aac702ac2f133e0dd16cfdbb415ee)

x86-64: Add memmove family functions with 256-bit EVEX

Update ifunc-memmove.h to select the function optimized with 256-bit EVEX
instructions using YMM16-YMM31 registers to avoid RTM abort with usable
AVX512VL since VZEROUPPER isn't needed at function exit.

(cherry picked from commit 63ad43566f7a25d140dc723598aeb441ad657eed)

x86-64: Add strcpy family functions with 256-bit EVEX

Update ifunc-strcpy.h to select the function optimized with 256-bit EVEX
instructions using YMM16-YMM31 registers to avoid RTM abort with usable
AVX512VL and AVX512BW since VZEROUPPER isn't needed at function exit.

(cherry picked from commit 525bc2a32c9710df40371f951217c6ae7a923aee)

x86-64: Add ifunc-avx2.h functions with 256-bit EVEX

Update ifunc-avx2.h, strchr.c, strcmp.c, strncmp.c and wcsnlen.c to
select the function optimized with 256-bit EVEX instructions using
YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL, AVX512BW
and BMI2 since VZEROUPPER isn't needed at function exit.

For strcmp/strncmp, prefer AVX2 strcmp/strncmp if Prefer_AVX2_STRCMP
is set.

(cherry picked from commit 1fd8c163a83d96ace1ff78fa6bac7aee084f6f77)

x86: Add AVX512VL_Usable and AVX512BW_Usable

Add AVX512VL_Usable and AVX512BW_Usable for backporting string/memory
functions optimized with 256-bit EVEX.

x86: Set Prefer_No_VZEROUPPER and add Prefer_AVX2_STRCMP

1. Set Prefer_No_VZEROUPPER if RTM is usable to avoid RTM abort triggered
by VZEROUPPER inside a transactionally executing RTM region.
2. Since to compare 2 32-byte strings, 256-bit EVEX strcmp requires 2
loads, 3 VPCMPs and 2 KORDs while AVX2 strcmp requires 1 load, 2 VPCMPEQs,
1 VPMINU and 1 VPMOVMSKB, AVX2 strcmp is faster than EVEX strcmp. Add
Prefer_AVX2_STRCMP to prefer AVX2 strcmp family functions.

(cherry picked from commit 1da50d4bda07f04135dca39f40e79fc9eabed1f8)

NEWS: Add a bug fix entry for BZ #28755

x86: Fix __wcsncmp_avx2 in strcmp-avx2.S [BZ# 28755]

Fixes [BZ# 28755] for wcsncmp by redirecting length >= 2^56 to
__wcscmp_avx2. For x86_64 this covers the entire address range so any
length larger could not possibly be used to bound `s1` or `s2`.

test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
(cherry picked from commit ddf0992cf57a93200e0c782e2a94d0733a5a0b87)

Fix SXID_ERASE behavior in setuid programs (BZ #27471)

When parse_tunables tries to erase a tunable marked as SXID_ERASE for
setuid programs, it ends up setting the envvar string iterator
incorrectly, because of which it may parse the next tunable
incorrectly.  Given that currently the implementation allows malformed
and unrecognized tunables pass through, it may even allow SXID_ERASE
tunables to go through.

This change revamps the SXID_ERASE implementation so that:

- Only valid tunables are written back to the tunestr string, because
  of which children of SXID programs will only inherit a clean list of
  identified tunables that are not SXID_ERASE.

- Unrecognized tunables get scrubbed off from the environment and
  subsequently from the child environment.

- This has the side-effect that a tunable that is not identified by
  the setxid binary, will not be passed on to a non-setxid child even
  if the child could have identified that tunable.  This may break
  applications that expect this behaviour but expecting such tunables
  to cross the SXID boundary is wrong.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 2ed18c5b534d9e92fc006202a5af0df6b72e7aca)

Enhance setuid-tunables test

Instead of passing GLIBC_TUNABLES via the environment, pass the
environment variable from parent to child. This allows us to test
multiple variables to ensure better coverage.

The test list currently only includes the case that's already being
tested. More tests will be added later.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 061fe3f8add46a89b7453e87eabb9c4695005ced)

Also add intprops.h from 2.29 from commit 8e6fd2bdb21efe2cc1ae7571ff8fb2599db6a05a

tst-env-setuid: Use support_capture_subprogram_self_sgid

Use the support_capture_subprogram_self_sgid to spawn an sgid child.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit ca335281068a1ed549a75ee64f90a8310755956f)

support: Add capability to fork an sgid child

Add a new function support_capture_subprogram_self_sgid that spawns an
sgid child of the running program with its own image and returns the
exit code of the child process.  This functionality is used by at
least three tests in the testsuite at the moment, so it makes sense to
consolidate.

There is also a new function support_subprogram_wait which should
provide simple system() like functionality that does not set up file
actions.  This is useful in cases where only the return code of the
spawned subprocess is interesting.

This patch also ports tst-secure-getenv to this new function.  A
subsequent patch will port other tests.  This also brings an important
change to tst-secure-getenv behaviour.  Now instead of succeeding, the
test fails as UNSUPPORTED if it is unable to spawn a setgid child,
which is how it should have been in the first place.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 716a3bdc41b2b4b864dc64475015ba51e35e1273)

support: Typo and formatting fixes

- Add a newline to the end of error messages in transfer().
- Fixed the name of support_subprocess_init().

(cherry picked from commit 95c68080a3ded882789b1629f872c3ad531efda0)

support: Pass environ to child process

Pass environ to posix_spawn so that the child process can inherit
environment of the test.

(cherry picked from commit e958490f8c74e660bd93c128b3bea746e268f3f6)

support: Add support_capture_subprogram

Its API is similar to support_capture_subprocess, but rather creates a
new process based on the input path and arguments. Under the hoods it
uses posix_spawn to create the new process.

It also allows the use of other support_capture_* functions to check
for expected results and free the resources.

Checked on x86_64-linux-gnu.

* support/Makefile (libsupport-routines): Add support_subprocess,
xposix_spawn, xposix_spawn_file_actions_addclose, and
xposix_spawn_file_actions_adddup2.
(tst-support_capture_subprocess-ARGS): New rule.
* support/capture_subprocess.h (support_capture_subprogram): New
prototype.
* support/support_capture_subprocess.c (support_capture_subprocess):
Refactor to use support_subprocess and support_capture_poll.
(support_capture_subprogram): New function.
* support/tst-support_capture_subprocess.c (write_mode_to_str,
str_to_write_mode, test_common, parse_int, handle_restart,
do_subprocess, do_subprogram, do_multiple_tests): New functions.
(do_test): Add support_capture_subprogram tests.
* support/subprocess.h: New file.
* support/support_subprocess.c: Likewise.
* support/xposix_spawn.c: Likewise.
* support/xposix_spawn_file_actions_addclose.c: Likewise.
* support/xposix_spawn_file_actions_adddup2.c: Likewise.
* support/xspawn.h: Likewise.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 0e169691290a6d2187a4ff41495fc5678cbfdcdc)

nscd: Fix double free in netgroupcache [BZ #27462]

In commit 745664bd798ec8fd50438605948eea594179fba1 a use-after-free
was fixed, but this led to an occasional double-free. This patch
tracks the "live" allocation better.

Tested manually by a third party.

Related: RHBZ 1927877

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit dca565886b5e8bd7966e15f0ca42ee5cff686673)

gconv: Fix assertion failure in ISO-2022-JP-3 module (bug 27256)

The conversion loop to the internal encoding does not follow
the interface contract that __GCONV_FULL_OUTPUT is only returned
after the internal wchar_t buffer has been filled completely.  This
is enforced by the first of the two asserts in iconv/skeleton.c:

      /* We must run out of output buffer space in this
rerun.  */
      assert (outbuf == outerr);
      assert (nstatus == __GCONV_FULL_OUTPUT);

This commit solves this issue by queuing a second wide character
which cannot be written immediately in the state variable, like
other converters already do (e.g., BIG5-HKSCS or TSCII).

Reported-by: Tavis Ormandy <taviso@gmail.com>
(cherry picked from commit 7d88c6142c6efc160c0ee5e4f85cde382c072888)

x86: Check IFUNC definition in unrelocated executable [BZ #20019]

Calling an IFUNC function defined in unrelocated executable also leads to
segfault.  Issue a fatal error message when calling IFUNC function defined
in the unrelocated executable from a shared library.

On x86, ifuncmain6pie failed with:

[hjl@gnu-cfl-2 build-i686-linux]$ ./elf/ifuncmain6pie --direct
./elf/ifuncmain6pie: IFUNC symbol 'foo' referenced in '/export/build/gnu/tools-build/glibc-32bit/build-i686-linux/elf/ifuncmod6.so' is defined in the executable and creates an unsatisfiable circular dependency.
[hjl@gnu-cfl-2 build-i686-linux]$ readelf -rW elf/ifuncmod6.so | grep foo
00003ff4  00000706 R_386_GLOB_DAT         0000400c   foo_ptr
00003ff8  00000406 R_386_GLOB_DAT         00000000   foo
0000400c  00000401 R_386_32               00000000   foo
[hjl@gnu-cfl-2 build-i686-linux]$

Remove non-JUMP_SLOT relocations against foo in ifuncmod6.so, which
trigger the circular IFUNC dependency, and build ifuncmain6pie with
-Wl,-z,lazy.

(cherry picked from commits 6ea5b57afa5cdc9ce367d2b69a2cebfb273e4617
and 7137d682ebfcb6db5dfc5f39724718699922f06c)

x86: Set header.feature_1 in TCB for always-on CET [BZ #27177]

Update dl_cet_check() to set header.feature_1 in TCB when both IBT and
SHSTK are always on.

(cherry picked from commit 2ef23b520597f4ea1790a669b83e608f24f4cf12)

x86-64: Avoid rep movsb with short distance [BZ #27130]

When copying with "rep movsb", if the distance between source and
destination is N*4GB + [1..63] with N >= 0, performance may be very
slow. This patch updates memmove-vec-unaligned-erms.S for AVX and
AVX512 versions with the distance in RCX:

cmpl $63, %ecx
// Don't use "rep movsb" if ECX <= 63
jbe L(Don't use rep movsb")
Use "rep movsb"

Benchtests data with bench-memcpy, bench-memcpy-large, bench-memcpy-random
and bench-memcpy-walk on Skylake, Ice Lake and Tiger Lake show that its
performance impact is within noise range as "rep movsb" is only used for
data size >= 4KB.

(cherry picked from commit 3ec5d83d2a237d39e7fd6ef7a0bc8ac4c171a4a5)

aarch64: Fix DT_AARCH64_VARIANT_PCS handling [BZ #26798]

The variant PCS support was ineffective because in the common case
linkmap->l_mach.plt == 0 but then the symbol table flags were ignored
and normal lazy binding was used instead of resolving the relocs early.
(This was a misunderstanding about how GOT[1] is setup by the linker.)

In practice this mainly affects SVE calls when the vector length is
more than 128 bits, then the top bits of the argument registers get
clobbered during lazy binding.

Fixes bug 26798.

(cherry picked from commit 558251bd8785760ad40fcbfeaaee5d27fa5b0fe4)

AArch64: Use __memcpy_simd on Neoverse N2/V1

Add CPU detection of Neoverse N2 and Neoverse V1, and select __memcpy_simd as
the memcpy/memmove ifunc.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit e11ed9d2b4558eeacff81557dc9557001af42a6b)

AArch64: Rename IS_ARES to IS_NEOVERSE_N1

Rename IS_ARES to IS_NEOVERSE_N1 since that is a bit clearer.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 0f6278a8793a5d04ea31878119eccf99f469a02d)

[AArch64] Improve integer memcpy

Further optimize integer memcpy.  Small cases now include copies up
to 32 bytes.  64-128 byte copies are split into two cases to improve
performance of 64-96 byte copies.  Comments have been rewritten.

(cherry picked from commit 700065132744e0dfa6d4d9142d63f6e3a1934726)

aarch64: Increase small and medium cases for __memcpy_generic

Increase the upper bound on medium cases from 96 to 128 bytes.
Now, up to 128 bytes are copied unrolled.

Increase the upper bound on small cases from 16 to 32 bytes so that
copies of 17-32 bytes are not impacted by the larger medium case.

Benchmarking:
The attached figures show relative timing difference with respect
to 'memcpy_generic', which is the existing implementation.
'memcpy_med_128' denotes the the version of memcpy_generic with
only the medium case enlarged. The 'memcpy_med_128_small_32' numbers
are for the version of memcpy_generic submitted in this patch, which
has both medium and small cases enlarged. The figures were generated
using the script from:
https://www.sourceware.org/ml/libc-alpha/2019-10/msg00563.html

Depending on the platform, the performance improvement in the
bench-memcpy-random.c benchmark ranges from 6% to 20% between
the original and final version of memcpy.S

Tested against GLIBC testsuite and randomized tests.

(cherry picked from commit b9f145df85145506f8e61bac38b792584a38d88f)

AArch64: Improve backwards memmove performance

On some microarchitectures performance of the backwards memmove improves if
the stores use STR with decreasing addresses. So change the memmove loop
in memcpy_advsimd.S to use 2x STR rather than STP.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit bd394d131c10c9ec22c6424197b79410042eed99)

AArch64: Add optimized Q-register memcpy

Add a new memcpy using 128-bit Q registers - this is faster on modern
cores and reduces codesize.  Similar to the generic memcpy, small cases
include copies up to 32 bytes.  64-128 byte copies are split into two
cases to improve performance of 64-96 byte copies.  Large copies align
the source rather than the destination.

bench-memcpy-random is ~9% faster than memcpy_falkor on Neoverse N1,
so make this memcpy the default on N1 (on Centriq it is 15% faster than
memcpy_falkor).

Passes GLIBC regression tests.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 4a733bf375238a6a595033b5785cea7f27d61307)

AArch64: Align ENTRY to a cacheline

Given almost all uses of ENTRY are for string/memory functions,
align ENTRY to a cacheline to simplify things.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 34f0d01d5e43c7dedd002ab47f6266dfb5b79c22)

NEWS: Mention BZ 25933 fix

Fix avx2 strncmp offset compare condition check [BZ #25933]

strcmp-avx2.S: In avx2 strncmp function, strings are compared in
chunks of 4 vector size(i.e. 32x4=128 byte for avx2). After first 4
vector size comparison, code must check whether it already passed
the given offset. This patch implement avx2 offset check condition
for strncmp function, if both string compare same for first 4 vector
size.

(cherry picked from commit 75870237ff3bb363447b03f4b0af100227570910)

Fix use-after-free in glob when expanding ~user (bug 25414)

The value of `end_name' points into the value of `dirname', thus don't
deallocate the latter before the last use of the former.

(cherry picked from commit ddc650e9b3dc916eab417ce9f79e67337b05035c)

Fix array overflow in backtrace on PowerPC (bug 25423)

When unwinding through a signal frame the backtrace function on PowerPC
didn't check array bounds when storing the frame address. Fixes commit
d400dcac5e ("PowerPC: fix backtrace to handle signal trampolines").

(cherry picked from commit d93769405996dfc11d216ddbe415946617b5a494)

riscv: Do not use __has_include__

The user-visible preprocessor construct is called __has_include.

(cherry picked from commit 28dd3939221ab26c6774097e9596e30d9753f758)

misc/test-errno-linux: Handle EINVAL from quotactl

In commit 3dd4d40b420846dd35869ccc8f8627feef2cff32 ("xfs: Sanity check
flags of Q_XQUOTARM call"), Linux 5.4 added checking for the flags
argument, causing the test to fail due to too restrictive test
expectations.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 1f7525d924b608a3e43b10fcfb3d46b8a6e9e4f9)

<string.h>: Define __CORRECT_ISO_CPP_STRING_H_PROTO for Clang [BZ #25232]

Without the asm redirects, strchr et al. are not const-correct.

libc++ has a wrapper header that works with and without
__CORRECT_ISO_CPP_STRING_H_PROTO (using a Clang extension). But when
Clang is used with libstdc++ or just C headers, the overloaded functions
with the correct types are not declared.

This change does not impact current GCC (with libstdc++ or libc++).

(cherry picked from commit 953ceff17a4a15b10cfdd5edc3c8cae4884c8ec3)

x86: Assume --enable-cet if GCC defaults to CET [BZ #25225]

This links in CET support if GCC defaults to CET. Otherwise, __CET__
is defined, yet CET functionality is not compiled and linked into the
dynamic loader, resulting in a linker failure due to undefined
references to _dl_cet_check and _dl_open_check.

(cherry picked from commit 9fb8139079ef0bb1aa33a4ae418cbb113b9b9da7)

libio: Disable vtable validation for pre-2.1 interposed handles [BZ #25203]

Commit c402355dfa7807b8e0adb27c009135a7e2b9f1b0 ("libio: Disable
vtable validation in case of interposition [BZ #23313]") only covered
the interposable glibc 2.1 handles, in libio/stdfiles.c. The
parallel code in libio/oldstdfiles.c needs similar detection logic.

Fixes (again) commit db3476aff19b75c4fdefbe65fcd5f0a90588ba51
("libio: Implement vtable verification [BZ #20191]").

Change-Id: Ief6f9f17e91d1f7263421c56a7dc018f4f595c21
(cherry picked from commit cb61630ed712d033f54295f776967532d3f4b46a)

rtld: Check __libc_enable_secure before honoring LD_PREFER_MAP_32BIT_EXEC (CVE-2019-19126) [BZ #25204]

The problem was introduced in glibc 2.23, in commit
b9eb92ab05204df772eb4929eccd018637c9f3e9
("Add Prefer_MAP_32BIT_EXEC to map executable pages with MAP_32BIT").

(cherry picked from commit d5dfad4326fc683c813df1e37bbf5cf920591c8e)

Fix alignment of TLS variables for tls variant TLS_TCB_AT_TP [BZ #23403]

The alignment of TLS variables is wrong if accessed from within a thread
for architectures with tls variant TLS_TCB_AT_TP.
For the main thread the static tls data is properly aligned.
For other threads the alignment depends on the alignment of the thread
pointer as the static tls data is located relative to this pointer.

This patch adds this alignment for TLS_TCB_AT_TP variants in the same way
as it is already done for TLS_DTV_AT_TP. The thread pointer is also already
properly aligned if the user provides its own stack for the new thread.

This patch extends the testcase nptl/tst-tls1.c in order to check the
alignment of the tls variables and it adds a pthread_create invocation
with a user provided stack.
The test itself is migrated from test-skeleton.c to test-driver.c
and the missing support functions xpthread_attr_setstack and xposix_memalign
are added.

ChangeLog:

[BZ #23403]
* nptl/allocatestack.c (allocate_stack): Align pointer pd for
TLS_TCB_AT_TP tls variant.
* nptl/tst-tls1.c: Migrate to support/test-driver.c.
Add alignment checks.
* support/Makefile (libsupport-routines): Add xposix_memalign and
xpthread_setstack.
* support/support.h: Add xposix_memalign.
* support/xthread.h: Add xpthread_attr_setstack.
* support/xposix_memalign.c: New File.
* support/xpthread_attr_setstack.c: Likewise.

(cherry picked from commit bc79db3fd487daea36e7c130f943cfb9826a41b4)

mips: Force RWX stack for hard-float builds that can run on pre-4.8 kernels

Linux/Mips kernels prior to 4.8 could potentially crash the user
process when doing FPU emulation while running on non-executable
user stack.

Currently, gcc doesn't emit .note.GNU-stack for mips, but that will
change in the future. To ensure that glibc can be used with such
future gcc, without silently resulting in binaries that might crash
in runtime, this patch forces RWX stack for all built objects if
configured to run against minimum kernel version less than 4.8.

* sysdeps/unix/sysv/linux/mips/Makefile
(test-xfail-check-execstack):
Move under mips-has-gnustack != yes.
(CFLAGS-.o*, ASFLAGS-.o*): New rules.
Apply -Wa,-execstack if mips-force-execstack == yes.
* sysdeps/unix/sysv/linux/mips/configure: Regenerated.
* sysdeps/unix/sysv/linux/mips/configure.ac
(mips-force-execstack): New var.
Set to yes for hard-float builds with minimum_kernel < 4.8.0
or minimum_kernel not set at all.
(mips-has-gnustack): New var.
Use value of libc_cv_as_noexecstack
if mips-force-execstack != yes, otherwise set to no.

(cherry picked from commit 33bc9efd91de1b14354291fc8ebd5bce96379f12)

elf: Refuse to dlopen PIE objects [BZ #24323]

Another executable has already been mapped, so the dynamic linker
cannot perform relocations correctly for the second executable.

(cherry picked from commit 2c75b545de6fe3c44138799c68217a94bc669a88)
(test omitted due to indirect dependency on test-in-container)

nss_db: fix endent wrt NULL mappings [BZ #24695] [BZ #24696]

nss_db allows for getpwent et al to be called without a set*ent,
but it only works once. After the last get*ent a set*ent is
required to restart, because the end*ent did not properly reset
the module. Resetting it to NULL allows for a proper restart.

If the database doesn't exist, however, end*ent erroniously called
munmap which set errno.

The test case runs "makedb" inside the testroot, so needs selinux
DSOs installed.

(cherry picked from commit 99135114ba23c3110b7e4e650fabdc5e639746b7)
(note: tests excluded as test-in-container infrastructure missing)

Call _dl_open_check after relocation [BZ #24259]

This is a workaround for [BZ #20839] which doesn't remove the NODELETE
object when _dl_open_check throws an exception. Move it after relocation
in dl_open_worker to avoid leaving the NODELETE object mapped without
relocation.

[BZ #24259]
* elf/dl-open.c (dl_open_worker): Call _dl_open_check after
relocation.
* sysdeps/x86/Makefile (tests): Add tst-cet-legacy-5a,
tst-cet-legacy-5b, tst-cet-legacy-6a and tst-cet-legacy-6b.
(modules-names): Add tst-cet-legacy-mod-5a, tst-cet-legacy-mod-5b,
tst-cet-legacy-mod-5c, tst-cet-legacy-mod-6a, tst-cet-legacy-mod-6b
and tst-cet-legacy-mod-6c.
(CFLAGS-tst-cet-legacy-5a.c): New.
(CFLAGS-tst-cet-legacy-5b.c): Likewise.
(CFLAGS-tst-cet-legacy-mod-5a.c): Likewise.
(CFLAGS-tst-cet-legacy-mod-5b.c): Likewise.
(CFLAGS-tst-cet-legacy-mod-5c.c): Likewise.
(CFLAGS-tst-cet-legacy-6a.c): Likewise.
(CFLAGS-tst-cet-legacy-6b.c): Likewise.
(CFLAGS-tst-cet-legacy-mod-6a.c): Likewise.
(CFLAGS-tst-cet-legacy-mod-6b.c): Likewise.
(CFLAGS-tst-cet-legacy-mod-6c.c): Likewise.
($(objpfx)tst-cet-legacy-5a): Likewise.
($(objpfx)tst-cet-legacy-5a.out): Likewise.
($(objpfx)tst-cet-legacy-mod-5a.so): Likewise.
($(objpfx)tst-cet-legacy-mod-5b.so): Likewise.
($(objpfx)tst-cet-legacy-5b): Likewise.
($(objpfx)tst-cet-legacy-5b.out): Likewise.
(tst-cet-legacy-5b-ENV): Likewise.
($(objpfx)tst-cet-legacy-6a): Likewise.
($(objpfx)tst-cet-legacy-6a.out): Likewise.
($(objpfx)tst-cet-legacy-mod-6a.so): Likewise.
($(objpfx)tst-cet-legacy-mod-6b.so): Likewise.
($(objpfx)tst-cet-legacy-6b): Likewise.
($(objpfx)tst-cet-legacy-6b.out): Likewise.
(tst-cet-legacy-6b-ENV): Likewise.
* sysdeps/x86/tst-cet-legacy-5.c: New file.
* sysdeps/x86/tst-cet-legacy-5a.c: Likewise.
* sysdeps/x86/tst-cet-legacy-5b.c: Likewise.
* sysdeps/x86/tst-cet-legacy-6.c: Likewise.
* sysdeps/x86/tst-cet-legacy-6a.c: Likewise.
* sysdeps/x86/tst-cet-legacy-6b.c: Likewise.
* sysdeps/x86/tst-cet-legacy-mod-5.c: Likewise.
* sysdeps/x86/tst-cet-legacy-mod-5a.c: Likewise.
* sysdeps/x86/tst-cet-legacy-mod-5b.c: Likewise.
* sysdeps/x86/tst-cet-legacy-mod-5c.c: Likewise.
* sysdeps/x86/tst-cet-legacy-mod-6.c: Likewise.
* sysdeps/x86/tst-cet-legacy-mod-6a.c: Likewise.
* sysdeps/x86/tst-cet-legacy-mod-6b.c: Likewise.
* sysdeps/x86/tst-cet-legacy-mod-6c.c: Likewise.

(cherry picked from commit d0093c5cefb7f7a4143f3bb03743633823229cc6)

Base max_fast on alignment, not width, of bins (Bug 24903)

set_max_fast sets the "impossibly small" value based on,
eventually, MALLOC_ALIGNMENT. The comparisons for the smallest
chunk used is, eventually, MIN_CHUNK_SIZE. Note that i386
is the only platform where these are the same, so a smallest
chunk *would* be put in a no-fastbins fastbin.

This change calculates the "impossibly small" value
based on MIN_CHUNK_SIZE instead, so that we can know it will
always be impossibly small.

(cherry picked from commit ff12e0fb91b9072800f031cb21fb2651ee7b6251)

malloc: Fix missing accounting of top chunk in malloc_info [BZ #24026]

Fixes `<total type="rest" size="..."> incorrectly showing as 0 most
of the time.

The rest value being wrong is significant because to compute the
actual amount of memory handed out via malloc, the user must subtract
it from <system type="current" size="...">. That result being wrong
makes investigating memory fragmentation issues like
<https://bugzilla.redhat.com/show_bug.cgi?id=843478> close to
impossible.

(cherry picked from commit b6d2c4475d5abc05dd009575b90556bdd3c78ad0)

malloc: Remove unwanted leading whitespace in malloc_info [BZ #24867]

It was introduced in commit 6c8dbf00f536d78b1937b5af6f57be47fd376344
("Reformat malloc to gnu style.").

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit b0f6679bcd738ea244a14acd879d974901e56c8e)

malloc: Various cleanups for malloc/tst-mxfast

(cherry picked from commit f9769a239784772453d595bc2f4bed8739810e06)

Add glibc.malloc.mxfast tunable

* elf/dl-tunables.list: Add glibc.malloc.mxfast.
* manual/tunables.texi: Document it.
* malloc/malloc.c (do_set_mxfast): New.
(__libc_mallopt): Call it.
* malloc/arena.c: Add mxfast tunable.
* malloc/tst-mxfast.c: New.
* malloc/Makefile: Add it.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit c48d92b430c480de06762f80c104922239416826)

nscd: avoid assertion failure during persistent db check

nscd should not abort when it finds inconsistencies in the persistent db.

(cherry picked from commit 61595e3d36ded374f97961503e843a314b0203c2)

Small tcache improvements

Change the tcache->counts[] entries to uint16_t - this removes
the limit set by char and allows a larger tcache. Remove a few
redundant asserts.

bench-malloc-thread with 4 threads is ~15% faster on Cortex-A72.

Reviewed-by: DJ Delorie <dj@redhat.com>
* malloc/malloc.c (MAX_TCACHE_COUNT): Increase to UINT16_MAX.
(tcache_put): Remove redundant assert.
(tcache_get): Remove redundant asserts.
(__libc_malloc): Check tcache count is not zero.
* manual/tunables.texi (glibc.malloc.tcache_count): Update maximum.

(cherry picked from commit 1f50f2ad854c84ead522bfc7331b46dbe6057d53)

Fix assertion in malloc.c:tcache_get.

One of the warnings that appears with -Wextra is "ordered comparison
of pointer with integer zero" in malloc.c:tcache_get, for the
assertion:

assert (tcache->entries[tc_idx] > 0);

Indeed, a "> 0" comparison does not make sense for
tcache->entries[tc_idx], which is a pointer. My guess is that
tcache->counts[tc_idx] is what's intended here, and this patch changes
the assertion accordingly.

Tested for x86_64.

* malloc/malloc.c (tcache_get): Compare tcache->counts[tc_idx]
with 0, not tcache->entries[tc_idx].

(cherry picked from commit 77dc0d8643aa99c92bf671352b0a8adde705896f)

Improve performance of memmem

This patch significantly improves performance of memmem using a novel
modified Horspool algorithm. Needles up to size 256 use a bad-character
table indexed by hashed pairs of characters to quickly skip past mismatches.
Long needles use a self-adapting filtering step to avoid comparing the whole
needle repeatedly.

By limiting the needle length to 256, the shift table only requires 8 bits
per entry, lowering preprocessing overhead and minimizing cache effects.
This limit also implies worst-case performance is linear.

Small needles up to size 2 use a dedicated linear search. Very long needles
use the Two-Way algorithm (to avoid increasing stack size or slowing down
the common case, inlining is disabled).

The performance gain is 6.6 times on English text on AArch64 using random
needles with average size 8.

Tested against GLIBC testsuite and randomized tests.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* string/memmem.c (__memmem): Rewrite to improve performance.

(cherry picked from commit 680942b0167715e123d934b609060cd382f8e39f)

Improve performance of strstr

This patch significantly improves performance of strstr using a novel
modified Horspool algorithm. Needles up to size 256 use a bad-character
table indexed by hashed pairs of characters to quickly skip past mismatches.
Long needles use a self-adapting filtering step to avoid comparing the whole
needle repeatedly.

By limiting the needle length to 256, the shift table only requires 8 bits
per entry, lowering preprocessing overhead and minimizing cache effects.
This limit also implies worst-case performance is linear.

Small needles up to size 3 use a dedicated linear search. Very long needles
use the Two-Way algorithm.

The performance gain using the improved bench-strstr on Cortex-A72 is 5.8
times basic_strstr and 3.7 times twoway_strstr.

Tested against GLIBC testsuite, randomized tests and the GNULIB strstr test
(https://git.savannah.gnu.org/cgit/gnulib.git/tree/tests/test-strstr.c).

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* string/str-two-way.h (two_way_short_needle): Add inline to avoid
warning.
(two_way_long_needle): Block inlining.
* string/strstr.c (strstr2): Add new function.
(strstr3): Likewise.
(STRSTR): Completely rewrite strstr to improve performance.

(cherry picked from commit 5e0a7ecb6629461b28adc1a5aabcc0ede122f201)

Speedup first memmem match

As done in commit 284f42bc778e487dfd5dff5c01959f93b9e0c4f5, memcmp
can be used after memchr to avoid the initialization overhead of the
two-way algorithm for the first match. This has shown improvement
>40% for first match.

(cherry picked from commit c8dd67e7c958de04c3783cbea7c384431707b5f8)

Simplify and speedup strstr/strcasestr first match

Looking at the benchtests, both strstr and strcasestr spend a lot of time
in a slow initialization loop handling one character per iteration.
This can be simplified and use the much faster strlen/strnlen/strchr/memcmp.
Read ahead a few cachelines to reduce the number of strnlen calls, which
improves performance by ~3-4%. This patch improves the time taken for the
full strstr benchtest by >40%.

* string/strcasestr.c (STRCASESTR): Simplify and speedup first match.
* string/strstr.c (AVAILABLE): Likewise.

(cherry picked from commit 284f42bc778e487dfd5dff5c01959f93b9e0c4f5)

[AArch64] Add ifunc support for Ares

Add Ares to the midr_el0 list and support ifunc dispatch. Since Ares
supports 2 128-bit loads/stores, use Neon registers for memcpy by
selecting __memcpy_falkor by default (we should rename this to
__memcpy_simd or similar).

* manual/tunables.texi (glibc.cpu.name): Add ares tunable.
* sysdeps/aarch64/multiarch/memcpy.c (__libc_memcpy): Use
__memcpy_falkor for ares.
* sysdeps/unix/sysv/linux/aarch64/cpu-features.h (IS_ARES):
Add new define.
* sysdeps/unix/sysv/linux/aarch64/cpu-features.c (cpu_list):
Add ares cpu.

(cherry picked from commit 02f440c1ef5d5d79552a524065aa3e2fabe469b9)

posix: Fix large mmap64 offset for mips64n32 (BZ#24699)

The fix for BZ#21270 (commit 158d5fa0e19) added a mask to avoid offset larger
than 1^44 to be used along __NR_mmap2. However mips64n32 users __NR_mmap,
as mips64n64, but still defines off_t as old non-LFS type (other ILP32, such
x32, defines off_t being equal to off64_t). This leads to use the same
mask meant only for __NR_mmap2 call for __NR_mmap, thus limiting the maximum
offset it can use with mmap64.

This patch fixes by setting the high mask only for __NR_mmap2 usage. The
posix/tst-mmap-offset.c already tests it and also fails for mips64n32. The
patch also change the test to check for an arch-specific header that defines
the maximum supported offset.

Checked on x86_64-linux-gnu, i686-linux-gnu, and I also tests tst-mmap-offset
on qemu simulated mips64 with kernel 3.2.0 kernel for both mips-linux-gnu and
mips64-n32-linux-gnu.

[BZ #24699]
* posix/tst-mmap-offset.c: Mention BZ #24699.
(do_test_bz21270): Rename to do_test_large_offset and use
mmap64_maximum_offset to check for maximum expected offset value.
* sysdeps/generic/mmap_info.h: New file.
* sysdeps/unix/sysv/linux/mips/mmap_info.h: Likewise.
* sysdeps/unix/sysv/linux/mmap64.c (MMAP_OFF_HIGH_MASK): Define iff
__NR_mmap2 is used.

(cherry picked from commit a008c76b56e4f958cf5a0d6f67d29fade89421b7)

aarch64: handle STO_AARCH64_VARIANT_PCS

Backport of commit 82bc69c012838a381c4167c156a06f4598f34227
and commit 30ba0375464f34e4bf8129f3d3dc14d0c09add17
without using DT_AARCH64_VARIANT_PCS for optimizing the symbol table check.
This is needed so the internal abi between ld.so and libc.so is unchanged.

Avoid lazy binding of symbols that may follow a variant PCS with different
register usage convention from the base PCS.

Currently the lazy binding entry code does not preserve all the registers
required for AdvSIMD and SVE vector calls. Saving and restoring all
registers unconditionally may break existing binaries, even if they never
use vector calls, because of the larger stack requirement for lazy
resolution, which can be significant on an SVE system.

The solution is to mark all symbols in the symbol table that may follow
a variant PCS so the dynamic linker can handle them specially. In this
patch such symbols are always resolved at load time, not lazily.

So currently LD_AUDIT for variant PCS symbols are not supported, for that
the _dl_runtime_profile entry needs to be changed e.g. to unconditionally
save/restore all registers (but pass down arg and retval registers to
pltentry/exit callbacks according to the base PCS).

This patch also removes a __builtin_expect from the modified code because
the branch prediction hint did not seem useful.

* sysdeps/aarch64/dl-machine.h (elf_machine_lazy_rel): Check
STO_AARCH64_VARIANT_PCS and bind such symbols at load time.

aarch64: add STO_AARCH64_VARIANT_PCS and DT_AARCH64_VARIANT_PCS

STO_AARCH64_VARIANT_PCS is a non-visibility st_other flag for marking
symbols that reference functions that may follow a variant PCS with
different register usage convention from the base PCS.

DT_AARCH64_VARIANT_PCS is a dynamic tag that marks ELF modules that
have R_*_JUMP_SLOT relocations for symbols marked with
STO_AARCH64_VARIANT_PCS (i.e. have variant PCS calls via a PLT).

* elf/elf.h (STO_AARCH64_VARIANT_PCS): Define.
(DT_AARCH64_VARIANT_PCS): Define.

io: Remove copy_file_range emulation [BZ #24744]

The kernel is evolving this interface (e.g., removal of the
restriction on cross-device copies), and keeping up with that
is difficult. Applications which need the function should
run kernels which support the system call instead of relying on
the imperfect glibc emulation.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 5a659ccc0ec217ab02a4c273a1f6d346a359560a)

libio: do not attempt to free wide buffers of legacy streams [BZ #24228]

Commit a601b74d31ca086de38441d316a3dee24c866305 aka glibc-2.23~693
("In preparation for fixing BZ#16734, fix failure in misc/tst-error1-mem
when _G_HAVE_MMAP is turned off.") introduced a regression:
_IO_unbuffer_all now invokes _IO_wsetb to free wide buffers of all
files, including legacy standard files which are small statically
allocated objects that do not have wide buffers and the _mode member,
causing memory corruption.

Another memory corruption in _IO_unbuffer_all happens when -1
is assigned to the _mode member of legacy standard files that
do not have it.

[BZ #24228]
* libio/genops.c (_IO_unbuffer_all)
[SHLIB_COMPAT (libc, GLIBC_2_0, GLIBC_2_1)]: Do not attempt to free wide
buffers and access _IO_FILE_complete members of legacy libio streams.
* libio/tst-bz24228.c: New file.
* libio/tst-bz24228.map: Likewise.
* libio/Makefile [build-shared] (tests): Add tst-bz24228.
[build-shared] (generated): Add tst-bz24228.mtrace and
tst-bz24228.check.
[run-built-tests && build-shared] (tests-special): Add
$(objpfx)tst-bz24228-mem.out.
(LDFLAGS-tst-bz24228, tst-bz24228-ENV): New variables.
($(objpfx)tst-bz24228-mem.out): New rule.

(cherry picked from commit 21cc130b78a4db9113fb6695e2b951e697662440)

Fix tcache count maximum (BZ #24531)

The tcache counts[] array is a char, which has a very small range and thus
may overflow. When setting tcache_count tunable, there is no overflow check.
However the tunable must not be larger than the maximum value of the tcache
counts[] array, otherwise it can overflow when filling the tcache.

[BZ #24531]
* malloc/malloc.c (MAX_TCACHE_COUNT): New define.
(do_set_tcache_count): Only update if count is small enough.
* manual/tunables.texi (glibc.malloc.tcache_count): Document max value.

(cherry picked from commit 5ad533e8e65092be962e414e0417112c65d154fb)

dlfcn: Guard __dlerror_main_freeres with __libc_once_get (once) [BZ#24476]

dlerror.c (__dlerror_main_freeres) will try to free resources which only
have been initialized when init () has been called. That function is
called when resources are needed using __libc_once (once, init) where
once is a __libc_once_define (static, once) in the dlerror.c file.
Trying to free those resources if init () hasn't been called will
produce errors under valgrind memcheck. So guard the freeing of those
resources using __libc_once_get (once) and make sure we have a valid
key. Also add a similar guard to __dlerror ().

* dlfcn/dlerror.c (__dlerror_main_freeres): Guard using
__libc_once_get (once) and static_bug == NULL.
(__dlerror): Check we have a valid key, set result to static_buf
otherwise.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 11b451c8868d8a2b0edc5dfd44fc58d9ee538be0)

Fix crash in _IO_wfile_sync (bug 20568)

When computing the length of the converted part of the stdio buffer, use
the number of consumed wide characters, not the (negative) distance to the
end of the wide buffer.

(cherry picked from commit 32ff397533715988c19cbf3675dcbd727ec13e18)

malloc: Check for large bin list corruption when inserting unsorted chunk

Fixes bug 24216. This patch adds security checks for bk and bk_nextsize pointers
of chunks in large bin when inserting chunk from unsorted bin. It was possible
to write the pointer to victim (newly inserted chunk) to arbitrary memory
locations if bk or bk_nextsize pointers of the next large bin chunk
got corrupted.

(cherry picked from commit 5b06f538c5aee0389ed034f60d90a8884d6d54de)

malloc: Check the alignment of mmapped chunks before unmapping.

* malloc/malloc.c (munmap_chunk): Verify chunk alignment.

(cherry picked from commit c0e82f117357a941e4d40fcc08babbd6a3c3a1b5)

malloc: Add more integrity checks to mremap_chunk.

* malloc/malloc.c (mremap_chunk): Additional checks.

(cherry picked from commit ebe544bf6e8eec35e754fd49efb027c6f161b6cb)

malloc: Add ChangeLog for accidentally committed change

Commit b90ddd08f6dd688e651df9ee89ca3a69ff88cd0c ("malloc: Additional
checks for unsorted bin integrity I.") was committed without a
whitespace fix, so it is adjusted here as well.

(cherry picked from commit 35cfefd96062145eeb8aee6bd72d07e0909a6b2e)

elf: Fix pldd (BZ#18035)

Since 9182aa67994 (Fix vDSO l_name for GDB's, BZ#387) the initial link_map
for executable itself and loader will have both l_name and l_libname->name
holding the same value due:

elf/dl-object.c

95   new->l_name = *realname ? realname : (char *) newname->name + libname_len - 1;

Since newname->name points to new->l_libname->name.

This leads to pldd to an infinite call at:

elf/pldd-xx.c

203     again:
204       while (1)
205         {
206           ssize_t n = pread64 (memfd, tmpbuf.data, tmpbuf.length, name_offset);

228           /* Try the l_libname element.  */
229           struct E(libname_list) ln;
230           if (pread64 (memfd, &ln, sizeof (ln), m.l_libname) == sizeof (ln))
231             {
232               name_offset = ln.name;
233               goto again;
234             }

Since the value at ln.name (l_libname->name) will be the same as previously
read. The straightforward fix is just avoid the check and read the new list
entry.

I checked also against binaries issues with old loaders with fix for BZ#387,
and pldd could dump the shared objects.

Checked on x86_64-linux-gnu, i686-linux-gnu, aarch64-linux-gnu, and
powerpc64le-linux-gnu.

[BZ #18035]
* elf/Makefile (tests-container): Add tst-pldd.
* elf/pldd-xx.c: Use _Static_assert in of pldd_assert.
(E(find_maps)): Avoid use alloca, use default read file operations
instead of explicit LFS names, and fix infinite loop.
* elf/pldd.c: Explicit set _FILE_OFFSET_BITS, cleanup headers.
(get_process_info): Use _Static_assert instead of assert, use default
directory operations instead of explicit LFS names, and free some
leadek pointers.
* elf/tst-pldd.c: New file.

(cherry picked from commit 1a4c27355e146b6d8cc6487b998462c7fdd1048f)
(Backported without the test case due to lack of test-in-container
support.)

Revert "memusagestat: use local glibc when linking [BZ #18465]"

This reverts commit 630d7201ceb12f8dcdbe20abce67e1333c5e15ee.

The position of the -Wl,-rpath-link= options on the linker command
line is not correct, so the new way of linking memusagestat does not
always work.

memusagestat: use local glibc when linking [BZ #18465]

The memusagestat is the only binary that has its own link line which
causes it to be linked against the existing installed C library. It
has been this way since it was originally committed in 1999, but I
don't see any reason as to why. Since we want all the programs we
build locally to be against the new copy of glibc, change the build
to be like all other programs.

(cherry picked from commit f9b645b4b0a10c43753296ce3fa40053fa44606a)

ja_JP locale: Add entry for the new Japanese era [BZ #22964]

The Japanese era name will be changed on May 1, 2019. The Japanese
government made a preliminary announcement on April 1, 2019.

The glibc ja_JP locale must be updated to include the new era name for
strftime's alternative year format support.

This is a minimal cherry pick of just the required locale changes.

(cherry picked from commit 466afec30896585b60c2106df7a722a86247c9f3)

S390: Mark vx and vxe as important hwcap.

This patch adds vx and vxe as important hwcaps
which allows one to provide shared libraries
tuned for platforms with non-vx/-vxe, vx or vxe.

ChangeLog:

* sysdeps/s390/dl-procinfo.h (HWCAP_IMPORTANT):
Add HWCAP_S390_VX and HWCAP_S390_VXE.

(cherry picked from commit 61f5e9470fb397a4c334938ac5a667427d9047df)

Conflicts:
ChangeLog

Record CVE-2019-9169 in NEWS and ChangeLog [BZ #24114]

(cherry picked from commit b626c5aa5d0673a9caa48fb79fba8bda237e6fa8)

regex: fix read overrun [BZ #24114]

Problem found by AddressSanitizer, reported by Hongxu Chen in:
https://debbugs.gnu.org/34140
* posix/regexec.c (proceed_next_node):
Do not read past end of input buffer.

(cherry picked from commit 583dd860d5b833037175247230a328f0050dbfe9)

RISC-V: don't assume PI mutexes and robust futexes before 4.20 (bug 23864)

Support for futex_cmpxchg as only been added to 4.20-rc1.

(cherry picked from commit 295132ff052b32960207d82a950c0efbf0766857)

powerpc: Only enable TLE with PPC_FEATURE2_HTM_NOSC

Linux from 3.9 through 4.2 does not abort HTM transaction on syscalls,
instead it suspend and resume it when leaving the kernel.  The
side-effects of the syscall will always remain visible, even if the
transaction is aborted.  This is an issue when transaction is used along
with futex syscall, on pthread_cond_wait for instance, where the futex
call might succeed but the transaction is rolled back leading the
pthread_cond object in an inconsistent state.

Glibc used to prevent it by always aborting a transaction before issuing
a syscall.  Linux 4.2 also decided to abort active transaction in
syscalls which makes the glibc workaround superfluous.  Worse, glibc
transaction abortion leads to a performance issue on recent kernels
where the HTM state is saved/restore lazily (v4.9).  By aborting a
transaction on every syscalls, regardless whether a transaction has being
initiated before, GLIBS makes the kernel always save/restore HTM state
(it can not even lazily disable it after a certain number of syscall
iterations).

Because of this shortcoming, Transactional Lock Elision is just enabled
when it has been explicitly set (either by tunables of by a configure
switch) and if kernel aborts HTM transactions on syscalls
(PPC_FEATURE2_HTM_NOSC).  It is reported that using simple benchmark [1],
the context-switch is about 5% faster by not issuing a tabort in every
syscall in newer kernels.

Checked on powerpc64le-linux-gnu with 4.4.0 kernel (Ubuntu 16.04).

* NEWS: Add note about new TLE support on powerpc64le.
* sysdeps/powerpc/nptl/tcb-offsets.sym (TM_CAPABLE): Remove.
* sysdeps/powerpc/nptl/tls.h (tcbhead_t): Rename tm_capable to
__ununsed1.
(TLS_INIT_TP, TLS_DEFINE_INIT_TP): Remove tm_capable setup.
(THREAD_GET_TM_CAPABLE, THREAD_SET_TM_CAPABLE): Remove macros.
* sysdeps/powerpc/powerpc32/sysdep.h,
sysdeps/powerpc/powerpc64/sysdep.h (ABORT_TRANSACTION_IMPL,
ABORT_TRANSACTION): Remove macros.
* sysdeps/powerpc/sysdep.h (ABORT_TRANSACTION): Likewise.
* sysdeps/unix/sysv/linux/powerpc/elision-conf.c (elision_init): Set
__pthread_force_elision iff PPC_FEATURE2_HTM_NOSC is set.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/sysdep.h,
sysdeps/unix/sysv/linux/powerpc/powerpc64/sysdep.h
sysdeps/unix/sysv/linux/powerpc/syscall.S (ABORT_TRANSACTION): Remove
usage.
* sysdeps/unix/sysv/linux/powerpc/not-errno.h: Remove file.

Reported-by: Breno Leitão <leitao@debian.org>
(cherry picked from commit f0458cf4f9ff3d870c43b624e6dccaaf657d5e83)

RISC-V: Fix elfutils testsuite unwind failures.

The clone.S patch fixes 2 elfutils testsuite unwind failures, where the
backtrace gets stuck repeating __thread_start until we hit the backtrace
limit.  This was confirmed by building and installing a patched glibc and
then building elfutils and running its testsuite.

Unfortunately, the testcase isn't working as expected and I don't know why.
The testcase passes even when my clone.S patch is not installed.  The testcase
looks logically similarly to the elfutils testcases that are failing.  Maybe
there is a subtle difference in how the glibc unwinding works versus the
elfutils unwinding?  I don't have good gdb pthread support yet, so I haven't
found a way to debug this.  Anyways, I don't know if the testcase is useful or
not.  If the testcase isn't useful then maybe the clone.S patch is OK without
a testcase?

Jim

[BZ #24040]
* elf/Makefile (CFLAGS-tst-unwind-main.c): Add -DUSE_PTHREADS=0.
* elf/tst-unwind-main.c: If USE_PTHEADS, include pthread.h and error.h
(func): New.
(main): If USE_PTHREADS, call pthread_create to run func.  Otherwise
call func directly.
* nptl/Makefile (tests): Add tst-unwind-thread.
(CFLAGS-tst-unwind-thread.c): Define.
* nptl/tst-unwind-thread.c: New file.
* sysdeps/unix/sysv/linux/riscv/clone.S (__thread_start): Mark ra
as undefined.

(cherry picked from commit 85bd1ddbdfdfd13cfd06f7c367519b6ed3360843)

nptl: Fix pthread_rwlock_try*lock stalls (Bug 23844)

For a full analysis of both the pthread_rwlock_tryrdlock() stall
and the pthread_rwlock_trywrlock() stall see:
https://sourceware.org/bugzilla/show_bug.cgi?id=23844#c14

In the pthread_rwlock_trydlock() function we fail to inspect for
PTHREAD_RWLOCK_FUTEX_USED in __wrphase_futex and wake the waiting
readers.

In the pthread_rwlock_trywrlock() function we write 1 to
__wrphase_futex and loose the setting of the PTHREAD_RWLOCK_FUTEX_USED
bit, again failing to wake waiting readers during unlock.

The fix in the case of pthread_rwlock_trydlock() is to check for
PTHREAD_RWLOCK_FUTEX_USED and wake the readers.

The fix in the case of pthread_rwlock_trywrlock() is to only write
1 to __wrphase_futex if we installed the write phase, since all other
readers would be spinning waiting for this step.

We add two new tests, one exercises the stall for
pthread_rwlock_trywrlock() which is easy to exercise, and one exercises
the stall for pthread_rwlock_trydlock() which is harder to exercise.

The pthread_rwlock_trywrlock() test fails consistently without the fix,
and passes after. The pthread_rwlock_tryrdlock() test fails roughly
5-10% of the time without the fix, and passes all the time after.

Signed-off-by: Carlos O'Donell <carlos@redhat.com>
Signed-off-by: Torvald Riegel <triegel@redhat.com>
Signed-off-by: Rik Prohaska <prohaska7@gmail.com>
Co-authored-by: Torvald Riegel <triegel@redhat.com>
Co-authored-by: Rik Prohaska <prohaska7@gmail.com>
(cherry picked from commit 5fc9ed4c4058bfbdf51ad6e7aac7d209b580e8c4)

nptl: Avoid fork handler lock for async-signal-safe fork [BZ #24161]

Commit 27761a1042daf01987e7d79636d0c41511c6df3c ("Refactor atfork
handlers") introduced a lock, atfork_lock, around fork handler list
accesses.  It turns out that this lock occasionally results in
self-deadlocks in malloc/tst-mallocfork2:

(gdb) bt
#0  __lll_lock_wait_private ()
    at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:63
#1  0x00007f160c6f927a in __run_fork_handlers (who=(unknown: 209394016),
    who@entry=atfork_run_prepare) at register-atfork.c:116
#2  0x00007f160c6b7897 in __libc_fork () at ../sysdeps/nptl/fork.c:58
#3  0x00000000004027d6 in sigusr1_handler (signo=<optimized out>)
    at tst-mallocfork2.c:80
#4  sigusr1_handler (signo=<optimized out>) at tst-mallocfork2.c:64
#5  <signal handler called>
#6  0x00007f160c6f92e4 in __run_fork_handlers (who=who@entry=atfork_run_parent)
    at register-atfork.c:136
#7  0x00007f160c6b79a2 in __libc_fork () at ../sysdeps/nptl/fork.c:152
#8  0x0000000000402567 in do_test () at tst-mallocfork2.c:156
#9  0x0000000000402dd2 in support_test_main (argc=1, argv=0x7ffc81ef1ab0,
    config=config@entry=0x7ffc81ef1970) at support_test_main.c:350
#10 0x0000000000402362 in main (argc=<optimized out>, argv=<optimized out>)
    at ../support/test-driver.c:168

If no locking happens in the single-threaded case (where fork is
expected to be async-signal-safe), this deadlock is avoided.
(pthread_atfork is not required to be async-signal-safe, so a fork
call from a signal handler interrupting pthread_atfork is not
a problem.)

(cherry picked from commit 669ff911e2571f74a2668493e326ac9a505776bd)

Add compiler barriers around modifications of the robust mutex list for pthread_mutex_trylock. [BZ #24180]

While debugging a kernel warning, Thomas Gleixner, Sebastian Sewior and
Heiko Carstens found a bug in pthread_mutex_trylock due to misordered
instructions:
140:   a5 1b 00 01             oill    %r1,1
144:   e5 48 a0 f0 00 00       mvghi   240(%r10),0   <--- THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, NULL);
14a:   e3 10 a0 e0 00 24       stg     %r1,224(%r10) <--- last THREAD_SETMEM of ENQUEUE_MUTEX_PI

vs (with compiler barriers):
140:   a5 1b 00 01             oill    %r1,1
144:   e3 10 a0 e0 00 24       stg     %r1,224(%r10)
14a:   e5 48 a0 f0 00 00       mvghi   240(%r10),0

Please have a look at the discussion:
"Re: WARN_ON_ONCE(!new_owner) within wake_futex_pi() triggerede"
(https://lore.kernel.org/lkml/20190202112006.GB3381@osiris/)

This patch is introducing the same compiler barriers and comments
for pthread_mutex_trylock as introduced for pthread_mutex_lock and
pthread_mutex_timedlock by commit 8f9450a0b7a9e78267e8ae1ab1000ebca08e473e
"Add compiler barriers around modifications of the robust mutex list."

ChangeLog:

[BZ #24180]
* nptl/pthread_mutex_trylock.c (__pthread_mutex_trylock):
Add compiler barriers and comments.

(cherry picked from commit 823624bdc47f1f80109c9c52dee7939b9386d708)

NEWS: Mention bug 24112.

The fix has been backported in the commit
37edf1d3f8ab9adefb61cc466ac52b53114fbd5b.

nscd: Do not use __inet_aton_exact@GLIBC_PRIVATE [BZ #20018]

This commit avoids referencing the __inet_aton_exact@GLIBC_PRIVATE
symbol from nscd. In master, the separately-compiled getaddrinfo
implementation in nscd needs it, however such an internal ABI change
is not desirable on a release branch if it can be avoided.

CVE-2016-10739: getaddrinfo: Fully parse IPv4 address strings [BZ #20018]

The IPv4 address parser in the getaddrinfo function is changed so that
it does not ignore trailing whitespace and all characters after it.
For backwards compatibility, the getaddrinfo function still recognizes
legacy name syntax, such as 192.000.002.010 interpreted as 192.0.2.8
(octal).

This commit does not change the behavior of inet_addr and inet_aton.
gethostbyname already had additional sanity checks (but is switched
over to the new __inet_aton_exact function for completeness as well).

To avoid sending the problematic query names over DNS, commit
6ca53a2453598804a2559a548a08424fca96434a ("resolv: Do not send queries
for non-host-names in nss_dns [BZ #24112]") is needed.

(cherry picked from commit 108bc4049f8ae82710aec26a92ffdb4b439c83fd)

resolv: Do not send queries for non-host-names in nss_dns [BZ #24112]

Before this commit, nss_dns would send a query which did not contain a
host name as the query name (such as invalid\032name.example.com) and
then reject the answer in getanswer_r and gaih_getanswer_slice, using
a check based on res_hnok. With this commit, no query is sent, and a
host-not-found error is returned to NSS without network interaction.

(cherry picked from commit 6ca53a2453598804a2559a548a08424fca96434a)

resolv: Reformat inet_addr, inet_aton to GNU style

(cherry picked from commit 5e30b8ef0758763effa115634e0ed7d8938e4bc0)

x86-64 memcmp: Use unsigned Jcc instructions on size [BZ #24155]

Since the size argument is unsigned. we should use unsigned Jcc
instructions, instead of signed, to check size.

Tested on x86-64 and x32, with and without --disable-multi-arch.

[BZ #24155]
CVE-2019-7309
* NEWS: Updated for CVE-2019-7309.
* sysdeps/x86_64/memcmp.S: Use RDX_LP for size. Clear the
upper 32 bits of RDX register for x32. Use unsigned Jcc
instructions, instead of signed.
* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-memcmp-2.
* sysdeps/x86_64/x32/tst-size_t-memcmp-2.c: New test.

(cherry picked from commit 3f635fb43389b54f682fc9ed2acc0b2aaf4a923d)

x86-64 strnlen/wcsnlen: Properly handle the length parameter [BZ #24097]

On x32, the size_t parameter may be passed in the lower 32 bits of a
64-bit register with the non-zero upper 32 bits. The string/memory
functions written in assembly can only use the lower 32 bits of a
64-bit register as length or must clear the upper 32 bits before using
the full 64-bit register for length.

This pach fixes strnlen/wcsnlen for x32. Tested on x86-64 and x32. On
x86-64, libc.so is the same with and withou the fix.

[BZ #24097]
CVE-2019-6488
* sysdeps/x86_64/multiarch/strlen-avx2.S: Use RSI_LP for length.
Clear the upper 32 bits of RSI register.
* sysdeps/x86_64/strlen.S: Use RSI_LP for length.
* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-strnlen
and tst-size_t-wcsnlen.
* sysdeps/x86_64/x32/tst-size_t-strnlen.c: New file.
* sysdeps/x86_64/x32/tst-size_t-wcsnlen.c: Likewise.

(cherry picked from commit 5165de69c0908e28a380cbd4bb054e55ea4abc95)

x86-64 strncpy: Properly handle the length parameter [BZ #24097]

On x32, the size_t parameter may be passed in the lower 32 bits of a
64-bit register with the non-zero upper 32 bits. The string/memory
functions written in assembly can only use the lower 32 bits of a
64-bit register as length or must clear the upper 32 bits before using
the full 64-bit register for length.

This pach fixes strncpy for x32. Tested on x86-64 and x32. On x86-64,
libc.so is the same with and withou the fix.

[BZ #24097]
CVE-2019-6488
* sysdeps/x86_64/multiarch/strcpy-sse2-unaligned.S: Use RDX_LP
for length.
* sysdeps/x86_64/multiarch/strcpy-ssse3.S: Likewise.
* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-strncpy.
* sysdeps/x86_64/x32/tst-size_t-strncpy.c: New file.

(cherry picked from commit c7c54f65b080affb87a1513dee449c8ad6143c8b)