git.ipfire.org Git - thirdparty/glibc.git/log

aarch64: Fix DT_AARCH64_VARIANT_PCS handling [BZ #26798]

The variant PCS support was ineffective because in the common case
linkmap->l_mach.plt == 0 but then the symbol table flags were ignored
and normal lazy binding was used instead of resolving the relocs early.
(This was a misunderstanding about how GOT[1] is setup by the linker.)

In practice this mainly affects SVE calls when the vector length is
more than 128 bits, then the top bits of the argument registers get
clobbered during lazy binding.

Fixes bug 26798.

(cherry picked from commit 558251bd8785760ad40fcbfeaaee5d27fa5b0fe4)

aarch64: add HWCAP_ATOMICS to HWCAP_IMPORTANT

This enables searching shared libraries in atomics/ when the hardware
supports LSE atomics of armv8.1 so one can provide optimized variants
of libraries in a portable way.

LSE atomics does not affect library abi, the new instructions can
interoperate with old ones.

I considered the earlier comments on the patch

https://sourceware.org/ml/libc-alpha/2018-04/msg00400.html
https://sourceware.org/ml/libc-alpha/2018-04/msg00625.html

It turns out that the way glibc dynamic linker decides on the search
path is not very flexible: it wants to use hwcap bits and associated
strings.  So some targets reuse hwcap bits for glibc internal purposes
to affect the search logic.  But hwcap is an interface with the kernel,
glibc should not allocate bits in it for its internal logic as that
limits future hwcap extensions and confusing to users who expect to see
hwcap bits in ifunc resolvers.  Instead of rewriting the dynamic linker
path logic (which affects all targets) this patch just uses the existing
mechanism, however this means that the path name has to be the hwcap
name "atomics" and cannot be changed to something more meaningful to
users.

It is hard to tell how much performance benefit this can give, in
principle armv8.1 atomics can be better optimized in the hardware, so it
can make a difference for synchronization heavy code.  On some systems
such multilib setup may be the only viable way to get optimized
libraries used.

* sysdeps/unix/sysv/linux/aarch64/dl-procinfo.h (HWCAP_IMPORTANT): Add
HWCAP_ATOMICS.

(cherry picked from commit 397c54c1afa531242602fe3ac7bb47eff0e909f9)

aarch64: Remove HWCAP_CPUID from HWCAP_IMPORTANT

This partially reverts

commit f82e9672ad89ea1ef40bbe1af71478e255e87c5e
Author:     Siddhesh Poyarekar <siddhesh@sourceware.org>

    aarch64: Allow overriding HWCAP_CPUID feature check using HWCAP_MASK

The idea was to make it possible to disable cpuid based ifunc resolution
in glibc by changing the hwcap mask which the user could already control.

However the hwcap mask has an orthogonal role: it specifies additional
library search paths for the dynamic linker.  So "cpuid" got added to
the search paths when it was set in the default mask (HWCAP_IMPORTANT),
which is not useful behaviour, the hwcap masking should not be reused
in the cpu features code.

Meanwhile there is a tunable to set the cpu explicitly so it is possible
to disable the cpuid based dispatch without using a hwcap mask:

  GLIBC_TUNABLES=glibc.tune.cpu=generic

* sysdeps/unix/sysv/linux/aarch64/cpu-features.c (init_cpu_features):
Use dl_hwcap without masking.
* sysdeps/unix/sysv/linux/aarch64/dl-procinfo.h (HWCAP_IMPORTANT):
Remove HWCAP_CPUID.

(cherry picked from commit d0cd79807157e399ff58e67cb51651f90442122e)

Fix use-after-free in glob when expanding ~user (bug 25414)

The value of `end_name' points into the value of `dirname', thus don't
deallocate the latter before the last use of the former.

(cherry picked from commit ddc650e9b3dc916eab417ce9f79e67337b05035c with
changes from commit d711a00f93fa964f41a53839228598fbf1a6b482)

Fix array overflow in backtrace on PowerPC (bug 25423)

When unwinding through a signal frame the backtrace function on PowerPC
didn't check array bounds when storing the frame address. Fixes commit
d400dcac5e ("PowerPC: fix backtrace to handle signal trampolines").

(cherry picked from commit d93769405996dfc11d216ddbe415946617b5a494)

libio: Disable vtable validation for pre-2.1 interposed handles [BZ #25203]

Commit c402355dfa7807b8e0adb27c009135a7e2b9f1b0 ("libio: Disable
vtable validation in case of interposition [BZ #23313]") only covered
the interposable glibc 2.1 handles, in libio/stdfiles.c. The
parallel code in libio/oldstdfiles.c needs similar detection logic.

Fixes (again) commit db3476aff19b75c4fdefbe65fcd5f0a90588ba51
("libio: Implement vtable verification [BZ #20191]").

Change-Id: Ief6f9f17e91d1f7263421c56a7dc018f4f595c21
(cherry picked from commit cb61630ed712d033f54295f776967532d3f4b46a)

rtld: Check __libc_enable_secure before honoring LD_PREFER_MAP_32BIT_EXEC (CVE-2019-19126) [BZ #25204]

The problem was introduced in glibc 2.23, in commit
b9eb92ab05204df772eb4929eccd018637c9f3e9
("Add Prefer_MAP_32BIT_EXEC to map executable pages with MAP_32BIT").

(cherry picked from commit d5dfad4326fc683c813df1e37bbf5cf920591c8e)

mips: Force RWX stack for hard-float builds that can run on pre-4.8 kernels

Linux/Mips kernels prior to 4.8 could potentially crash the user
process when doing FPU emulation while running on non-executable
user stack.

Currently, gcc doesn't emit .note.GNU-stack for mips, but that will
change in the future. To ensure that glibc can be used with such
future gcc, without silently resulting in binaries that might crash
in runtime, this patch forces RWX stack for all built objects if
configured to run against minimum kernel version less than 4.8.

* sysdeps/unix/sysv/linux/mips/Makefile
(test-xfail-check-execstack):
Move under mips-has-gnustack != yes.
(CFLAGS-.o*, ASFLAGS-.o*): New rules.
Apply -Wa,-execstack if mips-force-execstack == yes.
* sysdeps/unix/sysv/linux/mips/configure: Regenerated.
* sysdeps/unix/sysv/linux/mips/configure.ac
(mips-force-execstack): New var.
Set to yes for hard-float builds with minimum_kernel < 4.8.0
or minimum_kernel not set at all.
(mips-has-gnustack): New var.
Use value of libc_cv_as_noexecstack
if mips-force-execstack != yes, otherwise set to no.

(cherry picked from commit 33bc9efd91de1b14354291fc8ebd5bce96379f12)

Improve performance of memmem

This patch significantly improves performance of memmem using a novel
modified Horspool algorithm. Needles up to size 256 use a bad-character
table indexed by hashed pairs of characters to quickly skip past mismatches.
Long needles use a self-adapting filtering step to avoid comparing the whole
needle repeatedly.

By limiting the needle length to 256, the shift table only requires 8 bits
per entry, lowering preprocessing overhead and minimizing cache effects.
This limit also implies worst-case performance is linear.

Small needles up to size 2 use a dedicated linear search. Very long needles
use the Two-Way algorithm (to avoid increasing stack size or slowing down
the common case, inlining is disabled).

The performance gain is 6.6 times on English text on AArch64 using random
needles with average size 8.

Tested against GLIBC testsuite and randomized tests.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* string/memmem.c (__memmem): Rewrite to improve performance.

(cherry picked from commit 680942b0167715e123d934b609060cd382f8e39f)

Improve performance of strstr

This patch significantly improves performance of strstr using a novel
modified Horspool algorithm. Needles up to size 256 use a bad-character
table indexed by hashed pairs of characters to quickly skip past mismatches.
Long needles use a self-adapting filtering step to avoid comparing the whole
needle repeatedly.

By limiting the needle length to 256, the shift table only requires 8 bits
per entry, lowering preprocessing overhead and minimizing cache effects.
This limit also implies worst-case performance is linear.

Small needles up to size 3 use a dedicated linear search. Very long needles
use the Two-Way algorithm.

The performance gain using the improved bench-strstr on Cortex-A72 is 5.8
times basic_strstr and 3.7 times twoway_strstr.

Tested against GLIBC testsuite, randomized tests and the GNULIB strstr test
(https://git.savannah.gnu.org/cgit/gnulib.git/tree/tests/test-strstr.c).

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* string/str-two-way.h (two_way_short_needle): Add inline to avoid
warning.
(two_way_long_needle): Block inlining.
* string/strstr.c (strstr2): Add new function.
(strstr3): Likewise.
(STRSTR): Completely rewrite strstr to improve performance.

(cherry picked from commit 5e0a7ecb6629461b28adc1a5aabcc0ede122f201)

Fix strstr bug with huge needles (bug 23637)

The generic strstr in GLIBC 2.28 fails to match huge needles.  The optimized
AVAILABLE macro reads ahead a large fixed amount to reduce the overhead of
repeatedly checking for the end of the string.  However if the needle length
is larger than this, two_way_long_needle may confuse this as meaning the end
of the string and return NULL.  This is fixed by adding the needle length to
the amount to read ahead.

[BZ #23637]
* string/test-strstr.c (pr23637): New function.
(test_main): Add tests with longer needles.
* string/strcasestr.c (AVAILABLE): Fix readahead distance.
* string/strstr.c (AVAILABLE): Likewise.

(cherry picked from commit 83a552b0bb9fc2a5e80a0ab3723c0a80ce1db9f2)

Speedup first memmem match

As done in commit 284f42bc778e487dfd5dff5c01959f93b9e0c4f5, memcmp
can be used after memchr to avoid the initialization overhead of the
two-way algorithm for the first match. This has shown improvement
>40% for first match.

(cherry picked from commit c8dd67e7c958de04c3783cbea7c384431707b5f8)

Simplify and speedup strstr/strcasestr first match

Looking at the benchtests, both strstr and strcasestr spend a lot of time
in a slow initialization loop handling one character per iteration.
This can be simplified and use the much faster strlen/strnlen/strchr/memcmp.
Read ahead a few cachelines to reduce the number of strnlen calls, which
improves performance by ~3-4%. This patch improves the time taken for the
full strstr benchtest by >40%.

* string/strcasestr.c (STRCASESTR): Simplify and speedup first match.
* string/strstr.c (AVAILABLE): Likewise.

(cherry picked from commit 284f42bc778e487dfd5dff5c01959f93b9e0c4f5)

Improve strstr performance

Improve strstr performance. Strstr tends to be slow because it uses
many calls to memchr and a slow byte loop to scan for the next match.
Performance is significantly improved by using strnlen on larger blocks
and using strchr to search for the next matching character. strcasestr
can also use strnlen to scan ahead, and memmem can use memchr to check
for the next match.

On the GLIBC bench tests the performance gains on Cortex-A72 are:
strstr: +25%
strcasestr: +4.3%
memmem: +18%

On a 256KB dataset strstr performance improves by 67%, strcasestr by 47%.

Reviewd-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 3ae725dfb6d7f61447d27d00ed83e573bd5454f4)

[AArch64] Add ifunc support for Ares

Add Ares to the midr_el0 list and support ifunc dispatch. Since Ares
supports 2 128-bit loads/stores, use Neon registers for memcpy by
selecting __memcpy_falkor by default (we should rename this to
__memcpy_simd or similar).

* manual/tunables.texi (glibc.cpu.name): Add ares tunable.
* sysdeps/aarch64/multiarch/memcpy.c (__libc_memcpy): Use
__memcpy_falkor for ares.
* sysdeps/unix/sysv/linux/aarch64/cpu-features.h (IS_ARES):
Add new define.
* sysdeps/unix/sysv/linux/aarch64/cpu-features.c (cpu_list):
Add ares cpu.

(cherry picked from commit 02f440c1ef5d5d79552a524065aa3e2fabe469b9)

aarch64,falkor: Use vector registers for memcpy

Vector registers perform better than scalar register pairs for copying
data so prefer them instead. This results in a time reduction of over
50% (i.e. 2x speed improvemnet) for some smaller sizes for memcpy-walk.
Larger sizes show improvements of around 1% to 2%. memcpy-random shows
a very small improvement, in the range of 1-2%.

* sysdeps/aarch64/multiarch/memcpy_falkor.S (__memcpy_falkor):
Use vector registers.

(cherry picked from commit 0aec4c1d1801e8016ebe89281d16597e0557b8be)

aarch64,falkor: Ignore prefetcher tagging for smaller copies

For smaller and medium sized copies, the effect of hardware
prefetching are not as dominant as instruction level parallelism.
Hence it makes more sense to load data into multiple registers than to
try and route them to the same prefetch unit. This is also the case
for the loop exit where we are unable to latch on to the same prefetch
unit anyway so it makes more sense to have data loaded in parallel.

The performance results are a bit mixed with memcpy-random, with
numbers jumping between -1% and +3%, i.e. the numbers don't seem
repeatable. memcpy-walk sees a 70% improvement (i.e. > 2x) for 128
bytes and that improvement reduces down as the impact of the tail copy
decreases in comparison to the loop.

* sysdeps/aarch64/multiarch/memcpy_falkor.S (__memcpy_falkor):
Use multiple registers to copy data in loop tail.

(cherry picked from commit db725a458e1cb0e17204daa543744faf08bb2e06)

aarch64/strncmp: Use lsr instead of mov+lsr

A lsr can do what the mov and lsr did.

(cherry picked from commit b47c3e7637efb77818cbef55dcd0ed1f0ea0ddf1)

aarch64/strncmp: Unbreak builds with old binutils

Binutils 2.26.* and older do not support moves with shifted registers,
so use a separate shift instruction instead.

(cherry picked from commit d46f84de745db8f3f06a37048261f4e5ceacf0a3)

aarch64: Improve strncmp for mutually misaligned inputs

The mutually misaligned inputs on aarch64 are compared with a simple
byte copy, which is not very efficient. Enhance the comparison
similar to strcmp by loading a double-word at a time. The peak
performance improvement (i.e. 4k maxlen comparisons) due to this on
the strncmp microbenchmark is as follows:

falkor: 3.5x (up to 72% time reduction)
cortex-a73: 3.5x (up to 71% time reduction)
cortex-a53: 3.5x (up to 71% time reduction)

All mutually misaligned inputs from 16 bytes maxlen onwards show
upwards of 15% improvement and there is no measurable effect on the
performance of aligned/mutually aligned inputs.

* sysdeps/aarch64/strncmp.S (count): New macro.
(strncmp): Store misaligned length in SRC1 in COUNT.
(mutual_align): Adjust.
(misaligned8): Load dword at a time when it is safe.

(cherry picked from commit 7108f1f944792ac68332967015d5e6418c5ccc88)

aarch64/strcmp: fix misaligned loop jump target

I accidentally set the loop jump back label as misaligned8 instead of
do_misaligned. The typo is harmless but it's always nice to not have
to unnecessarily execute those two instructions.

* sysdeps/aarch64/strcmp.S (do_misaligned): Jump back to
do_misaligned, not misaligned8.

(cherry picked from commit 6ca24c43481e2c93a6eec362b04c3e77a35b28e3)

aarch64: Improve strcmp unaligned performance

Replace the simple byte-wise compare in the misaligned case with a
dword compare with page boundary checks in place. For simplicity I've
chosen a 4K page boundary so that we don't have to query the actual
page size on the system.

This results in up to 3x improvement in performance in the unaligned
case on falkor and about 2.5x improvement on mustang as measured using
bench-strcmp.

* sysdeps/aarch64/strcmp.S (misaligned8): Compare dword at a
time whenever possible.

(cherry picked from commit 2bce01ebbaf8db52ba4a5635eb5744f989cdbf69)

aarch64: Fix branch target to loop16

I goofed up when changing the loop8 name to loop16 and missed on out
the branch instance. Fixed and actually build tested this time.

* sysdeps/aarch64/memcmp.S (more16): Fix branch target loop16.

(cherry picked from commit 4e54d918630ea53e29dd70d3bdffcb00d29ed3d4)

aarch64: Optimized memcmp for medium to large sizes

This improved memcmp provides a fast path for compares up to 16 bytes
and then compares 16 bytes at a time, thus optimizing loads from both
sources. The glibc memcmp microbenchmark retains performance (with an
error of ~1ns) for smaller compare sizes and reduces up to 31% of
execution time for compares up to 4K on the APM Mustang. On Qualcomm
Falkor this improves to almost 48%, i.e. it is almost 2x improvement
for sizes of 2K and above.

* sysdeps/aarch64/memcmp.S: Widen comparison to 16 bytes at a
time.

(cherry picked from commit 30a81dae5b752f8aa5f96e7f7c341ec57cba3585)

aarch64: Use the L() macro for labels in memcmp

The L() macro makes the assembly a bit more readable.

* sysdeps/aarch64/memcmp.S: Use L() macro for labels.

(cherry picked from commit 84c94d2fd90d84ae7e67657ee8e22c2d1b796f63)

[AArch64] Optimized memcmp.

This is an optimized memcmp for AArch64.  This is a complete rewrite
using a different algorithm.  The previous version split into cases
where both inputs were aligned, the inputs were mutually aligned and
unaligned using a byte loop.  The new version combines all these cases,
while small inputs of less than 8 bytes are handled separately.

This allows the main code to be sped up using unaligned loads since
there are now at least 8 bytes to be compared.  After the first 8 bytes,
align the first input.  This ensures each iteration does at most one
unaligned access and mutually aligned inputs behave as aligned.
After the main loop, process the last 8 bytes using unaligned accesses.

This improves performance of (mutually) aligned cases by 25% and
unaligned by >500% (yes >6 times faster) on large inputs.

* sysdeps/aarch64/memcmp.S (memcmp):
Rewrite of optimized memcmp.

(cherry picked from commit 922369032c604b4dcfd535e1bcddd4687e7126a5)

posix: Fix large mmap64 offset for mips64n32 (BZ#24699)

The fix for BZ#21270 (commit 158d5fa0e19) added a mask to avoid offset larger
than 1^44 to be used along __NR_mmap2. However mips64n32 users __NR_mmap,
as mips64n64, but still defines off_t as old non-LFS type (other ILP32, such
x32, defines off_t being equal to off64_t). This leads to use the same
mask meant only for __NR_mmap2 call for __NR_mmap, thus limiting the maximum
offset it can use with mmap64.

This patch fixes by setting the high mask only for __NR_mmap2 usage. The
posix/tst-mmap-offset.c already tests it and also fails for mips64n32. The
patch also change the test to check for an arch-specific header that defines
the maximum supported offset.

Checked on x86_64-linux-gnu, i686-linux-gnu, and I also tests tst-mmap-offset
on qemu simulated mips64 with kernel 3.2.0 kernel for both mips-linux-gnu and
mips64-n32-linux-gnu.

[BZ #24699]
* posix/tst-mmap-offset.c: Mention BZ #24699.
(do_test_bz21270): Rename to do_test_large_offset and use
mmap64_maximum_offset to check for maximum expected offset value.
* sysdeps/generic/mmap_info.h: New file.
* sysdeps/unix/sysv/linux/mips/mmap_info.h: Likewise.
* sysdeps/unix/sysv/linux/mmap64.c (MMAP_OFF_HIGH_MASK): Define iff
__NR_mmap2 is used.

(cherry picked from commit a008c76b56e4f958cf5a0d6f67d29fade89421b7)

aarch64: handle STO_AARCH64_VARIANT_PCS

Backport of commit 82bc69c012838a381c4167c156a06f4598f34227
and commit 30ba0375464f34e4bf8129f3d3dc14d0c09add17
without using DT_AARCH64_VARIANT_PCS for optimizing the symbol table check.
This is needed so the internal abi between ld.so and libc.so is unchanged.

Avoid lazy binding of symbols that may follow a variant PCS with different
register usage convention from the base PCS.

Currently the lazy binding entry code does not preserve all the registers
required for AdvSIMD and SVE vector calls. Saving and restoring all
registers unconditionally may break existing binaries, even if they never
use vector calls, because of the larger stack requirement for lazy
resolution, which can be significant on an SVE system.

The solution is to mark all symbols in the symbol table that may follow
a variant PCS so the dynamic linker can handle them specially. In this
patch such symbols are always resolved at load time, not lazily.

So currently LD_AUDIT for variant PCS symbols are not supported, for that
the _dl_runtime_profile entry needs to be changed e.g. to unconditionally
save/restore all registers (but pass down arg and retval registers to
pltentry/exit callbacks according to the base PCS).

This patch also removes a __builtin_expect from the modified code because
the branch prediction hint did not seem useful.

* sysdeps/aarch64/dl-machine.h (elf_machine_lazy_rel): Check
STO_AARCH64_VARIANT_PCS and bind such symbols at load time.

aarch64: add STO_AARCH64_VARIANT_PCS and DT_AARCH64_VARIANT_PCS

STO_AARCH64_VARIANT_PCS is a non-visibility st_other flag for marking
symbols that reference functions that may follow a variant PCS with
different register usage convention from the base PCS.

DT_AARCH64_VARIANT_PCS is a dynamic tag that marks ELF modules that
have R_*_JUMP_SLOT relocations for symbols marked with
STO_AARCH64_VARIANT_PCS (i.e. have variant PCS calls via a PLT).

* elf/elf.h (STO_AARCH64_VARIANT_PCS): Define.
(DT_AARCH64_VARIANT_PCS): Define.

Fix tcache count maximum (BZ #24531)

The tcache counts[] array is a char, which has a very small range and thus
may overflow. When setting tcache_count tunable, there is no overflow check.
However the tunable must not be larger than the maximum value of the tcache
counts[] array, otherwise it can overflow when filling the tcache.

[BZ #24531]
* malloc/malloc.c (MAX_TCACHE_COUNT): New define.
(do_set_tcache_count): Only update if count is small enough.
* manual/tunables.texi (glibc.malloc.tcache_count): Document max value.

(cherry picked from commit 5ad533e8e65092be962e414e0417112c65d154fb)

Fix crash in _IO_wfile_sync (bug 20568)

When computing the length of the converted part of the stdio buffer, use
the number of consumed wide characters, not the (negative) distance to the
end of the wide buffer.

(cherry picked from commit 32ff397533715988c19cbf3675dcbd727ec13e18)

Add compiler barriers around modifications of the robust mutex list for pthread_mutex_trylock. [BZ #24180]

While debugging a kernel warning, Thomas Gleixner, Sebastian Sewior and
Heiko Carstens found a bug in pthread_mutex_trylock due to misordered
instructions:
140:   a5 1b 00 01             oill    %r1,1
144:   e5 48 a0 f0 00 00       mvghi   240(%r10),0   <--- THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, NULL);
14a:   e3 10 a0 e0 00 24       stg     %r1,224(%r10) <--- last THREAD_SETMEM of ENQUEUE_MUTEX_PI

vs (with compiler barriers):
140:   a5 1b 00 01             oill    %r1,1
144:   e3 10 a0 e0 00 24       stg     %r1,224(%r10)
14a:   e5 48 a0 f0 00 00       mvghi   240(%r10),0

Please have a look at the discussion:
"Re: WARN_ON_ONCE(!new_owner) within wake_futex_pi() triggerede"
(https://lore.kernel.org/lkml/20190202112006.GB3381@osiris/)

This patch is introducing the same compiler barriers and comments
for pthread_mutex_trylock as introduced for pthread_mutex_lock and
pthread_mutex_timedlock by commit 8f9450a0b7a9e78267e8ae1ab1000ebca08e473e
"Add compiler barriers around modifications of the robust mutex list."

ChangeLog:

[BZ #24180]
* nptl/pthread_mutex_trylock.c (__pthread_mutex_trylock):
Add compiler barriers and comments.

(cherry picked from commit 823624bdc47f1f80109c9c52dee7939b9386d708)

x86-64 memcmp: Use unsigned Jcc instructions on size [BZ #24155]

Since the size argument is unsigned. we should use unsigned Jcc
instructions, instead of signed, to check size.

Tested on x86-64 and x32, with and without --disable-multi-arch.

[BZ #24155]
CVE-2019-7309
* NEWS: Updated for CVE-2019-7309.
* sysdeps/x86_64/memcmp.S: Use RDX_LP for size. Clear the
upper 32 bits of RDX register for x32. Use unsigned Jcc
instructions, instead of signed.
* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-memcmp-2.
* sysdeps/x86_64/x32/tst-size_t-memcmp-2.c: New test.

(cherry picked from commit 3f635fb43389b54f682fc9ed2acc0b2aaf4a923d)

x86-64 strnlen/wcsnlen: Properly handle the length parameter [BZ #24097]

On x32, the size_t parameter may be passed in the lower 32 bits of a
64-bit register with the non-zero upper 32 bits. The string/memory
functions written in assembly can only use the lower 32 bits of a
64-bit register as length or must clear the upper 32 bits before using
the full 64-bit register for length.

This pach fixes strnlen/wcsnlen for x32. Tested on x86-64 and x32. On
x86-64, libc.so is the same with and withou the fix.

[BZ #24097]
CVE-2019-6488
* sysdeps/x86_64/multiarch/strlen-avx2.S: Use RSI_LP for length.
Clear the upper 32 bits of RSI register.
* sysdeps/x86_64/strlen.S: Use RSI_LP for length.
* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-strnlen
and tst-size_t-wcsnlen.
* sysdeps/x86_64/x32/tst-size_t-strnlen.c: New file.
* sysdeps/x86_64/x32/tst-size_t-wcsnlen.c: Likewise.

(cherry picked from commit 5165de69c0908e28a380cbd4bb054e55ea4abc95)

x86-64 strncpy: Properly handle the length parameter [BZ #24097]

On x32, the size_t parameter may be passed in the lower 32 bits of a
64-bit register with the non-zero upper 32 bits. The string/memory
functions written in assembly can only use the lower 32 bits of a
64-bit register as length or must clear the upper 32 bits before using
the full 64-bit register for length.

This pach fixes strncpy for x32. Tested on x86-64 and x32. On x86-64,
libc.so is the same with and withou the fix.

[BZ #24097]
CVE-2019-6488
* sysdeps/x86_64/multiarch/strcpy-sse2-unaligned.S: Use RDX_LP
for length.
* sysdeps/x86_64/multiarch/strcpy-ssse3.S: Likewise.
* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-strncpy.
* sysdeps/x86_64/x32/tst-size_t-strncpy.c: New file.

(cherry picked from commit c7c54f65b080affb87a1513dee449c8ad6143c8b)

x86-64 strncmp family: Properly handle the length parameter [BZ #24097]

On x32, the size_t parameter may be passed in the lower 32 bits of a
64-bit register with the non-zero upper 32 bits. The string/memory
functions written in assembly can only use the lower 32 bits of a
64-bit register as length or must clear the upper 32 bits before using
the full 64-bit register for length.

This pach fixes the strncmp family for x32. Tested on x86-64 and x32.
On x86-64, libc.so is the same with and withou the fix.

[BZ #24097]
CVE-2019-6488
* sysdeps/x86_64/multiarch/strcmp-sse42.S: Use RDX_LP for length.
* sysdeps/x86_64/strcmp.S: Likewise.
* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-strncasecmp,
tst-size_t-strncmp and tst-size_t-wcsncmp.
* sysdeps/x86_64/x32/tst-size_t-strncasecmp.c: New file.
* sysdeps/x86_64/x32/tst-size_t-strncmp.c: Likewise.
* sysdeps/x86_64/x32/tst-size_t-wcsncmp.c: Likewise.

(cherry picked from commit ee915088a0231cd421054dbd8abab7aadf331153)

x86-64 memset/wmemset: Properly handle the length parameter [BZ #24097]

On x32, the size_t parameter may be passed in the lower 32 bits of a
64-bit register with the non-zero upper 32 bits.  The string/memory
functions written in assembly can only use the lower 32 bits of a
64-bit register as length or must clear the upper 32 bits before using
the full 64-bit register for length.

This pach fixes memset/wmemset for x32.  Tested on x86-64 and x32.  On
x86-64, libc.so is the same with and withou the fix.

[BZ #24097]
CVE-2019-6488
* sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S: Use
RDX_LP for length.  Clear the upper 32 bits of RDX register.
* sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Likewise.
* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-wmemset.
* sysdeps/x86_64/x32/tst-size_t-memset.c: New file.
* sysdeps/x86_64/x32/tst-size_t-wmemset.c: Likewise.

(cherry picked from commit 82d0b4a4d76db554eb6757acb790fcea30b19965)

x86-64 memrchr: Properly handle the length parameter [BZ #24097]

On x32, the size_t parameter may be passed in the lower 32 bits of a
64-bit register with the non-zero upper 32 bits. The string/memory
functions written in assembly can only use the lower 32 bits of a
64-bit register as length or must clear the upper 32 bits before using
the full 64-bit register for length.

This pach fixes memrchr for x32. Tested on x86-64 and x32. On x86-64,
libc.so is the same with and withou the fix.

[BZ #24097]
CVE-2019-6488
* sysdeps/x86_64/memrchr.S: Use RDX_LP for length.
* sysdeps/x86_64/multiarch/memrchr-avx2.S: Likewise.
* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-memrchr.
* sysdeps/x86_64/x32/tst-size_t-memrchr.c: New file.

(cherry picked from commit ecd8b842cf37ea112e59cd9085ff1f1b6e208ae0)

x86-64 memcpy: Properly handle the length parameter [BZ #24097]

On x32, the size_t parameter may be passed in the lower 32 bits of a
64-bit register with the non-zero upper 32 bits.  The string/memory
functions written in assembly can only use the lower 32 bits of a
64-bit register as length or must clear the upper 32 bits before using
the full 64-bit register for length.

This pach fixes memcpy for x32.  Tested on x86-64 and x32.  On x86-64,
libc.so is the same with and withou the fix.

[BZ #24097]
CVE-2019-6488
* sysdeps/x86_64/multiarch/memcpy-ssse3-back.S: Use RDX_LP for
length.  Clear the upper 32 bits of RDX register.
* sysdeps/x86_64/multiarch/memcpy-ssse3.S: Likewise.
* sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S:
Likewise.
* sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:
Likewise.
* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-memcpy.
tst-size_t-wmemchr.
* sysdeps/x86_64/x32/tst-size_t-memcpy.c: New file.

(cherry picked from commit 231c56760c1e2ded21ad96bbb860b1f08c556c7a)

x86-64 memcmp/wmemcmp: Properly handle the length parameter [BZ #24097]

On x32, the size_t parameter may be passed in the lower 32 bits of a
64-bit register with the non-zero upper 32 bits.  The string/memory
functions written in assembly can only use the lower 32 bits of a
64-bit register as length or must clear the upper 32 bits before using
the full 64-bit register for length.

This pach fixes memcmp/wmemcmp for x32.  Tested on x86-64 and x32.  On
x86-64, libc.so is the same with and withou the fix.

[BZ #24097]
CVE-2019-6488
* sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S: Use RDX_LP for
length.  Clear the upper 32 bits of RDX register.
* sysdeps/x86_64/multiarch/memcmp-sse4.S: Likewise.
* sysdeps/x86_64/multiarch/memcmp-ssse3.S: Likewise.
* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-memcmp and
tst-size_t-wmemcmp.
* sysdeps/x86_64/x32/tst-size_t-memcmp.c: New file.
* sysdeps/x86_64/x32/tst-size_t-wmemcmp.c: Likewise.

(cherry picked from commit b304fc201d2f6baf52ea790df8643e99772243cd)

x86-64 memchr/wmemchr: Properly handle the length parameter [BZ #24097]

On x32, the size_t parameter may be passed in the lower 32 bits of a
64-bit register with the non-zero upper 32 bits.  The string/memory
functions written in assembly can only use the lower 32 bits of a
64-bit register as length or must clear the upper 32 bits before using
the full 64-bit register for length.

This pach fixes memchr/wmemchr for x32.  Tested on x86-64 and x32.  On
x86-64, libc.so is the same with and withou the fix.

[BZ #24097]
CVE-2019-6488
* sysdeps/x86_64/memchr.S: Use RDX_LP for length.  Clear the
upper 32 bits of RDX register.
* sysdeps/x86_64/multiarch/memchr-avx2.S: Likewise.
* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-memchr and
tst-size_t-wmemchr.
* sysdeps/x86_64/x32/test-size_t.h: New file.
* sysdeps/x86_64/x32/tst-size_t-memchr.c: Likewise.
* sysdeps/x86_64/x32/tst-size_t-wmemchr.c: Likewise.

(cherry picked from commit 97700a34f36721b11a754cf37a1cc40695ece1fd)

powerpc: Regenerate ULPs

On POWER9, cbrtf128 fails by 1 ULP.

* sysdeps/powerpc/fpu/libm-test-ulps: Regenerate.

Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
(cherry picked from commit 428fc49eaafe0fe5352445fcf23d9f603e9083a2)

[BZ #21745] powerpc: build some IFUNC math functions for libc and libm

Some math functions have to be distributed in libc because they're
required by printf.
libc and libm require their own builds of these functions, e.g. libc
functions have to call __stack_chk_fail_local in order to bypass the
PLT, while libm functions have to call __stack_chk_fail.

While math/Makefile treat the generic cases, i.e. s_isinff, the
multiarch Makefile has to treat its own files, i.e. s_isinff-ppc64.

[BZ #21745]
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile:
[$(subdir) = math] (sysdep_calls): New variable. Has the
previous contents of sysdep_routines, but re-sorted..
[$(subdir) = math] (sysdep_routines): Re-use the contents from
sysdep_calls.
[$(subdir) = math] (libm-sysdep_routines): Remove the functions
defined in sysdep_calls and replace by the respective m_* names.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-ppc64.S:
(compat_symbol): Undefine to avoid duplicated compat symbols in
libc.

(cherry picked from commit 61c45f250528dae431391823a9766053e61ccde1)

intl: Do not return NULL on asprintf failure in gettext [BZ #24018]

Fixes commit 9695dd0c9309712ed8e9c17a7040fe7af347f2dc ("DCIGETTEXT:
Use getcwd, asprintf to construct absolute pathname").

(cherry picked from commit 8c1aafc1f34d090a5b41dc527c33e8687f6a1287)

malloc: Always call memcpy in _int_realloc [BZ #24027]

This commit removes the custom memcpy implementation from _int_realloc
for small chunk sizes. The ncopies variable has the wrong type, and
an integer wraparound could cause the existing code to copy too few
elements (leaving the new memory region mostly uninitialized).
Therefore, removing this code fixes bug 24027.

(cherry picked from commit b50dd3bc8cbb1efe85399b03d7e6c0310c2ead84)

CVE-2018-19591: if_nametoindex: Fix descriptor for overlong name [BZ #23927]

(cherry picked from commit d527c860f5a3f0ed687bd03f0cb464612dc23408)

Add an additional test to resolv/tst-resolv-network.c

Test for the infinite loop in getnetbyname, bug #17630.

(cherry picked from commit ac8060265bcaca61568ef3a20b9a0140a270af54)

libanl: properly cleanup if first helper thread creation failed (bug 22927)

(cherry picked from commit bd3b0fbae33a9a4cc5e2daf049443d5cf03d4251)

x86: Fix Haswell CPU string flags (BZ#23709)

Th commit 'Disable TSX on some Haswell processors.' (2702856bf4) changed the
default flags for Haswell models. Previously, new models were handled by the
default switch path, which assumed a Core i3/i5/i7 if AVX is available. After
the patch, Haswell models (0x3f, 0x3c, 0x45, 0x46) do not set the flags
Fast_Rep_String, Fast_Unaligned_Load, Fast_Unaligned_Copy, and
Prefer_PMINUB_for_stringop (only the TSX one).

This patch fixes it by disentangle the TSX flag handling from the memory
optimization ones. The strstr case cited on patch now selects the
__strstr_sse2_unaligned as expected for the Haswell cpu.

Checked on x86_64-linux-gnu.

[BZ #23709]
* sysdeps/x86/cpu-features.c (init_cpu_features): Set TSX bits
independently of other flags.

(cherry picked from commit c3d8dc45c9df199b8334599a6cbd98c9950dba62)

Disable -Wrestrict for two nptl/tst-attr3.c tests.

nptl/tst-attr3 fails to build with GCC mainline because of
(deliberate) aliasing between the second (attributes) and fourth
(argument to thread start routine) arguments to pthread_create.

Although both those arguments are restrict-qualified in POSIX,
pthread_create does not actually dereference its fourth argument; it's
an opaque pointer passed to the thread start routine.  Thus, the
aliasing is actually valid in this case, and it's deliberate in the
test.  So this patch makes the test disable -Wrestrict for the two
pthread_create calls in question.  (-Wrestrict was added in GCC 7,
hence the __GNUC_PREREQ conditions, but the particular warning in
question is new in GCC 8.)

Tested compilation with build-many-glibcs.py for aarch64-linux-gnu.

* nptl/tst-attr3.c: Include <libc-diag.h>.
(do_test) [__GNUC_PREREQ (7, 0)]: Ignore -Wrestrict for two tests.

(cherry picked from commit 40c4162df6766fb1e8ede875ca8df25d8075d3a5)

Fix string/bug-strncat1.c build with GCC 8.

GCC 8 warns about strncat calls with truncated output.
string/bug-strncat1.c tests such a call; this patch disables the
warning for it.

Tested (compilation) with GCC 8 for x86_64-linux-gnu with
build-many-glibcs.py (in conjunction with Martin's patch to allow
glibc to build).

* string/bug-strncat1.c: Include <libc-diag.h>.
(main): Disable -Wstringop-truncation for strncat call for GCC 8.

(cherry picked from commit ec72135e5f1d061cb5cf7cd1b855fd6290be10d9)

Ignore -Wrestrict for one strncat test.

With current GCC mainline, one strncat test involving a size close to
SIZE_MAX results in a -Wrestrict warning that that buffer size would
imply that the two buffers must overlap. This patch fixes the build
by adding disabling of -Wrestrict (for GCC versions supporting that
option) to the already-present disabling of -Wstringop-overflow= and
-Warray-bounds for this test.

Tested with build-many-glibcs.py that this restores the testsuite
build with GCC mainline for aarch64-linux-gnu.

* string/tester.c (test_strncat) [__GNUC_PREREQ (7, 0)]: Also
ignore -Wrestrict for one test.

(cherry picked from commit 35ebb6b0c48bc671d9c54e089884f9bf6fca540e)

Disable strncat test array-bounds warnings for GCC 8.

Some strncat tests fail to build with GCC 8 because of -Warray-bounds
warnings. These tests are deliberately test over-large size arguments
passed to strncat, and already disable -Wstringop-overflow warnings,
but now the warnings for these tests come under -Warray-bounds so that
option needs disabling for them as well, which this patch does (with
an update on the comments; the DIAG_IGNORE_NEEDS_COMMENT call for
-Warray-bounds doesn't need to be conditional itself, because that
option is supported by all versions of GCC that can build glibc).

Tested compilation with build-many-glibcs.py for aarch64-linux-gnu.

* string/tester.c (test_strncat): Also disable -Warray-bounds
warnings for two tests.

(cherry picked from commit 1421f39b7eadd3b5fbd2a3f2da1fc006b69fbc42)

Fix string/tester.c build with GCC 8.

GCC 8 warns about more cases of string functions truncating their
output or not copying a trailing NUL byte.

This patch fixes testsuite build failures caused by such warnings in
string/tester.c.  In general, the warnings are disabled around the
relevant calls using DIAG_* macros, since the relevant cases are being
deliberately tested.  In one case, the warning is with
-Wstringop-overflow= instead of -Wstringop-truncation; in that case,
the conditional is __GNUC_PREREQ (7, 0) (being the version where
-Wstringop-overflow= was introduced), to allow the conditional to be
removed sooner, since it's harmless to disable the warning for a
GCC version where it doesn't actually occur.  In the case of warnings
for strncpy calls in test_memcmp, the calls in question are changed to
use memcpy, as they don't copy a trailing NUL and the point of that
code is to test memcmp rather than strncpy.

Tested (compilation) with GCC 8 for x86_64-linux-gnu with
build-many-glibcs.py (in conjunction with Martin's patch to allow
glibc to build).

* string/tester.c (test_stpncpy): Disable -Wstringop-truncation
for stpncpy calls for GCC 8.
(test_strncat): Disable -Wstringop-truncation warning for strncat
calls for GCC 8.  Disable -Wstringop-overflow= warning for one
strncat call for GCC 7.
(test_strncpy): Disable -Wstringop-truncation warning for strncpy
calls for GCC 8.
(test_memcmp): Use memcpy instead of strncpy for calls not copying
trailing NUL.

(cherry picked from commit 2e64ec9c9eac3aeb70f7cfa2392846c87c28068e)

Fix nscd readlink argument aliasing (bug 22446).

Current GCC mainline detects that nscd calls readlink with the same
buffer for both input and output, which is not valid (those arguments
are both restrict-qualified in POSIX). This patch makes it use a
separate buffer for readlink's input (with a size that is sufficient
to avoid truncation, so there should be no problems with warnings
about possible truncation, though not strictly minimal, but much
smaller than the buffer for output) to avoid this problem.

Tested compilation for aarch64-linux-gnu with build-many-glibcs.py.

[BZ #22446]
* nscd/connections.c (handle_request) [SO_PEERCRED]: Use separate
buffers for readlink input and output.

(cherry picked from commit 49b036bce9f021ae994a85aee8b410d20b29c8b7)

Increase buffer size due to warning from ToT GCC

* nscd/dbg_log.c (dbg_log): Increase msg buffer size.

(cherry picked from commit a7e3edf4f252fb72afeb8ecca946a2d8294bb577)

Fix p_secstodate overflow handling (bug 22463).

The resolv/res_debug.c function p_secstodate (which is a public
function exported from libresolv, taking an unsigned long argument)
does:

        struct tm timebuf;
        time = __gmtime_r(&clock, &timebuf);
        time->tm_year += 1900;
        time->tm_mon += 1;
        sprintf(output, "%04d%02d%02d%02d%02d%02d",
                time->tm_year, time->tm_mon, time->tm_mday,
                time->tm_hour, time->tm_min, time->tm_sec);

If __gmtime_r returns NULL (because the year overflows the range of
int), this will dereference a null pointer.  Otherwise, if the
computed year does not fit in four characters, this will cause a
buffer overrun of the fixed-size 15-byte buffer.  With current GCC
mainline, there is a compilation failure because of the possible
buffer overrun.

I couldn't find a specification for how this function is meant to
behave, but Paul pointed to RFC 4034 as relevant to the cases where
this function is called from within glibc.  The function's interface
is inherently problematic when dates beyond Y2038 might be involved,
because of the ambiguity in how to interpret 32-bit timestamps as such
dates (the RFC suggests interpreting times as being within 68 years of
the present date, which would mean some kind of interface whose
behavior depends on the present date).

This patch works on the basis of making a minimal fix in preparation
for obsoleting the function.  The function is made to handle times in
the interval [0, 0x7fffffff] only, on all platforms, with <overflow>
used as the output string in other cases (and errno set to EOVERFLOW
in such cases).  This seems to be a reasonable state for the function
to be in when made a compat symbol by a future patch, being compatible
with any existing uses for existing timestamps without trying to work
for later timestamps.  Results independent of the range of time_t also
simplify the testcase.

I couldn't persuade GCC to recognize the ranges of the struct tm
fields by adding explicit range checks with a call to
__builtin_unreachable if outside the range (this looks similar to
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80776>), so having added
a range check on the input, this patch then disables the
-Wformat-overflow= warning for the sprintf call (I prefer that to the
use of strftime, as being more transparently correct without knowing
what each of %m and %M etc. is).

I do not know why this build failure should be new with mainline GCC
(that is, I don't know what GCC change might have introduced it, when
the basic functionality for such warnings was already in GCC 7).

I do not know if this is a security issue (that is, if there are
plausible ways in which a date before -999 or after 9999 from an
untrusted source might end up in this function).  The system clock is
arguably an untrusted source (in that e.g. NTP is insecure), but
probably not to that extent (NTP can't communicate such wild
timestamps), and uses from within glibc are limited to 32-bit inputs.

Tested with build-many-glibcs.py that this restores the build for arm
with yesterday's mainline GCC.  Also tested for x86_64 and x86.

[BZ #22463]
* resolv/res_debug.c: Include <libc-diag.h>.
(p_secstodate): Assert time_t at least as wide as u_long.  On
overflow, use integer seconds since the epoch as output, or use
"<overflow>" as output and set errno to EOVERFLOW if integer
seconds since the epoch would be 14 or more characters.
(p_secstodate) [__GNUC_PREREQ (7, 0)]: Disable -Wformat-overflow=
for sprintf call.
* resolv/tst-p_secstodate.c: New file.
* resolv/Makefile (tests): Add tst-p_secstodate.
($(objpfx)tst-p_secstodate): Depend on $(objpfx)libresolv.so.

(cherry picked from commit f120cda6072d830df92656dad0c89967547b97dc)

timezone: pacify GCC -Wstringop-truncation

Problem reported by Martin Sebor in:
https://sourceware.org/ml/libc-alpha/2017-11/msg00336.html
* timezone/zic.c (writezone): Use memcpy, not strncpy.

(cherry picked from commit e69897bf202e18034cbef26f363bae64de70a196)

utmp: Avoid -Wstringop-truncation warning

The -Wstringop-truncation option new in GCC 8 detects common misuses
of the strncat and strncpy function that may result in truncating
the copied string before the terminating NUL. To avoid false positive
warnings for correct code that intentionally creates sequences of
characters that aren't guaranteed to be NUL-terminated, arrays that
are intended to store such sequences should be decorated with a new
nonstring attribute. This change add this attribute to Glibc and
uses it to suppress such false positives.

ChangeLog:
* misc/sys/cdefs.h (__attribute_nonstring__): New macro.
* sysdeps/gnu/bits/utmp.h (struct utmp): Use it.
* sysdeps/unix/sysv/linux/s390/bits/utmp.h (struct utmp): Same.

(cherry picked from commit 7532837d7b03b3ca5b9a63d77a5bd81dd23f3d9c)

Avoid use of strlen in getlogin_r (bug 22447).

Building glibc with current mainline GCC fails, among other reasons,
because of an error for use of strlen on the nonstring ut_user field.
This patch changes the problem code in getlogin_r to use __strnlen
instead. It also needs to set the trailing NUL byte of the result
explicitly, because of the case where ut_user does not have such a
trailing NUL byte (but the result should always have one).

Tested for x86_64. Also tested that, in conjunction with
<https://sourceware.org/ml/libc-alpha/2017-11/msg00797.html>, it fixes
the build for arm with mainline GCC.

[BZ #22447]
* sysdeps/unix/getlogin_r.c (__getlogin_r): Use __strnlen not
strlen to compute length of ut_user and set trailing NUL byte of
result explicitly.

(cherry picked from commit 4bae615022cb5a5da79ccda83cc6c9ba9f2d479c)

signal: Use correct type for si_band in siginfo_t [BZ #23562]

(cherry picked from commit f997b4be18f7e57d757d39e42f7715db26528aa0)

Fix misreported errno on preadv2/pwritev2 (BZ#23579)

The fallback code of Linux wrapper for preadv2/pwritev2 executes
regardless of the errno code for preadv2, instead of the case where
the syscall is not supported.

This fixes it by calling the fallback code iff errno is ENOSYS. The
patch also adds tests for both invalid file descriptor and invalid
iov_len and vector count.

The only discrepancy between preadv2 and fallback code regarding
error reporting is when an invalid flags are used. The fallback code
bails out earlier with ENOTSUP instead of EINVAL/EBADF when the syscall
is used.

Checked on x86_64-linux-gnu on a 4.4.0 and 4.15.0 kernel.

[BZ #23579]
* misc/tst-preadvwritev2-common.c (do_test_with_invalid_fd): New
test.
* misc/tst-preadvwritev2.c, misc/tst-preadvwritev64v2.c (do_test):
Call do_test_with_invalid_fd.
* sysdeps/unix/sysv/linux/preadv2.c (preadv2): Use fallback code iff
errno is ENOSYS.
* sysdeps/unix/sysv/linux/preadv64v2.c (preadv64v2): Likewise.
* sysdeps/unix/sysv/linux/pwritev2.c (pwritev2): Likewise.
* sysdeps/unix/sysv/linux/pwritev64v2.c (pwritev64v2): Likewise.

(cherry picked from commit 7a16bdbb9ff4122af0a28dc20996c95352011fdd)

preadv2/pwritev2: Handle offset == -1 [BZ #22753]

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit d4b4a00a462348750bb18544eb30853ee6ac5d10)

Fix segfault in maybe_script_execute.

If glibc is built with gcc 8 and -march=z900,
the testcase posix/tst-spawn4-compat crashes with a segfault.

In function maybe_script_execute, the new_argv array is dynamically
initialized on stack with (argc + 1) elements.
The function wants to add _PATH_BSHELL as the first argument
and writes out of bounds of new_argv.
There is an off-by-one because maybe_script_execute fails to count
the terminating NULL when sizing new_argv.

ChangeLog:

* sysdeps/unix/sysv/linux/spawni.c (maybe_script_execute):
Increment size of new_argv by one.

(cherry picked from commit 28669f86f6780a18daca264f32d66b1428c9c6f1)

pthread_cond_broadcast: Fix waiters-after-spinning case [BZ #23538]

(cherry picked from commit 99ea93ca31795469d2a1f1570f17a5c39c2eb7e2)

x86: Populate COMMON_CPUID_INDEX_80000001 for Intel CPUs [BZ #23459]

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
[BZ #23459]
* sysdeps/x86/cpu-features.c (get_extended_indices): New
function.
(init_cpu_features): Call get_extended_indices for both Intel
and AMD CPUs.
* sysdeps/x86/cpu-features.h (COMMON_CPUID_INDEX_80000001):
Remove "for AMD" comment.

(cherry picked from commit be525a69a6630abc83144c0a96474f2e26da7443)

x86: Correct index_cpu_LZCNT [BZ #23456]

cpu-features.h has

#define bit_cpu_LZCNT (1 << 5)
#define index_cpu_LZCNT COMMON_CPUID_INDEX_1
#define reg_LZCNT

But the LZCNT feature bit is in COMMON_CPUID_INDEX_80000001:

Initial EAX Value: 80000001H
ECX Extended Processor Signature and Feature Bits:
Bit 05: LZCNT available

index_cpu_LZCNT should be COMMON_CPUID_INDEX_80000001, not
COMMON_CPUID_INDEX_1. The VMX feature bit is in COMMON_CPUID_INDEX_1:

Initial EAX Value: 01H
Feature Information Returned in the ECX Register:
5 VMX

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
[BZ #23456]
* sysdeps/x86/cpu-features.h (index_cpu_LZCNT): Set to
COMMON_CPUID_INDEX_80000001.

(cherry picked from commit 65d87ade1ee6f3ac099105e3511bd09bdc24cf3f)

conform/conformtest.pl: Escape literal braces in regular expressions

This suppresses Perl warnings like these:

Unescaped left brace in regex is deprecated here (and will be fatal in
Perl 5.32), passed through in regex; marked by <-- HERE in m/^element
*({ <-- HERE ([^}]*)}|([^{ ]*)) *({([^}]*)}|([^{ ]*)) *([A-Za-z0-9_]*)
*(.*)/ at conformtest.pl line 370.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit ddb3c626b0a159de3547d901420198b626c29554)

stdio-common/tst-printf.c: Remove part under a non-free license [BZ #23363]

The license does not allow modification.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 5a357506659f9a00fcf5bc9c5d8fc676175c89a7)

libio: Add tst-vtables, tst-vtables-interposed

(cherry picked from commit 29055464a03c72762969a2e8734d0d05d4d70e58)

Some adjustments were needed for a tricky multi-inclusion issue related
to libioP.h.

Synchronize support/ infrastructure with master

This commit updates the support/ subdirectory to
commit 5c0202af4b3d588c04bcec7baf05706b21cd7416
on the master branch.

NEWS: Reorder out-of-order bugs

libio: Disable vtable validation in case of interposition [BZ #23313]

(cherry picked from commit c402355dfa7807b8e0adb27c009135a7e2b9f1b0)

Check length of ifname before copying it into to ifreq structure.

[BZ #22442]
* sysdeps/unix/sysv/linux/if_index.c (__if_nametoindex):
Check if ifname is too long.

(cherry picked from commit 2180fee114b778515b3f560e5ff1e795282e60b0)

getifaddrs: Don't return ifa entries with NULL names [BZ #21812]

A lookup operation in map_newlink could turn into an insert because of
holes in the interface part of the map. This leads to incorrectly set
the name of the interface to NULL when the interface is not present
for the address being processed (most likely because the interface was
added between the RTM_GETLINK and RTM_GETADDR calls to the kernel).
When such changes are detected by the kernel, it'll mark the dump as
"inconsistent" by setting NLM_F_DUMP_INTR flag on the next netlink
message.

This patch checks this condition and retries the whole operation.
Hopes are that next time the interface corresponding to the address
entry is present in the list and correct name is returned.

(cherry picked from commit c1f86a33ca32e26a9d6e29fc961e5ecb5e2e5eb4)

Use _STRUCT_TIMESPEC as guard in <bits/types/struct_timespec.h> [BZ #23349]

After commit d76d3703551a362b472c866b5b6089f66f8daa8e ("Fix missing
timespec definition for sys/stat.h (BZ #21371)") in combination with
kernel UAPI changes, GCC sanitizer builds start to fail due to a
conflicting definition of struct timespec in <linux/time.h>. Use
_STRUCT_TIMESPEC as the header file inclusion guard, which is already
checked in the kernel header, to support including <linux/time.h> and
<sys/stat.h> in the same translation unit.

(cherry picked from commit c1c2848b572ea7f92b7fa81dd5b1b9ef7c69b83b)

Fix parameter type in C++ version of iseqsig (bug 23171)

The commit

  commit c85e54ac6cef0faed7b7ffc722f52523dec59bf5
  Author: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
  Date:   Fri Nov 3 10:44:36 2017 -0200

      Provide a C++ version of iseqsig (bug 22377)

mistakenly used double parameters in the long double version of iseqsig,
thus causing spurious conversions to double, as reported on bug 23171.

Tested for powerpc64le and x86_64.

(cherry picked from commit fb0e10b8eb1ebb68c57d4551f7a95118f3c54837)

libio: Avoid _allocate_buffer, _free_buffer function pointers [BZ #23236]

These unmangled function pointers reside on the heap and could
be targeted by exploit writers, effectively bypassing libio vtable
validation. Instead, we ignore these pointers and always call
malloc or free.

In theory, this is a backwards-incompatible change, but using the
global heap instead of the user-supplied callback functions should
have little application impact. (The old libstdc++ implementation
exposed this functionality via a public, undocumented constructor
in its strstreambuf class.)

(cherry picked from commit 4e8a6346cd3da2d88bbad745a1769260d36f2783)

Add NEWS entry for CVE-2018-11236

Add references to CVE-2018-11236, CVE-2017-18269

Add a test case for [BZ #23196]

[BZ #23196]
* string/test-memcpy.c (do_test1): New function.
(test_main): Call it.

(cherry picked from commit ed983107bbc62245b06b99f02e69acf36a0baa3e)

Don't write beyond destination in __mempcpy_avx512_no_vzeroupper (bug 23196)

When compiled as mempcpy, the return value is the end of the destination
buffer, thus it cannot be used to refer to the start of it.

(cherry picked from commit 9aaaab7c6e4176e61c59b0a63c6ba906d875dc0e)

Fix path length overflow in realpath [BZ #22786]

Integer addition overflow may cause stack buffer overflow
when realpath() input length is close to SSIZE_MAX.

2018-05-09 Paul Pluzhnikov <ppluzhnikov@google.com>

[BZ #22786]
* stdlib/canonicalize.c (__realpath): Fix overflow in path length
computation.
* stdlib/Makefile (test-bz22786): New test.
* stdlib/test-bz22786.c: New test.

(cherry picked from commit 5460617d1567657621107d895ee2dd83bc1f88f2)

Fix stack overflow with huge PT_NOTE segment [BZ #20419]

A PT_NOTE in a binary could be arbitratily large, so using alloca
for it may cause stack overflow. If the note is larger than
__MAX_ALLOCA_CUTOFF, use dynamically allocated memory to read it in.

2018-05-05 Paul Pluzhnikov <ppluzhnikov@google.com>

[BZ #20419]
* elf/dl-load.c (open_verify): Fix stack overflow.
* elf/Makefile (tst-big-note): New test.
* elf/tst-big-note-lib.S: New.
* elf/tst-big-note.c: New.

(cherry picked from commit 0065aaaaae51cd60210ec3a7e13dddd8e01ffe2c)

Fix blocking pthread_join. [BZ #23137]

On s390 (31bit) if glibc is build with -Os, pthread_join sometimes
blocks indefinitely. This is e.g. observable with
testcase intl/tst-gettext6.

pthread_join is calling lll_wait_tid(tid), which performs the futex-wait
syscall in a loop as long as tid != 0 (thread is alive).

On s390 (and build with -Os), tid is loaded from memory before
comparing against zero and then the tid is loaded a second time
in order to pass it to the futex-wait-syscall.
If the thread exits in between, then the futex-wait-syscall is
called with the value zero and it waits until a futex-wake occurs.
As the thread is already exited, there won't be a futex-wake.

In lll_wait_tid, the tid is stored to the local variable __tid,
which is then used as argument for the futex-wait-syscall.
But unfortunately the compiler is allowed to reload the value
from memory.

With this patch, the tid is loaded with atomic_load_acquire.
Then the compiler is not allowed to reload the value for __tid from memory.

ChangeLog:

[BZ #23137]
* sysdeps/nptl/lowlevellock.h (lll_wait_tid):
Use atomic_load_acquire to load __tid.

(cherry picked from commit 1660901840dfc9fde6c5720a32f901af6f08f00a)

Fix signed integer overflow in random_r (bug 17343).

Bug 17343 reports that stdlib/random_r.c has code with undefined
behavior because of signed integer overflow on int32_t. This patch
changes the code so that the possibly overflowing computations use
unsigned arithmetic instead.

Note that the bug report refers to "Most code" in that file. The
places changed in this patch are the only ones I found where I think
such overflow can occur.

Tested for x86_64 and x86.

[BZ #17343]
* stdlib/random_r.c (__random_r): Use unsigned arithmetic for
possibly overflowing computations.

(cherry picked from commit 8a07b0c43c46a480da070efd53a2720195e2256f)

i386: Fix i386 sigaction sa_restorer initialization (BZ#21269)

This patch fixes the i386 sa_restorer field initialization for sigaction
syscall for kernel with vDSO.  As described in bug report, i386 Linux
(and compat on x86_64) interprets SA_RESTORER clear with nonzero
sa_restorer as a request for stack switching if the SS segment is 'funny'.
This means that anything that tries to mix glibc's signal handling with
segmentation (for instance through modify_ldt syscall) is randomly broken
depending on what values lands in sa_restorer.

The testcase added  is based on Linux test tools/testing/selftests/x86/ldt_gdt.c,
more specifically in do_multicpu_tests function.  The main changes are:

  - C11 atomics instead of plain access.

  - Remove x86_64 support which simplifies the syscall handling and fallbacks.

  - Replicate only the test required to trigger the issue.

Checked on i686-linux-gnu.

[BZ #21269]
* sysdeps/unix/sysv/linux/i386/Makefile (tests): Add tst-bz21269.
* sysdeps/unix/sysv/linux/i386/sigaction.c (SET_SA_RESTORER): Clear
sa_restorer for vDSO case.
* sysdeps/unix/sysv/linux/i386/tst-bz21269.c: New file.

(cherry picked from commit 68448be208ee06e76665918b37b0a57e3e00c8b4)

[BZ #22342] Fix netgroup cache keys.

Unlike other nscd caches, the netgroup cache contains two types of
records - those for "iterate through a netgroup" (i.e. setnetgrent())
and those for "is this user in this netgroup" (i.e. innetgr()),
i.e. full and partial records. The timeout code assumes these records
have the same key for the group name, so that the collection of records
that is "this netgroup" can be expired as a unit.

However, the keys are not the same, as the in-netgroup key is generated
by nscd rather than being passed to it from elsewhere, and is generated
without the trailing NUL. All other keys have the trailing NUL, and as
noted in the linked BZ, debug statements confirm that two keys for the
same netgroup are added to the cache with two different lengths.

The result of this is that as records in the cache expire, the purge
code only cleans out one of the two types of entries, resulting in
stale, possibly incorrect, and possibly inconsistent cache data.

The patch simply includes the existing NUL in the computation for the
key length ('key' points to the char after the NUL, and 'group' to the
first char of the group, so 'key-group' includes the first char to the
NUL, inclusive).

[BZ #22342]
* nscd/netgroupcache.c (addinnetgrX): Include trailing NUL in
key value.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 1c81d55fc4b07b51adf68558ba74ce975153e580)

Fix i386 memmove issue (bug 22644).

[BZ #22644]
* sysdeps/i386/i686/multiarch/memcpy-sse2-unaligned.S: Fixed
branch conditions.
* string/test-memmove.c (do_test2): New testcase.

(cherry picked from commit cd66c0e584c6d692bc8347b5e72723d02b8a8ada)

Fix crash in resolver on memory allocation failure (bug 23005)

(cherry picked from commit f178e59fa5eefbbd37fde040ae8334aa5c857ee1)

getlogin_r: return early when linux sentinel value is set

When there is no login uid Linux sets /proc/self/loginid to the sentinel
value of, (uid_t) -1. If this is set we can return early and avoid
needlessly looking up the sentinel value in any configured nss
databases.

Checked on aarch64-linux-gnu.

* sysdeps/unix/sysv/linux/getlogin_r.c (__getlogin_r_loginuid): Return
early when linux sentinel value is set.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit cc8a1620eb97ccddd337d157263c13c57b39ab71)

resolv: Fully initialize struct mmsghdr in send_dg [BZ #23037]

(cherry picked from commit 583a27d525ae189bdfaa6784021b92a9a1dae12e)

Update NEWS for bug 22343 and bug 22774

Fix integer overflows in internal memalign and malloc [BZ #22343] [BZ #22774]

When posix_memalign is called with an alignment less than MALLOC_ALIGNMENT
and a requested size close to SIZE_MAX, it falls back to malloc code
(because the alignment of a block returned by malloc is sufficient to
satisfy the call). In this case, an integer overflow in _int_malloc leads
to posix_memalign incorrectly returning successfully.

Upon fixing this and writing a somewhat thorough regression test, it was
discovered that when posix_memalign is called with an alignment larger than
MALLOC_ALIGNMENT (so it uses _int_memalign instead) and a requested size
close to SIZE_MAX, a different integer overflow in _int_memalign leads to
posix_memalign incorrectly returning successfully.

Both integer overflows affect other memory allocation functions that use
_int_malloc (one affected malloc in x86) or _int_memalign as well.

This commit fixes both integer overflows. In addition to this, it adds a
regression test to guard against false successful allocations by the
following memory allocation functions when called with too-large allocation
sizes and, where relevant, various valid alignments:
malloc, realloc, calloc, reallocarray, memalign, posix_memalign,
aligned_alloc, valloc, and pvalloc.

(cherry picked from commit 8e448310d74b283c5cd02b9ed7fb997b47bf9b22)

powerpc: Fix syscalls during early process initialization [BZ #22685]

The tunables framework needs to execute syscall early in process
initialization, before the TCB is available for consumption. This
behavior conflicts with powerpc{|64|64le}'s lock elision code, that
checks the TCB before trying to abort transactions immediately before
executing a syscall.

This patch adds a powerpc-specific implementation of __access_noerrno
that does not abort transactions before the executing syscall.

Tested on powerpc{|64|64le}.

[BZ #22685]
* sysdeps/powerpc/powerpc32/sysdep.h (ABORT_TRANSACTION_IMPL): Renamed
from ABORT_TRANSACTION.
(ABORT_TRANSACTION): Redirect to ABORT_TRANSACTION_IMPL.
* sysdeps/powerpc/powerpc64/sysdep.h (ABORT_TRANSACTION,
ABORT_TRANSACTION_IMPL): Likewise.
* sysdeps/unix/sysv/linux/powerpc/not-errno.h: New file. Reuse
Linux code, but remove the code that aborts transactions.

Signed-off-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
(cherry picked from commit 4612268a0ad8e3409d8ce2314dd2dd8ee0af5269)

Provide a C++ version of iseqsig (bug 22377)

In C++ mode, __MATH_TG cannot be used for defining iseqsig, because
__MATH_TG relies on __builtin_types_compatible_p, which is a C-only
builtin.  This is true when float128 is provided as an ABI-distinct type
from long double.

Moreover, the comparison macros from ISO C take two floating-point
arguments, which need not have the same type.  Choosing what underlying
function to call requires evaluating the formats of the arguments, then
selecting which is wider.  The macro __MATH_EVAL_FMT2 provides this
information, however, only the type of the macro expansion is relevant
(actually evaluating the expression would be incorrect).

This patch provides a C++ version of iseqsig, in which only the type of
__MATH_EVAL_FMT2 (__typeof or decltype) is used as a template parameter
for __iseqsig_type.  This function calls the appropriate underlying
function.

Tested for powerpc64le and x86_64.

[BZ #22377]
* math/Makefile [C++] (tests): Add test for iseqsig.
* math/math.h [C++] (iseqsig): New implementation, which does
not rely on __MATH_TG/__builtin_types_compatible_p.
* math/test-math-iseqsig.cc: New file.
* sysdeps/powerpc/powerpc64le/Makefile
(CFLAGS-test-math-iseqsig.cc): New variable.

(cherry picked from commit c85e54ac6cef0faed7b7ffc722f52523dec59bf5)

[AARCH64] Rewrite elf_machine_load_address using _DYNAMIC symbol

This patch rewrites aarch64 elf_machine_load_address to use special _DYNAMIC
symbol instead of _dl_start.

The static address of _DYNAMIC symbol is stored in the first GOT entry.
Here is the change which makes this solution work (part of binutils 2.24):
https://sourceware.org/ml/binutils/2013-06/msg00248.html

i386, x86_64 targets use the same method to do this as well.

The original implementation relies on a trick that R_AARCH64_ABS32 relocation
being resolved at link time and the static address fits in the 32bits.
However, in LP64, normally, the address is defined to be 64 bit.

Here is the C version one which should be portable in all cases.

* sysdeps/aarch64/dl-machine.h (elf_machine_load_address): Use
_DYNAMIC symbol to calculate load address.

(cherry picked from commit a68ba2f3cd3cbe32c1f31e13c20ed13487727b32)

NEWS: Add an entry for [BZ #22715]

x86-64: Properly align La_x86_64_retval to VEC_SIZE [BZ #22715]

_dl_runtime_profile calls _dl_call_pltexit, passing a pointer to
La_x86_64_retval which is allocated on stack.  The lrv_vector0
field in La_x86_64_retval must be aligned to size of vector register.
When allocating stack space for La_x86_64_retval, we need to make sure
that the address of La_x86_64_retval + RV_VECTOR0_OFFSET is aligned to
VEC_SIZE.  This patch checks the alignment of the lrv_vector0 field
and pads the stack space if needed.

Tested with x32 and x86-64 on SSE4, AVX and AVX512 machines.  It fixed

FAIL: elf/tst-audit10
FAIL: elf/tst-audit4
FAIL: elf/tst-audit5
FAIL: elf/tst-audit6
FAIL: elf/tst-audit7

on x32 AVX512 machine.

(cherry picked from commit 207a72e2988c6d6343f50fe0128eb4fc4edfdd15)

[BZ #22715]
* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_profile): Properly
align La_x86_64_retval to VEC_SIZE.

nptl/tst-thread-exit-clobber: Run with any C++ compiler

We do not need thread_local support in the C++11 comiler, and the
minimum GCC version for glibc has C++11 support (if it has C++ support).

(cherry picked from commit 10d200dbace0ea5198006b313f40c3b884c88724)