git.ipfire.org Git - thirdparty/glibc.git/log

Make io/ftwtest-sh remove temporary files on early exit.

The test io/ftwtest-sh creates a directory that at some points during
the test does not have execute permission. To avoid leaving behind
such a directory that prevents the build directory from being removed
with a simple "rm -rf", it traps various signals to make the directory
executable and remove it before exit. However, this doesn't cover the
case where one of the tests simply fails (which happens with cross
testing if testing on a remote system where the path to the build
directory involves a symlink, or if that remote system fell over
during testing - I think the latter is the case where the directory is
left behind with bad permissions).

This patch makes that test also trap signal 0 (exit) so that the
directory gets properly removed in such failure cases as well.

Tested in both configurations where the test passes and where it fails
to verify that the result of the test is unchanged but the directory
is no longer left behind where it was previously left behind.

* io/ftwtest-sh: Also trap on exit to remove temporary files.

posix: Add cleanup on the trap list for globtest.sh

This patch prevents lingering files for SIGSEGV failures by adding
a cleanup handler on trap handler. Checked on x86_64-linux-gnu.

* posix/globtest.sh: Add cleanup routine on trap 0.

Cherry-pick of 4fee33f.

malloc: Fix i686 build from malloc interpose backport

This patch fixes i686 build after malloc interpose fix backport
(commit 749c94d4).

Checked on i686-linux-gnu.

* malloc/arena.c (__malloc_fork_lock_parent): Add internal_function
attribute.
(__malloc_fork_unlock_parent): Likewise.
(__malloc_fork_unlock_child): Likewise.

Extend local PLT reference check

On x86, linker in binutils 2.26 and newer consolidates R_*_JUMP_SLOT with
R_*_GLOB_DAT relocation against the same symbol. This patch extends
local PLT reference check to support alternate relocations.

[BZ #18078]
* scripts/check-localplt.awk: Support alternate relocations.
* scripts/localplt.awk: Also check relocations in DT_RELA/DT_REL
sections.
* sysdeps/unix/sysv/linux/i386/localplt.data: Mark free and
malloc entries with + REL R_386_GLOB_DAT.
* sysdeps/x86_64/localplt.data: New file.

Fix localplt test breakage with new readelf

Since 2014-11-24 binutils git commit bb4d2ac2, readelf has appended
the symbol version to symbols shown in reloc dumps.

[BZ #16512]
* scripts/localplt.awk: Strip off symbol version.
* NEWS: Mention bug fix.

malloc: Simplify static malloc interposition [BZ #20432]

Existing interposed mallocs do not define the glibc-internal
fork callbacks (and they should not), so statically interposed
mallocs lead to link failures because the strong reference from
fork pulls in glibc's malloc, resulting in multiple definitions
of malloc-related symbols.

nptl/tst-tls3-malloc: Force freeing of thread stacks

It turns out that due to the reduced stack size in tst-tls3 and the
(fixed) default stack cache size, allocated TLS variables are never
freed, so the test coverage for tst-tls3-malloc is less than complete.
This change increases the thread stack size for tst-tls3-malloc only,
to make sure thread stacks and TLS variables are freed.

malloc: Run tests without calling mallopt [BZ #19469]

The compiled tests no longer refer to the mallopt symbol
from their main functions. (Some tests still call mallopt
explicitly, which is fine.)

elf: Do not use memalign for TCB/TLS blocks allocation [BZ #17730]

Instead, call malloc and explicitly align the pointer.

There is no external location to store the original (unaligned)
pointer, and this commit increases the allocation size to store
the pointer at a fixed location relative to the TCB pointer.

The manual alignment means that some space goes unused which
was previously made available for subsequent allocations.
However, in the TLS_DTV_AT_TP case, the manual alignment code
avoids aligning the pre-TCB to the TLS block alignment. (Even
while using memalign, the allocation had some unused padding
in front.)

This concludes the removal of memalign calls from the TLS code,
and the new tst-tls3-malloc test verifies that only core malloc
routines are used.

elf: Avoid using memalign for TLS allocations [BZ #17730]

Instead of a flag which indicates the pointer can be freed, dtv_t
now includes the pointer which should be freed. Due to padding,
the size of dtv_t does not increase.

To avoid using memalign, the new allocate_dtv_entry function
allocates a sufficiently large buffer so that a sub-buffer
can be found in it which starts with an aligned pointer. Both
the aligned and original pointers are kept, the latter for calling
free later.

Fix DTV race, assert, DTV_SURPLUS Static TLS limit, and nptl_db garbage

for  ChangeLog

[BZ #17090]
[BZ #17620]
[BZ #17621]
[BZ #17628]
* NEWS: Update.
* elf/dl-tls.c (_dl_update_slotinfo): Clean up outdated DTV
entries with Static TLS too.  Skip entries past the end of the
allocated DTV, from Alan Modra.
(tls_get_addr_tail): Update to glibc_likely/unlikely.  Move
Static TLS DTV entry set up from...
(_dl_allocate_tls_init): ... here (fix modid assertion), ...
* elf/dl-reloc.c (_dl_nothread_init_static_tls): ... here...
* nptl/allocatestack.c (init_one_static_tls): ... and here...
* elf/dlopen.c (dl_open_worker): Drop l_tls_modid upper bound
for Static TLS.
* elf/tlsdeschtab.h (map_generation): Return size_t.  Check
that the slot we find is associated with the given map before
using its generation count.
* nptl_db/db_info.c: Include ldsodefs.h.
(rtld_global, dtv_slotinfo_list, dtv_slotinfo): New typedefs.
* nptl_db/structs.def (DB_RTLD_VARIABLE): New macro.
(DB_MAIN_VARIABLE, DB_RTLD_GLOBAL_FIELD): Likewise.
(link_map::l_tls_offset): New struct field.
(dtv_t::counter): Likewise.
(rtld_global): New struct.
(_rtld_global): New rtld variable.
(dl_tls_dtv_slotinfo_list): New rtld global field.
(dtv_slotinfo_list): New struct.
(dtv_slotinfo): Likewise.
* nptl_db/td_symbol_list.c: Drop gnu/lib-names.h include.
(td_lookup): Rename to...
(td_mod_lookup): ... this.  Use new mod parameter instead of
LIBPTHREAD_SO.
* nptl_db/td_thr_tlsbase.c: Include link.h.
(dtv_slotinfo_list, dtv_slotinfo): New functions.
(td_thr_tlsbase): Check DTV generation.  Compute Static TLS
addresses even if the DTV is out of date or missing them.
* nptl_db/fetch-value.c (_td_locate_field): Do not refuse to
index zero-length arrays.
* nptl_db/thread_dbP.h: Include gnu/lib-names.h.
(td_lookup): Make it a macro implemented in terms of...
(td_mod_lookup): ... this declaration.
* nptl_db/db-symbols.awk (DB_RTLD_VARIABLE): Override.
(DB_MAIN_VARIABLE): Likewise.

elf: Consolidate machine-agnostic DTV definitions in <dl-dtv.h>

Identical definitions of dtv_t and TLS_DTV_UNALLOCATED were
repeated for all architectures using DTVs.

malloc: Run fork handler as late as possible [BZ #19431]

Previously, a thread M invoking fork would acquire locks in this order:

  (M1) malloc arena locks (in the registered fork handler)
  (M2) libio list lock

A thread F invoking flush (NULL) would acquire locks in this order:

  (F1) libio list lock
  (F2) individual _IO_FILE locks

A thread G running getdelim would use this order:

  (G1) _IO_FILE lock
  (G2) malloc arena lock

After executing (M1), (F1), (G1), none of the threads can make progress.

This commit changes the fork lock order to:

  (M'1) libio list lock
  (M'2) malloc arena locks

It explicitly encodes the lock order in the implementations of fork,
and does not rely on the registration order, thus avoiding the deadlock.

malloc: Fix list_lock/arena lock deadlock [BZ #19182]

* malloc/arena.c (list_lock): Document lock ordering requirements.
(free_list_lock): New lock.
(ptmalloc_lock_all): Comment on free_list_lock.
(ptmalloc_unlock_all2): Reinitialize free_list_lock.
(detach_arena): Update comment.  free_list_lock is now needed.
(_int_new_arena): Use free_list_lock around detach_arena call.
Acquire arena lock after list_lock.  Add comment, including FIXME
about incorrect synchronization.
(get_free_list): Switch to free_list_lock.
(reused_arena): Acquire free_list_lock around detach_arena call
and attached threads counter update.  Add two FIXMEs about
incorrect synchronization.
(arena_thread_freeres): Switch to free_list_lock.
* malloc/malloc.c (struct malloc_state): Update comments to
mention free_list_lock.

Replace MUTEX_INITIALIZER with _LIBC_LOCK_INITIALIZER in generic code

* sysdeps/mach/hurd/libc-lock.h (_LIBC_LOCK_INITIALIZER): Define.
(__libc_lock_define_initialized): Use it.
* sysdeps/nptl/libc-lockP.h (_LIBC_LOCK_INITIALIZER): Define.
* malloc/arena.c (list_lock): Use _LIBC_LOCK_INITIALIZER.
* malloc/malloc.c (main_arena): Likewise.
* sysdeps/generic/malloc-machine.h (MUTEX_INITIALIZER): Remove.
* sysdeps/nptl/malloc-machine.h (MUTEX_INITIALIZER): Remove.

malloc: Prevent arena free_list from turning cyclic [BZ #19048]

[BZ# 19048]
* malloc/malloc.c (struct malloc_state): Update comment. Add
attached_threads member.
(main_arena): Initialize attached_threads.
* malloc/arena.c (list_lock): Update comment.
(ptmalloc_lock_all, ptmalloc_unlock_all): Likewise.
(ptmalloc_unlock_all2): Reinitialize arena reference counts.
(deattach_arena): New function.
(_int_new_arena): Initialize arena reference count and deattach
replaced arena.
(get_free_list, reused_arena): Update reference count and deattach
replaced arena.
(arena_thread_freeres): Update arena reference count and only put
unreferenced arenas on the free list.

malloc: Rewrite with explicit TLS access using __thread

Consolidate arena_lookup and arena_lock into a single arena_get

This seems to have been left behind as an artifact of some old changes
and can now be merged.  Verified that the only generated code change
on x86_64 is that of line numbers in asserts, like so:

@@ -27253,7 +27253,7 @@ Disassembly of section .text:
   416f09:      48 89 42 20             mov    %rax,0x20(%rdx)
   416f0d:      e9 7e f6 ff ff          jmpq   416590 <_int_free+0x230>
   416f12:      b9 3f 9f 4a 00          mov    $0x4a9f3f,%ecx
-  416f17:      ba d5 0f 00 00          mov    $0xfd5,%edx
+  416f17:      ba d6 0f 00 00          mov    $0xfd6,%edx
   416f1c:      be a8 9b 4a 00          mov    $0x4a9ba8,%esi
   416f21:      bf 6a 9c 4a 00          mov    $0x4a9c6a,%edi
   416f26:      e8 45 e8 ff ff          callq  415770 <__malloc_assert>

Add runtime check for __ASSUME_REQUEUE_PI (BZ# 18463)

This patch adds a runtime check for requeue priority futexes
(FUTEX_{WAIT,CMP}_REQUEUE_PI) for configurations that do not define
__ASSUME_REQUEUE_PI.

It uses the same check uses for priority lock/unlock support, where
issuing a futex operation with FUTEX_UNLOCK_PI returns if kernel has
or not internal 'futex_cmpxchg_enabled' support (which is also used
on requeue priority futexes operations).

For architectures that already have __ASSUME_REQUEUE_PI the code
flow does not change, 'use_requeue_pi' returns the previous chec
logic. Also, if 'futex_cmpxchg_enabled' is not supported by the
kernel, the previous logic will be used (by issuing a default futex
operation).

Tested on ARM (v3.8 kernel) and x86_64.

* nptl/pthreadP.h (prio_inherit_missing): Change name to
__prio_inherit_missing.
(USE_REQUEUE_PI): Remove define.
(use_requeue_pi): New function.
* nptl/pthread_cond_broadcast.c [__ASSUME_REQUEUE_PI]
(__pthread_cond_broadcast): Remove ifdef.
(USE_REQUEUE_PI): Change to use_requeue_pi.
* nptl/pthread_cond_signal.c [__ASSUME_REQUEUE_PI]
(__pthread_cond_signal): Remove ifdef.
(USE_REQUEUE_PI): Change to use_requeue_pi.
* nptl/pthread_cond_timedwait.c [__ASSUME_REQUEUE_PI]
(__pthread_cond_timedwait): Remove ifdef.
(USE_REQUEUE_PI): Change to use_requeue_pi.
( nptl/pthread_cond_wait.c [__ASSUME_REQUEUE_PI]
(__pthread_cond_wait): Remove ifdef.
(USE_REQUEUE_PI): Change to use_requeue_pi.
* nptl/pthread_mutex_init.c (prio_inherit_missing): Remove 'static'
qualifier and change name to __prio_inherit_missing.

Remove __ASSUME_SET_ROBUST_LIST

This patch removes __ASSUME_SET_ROBUST_LIST usage and assumes that
kernel will correctly return if it supports or not
futex_atomic_cmpxchg_inatomic.

On minimum supported kernel (v3.2 and v2.6.32 for x86) kernel has:

2418 SYSCALL_DEFINE2(set_robust_list, struct robust_list_head __user *, head,
2419                 size_t, len)
2420 {
2421         if (!futex_cmpxchg_enabled)
2422                 return -ENOSYS;

The patch also adds the __set_robust_list_avail runtime check for all
architectures, since for some the syscall may still return ENOSYS if
futex_atomic_cmpxchg_inatomic is not supported (for instance ARM).

Tested on armhf (with 3.8 kernel) and x86_64.

* nptl/nptl-init.c [__ASSUME_SET_ROBUST_LIST]
(__set_robust_list_avail): Remove define.
[__NR_set_robust_list] (__pthread_initialize_minimal_internal):
Likewise.
* nptl/pthreadP.h [__ASSUME_SET_ROBUST_LIST]
(__set_robust_list_avail): Likewise.
* nptl/pthread_create.c
[__NR_set_robust_list && !__ASSUME_SET_ROBUST_LIST]
(START_THREAD_DEFN): Likewise.
* nptl/pthread_mutex_init.c [!__ASSUME_SET_ROBUST_LIST]
(__pthread_mutex_init): Likewise.
* sysdeps/unix/sysv/linux/arm/kernel-features.h
(__ASSUME_SET_ROBUST_LIST): Likewise.
* sysdeps/unix/sysv/linux/kernel-features.h:
(__ASSUME_SET_ROBUST_LIST): Likewise.
* sysdeps/unix/sysv/linux/m68k/kernel-features.h:
(__ASSUME_SET_ROBUST_LIST): Likewise.
* sysdeps/unix/sysv/linux/mips/kernel-features.h:
(__ASSUME_SET_ROBUST_LIST): Likewise.
* sysdeps/unix/sysv/linux/sparc/kernel-features.h:
(__ASSUME_SET_ROBUST_LIST): Likewise.

Fix nptl-init.c use of INTERNAL_SYSCALL_DECL.

Remove __ASSUME_FUTEX_LOCK_PI

This patch removes __ASSUME_FUTEX_LOCK_PI usage and assumes that
kernel will correctly return if it supports or not
futex_atomic_cmpxchg_inatomic.

Current PI mutex code already has runtime support by calling
prio_inherit_missing and returns ENOTSUP if the futex operation fails
at initialization (it issues a FUTEX_UNLOCK_PI futex operation).

Also, current minimum supported kernel (v3.2) will return ENOSYS if
futex_atomic_cmpxchg_inatomic is not supported in the system:

kernel/futex.c:

2628 long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
2629                 u32 __user *uaddr2, u32 val2, u32 val3)
2630 {
2631         int ret = -ENOSYS, cmd = op & FUTEX_CMD_MASK;
[...]
2667         case FUTEX_UNLOCK_PI:
2668                 if (futex_cmpxchg_enabled)
2669                         ret = futex_unlock_pi(uaddr, flags);
[...]
2686         return ret;
2687 }

The futex_cmpxchg_enabled is initialized by calling cmpxchg_futex_value_locked,
which calls futex_atomic_cmpxchg_inatomic.

For ARM futex_atomic_cmpxchg_inatomic will be either defined (if both
CONFIG_CPU_USE_DOMAINS and CONFIG_SMP are not defined) or use the
default generic implementation that returns ENOSYS.

For m68k is uses the default generic implementation.

For mips futex_atomic_cmpxchg_inatomic will return ENOSYS if cpu has no
'cpu_has_llsc' support (defined by each chip supporte inside kernel).

For sparc, 32-bit kernel will just use default generic implementation,
while 64-bit kernel has support.

Tested on ARM (v3.8 kernel) and x86_64.

* nptl/pthread_mutex_init.c [__ASSUME_FUTEX_LOCK_PI]
(prio_inherit_missing): Remove define.
* sysdeps/unix/sysv/linux/arm/kernel-features.h
(__ASSUME_FUTEX_LOCK_PI): Likewise.
* sysdeps/unix/sysv/linux/kernel-features.h (__ASSUME_FUTEX_LOCK_PI):
Likewise.
* sysdeps/unix/sysv/linux/m68k/kernel-features.h
(__ASSUME_FUTEX_LOCK_PI): Likewise.
* sysdeps/unix/sysv/linux/mips/kernel-features.h
(__ASSUME_FUTEX_LOCK_PI): Likewise.
* sysdeps/unix/sysv/linux/sparc/kernel-features.h
(__ASSUME_FUTEX_LOCK_PI): Likewise.

Suppress GCC 6 warning about ambiguous 'else' with -Wparentheses

Backport of df1cf48777fe4cd81ad7fb09ecbe5b31432b7c1c.

* stdlib/setenv.c (unsetenv): Fix ambiguous 'else'.
* nis/nis_call.c (nis_server_cache_add): Likewise.

S390: Fix "backtrace() returns infinitely deep stack frames with makecontext()" [BZ #18508].

On s390/s390x backtrace(buffer, size) returns the series of called functions until
"makecontext_ret" and additional entries (up to "size") with "makecontext_ret".
GDB-backtrace is also warning:
"Backtrace stopped: previous frame identical to this frame (corrupt stack?)"

To reproduce this scenario you have to setup a new context with makecontext()
and activate it with setcontext(). See e.g. cf() function in testcase stdlib/tst-makecontext.c.
Or see bug in libgo "Bug 66303 - runtime.Caller() returns infinitely deep stack frames
on s390x " (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66303).

This patch omits the cfi_startproc/cfi_endproc directives in ENTRY/END macro of
__makecontext_ret. Thus no frame information is generated in .eh_frame and backtrace
stops after __makecontext_ret. There is also no .eh_frame info for _start or
thread_start functions.

ChangeLog:

[BZ #18508]
* stdlib/Makefile ($(objpfx)tst-makecontext3):
Depend on $(libdl).
* stdlib/tst-makecontext.c (cf): Test if _Unwind_Backtrace
is not called infinitely times.
(backtrace_helper): New function.
(trace_arg): New struct.
(st1): Enlarge stack size.
* sysdeps/unix/sysv/linux/s390/s390-32/__makecontext_ret.S:
(__makecontext_ret): Omit cfi_startproc and cfi_endproc.
* sysdeps/unix/sysv/linux/s390/s390-64/__makecontext_ret.S:
Likewise.

(cherry picked from commit 890b7a4b33d482b5c768ab47d70758b80227e9bc)

S/390: Fix setcontext/swapcontext which are not restoring sigmask.

This patch uses sigprocmask(SIG_SETMASK) instead of SIG_BLOCK
in setcontext, swapcontext.

(cherry picked from commit 2e807f29595eb5b1e5d0decc6e356a3562ecc58e)

configure: fix `test ==` usage

POSIX defines the = operator, but not ==. Fix the few places where we
incorrectly used ==.

(cherry picked from commit b2d4456b333970ab4cb01ed8045b9a8d2c4832f3)

S390: Extend structs La_s390_regs / La_s390_retval with vector-registers.

Starting with z13, vector registers can also occur as argument registers.
Thus the passed input/output register structs for
la_s390_[32|64]_gnu_plt[enter|exit] functions should reflect those new
registers. This patch extends these structs La_s390_regs and La_s390_retval
and adjusts _dl_runtime_profile() to handle those fields in case of
running on a z13 machine.

ChangeLog:

* sysdeps/s390/bits/link.h: (La_s390_vr) New typedef.
(La_s390_32_regs): Append vector register lr_v24-lr_v31.
(La_s390_64_regs): Likewise.
(La_s390_32_retval): Append vector register lrv_v24.
(La_s390_64_retval): Likeweise.
* sysdeps/s390/s390-32/dl-trampoline.h (_dl_runtime_profile):
Handle extended structs La_s390_32_regs and La_s390_32_retval.
* sysdeps/s390/s390-64/dl-trampoline.h (_dl_runtime_profile):
Handle extended structs La_s390_64_regs and La_s390_64_retval.

(cherry picked from commit 5cdd1989d1d2f135d02e66250f37ba8e767f9772)

S390: Save and restore fprs/vrs while resolving symbols.

On s390, no fpr/vrs were saved while resolving a symbol
via _dl_runtime_resolve/_dl_runtime_profile.

According to the abi, the fpr-arguments are defined as call clobbered.
In leaf-functions, gcc 4.9 and newer can use fprs for saving/restoring gprs
instead of saving them to the stack.
If gcc do this in one of the resolver-functions, then the floating point
arguments of a library-function are invalid for the first library-function-call.
Thus, this patch saves/restores the fprs around the resolving code.

The same could occur for vector registers. Furthermore an ifunc-resolver
could also clobber the vector/floating point argument registers.
Thus this patch provides the further variants _dl_runtime_resolve_vx/
_dl_runtime_profile_vx, which are used if the kernel claims, that
we run on a machine with vector registers.

Furthermore, if _dl_runtime_profile calls _dl_call_pltexit,
the pointers to inregs-/outregs-structs were setup invalid.
Now they point to the correct location in the stack-frame.
Before branching back to the caller, the return values are now
restored instead of containing the return values of the
_dl_call_pltexit() call.
On s390-32, an endless loop occurs if _dl_call_pltexit() should be called.
Now, this code-path branches to this function instead of just after the
preceding basr-instruction.

ChangeLog:

* sysdeps/s390/s390-32/dl-trampoline.S: Include dl-trampoline.h twice
to create a non-vector/vector version for _dl_runtime_resolve and
_dl_runtime_profile. Move implementation to ...
* sysdeps/s390/s390-32/dl-trampoline.h: ... here.
(_dl_runtime_resolve) Save and restore fpr/vrs.
(_dl_runtime_profile) Save and restore vrs and fix some issues
if _dl_call_pltexit is called.
* sysdeps/s390/s390-32/dl-machine.h (elf_machine_runtime_setup):
Choose the correct resolver function if running on a machine with vx.
* sysdeps/s390/s390-64/dl-trampoline.S: Include dl-trampoline.h twice
to create a non-vector/vector version for _dl_runtime_resolve and
_dl_runtime_profile. Move implementation to ...
* sysdeps/s390/s390-64/dl-trampoline.h: ... here.
(_dl_runtime_resolve) Save and restore fpr/vrs.
(_dl_runtime_profile) Save and restore vrs and fix some issues
* sysdeps/s390/s390-64/dl-machine.h: (elf_machine_runtime_setup):
Choose the correct resolver function if running on a machine with vx.

(cherry picked from commit 4603c51ef7989d7eb800cdd6f42aab206f891077
and commit d8a012c5c9e4bfc1b8db2bc6deacb85b44a2e1eb)

S390: configure check for vector instruction support in assembler.

The S390 specific test checks if the assembler has support for the new z13
vector instructions by compiling a vector instruction. The .machine and
.machinemode directives are needed to compile the vector instruction without
-march=z13 option on 31/64 bit.
On success the macro HAVE_S390_VX_ASM_SUPPORT is defined. This macro is used
to determine if the optimized functions can be build without compile errors.
If the used assembler lacks vector support, then a warning is dumped while
configuring and only the common code functions are build.

The z13 instruction support was introduced in
"[Committed] S/390: Add support for IBM z13."
(https://sourceware.org/ml/binutils/2015-01/msg00197.html)

ChangeLog:

* config.h.in (HAVE_S390_VX_ASM_SUPPORT): New macro undefine.
* sysdeps/s390/configure.ac: Add test for S390 vector instruction
assembler support.
* sysdeps/s390/configure: Regenerated.

(cherry picked from commit 4f0a1cea34c05fb2acc16f1a2d291f53230eb4fb)

S390: Add new s390 platform.

The new IBM z13 is added to platform string array.
The macro _DL_PLATFORMS_COUNT is incremented to 8,
because it was not incremented by commit
"S/390: Sync AUXV capabilities and archs with kernel".

ChangeLog:

* sysdeps/s390/dl-procinfo.c (_dl_s390_cap_flags): Add z13.
* sysdeps/s390/dl-procinfo.h (_DL_PLATFORMS_COUNT): Increased.

(cherry picked from commit a1b0488fc9df3d895a2e5eefbcd348d3f7fe0e52)

S390: Add hwcaps value for vector facility.

The HWCAP_S390_VX flag in hwcap field of auxiliary vector indicates
if the vector facility is available and the kernel is aware of it.
This can be tested with LD_SHOW_AUXV=1 <prog>.
Currently it does not show te, because it was not incremented
by commit "S/390: Add hwcap value for transactional execution.".
Thus _DL_HWCAP_COUNT is incremented by two.

ChangeLog:

* sysdeps/s390/dl-procinfo.c (_dl_s390_platforms): Add vector flag.
* sysdeps/s390/dl-procinfo.h: Add vector capability.
* sysdeps/unix/sysv/linux/s390/bits/hwcap.h (HWCAP_S390_VX): Define.

(cherry picked from commit 4e28fa80886c71e6aaf85016b82ce981c0f12e6d)

sln: use stat64

When using sln on some filesystems which return 64-bit inodes,
the stat call might fail during install like so:
.../elf/sln .../elf/symlink.list
/lib32/libc.so.6: invalid destination: Value too large for defined data type
/lib32/ld-linux.so.2: invalid destination: Value too large for defined data type
Makefile:104: recipe for target 'install-symbolic-link' failed

Switch to using stat64 all the time to avoid this.

URL: https://bugs.gentoo.org/576396
(cherry picked from commit f5e753c8c3a18a1e3c715dd11bf4dc341b5c481f)

S390: Do not use direct socket syscalls if build on kernels >= 4.3. [BZ #19682]

Beginning with Linux 4.3, the kernel headers contain direct
system call numbers __NR_socket etc. on s390x. On older kernels,
the socket-multiplexer syscall __NR_socketcall was used.

To enable these new syscalls, the patch
"S390: Call direct system calls for socket operations."
(https://sourceware.org/git/?p=glibc.git;a=commit;h=016495b818cb61df7d0d10e6db54074271b3e3a5)
was applied upstream.

If glibc 2.23 is configured with --enable-kernel=4.3 and newer,
the direct socket syscalls are used.
For older kernels, the socket-multiplexer syscall is used instead.

In glibc 2.22 and earlier, this patch is not applied.
If you build glibc on a kernel < 4.3, the socket-multiplexer
syscall is used. But if you build glibc on kernel >= 4.3, the
direct socket-syscalls are used. If you install this glibc on a
kernel < 4.3, all socket operations will fail.
See "Bug 19682 - s390x: Incorrect syscall definitions cause
breakage with Linux 4.3 headers"
(https://sourceware.org/bugzilla/show_bug.cgi?id=19682)
The configure switch --enable-kernel does not influence this
behaviour on older glibc-releases.

The solution is to remove the direct socket-syscalls in
sysdeps/unix/sysv/linux/s390/s390-64/syscalls.list
(this patch) on older glibc-releases as it was done by the
upstream patch, too. These entries were never used on s390x,
but the c-files in sysdeps/unix/sysv/linux/.
After this removal, the behaviour of the socket functions are
not changed compared to the original glibc release version
and the socket-multiplexer-syscall is always used.

CVE-2015-7547: getaddrinfo() stack-based buffer overflow (Bug 18665).

* A stack-based buffer overflow was found in libresolv when invoked from
  libnss_dns, allowing specially crafted DNS responses to seize control
  of execution flow in the DNS client.  The buffer overflow occurs in
  the functions send_dg (send datagram) and send_vc (send TCP) for the
  NSS module libnss_dns.so.2 when calling getaddrinfo with AF_UNSPEC
  family.  The use of AF_UNSPEC triggers the low-level resolver code to
  send out two parallel queries for A and AAAA.  A mismanagement of the
  buffers used for those queries could result in the response of a query
  writing beyond the alloca allocated buffer created by
  _nss_dns_gethostbyname4_r.  Buffer management is simplified to remove
  the overflow.  Thanks to the Google Security Team and Red Hat for
  reporting the security impact of this issue, and Robert Holiday of
  Ciena for reporting the related bug 18665. (CVE-2015-7547)

See also:
https://sourceware.org/ml/libc-alpha/2016-02/msg00416.html
https://sourceware.org/ml/libc-alpha/2016-02/msg00418.html

(cherry picked from commit e9db92d3acfe1822d56d11abcea5bfc4c41cf6ca)

hsearch_r: Apply VM size limit in test case

(cherry picked from commit f34f146e682d8d529dcf64b3c2781bf3f2f05f6c)

Improve check against integer wraparound in hcreate_r [BZ #18240]

(cherry picked from commit bae7c7c764413b23e61cb099ce33be4c4ee259bb)

Handle overflow in __hcreate_r

Hi,

As in bugzilla entry there is overflow in hsearch when looking for prime
number as SIZE_MAX - 1 is divisible by 5. We fix that by rejecting large
inputs before looking for prime.

* misc/hsearch_r.c (__hcreate_r): Handle overflow.

(cherry picked from commit 2f5c1750558fe64bac361f52d6827ab1bcfe52bc)

Fix BZ #18985 -- out of range data to strftime() causes a segfault

(cherry picked from commit d36c75fc0d44deec29635dd239b0fbd206ca49b7)

Fix trailing space.

(cherry picked from commit 7565d2a862683a3c26ffb1f32351b8c5ab9f7b31)

Fix BZ #17905

(cherry picked from commit 0f58539030e436449f79189b6edab17d7479796e)

[AArch64] Fix the big endian loader name.

(cherry picked from commit 44cb254f9a024db33ba549e59dc9d90355b797c9)

nscd: drop selinux/flask.h include

Building nscd w/selinux enabled yields a warning which yields an error:
In file included from selinux.c:32:0:
/usr/include/selinux/flask.h:5:2: error:
#warning "Please remove any #include's of this header in your source code."

I've done just that and it builds cleanly with libselinux-2.4.

(cherry picked from commit 808696696837b8b8fc858f2e6f8d4e40e26e1308)

Fix parallel build error

(cherry picked from commit e8b6be0016f131c2ac72bf3213eabdb59800e63b)

CVE-2014-8121: Do not close NSS files database during iteration [BZ #18007]

Robin Hack discovered Samba would enter an infinite loop processing
certain quota-related requests.  We eventually tracked this down to a
glibc issue.

Running a (simplified) test case under strace shows that /etc/passwd
is continuously opened and closed:

…
open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 3
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 2717
lseek(3, 2717, SEEK_SET)                = 2717
close(3)                                = 0
open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 3
lseek(3, 0, SEEK_CUR)                   = 0
lseek(3, 0, SEEK_SET)                   = 0
read(3, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 2717
lseek(3, 2717, SEEK_SET)                = 2717
close(3)                                = 0
open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 3
lseek(3, 0, SEEK_CUR)                   = 0
…

The lookup function implementation in
nss/nss_files/files-XXX.c:DB_LOOKUP has code to prevent that.  It is
supposed skip closing the input file if it was already open.

  /* Reset file pointer to beginning or open file.  */       \
  status = internal_setent (keep_stream);       \
      \
  if (status == NSS_STATUS_SUCCESS)       \
    {       \
      /* Tell getent function that we have repositioned the file pointer.  */ \
      last_use = getby;       \
      \
      while ((status = internal_getent (result, buffer, buflen, errnop       \
H_ERRNO_ARG EXTRA_ARGS_VALUE))       \
     == NSS_STATUS_SUCCESS)       \
{ break_if_match }       \
      \
      if (! keep_stream)       \
internal_endent ();       \
    }       \

keep_stream is initialized from the stayopen flag in internal_setent.
internal_setent is called from the set*ent implementation as:

  status = internal_setent (stayopen);

However, for non-host database, this flag is always 0, per the
STAYOPEN magic in nss/getXXent_r.c.

Thus, the fix is this:

-  status = internal_setent (stayopen);
+  status = internal_setent (1);

This is not a behavioral change even for the hosts database (where the
application can specify the stayopen flag) because with a call to
sethostent(0), the file handle is still not closed in the
implementation of gethostent.

(cherry picked from commit 03d2730b44cc2236318fd978afa2651753666c55)

Conflicts:
ChangeLog
NEWS

getmntent: fix memory corruption w/blank lines [BZ #18887]

The fix for BZ #17273 introduced a single byte of memory corruption when
the line is entirely blank.  It would walk back past the start of the
buffer if the heap happened to be 0x20 or 0x09 and then write a NUL byte.
buffer = '\n';
end_ptr = buffer;
while (end_ptr[-1] == ' ' || end_ptr[-1] == '\t')
end_ptr--;
*end_ptr = '\0';

Fix that and rework the tests.  Adding the testcase for BZ #17273 to the
existing \040 parser does not really make sense as it's unrelated, and
leads to confusing behavior: it implicitly relies on the new entry being
longer than the previous entry (since it just rewinds the FILE*).  Split
it out into its own dedicated testcase instead.

(cherry picked from commit b0e805fa0d6fea33745952df7b7f5442ca4c374f)

ia64: atomic.h: fix atomic_exchange_and_add 64bit handling

Way back in 2005 the atomic_exchange_and_add function was cleaned up to
avoid the explicit size checking and instead let gcc handle things itself.
Unfortunately that change ended up leaving beyond a cast to int, even when
the incoming value was a long.  This has flown under the radar for a long
time due to the function not being heavily used in the tree (especially as
a full 64bit field), but a recent change to semaphores made some nptl tests
fail reliably.  This is due to the code packing two 32bit values into one
64bit variable (where the high 32bits contained the number of waiters), and
then the whole variable being atomically updated between threads.  On ia64,
that meant we never atomically updated the count, so sometimes the sem_post
would not wake up the waiters.

(cherry picked from commit cf31a2c79957936b60de34ea1e718e892baf669c)

Fix BZ #17269 -- _IO_wstr_overflow integer overflow

(cherry picked from commit bdf1ff052a8e23d637f2c838fa5642d78fcedc33)

Fix read past end of pattern in fnmatch (bug 18032)

(cherry picked from commit 4a28f4d55a6cc33474c0792fe93b5942d81bf185)

sparc: fix sigaction for 32bit builds [BZ #18694]

Commit a059d359d86130b5fa74e04a978c8523a0293f77 changed the sigaction
struct to pass conform tests, but it ended up also changing the ABI for
32 bit builds. For 64 bit builds, changing the long to two ints works,
but for 32 bit builds, it inserts 4 extra bytes. This leads to many
packages randomly failing like bash that spews things like:
configure: line 471: wait_for: No record of process 0

Bracket the new member by a wordsize check to fix the ABI for 32bit.

(cherry picked from commit 7fde904c73c57faea48c9679bbdc0932d81b3a2f)

CVE-2015-1781: resolv/nss_dns/dns-host.c buffer overflow [BZ#18287]

(cherry picked from commit 2959eda9272a033863c271aff62095abd01bd4e3)

Fix __memcpy_chk on non-SSE2 CPUs

In commit 8b4416d, the 1: jump label in __mempcpy_chk was accidentally
moved. This resulted in failures of mempcpy on CPU without SSE2.

(cherry picked from commit 132a1328eccd20621b77f7810eebbeec0a1af187)

NEWS: Also mention CVE-2015-1473

Fix missing ChangeLog attribution.

Update version.h and include/features.h for 2.21 release

hppa: Sync with pthread.h.

This reverts part of the previous commit to refactor pthread.h.
The refactoring must be done by having pthread.h include arch
bits headers, not the other way around. Then hppa provides the
arch bits header. For now we synchronzie again with pthread.h
and include the entire contents in the hppa copy.

CVE-2015-1472: wscanf allocates too little memory

BZ #16618

Under certain conditions wscanf can allocate too little memory for the
to-be-scanned arguments and overflow the allocated buffer. The
implementation now correctly computes the required buffer size when
using malloc.

A regression test was added to tst-sscanf.

glibc 2.21 pre-release update.

Update all translations.

Update contributions in the manual.

Update installation notes with information about newest working tools.

Reconfigure using exactly autoconf 2.69.

Regenerate INSTALL.

hppa: Remove warnings and fix conformance errors.

(1) Fix warnings.

This is a bulk update to fix all the warnings that were causing
build failures with -Werror on hppa.

The most egregious problems are in dl-fptr.c which needs to be
entirely rewritten, thus I've used -Wno-error for that.

(2) Fix conformance errors.

The sysdep.c file had __syscall_error and syscall in one file
which caused conformance issues by including syscall when
__syscall_error was linked to. The fix is obviously to split
the file and use syscall.c to implement syscall.

Function name typo error in non-PIC case, fixed in this patch.

Fix two bugs in sparc atomics.

* sysdeps/sparc/sparc32/bits/atomic.h
(__sparc32_atomic_do_unlock24): Put the memory barrier before the
unlock not after it.
(__v9_compare_and_exchange_val_32_acq): Use unions to avoid getting
volatile register usage warnings from the compiler.

Fix sparc semaphore implementation after recent changes.

* sysdeps/sparc/nptl/sem_init.c: Delete.
* sysdeps/sparc/nptl/sem_post.c: Delete.
* sysdeps/sparc/nptl/sem_timedwait.c: Delete.
* sysdeps/sparc/nptl/sem_wait.c: Delete.
* sysdeps/sparc/sparc32/sem_init.c: New file.
* sysdeps/sparc/sparc32/sem_waitcommon.c: New file.
* sysdeps/sparc/sparc32/sem_open.c: Generic nptl version with
padding explicitly initialized.
* sysdeps/sparc/sparc32/sem_post.c: Generic nptl version using
padding for in-semaphore spinlock.
* sysdeps/sparc/sparc32/sem_wait.c: Likewise.
* sysdeps/sparc/sparc32/sem_trywait.c: Delete.
* sysdeps/sparc/sparc32/sem_timedwait.c: Delete.
* sysdeps/sparc/sparc32/sparcv9/sem_init.c: New file.
* sysdeps/sparc/sparc32/sparcv9/sem_open.c: New file.
* sysdeps/sparc/sparc32/sparcv9/sem_post.c: New file.
* sysdeps/sparc/sparc32/sparcv9/sem_waitcommon.c: New file.
* sysdeps/sparc/sparc32/sparcv9/sem_wait.c: Redirect to nptl
version.
* sysdeps/sparc/sparc32/sparcv9/sem_timedwait.c: Delete.
* sysdeps/sparc/sparc32/sparcv9/sem_trywait.c: Delete.

Use AVX unaligned memcpy only if AVX2 is available

memcpy with unaligned 256-bit AVX register loads/stores are slow on older
processorsl like Sandy Bridge. This patch adds bit_AVX_Fast_Unaligned_Load
and sets it only when AVX2 is available.

[BZ #17801]
* sysdeps/x86_64/multiarch/init-arch.c (__init_cpu_features):
Set the bit_AVX_Fast_Unaligned_Load bit for AVX2.
* sysdeps/x86_64/multiarch/init-arch.h (bit_AVX_Fast_Unaligned_Load):
New.
(index_AVX_Fast_Unaligned_Load): Likewise.
(HAS_AVX_FAST_UNALIGNED_LOAD): Likewise.
* sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Check the
bit_AVX_Fast_Unaligned_Load bit instead of the bit_AVX_Usable bit.
* sysdeps/x86_64/multiarch/memcpy_chk.S (__memcpy_chk): Likewise.
* sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Likewise.
* sysdeps/x86_64/multiarch/mempcpy_chk.S (__mempcpy_chk): Likewise.
* sysdeps/x86_64/multiarch/memmove.c (__libc_memmove): Replace
HAS_AVX with HAS_AVX_FAST_UNALIGNED_LOAD.
* sysdeps/x86_64/multiarch/memmove_chk.c (__memmove_chk): Likewise.

Include <signal.h> in sysdeps/nptl/allocrtsig.c

Architectures which don't use hp-timing-common.h don't include <signal.h>
via <sys/param.h>.

Fix up ChangeLog formatting

Initialize nscd stats data [BZ #17892]

The padding bytes in the statsdata struct are not initialized, due to
which valgrind throws a warning:

==11384== Memcheck, a memory error detector
==11384== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==11384== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==11384== Command: nscd -d
==11384==
Fri 25 Apr 2014 10:34:53 AM CEST - 11384: handle_request: request received (Version = 2) from PID 11396
Fri 25 Apr 2014 10:34:53 AM CEST - 11384:       GETSTAT
==11384== Thread 6:
==11384== Syscall param socketcall.sendto(msg) points to uninitialised byte(s)
==11384==    at 0x4E4ACDC: send (in /lib64/libpthread-2.12.so)
==11384==    by 0x11AF6B: send_stats (in /usr/sbin/nscd)
==11384==    by 0x112F75: nscd_run_worker (in /usr/sbin/nscd)
==11384==    by 0x4E439D0: start_thread (in /lib64/libpthread-2.12.so)
==11384==    by 0x599AB6C: clone (in /lib64/libc-2.12.so)
==11384==  Address 0x15708395 is on thread 6's stack

Fix the warning by initializing the structure.

Clarify math/README.libm-test. Add "How to read the test output."

tilegx32: set __HAVE_64B_ATOMICS to 0

This is because of alignment issues in the sem_t support.
tilegx32 does in fact support 64-bit atomics and we will need
to revisit this after the 2.21 freeze.

Disable 64-bit atomics for MIPS n32.

This patch disables use of 64-bit atomics for MIPS n32 to fix the
problems with unaligned semaphores.

Before 64-bit atomics are used for anything for which such alignment
issues do not arise, and before the addition of any new ILP32 ports
with 64-bit semaphores for which the ABI can be set to have the
greater alignment (AARCH64?), a better approach will need to be
established that allows architectures to declare their 64-bit atomics
availability accurately, without doing so causing inappropriate use of
such atomics on unaligned semaphores.

Tested for MIPS n32 that this fixes the nptl/tst-sem3 failure.

* sysdeps/mips/bits/atomic.h [_MIPS_SIM == _ABIN32]
(__HAVE_64B_ATOMICS): Define to 0.

powerpc: Fix fesetexceptflag [BZ#17885]

This patch fixes a bug introduced by 18f2945ae9216cfc, where it optimizes
the FPSCR set by just issuing a mtfs instruction if new flag is different
from older one. The issue is a typo, where the new flag should the the
new value, instead of the old one.

It fixes BZ#17885.

powerpc: Fix fsqrt build in libm [BZ#16576]

Some powerpc64 processors (e5500 core for instance) does not provide the
fsqrt instruction, however current check to use in math_private.h is
__WORDSIZE and _ARCH_PWR4 (ISA 2.02). This is patch change it to use
the compiler flag _ARCH_PPCSQ (which is the same condition GCC uses to
decide whether to generate fsqrt instruction).

It fixes BZ#16576.

iconv: Suppress array out of bounds warning.

ia64: avoid set-but-not-used warning

m68k/coldfire: avoid warning about volatile register variables

m68k: fix missing definition of __feraiseexcept

m68k: force inlining bswap functions

Fix segmentation fault when LD_LIBRARY_PATH contains only non-existings paths

powerpc: Fix powerpc64 build failure with binutils 2.22

GLIBC memset optimization for POWER8 uses the '.machine power8'
directive, which is only supported officially on binutils 2.24+. This
causes a build failure on older binutils.

Since the requirement of .machine power8 is to correctly assembly the
'mtvsrd' instruction and it is already handled by the MTVSRD_V1_R4
macro, there is no really needed of using it.

The patch replaces the power8 with power7 for .machine directive.

It fixes BZ#17869.

powerpc: Fix ifuncmain6pie failure with GCC 4.9

This patch fix the elf/ifuncmain6pie failure when building with GCC
4.9+. For some reason, the compiler removes the branch taken code at
resolve_ifunc (sysdeps/powerpc/powerpc64/dl-machine.h) as dead-code
and thus the testcase fails because the ifunc resolves branches to an
invalid memory location. It fixes by explicit adding a dependency of
value based on odp variable to avoid compiler optimization.

It fixes BZ#17868.

Also treat model numbers 0x5a/0x5d as Silvermont

Treat model numbers 0x4a/0x4d as Silvermont

* sysdeps/x86_64/multiarch/init-arch.c (__init_cpu_features):
Treat model numbers 0x4a/0x4d as Intel Silvermont architecture.

Also use uint64_t in __new_sem_wait_fast

Use uint64_t and (uint64_t) 1 for 64-bit int

This patch replaces unsigned long int and 1UL with uint64_t and
(uint64_t) 1 to support ILP32 targets like x32.

[BZ #17870]
* nptl/sem_post.c (__new_sem_post): Replace unsigned long int
with uint64_t.
* nptl/sem_waitcommon.c (__sem_wait_cleanup): Replace 1UL with
(uint64_t) 1.
(__new_sem_wait_slow): Replace unsigned long int with uint64_t.
Replace 1UL with (uint64_t) 1.
* sysdeps/nptl/internaltypes.h (new_sem): Replace unsigned long
int with uint64_t.

Add missing libc_hidden_weak to stub if_nameindex, if_freenameindex.

Add missing libc_hidden_def to stub getrlimit64.

soft-fp: Use __label__ for all labels within macros.

soft-fp has various macros containing labels and goto statements.
Because label names are function-scoped, this is problematic for using
the same macro more than once within a function, which some
architectures do in the Linux kernel (the soft-fp version there
predates the addition of any of these labels and gotos). This patch
fixes this by using __label__ to make the labels local to the block
with the __label__ declaration.

Tested for powerpc-nofpu that installed stripped shared libraries are
unchanged by this patch.

* soft-fp/op-common.h (_FP_ADD_INTERNAL): Declare labels with
__label__.
(_FP_FMA): Likewise.
(_FP_TO_INT_ROUND): Likewise.
(_FP_FROM_INT): Likewise.

BZ #16418: Fix powerpc get_clockfreq raciness

This patch fix powerpc __get_clockfreq racy and cancel-safe issues by
dropping internal static cache and by using nocancel file operations.
The vDSO failure check is also removed, since kernel code does not
return an error (it cleans cr0.so bit on function return) and the static
code (to read value /proc) now uses non-cancellable calls.

tst-getpw: Rewrite.

The test is rewritten to look for the testable conditions and
exit once they are all detected. This prevents the test from
iterating over 2000 UIDs and looking up each one. It speeds up
the test and prevents it from failing if the system under test
has an NSS-based passwd that is slower than the test timeout.

See:
https://sourceware.org/ml/libc-alpha/2015-01/msg00394.html

Fix tst_wcscpy.c test.

Fix recursive dlopen.

The ability to recursively call dlopen is useful for malloc
implementations that wish to load other dynamic modules that
implement reentrant/AS-safe functions to use in their own
implementation.

Given that a user malloc implementation may be called by an
ongoing dlopen to allocate memory the user malloc
implementation interrupts dlopen and if it calls dlopen again
that's a reentrant call.

This patch fixes the issues with the ld.so.cache mapping
and the _r_debug assertion which prevent this from working
as expected.

See:
https://sourceware.org/ml/libc-alpha/2014-12/msg00446.html

Fix semaphore destruction (bug 12674).

This commit fixes semaphore destruction by either using 64b atomic
operations (where available), or by using two separate fields when only
32b atomic operations are available. In the latter case, we keep a
conservative estimate of whether there are any waiting threads in one
bit of the field that counts the number of available tokens, thus
allowing sem_post to atomically both add a token and determine whether
it needs to call futex_wake.

See:
https://sourceware.org/ml/libc-alpha/2014-12/msg00155.html

Regenerate INSTALL.

Update libc.pot:

In preparation for providing a tarball to the translation project.

* po/libc.pot: Regenerated.

Commit nios2 port to master.

S390: Get rid of linknamespace failures for utmp functions.

S390: Get rid of linknamespace failures for string functions.

Fix powerpc-nofpu fesetenv namespace (bug 17748).

When fixing namespace issues for <fenv.h> functions I missed one call
to fesetenv for powerpc-nofpu. This patch changes this to a call to
__fesetenv.

Tested for powerpc-nofpu; it fixes the previously observed math.h
linknamespace test failures.

[BZ #17748]
* sysdeps/powerpc/nofpu/feholdexcpt.c (__feholdexcept): Call
__fesetenv instead of fesetenv.

[s390] Define a __tls_get_addr macro to avoid declaring it again

commit 050f7298e1ecc39887c329037575ccd972071255 added an extern
declaration for __tls_get_addr that conflicts with the one in s390
dl-tls.h, based on whether __tls_get_addr is defined as a macro.  The
rationale seems to be based on the assumption that __tls_get_addr is
exported for every architecture and hence an internal non-plt alias is
needed.  This is not true for s390 though, since it exports
__tls_get_offset and not __tls_get_addr.  This results in tst-audit9
being stuck in an infinite loop.

This patch fixes this by defining a __tls_get_addr macro to itself so
as to not use the conflicting declaration.

powerpc: Fix POWER7/PPC64 performance regression on LE

This patch fixes a performance regression on the POWER7/PPC64 memcmp
porting for Little Endian.  The LE code uses 'ldbrx' instruction to read
the memory on byte reversed form, however ISA 2.06 just provide the indexed
form which uses a register value as additional index, instead of a fixed value
enconded in the instruction.

And the port strategy for LE uses r0 index value and update the address
value on each compare loop interation.  For large compare size values,
it adds 8 more instructions plus some more depending of trailing
size.  This patch fixes it by adding pre-calculate indexes to remove the
address update on loops and tailing sizes.

For large sizes it shows a considerable gain, with double performance
pairing with BE.

powerpc: Optimized strncmp for POWER8/PPC64

This patch adds an optimized POWER8 strncmp. The implementation focus
on speeding up unaligned cases follwing the ideas of power8 strcmp.

The algorithm first check the initial 16 bytes, then align the first
function source and uses unaligned loads on second argument only.
Aditional checks for page boundaries are done for unaligned cases
(where sources alignment are different).