git.ipfire.org Git - thirdparty/glibc.git/log

NEWS: Fix typo in CVE-2021-3326 entry

NEWS: Mention CVE-2021-3326 (iconv assertion with ISO-20220-JP-3)

Merge branch release/2.30/master into ibm/2.30/master

Add NEWS entry for CVE-2020-29562 (BZ #26923)

BZ #26923 now has a CVE entry, so add a NEWS entry for it.

iconv: Fix incorrect UCS4 inner loop bounds (BZ#26923)

Previously, in UCS4 conversion routines we limit the number of
characters we examine to the minimum of the number of characters in the
input and the number of characters in the output. This is not the
correct behavior when __GCONV_IGNORE_ERRORS is set, as we do not consume
an output character when we skip a code unit. Instead, track the input
and output pointers and terminate the loop when either reaches its
limit.

This resolves assertion failures when resetting the input buffer in a step of
iconv, which assumes that the input will be fully consumed given sufficient
output space.

iconv: Accept redundant shift sequences in IBM1364 [BZ #26224]

The IBM1364, IBM1371, IBM1388, IBM1390 and IBM1399 character sets
share converter logic (iconvdata/ibm1364.c) which would reject
redundant shift sequences when processing input in these character
sets. This led to a hang in the iconv program (CVE-2020-27618).

This commit adjusts the converter to ignore redundant shift sequences
and adds test cases for iconv_prog hangs that would be triggered upon
their rejection. This brings the implementation in line with other
converters that also ignore redundant shift sequences (e.g. IBM930
etc., fixed in commit 692de4b3960d).

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

nscd: Fix double free in netgroupcache [BZ #27462]

In commit 745664bd798ec8fd50438605948eea594179fba1 a use-after-free
was fixed, but this led to an occasional double-free. This patch
tracks the "live" allocation better.

Tested manually by a third party.

Related: RHBZ 1927877

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit dca565886b5e8bd7966e15f0ca42ee5cff686673)

gconv: Fix assertion failure in ISO-2022-JP-3 module (bug 27256)

The conversion loop to the internal encoding does not follow
the interface contract that __GCONV_FULL_OUTPUT is only returned
after the internal wchar_t buffer has been filled completely.  This
is enforced by the first of the two asserts in iconv/skeleton.c:

      /* We must run out of output buffer space in this
rerun.  */
      assert (outbuf == outerr);
      assert (nstatus == __GCONV_FULL_OUTPUT);

This commit solves this issue by queuing a second wide character
which cannot be written immediately in the state variable, like
other converters already do (e.g., BIG5-HKSCS or TSCII).

Reported-by: Tavis Ormandy <taviso@gmail.com>
(cherry picked from commit 7d88c6142c6efc160c0ee5e4f85cde382c072888)

x86: Check IFUNC definition in unrelocated executable [BZ #20019]

Calling an IFUNC function defined in unrelocated executable also leads to
segfault.  Issue a fatal error message when calling IFUNC function defined
in the unrelocated executable from a shared library.

On x86, ifuncmain6pie failed with:

[hjl@gnu-cfl-2 build-i686-linux]$ ./elf/ifuncmain6pie --direct
./elf/ifuncmain6pie: IFUNC symbol 'foo' referenced in '/export/build/gnu/tools-build/glibc-32bit/build-i686-linux/elf/ifuncmod6.so' is defined in the executable and creates an unsatisfiable circular dependency.
[hjl@gnu-cfl-2 build-i686-linux]$ readelf -rW elf/ifuncmod6.so | grep foo
00003ff4  00000706 R_386_GLOB_DAT         0000400c   foo_ptr
00003ff8  00000406 R_386_GLOB_DAT         00000000   foo
0000400c  00000401 R_386_32               00000000   foo
[hjl@gnu-cfl-2 build-i686-linux]$

Remove non-JUMP_SLOT relocations against foo in ifuncmod6.so, which
trigger the circular IFUNC dependency, and build ifuncmain6pie with
-Wl,-z,lazy.

(cherry picked from commits 6ea5b57afa5cdc9ce367d2b69a2cebfb273e4617
and 7137d682ebfcb6db5dfc5f39724718699922f06c)

x86: Set header.feature_1 in TCB for always-on CET [BZ #27177]

Update dl_cet_check() to set header.feature_1 in TCB when both IBT and
SHSTK are always on.

(cherry picked from commit 2ef23b520597f4ea1790a669b83e608f24f4cf12)

x86-64: Avoid rep movsb with short distance [BZ #27130]

When copying with "rep movsb", if the distance between source and
destination is N*4GB + [1..63] with N >= 0, performance may be very
slow. This patch updates memmove-vec-unaligned-erms.S for AVX and
AVX512 versions with the distance in RCX:

cmpl $63, %ecx
// Don't use "rep movsb" if ECX <= 63
jbe L(Don't use rep movsb")
Use "rep movsb"

Benchtests data with bench-memcpy, bench-memcpy-large, bench-memcpy-random
and bench-memcpy-walk on Skylake, Ice Lake and Tiger Lake show that its
performance impact is within noise range as "rep movsb" is only used for
data size >= 4KB.

(cherry picked from commit 3ec5d83d2a237d39e7fd6ef7a0bc8ac4c171a4a5)

Merge branch release/2.30/master into ibm/2.30/master

aarch64: Fix DT_AARCH64_VARIANT_PCS handling [BZ #26798]

The variant PCS support was ineffective because in the common case
linkmap->l_mach.plt == 0 but then the symbol table flags were ignored
and normal lazy binding was used instead of resolving the relocs early.
(This was a misunderstanding about how GOT[1] is setup by the linker.)

In practice this mainly affects SVE calls when the vector length is
more than 128 bits, then the top bits of the argument registers get
clobbered during lazy binding.

Fixes bug 26798.

(cherry picked from commit 558251bd8785760ad40fcbfeaaee5d27fa5b0fe4)

AArch64: Use __memcpy_simd on Neoverse N2/V1

Add CPU detection of Neoverse N2 and Neoverse V1, and select __memcpy_simd as
the memcpy/memmove ifunc.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit e11ed9d2b4558eeacff81557dc9557001af42a6b)

[AArch64] Improve integer memcpy

Further optimize integer memcpy.  Small cases now include copies up
to 32 bytes.  64-128 byte copies are split into two cases to improve
performance of 64-96 byte copies.  Comments have been rewritten.

(cherry picked from commit 700065132744e0dfa6d4d9142d63f6e3a1934726)

aarch64: Increase small and medium cases for __memcpy_generic

Increase the upper bound on medium cases from 96 to 128 bytes.
Now, up to 128 bytes are copied unrolled.

Increase the upper bound on small cases from 16 to 32 bytes so that
copies of 17-32 bytes are not impacted by the larger medium case.

Benchmarking:
The attached figures show relative timing difference with respect
to 'memcpy_generic', which is the existing implementation.
'memcpy_med_128' denotes the the version of memcpy_generic with
only the medium case enlarged. The 'memcpy_med_128_small_32' numbers
are for the version of memcpy_generic submitted in this patch, which
has both medium and small cases enlarged. The figures were generated
using the script from:
https://www.sourceware.org/ml/libc-alpha/2019-10/msg00563.html

Depending on the platform, the performance improvement in the
bench-memcpy-random.c benchmark ranges from 6% to 20% between
the original and final version of memcpy.S

Tested against GLIBC testsuite and randomized tests.

(cherry picked from commit b9f145df85145506f8e61bac38b792584a38d88f)

AArch64: Rename IS_ARES to IS_NEOVERSE_N1

Rename IS_ARES to IS_NEOVERSE_N1 since that is a bit clearer.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 0f6278a8793a5d04ea31878119eccf99f469a02d)

AArch64: Improve backwards memmove performance

On some microarchitectures performance of the backwards memmove improves if
the stores use STR with decreasing addresses. So change the memmove loop
in memcpy_advsimd.S to use 2x STR rather than STP.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit bd394d131c10c9ec22c6424197b79410042eed99)

AArch64: Add optimized Q-register memcpy

Add a new memcpy using 128-bit Q registers - this is faster on modern
cores and reduces codesize.  Similar to the generic memcpy, small cases
include copies up to 32 bytes.  64-128 byte copies are split into two
cases to improve performance of 64-96 byte copies.  Large copies align
the source rather than the destination.

bench-memcpy-random is ~9% faster than memcpy_falkor on Neoverse N1,
so make this memcpy the default on N1 (on Centriq it is 15% faster than
memcpy_falkor).

Passes GLIBC regression tests.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 4a733bf375238a6a595033b5785cea7f27d61307)

AArch64: Align ENTRY to a cacheline

Given almost all uses of ENTRY are for string/memory functions,
align ENTRY to a cacheline to simplify things.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 34f0d01d5e43c7dedd002ab47f6266dfb5b79c22)

arm: CVE-2020-6096: Fix multiarch memcpy for negative length [BZ #25620]

Unsigned branch instructions could be used for r2 to fix the wrong
behavior when a negative length is passed to memcpy.
This commit fixes the armv7 version.

(cherry picked from commit beea361050728138b82c57dda0c4810402d342b9)

arm: CVE-2020-6096: fix memcpy and memmove for negative length [BZ #25620]

Unsigned branch instructions could be used for r2 to fix the wrong
behavior when a negative length is passed to memcpy and memmove.
This commit fixes the generic arm implementation of memcpy amd memmove.

(cherry picked from commit 79a4fa341b8a89cb03f84564fd72abaa1a2db394)

NEWS: Mention BZ 25933 fix

Fix avx2 strncmp offset compare condition check [BZ #25933]

strcmp-avx2.S: In avx2 strncmp function, strings are compared in
chunks of 4 vector size(i.e. 32x4=128 byte for avx2). After first 4
vector size comparison, code must check whether it already passed
the given offset. This patch implement avx2 offset check condition
for strncmp function, if both string compare same for first 4 vector
size.

(cherry picked from commit 75870237ff3bb363447b03f4b0af100227570910)

nss_compat: internal_end*ent may clobber errno, hiding ERANGE [BZ #25976]

During cleanup, before returning from get*_r functions, the end*ent
calls must not change errno. Otherwise, an ERANGE error from the
underlying implementation can be hidden, causing unexpected lookup
failures. This commit introduces an internal_end*ent_noerror
function which saves and restore errno, and marks the original
internal_end*ent function as warn_unused_result, so that it is used
only in contexts were errors from it can be handled explicitly.

Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 790b8dda4455865cb8c3a47801f4304c1a43baf6)

NEWS: Merge two bug lists in the glibc 2.30.1 section

NEWS: Mention fixes for BZ 25810/25896/25902/25966

x86-64: Use RDX_LP on __x86_shared_non_temporal_threshold [BZ #25966]

Since __x86_shared_non_temporal_threshold is defined as

long int __x86_shared_non_temporal_threshold;

and long int is 4 bytes for x32, use RDX_LP to compare against
__x86_shared_non_temporal_threshold in assembly code.

(cherry picked from commit 55c7bcc71b84123d5d4bd2814366a6b05fcf8ebd)

Merge branch release/2.30/master into ibm/2.30/master

Add a C wrapper for prctl [BZ #25896]

Add a C wrapper to pass arguments in

/* Control process execution. */
extern int prctl (int __option, ...) __THROW;

to prctl syscall:

extern int prctl (int, unsigned long int, unsigned long int,
unsigned long int, unsigned long int);

(cherry picked from commit ff026950e280bc3e9487b41b460fb31bc5b57721)

powerpc: Rename argN to _argN in LOADARGS_N [BZ #25902]

LOADARGS_N in powerpc/sysdep.h uses argN as local variables. It breaks
when argN is also a function argument. Rename argN to _argN to avoid
conflict.

(cherry picked from commit 14f43dd34dcf1ba29386c01cd0b286dffb37412d)

Add C wrappers for process_vm_readv/process_vm_writev [BZ #25810]

Since the the U marker can only be applied to 2 unsigned long arguments
in syscalls.list files, add a C wrapper for process_vm_readv and
process_vm_writev syscals which have more than 2 unsigned long arguments.

(cherry picked from commit ad9fd65d716f1ccd757b6b2feeee826d0f187ed4)

Mark unsigned long arguments with U in more syscalls [BZ #25810]

Mark unsigned long arguments in mmap, read, recv, recvfrom, send, sendto,
write, ioperm, sendfile64, setxattr, lsetxattr, fsetxattr, getxattr,
lgetxattr, fgetxattr, listxattr, llistxattr and flistxattr with U in
syscalls.list files.

(cherry picked from commit 86f4f2263bf21ff7f80905b3062c16213b016fe6)

Add a syscall test for [BZ #25810]

Add a test to pass 64-bit long arguments to syscall with undefined upper
32 bits on x32.

Tested on i386, x86-64 and x32 as well as with build-many-glibcs.py.

(cherry picked from commit 781dacc4f41332098e3a272514b20a490a7ebc8c)

Add SYSCALL_ULONG_ARG_[12] to pass long to syscall [BZ #25810]

X32 has 32-bit long and pointer with 64-bit off_t.  Since x32 psABI
requires that pointers passed in registers must be zero-extended to
64bit, x32 can share many syscall interfaces with LP64.  When a LP64
syscall with long and unsigned long int arguments is used for x32, these
arguments must be properly extended to 64-bit.  Otherwise if the upper
32 bits of the register have undefined value, such a syscall will be
rejected by kernel.

For syscalls implemented in assembly codes, 'U' is added to syscall
signature key letters for unsigned long, which is zero-extended to
64-bit types.  SYSCALL_ULONG_ARG_1 and SYSCALL_ULONG_ARG_2 are passed
to syscall-template.S for the first and the second unsigned long int
arguments if PSEUDOS_HAVE_ULONG_INDICES is defined.  They are used by
x32 to zero-extend 32-bit arguments to 64 bits.

Tested on i386, x86-64 and x32 as well as with build-many-glibcs.py.

(cherry picked from commit 2ad5d0845d80589d0adf86593bd36a7c71a521f8)

x32: Properly pass long to syscall [BZ #25810]

X32 has 32-bit long and pointer with 64-bit off_t.  Since x32 psABI
requires that pointers passed in registers must be zero-extended to
64bit, x32 can share many syscall interfaces with LP64.  When a LP64
syscall with long and unsigned long arguments is used for x32, these
arguments must be properly extended to 64-bit.  Otherwise if the upper
32 bits of the register have undefined value, such a syscall will be
rejected by kernel.

Enforce zero-extension for pointers and array system call arguments.
For integer types, extend to int64_t (the full register) using a
regular cast, resulting in zero or sign extension based on the
signedness of the original type.

For

       void *mmap(void *addr, size_t length, int prot, int flags,
                  int fd, off_t offset);

we now generate

   0: 41 f7 c1 ff 0f 00 00 test   $0xfff,%r9d
   7: 75 1f                 jne    28 <__mmap64+0x28>
   9: 48 63 d2              movslq %edx,%rdx
   c: 89 f6                 mov    %esi,%esi
   e: 4d 63 c0              movslq %r8d,%r8
  11: 4c 63 d1              movslq %ecx,%r10
  14: b8 09 00 00 40        mov    $0x40000009,%eax
  19: 0f 05                 syscall

That is

1. addr is unchanged.
2. length is zero-extend to 64 bits.
3. prot is sign-extend to 64 bits.
4. flags is sign-extend to 64 bits.
5. fd is sign-extend to 64 bits.
6. offset is unchanged.

For int arguments, since kernel uses only the lower 32 bits and ignores
the upper 32 bits in 64-bit registers, these work correctly.

Tested on x86-64 and x32.  There are no code changes on x86-64.

(cherry picked from commit df76ff3a446a787a95cf74cb15c285464d73a93d)

Add new file missed in previous hppa commit.

(cherry picked from commit acdcca72940e060270e4e54d9c0457398110f409)

Fix data race in setting function descriptors during lazy binding on hppa.

This addresses an issue that is present mainly on SMP machines running
threaded code.  In a typical indirect call or PLT import stub, the
target address is loaded first.  Then the global pointer is loaded into
the PIC register in the delay slot of a branch to the target address.
During lazy binding, the target address is a trampoline which transfers
to _dl_runtime_resolve().

_dl_runtime_resolve() uses the relocation offset stored in the global
pointer and the linkage map stored in the trampoline to find the
relocation.  Then, the function descriptor is updated.

In a multi-threaded application, it is possible for the global pointer
to be updated between the load of the target address and the global
pointer.  When this happens, the relocation offset has been replaced
by the new global pointer.  The function pointer has probably been
updated as well but there is no way to find the address of the function
descriptor and to transfer to the target.  So, _dl_runtime_resolve()
typically crashes.

HP-UX addressed this problem by adding an extra pc-relative branch to
the trampoline.  The descriptor is initially setup to point to the
branch.  The branch then transfers to the trampoline.  This allowed
the trampoline code to figure out which descriptor was being used
without any modification to user code.  I didn't use this approach
as it is more complex and changes function pointer canonicalization.

The order of loading the target address and global pointer in
indirect calls was not consistent with the order used in import stubs.
In particular, $$dyncall and some inline versions of it loaded the
global pointer first.  This was inconsistent with the global pointer
being updated first in dl-machine.h.  Assuming the accesses are
ordered, we want elf_machine_fixup_plt() to store the global pointer
first and calls to load it last.  Then, the global pointer will be
correct when the target function is entered.

However, just to make things more fun, HP added support for
out-of-order execution of accesses in PA 2.0.  The accesses used by
calls are weakly ordered. So, it's possibly under some circumstances
that a function might be entered with the wrong global pointer.
However, HP uses weakly ordered accesses in 64-bit HP-UX, so I assume
that loading the global pointer in the delay slot of the branch must
work consistently.

The basic fix for the race is a combination of modifying user code to
preserve the address of the function descriptor in register %r22 and
setting the least-significant bit in the relocation offset.  The
latter was suggested by Carlos as a way to distinguish relocation
offsets from global pointer values.  Conventionally, %r22 is used
as the address of the function descriptor in calls to $$dyncall.
So, it wasn't hard to preserve the address in %r22.

I have updated gcc trunk and gcc-9 branch to not clobber %r22 in
$$dyncall and inline indirect calls.  I have also modified the import
stubs in binutils trunk and the 2.33 branch to preserve %r22.  This
required making the stubs one instruction longer but we save one
relocation.  I also modified binutils to align the .plt section on
a 8-byte boundary.  This allows descriptors to be updated atomically
with a floting-point store.

With these changes, _dl_runtime_resolve() can fallback to an alternate
mechanism to find the relocation offset when it has been clobbered.
There's just one additional instruction in the fast path. I tested
the fallback function, _dl_fix_reloc_arg(), by changing the branch to
always use the fallback.  Old code still runs as it did before.

Fixes bug 23296.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 1a044511a3f9020c3f430164e0a6a77426fecd7e)

stdlib: Move tst-system to tests-container

Fix some issues with different shell and error messages.

Checked on x86_64-linux-gnu and i686-linux-gnu.

(cherry picked from commit 4eda036f5b897fa8bc20ddd2099b5a6ed4239dc9)

support/shell-container.c: Add builtin kill

No options supported.

Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 1c17100c43c0913ec94f3bcc966bf3792236c690)

support/shell-container.c: Add builtin exit

Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 5a5a3a3234bc220a5192d620e0cbc5360da46f14)

support/shell-container.c: Return 127 if execve fails

Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 5fce0e095bc413f908f472074c2235198cd76bf4)

Add NEWS entry for CVE-2020-1751 (bug 25423)

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 07d16a6debc830ebcf9533da5396edd2eff688e0)

posix: Fix system error return value [BZ #25715]

It fixes 5fb7fc9635 when posix_spawn fails.

Checked on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit f09542c584b121da0322fde4b55306d512b85d93)

sparc: Move sigreturn stub to assembly

It seems that some gcc versions might generates a stack frame for the
sigreturn stub requires on sparc signal handling.  For instance:

  $ cat test.c
  #define _GNU_SOURCE
  #include <sys/syscall.h>

  __attribute__ ((__optimize__ ("-fno-stack-protector")))
  void
  __sigreturn_stub (void)
  {
    __asm__ ("mov %0, %%g1\n\t"
            "ta  0x10\n\t"
            : /* no outputs */
            : "i" (SYS_rt_sigreturn));
  }
  $ gcc -v
  [...]
  gcc version 9.2.1 20200224 (Debian 9.2.1-30)
  $ gcc -O2 -m64 test.c -S -o -
  [...]
    __sigreturn_stub:
          save    %sp, -176, %sp
  #APP
  ! 9 "t.c" 1
          mov 101, %g1
          ta  0x10

  ! 0 "" 2
  #NO_APP
          .size   __sigreturn_stub, .-__sigreturn_stub

As indicated by kernel developers [1], the sigreturn stub can not change
the register window or the stack pointer since the kernel has setup the
restore frame at a precise location relative to the stack pointer when
the stub is invoked.

I tried to play with some compiler flags and even with _Noreturn and
__builtin_unreachable after the asm does not help (and Sparc does not
support naked functions).

To avoid similar issues, as the stack-protector support also have
stumbled, this patch moves the implementation of the sigreturn stubs to
assembly.

Checked on sparcv9-linux-gnu and sparc64-linux-gnu with gcc 9.2.1
and gcc 7.5.0.

[1] https://lkml.org/lkml/2016/5/27/465

(cherry picked from commit b33e946fbb1659d2c5937c4dd756a7c49a132dff)

arm: Fix softp-fp Implies (BZ #25635)

The commit "arm: Split BE/LE abilist"
(1673ba87fefe019c834c09d33673d1d453ea698d) changed the soft-fp order for
ARM selection when __SOFTFP__ is defined by the compiler.

On 2.30 the sysdeps order is:

2.30
sysdeps/unix/sysv/linux/arm
sysdeps/arm/nptl
sysdeps/unix/sysv/linux
sysdeps/nptl
sysdeps/pthread
sysdeps/gnu
sysdeps/unix/inet
sysdeps/unix/sysv
sysdeps/unix/arm
sysdeps/unix
sysdeps/posix
sysdeps/arm/nofpu
sysdeps/ieee754/soft-fp
sysdeps/arm
sysdeps/wordsize-32
sysdeps/ieee754/flt-32
sysdeps/ieee754/dbl-64
sysdeps/ieee754
sysdeps/generic

While on master is:

sysdeps/unix/sysv/linux/arm/le
sysdeps/unix/sysv/linux/arm
sysdeps/arm/nptl
sysdeps/unix/sysv/linux
sysdeps/nptl
sysdeps/pthread
sysdeps/gnu
sysdeps/unix/inet
sysdeps/unix/sysv
sysdeps/unix/arm
sysdeps/unix
sysdeps/posix
sysdeps/arm/le
sysdeps/arm
sysdeps/wordsize-32
sysdeps/ieee754/flt-32
sysdeps/ieee754/dbl-64
sysdeps/arm/nofpu
sysdeps/ieee754/soft-fp
sysdeps/ieee754
sysdeps/generic

It make the build select some routines (fadd, fdiv, fmul, fsub, and fma)
on ieee754/flt-32 and ieee754/dbl-64 that requires fenv support to be
correctly rounded which in turns lead to math failures since the
__SOFTFP__ does not have fenv support.

With this patch the order is now:

sysdeps/unix/sysv/linux/arm/le
sysdeps/unix/sysv/linux/arm
sysdeps/arm/nptl
sysdeps/unix/sysv/linux
sysdeps/nptlsysdeps/pthread
sysdeps/gnu
sysdeps/unix/inet
sysdeps/unix/sysv
sysdeps/unix/arm
sysdeps/unix
sysdeps/posix
sysdeps/arm/le/nofpu
sysdeps/arm/nofpu
sysdeps/ieee754/soft-fp
sysdeps/arm/le
sysdeps/arm
sysdeps/wordsize-32
sysdeps/ieee754/flt-32
sysdeps/ieee754/dbl-64
sysdeps/ieee754
sysdeps/generic

Checked on arm-linux-gnuaebi.

(cherry picked from commit af09e5e5d9ec3ca20891e61a6922eac984fcbdc4)

i386: Use comdat instead of .gnu.linkonce for i386 setup pic register (BZ #20543)

GCC has moved from using .gnu.linkonce for i386 setup pic register with
minimum current version (as for binutils minimum binutils that support
comdat).

Trying to pinpoint when binutils has added comdat support for i686, it
seems it was around 2004 [1].  I also checking with some ancient
binutils older than 2.16 I see:

test.o: In function `__x86.get_pc_thunk.bx':
test.o(.text.__x86.get_pc_thunk.bx+0x0): multiple definition of `__x86.get_pc_thunk.bx'
/usr/lib/gcc/x86_64-linux-gnu/5/../../../i386-linux-gnu/crti.o(.gnu.linkonce.t.__x86.get_pc_thunk.bx+0x0): first defined here

Which seems that such version can not handle either comdat at all or
a mix of linkonce and comdat.  For binutils 2.16.1 I am getting a
different issue trying to link a binary with and more recent
ctri.o (unrecognized relocation (0x2b) in section `.init', which is
R_386_GOT32X and old binutils won't generate it anyway).

So I think that either unlikely someone will use an older binutils than
the one used to glibc and even this scenario may fail with some issue
as the R_386_GOT32X.  Also, 2.16.1 is quite old and not really supported
(glibc itself required 2.25).

Checked on i686-linux-gnu.

[1] https://gcc.gnu.org/ml/gcc/2004-05/msg00030.html

(cherry picked from commit 35200fd3892f6caf867bf89bc8048e553906af28)

Add NEWS entry for CVE-2020-1752 (bug 25414)

(cherry picked from commit 39a05214fe14ff722d4d92e697fb71ff15e84e70)

Fix array overflow in backtrace on PowerPC (bug 25423)

When unwinding through a signal frame the backtrace function on PowerPC
didn't check array bounds when storing the frame address. Fixes commit
d400dcac5e ("PowerPC: fix backtrace to handle signal trampolines").

(cherry picked from commit d93769405996dfc11d216ddbe415946617b5a494)

Fix use-after-free in glob when expanding ~user (bug 25414)

The value of `end_name' points into the value of `dirname', thus don't
deallocate the latter before the last use of the former.

(cherry picked from commit ddc650e9b3dc916eab417ce9f79e67337b05035c)

Add NEWS entry for CVE-2020-10029 (bug 25487)

(cherry picked from commit 15ab195229dc288d1d49612c3de14a33b88065ed)

math/test-sinl-pseudo: Use stack protector only if available

This fixes commit 9333498794cde1d5cca518bad ("Avoid ldbl-96 stack
corruption from range reduction of pseudo-zero (bug 25487).").

(cherry picked from commit c10acd40262486dac597001aecc20ad9d3bd0e4a)

Avoid ldbl-96 stack corruption from range reduction of pseudo-zero (bug 25487).

Bug 25487 reports stack corruption in ldbl-96 sinl on a pseudo-zero
argument (an representation where all the significand bits, including
the explicit high bit, are zero, but the exponent is not zero, which
is not a valid representation for the long double type).

Although this is not a valid long double representation, existing
practice in this area (see bug 4586, originally marked invalid but
subsequently fixed) is that we still seek to avoid invalid memory
accesses as a result, in case of programs that treat arbitrary binary
data as long double representations, although the invalid
representations of the ldbl-96 format do not need to be consistently
handled the same as any particular valid representation.

This patch makes the range reduction detect pseudo-zero and unnormal
representations that would otherwise go to __kernel_rem_pio2, and
returns a NaN for them instead of continuing with the range reduction
process. (Pseudo-zero and unnormal representations whose unbiased
exponent is less than -1 have already been safely returned from the
function before this point without going through the rest of range
reduction.) Pseudo-zero representations would previously result in
the value passed to __kernel_rem_pio2 being all-zero, which is
definitely unsafe; unnormal representations would previously result in
a value passed whose high bit is zero, which might well be unsafe
since that is not a form of input expected by __kernel_rem_pio2.

Tested for x86_64.

(cherry picked from commit 9333498794cde1d5cca518badf79533a24114b6f)

Improve IFUNC check [BZ #25506]

GNU ld's RISCV port does not support IFUNC. ld -no-pie produces no
relocation and the test passed incorrectly. Be more rigid by testing
IRELATIVE explicitly.

Tested-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 87a698a21646b7ee620923ef5ffa9735471a8ddd)

malloc/tst-mallocfork2: Kill lingering process for unexpected failures

If the test fails due some unexpected failure after the children
creation, either in the signal handler by calling abort or in the main
loop; the created children might not be killed properly.

This patches fixes it by:

  * Avoid aborting in the signal handler by setting a flag that
    an error has occured and add a check in the main loop.

  * Add a atexit handler to handle kill child processes.

Checked on x86_64-linux-gnu.

riscv: Avoid clobbering register parameters in syscall

The riscv INTERNAL_SYSCALL macro might clobber the register
parameter if the argument itself might clobber any register (a function
call for instance).

This patch fixes it by using temporary variables for the expressions
between the register assignments (as indicated by GCC documentation,
6.47.5.2 Specifying Registers for Local Variables).

It is similar to the fix done for MIPS (bug 25523).

Checked with riscv64-linux-gnu-rv64imafdc-lp64d build.

(cherry picked from commit be74b42ee2a97009a6cd4fc90126add4a41c583b)

microblaze: Avoid clobbering register parameters in syscall

The microblaze INTERNAL_SYSCALL macro might clobber the register
parameter if the argument itself might clobber any register (a function
call for instance).

This patch fixes it by using temporary variables for the expressions
between the register assignments (as indicated by GCC documentation,
6.47.5.2 Specifying Registers for Local Variables).

It is similar to the fix done for MIPS (bug 25523).

Checked with microblaze-linux-gnu and microblazeel-linux-gnu build.

(cherry picked from commit 6cc8fc7c1506e8084d65b078ff5b05a92e17a28a)

mips: Fix argument passing for inlined syscalls on Linux [BZ #25523]

According to [gcc documentation][1], temporary variables must be used for
the desired content to not be call-clobbered.

Fix the Linux inline syscall templates by adding temporary variables,
much like what x86 did before
(commit 381a0c26d73e0f074c962e0ab53b99a6c327066d).

Tested with gcc 9.2.0, both cross-compiled and natively on Loongson
3A4000.

[1]: https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html

(cherry picked from commit 4fbba6fe904d0094ddc4284066b3860d119cbd4a)

mips: Use 'long int' and 'long long int' in linux syscall code

Style fixes only, no functional change.

(cherry picked from commit d3fbb18aa3164ca1d11e8acba81e1fc2ca70f43e)

hppa: Align __clone stack argument to 8 bytes (Bug 25066)

The hppa architecture requires strict alignment for loads and stores.
As a result, the minimum stack alignment that will work is 8 bytes.
This patch adjusts __clone() to align the stack argument passed to it.
It also adjusts slightly some formatting.

This fixes the nptl/tst-tls1 test.

(cherry picked from commit e4c23a029a54c8c7788eff9ca771a01cccaaa0ce)

Merge branch release/2.30/master into ibm/2.30/master

Remove incorrect alloc_size attribute from pvalloc [BZ #25401]

pvalloc is guarantueed to round up the allocation size to the page
size, so applications can assume that the memory region is larger
than the passed-in argument. The alloc_size attribute cannot express
that.

The test case is based on a suggestion from Jakub Jelinek.

This fixes commit 9bf8e29ca136094f73f69f725f15c51facc97206 ("malloc:
make malloc fail with requests larger than PTRDIFF_MAX (BZ#23741)").

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 768c83b7f60d82db6677e19dc51be9f341e0f3fc)

login: Use pread64 in utmp implementation

This reduces the possible error scenarios considerably because
no longer can file seek fail, leaving the file descriptor in an
inconsistent state and out of sync with the cache.

As a result, it is possible to avoid setting file_offset to -1
to make an error persistent. Instead, subsequent calls will retry
the operation and report any errors returned by the kernel.

This change also avoids reading the file from the start if pututline
is called multiple times, to work around lock acquisition failures
due to timeouts.

Change-Id: If21ea0c162c38830a89331ea93cddec14c0974de
(cherry picked from commit d4625a19fe64f664119a541b317fb83de01bb273)

Add nocancel version of pread64()

This is in preparation for changes in the dynamic linker so that
pread() is used instead of lseek()+read().

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit fed33b0fb03d1942a6713286176d42869c0f1580)

login: Introduce matches_last_entry to utmp processing

This simplifies internal_getut_nolock and fixes a regression,
introduced in commit be6b16d975683e6cca57852cd4cfe715b2a9d8b1
("login: Acquire write lock early in pututline [BZ #24882]")
in pututxline because __utmp_equal can only compare process-related
utmp entries.

Fixes: be6b16d975683e6cca57852cd4cfe715b2a9d8b1
Change-Id: Ib8a85002f7f87ee41590846d16d7e52bdb82f5a5
(cherry picked from commit 76a7c103eb9060f9e3ba01d073ae4621a17d8b46)

login: Acquire write lock early in pututline [BZ #24882]

It has been reported that due to lack of fairness in POSIX file
locking, the current reader-to-writer lock upgrade can result in
lack of forward progress. Acquiring the write lock directly
hopefully avoids this issue if there are only writers.

This also fixes bug 24882 due to the cache revalidation in
__libc_pututline.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Change-Id: I57e31ae30719e609a53505a0924dda101d46372e
(cherry picked from commit be6b16d975683e6cca57852cd4cfe715b2a9d8b1)

login: Add nonstring attributes to struct utmp, struct utmpx [BZ #24899]

Commit 7532837d7b03b3ca5b9a63d77a5bd81dd23f3d9c ("The
-Wstringop-truncation option new in GCC 8 detects common misuses")
added __attribute_nonstring__ to bits/utmp.h, but it did not update
the parallel bits/utmpx.h header. In struct utmp, the nonstring
attribute for ut_id was missing.

(cherry picked from commit c2adefbafcdd2519ff43eca6891c77cd7b29ab62)

login: Remove double-assignment of fl.l_whence in try_file_lock

Since l_whence is the second member of struct flock, it is written
twice. The double-assignment is technically undefined behavior due to
the lack of a sequence point.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Change-Id: I2baf9e70690e723c61051b25ccbd510aec15976c
(cherry picked from commit b0a83ae71b2588bd2a9e6b40f95191602940e01e)

login: pututxline could fail to overwrite existing entries [BZ #24902]

The internal_getut_r function updates the file_offset variable and
therefore must always update last_entry as well.

Previously, if pututxline could not upgrade the read lock to a
write lock, internal_getut_r would update file_offset only,
without updating last_entry, and a subsequent call would not
overwrite the existing utmpx entry at file_offset, instead
creating a new entry. This has been observed to cause unbounded
file growth in high-load situations.

This commit removes the buffer argument to internal_getut_r and
updates the last_entry variable directly, along with file_offset.

Initially reported and fixed by Ondřej Lysoněk.

Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
(cherry picked from commit 61d3db428176d9d0822e4e680305fe34285edff2)

login: Use struct flock64 in utmp [BZ #24880]

Commit 06ab719d30b01da401150068054d3b8ea93dd12f ("Fix Linux fcntl OFD
locks for non-LFS architectures (BZ#20251)") introduced the use of
fcntl64 into the utmp implementation. However, the lock file
structure was not updated to struct flock64 at that point.

(cherry picked from commit 0d5b2917530ccaf8ad312dfbb7bce69d569c23ad)

login: Disarm timer after utmp lock acquisition [BZ #24879]

If the file processing takes a long time for some reason, SIGALRM can
arrive while the file is still being processed.  At that point, file
access will fail with EINTR.  Disarming the timer after lock
acquisition avoids that.  (If there was a previous alarm, it is the
responsibility of the caller to deal with the EINTR error.)

(cherry picked from commit 628598be7e1bfaa04f34df71ef6678f2c5103dfd)

login: Fix updwtmp, updwtmx unlocking

Commit 5a3afa9738f3dbbaf8c0a35665318c1af782111b (login: Replace
macro-based control flow with function calls in utmp) introduced
a regression because after it, __libc_updwtmp attempts to unlock
the wrong file descriptor.

(cherry picked from commit 341da5b4b6253de9a7581a066f33f89cacb44dec)

login: Replace macro-based control flow with function calls in utmp

(cherry picked from commit 5a3afa9738f3dbbaf8c0a35665318c1af782111b)

login: Assume that _HAVE_UT_* constants are true

Make the GNU version of bits/utmp.h the generic version because
all remaining ports use it (with a sysdeps override for
Linux s390/s390x).

(cherry picked from commit a33b817f13170b5c24263b92e7e09880fe797d7e)

login: Remove utmp backend jump tables [BZ #23518]

There is just one file-based implementation, so this dispatch
mechanism is unnecessary. Instead of the vtable pointer
__libc_utmp_jump_table, use a non-negative file_fd as the indicator
that the backend is initialized.

(cherry picked from commit 1a7fe2ebe52b3c8bf465d1756e69452d05c1c103)

misc/test-errno-linux: Handle EINVAL from quotactl

In commit 3dd4d40b420846dd35869ccc8f8627feef2cff32 ("xfs: Sanity check
flags of Q_XQUOTARM call"), Linux 5.4 added checking for the flags
argument, causing the test to fail due to too restrictive test
expectations.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 1f7525d924b608a3e43b10fcfb3d46b8a6e9e4f9)

<string.h>: Define __CORRECT_ISO_CPP_STRING_H_PROTO for Clang [BZ #25232]

Without the asm redirects, strchr et al. are not const-correct.

libc++ has a wrapper header that works with and without
__CORRECT_ISO_CPP_STRING_H_PROTO (using a Clang extension). But when
Clang is used with libstdc++ or just C headers, the overloaded functions
with the correct types are not declared.

This change does not impact current GCC (with libstdc++ or libc++).

(cherry picked from commit 953ceff17a4a15b10cfdd5edc3c8cae4884c8ec3)

x86: Assume --enable-cet if GCC defaults to CET [BZ #25225]

This links in CET support if GCC defaults to CET. Otherwise, __CET__
is defined, yet CET functionality is not compiled and linked into the
dynamic loader, resulting in a linker failure due to undefined
references to _dl_cet_check and _dl_open_check.

libio: Disable vtable validation for pre-2.1 interposed handles [BZ #25203]

Commit c402355dfa7807b8e0adb27c009135a7e2b9f1b0 ("libio: Disable
vtable validation in case of interposition [BZ #23313]") only covered
the interposable glibc 2.1 handles, in libio/stdfiles.c. The
parallel code in libio/oldstdfiles.c needs similar detection logic.

Fixes (again) commit db3476aff19b75c4fdefbe65fcd5f0a90588ba51
("libio: Implement vtable verification [BZ #20191]").

Change-Id: Ief6f9f17e91d1f7263421c56a7dc018f4f595c21
(cherry picked from commit cb61630ed712d033f54295f776967532d3f4b46a)

S390: Fix handling of needles crossing a page in strstr z15 ifunc-variant. [BZ #25226]

If the specified needle crosses a page-boundary, the s390-z15 ifunc variant of
strstr truncates the needle which results in invalid results.

This is fixed by loading the needle beyond the page boundary to v18 instead of v16.
The bug is sometimes observable in test-strstr.c in check1 and check2 as the
haystack and needle is stored on stack. Thus the needle can be on a page boundary.

check2 is now extended to test haystack / needles located on stack, at end of page
and on two pages.

This bug was introduced with commit 6f47401bd5fc71209219779a0426170a9a7395b0
("S390: Add arch13 strstr ifunc variant.") and is already released in glibc 2.30.

(cherry picked from commit bfdb731438206b0f70fe7afa890681155c30b419)

Merge branch release/2.30/master into ibm/2.30/master

rtld: Check __libc_enable_secure before honoring LD_PREFER_MAP_32BIT_EXEC (CVE-2019-19126) [BZ #25204]

The problem was introduced in glibc 2.23, in commit
b9eb92ab05204df772eb4929eccd018637c9f3e9
("Add Prefer_MAP_32BIT_EXEC to map executable pages with MAP_32BIT").

(cherry picked from commit d5dfad4326fc683c813df1e37bbf5cf920591c8e)

Don't use a custom wrapper macro around __has_include (bug 25189).

This causes issues when using clang with -frewrite-includes to e.g.,
submit the translation unit to a distributed compiler.

In my case, I was building Firefox using sccache.

See [1] for a reduced test-case since I initially thought this was a
clang bug, and [2] for more context.

Apparently doing this is invalid C++ per [cpp.cond], which mentions [3]:

> The #ifdef and #ifndef directives, and the defined conditional
> inclusion operator, shall treat __has_include and __has_cpp_attribute
> as if they were the names of defined macros. The identifiers
> __has_include and __has_cpp_attribute shall not appear in any context
> not mentioned in this subclause.

[1]: https://bugs.llvm.org/show_bug.cgi?id=43982
[2]: https://bugs.llvm.org/show_bug.cgi?id=37990
[3]: http://eel.is/c++draft/cpp.cond#7.sentence-2

Change-Id: Id4b8ee19176a9e4624b533087ba870c418f27e60
(cherry picked from commit bfa864e1645e140da2e1aae3cf0d0ba0674f6eb5)

Base max_fast on alignment, not width, of bins (Bug 24903)

set_max_fast sets the "impossibly small" value based on,
eventually, MALLOC_ALIGNMENT. The comparisons for the smallest
chunk used is, eventually, MIN_CHUNK_SIZE. Note that i386
is the only platform where these are the same, so a smallest
chunk *would* be put in a no-fastbins fastbin.

This change calculates the "impossibly small" value
based on MIN_CHUNK_SIZE instead, so that we can know it will
always be impossibly small.

(cherry picked from commit ff12e0fb91b9072800f031cb21fb2651ee7b6251)

malloc: Various cleanups for malloc/tst-mxfast

(cherry picked from commit f9769a239784772453d595bc2f4bed8739810e06)

Add glibc.malloc.mxfast tunable

* elf/dl-tunables.list: Add glibc.malloc.mxfast.
* manual/tunables.texi: Document it.
* malloc/malloc.c (do_set_mxfast): New.
(__libc_mallopt): Call it.
* malloc/arena.c: Add mxfast tunable.
* malloc/tst-mxfast.c: New.
* malloc/Makefile: Add it.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit c48d92b430c480de06762f80c104922239416826)

malloc: Fix missing accounting of top chunk in malloc_info [BZ #24026]

Fixes `<total type="rest" size="..."> incorrectly showing as 0 most
of the time.

The rest value being wrong is significant because to compute the
actual amount of memory handed out via malloc, the user must subtract
it from <system type="current" size="...">. That result being wrong
makes investigating memory fragmentation issues like
<https://bugzilla.redhat.com/show_bug.cgi?id=843478> close to
impossible.

(cherry picked from commit b6d2c4475d5abc05dd009575b90556bdd3c78ad0)

ChangeLog update from my last commit

I forgot to include the ChangeLog update with my last commit:
7b8481b330720d28c019a2e5994492a1923d5daa.

Signed-off-by: Paul A. Clarke <pc@us.ibm.com>

[powerpc] No need to enter "Ignore Exceptions Mode"

Since at least POWER8, there is no performance advantage to entering
"Ignore Exceptions Mode", and doing so conditionally requires
- the conditional logic, and
- a system call.

Make it a no-op for uses within glibc.

[powerpc] Rename fesetenv_mode to fesetenv_control

fesetenv_mode is used variously to write the FPSCR exception enable
bits and rounding mode bits. These are referred to as the control
bits in the POWER ISA. Change the name to be reflective of its
current and expected use, and match up well with fegetenv_control.

[powerpc] libc_feholdsetround_noex_ppc_ctx: optimize FPSCR write

libc_feholdsetround_noex_ppc_ctx currently performs:
1. Read FPSCR, save to context.
2. Create new FPSCR value: clear enables and set new rounding mode.
3. Write new value to FPSCR.

Since other bits just pass through, there is no need to write them.

Instead, write just the changed values (enables and rounding mode),
which can be a bit more efficient.

[powerpc] Rename fegetenv_status to fegetenv_control

fegetenv_status is used variously to retrieve the FPSCR exception enable
bits, rounding mode bits, or both.  These are referred to as the control
bits in the POWER ISA.  FPSCR status bits are also returned by the
'mffs' and 'mffsl' instructions, but they are uniformly ignored by all
uses of fegetenv_status.  Change the name to be reflective of its
current and expected use.

Reviewed-By: Paul E Murphy <murphyp@linux.ibm.com>

[powerpc] __fesetround_inline optimizations

On POWER9, use more efficient means to update the 2-bit rounding mode
via the 'mffscrn' instruction (instead of two 'mtfsb0/1' instructions
or one 'mtfsfi' instruction that modifies 4 bits).

Suggested-by: Paul E. Murphy <murphyp@linux.ibm.com>
Reviewed-By: Paul E Murphy <murphyp@linux.ibm.com>

[powerpc] libc_feupdateenv_test: optimize FPSCR access

ROUND_TO_ODD and a couple of other places use libc_feupdateenv_test to
restore the rounding mode and exception enables, preserve exception flags,
and test whether given exception(s) were generated.

If the exception flags haven't changed, then it is sufficient and a bit
more efficient to just restore the rounding mode and enables, rather than
writing the full Floating-Point Status and Control Register (FPSCR).

Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>

[powerpc] fenv_private.h clean up

fenv_private.h includes unused functions, magic macro constants, and
some replicated common code fragments.

Remove unused functions, replace magic constants with constants from
fenv_libc.h, and refactor replicated code.

Suggested-by: Paul E. Murphy <murphyp@linux.ibm.com>
Reviewed-By: Paul E Murphy <murphyp@linux.ibm.com>

[powerpc] SET_RESTORE_ROUND optimizations and bug fix

SET_RESTORE_ROUND brackets a block of code, temporarily setting and
restoring the rounding mode and letting everything else, including
exceptions generated within the block, pass through.

On powerpc, the current code clears the exception enables, which will hide
exceptions generated within the block. This issue was introduced by me
in commit e905212627350d54b58426214b5a54ddc852b0c9.

Fix this by not clearing exception enable bits in the prologue.

Also, since we are no longer changing the enable bits in either the
prologue or the epilogue, there is no need to test for entering/exiting
non-stop mode.

Also, optimize the prologue get/save/set rounding mode operations for
POWER9 and later by using 'mffscrn' when possible.

Suggested-by: Paul E. Murphy <murphyp@linux.ibm.com>
Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
Fixes: e905212627350d54b58426214b5a54ddc852b0c9
2019-09-19 Paul A. Clarke <pc@us.ibm.com>

* sysdeps/powerpc/fpu/fenv_libc.h (fegetenv_and_set_rn): New.
(__fe_mffscrn): New.
* sysdeps/powerpc/fpu/fenv_private.h (libc_feholdsetround_ppc_ctx):
Do not clear enable bits, remove obsolete code, use
fegetenv_and_set_rn.
(libc_feresetround_ppc): Remove obsolete code, use
fegetenv_and_set_rn.

[powerpc] fegetenv_status: simplify instruction generation

fegetenv_status() wants to use the lighter weight instruction 'mffsl'
for reading the Floating-Point Status and Control Register (FPSCR).
It currently will use it directly if compiled '-mcpu=power9', and will
perform a runtime check (cpu_supports("arch_3_00")) otherwise.

Nicely, it turns out that the 'mffsl' instruction will decode to
'mffs' on architectures older than "arch_3_00" because the additional
bits set for 'mffsl' are "don't care" for 'mffs'. 'mffs' is a superset
of 'mffsl'.

So, just generate 'mffsl'.

[powerpc] fesetenv: optimize FPSCR access

fesetenv() reads the current value of the Floating-Point Status and Control
Register (FPSCR) to determine the difference between the current state of
exception enables and the newly requested state. All of these bits are also
returned by the lighter weight 'mffsl' instruction used by fegetenv_status().
Use that instead.

Also, remove a local macro _FPU_MASK_ALL in favor of a common macro,
FPU_ENABLES_MASK from fenv_libc.h.

Finally, use a local variable ('new') in favor of a pointer dereference
('*envp').

[powerpc] SET_RESTORE_ROUND improvements

SET_RESTORE_ROUND uses libc_feholdsetround_ppc_ctx and
libc_feresetround_ppc_ctx to bracket a block of code where the floating point
rounding mode must be set to a certain value.

For the *prologue*, libc_feholdsetround_ppc_ctx is used and performs:
1. Read/save FPSCR.
2. Create new value for FPSCR with new rounding mode and enables cleared.
3. If new value is different than current value,
   a. If transitioning from a state where some exceptions enabled,
      enter "ignore exceptions / non-stop" mode.
   b. Write new value to FPSCR.
   c. Put a mark on the wall indicating the FPSCR was changed.

(1) uses the 'mffs' instruction.  On POWER9, the lighter weight 'mffsl'
instruction can be used, but it doesn't return all of the bits in the FPSCR.
fegetenv_status uses 'mffsl' on POWER9, 'mffs' otherwise, and can thus be
used instead of fegetenv_register.
(3b) uses 'mtfsf 0b11111111' to write the entire FPSCR, so it must
instead use 'mtfsf 0b00000011' to write just the enables and the mode,
because some of the rest of the bits are not valid if 'mffsl' was used.
fesetenv_mode uses 'mtfsf 0b00000011' on POWER9, 'mtfsf 0b11111111'
otherwise.

For the *epilogue*, libc_feresetround_ppc_ctx checks the mark on the wall, then
calls libc_feresetround_ppc, which just calls __libc_femergeenv_ppc with
parameters such that it performs:
1. Retreive saved value of FPSCR, saved in prologue above.
2. Read FPSCR.
3. Create new value of FPSCR where:
   - Summary bits and exception indicators = current OR saved.
   - Rounding mode and enables = saved.
   - Status bits = current.
4. If transitioning from some exceptions enabled to none,
   enter "ignore exceptions / non-stop" mode.
5. If transitioning from no exceptions enabled to some,
   enter "catch exceptions" mode.
6. Write new value to FPSCR.

The summary bits are hardwired to the exception indicators, so there is no
need to restore any saved summary bits.
The exception indicator bits, which are sticky and remain set unless
explicitly cleared, would only need to be restored if the code block
might explicitly clear any of them.  This is certainly not expected.

So, the only bits that need to be restored are the enables and the mode.
If it is the case that only those bits are to be restored, there is no need to
read the FPSCR.  Steps (2) and (3) are unnecessary, and step (6) only needs to
write the bits being restored.

We know we are transitioning out of "ignore exceptions" mode, so step (4) is
unnecessary, and in step (6), we only need to check the state we are
entering.

[powerpc] fe{en,dis}ableexcept, fesetmode: optimize FPSCR accesses

Since fe{en,dis}ableexcept() and fesetmode() read-modify-write just the
"mode" (exception enable and rounding mode) bits of the Floating Point Status
Control Register (FPSCR), the lighter weight 'mffsl' instruction can be used
to read the FPSCR (enables and rounding mode), and 'mtfsf 0b00000011' can be
used to write just those bits back to the FPSCR.  The net is better performance.

In addition, fe{en,dis}ableexcept() read the FPSCR again after writing it, or
they determine that it doesn't need to be written because it is not changing.
In either case, the local variable holds the current values of the enable
bits in the FPSCR.  This local variable can be used instead of again reading
the FPSCR.

Also, that value of the FPSCR which is read the second time is validated
against the requested enables.  Since the write can't fail, this validation
step is unnecessary, and can be removed.  Instead, the exceptions to be
enabled (or disabled) are transformed into available bits in the FPSCR,
then validated after being transformed back, to ensure that all requested
bits are actually being set.  For example, FE_INVALID_SQRT can be
requested, but cannot actually be set.  This bit is not mapped during the
transformations, so a test for that bit being set before and after
transformations will show the bit would not be set, and the function will
return -1 for failure.

Finally, convert the local macros in fesetmode.c to more generally useful
macros in fenv_libc.h.