H.J. Lu [Tue, 12 Jan 2021 13:15:49 +0000 (05:15 -0800)]
x86-64: Avoid rep movsb with short distance [BZ #27130]
When copying with "rep movsb", if the distance between source and
destination is N*4GB + [1..63] with N >= 0, performance may be very
slow. This patch updates memmove-vec-unaligned-erms.S for AVX and
AVX512 versions with the distance in RCX:
cmpl $63, %ecx
// Don't use "rep movsb" if ECX <= 63
jbe L(Don't use rep movsb")
Use "rep movsb"
Benchtests data with bench-memcpy, bench-memcpy-large, bench-memcpy-random
and bench-memcpy-walk on Skylake, Ice Lake and Tiger Lake show that its
performance impact is within noise range as "rep movsb" is only used for
data size >= 4KB.
The variant PCS support was ineffective because in the common case
linkmap->l_mach.plt == 0 but then the symbol table flags were ignored
and normal lazy binding was used instead of resolving the relocs early.
(This was a misunderstanding about how GOT[1] is setup by the linker.)
In practice this mainly affects SVE calls when the vector length is
more than 128 bits, then the top bits of the argument registers get
clobbered during lazy binding.
Wilco Dijkstra [Wed, 11 Mar 2020 17:15:25 +0000 (17:15 +0000)]
[AArch64] Improve integer memcpy
Further optimize integer memcpy. Small cases now include copies up
to 32 bytes. 64-128 byte copies are split into two cases to improve
performance of 64-96 byte copies. Comments have been rewritten.
Krzysztof Koch [Tue, 5 Nov 2019 17:35:18 +0000 (17:35 +0000)]
aarch64: Increase small and medium cases for __memcpy_generic
Increase the upper bound on medium cases from 96 to 128 bytes.
Now, up to 128 bytes are copied unrolled.
Increase the upper bound on small cases from 16 to 32 bytes so that
copies of 17-32 bytes are not impacted by the larger medium case.
Benchmarking:
The attached figures show relative timing difference with respect
to 'memcpy_generic', which is the existing implementation.
'memcpy_med_128' denotes the the version of memcpy_generic with
only the medium case enlarged. The 'memcpy_med_128_small_32' numbers
are for the version of memcpy_generic submitted in this patch, which
has both medium and small cases enlarged. The figures were generated
using the script from:
https://www.sourceware.org/ml/libc-alpha/2019-10/msg00563.html
Depending on the platform, the performance improvement in the
bench-memcpy-random.c benchmark ranges from 6% to 20% between
the original and final version of memcpy.S
Tested against GLIBC testsuite and randomized tests.
Wilco Dijkstra [Fri, 28 Aug 2020 16:51:40 +0000 (17:51 +0100)]
AArch64: Improve backwards memmove performance
On some microarchitectures performance of the backwards memmove improves if
the stores use STR with decreasing addresses. So change the memmove loop
in memcpy_advsimd.S to use 2x STR rather than STP.
Add a new memcpy using 128-bit Q registers - this is faster on modern
cores and reduces codesize. Similar to the generic memcpy, small cases
include copies up to 32 bytes. 64-128 byte copies are split into two
cases to improve performance of 64-96 byte copies. Large copies align
the source rather than the destination.
bench-memcpy-random is ~9% faster than memcpy_falkor on Neoverse N1,
so make this memcpy the default on N1 (on Centriq it is 15% faster than
memcpy_falkor).
arm: CVE-2020-6096: Fix multiarch memcpy for negative length [BZ #25620]
Unsigned branch instructions could be used for r2 to fix the wrong
behavior when a negative length is passed to memcpy.
This commit fixes the armv7 version.
arm: CVE-2020-6096: fix memcpy and memmove for negative length [BZ #25620]
Unsigned branch instructions could be used for r2 to fix the wrong
behavior when a negative length is passed to memcpy and memmove.
This commit fixes the generic arm implementation of memcpy amd memmove.
strcmp-avx2.S: In avx2 strncmp function, strings are compared in
chunks of 4 vector size(i.e. 32x4=128 byte for avx2). After first 4
vector size comparison, code must check whether it already passed
the given offset. This patch implement avx2 offset check condition
for strncmp function, if both string compare same for first 4 vector
size.
Florian Weimer [Tue, 19 May 2020 12:09:38 +0000 (14:09 +0200)]
nss_compat: internal_end*ent may clobber errno, hiding ERANGE [BZ #25976]
During cleanup, before returning from get*_r functions, the end*ent
calls must not change errno. Otherwise, an ERANGE error from the
underlying implementation can be hidden, causing unexpected lookup
failures. This commit introduces an internal_end*ent_noerror
function which saves and restore errno, and marks the original
internal_end*ent function as warn_unused_result, so that it is used
only in contexts were errors from it can be handled explicitly.
H.J. Lu [Wed, 29 Apr 2020 20:20:27 +0000 (13:20 -0700)]
Add C wrappers for process_vm_readv/process_vm_writev [BZ #25810]
Since the the U marker can only be applied to 2 unsigned long arguments
in syscalls.list files, add a C wrapper for process_vm_readv and
process_vm_writev syscals which have more than 2 unsigned long arguments.
H.J. Lu [Wed, 29 Apr 2020 15:08:40 +0000 (08:08 -0700)]
Mark unsigned long arguments with U in more syscalls [BZ #25810]
Mark unsigned long arguments in mmap, read, recv, recvfrom, send, sendto,
write, ioperm, sendfile64, setxattr, lsetxattr, fsetxattr, getxattr,
lgetxattr, fgetxattr, listxattr, llistxattr and flistxattr with U in
syscalls.list files.
H.J. Lu [Wed, 29 Apr 2020 12:35:34 +0000 (05:35 -0700)]
Add SYSCALL_ULONG_ARG_[12] to pass long to syscall [BZ #25810]
X32 has 32-bit long and pointer with 64-bit off_t. Since x32 psABI
requires that pointers passed in registers must be zero-extended to
64bit, x32 can share many syscall interfaces with LP64. When a LP64
syscall with long and unsigned long int arguments is used for x32, these
arguments must be properly extended to 64-bit. Otherwise if the upper
32 bits of the register have undefined value, such a syscall will be
rejected by kernel.
For syscalls implemented in assembly codes, 'U' is added to syscall
signature key letters for unsigned long, which is zero-extended to
64-bit types. SYSCALL_ULONG_ARG_1 and SYSCALL_ULONG_ARG_2 are passed
to syscall-template.S for the first and the second unsigned long int
arguments if PSEUDOS_HAVE_ULONG_INDICES is defined. They are used by
x32 to zero-extend 32-bit arguments to 64 bits.
Tested on i386, x86-64 and x32 as well as with build-many-glibcs.py.
H.J. Lu [Mon, 13 Apr 2020 17:31:26 +0000 (10:31 -0700)]
x32: Properly pass long to syscall [BZ #25810]
X32 has 32-bit long and pointer with 64-bit off_t. Since x32 psABI
requires that pointers passed in registers must be zero-extended to
64bit, x32 can share many syscall interfaces with LP64. When a LP64
syscall with long and unsigned long arguments is used for x32, these
arguments must be properly extended to 64-bit. Otherwise if the upper
32 bits of the register have undefined value, such a syscall will be
rejected by kernel.
Enforce zero-extension for pointers and array system call arguments.
For integer types, extend to int64_t (the full register) using a
regular cast, resulting in zero or sign extension based on the
signedness of the original type.
For
void *mmap(void *addr, size_t length, int prot, int flags,
int fd, off_t offset);
1. addr is unchanged.
2. length is zero-extend to 64 bits.
3. prot is sign-extend to 64 bits.
4. flags is sign-extend to 64 bits.
5. fd is sign-extend to 64 bits.
6. offset is unchanged.
For int arguments, since kernel uses only the lower 32 bits and ignores
the upper 32 bits in 64-bit registers, these work correctly.
Tested on x86-64 and x32. There are no code changes on x86-64.
Fix data race in setting function descriptors during lazy binding on hppa.
This addresses an issue that is present mainly on SMP machines running
threaded code. In a typical indirect call or PLT import stub, the
target address is loaded first. Then the global pointer is loaded into
the PIC register in the delay slot of a branch to the target address.
During lazy binding, the target address is a trampoline which transfers
to _dl_runtime_resolve().
_dl_runtime_resolve() uses the relocation offset stored in the global
pointer and the linkage map stored in the trampoline to find the
relocation. Then, the function descriptor is updated.
In a multi-threaded application, it is possible for the global pointer
to be updated between the load of the target address and the global
pointer. When this happens, the relocation offset has been replaced
by the new global pointer. The function pointer has probably been
updated as well but there is no way to find the address of the function
descriptor and to transfer to the target. So, _dl_runtime_resolve()
typically crashes.
HP-UX addressed this problem by adding an extra pc-relative branch to
the trampoline. The descriptor is initially setup to point to the
branch. The branch then transfers to the trampoline. This allowed
the trampoline code to figure out which descriptor was being used
without any modification to user code. I didn't use this approach
as it is more complex and changes function pointer canonicalization.
The order of loading the target address and global pointer in
indirect calls was not consistent with the order used in import stubs.
In particular, $$dyncall and some inline versions of it loaded the
global pointer first. This was inconsistent with the global pointer
being updated first in dl-machine.h. Assuming the accesses are
ordered, we want elf_machine_fixup_plt() to store the global pointer
first and calls to load it last. Then, the global pointer will be
correct when the target function is entered.
However, just to make things more fun, HP added support for
out-of-order execution of accesses in PA 2.0. The accesses used by
calls are weakly ordered. So, it's possibly under some circumstances
that a function might be entered with the wrong global pointer.
However, HP uses weakly ordered accesses in 64-bit HP-UX, so I assume
that loading the global pointer in the delay slot of the branch must
work consistently.
The basic fix for the race is a combination of modifying user code to
preserve the address of the function descriptor in register %r22 and
setting the least-significant bit in the relocation offset. The
latter was suggested by Carlos as a way to distinguish relocation
offsets from global pointer values. Conventionally, %r22 is used
as the address of the function descriptor in calls to $$dyncall.
So, it wasn't hard to preserve the address in %r22.
I have updated gcc trunk and gcc-9 branch to not clobber %r22 in
$$dyncall and inline indirect calls. I have also modified the import
stubs in binutils trunk and the 2.33 branch to preserve %r22. This
required making the stubs one instruction longer but we save one
relocation. I also modified binutils to align the .plt section on
a 8-byte boundary. This allows descriptors to be updated atomically
with a floting-point store.
With these changes, _dl_runtime_resolve() can fallback to an alternate
mechanism to find the relocation offset when it has been clobbered.
There's just one additional instruction in the fast path. I tested
the fallback function, _dl_fix_reloc_arg(), by changing the branch to
always use the fallback. Old code still runs as it did before.
As indicated by kernel developers [1], the sigreturn stub can not change
the register window or the stack pointer since the kernel has setup the
restore frame at a precise location relative to the stack pointer when
the stub is invoked.
I tried to play with some compiler flags and even with _Noreturn and
__builtin_unreachable after the asm does not help (and Sparc does not
support naked functions).
To avoid similar issues, as the stack-protector support also have
stumbled, this patch moves the implementation of the sigreturn stubs to
assembly.
Checked on sparcv9-linux-gnu and sparc64-linux-gnu with gcc 9.2.1
and gcc 7.5.0.
The commit "arm: Split BE/LE abilist"
(1673ba87fefe019c834c09d33673d1d453ea698d) changed the soft-fp order for
ARM selection when __SOFTFP__ is defined by the compiler.
It make the build select some routines (fadd, fdiv, fmul, fsub, and fma)
on ieee754/flt-32 and ieee754/dbl-64 that requires fenv support to be
correctly rounded which in turns lead to math failures since the
__SOFTFP__ does not have fenv support.
i386: Use comdat instead of .gnu.linkonce for i386 setup pic register (BZ #20543)
GCC has moved from using .gnu.linkonce for i386 setup pic register with
minimum current version (as for binutils minimum binutils that support
comdat).
Trying to pinpoint when binutils has added comdat support for i686, it
seems it was around 2004 [1]. I also checking with some ancient
binutils older than 2.16 I see:
test.o: In function `__x86.get_pc_thunk.bx':
test.o(.text.__x86.get_pc_thunk.bx+0x0): multiple definition of `__x86.get_pc_thunk.bx'
/usr/lib/gcc/x86_64-linux-gnu/5/../../../i386-linux-gnu/crti.o(.gnu.linkonce.t.__x86.get_pc_thunk.bx+0x0): first defined here
Which seems that such version can not handle either comdat at all or
a mix of linkonce and comdat. For binutils 2.16.1 I am getting a
different issue trying to link a binary with and more recent
ctri.o (unrecognized relocation (0x2b) in section `.init', which is
R_386_GOT32X and old binutils won't generate it anyway).
So I think that either unlikely someone will use an older binutils than
the one used to glibc and even this scenario may fail with some issue
as the R_386_GOT32X. Also, 2.16.1 is quite old and not really supported
(glibc itself required 2.25).
Andreas Schwab [Mon, 20 Jan 2020 16:01:50 +0000 (17:01 +0100)]
Fix array overflow in backtrace on PowerPC (bug 25423)
When unwinding through a signal frame the backtrace function on PowerPC
didn't check array bounds when storing the frame address. Fixes commit d400dcac5e ("PowerPC: fix backtrace to handle signal trampolines").
Joseph Myers [Wed, 12 Feb 2020 23:31:56 +0000 (23:31 +0000)]
Avoid ldbl-96 stack corruption from range reduction of pseudo-zero (bug 25487).
Bug 25487 reports stack corruption in ldbl-96 sinl on a pseudo-zero
argument (an representation where all the significand bits, including
the explicit high bit, are zero, but the exponent is not zero, which
is not a valid representation for the long double type).
Although this is not a valid long double representation, existing
practice in this area (see bug 4586, originally marked invalid but
subsequently fixed) is that we still seek to avoid invalid memory
accesses as a result, in case of programs that treat arbitrary binary
data as long double representations, although the invalid
representations of the ldbl-96 format do not need to be consistently
handled the same as any particular valid representation.
This patch makes the range reduction detect pseudo-zero and unnormal
representations that would otherwise go to __kernel_rem_pio2, and
returns a NaN for them instead of continuing with the range reduction
process. (Pseudo-zero and unnormal representations whose unbiased
exponent is less than -1 have already been safely returned from the
function before this point without going through the rest of range
reduction.) Pseudo-zero representations would previously result in
the value passed to __kernel_rem_pio2 being all-zero, which is
definitely unsafe; unnormal representations would previously result in
a value passed whose high bit is zero, which might well be unsafe
since that is not a form of input expected by __kernel_rem_pio2.
Fangrui Song [Wed, 5 Feb 2020 05:55:44 +0000 (21:55 -0800)]
Improve IFUNC check [BZ #25506]
GNU ld's RISCV port does not support IFUNC. ld -no-pie produces no
relocation and the test passed incorrectly. Be more rigid by testing
IRELATIVE explicitly.
malloc/tst-mallocfork2: Kill lingering process for unexpected failures
If the test fails due some unexpected failure after the children
creation, either in the signal handler by calling abort or in the main
loop; the created children might not be killed properly.
This patches fixes it by:
* Avoid aborting in the signal handler by setting a flag that
an error has occured and add a check in the main loop.
* Add a atexit handler to handle kill child processes.
riscv: Avoid clobbering register parameters in syscall
The riscv INTERNAL_SYSCALL macro might clobber the register
parameter if the argument itself might clobber any register (a function
call for instance).
This patch fixes it by using temporary variables for the expressions
between the register assignments (as indicated by GCC documentation,
6.47.5.2 Specifying Registers for Local Variables).
It is similar to the fix done for MIPS (bug 25523).
Checked with riscv64-linux-gnu-rv64imafdc-lp64d build.
microblaze: Avoid clobbering register parameters in syscall
The microblaze INTERNAL_SYSCALL macro might clobber the register
parameter if the argument itself might clobber any register (a function
call for instance).
This patch fixes it by using temporary variables for the expressions
between the register assignments (as indicated by GCC documentation,
6.47.5.2 Specifying Registers for Local Variables).
It is similar to the fix done for MIPS (bug 25523).
Checked with microblaze-linux-gnu and microblazeel-linux-gnu build.
hppa: Align __clone stack argument to 8 bytes (Bug 25066)
The hppa architecture requires strict alignment for loads and stores.
As a result, the minimum stack alignment that will work is 8 bytes.
This patch adjusts __clone() to align the stack argument passed to it.
It also adjusts slightly some formatting.
Florian Weimer [Fri, 17 Jan 2020 14:11:20 +0000 (15:11 +0100)]
Remove incorrect alloc_size attribute from pvalloc [BZ #25401]
pvalloc is guarantueed to round up the allocation size to the page
size, so applications can assume that the memory region is larger
than the passed-in argument. The alloc_size attribute cannot express
that.
The test case is based on a suggestion from Jakub Jelinek.
Florian Weimer [Tue, 12 Nov 2019 11:25:49 +0000 (12:25 +0100)]
login: Use pread64 in utmp implementation
This reduces the possible error scenarios considerably because
no longer can file seek fail, leaving the file descriptor in an
inconsistent state and out of sync with the cache.
As a result, it is possible to avoid setting file_offset to -1
to make an error persistent. Instead, subsequent calls will retry
the operation and report any errors returned by the kernel.
This change also avoids reading the file from the start if pututline
is called multiple times, to work around lock acquisition failures
due to timeouts.
Florian Weimer [Tue, 12 Nov 2019 11:02:57 +0000 (12:02 +0100)]
login: Introduce matches_last_entry to utmp processing
This simplifies internal_getut_nolock and fixes a regression,
introduced in commit be6b16d975683e6cca57852cd4cfe715b2a9d8b1
("login: Acquire write lock early in pututline [BZ #24882]")
in pututxline because __utmp_equal can only compare process-related
utmp entries.
Fixes: be6b16d975683e6cca57852cd4cfe715b2a9d8b1
Change-Id: Ib8a85002f7f87ee41590846d16d7e52bdb82f5a5
(cherry picked from commit 76a7c103eb9060f9e3ba01d073ae4621a17d8b46)
Florian Weimer [Thu, 7 Nov 2019 17:15:18 +0000 (18:15 +0100)]
login: Acquire write lock early in pututline [BZ #24882]
It has been reported that due to lack of fairness in POSIX file
locking, the current reader-to-writer lock upgrade can result in
lack of forward progress. Acquiring the write lock directly
hopefully avoids this issue if there are only writers.
This also fixes bug 24882 due to the cache revalidation in
__libc_pututline.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Change-Id: I57e31ae30719e609a53505a0924dda101d46372e
(cherry picked from commit be6b16d975683e6cca57852cd4cfe715b2a9d8b1)
Commit 7532837d7b03b3ca5b9a63d77a5bd81dd23f3d9c ("The
-Wstringop-truncation option new in GCC 8 detects common misuses")
added __attribute_nonstring__ to bits/utmp.h, but it did not update
the parallel bits/utmpx.h header. In struct utmp, the nonstring
attribute for ut_id was missing.
Florian Weimer [Thu, 7 Nov 2019 08:53:41 +0000 (09:53 +0100)]
login: Remove double-assignment of fl.l_whence in try_file_lock
Since l_whence is the second member of struct flock, it is written
twice. The double-assignment is technically undefined behavior due to
the lack of a sequence point.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Change-Id: I2baf9e70690e723c61051b25ccbd510aec15976c
(cherry picked from commit b0a83ae71b2588bd2a9e6b40f95191602940e01e)
Florian Weimer [Wed, 28 Aug 2019 09:59:45 +0000 (11:59 +0200)]
login: pututxline could fail to overwrite existing entries [BZ #24902]
The internal_getut_r function updates the file_offset variable and
therefore must always update last_entry as well.
Previously, if pututxline could not upgrade the read lock to a
write lock, internal_getut_r would update file_offset only,
without updating last_entry, and a subsequent call would not
overwrite the existing utmpx entry at file_offset, instead
creating a new entry. This has been observed to cause unbounded
file growth in high-load situations.
This commit removes the buffer argument to internal_getut_r and
updates the last_entry variable directly, along with file_offset.
Florian Weimer [Thu, 15 Aug 2019 14:09:20 +0000 (16:09 +0200)]
login: Use struct flock64 in utmp [BZ #24880]
Commit 06ab719d30b01da401150068054d3b8ea93dd12f ("Fix Linux fcntl OFD
locks for non-LFS architectures (BZ#20251)") introduced the use of
fcntl64 into the utmp implementation. However, the lock file
structure was not updated to struct flock64 at that point.
Florian Weimer [Thu, 15 Aug 2019 14:09:05 +0000 (16:09 +0200)]
login: Disarm timer after utmp lock acquisition [BZ #24879]
If the file processing takes a long time for some reason, SIGALRM can
arrive while the file is still being processed. At that point, file
access will fail with EINTR. Disarming the timer after lock
acquisition avoids that. (If there was a previous alarm, it is the
responsibility of the caller to deal with the EINTR error.)
Florian Weimer [Thu, 15 Aug 2019 08:30:23 +0000 (10:30 +0200)]
login: Fix updwtmp, updwtmx unlocking
Commit 5a3afa9738f3dbbaf8c0a35665318c1af782111b (login: Replace
macro-based control flow with function calls in utmp) introduced
a regression because after it, __libc_updwtmp attempts to unlock
the wrong file descriptor.
There is just one file-based implementation, so this dispatch
mechanism is unnecessary. Instead of the vtable pointer
__libc_utmp_jump_table, use a non-negative file_fd as the indicator
that the backend is initialized.
Florian Weimer [Thu, 5 Dec 2019 16:29:42 +0000 (17:29 +0100)]
misc/test-errno-linux: Handle EINVAL from quotactl
In commit 3dd4d40b420846dd35869ccc8f8627feef2cff32 ("xfs: Sanity check
flags of Q_XQUOTARM call"), Linux 5.4 added checking for the flags
argument, causing the test to fail due to too restrictive test
expectations.
Kamlesh Kumar [Thu, 5 Dec 2019 15:54:47 +0000 (16:54 +0100)]
<string.h>: Define __CORRECT_ISO_CPP_STRING_H_PROTO for Clang [BZ #25232]
Without the asm redirects, strchr et al. are not const-correct.
libc++ has a wrapper header that works with and without
__CORRECT_ISO_CPP_STRING_H_PROTO (using a Clang extension). But when
Clang is used with libstdc++ or just C headers, the overloaded functions
with the correct types are not declared.
This change does not impact current GCC (with libstdc++ or libc++).
Florian Weimer [Tue, 3 Dec 2019 19:26:28 +0000 (20:26 +0100)]
x86: Assume --enable-cet if GCC defaults to CET [BZ #25225]
This links in CET support if GCC defaults to CET. Otherwise, __CET__
is defined, yet CET functionality is not compiled and linked into the
dynamic loader, resulting in a linker failure due to undefined
references to _dl_cet_check and _dl_open_check.
Florian Weimer [Thu, 28 Nov 2019 13:18:12 +0000 (14:18 +0100)]
libio: Disable vtable validation for pre-2.1 interposed handles [BZ #25203]
Commit c402355dfa7807b8e0adb27c009135a7e2b9f1b0 ("libio: Disable
vtable validation in case of interposition [BZ #23313]") only covered
the interposable glibc 2.1 handles, in libio/stdfiles.c. The
parallel code in libio/oldstdfiles.c needs similar detection logic.
Stefan Liebler [Wed, 27 Nov 2019 11:35:40 +0000 (12:35 +0100)]
S390: Fix handling of needles crossing a page in strstr z15 ifunc-variant. [BZ #25226]
If the specified needle crosses a page-boundary, the s390-z15 ifunc variant of
strstr truncates the needle which results in invalid results.
This is fixed by loading the needle beyond the page boundary to v18 instead of v16.
The bug is sometimes observable in test-strstr.c in check1 and check2 as the
haystack and needle is stored on stack. Thus the needle can be on a page boundary.
check2 is now extended to test haystack / needles located on stack, at end of page
and on two pages.
rtld: Check __libc_enable_secure before honoring LD_PREFER_MAP_32BIT_EXEC (CVE-2019-19126) [BZ #25204]
The problem was introduced in glibc 2.23, in commit b9eb92ab05204df772eb4929eccd018637c9f3e9
("Add Prefer_MAP_32BIT_EXEC to map executable pages with MAP_32BIT").
Don't use a custom wrapper macro around __has_include (bug 25189).
This causes issues when using clang with -frewrite-includes to e.g.,
submit the translation unit to a distributed compiler.
In my case, I was building Firefox using sccache.
See [1] for a reduced test-case since I initially thought this was a
clang bug, and [2] for more context.
Apparently doing this is invalid C++ per [cpp.cond], which mentions [3]:
> The #ifdef and #ifndef directives, and the defined conditional
> inclusion operator, shall treat __has_include and __has_cpp_attribute
> as if they were the names of defined macros. The identifiers
> __has_include and __has_cpp_attribute shall not appear in any context
> not mentioned in this subclause.
DJ Delorie [Wed, 30 Oct 2019 22:03:14 +0000 (18:03 -0400)]
Base max_fast on alignment, not width, of bins (Bug 24903)
set_max_fast sets the "impossibly small" value based on,
eventually, MALLOC_ALIGNMENT. The comparisons for the smallest
chunk used is, eventually, MIN_CHUNK_SIZE. Note that i386
is the only platform where these are the same, so a smallest
chunk *would* be put in a no-fastbins fastbin.
This change calculates the "impossibly small" value
based on MIN_CHUNK_SIZE instead, so that we can know it will
always be impossibly small.
malloc: Fix missing accounting of top chunk in malloc_info [BZ #24026]
Fixes `<total type="rest" size="..."> incorrectly showing as 0 most
of the time.
The rest value being wrong is significant because to compute the
actual amount of memory handed out via malloc, the user must subtract
it from <system type="current" size="...">. That result being wrong
makes investigating memory fragmentation issues like
<https://bugzilla.redhat.com/show_bug.cgi?id=843478> close to
impossible.
mips: Force RWX stack for hard-float builds that can run on pre-4.8 kernels
Linux/Mips kernels prior to 4.8 could potentially crash the user
process when doing FPU emulation while running on non-executable
user stack.
Currently, gcc doesn't emit .note.GNU-stack for mips, but that will
change in the future. To ensure that glibc can be used with such
future gcc, without silently resulting in binaries that might crash
in runtime, this patch forces RWX stack for all built objects if
configured to run against minimum kernel version less than 4.8.
* sysdeps/unix/sysv/linux/mips/Makefile
(test-xfail-check-execstack):
Move under mips-has-gnustack != yes.
(CFLAGS-.o*, ASFLAGS-.o*): New rules.
Apply -Wa,-execstack if mips-force-execstack == yes.
* sysdeps/unix/sysv/linux/mips/configure: Regenerated.
* sysdeps/unix/sysv/linux/mips/configure.ac
(mips-force-execstack): New var.
Set to yes for hard-float builds with minimum_kernel < 4.8.0
or minimum_kernel not set at all.
(mips-has-gnustack): New var.
Use value of libc_cv_as_noexecstack
if mips-force-execstack != yes, otherwise set to no.
Make tst-strftime2 and tst-strftime3 depend on locale generation
Building the test cases in parallel might make tst-strftime2 and
tst-strftime3 fail. Simply re-running the test case (or building
serially) makes the problem go away. This patch adds the necessary
dependency to allow parallel builds in the time subdirectory.
Tested for powerpc64le.
Reviewed-by: Florian Weimer <fweimer@redhat.com> Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
(cherry picked from commit 52151051b39293dac9fc3181cfd6240d7ce86ca1)
Joseph Myers [Wed, 18 Sep 2019 13:22:24 +0000 (13:22 +0000)]
Fix RISC-V vfork build with Linux 5.3 kernel headers.
Building glibc for RISC-V with Linux 5.3 kernel headers fails because
<linux/sched.h>, included in vfork.S for CLONE_* constants, contains a
structure definition not safe for inclusion in assembly code.
All other architectures already avoid use of that header in vfork.S,
either defining the CLONE_* constants locally or embedding the
required values directly in the relevant instruction, where they
implement vfork using the clone syscall (see the implementations for
aarch64, ia64, mips and nios2). This patch makes the RISC-V version
define the constants locally like the other architectures.
Tested build for all three RISC-V configurations in
build-many-glibcs.py with Linux 5.3 headers.
* sysdeps/unix/sysv/linux/riscv/vfork.S: Do not include
<linux/sched.h>.
(CLONE_VM): New macro.
(CLONE_VFORK): Likewise.
alpha: force old OSF1 syscalls for getegid, geteuid and getppid [BZ #24986]
On alpha, Linux kernel 5.1 added the standard getegid, geteuid and
getppid syscalls (commit ecf7e0a4ad15287). Up to now alpha was using
the corresponding OSF1 syscalls through:
- sysdeps/unix/alpha/getegid.S
- sysdeps/unix/alpha/geteuid.S
- sysdeps/unix/alpha/getppid.S
When building against kernel headers >= 5.1, the glibc now use the new
syscalls through sysdeps/unix/sysv/linux/syscalls.list. When it is then
used with an older kernel, the corresponding 3 functions fail.
A quick fix is to move the OSF1 wrappers under the
sysdeps/unix/sysv/linux/alpha directory so they override the standard
linux ones. A better fix would be to try the new syscalls and fallback
to the old OSF1 in case the new ones fail. This can be implemented in
a later commit.
Changelog:
[BZ #24986]
* sysdeps/unix/alpha/getegid.S: Move to ...
* sysdeps/unix/sysv/linux/alpha/getegid.S: ... here.
* sysdeps/unix/alpha/geteuid.S: Move to ...
* sysdeps/unix/sysv/linux/alpha/geteuid.S: ... here.
* sysdeps/unix/alpha/getppid.S: Move to ...
* sysdeps/unix/sysv/linux/alpha/getppid.S: ... here
(cherry picked from commit 1a6566094d3097f4a3037ab5555cddc6cb11c3a3)
The changes introduce a memory leak for gconv steps arrays whose
first element is an internal conversion, which has a fixed
reference count which is not decremented. As a result, after the
change in commit 50ce3eae5ba304650459d4441d7d246a7cefc26f, the steps
array is never freed, resulting in an unbounded memory leak.
This reverts commit 50ce3eae5ba304650459d4441d7d246a7cefc26f
("gconv: Check reference count in __gconv_release_cache
[BZ #24677]") and commit 7e740ab2e7be7d83b75513aa406e0b10875f7f9c
("libio: Fix gconv-related memory leak [BZ #24583]"). It
reintroduces bug 24583. (Bug 24677 was just a regression caused by
the second commit.)
Joseph Myers [Tue, 30 Jul 2019 14:05:11 +0000 (14:05 +0000)]
Restore r31 setting in powerpc32 swapcontext.
Commit ffe8a9a8318e1db225b22da8bc067408494bac5c, "powerpc: Remove
rt_sigreturn usage on context function", removed from powerpc32
swapcontext a setting of r31 that is relied upon in subsequent code.
I'm not sure why this didn't produce test failures in Adhemerval's
32-bit testing; in my (soft-float) testing in preparation for 2.30
release, I see several context-related failures
nptl: Use uintptr_t for address diagnostic in nptl/tst-pthread-getattr
Recent GCC versions warn about the attempt to return the address of a
local variable:
tst-pthread-getattr.c: In function ‘allocate_and_test’:
tst-pthread-getattr.c:54:10: error: function returns address of local variable [-Werror=return-local-addr]
54 | return mem;
| ^~~
In file included from ../include/alloca.h:3,
from tst-pthread-getattr.c:26:
../stdlib/alloca.h:35:23: note: declared here
35 | # define alloca(size) __builtin_alloca (size)
| ^~~~~~~~~~~~~~~~~~~~~~~
tst-pthread-getattr.c:51:9: note: in expansion of macro ‘alloca’
51 | mem = alloca ((size_t) (mem - target));
| ^~~~~~
The address itself is used in a check in the caller, so using
uintptr_t instead is reasonable.
test-container: Install with $(sorted-subdirs) [BZ #24794]
Commit 35e038c1d2ccb3a75395662f9c4f28d85a61444f started to use an
incomplete list of subdirs based on $(all-subdirs) causing
testroot.pristine to miss files from nss.
Tested if the list of files in testroot.pristine remains the same.
[BZ #24794]
* Makeconfig (all-subdirs): Improved source comments.
* Makefile (testroot.pristine/install.stamp): Pass
subdirs='$(sorted-subdirs)' to make install.
__gconv_release_cache is only ever called with heap-allocated
arrays which contain at least one member. The statically allocated
ASCII steps are filtered out by __wcsmbs_close_conv.
H.J. Lu [Wed, 24 Jul 2019 21:48:33 +0000 (14:48 -0700)]
x86-64: Compile branred.c with -mprefer-vector-width=128 [BZ #24603]
When compiled with -O3 and AVX, GCC 8 and 9 optimize some loops in
sysdeps/ieee754/dbl-64/branred.c with 256-bit vector instructions,
which leads to store forward stall:
There is no easy fix in compiler. This patch limits vector width to
128 bits to work around this issue. It improves performance of sin
and cos by more than 40% on Skylake compiled with -O3 -march=skylake.
Tested with GCC 7/8/9 on x86-64.
[BZ #24603]
* sysdeps/x86_64/configure.ac: Check if -mprefer-vector-width=128
works.
* sysdeps/x86_64/configure: Regenerated.
* sysdeps/x86_64/fpu/Makefile (CFLAGS-branred.c): New. Set
to -mprefer-vector-width=128 if supported.
Linux: Use in-tree copy of SO_ constants for !__USE_MISC [BZ #24532]
The kernel changes for a 64-bit time_t on 32-bit architectures
resulted in <asm/socket.h> indirectly including <linux/posix_types.h>.
The latter is not namespace-clean for the POSIX version of
<sys/socket.h>.
This issue has persisted across several Linux releases, so this commit
creates our own copy of the SO_* definitions for !__USE_MISC mode.
The new test socket/tst-socket-consts ensures that the copy is
consistent with the kernel definitions (which vary across
architectures). The test is tricky to get right because CPPFLAGS
includes include/libc-symbols.h, which in turn defines _GNU_SOURCE
unconditionally.
Tested with build-many-glibcs.py. I verified that a discrepancy in
the definitions actually results in a failure of the
socket/tst-socket-consts test.
test-container: Install with $(all-subdirs) [BZ #24794]
Whenever a sub-make is created, it inherits the variable subdirs from its
parent. This is also true when make check is called with a restricted
list of subdirs. In this scenario, make install is executed "partially"
and testroot.pristine ends up with an incomplete installation.
[BZ #24794]
* Makefile (testroot.pristine/install.stamp): Pass
subdirs='$(all-subdirs)' to make install.
test-container: Avoid copying unintended system libraries
Some DSOs are distributed in hardware capability directories, e.g.
/usr/lib64/power7/libc.so.6
Whenever the processor is able to use one of these hardware-enabled
DSOs, testroot.pristine ends up with copies of glibc-provided libraries
from the system because it can't overwrite or remove them.
This patch avoids the unintended copies by executing ld.so with the same
arguments passed to each glibc test.
* Makefile (testroot.pristine/install.stamp): Execute ld.so with
the same arguments used in all tests.