Florian Weimer [Fri, 2 Jun 2017 09:59:28 +0000 (11:59 +0200)]
Add internal facility for dynamic array handling
This is intended as a type-safe alternative to obstacks and
hand-written realloc constructs. The implementation avoids
writing function pointers to the heap.
Florian Weimer [Thu, 4 Jan 2018 11:32:36 +0000 (12:32 +0100)]
getaddrinfo: Fix error handling in gethosts [BZ #21915] [BZ #21922]
The old code uses errno as the primary indicator for success or
failure. This is wrong because errno is only set for specific
combinations of the status return value and the h_errno variable.
This simplifies the code because it is not necessary to propagate the
temporary h_errno value to the thread-local variable. It also increases
compatibility with NSS modules which update only one of the two places.
Florian Weimer [Thu, 11 May 2017 09:32:16 +0000 (11:32 +0200)]
support_format_addrinfo: Fix flags and canonname formatting
The address family splitting via format_ai_family made unpredictable
the place where the canonname field was printed. This commit adjusts
the implementation so that the ai_flags is checked for consistency
across the list, and ai_canonname must only be present on the first
list element.
Tests for AI_CANONNAME are added to resolv/tst-resolv-basic.
Florian Weimer [Thu, 4 Jan 2018 10:45:20 +0000 (11:45 +0100)]
resolv: Support an exactly sized buffer in ns_name_pack [BZ #21359]
This bug did not affect name resolution because those functions
indirectly call ns_name_pack with a buffer which is always larger
than the generated query packet, even in the case of the
longest-possible domain name.
Florian Weimer [Fri, 2 Jun 2017 12:54:56 +0000 (14:54 +0200)]
getaddrinfo: Always allocate canonical name on the heap
A further simplification could eliminate the canon variable in
gaih_inet and replace it with canonbuf. However, canonbuf is
used as a flag in the nscd code, which makes this somewhat
non-straightforward.
Aurelien Jarno [Sat, 30 Dec 2017 09:54:23 +0000 (10:54 +0100)]
elf: Check for empty tokens before dynamic string token expansion [BZ #22625]
The fillin_rpath function in elf/dl-load.c loops over each RPATH or
RUNPATH tokens and interprets empty tokens as the current directory
("./"). In practice the check for empty token is done *after* the
dynamic string token expansion. The expansion process can return an
empty string for the $ORIGIN token if __libc_enable_secure is set
or if the path of the binary can not be determined (/proc not mounted).
Fix that by moving the check for empty tokens before the dynamic string
token expansion. In addition, check for NULL pointer or empty strings
return by expand_dynamic_string_token.
The above changes highlighted a bug in decompose_rpath, an empty array
is represented by the first element being NULL at the fillin_rpath
level, but by using a -1 pointer in decompose_rpath and other functions.
Changelog:
[BZ #22625]
* elf/dl-load.c (fillin_rpath): Check for empty tokens before dynamic
string token expansion. Check for NULL pointer or empty string possibly
returned by expand_dynamic_string_token.
(decompose_rpath): Check for empty path after dynamic string
token expansion.
(cherry picked from commit 3e3c904daef69b8bf7d5cc07f793c9f07c3553ef)
Dmitry V. Levin [Sun, 17 Dec 2017 23:49:46 +0000 (23:49 +0000)]
elf: do not substitute dst in $LD_LIBRARY_PATH twice [BZ #22627]
Starting with commit glibc-2.18.90-470-g2a939a7e6d81f109d49306bc2e10b4ac9ceed8f9 that
introduced substitution of dynamic string tokens in fillin_rpath,
_dl_init_paths invokes _dl_dst_substitute for $LD_LIBRARY_PATH twice:
the first time it's called directly, the second time the result
is passed on to fillin_rpath which calls expand_dynamic_string_token
which in turn calls _dl_dst_substitute, leading to the following
behaviour:
$ mkdir -p /tmp/'$ORIGIN' && cd /tmp/'$ORIGIN' &&
echo 'int main(){}' |gcc -xc - &&
strace -qq -E LD_LIBRARY_PATH='$ORIGIN' -e /open ./a.out
open("/tmp//tmp/$ORIGIN/tls/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/tmp//tmp/$ORIGIN/tls/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/tmp//tmp/$ORIGIN/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/tmp//tmp/$ORIGIN/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
Fix this by removing the direct _dl_dst_substitute invocation.
* elf/dl-load.c (_dl_init_paths): Remove _dl_dst_substitute preparatory
code and invocation.
James Clarke [Tue, 12 Dec 2017 14:17:10 +0000 (12:17 -0200)]
ia64: Add ipc_priv.h header to set __IPC_64 to zero
When running strace, IPC_64 was set in the command, but ia64 is
an architecture where CONFIG_ARCH_WANT_IPC_PARSE_VERSION *isn't* set
in the kernel, so ipc_parse_version just returns IPC_64 without
clearing the IPC_64 bit in the command.
* sysdeps/unix/sysv/linux/ia64/ipc_priv.h: New file defining
__IPC_64 to 0 to avoid IPC_64 being set.
This patch syncs posix/glob.c implementation with gnulib version b5ec983 (glob: simplify symlink detection). The only difference
to gnulib code is
* DT_UNKNOWN, DT_DIR, and DT_LNK definition in the case there
were not already defined. Gnulib code which uses
HAVE_STRUCT_DIRENT_D_TYPE will redefine them wrongly because
GLIBC does not define HAVE_STRUCT_DIRENT_D_TYPE. Instead
the patch check for each definition instead.
Also, the patch requires additional globfree and globfree64 files
for compatibility version on some architectures. Also the code
simplification leads to not macro simplification (not need for
NO_GLOB_PATTERN_P anymore).
Checked on x86_64-linux-gnu and on a build using build-many-glibcs.py
for all major architectures.
[BZ #1062]
* posix/Makefile (routines): Add globfree, globfree64, and
glob_pattern_p.
* posix/flexmember.h: New file.
* posix/glob_internal.h: Likewise.
* posix/glob_pattern_p.c: Likewise.
* posix/globfree.c: Likewise.
* posix/globfree64.c: Likewise.
* sysdeps/gnu/globfree64.c: Likewise.
* sysdeps/unix/sysv/linux/alpha/globfree.c: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/n64/globfree64.c: Likewise.
* sysdeps/unix/sysv/linux/oldglob.c: Likewise.
* sysdeps/unix/sysv/linux/wordsize-64/globfree64.c: Likewise.
* sysdeps/unix/sysv/linux/x86_64/x32/globfree.c: Likewise.
* sysdeps/wordsize-64/globfree.c: Likewise.
* sysdeps/wordsize-64/globfree64.c: Likewise.
* posix/glob.c (HAVE_CONFIG_H): Use !_LIBC instead.
[NDEBUG): Remove comments.
(GLOB_ONLY_P, _AMIGA, VMS): Remove define.
(dirent_type): New type. Use uint_fast8_t not
uint8_t, as C99 does not require uint8_t.
(DT_UNKNOWN, DT_DIR, DT_LNK): New macros.
(struct readdir_result): Use dirent_type. Do not define skip_entry
unless it is needed; this saves a byte on platforms lacking d_ino.
(readdir_result_type, readdir_result_skip_entry):
New functions, replacing ...
(readdir_result_might_be_symlink, readdir_result_might_be_dir):
these functions, which were removed. This makes the callers
easier to read. All callers changed.
(D_INO_TO_RESULT): Now empty if there is no d_ino.
(size_add_wrapv, glob_use_alloca): New static functions.
(glob, glob_in_dir): Check for size_t overflow in several places,
and fix some size_t checks that were not quite right.
Remove old code using SHELL since Bash no longer
uses this.
(glob, prefix_array): Separate MS code better.
(glob_in_dir): Remove old Amiga and VMS code.
(globfree, __glob_pattern_type, __glob_pattern_p): Move to
separate files.
(glob_in_dir): Do not rely on undefined behavior in accessing
struct members beyond their bounds. Use a flexible array member
instead
(link_stat): Rename from link_exists2_p and return -1/0 instead of
0/1. Caller changed.
(glob): Fix memory leaks.
* posix/glob64 (globfree64): Move to separate file.
* sysdeps/gnu/glob64.c (NO_GLOB_PATTERN_P): Remove define.
(globfree64): Remove hidden alias.
* sysdeps/unix/sysv/linux/Makefile (sysdeps_routines): Add
oldglob.
* sysdeps/unix/sysv/linux/alpha/glob.c (__new_globfree): Move to
separate file.
* sysdeps/unix/sysv/linux/i386/glob64.c (NO_GLOB_PATTERN_P): Remove
define.
Move compat code to separate file.
* sysdeps/wordsize-64/glob.c (globfree): Move definitions to
separate file.
posix: Do not use WNOHANG in waitpid call for Linux posix_spawn
As shown in some buildbot issues on aarch64 and powerpc, calling
clone (VFORK) and waitpid (WNOHANG) does not guarantee the child
is ready to be collected. This patch changes the call back to 0
as before fe05e1cb6d64 fix.
This change can lead to the scenario 4.3 described in the commit,
where the waitpid call can hang undefinitely on the call. However
this is also a very unlikely and also undefinied situation where
both the caller is trying to terminate a pid before posix_spawn
returns and the race pid reuse is triggered. I don't see how to
correct handle this specific situation within posix_spawn.
Checked on x86_64-linux-gnu, aarch64-linux-gnu and
powerpc64-linux-gnu.
* sysdeps/unix/sysv/linux/spawni.c (__spawnix): Use 0 instead of
WNOHANG in waitpid call.
posix: Fix improper assert in Linux posix_spawn (BZ#22273)
As noted by Florian Weimer, current Linux posix_spawn implementation
can trigger an assert if the auxiliary process is terminated before
actually setting the err member:
340 /* Child must set args.err to something non-negative - we rely on
341 the parent and child sharing VM. */
342 args.err = -1;
[...]
362 new_pid = CLONE (__spawni_child, STACK (stack, stack_size), stack_size,
363 CLONE_VM | CLONE_VFORK | SIGCHLD, &args);
364
365 if (new_pid > 0)
366 {
367 ec = args.err;
368 assert (ec >= 0);
Another possible issue is killing the child between setting the err and
actually calling execve. In this case the process will not ran, but
posix_spawn also will not report any error:
As suggested by Andreas Schwab, this patch removes the faulty assert
and also handles any signal that happens before fork and execve as the
spawn was successful (and thus relaying the handling to the caller to
figure this out). Different than Florian, I can not see why using
atomics to set err would help here, essentially the code runs
sequentially (due CLONE_VFORK) and I think it would not be legal the
compiler evaluate ec without checking for new_pid result (thus there
is no need to compiler barrier).
Summarizing the possible scenarios on posix_spawn execution, we
have:
1. For default case with a success execution, args.err will be 0, pid
will not be collected and it will be reported to caller.
2. For default failure case, args.err will be positive and the it will
be collected by the waitpid. An error will be reported to the
caller.
3. For the unlikely case where the process was terminated and not
collected by a caller signal handler, it will be reported as succeful
execution and not be collected by posix_spawn (since args.err will
be 0). The caller will need to actually handle this case.
4. For the unlikely case where the process was terminated and collected
by caller we have 3 other possible scenarios:
4.1. The auxiliary process was terminated with args.err equal to 0:
it will handled as 1. (so it does not matter if we hit the pid
reuse race since we won't possible collect an unexpected
process).
4.2. The auxiliary process was terminated after execve (due a failure
in calling it) and before setting args.err to -1: it will also
be handle as 1. but with the issue of not be able to report the
caller a possible execve failures.
4.3. The auxiliary process was terminated after args.err is set to -1:
this is the case where it will be possible to hit the pid reuse
case where we will need to collected the auxiliary pid but we
can not be sure if it will be expected one. I think for this
case we need to actually change waitpid to use WNOHANG to avoid
hanging indefinitely on the call and report an error to caller
since we can't differentiate between a default failure as 2.
and a possible pid reuse race issue.
Checked on x86_64-linux-gnu.
* sysdeps/unix/sysv/linux/spawni.c (__spawnix): Handle the case where
the auxiliary process is terminated by a signal before calling _exit
or execve.
Andreas Schwab [Tue, 8 Aug 2017 14:21:58 +0000 (16:21 +0200)]
Don't use IFUNC resolver for longjmp or system in libpthread (bug 21041)
Unlike the vfork forwarder and like the fork forwarder as in bug 19861,
there won't be a problem when the compiler does not turn this into a tail
call.
James Clarke [Fri, 13 Oct 2017 18:44:39 +0000 (15:44 -0300)]
Fix TLS relocations against local symbols on powerpc32, sparc32 and sparc64
Normally, TLS relocations against local symbols are optimised by the linker
to be absolute. However, gold does not do this, and so it is possible to
end up with, for example, R_SPARC_TLS_DTPMOD64 referring to a local symbol.
Since sym_map is left as null in elf_machine_rela for the special local
symbol case, the relocation handling thinks it has nothing to do, and so
the module gets left as 0. Havoc then ensues when the variable in question
is accessed.
Before this fix, the main_local_gold program would receive a SIGBUS on
sparc64, and SIGSEGV on powerpc32. With this fix applied, that test now
passes like the rest of them.
* sysdeps/powerpc/powerpc32/dl-machine.h (elf_machine_rela):
Assign sym_map to be map for local symbols, as TLS relocations
use sym_map to determine whether the symbol is defined and to
extract the TLS information.
* sysdeps/sparc/sparc32/dl-machine.h (elf_machine_rela): Likewise.
* sysdeps/sparc/sparc64/dl-machine.h (elf_machine_rela): Likewise.
Joseph Myers [Thu, 19 Oct 2017 17:32:20 +0000 (17:32 +0000)]
Install correct bits/long-double.h for MIPS64 (bug 22322).
Similar to bug 21987 for SPARC, MIPS64 wrongly installs the ldbl-128
version of bits/long-double.h, meaning incorrect results when using
headers installed from a 64-bit installation for a 32-bit build. (I
haven't actually seen this cause build failures before its interaction
with bits/floatn.h did so - installed headers wrongly expecting
_Float128 to be available in a 32-bit configuration.)
This patch fixes the bug by moving the MIPS header to
sysdeps/mips/ieee754, which comes before sysdeps/ieee754/ldbl-128 in
the sysdeps directory ordering. (bits/floatn.h will need a similar
fix - duplicating the ldbl-128 version for MIPS will suffice - for
headers from a 32-bit installation to be correct for 64-bit builds.)
Tested with build-many-glibcs.py (compilers build for
mips64-linux-gnu, where there was previously a libstdc++ build failure
as at
<https://sourceware.org/ml/libc-testresults/2017-q4/msg00130.html>).
H.J. Lu [Sun, 22 Oct 2017 15:20:38 +0000 (08:20 -0700)]
x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector,
mask and bound registers. It simplifies _dl_runtime_resolve and supports
different calling conventions. ld.so code size is reduced by more than
1 KB. However, use fxsave/xsave/xsavec takes a little bit more cycles
than saving and restoring vector and bound registers individually.
Latency for _dl_runtime_resolve to lookup the function, foo, from one
shared library plus libc.so:
This is the worst case where portion of time spent for saving and
restoring registers is bigger than majority of cases. With smaller
_dl_runtime_resolve code size, overall performance impact is negligible.
On IvyBridge, differences in build and test time of binutils with lazy
binding GCC and binutils are noises. On Westmere, differences in
bootstrap and "makc check" time of GCC 7 with lazy binding GCC and
binutils are also noises.
H.J. Lu [Thu, 19 Oct 2017 15:47:38 +0000 (08:47 -0700)]
x86-64: Verify that _dl_runtime_resolve preserves vector registers
On x86-64, _dl_runtime_resolve must preserve the first 8 vector
registers. Add 3 _dl_runtime_resolve tests to verify that SSE,
AVX and AVX512 registers are preserved.
Refactor long double information into bits/long-double.h.
resulted in sparc32 configurations installing the ldbl-opt version of
bits/long-double.h instead of the intended
sysdeps/unix/sysv/linux/sparc version.
For sparc32 by itself, this is not a problem, since the ldbl-opt
version is correct for sparc32. However, both sparc32 and sparc64 are
supposed to install sets of headers that work for both of them, so
that a single sysroot, whichever order the libraries are built and
installed in, works for both. The effect of having the wrong version
installed is that you end up with a miscompiled sparc64 libstdc++
which fails glibc's configure tests for the C++ compiler.
This patch moves the header from sysdeps/unix/sysv/linux/sparc to
separate copies of the same file for sparc32 and sparc64, to ensure it
comes before ldbl-opt in the sysdeps directory ordering.
Tested with build-many-glibcs.py for sparc64-linux-gnu and
sparcv9-linux-gnu.
[BZ #21987]
* sysdeps/unix/sysv/linux/sparc/bits/long-double.h: Remove file
and copy to ...
* sysdeps/unix/sysv/linux/sparc/sparc32/bits/long-double.h:
... here.
* sysdeps/unix/sysv/linux/sparc/sparc64/bits/long-double.h:
... and here.
H.J. Lu [Mon, 11 Sep 2017 16:04:57 +0000 (09:04 -0700)]
Make copy of <bits/std_abs.h> from GCC 7 [BZ #21573]
<bits/std_abs.h> from GCC 7 will include /usr/include/stdlib.h from
"#include_next" (instead of stdlib/stdlib.h in the glibc source
directory), and this turns up as a make dependency. Also make a copy
of <bits/std_abs.h> to prevent it from including /usr/include/stdlib.h.
H.J. Lu [Mon, 11 Sep 2017 15:51:52 +0000 (08:51 -0700)]
string/stratcliff.c: Replace int with size_t [BZ #21982]
Fix GCC 7 errors when string/stratcliff.c is compiled with -O3:
stratcliff.c: In function ‘do_test’:
cc1: error: assuming signed overflow does not occur when assuming that (X - c) <= X is always true [-Werror=strict-overflow]
[BZ #21982]
* string/stratcliff.c (do_test): Declare size, nchars, inner,
middle and outer with size_t instead of int. Repleace %d and
%Zd with %zu in printf. Update "MAX (0, nchars - 128)" and
"MAX (outer, nchars - 64)" to support unsigned outer and
nchars. Also exit loop when outer == 0.
H.J. Lu [Thu, 31 Aug 2017 13:28:31 +0000 (06:28 -0700)]
Place $(elf-objpfx)sofini.os last [BZ #22051]
Since sofini.os terminates .eh_frame section, it should be placed last.
[BZ #22051]
* Makerules (build-module-helper-objlist): Filter out
$(elf-objpfx)sofini.os.
(build-shlib-objlist): Append $(elf-objpfx)sofini.os if it is
needed.
Carlos O'Donell [Sat, 29 Jul 2017 04:02:03 +0000 (00:02 -0400)]
mutex: Fix robust mutex lock acquire (Bug 21778)
65810f0ef05e8c9e333f17a44e77808b163ca298 fixed a robust mutex bug but
introduced BZ 21778: if the CAS used to try to acquire a lock fails, the
expected value is not updated, which breaks other cases in the loce
acquisition loop. The fix is to simply update the expected value with
the value returned by the CAS, which ensures that behavior is as if the
first case with the CAS never happened (if the CAS fails).
This is a regression introduced in the last release.
Tested on x86_64, i686, ppc64, ppc64le, s390x, aarch64, armv7hl.
Carlos O'Donell [Mon, 28 Aug 2017 13:04:39 +0000 (15:04 +0200)]
rwlock: Fix explicit hand-over (bug 21298)
Without this fix, the rwlock can fail to execute the explicit hand-over
in certain cases (e.g., empty critical sections that switch quickly between
read and write phases). This can then lead to errors in how __wrphase_futex
is accessed, which in turn can lead to deadlocks.
In 1e5834c38a22 ("Refactor Linux ipc_priv header") a different
approach to passing __IPC_64 as zero was created. Hppa kernel ABI
requires to oass __IPC_64 as zero since it does not set
CONFIG_ARCH_WANT_IPC_PARSE_VERSION in the kernel.
Checked on hppa-linux-gnu with some adjustments to avoid BZ#21016
(basically by removing hppa compat implementations and adjusting
required headers).
* sysdeps/unix/sysv/linux/hppa/ipc_priv.h: New file.
[BZ 20098]
* sysdeps/hppa/dl-fptr.c (_dl_read_access_allowed): New.
(_dl_lookup_address): Return address if it is not consistent with
being a linker defined function pointer. Likewise, return address
if address and function descriptor addresses are not accessible.
since xgetbv takes much more cycles than single cycle operations like
vpord/vvpcmpeq/ptest. _dl_runtime_resolve_opt should be used only with
AVX512 where AVX512 instructions lead to lower CPU frequency on Skylake
server.
[BZ #21871]
* sysdeps/x86/cpu-features.c (init_cpu_features): Set
bit_arch_Use_dl_runtime_resolve_opt only with AVX512F.
Aurelien Jarno [Thu, 3 Aug 2017 22:35:48 +0000 (22:35 +0000)]
Fix the return type of the getentropy stub
The return type of the getentropy stub is wrongly defined as ssize_t,
while both the <sys/random.h> header and the Linux implementation
define it as int. This patch fixes that.
1. Fix the results for negative subnormals by ignoring the signal when
normalizing the value.
2. Fix the output when the high part is a power of 2 and the low part
is a nonzero number with opposite sign. This fix is based on commit 380bd0fd2418f8988217de950f8b8ff18af0cb2b.
After applying this patch, logbl() tests pass cleanly on POWER >= 7.
Tested on powerpc, powerpc64 and powerpc64le
[BZ #21280]
* sysdeps/powerpc/power7/fpu/s_logbl.c (__logbl): Ignore the
signal of subnormals and adjust the exponent of power of 2 down
when low part has opposite sign.
continue to work even if vector instructions are used in glibc which
require the ABI stack realignment.
__tls_get_addr_slow is added to handle the slow paths in the default
implementation of__tls_get_addr in elf/dl-tls.c. The new __tls_get_addr
calls __tls_get_addr_slow after realigning the stack. Internal calls
within ld.so go directly to the default implementation of __tls_get_addr
because they do not need stack realignment.
Florian Weimer [Wed, 14 Jun 2017 06:11:22 +0000 (08:11 +0200)]
i686: Add missing IS_IN (libc) guards to vectorized strcspn
Since commit d957c4d3fa48d685ff2726c605c988127ef99395 (i386: Compile
rtld-*.os with -mno-sse -mno-mmx -mfpmath=387), vector intrinsics can
no longer be used in ld.so, even if the compiled code never makes it
into the final ld.so link. This commit adds the missing IS_IN (libc)
guard to the SSE 4.2 strcspn implementation, so that it can be used from
ld.so in the future.
Ignore and remove LD_HWCAP_MASK for AT_SECURE programs (bug #21209)
The LD_HWCAP_MASK environment variable may alter the selection of
function variants for some architectures. For AT_SECURE process it
means that if an outdated routine has a bug that would otherwise not
affect newer platforms by default, LD_HWCAP_MASK will allow that bug
to be exploited.
To be on the safe side, ignore and disable LD_HWCAP_MASK for setuid
binaries.
Joseph Myers [Wed, 15 Mar 2017 17:32:46 +0000 (17:32 +0000)]
Fix test-math-vector-sincos.h aliasing.
x86_64 libmvec tests have been failing to build lately with GCC
mainline with -Wuninitialized errors, and Markus Trippelsdorf traced
this to an aliasing issue
<https://sourceware.org/ml/libc-alpha/2017-03/msg00169.html>.
This patch fixes the aliasing issue, so that the vectors-of-pointers
are initialized using a union instead of pointer casts. This also
fixes the testsuite build failures with GCC mainline.
Tested for x86_64 (full testsuite with GCC 6; testsuite build with GCC
mainline with build-many-glibcs.py).
* sysdeps/x86/fpu/test-math-vector-sincos.h (INIT_VEC_PTRS_LOOP):
Use a union when storing pointers.
(VECTOR_WRAPPER_fFF_2): Do not take address of integer vector and
cast result when passing to INIT_VEC_PTRS_LOOP.
(VECTOR_WRAPPER_fFF_3): Likewise.
(VECTOR_WRAPPER_fFF_4): Likewise.
This patch fixes the regression added by 23d2770 for final address
overflow calculation. The subtraction of the considered size (16)
at line 120 is at wrong place, for sizes less than 16 subsequent
overflow check will not take in consideration an invalid size (since
the subtraction will be negative). Also, the lea instruction also
does not raise the carry flag (CF) that is used in subsequent jbe
to check for overflow.
The fix is to follow x86_64 logic from 3daef2c where the overflow
is first check and a sub instruction is issued. In case of resulting
negative size, CF will be set by the sub instruction and a NULL
result will be returned. The patch also add similar tests reported
in bug report.
Checked on i686-linux-gnu and x86_64-linux-gnu.
* string/test-memchr.c (do_test): Add BZ#21182 checks for address
near end of a page.
* sysdeps/i386/i686/multiarch/memchr-sse2.S (__memchr): Fix
overflow calculation.
H.J. Lu [Fri, 28 Apr 2017 17:04:15 +0000 (10:04 -0700)]
x86: Use AVX2 memcpy/memset on Skylake server [BZ #21396]
On Skylake server, AVX512 load/store instructions in memcpy/memset may
lead to lower CPU turbo frequency in certain situations. Use of AVX2
in memcpy/memset has been observed to have improved overall performance
in many workloads due to the higher frequency.
Since AVX512ER is unique to Xeon Phi, this patch sets Prefer_No_AVX512
if AVX512ER isn't available so that AVX2 versions of memcpy/memset are
used on Skylake server.
[BZ #21396]
* sysdeps/x86/cpu-features.c (init_cpu_features): Set
Prefer_No_AVX512 if AVX512ER isn't available.
* sysdeps/x86/cpu-features.h (bit_arch_Prefer_No_AVX512): New.
(index_arch_Prefer_No_AVX512): Likewise.
* sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Don't use
AVX512 version if Prefer_No_AVX512 is set.
* sysdeps/x86_64/multiarch/memcpy_chk.S (__memcpy_chk):
Likewise.
* sysdeps/x86_64/multiarch/memmove.S (__libc_memmove): Likewise.
* sysdeps/x86_64/multiarch/memmove_chk.S (__memmove_chk):
Likewise.
* sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Likewise.
* sysdeps/x86_64/multiarch/mempcpy_chk.S (__mempcpy_chk):
Likewise.
* sysdeps/x86_64/multiarch/memset.S (memset): Likewise.
* sysdeps/x86_64/multiarch/memset_chk.S (__memset_chk):
Likewise.
H.J. Lu [Fri, 28 Apr 2017 17:03:09 +0000 (10:03 -0700)]
x86: Set Prefer_No_VZEROUPPER if AVX512ER is available
AVX512ER won't be implemented in any Xeon processors and will be in
all Xeon Phi processors. Don't check CPU model number when setting
Prefer_No_VZEROUPPER for Xeon Phi. Instead, set Prefer_No_VZEROUPPER
if AVX512ER is available. It works with current and future Xeon Phi
and non-Xeon Phi processors.
H.J. Lu [Tue, 21 Mar 2017 17:59:31 +0000 (10:59 -0700)]
x86-64: Improve branch predication in _dl_runtime_resolve_avx512_opt [BZ #21258]
On Skylake server, _dl_runtime_resolve_avx512_opt is used to preserve
the first 8 vector registers. The code layout is
if only %xmm0 - %xmm7 registers are used
preserve %xmm0 - %xmm7 registers
if only %ymm0 - %ymm7 registers are used
preserve %ymm0 - %ymm7 registers
preserve %zmm0 - %zmm7 registers
Branch predication always executes the fallthrough code path to preserve
%zmm0 - %zmm7 registers speculatively, even though only %xmm0 - %xmm7
registers are used. This leads to lower CPU frequency on Skylake
server. This patch changes the fallthrough code path to preserve
%xmm0 - %xmm7 registers instead:
if whole %zmm0 - %zmm7 registers are used
preserve %zmm0 - %zmm7 registers
if only %ymm0 - %ymm7 registers are used
preserve %ymm0 - %ymm7 registers
preserve %xmm0 - %xmm7 registers
Tested on Skylake server.
[BZ #21258]
* sysdeps/x86_64/dl-trampoline.S (_dl_runtime_resolve_opt):
Define only if _dl_runtime_resolve is defined to
_dl_runtime_resolve_sse_vex.
* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_opt):
Fallthrough to _dl_runtime_resolve_sse_vex.
Mike Frysinger [Thu, 16 Mar 2017 06:59:31 +0000 (23:59 -0700)]
posix_spawn: use a larger min stack for -fstack-check [BZ #21253]
When glibc is built with -fstack-check, trying to use posix_spawn can
lead to segfaults due to gcc internally probing stack memory too far.
The new spawn API will allocate a minimum of 1 page, but the stack
checking logic might probe a couple of pages. When it tries to walk
them, everything falls apart.
The gcc internal docs [1] state the default interval checking is one
page. Which means we need two pages (the current one, and the next
probed). No target currently defines it larger.
Further, it mentions that the default minimum stack size needed to
recover from an overflow is 4/8KiB for sjlj or 8/12KiB for others.
But some Linux targets (like mips and ppc) go up to 16KiB (and some
non-Linux targets go up to 24KiB).
Let's create each child with a minimum of 32KiB slack space to support
them all, and give us future breathing room.
No test is added as existing ones crash. Even a simple call is
enough to trigger the problem:
char *argv[] = { "/bin/ls", NULL };
posix_spawn(NULL, "/bin/ls", NULL, NULL, argv, NULL);
Call the right helper function when setting mallopt M_ARENA_MAX (BZ #21338)
Fixes a typo introduced in commit be7991c0705e35b4d70a419d117addcd6c627319. This caused
mallopt(M_ARENA_MAX) as well as the environment variable
MALLOC_ARENA_MAX to not work as intended because it set the
wrong internal parameter.
[BZ #21338]
* malloc/malloc.c: Call do_set_arena_max for M_ARENA_MAX
instead of incorrect do_set_arena_test