Now that MEMCPY_OK_FOR_FWD_MEMMOVE should be define on memcopy.h there
is no need to specialized powerpc memmove implementation. This patch
moves the define set to powerpc memcopy and cleanup its definition on
powerpc code.
PowerPC: Align power7 memcpy using VSX to quadword
This patch changes power7 memcpy to use VSX instructions only when
memory is aligned to quardword. It is to avoid unaligned kernel traps
on non-cacheable memory (for instance, memory-mapped I/O).
This patch adds an optimized memmove optimization for POWER7/powerpc64.
Basically the idea is to use the memcpy for POWER7 on non-overlapped
memory regions and a optimized backward memcpy for memory regions
that overlap (similar to the idea of string/memmove.c).
The backward memcpy algorithm used is similar the one use for memcpy for
POWER7, with adjustments done for alignment. The difference is memory
is always aligned to 16 bytes before using VSX/altivec instructions.
This patch removes the powerpc specific logic in memmove and instead
include default implementation with MEMCPY_OK_FOR_FWD_MEMMOVE defined.
This lead in a increase performance, since the constraints to use
memcpy in powerpc code are too restrictive and memcpy can be used for
any forward memmove.
This patch adds an ifunc power7 strcat symbol that uses the logic on
sysdeps/powerpc/strcat.c but call power7 strlen/strcpy symbols instead
of default ones.
Linux commit dd58a092c4202f2bd490adab7285b3ff77f8e467 added the
PPC_FEATURE2_VEC_CRYPTO auvx capability to indicate whether to
hardware supports vector crypto hardware instructions. This patch
adds its definition to powerpc hwcap bits.
This patch fixes few failures in nearbyintl() where the fraction part is
close to 0.5.i The new tests added report few extra failures in
nearbyint_downward and nearbyint_towardzero which is a known issue.
Optimization is achieved on 8 byte aligned strings with double word
comparison using cmpb instruction. On unaligned strings loop unrolling
is applied for Power7 gain.
This patch fixes the optimized ppc64/power7 strncat strlen call for
static build without ifunc enabled. The strlen symbol to call in such
situation is just strlen, instead of __GI_strlen (since the __GI_
alias is just created for shared objects).
This patch fixes some powerpc32 and powerpc64 builds with
--disable-multi-arch option along with different --with-cpu=powerN.
It cleanups the Implies directories by removing the multiarch
folder for non multiarch config and also fixing two assembly
implementations: powerpc64/power7/strncat.S that is calling the
wrong strlen; and power8/fpu/s_isnan.S that misses the hidden_def and
weak_alias directives.
This patch fixes a similar issue to 736c304a1ab4cee36a2f3343f1698bc0abae4608, where for PPC32 if the symbol
is defined as hidden (memchr) then compiler will create a local branc
(symbol@local) and the linker will not create a required PLT call to
make the ifunc work. It changes the default hidden symbol (__GI_memchr)
to default memchr symbol for powerpc32 (__memchr_ppc32).
PowerPC: strncpy/stpncpy optimization for PPC64/POWER7
The optimization is achieved by following techniques:
> data alignment [gain from aligned memory access on read/write]
> POWER7 gains performance with loop unrolling/unwinding
[gain by reduction of branch penalty].
> zero padding done by calling optimized memset
This patch changes de default symbol redirection for internal call of
memcpy, memset, memchr, and strlen to the IFUNC resolved ones. The
performance improvement is noticeable in algorithms that uses these
symbols extensible, like the regex functions.
Alan Modra [Wed, 16 Apr 2014 10:03:32 +0000 (19:33 +0930)]
Correct IBM long double frexpl.
Besides fixing the bugzilla, this also fixes corner-cases where the high
and low double differ greatly in magnitude, and handles a denormal
input without resorting to a fp rescale.
PowerPC: Fix nearbyint/nearbyintf result for FE_DOWNWARD
This patch fixes the powerpc32 optimized nearbyint/nearbyintf bogus
results for FE_DOWNWARD rounding mode. This is due wrong instructions
sequence used in the rounding calculation (two subtractions instead of
adition and a subtraction).
Alan Modra [Wed, 2 Apr 2014 03:16:19 +0000 (13:46 +1030)]
Correct IBM long double nextafterl.
Fix for values near a power of two, and some tidies.
[BZ #16739]
* sysdeps/ieee754/ldbl-128ibm/s_nextafterl.c (__nextafterl): Correct
output when value is near a power of two. Use int64_t for lx and
remove casts. Use decimal rather than hex exponent constants.
Don't use long double multiplication when double will suffice.
* math/libm-test.inc (nextafter_test_data): Add tests.
* NEWS: Add 16739 and 16786 to bug list.
This patch add an optimized strpbrk for POWER7 by using a different
algorithm than default implementation: it constructs a table based on
the 'accept' argument and use this table to check for any occurance on
the input string. The idea is similar as x86_64 uses.
For PowerPC some tunings were added, such as unroll loops and memory
clear using VSX instructions.
This patch add a optimized strcspn for POWER7 by using a different
algorithm than default implementation: it constructs a table based on
the 'accept' argument and use this table to check for any occurance
on the input string. The idea is similar as x86_64 uses.
For PowerPC some tunings were added, such as unroll loops and align
stack memory to table to 16 bytes (so VSX clean can ran without
alignment issues).
PowerPC: remove wrong roundl implementation for PowerPC64
The roundl assembly implementation
(sysdeps/powerpc/powerpc64/fpu/s_roundl.S)
returns wrong results for some inputs where first double is a exact
integer and the precision is determined by second long double.
Checking on implementation comments and history, I am very confident the
assembly implementation was based on a version before commit 5c68d401698a58cf7da150d9cce769fa6679ba5f that fixes BZ#2423 (Errors in
long double (ldbl-128ibm) rounding functions in glibc-2.4).
By just removing the implementation and make the build select
sysdeps/ieee754/ldbl-128ibm/s_roundl.c instead fixes the failing math.
PowerPC: remove wrong nearbyintl implementation for PPC64
The nearbyintl assembly implementation
(sysdeps/powerpc/powerpc64/fpu/s_nearbyintl.S)
returns wrong results for some inputs where first double is a exact
integer and the precision is determined by second long double.
Checking on implementation comments and history, I am very confident the
assembly implementation was based on a version before commit 5c68d401698a58cf7da150d9cce769fa6679ba5f that fixes BZ#2423 (Errors in
long double (ldbl-128ibm) rounding functions in glibc-2.4).
By just removing the implementation and make the build select
sysdeps/ieee754/ldbl-128ibm/s_nearbyintl.c instead fixes the failing
math.
PowerPC: remove wrong ceill implementation for PowerPC64
The ceill assembly implementation (sysdeps/powerpc/powerpc64/fpu/s_ceill.S)
returns wrong results for some inputs where first double is a exact
integer and the precision is determined by second long double.
Checking on implementation comments and history, I am very confident the
assembly implementation was based on a version before commit 5c68d401698a58cf7da150d9cce769fa6679ba5f that fixes BZ#2423 (Errors in
long double (ldbl-128ibm) rounding functions in glibc-2.4).
By just removing the implementation and make the build select
sysdeps/ieee754/ldbl-128ibm/s_ceill.c instead fixes the failing math.
PowerPC: Fix bzero definition for static libc for PPC32
This patch fixes an issue for powerpc32-fpu static build which fails
with an 'bzero' undefined reference. This patch adds bzero ifunc selector
for static builds and fixes the '__bzero_ppc' reference to default
memset symbol (since static memset build does not provide ifunc
selector).
PowerPC: Fix bzero definition for static libc for PPC64
This patch fixes an issue for powerpc64[le] static build where __bzero
is definied in multiple places (memset-ppc64.o and bzero.o). It is now
defined only in bzero.o and memset-ppc64.o only defined __bzero_ppc for
both dynamic and static library.
The optimization is achieved by following techniques:
> hashing of needle.
> hashing avoids scanning of duplicate entries in needle across the string.
> initializing the hash table with Vector instructions (VSX) by quadword access.
> unrolling when scanning for character in string across hash table.
The optimization is achieved by following techniques:
1. Doubleword aligned memory access and compares using
cmpb instruction.
2. Loop unrolling for byte load/store.
3. CPU pre-fetch to avoid cache miss.
This patch optimizes strrchr() for ppc64. It uses aligned memory
access along with cmpb instruction and CPU prefetch to avoid
cache misses for speed improvement.
This patch add a optimized llround/llroundf implementation for POWER8
using the new Move From VSR Doubleword instruction to gains some
cycles from FP to GRP register move.
This patch add a optimized llrint/llrintf implementation for POWER8
using the new Move From VSR Doubleword instruction to gains some
cycles from FP to GRP register move.
This patch add a optimized finite/finitef implementation for POWER8
using the new Move From VSR Doubleword instruction to gains some
cycles from FP to GRP register move.
This patch add a optimized isinf/isinff implementation for POWER8
using the new Move From VSR Doubleword instruction to gains some
cycles from FP to GRP register move.
This patch add a optimized isnan/isnanf implementation for POWER8
using the new Move From VSR Doubleword instruction to gains some
cycles from FP to GRP register move.
Carlos O'Donell [Thu, 6 Feb 2014 16:12:48 +0000 (11:12 -0500)]
BZ #16529: Fix pedantic warning with netinet/in.h.
When compiling with pedantic the following warning is seen:
gcc -Wall -pedantic -O0 -o test test.c
In file included from test.c:3:0:
/path/inet/netinet/in.h:111:21: warning: comma at end of \
enumerator list [-Wpedantic]
IPPROTO_MH = 135, /* IPv6 mobility header. */
^
It is valid C99 to have a trailing comma after the last item in
an enumeration. However it is not valid C90. If possible glibc
attempts to keep all headers C90 + long long without requiring
C99 features. In this case it's easy to fix the headers and it
removes the warning seem with -pedantic.
Carlos O'Donell [Wed, 5 Feb 2014 15:10:34 +0000 (10:10 -0500)]
Fix tst-setgetname for Linux kernels < 2.6.33.
Support for /proc/self/task/$tid/comm as added in Linux 2.6.33,
therefore since the test tst-setgetname relies on this functionality
to operate we must skip the test in kernels < 2.6.33. We wrap the
checks with __ASSUME_PROC_PID_TASK_COMM such that in the future when
we move arch_minimum_kernel to 2.6.33 we can remove this code.
Fix infinite loop in ftell when writing wide char data (BZ #16398)
ftell tries to avoid flushing the buffer when it is in write mode by
converting the wide char data and placing it into the binary buffer.
If the output buffer space is full and there is data to write, the
code reverts to flushing the buffer. This breaks when there is space
in the buffer but it is not enough to convert the next character in
the wide data buffer, due to which __codecvt_do_out returns a
__codecvt_partial status. In this case, ftell keeps running in an
infinite loop.
The fix here is to detect the __codecvt_partial status in addition to
checking if the buffer is full. I have also added a test case that
demonstrates the infinite loop.
This patch creates implicit rules to match the abifiles if
abilist-pattern is defined in the architecture Makefile. This allows
machine specific Makefiles to define different abifiles names
(for instance *-le.abilist for powerpc64le).
Carlos O'Donell [Mon, 3 Feb 2014 17:43:25 +0000 (12:43 -0500)]
Fix manual build warnings.
The mixed use of automatic and manual node next, previous,
and top specification causes warning when building the manual.
This fix explicitly specifies the node's next, previous and top
values to fix the warning.
Alexandre Oliva [Mon, 3 Feb 2014 19:17:59 +0000 (17:17 -0200)]
* manual/threads.texi (pthread_key_create, pthread_key_delete,
pthread_getspecific, pthread_setspecific): Format with
@deftypefun, and add @safety note.
* manual/signal.texi: Move comments that analyze the above
functions to their home place.