PowerPC - Optimization for str[n]casecmp functions
This patch provides throughput boost for the strcasecmp function
(25% on ppc32 and 40% on ppc64) and strncasecmp (15% on both ppc32
and ppc64) for POWER7. The optimization is done by manually
(strcasecmp) or automatically (strncasecmp) unrolling the test loop
to avoid CPU stalls caused by a test followed by a load.
This patch provides throughput boost for the nearbyint[f] functions
for POWER. For POWER7, it improves performance for nearbyintf from
5 (ppc32) to 6 times (ppc64) and for nearbyint from 2.5 up to 5
times. For POWER6 it improves nearbyintf up to 2x (ppc64) and
nearbyint up to 4x.
PowerPC: Arithmetic function optimizations for POWER
This patch creates inline assembly functions that use intrinsic PPC
floating point instructions when the platform supports them but rely on
the internal GLIBC functions when the instructions are not implemented
(for instance, on POWER4).
Alan Modra [Fri, 19 Aug 2011 16:39:38 +0000 (11:39 -0500)]
Fix profiling on powerpc32 secure-plt shared libs and PIEs
This patch moves the ppc32 _mcount to libc_shared.a, fixing a
long-standing bug with profiling of secure-plt shared libraries and
PIEs. The problem is that a ppc32 PIC PLT call stub uses r30 (GOT
pointer) to load the function address from the PLT, r30 being set up
in the function prologue, but _mcount is called before the function
prologue. So chances are good that r30 will be pointing to the
executable GOT when trying to call _mcount in a shared lib function.
A similar problem can occur in a PIE if a shared lib calls a function
in the executable.
Dave Flaherty [Fri, 19 Aug 2011 15:29:30 +0000 (10:29 -0500)]
Check for finite/infinity parms in IBM Long Double 128 fmal( )
This patch addresses some IBM Long Double 128 fmal () test-ldouble.out
and test-ildoubl.out failures. If the ‘x’ and ‘y’ parameters are
finite values and ‘z’ is infinity, the result of fmal () should be ‘z’
not NaN.
Will Schmidt [Thu, 18 Aug 2011 16:01:44 +0000 (11:01 -0500)]
Provide a throughput boost to the 64-bit power7 strncmp code of approx
15%. The 32-bit throughput is not notably affected by this change, so
the change to the 32-bit code is done to keep the two files in sync
with each other.
These POWER optimizations remove most of the FP->INT conversions in
hypot/hypotf and sinf/cosf on POWER making the computation done with
FP operations instead. This eliminates Load-Hit-Store (LHS) stalls,
increasing performance of hypot/hypotf (by about 100% on POWER7 and
12% on POWER6) and sinf/cosf (by 80% on POWER7 and 30% on POWER6).
These optimizations remove most of the FP->INT conversions making
the computation done with FP operations instead. This eliminates
Load-Hit-Store (LHS) stalls on POWER, increasing performance of
hypot/hypotf (about 50% on POWER7, 25% on POWER6, and 30% on POWER5)
and sinf/cosf (30% on POWER7, 15% on POWER6, and 10% on POWER5).
Ryan S. Arnold [Wed, 16 Feb 2011 19:04:16 +0000 (13:04 -0600)]
Prevent VSX type TOC ref in _dl_start before relocs are resolved.
Disable VSX instruction usage in rtld.c with -mno-vsx so that, under
-O3 optimization, a TOC reference isn't used for a zero constant in a
VSX register prior to resolution of relocations.
Ryan S. Arnold [Tue, 15 Feb 2011 15:50:09 +0000 (09:50 -0600)]
Prevent VSX type TOC ref in _dl_start before relocs are resolved.
Disable VSX instruction usage in rtld.c with -mno-vsx so that, under
-O3 optimization, a TOC reference isn't used for a zero constant in a
VSX register prior to resolution of relocations.
Ryan S. Arnold [Tue, 15 Feb 2011 15:50:09 +0000 (09:50 -0600)]
Prevent VSX type TOC ref in _dl_start before relocs are resolved.
Disable VSX instruction usage in rtld.c with -mno-vsx so that, under
-O3 optimization, a TOC reference isn't used for a zero constant in a
VSX register prior to resolution of relocations.
Ryan Arnold [Mon, 1 Nov 2010 20:38:51 +0000 (15:38 -0500)]
PowerPC64 doesn't need an executable stack and therefore doesn't need
PT_GNU_STACK to make the stack no-exec. This change abstracts the stack
permissions settings into a macro defined in a header.
Luis Machado [Wed, 30 Jun 2010 16:57:38 +0000 (09:57 -0700)]
powerpc: Re-work the Implies structure
This patch tries to organize the implies files for ppc, since there are
a number of processors and most of them are compatible with each other
(backwards compatible).
Having in mind that we start the search for processor-specific files in
the sysdeps/unix/sysv/linux tree
(sysdeps/unix/sysv/linux/powerpc/powerpc[32|64]/[processor]/fpu to be
exact), we would like to grab any linux-specific code from that tree
prior to going through the other tree (sysdeps/powerpc/...).
For that, i removed the Implies files that were originally inside the
fpu directories and placed then in the non-fpu directories (still inside
the unix/sysv/linux tree). If no processor-specific/linux-specific files
could be found, we "imply" the other tree's (sysdeps/powerpc/...) fpu
directory for that specific processor AND also the non-fpu directory for
that same tree.
If, again, no processor-specific code is found, we read another Implies
file that will point to the most compatible processor that we should
grab code from, and so on, until we reach the power4 processor.
So, in summary, the Implies files will live inside these directories
now: