+2018-10-24 H.J. Lu <hongjiu.lu@intel.com>
+
+ * benchtests/Makefile (CPPFLAGS-nonlib): Add -DUSE_RDTSCP if
+ USE_RDTSCP is defined.
+ * sysdeps/x86/hp-timing.h (HP_TIMING_NOW): Use RDTSCP if
+ USE_RDTSCP is defined.
+
2018-10-23 Adhemerval Zanella <adhemerval.zanella@linaro.org>
* misc/tst-preadvwritev2-common.c (IOV_MAX): Define if not
# HP_TIMING if it is available.
ifdef USE_CLOCK_GETTIME
CPPFLAGS-nonlib += -DUSE_CLOCK_GETTIME
+else
+# On x86 processors, use RDTSCP, instead of RDTSC, to measure performance
+# of functions. All x86 processors since 2010 support RDTSCP instruction.
+ifdef USE_RDTSCP
+CPPFLAGS-nonlib += -DUSE_RDTSCP
+endif
endif
DETAILED_OPT :=
Again, one must run `make bench-clean' before changing the measurement method.
+On x86 processors, RDTSCP instruction provides more precise timing data
+than RDTSC instruction. All x86 processors since 2010 support RDTSCP
+instruction. One can force the benchmark to use RDTSCP by invoking make
+as follows:
+
+ $ make USE_RDTSCP=1 bench
+
+One must run `make bench-clean' before changing the measurement method.
+
Running benchmarks on another target:
====================================
NB: Use __builtin_ia32_rdtsc directly since including <x86intrin.h>
makes building glibc very slow. */
-# define HP_TIMING_NOW(Var) ((Var) = __builtin_ia32_rdtsc ())
+# ifdef USE_RDTSCP
+/* RDTSCP waits until all previous instructions have executed and all
+ previous loads are globally visible before reading the counter.
+ RDTSC doesn't wait until all previous instructions have been executed
+ before reading the counter. */
+# define HP_TIMING_NOW(Var) \
+ (__extension__ ({ \
+ unsigned int __aux; \
+ (Var) = __builtin_ia32_rdtscp (&__aux); \
+ }))
+# else
+# define HP_TIMING_NOW(Var) ((Var) = __builtin_ia32_rdtsc ())
+# endif
# include <hp-timing-common.h>
#else