]>
Commit | Line | Data |
---|---|---|
fef94eab SP |
1 | Using the glibc microbenchmark suite |
2 | ==================================== | |
3 | ||
4 | The glibc microbenchmark suite automatically generates code for specified | |
5 | functions, builds and calls them repeatedly for given inputs to give some | |
6 | basic performance properties of the function. | |
7 | ||
8 | Running the benchmark: | |
9 | ===================== | |
10 | ||
df26ea53 SP |
11 | The benchmark needs python 2.7 or later in addition to the |
12 | dependencies required to build the GNU C Library. One may run the | |
13 | benchmark by invoking make as follows: | |
fef94eab SP |
14 | |
15 | $ make bench | |
16 | ||
17 | This runs each function for 10 seconds and appends its output to | |
18 | benchtests/bench.out. To ensure that the tests are rebuilt, one could run: | |
19 | ||
20 | $ make bench-clean | |
21 | ||
22 | The duration of each test can be configured setting the BENCH_DURATION variable | |
23 | in the call to make. One should run `make bench-clean' before changing | |
24 | BENCH_DURATION. | |
25 | ||
26 | $ make BENCH_DURATION=1 bench | |
27 | ||
28 | The benchmark suite does function call measurements using architecture-specific | |
29 | high precision timing instructions whenever available. When such support is | |
7621e38b | 30 | not available, it uses clock_gettime (CLOCK_MONOTONIC). |
fef94eab | 31 | |
7cc65773 L |
32 | On x86 processors, RDTSCP instruction provides more precise timing data |
33 | than RDTSC instruction. All x86 processors since 2010 support RDTSCP | |
34 | instruction. One can force the benchmark to use RDTSCP by invoking make | |
35 | as follows: | |
36 | ||
37 | $ make USE_RDTSCP=1 bench | |
38 | ||
39 | One must run `make bench-clean' before changing the measurement method. | |
40 | ||
bfdda211 SP |
41 | Running benchmarks on another target: |
42 | ==================================== | |
43 | ||
44 | If the target where you want to run benchmarks is not capable of building the | |
45 | code or you're cross-building, you could build and execute the benchmark in | |
46 | separate steps. On the build system run: | |
47 | ||
48 | $ make bench-build | |
49 | ||
50 | and then copy the source and build directories to the target and run the | |
51 | benchmarks from the build directory as usual: | |
52 | ||
53 | $ make bench | |
54 | ||
55 | make sure the copy preserves timestamps by using either rsync or scp -p | |
2d304f3c SP |
56 | otherwise the above command may try to build the benchmark again. Benchmarks |
57 | that require generated code to be executed during the build are skipped when | |
58 | cross-building. | |
bfdda211 | 59 | |
0422ed1e VR |
60 | Running subsets of benchmarks: |
61 | ============================== | |
62 | ||
63 | To run only a subset of benchmarks, one may invoke make as follows | |
64 | ||
65 | $ make bench BENCHSET="bench-pthread bench-math malloc-thread" | |
66 | ||
67 | where BENCHSET may be a space-separated list of the following values: | |
68 | ||
69 | bench-math | |
70 | bench-pthread | |
71 | bench-string | |
72 | string-benchset | |
73 | wcsmbs-benchset | |
74 | stdlib-benchset | |
75 | stdio-common-benchset | |
76 | math-benchset | |
77 | malloc-thread | |
78 | ||
fef94eab SP |
79 | Adding a function to benchtests: |
80 | =============================== | |
81 | ||
82 | If the name of the function is `foo', then the following procedure should allow | |
83 | one to add `foo' to the bench tests: | |
84 | ||
85 | - Append the function name to the bench variable in the Makefile. | |
86 | ||
a357259b SP |
87 | - Make a file called `foo-inputs` to provide the definition and input for the |
88 | function. The file should have some directives telling the parser script | |
89 | about the function and then one input per line. Directives are lines that | |
90 | have a special meaning for the parser and they begin with two hashes '##'. | |
91 | The following directives are recognized: | |
92 | ||
93 | - args: This should be assigned a colon separated list of types of the input | |
94 | arguments. This directive may be skipped if the function does not take any | |
9298ecba SP |
95 | inputs. One may identify output arguments by nesting them in <>. The |
96 | generator will create variables to get outputs from the calling function. | |
a357259b SP |
97 | - ret: This should be assigned the type that the function returns. This |
98 | directive may be skipped if the function does not return a value. | |
40fefba1 | 99 | - includes: This should be assigned a comma-separated list of headers that |
a357259b | 100 | need to be included to provide declarations for the function and types it |
40fefba1 TR |
101 | may need (specifically, this includes using "#include <header>"). |
102 | - include-sources: This should be assigned a comma-separated list of source | |
103 | files that need to be included to provide definitions of global variables | |
104 | and functions (specifically, this includes using "#include "source"). | |
6a5d6ea1 TR |
105 | See pthread_once-inputs and pthreads_once-source.c for an example of how |
106 | to use this to benchmark a function that needs state across several calls. | |
15eaf6ff | 107 | - init: Name of an initializer function to call to initialize the benchtest. |
a357259b SP |
108 | - name: See following section for instructions on how to use this directive. |
109 | ||
110 | Lines beginning with a single hash '#' are treated as comments. See | |
111 | pow-inputs for an example of an input file. | |
fef94eab SP |
112 | |
113 | Multiple execution units per function: | |
114 | ===================================== | |
115 | ||
116 | Some functions have distinct performance characteristics for different input | |
117 | domains and it may be necessary to measure those separately. For example, some | |
118 | math functions perform computations at different levels of precision (64-bit vs | |
119 | 240-bit vs 768-bit) and mixing them does not give a very useful picture of the | |
120 | performance of these functions. One could separate inputs for these domains in | |
121 | the same file by using the `name' directive that looks something like this: | |
122 | ||
123 | ##name: 240bit | |
124 | ||
125 | See the pow-inputs file for an example of what such a partitioned input file | |
126 | would look like. | |
c1f75dc3 | 127 | |
beb52f50 WD |
128 | It is also possible to measure throughput of a (partial) trace extracted from |
129 | a real workload. In this case the whole trace is iterated over multiple times | |
130 | rather than repeating every input multiple times. This can be done via: | |
131 | ||
132 | ##name: workload-<name> | |
133 | ||
c1f75dc3 SP |
134 | Benchmark Sets: |
135 | ============== | |
136 | ||
137 | In addition to standard benchmarking of functions, one may also generate | |
138 | custom outputs for a set of functions. This is currently used by string | |
139 | function benchmarks where the aim is to compare performance between | |
140 | implementations at various alignments and for various sizes. | |
141 | ||
142 | To add a benchset for `foo': | |
143 | ||
144 | - Add `foo' to the benchset variable. | |
145 | - Write your bench-foo.c that prints out the measurements to stdout. | |
146 | - On execution, a bench-foo.out is created in $(objpfx) with the contents of | |
147 | stdout. | |
06b1de23 SP |
148 | |
149 | Reading String Benchmark Results: | |
150 | ================================ | |
151 | ||
152 | Some of the string benchmark results are now in JSON to make it easier to read | |
153 | in scripts. Use the benchtests/compare_strings.py script to show the results | |
154 | in a tabular format, generate graphs and more. Run | |
155 | ||
156 | benchtests/scripts/compare_strings.py -h | |
157 | ||
158 | for usage information. |