]>
Commit | Line | Data |
---|---|---|
fef94eab SP |
1 | Using the glibc microbenchmark suite |
2 | ==================================== | |
3 | ||
4 | The glibc microbenchmark suite automatically generates code for specified | |
5 | functions, builds and calls them repeatedly for given inputs to give some | |
6 | basic performance properties of the function. | |
7 | ||
8 | Running the benchmark: | |
9 | ===================== | |
10 | ||
df26ea53 SP |
11 | The benchmark needs python 2.7 or later in addition to the |
12 | dependencies required to build the GNU C Library. One may run the | |
13 | benchmark by invoking make as follows: | |
fef94eab SP |
14 | |
15 | $ make bench | |
16 | ||
17 | This runs each function for 10 seconds and appends its output to | |
18 | benchtests/bench.out. To ensure that the tests are rebuilt, one could run: | |
19 | ||
20 | $ make bench-clean | |
21 | ||
22 | The duration of each test can be configured setting the BENCH_DURATION variable | |
23 | in the call to make. One should run `make bench-clean' before changing | |
24 | BENCH_DURATION. | |
25 | ||
26 | $ make BENCH_DURATION=1 bench | |
27 | ||
28 | The benchmark suite does function call measurements using architecture-specific | |
29 | high precision timing instructions whenever available. When such support is | |
30 | not available, it uses clock_gettime (CLOCK_PROCESS_CPUTIME_ID). One can force | |
31 | the benchmark to use clock_gettime by invoking make as follows: | |
32 | ||
33 | $ make USE_CLOCK_GETTIME=1 bench | |
34 | ||
35 | Again, one must run `make bench-clean' before changing the measurement method. | |
36 | ||
bfdda211 SP |
37 | Running benchmarks on another target: |
38 | ==================================== | |
39 | ||
40 | If the target where you want to run benchmarks is not capable of building the | |
41 | code or you're cross-building, you could build and execute the benchmark in | |
42 | separate steps. On the build system run: | |
43 | ||
44 | $ make bench-build | |
45 | ||
46 | and then copy the source and build directories to the target and run the | |
47 | benchmarks from the build directory as usual: | |
48 | ||
49 | $ make bench | |
50 | ||
51 | make sure the copy preserves timestamps by using either rsync or scp -p | |
2d304f3c SP |
52 | otherwise the above command may try to build the benchmark again. Benchmarks |
53 | that require generated code to be executed during the build are skipped when | |
54 | cross-building. | |
bfdda211 | 55 | |
fef94eab SP |
56 | Adding a function to benchtests: |
57 | =============================== | |
58 | ||
59 | If the name of the function is `foo', then the following procedure should allow | |
60 | one to add `foo' to the bench tests: | |
61 | ||
62 | - Append the function name to the bench variable in the Makefile. | |
63 | ||
a357259b SP |
64 | - Make a file called `foo-inputs` to provide the definition and input for the |
65 | function. The file should have some directives telling the parser script | |
66 | about the function and then one input per line. Directives are lines that | |
67 | have a special meaning for the parser and they begin with two hashes '##'. | |
68 | The following directives are recognized: | |
69 | ||
70 | - args: This should be assigned a colon separated list of types of the input | |
71 | arguments. This directive may be skipped if the function does not take any | |
9298ecba SP |
72 | inputs. One may identify output arguments by nesting them in <>. The |
73 | generator will create variables to get outputs from the calling function. | |
a357259b SP |
74 | - ret: This should be assigned the type that the function returns. This |
75 | directive may be skipped if the function does not return a value. | |
40fefba1 | 76 | - includes: This should be assigned a comma-separated list of headers that |
a357259b | 77 | need to be included to provide declarations for the function and types it |
40fefba1 TR |
78 | may need (specifically, this includes using "#include <header>"). |
79 | - include-sources: This should be assigned a comma-separated list of source | |
80 | files that need to be included to provide definitions of global variables | |
81 | and functions (specifically, this includes using "#include "source"). | |
6a5d6ea1 TR |
82 | See pthread_once-inputs and pthreads_once-source.c for an example of how |
83 | to use this to benchmark a function that needs state across several calls. | |
15eaf6ff | 84 | - init: Name of an initializer function to call to initialize the benchtest. |
a357259b SP |
85 | - name: See following section for instructions on how to use this directive. |
86 | ||
87 | Lines beginning with a single hash '#' are treated as comments. See | |
88 | pow-inputs for an example of an input file. | |
fef94eab SP |
89 | |
90 | Multiple execution units per function: | |
91 | ===================================== | |
92 | ||
93 | Some functions have distinct performance characteristics for different input | |
94 | domains and it may be necessary to measure those separately. For example, some | |
95 | math functions perform computations at different levels of precision (64-bit vs | |
96 | 240-bit vs 768-bit) and mixing them does not give a very useful picture of the | |
97 | performance of these functions. One could separate inputs for these domains in | |
98 | the same file by using the `name' directive that looks something like this: | |
99 | ||
100 | ##name: 240bit | |
101 | ||
102 | See the pow-inputs file for an example of what such a partitioned input file | |
103 | would look like. | |
c1f75dc3 | 104 | |
beb52f50 WD |
105 | It is also possible to measure throughput of a (partial) trace extracted from |
106 | a real workload. In this case the whole trace is iterated over multiple times | |
107 | rather than repeating every input multiple times. This can be done via: | |
108 | ||
109 | ##name: workload-<name> | |
110 | ||
c1f75dc3 SP |
111 | Benchmark Sets: |
112 | ============== | |
113 | ||
114 | In addition to standard benchmarking of functions, one may also generate | |
115 | custom outputs for a set of functions. This is currently used by string | |
116 | function benchmarks where the aim is to compare performance between | |
117 | implementations at various alignments and for various sizes. | |
118 | ||
119 | To add a benchset for `foo': | |
120 | ||
121 | - Add `foo' to the benchset variable. | |
122 | - Write your bench-foo.c that prints out the measurements to stdout. | |
123 | - On execution, a bench-foo.out is created in $(objpfx) with the contents of | |
124 | stdout. | |
06b1de23 SP |
125 | |
126 | Reading String Benchmark Results: | |
127 | ================================ | |
128 | ||
129 | Some of the string benchmark results are now in JSON to make it easier to read | |
130 | in scripts. Use the benchtests/compare_strings.py script to show the results | |
131 | in a tabular format, generate graphs and more. Run | |
132 | ||
133 | benchtests/scripts/compare_strings.py -h | |
134 | ||
135 | for usage information. |