]>
Commit | Line | Data |
---|---|---|
14e21f86 AP |
1 | =pod |
2 | ||
3 | =head1 NAME | |
4 | ||
eeac54ef | 5 | OPENSSL_ia32cap - the x86[_64] processor capabilities vector |
14e21f86 AP |
6 | |
7 | =head1 SYNOPSIS | |
8 | ||
eeac54ef | 9 | env OPENSSL_ia32cap=... <application> |
14e21f86 AP |
10 | |
11 | =head1 DESCRIPTION | |
12 | ||
eeac54ef AP |
13 | OpenSSL supports a range of x86[_64] instruction set extensions. These |
14 | extensions are denoted by individual bits in capability vector returned | |
15 | by processor in EDX:ECX register pair after executing CPUID instruction | |
16 | with EAX=1 input value (see Intel Application Note #241618). This vector | |
17 | is copied to memory upon toolkit initialization and used to choose | |
18 | between different code paths to provide optimal performance across wide | |
19 | range of processors. For the moment of this writing following bits are | |
20 | significant: | |
b9064221 | 21 | |
e1271ac2 | 22 | =over 4 |
aafbe1cc | 23 | |
b9064221 AP |
24 | =item bit #4 denoting presence of Time-Stamp Counter. |
25 | ||
26 | =item bit #19 denoting availability of CLFLUSH instruction; | |
27 | ||
28 | =item bit #20, reserved by Intel, is used to choose among RC4 code paths; | |
29 | ||
30 | =item bit #23 denoting MMX support; | |
31 | ||
32 | =item bit #24, FXSR bit, denoting availability of XMM registers; | |
33 | ||
34 | =item bit #25 denoting SSE support; | |
35 | ||
36 | =item bit #26 denoting SSE2 support; | |
37 | ||
aafbe1cc MC |
38 | =item bit #28 denoting Hyperthreading, which is used to distinguish |
39 | cores with shared cache; | |
b9064221 | 40 | |
4bb90087 | 41 | =item bit #30, reserved by Intel, denotes specifically Intel CPUs; |
b9064221 AP |
42 | |
43 | =item bit #33 denoting availability of PCLMULQDQ instruction; | |
44 | ||
45 | =item bit #41 denoting SSSE3, Supplemental SSE3, support; | |
46 | ||
4bb90087 | 47 | =item bit #43 denoting AMD XOP support (forced to zero on non-AMD CPUs); |
b9064221 | 48 | |
2ac68bd6 AP |
49 | =item bit #54 denoting availability of MOVBE instruction; |
50 | ||
b9064221 AP |
51 | =item bit #57 denoting AES-NI instruction set extension; |
52 | ||
2ac68bd6 AP |
53 | =item bit #58, XSAVE bit, lack of which in combination with MOVBE is used |
54 | to identify Atom Silvermont core; | |
55 | ||
b9064221 AP |
56 | =item bit #59, OSXSAVE bit, denoting availability of YMM registers; |
57 | ||
58 | =item bit #60 denoting AVX extension; | |
162f677d | 59 | |
301799b8 AP |
60 | =item bit #62 denoting availability of RDRAND instruction; |
61 | ||
aafbe1cc MC |
62 | =back |
63 | ||
2ac68bd6 AP |
64 | For example, in 32-bit application context clearing bit #26 at run-time |
65 | disables high-performance SSE2 code present in the crypto library, while | |
66 | clearing bit #24 disables SSE2 code operating on 128-bit XMM register | |
67 | bank. You might have to do the latter if target OpenSSL application is | |
68 | executed on SSE2 capable CPU, but under control of OS that does not | |
eeac54ef AP |
69 | enable XMM registers. Historically address of the capability vector copy |
70 | was exposed to application through OPENSSL_ia32cap_loc(), but not | |
71 | anymore. Now the only way to affect the capability detection is to set | |
9d22666e | 72 | OPENSSL_ia32cap environment variable prior target application start. To |
eeac54ef AP |
73 | give a specific example, on Intel P4 processor 'env |
74 | OPENSSL_ia32cap=0x16980010 apps/openssl', or better yet 'env | |
75 | OPENSSL_ia32cap=~0x1000000 apps/openssl' would achieve the desired | |
76 | effect. Alternatively you can reconfigure the toolkit with no-sse2 | |
2ac68bd6 | 77 | option and recompile. |
14e21f86 | 78 | |
eeac54ef AP |
79 | Less intuitive is clearing bit #28, or ~0x10000000 in the "environment |
80 | variable" terms. The truth is that it's not copied from CPUID output | |
81 | verbatim, but is adjusted to reflect whether or not the data cache is | |
82 | actually shared between logical cores. This in turn affects the decision | |
83 | on whether or not expensive countermeasures against cache-timing attacks | |
84 | are applied, most notably in AES assembler module. | |
c5cd28bd | 85 | |
2ac68bd6 AP |
86 | The capability vector is further extended with EBX value returned by |
87 | CPUID with EAX=7 and ECX=0 as input. Following bits are significant: | |
c5cd28bd | 88 | |
e1271ac2 | 89 | =over 4 |
aafbe1cc | 90 | |
c5cd28bd AP |
91 | =item bit #64+3 denoting availability of BMI1 instructions, e.g. ANDN; |
92 | ||
93 | =item bit #64+5 denoting availability of AVX2 instructions; | |
94 | ||
2ac68bd6 | 95 | =item bit #64+8 denoting availability of BMI2 instructions, e.g. MULX |
aafbe1cc | 96 | and RORX; |
c5cd28bd | 97 | |
2ac68bd6 AP |
98 | =item bit #64+16 denoting availability of AVX512F extension; |
99 | ||
c5cd28bd AP |
100 | =item bit #64+18 denoting availability of RDSEED instruction; |
101 | ||
aafbe1cc MC |
102 | =item bit #64+19 denoting availability of ADCX and ADOX instructions; |
103 | ||
569204be AP |
104 | =item bit #64+21 denoting availability of VPMADD52[LH]UQ instructions, |
105 | a.k.a. AVX512IFMA extension; | |
106 | ||
2ac68bd6 AP |
107 | =item bit #64+29 denoting availability of SHA extension; |
108 | ||
109 | =item bit #64+30 denoting availability of AVX512BW extension; | |
110 | ||
111 | =item bit #64+31 denoting availability of AVX512VL extension; | |
112 | ||
d6ee8f3d AP |
113 | =item bit #64+41 denoting availability of VAES extension; |
114 | ||
115 | =item bit #64+42 denoting availability of VPCLMULQDQ extension; | |
116 | ||
aafbe1cc | 117 | =back |
99ec4fdb | 118 | |
2ac68bd6 AP |
119 | To control this extended capability word use ':' as delimiter when |
120 | setting up OPENSSL_ia32cap environment variable. For example assigning | |
121 | ':~0x20' would disable AVX2 code paths, and ':0' - all post-AVX | |
122 | extensions. | |
123 | ||
124 | It should be noted that whether or not some of the most "fancy" | |
125 | extension code paths are actually assembled depends on current assembler | |
126 | version. Base minimum of AES-NI/PCLMULQDQ, SSSE3 and SHA extension code | |
100ebb32 | 127 | paths are always assembled. Apart from that, minimum assembler version |
2ac68bd6 AP |
128 | requirements are summarized in below table: |
129 | ||
130 | Extension | GNU as | nasm | llvm | |
131 | ------------+--------+--------+-------- | |
132 | AVX | 2.19 | 2.09 | 3.0 | |
133 | AVX2 | 2.22 | 2.10 | 3.1 | |
bf78883d | 134 | ADCX/ADOX | 2.23 | 2.10 | 3.3 |
569204be AP |
135 | AVX512 | 2.25 | 2.11.8 | see NOTES |
136 | AVX512IFMA | 2.26 | 2.11.8 | see NOTES | |
100ebb32 | 137 | VAES | 2.30 | 2.13.3 | |
569204be AP |
138 | |
139 | =head1 NOTES | |
140 | ||
141 | Even though AVX512 support was implemented in llvm 3.6, compilation of | |
142 | assembly modules apparently requires explicit -march flag. But then | |
143 | compiler generates processor-specific code, which in turn contradicts | |
144 | the mere idea of run-time switch execution facilitated by the variable | |
145 | in question. Till the limitation is lifted, it's possible to work around | |
146 | the problem by making build procedure use following script: | |
147 | ||
148 | #!/bin/sh | |
149 | exec clang -no-integrated-as "$@" | |
150 | ||
151 | instead of real clang. In which case it doesn't matter which clang | |
152 | version is used, as it is GNU assembler version that will be checked. | |
2ac68bd6 | 153 | |
a085f43f PY |
154 | =head1 RETURN VALUES |
155 | ||
156 | Not available. | |
157 | ||
e2f92610 RS |
158 | =head1 COPYRIGHT |
159 | ||
48e5119a | 160 | Copyright 2004-2018 The OpenSSL Project Authors. All Rights Reserved. |
e2f92610 RS |
161 | |
162 | Licensed under the OpenSSL license (the "License"). You may not use | |
163 | this file except in compliance with the License. You can obtain a copy | |
164 | in the file LICENSE in the source distribution or at | |
165 | L<https://www.openssl.org/source/license.html>. | |
166 | ||
167 | =cut |