]>
Commit | Line | Data |
---|---|---|
2800db82 MF |
1 | .\" Written by Mike Frysinger <vapier@gentoo.org> |
2 | .\" | |
3 | .\" %%%LICENSE_START(PUBLIC_DOMAIN) | |
4 | .\" This page is in the public domain. | |
5 | .\" %%%LICENSE_END | |
6 | .\" | |
fb634bd8 | 7 | .\" Useful background: |
8635ed1b MK |
8 | .\" http://articles.manugarg.com/systemcallinlinux2_6.html |
9 | .\" https://lwn.net/Articles/446528/ | |
10 | .\" http://www.linuxjournal.com/content/creating-vdso-colonels-other-chicken | |
11 | .\" http://www.trilithium.com/johan/2005/08/linux-gate/ | |
fb634bd8 MK |
12 | .\" |
13 | .TH VDSO 7 2014-01-01 "Linux" "Linux Programmer's Manual" | |
2800db82 MF |
14 | .SH NAME |
15 | vDSO \- overview of the virtual ELF dynamic shared object | |
16 | .SH SYNOPSIS | |
17 | .B #include <sys/auxv.h> | |
18 | ||
19 | .B void *vdso = (uintptr_t) getauxval(AT_SYSINFO_EHDR); | |
20 | .SH DESCRIPTION | |
8635ed1b MK |
21 | The "vDSO" is a small shared library that |
22 | the kernel automatically maps into the | |
2800db82 | 23 | address space of all user-space applications. |
fb634bd8 | 24 | Applications usually do not need to concern themselves with these details |
2800db82 | 25 | as the vDSO is most commonly called by the C library. |
f6816de9 | 26 | This way you can code in the normal way using standard functions |
fb634bd8 MK |
27 | and the C library will take care |
28 | of using any functionality that is available via the vDSO. | |
2800db82 MF |
29 | |
30 | Why does the vDSO exist at all? | |
8635ed1b | 31 | There are some system calls the kernel provides that |
dd6b62ec | 32 | user-space code ends up using frequently, |
8635ed1b | 33 | to the point that such calls can dominate overall performance. |
fb634bd8 | 34 | This is due both to the frequency of the call as well as the |
35432a03 | 35 | context-switch overhead that results |
2800db82 MF |
36 | from exiting user space and entering the kernel. |
37 | ||
8635ed1b MK |
38 | The rest of this documentation is geared toward the curious and/or |
39 | C library writers rather than general developers. | |
2800db82 MF |
40 | If you're trying to call the vDSO in your own application rather than using |
41 | the C library, you're most likely doing it wrong. | |
42 | .SS Example background | |
43 | Making system calls can be slow. | |
fb634bd8 MK |
44 | In x86 32-bit systems, you can trigger a software interrupt |
45 | .RI ( "int $0x80" ) | |
46 | to tell the kernel you wish to make a system call. | |
47 | However, this instruction is expensive: it goes through | |
48 | the full interrupt-handling paths | |
49 | in the processor's microcode as well as in the kernel. | |
50 | Newer processors have faster (but backward incompatible) instructions to | |
2800db82 MF |
51 | initiate system calls. |
52 | Rather than require the C library to figure out if this functionality is | |
8635ed1b | 53 | available at run time, |
fb634bd8 | 54 | the C library can use functions provided by the kernel in |
2800db82 MF |
55 | the vDSO. |
56 | ||
57 | Note that the terminology can be confusing. | |
fb634bd8 MK |
58 | On x86 systems, the vDSO function |
59 | used to determine the preferred method of making a system call is | |
60 | named "__kernel_vsyscall", but on x86_64, | |
8635ed1b MK |
61 | the term "vsyscall" also refers to an obsolete way to ask the kernel |
62 | what time it is or what CPU the caller is on. | |
2800db82 | 63 | |
fb634bd8 MK |
64 | One frequently used system call is |
65 | .BR gettimeofday (2). | |
66 | This system call is called both directly by user-space applications | |
67 | as well as indirectly by | |
2800db82 | 68 | the C library. |
8635ed1b MK |
69 | Think timestamps or timing loops or polling\(emall of these |
70 | frequently need to know what time it is right now. | |
71 | This information is also not secret\(emany application in any | |
72 | privilege mode (root or any unprivileged user) will get the same answer. | |
73 | Thus the kernel arranges for the information required to answer | |
74 | this question to be placed in memory the process can access. | |
fb634bd8 MK |
75 | Now a call to |
76 | .BR gettimeofday (2) | |
77 | changes from a system call to a normal function | |
2800db82 MF |
78 | call and a few memory accesses. |
79 | .SS Finding the vDSO | |
8635ed1b MK |
80 | The base address of the vDSO (if one exists) is passed by the kernel to |
81 | each program in the initial auxiliary vector (see | |
d3532647 | 82 | .BR getauxval (3)), |
fb634bd8 | 83 | via the |
2800db82 MF |
84 | .B AT_SYSINFO_EHDR |
85 | tag. | |
86 | ||
87 | You must not assume the vDSO is mapped at any particular location in the | |
88 | user's memory map. | |
8635ed1b | 89 | The base address will usually be randomized at run time every time a new |
2800db82 MF |
90 | process image is created (at |
91 | .BR execve (2) | |
92 | time). | |
fb634bd8 MK |
93 | This is done for security reasons, |
94 | to prevent "return-to-libc" attacks. | |
2800db82 | 95 | |
fb634bd8 | 96 | For some architectures, there is also an |
2800db82 MF |
97 | .B AT_SYSINFO |
98 | tag. | |
99 | This is used only for locating the vsyscall entry point and is frequently | |
100 | omitted or set to 0 (meaning it's not available). | |
fb634bd8 MK |
101 | This tag is a throwback to the initial vDSO work (see |
102 | .IR History | |
103 | below) and its use should be avoided. | |
2800db82 MF |
104 | .SS File format |
105 | Since the vDSO is a fully formed ELF image, you can do symbol lookups on it. | |
8635ed1b MK |
106 | This allows new symbols to be added with newer kernel releases, |
107 | and allows the C library to detect available functionality at | |
108 | run time when running under different kernel versions. | |
fb634bd8 | 109 | Oftentimes the C library will do detection with the first call and then |
2800db82 MF |
110 | cache the result for subsequent calls. |
111 | ||
112 | All symbols are also versioned (using the GNU version format). | |
113 | This allows the kernel to update the function signature without breaking | |
fb634bd8 | 114 | backward compatibility. |
2800db82 MF |
115 | This means changing the arguments that the function accepts as well as the |
116 | return value. | |
fb634bd8 MK |
117 | Thus, when looking up a symbol in the vDSO, |
118 | you must always include the version | |
2800db82 MF |
119 | to match the ABI you expect. |
120 | ||
fb634bd8 MK |
121 | Typically the vDSO follows the naming convention of prefixing |
122 | all symbols with "__vdso_" or "__kernel_" | |
123 | so as to distinguish them from other standard symbols. | |
124 | For example, the "gettimeofday" function is named "__vdso_gettimeofday". | |
2800db82 | 125 | |
fb634bd8 MK |
126 | You use the standard C calling conventions when calling |
127 | any of these functions. | |
2800db82 MF |
128 | No need to worry about weird register or stack behavior. |
129 | .SH NOTES | |
130 | .SS Source | |
8635ed1b MK |
131 | When you compile the kernel, |
132 | it will automatically compile and link the vDSO code for you. | |
fb634bd8 | 133 | You will frequently find it under the architecture-specific directory: |
2800db82 MF |
134 | |
135 | find arch/$ARCH/ -name '*vdso*.so*' -o -name '*gate*.so*' | |
136 | ||
2800db82 | 137 | .SS vDSO names |
35432a03 | 138 | The name of vDSO varies across architectures. |
d3532647 | 139 | It will often show up in things like glibc's |
fb634bd8 MK |
140 | .BR ldd (1) |
141 | output. | |
2800db82 MF |
142 | The exact name should not matter to any code, so do not hardcode it. |
143 | .if t \{\ | |
144 | .ft CW | |
145 | \} | |
146 | .TS | |
147 | l l. | |
148 | user ABI vDSO name | |
149 | _ | |
150 | aarch64 linux-vdso.so.1 | |
151 | ia64 linux-gate.so.1 | |
152 | ppc/32 linux-vdso32.so.1 | |
153 | ppc/64 linux-vdso64.so.1 | |
154 | s390 linux-vdso32.so.1 | |
155 | s390x linux-vdso64.so.1 | |
156 | sh linux-gate.so.1 | |
157 | i386 linux-gate.so.1 | |
158 | x86_64 linux-vdso.so.1 | |
159 | x86/x32 linux-vdso.so.1 | |
160 | .TE | |
161 | .if t \{\ | |
162 | .in | |
163 | .ft P | |
164 | \} | |
dd6b62ec | 165 | .SH ARCHITECTURE-SPECIFIC NOTES |
f6816de9 MK |
166 | The subsections below provide architecture-specific notes |
167 | on the vDSO. | |
168 | ||
169 | Note that the vDSO that is used is based on the ABI of your user-space code | |
170 | and not the ABI of the kernel. | |
171 | Thus, for example, | |
172 | when you run an i386 32-bit ELF binary, | |
173 | you'll get the same vDSO regardless of whether you run it under | |
174 | an i386 32-bit kernel or under an x86_64 64-bit kernel. | |
dd6b62ec | 175 | Therefore, the name of the user-space ABI should be used to determine |
f6816de9 | 176 | which of the sections below is relevant. |
fb634bd8 | 177 | .SS ARM functions |
2800db82 MF |
178 | .\" See linux/arch/arm/kernel/entry-armv.S |
179 | .\" See linux/Documentation/arm/kernel_user_helpers.txt | |
fb634bd8 | 180 | The ARM port has a code page full of utility functions. |
2800db82 MF |
181 | Since it's just a raw page of code, there is no ELF information for doing |
182 | symbol lookups or versioning. | |
183 | It does provide support for different versions though. | |
184 | ||
fb634bd8 MK |
185 | For information on this code page, |
186 | it's best to refer to the kernel documentation | |
2800db82 | 187 | as it's extremely detailed and covers everything you need to know: |
fb634bd8 | 188 | .IR Documentation/arm/kernel_user_helpers.txt . |
2800db82 MF |
189 | .SS aarch64 functions |
190 | .\" See linux/arch/arm64/kernel/vdso/vdso.lds.S | |
f6816de9 | 191 | The table below lists the symbols exported by the vDSO. |
2800db82 MF |
192 | .if t \{\ |
193 | .ft CW | |
194 | \} | |
195 | .TS | |
196 | l l. | |
197 | symbol version | |
198 | _ | |
199 | __kernel_rt_sigreturn LINUX_2.6.39 | |
200 | __kernel_gettimeofday LINUX_2.6.39 | |
201 | __kernel_clock_gettime LINUX_2.6.39 | |
202 | __kernel_clock_getres LINUX_2.6.39 | |
203 | .TE | |
204 | .if t \{\ | |
205 | .in | |
206 | .ft P | |
207 | \} | |
208 | .SS bfin (Blackfin) functions | |
209 | .\" See linux/arch/blackfin/kernel/fixed_code.S | |
210 | .\" See http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code | |
8635ed1b MK |
211 | As this CPU lacks a memory management unit (MMU), |
212 | it doesn't set up a vDSO in the normal sense. | |
213 | Instead, it maps at boot time a few raw functions into | |
214 | a fixed location in memory. | |
2800db82 | 215 | User-space applications then call directly into that region. |
8635ed1b MK |
216 | There is no provision for backward compatibility |
217 | beyond sniffing raw opcodes, | |
fb634bd8 | 218 | but as this is an embedded CPU, it can get away with things\(emsome of the |
2800db82 MF |
219 | object formats it runs aren't even ELF based (they're bFLT/FLAT). |
220 | ||
f6816de9 MK |
221 | For information on this code page, |
222 | it's best to refer to the public documentation: | |
2800db82 MF |
223 | .br |
224 | http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code | |
225 | .SS ia64 (Itanium) functions | |
226 | .\" See linux/arch/ia64/kernel/gate.lds.S | |
227 | .\" Also linux/arch/ia64/kernel/fsys.S and linux/Documentation/ia64/fsys.txt | |
f6816de9 | 228 | The table below lists the symbols exported by the vDSO. |
2800db82 MF |
229 | .if t \{\ |
230 | .ft CW | |
231 | \} | |
232 | .TS | |
233 | l l. | |
234 | symbol version | |
235 | _ | |
236 | __kernel_sigtramp LINUX_2.5 | |
237 | __kernel_syscall_via_break LINUX_2.5 | |
238 | __kernel_syscall_via_epc LINUX_2.5 | |
239 | .TE | |
240 | .if t \{\ | |
241 | .in | |
242 | .ft P | |
243 | \} | |
244 | ||
fb634bd8 | 245 | The Itanium port is somewhat tricky. |
8635ed1b MK |
246 | In addition to the vDSO above, it also has "light-weight system calls" |
247 | (also known as "fast syscalls" or "fsys"). | |
fb634bd8 MK |
248 | You can invoke these via the |
249 | .I __kernel_syscall_via_epc | |
250 | vDSO helper. | |
2800db82 MF |
251 | The system calls listed here have the same semantics as if you called them |
252 | directly via | |
fb634bd8 | 253 | .BR syscall (2), |
2800db82 MF |
254 | so refer to the relevant |
255 | documentation for each. | |
256 | The table below lists the functions available via this mechanism. | |
257 | .if t \{\ | |
258 | .ft CW | |
259 | \} | |
260 | .TS | |
261 | l. | |
262 | function | |
263 | _ | |
264 | clock_gettime | |
265 | getcpu | |
266 | getpid | |
267 | getppid | |
268 | gettimeofday | |
269 | set_tid_address | |
270 | .TE | |
271 | .if t \{\ | |
272 | .in | |
273 | .ft P | |
274 | \} | |
275 | .SS parisc (hppa) functions | |
276 | .\" See linux/arch/parisc/kernel/syscall.S | |
277 | .\" See linux/Documentation/parisc/registers | |
8635ed1b MK |
278 | The parisc port has a code page full of utility functions |
279 | called a gateway page. | |
fb634bd8 MK |
280 | Rather than use the normal ELF auxiliary vector approach, |
281 | it passes the address of | |
2800db82 MF |
282 | the page to the process via the SR2 register. |
283 | The permissions on the page are such that merely executing those addresses | |
dd6b62ec | 284 | automatically executes with kernel privileges and not in user space. |
2800db82 MF |
285 | This is done to match the way HP-UX works. |
286 | ||
287 | Since it's just a raw page of code, there is no ELF information for doing | |
288 | symbol lookups or versioning. | |
fb634bd8 MK |
289 | Simply call into the appropriate offset via the branch instruction, |
290 | for example: | |
291 | ||
292 | ble <offset>(%sr2, %r0) | |
2800db82 MF |
293 | .if t \{\ |
294 | .ft CW | |
295 | \} | |
296 | .TS | |
297 | l l. | |
298 | offset function | |
299 | _ | |
300 | 00b0 lws_entry | |
301 | 00e0 set_thread_pointer | |
302 | 0100 linux_gateway_entry (syscall) | |
303 | 0268 syscall_nosys | |
304 | 0274 tracesys | |
305 | 0324 tracesys_next | |
306 | 0368 tracesys_exit | |
307 | 03a0 tracesys_sigexit | |
308 | 03b8 lws_start | |
309 | 03dc lws_exit_nosys | |
310 | 03e0 lws_exit | |
311 | 03e4 lws_compare_and_swap64 | |
312 | 03e8 lws_compare_and_swap | |
313 | 0404 cas_wouldblock | |
314 | 0410 cas_action | |
315 | .TE | |
316 | .if t \{\ | |
317 | .in | |
318 | .ft P | |
319 | \} | |
320 | .SS ppc/32 functions | |
321 | .\" See linux/arch/powerpc/kernel/vdso32/vdso32.lds.S | |
f6816de9 | 322 | The table below lists the symbols exported by the vDSO. |
2800db82 MF |
323 | The functions marked with a |
324 | .I * | |
f6816de9 MK |
325 | are available only when the kernel is |
326 | a PowerPC64 (64-bit) kernel. | |
2800db82 MF |
327 | .if t \{\ |
328 | .ft CW | |
329 | \} | |
330 | .TS | |
331 | l l. | |
332 | symbol version | |
333 | _ | |
334 | __kernel_clock_getres LINUX_2.6.15 | |
335 | __kernel_clock_gettime LINUX_2.6.15 | |
336 | __kernel_datapage_offset LINUX_2.6.15 | |
337 | __kernel_get_syscall_map LINUX_2.6.15 | |
338 | __kernel_get_tbfreq LINUX_2.6.15 | |
339 | __kernel_getcpu \fI*\fR LINUX_2.6.15 | |
340 | __kernel_gettimeofday LINUX_2.6.15 | |
341 | __kernel_sigtramp_rt32 LINUX_2.6.15 | |
342 | __kernel_sigtramp32 LINUX_2.6.15 | |
343 | __kernel_sync_dicache LINUX_2.6.15 | |
344 | __kernel_sync_dicache_p5 LINUX_2.6.15 | |
345 | .TE | |
346 | .if t \{\ | |
347 | .in | |
348 | .ft P | |
349 | \} | |
350 | .SS ppc/64 functions | |
351 | .\" See linux/arch/powerpc/kernel/vdso64/vdso64.lds.S | |
f6816de9 | 352 | The table below lists the symbols exported by the vDSO. |
2800db82 MF |
353 | .if t \{\ |
354 | .ft CW | |
355 | \} | |
356 | .TS | |
357 | l l. | |
358 | symbol version | |
359 | _ | |
360 | __kernel_clock_getres LINUX_2.6.15 | |
361 | __kernel_clock_gettime LINUX_2.6.15 | |
362 | __kernel_datapage_offset LINUX_2.6.15 | |
363 | __kernel_get_syscall_map LINUX_2.6.15 | |
364 | __kernel_get_tbfreq LINUX_2.6.15 | |
365 | __kernel_getcpu LINUX_2.6.15 | |
366 | __kernel_gettimeofday LINUX_2.6.15 | |
367 | __kernel_sigtramp_rt64 LINUX_2.6.15 | |
368 | __kernel_sync_dicache LINUX_2.6.15 | |
369 | __kernel_sync_dicache_p5 LINUX_2.6.15 | |
370 | .TE | |
371 | .if t \{\ | |
372 | .in | |
373 | .ft P | |
374 | \} | |
375 | .SS s390 functions | |
376 | .\" See linux/arch/s390/kernel/vdso32/vdso32.lds.S | |
f6816de9 | 377 | The table below lists the symbols exported by the vDSO. |
2800db82 MF |
378 | .if t \{\ |
379 | .ft CW | |
380 | \} | |
381 | .TS | |
382 | l l. | |
383 | symbol version | |
384 | _ | |
385 | __kernel_clock_getres LINUX_2.6.29 | |
386 | __kernel_clock_gettime LINUX_2.6.29 | |
387 | __kernel_gettimeofday LINUX_2.6.29 | |
388 | .TE | |
389 | .if t \{\ | |
390 | .in | |
391 | .ft P | |
392 | \} | |
393 | .SS s390x functions | |
394 | .\" See linux/arch/s390/kernel/vdso64/vdso64.lds.S | |
f6816de9 | 395 | The table below lists the symbols exported by the vDSO. |
2800db82 MF |
396 | .if t \{\ |
397 | .ft CW | |
398 | \} | |
399 | .TS | |
400 | l l. | |
401 | symbol version | |
402 | _ | |
403 | __kernel_clock_getres LINUX_2.6.29 | |
404 | __kernel_clock_gettime LINUX_2.6.29 | |
405 | __kernel_gettimeofday LINUX_2.6.29 | |
406 | .TE | |
407 | .if t \{\ | |
408 | .in | |
409 | .ft P | |
410 | \} | |
411 | .SS sh (SuperH) functions | |
412 | .\" See linux/arch/sh/kernel/vsyscall/vsyscall.lds.S | |
f6816de9 | 413 | The table below lists the symbols exported by the vDSO. |
2800db82 MF |
414 | .if t \{\ |
415 | .ft CW | |
416 | \} | |
417 | .TS | |
418 | l l. | |
419 | symbol version | |
420 | _ | |
421 | __kernel_rt_sigreturn LINUX_2.6 | |
422 | __kernel_sigreturn LINUX_2.6 | |
423 | __kernel_vsyscall LINUX_2.6 | |
424 | .TE | |
425 | .if t \{\ | |
426 | .in | |
427 | .ft P | |
428 | \} | |
429 | .SS i386 functions | |
430 | .\" See linux/arch/x86/vdso/vdso32/vdso32.lds.S | |
f6816de9 | 431 | The table below lists the symbols exported by the vDSO. |
2800db82 MF |
432 | .if t \{\ |
433 | .ft CW | |
434 | \} | |
435 | .TS | |
436 | l l. | |
437 | symbol version | |
438 | _ | |
439 | __kernel_sigreturn LINUX_2.5 | |
440 | __kernel_rt_sigreturn LINUX_2.5 | |
441 | __kernel_vsyscall LINUX_2.5 | |
442 | .TE | |
443 | .if t \{\ | |
444 | .in | |
445 | .ft P | |
446 | \} | |
447 | .SS x86_64 functions | |
448 | .\" See linux/arch/x86/vdso/vdso.lds.S | |
f6816de9 | 449 | The table below lists the symbols exported by the vDSO. |
2800db82 MF |
450 | All of these symbols are also available without the "__vdso_" prefix, but |
451 | you should ignore those and stick to the names below. | |
452 | .if t \{\ | |
453 | .ft CW | |
454 | \} | |
455 | .TS | |
456 | l l. | |
457 | symbol version | |
458 | _ | |
459 | __vdso_clock_gettime LINUX_2.6 | |
460 | __vdso_getcpu LINUX_2.6 | |
461 | __vdso_gettimeofday LINUX_2.6 | |
462 | __vdso_time LINUX_2.6 | |
463 | .TE | |
464 | .if t \{\ | |
465 | .in | |
466 | .ft P | |
467 | \} | |
468 | .SS x86/x32 functions | |
469 | .\" See linux/arch/x86/vdso/vdso32.lds.S | |
f6816de9 | 470 | The table below lists the symbols exported by the vDSO. |
2800db82 MF |
471 | .if t \{\ |
472 | .ft CW | |
473 | \} | |
474 | .TS | |
475 | l l. | |
476 | symbol version | |
477 | _ | |
478 | __vdso_clock_gettime LINUX_2.6 | |
479 | __vdso_getcpu LINUX_2.6 | |
480 | __vdso_gettimeofday LINUX_2.6 | |
481 | __vdso_time LINUX_2.6 | |
482 | .TE | |
483 | .if t \{\ | |
484 | .in | |
485 | .ft P | |
486 | \} | |
487 | .SS History | |
fb634bd8 MK |
488 | The vDSO was originally just a single function\(emthe vsyscall. |
489 | In older kernels, you might see that name | |
490 | in a process's memory map rather than "vdso". | |
d3532647 | 491 | Over time, people realized that this mechanism |
fb634bd8 | 492 | was a great way to pass more functionality |
2800db82 MF |
493 | to user space, so it was reconceived as a vDSO in the current format. |
494 | .SH SEE ALSO | |
495 | .BR syscalls (2), | |
496 | .BR getauxval (3), | |
497 | .BR proc (5) | |
498 | ||
fb634bd8 MK |
499 | The documents, examples, and source code in the Linux source code tree: |
500 | .in +4n | |
2800db82 | 501 | .nf |
fb634bd8 | 502 | |
2800db82 | 503 | Documentation/ABI/stable/vdso |
fb634bd8 | 504 | Documentation/ia64/fsys.txt |
2800db82 | 505 | Documentation/vDSO/* (includes examples of using the vDSO) |
fb634bd8 | 506 | |
2800db82 MF |
507 | find arch/ -iname '*vdso*' -o -iname '*gate*' |
508 | .fi | |
fb634bd8 | 509 | .in |