]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man7/vdso.7
vdso.7: wfix: repetition fix
[thirdparty/man-pages.git] / man7 / vdso.7
CommitLineData
2800db82
MF
1.\" Written by Mike Frysinger <vapier@gentoo.org>
2.\"
3.\" %%%LICENSE_START(PUBLIC_DOMAIN)
4.\" This page is in the public domain.
5.\" %%%LICENSE_END
6.\"
fb634bd8 7.\" Useful background:
8635ed1b
MK
8.\" http://articles.manugarg.com/systemcallinlinux2_6.html
9.\" https://lwn.net/Articles/446528/
10.\" http://www.linuxjournal.com/content/creating-vdso-colonels-other-chicken
11.\" http://www.trilithium.com/johan/2005/08/linux-gate/
fb634bd8
MK
12.\"
13.TH VDSO 7 2014-01-01 "Linux" "Linux Programmer's Manual"
2800db82
MF
14.SH NAME
15vDSO \- overview of the virtual ELF dynamic shared object
16.SH SYNOPSIS
17.B #include <sys/auxv.h>
18
19.B void *vdso = (uintptr_t) getauxval(AT_SYSINFO_EHDR);
20.SH DESCRIPTION
8635ed1b
MK
21The "vDSO" is a small shared library that
22the kernel automatically maps into the
2800db82 23address space of all user-space applications.
fb634bd8 24Applications usually do not need to concern themselves with these details
2800db82 25as the vDSO is most commonly called by the C library.
f6816de9 26This way you can code in the normal way using standard functions
fb634bd8
MK
27and the C library will take care
28of using any functionality that is available via the vDSO.
2800db82
MF
29
30Why does the vDSO exist at all?
8635ed1b 31There are some system calls the kernel provides that
dd6b62ec 32user-space code ends up using frequently,
8635ed1b 33to the point that such calls can dominate overall performance.
fb634bd8 34This is due both to the frequency of the call as well as the
35432a03 35context-switch overhead that results
2800db82
MF
36from exiting user space and entering the kernel.
37
8635ed1b
MK
38The rest of this documentation is geared toward the curious and/or
39C library writers rather than general developers.
2800db82
MF
40If you're trying to call the vDSO in your own application rather than using
41the C library, you're most likely doing it wrong.
42.SS Example background
43Making system calls can be slow.
fb634bd8
MK
44In x86 32-bit systems, you can trigger a software interrupt
45.RI ( "int $0x80" )
46to tell the kernel you wish to make a system call.
47However, this instruction is expensive: it goes through
48the full interrupt-handling paths
49in the processor's microcode as well as in the kernel.
50Newer processors have faster (but backward incompatible) instructions to
2800db82
MF
51initiate system calls.
52Rather than require the C library to figure out if this functionality is
8635ed1b 53available at run time,
fb634bd8 54the C library can use functions provided by the kernel in
2800db82
MF
55the vDSO.
56
57Note that the terminology can be confusing.
fb634bd8
MK
58On x86 systems, the vDSO function
59used to determine the preferred method of making a system call is
60named "__kernel_vsyscall", but on x86_64,
8635ed1b
MK
61the term "vsyscall" also refers to an obsolete way to ask the kernel
62what time it is or what CPU the caller is on.
2800db82 63
fb634bd8
MK
64One frequently used system call is
65.BR gettimeofday (2).
66This system call is called both directly by user-space applications
67as well as indirectly by
2800db82 68the C library.
8635ed1b
MK
69Think timestamps or timing loops or polling\(emall of these
70frequently need to know what time it is right now.
71This information is also not secret\(emany application in any
72privilege mode (root or any unprivileged user) will get the same answer.
73Thus the kernel arranges for the information required to answer
74this question to be placed in memory the process can access.
fb634bd8
MK
75Now a call to
76.BR gettimeofday (2)
77changes from a system call to a normal function
2800db82
MF
78call and a few memory accesses.
79.SS Finding the vDSO
8635ed1b
MK
80The base address of the vDSO (if one exists) is passed by the kernel to
81each program in the initial auxiliary vector (see
d3532647 82.BR getauxval (3)),
fb634bd8 83via the
2800db82
MF
84.B AT_SYSINFO_EHDR
85tag.
86
87You must not assume the vDSO is mapped at any particular location in the
88user's memory map.
8635ed1b 89The base address will usually be randomized at run time every time a new
2800db82
MF
90process image is created (at
91.BR execve (2)
92time).
fb634bd8
MK
93This is done for security reasons,
94to prevent "return-to-libc" attacks.
2800db82 95
fb634bd8 96For some architectures, there is also an
2800db82
MF
97.B AT_SYSINFO
98tag.
99This is used only for locating the vsyscall entry point and is frequently
100omitted or set to 0 (meaning it's not available).
fb634bd8
MK
101This tag is a throwback to the initial vDSO work (see
102.IR History
103below) and its use should be avoided.
2800db82
MF
104.SS File format
105Since the vDSO is a fully formed ELF image, you can do symbol lookups on it.
8635ed1b
MK
106This allows new symbols to be added with newer kernel releases,
107and allows the C library to detect available functionality at
108run time when running under different kernel versions.
fb634bd8 109Oftentimes the C library will do detection with the first call and then
2800db82
MF
110cache the result for subsequent calls.
111
112All symbols are also versioned (using the GNU version format).
113This allows the kernel to update the function signature without breaking
fb634bd8 114backward compatibility.
2800db82
MF
115This means changing the arguments that the function accepts as well as the
116return value.
fb634bd8
MK
117Thus, when looking up a symbol in the vDSO,
118you must always include the version
2800db82
MF
119to match the ABI you expect.
120
fb634bd8
MK
121Typically the vDSO follows the naming convention of prefixing
122all symbols with "__vdso_" or "__kernel_"
123so as to distinguish them from other standard symbols.
124For example, the "gettimeofday" function is named "__vdso_gettimeofday".
2800db82 125
fb634bd8
MK
126You use the standard C calling conventions when calling
127any of these functions.
2800db82
MF
128No need to worry about weird register or stack behavior.
129.SH NOTES
130.SS Source
8635ed1b
MK
131When you compile the kernel,
132it will automatically compile and link the vDSO code for you.
fb634bd8 133You will frequently find it under the architecture-specific directory:
2800db82
MF
134
135 find arch/$ARCH/ -name '*vdso*.so*' -o -name '*gate*.so*'
136
2800db82 137.SS vDSO names
35432a03 138The name of vDSO varies across architectures.
d3532647 139It will often show up in things like glibc's
fb634bd8
MK
140.BR ldd (1)
141output.
2800db82
MF
142The exact name should not matter to any code, so do not hardcode it.
143.if t \{\
144.ft CW
145\}
146.TS
147l l.
148user ABI vDSO name
149_
150aarch64 linux-vdso.so.1
151ia64 linux-gate.so.1
152ppc/32 linux-vdso32.so.1
153ppc/64 linux-vdso64.so.1
154s390 linux-vdso32.so.1
155s390x linux-vdso64.so.1
156sh linux-gate.so.1
157i386 linux-gate.so.1
158x86_64 linux-vdso.so.1
159x86/x32 linux-vdso.so.1
160.TE
161.if t \{\
162.in
163.ft P
164\}
dd6b62ec 165.SH ARCHITECTURE-SPECIFIC NOTES
f6816de9
MK
166The subsections below provide architecture-specific notes
167on the vDSO.
168
169Note that the vDSO that is used is based on the ABI of your user-space code
170and not the ABI of the kernel.
171Thus, for example,
172when you run an i386 32-bit ELF binary,
173you'll get the same vDSO regardless of whether you run it under
174an i386 32-bit kernel or under an x86_64 64-bit kernel.
dd6b62ec 175Therefore, the name of the user-space ABI should be used to determine
f6816de9 176which of the sections below is relevant.
fb634bd8 177.SS ARM functions
2800db82
MF
178.\" See linux/arch/arm/kernel/entry-armv.S
179.\" See linux/Documentation/arm/kernel_user_helpers.txt
fb634bd8 180The ARM port has a code page full of utility functions.
2800db82
MF
181Since it's just a raw page of code, there is no ELF information for doing
182symbol lookups or versioning.
183It does provide support for different versions though.
184
fb634bd8
MK
185For information on this code page,
186it's best to refer to the kernel documentation
2800db82 187as it's extremely detailed and covers everything you need to know:
fb634bd8 188.IR Documentation/arm/kernel_user_helpers.txt .
2800db82
MF
189.SS aarch64 functions
190.\" See linux/arch/arm64/kernel/vdso/vdso.lds.S
f6816de9 191The table below lists the symbols exported by the vDSO.
2800db82
MF
192.if t \{\
193.ft CW
194\}
195.TS
196l l.
197symbol version
198_
199__kernel_rt_sigreturn LINUX_2.6.39
200__kernel_gettimeofday LINUX_2.6.39
201__kernel_clock_gettime LINUX_2.6.39
202__kernel_clock_getres LINUX_2.6.39
203.TE
204.if t \{\
205.in
206.ft P
207\}
208.SS bfin (Blackfin) functions
209.\" See linux/arch/blackfin/kernel/fixed_code.S
210.\" See http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code
8635ed1b
MK
211As this CPU lacks a memory management unit (MMU),
212it doesn't set up a vDSO in the normal sense.
213Instead, it maps at boot time a few raw functions into
214a fixed location in memory.
2800db82 215User-space applications then call directly into that region.
8635ed1b
MK
216There is no provision for backward compatibility
217beyond sniffing raw opcodes,
fb634bd8 218but as this is an embedded CPU, it can get away with things\(emsome of the
2800db82
MF
219object formats it runs aren't even ELF based (they're bFLT/FLAT).
220
f6816de9
MK
221For information on this code page,
222it's best to refer to the public documentation:
2800db82
MF
223.br
224http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code
225.SS ia64 (Itanium) functions
226.\" See linux/arch/ia64/kernel/gate.lds.S
227.\" Also linux/arch/ia64/kernel/fsys.S and linux/Documentation/ia64/fsys.txt
f6816de9 228The table below lists the symbols exported by the vDSO.
2800db82
MF
229.if t \{\
230.ft CW
231\}
232.TS
233l l.
234symbol version
235_
236__kernel_sigtramp LINUX_2.5
237__kernel_syscall_via_break LINUX_2.5
238__kernel_syscall_via_epc LINUX_2.5
239.TE
240.if t \{\
241.in
242.ft P
243\}
244
fb634bd8 245The Itanium port is somewhat tricky.
8635ed1b
MK
246In addition to the vDSO above, it also has "light-weight system calls"
247(also known as "fast syscalls" or "fsys").
fb634bd8
MK
248You can invoke these via the
249.I __kernel_syscall_via_epc
250vDSO helper.
2800db82
MF
251The system calls listed here have the same semantics as if you called them
252directly via
fb634bd8 253.BR syscall (2),
2800db82
MF
254so refer to the relevant
255documentation for each.
256The table below lists the functions available via this mechanism.
257.if t \{\
258.ft CW
259\}
260.TS
261l.
262function
263_
264clock_gettime
265getcpu
266getpid
267getppid
268gettimeofday
269set_tid_address
270.TE
271.if t \{\
272.in
273.ft P
274\}
275.SS parisc (hppa) functions
276.\" See linux/arch/parisc/kernel/syscall.S
277.\" See linux/Documentation/parisc/registers
8635ed1b
MK
278The parisc port has a code page full of utility functions
279called a gateway page.
fb634bd8
MK
280Rather than use the normal ELF auxiliary vector approach,
281it passes the address of
2800db82
MF
282the page to the process via the SR2 register.
283The permissions on the page are such that merely executing those addresses
dd6b62ec 284automatically executes with kernel privileges and not in user space.
2800db82
MF
285This is done to match the way HP-UX works.
286
287Since it's just a raw page of code, there is no ELF information for doing
288symbol lookups or versioning.
fb634bd8
MK
289Simply call into the appropriate offset via the branch instruction,
290for example:
291
292 ble <offset>(%sr2, %r0)
2800db82
MF
293.if t \{\
294.ft CW
295\}
296.TS
297l l.
298offset function
299_
30000b0 lws_entry
30100e0 set_thread_pointer
3020100 linux_gateway_entry (syscall)
3030268 syscall_nosys
3040274 tracesys
3050324 tracesys_next
3060368 tracesys_exit
30703a0 tracesys_sigexit
30803b8 lws_start
30903dc lws_exit_nosys
31003e0 lws_exit
31103e4 lws_compare_and_swap64
31203e8 lws_compare_and_swap
3130404 cas_wouldblock
3140410 cas_action
315.TE
316.if t \{\
317.in
318.ft P
319\}
320.SS ppc/32 functions
321.\" See linux/arch/powerpc/kernel/vdso32/vdso32.lds.S
f6816de9 322The table below lists the symbols exported by the vDSO.
2800db82
MF
323The functions marked with a
324.I *
f6816de9
MK
325are available only when the kernel is
326a PowerPC64 (64-bit) kernel.
2800db82
MF
327.if t \{\
328.ft CW
329\}
330.TS
331l l.
332symbol version
333_
334__kernel_clock_getres LINUX_2.6.15
335__kernel_clock_gettime LINUX_2.6.15
336__kernel_datapage_offset LINUX_2.6.15
337__kernel_get_syscall_map LINUX_2.6.15
338__kernel_get_tbfreq LINUX_2.6.15
339__kernel_getcpu \fI*\fR LINUX_2.6.15
340__kernel_gettimeofday LINUX_2.6.15
341__kernel_sigtramp_rt32 LINUX_2.6.15
342__kernel_sigtramp32 LINUX_2.6.15
343__kernel_sync_dicache LINUX_2.6.15
344__kernel_sync_dicache_p5 LINUX_2.6.15
345.TE
346.if t \{\
347.in
348.ft P
349\}
350.SS ppc/64 functions
351.\" See linux/arch/powerpc/kernel/vdso64/vdso64.lds.S
f6816de9 352The table below lists the symbols exported by the vDSO.
2800db82
MF
353.if t \{\
354.ft CW
355\}
356.TS
357l l.
358symbol version
359_
360__kernel_clock_getres LINUX_2.6.15
361__kernel_clock_gettime LINUX_2.6.15
362__kernel_datapage_offset LINUX_2.6.15
363__kernel_get_syscall_map LINUX_2.6.15
364__kernel_get_tbfreq LINUX_2.6.15
365__kernel_getcpu LINUX_2.6.15
366__kernel_gettimeofday LINUX_2.6.15
367__kernel_sigtramp_rt64 LINUX_2.6.15
368__kernel_sync_dicache LINUX_2.6.15
369__kernel_sync_dicache_p5 LINUX_2.6.15
370.TE
371.if t \{\
372.in
373.ft P
374\}
375.SS s390 functions
376.\" See linux/arch/s390/kernel/vdso32/vdso32.lds.S
f6816de9 377The table below lists the symbols exported by the vDSO.
2800db82
MF
378.if t \{\
379.ft CW
380\}
381.TS
382l l.
383symbol version
384_
385__kernel_clock_getres LINUX_2.6.29
386__kernel_clock_gettime LINUX_2.6.29
387__kernel_gettimeofday LINUX_2.6.29
388.TE
389.if t \{\
390.in
391.ft P
392\}
393.SS s390x functions
394.\" See linux/arch/s390/kernel/vdso64/vdso64.lds.S
f6816de9 395The table below lists the symbols exported by the vDSO.
2800db82
MF
396.if t \{\
397.ft CW
398\}
399.TS
400l l.
401symbol version
402_
403__kernel_clock_getres LINUX_2.6.29
404__kernel_clock_gettime LINUX_2.6.29
405__kernel_gettimeofday LINUX_2.6.29
406.TE
407.if t \{\
408.in
409.ft P
410\}
411.SS sh (SuperH) functions
412.\" See linux/arch/sh/kernel/vsyscall/vsyscall.lds.S
f6816de9 413The table below lists the symbols exported by the vDSO.
2800db82
MF
414.if t \{\
415.ft CW
416\}
417.TS
418l l.
419symbol version
420_
421__kernel_rt_sigreturn LINUX_2.6
422__kernel_sigreturn LINUX_2.6
423__kernel_vsyscall LINUX_2.6
424.TE
425.if t \{\
426.in
427.ft P
428\}
429.SS i386 functions
430.\" See linux/arch/x86/vdso/vdso32/vdso32.lds.S
f6816de9 431The table below lists the symbols exported by the vDSO.
2800db82
MF
432.if t \{\
433.ft CW
434\}
435.TS
436l l.
437symbol version
438_
439__kernel_sigreturn LINUX_2.5
440__kernel_rt_sigreturn LINUX_2.5
441__kernel_vsyscall LINUX_2.5
442.TE
443.if t \{\
444.in
445.ft P
446\}
447.SS x86_64 functions
448.\" See linux/arch/x86/vdso/vdso.lds.S
f6816de9 449The table below lists the symbols exported by the vDSO.
2800db82
MF
450All of these symbols are also available without the "__vdso_" prefix, but
451you should ignore those and stick to the names below.
452.if t \{\
453.ft CW
454\}
455.TS
456l l.
457symbol version
458_
459__vdso_clock_gettime LINUX_2.6
460__vdso_getcpu LINUX_2.6
461__vdso_gettimeofday LINUX_2.6
462__vdso_time LINUX_2.6
463.TE
464.if t \{\
465.in
466.ft P
467\}
468.SS x86/x32 functions
469.\" See linux/arch/x86/vdso/vdso32.lds.S
f6816de9 470The table below lists the symbols exported by the vDSO.
2800db82
MF
471.if t \{\
472.ft CW
473\}
474.TS
475l l.
476symbol version
477_
478__vdso_clock_gettime LINUX_2.6
479__vdso_getcpu LINUX_2.6
480__vdso_gettimeofday LINUX_2.6
481__vdso_time LINUX_2.6
482.TE
483.if t \{\
484.in
485.ft P
486\}
487.SS History
fb634bd8
MK
488The vDSO was originally just a single function\(emthe vsyscall.
489In older kernels, you might see that name
490in a process's memory map rather than "vdso".
d3532647 491Over time, people realized that this mechanism
fb634bd8 492was a great way to pass more functionality
2800db82
MF
493to user space, so it was reconceived as a vDSO in the current format.
494.SH SEE ALSO
495.BR syscalls (2),
496.BR getauxval (3),
497.BR proc (5)
498
fb634bd8
MK
499The documents, examples, and source code in the Linux source code tree:
500.in +4n
2800db82 501.nf
fb634bd8 502
2800db82 503Documentation/ABI/stable/vdso
fb634bd8 504Documentation/ia64/fsys.txt
2800db82 505Documentation/vDSO/* (includes examples of using the vDSO)
fb634bd8 506
2800db82
MF
507find arch/ -iname '*vdso*' -o -iname '*gate*'
508.fi
fb634bd8 509.in