]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man7/vdso.7
elf.5: wfix
[thirdparty/man-pages.git] / man7 / vdso.7
CommitLineData
2800db82
MF
1.\" Written by Mike Frysinger <vapier@gentoo.org>
2.\"
3.\" %%%LICENSE_START(PUBLIC_DOMAIN)
4.\" This page is in the public domain.
5.\" %%%LICENSE_END
6.\"
fb634bd8 7.\" Useful background:
8635ed1b
MK
8.\" http://articles.manugarg.com/systemcallinlinux2_6.html
9.\" https://lwn.net/Articles/446528/
10.\" http://www.linuxjournal.com/content/creating-vdso-colonels-other-chicken
11.\" http://www.trilithium.com/johan/2005/08/linux-gate/
fb634bd8 12.\"
29b41e74 13.TH VDSO 7 2015-12-28 "Linux" "Linux Programmer's Manual"
2800db82 14.SH NAME
3e179634 15vdso \- overview of the virtual ELF dynamic shared object
2800db82
MF
16.SH SYNOPSIS
17.B #include <sys/auxv.h>
18
19.B void *vdso = (uintptr_t) getauxval(AT_SYSINFO_EHDR);
20.SH DESCRIPTION
e1549829 21The "vDSO" (virtual dynamic shared object) is a small shared library that
8635ed1b 22the kernel automatically maps into the
2800db82 23address space of all user-space applications.
fb634bd8 24Applications usually do not need to concern themselves with these details
2800db82 25as the vDSO is most commonly called by the C library.
f6816de9 26This way you can code in the normal way using standard functions
fb634bd8
MK
27and the C library will take care
28of using any functionality that is available via the vDSO.
2800db82
MF
29
30Why does the vDSO exist at all?
8635ed1b 31There are some system calls the kernel provides that
dd6b62ec 32user-space code ends up using frequently,
8635ed1b 33to the point that such calls can dominate overall performance.
fb634bd8 34This is due both to the frequency of the call as well as the
35432a03 35context-switch overhead that results
2800db82
MF
36from exiting user space and entering the kernel.
37
8635ed1b
MK
38The rest of this documentation is geared toward the curious and/or
39C library writers rather than general developers.
2800db82
MF
40If you're trying to call the vDSO in your own application rather than using
41the C library, you're most likely doing it wrong.
42.SS Example background
43Making system calls can be slow.
fb634bd8
MK
44In x86 32-bit systems, you can trigger a software interrupt
45.RI ( "int $0x80" )
46to tell the kernel you wish to make a system call.
47However, this instruction is expensive: it goes through
48the full interrupt-handling paths
49in the processor's microcode as well as in the kernel.
50Newer processors have faster (but backward incompatible) instructions to
2800db82
MF
51initiate system calls.
52Rather than require the C library to figure out if this functionality is
8635ed1b 53available at run time,
fb634bd8 54the C library can use functions provided by the kernel in
2800db82
MF
55the vDSO.
56
57Note that the terminology can be confusing.
fb634bd8
MK
58On x86 systems, the vDSO function
59used to determine the preferred method of making a system call is
60named "__kernel_vsyscall", but on x86_64,
8635ed1b
MK
61the term "vsyscall" also refers to an obsolete way to ask the kernel
62what time it is or what CPU the caller is on.
2800db82 63
fb634bd8
MK
64One frequently used system call is
65.BR gettimeofday (2).
66This system call is called both directly by user-space applications
67as well as indirectly by
2800db82 68the C library.
8635ed1b
MK
69Think timestamps or timing loops or polling\(emall of these
70frequently need to know what time it is right now.
71This information is also not secret\(emany application in any
72privilege mode (root or any unprivileged user) will get the same answer.
73Thus the kernel arranges for the information required to answer
74this question to be placed in memory the process can access.
fb634bd8
MK
75Now a call to
76.BR gettimeofday (2)
77changes from a system call to a normal function
2800db82
MF
78call and a few memory accesses.
79.SS Finding the vDSO
8635ed1b
MK
80The base address of the vDSO (if one exists) is passed by the kernel to
81each program in the initial auxiliary vector (see
d3532647 82.BR getauxval (3)),
fb634bd8 83via the
2800db82
MF
84.B AT_SYSINFO_EHDR
85tag.
86
87You must not assume the vDSO is mapped at any particular location in the
88user's memory map.
8635ed1b 89The base address will usually be randomized at run time every time a new
2800db82
MF
90process image is created (at
91.BR execve (2)
92time).
fb634bd8
MK
93This is done for security reasons,
94to prevent "return-to-libc" attacks.
2800db82 95
fb634bd8 96For some architectures, there is also an
2800db82
MF
97.B AT_SYSINFO
98tag.
99This is used only for locating the vsyscall entry point and is frequently
100omitted or set to 0 (meaning it's not available).
fb634bd8
MK
101This tag is a throwback to the initial vDSO work (see
102.IR History
103below) and its use should be avoided.
2800db82
MF
104.SS File format
105Since the vDSO is a fully formed ELF image, you can do symbol lookups on it.
8635ed1b
MK
106This allows new symbols to be added with newer kernel releases,
107and allows the C library to detect available functionality at
108run time when running under different kernel versions.
fb634bd8 109Oftentimes the C library will do detection with the first call and then
2800db82
MF
110cache the result for subsequent calls.
111
112All symbols are also versioned (using the GNU version format).
113This allows the kernel to update the function signature without breaking
fb634bd8 114backward compatibility.
2800db82
MF
115This means changing the arguments that the function accepts as well as the
116return value.
fb634bd8
MK
117Thus, when looking up a symbol in the vDSO,
118you must always include the version
2800db82
MF
119to match the ABI you expect.
120
fb634bd8
MK
121Typically the vDSO follows the naming convention of prefixing
122all symbols with "__vdso_" or "__kernel_"
123so as to distinguish them from other standard symbols.
124For example, the "gettimeofday" function is named "__vdso_gettimeofday".
2800db82 125
fb634bd8
MK
126You use the standard C calling conventions when calling
127any of these functions.
2800db82
MF
128No need to worry about weird register or stack behavior.
129.SH NOTES
130.SS Source
8635ed1b
MK
131When you compile the kernel,
132it will automatically compile and link the vDSO code for you.
fb634bd8 133You will frequently find it under the architecture-specific directory:
2800db82
MF
134
135 find arch/$ARCH/ -name '*vdso*.so*' -o -name '*gate*.so*'
787dd4ad 136.\"
2800db82 137.SS vDSO names
21ffc8d1 138The name of the vDSO varies across architectures.
d3532647 139It will often show up in things like glibc's
fb634bd8
MK
140.BR ldd (1)
141output.
2800db82
MF
142The exact name should not matter to any code, so do not hardcode it.
143.if t \{\
144.ft CW
145\}
146.TS
147l l.
148user ABI vDSO name
149_
150aarch64 linux-vdso.so.1
ebfc3611 151arm linux-vdso.so.1
2800db82
MF
152ia64 linux-gate.so.1
153ppc/32 linux-vdso32.so.1
154ppc/64 linux-vdso64.so.1
155s390 linux-vdso32.so.1
156s390x linux-vdso64.so.1
157sh linux-gate.so.1
158i386 linux-gate.so.1
159x86_64 linux-vdso.so.1
160x86/x32 linux-vdso.so.1
161.TE
162.if t \{\
163.in
164.ft P
165\}
afc40b07
MK
166.SS strace(1) and the vDSO
167When tracing systems calls with
168.BR strace (1),
169symbols (system calls) that are exported by the vDSO will
170.I not
171appear in the trace output.
dd6b62ec 172.SH ARCHITECTURE-SPECIFIC NOTES
f6816de9
MK
173The subsections below provide architecture-specific notes
174on the vDSO.
175
176Note that the vDSO that is used is based on the ABI of your user-space code
177and not the ABI of the kernel.
178Thus, for example,
179when you run an i386 32-bit ELF binary,
180you'll get the same vDSO regardless of whether you run it under
181an i386 32-bit kernel or under an x86_64 64-bit kernel.
dd6b62ec 182Therefore, the name of the user-space ABI should be used to determine
f6816de9 183which of the sections below is relevant.
fb634bd8 184.SS ARM functions
ebfc3611
NL
185.\" See linux/arch/arm/vdso/vdso.lds.S
186.\" Commit: 8512287a8165592466cb9cb347ba94892e9c56a5
187The table below lists the symbols exported by the vDSO.
188.if t \{\
189.ft CW
190\}
191.TS
192l l.
193symbol version
194_
195__vdso_gettimeofday LINUX_2.6 (exported since Linux 4.1)
196__vdso_clock_gettime LINUX_2.6 (exported since Linux 4.1)
197.TE
198.if t \{\
199.in
200.ft P
201\}
202
2800db82
MF
203.\" See linux/arch/arm/kernel/entry-armv.S
204.\" See linux/Documentation/arm/kernel_user_helpers.txt
ebfc3611 205Additionally, the ARM port has a code page full of utility functions.
2800db82
MF
206Since it's just a raw page of code, there is no ELF information for doing
207symbol lookups or versioning.
208It does provide support for different versions though.
209
fb634bd8
MK
210For information on this code page,
211it's best to refer to the kernel documentation
2800db82 212as it's extremely detailed and covers everything you need to know:
fb634bd8 213.IR Documentation/arm/kernel_user_helpers.txt .
2800db82
MF
214.SS aarch64 functions
215.\" See linux/arch/arm64/kernel/vdso/vdso.lds.S
f6816de9 216The table below lists the symbols exported by the vDSO.
2800db82
MF
217.if t \{\
218.ft CW
219\}
220.TS
221l l.
222symbol version
223_
224__kernel_rt_sigreturn LINUX_2.6.39
225__kernel_gettimeofday LINUX_2.6.39
226__kernel_clock_gettime LINUX_2.6.39
227__kernel_clock_getres LINUX_2.6.39
228.TE
229.if t \{\
230.in
231.ft P
232\}
233.SS bfin (Blackfin) functions
234.\" See linux/arch/blackfin/kernel/fixed_code.S
235.\" See http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code
8635ed1b
MK
236As this CPU lacks a memory management unit (MMU),
237it doesn't set up a vDSO in the normal sense.
238Instead, it maps at boot time a few raw functions into
239a fixed location in memory.
2800db82 240User-space applications then call directly into that region.
8635ed1b
MK
241There is no provision for backward compatibility
242beyond sniffing raw opcodes,
fb634bd8 243but as this is an embedded CPU, it can get away with things\(emsome of the
2800db82
MF
244object formats it runs aren't even ELF based (they're bFLT/FLAT).
245
f6816de9
MK
246For information on this code page,
247it's best to refer to the public documentation:
2800db82
MF
248.br
249http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code
250.SS ia64 (Itanium) functions
251.\" See linux/arch/ia64/kernel/gate.lds.S
252.\" Also linux/arch/ia64/kernel/fsys.S and linux/Documentation/ia64/fsys.txt
f6816de9 253The table below lists the symbols exported by the vDSO.
2800db82
MF
254.if t \{\
255.ft CW
256\}
257.TS
258l l.
259symbol version
260_
261__kernel_sigtramp LINUX_2.5
262__kernel_syscall_via_break LINUX_2.5
263__kernel_syscall_via_epc LINUX_2.5
264.TE
265.if t \{\
266.in
267.ft P
268\}
269
fb634bd8 270The Itanium port is somewhat tricky.
8635ed1b
MK
271In addition to the vDSO above, it also has "light-weight system calls"
272(also known as "fast syscalls" or "fsys").
fb634bd8
MK
273You can invoke these via the
274.I __kernel_syscall_via_epc
275vDSO helper.
2800db82
MF
276The system calls listed here have the same semantics as if you called them
277directly via
fb634bd8 278.BR syscall (2),
2800db82
MF
279so refer to the relevant
280documentation for each.
281The table below lists the functions available via this mechanism.
282.if t \{\
283.ft CW
284\}
285.TS
286l.
287function
288_
289clock_gettime
290getcpu
291getpid
292getppid
293gettimeofday
294set_tid_address
295.TE
296.if t \{\
297.in
298.ft P
299\}
300.SS parisc (hppa) functions
301.\" See linux/arch/parisc/kernel/syscall.S
302.\" See linux/Documentation/parisc/registers
8635ed1b
MK
303The parisc port has a code page full of utility functions
304called a gateway page.
fb634bd8
MK
305Rather than use the normal ELF auxiliary vector approach,
306it passes the address of
2800db82
MF
307the page to the process via the SR2 register.
308The permissions on the page are such that merely executing those addresses
dd6b62ec 309automatically executes with kernel privileges and not in user space.
2800db82
MF
310This is done to match the way HP-UX works.
311
312Since it's just a raw page of code, there is no ELF information for doing
313symbol lookups or versioning.
fb634bd8
MK
314Simply call into the appropriate offset via the branch instruction,
315for example:
316
317 ble <offset>(%sr2, %r0)
2800db82
MF
318.if t \{\
319.ft CW
320\}
321.TS
322l l.
323offset function
324_
32500b0 lws_entry
32600e0 set_thread_pointer
3270100 linux_gateway_entry (syscall)
3280268 syscall_nosys
3290274 tracesys
3300324 tracesys_next
3310368 tracesys_exit
33203a0 tracesys_sigexit
33303b8 lws_start
33403dc lws_exit_nosys
33503e0 lws_exit
33603e4 lws_compare_and_swap64
33703e8 lws_compare_and_swap
3380404 cas_wouldblock
3390410 cas_action
340.TE
341.if t \{\
342.in
343.ft P
344\}
345.SS ppc/32 functions
346.\" See linux/arch/powerpc/kernel/vdso32/vdso32.lds.S
f6816de9 347The table below lists the symbols exported by the vDSO.
2800db82
MF
348The functions marked with a
349.I *
f6816de9
MK
350are available only when the kernel is
351a PowerPC64 (64-bit) kernel.
2800db82
MF
352.if t \{\
353.ft CW
354\}
355.TS
356l l.
357symbol version
358_
359__kernel_clock_getres LINUX_2.6.15
360__kernel_clock_gettime LINUX_2.6.15
361__kernel_datapage_offset LINUX_2.6.15
362__kernel_get_syscall_map LINUX_2.6.15
363__kernel_get_tbfreq LINUX_2.6.15
364__kernel_getcpu \fI*\fR LINUX_2.6.15
365__kernel_gettimeofday LINUX_2.6.15
366__kernel_sigtramp_rt32 LINUX_2.6.15
367__kernel_sigtramp32 LINUX_2.6.15
368__kernel_sync_dicache LINUX_2.6.15
369__kernel_sync_dicache_p5 LINUX_2.6.15
370.TE
371.if t \{\
372.in
373.ft P
374\}
375.SS ppc/64 functions
376.\" See linux/arch/powerpc/kernel/vdso64/vdso64.lds.S
f6816de9 377The table below lists the symbols exported by the vDSO.
2800db82
MF
378.if t \{\
379.ft CW
380\}
381.TS
382l l.
383symbol version
384_
385__kernel_clock_getres LINUX_2.6.15
386__kernel_clock_gettime LINUX_2.6.15
387__kernel_datapage_offset LINUX_2.6.15
388__kernel_get_syscall_map LINUX_2.6.15
389__kernel_get_tbfreq LINUX_2.6.15
390__kernel_getcpu LINUX_2.6.15
391__kernel_gettimeofday LINUX_2.6.15
392__kernel_sigtramp_rt64 LINUX_2.6.15
393__kernel_sync_dicache LINUX_2.6.15
394__kernel_sync_dicache_p5 LINUX_2.6.15
395.TE
396.if t \{\
397.in
398.ft P
399\}
400.SS s390 functions
401.\" See linux/arch/s390/kernel/vdso32/vdso32.lds.S
f6816de9 402The table below lists the symbols exported by the vDSO.
2800db82
MF
403.if t \{\
404.ft CW
405\}
406.TS
407l l.
408symbol version
409_
410__kernel_clock_getres LINUX_2.6.29
411__kernel_clock_gettime LINUX_2.6.29
412__kernel_gettimeofday LINUX_2.6.29
413.TE
414.if t \{\
415.in
416.ft P
417\}
418.SS s390x functions
419.\" See linux/arch/s390/kernel/vdso64/vdso64.lds.S
f6816de9 420The table below lists the symbols exported by the vDSO.
2800db82
MF
421.if t \{\
422.ft CW
423\}
424.TS
425l l.
426symbol version
427_
428__kernel_clock_getres LINUX_2.6.29
429__kernel_clock_gettime LINUX_2.6.29
430__kernel_gettimeofday LINUX_2.6.29
431.TE
432.if t \{\
433.in
434.ft P
435\}
436.SS sh (SuperH) functions
437.\" See linux/arch/sh/kernel/vsyscall/vsyscall.lds.S
f6816de9 438The table below lists the symbols exported by the vDSO.
2800db82
MF
439.if t \{\
440.ft CW
441\}
442.TS
443l l.
444symbol version
445_
446__kernel_rt_sigreturn LINUX_2.6
447__kernel_sigreturn LINUX_2.6
448__kernel_vsyscall LINUX_2.6
449.TE
450.if t \{\
451.in
452.ft P
453\}
454.SS i386 functions
455.\" See linux/arch/x86/vdso/vdso32/vdso32.lds.S
f6816de9 456The table below lists the symbols exported by the vDSO.
2800db82
MF
457.if t \{\
458.ft CW
459\}
460.TS
461l l.
462symbol version
463_
464__kernel_sigreturn LINUX_2.5
465__kernel_rt_sigreturn LINUX_2.5
466__kernel_vsyscall LINUX_2.5
1b294717
MF
467.\" Added in 7a59ed415f5b57469e22e41fc4188d5399e0b194 and updated
468.\" in 37c975545ec63320789962bf307f000f08fabd48.
54a82012
MK
469__vdso_clock_gettime LINUX_2.6 (exported since Linux 3.15)
470__vdso_gettimeofday LINUX_2.6 (exported since Linux 3.15)
471__vdso_time LINUX_2.6 (exported since Linux 3.15)
2800db82
MF
472.TE
473.if t \{\
474.in
475.ft P
476\}
477.SS x86_64 functions
478.\" See linux/arch/x86/vdso/vdso.lds.S
f6816de9 479The table below lists the symbols exported by the vDSO.
2800db82
MF
480All of these symbols are also available without the "__vdso_" prefix, but
481you should ignore those and stick to the names below.
482.if t \{\
483.ft CW
484\}
485.TS
486l l.
487symbol version
488_
489__vdso_clock_gettime LINUX_2.6
490__vdso_getcpu LINUX_2.6
491__vdso_gettimeofday LINUX_2.6
492__vdso_time LINUX_2.6
493.TE
494.if t \{\
495.in
496.ft P
497\}
498.SS x86/x32 functions
499.\" See linux/arch/x86/vdso/vdso32.lds.S
f6816de9 500The table below lists the symbols exported by the vDSO.
2800db82
MF
501.if t \{\
502.ft CW
503\}
504.TS
505l l.
506symbol version
507_
508__vdso_clock_gettime LINUX_2.6
509__vdso_getcpu LINUX_2.6
510__vdso_gettimeofday LINUX_2.6
511__vdso_time LINUX_2.6
512.TE
513.if t \{\
514.in
515.ft P
516\}
517.SS History
fb634bd8
MK
518The vDSO was originally just a single function\(emthe vsyscall.
519In older kernels, you might see that name
520in a process's memory map rather than "vdso".
d3532647 521Over time, people realized that this mechanism
fb634bd8 522was a great way to pass more functionality
2800db82
MF
523to user space, so it was reconceived as a vDSO in the current format.
524.SH SEE ALSO
525.BR syscalls (2),
526.BR getauxval (3),
527.BR proc (5)
528
fb634bd8
MK
529The documents, examples, and source code in the Linux source code tree:
530.in +4n
2800db82 531.nf
fb634bd8 532
2800db82 533Documentation/ABI/stable/vdso
fb634bd8 534Documentation/ia64/fsys.txt
2800db82 535Documentation/vDSO/* (includes examples of using the vDSO)
fb634bd8 536
2800db82
MF
537find arch/ -iname '*vdso*' -o -iname '*gate*'
538.fi
fb634bd8 539.in