]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man7/vdso.7
membarrier.2: Rework discussion of 'cmd'
[thirdparty/man-pages.git] / man7 / vdso.7
CommitLineData
2800db82
MF
1.\" Written by Mike Frysinger <vapier@gentoo.org>
2.\"
3.\" %%%LICENSE_START(PUBLIC_DOMAIN)
4.\" This page is in the public domain.
5.\" %%%LICENSE_END
6.\"
fb634bd8 7.\" Useful background:
8635ed1b
MK
8.\" http://articles.manugarg.com/systemcallinlinux2_6.html
9.\" https://lwn.net/Articles/446528/
10.\" http://www.linuxjournal.com/content/creating-vdso-colonels-other-chicken
11.\" http://www.trilithium.com/johan/2005/08/linux-gate/
fb634bd8 12.\"
5722c835 13.TH VDSO 7 2015-07-23 "Linux" "Linux Programmer's Manual"
2800db82
MF
14.SH NAME
15vDSO \- overview of the virtual ELF dynamic shared object
16.SH SYNOPSIS
17.B #include <sys/auxv.h>
18
19.B void *vdso = (uintptr_t) getauxval(AT_SYSINFO_EHDR);
20.SH DESCRIPTION
e1549829 21The "vDSO" (virtual dynamic shared object) is a small shared library that
8635ed1b 22the kernel automatically maps into the
2800db82 23address space of all user-space applications.
fb634bd8 24Applications usually do not need to concern themselves with these details
2800db82 25as the vDSO is most commonly called by the C library.
f6816de9 26This way you can code in the normal way using standard functions
fb634bd8
MK
27and the C library will take care
28of using any functionality that is available via the vDSO.
2800db82
MF
29
30Why does the vDSO exist at all?
8635ed1b 31There are some system calls the kernel provides that
dd6b62ec 32user-space code ends up using frequently,
8635ed1b 33to the point that such calls can dominate overall performance.
fb634bd8 34This is due both to the frequency of the call as well as the
35432a03 35context-switch overhead that results
2800db82
MF
36from exiting user space and entering the kernel.
37
8635ed1b
MK
38The rest of this documentation is geared toward the curious and/or
39C library writers rather than general developers.
2800db82
MF
40If you're trying to call the vDSO in your own application rather than using
41the C library, you're most likely doing it wrong.
42.SS Example background
43Making system calls can be slow.
fb634bd8
MK
44In x86 32-bit systems, you can trigger a software interrupt
45.RI ( "int $0x80" )
46to tell the kernel you wish to make a system call.
47However, this instruction is expensive: it goes through
48the full interrupt-handling paths
49in the processor's microcode as well as in the kernel.
50Newer processors have faster (but backward incompatible) instructions to
2800db82
MF
51initiate system calls.
52Rather than require the C library to figure out if this functionality is
8635ed1b 53available at run time,
fb634bd8 54the C library can use functions provided by the kernel in
2800db82
MF
55the vDSO.
56
57Note that the terminology can be confusing.
fb634bd8
MK
58On x86 systems, the vDSO function
59used to determine the preferred method of making a system call is
60named "__kernel_vsyscall", but on x86_64,
8635ed1b
MK
61the term "vsyscall" also refers to an obsolete way to ask the kernel
62what time it is or what CPU the caller is on.
2800db82 63
fb634bd8
MK
64One frequently used system call is
65.BR gettimeofday (2).
66This system call is called both directly by user-space applications
67as well as indirectly by
2800db82 68the C library.
8635ed1b
MK
69Think timestamps or timing loops or polling\(emall of these
70frequently need to know what time it is right now.
71This information is also not secret\(emany application in any
72privilege mode (root or any unprivileged user) will get the same answer.
73Thus the kernel arranges for the information required to answer
74this question to be placed in memory the process can access.
fb634bd8
MK
75Now a call to
76.BR gettimeofday (2)
77changes from a system call to a normal function
2800db82
MF
78call and a few memory accesses.
79.SS Finding the vDSO
8635ed1b
MK
80The base address of the vDSO (if one exists) is passed by the kernel to
81each program in the initial auxiliary vector (see
d3532647 82.BR getauxval (3)),
fb634bd8 83via the
2800db82
MF
84.B AT_SYSINFO_EHDR
85tag.
86
87You must not assume the vDSO is mapped at any particular location in the
88user's memory map.
8635ed1b 89The base address will usually be randomized at run time every time a new
2800db82
MF
90process image is created (at
91.BR execve (2)
92time).
fb634bd8
MK
93This is done for security reasons,
94to prevent "return-to-libc" attacks.
2800db82 95
fb634bd8 96For some architectures, there is also an
2800db82
MF
97.B AT_SYSINFO
98tag.
99This is used only for locating the vsyscall entry point and is frequently
100omitted or set to 0 (meaning it's not available).
fb634bd8
MK
101This tag is a throwback to the initial vDSO work (see
102.IR History
103below) and its use should be avoided.
2800db82
MF
104.SS File format
105Since the vDSO is a fully formed ELF image, you can do symbol lookups on it.
8635ed1b
MK
106This allows new symbols to be added with newer kernel releases,
107and allows the C library to detect available functionality at
108run time when running under different kernel versions.
fb634bd8 109Oftentimes the C library will do detection with the first call and then
2800db82
MF
110cache the result for subsequent calls.
111
112All symbols are also versioned (using the GNU version format).
113This allows the kernel to update the function signature without breaking
fb634bd8 114backward compatibility.
2800db82
MF
115This means changing the arguments that the function accepts as well as the
116return value.
fb634bd8
MK
117Thus, when looking up a symbol in the vDSO,
118you must always include the version
2800db82
MF
119to match the ABI you expect.
120
fb634bd8
MK
121Typically the vDSO follows the naming convention of prefixing
122all symbols with "__vdso_" or "__kernel_"
123so as to distinguish them from other standard symbols.
124For example, the "gettimeofday" function is named "__vdso_gettimeofday".
2800db82 125
fb634bd8
MK
126You use the standard C calling conventions when calling
127any of these functions.
2800db82
MF
128No need to worry about weird register or stack behavior.
129.SH NOTES
130.SS Source
8635ed1b
MK
131When you compile the kernel,
132it will automatically compile and link the vDSO code for you.
fb634bd8 133You will frequently find it under the architecture-specific directory:
2800db82
MF
134
135 find arch/$ARCH/ -name '*vdso*.so*' -o -name '*gate*.so*'
136
2800db82 137.SS vDSO names
21ffc8d1 138The name of the vDSO varies across architectures.
d3532647 139It will often show up in things like glibc's
fb634bd8
MK
140.BR ldd (1)
141output.
2800db82
MF
142The exact name should not matter to any code, so do not hardcode it.
143.if t \{\
144.ft CW
145\}
146.TS
147l l.
148user ABI vDSO name
149_
150aarch64 linux-vdso.so.1
ebfc3611 151arm linux-vdso.so.1
2800db82
MF
152ia64 linux-gate.so.1
153ppc/32 linux-vdso32.so.1
154ppc/64 linux-vdso64.so.1
155s390 linux-vdso32.so.1
156s390x linux-vdso64.so.1
157sh linux-gate.so.1
158i386 linux-gate.so.1
159x86_64 linux-vdso.so.1
160x86/x32 linux-vdso.so.1
161.TE
162.if t \{\
163.in
164.ft P
165\}
dd6b62ec 166.SH ARCHITECTURE-SPECIFIC NOTES
f6816de9
MK
167The subsections below provide architecture-specific notes
168on the vDSO.
169
170Note that the vDSO that is used is based on the ABI of your user-space code
171and not the ABI of the kernel.
172Thus, for example,
173when you run an i386 32-bit ELF binary,
174you'll get the same vDSO regardless of whether you run it under
175an i386 32-bit kernel or under an x86_64 64-bit kernel.
dd6b62ec 176Therefore, the name of the user-space ABI should be used to determine
f6816de9 177which of the sections below is relevant.
fb634bd8 178.SS ARM functions
ebfc3611
NL
179.\" See linux/arch/arm/vdso/vdso.lds.S
180.\" Commit: 8512287a8165592466cb9cb347ba94892e9c56a5
181The table below lists the symbols exported by the vDSO.
182.if t \{\
183.ft CW
184\}
185.TS
186l l.
187symbol version
188_
189__vdso_gettimeofday LINUX_2.6 (exported since Linux 4.1)
190__vdso_clock_gettime LINUX_2.6 (exported since Linux 4.1)
191.TE
192.if t \{\
193.in
194.ft P
195\}
196
2800db82
MF
197.\" See linux/arch/arm/kernel/entry-armv.S
198.\" See linux/Documentation/arm/kernel_user_helpers.txt
ebfc3611 199Additionally, the ARM port has a code page full of utility functions.
2800db82
MF
200Since it's just a raw page of code, there is no ELF information for doing
201symbol lookups or versioning.
202It does provide support for different versions though.
203
fb634bd8
MK
204For information on this code page,
205it's best to refer to the kernel documentation
2800db82 206as it's extremely detailed and covers everything you need to know:
fb634bd8 207.IR Documentation/arm/kernel_user_helpers.txt .
2800db82
MF
208.SS aarch64 functions
209.\" See linux/arch/arm64/kernel/vdso/vdso.lds.S
f6816de9 210The table below lists the symbols exported by the vDSO.
2800db82
MF
211.if t \{\
212.ft CW
213\}
214.TS
215l l.
216symbol version
217_
218__kernel_rt_sigreturn LINUX_2.6.39
219__kernel_gettimeofday LINUX_2.6.39
220__kernel_clock_gettime LINUX_2.6.39
221__kernel_clock_getres LINUX_2.6.39
222.TE
223.if t \{\
224.in
225.ft P
226\}
227.SS bfin (Blackfin) functions
228.\" See linux/arch/blackfin/kernel/fixed_code.S
229.\" See http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code
8635ed1b
MK
230As this CPU lacks a memory management unit (MMU),
231it doesn't set up a vDSO in the normal sense.
232Instead, it maps at boot time a few raw functions into
233a fixed location in memory.
2800db82 234User-space applications then call directly into that region.
8635ed1b
MK
235There is no provision for backward compatibility
236beyond sniffing raw opcodes,
fb634bd8 237but as this is an embedded CPU, it can get away with things\(emsome of the
2800db82
MF
238object formats it runs aren't even ELF based (they're bFLT/FLAT).
239
f6816de9
MK
240For information on this code page,
241it's best to refer to the public documentation:
2800db82
MF
242.br
243http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code
244.SS ia64 (Itanium) functions
245.\" See linux/arch/ia64/kernel/gate.lds.S
246.\" Also linux/arch/ia64/kernel/fsys.S and linux/Documentation/ia64/fsys.txt
f6816de9 247The table below lists the symbols exported by the vDSO.
2800db82
MF
248.if t \{\
249.ft CW
250\}
251.TS
252l l.
253symbol version
254_
255__kernel_sigtramp LINUX_2.5
256__kernel_syscall_via_break LINUX_2.5
257__kernel_syscall_via_epc LINUX_2.5
258.TE
259.if t \{\
260.in
261.ft P
262\}
263
fb634bd8 264The Itanium port is somewhat tricky.
8635ed1b
MK
265In addition to the vDSO above, it also has "light-weight system calls"
266(also known as "fast syscalls" or "fsys").
fb634bd8
MK
267You can invoke these via the
268.I __kernel_syscall_via_epc
269vDSO helper.
2800db82
MF
270The system calls listed here have the same semantics as if you called them
271directly via
fb634bd8 272.BR syscall (2),
2800db82
MF
273so refer to the relevant
274documentation for each.
275The table below lists the functions available via this mechanism.
276.if t \{\
277.ft CW
278\}
279.TS
280l.
281function
282_
283clock_gettime
284getcpu
285getpid
286getppid
287gettimeofday
288set_tid_address
289.TE
290.if t \{\
291.in
292.ft P
293\}
294.SS parisc (hppa) functions
295.\" See linux/arch/parisc/kernel/syscall.S
296.\" See linux/Documentation/parisc/registers
8635ed1b
MK
297The parisc port has a code page full of utility functions
298called a gateway page.
fb634bd8
MK
299Rather than use the normal ELF auxiliary vector approach,
300it passes the address of
2800db82
MF
301the page to the process via the SR2 register.
302The permissions on the page are such that merely executing those addresses
dd6b62ec 303automatically executes with kernel privileges and not in user space.
2800db82
MF
304This is done to match the way HP-UX works.
305
306Since it's just a raw page of code, there is no ELF information for doing
307symbol lookups or versioning.
fb634bd8
MK
308Simply call into the appropriate offset via the branch instruction,
309for example:
310
311 ble <offset>(%sr2, %r0)
2800db82
MF
312.if t \{\
313.ft CW
314\}
315.TS
316l l.
317offset function
318_
31900b0 lws_entry
32000e0 set_thread_pointer
3210100 linux_gateway_entry (syscall)
3220268 syscall_nosys
3230274 tracesys
3240324 tracesys_next
3250368 tracesys_exit
32603a0 tracesys_sigexit
32703b8 lws_start
32803dc lws_exit_nosys
32903e0 lws_exit
33003e4 lws_compare_and_swap64
33103e8 lws_compare_and_swap
3320404 cas_wouldblock
3330410 cas_action
334.TE
335.if t \{\
336.in
337.ft P
338\}
339.SS ppc/32 functions
340.\" See linux/arch/powerpc/kernel/vdso32/vdso32.lds.S
f6816de9 341The table below lists the symbols exported by the vDSO.
2800db82
MF
342The functions marked with a
343.I *
f6816de9
MK
344are available only when the kernel is
345a PowerPC64 (64-bit) kernel.
2800db82
MF
346.if t \{\
347.ft CW
348\}
349.TS
350l l.
351symbol version
352_
353__kernel_clock_getres LINUX_2.6.15
354__kernel_clock_gettime LINUX_2.6.15
355__kernel_datapage_offset LINUX_2.6.15
356__kernel_get_syscall_map LINUX_2.6.15
357__kernel_get_tbfreq LINUX_2.6.15
358__kernel_getcpu \fI*\fR LINUX_2.6.15
359__kernel_gettimeofday LINUX_2.6.15
360__kernel_sigtramp_rt32 LINUX_2.6.15
361__kernel_sigtramp32 LINUX_2.6.15
362__kernel_sync_dicache LINUX_2.6.15
363__kernel_sync_dicache_p5 LINUX_2.6.15
364.TE
365.if t \{\
366.in
367.ft P
368\}
369.SS ppc/64 functions
370.\" See linux/arch/powerpc/kernel/vdso64/vdso64.lds.S
f6816de9 371The table below lists the symbols exported by the vDSO.
2800db82
MF
372.if t \{\
373.ft CW
374\}
375.TS
376l l.
377symbol version
378_
379__kernel_clock_getres LINUX_2.6.15
380__kernel_clock_gettime LINUX_2.6.15
381__kernel_datapage_offset LINUX_2.6.15
382__kernel_get_syscall_map LINUX_2.6.15
383__kernel_get_tbfreq LINUX_2.6.15
384__kernel_getcpu LINUX_2.6.15
385__kernel_gettimeofday LINUX_2.6.15
386__kernel_sigtramp_rt64 LINUX_2.6.15
387__kernel_sync_dicache LINUX_2.6.15
388__kernel_sync_dicache_p5 LINUX_2.6.15
389.TE
390.if t \{\
391.in
392.ft P
393\}
394.SS s390 functions
395.\" See linux/arch/s390/kernel/vdso32/vdso32.lds.S
f6816de9 396The table below lists the symbols exported by the vDSO.
2800db82
MF
397.if t \{\
398.ft CW
399\}
400.TS
401l l.
402symbol version
403_
404__kernel_clock_getres LINUX_2.6.29
405__kernel_clock_gettime LINUX_2.6.29
406__kernel_gettimeofday LINUX_2.6.29
407.TE
408.if t \{\
409.in
410.ft P
411\}
412.SS s390x functions
413.\" See linux/arch/s390/kernel/vdso64/vdso64.lds.S
f6816de9 414The table below lists the symbols exported by the vDSO.
2800db82
MF
415.if t \{\
416.ft CW
417\}
418.TS
419l l.
420symbol version
421_
422__kernel_clock_getres LINUX_2.6.29
423__kernel_clock_gettime LINUX_2.6.29
424__kernel_gettimeofday LINUX_2.6.29
425.TE
426.if t \{\
427.in
428.ft P
429\}
430.SS sh (SuperH) functions
431.\" See linux/arch/sh/kernel/vsyscall/vsyscall.lds.S
f6816de9 432The table below lists the symbols exported by the vDSO.
2800db82
MF
433.if t \{\
434.ft CW
435\}
436.TS
437l l.
438symbol version
439_
440__kernel_rt_sigreturn LINUX_2.6
441__kernel_sigreturn LINUX_2.6
442__kernel_vsyscall LINUX_2.6
443.TE
444.if t \{\
445.in
446.ft P
447\}
448.SS i386 functions
449.\" See linux/arch/x86/vdso/vdso32/vdso32.lds.S
f6816de9 450The table below lists the symbols exported by the vDSO.
2800db82
MF
451.if t \{\
452.ft CW
453\}
454.TS
455l l.
456symbol version
457_
458__kernel_sigreturn LINUX_2.5
459__kernel_rt_sigreturn LINUX_2.5
460__kernel_vsyscall LINUX_2.5
1b294717
MF
461.\" Added in 7a59ed415f5b57469e22e41fc4188d5399e0b194 and updated
462.\" in 37c975545ec63320789962bf307f000f08fabd48.
54a82012
MK
463__vdso_clock_gettime LINUX_2.6 (exported since Linux 3.15)
464__vdso_gettimeofday LINUX_2.6 (exported since Linux 3.15)
465__vdso_time LINUX_2.6 (exported since Linux 3.15)
2800db82
MF
466.TE
467.if t \{\
468.in
469.ft P
470\}
471.SS x86_64 functions
472.\" See linux/arch/x86/vdso/vdso.lds.S
f6816de9 473The table below lists the symbols exported by the vDSO.
2800db82
MF
474All of these symbols are also available without the "__vdso_" prefix, but
475you should ignore those and stick to the names below.
476.if t \{\
477.ft CW
478\}
479.TS
480l l.
481symbol version
482_
483__vdso_clock_gettime LINUX_2.6
484__vdso_getcpu LINUX_2.6
485__vdso_gettimeofday LINUX_2.6
486__vdso_time LINUX_2.6
487.TE
488.if t \{\
489.in
490.ft P
491\}
492.SS x86/x32 functions
493.\" See linux/arch/x86/vdso/vdso32.lds.S
f6816de9 494The table below lists the symbols exported by the vDSO.
2800db82
MF
495.if t \{\
496.ft CW
497\}
498.TS
499l l.
500symbol version
501_
502__vdso_clock_gettime LINUX_2.6
503__vdso_getcpu LINUX_2.6
504__vdso_gettimeofday LINUX_2.6
505__vdso_time LINUX_2.6
506.TE
507.if t \{\
508.in
509.ft P
510\}
511.SS History
fb634bd8
MK
512The vDSO was originally just a single function\(emthe vsyscall.
513In older kernels, you might see that name
514in a process's memory map rather than "vdso".
d3532647 515Over time, people realized that this mechanism
fb634bd8 516was a great way to pass more functionality
2800db82
MF
517to user space, so it was reconceived as a vDSO in the current format.
518.SH SEE ALSO
519.BR syscalls (2),
520.BR getauxval (3),
521.BR proc (5)
522
fb634bd8
MK
523The documents, examples, and source code in the Linux source code tree:
524.in +4n
2800db82 525.nf
fb634bd8 526
2800db82 527Documentation/ABI/stable/vdso
fb634bd8 528Documentation/ia64/fsys.txt
2800db82 529Documentation/vDSO/* (includes examples of using the vDSO)
fb634bd8 530
2800db82
MF
531find arch/ -iname '*vdso*' -o -iname '*gate*'
532.fi
fb634bd8 533.in