]> git.ipfire.org Git - thirdparty/man-pages.git/blame_incremental - man7/vdso.7
pldd.1, bpf.2, chdir.2, clone.2, fanotify_init.2, fanotify_mark.2, intro.2, ipc.2...
[thirdparty/man-pages.git] / man7 / vdso.7
... / ...
CommitLineData
1.\" Written by Mike Frysinger <vapier@gentoo.org>
2.\"
3.\" %%%LICENSE_START(PUBLIC_DOMAIN)
4.\" This page is in the public domain.
5.\" %%%LICENSE_END
6.\"
7.\" Useful background:
8.\" http://articles.manugarg.com/systemcallinlinux2_6.html
9.\" https://lwn.net/Articles/446528/
10.\" http://www.linuxjournal.com/content/creating-vdso-colonels-other-chicken
11.\" http://www.trilithium.com/johan/2005/08/linux-gate/
12.\"
13.TH VDSO 7 2019-08-02 "Linux" "Linux Programmer's Manual"
14.SH NAME
15vdso \- overview of the virtual ELF dynamic shared object
16.SH SYNOPSIS
17.B #include <sys/auxv.h>
18.PP
19.B void *vdso = (uintptr_t) getauxval(AT_SYSINFO_EHDR);
20.SH DESCRIPTION
21The "vDSO" (virtual dynamic shared object) is a small shared library that
22the kernel automatically maps into the
23address space of all user-space applications.
24Applications usually do not need to concern themselves with these details
25as the vDSO is most commonly called by the C library.
26This way you can code in the normal way using standard functions
27and the C library will take care
28of using any functionality that is available via the vDSO.
29.PP
30Why does the vDSO exist at all?
31There are some system calls the kernel provides that
32user-space code ends up using frequently,
33to the point that such calls can dominate overall performance.
34This is due both to the frequency of the call as well as the
35context-switch overhead that results
36from exiting user space and entering the kernel.
37.PP
38The rest of this documentation is geared toward the curious and/or
39C library writers rather than general developers.
40If you're trying to call the vDSO in your own application rather than using
41the C library, you're most likely doing it wrong.
42.SS Example background
43Making system calls can be slow.
44In x86 32-bit systems, you can trigger a software interrupt
45.RI ( "int $0x80" )
46to tell the kernel you wish to make a system call.
47However, this instruction is expensive: it goes through
48the full interrupt-handling paths
49in the processor's microcode as well as in the kernel.
50Newer processors have faster (but backward incompatible) instructions to
51initiate system calls.
52Rather than require the C library to figure out if this functionality is
53available at run time,
54the C library can use functions provided by the kernel in
55the vDSO.
56.PP
57Note that the terminology can be confusing.
58On x86 systems, the vDSO function
59used to determine the preferred method of making a system call is
60named "__kernel_vsyscall", but on x86-64,
61the term "vsyscall" also refers to an obsolete way to ask the kernel
62what time it is or what CPU the caller is on.
63.PP
64One frequently used system call is
65.BR gettimeofday (2).
66This system call is called both directly by user-space applications
67as well as indirectly by
68the C library.
69Think timestamps or timing loops or polling\(emall of these
70frequently need to know what time it is right now.
71This information is also not secret\(emany application in any
72privilege mode (root or any unprivileged user) will get the same answer.
73Thus the kernel arranges for the information required to answer
74this question to be placed in memory the process can access.
75Now a call to
76.BR gettimeofday (2)
77changes from a system call to a normal function
78call and a few memory accesses.
79.SS Finding the vDSO
80The base address of the vDSO (if one exists) is passed by the kernel to
81each program in the initial auxiliary vector (see
82.BR getauxval (3)),
83via the
84.B AT_SYSINFO_EHDR
85tag.
86.PP
87You must not assume the vDSO is mapped at any particular location in the
88user's memory map.
89The base address will usually be randomized at run time every time a new
90process image is created (at
91.BR execve (2)
92time).
93This is done for security reasons,
94to prevent "return-to-libc" attacks.
95.PP
96For some architectures, there is also an
97.B AT_SYSINFO
98tag.
99This is used only for locating the vsyscall entry point and is frequently
100omitted or set to 0 (meaning it's not available).
101This tag is a throwback to the initial vDSO work (see
102.IR History
103below) and its use should be avoided.
104.SS File format
105Since the vDSO is a fully formed ELF image, you can do symbol lookups on it.
106This allows new symbols to be added with newer kernel releases,
107and allows the C library to detect available functionality at
108run time when running under different kernel versions.
109Oftentimes the C library will do detection with the first call and then
110cache the result for subsequent calls.
111.PP
112All symbols are also versioned (using the GNU version format).
113This allows the kernel to update the function signature without breaking
114backward compatibility.
115This means changing the arguments that the function accepts as well as the
116return value.
117Thus, when looking up a symbol in the vDSO,
118you must always include the version
119to match the ABI you expect.
120.PP
121Typically the vDSO follows the naming convention of prefixing
122all symbols with "__vdso_" or "__kernel_"
123so as to distinguish them from other standard symbols.
124For example, the "gettimeofday" function is named "__vdso_gettimeofday".
125.PP
126You use the standard C calling conventions when calling
127any of these functions.
128No need to worry about weird register or stack behavior.
129.SH NOTES
130.SS Source
131When you compile the kernel,
132it will automatically compile and link the vDSO code for you.
133You will frequently find it under the architecture-specific directory:
134.PP
135 find arch/$ARCH/ \-name \(aq*vdso*.so*\(aq \-o \-name \(aq*gate*.so*\(aq
136.\"
137.SS vDSO names
138The name of the vDSO varies across architectures.
139It will often show up in things like glibc's
140.BR ldd (1)
141output.
142The exact name should not matter to any code, so do not hardcode it.
143.if t \{\
144.ft CW
145\}
146.TS
147l l.
148user ABI vDSO name
149_
150aarch64 linux\-vdso.so.1
151arm linux\-vdso.so.1
152ia64 linux\-gate.so.1
153mips linux\-vdso.so.1
154ppc/32 linux\-vdso32.so.1
155ppc/64 linux\-vdso64.so.1
156riscv linux\-vdso.so.1
157s390 linux\-vdso32.so.1
158s390x linux\-vdso64.so.1
159sh linux\-gate.so.1
160i386 linux\-gate.so.1
161x86-64 linux\-vdso.so.1
162x86/x32 linux\-vdso.so.1
163.TE
164.if t \{\
165.in
166.ft P
167\}
168.SS strace(1), seccomp(2), and the vDSO
169When tracing systems calls with
170.BR strace (1),
171symbols (system calls) that are exported by the vDSO will
172.I not
173appear in the trace output.
174Those system calls will likewise not be visible to
175.BR seccomp (2)
176filters.
177.SH ARCHITECTURE-SPECIFIC NOTES
178The subsections below provide architecture-specific notes
179on the vDSO.
180.PP
181Note that the vDSO that is used is based on the ABI of your user-space code
182and not the ABI of the kernel.
183Thus, for example,
184when you run an i386 32-bit ELF binary,
185you'll get the same vDSO regardless of whether you run it under
186an i386 32-bit kernel or under an x86-64 64-bit kernel.
187Therefore, the name of the user-space ABI should be used to determine
188which of the sections below is relevant.
189.SS ARM functions
190.\" See linux/arch/arm/vdso/vdso.lds.S
191.\" Commit: 8512287a8165592466cb9cb347ba94892e9c56a5
192The table below lists the symbols exported by the vDSO.
193.if t \{\
194.ft CW
195\}
196.TS
197l l.
198symbol version
199_
200__vdso_gettimeofday LINUX_2.6 (exported since Linux 4.1)
201__vdso_clock_gettime LINUX_2.6 (exported since Linux 4.1)
202.TE
203.if t \{\
204.in
205.ft P
206\}
207.PP
208.\" See linux/arch/arm/kernel/entry-armv.S
209.\" See linux/Documentation/arm/kernel_user_helpers.txt
210Additionally, the ARM port has a code page full of utility functions.
211Since it's just a raw page of code, there is no ELF information for doing
212symbol lookups or versioning.
213It does provide support for different versions though.
214.PP
215For information on this code page,
216it's best to refer to the kernel documentation
217as it's extremely detailed and covers everything you need to know:
218.IR Documentation/arm/kernel_user_helpers.txt .
219.SS aarch64 functions
220.\" See linux/arch/arm64/kernel/vdso/vdso.lds.S
221The table below lists the symbols exported by the vDSO.
222.if t \{\
223.ft CW
224\}
225.TS
226l l.
227symbol version
228_
229__kernel_rt_sigreturn LINUX_2.6.39
230__kernel_gettimeofday LINUX_2.6.39
231__kernel_clock_gettime LINUX_2.6.39
232__kernel_clock_getres LINUX_2.6.39
233.TE
234.if t \{\
235.in
236.ft P
237\}
238.SS bfin (Blackfin) functions (port removed in Linux 4.17)
239.\" See linux/arch/blackfin/kernel/fixed_code.S
240.\" See http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code
241As this CPU lacks a memory management unit (MMU),
242it doesn't set up a vDSO in the normal sense.
243Instead, it maps at boot time a few raw functions into
244a fixed location in memory.
245User-space applications then call directly into that region.
246There is no provision for backward compatibility
247beyond sniffing raw opcodes,
248but as this is an embedded CPU, it can get away with things\(emsome of the
249object formats it runs aren't even ELF based (they're bFLT/FLAT).
250.PP
251For information on this code page,
252it's best to refer to the public documentation:
253.br
254http://docs.blackfin.uclinux.org/doku.php?id=linux\-kernel:fixed\-code
255.SS mips functions
256.\" See linux/arch/mips/vdso/vdso.ld.S
257.PP
258The table below lists the symbols exported by the vDSO.
259.if t \{\
260.ft CW
261\}
262.TS
263l l.
264symbol version
265_
266__kernel_gettimeofday LINUX_2.6 (exported since Linux 4.4)
267__kernel_clock_gettime LINUX_2.6 (exported since Linux 4.4)
268.TE
269.if t \{\
270.in
271.ft P
272\}
273.SS ia64 (Itanium) functions
274.\" See linux/arch/ia64/kernel/gate.lds.S
275.\" Also linux/arch/ia64/kernel/fsys.S and linux/Documentation/ia64/fsys.txt
276The table below lists the symbols exported by the vDSO.
277.if t \{\
278.ft CW
279\}
280.TS
281l l.
282symbol version
283_
284__kernel_sigtramp LINUX_2.5
285__kernel_syscall_via_break LINUX_2.5
286__kernel_syscall_via_epc LINUX_2.5
287.TE
288.if t \{\
289.in
290.ft P
291\}
292.PP
293The Itanium port is somewhat tricky.
294In addition to the vDSO above, it also has "light-weight system calls"
295(also known as "fast syscalls" or "fsys").
296You can invoke these via the
297.I __kernel_syscall_via_epc
298vDSO helper.
299The system calls listed here have the same semantics as if you called them
300directly via
301.BR syscall (2),
302so refer to the relevant
303documentation for each.
304The table below lists the functions available via this mechanism.
305.if t \{\
306.ft CW
307\}
308.TS
309l.
310function
311_
312clock_gettime
313getcpu
314getpid
315getppid
316gettimeofday
317set_tid_address
318.TE
319.if t \{\
320.in
321.ft P
322\}
323.SS parisc (hppa) functions
324.\" See linux/arch/parisc/kernel/syscall.S
325.\" See linux/Documentation/parisc/registers
326The parisc port has a code page with utility functions
327called a gateway page.
328Rather than use the normal ELF auxiliary vector approach,
329it passes the address of
330the page to the process via the SR2 register.
331The permissions on the page are such that merely executing those addresses
332automatically executes with kernel privileges and not in user space.
333This is done to match the way HP-UX works.
334.PP
335Since it's just a raw page of code, there is no ELF information for doing
336symbol lookups or versioning.
337Simply call into the appropriate offset via the branch instruction,
338for example:
339.PP
340 ble <offset>(%sr2, %r0)
341.if t \{\
342.ft CW
343\}
344.TS
345l l.
346offset function
347_
34800b0 lws_entry (CAS operations)
34900e0 set_thread_pointer (used by glibc)
3500100 linux_gateway_entry (syscall)
351.TE
352.if t \{\
353.in
354.ft P
355\}
356.SS ppc/32 functions
357.\" See linux/arch/powerpc/kernel/vdso32/vdso32.lds.S
358The table below lists the symbols exported by the vDSO.
359The functions marked with a
360.I *
361are available only when the kernel is
362a PowerPC64 (64-bit) kernel.
363.if t \{\
364.ft CW
365\}
366.TS
367l l.
368symbol version
369_
370__kernel_clock_getres LINUX_2.6.15
371__kernel_clock_gettime LINUX_2.6.15
372__kernel_datapage_offset LINUX_2.6.15
373__kernel_get_syscall_map LINUX_2.6.15
374__kernel_get_tbfreq LINUX_2.6.15
375__kernel_getcpu \fI*\fR LINUX_2.6.15
376__kernel_gettimeofday LINUX_2.6.15
377__kernel_sigtramp_rt32 LINUX_2.6.15
378__kernel_sigtramp32 LINUX_2.6.15
379__kernel_sync_dicache LINUX_2.6.15
380__kernel_sync_dicache_p5 LINUX_2.6.15
381.TE
382.if t \{\
383.in
384.ft P
385\}
386.PP
387The
388.B CLOCK_REALTIME_COARSE
389and
390.B CLOCK_MONOTONIC_COARSE
391clocks are
392.I not
393supported by the
394.I __kernel_clock_getres
395and
396.I __kernel_clock_gettime
397interfaces;
398the kernel falls back to the real system call.
399.SS ppc/64 functions
400.\" See linux/arch/powerpc/kernel/vdso64/vdso64.lds.S
401The table below lists the symbols exported by the vDSO.
402.if t \{\
403.ft CW
404\}
405.TS
406l l.
407symbol version
408_
409__kernel_clock_getres LINUX_2.6.15
410__kernel_clock_gettime LINUX_2.6.15
411__kernel_datapage_offset LINUX_2.6.15
412__kernel_get_syscall_map LINUX_2.6.15
413__kernel_get_tbfreq LINUX_2.6.15
414__kernel_getcpu LINUX_2.6.15
415__kernel_gettimeofday LINUX_2.6.15
416__kernel_sigtramp_rt64 LINUX_2.6.15
417__kernel_sync_dicache LINUX_2.6.15
418__kernel_sync_dicache_p5 LINUX_2.6.15
419.TE
420.if t \{\
421.in
422.ft P
423\}
424.PP
425The
426.B CLOCK_REALTIME_COARSE
427and
428.B CLOCK_MONOTONIC_COARSE
429clocks are
430.I not
431supported by the
432.I __kernel_clock_getres
433and
434.I __kernel_clock_gettime
435interfaces;
436the kernel falls back to the real system call.
437.SS riscv functions
438.\" See linux/arch/riscv/kernel/vdso/vdso.lds.S
439The table below lists the symbols exported by the vDSO.
440.if t \{\
441.ft CW
442\}
443.TS
444l l.
445symbol version
446_
447__kernel_rt_sigreturn LINUX_4.15
448__kernel_gettimeofday LINUX_4.15
449__kernel_clock_gettime LINUX_4.15
450__kernel_clock_getres LINUX_4.15
451__kernel_getcpu LINUX_4.15
452__kernel_flush_icache LINUX_4.15
453.TE
454.if t \{\
455.in
456.ft P
457\}
458.SS s390 functions
459.\" See linux/arch/s390/kernel/vdso32/vdso32.lds.S
460The table below lists the symbols exported by the vDSO.
461.if t \{\
462.ft CW
463\}
464.TS
465l l.
466symbol version
467_
468__kernel_clock_getres LINUX_2.6.29
469__kernel_clock_gettime LINUX_2.6.29
470__kernel_gettimeofday LINUX_2.6.29
471.TE
472.if t \{\
473.in
474.ft P
475\}
476.SS s390x functions
477.\" See linux/arch/s390/kernel/vdso64/vdso64.lds.S
478The table below lists the symbols exported by the vDSO.
479.if t \{\
480.ft CW
481\}
482.TS
483l l.
484symbol version
485_
486__kernel_clock_getres LINUX_2.6.29
487__kernel_clock_gettime LINUX_2.6.29
488__kernel_gettimeofday LINUX_2.6.29
489.TE
490.if t \{\
491.in
492.ft P
493\}
494.SS sh (SuperH) functions
495.\" See linux/arch/sh/kernel/vsyscall/vsyscall.lds.S
496The table below lists the symbols exported by the vDSO.
497.if t \{\
498.ft CW
499\}
500.TS
501l l.
502symbol version
503_
504__kernel_rt_sigreturn LINUX_2.6
505__kernel_sigreturn LINUX_2.6
506__kernel_vsyscall LINUX_2.6
507.TE
508.if t \{\
509.in
510.ft P
511\}
512.SS i386 functions
513.\" See linux/arch/x86/vdso/vdso32/vdso32.lds.S
514The table below lists the symbols exported by the vDSO.
515.if t \{\
516.ft CW
517\}
518.TS
519l l.
520symbol version
521_
522__kernel_sigreturn LINUX_2.5
523__kernel_rt_sigreturn LINUX_2.5
524__kernel_vsyscall LINUX_2.5
525.\" Added in 7a59ed415f5b57469e22e41fc4188d5399e0b194 and updated
526.\" in 37c975545ec63320789962bf307f000f08fabd48.
527__vdso_clock_gettime LINUX_2.6 (exported since Linux 3.15)
528__vdso_gettimeofday LINUX_2.6 (exported since Linux 3.15)
529__vdso_time LINUX_2.6 (exported since Linux 3.15)
530.TE
531.if t \{\
532.in
533.ft P
534\}
535.SS x86-64 functions
536.\" See linux/arch/x86/vdso/vdso.lds.S
537The table below lists the symbols exported by the vDSO.
538All of these symbols are also available without the "__vdso_" prefix, but
539you should ignore those and stick to the names below.
540.if t \{\
541.ft CW
542\}
543.TS
544l l.
545symbol version
546_
547__vdso_clock_gettime LINUX_2.6
548__vdso_getcpu LINUX_2.6
549__vdso_gettimeofday LINUX_2.6
550__vdso_time LINUX_2.6
551.TE
552.if t \{\
553.in
554.ft P
555\}
556.SS x86/x32 functions
557.\" See linux/arch/x86/vdso/vdso32.lds.S
558The table below lists the symbols exported by the vDSO.
559.if t \{\
560.ft CW
561\}
562.TS
563l l.
564symbol version
565_
566__vdso_clock_gettime LINUX_2.6
567__vdso_getcpu LINUX_2.6
568__vdso_gettimeofday LINUX_2.6
569__vdso_time LINUX_2.6
570.TE
571.if t \{\
572.in
573.ft P
574\}
575.SS History
576The vDSO was originally just a single function\(emthe vsyscall.
577In older kernels, you might see that name
578in a process's memory map rather than "vdso".
579Over time, people realized that this mechanism
580was a great way to pass more functionality
581to user space, so it was reconceived as a vDSO in the current format.
582.SH SEE ALSO
583.BR syscalls (2),
584.BR getauxval (3),
585.BR proc (5)
586.PP
587The documents, examples, and source code in the Linux source code tree:
588.PP
589.in +4n
590.EX
591Documentation/ABI/stable/vdso
592Documentation/ia64/fsys.txt
593Documentation/vDSO/* (includes examples of using the vDSO)
594
595find arch/ \-iname \(aq*vdso*\(aq \-o \-iname \(aq*gate*\(aq
596.EE
597.in