.\" http://www.linuxjournal.com/content/creating-vdso-colonels-other-chicken
.\" http://www.trilithium.com/johan/2005/08/linux-gate/
.\"
-.TH VDSO 7 2015-12-28 "Linux" "Linux Programmer's Manual"
+.TH VDSO 7 2019-08-02 "Linux" "Linux Programmer's Manual"
.SH NAME
-vDSO \- overview of the virtual ELF dynamic shared object
+vdso \- overview of the virtual ELF dynamic shared object
.SH SYNOPSIS
.B #include <sys/auxv.h>
-
+.PP
.B void *vdso = (uintptr_t) getauxval(AT_SYSINFO_EHDR);
.SH DESCRIPTION
The "vDSO" (virtual dynamic shared object) is a small shared library that
This way you can code in the normal way using standard functions
and the C library will take care
of using any functionality that is available via the vDSO.
-
+.PP
Why does the vDSO exist at all?
There are some system calls the kernel provides that
user-space code ends up using frequently,
This is due both to the frequency of the call as well as the
context-switch overhead that results
from exiting user space and entering the kernel.
-
+.PP
The rest of this documentation is geared toward the curious and/or
C library writers rather than general developers.
If you're trying to call the vDSO in your own application rather than using
available at run time,
the C library can use functions provided by the kernel in
the vDSO.
-
+.PP
Note that the terminology can be confusing.
On x86 systems, the vDSO function
used to determine the preferred method of making a system call is
-named "__kernel_vsyscall", but on x86_64,
+named "__kernel_vsyscall", but on x86-64,
the term "vsyscall" also refers to an obsolete way to ask the kernel
what time it is or what CPU the caller is on.
-
+.PP
One frequently used system call is
.BR gettimeofday (2).
This system call is called both directly by user-space applications
via the
.B AT_SYSINFO_EHDR
tag.
-
+.PP
You must not assume the vDSO is mapped at any particular location in the
user's memory map.
The base address will usually be randomized at run time every time a new
time).
This is done for security reasons,
to prevent "return-to-libc" attacks.
-
+.PP
For some architectures, there is also an
.B AT_SYSINFO
tag.
run time when running under different kernel versions.
Oftentimes the C library will do detection with the first call and then
cache the result for subsequent calls.
-
+.PP
All symbols are also versioned (using the GNU version format).
This allows the kernel to update the function signature without breaking
backward compatibility.
Thus, when looking up a symbol in the vDSO,
you must always include the version
to match the ABI you expect.
-
+.PP
Typically the vDSO follows the naming convention of prefixing
all symbols with "__vdso_" or "__kernel_"
so as to distinguish them from other standard symbols.
For example, the "gettimeofday" function is named "__vdso_gettimeofday".
-
+.PP
You use the standard C calling conventions when calling
any of these functions.
No need to worry about weird register or stack behavior.
When you compile the kernel,
it will automatically compile and link the vDSO code for you.
You will frequently find it under the architecture-specific directory:
-
- find arch/$ARCH/ -name '*vdso*.so*' -o -name '*gate*.so*'
-
+.PP
+ find arch/$ARCH/ \-name \(aq*vdso*.so*\(aq \-o \-name \(aq*gate*.so*\(aq
+.\"
.SS vDSO names
The name of the vDSO varies across architectures.
It will often show up in things like glibc's
l l.
user ABI vDSO name
_
-aarch64 linux-vdso.so.1
-arm linux-vdso.so.1
-ia64 linux-gate.so.1
-ppc/32 linux-vdso32.so.1
-ppc/64 linux-vdso64.so.1
-s390 linux-vdso32.so.1
-s390x linux-vdso64.so.1
-sh linux-gate.so.1
-i386 linux-gate.so.1
-x86_64 linux-vdso.so.1
-x86/x32 linux-vdso.so.1
+aarch64 linux\-vdso.so.1
+arm linux\-vdso.so.1
+ia64 linux\-gate.so.1
+mips linux\-vdso.so.1
+ppc/32 linux\-vdso32.so.1
+ppc/64 linux\-vdso64.so.1
+riscv linux\-vdso.so.1
+s390 linux\-vdso32.so.1
+s390x linux\-vdso64.so.1
+sh linux\-gate.so.1
+i386 linux\-gate.so.1
+x86-64 linux\-vdso.so.1
+x86/x32 linux\-vdso.so.1
.TE
.if t \{\
.in
.ft P
\}
-.SS strace(1) and the vDSO
+.SS strace(1), seccomp(2), and the vDSO
When tracing systems calls with
.BR strace (1),
symbols (system calls) that are exported by the vDSO will
.I not
appear in the trace output.
+Those system calls will likewise not be visible to
+.BR seccomp (2)
+filters.
.SH ARCHITECTURE-SPECIFIC NOTES
The subsections below provide architecture-specific notes
on the vDSO.
-
+.PP
Note that the vDSO that is used is based on the ABI of your user-space code
and not the ABI of the kernel.
Thus, for example,
when you run an i386 32-bit ELF binary,
you'll get the same vDSO regardless of whether you run it under
-an i386 32-bit kernel or under an x86_64 64-bit kernel.
+an i386 32-bit kernel or under an x86-64 64-bit kernel.
Therefore, the name of the user-space ABI should be used to determine
which of the sections below is relevant.
.SS ARM functions
.in
.ft P
\}
-
+.PP
.\" See linux/arch/arm/kernel/entry-armv.S
.\" See linux/Documentation/arm/kernel_user_helpers.txt
Additionally, the ARM port has a code page full of utility functions.
Since it's just a raw page of code, there is no ELF information for doing
symbol lookups or versioning.
It does provide support for different versions though.
-
+.PP
For information on this code page,
it's best to refer to the kernel documentation
as it's extremely detailed and covers everything you need to know:
.in
.ft P
\}
-.SS bfin (Blackfin) functions
+.SS bfin (Blackfin) functions (port removed in Linux 4.17)
.\" See linux/arch/blackfin/kernel/fixed_code.S
.\" See http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code
As this CPU lacks a memory management unit (MMU),
beyond sniffing raw opcodes,
but as this is an embedded CPU, it can get away with things\(emsome of the
object formats it runs aren't even ELF based (they're bFLT/FLAT).
-
+.PP
For information on this code page,
it's best to refer to the public documentation:
.br
-http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code
+http://docs.blackfin.uclinux.org/doku.php?id=linux\-kernel:fixed\-code
+.SS mips functions
+.\" See linux/arch/mips/vdso/vdso.ld.S
+.PP
+The table below lists the symbols exported by the vDSO.
+.if t \{\
+.ft CW
+\}
+.TS
+l l.
+symbol version
+_
+__kernel_gettimeofday LINUX_2.6 (exported since Linux 4.4)
+__kernel_clock_gettime LINUX_2.6 (exported since Linux 4.4)
+.TE
+.if t \{\
+.in
+.ft P
+\}
.SS ia64 (Itanium) functions
.\" See linux/arch/ia64/kernel/gate.lds.S
.\" Also linux/arch/ia64/kernel/fsys.S and linux/Documentation/ia64/fsys.txt
.in
.ft P
\}
-
+.PP
The Itanium port is somewhat tricky.
In addition to the vDSO above, it also has "light-weight system calls"
(also known as "fast syscalls" or "fsys").
.SS parisc (hppa) functions
.\" See linux/arch/parisc/kernel/syscall.S
.\" See linux/Documentation/parisc/registers
-The parisc port has a code page full of utility functions
+The parisc port has a code page with utility functions
called a gateway page.
Rather than use the normal ELF auxiliary vector approach,
it passes the address of
The permissions on the page are such that merely executing those addresses
automatically executes with kernel privileges and not in user space.
This is done to match the way HP-UX works.
-
+.PP
Since it's just a raw page of code, there is no ELF information for doing
symbol lookups or versioning.
Simply call into the appropriate offset via the branch instruction,
for example:
-
+.PP
ble <offset>(%sr2, %r0)
.if t \{\
.ft CW
l l.
offset function
_
-00b0 lws_entry
-00e0 set_thread_pointer
+00b0 lws_entry (CAS operations)
+00e0 set_thread_pointer (used by glibc)
0100 linux_gateway_entry (syscall)
-0268 syscall_nosys
-0274 tracesys
-0324 tracesys_next
-0368 tracesys_exit
-03a0 tracesys_sigexit
-03b8 lws_start
-03dc lws_exit_nosys
-03e0 lws_exit
-03e4 lws_compare_and_swap64
-03e8 lws_compare_and_swap
-0404 cas_wouldblock
-0410 cas_action
.TE
.if t \{\
.in
.in
.ft P
\}
+.PP
+The
+.B CLOCK_REALTIME_COARSE
+and
+.B CLOCK_MONOTONIC_COARSE
+clocks are
+.I not
+supported by the
+.I __kernel_clock_getres
+and
+.I __kernel_clock_gettime
+interfaces;
+the kernel falls back to the real system call.
.SS ppc/64 functions
.\" See linux/arch/powerpc/kernel/vdso64/vdso64.lds.S
The table below lists the symbols exported by the vDSO.
.in
.ft P
\}
+.PP
+The
+.B CLOCK_REALTIME_COARSE
+and
+.B CLOCK_MONOTONIC_COARSE
+clocks are
+.I not
+supported by the
+.I __kernel_clock_getres
+and
+.I __kernel_clock_gettime
+interfaces;
+the kernel falls back to the real system call.
+.SS riscv functions
+.\" See linux/arch/riscv/kernel/vdso/vdso.lds.S
+The table below lists the symbols exported by the vDSO.
+.if t \{\
+.ft CW
+\}
+.TS
+l l.
+symbol version
+_
+__kernel_rt_sigreturn LINUX_4.15
+__kernel_gettimeofday LINUX_4.15
+__kernel_clock_gettime LINUX_4.15
+__kernel_clock_getres LINUX_4.15
+__kernel_getcpu LINUX_4.15
+__kernel_flush_icache LINUX_4.15
+.TE
+.if t \{\
+.in
+.ft P
+\}
.SS s390 functions
.\" See linux/arch/s390/kernel/vdso32/vdso32.lds.S
The table below lists the symbols exported by the vDSO.
.in
.ft P
\}
-.SS x86_64 functions
+.SS x86-64 functions
.\" See linux/arch/x86/vdso/vdso.lds.S
The table below lists the symbols exported by the vDSO.
All of these symbols are also available without the "__vdso_" prefix, but
.BR syscalls (2),
.BR getauxval (3),
.BR proc (5)
-
+.PP
The documents, examples, and source code in the Linux source code tree:
+.PP
.in +4n
-.nf
-
+.EX
Documentation/ABI/stable/vdso
Documentation/ia64/fsys.txt
Documentation/vDSO/* (includes examples of using the vDSO)
-find arch/ -iname '*vdso*' -o -iname '*gate*'
-.fi
+find arch/ \-iname \(aq*vdso*\(aq \-o \-iname \(aq*gate*\(aq
+.EE
.in