]> git.ipfire.org Git - thirdparty/kernel/stable.git/blame - Documentation/kvm/api.txt
KVM: Protect update_cr8_intercept() when running without an apic
[thirdparty/kernel/stable.git] / Documentation / kvm / api.txt
CommitLineData
9c1b96e3
AK
1The Definitive KVM (Kernel-based Virtual Machine) API Documentation
2===================================================================
3
41. General description
5
6The kvm API is a set of ioctls that are issued to control various aspects
7of a virtual machine. The ioctls belong to three classes
8
9 - System ioctls: These query and set global attributes which affect the
10 whole kvm subsystem. In addition a system ioctl is used to create
11 virtual machines
12
13 - VM ioctls: These query and set attributes that affect an entire virtual
14 machine, for example memory layout. In addition a VM ioctl is used to
15 create virtual cpus (vcpus).
16
17 Only run VM ioctls from the same process (address space) that was used
18 to create the VM.
19
20 - vcpu ioctls: These query and set attributes that control the operation
21 of a single virtual cpu.
22
23 Only run vcpu ioctls from the same thread that was used to create the
24 vcpu.
25
262. File descritpors
27
28The kvm API is centered around file descriptors. An initial
29open("/dev/kvm") obtains a handle to the kvm subsystem; this handle
30can be used to issue system ioctls. A KVM_CREATE_VM ioctl on this
31handle will create a VM file descripror which can be used to issue VM
32ioctls. A KVM_CREATE_VCPU ioctl on a VM fd will create a virtual cpu
33and return a file descriptor pointing to it. Finally, ioctls on a vcpu
34fd can be used to control the vcpu, including the important task of
35actually running guest code.
36
37In general file descriptors can be migrated among processes by means
38of fork() and the SCM_RIGHTS facility of unix domain socket. These
39kinds of tricks are explicitly not supported by kvm. While they will
40not cause harm to the host, their actual behavior is not guaranteed by
41the API. The only supported use is one virtual machine per process,
42and one vcpu per thread.
43
443. Extensions
45
46As of Linux 2.6.22, the KVM ABI has been stabilized: no backward
47incompatible change are allowed. However, there is an extension
48facility that allows backward-compatible extensions to the API to be
49queried and used.
50
51The extension mechanism is not based on on the Linux version number.
52Instead, kvm defines extension identifiers and a facility to query
53whether a particular extension identifier is available. If it is, a
54set of ioctls is available for application use.
55
564. API description
57
58This section describes ioctls that can be used to control kvm guests.
59For each ioctl, the following information is provided along with a
60description:
61
62 Capability: which KVM extension provides this ioctl. Can be 'basic',
63 which means that is will be provided by any kernel that supports
64 API version 12 (see section 4.1), or a KVM_CAP_xyz constant, which
65 means availability needs to be checked with KVM_CHECK_EXTENSION
66 (see section 4.4).
67
68 Architectures: which instruction set architectures provide this ioctl.
69 x86 includes both i386 and x86_64.
70
71 Type: system, vm, or vcpu.
72
73 Parameters: what parameters are accepted by the ioctl.
74
75 Returns: the return value. General error numbers (EBADF, ENOMEM, EINVAL)
76 are not detailed, but errors with specific meanings are.
77
784.1 KVM_GET_API_VERSION
79
80Capability: basic
81Architectures: all
82Type: system ioctl
83Parameters: none
84Returns: the constant KVM_API_VERSION (=12)
85
86This identifies the API version as the stable kvm API. It is not
87expected that this number will change. However, Linux 2.6.20 and
882.6.21 report earlier versions; these are not documented and not
89supported. Applications should refuse to run if KVM_GET_API_VERSION
90returns a value other than 12. If this check passes, all ioctls
91described as 'basic' will be available.
92
934.2 KVM_CREATE_VM
94
95Capability: basic
96Architectures: all
97Type: system ioctl
98Parameters: none
99Returns: a VM fd that can be used to control the new virtual machine.
100
101The new VM has no virtual cpus and no memory. An mmap() of a VM fd
102will access the virtual machine's physical address space; offset zero
103corresponds to guest physical address zero. Use of mmap() on a VM fd
104is discouraged if userspace memory allocation (KVM_CAP_USER_MEMORY) is
105available.
106
1074.3 KVM_GET_MSR_INDEX_LIST
108
109Capability: basic
110Architectures: x86
111Type: system
112Parameters: struct kvm_msr_list (in/out)
113Returns: 0 on success; -1 on error
114Errors:
115 E2BIG: the msr index list is to be to fit in the array specified by
116 the user.
117
118struct kvm_msr_list {
119 __u32 nmsrs; /* number of msrs in entries */
120 __u32 indices[0];
121};
122
123This ioctl returns the guest msrs that are supported. The list varies
124by kvm version and host processor, but does not change otherwise. The
125user fills in the size of the indices array in nmsrs, and in return
126kvm adjusts nmsrs to reflect the actual number of msrs and fills in
127the indices array with their numbers.
128
1294.4 KVM_CHECK_EXTENSION
130
131Capability: basic
132Architectures: all
133Type: system ioctl
134Parameters: extension identifier (KVM_CAP_*)
135Returns: 0 if unsupported; 1 (or some other positive integer) if supported
136
137The API allows the application to query about extensions to the core
138kvm API. Userspace passes an extension identifier (an integer) and
139receives an integer that describes the extension availability.
140Generally 0 means no and 1 means yes, but some extensions may report
141additional information in the integer return value.
142
1434.5 KVM_GET_VCPU_MMAP_SIZE
144
145Capability: basic
146Architectures: all
147Type: system ioctl
148Parameters: none
149Returns: size of vcpu mmap area, in bytes
150
151The KVM_RUN ioctl (cf.) communicates with userspace via a shared
152memory region. This ioctl returns the size of that region. See the
153KVM_RUN documentation for details.
154
1554.6 KVM_SET_MEMORY_REGION
156
157Capability: basic
158Architectures: all
159Type: vm ioctl
160Parameters: struct kvm_memory_region (in)
161Returns: 0 on success, -1 on error
162
163struct kvm_memory_region {
164 __u32 slot;
165 __u32 flags;
166 __u64 guest_phys_addr;
167 __u64 memory_size; /* bytes */
168};
169
170/* for kvm_memory_region::flags */
171#define KVM_MEM_LOG_DIRTY_PAGES 1UL
172
173This ioctl allows the user to create or modify a guest physical memory
174slot. When changing an existing slot, it may be moved in the guest
175physical memory space, or its flags may be modified. It may not be
176resized. Slots may not overlap.
177
178The flags field supports just one flag, KVM_MEM_LOG_DIRTY_PAGES, which
179instructs kvm to keep track of writes to memory within the slot. See
180the KVM_GET_DIRTY_LOG ioctl.
181
182It is recommended to use the KVM_SET_USER_MEMORY_REGION ioctl instead
183of this API, if available. This newer API allows placing guest memory
184at specified locations in the host address space, yielding better
185control and easy access.
186
1874.6 KVM_CREATE_VCPU
188
189Capability: basic
190Architectures: all
191Type: vm ioctl
192Parameters: vcpu id (apic id on x86)
193Returns: vcpu fd on success, -1 on error
194
195This API adds a vcpu to a virtual machine. The vcpu id is a small integer
196in the range [0, max_vcpus).
197
1984.7 KVM_GET_DIRTY_LOG (vm ioctl)
199
200Capability: basic
201Architectures: x86
202Type: vm ioctl
203Parameters: struct kvm_dirty_log (in/out)
204Returns: 0 on success, -1 on error
205
206/* for KVM_GET_DIRTY_LOG */
207struct kvm_dirty_log {
208 __u32 slot;
209 __u32 padding;
210 union {
211 void __user *dirty_bitmap; /* one bit per page */
212 __u64 padding;
213 };
214};
215
216Given a memory slot, return a bitmap containing any pages dirtied
217since the last call to this ioctl. Bit 0 is the first page in the
218memory slot. Ensure the entire structure is cleared to avoid padding
219issues.
220
2214.8 KVM_SET_MEMORY_ALIAS
222
223Capability: basic
224Architectures: x86
225Type: vm ioctl
226Parameters: struct kvm_memory_alias (in)
227Returns: 0 (success), -1 (error)
228
229struct kvm_memory_alias {
230 __u32 slot; /* this has a different namespace than memory slots */
231 __u32 flags;
232 __u64 guest_phys_addr;
233 __u64 memory_size;
234 __u64 target_phys_addr;
235};
236
237Defines a guest physical address space region as an alias to another
238region. Useful for aliased address, for example the VGA low memory
239window. Should not be used with userspace memory.
240
2414.9 KVM_RUN
242
243Capability: basic
244Architectures: all
245Type: vcpu ioctl
246Parameters: none
247Returns: 0 on success, -1 on error
248Errors:
249 EINTR: an unmasked signal is pending
250
251This ioctl is used to run a guest virtual cpu. While there are no
252explicit parameters, there is an implicit parameter block that can be
253obtained by mmap()ing the vcpu fd at offset 0, with the size given by
254KVM_GET_VCPU_MMAP_SIZE. The parameter block is formatted as a 'struct
255kvm_run' (see below).
256
2574.10 KVM_GET_REGS
258
259Capability: basic
260Architectures: all
261Type: vcpu ioctl
262Parameters: struct kvm_regs (out)
263Returns: 0 on success, -1 on error
264
265Reads the general purpose registers from the vcpu.
266
267/* x86 */
268struct kvm_regs {
269 /* out (KVM_GET_REGS) / in (KVM_SET_REGS) */
270 __u64 rax, rbx, rcx, rdx;
271 __u64 rsi, rdi, rsp, rbp;
272 __u64 r8, r9, r10, r11;
273 __u64 r12, r13, r14, r15;
274 __u64 rip, rflags;
275};
276
2774.11 KVM_SET_REGS
278
279Capability: basic
280Architectures: all
281Type: vcpu ioctl
282Parameters: struct kvm_regs (in)
283Returns: 0 on success, -1 on error
284
285Writes the general purpose registers into the vcpu.
286
287See KVM_GET_REGS for the data structure.
288
2894.12 KVM_GET_SREGS
290
291Capability: basic
292Architectures: x86
293Type: vcpu ioctl
294Parameters: struct kvm_sregs (out)
295Returns: 0 on success, -1 on error
296
297Reads special registers from the vcpu.
298
299/* x86 */
300struct kvm_sregs {
301 struct kvm_segment cs, ds, es, fs, gs, ss;
302 struct kvm_segment tr, ldt;
303 struct kvm_dtable gdt, idt;
304 __u64 cr0, cr2, cr3, cr4, cr8;
305 __u64 efer;
306 __u64 apic_base;
307 __u64 interrupt_bitmap[(KVM_NR_INTERRUPTS + 63) / 64];
308};
309
310interrupt_bitmap is a bitmap of pending external interrupts. At most
311one bit may be set. This interrupt has been acknowledged by the APIC
312but not yet injected into the cpu core.
313
3144.13 KVM_SET_SREGS
315
316Capability: basic
317Architectures: x86
318Type: vcpu ioctl
319Parameters: struct kvm_sregs (in)
320Returns: 0 on success, -1 on error
321
322Writes special registers into the vcpu. See KVM_GET_SREGS for the
323data structures.
324
3254.14 KVM_TRANSLATE
326
327Capability: basic
328Architectures: x86
329Type: vcpu ioctl
330Parameters: struct kvm_translation (in/out)
331Returns: 0 on success, -1 on error
332
333Translates a virtual address according to the vcpu's current address
334translation mode.
335
336struct kvm_translation {
337 /* in */
338 __u64 linear_address;
339
340 /* out */
341 __u64 physical_address;
342 __u8 valid;
343 __u8 writeable;
344 __u8 usermode;
345 __u8 pad[5];
346};
347
3484.15 KVM_INTERRUPT
349
350Capability: basic
351Architectures: x86
352Type: vcpu ioctl
353Parameters: struct kvm_interrupt (in)
354Returns: 0 on success, -1 on error
355
356Queues a hardware interrupt vector to be injected. This is only
357useful if in-kernel local APIC is not used.
358
359/* for KVM_INTERRUPT */
360struct kvm_interrupt {
361 /* in */
362 __u32 irq;
363};
364
365Note 'irq' is an interrupt vector, not an interrupt pin or line.
366
3674.16 KVM_DEBUG_GUEST
368
369Capability: basic
370Architectures: none
371Type: vcpu ioctl
372Parameters: none)
373Returns: -1 on error
374
375Support for this has been removed. Use KVM_SET_GUEST_DEBUG instead.
376
3774.17 KVM_GET_MSRS
378
379Capability: basic
380Architectures: x86
381Type: vcpu ioctl
382Parameters: struct kvm_msrs (in/out)
383Returns: 0 on success, -1 on error
384
385Reads model-specific registers from the vcpu. Supported msr indices can
386be obtained using KVM_GET_MSR_INDEX_LIST.
387
388struct kvm_msrs {
389 __u32 nmsrs; /* number of msrs in entries */
390 __u32 pad;
391
392 struct kvm_msr_entry entries[0];
393};
394
395struct kvm_msr_entry {
396 __u32 index;
397 __u32 reserved;
398 __u64 data;
399};
400
401Application code should set the 'nmsrs' member (which indicates the
402size of the entries array) and the 'index' member of each array entry.
403kvm will fill in the 'data' member.
404
4054.18 KVM_SET_MSRS
406
407Capability: basic
408Architectures: x86
409Type: vcpu ioctl
410Parameters: struct kvm_msrs (in)
411Returns: 0 on success, -1 on error
412
413Writes model-specific registers to the vcpu. See KVM_GET_MSRS for the
414data structures.
415
416Application code should set the 'nmsrs' member (which indicates the
417size of the entries array), and the 'index' and 'data' members of each
418array entry.
419
4204.19 KVM_SET_CPUID
421
422Capability: basic
423Architectures: x86
424Type: vcpu ioctl
425Parameters: struct kvm_cpuid (in)
426Returns: 0 on success, -1 on error
427
428Defines the vcpu responses to the cpuid instruction. Applications
429should use the KVM_SET_CPUID2 ioctl if available.
430
431
432struct kvm_cpuid_entry {
433 __u32 function;
434 __u32 eax;
435 __u32 ebx;
436 __u32 ecx;
437 __u32 edx;
438 __u32 padding;
439};
440
441/* for KVM_SET_CPUID */
442struct kvm_cpuid {
443 __u32 nent;
444 __u32 padding;
445 struct kvm_cpuid_entry entries[0];
446};
447
4484.20 KVM_SET_SIGNAL_MASK
449
450Capability: basic
451Architectures: x86
452Type: vcpu ioctl
453Parameters: struct kvm_signal_mask (in)
454Returns: 0 on success, -1 on error
455
456Defines which signals are blocked during execution of KVM_RUN. This
457signal mask temporarily overrides the threads signal mask. Any
458unblocked signal received (except SIGKILL and SIGSTOP, which retain
459their traditional behaviour) will cause KVM_RUN to return with -EINTR.
460
461Note the signal will only be delivered if not blocked by the original
462signal mask.
463
464/* for KVM_SET_SIGNAL_MASK */
465struct kvm_signal_mask {
466 __u32 len;
467 __u8 sigset[0];
468};
469
4704.21 KVM_GET_FPU
471
472Capability: basic
473Architectures: x86
474Type: vcpu ioctl
475Parameters: struct kvm_fpu (out)
476Returns: 0 on success, -1 on error
477
478Reads the floating point state from the vcpu.
479
480/* for KVM_GET_FPU and KVM_SET_FPU */
481struct kvm_fpu {
482 __u8 fpr[8][16];
483 __u16 fcw;
484 __u16 fsw;
485 __u8 ftwx; /* in fxsave format */
486 __u8 pad1;
487 __u16 last_opcode;
488 __u64 last_ip;
489 __u64 last_dp;
490 __u8 xmm[16][16];
491 __u32 mxcsr;
492 __u32 pad2;
493};
494
4954.22 KVM_SET_FPU
496
497Capability: basic
498Architectures: x86
499Type: vcpu ioctl
500Parameters: struct kvm_fpu (in)
501Returns: 0 on success, -1 on error
502
503Writes the floating point state to the vcpu.
504
505/* for KVM_GET_FPU and KVM_SET_FPU */
506struct kvm_fpu {
507 __u8 fpr[8][16];
508 __u16 fcw;
509 __u16 fsw;
510 __u8 ftwx; /* in fxsave format */
511 __u8 pad1;
512 __u16 last_opcode;
513 __u64 last_ip;
514 __u64 last_dp;
515 __u8 xmm[16][16];
516 __u32 mxcsr;
517 __u32 pad2;
518};
519
5205. The kvm_run structure
521
522Application code obtains a pointer to the kvm_run structure by
523mmap()ing a vcpu fd. From that point, application code can control
524execution by changing fields in kvm_run prior to calling the KVM_RUN
525ioctl, and obtain information about the reason KVM_RUN returned by
526looking up structure members.
527
528struct kvm_run {
529 /* in */
530 __u8 request_interrupt_window;
531
532Request that KVM_RUN return when it becomes possible to inject external
533interrupts into the guest. Useful in conjunction with KVM_INTERRUPT.
534
535 __u8 padding1[7];
536
537 /* out */
538 __u32 exit_reason;
539
540When KVM_RUN has returned successfully (return value 0), this informs
541application code why KVM_RUN has returned. Allowable values for this
542field are detailed below.
543
544 __u8 ready_for_interrupt_injection;
545
546If request_interrupt_window has been specified, this field indicates
547an interrupt can be injected now with KVM_INTERRUPT.
548
549 __u8 if_flag;
550
551The value of the current interrupt flag. Only valid if in-kernel
552local APIC is not used.
553
554 __u8 padding2[2];
555
556 /* in (pre_kvm_run), out (post_kvm_run) */
557 __u64 cr8;
558
559The value of the cr8 register. Only valid if in-kernel local APIC is
560not used. Both input and output.
561
562 __u64 apic_base;
563
564The value of the APIC BASE msr. Only valid if in-kernel local
565APIC is not used. Both input and output.
566
567 union {
568 /* KVM_EXIT_UNKNOWN */
569 struct {
570 __u64 hardware_exit_reason;
571 } hw;
572
573If exit_reason is KVM_EXIT_UNKNOWN, the vcpu has exited due to unknown
574reasons. Further architecture-specific information is available in
575hardware_exit_reason.
576
577 /* KVM_EXIT_FAIL_ENTRY */
578 struct {
579 __u64 hardware_entry_failure_reason;
580 } fail_entry;
581
582If exit_reason is KVM_EXIT_FAIL_ENTRY, the vcpu could not be run due
583to unknown reasons. Further architecture-specific information is
584available in hardware_entry_failure_reason.
585
586 /* KVM_EXIT_EXCEPTION */
587 struct {
588 __u32 exception;
589 __u32 error_code;
590 } ex;
591
592Unused.
593
594 /* KVM_EXIT_IO */
595 struct {
596#define KVM_EXIT_IO_IN 0
597#define KVM_EXIT_IO_OUT 1
598 __u8 direction;
599 __u8 size; /* bytes */
600 __u16 port;
601 __u32 count;
602 __u64 data_offset; /* relative to kvm_run start */
603 } io;
604
605If exit_reason is KVM_EXIT_IO_IN or KVM_EXIT_IO_OUT, then the vcpu has
606executed a port I/O instruction which could not be satisfied by kvm.
607data_offset describes where the data is located (KVM_EXIT_IO_OUT) or
608where kvm expects application code to place the data for the next
609KVM_RUN invocation (KVM_EXIT_IO_IN). Data format is a patcked array.
610
611 struct {
612 struct kvm_debug_exit_arch arch;
613 } debug;
614
615Unused.
616
617 /* KVM_EXIT_MMIO */
618 struct {
619 __u64 phys_addr;
620 __u8 data[8];
621 __u32 len;
622 __u8 is_write;
623 } mmio;
624
625If exit_reason is KVM_EXIT_MMIO or KVM_EXIT_IO_OUT, then the vcpu has
626executed a memory-mapped I/O instruction which could not be satisfied
627by kvm. The 'data' member contains the written data if 'is_write' is
628true, and should be filled by application code otherwise.
629
630 /* KVM_EXIT_HYPERCALL */
631 struct {
632 __u64 nr;
633 __u64 args[6];
634 __u64 ret;
635 __u32 longmode;
636 __u32 pad;
637 } hypercall;
638
639Unused.
640
641 /* KVM_EXIT_TPR_ACCESS */
642 struct {
643 __u64 rip;
644 __u32 is_write;
645 __u32 pad;
646 } tpr_access;
647
648To be documented (KVM_TPR_ACCESS_REPORTING).
649
650 /* KVM_EXIT_S390_SIEIC */
651 struct {
652 __u8 icptcode;
653 __u64 mask; /* psw upper half */
654 __u64 addr; /* psw lower half */
655 __u16 ipa;
656 __u32 ipb;
657 } s390_sieic;
658
659s390 specific.
660
661 /* KVM_EXIT_S390_RESET */
662#define KVM_S390_RESET_POR 1
663#define KVM_S390_RESET_CLEAR 2
664#define KVM_S390_RESET_SUBSYSTEM 4
665#define KVM_S390_RESET_CPU_INIT 8
666#define KVM_S390_RESET_IPL 16
667 __u64 s390_reset_flags;
668
669s390 specific.
670
671 /* KVM_EXIT_DCR */
672 struct {
673 __u32 dcrn;
674 __u32 data;
675 __u8 is_write;
676 } dcr;
677
678powerpc specific.
679
680 /* Fix the size of the union. */
681 char padding[256];
682 };
683};