]> git.ipfire.org Git - thirdparty/kernel/stable.git/blame - Documentation/virtual/kvm/api.txt
Merge branch 'khdr_fix' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux...
[thirdparty/kernel/stable.git] / Documentation / virtual / kvm / api.txt
CommitLineData
9c1b96e3
AK
1The Definitive KVM (Kernel-based Virtual Machine) API Documentation
2===================================================================
3
41. General description
414fa985 5----------------------
9c1b96e3
AK
6
7The kvm API is a set of ioctls that are issued to control various aspects
8of a virtual machine. The ioctls belong to three classes
9
10 - System ioctls: These query and set global attributes which affect the
11 whole kvm subsystem. In addition a system ioctl is used to create
12 virtual machines
13
14 - VM ioctls: These query and set attributes that affect an entire virtual
15 machine, for example memory layout. In addition a VM ioctl is used to
16 create virtual cpus (vcpus).
17
18 Only run VM ioctls from the same process (address space) that was used
19 to create the VM.
20
21 - vcpu ioctls: These query and set attributes that control the operation
22 of a single virtual cpu.
23
24 Only run vcpu ioctls from the same thread that was used to create the
25 vcpu.
26
414fa985 27
2044892d 282. File descriptors
414fa985 29-------------------
9c1b96e3
AK
30
31The kvm API is centered around file descriptors. An initial
32open("/dev/kvm") obtains a handle to the kvm subsystem; this handle
33can be used to issue system ioctls. A KVM_CREATE_VM ioctl on this
2044892d 34handle will create a VM file descriptor which can be used to issue VM
9c1b96e3
AK
35ioctls. A KVM_CREATE_VCPU ioctl on a VM fd will create a virtual cpu
36and return a file descriptor pointing to it. Finally, ioctls on a vcpu
37fd can be used to control the vcpu, including the important task of
38actually running guest code.
39
40In general file descriptors can be migrated among processes by means
41of fork() and the SCM_RIGHTS facility of unix domain socket. These
42kinds of tricks are explicitly not supported by kvm. While they will
43not cause harm to the host, their actual behavior is not guaranteed by
44the API. The only supported use is one virtual machine per process,
45and one vcpu per thread.
46
414fa985 47
9c1b96e3 483. Extensions
414fa985 49-------------
9c1b96e3
AK
50
51As of Linux 2.6.22, the KVM ABI has been stabilized: no backward
52incompatible change are allowed. However, there is an extension
53facility that allows backward-compatible extensions to the API to be
54queried and used.
55
c9f3f2d8 56The extension mechanism is not based on the Linux version number.
9c1b96e3
AK
57Instead, kvm defines extension identifiers and a facility to query
58whether a particular extension identifier is available. If it is, a
59set of ioctls is available for application use.
60
414fa985 61
9c1b96e3 624. API description
414fa985 63------------------
9c1b96e3
AK
64
65This section describes ioctls that can be used to control kvm guests.
66For each ioctl, the following information is provided along with a
67description:
68
69 Capability: which KVM extension provides this ioctl. Can be 'basic',
70 which means that is will be provided by any kernel that supports
7f05db6a 71 API version 12 (see section 4.1), a KVM_CAP_xyz constant, which
9c1b96e3 72 means availability needs to be checked with KVM_CHECK_EXTENSION
7f05db6a
MT
73 (see section 4.4), or 'none' which means that while not all kernels
74 support this ioctl, there's no capability bit to check its
75 availability: for kernels that don't support the ioctl,
76 the ioctl returns -ENOTTY.
9c1b96e3
AK
77
78 Architectures: which instruction set architectures provide this ioctl.
79 x86 includes both i386 and x86_64.
80
81 Type: system, vm, or vcpu.
82
83 Parameters: what parameters are accepted by the ioctl.
84
85 Returns: the return value. General error numbers (EBADF, ENOMEM, EINVAL)
86 are not detailed, but errors with specific meanings are.
87
414fa985 88
9c1b96e3
AK
894.1 KVM_GET_API_VERSION
90
91Capability: basic
92Architectures: all
93Type: system ioctl
94Parameters: none
95Returns: the constant KVM_API_VERSION (=12)
96
97This identifies the API version as the stable kvm API. It is not
98expected that this number will change. However, Linux 2.6.20 and
992.6.21 report earlier versions; these are not documented and not
100supported. Applications should refuse to run if KVM_GET_API_VERSION
101returns a value other than 12. If this check passes, all ioctls
102described as 'basic' will be available.
103
414fa985 104
9c1b96e3
AK
1054.2 KVM_CREATE_VM
106
107Capability: basic
108Architectures: all
109Type: system ioctl
e08b9637 110Parameters: machine type identifier (KVM_VM_*)
9c1b96e3
AK
111Returns: a VM fd that can be used to control the new virtual machine.
112
bcb85c88 113The new VM has no virtual cpus and no memory.
a8a3c426 114You probably want to use 0 as machine type.
e08b9637
CO
115
116In order to create user controlled virtual machines on S390, check
117KVM_CAP_S390_UCONTROL and use the flag KVM_VM_S390_UCONTROL as
118privileged user (CAP_SYS_ADMIN).
9c1b96e3 119
a8a3c426
JH
120To use hardware assisted virtualization on MIPS (VZ ASE) rather than
121the default trap & emulate implementation (which changes the virtual
122memory layout to fit in user mode), check KVM_CAP_MIPS_VZ and use the
123flag KVM_VM_MIPS_VZ.
124
414fa985 125
233a7cb2
SP
126On arm64, the physical address size for a VM (IPA Size limit) is limited
127to 40bits by default. The limit can be configured if the host supports the
128extension KVM_CAP_ARM_VM_IPA_SIZE. When supported, use
129KVM_VM_TYPE_ARM_IPA_SIZE(IPA_Bits) to set the size in the machine type
130identifier, where IPA_Bits is the maximum width of any physical
131address used by the VM. The IPA_Bits is encoded in bits[7-0] of the
132machine type identifier.
133
134e.g, to configure a guest to use 48bit physical address size :
135
136 vm_fd = ioctl(dev_fd, KVM_CREATE_VM, KVM_VM_TYPE_ARM_IPA_SIZE(48));
137
138The requested size (IPA_Bits) must be :
139 0 - Implies default size, 40bits (for backward compatibility)
140
141 or
142
143 N - Implies N bits, where N is a positive integer such that,
144 32 <= N <= Host_IPA_Limit
145
146Host_IPA_Limit is the maximum possible value for IPA_Bits on the host and
147is dependent on the CPU capability and the kernel configuration. The limit can
148be retrieved using KVM_CAP_ARM_VM_IPA_SIZE of the KVM_CHECK_EXTENSION
149ioctl() at run-time.
150
151Please note that configuring the IPA size does not affect the capability
152exposed by the guest CPUs in ID_AA64MMFR0_EL1[PARange]. It only affects
153size of the address translated by the stage2 level (guest physical to
154host physical address translations).
155
156
801e459a 1574.3 KVM_GET_MSR_INDEX_LIST, KVM_GET_MSR_FEATURE_INDEX_LIST
9c1b96e3 158
801e459a 159Capability: basic, KVM_CAP_GET_MSR_FEATURES for KVM_GET_MSR_FEATURE_INDEX_LIST
9c1b96e3 160Architectures: x86
801e459a 161Type: system ioctl
9c1b96e3
AK
162Parameters: struct kvm_msr_list (in/out)
163Returns: 0 on success; -1 on error
164Errors:
801e459a 165 EFAULT: the msr index list cannot be read from or written to
9c1b96e3
AK
166 E2BIG: the msr index list is to be to fit in the array specified by
167 the user.
168
169struct kvm_msr_list {
170 __u32 nmsrs; /* number of msrs in entries */
171 __u32 indices[0];
172};
173
801e459a
TL
174The user fills in the size of the indices array in nmsrs, and in return
175kvm adjusts nmsrs to reflect the actual number of msrs and fills in the
176indices array with their numbers.
177
178KVM_GET_MSR_INDEX_LIST returns the guest msrs that are supported. The list
179varies by kvm version and host processor, but does not change otherwise.
9c1b96e3 180
2e2602ca
AK
181Note: if kvm indicates supports MCE (KVM_CAP_MCE), then the MCE bank MSRs are
182not returned in the MSR list, as different vcpus can have a different number
183of banks, as set via the KVM_X86_SETUP_MCE ioctl.
184
801e459a
TL
185KVM_GET_MSR_FEATURE_INDEX_LIST returns the list of MSRs that can be passed
186to the KVM_GET_MSRS system ioctl. This lets userspace probe host capabilities
187and processor features that are exposed via MSRs (e.g., VMX capabilities).
188This list also varies by kvm version and host processor, but does not change
189otherwise.
190
414fa985 191
9c1b96e3
AK
1924.4 KVM_CHECK_EXTENSION
193
92b591a4 194Capability: basic, KVM_CAP_CHECK_EXTENSION_VM for vm ioctl
9c1b96e3 195Architectures: all
92b591a4 196Type: system ioctl, vm ioctl
9c1b96e3
AK
197Parameters: extension identifier (KVM_CAP_*)
198Returns: 0 if unsupported; 1 (or some other positive integer) if supported
199
200The API allows the application to query about extensions to the core
201kvm API. Userspace passes an extension identifier (an integer) and
202receives an integer that describes the extension availability.
203Generally 0 means no and 1 means yes, but some extensions may report
204additional information in the integer return value.
205
92b591a4
AG
206Based on their initialization different VMs may have different capabilities.
207It is thus encouraged to use the vm ioctl to query for capabilities (available
208with KVM_CAP_CHECK_EXTENSION_VM on the vm fd)
414fa985 209
9c1b96e3
AK
2104.5 KVM_GET_VCPU_MMAP_SIZE
211
212Capability: basic
213Architectures: all
214Type: system ioctl
215Parameters: none
216Returns: size of vcpu mmap area, in bytes
217
218The KVM_RUN ioctl (cf.) communicates with userspace via a shared
219memory region. This ioctl returns the size of that region. See the
220KVM_RUN documentation for details.
221
414fa985 222
9c1b96e3
AK
2234.6 KVM_SET_MEMORY_REGION
224
225Capability: basic
226Architectures: all
227Type: vm ioctl
228Parameters: struct kvm_memory_region (in)
229Returns: 0 on success, -1 on error
230
b74a07be 231This ioctl is obsolete and has been removed.
9c1b96e3 232
414fa985 233
68ba6974 2344.7 KVM_CREATE_VCPU
9c1b96e3
AK
235
236Capability: basic
237Architectures: all
238Type: vm ioctl
239Parameters: vcpu id (apic id on x86)
240Returns: vcpu fd on success, -1 on error
241
0b1b1dfd
GK
242This API adds a vcpu to a virtual machine. No more than max_vcpus may be added.
243The vcpu id is an integer in the range [0, max_vcpu_id).
8c3ba334
SL
244
245The recommended max_vcpus value can be retrieved using the KVM_CAP_NR_VCPUS of
246the KVM_CHECK_EXTENSION ioctl() at run-time.
247The maximum possible value for max_vcpus can be retrieved using the
248KVM_CAP_MAX_VCPUS of the KVM_CHECK_EXTENSION ioctl() at run-time.
249
76d25402
PE
250If the KVM_CAP_NR_VCPUS does not exist, you should assume that max_vcpus is 4
251cpus max.
8c3ba334
SL
252If the KVM_CAP_MAX_VCPUS does not exist, you should assume that max_vcpus is
253same as the value returned from KVM_CAP_NR_VCPUS.
9c1b96e3 254
0b1b1dfd
GK
255The maximum possible value for max_vcpu_id can be retrieved using the
256KVM_CAP_MAX_VCPU_ID of the KVM_CHECK_EXTENSION ioctl() at run-time.
257
258If the KVM_CAP_MAX_VCPU_ID does not exist, you should assume that max_vcpu_id
259is the same as the value returned from KVM_CAP_MAX_VCPUS.
260
371fefd6
PM
261On powerpc using book3s_hv mode, the vcpus are mapped onto virtual
262threads in one or more virtual CPU cores. (This is because the
263hardware requires all the hardware threads in a CPU core to be in the
264same partition.) The KVM_CAP_PPC_SMT capability indicates the number
36442687
AK
265of vcpus per virtual core (vcore). The vcore id is obtained by
266dividing the vcpu id by the number of vcpus per vcore. The vcpus in a
267given vcore will always be in the same physical core as each other
268(though that might be a different physical core from time to time).
269Userspace can control the threading (SMT) mode of the guest by its
270allocation of vcpu ids. For example, if userspace wants
271single-threaded guest vcpus, it should make all vcpu ids be a multiple
272of the number of vcpus per vcore.
273
5b1c1493
CO
274For virtual cpus that have been created with S390 user controlled virtual
275machines, the resulting vcpu fd can be memory mapped at page offset
276KVM_S390_SIE_PAGE_OFFSET in order to obtain a memory map of the virtual
277cpu's hardware control block.
278
414fa985 279
68ba6974 2804.8 KVM_GET_DIRTY_LOG (vm ioctl)
9c1b96e3
AK
281
282Capability: basic
283Architectures: x86
284Type: vm ioctl
285Parameters: struct kvm_dirty_log (in/out)
286Returns: 0 on success, -1 on error
287
288/* for KVM_GET_DIRTY_LOG */
289struct kvm_dirty_log {
290 __u32 slot;
291 __u32 padding;
292 union {
293 void __user *dirty_bitmap; /* one bit per page */
294 __u64 padding;
295 };
296};
297
298Given a memory slot, return a bitmap containing any pages dirtied
299since the last call to this ioctl. Bit 0 is the first page in the
300memory slot. Ensure the entire structure is cleared to avoid padding
301issues.
302
f481b069
PB
303If KVM_CAP_MULTI_ADDRESS_SPACE is available, bits 16-31 specifies
304the address space for which you want to return the dirty bitmap.
305They must be less than the value that KVM_CHECK_EXTENSION returns for
306the KVM_CAP_MULTI_ADDRESS_SPACE capability.
307
414fa985 308
68ba6974 3094.9 KVM_SET_MEMORY_ALIAS
9c1b96e3
AK
310
311Capability: basic
312Architectures: x86
313Type: vm ioctl
314Parameters: struct kvm_memory_alias (in)
315Returns: 0 (success), -1 (error)
316
a1f4d395 317This ioctl is obsolete and has been removed.
9c1b96e3 318
414fa985 319
68ba6974 3204.10 KVM_RUN
9c1b96e3
AK
321
322Capability: basic
323Architectures: all
324Type: vcpu ioctl
325Parameters: none
326Returns: 0 on success, -1 on error
327Errors:
328 EINTR: an unmasked signal is pending
329
330This ioctl is used to run a guest virtual cpu. While there are no
331explicit parameters, there is an implicit parameter block that can be
332obtained by mmap()ing the vcpu fd at offset 0, with the size given by
333KVM_GET_VCPU_MMAP_SIZE. The parameter block is formatted as a 'struct
334kvm_run' (see below).
335
414fa985 336
68ba6974 3374.11 KVM_GET_REGS
9c1b96e3
AK
338
339Capability: basic
379e04c7 340Architectures: all except ARM, arm64
9c1b96e3
AK
341Type: vcpu ioctl
342Parameters: struct kvm_regs (out)
343Returns: 0 on success, -1 on error
344
345Reads the general purpose registers from the vcpu.
346
347/* x86 */
348struct kvm_regs {
349 /* out (KVM_GET_REGS) / in (KVM_SET_REGS) */
350 __u64 rax, rbx, rcx, rdx;
351 __u64 rsi, rdi, rsp, rbp;
352 __u64 r8, r9, r10, r11;
353 __u64 r12, r13, r14, r15;
354 __u64 rip, rflags;
355};
356
c2d2c21b
JH
357/* mips */
358struct kvm_regs {
359 /* out (KVM_GET_REGS) / in (KVM_SET_REGS) */
360 __u64 gpr[32];
361 __u64 hi;
362 __u64 lo;
363 __u64 pc;
364};
365
414fa985 366
68ba6974 3674.12 KVM_SET_REGS
9c1b96e3
AK
368
369Capability: basic
379e04c7 370Architectures: all except ARM, arm64
9c1b96e3
AK
371Type: vcpu ioctl
372Parameters: struct kvm_regs (in)
373Returns: 0 on success, -1 on error
374
375Writes the general purpose registers into the vcpu.
376
377See KVM_GET_REGS for the data structure.
378
414fa985 379
68ba6974 3804.13 KVM_GET_SREGS
9c1b96e3
AK
381
382Capability: basic
5ce941ee 383Architectures: x86, ppc
9c1b96e3
AK
384Type: vcpu ioctl
385Parameters: struct kvm_sregs (out)
386Returns: 0 on success, -1 on error
387
388Reads special registers from the vcpu.
389
390/* x86 */
391struct kvm_sregs {
392 struct kvm_segment cs, ds, es, fs, gs, ss;
393 struct kvm_segment tr, ldt;
394 struct kvm_dtable gdt, idt;
395 __u64 cr0, cr2, cr3, cr4, cr8;
396 __u64 efer;
397 __u64 apic_base;
398 __u64 interrupt_bitmap[(KVM_NR_INTERRUPTS + 63) / 64];
399};
400
68e2ffed 401/* ppc -- see arch/powerpc/include/uapi/asm/kvm.h */
5ce941ee 402
9c1b96e3
AK
403interrupt_bitmap is a bitmap of pending external interrupts. At most
404one bit may be set. This interrupt has been acknowledged by the APIC
405but not yet injected into the cpu core.
406
414fa985 407
68ba6974 4084.14 KVM_SET_SREGS
9c1b96e3
AK
409
410Capability: basic
5ce941ee 411Architectures: x86, ppc
9c1b96e3
AK
412Type: vcpu ioctl
413Parameters: struct kvm_sregs (in)
414Returns: 0 on success, -1 on error
415
416Writes special registers into the vcpu. See KVM_GET_SREGS for the
417data structures.
418
414fa985 419
68ba6974 4204.15 KVM_TRANSLATE
9c1b96e3
AK
421
422Capability: basic
423Architectures: x86
424Type: vcpu ioctl
425Parameters: struct kvm_translation (in/out)
426Returns: 0 on success, -1 on error
427
428Translates a virtual address according to the vcpu's current address
429translation mode.
430
431struct kvm_translation {
432 /* in */
433 __u64 linear_address;
434
435 /* out */
436 __u64 physical_address;
437 __u8 valid;
438 __u8 writeable;
439 __u8 usermode;
440 __u8 pad[5];
441};
442
414fa985 443
68ba6974 4444.16 KVM_INTERRUPT
9c1b96e3
AK
445
446Capability: basic
c2d2c21b 447Architectures: x86, ppc, mips
9c1b96e3
AK
448Type: vcpu ioctl
449Parameters: struct kvm_interrupt (in)
1c1a9ce9 450Returns: 0 on success, negative on failure.
9c1b96e3 451
1c1a9ce9 452Queues a hardware interrupt vector to be injected.
9c1b96e3
AK
453
454/* for KVM_INTERRUPT */
455struct kvm_interrupt {
456 /* in */
457 __u32 irq;
458};
459
6f7a2bd4
AG
460X86:
461
1c1a9ce9
SR
462Returns: 0 on success,
463 -EEXIST if an interrupt is already enqueued
464 -EINVAL the the irq number is invalid
465 -ENXIO if the PIC is in the kernel
466 -EFAULT if the pointer is invalid
467
468Note 'irq' is an interrupt vector, not an interrupt pin or line. This
469ioctl is useful if the in-kernel PIC is not used.
9c1b96e3 470
6f7a2bd4
AG
471PPC:
472
473Queues an external interrupt to be injected. This ioctl is overleaded
474with 3 different irq values:
475
476a) KVM_INTERRUPT_SET
477
478 This injects an edge type external interrupt into the guest once it's ready
479 to receive interrupts. When injected, the interrupt is done.
480
481b) KVM_INTERRUPT_UNSET
482
483 This unsets any pending interrupt.
484
485 Only available with KVM_CAP_PPC_UNSET_IRQ.
486
487c) KVM_INTERRUPT_SET_LEVEL
488
489 This injects a level type external interrupt into the guest context. The
490 interrupt stays pending until a specific ioctl with KVM_INTERRUPT_UNSET
491 is triggered.
492
493 Only available with KVM_CAP_PPC_IRQ_LEVEL.
494
495Note that any value for 'irq' other than the ones stated above is invalid
496and incurs unexpected behavior.
497
c2d2c21b
JH
498MIPS:
499
500Queues an external interrupt to be injected into the virtual CPU. A negative
501interrupt number dequeues the interrupt.
502
414fa985 503
68ba6974 5044.17 KVM_DEBUG_GUEST
9c1b96e3
AK
505
506Capability: basic
507Architectures: none
508Type: vcpu ioctl
509Parameters: none)
510Returns: -1 on error
511
512Support for this has been removed. Use KVM_SET_GUEST_DEBUG instead.
513
414fa985 514
68ba6974 5154.18 KVM_GET_MSRS
9c1b96e3 516
801e459a 517Capability: basic (vcpu), KVM_CAP_GET_MSR_FEATURES (system)
9c1b96e3 518Architectures: x86
801e459a 519Type: system ioctl, vcpu ioctl
9c1b96e3 520Parameters: struct kvm_msrs (in/out)
801e459a
TL
521Returns: number of msrs successfully returned;
522 -1 on error
523
524When used as a system ioctl:
525Reads the values of MSR-based features that are available for the VM. This
526is similar to KVM_GET_SUPPORTED_CPUID, but it returns MSR indices and values.
527The list of msr-based features can be obtained using KVM_GET_MSR_FEATURE_INDEX_LIST
528in a system ioctl.
9c1b96e3 529
801e459a 530When used as a vcpu ioctl:
9c1b96e3 531Reads model-specific registers from the vcpu. Supported msr indices can
801e459a 532be obtained using KVM_GET_MSR_INDEX_LIST in a system ioctl.
9c1b96e3
AK
533
534struct kvm_msrs {
535 __u32 nmsrs; /* number of msrs in entries */
536 __u32 pad;
537
538 struct kvm_msr_entry entries[0];
539};
540
541struct kvm_msr_entry {
542 __u32 index;
543 __u32 reserved;
544 __u64 data;
545};
546
547Application code should set the 'nmsrs' member (which indicates the
548size of the entries array) and the 'index' member of each array entry.
549kvm will fill in the 'data' member.
550
414fa985 551
68ba6974 5524.19 KVM_SET_MSRS
9c1b96e3
AK
553
554Capability: basic
555Architectures: x86
556Type: vcpu ioctl
557Parameters: struct kvm_msrs (in)
558Returns: 0 on success, -1 on error
559
560Writes model-specific registers to the vcpu. See KVM_GET_MSRS for the
561data structures.
562
563Application code should set the 'nmsrs' member (which indicates the
564size of the entries array), and the 'index' and 'data' members of each
565array entry.
566
414fa985 567
68ba6974 5684.20 KVM_SET_CPUID
9c1b96e3
AK
569
570Capability: basic
571Architectures: x86
572Type: vcpu ioctl
573Parameters: struct kvm_cpuid (in)
574Returns: 0 on success, -1 on error
575
576Defines the vcpu responses to the cpuid instruction. Applications
577should use the KVM_SET_CPUID2 ioctl if available.
578
579
580struct kvm_cpuid_entry {
581 __u32 function;
582 __u32 eax;
583 __u32 ebx;
584 __u32 ecx;
585 __u32 edx;
586 __u32 padding;
587};
588
589/* for KVM_SET_CPUID */
590struct kvm_cpuid {
591 __u32 nent;
592 __u32 padding;
593 struct kvm_cpuid_entry entries[0];
594};
595
414fa985 596
68ba6974 5974.21 KVM_SET_SIGNAL_MASK
9c1b96e3
AK
598
599Capability: basic
572e0929 600Architectures: all
9c1b96e3
AK
601Type: vcpu ioctl
602Parameters: struct kvm_signal_mask (in)
603Returns: 0 on success, -1 on error
604
605Defines which signals are blocked during execution of KVM_RUN. This
606signal mask temporarily overrides the threads signal mask. Any
607unblocked signal received (except SIGKILL and SIGSTOP, which retain
608their traditional behaviour) will cause KVM_RUN to return with -EINTR.
609
610Note the signal will only be delivered if not blocked by the original
611signal mask.
612
613/* for KVM_SET_SIGNAL_MASK */
614struct kvm_signal_mask {
615 __u32 len;
616 __u8 sigset[0];
617};
618
414fa985 619
68ba6974 6204.22 KVM_GET_FPU
9c1b96e3
AK
621
622Capability: basic
623Architectures: x86
624Type: vcpu ioctl
625Parameters: struct kvm_fpu (out)
626Returns: 0 on success, -1 on error
627
628Reads the floating point state from the vcpu.
629
630/* for KVM_GET_FPU and KVM_SET_FPU */
631struct kvm_fpu {
632 __u8 fpr[8][16];
633 __u16 fcw;
634 __u16 fsw;
635 __u8 ftwx; /* in fxsave format */
636 __u8 pad1;
637 __u16 last_opcode;
638 __u64 last_ip;
639 __u64 last_dp;
640 __u8 xmm[16][16];
641 __u32 mxcsr;
642 __u32 pad2;
643};
644
414fa985 645
68ba6974 6464.23 KVM_SET_FPU
9c1b96e3
AK
647
648Capability: basic
649Architectures: x86
650Type: vcpu ioctl
651Parameters: struct kvm_fpu (in)
652Returns: 0 on success, -1 on error
653
654Writes the floating point state to the vcpu.
655
656/* for KVM_GET_FPU and KVM_SET_FPU */
657struct kvm_fpu {
658 __u8 fpr[8][16];
659 __u16 fcw;
660 __u16 fsw;
661 __u8 ftwx; /* in fxsave format */
662 __u8 pad1;
663 __u16 last_opcode;
664 __u64 last_ip;
665 __u64 last_dp;
666 __u8 xmm[16][16];
667 __u32 mxcsr;
668 __u32 pad2;
669};
670
414fa985 671
68ba6974 6724.24 KVM_CREATE_IRQCHIP
5dadbfd6 673
84223598 674Capability: KVM_CAP_IRQCHIP, KVM_CAP_S390_IRQCHIP (s390)
c32a4272 675Architectures: x86, ARM, arm64, s390
5dadbfd6
AK
676Type: vm ioctl
677Parameters: none
678Returns: 0 on success, -1 on error
679
ac3d3735
AP
680Creates an interrupt controller model in the kernel.
681On x86, creates a virtual ioapic, a virtual PIC (two PICs, nested), and sets up
682future vcpus to have a local APIC. IRQ routing for GSIs 0-15 is set to both
683PIC and IOAPIC; GSI 16-23 only go to the IOAPIC.
684On ARM/arm64, a GICv2 is created. Any other GIC versions require the usage of
685KVM_CREATE_DEVICE, which also supports creating a GICv2. Using
686KVM_CREATE_DEVICE is preferred over KVM_CREATE_IRQCHIP for GICv2.
687On s390, a dummy irq routing table is created.
84223598
CH
688
689Note that on s390 the KVM_CAP_S390_IRQCHIP vm capability needs to be enabled
690before KVM_CREATE_IRQCHIP can be used.
5dadbfd6 691
414fa985 692
68ba6974 6934.25 KVM_IRQ_LINE
5dadbfd6
AK
694
695Capability: KVM_CAP_IRQCHIP
c32a4272 696Architectures: x86, arm, arm64
5dadbfd6
AK
697Type: vm ioctl
698Parameters: struct kvm_irq_level
699Returns: 0 on success, -1 on error
700
701Sets the level of a GSI input to the interrupt controller model in the kernel.
86ce8535
CD
702On some architectures it is required that an interrupt controller model has
703been previously created with KVM_CREATE_IRQCHIP. Note that edge-triggered
704interrupts require the level to be set to 1 and then back to 0.
705
100943c5
GS
706On real hardware, interrupt pins can be active-low or active-high. This
707does not matter for the level field of struct kvm_irq_level: 1 always
708means active (asserted), 0 means inactive (deasserted).
709
710x86 allows the operating system to program the interrupt polarity
711(active-low/active-high) for level-triggered interrupts, and KVM used
712to consider the polarity. However, due to bitrot in the handling of
713active-low interrupts, the above convention is now valid on x86 too.
714This is signaled by KVM_CAP_X86_IOAPIC_POLARITY_IGNORED. Userspace
715should not present interrupts to the guest as active-low unless this
716capability is present (or unless it is not using the in-kernel irqchip,
717of course).
718
719
379e04c7
MZ
720ARM/arm64 can signal an interrupt either at the CPU level, or at the
721in-kernel irqchip (GIC), and for in-kernel irqchip can tell the GIC to
722use PPIs designated for specific cpus. The irq field is interpreted
723like this:
86ce8535
CD
724
725  bits: | 31 ... 24 | 23 ... 16 | 15 ... 0 |
726 field: | irq_type | vcpu_index | irq_id |
727
728The irq_type field has the following values:
729- irq_type[0]: out-of-kernel GIC: irq_id 0 is IRQ, irq_id 1 is FIQ
730- irq_type[1]: in-kernel GIC: SPI, irq_id between 32 and 1019 (incl.)
731 (the vcpu_index field is ignored)
732- irq_type[2]: in-kernel GIC: PPI, irq_id between 16 and 31 (incl.)
733
734(The irq_id field thus corresponds nicely to the IRQ ID in the ARM GIC specs)
735
100943c5 736In both cases, level is used to assert/deassert the line.
5dadbfd6
AK
737
738struct kvm_irq_level {
739 union {
740 __u32 irq; /* GSI */
741 __s32 status; /* not used for KVM_IRQ_LEVEL */
742 };
743 __u32 level; /* 0 or 1 */
744};
745
414fa985 746
68ba6974 7474.26 KVM_GET_IRQCHIP
5dadbfd6
AK
748
749Capability: KVM_CAP_IRQCHIP
c32a4272 750Architectures: x86
5dadbfd6
AK
751Type: vm ioctl
752Parameters: struct kvm_irqchip (in/out)
753Returns: 0 on success, -1 on error
754
755Reads the state of a kernel interrupt controller created with
756KVM_CREATE_IRQCHIP into a buffer provided by the caller.
757
758struct kvm_irqchip {
759 __u32 chip_id; /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */
760 __u32 pad;
761 union {
762 char dummy[512]; /* reserving space */
763 struct kvm_pic_state pic;
764 struct kvm_ioapic_state ioapic;
765 } chip;
766};
767
414fa985 768
68ba6974 7694.27 KVM_SET_IRQCHIP
5dadbfd6
AK
770
771Capability: KVM_CAP_IRQCHIP
c32a4272 772Architectures: x86
5dadbfd6
AK
773Type: vm ioctl
774Parameters: struct kvm_irqchip (in)
775Returns: 0 on success, -1 on error
776
777Sets the state of a kernel interrupt controller created with
778KVM_CREATE_IRQCHIP from a buffer provided by the caller.
779
780struct kvm_irqchip {
781 __u32 chip_id; /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */
782 __u32 pad;
783 union {
784 char dummy[512]; /* reserving space */
785 struct kvm_pic_state pic;
786 struct kvm_ioapic_state ioapic;
787 } chip;
788};
789
414fa985 790
68ba6974 7914.28 KVM_XEN_HVM_CONFIG
ffde22ac
ES
792
793Capability: KVM_CAP_XEN_HVM
794Architectures: x86
795Type: vm ioctl
796Parameters: struct kvm_xen_hvm_config (in)
797Returns: 0 on success, -1 on error
798
799Sets the MSR that the Xen HVM guest uses to initialize its hypercall
800page, and provides the starting address and size of the hypercall
801blobs in userspace. When the guest writes the MSR, kvm copies one
802page of a blob (32- or 64-bit, depending on the vcpu mode) to guest
803memory.
804
805struct kvm_xen_hvm_config {
806 __u32 flags;
807 __u32 msr;
808 __u64 blob_addr_32;
809 __u64 blob_addr_64;
810 __u8 blob_size_32;
811 __u8 blob_size_64;
812 __u8 pad2[30];
813};
814
414fa985 815
68ba6974 8164.29 KVM_GET_CLOCK
afbcf7ab
GC
817
818Capability: KVM_CAP_ADJUST_CLOCK
819Architectures: x86
820Type: vm ioctl
821Parameters: struct kvm_clock_data (out)
822Returns: 0 on success, -1 on error
823
824Gets the current timestamp of kvmclock as seen by the current guest. In
825conjunction with KVM_SET_CLOCK, it is used to ensure monotonicity on scenarios
826such as migration.
827
e3fd9a93
PB
828When KVM_CAP_ADJUST_CLOCK is passed to KVM_CHECK_EXTENSION, it returns the
829set of bits that KVM can return in struct kvm_clock_data's flag member.
830
831The only flag defined now is KVM_CLOCK_TSC_STABLE. If set, the returned
832value is the exact kvmclock value seen by all VCPUs at the instant
833when KVM_GET_CLOCK was called. If clear, the returned value is simply
834CLOCK_MONOTONIC plus a constant offset; the offset can be modified
835with KVM_SET_CLOCK. KVM will try to make all VCPUs follow this clock,
836but the exact value read by each VCPU could differ, because the host
837TSC is not stable.
838
afbcf7ab
GC
839struct kvm_clock_data {
840 __u64 clock; /* kvmclock current value */
841 __u32 flags;
842 __u32 pad[9];
843};
844
414fa985 845
68ba6974 8464.30 KVM_SET_CLOCK
afbcf7ab
GC
847
848Capability: KVM_CAP_ADJUST_CLOCK
849Architectures: x86
850Type: vm ioctl
851Parameters: struct kvm_clock_data (in)
852Returns: 0 on success, -1 on error
853
2044892d 854Sets the current timestamp of kvmclock to the value specified in its parameter.
afbcf7ab
GC
855In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios
856such as migration.
857
858struct kvm_clock_data {
859 __u64 clock; /* kvmclock current value */
860 __u32 flags;
861 __u32 pad[9];
862};
863
414fa985 864
68ba6974 8654.31 KVM_GET_VCPU_EVENTS
3cfc3092
JK
866
867Capability: KVM_CAP_VCPU_EVENTS
48005f64 868Extended by: KVM_CAP_INTR_SHADOW
b0960b95 869Architectures: x86, arm, arm64
b7b27fac 870Type: vcpu ioctl
3cfc3092
JK
871Parameters: struct kvm_vcpu_event (out)
872Returns: 0 on success, -1 on error
873
b7b27fac
DG
874X86:
875
3cfc3092
JK
876Gets currently pending exceptions, interrupts, and NMIs as well as related
877states of the vcpu.
878
879struct kvm_vcpu_events {
880 struct {
881 __u8 injected;
882 __u8 nr;
883 __u8 has_error_code;
59073aaf 884 __u8 pending;
3cfc3092
JK
885 __u32 error_code;
886 } exception;
887 struct {
888 __u8 injected;
889 __u8 nr;
890 __u8 soft;
48005f64 891 __u8 shadow;
3cfc3092
JK
892 } interrupt;
893 struct {
894 __u8 injected;
895 __u8 pending;
896 __u8 masked;
897 __u8 pad;
898 } nmi;
899 __u32 sipi_vector;
dab4b911 900 __u32 flags;
f077825a
PB
901 struct {
902 __u8 smm;
903 __u8 pending;
904 __u8 smm_inside_nmi;
905 __u8 latched_init;
906 } smi;
59073aaf
JM
907 __u8 reserved[27];
908 __u8 exception_has_payload;
909 __u64 exception_payload;
3cfc3092
JK
910};
911
59073aaf 912The following bits are defined in the flags field:
f077825a 913
59073aaf 914- KVM_VCPUEVENT_VALID_SHADOW may be set to signal that
f077825a 915 interrupt.shadow contains a valid state.
48005f64 916
59073aaf
JM
917- KVM_VCPUEVENT_VALID_SMM may be set to signal that smi contains a
918 valid state.
919
920- KVM_VCPUEVENT_VALID_PAYLOAD may be set to signal that the
921 exception_has_payload, exception_payload, and exception.pending
922 fields contain a valid state. This bit will be set whenever
923 KVM_CAP_EXCEPTION_PAYLOAD is enabled.
414fa985 924
b0960b95 925ARM/ARM64:
b7b27fac
DG
926
927If the guest accesses a device that is being emulated by the host kernel in
928such a way that a real device would generate a physical SError, KVM may make
929a virtual SError pending for that VCPU. This system error interrupt remains
930pending until the guest takes the exception by unmasking PSTATE.A.
931
932Running the VCPU may cause it to take a pending SError, or make an access that
933causes an SError to become pending. The event's description is only valid while
934the VPCU is not running.
935
936This API provides a way to read and write the pending 'event' state that is not
937visible to the guest. To save, restore or migrate a VCPU the struct representing
938the state can be read then written using this GET/SET API, along with the other
939guest-visible registers. It is not possible to 'cancel' an SError that has been
940made pending.
941
942A device being emulated in user-space may also wish to generate an SError. To do
943this the events structure can be populated by user-space. The current state
944should be read first, to ensure no existing SError is pending. If an existing
945SError is pending, the architecture's 'Multiple SError interrupts' rules should
946be followed. (2.5.3 of DDI0587.a "ARM Reliability, Availability, and
947Serviceability (RAS) Specification").
948
be26b3a7
DG
949SError exceptions always have an ESR value. Some CPUs have the ability to
950specify what the virtual SError's ESR value should be. These systems will
688e0581 951advertise KVM_CAP_ARM_INJECT_SERROR_ESR. In this case exception.has_esr will
be26b3a7
DG
952always have a non-zero value when read, and the agent making an SError pending
953should specify the ISS field in the lower 24 bits of exception.serror_esr. If
688e0581 954the system supports KVM_CAP_ARM_INJECT_SERROR_ESR, but user-space sets the events
be26b3a7
DG
955with exception.has_esr as zero, KVM will choose an ESR.
956
957Specifying exception.has_esr on a system that does not support it will return
958-EINVAL. Setting anything other than the lower 24bits of exception.serror_esr
959will return -EINVAL.
960
b7b27fac
DG
961struct kvm_vcpu_events {
962 struct {
963 __u8 serror_pending;
964 __u8 serror_has_esr;
965 /* Align it to 8 bytes */
966 __u8 pad[6];
967 __u64 serror_esr;
968 } exception;
969 __u32 reserved[12];
970};
971
68ba6974 9724.32 KVM_SET_VCPU_EVENTS
3cfc3092
JK
973
974Capability: KVM_CAP_VCPU_EVENTS
48005f64 975Extended by: KVM_CAP_INTR_SHADOW
b0960b95 976Architectures: x86, arm, arm64
b7b27fac 977Type: vcpu ioctl
3cfc3092
JK
978Parameters: struct kvm_vcpu_event (in)
979Returns: 0 on success, -1 on error
980
b7b27fac
DG
981X86:
982
3cfc3092
JK
983Set pending exceptions, interrupts, and NMIs as well as related states of the
984vcpu.
985
986See KVM_GET_VCPU_EVENTS for the data structure.
987
dab4b911 988Fields that may be modified asynchronously by running VCPUs can be excluded
f077825a
PB
989from the update. These fields are nmi.pending, sipi_vector, smi.smm,
990smi.pending. Keep the corresponding bits in the flags field cleared to
991suppress overwriting the current in-kernel state. The bits are:
dab4b911
JK
992
993KVM_VCPUEVENT_VALID_NMI_PENDING - transfer nmi.pending to the kernel
994KVM_VCPUEVENT_VALID_SIPI_VECTOR - transfer sipi_vector
f077825a 995KVM_VCPUEVENT_VALID_SMM - transfer the smi sub-struct.
dab4b911 996
48005f64
JK
997If KVM_CAP_INTR_SHADOW is available, KVM_VCPUEVENT_VALID_SHADOW can be set in
998the flags field to signal that interrupt.shadow contains a valid state and
999shall be written into the VCPU.
1000
f077825a
PB
1001KVM_VCPUEVENT_VALID_SMM can only be set if KVM_CAP_X86_SMM is available.
1002
59073aaf
JM
1003If KVM_CAP_EXCEPTION_PAYLOAD is enabled, KVM_VCPUEVENT_VALID_PAYLOAD
1004can be set in the flags field to signal that the
1005exception_has_payload, exception_payload, and exception.pending fields
1006contain a valid state and shall be written into the VCPU.
1007
b0960b95 1008ARM/ARM64:
b7b27fac
DG
1009
1010Set the pending SError exception state for this VCPU. It is not possible to
1011'cancel' an Serror that has been made pending.
1012
1013See KVM_GET_VCPU_EVENTS for the data structure.
1014
414fa985 1015
68ba6974 10164.33 KVM_GET_DEBUGREGS
a1efbe77
JK
1017
1018Capability: KVM_CAP_DEBUGREGS
1019Architectures: x86
1020Type: vm ioctl
1021Parameters: struct kvm_debugregs (out)
1022Returns: 0 on success, -1 on error
1023
1024Reads debug registers from the vcpu.
1025
1026struct kvm_debugregs {
1027 __u64 db[4];
1028 __u64 dr6;
1029 __u64 dr7;
1030 __u64 flags;
1031 __u64 reserved[9];
1032};
1033
414fa985 1034
68ba6974 10354.34 KVM_SET_DEBUGREGS
a1efbe77
JK
1036
1037Capability: KVM_CAP_DEBUGREGS
1038Architectures: x86
1039Type: vm ioctl
1040Parameters: struct kvm_debugregs (in)
1041Returns: 0 on success, -1 on error
1042
1043Writes debug registers into the vcpu.
1044
1045See KVM_GET_DEBUGREGS for the data structure. The flags field is unused
1046yet and must be cleared on entry.
1047
414fa985 1048
68ba6974 10494.35 KVM_SET_USER_MEMORY_REGION
0f2d8f4d
AK
1050
1051Capability: KVM_CAP_USER_MEM
1052Architectures: all
1053Type: vm ioctl
1054Parameters: struct kvm_userspace_memory_region (in)
1055Returns: 0 on success, -1 on error
1056
1057struct kvm_userspace_memory_region {
1058 __u32 slot;
1059 __u32 flags;
1060 __u64 guest_phys_addr;
1061 __u64 memory_size; /* bytes */
1062 __u64 userspace_addr; /* start of the userspace allocated memory */
1063};
1064
1065/* for kvm_memory_region::flags */
4d8b81ab
XG
1066#define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0)
1067#define KVM_MEM_READONLY (1UL << 1)
0f2d8f4d
AK
1068
1069This ioctl allows the user to create or modify a guest physical memory
1070slot. When changing an existing slot, it may be moved in the guest
1071physical memory space, or its flags may be modified. It may not be
1072resized. Slots may not overlap in guest physical address space.
a677e704
LC
1073Bits 0-15 of "slot" specifies the slot id and this value should be
1074less than the maximum number of user memory slots supported per VM.
1075The maximum allowed slots can be queried using KVM_CAP_NR_MEMSLOTS,
1076if this capability is supported by the architecture.
0f2d8f4d 1077
f481b069
PB
1078If KVM_CAP_MULTI_ADDRESS_SPACE is available, bits 16-31 of "slot"
1079specifies the address space which is being modified. They must be
1080less than the value that KVM_CHECK_EXTENSION returns for the
1081KVM_CAP_MULTI_ADDRESS_SPACE capability. Slots in separate address spaces
1082are unrelated; the restriction on overlapping slots only applies within
1083each address space.
1084
0f2d8f4d
AK
1085Memory for the region is taken starting at the address denoted by the
1086field userspace_addr, which must point at user addressable memory for
1087the entire memory slot size. Any object may back this memory, including
1088anonymous memory, ordinary files, and hugetlbfs.
1089
1090It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr
1091be identical. This allows large pages in the guest to be backed by large
1092pages in the host.
1093
75d61fbc
TY
1094The flags field supports two flags: KVM_MEM_LOG_DIRTY_PAGES and
1095KVM_MEM_READONLY. The former can be set to instruct KVM to keep track of
1096writes to memory within the slot. See KVM_GET_DIRTY_LOG ioctl to know how to
1097use it. The latter can be set, if KVM_CAP_READONLY_MEM capability allows it,
1098to make a new slot read-only. In this case, writes to this memory will be
1099posted to userspace as KVM_EXIT_MMIO exits.
7efd8fa1
JK
1100
1101When the KVM_CAP_SYNC_MMU capability is available, changes in the backing of
1102the memory region are automatically reflected into the guest. For example, an
1103mmap() that affects the region will be made visible immediately. Another
1104example is madvise(MADV_DROP).
0f2d8f4d
AK
1105
1106It is recommended to use this API instead of the KVM_SET_MEMORY_REGION ioctl.
1107The KVM_SET_MEMORY_REGION does not allow fine grained control over memory
1108allocation and is deprecated.
3cfc3092 1109
414fa985 1110
68ba6974 11114.36 KVM_SET_TSS_ADDR
8a5416db
AK
1112
1113Capability: KVM_CAP_SET_TSS_ADDR
1114Architectures: x86
1115Type: vm ioctl
1116Parameters: unsigned long tss_address (in)
1117Returns: 0 on success, -1 on error
1118
1119This ioctl defines the physical address of a three-page region in the guest
1120physical address space. The region must be within the first 4GB of the
1121guest physical address space and must not conflict with any memory slot
1122or any mmio address. The guest may malfunction if it accesses this memory
1123region.
1124
1125This ioctl is required on Intel-based hosts. This is needed on Intel hardware
1126because of a quirk in the virtualization implementation (see the internals
1127documentation when it pops into existence).
1128
414fa985 1129
68ba6974 11304.37 KVM_ENABLE_CAP
71fbfd5f 1131
d938dc55 1132Capability: KVM_CAP_ENABLE_CAP, KVM_CAP_ENABLE_CAP_VM
90de4a18
NA
1133Architectures: x86 (only KVM_CAP_ENABLE_CAP_VM),
1134 mips (only KVM_CAP_ENABLE_CAP), ppc, s390
d938dc55 1135Type: vcpu ioctl, vm ioctl (with KVM_CAP_ENABLE_CAP_VM)
71fbfd5f
AG
1136Parameters: struct kvm_enable_cap (in)
1137Returns: 0 on success; -1 on error
1138
1139+Not all extensions are enabled by default. Using this ioctl the application
1140can enable an extension, making it available to the guest.
1141
1142On systems that do not support this ioctl, it always fails. On systems that
1143do support it, it only works for extensions that are supported for enablement.
1144
1145To check if a capability can be enabled, the KVM_CHECK_EXTENSION ioctl should
1146be used.
1147
1148struct kvm_enable_cap {
1149 /* in */
1150 __u32 cap;
1151
1152The capability that is supposed to get enabled.
1153
1154 __u32 flags;
1155
1156A bitfield indicating future enhancements. Has to be 0 for now.
1157
1158 __u64 args[4];
1159
1160Arguments for enabling a feature. If a feature needs initial values to
1161function properly, this is the place to put them.
1162
1163 __u8 pad[64];
1164};
1165
d938dc55
CH
1166The vcpu ioctl should be used for vcpu-specific capabilities, the vm ioctl
1167for vm-wide capabilities.
414fa985 1168
68ba6974 11694.38 KVM_GET_MP_STATE
b843f065
AK
1170
1171Capability: KVM_CAP_MP_STATE
ecccf0cc 1172Architectures: x86, s390, arm, arm64
b843f065
AK
1173Type: vcpu ioctl
1174Parameters: struct kvm_mp_state (out)
1175Returns: 0 on success; -1 on error
1176
1177struct kvm_mp_state {
1178 __u32 mp_state;
1179};
1180
1181Returns the vcpu's current "multiprocessing state" (though also valid on
1182uniprocessor guests).
1183
1184Possible values are:
1185
ecccf0cc 1186 - KVM_MP_STATE_RUNNABLE: the vcpu is currently running [x86,arm/arm64]
b843f065 1187 - KVM_MP_STATE_UNINITIALIZED: the vcpu is an application processor (AP)
c32a4272 1188 which has not yet received an INIT signal [x86]
b843f065 1189 - KVM_MP_STATE_INIT_RECEIVED: the vcpu has received an INIT signal, and is
c32a4272 1190 now ready for a SIPI [x86]
b843f065 1191 - KVM_MP_STATE_HALTED: the vcpu has executed a HLT instruction and
c32a4272 1192 is waiting for an interrupt [x86]
b843f065 1193 - KVM_MP_STATE_SIPI_RECEIVED: the vcpu has just received a SIPI (vector
c32a4272 1194 accessible via KVM_GET_VCPU_EVENTS) [x86]
ecccf0cc 1195 - KVM_MP_STATE_STOPPED: the vcpu is stopped [s390,arm/arm64]
6352e4d2
DH
1196 - KVM_MP_STATE_CHECK_STOP: the vcpu is in a special error state [s390]
1197 - KVM_MP_STATE_OPERATING: the vcpu is operating (running or halted)
1198 [s390]
1199 - KVM_MP_STATE_LOAD: the vcpu is in a special load/startup state
1200 [s390]
b843f065 1201
c32a4272 1202On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an
0b4820d6
DH
1203in-kernel irqchip, the multiprocessing state must be maintained by userspace on
1204these architectures.
b843f065 1205
ecccf0cc
AB
1206For arm/arm64:
1207
1208The only states that are valid are KVM_MP_STATE_STOPPED and
1209KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.
414fa985 1210
68ba6974 12114.39 KVM_SET_MP_STATE
b843f065
AK
1212
1213Capability: KVM_CAP_MP_STATE
ecccf0cc 1214Architectures: x86, s390, arm, arm64
b843f065
AK
1215Type: vcpu ioctl
1216Parameters: struct kvm_mp_state (in)
1217Returns: 0 on success; -1 on error
1218
1219Sets the vcpu's current "multiprocessing state"; see KVM_GET_MP_STATE for
1220arguments.
1221
c32a4272 1222On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an
0b4820d6
DH
1223in-kernel irqchip, the multiprocessing state must be maintained by userspace on
1224these architectures.
b843f065 1225
ecccf0cc
AB
1226For arm/arm64:
1227
1228The only states that are valid are KVM_MP_STATE_STOPPED and
1229KVM_MP_STATE_RUNNABLE which reflect if the vcpu should be paused or not.
414fa985 1230
68ba6974 12314.40 KVM_SET_IDENTITY_MAP_ADDR
47dbb84f
AK
1232
1233Capability: KVM_CAP_SET_IDENTITY_MAP_ADDR
1234Architectures: x86
1235Type: vm ioctl
1236Parameters: unsigned long identity (in)
1237Returns: 0 on success, -1 on error
1238
1239This ioctl defines the physical address of a one-page region in the guest
1240physical address space. The region must be within the first 4GB of the
1241guest physical address space and must not conflict with any memory slot
1242or any mmio address. The guest may malfunction if it accesses this memory
1243region.
1244
726b99c4
DH
1245Setting the address to 0 will result in resetting the address to its default
1246(0xfffbc000).
1247
47dbb84f
AK
1248This ioctl is required on Intel-based hosts. This is needed on Intel hardware
1249because of a quirk in the virtualization implementation (see the internals
1250documentation when it pops into existence).
1251
1af1ac91 1252Fails if any VCPU has already been created.
414fa985 1253
68ba6974 12544.41 KVM_SET_BOOT_CPU_ID
57bc24cf
AK
1255
1256Capability: KVM_CAP_SET_BOOT_CPU_ID
c32a4272 1257Architectures: x86
57bc24cf
AK
1258Type: vm ioctl
1259Parameters: unsigned long vcpu_id
1260Returns: 0 on success, -1 on error
1261
1262Define which vcpu is the Bootstrap Processor (BSP). Values are the same
1263as the vcpu id in KVM_CREATE_VCPU. If this ioctl is not called, the default
1264is vcpu 0.
1265
414fa985 1266
68ba6974 12674.42 KVM_GET_XSAVE
2d5b5a66
SY
1268
1269Capability: KVM_CAP_XSAVE
1270Architectures: x86
1271Type: vcpu ioctl
1272Parameters: struct kvm_xsave (out)
1273Returns: 0 on success, -1 on error
1274
1275struct kvm_xsave {
1276 __u32 region[1024];
1277};
1278
1279This ioctl would copy current vcpu's xsave struct to the userspace.
1280
414fa985 1281
68ba6974 12824.43 KVM_SET_XSAVE
2d5b5a66
SY
1283
1284Capability: KVM_CAP_XSAVE
1285Architectures: x86
1286Type: vcpu ioctl
1287Parameters: struct kvm_xsave (in)
1288Returns: 0 on success, -1 on error
1289
1290struct kvm_xsave {
1291 __u32 region[1024];
1292};
1293
1294This ioctl would copy userspace's xsave struct to the kernel.
1295
414fa985 1296
68ba6974 12974.44 KVM_GET_XCRS
2d5b5a66
SY
1298
1299Capability: KVM_CAP_XCRS
1300Architectures: x86
1301Type: vcpu ioctl
1302Parameters: struct kvm_xcrs (out)
1303Returns: 0 on success, -1 on error
1304
1305struct kvm_xcr {
1306 __u32 xcr;
1307 __u32 reserved;
1308 __u64 value;
1309};
1310
1311struct kvm_xcrs {
1312 __u32 nr_xcrs;
1313 __u32 flags;
1314 struct kvm_xcr xcrs[KVM_MAX_XCRS];
1315 __u64 padding[16];
1316};
1317
1318This ioctl would copy current vcpu's xcrs to the userspace.
1319
414fa985 1320
68ba6974 13214.45 KVM_SET_XCRS
2d5b5a66
SY
1322
1323Capability: KVM_CAP_XCRS
1324Architectures: x86
1325Type: vcpu ioctl
1326Parameters: struct kvm_xcrs (in)
1327Returns: 0 on success, -1 on error
1328
1329struct kvm_xcr {
1330 __u32 xcr;
1331 __u32 reserved;
1332 __u64 value;
1333};
1334
1335struct kvm_xcrs {
1336 __u32 nr_xcrs;
1337 __u32 flags;
1338 struct kvm_xcr xcrs[KVM_MAX_XCRS];
1339 __u64 padding[16];
1340};
1341
1342This ioctl would set vcpu's xcr to the value userspace specified.
1343
414fa985 1344
68ba6974 13454.46 KVM_GET_SUPPORTED_CPUID
d153513d
AK
1346
1347Capability: KVM_CAP_EXT_CPUID
1348Architectures: x86
1349Type: system ioctl
1350Parameters: struct kvm_cpuid2 (in/out)
1351Returns: 0 on success, -1 on error
1352
1353struct kvm_cpuid2 {
1354 __u32 nent;
1355 __u32 padding;
1356 struct kvm_cpuid_entry2 entries[0];
1357};
1358
9c15bb1d
BP
1359#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX BIT(0)
1360#define KVM_CPUID_FLAG_STATEFUL_FUNC BIT(1)
1361#define KVM_CPUID_FLAG_STATE_READ_NEXT BIT(2)
d153513d
AK
1362
1363struct kvm_cpuid_entry2 {
1364 __u32 function;
1365 __u32 index;
1366 __u32 flags;
1367 __u32 eax;
1368 __u32 ebx;
1369 __u32 ecx;
1370 __u32 edx;
1371 __u32 padding[3];
1372};
1373
df9cb9cc
JM
1374This ioctl returns x86 cpuid features which are supported by both the
1375hardware and kvm in its default configuration. Userspace can use the
1376information returned by this ioctl to construct cpuid information (for
1377KVM_SET_CPUID2) that is consistent with hardware, kernel, and
1378userspace capabilities, and with user requirements (for example, the
1379user may wish to constrain cpuid to emulate older hardware, or for
1380feature consistency across a cluster).
1381
1382Note that certain capabilities, such as KVM_CAP_X86_DISABLE_EXITS, may
1383expose cpuid features (e.g. MONITOR) which are not supported by kvm in
1384its default configuration. If userspace enables such capabilities, it
1385is responsible for modifying the results of this ioctl appropriately.
d153513d
AK
1386
1387Userspace invokes KVM_GET_SUPPORTED_CPUID by passing a kvm_cpuid2 structure
1388with the 'nent' field indicating the number of entries in the variable-size
1389array 'entries'. If the number of entries is too low to describe the cpu
1390capabilities, an error (E2BIG) is returned. If the number is too high,
1391the 'nent' field is adjusted and an error (ENOMEM) is returned. If the
1392number is just right, the 'nent' field is adjusted to the number of valid
1393entries in the 'entries' array, which is then filled.
1394
1395The entries returned are the host cpuid as returned by the cpuid instruction,
c39cbd2a
AK
1396with unknown or unsupported features masked out. Some features (for example,
1397x2apic), may not be present in the host cpu, but are exposed by kvm if it can
1398emulate them efficiently. The fields in each entry are defined as follows:
d153513d
AK
1399
1400 function: the eax value used to obtain the entry
1401 index: the ecx value used to obtain the entry (for entries that are
1402 affected by ecx)
1403 flags: an OR of zero or more of the following:
1404 KVM_CPUID_FLAG_SIGNIFCANT_INDEX:
1405 if the index field is valid
1406 KVM_CPUID_FLAG_STATEFUL_FUNC:
1407 if cpuid for this function returns different values for successive
1408 invocations; there will be several entries with the same function,
1409 all with this flag set
1410 KVM_CPUID_FLAG_STATE_READ_NEXT:
1411 for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is
1412 the first entry to be read by a cpu
1413 eax, ebx, ecx, edx: the values returned by the cpuid instruction for
1414 this function/index combination
1415
4d25a066
JK
1416The TSC deadline timer feature (CPUID leaf 1, ecx[24]) is always returned
1417as false, since the feature depends on KVM_CREATE_IRQCHIP for local APIC
1418support. Instead it is reported via
1419
1420 ioctl(KVM_CHECK_EXTENSION, KVM_CAP_TSC_DEADLINE_TIMER)
1421
1422if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the
1423feature in userspace, then you can enable the feature for KVM_SET_CPUID2.
1424
414fa985 1425
68ba6974 14264.47 KVM_PPC_GET_PVINFO
15711e9c
AG
1427
1428Capability: KVM_CAP_PPC_GET_PVINFO
1429Architectures: ppc
1430Type: vm ioctl
1431Parameters: struct kvm_ppc_pvinfo (out)
1432Returns: 0 on success, !0 on error
1433
1434struct kvm_ppc_pvinfo {
1435 __u32 flags;
1436 __u32 hcall[4];
1437 __u8 pad[108];
1438};
1439
1440This ioctl fetches PV specific information that need to be passed to the guest
1441using the device tree or other means from vm context.
1442
9202e076 1443The hcall array defines 4 instructions that make up a hypercall.
15711e9c
AG
1444
1445If any additional field gets added to this structure later on, a bit for that
1446additional piece of information will be set in the flags bitmap.
1447
9202e076
LYB
1448The flags bitmap is defined as:
1449
1450 /* the host supports the ePAPR idle hcall
1451 #define KVM_PPC_PVINFO_FLAGS_EV_IDLE (1<<0)
414fa985 1452
68ba6974 14534.52 KVM_SET_GSI_ROUTING
49f48172
JK
1454
1455Capability: KVM_CAP_IRQ_ROUTING
180ae7b1 1456Architectures: x86 s390 arm arm64
49f48172
JK
1457Type: vm ioctl
1458Parameters: struct kvm_irq_routing (in)
1459Returns: 0 on success, -1 on error
1460
1461Sets the GSI routing table entries, overwriting any previously set entries.
1462
180ae7b1
EA
1463On arm/arm64, GSI routing has the following limitation:
1464- GSI routing does not apply to KVM_IRQ_LINE but only to KVM_IRQFD.
1465
49f48172
JK
1466struct kvm_irq_routing {
1467 __u32 nr;
1468 __u32 flags;
1469 struct kvm_irq_routing_entry entries[0];
1470};
1471
1472No flags are specified so far, the corresponding field must be set to zero.
1473
1474struct kvm_irq_routing_entry {
1475 __u32 gsi;
1476 __u32 type;
1477 __u32 flags;
1478 __u32 pad;
1479 union {
1480 struct kvm_irq_routing_irqchip irqchip;
1481 struct kvm_irq_routing_msi msi;
84223598 1482 struct kvm_irq_routing_s390_adapter adapter;
5c919412 1483 struct kvm_irq_routing_hv_sint hv_sint;
49f48172
JK
1484 __u32 pad[8];
1485 } u;
1486};
1487
1488/* gsi routing entry types */
1489#define KVM_IRQ_ROUTING_IRQCHIP 1
1490#define KVM_IRQ_ROUTING_MSI 2
84223598 1491#define KVM_IRQ_ROUTING_S390_ADAPTER 3
5c919412 1492#define KVM_IRQ_ROUTING_HV_SINT 4
49f48172 1493
76a10b86 1494flags:
6f49b2f3
PB
1495- KVM_MSI_VALID_DEVID: used along with KVM_IRQ_ROUTING_MSI routing entry
1496 type, specifies that the devid field contains a valid value. The per-VM
1497 KVM_CAP_MSI_DEVID capability advertises the requirement to provide
1498 the device ID. If this capability is not available, userspace should
1499 never set the KVM_MSI_VALID_DEVID flag as the ioctl might fail.
76a10b86 1500- zero otherwise
49f48172
JK
1501
1502struct kvm_irq_routing_irqchip {
1503 __u32 irqchip;
1504 __u32 pin;
1505};
1506
1507struct kvm_irq_routing_msi {
1508 __u32 address_lo;
1509 __u32 address_hi;
1510 __u32 data;
76a10b86
EA
1511 union {
1512 __u32 pad;
1513 __u32 devid;
1514 };
49f48172
JK
1515};
1516
6f49b2f3
PB
1517If KVM_MSI_VALID_DEVID is set, devid contains a unique device identifier
1518for the device that wrote the MSI message. For PCI, this is usually a
1519BFD identifier in the lower 16 bits.
76a10b86 1520
37131313
RK
1521On x86, address_hi is ignored unless the KVM_X2APIC_API_USE_32BIT_IDS
1522feature of KVM_CAP_X2APIC_API capability is enabled. If it is enabled,
1523address_hi bits 31-8 provide bits 31-8 of the destination id. Bits 7-0 of
1524address_hi must be zero.
1525
84223598
CH
1526struct kvm_irq_routing_s390_adapter {
1527 __u64 ind_addr;
1528 __u64 summary_addr;
1529 __u64 ind_offset;
1530 __u32 summary_offset;
1531 __u32 adapter_id;
1532};
1533
5c919412
AS
1534struct kvm_irq_routing_hv_sint {
1535 __u32 vcpu;
1536 __u32 sint;
1537};
414fa985 1538
414fa985
JK
1539
15404.55 KVM_SET_TSC_KHZ
92a1f12d
JR
1541
1542Capability: KVM_CAP_TSC_CONTROL
1543Architectures: x86
1544Type: vcpu ioctl
1545Parameters: virtual tsc_khz
1546Returns: 0 on success, -1 on error
1547
1548Specifies the tsc frequency for the virtual machine. The unit of the
1549frequency is KHz.
1550
414fa985
JK
1551
15524.56 KVM_GET_TSC_KHZ
92a1f12d
JR
1553
1554Capability: KVM_CAP_GET_TSC_KHZ
1555Architectures: x86
1556Type: vcpu ioctl
1557Parameters: none
1558Returns: virtual tsc-khz on success, negative value on error
1559
1560Returns the tsc frequency of the guest. The unit of the return value is
1561KHz. If the host has unstable tsc this ioctl returns -EIO instead as an
1562error.
1563
414fa985
JK
1564
15654.57 KVM_GET_LAPIC
e7677933
AK
1566
1567Capability: KVM_CAP_IRQCHIP
1568Architectures: x86
1569Type: vcpu ioctl
1570Parameters: struct kvm_lapic_state (out)
1571Returns: 0 on success, -1 on error
1572
1573#define KVM_APIC_REG_SIZE 0x400
1574struct kvm_lapic_state {
1575 char regs[KVM_APIC_REG_SIZE];
1576};
1577
1578Reads the Local APIC registers and copies them into the input argument. The
1579data format and layout are the same as documented in the architecture manual.
1580
37131313
RK
1581If KVM_X2APIC_API_USE_32BIT_IDS feature of KVM_CAP_X2APIC_API is
1582enabled, then the format of APIC_ID register depends on the APIC mode
1583(reported by MSR_IA32_APICBASE) of its VCPU. x2APIC stores APIC ID in
1584the APIC_ID register (bytes 32-35). xAPIC only allows an 8-bit APIC ID
1585which is stored in bits 31-24 of the APIC register, or equivalently in
1586byte 35 of struct kvm_lapic_state's regs field. KVM_GET_LAPIC must then
1587be called after MSR_IA32_APICBASE has been set with KVM_SET_MSR.
1588
1589If KVM_X2APIC_API_USE_32BIT_IDS feature is disabled, struct kvm_lapic_state
1590always uses xAPIC format.
1591
414fa985
JK
1592
15934.58 KVM_SET_LAPIC
e7677933
AK
1594
1595Capability: KVM_CAP_IRQCHIP
1596Architectures: x86
1597Type: vcpu ioctl
1598Parameters: struct kvm_lapic_state (in)
1599Returns: 0 on success, -1 on error
1600
1601#define KVM_APIC_REG_SIZE 0x400
1602struct kvm_lapic_state {
1603 char regs[KVM_APIC_REG_SIZE];
1604};
1605
df5cbb27 1606Copies the input argument into the Local APIC registers. The data format
e7677933
AK
1607and layout are the same as documented in the architecture manual.
1608
37131313
RK
1609The format of the APIC ID register (bytes 32-35 of struct kvm_lapic_state's
1610regs field) depends on the state of the KVM_CAP_X2APIC_API capability.
1611See the note in KVM_GET_LAPIC.
1612
414fa985
JK
1613
16144.59 KVM_IOEVENTFD
55399a02
SL
1615
1616Capability: KVM_CAP_IOEVENTFD
1617Architectures: all
1618Type: vm ioctl
1619Parameters: struct kvm_ioeventfd (in)
1620Returns: 0 on success, !0 on error
1621
1622This ioctl attaches or detaches an ioeventfd to a legal pio/mmio address
1623within the guest. A guest write in the registered address will signal the
1624provided event instead of triggering an exit.
1625
1626struct kvm_ioeventfd {
1627 __u64 datamatch;
1628 __u64 addr; /* legal pio/mmio address */
e9ea5069 1629 __u32 len; /* 0, 1, 2, 4, or 8 bytes */
55399a02
SL
1630 __s32 fd;
1631 __u32 flags;
1632 __u8 pad[36];
1633};
1634
2b83451b
CH
1635For the special case of virtio-ccw devices on s390, the ioevent is matched
1636to a subchannel/virtqueue tuple instead.
1637
55399a02
SL
1638The following flags are defined:
1639
1640#define KVM_IOEVENTFD_FLAG_DATAMATCH (1 << kvm_ioeventfd_flag_nr_datamatch)
1641#define KVM_IOEVENTFD_FLAG_PIO (1 << kvm_ioeventfd_flag_nr_pio)
1642#define KVM_IOEVENTFD_FLAG_DEASSIGN (1 << kvm_ioeventfd_flag_nr_deassign)
2b83451b
CH
1643#define KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY \
1644 (1 << kvm_ioeventfd_flag_nr_virtio_ccw_notify)
55399a02
SL
1645
1646If datamatch flag is set, the event will be signaled only if the written value
1647to the registered address is equal to datamatch in struct kvm_ioeventfd.
1648
2b83451b
CH
1649For virtio-ccw devices, addr contains the subchannel id and datamatch the
1650virtqueue index.
1651
e9ea5069
JW
1652With KVM_CAP_IOEVENTFD_ANY_LENGTH, a zero length ioeventfd is allowed, and
1653the kernel will ignore the length of guest write and may get a faster vmexit.
1654The speedup may only apply to specific architectures, but the ioeventfd will
1655work anyway.
414fa985
JK
1656
16574.60 KVM_DIRTY_TLB
dc83b8bc
SW
1658
1659Capability: KVM_CAP_SW_TLB
1660Architectures: ppc
1661Type: vcpu ioctl
1662Parameters: struct kvm_dirty_tlb (in)
1663Returns: 0 on success, -1 on error
1664
1665struct kvm_dirty_tlb {
1666 __u64 bitmap;
1667 __u32 num_dirty;
1668};
1669
1670This must be called whenever userspace has changed an entry in the shared
1671TLB, prior to calling KVM_RUN on the associated vcpu.
1672
1673The "bitmap" field is the userspace address of an array. This array
1674consists of a number of bits, equal to the total number of TLB entries as
1675determined by the last successful call to KVM_CONFIG_TLB, rounded up to the
1676nearest multiple of 64.
1677
1678Each bit corresponds to one TLB entry, ordered the same as in the shared TLB
1679array.
1680
1681The array is little-endian: the bit 0 is the least significant bit of the
1682first byte, bit 8 is the least significant bit of the second byte, etc.
1683This avoids any complications with differing word sizes.
1684
1685The "num_dirty" field is a performance hint for KVM to determine whether it
1686should skip processing the bitmap and just invalidate everything. It must
1687be set to the number of set bits in the bitmap.
1688
414fa985 1689
54738c09
DG
16904.62 KVM_CREATE_SPAPR_TCE
1691
1692Capability: KVM_CAP_SPAPR_TCE
1693Architectures: powerpc
1694Type: vm ioctl
1695Parameters: struct kvm_create_spapr_tce (in)
1696Returns: file descriptor for manipulating the created TCE table
1697
1698This creates a virtual TCE (translation control entry) table, which
1699is an IOMMU for PAPR-style virtual I/O. It is used to translate
1700logical addresses used in virtual I/O into guest physical addresses,
1701and provides a scatter/gather capability for PAPR virtual I/O.
1702
1703/* for KVM_CAP_SPAPR_TCE */
1704struct kvm_create_spapr_tce {
1705 __u64 liobn;
1706 __u32 window_size;
1707};
1708
1709The liobn field gives the logical IO bus number for which to create a
1710TCE table. The window_size field specifies the size of the DMA window
1711which this TCE table will translate - the table will contain one 64
1712bit TCE entry for every 4kiB of the DMA window.
1713
1714When the guest issues an H_PUT_TCE hcall on a liobn for which a TCE
1715table has been created using this ioctl(), the kernel will handle it
1716in real mode, updating the TCE table. H_PUT_TCE calls for other
1717liobns will cause a vm exit and must be handled by userspace.
1718
1719The return value is a file descriptor which can be passed to mmap(2)
1720to map the created TCE table into userspace. This lets userspace read
1721the entries written by kernel-handled H_PUT_TCE calls, and also lets
1722userspace update the TCE table directly which is useful in some
1723circumstances.
1724
414fa985 1725
aa04b4cc
PM
17264.63 KVM_ALLOCATE_RMA
1727
1728Capability: KVM_CAP_PPC_RMA
1729Architectures: powerpc
1730Type: vm ioctl
1731Parameters: struct kvm_allocate_rma (out)
1732Returns: file descriptor for mapping the allocated RMA
1733
1734This allocates a Real Mode Area (RMA) from the pool allocated at boot
1735time by the kernel. An RMA is a physically-contiguous, aligned region
1736of memory used on older POWER processors to provide the memory which
1737will be accessed by real-mode (MMU off) accesses in a KVM guest.
1738POWER processors support a set of sizes for the RMA that usually
1739includes 64MB, 128MB, 256MB and some larger powers of two.
1740
1741/* for KVM_ALLOCATE_RMA */
1742struct kvm_allocate_rma {
1743 __u64 rma_size;
1744};
1745
1746The return value is a file descriptor which can be passed to mmap(2)
1747to map the allocated RMA into userspace. The mapped area can then be
1748passed to the KVM_SET_USER_MEMORY_REGION ioctl to establish it as the
1749RMA for a virtual machine. The size of the RMA in bytes (which is
1750fixed at host kernel boot time) is returned in the rma_size field of
1751the argument structure.
1752
1753The KVM_CAP_PPC_RMA capability is 1 or 2 if the KVM_ALLOCATE_RMA ioctl
1754is supported; 2 if the processor requires all virtual machines to have
1755an RMA, or 1 if the processor can use an RMA but doesn't require it,
1756because it supports the Virtual RMA (VRMA) facility.
1757
414fa985 1758
3f745f1e
AK
17594.64 KVM_NMI
1760
1761Capability: KVM_CAP_USER_NMI
1762Architectures: x86
1763Type: vcpu ioctl
1764Parameters: none
1765Returns: 0 on success, -1 on error
1766
1767Queues an NMI on the thread's vcpu. Note this is well defined only
1768when KVM_CREATE_IRQCHIP has not been called, since this is an interface
1769between the virtual cpu core and virtual local APIC. After KVM_CREATE_IRQCHIP
1770has been called, this interface is completely emulated within the kernel.
1771
1772To use this to emulate the LINT1 input with KVM_CREATE_IRQCHIP, use the
1773following algorithm:
1774
5d4f6f3d 1775 - pause the vcpu
3f745f1e
AK
1776 - read the local APIC's state (KVM_GET_LAPIC)
1777 - check whether changing LINT1 will queue an NMI (see the LVT entry for LINT1)
1778 - if so, issue KVM_NMI
1779 - resume the vcpu
1780
1781Some guests configure the LINT1 NMI input to cause a panic, aiding in
1782debugging.
1783
414fa985 1784
e24ed81f 17854.65 KVM_S390_UCAS_MAP
27e0393f
CO
1786
1787Capability: KVM_CAP_S390_UCONTROL
1788Architectures: s390
1789Type: vcpu ioctl
1790Parameters: struct kvm_s390_ucas_mapping (in)
1791Returns: 0 in case of success
1792
1793The parameter is defined like this:
1794 struct kvm_s390_ucas_mapping {
1795 __u64 user_addr;
1796 __u64 vcpu_addr;
1797 __u64 length;
1798 };
1799
1800This ioctl maps the memory at "user_addr" with the length "length" to
1801the vcpu's address space starting at "vcpu_addr". All parameters need to
f884ab15 1802be aligned by 1 megabyte.
27e0393f 1803
414fa985 1804
e24ed81f 18054.66 KVM_S390_UCAS_UNMAP
27e0393f
CO
1806
1807Capability: KVM_CAP_S390_UCONTROL
1808Architectures: s390
1809Type: vcpu ioctl
1810Parameters: struct kvm_s390_ucas_mapping (in)
1811Returns: 0 in case of success
1812
1813The parameter is defined like this:
1814 struct kvm_s390_ucas_mapping {
1815 __u64 user_addr;
1816 __u64 vcpu_addr;
1817 __u64 length;
1818 };
1819
1820This ioctl unmaps the memory in the vcpu's address space starting at
1821"vcpu_addr" with the length "length". The field "user_addr" is ignored.
f884ab15 1822All parameters need to be aligned by 1 megabyte.
27e0393f 1823
414fa985 1824
e24ed81f 18254.67 KVM_S390_VCPU_FAULT
ccc7910f
CO
1826
1827Capability: KVM_CAP_S390_UCONTROL
1828Architectures: s390
1829Type: vcpu ioctl
1830Parameters: vcpu absolute address (in)
1831Returns: 0 in case of success
1832
1833This call creates a page table entry on the virtual cpu's address space
1834(for user controlled virtual machines) or the virtual machine's address
1835space (for regular virtual machines). This only works for minor faults,
1836thus it's recommended to access subject memory page via the user page
1837table upfront. This is useful to handle validity intercepts for user
1838controlled virtual machines to fault in the virtual cpu's lowcore pages
1839prior to calling the KVM_RUN ioctl.
1840
414fa985 1841
e24ed81f
AG
18424.68 KVM_SET_ONE_REG
1843
1844Capability: KVM_CAP_ONE_REG
1845Architectures: all
1846Type: vcpu ioctl
1847Parameters: struct kvm_one_reg (in)
1848Returns: 0 on success, negative value on failure
1849
1850struct kvm_one_reg {
1851 __u64 id;
1852 __u64 addr;
1853};
1854
1855Using this ioctl, a single vcpu register can be set to a specific value
1856defined by user space with the passed in struct kvm_one_reg, where id
1857refers to the register identifier as described below and addr is a pointer
1858to a variable with the respective size. There can be architecture agnostic
1859and architecture specific registers. Each have their own range of operation
1860and their own constants and width. To keep track of the implemented
1861registers, find a list below:
1862
bf5590f3
JH
1863 Arch | Register | Width (bits)
1864 | |
1865 PPC | KVM_REG_PPC_HIOR | 64
1866 PPC | KVM_REG_PPC_IAC1 | 64
1867 PPC | KVM_REG_PPC_IAC2 | 64
1868 PPC | KVM_REG_PPC_IAC3 | 64
1869 PPC | KVM_REG_PPC_IAC4 | 64
1870 PPC | KVM_REG_PPC_DAC1 | 64
1871 PPC | KVM_REG_PPC_DAC2 | 64
1872 PPC | KVM_REG_PPC_DABR | 64
1873 PPC | KVM_REG_PPC_DSCR | 64
1874 PPC | KVM_REG_PPC_PURR | 64
1875 PPC | KVM_REG_PPC_SPURR | 64
1876 PPC | KVM_REG_PPC_DAR | 64
1877 PPC | KVM_REG_PPC_DSISR | 32
1878 PPC | KVM_REG_PPC_AMR | 64
1879 PPC | KVM_REG_PPC_UAMOR | 64
1880 PPC | KVM_REG_PPC_MMCR0 | 64
1881 PPC | KVM_REG_PPC_MMCR1 | 64
1882 PPC | KVM_REG_PPC_MMCRA | 64
1883 PPC | KVM_REG_PPC_MMCR2 | 64
1884 PPC | KVM_REG_PPC_MMCRS | 64
1885 PPC | KVM_REG_PPC_SIAR | 64
1886 PPC | KVM_REG_PPC_SDAR | 64
1887 PPC | KVM_REG_PPC_SIER | 64
1888 PPC | KVM_REG_PPC_PMC1 | 32
1889 PPC | KVM_REG_PPC_PMC2 | 32
1890 PPC | KVM_REG_PPC_PMC3 | 32
1891 PPC | KVM_REG_PPC_PMC4 | 32
1892 PPC | KVM_REG_PPC_PMC5 | 32
1893 PPC | KVM_REG_PPC_PMC6 | 32
1894 PPC | KVM_REG_PPC_PMC7 | 32
1895 PPC | KVM_REG_PPC_PMC8 | 32
1896 PPC | KVM_REG_PPC_FPR0 | 64
a8bd19ef 1897 ...
bf5590f3
JH
1898 PPC | KVM_REG_PPC_FPR31 | 64
1899 PPC | KVM_REG_PPC_VR0 | 128
a8bd19ef 1900 ...
bf5590f3
JH
1901 PPC | KVM_REG_PPC_VR31 | 128
1902 PPC | KVM_REG_PPC_VSR0 | 128
a8bd19ef 1903 ...
bf5590f3
JH
1904 PPC | KVM_REG_PPC_VSR31 | 128
1905 PPC | KVM_REG_PPC_FPSCR | 64
1906 PPC | KVM_REG_PPC_VSCR | 32
1907 PPC | KVM_REG_PPC_VPA_ADDR | 64
1908 PPC | KVM_REG_PPC_VPA_SLB | 128
1909 PPC | KVM_REG_PPC_VPA_DTL | 128
1910 PPC | KVM_REG_PPC_EPCR | 32
1911 PPC | KVM_REG_PPC_EPR | 32
1912 PPC | KVM_REG_PPC_TCR | 32
1913 PPC | KVM_REG_PPC_TSR | 32
1914 PPC | KVM_REG_PPC_OR_TSR | 32
1915 PPC | KVM_REG_PPC_CLEAR_TSR | 32
1916 PPC | KVM_REG_PPC_MAS0 | 32
1917 PPC | KVM_REG_PPC_MAS1 | 32
1918 PPC | KVM_REG_PPC_MAS2 | 64
1919 PPC | KVM_REG_PPC_MAS7_3 | 64
1920 PPC | KVM_REG_PPC_MAS4 | 32
1921 PPC | KVM_REG_PPC_MAS6 | 32
1922 PPC | KVM_REG_PPC_MMUCFG | 32
1923 PPC | KVM_REG_PPC_TLB0CFG | 32
1924 PPC | KVM_REG_PPC_TLB1CFG | 32
1925 PPC | KVM_REG_PPC_TLB2CFG | 32
1926 PPC | KVM_REG_PPC_TLB3CFG | 32
1927 PPC | KVM_REG_PPC_TLB0PS | 32
1928 PPC | KVM_REG_PPC_TLB1PS | 32
1929 PPC | KVM_REG_PPC_TLB2PS | 32
1930 PPC | KVM_REG_PPC_TLB3PS | 32
1931 PPC | KVM_REG_PPC_EPTCFG | 32
1932 PPC | KVM_REG_PPC_ICP_STATE | 64
1933 PPC | KVM_REG_PPC_TB_OFFSET | 64
1934 PPC | KVM_REG_PPC_SPMC1 | 32
1935 PPC | KVM_REG_PPC_SPMC2 | 32
1936 PPC | KVM_REG_PPC_IAMR | 64
1937 PPC | KVM_REG_PPC_TFHAR | 64
1938 PPC | KVM_REG_PPC_TFIAR | 64
1939 PPC | KVM_REG_PPC_TEXASR | 64
1940 PPC | KVM_REG_PPC_FSCR | 64
1941 PPC | KVM_REG_PPC_PSPB | 32
1942 PPC | KVM_REG_PPC_EBBHR | 64
1943 PPC | KVM_REG_PPC_EBBRR | 64
1944 PPC | KVM_REG_PPC_BESCR | 64
1945 PPC | KVM_REG_PPC_TAR | 64
1946 PPC | KVM_REG_PPC_DPDES | 64
1947 PPC | KVM_REG_PPC_DAWR | 64
1948 PPC | KVM_REG_PPC_DAWRX | 64
1949 PPC | KVM_REG_PPC_CIABR | 64
1950 PPC | KVM_REG_PPC_IC | 64
1951 PPC | KVM_REG_PPC_VTB | 64
1952 PPC | KVM_REG_PPC_CSIGR | 64
1953 PPC | KVM_REG_PPC_TACR | 64
1954 PPC | KVM_REG_PPC_TCSCR | 64
1955 PPC | KVM_REG_PPC_PID | 64
1956 PPC | KVM_REG_PPC_ACOP | 64
1957 PPC | KVM_REG_PPC_VRSAVE | 32
cc568ead
PB
1958 PPC | KVM_REG_PPC_LPCR | 32
1959 PPC | KVM_REG_PPC_LPCR_64 | 64
bf5590f3
JH
1960 PPC | KVM_REG_PPC_PPR | 64
1961 PPC | KVM_REG_PPC_ARCH_COMPAT | 32
1962 PPC | KVM_REG_PPC_DABRX | 32
1963 PPC | KVM_REG_PPC_WORT | 64
bc8a4e5c
BB
1964 PPC | KVM_REG_PPC_SPRG9 | 64
1965 PPC | KVM_REG_PPC_DBSR | 32
e9cf1e08
PM
1966 PPC | KVM_REG_PPC_TIDR | 64
1967 PPC | KVM_REG_PPC_PSSCR | 64
5855564c 1968 PPC | KVM_REG_PPC_DEC_EXPIRY | 64
30323418 1969 PPC | KVM_REG_PPC_PTCR | 64
bf5590f3 1970 PPC | KVM_REG_PPC_TM_GPR0 | 64
3b783474 1971 ...
bf5590f3
JH
1972 PPC | KVM_REG_PPC_TM_GPR31 | 64
1973 PPC | KVM_REG_PPC_TM_VSR0 | 128
3b783474 1974 ...
bf5590f3
JH
1975 PPC | KVM_REG_PPC_TM_VSR63 | 128
1976 PPC | KVM_REG_PPC_TM_CR | 64
1977 PPC | KVM_REG_PPC_TM_LR | 64
1978 PPC | KVM_REG_PPC_TM_CTR | 64
1979 PPC | KVM_REG_PPC_TM_FPSCR | 64
1980 PPC | KVM_REG_PPC_TM_AMR | 64
1981 PPC | KVM_REG_PPC_TM_PPR | 64
1982 PPC | KVM_REG_PPC_TM_VRSAVE | 64
1983 PPC | KVM_REG_PPC_TM_VSCR | 32
1984 PPC | KVM_REG_PPC_TM_DSCR | 64
1985 PPC | KVM_REG_PPC_TM_TAR | 64
0d808df0 1986 PPC | KVM_REG_PPC_TM_XER | 64
c2d2c21b
JH
1987 | |
1988 MIPS | KVM_REG_MIPS_R0 | 64
1989 ...
1990 MIPS | KVM_REG_MIPS_R31 | 64
1991 MIPS | KVM_REG_MIPS_HI | 64
1992 MIPS | KVM_REG_MIPS_LO | 64
1993 MIPS | KVM_REG_MIPS_PC | 64
1994 MIPS | KVM_REG_MIPS_CP0_INDEX | 32
013044cc
JH
1995 MIPS | KVM_REG_MIPS_CP0_ENTRYLO0 | 64
1996 MIPS | KVM_REG_MIPS_CP0_ENTRYLO1 | 64
c2d2c21b 1997 MIPS | KVM_REG_MIPS_CP0_CONTEXT | 64
dffe042f 1998 MIPS | KVM_REG_MIPS_CP0_CONTEXTCONFIG| 32
c2d2c21b 1999 MIPS | KVM_REG_MIPS_CP0_USERLOCAL | 64
dffe042f 2000 MIPS | KVM_REG_MIPS_CP0_XCONTEXTCONFIG| 64
c2d2c21b 2001 MIPS | KVM_REG_MIPS_CP0_PAGEMASK | 32
c992a4f6 2002 MIPS | KVM_REG_MIPS_CP0_PAGEGRAIN | 32
4b7de028
JH
2003 MIPS | KVM_REG_MIPS_CP0_SEGCTL0 | 64
2004 MIPS | KVM_REG_MIPS_CP0_SEGCTL1 | 64
2005 MIPS | KVM_REG_MIPS_CP0_SEGCTL2 | 64
5a2f352f
JH
2006 MIPS | KVM_REG_MIPS_CP0_PWBASE | 64
2007 MIPS | KVM_REG_MIPS_CP0_PWFIELD | 64
2008 MIPS | KVM_REG_MIPS_CP0_PWSIZE | 64
c2d2c21b 2009 MIPS | KVM_REG_MIPS_CP0_WIRED | 32
5a2f352f 2010 MIPS | KVM_REG_MIPS_CP0_PWCTL | 32
c2d2c21b
JH
2011 MIPS | KVM_REG_MIPS_CP0_HWRENA | 32
2012 MIPS | KVM_REG_MIPS_CP0_BADVADDR | 64
edc89260
JH
2013 MIPS | KVM_REG_MIPS_CP0_BADINSTR | 32
2014 MIPS | KVM_REG_MIPS_CP0_BADINSTRP | 32
c2d2c21b
JH
2015 MIPS | KVM_REG_MIPS_CP0_COUNT | 32
2016 MIPS | KVM_REG_MIPS_CP0_ENTRYHI | 64
2017 MIPS | KVM_REG_MIPS_CP0_COMPARE | 32
2018 MIPS | KVM_REG_MIPS_CP0_STATUS | 32
ad58d4d4 2019 MIPS | KVM_REG_MIPS_CP0_INTCTL | 32
c2d2c21b
JH
2020 MIPS | KVM_REG_MIPS_CP0_CAUSE | 32
2021 MIPS | KVM_REG_MIPS_CP0_EPC | 64
1068eaaf 2022 MIPS | KVM_REG_MIPS_CP0_PRID | 32
7801bbe1 2023 MIPS | KVM_REG_MIPS_CP0_EBASE | 64
c2d2c21b
JH
2024 MIPS | KVM_REG_MIPS_CP0_CONFIG | 32
2025 MIPS | KVM_REG_MIPS_CP0_CONFIG1 | 32
2026 MIPS | KVM_REG_MIPS_CP0_CONFIG2 | 32
2027 MIPS | KVM_REG_MIPS_CP0_CONFIG3 | 32
c771607a
JH
2028 MIPS | KVM_REG_MIPS_CP0_CONFIG4 | 32
2029 MIPS | KVM_REG_MIPS_CP0_CONFIG5 | 32
c2d2c21b 2030 MIPS | KVM_REG_MIPS_CP0_CONFIG7 | 32
c992a4f6 2031 MIPS | KVM_REG_MIPS_CP0_XCONTEXT | 64
c2d2c21b 2032 MIPS | KVM_REG_MIPS_CP0_ERROREPC | 64
05108709
JH
2033 MIPS | KVM_REG_MIPS_CP0_KSCRATCH1 | 64
2034 MIPS | KVM_REG_MIPS_CP0_KSCRATCH2 | 64
2035 MIPS | KVM_REG_MIPS_CP0_KSCRATCH3 | 64
2036 MIPS | KVM_REG_MIPS_CP0_KSCRATCH4 | 64
2037 MIPS | KVM_REG_MIPS_CP0_KSCRATCH5 | 64
2038 MIPS | KVM_REG_MIPS_CP0_KSCRATCH6 | 64
d42a008f 2039 MIPS | KVM_REG_MIPS_CP0_MAAR(0..63) | 64
c2d2c21b
JH
2040 MIPS | KVM_REG_MIPS_COUNT_CTL | 64
2041 MIPS | KVM_REG_MIPS_COUNT_RESUME | 64
2042 MIPS | KVM_REG_MIPS_COUNT_HZ | 64
379245cd
JH
2043 MIPS | KVM_REG_MIPS_FPR_32(0..31) | 32
2044 MIPS | KVM_REG_MIPS_FPR_64(0..31) | 64
ab86bd60 2045 MIPS | KVM_REG_MIPS_VEC_128(0..31) | 128
379245cd
JH
2046 MIPS | KVM_REG_MIPS_FCR_IR | 32
2047 MIPS | KVM_REG_MIPS_FCR_CSR | 32
ab86bd60
JH
2048 MIPS | KVM_REG_MIPS_MSA_IR | 32
2049 MIPS | KVM_REG_MIPS_MSA_CSR | 32
414fa985 2050
749cf76c
CD
2051ARM registers are mapped using the lower 32 bits. The upper 16 of that
2052is the register group type, or coprocessor number:
2053
2054ARM core registers have the following id bit patterns:
aa404ddf 2055 0x4020 0000 0010 <index into the kvm_regs struct:16>
749cf76c 2056
1138245c 2057ARM 32-bit CP15 registers have the following id bit patterns:
aa404ddf 2058 0x4020 0000 000F <zero:1> <crn:4> <crm:4> <opc1:4> <opc2:3>
1138245c
CD
2059
2060ARM 64-bit CP15 registers have the following id bit patterns:
aa404ddf 2061 0x4030 0000 000F <zero:1> <zero:4> <crm:4> <opc1:4> <zero:3>
749cf76c 2062
c27581ed 2063ARM CCSIDR registers are demultiplexed by CSSELR value:
aa404ddf 2064 0x4020 0000 0011 00 <csselr:8>
749cf76c 2065
4fe21e4c 2066ARM 32-bit VFP control registers have the following id bit patterns:
aa404ddf 2067 0x4020 0000 0012 1 <regno:12>
4fe21e4c
RR
2068
2069ARM 64-bit FP registers have the following id bit patterns:
aa404ddf 2070 0x4030 0000 0012 0 <regno:12>
4fe21e4c 2071
85bd0ba1
MZ
2072ARM firmware pseudo-registers have the following bit pattern:
2073 0x4030 0000 0014 <regno:16>
2074
379e04c7
MZ
2075
2076arm64 registers are mapped using the lower 32 bits. The upper 16 of
2077that is the register group type, or coprocessor number:
2078
2079arm64 core/FP-SIMD registers have the following id bit patterns. Note
2080that the size of the access is variable, as the kvm_regs structure
2081contains elements ranging from 32 to 128 bits. The index is a 32bit
2082value in the kvm_regs structure seen as a 32bit array.
2083 0x60x0 0000 0010 <index into the kvm_regs struct:16>
2084
2085arm64 CCSIDR registers are demultiplexed by CSSELR value:
2086 0x6020 0000 0011 00 <csselr:8>
2087
2088arm64 system registers have the following id bit patterns:
2089 0x6030 0000 0013 <op0:2> <op1:3> <crn:4> <crm:4> <op2:3>
2090
85bd0ba1
MZ
2091arm64 firmware pseudo-registers have the following bit pattern:
2092 0x6030 0000 0014 <regno:16>
2093
c2d2c21b
JH
2094
2095MIPS registers are mapped using the lower 32 bits. The upper 16 of that is
2096the register group type:
2097
2098MIPS core registers (see above) have the following id bit patterns:
2099 0x7030 0000 0000 <reg:16>
2100
2101MIPS CP0 registers (see KVM_REG_MIPS_CP0_* above) have the following id bit
2102patterns depending on whether they're 32-bit or 64-bit registers:
2103 0x7020 0000 0001 00 <reg:5> <sel:3> (32-bit)
2104 0x7030 0000 0001 00 <reg:5> <sel:3> (64-bit)
2105
013044cc
JH
2106Note: KVM_REG_MIPS_CP0_ENTRYLO0 and KVM_REG_MIPS_CP0_ENTRYLO1 are the MIPS64
2107versions of the EntryLo registers regardless of the word size of the host
2108hardware, host kernel, guest, and whether XPA is present in the guest, i.e.
2109with the RI and XI bits (if they exist) in bits 63 and 62 respectively, and
2110the PFNX field starting at bit 30.
2111
d42a008f
JH
2112MIPS MAARs (see KVM_REG_MIPS_CP0_MAAR(*) above) have the following id bit
2113patterns:
2114 0x7030 0000 0001 01 <reg:8>
2115
c2d2c21b
JH
2116MIPS KVM control registers (see above) have the following id bit patterns:
2117 0x7030 0000 0002 <reg:16>
2118
379245cd
JH
2119MIPS FPU registers (see KVM_REG_MIPS_FPR_{32,64}() above) have the following
2120id bit patterns depending on the size of the register being accessed. They are
2121always accessed according to the current guest FPU mode (Status.FR and
2122Config5.FRE), i.e. as the guest would see them, and they become unpredictable
ab86bd60
JH
2123if the guest FPU mode is changed. MIPS SIMD Architecture (MSA) vector
2124registers (see KVM_REG_MIPS_VEC_128() above) have similar patterns as they
2125overlap the FPU registers:
379245cd
JH
2126 0x7020 0000 0003 00 <0:3> <reg:5> (32-bit FPU registers)
2127 0x7030 0000 0003 00 <0:3> <reg:5> (64-bit FPU registers)
ab86bd60 2128 0x7040 0000 0003 00 <0:3> <reg:5> (128-bit MSA vector registers)
379245cd
JH
2129
2130MIPS FPU control registers (see KVM_REG_MIPS_FCR_{IR,CSR} above) have the
2131following id bit patterns:
2132 0x7020 0000 0003 01 <0:3> <reg:5>
2133
ab86bd60
JH
2134MIPS MSA control registers (see KVM_REG_MIPS_MSA_{IR,CSR} above) have the
2135following id bit patterns:
2136 0x7020 0000 0003 02 <0:3> <reg:5>
2137
c2d2c21b 2138
e24ed81f
AG
21394.69 KVM_GET_ONE_REG
2140
2141Capability: KVM_CAP_ONE_REG
2142Architectures: all
2143Type: vcpu ioctl
2144Parameters: struct kvm_one_reg (in and out)
2145Returns: 0 on success, negative value on failure
2146
2147This ioctl allows to receive the value of a single register implemented
2148in a vcpu. The register to read is indicated by the "id" field of the
2149kvm_one_reg struct passed in. On success, the register value can be found
2150at the memory location pointed to by "addr".
2151
2152The list of registers accessible using this interface is identical to the
2e232702 2153list in 4.68.
e24ed81f 2154
414fa985 2155
1c0b28c2
EM
21564.70 KVM_KVMCLOCK_CTRL
2157
2158Capability: KVM_CAP_KVMCLOCK_CTRL
2159Architectures: Any that implement pvclocks (currently x86 only)
2160Type: vcpu ioctl
2161Parameters: None
2162Returns: 0 on success, -1 on error
2163
2164This signals to the host kernel that the specified guest is being paused by
2165userspace. The host will set a flag in the pvclock structure that is checked
2166from the soft lockup watchdog. The flag is part of the pvclock structure that
2167is shared between guest and host, specifically the second bit of the flags
2168field of the pvclock_vcpu_time_info structure. It will be set exclusively by
2169the host and read/cleared exclusively by the guest. The guest operation of
2170checking and clearing the flag must an atomic operation so
2171load-link/store-conditional, or equivalent must be used. There are two cases
2172where the guest will clear the flag: when the soft lockup watchdog timer resets
2173itself or when a soft lockup is detected. This ioctl can be called any time
2174after pausing the vcpu, but before it is resumed.
2175
414fa985 2176
07975ad3
JK
21774.71 KVM_SIGNAL_MSI
2178
2179Capability: KVM_CAP_SIGNAL_MSI
2988509d 2180Architectures: x86 arm arm64
07975ad3
JK
2181Type: vm ioctl
2182Parameters: struct kvm_msi (in)
2183Returns: >0 on delivery, 0 if guest blocked the MSI, and -1 on error
2184
2185Directly inject a MSI message. Only valid with in-kernel irqchip that handles
2186MSI messages.
2187
2188struct kvm_msi {
2189 __u32 address_lo;
2190 __u32 address_hi;
2191 __u32 data;
2192 __u32 flags;
2b8ddd93
AP
2193 __u32 devid;
2194 __u8 pad[12];
07975ad3
JK
2195};
2196
6f49b2f3
PB
2197flags: KVM_MSI_VALID_DEVID: devid contains a valid value. The per-VM
2198 KVM_CAP_MSI_DEVID capability advertises the requirement to provide
2199 the device ID. If this capability is not available, userspace
2200 should never set the KVM_MSI_VALID_DEVID flag as the ioctl might fail.
2b8ddd93 2201
6f49b2f3
PB
2202If KVM_MSI_VALID_DEVID is set, devid contains a unique device identifier
2203for the device that wrote the MSI message. For PCI, this is usually a
2204BFD identifier in the lower 16 bits.
07975ad3 2205
055b6ae9
PB
2206On x86, address_hi is ignored unless the KVM_X2APIC_API_USE_32BIT_IDS
2207feature of KVM_CAP_X2APIC_API capability is enabled. If it is enabled,
2208address_hi bits 31-8 provide bits 31-8 of the destination id. Bits 7-0 of
2209address_hi must be zero.
37131313 2210
414fa985 2211
0589ff6c
JK
22124.71 KVM_CREATE_PIT2
2213
2214Capability: KVM_CAP_PIT2
2215Architectures: x86
2216Type: vm ioctl
2217Parameters: struct kvm_pit_config (in)
2218Returns: 0 on success, -1 on error
2219
2220Creates an in-kernel device model for the i8254 PIT. This call is only valid
2221after enabling in-kernel irqchip support via KVM_CREATE_IRQCHIP. The following
2222parameters have to be passed:
2223
2224struct kvm_pit_config {
2225 __u32 flags;
2226 __u32 pad[15];
2227};
2228
2229Valid flags are:
2230
2231#define KVM_PIT_SPEAKER_DUMMY 1 /* emulate speaker port stub */
2232
b6ddf05f
JK
2233PIT timer interrupts may use a per-VM kernel thread for injection. If it
2234exists, this thread will have a name of the following pattern:
2235
2236kvm-pit/<owner-process-pid>
2237
2238When running a guest with elevated priorities, the scheduling parameters of
2239this thread may have to be adjusted accordingly.
2240
0589ff6c
JK
2241This IOCTL replaces the obsolete KVM_CREATE_PIT.
2242
2243
22444.72 KVM_GET_PIT2
2245
2246Capability: KVM_CAP_PIT_STATE2
2247Architectures: x86
2248Type: vm ioctl
2249Parameters: struct kvm_pit_state2 (out)
2250Returns: 0 on success, -1 on error
2251
2252Retrieves the state of the in-kernel PIT model. Only valid after
2253KVM_CREATE_PIT2. The state is returned in the following structure:
2254
2255struct kvm_pit_state2 {
2256 struct kvm_pit_channel_state channels[3];
2257 __u32 flags;
2258 __u32 reserved[9];
2259};
2260
2261Valid flags are:
2262
2263/* disable PIT in HPET legacy mode */
2264#define KVM_PIT_FLAGS_HPET_LEGACY 0x00000001
2265
2266This IOCTL replaces the obsolete KVM_GET_PIT.
2267
2268
22694.73 KVM_SET_PIT2
2270
2271Capability: KVM_CAP_PIT_STATE2
2272Architectures: x86
2273Type: vm ioctl
2274Parameters: struct kvm_pit_state2 (in)
2275Returns: 0 on success, -1 on error
2276
2277Sets the state of the in-kernel PIT model. Only valid after KVM_CREATE_PIT2.
2278See KVM_GET_PIT2 for details on struct kvm_pit_state2.
2279
2280This IOCTL replaces the obsolete KVM_SET_PIT.
2281
2282
5b74716e
BH
22834.74 KVM_PPC_GET_SMMU_INFO
2284
2285Capability: KVM_CAP_PPC_GET_SMMU_INFO
2286Architectures: powerpc
2287Type: vm ioctl
2288Parameters: None
2289Returns: 0 on success, -1 on error
2290
2291This populates and returns a structure describing the features of
2292the "Server" class MMU emulation supported by KVM.
cc22c354 2293This can in turn be used by userspace to generate the appropriate
5b74716e
BH
2294device-tree properties for the guest operating system.
2295
c98be0c9 2296The structure contains some global information, followed by an
5b74716e
BH
2297array of supported segment page sizes:
2298
2299 struct kvm_ppc_smmu_info {
2300 __u64 flags;
2301 __u32 slb_size;
2302 __u32 pad;
2303 struct kvm_ppc_one_seg_page_size sps[KVM_PPC_PAGE_SIZES_MAX_SZ];
2304 };
2305
2306The supported flags are:
2307
2308 - KVM_PPC_PAGE_SIZES_REAL:
2309 When that flag is set, guest page sizes must "fit" the backing
2310 store page sizes. When not set, any page size in the list can
2311 be used regardless of how they are backed by userspace.
2312
2313 - KVM_PPC_1T_SEGMENTS
2314 The emulated MMU supports 1T segments in addition to the
2315 standard 256M ones.
2316
901f8c3f
PM
2317 - KVM_PPC_NO_HASH
2318 This flag indicates that HPT guests are not supported by KVM,
2319 thus all guests must use radix MMU mode.
2320
5b74716e
BH
2321The "slb_size" field indicates how many SLB entries are supported
2322
2323The "sps" array contains 8 entries indicating the supported base
2324page sizes for a segment in increasing order. Each entry is defined
2325as follow:
2326
2327 struct kvm_ppc_one_seg_page_size {
2328 __u32 page_shift; /* Base page shift of segment (or 0) */
2329 __u32 slb_enc; /* SLB encoding for BookS */
2330 struct kvm_ppc_one_page_size enc[KVM_PPC_PAGE_SIZES_MAX_SZ];
2331 };
2332
2333An entry with a "page_shift" of 0 is unused. Because the array is
2334organized in increasing order, a lookup can stop when encoutering
2335such an entry.
2336
2337The "slb_enc" field provides the encoding to use in the SLB for the
2338page size. The bits are in positions such as the value can directly
2339be OR'ed into the "vsid" argument of the slbmte instruction.
2340
2341The "enc" array is a list which for each of those segment base page
2342size provides the list of supported actual page sizes (which can be
2343only larger or equal to the base page size), along with the
f884ab15 2344corresponding encoding in the hash PTE. Similarly, the array is
5b74716e
BH
23458 entries sorted by increasing sizes and an entry with a "0" shift
2346is an empty entry and a terminator:
2347
2348 struct kvm_ppc_one_page_size {
2349 __u32 page_shift; /* Page shift (or 0) */
2350 __u32 pte_enc; /* Encoding in the HPTE (>>12) */
2351 };
2352
2353The "pte_enc" field provides a value that can OR'ed into the hash
2354PTE's RPN field (ie, it needs to be shifted left by 12 to OR it
2355into the hash PTE second double word).
2356
f36992e3
AW
23574.75 KVM_IRQFD
2358
2359Capability: KVM_CAP_IRQFD
174178fe 2360Architectures: x86 s390 arm arm64
f36992e3
AW
2361Type: vm ioctl
2362Parameters: struct kvm_irqfd (in)
2363Returns: 0 on success, -1 on error
2364
2365Allows setting an eventfd to directly trigger a guest interrupt.
2366kvm_irqfd.fd specifies the file descriptor to use as the eventfd and
2367kvm_irqfd.gsi specifies the irqchip pin toggled by this event. When
17180032 2368an event is triggered on the eventfd, an interrupt is injected into
f36992e3
AW
2369the guest using the specified gsi pin. The irqfd is removed using
2370the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
2371and kvm_irqfd.gsi.
2372
7a84428a
AW
2373With KVM_CAP_IRQFD_RESAMPLE, KVM_IRQFD supports a de-assert and notify
2374mechanism allowing emulation of level-triggered, irqfd-based
2375interrupts. When KVM_IRQFD_FLAG_RESAMPLE is set the user must pass an
2376additional eventfd in the kvm_irqfd.resamplefd field. When operating
2377in resample mode, posting of an interrupt through kvm_irq.fd asserts
2378the specified gsi in the irqchip. When the irqchip is resampled, such
17180032 2379as from an EOI, the gsi is de-asserted and the user is notified via
7a84428a
AW
2380kvm_irqfd.resamplefd. It is the user's responsibility to re-queue
2381the interrupt if the device making use of it still requires service.
2382Note that closing the resamplefd is not sufficient to disable the
2383irqfd. The KVM_IRQFD_FLAG_RESAMPLE is only necessary on assignment
2384and need not be specified with KVM_IRQFD_FLAG_DEASSIGN.
2385
180ae7b1
EA
2386On arm/arm64, gsi routing being supported, the following can happen:
2387- in case no routing entry is associated to this gsi, injection fails
2388- in case the gsi is associated to an irqchip routing entry,
2389 irqchip.pin + 32 corresponds to the injected SPI ID.
995a0ee9
EA
2390- in case the gsi is associated to an MSI routing entry, the MSI
2391 message and device ID are translated into an LPI (support restricted
2392 to GICv3 ITS in-kernel emulation).
174178fe 2393
5fecc9d8 23944.76 KVM_PPC_ALLOCATE_HTAB
32fad281
PM
2395
2396Capability: KVM_CAP_PPC_ALLOC_HTAB
2397Architectures: powerpc
2398Type: vm ioctl
2399Parameters: Pointer to u32 containing hash table order (in/out)
2400Returns: 0 on success, -1 on error
2401
2402This requests the host kernel to allocate an MMU hash table for a
2403guest using the PAPR paravirtualization interface. This only does
2404anything if the kernel is configured to use the Book 3S HV style of
2405virtualization. Otherwise the capability doesn't exist and the ioctl
2406returns an ENOTTY error. The rest of this description assumes Book 3S
2407HV.
2408
2409There must be no vcpus running when this ioctl is called; if there
2410are, it will do nothing and return an EBUSY error.
2411
2412The parameter is a pointer to a 32-bit unsigned integer variable
2413containing the order (log base 2) of the desired size of the hash
2414table, which must be between 18 and 46. On successful return from the
f98a8bf9 2415ioctl, the value will not be changed by the kernel.
32fad281
PM
2416
2417If no hash table has been allocated when any vcpu is asked to run
2418(with the KVM_RUN ioctl), the host kernel will allocate a
2419default-sized hash table (16 MB).
2420
2421If this ioctl is called when a hash table has already been allocated,
f98a8bf9
DG
2422with a different order from the existing hash table, the existing hash
2423table will be freed and a new one allocated. If this is ioctl is
2424called when a hash table has already been allocated of the same order
2425as specified, the kernel will clear out the existing hash table (zero
2426all HPTEs). In either case, if the guest is using the virtualized
2427real-mode area (VRMA) facility, the kernel will re-create the VMRA
2428HPTEs on the next KVM_RUN of any vcpu.
32fad281 2429
416ad65f
CH
24304.77 KVM_S390_INTERRUPT
2431
2432Capability: basic
2433Architectures: s390
2434Type: vm ioctl, vcpu ioctl
2435Parameters: struct kvm_s390_interrupt (in)
2436Returns: 0 on success, -1 on error
2437
2438Allows to inject an interrupt to the guest. Interrupts can be floating
2439(vm ioctl) or per cpu (vcpu ioctl), depending on the interrupt type.
2440
2441Interrupt parameters are passed via kvm_s390_interrupt:
2442
2443struct kvm_s390_interrupt {
2444 __u32 type;
2445 __u32 parm;
2446 __u64 parm64;
2447};
2448
2449type can be one of the following:
2450
2822545f 2451KVM_S390_SIGP_STOP (vcpu) - sigp stop; optional flags in parm
416ad65f
CH
2452KVM_S390_PROGRAM_INT (vcpu) - program check; code in parm
2453KVM_S390_SIGP_SET_PREFIX (vcpu) - sigp set prefix; prefix address in parm
2454KVM_S390_RESTART (vcpu) - restart
e029ae5b
TH
2455KVM_S390_INT_CLOCK_COMP (vcpu) - clock comparator interrupt
2456KVM_S390_INT_CPU_TIMER (vcpu) - CPU timer interrupt
416ad65f
CH
2457KVM_S390_INT_VIRTIO (vm) - virtio external interrupt; external interrupt
2458 parameters in parm and parm64
2459KVM_S390_INT_SERVICE (vm) - sclp external interrupt; sclp parameter in parm
2460KVM_S390_INT_EMERGENCY (vcpu) - sigp emergency; source cpu in parm
2461KVM_S390_INT_EXTERNAL_CALL (vcpu) - sigp external call; source cpu in parm
d8346b7d
CH
2462KVM_S390_INT_IO(ai,cssid,ssid,schid) (vm) - compound value to indicate an
2463 I/O interrupt (ai - adapter interrupt; cssid,ssid,schid - subchannel);
2464 I/O interruption parameters in parm (subchannel) and parm64 (intparm,
2465 interruption subclass)
48a3e950
CH
2466KVM_S390_MCHK (vm, vcpu) - machine check interrupt; cr 14 bits in parm,
2467 machine check interrupt code in parm64 (note that
2468 machine checks needing further payload are not
2469 supported by this ioctl)
416ad65f
CH
2470
2471Note that the vcpu ioctl is asynchronous to vcpu execution.
2472
a2932923
PM
24734.78 KVM_PPC_GET_HTAB_FD
2474
2475Capability: KVM_CAP_PPC_HTAB_FD
2476Architectures: powerpc
2477Type: vm ioctl
2478Parameters: Pointer to struct kvm_get_htab_fd (in)
2479Returns: file descriptor number (>= 0) on success, -1 on error
2480
2481This returns a file descriptor that can be used either to read out the
2482entries in the guest's hashed page table (HPT), or to write entries to
2483initialize the HPT. The returned fd can only be written to if the
2484KVM_GET_HTAB_WRITE bit is set in the flags field of the argument, and
2485can only be read if that bit is clear. The argument struct looks like
2486this:
2487
2488/* For KVM_PPC_GET_HTAB_FD */
2489struct kvm_get_htab_fd {
2490 __u64 flags;
2491 __u64 start_index;
2492 __u64 reserved[2];
2493};
2494
2495/* Values for kvm_get_htab_fd.flags */
2496#define KVM_GET_HTAB_BOLTED_ONLY ((__u64)0x1)
2497#define KVM_GET_HTAB_WRITE ((__u64)0x2)
2498
2499The `start_index' field gives the index in the HPT of the entry at
2500which to start reading. It is ignored when writing.
2501
2502Reads on the fd will initially supply information about all
2503"interesting" HPT entries. Interesting entries are those with the
2504bolted bit set, if the KVM_GET_HTAB_BOLTED_ONLY bit is set, otherwise
2505all entries. When the end of the HPT is reached, the read() will
2506return. If read() is called again on the fd, it will start again from
2507the beginning of the HPT, but will only return HPT entries that have
2508changed since they were last read.
2509
2510Data read or written is structured as a header (8 bytes) followed by a
2511series of valid HPT entries (16 bytes) each. The header indicates how
2512many valid HPT entries there are and how many invalid entries follow
2513the valid entries. The invalid entries are not represented explicitly
2514in the stream. The header format is:
2515
2516struct kvm_get_htab_header {
2517 __u32 index;
2518 __u16 n_valid;
2519 __u16 n_invalid;
2520};
2521
2522Writes to the fd create HPT entries starting at the index given in the
2523header; first `n_valid' valid entries with contents from the data
2524written, then `n_invalid' invalid entries, invalidating any previously
2525valid entries found.
2526
852b6d57
SW
25274.79 KVM_CREATE_DEVICE
2528
2529Capability: KVM_CAP_DEVICE_CTRL
2530Type: vm ioctl
2531Parameters: struct kvm_create_device (in/out)
2532Returns: 0 on success, -1 on error
2533Errors:
2534 ENODEV: The device type is unknown or unsupported
2535 EEXIST: Device already created, and this type of device may not
2536 be instantiated multiple times
2537
2538 Other error conditions may be defined by individual device types or
2539 have their standard meanings.
2540
2541Creates an emulated device in the kernel. The file descriptor returned
2542in fd can be used with KVM_SET/GET/HAS_DEVICE_ATTR.
2543
2544If the KVM_CREATE_DEVICE_TEST flag is set, only test whether the
2545device type is supported (not necessarily whether it can be created
2546in the current vm).
2547
2548Individual devices should not define flags. Attributes should be used
2549for specifying any behavior that is not implied by the device type
2550number.
2551
2552struct kvm_create_device {
2553 __u32 type; /* in: KVM_DEV_TYPE_xxx */
2554 __u32 fd; /* out: device handle */
2555 __u32 flags; /* in: KVM_CREATE_DEVICE_xxx */
2556};
2557
25584.80 KVM_SET_DEVICE_ATTR/KVM_GET_DEVICE_ATTR
2559
f577f6c2
SZ
2560Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device,
2561 KVM_CAP_VCPU_ATTRIBUTES for vcpu device
2562Type: device ioctl, vm ioctl, vcpu ioctl
852b6d57
SW
2563Parameters: struct kvm_device_attr
2564Returns: 0 on success, -1 on error
2565Errors:
2566 ENXIO: The group or attribute is unknown/unsupported for this device
f9cbd9b0 2567 or hardware support is missing.
852b6d57
SW
2568 EPERM: The attribute cannot (currently) be accessed this way
2569 (e.g. read-only attribute, or attribute that only makes
2570 sense when the device is in a different state)
2571
2572 Other error conditions may be defined by individual device types.
2573
2574Gets/sets a specified piece of device configuration and/or state. The
2575semantics are device-specific. See individual device documentation in
2576the "devices" directory. As with ONE_REG, the size of the data
2577transferred is defined by the particular attribute.
2578
2579struct kvm_device_attr {
2580 __u32 flags; /* no flags currently defined */
2581 __u32 group; /* device-defined */
2582 __u64 attr; /* group-defined */
2583 __u64 addr; /* userspace address of attr data */
2584};
2585
25864.81 KVM_HAS_DEVICE_ATTR
2587
f577f6c2
SZ
2588Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device,
2589 KVM_CAP_VCPU_ATTRIBUTES for vcpu device
2590Type: device ioctl, vm ioctl, vcpu ioctl
852b6d57
SW
2591Parameters: struct kvm_device_attr
2592Returns: 0 on success, -1 on error
2593Errors:
2594 ENXIO: The group or attribute is unknown/unsupported for this device
f9cbd9b0 2595 or hardware support is missing.
852b6d57
SW
2596
2597Tests whether a device supports a particular attribute. A successful
2598return indicates the attribute is implemented. It does not necessarily
2599indicate that the attribute can be read or written in the device's
2600current state. "addr" is ignored.
f36992e3 2601
d8968f1f 26024.82 KVM_ARM_VCPU_INIT
749cf76c
CD
2603
2604Capability: basic
379e04c7 2605Architectures: arm, arm64
749cf76c 2606Type: vcpu ioctl
beb11fc7 2607Parameters: struct kvm_vcpu_init (in)
749cf76c
CD
2608Returns: 0 on success; -1 on error
2609Errors:
2610  EINVAL:    the target is unknown, or the combination of features is invalid.
2611  ENOENT:    a features bit specified is unknown.
2612
2613This tells KVM what type of CPU to present to the guest, and what
2614optional features it should have.  This will cause a reset of the cpu
2615registers to their initial values.  If this is not called, KVM_RUN will
2616return ENOEXEC for that vcpu.
2617
2618Note that because some registers reflect machine topology, all vcpus
2619should be created before this ioctl is invoked.
2620
f7fa034d
CD
2621Userspace can call this function multiple times for a given vcpu, including
2622after the vcpu has been run. This will reset the vcpu to its initial
2623state. All calls to this function after the initial call must use the same
2624target and same set of feature flags, otherwise EINVAL will be returned.
2625
aa024c2f
MZ
2626Possible features:
2627 - KVM_ARM_VCPU_POWER_OFF: Starts the CPU in a power-off state.
3ad8b3de
CD
2628 Depends on KVM_CAP_ARM_PSCI. If not set, the CPU will be powered on
2629 and execute guest code when KVM_RUN is called.
379e04c7
MZ
2630 - KVM_ARM_VCPU_EL1_32BIT: Starts the CPU in a 32bit mode.
2631 Depends on KVM_CAP_ARM_EL1_32BIT (arm64 only).
85bd0ba1
MZ
2632 - KVM_ARM_VCPU_PSCI_0_2: Emulate PSCI v0.2 (or a future revision
2633 backward compatible with v0.2) for the CPU.
50bb0c94 2634 Depends on KVM_CAP_ARM_PSCI_0_2.
808e7381
SZ
2635 - KVM_ARM_VCPU_PMU_V3: Emulate PMUv3 for the CPU.
2636 Depends on KVM_CAP_ARM_PMU_V3.
aa024c2f 2637
749cf76c 2638
740edfc0
AP
26394.83 KVM_ARM_PREFERRED_TARGET
2640
2641Capability: basic
2642Architectures: arm, arm64
2643Type: vm ioctl
2644Parameters: struct struct kvm_vcpu_init (out)
2645Returns: 0 on success; -1 on error
2646Errors:
a7265fb1 2647 ENODEV: no preferred target available for the host
740edfc0
AP
2648
2649This queries KVM for preferred CPU target type which can be emulated
2650by KVM on underlying host.
2651
2652The ioctl returns struct kvm_vcpu_init instance containing information
2653about preferred CPU target type and recommended features for it. The
2654kvm_vcpu_init->features bitmap returned will have feature bits set if
2655the preferred target recommends setting these features, but this is
2656not mandatory.
2657
2658The information returned by this ioctl can be used to prepare an instance
2659of struct kvm_vcpu_init for KVM_ARM_VCPU_INIT ioctl which will result in
2660in VCPU matching underlying host.
2661
2662
26634.84 KVM_GET_REG_LIST
749cf76c
CD
2664
2665Capability: basic
c2d2c21b 2666Architectures: arm, arm64, mips
749cf76c
CD
2667Type: vcpu ioctl
2668Parameters: struct kvm_reg_list (in/out)
2669Returns: 0 on success; -1 on error
2670Errors:
2671  E2BIG:     the reg index list is too big to fit in the array specified by
2672             the user (the number required will be written into n).
2673
2674struct kvm_reg_list {
2675 __u64 n; /* number of registers in reg[] */
2676 __u64 reg[0];
2677};
2678
2679This ioctl returns the guest registers that are supported for the
2680KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
2681
ce01e4e8
CD
2682
26834.85 KVM_ARM_SET_DEVICE_ADDR (deprecated)
3401d546
CD
2684
2685Capability: KVM_CAP_ARM_SET_DEVICE_ADDR
379e04c7 2686Architectures: arm, arm64
3401d546
CD
2687Type: vm ioctl
2688Parameters: struct kvm_arm_device_address (in)
2689Returns: 0 on success, -1 on error
2690Errors:
2691 ENODEV: The device id is unknown
2692 ENXIO: Device not supported on current system
2693 EEXIST: Address already set
2694 E2BIG: Address outside guest physical address space
330690cd 2695 EBUSY: Address overlaps with other device range
3401d546
CD
2696
2697struct kvm_arm_device_addr {
2698 __u64 id;
2699 __u64 addr;
2700};
2701
2702Specify a device address in the guest's physical address space where guests
2703can access emulated or directly exposed devices, which the host kernel needs
2704to know about. The id field is an architecture specific identifier for a
2705specific device.
2706
379e04c7
MZ
2707ARM/arm64 divides the id field into two parts, a device id and an
2708address type id specific to the individual device.
3401d546
CD
2709
2710  bits: | 63 ... 32 | 31 ... 16 | 15 ... 0 |
2711 field: | 0x00000000 | device id | addr type id |
2712
379e04c7
MZ
2713ARM/arm64 currently only require this when using the in-kernel GIC
2714support for the hardware VGIC features, using KVM_ARM_DEVICE_VGIC_V2
2715as the device id. When setting the base address for the guest's
2716mapping of the VGIC virtual CPU and distributor interface, the ioctl
2717must be called after calling KVM_CREATE_IRQCHIP, but before calling
2718KVM_RUN on any of the VCPUs. Calling this ioctl twice for any of the
2719base addresses will return -EEXIST.
3401d546 2720
ce01e4e8
CD
2721Note, this IOCTL is deprecated and the more flexible SET/GET_DEVICE_ATTR API
2722should be used instead.
2723
2724
740edfc0 27254.86 KVM_PPC_RTAS_DEFINE_TOKEN
8e591cb7
ME
2726
2727Capability: KVM_CAP_PPC_RTAS
2728Architectures: ppc
2729Type: vm ioctl
2730Parameters: struct kvm_rtas_token_args
2731Returns: 0 on success, -1 on error
2732
2733Defines a token value for a RTAS (Run Time Abstraction Services)
2734service in order to allow it to be handled in the kernel. The
2735argument struct gives the name of the service, which must be the name
2736of a service that has a kernel-side implementation. If the token
2737value is non-zero, it will be associated with that service, and
2738subsequent RTAS calls by the guest specifying that token will be
2739handled by the kernel. If the token value is 0, then any token
2740associated with the service will be forgotten, and subsequent RTAS
2741calls by the guest for that service will be passed to userspace to be
2742handled.
2743
4bd9d344
AB
27444.87 KVM_SET_GUEST_DEBUG
2745
2746Capability: KVM_CAP_SET_GUEST_DEBUG
0e6f07f2 2747Architectures: x86, s390, ppc, arm64
4bd9d344
AB
2748Type: vcpu ioctl
2749Parameters: struct kvm_guest_debug (in)
2750Returns: 0 on success; -1 on error
2751
2752struct kvm_guest_debug {
2753 __u32 control;
2754 __u32 pad;
2755 struct kvm_guest_debug_arch arch;
2756};
2757
2758Set up the processor specific debug registers and configure vcpu for
2759handling guest debug events. There are two parts to the structure, the
2760first a control bitfield indicates the type of debug events to handle
2761when running. Common control bits are:
2762
2763 - KVM_GUESTDBG_ENABLE: guest debugging is enabled
2764 - KVM_GUESTDBG_SINGLESTEP: the next run should single-step
2765
2766The top 16 bits of the control field are architecture specific control
2767flags which can include the following:
2768
4bd611ca 2769 - KVM_GUESTDBG_USE_SW_BP: using software breakpoints [x86, arm64]
834bf887 2770 - KVM_GUESTDBG_USE_HW_BP: using hardware breakpoints [x86, s390, arm64]
4bd9d344
AB
2771 - KVM_GUESTDBG_INJECT_DB: inject DB type exception [x86]
2772 - KVM_GUESTDBG_INJECT_BP: inject BP type exception [x86]
2773 - KVM_GUESTDBG_EXIT_PENDING: trigger an immediate guest exit [s390]
2774
2775For example KVM_GUESTDBG_USE_SW_BP indicates that software breakpoints
2776are enabled in memory so we need to ensure breakpoint exceptions are
2777correctly trapped and the KVM run loop exits at the breakpoint and not
2778running off into the normal guest vector. For KVM_GUESTDBG_USE_HW_BP
2779we need to ensure the guest vCPUs architecture specific registers are
2780updated to the correct (supplied) values.
2781
2782The second part of the structure is architecture specific and
2783typically contains a set of debug registers.
2784
834bf887
AB
2785For arm64 the number of debug registers is implementation defined and
2786can be determined by querying the KVM_CAP_GUEST_DEBUG_HW_BPS and
2787KVM_CAP_GUEST_DEBUG_HW_WPS capabilities which return a positive number
2788indicating the number of supported registers.
2789
4bd9d344
AB
2790When debug events exit the main run loop with the reason
2791KVM_EXIT_DEBUG with the kvm_debug_exit_arch part of the kvm_run
2792structure containing architecture specific debug information.
3401d546 2793
209cf19f
AB
27944.88 KVM_GET_EMULATED_CPUID
2795
2796Capability: KVM_CAP_EXT_EMUL_CPUID
2797Architectures: x86
2798Type: system ioctl
2799Parameters: struct kvm_cpuid2 (in/out)
2800Returns: 0 on success, -1 on error
2801
2802struct kvm_cpuid2 {
2803 __u32 nent;
2804 __u32 flags;
2805 struct kvm_cpuid_entry2 entries[0];
2806};
2807
2808The member 'flags' is used for passing flags from userspace.
2809
2810#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX BIT(0)
2811#define KVM_CPUID_FLAG_STATEFUL_FUNC BIT(1)
2812#define KVM_CPUID_FLAG_STATE_READ_NEXT BIT(2)
2813
2814struct kvm_cpuid_entry2 {
2815 __u32 function;
2816 __u32 index;
2817 __u32 flags;
2818 __u32 eax;
2819 __u32 ebx;
2820 __u32 ecx;
2821 __u32 edx;
2822 __u32 padding[3];
2823};
2824
2825This ioctl returns x86 cpuid features which are emulated by
2826kvm.Userspace can use the information returned by this ioctl to query
2827which features are emulated by kvm instead of being present natively.
2828
2829Userspace invokes KVM_GET_EMULATED_CPUID by passing a kvm_cpuid2
2830structure with the 'nent' field indicating the number of entries in
2831the variable-size array 'entries'. If the number of entries is too low
2832to describe the cpu capabilities, an error (E2BIG) is returned. If the
2833number is too high, the 'nent' field is adjusted and an error (ENOMEM)
2834is returned. If the number is just right, the 'nent' field is adjusted
2835to the number of valid entries in the 'entries' array, which is then
2836filled.
2837
2838The entries returned are the set CPUID bits of the respective features
2839which kvm emulates, as returned by the CPUID instruction, with unknown
2840or unsupported feature bits cleared.
2841
2842Features like x2apic, for example, may not be present in the host cpu
2843but are exposed by kvm in KVM_GET_SUPPORTED_CPUID because they can be
2844emulated efficiently and thus not included here.
2845
2846The fields in each entry are defined as follows:
2847
2848 function: the eax value used to obtain the entry
2849 index: the ecx value used to obtain the entry (for entries that are
2850 affected by ecx)
2851 flags: an OR of zero or more of the following:
2852 KVM_CPUID_FLAG_SIGNIFCANT_INDEX:
2853 if the index field is valid
2854 KVM_CPUID_FLAG_STATEFUL_FUNC:
2855 if cpuid for this function returns different values for successive
2856 invocations; there will be several entries with the same function,
2857 all with this flag set
2858 KVM_CPUID_FLAG_STATE_READ_NEXT:
2859 for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is
2860 the first entry to be read by a cpu
2861 eax, ebx, ecx, edx: the values returned by the cpuid instruction for
2862 this function/index combination
2863
41408c28
TH
28644.89 KVM_S390_MEM_OP
2865
2866Capability: KVM_CAP_S390_MEM_OP
2867Architectures: s390
2868Type: vcpu ioctl
2869Parameters: struct kvm_s390_mem_op (in)
2870Returns: = 0 on success,
2871 < 0 on generic error (e.g. -EFAULT or -ENOMEM),
2872 > 0 if an exception occurred while walking the page tables
2873
5d4f6f3d 2874Read or write data from/to the logical (virtual) memory of a VCPU.
41408c28
TH
2875
2876Parameters are specified via the following structure:
2877
2878struct kvm_s390_mem_op {
2879 __u64 gaddr; /* the guest address */
2880 __u64 flags; /* flags */
2881 __u32 size; /* amount of bytes */
2882 __u32 op; /* type of operation */
2883 __u64 buf; /* buffer in userspace */
2884 __u8 ar; /* the access register number */
2885 __u8 reserved[31]; /* should be set to 0 */
2886};
2887
2888The type of operation is specified in the "op" field. It is either
2889KVM_S390_MEMOP_LOGICAL_READ for reading from logical memory space or
2890KVM_S390_MEMOP_LOGICAL_WRITE for writing to logical memory space. The
2891KVM_S390_MEMOP_F_CHECK_ONLY flag can be set in the "flags" field to check
2892whether the corresponding memory access would create an access exception
2893(without touching the data in the memory at the destination). In case an
2894access exception occurred while walking the MMU tables of the guest, the
2895ioctl returns a positive error number to indicate the type of exception.
2896This exception is also raised directly at the corresponding VCPU if the
2897flag KVM_S390_MEMOP_F_INJECT_EXCEPTION is set in the "flags" field.
2898
2899The start address of the memory region has to be specified in the "gaddr"
2900field, and the length of the region in the "size" field. "buf" is the buffer
2901supplied by the userspace application where the read data should be written
2902to for KVM_S390_MEMOP_LOGICAL_READ, or where the data that should be written
2903is stored for a KVM_S390_MEMOP_LOGICAL_WRITE. "buf" is unused and can be NULL
2904when KVM_S390_MEMOP_F_CHECK_ONLY is specified. "ar" designates the access
2905register number to be used.
2906
2907The "reserved" field is meant for future extensions. It is not used by
2908KVM with the currently defined set of flags.
2909
30ee2a98
JH
29104.90 KVM_S390_GET_SKEYS
2911
2912Capability: KVM_CAP_S390_SKEYS
2913Architectures: s390
2914Type: vm ioctl
2915Parameters: struct kvm_s390_skeys
2916Returns: 0 on success, KVM_S390_GET_KEYS_NONE if guest is not using storage
2917 keys, negative value on error
2918
2919This ioctl is used to get guest storage key values on the s390
2920architecture. The ioctl takes parameters via the kvm_s390_skeys struct.
2921
2922struct kvm_s390_skeys {
2923 __u64 start_gfn;
2924 __u64 count;
2925 __u64 skeydata_addr;
2926 __u32 flags;
2927 __u32 reserved[9];
2928};
2929
2930The start_gfn field is the number of the first guest frame whose storage keys
2931you want to get.
2932
2933The count field is the number of consecutive frames (starting from start_gfn)
2934whose storage keys to get. The count field must be at least 1 and the maximum
2935allowed value is defined as KVM_S390_SKEYS_ALLOC_MAX. Values outside this range
2936will cause the ioctl to return -EINVAL.
2937
2938The skeydata_addr field is the address to a buffer large enough to hold count
2939bytes. This buffer will be filled with storage key data by the ioctl.
2940
29414.91 KVM_S390_SET_SKEYS
2942
2943Capability: KVM_CAP_S390_SKEYS
2944Architectures: s390
2945Type: vm ioctl
2946Parameters: struct kvm_s390_skeys
2947Returns: 0 on success, negative value on error
2948
2949This ioctl is used to set guest storage key values on the s390
2950architecture. The ioctl takes parameters via the kvm_s390_skeys struct.
2951See section on KVM_S390_GET_SKEYS for struct definition.
2952
2953The start_gfn field is the number of the first guest frame whose storage keys
2954you want to set.
2955
2956The count field is the number of consecutive frames (starting from start_gfn)
2957whose storage keys to get. The count field must be at least 1 and the maximum
2958allowed value is defined as KVM_S390_SKEYS_ALLOC_MAX. Values outside this range
2959will cause the ioctl to return -EINVAL.
2960
2961The skeydata_addr field is the address to a buffer containing count bytes of
2962storage keys. Each byte in the buffer will be set as the storage key for a
2963single frame starting at start_gfn for count frames.
2964
2965Note: If any architecturally invalid key value is found in the given data then
2966the ioctl will return -EINVAL.
2967
47b43c52
JF
29684.92 KVM_S390_IRQ
2969
2970Capability: KVM_CAP_S390_INJECT_IRQ
2971Architectures: s390
2972Type: vcpu ioctl
2973Parameters: struct kvm_s390_irq (in)
2974Returns: 0 on success, -1 on error
2975Errors:
2976 EINVAL: interrupt type is invalid
2977 type is KVM_S390_SIGP_STOP and flag parameter is invalid value
2978 type is KVM_S390_INT_EXTERNAL_CALL and code is bigger
2979 than the maximum of VCPUs
2980 EBUSY: type is KVM_S390_SIGP_SET_PREFIX and vcpu is not stopped
2981 type is KVM_S390_SIGP_STOP and a stop irq is already pending
2982 type is KVM_S390_INT_EXTERNAL_CALL and an external call interrupt
2983 is already pending
2984
2985Allows to inject an interrupt to the guest.
2986
2987Using struct kvm_s390_irq as a parameter allows
2988to inject additional payload which is not
2989possible via KVM_S390_INTERRUPT.
2990
2991Interrupt parameters are passed via kvm_s390_irq:
2992
2993struct kvm_s390_irq {
2994 __u64 type;
2995 union {
2996 struct kvm_s390_io_info io;
2997 struct kvm_s390_ext_info ext;
2998 struct kvm_s390_pgm_info pgm;
2999 struct kvm_s390_emerg_info emerg;
3000 struct kvm_s390_extcall_info extcall;
3001 struct kvm_s390_prefix_info prefix;
3002 struct kvm_s390_stop_info stop;
3003 struct kvm_s390_mchk_info mchk;
3004 char reserved[64];
3005 } u;
3006};
3007
3008type can be one of the following:
3009
3010KVM_S390_SIGP_STOP - sigp stop; parameter in .stop
3011KVM_S390_PROGRAM_INT - program check; parameters in .pgm
3012KVM_S390_SIGP_SET_PREFIX - sigp set prefix; parameters in .prefix
3013KVM_S390_RESTART - restart; no parameters
3014KVM_S390_INT_CLOCK_COMP - clock comparator interrupt; no parameters
3015KVM_S390_INT_CPU_TIMER - CPU timer interrupt; no parameters
3016KVM_S390_INT_EMERGENCY - sigp emergency; parameters in .emerg
3017KVM_S390_INT_EXTERNAL_CALL - sigp external call; parameters in .extcall
3018KVM_S390_MCHK - machine check interrupt; parameters in .mchk
3019
3020
3021Note that the vcpu ioctl is asynchronous to vcpu execution.
3022
816c7667
JF
30234.94 KVM_S390_GET_IRQ_STATE
3024
3025Capability: KVM_CAP_S390_IRQ_STATE
3026Architectures: s390
3027Type: vcpu ioctl
3028Parameters: struct kvm_s390_irq_state (out)
3029Returns: >= number of bytes copied into buffer,
3030 -EINVAL if buffer size is 0,
3031 -ENOBUFS if buffer size is too small to fit all pending interrupts,
3032 -EFAULT if the buffer address was invalid
3033
3034This ioctl allows userspace to retrieve the complete state of all currently
3035pending interrupts in a single buffer. Use cases include migration
3036and introspection. The parameter structure contains the address of a
3037userspace buffer and its length:
3038
3039struct kvm_s390_irq_state {
3040 __u64 buf;
bb64da9a 3041 __u32 flags; /* will stay unused for compatibility reasons */
816c7667 3042 __u32 len;
bb64da9a 3043 __u32 reserved[4]; /* will stay unused for compatibility reasons */
816c7667
JF
3044};
3045
3046Userspace passes in the above struct and for each pending interrupt a
3047struct kvm_s390_irq is copied to the provided buffer.
3048
bb64da9a
CB
3049The structure contains a flags and a reserved field for future extensions. As
3050the kernel never checked for flags == 0 and QEMU never pre-zeroed flags and
3051reserved, these fields can not be used in the future without breaking
3052compatibility.
3053
816c7667
JF
3054If -ENOBUFS is returned the buffer provided was too small and userspace
3055may retry with a bigger buffer.
3056
30574.95 KVM_S390_SET_IRQ_STATE
3058
3059Capability: KVM_CAP_S390_IRQ_STATE
3060Architectures: s390
3061Type: vcpu ioctl
3062Parameters: struct kvm_s390_irq_state (in)
3063Returns: 0 on success,
3064 -EFAULT if the buffer address was invalid,
3065 -EINVAL for an invalid buffer length (see below),
3066 -EBUSY if there were already interrupts pending,
3067 errors occurring when actually injecting the
3068 interrupt. See KVM_S390_IRQ.
3069
3070This ioctl allows userspace to set the complete state of all cpu-local
3071interrupts currently pending for the vcpu. It is intended for restoring
3072interrupt state after a migration. The input parameter is a userspace buffer
3073containing a struct kvm_s390_irq_state:
3074
3075struct kvm_s390_irq_state {
3076 __u64 buf;
bb64da9a 3077 __u32 flags; /* will stay unused for compatibility reasons */
816c7667 3078 __u32 len;
bb64da9a 3079 __u32 reserved[4]; /* will stay unused for compatibility reasons */
816c7667
JF
3080};
3081
bb64da9a
CB
3082The restrictions for flags and reserved apply as well.
3083(see KVM_S390_GET_IRQ_STATE)
3084
816c7667
JF
3085The userspace memory referenced by buf contains a struct kvm_s390_irq
3086for each interrupt to be injected into the guest.
3087If one of the interrupts could not be injected for some reason the
3088ioctl aborts.
3089
3090len must be a multiple of sizeof(struct kvm_s390_irq). It must be > 0
3091and it must not exceed (max_vcpus + 32) * sizeof(struct kvm_s390_irq),
3092which is the maximum number of possibly pending cpu-local interrupts.
47b43c52 3093
ed8e5a24 30944.96 KVM_SMI
f077825a
PB
3095
3096Capability: KVM_CAP_X86_SMM
3097Architectures: x86
3098Type: vcpu ioctl
3099Parameters: none
3100Returns: 0 on success, -1 on error
3101
3102Queues an SMI on the thread's vcpu.
3103
d3695aa4
AK
31044.97 KVM_CAP_PPC_MULTITCE
3105
3106Capability: KVM_CAP_PPC_MULTITCE
3107Architectures: ppc
3108Type: vm
3109
3110This capability means the kernel is capable of handling hypercalls
3111H_PUT_TCE_INDIRECT and H_STUFF_TCE without passing those into the user
3112space. This significantly accelerates DMA operations for PPC KVM guests.
3113User space should expect that its handlers for these hypercalls
3114are not going to be called if user space previously registered LIOBN
3115in KVM (via KVM_CREATE_SPAPR_TCE or similar calls).
3116
3117In order to enable H_PUT_TCE_INDIRECT and H_STUFF_TCE use in the guest,
3118user space might have to advertise it for the guest. For example,
3119IBM pSeries (sPAPR) guest starts using them if "hcall-multi-tce" is
3120present in the "ibm,hypertas-functions" device-tree property.
3121
3122The hypercalls mentioned above may or may not be processed successfully
3123in the kernel based fast path. If they can not be handled by the kernel,
3124they will get passed on to user space. So user space still has to have
3125an implementation for these despite the in kernel acceleration.
3126
3127This capability is always enabled.
3128
58ded420
AK
31294.98 KVM_CREATE_SPAPR_TCE_64
3130
3131Capability: KVM_CAP_SPAPR_TCE_64
3132Architectures: powerpc
3133Type: vm ioctl
3134Parameters: struct kvm_create_spapr_tce_64 (in)
3135Returns: file descriptor for manipulating the created TCE table
3136
3137This is an extension for KVM_CAP_SPAPR_TCE which only supports 32bit
3138windows, described in 4.62 KVM_CREATE_SPAPR_TCE
3139
3140This capability uses extended struct in ioctl interface:
3141
3142/* for KVM_CAP_SPAPR_TCE_64 */
3143struct kvm_create_spapr_tce_64 {
3144 __u64 liobn;
3145 __u32 page_shift;
3146 __u32 flags;
3147 __u64 offset; /* in pages */
3148 __u64 size; /* in pages */
3149};
3150
3151The aim of extension is to support an additional bigger DMA window with
3152a variable page size.
3153KVM_CREATE_SPAPR_TCE_64 receives a 64bit window size, an IOMMU page shift and
3154a bus offset of the corresponding DMA window, @size and @offset are numbers
3155of IOMMU pages.
3156
3157@flags are not used at the moment.
3158
3159The rest of functionality is identical to KVM_CREATE_SPAPR_TCE.
3160
ccc4df4e 31614.99 KVM_REINJECT_CONTROL
107d44a2
RK
3162
3163Capability: KVM_CAP_REINJECT_CONTROL
3164Architectures: x86
3165Type: vm ioctl
3166Parameters: struct kvm_reinject_control (in)
3167Returns: 0 on success,
3168 -EFAULT if struct kvm_reinject_control cannot be read,
3169 -ENXIO if KVM_CREATE_PIT or KVM_CREATE_PIT2 didn't succeed earlier.
3170
3171i8254 (PIT) has two modes, reinject and !reinject. The default is reinject,
3172where KVM queues elapsed i8254 ticks and monitors completion of interrupt from
3173vector(s) that i8254 injects. Reinject mode dequeues a tick and injects its
3174interrupt whenever there isn't a pending interrupt from i8254.
3175!reinject mode injects an interrupt as soon as a tick arrives.
3176
3177struct kvm_reinject_control {
3178 __u8 pit_reinject;
3179 __u8 reserved[31];
3180};
3181
3182pit_reinject = 0 (!reinject mode) is recommended, unless running an old
3183operating system that uses the PIT for timing (e.g. Linux 2.4.x).
3184
ccc4df4e 31854.100 KVM_PPC_CONFIGURE_V3_MMU
c9270132
PM
3186
3187Capability: KVM_CAP_PPC_RADIX_MMU or KVM_CAP_PPC_HASH_MMU_V3
3188Architectures: ppc
3189Type: vm ioctl
3190Parameters: struct kvm_ppc_mmuv3_cfg (in)
3191Returns: 0 on success,
3192 -EFAULT if struct kvm_ppc_mmuv3_cfg cannot be read,
3193 -EINVAL if the configuration is invalid
3194
3195This ioctl controls whether the guest will use radix or HPT (hashed
3196page table) translation, and sets the pointer to the process table for
3197the guest.
3198
3199struct kvm_ppc_mmuv3_cfg {
3200 __u64 flags;
3201 __u64 process_table;
3202};
3203
3204There are two bits that can be set in flags; KVM_PPC_MMUV3_RADIX and
3205KVM_PPC_MMUV3_GTSE. KVM_PPC_MMUV3_RADIX, if set, configures the guest
3206to use radix tree translation, and if clear, to use HPT translation.
3207KVM_PPC_MMUV3_GTSE, if set and if KVM permits it, configures the guest
3208to be able to use the global TLB and SLB invalidation instructions;
3209if clear, the guest may not use these instructions.
3210
3211The process_table field specifies the address and size of the guest
3212process table, which is in the guest's space. This field is formatted
3213as the second doubleword of the partition table entry, as defined in
3214the Power ISA V3.00, Book III section 5.7.6.1.
3215
ccc4df4e 32164.101 KVM_PPC_GET_RMMU_INFO
c9270132
PM
3217
3218Capability: KVM_CAP_PPC_RADIX_MMU
3219Architectures: ppc
3220Type: vm ioctl
3221Parameters: struct kvm_ppc_rmmu_info (out)
3222Returns: 0 on success,
3223 -EFAULT if struct kvm_ppc_rmmu_info cannot be written,
3224 -EINVAL if no useful information can be returned
3225
3226This ioctl returns a structure containing two things: (a) a list
3227containing supported radix tree geometries, and (b) a list that maps
3228page sizes to put in the "AP" (actual page size) field for the tlbie
3229(TLB invalidate entry) instruction.
3230
3231struct kvm_ppc_rmmu_info {
3232 struct kvm_ppc_radix_geom {
3233 __u8 page_shift;
3234 __u8 level_bits[4];
3235 __u8 pad[3];
3236 } geometries[8];
3237 __u32 ap_encodings[8];
3238};
3239
3240The geometries[] field gives up to 8 supported geometries for the
3241radix page table, in terms of the log base 2 of the smallest page
3242size, and the number of bits indexed at each level of the tree, from
3243the PTE level up to the PGD level in that order. Any unused entries
3244will have 0 in the page_shift field.
3245
3246The ap_encodings gives the supported page sizes and their AP field
3247encodings, encoded with the AP value in the top 3 bits and the log
3248base 2 of the page size in the bottom 6 bits.
3249
ef1ead0c
DG
32504.102 KVM_PPC_RESIZE_HPT_PREPARE
3251
3252Capability: KVM_CAP_SPAPR_RESIZE_HPT
3253Architectures: powerpc
3254Type: vm ioctl
3255Parameters: struct kvm_ppc_resize_hpt (in)
3256Returns: 0 on successful completion,
3257 >0 if a new HPT is being prepared, the value is an estimated
3258 number of milliseconds until preparation is complete
3259 -EFAULT if struct kvm_reinject_control cannot be read,
3260 -EINVAL if the supplied shift or flags are invalid
3261 -ENOMEM if unable to allocate the new HPT
3262 -ENOSPC if there was a hash collision when moving existing
3263 HPT entries to the new HPT
3264 -EIO on other error conditions
3265
3266Used to implement the PAPR extension for runtime resizing of a guest's
3267Hashed Page Table (HPT). Specifically this starts, stops or monitors
3268the preparation of a new potential HPT for the guest, essentially
3269implementing the H_RESIZE_HPT_PREPARE hypercall.
3270
3271If called with shift > 0 when there is no pending HPT for the guest,
3272this begins preparation of a new pending HPT of size 2^(shift) bytes.
3273It then returns a positive integer with the estimated number of
3274milliseconds until preparation is complete.
3275
3276If called when there is a pending HPT whose size does not match that
3277requested in the parameters, discards the existing pending HPT and
3278creates a new one as above.
3279
3280If called when there is a pending HPT of the size requested, will:
3281 * If preparation of the pending HPT is already complete, return 0
3282 * If preparation of the pending HPT has failed, return an error
3283 code, then discard the pending HPT.
3284 * If preparation of the pending HPT is still in progress, return an
3285 estimated number of milliseconds until preparation is complete.
3286
3287If called with shift == 0, discards any currently pending HPT and
3288returns 0 (i.e. cancels any in-progress preparation).
3289
3290flags is reserved for future expansion, currently setting any bits in
3291flags will result in an -EINVAL.
3292
3293Normally this will be called repeatedly with the same parameters until
3294it returns <= 0. The first call will initiate preparation, subsequent
3295ones will monitor preparation until it completes or fails.
3296
3297struct kvm_ppc_resize_hpt {
3298 __u64 flags;
3299 __u32 shift;
3300 __u32 pad;
3301};
3302
33034.103 KVM_PPC_RESIZE_HPT_COMMIT
3304
3305Capability: KVM_CAP_SPAPR_RESIZE_HPT
3306Architectures: powerpc
3307Type: vm ioctl
3308Parameters: struct kvm_ppc_resize_hpt (in)
3309Returns: 0 on successful completion,
3310 -EFAULT if struct kvm_reinject_control cannot be read,
3311 -EINVAL if the supplied shift or flags are invalid
3312 -ENXIO is there is no pending HPT, or the pending HPT doesn't
3313 have the requested size
3314 -EBUSY if the pending HPT is not fully prepared
3315 -ENOSPC if there was a hash collision when moving existing
3316 HPT entries to the new HPT
3317 -EIO on other error conditions
3318
3319Used to implement the PAPR extension for runtime resizing of a guest's
3320Hashed Page Table (HPT). Specifically this requests that the guest be
3321transferred to working with the new HPT, essentially implementing the
3322H_RESIZE_HPT_COMMIT hypercall.
3323
3324This should only be called after KVM_PPC_RESIZE_HPT_PREPARE has
3325returned 0 with the same parameters. In other cases
3326KVM_PPC_RESIZE_HPT_COMMIT will return an error (usually -ENXIO or
3327-EBUSY, though others may be possible if the preparation was started,
3328but failed).
3329
3330This will have undefined effects on the guest if it has not already
3331placed itself in a quiescent state where no vcpu will make MMU enabled
3332memory accesses.
3333
3334On succsful completion, the pending HPT will become the guest's active
3335HPT and the previous HPT will be discarded.
3336
3337On failure, the guest will still be operating on its previous HPT.
3338
3339struct kvm_ppc_resize_hpt {
3340 __u64 flags;
3341 __u32 shift;
3342 __u32 pad;
3343};
3344
3aa53859
LC
33454.104 KVM_X86_GET_MCE_CAP_SUPPORTED
3346
3347Capability: KVM_CAP_MCE
3348Architectures: x86
3349Type: system ioctl
3350Parameters: u64 mce_cap (out)
3351Returns: 0 on success, -1 on error
3352
3353Returns supported MCE capabilities. The u64 mce_cap parameter
3354has the same format as the MSR_IA32_MCG_CAP register. Supported
3355capabilities will have the corresponding bits set.
3356
33574.105 KVM_X86_SETUP_MCE
3358
3359Capability: KVM_CAP_MCE
3360Architectures: x86
3361Type: vcpu ioctl
3362Parameters: u64 mcg_cap (in)
3363Returns: 0 on success,
3364 -EFAULT if u64 mcg_cap cannot be read,
3365 -EINVAL if the requested number of banks is invalid,
3366 -EINVAL if requested MCE capability is not supported.
3367
3368Initializes MCE support for use. The u64 mcg_cap parameter
3369has the same format as the MSR_IA32_MCG_CAP register and
3370specifies which capabilities should be enabled. The maximum
3371supported number of error-reporting banks can be retrieved when
3372checking for KVM_CAP_MCE. The supported capabilities can be
3373retrieved with KVM_X86_GET_MCE_CAP_SUPPORTED.
3374
33754.106 KVM_X86_SET_MCE
3376
3377Capability: KVM_CAP_MCE
3378Architectures: x86
3379Type: vcpu ioctl
3380Parameters: struct kvm_x86_mce (in)
3381Returns: 0 on success,
3382 -EFAULT if struct kvm_x86_mce cannot be read,
3383 -EINVAL if the bank number is invalid,
3384 -EINVAL if VAL bit is not set in status field.
3385
3386Inject a machine check error (MCE) into the guest. The input
3387parameter is:
3388
3389struct kvm_x86_mce {
3390 __u64 status;
3391 __u64 addr;
3392 __u64 misc;
3393 __u64 mcg_status;
3394 __u8 bank;
3395 __u8 pad1[7];
3396 __u64 pad2[3];
3397};
3398
3399If the MCE being reported is an uncorrected error, KVM will
3400inject it as an MCE exception into the guest. If the guest
3401MCG_STATUS register reports that an MCE is in progress, KVM
3402causes an KVM_EXIT_SHUTDOWN vmexit.
3403
3404Otherwise, if the MCE is a corrected error, KVM will just
3405store it in the corresponding bank (provided this bank is
3406not holding a previously reported uncorrected error).
3407
4036e387
CI
34084.107 KVM_S390_GET_CMMA_BITS
3409
3410Capability: KVM_CAP_S390_CMMA_MIGRATION
3411Architectures: s390
3412Type: vm ioctl
3413Parameters: struct kvm_s390_cmma_log (in, out)
3414Returns: 0 on success, a negative value on error
3415
3416This ioctl is used to get the values of the CMMA bits on the s390
3417architecture. It is meant to be used in two scenarios:
3418- During live migration to save the CMMA values. Live migration needs
3419 to be enabled via the KVM_REQ_START_MIGRATION VM property.
3420- To non-destructively peek at the CMMA values, with the flag
3421 KVM_S390_CMMA_PEEK set.
3422
3423The ioctl takes parameters via the kvm_s390_cmma_log struct. The desired
3424values are written to a buffer whose location is indicated via the "values"
3425member in the kvm_s390_cmma_log struct. The values in the input struct are
3426also updated as needed.
3427Each CMMA value takes up one byte.
3428
3429struct kvm_s390_cmma_log {
3430 __u64 start_gfn;
3431 __u32 count;
3432 __u32 flags;
3433 union {
3434 __u64 remaining;
3435 __u64 mask;
3436 };
3437 __u64 values;
3438};
3439
3440start_gfn is the number of the first guest frame whose CMMA values are
3441to be retrieved,
3442
3443count is the length of the buffer in bytes,
3444
3445values points to the buffer where the result will be written to.
3446
3447If count is greater than KVM_S390_SKEYS_MAX, then it is considered to be
3448KVM_S390_SKEYS_MAX. KVM_S390_SKEYS_MAX is re-used for consistency with
3449other ioctls.
3450
3451The result is written in the buffer pointed to by the field values, and
3452the values of the input parameter are updated as follows.
3453
3454Depending on the flags, different actions are performed. The only
3455supported flag so far is KVM_S390_CMMA_PEEK.
3456
3457The default behaviour if KVM_S390_CMMA_PEEK is not set is:
3458start_gfn will indicate the first page frame whose CMMA bits were dirty.
3459It is not necessarily the same as the one passed as input, as clean pages
3460are skipped.
3461
3462count will indicate the number of bytes actually written in the buffer.
3463It can (and very often will) be smaller than the input value, since the
3464buffer is only filled until 16 bytes of clean values are found (which
3465are then not copied in the buffer). Since a CMMA migration block needs
3466the base address and the length, for a total of 16 bytes, we will send
3467back some clean data if there is some dirty data afterwards, as long as
3468the size of the clean data does not exceed the size of the header. This
3469allows to minimize the amount of data to be saved or transferred over
3470the network at the expense of more roundtrips to userspace. The next
3471invocation of the ioctl will skip over all the clean values, saving
3472potentially more than just the 16 bytes we found.
3473
3474If KVM_S390_CMMA_PEEK is set:
3475the existing storage attributes are read even when not in migration
3476mode, and no other action is performed;
3477
3478the output start_gfn will be equal to the input start_gfn,
3479
3480the output count will be equal to the input count, except if the end of
3481memory has been reached.
3482
3483In both cases:
3484the field "remaining" will indicate the total number of dirty CMMA values
3485still remaining, or 0 if KVM_S390_CMMA_PEEK is set and migration mode is
3486not enabled.
3487
3488mask is unused.
3489
3490values points to the userspace buffer where the result will be stored.
3491
3492This ioctl can fail with -ENOMEM if not enough memory can be allocated to
3493complete the task, with -ENXIO if CMMA is not enabled, with -EINVAL if
3494KVM_S390_CMMA_PEEK is not set but migration mode was not enabled, with
3495-EFAULT if the userspace address is invalid or if no page table is
3496present for the addresses (e.g. when using hugepages).
3497
34984.108 KVM_S390_SET_CMMA_BITS
3499
3500Capability: KVM_CAP_S390_CMMA_MIGRATION
3501Architectures: s390
3502Type: vm ioctl
3503Parameters: struct kvm_s390_cmma_log (in)
3504Returns: 0 on success, a negative value on error
3505
3506This ioctl is used to set the values of the CMMA bits on the s390
3507architecture. It is meant to be used during live migration to restore
3508the CMMA values, but there are no restrictions on its use.
3509The ioctl takes parameters via the kvm_s390_cmma_values struct.
3510Each CMMA value takes up one byte.
3511
3512struct kvm_s390_cmma_log {
3513 __u64 start_gfn;
3514 __u32 count;
3515 __u32 flags;
3516 union {
3517 __u64 remaining;
3518 __u64 mask;
3519 };
3520 __u64 values;
3521};
3522
3523start_gfn indicates the starting guest frame number,
3524
3525count indicates how many values are to be considered in the buffer,
3526
3527flags is not used and must be 0.
3528
3529mask indicates which PGSTE bits are to be considered.
3530
3531remaining is not used.
3532
3533values points to the buffer in userspace where to store the values.
3534
3535This ioctl can fail with -ENOMEM if not enough memory can be allocated to
3536complete the task, with -ENXIO if CMMA is not enabled, with -EINVAL if
3537the count field is too large (e.g. more than KVM_S390_CMMA_SIZE_MAX) or
3538if the flags field was not 0, with -EFAULT if the userspace address is
3539invalid, if invalid pages are written to (e.g. after the end of memory)
3540or if no page table is present for the addresses (e.g. when using
3541hugepages).
3542
7bf14c28 35434.109 KVM_PPC_GET_CPU_CHAR
3214d01f
PM
3544
3545Capability: KVM_CAP_PPC_GET_CPU_CHAR
3546Architectures: powerpc
3547Type: vm ioctl
3548Parameters: struct kvm_ppc_cpu_char (out)
3549Returns: 0 on successful completion
3550 -EFAULT if struct kvm_ppc_cpu_char cannot be written
3551
3552This ioctl gives userspace information about certain characteristics
3553of the CPU relating to speculative execution of instructions and
3554possible information leakage resulting from speculative execution (see
3555CVE-2017-5715, CVE-2017-5753 and CVE-2017-5754). The information is
3556returned in struct kvm_ppc_cpu_char, which looks like this:
3557
3558struct kvm_ppc_cpu_char {
3559 __u64 character; /* characteristics of the CPU */
3560 __u64 behaviour; /* recommended software behaviour */
3561 __u64 character_mask; /* valid bits in character */
3562 __u64 behaviour_mask; /* valid bits in behaviour */
3563};
3564
3565For extensibility, the character_mask and behaviour_mask fields
3566indicate which bits of character and behaviour have been filled in by
3567the kernel. If the set of defined bits is extended in future then
3568userspace will be able to tell whether it is running on a kernel that
3569knows about the new bits.
3570
3571The character field describes attributes of the CPU which can help
3572with preventing inadvertent information disclosure - specifically,
3573whether there is an instruction to flash-invalidate the L1 data cache
3574(ori 30,30,0 or mtspr SPRN_TRIG2,rN), whether the L1 data cache is set
3575to a mode where entries can only be used by the thread that created
3576them, whether the bcctr[l] instruction prevents speculation, and
3577whether a speculation barrier instruction (ori 31,31,0) is provided.
3578
3579The behaviour field describes actions that software should take to
3580prevent inadvertent information disclosure, and thus describes which
3581vulnerabilities the hardware is subject to; specifically whether the
3582L1 data cache should be flushed when returning to user mode from the
3583kernel, and whether a speculation barrier should be placed between an
3584array bounds check and the array access.
3585
3586These fields use the same bit definitions as the new
3587H_GET_CPU_CHARACTERISTICS hypercall.
3588
7bf14c28 35894.110 KVM_MEMORY_ENCRYPT_OP
5acc5c06
BS
3590
3591Capability: basic
3592Architectures: x86
3593Type: system
3594Parameters: an opaque platform specific structure (in/out)
3595Returns: 0 on success; -1 on error
3596
3597If the platform supports creating encrypted VMs then this ioctl can be used
3598for issuing platform-specific memory encryption commands to manage those
3599encrypted VMs.
3600
3601Currently, this ioctl is used for issuing Secure Encrypted Virtualization
3602(SEV) commands on AMD Processors. The SEV commands are defined in
21e94aca 3603Documentation/virtual/kvm/amd-memory-encryption.rst.
5acc5c06 3604
7bf14c28 36054.111 KVM_MEMORY_ENCRYPT_REG_REGION
69eaedee
BS
3606
3607Capability: basic
3608Architectures: x86
3609Type: system
3610Parameters: struct kvm_enc_region (in)
3611Returns: 0 on success; -1 on error
3612
3613This ioctl can be used to register a guest memory region which may
3614contain encrypted data (e.g. guest RAM, SMRAM etc).
3615
3616It is used in the SEV-enabled guest. When encryption is enabled, a guest
3617memory region may contain encrypted data. The SEV memory encryption
3618engine uses a tweak such that two identical plaintext pages, each at
3619different locations will have differing ciphertexts. So swapping or
3620moving ciphertext of those pages will not result in plaintext being
3621swapped. So relocating (or migrating) physical backing pages for the SEV
3622guest will require some additional steps.
3623
3624Note: The current SEV key management spec does not provide commands to
3625swap or migrate (move) ciphertext pages. Hence, for now we pin the guest
3626memory region registered with the ioctl.
3627
7bf14c28 36284.112 KVM_MEMORY_ENCRYPT_UNREG_REGION
69eaedee
BS
3629
3630Capability: basic
3631Architectures: x86
3632Type: system
3633Parameters: struct kvm_enc_region (in)
3634Returns: 0 on success; -1 on error
3635
3636This ioctl can be used to unregister the guest memory region registered
3637with KVM_MEMORY_ENCRYPT_REG_REGION ioctl above.
3638
faeb7833
RK
36394.113 KVM_HYPERV_EVENTFD
3640
3641Capability: KVM_CAP_HYPERV_EVENTFD
3642Architectures: x86
3643Type: vm ioctl
3644Parameters: struct kvm_hyperv_eventfd (in)
3645
3646This ioctl (un)registers an eventfd to receive notifications from the guest on
3647the specified Hyper-V connection id through the SIGNAL_EVENT hypercall, without
3648causing a user exit. SIGNAL_EVENT hypercall with non-zero event flag number
3649(bits 24-31) still triggers a KVM_EXIT_HYPERV_HCALL user exit.
3650
3651struct kvm_hyperv_eventfd {
3652 __u32 conn_id;
3653 __s32 fd;
3654 __u32 flags;
3655 __u32 padding[3];
3656};
3657
3658The conn_id field should fit within 24 bits:
3659
3660#define KVM_HYPERV_CONN_ID_MASK 0x00ffffff
3661
3662The acceptable values for the flags field are:
3663
3664#define KVM_HYPERV_EVENTFD_DEASSIGN (1 << 0)
3665
3666Returns: 0 on success,
3667 -EINVAL if conn_id or flags is outside the allowed range
3668 -ENOENT on deassign if the conn_id isn't registered
3669 -EEXIST on assign if the conn_id is already registered
3670
8fcc4b59
JM
36714.114 KVM_GET_NESTED_STATE
3672
3673Capability: KVM_CAP_NESTED_STATE
3674Architectures: x86
3675Type: vcpu ioctl
3676Parameters: struct kvm_nested_state (in/out)
3677Returns: 0 on success, -1 on error
3678Errors:
3679 E2BIG: the total state size (including the fixed-size part of struct
3680 kvm_nested_state) exceeds the value of 'size' specified by
3681 the user; the size required will be written into size.
3682
3683struct kvm_nested_state {
3684 __u16 flags;
3685 __u16 format;
3686 __u32 size;
3687 union {
3688 struct kvm_vmx_nested_state vmx;
3689 struct kvm_svm_nested_state svm;
3690 __u8 pad[120];
3691 };
3692 __u8 data[0];
3693};
3694
3695#define KVM_STATE_NESTED_GUEST_MODE 0x00000001
3696#define KVM_STATE_NESTED_RUN_PENDING 0x00000002
3697
3698#define KVM_STATE_NESTED_SMM_GUEST_MODE 0x00000001
3699#define KVM_STATE_NESTED_SMM_VMXON 0x00000002
3700
3701struct kvm_vmx_nested_state {
3702 __u64 vmxon_pa;
3703 __u64 vmcs_pa;
3704
3705 struct {
3706 __u16 flags;
3707 } smm;
3708};
3709
3710This ioctl copies the vcpu's nested virtualization state from the kernel to
3711userspace.
3712
3713The maximum size of the state, including the fixed-size part of struct
3714kvm_nested_state, can be retrieved by passing KVM_CAP_NESTED_STATE to
3715the KVM_CHECK_EXTENSION ioctl().
3716
37174.115 KVM_SET_NESTED_STATE
3718
3719Capability: KVM_CAP_NESTED_STATE
3720Architectures: x86
3721Type: vcpu ioctl
3722Parameters: struct kvm_nested_state (in)
3723Returns: 0 on success, -1 on error
3724
3725This copies the vcpu's kvm_nested_state struct from userspace to the kernel. For
3726the definition of struct kvm_nested_state, see KVM_GET_NESTED_STATE.
7bf14c28 3727
9943450b
PH
37284.116 KVM_(UN)REGISTER_COALESCED_MMIO
3729
0804c849
PH
3730Capability: KVM_CAP_COALESCED_MMIO (for coalesced mmio)
3731 KVM_CAP_COALESCED_PIO (for coalesced pio)
9943450b
PH
3732Architectures: all
3733Type: vm ioctl
3734Parameters: struct kvm_coalesced_mmio_zone
3735Returns: 0 on success, < 0 on error
3736
0804c849 3737Coalesced I/O is a performance optimization that defers hardware
9943450b
PH
3738register write emulation so that userspace exits are avoided. It is
3739typically used to reduce the overhead of emulating frequently accessed
3740hardware registers.
3741
0804c849 3742When a hardware register is configured for coalesced I/O, write accesses
9943450b
PH
3743do not exit to userspace and their value is recorded in a ring buffer
3744that is shared between kernel and userspace.
3745
0804c849 3746Coalesced I/O is used if one or more write accesses to a hardware
9943450b
PH
3747register can be deferred until a read or a write to another hardware
3748register on the same device. This last access will cause a vmexit and
3749userspace will process accesses from the ring buffer before emulating
0804c849
PH
3750it. That will avoid exiting to userspace on repeated writes.
3751
3752Coalesced pio is based on coalesced mmio. There is little difference
3753between coalesced mmio and pio except that coalesced pio records accesses
3754to I/O ports.
9943450b 3755
9c1b96e3 37565. The kvm_run structure
414fa985 3757------------------------
9c1b96e3
AK
3758
3759Application code obtains a pointer to the kvm_run structure by
3760mmap()ing a vcpu fd. From that point, application code can control
3761execution by changing fields in kvm_run prior to calling the KVM_RUN
3762ioctl, and obtain information about the reason KVM_RUN returned by
3763looking up structure members.
3764
3765struct kvm_run {
3766 /* in */
3767 __u8 request_interrupt_window;
3768
3769Request that KVM_RUN return when it becomes possible to inject external
3770interrupts into the guest. Useful in conjunction with KVM_INTERRUPT.
3771
460df4c1
PB
3772 __u8 immediate_exit;
3773
3774This field is polled once when KVM_RUN starts; if non-zero, KVM_RUN
3775exits immediately, returning -EINTR. In the common scenario where a
3776signal is used to "kick" a VCPU out of KVM_RUN, this field can be used
3777to avoid usage of KVM_SET_SIGNAL_MASK, which has worse scalability.
3778Rather than blocking the signal outside KVM_RUN, userspace can set up
3779a signal handler that sets run->immediate_exit to a non-zero value.
3780
3781This field is ignored if KVM_CAP_IMMEDIATE_EXIT is not available.
3782
3783 __u8 padding1[6];
9c1b96e3
AK
3784
3785 /* out */
3786 __u32 exit_reason;
3787
3788When KVM_RUN has returned successfully (return value 0), this informs
3789application code why KVM_RUN has returned. Allowable values for this
3790field are detailed below.
3791
3792 __u8 ready_for_interrupt_injection;
3793
3794If request_interrupt_window has been specified, this field indicates
3795an interrupt can be injected now with KVM_INTERRUPT.
3796
3797 __u8 if_flag;
3798
3799The value of the current interrupt flag. Only valid if in-kernel
3800local APIC is not used.
3801
f077825a
PB
3802 __u16 flags;
3803
3804More architecture-specific flags detailing state of the VCPU that may
3805affect the device's behavior. The only currently defined flag is
3806KVM_RUN_X86_SMM, which is valid on x86 machines and is set if the
3807VCPU is in system management mode.
9c1b96e3
AK
3808
3809 /* in (pre_kvm_run), out (post_kvm_run) */
3810 __u64 cr8;
3811
3812The value of the cr8 register. Only valid if in-kernel local APIC is
3813not used. Both input and output.
3814
3815 __u64 apic_base;
3816
3817The value of the APIC BASE msr. Only valid if in-kernel local
3818APIC is not used. Both input and output.
3819
3820 union {
3821 /* KVM_EXIT_UNKNOWN */
3822 struct {
3823 __u64 hardware_exit_reason;
3824 } hw;
3825
3826If exit_reason is KVM_EXIT_UNKNOWN, the vcpu has exited due to unknown
3827reasons. Further architecture-specific information is available in
3828hardware_exit_reason.
3829
3830 /* KVM_EXIT_FAIL_ENTRY */
3831 struct {
3832 __u64 hardware_entry_failure_reason;
3833 } fail_entry;
3834
3835If exit_reason is KVM_EXIT_FAIL_ENTRY, the vcpu could not be run due
3836to unknown reasons. Further architecture-specific information is
3837available in hardware_entry_failure_reason.
3838
3839 /* KVM_EXIT_EXCEPTION */
3840 struct {
3841 __u32 exception;
3842 __u32 error_code;
3843 } ex;
3844
3845Unused.
3846
3847 /* KVM_EXIT_IO */
3848 struct {
3849#define KVM_EXIT_IO_IN 0
3850#define KVM_EXIT_IO_OUT 1
3851 __u8 direction;
3852 __u8 size; /* bytes */
3853 __u16 port;
3854 __u32 count;
3855 __u64 data_offset; /* relative to kvm_run start */
3856 } io;
3857
2044892d 3858If exit_reason is KVM_EXIT_IO, then the vcpu has
9c1b96e3
AK
3859executed a port I/O instruction which could not be satisfied by kvm.
3860data_offset describes where the data is located (KVM_EXIT_IO_OUT) or
3861where kvm expects application code to place the data for the next
2044892d 3862KVM_RUN invocation (KVM_EXIT_IO_IN). Data format is a packed array.
9c1b96e3 3863
8ab30c15 3864 /* KVM_EXIT_DEBUG */
9c1b96e3
AK
3865 struct {
3866 struct kvm_debug_exit_arch arch;
3867 } debug;
3868
8ab30c15
AB
3869If the exit_reason is KVM_EXIT_DEBUG, then a vcpu is processing a debug event
3870for which architecture specific information is returned.
9c1b96e3
AK
3871
3872 /* KVM_EXIT_MMIO */
3873 struct {
3874 __u64 phys_addr;
3875 __u8 data[8];
3876 __u32 len;
3877 __u8 is_write;
3878 } mmio;
3879
2044892d 3880If exit_reason is KVM_EXIT_MMIO, then the vcpu has
9c1b96e3
AK
3881executed a memory-mapped I/O instruction which could not be satisfied
3882by kvm. The 'data' member contains the written data if 'is_write' is
3883true, and should be filled by application code otherwise.
3884
6acdb160
CD
3885The 'data' member contains, in its first 'len' bytes, the value as it would
3886appear if the VCPU performed a load or store of the appropriate width directly
3887to the byte array.
3888
cc568ead 3889NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR and
ce91ddc4 3890 KVM_EXIT_EPR the corresponding
ad0a048b
AG
3891operations are complete (and guest state is consistent) only after userspace
3892has re-entered the kernel with KVM_RUN. The kernel side will first finish
67961344
MT
3893incomplete operations and then check for pending signals. Userspace
3894can re-enter the guest with an unmasked signal pending to complete
3895pending operations.
3896
9c1b96e3
AK
3897 /* KVM_EXIT_HYPERCALL */
3898 struct {
3899 __u64 nr;
3900 __u64 args[6];
3901 __u64 ret;
3902 __u32 longmode;
3903 __u32 pad;
3904 } hypercall;
3905
647dc49e
AK
3906Unused. This was once used for 'hypercall to userspace'. To implement
3907such functionality, use KVM_EXIT_IO (x86) or KVM_EXIT_MMIO (all except s390).
3908Note KVM_EXIT_IO is significantly faster than KVM_EXIT_MMIO.
9c1b96e3
AK
3909
3910 /* KVM_EXIT_TPR_ACCESS */
3911 struct {
3912 __u64 rip;
3913 __u32 is_write;
3914 __u32 pad;
3915 } tpr_access;
3916
3917To be documented (KVM_TPR_ACCESS_REPORTING).
3918
3919 /* KVM_EXIT_S390_SIEIC */
3920 struct {
3921 __u8 icptcode;
3922 __u64 mask; /* psw upper half */
3923 __u64 addr; /* psw lower half */
3924 __u16 ipa;
3925 __u32 ipb;
3926 } s390_sieic;
3927
3928s390 specific.
3929
3930 /* KVM_EXIT_S390_RESET */
3931#define KVM_S390_RESET_POR 1
3932#define KVM_S390_RESET_CLEAR 2
3933#define KVM_S390_RESET_SUBSYSTEM 4
3934#define KVM_S390_RESET_CPU_INIT 8
3935#define KVM_S390_RESET_IPL 16
3936 __u64 s390_reset_flags;
3937
3938s390 specific.
3939
e168bf8d
CO
3940 /* KVM_EXIT_S390_UCONTROL */
3941 struct {
3942 __u64 trans_exc_code;
3943 __u32 pgm_code;
3944 } s390_ucontrol;
3945
3946s390 specific. A page fault has occurred for a user controlled virtual
3947machine (KVM_VM_S390_UNCONTROL) on it's host page table that cannot be
3948resolved by the kernel.
3949The program code and the translation exception code that were placed
3950in the cpu's lowcore are presented here as defined by the z Architecture
3951Principles of Operation Book in the Chapter for Dynamic Address Translation
3952(DAT)
3953
9c1b96e3
AK
3954 /* KVM_EXIT_DCR */
3955 struct {
3956 __u32 dcrn;
3957 __u32 data;
3958 __u8 is_write;
3959 } dcr;
3960
ce91ddc4 3961Deprecated - was used for 440 KVM.
9c1b96e3 3962
ad0a048b
AG
3963 /* KVM_EXIT_OSI */
3964 struct {
3965 __u64 gprs[32];
3966 } osi;
3967
3968MOL uses a special hypercall interface it calls 'OSI'. To enable it, we catch
3969hypercalls and exit with this exit struct that contains all the guest gprs.
3970
3971If exit_reason is KVM_EXIT_OSI, then the vcpu has triggered such a hypercall.
3972Userspace can now handle the hypercall and when it's done modify the gprs as
3973necessary. Upon guest entry all guest GPRs will then be replaced by the values
3974in this struct.
3975
de56a948
PM
3976 /* KVM_EXIT_PAPR_HCALL */
3977 struct {
3978 __u64 nr;
3979 __u64 ret;
3980 __u64 args[9];
3981 } papr_hcall;
3982
3983This is used on 64-bit PowerPC when emulating a pSeries partition,
3984e.g. with the 'pseries' machine type in qemu. It occurs when the
3985guest does a hypercall using the 'sc 1' instruction. The 'nr' field
3986contains the hypercall number (from the guest R3), and 'args' contains
3987the arguments (from the guest R4 - R12). Userspace should put the
3988return code in 'ret' and any extra returned values in args[].
3989The possible hypercalls are defined in the Power Architecture Platform
3990Requirements (PAPR) document available from www.power.org (free
3991developer registration required to access it).
3992
fa6b7fe9
CH
3993 /* KVM_EXIT_S390_TSCH */
3994 struct {
3995 __u16 subchannel_id;
3996 __u16 subchannel_nr;
3997 __u32 io_int_parm;
3998 __u32 io_int_word;
3999 __u32 ipb;
4000 __u8 dequeued;
4001 } s390_tsch;
4002
4003s390 specific. This exit occurs when KVM_CAP_S390_CSS_SUPPORT has been enabled
4004and TEST SUBCHANNEL was intercepted. If dequeued is set, a pending I/O
4005interrupt for the target subchannel has been dequeued and subchannel_id,
4006subchannel_nr, io_int_parm and io_int_word contain the parameters for that
4007interrupt. ipb is needed for instruction parameter decoding.
4008
1c810636
AG
4009 /* KVM_EXIT_EPR */
4010 struct {
4011 __u32 epr;
4012 } epr;
4013
4014On FSL BookE PowerPC chips, the interrupt controller has a fast patch
4015interrupt acknowledge path to the core. When the core successfully
4016delivers an interrupt, it automatically populates the EPR register with
4017the interrupt vector number and acknowledges the interrupt inside
4018the interrupt controller.
4019
4020In case the interrupt controller lives in user space, we need to do
4021the interrupt acknowledge cycle through it to fetch the next to be
4022delivered interrupt vector using this exit.
4023
4024It gets triggered whenever both KVM_CAP_PPC_EPR are enabled and an
4025external interrupt has just been delivered into the guest. User space
4026should put the acknowledged interrupt vector into the 'epr' field.
4027
8ad6b634
AP
4028 /* KVM_EXIT_SYSTEM_EVENT */
4029 struct {
4030#define KVM_SYSTEM_EVENT_SHUTDOWN 1
4031#define KVM_SYSTEM_EVENT_RESET 2
2ce79189 4032#define KVM_SYSTEM_EVENT_CRASH 3
8ad6b634
AP
4033 __u32 type;
4034 __u64 flags;
4035 } system_event;
4036
4037If exit_reason is KVM_EXIT_SYSTEM_EVENT then the vcpu has triggered
4038a system-level event using some architecture specific mechanism (hypercall
4039or some special instruction). In case of ARM/ARM64, this is triggered using
4040HVC instruction based PSCI call from the vcpu. The 'type' field describes
4041the system-level event type. The 'flags' field describes architecture
4042specific flags for the system-level event.
4043
cf5d3188
CD
4044Valid values for 'type' are:
4045 KVM_SYSTEM_EVENT_SHUTDOWN -- the guest has requested a shutdown of the
4046 VM. Userspace is not obliged to honour this, and if it does honour
4047 this does not need to destroy the VM synchronously (ie it may call
4048 KVM_RUN again before shutdown finally occurs).
4049 KVM_SYSTEM_EVENT_RESET -- the guest has requested a reset of the VM.
4050 As with SHUTDOWN, userspace can choose to ignore the request, or
4051 to schedule the reset to occur in the future and may call KVM_RUN again.
2ce79189
AS
4052 KVM_SYSTEM_EVENT_CRASH -- the guest crash occurred and the guest
4053 has requested a crash condition maintenance. Userspace can choose
4054 to ignore the request, or to gather VM memory core dump and/or
4055 reset/shutdown of the VM.
cf5d3188 4056
7543a635
SR
4057 /* KVM_EXIT_IOAPIC_EOI */
4058 struct {
4059 __u8 vector;
4060 } eoi;
4061
4062Indicates that the VCPU's in-kernel local APIC received an EOI for a
4063level-triggered IOAPIC interrupt. This exit only triggers when the
4064IOAPIC is implemented in userspace (i.e. KVM_CAP_SPLIT_IRQCHIP is enabled);
4065the userspace IOAPIC should process the EOI and retrigger the interrupt if
4066it is still asserted. Vector is the LAPIC interrupt vector for which the
4067EOI was received.
4068
db397571
AS
4069 struct kvm_hyperv_exit {
4070#define KVM_EXIT_HYPERV_SYNIC 1
83326e43 4071#define KVM_EXIT_HYPERV_HCALL 2
db397571
AS
4072 __u32 type;
4073 union {
4074 struct {
4075 __u32 msr;
4076 __u64 control;
4077 __u64 evt_page;
4078 __u64 msg_page;
4079 } synic;
83326e43
AS
4080 struct {
4081 __u64 input;
4082 __u64 result;
4083 __u64 params[2];
4084 } hcall;
db397571
AS
4085 } u;
4086 };
4087 /* KVM_EXIT_HYPERV */
4088 struct kvm_hyperv_exit hyperv;
4089Indicates that the VCPU exits into userspace to process some tasks
4090related to Hyper-V emulation.
4091Valid values for 'type' are:
4092 KVM_EXIT_HYPERV_SYNIC -- synchronously notify user-space about
4093Hyper-V SynIC state change. Notification is used to remap SynIC
4094event/message pages and to enable/disable SynIC messages/events processing
4095in userspace.
4096
9c1b96e3
AK
4097 /* Fix the size of the union. */
4098 char padding[256];
4099 };
b9e5dc8d
CB
4100
4101 /*
4102 * shared registers between kvm and userspace.
4103 * kvm_valid_regs specifies the register classes set by the host
4104 * kvm_dirty_regs specified the register classes dirtied by userspace
4105 * struct kvm_sync_regs is architecture specific, as well as the
4106 * bits for kvm_valid_regs and kvm_dirty_regs
4107 */
4108 __u64 kvm_valid_regs;
4109 __u64 kvm_dirty_regs;
4110 union {
4111 struct kvm_sync_regs regs;
7b7e3952 4112 char padding[SYNC_REGS_SIZE_BYTES];
b9e5dc8d
CB
4113 } s;
4114
4115If KVM_CAP_SYNC_REGS is defined, these fields allow userspace to access
4116certain guest registers without having to call SET/GET_*REGS. Thus we can
4117avoid some system call overhead if userspace has to handle the exit.
4118Userspace can query the validity of the structure by checking
4119kvm_valid_regs for specific bits. These bits are architecture specific
4120and usually define the validity of a groups of registers. (e.g. one bit
4121 for general purpose registers)
4122
d8482c0d
DH
4123Please note that the kernel is allowed to use the kvm_run structure as the
4124primary storage for certain register types. Therefore, the kernel may use the
4125values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
4126
9c1b96e3 4127};
821246a5 4128
414fa985 4129
9c15bb1d 4130
699a0ea0
PM
41316. Capabilities that can be enabled on vCPUs
4132--------------------------------------------
821246a5 4133
0907c855
CH
4134There are certain capabilities that change the behavior of the virtual CPU or
4135the virtual machine when enabled. To enable them, please see section 4.37.
4136Below you can find a list of capabilities and what their effect on the vCPU or
4137the virtual machine is when enabling them.
821246a5
AG
4138
4139The following information is provided along with the description:
4140
4141 Architectures: which instruction set architectures provide this ioctl.
4142 x86 includes both i386 and x86_64.
4143
0907c855
CH
4144 Target: whether this is a per-vcpu or per-vm capability.
4145
821246a5
AG
4146 Parameters: what parameters are accepted by the capability.
4147
4148 Returns: the return value. General error numbers (EBADF, ENOMEM, EINVAL)
4149 are not detailed, but errors with specific meanings are.
4150
414fa985 4151
821246a5
AG
41526.1 KVM_CAP_PPC_OSI
4153
4154Architectures: ppc
0907c855 4155Target: vcpu
821246a5
AG
4156Parameters: none
4157Returns: 0 on success; -1 on error
4158
4159This capability enables interception of OSI hypercalls that otherwise would
4160be treated as normal system calls to be injected into the guest. OSI hypercalls
4161were invented by Mac-on-Linux to have a standardized communication mechanism
4162between the guest and the host.
4163
4164When this capability is enabled, KVM_EXIT_OSI can occur.
4165
414fa985 4166
821246a5
AG
41676.2 KVM_CAP_PPC_PAPR
4168
4169Architectures: ppc
0907c855 4170Target: vcpu
821246a5
AG
4171Parameters: none
4172Returns: 0 on success; -1 on error
4173
4174This capability enables interception of PAPR hypercalls. PAPR hypercalls are
4175done using the hypercall instruction "sc 1".
4176
4177It also sets the guest privilege level to "supervisor" mode. Usually the guest
4178runs in "hypervisor" privilege mode with a few missing features.
4179
4180In addition to the above, it changes the semantics of SDR1. In this mode, the
4181HTAB address part of SDR1 contains an HVA instead of a GPA, as PAPR keeps the
4182HTAB invisible to the guest.
4183
4184When this capability is enabled, KVM_EXIT_PAPR_HCALL can occur.
dc83b8bc 4185
414fa985 4186
dc83b8bc
SW
41876.3 KVM_CAP_SW_TLB
4188
4189Architectures: ppc
0907c855 4190Target: vcpu
dc83b8bc
SW
4191Parameters: args[0] is the address of a struct kvm_config_tlb
4192Returns: 0 on success; -1 on error
4193
4194struct kvm_config_tlb {
4195 __u64 params;
4196 __u64 array;
4197 __u32 mmu_type;
4198 __u32 array_len;
4199};
4200
4201Configures the virtual CPU's TLB array, establishing a shared memory area
4202between userspace and KVM. The "params" and "array" fields are userspace
4203addresses of mmu-type-specific data structures. The "array_len" field is an
4204safety mechanism, and should be set to the size in bytes of the memory that
4205userspace has reserved for the array. It must be at least the size dictated
4206by "mmu_type" and "params".
4207
4208While KVM_RUN is active, the shared region is under control of KVM. Its
4209contents are undefined, and any modification by userspace results in
4210boundedly undefined behavior.
4211
4212On return from KVM_RUN, the shared region will reflect the current state of
4213the guest's TLB. If userspace makes any changes, it must call KVM_DIRTY_TLB
4214to tell KVM which entries have been changed, prior to calling KVM_RUN again
4215on this vcpu.
4216
4217For mmu types KVM_MMU_FSL_BOOKE_NOHV and KVM_MMU_FSL_BOOKE_HV:
4218 - The "params" field is of type "struct kvm_book3e_206_tlb_params".
4219 - The "array" field points to an array of type "struct
4220 kvm_book3e_206_tlb_entry".
4221 - The array consists of all entries in the first TLB, followed by all
4222 entries in the second TLB.
4223 - Within a TLB, entries are ordered first by increasing set number. Within a
4224 set, entries are ordered by way (increasing ESEL).
4225 - The hash for determining set number in TLB0 is: (MAS2 >> 12) & (num_sets - 1)
4226 where "num_sets" is the tlb_sizes[] value divided by the tlb_ways[] value.
4227 - The tsize field of mas1 shall be set to 4K on TLB0, even though the
4228 hardware ignores this value for TLB0.
fa6b7fe9
CH
4229
42306.4 KVM_CAP_S390_CSS_SUPPORT
4231
4232Architectures: s390
0907c855 4233Target: vcpu
fa6b7fe9
CH
4234Parameters: none
4235Returns: 0 on success; -1 on error
4236
4237This capability enables support for handling of channel I/O instructions.
4238
4239TEST PENDING INTERRUPTION and the interrupt portion of TEST SUBCHANNEL are
4240handled in-kernel, while the other I/O instructions are passed to userspace.
4241
4242When this capability is enabled, KVM_EXIT_S390_TSCH will occur on TEST
4243SUBCHANNEL intercepts.
1c810636 4244
0907c855
CH
4245Note that even though this capability is enabled per-vcpu, the complete
4246virtual machine is affected.
4247
1c810636
AG
42486.5 KVM_CAP_PPC_EPR
4249
4250Architectures: ppc
0907c855 4251Target: vcpu
1c810636
AG
4252Parameters: args[0] defines whether the proxy facility is active
4253Returns: 0 on success; -1 on error
4254
4255This capability enables or disables the delivery of interrupts through the
4256external proxy facility.
4257
4258When enabled (args[0] != 0), every time the guest gets an external interrupt
4259delivered, it automatically exits into user space with a KVM_EXIT_EPR exit
4260to receive the topmost interrupt vector.
4261
4262When disabled (args[0] == 0), behavior is as if this facility is unsupported.
4263
4264When this capability is enabled, KVM_EXIT_EPR can occur.
eb1e4f43
SW
4265
42666.6 KVM_CAP_IRQ_MPIC
4267
4268Architectures: ppc
4269Parameters: args[0] is the MPIC device fd
4270 args[1] is the MPIC CPU number for this vcpu
4271
4272This capability connects the vcpu to an in-kernel MPIC device.
5975a2e0
PM
4273
42746.7 KVM_CAP_IRQ_XICS
4275
4276Architectures: ppc
0907c855 4277Target: vcpu
5975a2e0
PM
4278Parameters: args[0] is the XICS device fd
4279 args[1] is the XICS CPU number (server ID) for this vcpu
4280
4281This capability connects the vcpu to an in-kernel XICS device.
8a366a4b
CH
4282
42836.8 KVM_CAP_S390_IRQCHIP
4284
4285Architectures: s390
4286Target: vm
4287Parameters: none
4288
4289This capability enables the in-kernel irqchip for s390. Please refer to
4290"4.24 KVM_CREATE_IRQCHIP" for details.
699a0ea0 4291
5fafd874
JH
42926.9 KVM_CAP_MIPS_FPU
4293
4294Architectures: mips
4295Target: vcpu
4296Parameters: args[0] is reserved for future use (should be 0).
4297
4298This capability allows the use of the host Floating Point Unit by the guest. It
4299allows the Config1.FP bit to be set to enable the FPU in the guest. Once this is
4300done the KVM_REG_MIPS_FPR_* and KVM_REG_MIPS_FCR_* registers can be accessed
4301(depending on the current guest FPU register mode), and the Status.FR,
4302Config5.FRE bits are accessible via the KVM API and also from the guest,
4303depending on them being supported by the FPU.
4304
d952bd07
JH
43056.10 KVM_CAP_MIPS_MSA
4306
4307Architectures: mips
4308Target: vcpu
4309Parameters: args[0] is reserved for future use (should be 0).
4310
4311This capability allows the use of the MIPS SIMD Architecture (MSA) by the guest.
4312It allows the Config3.MSAP bit to be set to enable the use of MSA by the guest.
4313Once this is done the KVM_REG_MIPS_VEC_* and KVM_REG_MIPS_MSA_* registers can be
4314accessed, and the Config5.MSAEn bit is accessible via the KVM API and also from
4315the guest.
4316
01643c51
KH
43176.74 KVM_CAP_SYNC_REGS
4318Architectures: s390, x86
4319Target: s390: always enabled, x86: vcpu
4320Parameters: none
4321Returns: x86: KVM_CHECK_EXTENSION returns a bit-array indicating which register
4322sets are supported (bitfields defined in arch/x86/include/uapi/asm/kvm.h).
4323
4324As described above in the kvm_sync_regs struct info in section 5 (kvm_run):
4325KVM_CAP_SYNC_REGS "allow[s] userspace to access certain guest registers
4326without having to call SET/GET_*REGS". This reduces overhead by eliminating
4327repeated ioctl calls for setting and/or getting register values. This is
4328particularly important when userspace is making synchronous guest state
4329modifications, e.g. when emulating and/or intercepting instructions in
4330userspace.
4331
4332For s390 specifics, please refer to the source code.
4333
4334For x86:
4335- the register sets to be copied out to kvm_run are selectable
4336 by userspace (rather that all sets being copied out for every exit).
4337- vcpu_events are available in addition to regs and sregs.
4338
4339For x86, the 'kvm_valid_regs' field of struct kvm_run is overloaded to
4340function as an input bit-array field set by userspace to indicate the
4341specific register sets to be copied out on the next exit.
4342
4343To indicate when userspace has modified values that should be copied into
4344the vCPU, the all architecture bitarray field, 'kvm_dirty_regs' must be set.
4345This is done using the same bitflags as for the 'kvm_valid_regs' field.
4346If the dirty bit is not set, then the register set values will not be copied
4347into the vCPU even if they've been modified.
4348
4349Unused bitfields in the bitarrays must be set to zero.
4350
4351struct kvm_sync_regs {
4352 struct kvm_regs regs;
4353 struct kvm_sregs sregs;
4354 struct kvm_vcpu_events events;
4355};
4356
699a0ea0
PM
43577. Capabilities that can be enabled on VMs
4358------------------------------------------
4359
4360There are certain capabilities that change the behavior of the virtual
4361machine when enabled. To enable them, please see section 4.37. Below
4362you can find a list of capabilities and what their effect on the VM
4363is when enabling them.
4364
4365The following information is provided along with the description:
4366
4367 Architectures: which instruction set architectures provide this ioctl.
4368 x86 includes both i386 and x86_64.
4369
4370 Parameters: what parameters are accepted by the capability.
4371
4372 Returns: the return value. General error numbers (EBADF, ENOMEM, EINVAL)
4373 are not detailed, but errors with specific meanings are.
4374
4375
43767.1 KVM_CAP_PPC_ENABLE_HCALL
4377
4378Architectures: ppc
4379Parameters: args[0] is the sPAPR hcall number
4380 args[1] is 0 to disable, 1 to enable in-kernel handling
4381
4382This capability controls whether individual sPAPR hypercalls (hcalls)
4383get handled by the kernel or not. Enabling or disabling in-kernel
4384handling of an hcall is effective across the VM. On creation, an
4385initial set of hcalls are enabled for in-kernel handling, which
4386consists of those hcalls for which in-kernel handlers were implemented
4387before this capability was implemented. If disabled, the kernel will
4388not to attempt to handle the hcall, but will always exit to userspace
4389to handle it. Note that it may not make sense to enable some and
4390disable others of a group of related hcalls, but KVM does not prevent
4391userspace from doing that.
ae2113a4
PM
4392
4393If the hcall number specified is not one that has an in-kernel
4394implementation, the KVM_ENABLE_CAP ioctl will fail with an EINVAL
4395error.
2444b352
DH
4396
43977.2 KVM_CAP_S390_USER_SIGP
4398
4399Architectures: s390
4400Parameters: none
4401
4402This capability controls which SIGP orders will be handled completely in user
4403space. With this capability enabled, all fast orders will be handled completely
4404in the kernel:
4405- SENSE
4406- SENSE RUNNING
4407- EXTERNAL CALL
4408- EMERGENCY SIGNAL
4409- CONDITIONAL EMERGENCY SIGNAL
4410
4411All other orders will be handled completely in user space.
4412
4413Only privileged operation exceptions will be checked for in the kernel (or even
4414in the hardware prior to interception). If this capability is not enabled, the
4415old way of handling SIGP orders is used (partially in kernel and user space).
68c55750
EF
4416
44177.3 KVM_CAP_S390_VECTOR_REGISTERS
4418
4419Architectures: s390
4420Parameters: none
4421Returns: 0 on success, negative value on error
4422
4423Allows use of the vector registers introduced with z13 processor, and
4424provides for the synchronization between host and user space. Will
4425return -EINVAL if the machine does not support vectors.
e44fc8c9
ET
4426
44277.4 KVM_CAP_S390_USER_STSI
4428
4429Architectures: s390
4430Parameters: none
4431
4432This capability allows post-handlers for the STSI instruction. After
4433initial handling in the kernel, KVM exits to user space with
4434KVM_EXIT_S390_STSI to allow user space to insert further data.
4435
4436Before exiting to userspace, kvm handlers should fill in s390_stsi field of
4437vcpu->run:
4438struct {
4439 __u64 addr;
4440 __u8 ar;
4441 __u8 reserved;
4442 __u8 fc;
4443 __u8 sel1;
4444 __u16 sel2;
4445} s390_stsi;
4446
4447@addr - guest address of STSI SYSIB
4448@fc - function code
4449@sel1 - selector 1
4450@sel2 - selector 2
4451@ar - access register number
4452
4453KVM handlers should exit to userspace with rc = -EREMOTE.
e928e9cb 4454
49df6397
SR
44557.5 KVM_CAP_SPLIT_IRQCHIP
4456
4457Architectures: x86
b053b2ae 4458Parameters: args[0] - number of routes reserved for userspace IOAPICs
49df6397
SR
4459Returns: 0 on success, -1 on error
4460
4461Create a local apic for each processor in the kernel. This can be used
4462instead of KVM_CREATE_IRQCHIP if the userspace VMM wishes to emulate the
4463IOAPIC and PIC (and also the PIT, even though this has to be enabled
4464separately).
4465
b053b2ae
SR
4466This capability also enables in kernel routing of interrupt requests;
4467when KVM_CAP_SPLIT_IRQCHIP only routes of KVM_IRQ_ROUTING_MSI type are
4468used in the IRQ routing table. The first args[0] MSI routes are reserved
4469for the IOAPIC pins. Whenever the LAPIC receives an EOI for these routes,
4470a KVM_EXIT_IOAPIC_EOI vmexit will be reported to userspace.
49df6397
SR
4471
4472Fails if VCPU has already been created, or if the irqchip is already in the
4473kernel (i.e. KVM_CREATE_IRQCHIP has already been called).
4474
051c87f7
DH
44757.6 KVM_CAP_S390_RI
4476
4477Architectures: s390
4478Parameters: none
4479
4480Allows use of runtime-instrumentation introduced with zEC12 processor.
4481Will return -EINVAL if the machine does not support runtime-instrumentation.
4482Will return -EBUSY if a VCPU has already been created.
e928e9cb 4483
37131313
RK
44847.7 KVM_CAP_X2APIC_API
4485
4486Architectures: x86
4487Parameters: args[0] - features that should be enabled
4488Returns: 0 on success, -EINVAL when args[0] contains invalid features
4489
4490Valid feature flags in args[0] are
4491
4492#define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
c519265f 4493#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
37131313
RK
4494
4495Enabling KVM_X2APIC_API_USE_32BIT_IDS changes the behavior of
4496KVM_SET_GSI_ROUTING, KVM_SIGNAL_MSI, KVM_SET_LAPIC, and KVM_GET_LAPIC,
4497allowing the use of 32-bit APIC IDs. See KVM_CAP_X2APIC_API in their
4498respective sections.
4499
c519265f
RK
4500KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK must be enabled for x2APIC to work
4501in logical mode or with more than 255 VCPUs. Otherwise, KVM treats 0xff
4502as a broadcast even in x2APIC mode in order to support physical x2APIC
4503without interrupt remapping. This is undesirable in logical mode,
4504where 0xff represents CPUs 0-7 in cluster 0.
37131313 4505
6502a34c
DH
45067.8 KVM_CAP_S390_USER_INSTR0
4507
4508Architectures: s390
4509Parameters: none
4510
4511With this capability enabled, all illegal instructions 0x0000 (2 bytes) will
4512be intercepted and forwarded to user space. User space can use this
4513mechanism e.g. to realize 2-byte software breakpoints. The kernel will
4514not inject an operating exception for these instructions, user space has
4515to take care of that.
4516
4517This capability can be enabled dynamically even if VCPUs were already
4518created and are running.
37131313 4519
4e0b1ab7
FZ
45207.9 KVM_CAP_S390_GS
4521
4522Architectures: s390
4523Parameters: none
4524Returns: 0 on success; -EINVAL if the machine does not support
4525 guarded storage; -EBUSY if a VCPU has already been created.
4526
4527Allows use of guarded storage for the KVM guest.
4528
47a4693e
YMZ
45297.10 KVM_CAP_S390_AIS
4530
4531Architectures: s390
4532Parameters: none
4533
4534Allow use of adapter-interruption suppression.
4535Returns: 0 on success; -EBUSY if a VCPU has already been created.
4536
3c313524
PM
45377.11 KVM_CAP_PPC_SMT
4538
4539Architectures: ppc
4540Parameters: vsmt_mode, flags
4541
4542Enabling this capability on a VM provides userspace with a way to set
4543the desired virtual SMT mode (i.e. the number of virtual CPUs per
4544virtual core). The virtual SMT mode, vsmt_mode, must be a power of 2
4545between 1 and 8. On POWER8, vsmt_mode must also be no greater than
4546the number of threads per subcore for the host. Currently flags must
4547be 0. A successful call to enable this capability will result in
4548vsmt_mode being returned when the KVM_CAP_PPC_SMT capability is
4549subsequently queried for the VM. This capability is only supported by
4550HV KVM, and can only be set before any VCPUs have been created.
2ed4f9dd
PM
4551The KVM_CAP_PPC_SMT_POSSIBLE capability indicates which virtual SMT
4552modes are available.
3c313524 4553
134764ed
AP
45547.12 KVM_CAP_PPC_FWNMI
4555
4556Architectures: ppc
4557Parameters: none
4558
4559With this capability a machine check exception in the guest address
4560space will cause KVM to exit the guest with NMI exit reason. This
4561enables QEMU to build error log and branch to guest kernel registered
4562machine check handling routine. Without this capability KVM will
4563branch to guests' 0x200 interrupt vector.
4564
4d5422ce
WL
45657.13 KVM_CAP_X86_DISABLE_EXITS
4566
4567Architectures: x86
4568Parameters: args[0] defines which exits are disabled
4569Returns: 0 on success, -EINVAL when args[0] contains invalid exits
4570
4571Valid bits in args[0] are
4572
4573#define KVM_X86_DISABLE_EXITS_MWAIT (1 << 0)
caa057a2 4574#define KVM_X86_DISABLE_EXITS_HLT (1 << 1)
4d5422ce
WL
4575
4576Enabling this capability on a VM provides userspace with a way to no
4577longer intercept some instructions for improved latency in some
4578workloads, and is suggested when vCPUs are associated to dedicated
4579physical CPUs. More bits can be added in the future; userspace can
4580just pass the KVM_CHECK_EXTENSION result to KVM_ENABLE_CAP to disable
4581all such vmexits.
4582
caa057a2 4583Do not enable KVM_FEATURE_PV_UNHALT if you disable HLT exits.
4d5422ce 4584
a4499382
JF
45857.14 KVM_CAP_S390_HPAGE_1M
4586
4587Architectures: s390
4588Parameters: none
4589Returns: 0 on success, -EINVAL if hpage module parameter was not set
40ebdb8e
JF
4590 or cmma is enabled, or the VM has the KVM_VM_S390_UCONTROL
4591 flag set
a4499382
JF
4592
4593With this capability the KVM support for memory backing with 1m pages
4594through hugetlbfs can be enabled for a VM. After the capability is
4595enabled, cmma can't be enabled anymore and pfmfi and the storage key
4596interpretation are disabled. If cmma has already been enabled or the
4597hpage module parameter is not set to 1, -EINVAL is returned.
4598
4599While it is generally possible to create a huge page backed VM without
4600this capability, the VM will not be able to run.
4601
c4f55198 46027.15 KVM_CAP_MSR_PLATFORM_INFO
6fbbde9a
DS
4603
4604Architectures: x86
4605Parameters: args[0] whether feature should be enabled or not
4606
4607With this capability, a guest may read the MSR_PLATFORM_INFO MSR. Otherwise,
4608a #GP would be raised when the guest tries to access. Currently, this
4609capability does not enable write permissions of this MSR for the guest.
4610
aa069a99
PM
46117.16 KVM_CAP_PPC_NESTED_HV
4612
4613Architectures: ppc
4614Parameters: none
4615Returns: 0 on success, -EINVAL when the implementation doesn't support
4616 nested-HV virtualization.
4617
4618HV-KVM on POWER9 and later systems allows for "nested-HV"
4619virtualization, which provides a way for a guest VM to run guests that
4620can run using the CPU's supervisor mode (privileged non-hypervisor
4621state). Enabling this capability on a VM depends on the CPU having
4622the necessary functionality and on the facility being enabled with a
4623kvm-hv module parameter.
4624
c4f55198
JM
46257.17 KVM_CAP_EXCEPTION_PAYLOAD
4626
4627Architectures: x86
4628Parameters: args[0] whether feature should be enabled or not
4629
4630With this capability enabled, CR2 will not be modified prior to the
4631emulated VM-exit when L1 intercepts a #PF exception that occurs in
4632L2. Similarly, for kvm-intel only, DR6 will not be modified prior to
4633the emulated VM-exit when L1 intercepts a #DB exception that occurs in
4634L2. As a result, when KVM_GET_VCPU_EVENTS reports a pending #PF (or
4635#DB) exception for L2, exception.has_payload will be set and the
4636faulting address (or the new DR6 bits*) will be reported in the
4637exception_payload field. Similarly, when userspace injects a #PF (or
4638#DB) into L2 using KVM_SET_VCPU_EVENTS, it is expected to set
4639exception.has_payload and to put the faulting address (or the new DR6
4640bits*) in the exception_payload field.
4641
4642This capability also enables exception.pending in struct
4643kvm_vcpu_events, which allows userspace to distinguish between pending
4644and injected exceptions.
4645
4646
4647* For the new DR6 bits, note that bit 16 is set iff the #DB exception
4648 will clear DR6.RTM.
4649
e928e9cb
ME
46508. Other capabilities.
4651----------------------
4652
4653This section lists capabilities that give information about other
4654features of the KVM implementation.
4655
46568.1 KVM_CAP_PPC_HWRNG
4657
4658Architectures: ppc
4659
4660This capability, if KVM_CHECK_EXTENSION indicates that it is
4661available, means that that the kernel has an implementation of the
4662H_RANDOM hypercall backed by a hardware random-number generator.
4663If present, the kernel H_RANDOM handler can be enabled for guest use
4664with the KVM_CAP_PPC_ENABLE_HCALL capability.
5c919412
AS
4665
46668.2 KVM_CAP_HYPERV_SYNIC
4667
4668Architectures: x86
4669This capability, if KVM_CHECK_EXTENSION indicates that it is
4670available, means that that the kernel has an implementation of the
4671Hyper-V Synthetic interrupt controller(SynIC). Hyper-V SynIC is
4672used to support Windows Hyper-V based guest paravirt drivers(VMBus).
4673
4674In order to use SynIC, it has to be activated by setting this
4675capability via KVM_ENABLE_CAP ioctl on the vcpu fd. Note that this
4676will disable the use of APIC hardware virtualization even if supported
4677by the CPU, as it's incompatible with SynIC auto-EOI behavior.
c9270132
PM
4678
46798.3 KVM_CAP_PPC_RADIX_MMU
4680
4681Architectures: ppc
4682
4683This capability, if KVM_CHECK_EXTENSION indicates that it is
4684available, means that that the kernel can support guests using the
4685radix MMU defined in Power ISA V3.00 (as implemented in the POWER9
4686processor).
4687
46888.4 KVM_CAP_PPC_HASH_MMU_V3
4689
4690Architectures: ppc
4691
4692This capability, if KVM_CHECK_EXTENSION indicates that it is
4693available, means that that the kernel can support guests using the
4694hashed page table MMU defined in Power ISA V3.00 (as implemented in
4695the POWER9 processor), including in-memory segment tables.
a8a3c426
JH
4696
46978.5 KVM_CAP_MIPS_VZ
4698
4699Architectures: mips
4700
4701This capability, if KVM_CHECK_EXTENSION on the main kvm handle indicates that
4702it is available, means that full hardware assisted virtualization capabilities
4703of the hardware are available for use through KVM. An appropriate
4704KVM_VM_MIPS_* type must be passed to KVM_CREATE_VM to create a VM which
4705utilises it.
4706
4707If KVM_CHECK_EXTENSION on a kvm VM handle indicates that this capability is
4708available, it means that the VM is using full hardware assisted virtualization
4709capabilities of the hardware. This is useful to check after creating a VM with
4710KVM_VM_MIPS_DEFAULT.
4711
4712The value returned by KVM_CHECK_EXTENSION should be compared against known
4713values (see below). All other values are reserved. This is to allow for the
4714possibility of other hardware assisted virtualization implementations which
4715may be incompatible with the MIPS VZ ASE.
4716
4717 0: The trap & emulate implementation is in use to run guest code in user
4718 mode. Guest virtual memory segments are rearranged to fit the guest in the
4719 user mode address space.
4720
4721 1: The MIPS VZ ASE is in use, providing full hardware assisted
4722 virtualization, including standard guest virtual memory segments.
4723
47248.6 KVM_CAP_MIPS_TE
4725
4726Architectures: mips
4727
4728This capability, if KVM_CHECK_EXTENSION on the main kvm handle indicates that
4729it is available, means that the trap & emulate implementation is available to
4730run guest code in user mode, even if KVM_CAP_MIPS_VZ indicates that hardware
4731assisted virtualisation is also available. KVM_VM_MIPS_TE (0) must be passed
4732to KVM_CREATE_VM to create a VM which utilises it.
4733
4734If KVM_CHECK_EXTENSION on a kvm VM handle indicates that this capability is
4735available, it means that the VM is using trap & emulate.
578fd61d
JH
4736
47378.7 KVM_CAP_MIPS_64BIT
4738
4739Architectures: mips
4740
4741This capability indicates the supported architecture type of the guest, i.e. the
4742supported register and address width.
4743
4744The values returned when this capability is checked by KVM_CHECK_EXTENSION on a
4745kvm VM handle correspond roughly to the CP0_Config.AT register field, and should
4746be checked specifically against known values (see below). All other values are
4747reserved.
4748
4749 0: MIPS32 or microMIPS32.
4750 Both registers and addresses are 32-bits wide.
4751 It will only be possible to run 32-bit guest code.
4752
4753 1: MIPS64 or microMIPS64 with access only to 32-bit compatibility segments.
4754 Registers are 64-bits wide, but addresses are 32-bits wide.
4755 64-bit guest code may run but cannot access MIPS64 memory segments.
4756 It will also be possible to run 32-bit guest code.
4757
4758 2: MIPS64 or microMIPS64 with access to all address segments.
4759 Both registers and addresses are 64-bits wide.
4760 It will be possible to run 64-bit or 32-bit guest code.
668fffa3 4761
c24a7be2 47628.9 KVM_CAP_ARM_USER_IRQ
3fe17e68
AG
4763
4764Architectures: arm, arm64
4765This capability, if KVM_CHECK_EXTENSION indicates that it is available, means
4766that if userspace creates a VM without an in-kernel interrupt controller, it
4767will be notified of changes to the output level of in-kernel emulated devices,
4768which can generate virtual interrupts, presented to the VM.
4769For such VMs, on every return to userspace, the kernel
4770updates the vcpu's run->s.regs.device_irq_level field to represent the actual
4771output level of the device.
4772
4773Whenever kvm detects a change in the device output level, kvm guarantees at
4774least one return to userspace before running the VM. This exit could either
4775be a KVM_EXIT_INTR or any other exit event, like KVM_EXIT_MMIO. This way,
4776userspace can always sample the device output level and re-compute the state of
4777the userspace interrupt controller. Userspace should always check the state
4778of run->s.regs.device_irq_level on every kvm exit.
4779The value in run->s.regs.device_irq_level can represent both level and edge
4780triggered interrupt signals, depending on the device. Edge triggered interrupt
4781signals will exit to userspace with the bit in run->s.regs.device_irq_level
4782set exactly once per edge signal.
4783
4784The field run->s.regs.device_irq_level is available independent of
4785run->kvm_valid_regs or run->kvm_dirty_regs bits.
4786
4787If KVM_CAP_ARM_USER_IRQ is supported, the KVM_CHECK_EXTENSION ioctl returns a
4788number larger than 0 indicating the version of this capability is implemented
4789and thereby which bits in in run->s.regs.device_irq_level can signal values.
4790
4791Currently the following bits are defined for the device_irq_level bitmap:
4792
4793 KVM_CAP_ARM_USER_IRQ >= 1:
4794
4795 KVM_ARM_DEV_EL1_VTIMER - EL1 virtual timer
4796 KVM_ARM_DEV_EL1_PTIMER - EL1 physical timer
4797 KVM_ARM_DEV_PMU - ARM PMU overflow interrupt signal
4798
4799Future versions of kvm may implement additional events. These will get
4800indicated by returning a higher number from KVM_CHECK_EXTENSION and will be
4801listed above.
2ed4f9dd
PM
4802
48038.10 KVM_CAP_PPC_SMT_POSSIBLE
4804
4805Architectures: ppc
4806
4807Querying this capability returns a bitmap indicating the possible
4808virtual SMT modes that can be set using KVM_CAP_PPC_SMT. If bit N
4809(counting from the right) is set, then a virtual SMT mode of 2^N is
4810available.
efc479e6
RK
4811
48128.11 KVM_CAP_HYPERV_SYNIC2
4813
4814Architectures: x86
4815
4816This capability enables a newer version of Hyper-V Synthetic interrupt
4817controller (SynIC). The only difference with KVM_CAP_HYPERV_SYNIC is that KVM
4818doesn't clear SynIC message and event flags pages when they are enabled by
4819writing to the respective MSRs.
d3457c87
RK
4820
48218.12 KVM_CAP_HYPERV_VP_INDEX
4822
4823Architectures: x86
4824
4825This capability indicates that userspace can load HV_X64_MSR_VP_INDEX msr. Its
4826value is used to denote the target vcpu for a SynIC interrupt. For
4827compatibilty, KVM initializes this msr to KVM's internal vcpu index. When this
4828capability is absent, userspace can still query this msr's value.
da9a1446
CB
4829
48308.13 KVM_CAP_S390_AIS_MIGRATION
4831
4832Architectures: s390
4833Parameters: none
4834
4835This capability indicates if the flic device will be able to get/set the
4836AIS states for migration via the KVM_DEV_FLIC_AISM_ALL attribute and allows
4837to discover this without having to create a flic device.
5c2b4d5b
CB
4838
48398.14 KVM_CAP_S390_PSW
4840
4841Architectures: s390
4842
4843This capability indicates that the PSW is exposed via the kvm_run structure.
4844
48458.15 KVM_CAP_S390_GMAP
4846
4847Architectures: s390
4848
4849This capability indicates that the user space memory used as guest mapping can
4850be anywhere in the user memory address space, as long as the memory slots are
4851aligned and sized to a segment (1MB) boundary.
4852
48538.16 KVM_CAP_S390_COW
4854
4855Architectures: s390
4856
4857This capability indicates that the user space memory used as guest mapping can
4858use copy-on-write semantics as well as dirty pages tracking via read-only page
4859tables.
4860
48618.17 KVM_CAP_S390_BPB
4862
4863Architectures: s390
4864
4865This capability indicates that kvm will implement the interfaces to handle
4866reset, migration and nested KVM for branch prediction blocking. The stfle
4867facility 82 should not be provided to the guest without this capability.
c1aea919 4868
2ddc6498 48698.18 KVM_CAP_HYPERV_TLBFLUSH
c1aea919
VK
4870
4871Architectures: x86
4872
4873This capability indicates that KVM supports paravirtualized Hyper-V TLB Flush
4874hypercalls:
4875HvFlushVirtualAddressSpace, HvFlushVirtualAddressSpaceEx,
4876HvFlushVirtualAddressList, HvFlushVirtualAddressListEx.
be26b3a7 4877
688e0581 48788.19 KVM_CAP_ARM_INJECT_SERROR_ESR
be26b3a7
DG
4879
4880Architectures: arm, arm64
4881
4882This capability indicates that userspace can specify (via the
4883KVM_SET_VCPU_EVENTS ioctl) the syndrome value reported to the guest when it
4884takes a virtual SError interrupt exception.
4885If KVM advertises this capability, userspace can only specify the ISS field for
4886the ESR syndrome. Other parts of the ESR, such as the EC are generated by the
4887CPU when the exception is taken. If this virtual SError is taken to EL1 using
4888AArch64, this value will be reported in the ISS field of ESR_ELx.
4889
4890See KVM_CAP_VCPU_EVENTS for more details.
214ff83d
VK
48918.20 KVM_CAP_HYPERV_SEND_IPI
4892
4893Architectures: x86
4894
4895This capability indicates that KVM supports paravirtualized Hyper-V IPI send
4896hypercalls:
4897HvCallSendSyntheticClusterIpi, HvCallSendSyntheticClusterIpiEx.