]>
Commit | Line | Data |
---|---|---|
1f673135 | 1 | \input texinfo @c -*- texinfo -*- |
debc7065 FB |
2 | @c %**start of header |
3 | @setfilename qemu-tech.info | |
e080e785 SW |
4 | |
5 | @documentlanguage en | |
6 | @documentencoding UTF-8 | |
7 | ||
debc7065 FB |
8 | @settitle QEMU Internals |
9 | @exampleindent 0 | |
10 | @paragraphindent 0 | |
11 | @c %**end of header | |
1f673135 | 12 | |
a1a32b05 SW |
13 | @ifinfo |
14 | @direntry | |
15 | * QEMU Internals: (qemu-tech). The QEMU Emulator Internals. | |
16 | @end direntry | |
17 | @end ifinfo | |
18 | ||
1f673135 | 19 | @iftex |
1f673135 FB |
20 | @titlepage |
21 | @sp 7 | |
22 | @center @titlefont{QEMU Internals} | |
23 | @sp 3 | |
24 | @end titlepage | |
25 | @end iftex | |
26 | ||
debc7065 FB |
27 | @ifnottex |
28 | @node Top | |
29 | @top | |
30 | ||
31 | @menu | |
32 | * Introduction:: | |
33 | * QEMU Internals:: | |
34 | * Regression Tests:: | |
debc7065 FB |
35 | @end menu |
36 | @end ifnottex | |
37 | ||
38 | @contents | |
39 | ||
40 | @node Introduction | |
1f673135 FB |
41 | @chapter Introduction |
42 | ||
debc7065 | 43 | @menu |
3aeaea65 MF |
44 | * intro_x86_emulation:: x86 and x86-64 emulation |
45 | * intro_arm_emulation:: ARM emulation | |
46 | * intro_mips_emulation:: MIPS emulation | |
47 | * intro_ppc_emulation:: PowerPC emulation | |
48 | * intro_sparc_emulation:: Sparc32 and Sparc64 emulation | |
49 | * intro_xtensa_emulation:: Xtensa emulation | |
50 | * intro_other_emulation:: Other CPU emulation | |
debc7065 FB |
51 | @end menu |
52 | ||
debc7065 | 53 | @node intro_x86_emulation |
998a0501 | 54 | @section x86 and x86-64 emulation |
1f673135 FB |
55 | |
56 | QEMU x86 target features: | |
57 | ||
5fafdf24 | 58 | @itemize |
1f673135 | 59 | |
5fafdf24 | 60 | @item The virtual x86 CPU supports 16 bit and 32 bit addressing with segmentation. |
998a0501 BS |
61 | LDT/GDT and IDT are emulated. VM86 mode is also supported to run |
62 | DOSEMU. There is some support for MMX/3DNow!, SSE, SSE2, SSE3, SSSE3, | |
63 | and SSE4 as well as x86-64 SVM. | |
1f673135 FB |
64 | |
65 | @item Support of host page sizes bigger than 4KB in user mode emulation. | |
66 | ||
67 | @item QEMU can emulate itself on x86. | |
68 | ||
5fafdf24 | 69 | @item An extensive Linux x86 CPU test program is included @file{tests/test-i386}. |
1f673135 FB |
70 | It can be used to test other x86 virtual CPUs. |
71 | ||
72 | @end itemize | |
73 | ||
74 | Current QEMU limitations: | |
75 | ||
5fafdf24 | 76 | @itemize |
1f673135 | 77 | |
998a0501 | 78 | @item Limited x86-64 support. |
1f673135 FB |
79 | |
80 | @item IPC syscalls are missing. | |
81 | ||
5fafdf24 | 82 | @item The x86 segment limits and access rights are not tested at every |
1f673135 FB |
83 | memory access (yet). Hopefully, very few OSes seem to rely on that for |
84 | normal use. | |
85 | ||
1f673135 FB |
86 | @end itemize |
87 | ||
debc7065 | 88 | @node intro_arm_emulation |
1f673135 FB |
89 | @section ARM emulation |
90 | ||
91 | @itemize | |
92 | ||
93 | @item Full ARM 7 user emulation. | |
94 | ||
95 | @item NWFPE FPU support included in user Linux emulation. | |
96 | ||
97 | @item Can run most ARM Linux binaries. | |
98 | ||
99 | @end itemize | |
100 | ||
24d4de45 TS |
101 | @node intro_mips_emulation |
102 | @section MIPS emulation | |
103 | ||
104 | @itemize | |
105 | ||
106 | @item The system emulation allows full MIPS32/MIPS64 Release 2 emulation, | |
107 | including privileged instructions, FPU and MMU, in both little and big | |
108 | endian modes. | |
109 | ||
110 | @item The Linux userland emulation can run many 32 bit MIPS Linux binaries. | |
111 | ||
112 | @end itemize | |
113 | ||
114 | Current QEMU limitations: | |
115 | ||
116 | @itemize | |
117 | ||
118 | @item Self-modifying code is not always handled correctly. | |
119 | ||
120 | @item 64 bit userland emulation is not implemented. | |
121 | ||
122 | @item The system emulation is not complete enough to run real firmware. | |
123 | ||
b1f45238 TS |
124 | @item The watchpoint debug facility is not implemented. |
125 | ||
24d4de45 TS |
126 | @end itemize |
127 | ||
debc7065 | 128 | @node intro_ppc_emulation |
1f673135 FB |
129 | @section PowerPC emulation |
130 | ||
131 | @itemize | |
132 | ||
5fafdf24 | 133 | @item Full PowerPC 32 bit emulation, including privileged instructions, |
1f673135 FB |
134 | FPU and MMU. |
135 | ||
136 | @item Can run most PowerPC Linux binaries. | |
137 | ||
138 | @end itemize | |
139 | ||
debc7065 | 140 | @node intro_sparc_emulation |
998a0501 | 141 | @section Sparc32 and Sparc64 emulation |
1f673135 FB |
142 | |
143 | @itemize | |
144 | ||
f6b647cd | 145 | @item Full SPARC V8 emulation, including privileged |
3475187d | 146 | instructions, FPU and MMU. SPARC V9 emulation includes most privileged |
a785e42e | 147 | and VIS instructions, FPU and I/D MMU. Alignment is fully enforced. |
1f673135 | 148 | |
a785e42e BS |
149 | @item Can run most 32-bit SPARC Linux binaries, SPARC32PLUS Linux binaries and |
150 | some 64-bit SPARC Linux binaries. | |
3475187d FB |
151 | |
152 | @end itemize | |
153 | ||
154 | Current QEMU limitations: | |
155 | ||
5fafdf24 | 156 | @itemize |
3475187d | 157 | |
3475187d FB |
158 | @item IPC syscalls are missing. |
159 | ||
1f587329 | 160 | @item Floating point exception support is buggy. |
3475187d FB |
161 | |
162 | @item Atomic instructions are not correctly implemented. | |
163 | ||
998a0501 BS |
164 | @item There are still some problems with Sparc64 emulators. |
165 | ||
166 | @end itemize | |
167 | ||
3aeaea65 MF |
168 | @node intro_xtensa_emulation |
169 | @section Xtensa emulation | |
170 | ||
171 | @itemize | |
172 | ||
173 | @item Core Xtensa ISA emulation, including most options: code density, | |
174 | loop, extended L32R, 16- and 32-bit multiplication, 32-bit division, | |
044d003d MF |
175 | MAC16, miscellaneous operations, boolean, FP coprocessor, coprocessor |
176 | context, debug, multiprocessor synchronization, | |
3aeaea65 MF |
177 | conditional store, exceptions, relocatable vectors, unaligned exception, |
178 | interrupts (including high priority and timer), hardware alignment, | |
179 | region protection, region translation, MMU, windowed registers, thread | |
180 | pointer, processor ID. | |
181 | ||
044d003d MF |
182 | @item Not implemented options: data/instruction cache (including cache |
183 | prefetch and locking), XLMI, processor interface. Also options not | |
184 | covered by the core ISA (e.g. FLIX, wide branches) are not implemented. | |
3aeaea65 MF |
185 | |
186 | @item Can run most Xtensa Linux binaries. | |
187 | ||
188 | @item New core configuration that requires no additional instructions | |
189 | may be created from overlay with minimal amount of hand-written code. | |
190 | ||
191 | @end itemize | |
192 | ||
998a0501 BS |
193 | @node intro_other_emulation |
194 | @section Other CPU emulation | |
1f673135 | 195 | |
998a0501 BS |
196 | In addition to the above, QEMU supports emulation of other CPUs with |
197 | varying levels of success. These are: | |
198 | ||
199 | @itemize | |
200 | ||
201 | @item | |
202 | Alpha | |
203 | @item | |
204 | CRIS | |
205 | @item | |
206 | M68k | |
207 | @item | |
208 | SH4 | |
1f673135 FB |
209 | @end itemize |
210 | ||
debc7065 | 211 | @node QEMU Internals |
1f673135 FB |
212 | @chapter QEMU Internals |
213 | ||
debc7065 FB |
214 | @menu |
215 | * QEMU compared to other emulators:: | |
216 | * Portable dynamic translation:: | |
debc7065 FB |
217 | * CPU state optimisations:: |
218 | * Translation cache:: | |
219 | * Direct block chaining:: | |
220 | * Self-modifying code and translated code invalidation:: | |
221 | * Exception support:: | |
222 | * MMU emulation:: | |
998a0501 | 223 | * Device emulation:: |
debc7065 FB |
224 | * Bibliography:: |
225 | @end menu | |
226 | ||
227 | @node QEMU compared to other emulators | |
1f673135 FB |
228 | @section QEMU compared to other emulators |
229 | ||
8e9620a6 | 230 | Like bochs [1], QEMU emulates an x86 CPU. But QEMU is much faster than |
1f673135 FB |
231 | bochs as it uses dynamic compilation. Bochs is closely tied to x86 PC |
232 | emulation while QEMU can emulate several processors. | |
233 | ||
234 | Like Valgrind [2], QEMU does user space emulation and dynamic | |
235 | translation. Valgrind is mainly a memory debugger while QEMU has no | |
236 | support for it (QEMU could be used to detect out of bound memory | |
237 | accesses as Valgrind, but it has no support to track uninitialised data | |
238 | as Valgrind does). The Valgrind dynamic translator generates better code | |
239 | than QEMU (in particular it does register allocation) but it is closely | |
240 | tied to an x86 host and target and has no support for precise exceptions | |
241 | and system emulation. | |
242 | ||
8e9620a6 | 243 | EM86 [3] is the closest project to user space QEMU (and QEMU still uses |
1f673135 FB |
244 | some of its code, in particular the ELF file loader). EM86 was limited |
245 | to an alpha host and used a proprietary and slow interpreter (the | |
8e9620a6 | 246 | interpreter part of the FX!32 Digital Win32 code translator [4]). |
1f673135 | 247 | |
8e9620a6 TH |
248 | TWIN from Willows Software was a Windows API emulator like Wine. It is less |
249 | accurate than Wine but includes a protected mode x86 interpreter to launch | |
250 | x86 Windows executables. Such an approach has greater potential because most | |
251 | of the Windows API is executed natively but it is far more difficult to | |
252 | develop because all the data structures and function parameters exchanged | |
1f673135 FB |
253 | between the API and the x86 code must be converted. |
254 | ||
8e9620a6 | 255 | User mode Linux [5] was the only solution before QEMU to launch a |
1f673135 FB |
256 | Linux kernel as a process while not needing any host kernel |
257 | patches. However, user mode Linux requires heavy kernel patches while | |
258 | QEMU accepts unpatched Linux kernels. The price to pay is that QEMU is | |
259 | slower. | |
260 | ||
8e9620a6 | 261 | The Plex86 [6] PC virtualizer is done in the same spirit as the now |
998a0501 BS |
262 | obsolete qemu-fast system emulator. It requires a patched Linux kernel |
263 | to work (you cannot launch the same kernel on your PC), but the | |
264 | patches are really small. As it is a PC virtualizer (no emulation is | |
265 | done except for some privileged instructions), it has the potential of | |
266 | being faster than QEMU. The downside is that a complicated (and | |
267 | potentially unsafe) host kernel patch is needed. | |
1f673135 | 268 | |
8e9620a6 TH |
269 | The commercial PC Virtualizers (VMWare [7], VirtualPC [8]) are faster |
270 | than QEMU (without virtualization), but they all need specific, proprietary | |
1f673135 FB |
271 | and potentially unsafe host drivers. Moreover, they are unable to |
272 | provide cycle exact simulation as an emulator can. | |
273 | ||
8e9620a6 TH |
274 | VirtualBox [9], Xen [10] and KVM [11] are based on QEMU. QEMU-SystemC |
275 | [12] uses QEMU to simulate a system where some hardware devices are | |
998a0501 BS |
276 | developed in SystemC. |
277 | ||
debc7065 | 278 | @node Portable dynamic translation |
1f673135 FB |
279 | @section Portable dynamic translation |
280 | ||
281 | QEMU is a dynamic translator. When it first encounters a piece of code, | |
282 | it converts it to the host instruction set. Usually dynamic translators | |
283 | are very complicated and highly CPU dependent. QEMU uses some tricks | |
284 | which make it relatively easily portable and simple while achieving good | |
285 | performances. | |
286 | ||
bf28a69e PB |
287 | QEMU's dynamic translation backend is called TCG, for "Tiny Code |
288 | Generator". For more information, please take a look at @code{tcg/README}. | |
1f673135 | 289 | |
debc7065 | 290 | @node CPU state optimisations |
1f673135 FB |
291 | @section CPU state optimisations |
292 | ||
998a0501 BS |
293 | The target CPUs have many internal states which change the way it |
294 | evaluates instructions. In order to achieve a good speed, the | |
295 | translation phase considers that some state information of the virtual | |
296 | CPU cannot change in it. The state is recorded in the Translation | |
297 | Block (TB). If the state changes (e.g. privilege level), a new TB will | |
298 | be generated and the previous TB won't be used anymore until the state | |
299 | matches the state recorded in the previous TB. For example, if the SS, | |
300 | DS and ES segments have a zero base, then the translator does not even | |
301 | generate an addition for the segment base. | |
1f673135 FB |
302 | |
303 | [The FPU stack pointer register is not handled that way yet]. | |
304 | ||
debc7065 | 305 | @node Translation cache |
1f673135 FB |
306 | @section Translation cache |
307 | ||
27c8efcb | 308 | A 32 MByte cache holds the most recently used translations. For |
1f673135 FB |
309 | simplicity, it is completely flushed when it is full. A translation unit |
310 | contains just a single basic block (a block of x86 instructions | |
311 | terminated by a jump or by a virtual CPU state change which the | |
312 | translator cannot deduce statically). | |
313 | ||
debc7065 | 314 | @node Direct block chaining |
1f673135 FB |
315 | @section Direct block chaining |
316 | ||
317 | After each translated basic block is executed, QEMU uses the simulated | |
d274e07c | 318 | Program Counter (PC) and other cpu state information (such as the CS |
1f673135 FB |
319 | segment base value) to find the next basic block. |
320 | ||
321 | In order to accelerate the most common cases where the new simulated PC | |
322 | is known, QEMU can patch a basic block so that it jumps directly to the | |
323 | next one. | |
324 | ||
325 | The most portable code uses an indirect jump. An indirect jump makes | |
326 | it easier to make the jump target modification atomic. On some host | |
327 | architectures (such as x86 or PowerPC), the @code{JUMP} opcode is | |
328 | directly patched so that the block chaining has no overhead. | |
329 | ||
debc7065 | 330 | @node Self-modifying code and translated code invalidation |
1f673135 FB |
331 | @section Self-modifying code and translated code invalidation |
332 | ||
333 | Self-modifying code is a special challenge in x86 emulation because no | |
334 | instruction cache invalidation is signaled by the application when code | |
335 | is modified. | |
336 | ||
337 | When translated code is generated for a basic block, the corresponding | |
998a0501 BS |
338 | host page is write protected if it is not already read-only. Then, if |
339 | a write access is done to the page, Linux raises a SEGV signal. QEMU | |
340 | then invalidates all the translated code in the page and enables write | |
341 | accesses to the page. | |
1f673135 FB |
342 | |
343 | Correct translated code invalidation is done efficiently by maintaining | |
344 | a linked list of every translated block contained in a given page. Other | |
5fafdf24 | 345 | linked lists are also maintained to undo direct block chaining. |
1f673135 | 346 | |
998a0501 BS |
347 | On RISC targets, correctly written software uses memory barriers and |
348 | cache flushes, so some of the protection above would not be | |
349 | necessary. However, QEMU still requires that the generated code always | |
350 | matches the target instructions in memory in order to handle | |
351 | exceptions correctly. | |
1f673135 | 352 | |
debc7065 | 353 | @node Exception support |
1f673135 FB |
354 | @section Exception support |
355 | ||
356 | longjmp() is used when an exception such as division by zero is | |
5fafdf24 | 357 | encountered. |
1f673135 FB |
358 | |
359 | The host SIGSEGV and SIGBUS signal handlers are used to get invalid | |
998a0501 BS |
360 | memory accesses. The simulated program counter is found by |
361 | retranslating the corresponding basic block and by looking where the | |
362 | host program counter was at the exception point. | |
1f673135 FB |
363 | |
364 | The virtual CPU cannot retrieve the exact @code{EFLAGS} register because | |
365 | in some cases it is not computed because of condition code | |
366 | optimisations. It is not a big concern because the emulated code can | |
367 | still be restarted in any cases. | |
368 | ||
debc7065 | 369 | @node MMU emulation |
1f673135 FB |
370 | @section MMU emulation |
371 | ||
998a0501 BS |
372 | For system emulation QEMU supports a soft MMU. In that mode, the MMU |
373 | virtual to physical address translation is done at every memory | |
374 | access. QEMU uses an address translation cache to speed up the | |
375 | translation. | |
1f673135 FB |
376 | |
377 | In order to avoid flushing the translated code each time the MMU | |
378 | mappings change, QEMU uses a physically indexed translation cache. It | |
5fafdf24 | 379 | means that each basic block is indexed with its physical address. |
1f673135 FB |
380 | |
381 | When MMU mappings change, only the chaining of the basic blocks is | |
382 | reset (i.e. a basic block can no longer jump directly to another one). | |
383 | ||
998a0501 BS |
384 | @node Device emulation |
385 | @section Device emulation | |
386 | ||
387 | Systems emulated by QEMU are organized by boards. At initialization | |
388 | phase, each board instantiates a number of CPUs, devices, RAM and | |
389 | ROM. Each device in turn can assign I/O ports or memory areas (for | |
390 | MMIO) to its handlers. When the emulation starts, an access to the | |
391 | ports or MMIO memory areas assigned to the device causes the | |
392 | corresponding handler to be called. | |
393 | ||
394 | RAM and ROM are handled more optimally, only the offset to the host | |
395 | memory needs to be added to the guest address. | |
396 | ||
397 | The video RAM of VGA and other display cards is special: it can be | |
398 | read or written directly like RAM, but write accesses cause the memory | |
399 | to be marked with VGA_DIRTY flag as well. | |
400 | ||
401 | QEMU supports some device classes like serial and parallel ports, USB, | |
402 | drives and network devices, by providing APIs for easier connection to | |
403 | the generic, higher level implementations. The API hides the | |
404 | implementation details from the devices, like native device use or | |
405 | advanced block device formats like QCOW. | |
406 | ||
407 | Usually the devices implement a reset method and register support for | |
408 | saving and loading of the device state. The devices can also use | |
409 | timers, especially together with the use of bottom halves (BHs). | |
410 | ||
debc7065 | 411 | @node Bibliography |
1f673135 FB |
412 | @section Bibliography |
413 | ||
414 | @table @asis | |
415 | ||
5fafdf24 | 416 | @item [1] |
8e9620a6 TH |
417 | @url{http://bochs.sourceforge.net/}, the Bochs IA-32 Emulator Project, |
418 | by Kevin Lawton et al. | |
1f673135 FB |
419 | |
420 | @item [2] | |
8e9620a6 TH |
421 | @url{http://www.valgrind.org/}, Valgrind, an open-source memory debugger |
422 | for GNU/Linux. | |
1f673135 FB |
423 | |
424 | @item [3] | |
8e9620a6 TH |
425 | @url{http://ftp.dreamtime.org/pub/linux/Linux-Alpha/em86/v0.2/docs/em86.html}, |
426 | the EM86 x86 emulator on Alpha-Linux. | |
1f673135 FB |
427 | |
428 | @item [4] | |
debc7065 | 429 | @url{http://www.usenix.org/publications/library/proceedings/usenix-nt97/@/full_papers/chernoff/chernoff.pdf}, |
1f673135 FB |
430 | DIGITAL FX!32: Running 32-Bit x86 Applications on Alpha NT, by Anton |
431 | Chernoff and Ray Hookway. | |
432 | ||
8e9620a6 | 433 | @item [5] |
5fafdf24 | 434 | @url{http://user-mode-linux.sourceforge.net/}, |
1f673135 FB |
435 | The User-mode Linux Kernel. |
436 | ||
8e9620a6 | 437 | @item [6] |
5fafdf24 | 438 | @url{http://www.plex86.org/}, |
1f673135 FB |
439 | The new Plex86 project. |
440 | ||
8e9620a6 | 441 | @item [7] |
5fafdf24 | 442 | @url{http://www.vmware.com/}, |
1f673135 FB |
443 | The VMWare PC virtualizer. |
444 | ||
8e9620a6 TH |
445 | @item [8] |
446 | @url{https://www.microsoft.com/download/details.aspx?id=3702}, | |
1f673135 FB |
447 | The VirtualPC PC virtualizer. |
448 | ||
8e9620a6 | 449 | @item [9] |
998a0501 BS |
450 | @url{http://virtualbox.org/}, |
451 | The VirtualBox PC virtualizer. | |
452 | ||
8e9620a6 | 453 | @item [10] |
998a0501 BS |
454 | @url{http://www.xen.org/}, |
455 | The Xen hypervisor. | |
456 | ||
8e9620a6 TH |
457 | @item [11] |
458 | @url{http://www.linux-kvm.org/}, | |
998a0501 BS |
459 | Kernel Based Virtual Machine (KVM). |
460 | ||
8e9620a6 | 461 | @item [12] |
998a0501 BS |
462 | @url{http://www.greensocs.com/projects/QEMUSystemC}, |
463 | QEMU-SystemC, a hardware co-simulator. | |
464 | ||
1f673135 FB |
465 | @end table |
466 | ||
debc7065 | 467 | @node Regression Tests |
1f673135 FB |
468 | @chapter Regression Tests |
469 | ||
470 | In the directory @file{tests/}, various interesting testing programs | |
b1f45238 | 471 | are available. They are used for regression testing. |
1f673135 | 472 | |
debc7065 FB |
473 | @menu |
474 | * test-i386:: | |
475 | * linux-test:: | |
debc7065 FB |
476 | @end menu |
477 | ||
478 | @node test-i386 | |
1f673135 FB |
479 | @section @file{test-i386} |
480 | ||
481 | This program executes most of the 16 bit and 32 bit x86 instructions and | |
482 | generates a text output. It can be compared with the output obtained with | |
483 | a real CPU or another emulator. The target @code{make test} runs this | |
484 | program and a @code{diff} on the generated output. | |
485 | ||
486 | The Linux system call @code{modify_ldt()} is used to create x86 selectors | |
487 | to test some 16 bit addressing and 32 bit with segmentation cases. | |
488 | ||
489 | The Linux system call @code{vm86()} is used to test vm86 emulation. | |
490 | ||
491 | Various exceptions are raised to test most of the x86 user space | |
492 | exception reporting. | |
493 | ||
debc7065 | 494 | @node linux-test |
1f673135 FB |
495 | @section @file{linux-test} |
496 | ||
497 | This program tests various Linux system calls. It is used to verify | |
498 | that the system call parameters are correctly converted between target | |
499 | and host CPUs. | |
500 | ||
debc7065 | 501 | @bye |