]>
Commit | Line | Data |
---|---|---|
1f673135 | 1 | \input texinfo @c -*- texinfo -*- |
debc7065 FB |
2 | @c %**start of header |
3 | @setfilename qemu-tech.info | |
e080e785 SW |
4 | |
5 | @documentlanguage en | |
6 | @documentencoding UTF-8 | |
7 | ||
debc7065 FB |
8 | @settitle QEMU Internals |
9 | @exampleindent 0 | |
10 | @paragraphindent 0 | |
11 | @c %**end of header | |
1f673135 | 12 | |
a1a32b05 SW |
13 | @ifinfo |
14 | @direntry | |
15 | * QEMU Internals: (qemu-tech). The QEMU Emulator Internals. | |
16 | @end direntry | |
17 | @end ifinfo | |
18 | ||
1f673135 | 19 | @iftex |
1f673135 FB |
20 | @titlepage |
21 | @sp 7 | |
22 | @center @titlefont{QEMU Internals} | |
23 | @sp 3 | |
24 | @end titlepage | |
25 | @end iftex | |
26 | ||
debc7065 FB |
27 | @ifnottex |
28 | @node Top | |
29 | @top | |
30 | ||
31 | @menu | |
77d47e16 PB |
32 | * CPU emulation:: |
33 | * Translator Internals:: | |
77d47e16 PB |
34 | * QEMU compared to other emulators:: |
35 | * Bibliography:: | |
debc7065 FB |
36 | @end menu |
37 | @end ifnottex | |
38 | ||
39 | @contents | |
40 | ||
77d47e16 PB |
41 | @node CPU emulation |
42 | @chapter CPU emulation | |
1f673135 | 43 | |
debc7065 | 44 | @menu |
77d47e16 PB |
45 | * x86:: x86 and x86-64 emulation |
46 | * ARM:: ARM emulation | |
47 | * MIPS:: MIPS emulation | |
48 | * PPC:: PowerPC emulation | |
49 | * SPARC:: Sparc32 and Sparc64 emulation | |
50 | * Xtensa:: Xtensa emulation | |
debc7065 FB |
51 | @end menu |
52 | ||
77d47e16 | 53 | @node x86 |
998a0501 | 54 | @section x86 and x86-64 emulation |
1f673135 FB |
55 | |
56 | QEMU x86 target features: | |
57 | ||
5fafdf24 | 58 | @itemize |
1f673135 | 59 | |
5fafdf24 | 60 | @item The virtual x86 CPU supports 16 bit and 32 bit addressing with segmentation. |
998a0501 BS |
61 | LDT/GDT and IDT are emulated. VM86 mode is also supported to run |
62 | DOSEMU. There is some support for MMX/3DNow!, SSE, SSE2, SSE3, SSSE3, | |
63 | and SSE4 as well as x86-64 SVM. | |
1f673135 FB |
64 | |
65 | @item Support of host page sizes bigger than 4KB in user mode emulation. | |
66 | ||
67 | @item QEMU can emulate itself on x86. | |
68 | ||
5fafdf24 | 69 | @item An extensive Linux x86 CPU test program is included @file{tests/test-i386}. |
1f673135 FB |
70 | It can be used to test other x86 virtual CPUs. |
71 | ||
72 | @end itemize | |
73 | ||
74 | Current QEMU limitations: | |
75 | ||
5fafdf24 | 76 | @itemize |
1f673135 | 77 | |
998a0501 | 78 | @item Limited x86-64 support. |
1f673135 FB |
79 | |
80 | @item IPC syscalls are missing. | |
81 | ||
5fafdf24 | 82 | @item The x86 segment limits and access rights are not tested at every |
1f673135 FB |
83 | memory access (yet). Hopefully, very few OSes seem to rely on that for |
84 | normal use. | |
85 | ||
1f673135 FB |
86 | @end itemize |
87 | ||
77d47e16 | 88 | @node ARM |
1f673135 FB |
89 | @section ARM emulation |
90 | ||
91 | @itemize | |
92 | ||
93 | @item Full ARM 7 user emulation. | |
94 | ||
95 | @item NWFPE FPU support included in user Linux emulation. | |
96 | ||
97 | @item Can run most ARM Linux binaries. | |
98 | ||
99 | @end itemize | |
100 | ||
77d47e16 | 101 | @node MIPS |
24d4de45 TS |
102 | @section MIPS emulation |
103 | ||
104 | @itemize | |
105 | ||
106 | @item The system emulation allows full MIPS32/MIPS64 Release 2 emulation, | |
107 | including privileged instructions, FPU and MMU, in both little and big | |
108 | endian modes. | |
109 | ||
110 | @item The Linux userland emulation can run many 32 bit MIPS Linux binaries. | |
111 | ||
112 | @end itemize | |
113 | ||
114 | Current QEMU limitations: | |
115 | ||
116 | @itemize | |
117 | ||
118 | @item Self-modifying code is not always handled correctly. | |
119 | ||
120 | @item 64 bit userland emulation is not implemented. | |
121 | ||
122 | @item The system emulation is not complete enough to run real firmware. | |
123 | ||
b1f45238 TS |
124 | @item The watchpoint debug facility is not implemented. |
125 | ||
24d4de45 TS |
126 | @end itemize |
127 | ||
77d47e16 | 128 | @node PPC |
1f673135 FB |
129 | @section PowerPC emulation |
130 | ||
131 | @itemize | |
132 | ||
5fafdf24 | 133 | @item Full PowerPC 32 bit emulation, including privileged instructions, |
1f673135 FB |
134 | FPU and MMU. |
135 | ||
136 | @item Can run most PowerPC Linux binaries. | |
137 | ||
138 | @end itemize | |
139 | ||
77d47e16 | 140 | @node SPARC |
998a0501 | 141 | @section Sparc32 and Sparc64 emulation |
1f673135 FB |
142 | |
143 | @itemize | |
144 | ||
f6b647cd | 145 | @item Full SPARC V8 emulation, including privileged |
3475187d | 146 | instructions, FPU and MMU. SPARC V9 emulation includes most privileged |
a785e42e | 147 | and VIS instructions, FPU and I/D MMU. Alignment is fully enforced. |
1f673135 | 148 | |
a785e42e BS |
149 | @item Can run most 32-bit SPARC Linux binaries, SPARC32PLUS Linux binaries and |
150 | some 64-bit SPARC Linux binaries. | |
3475187d FB |
151 | |
152 | @end itemize | |
153 | ||
154 | Current QEMU limitations: | |
155 | ||
5fafdf24 | 156 | @itemize |
3475187d | 157 | |
3475187d FB |
158 | @item IPC syscalls are missing. |
159 | ||
1f587329 | 160 | @item Floating point exception support is buggy. |
3475187d FB |
161 | |
162 | @item Atomic instructions are not correctly implemented. | |
163 | ||
998a0501 BS |
164 | @item There are still some problems with Sparc64 emulators. |
165 | ||
166 | @end itemize | |
167 | ||
77d47e16 | 168 | @node Xtensa |
3aeaea65 MF |
169 | @section Xtensa emulation |
170 | ||
171 | @itemize | |
172 | ||
173 | @item Core Xtensa ISA emulation, including most options: code density, | |
174 | loop, extended L32R, 16- and 32-bit multiplication, 32-bit division, | |
044d003d MF |
175 | MAC16, miscellaneous operations, boolean, FP coprocessor, coprocessor |
176 | context, debug, multiprocessor synchronization, | |
3aeaea65 MF |
177 | conditional store, exceptions, relocatable vectors, unaligned exception, |
178 | interrupts (including high priority and timer), hardware alignment, | |
179 | region protection, region translation, MMU, windowed registers, thread | |
180 | pointer, processor ID. | |
181 | ||
044d003d MF |
182 | @item Not implemented options: data/instruction cache (including cache |
183 | prefetch and locking), XLMI, processor interface. Also options not | |
184 | covered by the core ISA (e.g. FLIX, wide branches) are not implemented. | |
3aeaea65 MF |
185 | |
186 | @item Can run most Xtensa Linux binaries. | |
187 | ||
188 | @item New core configuration that requires no additional instructions | |
189 | may be created from overlay with minimal amount of hand-written code. | |
190 | ||
191 | @end itemize | |
192 | ||
77d47e16 PB |
193 | @node Translator Internals |
194 | @chapter Translator Internals | |
1f673135 | 195 | |
1f673135 FB |
196 | QEMU is a dynamic translator. When it first encounters a piece of code, |
197 | it converts it to the host instruction set. Usually dynamic translators | |
198 | are very complicated and highly CPU dependent. QEMU uses some tricks | |
199 | which make it relatively easily portable and simple while achieving good | |
200 | performances. | |
201 | ||
bf28a69e PB |
202 | QEMU's dynamic translation backend is called TCG, for "Tiny Code |
203 | Generator". For more information, please take a look at @code{tcg/README}. | |
1f673135 | 204 | |
36e4970e | 205 | Some notable features of QEMU's dynamic translator are: |
1f673135 | 206 | |
36e4970e PB |
207 | @table @strong |
208 | ||
209 | @item CPU state optimisations: | |
998a0501 BS |
210 | The target CPUs have many internal states which change the way it |
211 | evaluates instructions. In order to achieve a good speed, the | |
212 | translation phase considers that some state information of the virtual | |
213 | CPU cannot change in it. The state is recorded in the Translation | |
214 | Block (TB). If the state changes (e.g. privilege level), a new TB will | |
215 | be generated and the previous TB won't be used anymore until the state | |
36e4970e PB |
216 | matches the state recorded in the previous TB. The same idea can be applied |
217 | to other aspects of the CPU state. For example, on x86, if the SS, | |
998a0501 BS |
218 | DS and ES segments have a zero base, then the translator does not even |
219 | generate an addition for the segment base. | |
1f673135 | 220 | |
36e4970e | 221 | @item Direct block chaining: |
1f673135 | 222 | After each translated basic block is executed, QEMU uses the simulated |
d274e07c | 223 | Program Counter (PC) and other cpu state information (such as the CS |
1f673135 FB |
224 | segment base value) to find the next basic block. |
225 | ||
226 | In order to accelerate the most common cases where the new simulated PC | |
227 | is known, QEMU can patch a basic block so that it jumps directly to the | |
228 | next one. | |
229 | ||
230 | The most portable code uses an indirect jump. An indirect jump makes | |
231 | it easier to make the jump target modification atomic. On some host | |
232 | architectures (such as x86 or PowerPC), the @code{JUMP} opcode is | |
233 | directly patched so that the block chaining has no overhead. | |
234 | ||
36e4970e | 235 | @item Self-modifying code and translated code invalidation: |
1f673135 FB |
236 | Self-modifying code is a special challenge in x86 emulation because no |
237 | instruction cache invalidation is signaled by the application when code | |
238 | is modified. | |
239 | ||
36e4970e PB |
240 | User-mode emulation marks a host page as write-protected (if it is |
241 | not already read-only) every time translated code is generated for a | |
242 | basic block. Then, if a write access is done to the page, Linux raises | |
243 | a SEGV signal. QEMU then invalidates all the translated code in the page | |
244 | and enables write accesses to the page. For system emulation, write | |
245 | protection is achieved through the software MMU. | |
1f673135 FB |
246 | |
247 | Correct translated code invalidation is done efficiently by maintaining | |
248 | a linked list of every translated block contained in a given page. Other | |
5fafdf24 | 249 | linked lists are also maintained to undo direct block chaining. |
1f673135 | 250 | |
998a0501 BS |
251 | On RISC targets, correctly written software uses memory barriers and |
252 | cache flushes, so some of the protection above would not be | |
253 | necessary. However, QEMU still requires that the generated code always | |
254 | matches the target instructions in memory in order to handle | |
255 | exceptions correctly. | |
1f673135 | 256 | |
36e4970e | 257 | @item Exception support: |
1f673135 | 258 | longjmp() is used when an exception such as division by zero is |
5fafdf24 | 259 | encountered. |
1f673135 FB |
260 | |
261 | The host SIGSEGV and SIGBUS signal handlers are used to get invalid | |
36e4970e PB |
262 | memory accesses. QEMU keeps a map from host program counter to |
263 | target program counter, and looks up where the exception happened | |
264 | based on the host program counter at the exception point. | |
265 | ||
266 | On some targets, some bits of the virtual CPU's state are not flushed to the | |
267 | memory until the end of the translation block. This is done for internal | |
268 | emulation state that is rarely accessed directly by the program and/or changes | |
269 | very often throughout the execution of a translation block---this includes | |
270 | condition codes on x86, delay slots on SPARC, conditional execution on | |
271 | ARM, and so on. This state is stored for each target instruction, and | |
272 | looked up on exceptions. | |
273 | ||
274 | @item MMU emulation: | |
275 | For system emulation QEMU uses a software MMU. In that mode, the MMU | |
998a0501 | 276 | virtual to physical address translation is done at every memory |
36e4970e | 277 | access. |
1f673135 | 278 | |
36e4970e | 279 | QEMU uses an address translation cache (TLB) to speed up the translation. |
1f673135 | 280 | In order to avoid flushing the translated code each time the MMU |
36e4970e | 281 | mappings change, all caches in QEMU are physically indexed. This |
5fafdf24 | 282 | means that each basic block is indexed with its physical address. |
1f673135 | 283 | |
36e4970e PB |
284 | In order to avoid invalidating the basic block chain when MMU mappings |
285 | change, chaining is only performed when the destination of the jump | |
286 | shares a page with the basic block that is performing the jump. | |
287 | ||
288 | The MMU can also distinguish RAM and ROM memory areas from MMIO memory | |
289 | areas. Access is faster for RAM and ROM because the translation cache also | |
290 | hosts the offset between guest address and host memory. Accessing MMIO | |
291 | memory areas instead calls out to C code for device emulation. | |
292 | Finally, the MMU helps tracking dirty pages and pages pointed to by | |
293 | translation blocks. | |
294 | @end table | |
998a0501 | 295 | |
77d47e16 PB |
296 | @node QEMU compared to other emulators |
297 | @chapter QEMU compared to other emulators | |
298 | ||
299 | Like bochs [1], QEMU emulates an x86 CPU. But QEMU is much faster than | |
300 | bochs as it uses dynamic compilation. Bochs is closely tied to x86 PC | |
301 | emulation while QEMU can emulate several processors. | |
302 | ||
303 | Like Valgrind [2], QEMU does user space emulation and dynamic | |
304 | translation. Valgrind is mainly a memory debugger while QEMU has no | |
305 | support for it (QEMU could be used to detect out of bound memory | |
306 | accesses as Valgrind, but it has no support to track uninitialised data | |
307 | as Valgrind does). The Valgrind dynamic translator generates better code | |
308 | than QEMU (in particular it does register allocation) but it is closely | |
309 | tied to an x86 host and target and has no support for precise exceptions | |
310 | and system emulation. | |
311 | ||
312 | EM86 [3] is the closest project to user space QEMU (and QEMU still uses | |
313 | some of its code, in particular the ELF file loader). EM86 was limited | |
314 | to an alpha host and used a proprietary and slow interpreter (the | |
315 | interpreter part of the FX!32 Digital Win32 code translator [4]). | |
316 | ||
317 | TWIN from Willows Software was a Windows API emulator like Wine. It is less | |
318 | accurate than Wine but includes a protected mode x86 interpreter to launch | |
319 | x86 Windows executables. Such an approach has greater potential because most | |
320 | of the Windows API is executed natively but it is far more difficult to | |
321 | develop because all the data structures and function parameters exchanged | |
322 | between the API and the x86 code must be converted. | |
323 | ||
324 | User mode Linux [5] was the only solution before QEMU to launch a | |
325 | Linux kernel as a process while not needing any host kernel | |
326 | patches. However, user mode Linux requires heavy kernel patches while | |
327 | QEMU accepts unpatched Linux kernels. The price to pay is that QEMU is | |
328 | slower. | |
329 | ||
330 | The Plex86 [6] PC virtualizer is done in the same spirit as the now | |
331 | obsolete qemu-fast system emulator. It requires a patched Linux kernel | |
332 | to work (you cannot launch the same kernel on your PC), but the | |
333 | patches are really small. As it is a PC virtualizer (no emulation is | |
334 | done except for some privileged instructions), it has the potential of | |
335 | being faster than QEMU. The downside is that a complicated (and | |
336 | potentially unsafe) host kernel patch is needed. | |
337 | ||
338 | The commercial PC Virtualizers (VMWare [7], VirtualPC [8]) are faster | |
339 | than QEMU (without virtualization), but they all need specific, proprietary | |
340 | and potentially unsafe host drivers. Moreover, they are unable to | |
341 | provide cycle exact simulation as an emulator can. | |
342 | ||
343 | VirtualBox [9], Xen [10] and KVM [11] are based on QEMU. QEMU-SystemC | |
344 | [12] uses QEMU to simulate a system where some hardware devices are | |
345 | developed in SystemC. | |
346 | ||
debc7065 | 347 | @node Bibliography |
77d47e16 | 348 | @chapter Bibliography |
1f673135 FB |
349 | |
350 | @table @asis | |
351 | ||
5fafdf24 | 352 | @item [1] |
8e9620a6 TH |
353 | @url{http://bochs.sourceforge.net/}, the Bochs IA-32 Emulator Project, |
354 | by Kevin Lawton et al. | |
1f673135 FB |
355 | |
356 | @item [2] | |
8e9620a6 TH |
357 | @url{http://www.valgrind.org/}, Valgrind, an open-source memory debugger |
358 | for GNU/Linux. | |
1f673135 FB |
359 | |
360 | @item [3] | |
8e9620a6 TH |
361 | @url{http://ftp.dreamtime.org/pub/linux/Linux-Alpha/em86/v0.2/docs/em86.html}, |
362 | the EM86 x86 emulator on Alpha-Linux. | |
1f673135 FB |
363 | |
364 | @item [4] | |
debc7065 | 365 | @url{http://www.usenix.org/publications/library/proceedings/usenix-nt97/@/full_papers/chernoff/chernoff.pdf}, |
1f673135 FB |
366 | DIGITAL FX!32: Running 32-Bit x86 Applications on Alpha NT, by Anton |
367 | Chernoff and Ray Hookway. | |
368 | ||
8e9620a6 | 369 | @item [5] |
5fafdf24 | 370 | @url{http://user-mode-linux.sourceforge.net/}, |
1f673135 FB |
371 | The User-mode Linux Kernel. |
372 | ||
8e9620a6 | 373 | @item [6] |
5fafdf24 | 374 | @url{http://www.plex86.org/}, |
1f673135 FB |
375 | The new Plex86 project. |
376 | ||
8e9620a6 | 377 | @item [7] |
5fafdf24 | 378 | @url{http://www.vmware.com/}, |
1f673135 FB |
379 | The VMWare PC virtualizer. |
380 | ||
8e9620a6 TH |
381 | @item [8] |
382 | @url{https://www.microsoft.com/download/details.aspx?id=3702}, | |
1f673135 FB |
383 | The VirtualPC PC virtualizer. |
384 | ||
8e9620a6 | 385 | @item [9] |
998a0501 BS |
386 | @url{http://virtualbox.org/}, |
387 | The VirtualBox PC virtualizer. | |
388 | ||
8e9620a6 | 389 | @item [10] |
998a0501 BS |
390 | @url{http://www.xen.org/}, |
391 | The Xen hypervisor. | |
392 | ||
8e9620a6 TH |
393 | @item [11] |
394 | @url{http://www.linux-kvm.org/}, | |
998a0501 BS |
395 | Kernel Based Virtual Machine (KVM). |
396 | ||
8e9620a6 | 397 | @item [12] |
998a0501 BS |
398 | @url{http://www.greensocs.com/projects/QEMUSystemC}, |
399 | QEMU-SystemC, a hardware co-simulator. | |
400 | ||
1f673135 FB |
401 | @end table |
402 | ||
debc7065 | 403 | @bye |