]> git.ipfire.org Git - thirdparty/gcc.git/blob - gcc/jit/docs/intro/tutorial04.rst
Update copyright years.
[thirdparty/gcc.git] / gcc / jit / docs / intro / tutorial04.rst
1 .. Copyright (C) 2014-2016 Free Software Foundation, Inc.
2 Originally contributed by David Malcolm <dmalcolm@redhat.com>
3
4 This is free software: you can redistribute it and/or modify it
5 under the terms of the GNU General Public License as published by
6 the Free Software Foundation, either version 3 of the License, or
7 (at your option) any later version.
8
9 This program is distributed in the hope that it will be useful, but
10 WITHOUT ANY WARRANTY; without even the implied warranty of
11 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
12 General Public License for more details.
13
14 You should have received a copy of the GNU General Public License
15 along with this program. If not, see
16 <http://www.gnu.org/licenses/>.
17
18 Tutorial part 4: Adding JIT-compilation to a toy interpreter
19 ------------------------------------------------------------
20 In this example we construct a "toy" interpreter, and add JIT-compilation
21 to it.
22
23 Our toy interpreter
24 *******************
25
26 It's a stack-based interpreter, and is intended as a (very simple) example
27 of the kind of bytecode interpreter seen in dynamic languages such as
28 Python, Ruby etc.
29
30 For the sake of simplicity, our toy virtual machine is very limited:
31
32 * The only data type is `int`
33
34 * It can only work on one function at a time (so that the only
35 function call that can be made is to recurse).
36
37 * Functions can only take one parameter.
38
39 * Functions have a stack of `int` values.
40
41 * We'll implement function call within the interpreter by calling a
42 function in our implementation, rather than implementing our own
43 frame stack.
44
45 * The parser is only good enough to get the examples to work.
46
47 Naturally, a real interpreter would be much more complicated that this.
48
49 The following operations are supported:
50
51 ====================== ======================== =============== ==============
52 Operation Meaning Old Stack New Stack
53 ====================== ======================== =============== ==============
54 DUP Duplicate top of stack. ``[..., x]`` ``[..., x, x]``
55 ROT Swap top two elements ``[..., x, y]`` ``[..., y, x]``
56 of stack.
57 BINARY_ADD Add the top two elements ``[..., x, y]`` ``[..., (x+y)]``
58 on the stack.
59 BINARY_SUBTRACT Likewise, but subtract. ``[..., x, y]`` ``[..., (x-y)]``
60 BINARY_MULT Likewise, but multiply. ``[..., x, y]`` ``[..., (x*y)]``
61 BINARY_COMPARE_LT Compare the top two ``[..., x, y]`` ``[..., (x<y)]``
62 elements on the stack
63 and push a nonzero/zero
64 if (x<y).
65 RECURSE Recurse, passing the top ``[..., x]`` ``[..., fn(x)]``
66 of the stack, and
67 popping the result.
68 RETURN Return the top of the ``[x]`` ``[]``
69 stack.
70 PUSH_CONST `arg` Push an int const. ``[...]`` ``[..., arg]``
71 JUMP_ABS_IF_TRUE `arg` Pop; if top of stack was ``[..., x]`` ``[...]``
72 nonzero, jump to
73 ``arg``.
74 ====================== ======================== =============== ==============
75
76 Programs can be interpreted, disassembled, and compiled to machine code.
77
78 The interpreter reads ``.toy`` scripts. Here's what a simple recursive
79 factorial program looks like, the script ``factorial.toy``.
80 The parser ignores lines beginning with a `#`.
81
82 .. literalinclude:: ../examples/tut04-toyvm/factorial.toy
83 :lines: 1-
84 :language: c
85
86 The interpreter is a simple infinite loop with a big ``switch`` statement
87 based on what the next opcode is:
88
89 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
90 :start-after: /* Execute the given function. */
91 :end-before: /* JIT compilation. */
92 :language: c
93
94 Compiling to machine code
95 *************************
96 We want to generate machine code that can be cast to this type and
97 then directly executed in-process:
98
99 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
100 :start-after: /* Functions are compiled to this function ptr type. */
101 :end-before: enum opcode
102 :language: c
103
104 The lifetime of the code is tied to that of a :c:type:`gcc_jit_result *`.
105 We'll handle this by bundling them up in a structure, so that we can
106 clean them up together by calling :c:func:`gcc_jit_result_release`:
107
108 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
109 :start-after: /* A struct to hold the compilation results. */
110 :end-before: /* The main compilation hook. */
111 :language: c
112
113 Our compiler isn't very sophisticated; it takes the implementation of
114 each opcode above, and maps it directly to the operations supported by
115 the libgccjit API.
116
117 How should we handle the stack? In theory we could calculate what the
118 stack depth will be at each opcode, and optimize away the stack
119 manipulation "by hand". We'll see below that libgccjit is able to do
120 this for us, so we'll implement stack manipulation
121 in a direct way, by creating a ``stack`` array and ``stack_depth``
122 variables, local within the generated function, equivalent to this C code:
123
124 .. code-block:: c
125
126 int stack_depth;
127 int stack[MAX_STACK_DEPTH];
128
129 We'll also have local variables ``x`` and ``y`` for use when implementing
130 the opcodes, equivalent to this:
131
132 .. code-block:: c
133
134 int x;
135 int y;
136
137 This means our compiler has the following state:
138
139 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
140 :start-after: /* JIT compilation. */
141 :end-before: /* Stack manipulation. */
142 :language: c
143
144 Setting things up
145 *****************
146
147 First we create our types:
148
149 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
150 :start-after: /* Create types. */
151 :end-before: /* The constant value 1. */
152 :language: c
153
154 along with extracting a useful `int` constant:
155
156 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
157 :start-after: /* The constant value 1. */
158 :end-before: /* Create locations. */
159 :language: c
160
161 We'll implement push and pop in terms of the ``stack`` array and
162 ``stack_depth``. Here are helper functions for adding statements to
163 a block, implementing pushing and popping values:
164
165 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
166 :start-after: /* Stack manipulation. */
167 :end-before: /* A struct to hold the compilation results. */
168 :language: c
169
170 We will support single-stepping through the generated code in the
171 debugger, so we need to create :c:type:`gcc_jit_location` instances, one
172 per operation in the source code. These will reference the lines of
173 e.g. ``factorial.toy``.
174
175 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
176 :start-after: /* Create locations. */
177 :end-before: /* Creating the function. */
178 :language: c
179
180 Let's create the function itself. As usual, we create its parameter
181 first, then use the parameter to create the function:
182
183 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
184 :start-after: /* Creating the function. */
185 :end-before: /* Create stack lvalues. */
186 :language: c
187
188 We create the locals within the function.
189
190 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
191 :start-after: /* Create stack lvalues. */
192 :end-before: /* 1st pass: create blocks, one per opcode.
193 :language: c
194
195 Populating the function
196 ***********************
197
198 There's some one-time initialization, and the API treats the first block
199 you create as the entrypoint of the function, so we need to create that
200 block first:
201
202 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
203 :start-after: first. */
204 :end-before: /* Create a block per operation. */
205 :language: c
206
207 We can now create blocks for each of the operations. Most of these will
208 be consolidated into larger blocks when the optimizer runs.
209
210 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
211 :start-after: /* Create a block per operation. */
212 :end-before: /* Populate the initial block. */
213 :language: c
214
215 Now that we have a block it can jump to when it's done, we can populate
216 the initial block:
217
218 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
219 :start-after: /* Populate the initial block. */
220 :end-before: /* 2nd pass: fill in instructions. */
221 :language: c
222
223 We can now populate the blocks for the individual operations. We loop
224 through them, adding instructions to their blocks:
225
226 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
227 :start-after: /* 2nd pass: fill in instructions. */
228 :end-before: /* Helper macros. */
229 :language: c
230
231 We're going to have another big ``switch`` statement for implementing
232 the opcodes, this time for compiling them, rather than interpreting
233 them. It's helpful to have macros for implementing push and pop, so that
234 we can make the ``switch`` statement that's coming up look as much as
235 possible like the one above within the interpreter:
236
237 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
238 :start-after: /* Helper macros. */
239 :end-before: gcc_jit_block_add_comment
240 :language: c
241
242 .. note::
243
244 A particularly clever implementation would have an *identical*
245 ``switch`` statement shared by the interpreter and the compiler, with
246 some preprocessor "magic". We're not doing that here, for the sake
247 of simplicity.
248
249 When I first implemented this compiler, I accidentally missed an edit
250 when copying and pasting the ``Y_EQUALS_POP`` macro, so that popping the
251 stack into ``y`` instead erroneously assigned it to ``x``, leaving ``y``
252 uninitialized.
253
254 To track this kind of thing down, we can use
255 :c:func:`gcc_jit_block_add_comment` to add descriptive comments
256 to the internal representation. This is invaluable when looking through
257 the generated IR for, say ``factorial``:
258
259 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
260 :start-after: PUSH_RVALUE (gcc_jit_lvalue_as_rvalue (state.y))
261 :end-before: /* Handle the individual opcodes. */
262 :language: c
263
264 We can now write the big ``switch`` statement that implements the
265 individual opcodes, populating the relevant block with statements:
266
267 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
268 :start-after: /* Handle the individual opcodes. */
269 :end-before: /* Go to the next block. */
270 :language: c
271
272 Every block must be terminated, via a call to one of the
273 ``gcc_jit_block_end_with_`` entrypoints. This has been done for two
274 of the opcodes, but we need to do it for the other ones, by jumping
275 to the next block.
276
277 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
278 :start-after: /* Go to the next block. */
279 :end-before: /* end of loop on PC locations. */
280 :language: c
281
282 This is analogous to simply incrementing the program counter.
283
284 Verifying the control flow graph
285 ********************************
286 Having finished looping over the blocks, the context is complete.
287
288 As before, we can verify that the control flow and statements are sane by
289 using :c:func:`gcc_jit_function_dump_to_dot`:
290
291 .. code-block:: c
292
293 gcc_jit_function_dump_to_dot (state.fn, "/tmp/factorial.dot");
294
295 and viewing the result. Note how the label names, comments, and
296 variable names show up in the dump, to make it easier to spot
297 errors in our compiler.
298
299 .. figure:: factorial.png
300 :alt: image of a control flow graph
301
302 Compiling the context
303 *********************
304 Having finished looping over the blocks and populating them with
305 statements, the context is complete.
306
307 We can now compile it, and extract machine code from the result:
308
309 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
310 :start-after: /* We've now finished populating the context. Compile it. */
311 :end-before: /* (this leaks "result" and "funcname") */
312 :language: c
313
314 We can now run the result:
315
316 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
317 :start-after: /* JIT-compilation. */
318 :end-before: return 0;
319 :language: c
320
321 Single-stepping through the generated code
322 ******************************************
323
324 It's possible to debug the generated code. To do this we need to both:
325
326 * Set up source code locations for our statements, so that we can
327 meaningfully step through the code. We did this above by
328 calling :c:func:`gcc_jit_context_new_location` and using the
329 results.
330
331 * Enable the generation of debugging information, by setting
332 :c:macro:`GCC_JIT_BOOL_OPTION_DEBUGINFO` on the
333 :c:type:`gcc_jit_context` via
334 :c:func:`gcc_jit_context_set_bool_option`:
335
336 .. code-block:: c
337
338 gcc_jit_context_set_bool_option (
339 ctxt,
340 GCC_JIT_BOOL_OPTION_DEBUGINFO,
341 1);
342
343 Having done this, we can put a breakpoint on the generated function:
344
345 .. code-block:: console
346
347 $ gdb --args ./toyvm factorial.toy 10
348 (gdb) break factorial
349 Function "factorial" not defined.
350 Make breakpoint pending on future shared library load? (y or [n]) y
351 Breakpoint 1 (factorial) pending.
352 (gdb) run
353 Breakpoint 1, factorial (arg=10) at factorial.toy:14
354 14 DUP
355
356 We've set up location information, which references ``factorial.toy``.
357 This allows us to use e.g. ``list`` to see where we are in the script:
358
359 .. code-block:: console
360
361 (gdb) list
362 9
363 10 # Initial state:
364 11 # stack: [arg]
365 12
366 13 # 0:
367 14 DUP
368 15 # stack: [arg, arg]
369 16
370 17 # 1:
371 18 PUSH_CONST 2
372
373 and to step through the function, examining the data:
374
375 .. code-block:: console
376
377 (gdb) n
378 18 PUSH_CONST 2
379 (gdb) n
380 22 BINARY_COMPARE_LT
381 (gdb) print stack
382 $5 = {10, 10, 2, 0, -7152, 32767, 0, 0}
383 (gdb) print stack_depth
384 $6 = 3
385
386 You'll see that the parts of the ``stack`` array that haven't been
387 touched yet are uninitialized.
388
389 .. note::
390
391 Turning on optimizations may lead to unpredictable results when
392 stepping through the generated code: the execution may appear to
393 "jump around" the source code. This is analogous to turning up the
394 optimization level in a regular compiler.
395
396 Examining the generated code
397 ****************************
398
399 How good is the optimized code?
400
401 We can turn up optimizations, by calling
402 :c:func:`gcc_jit_context_set_int_option` with
403 :c:macro:`GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL`:
404
405 .. code-block:: c
406
407 gcc_jit_context_set_int_option (
408 ctxt,
409 GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL,
410 3);
411
412 One of GCC's internal representations is called "gimple". A dump of the
413 initial gimple representation of the code can be seen by setting:
414
415 .. code-block:: c
416
417 gcc_jit_context_set_bool_option (ctxt,
418 GCC_JIT_BOOL_OPTION_DUMP_INITIAL_GIMPLE,
419 1);
420
421 With optimization on and source locations displayed, this gives:
422
423 .. We'll use "c" for gimple dumps
424
425 .. code-block:: c
426
427 factorial (signed int arg)
428 {
429 <unnamed type> D.80;
430 signed int D.81;
431 signed int D.82;
432 signed int D.83;
433 signed int D.84;
434 signed int D.85;
435 signed int y;
436 signed int x;
437 signed int stack_depth;
438 signed int stack[8];
439
440 try
441 {
442 initial:
443 stack_depth = 0;
444 stack[stack_depth] = arg;
445 stack_depth = stack_depth + 1;
446 goto instr0;
447 instr0:
448 /* DUP */:
449 stack_depth = stack_depth + -1;
450 x = stack[stack_depth];
451 stack[stack_depth] = x;
452 stack_depth = stack_depth + 1;
453 stack[stack_depth] = x;
454 stack_depth = stack_depth + 1;
455 goto instr1;
456 instr1:
457 /* PUSH_CONST */:
458 stack[stack_depth] = 2;
459 stack_depth = stack_depth + 1;
460 goto instr2;
461
462 /* etc */
463
464 You can see the generated machine code in assembly form via:
465
466 .. code-block:: c
467
468 gcc_jit_context_set_bool_option (
469 ctxt,
470 GCC_JIT_BOOL_OPTION_DUMP_GENERATED_CODE,
471 1);
472 result = gcc_jit_context_compile (ctxt);
473
474 which shows that (on this x86_64 box) the compiler has unrolled the loop
475 and is using MMX instructions to perform several multiplications
476 simultaneously:
477
478 .. code-block:: gas
479
480 .file "fake.c"
481 .text
482 .Ltext0:
483 .p2align 4,,15
484 .globl factorial
485 .type factorial, @function
486 factorial:
487 .LFB0:
488 .file 1 "factorial.toy"
489 .loc 1 14 0
490 .cfi_startproc
491 .LVL0:
492 .L2:
493 .loc 1 26 0
494 cmpl $1, %edi
495 jle .L13
496 leal -1(%rdi), %edx
497 movl %edx, %ecx
498 shrl $2, %ecx
499 leal 0(,%rcx,4), %esi
500 testl %esi, %esi
501 je .L14
502 cmpl $9, %edx
503 jbe .L14
504 leal -2(%rdi), %eax
505 movl %eax, -16(%rsp)
506 leal -3(%rdi), %eax
507 movd -16(%rsp), %xmm0
508 movl %edi, -16(%rsp)
509 movl %eax, -12(%rsp)
510 movd -16(%rsp), %xmm1
511 xorl %eax, %eax
512 movl %edx, -16(%rsp)
513 movd -12(%rsp), %xmm4
514 movd -16(%rsp), %xmm6
515 punpckldq %xmm4, %xmm0
516 movdqa .LC1(%rip), %xmm4
517 punpckldq %xmm6, %xmm1
518 punpcklqdq %xmm0, %xmm1
519 movdqa .LC0(%rip), %xmm0
520 jmp .L5
521 # etc - edited for brevity
522
523 This is clearly overkill for a function that will likely overflow the
524 ``int`` type before the vectorization is worthwhile - but then again, this
525 is a toy example.
526
527 Turning down the optimization level to 2:
528
529 .. code-block:: c
530
531 gcc_jit_context_set_int_option (
532 ctxt,
533 GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL,
534 3);
535
536 yields this code, which is simple enough to quote in its entirety:
537
538 .. code-block:: gas
539
540 .file "fake.c"
541 .text
542 .p2align 4,,15
543 .globl factorial
544 .type factorial, @function
545 factorial:
546 .LFB0:
547 .cfi_startproc
548 .L2:
549 cmpl $1, %edi
550 jle .L8
551 movl $1, %edx
552 jmp .L4
553 .p2align 4,,10
554 .p2align 3
555 .L6:
556 movl %eax, %edi
557 .L4:
558 .L5:
559 leal -1(%rdi), %eax
560 imull %edi, %edx
561 cmpl $1, %eax
562 jne .L6
563 .L3:
564 .L7:
565 imull %edx, %eax
566 ret
567 .L8:
568 movl %edi, %eax
569 movl $1, %edx
570 jmp .L7
571 .cfi_endproc
572 .LFE0:
573 .size factorial, .-factorial
574 .ident "GCC: (GNU) 4.9.0 20131023 (Red Hat 0.2-%{gcc_release})"
575 .section .note.GNU-stack,"",@progbits
576
577 Note that the stack pushing and popping have been eliminated, as has the
578 recursive call (in favor of an iteration).
579
580 Putting it all together
581 ***********************
582
583 The complete example can be seen in the source tree at
584 ``gcc/jit/docs/examples/tut04-toyvm/toyvm.c``
585
586 along with a Makefile and a couple of sample .toy scripts:
587
588 .. code-block:: console
589
590 $ ls -al
591 drwxrwxr-x. 2 david david 4096 Sep 19 17:46 .
592 drwxrwxr-x. 3 david david 4096 Sep 19 15:26 ..
593 -rw-rw-r--. 1 david david 615 Sep 19 12:43 factorial.toy
594 -rw-rw-r--. 1 david david 834 Sep 19 13:08 fibonacci.toy
595 -rw-rw-r--. 1 david david 238 Sep 19 14:22 Makefile
596 -rw-rw-r--. 1 david david 16457 Sep 19 17:07 toyvm.c
597
598 $ make toyvm
599 g++ -Wall -g -o toyvm toyvm.c -lgccjit
600
601 $ ./toyvm factorial.toy 10
602 interpreter result: 3628800
603 compiler result: 3628800
604
605 $ ./toyvm fibonacci.toy 10
606 interpreter result: 55
607 compiler result: 55
608
609 Behind the curtain: How does our code get optimized?
610 ****************************************************
611
612 Our example is done, but you may be wondering about exactly how the
613 compiler turned what we gave it into the machine code seen above.
614
615 We can examine what the compiler is doing in detail by setting:
616
617 .. code-block:: c
618
619 gcc_jit_context_set_bool_option (state.ctxt,
620 GCC_JIT_BOOL_OPTION_DUMP_EVERYTHING,
621 1);
622 gcc_jit_context_set_bool_option (state.ctxt,
623 GCC_JIT_BOOL_OPTION_KEEP_INTERMEDIATES,
624 1);
625
626 This will dump detailed information about the compiler's state to a
627 directory under ``/tmp``, and keep it from being cleaned up.
628
629 The precise names and their formats of these files is subject to change.
630 Higher optimization levels lead to more files.
631 Here's what I saw (edited for brevity; there were almost 200 files):
632
633 .. code-block:: console
634
635 intermediate files written to /tmp/libgccjit-KPQbGw
636 $ ls /tmp/libgccjit-KPQbGw/
637 fake.c.000i.cgraph
638 fake.c.000i.type-inheritance
639 fake.c.004t.gimple
640 fake.c.007t.omplower
641 fake.c.008t.lower
642 fake.c.011t.eh
643 fake.c.012t.cfg
644 fake.c.014i.visibility
645 fake.c.015i.early_local_cleanups
646 fake.c.016t.ssa
647 # etc
648
649 The gimple code is converted into Static Single Assignment form,
650 with annotations for use when generating the debuginfo:
651
652 .. code-block:: console
653
654 $ less /tmp/libgccjit-KPQbGw/fake.c.016t.ssa
655
656 .. code-block:: c
657
658 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
659
660 factorial (signed int arg)
661 {
662 signed int stack[8];
663 signed int stack_depth;
664 signed int x;
665 signed int y;
666 <unnamed type> _20;
667 signed int _21;
668 signed int _38;
669 signed int _44;
670 signed int _51;
671 signed int _56;
672
673 initial:
674 stack_depth_3 = 0;
675 # DEBUG stack_depth => stack_depth_3
676 stack[stack_depth_3] = arg_5(D);
677 stack_depth_7 = stack_depth_3 + 1;
678 # DEBUG stack_depth => stack_depth_7
679 # DEBUG instr0 => NULL
680 # DEBUG /* DUP */ => NULL
681 stack_depth_8 = stack_depth_7 + -1;
682 # DEBUG stack_depth => stack_depth_8
683 x_9 = stack[stack_depth_8];
684 # DEBUG x => x_9
685 stack[stack_depth_8] = x_9;
686 stack_depth_11 = stack_depth_8 + 1;
687 # DEBUG stack_depth => stack_depth_11
688 stack[stack_depth_11] = x_9;
689 stack_depth_13 = stack_depth_11 + 1;
690 # DEBUG stack_depth => stack_depth_13
691 # DEBUG instr1 => NULL
692 # DEBUG /* PUSH_CONST */ => NULL
693 stack[stack_depth_13] = 2;
694
695 /* etc; edited for brevity */
696
697 We can perhaps better see the code by turning off
698 :c:macro:`GCC_JIT_BOOL_OPTION_DEBUGINFO` to suppress all those ``DEBUG``
699 statements, giving:
700
701 .. code-block:: console
702
703 $ less /tmp/libgccjit-1Hywc0/fake.c.016t.ssa
704
705 .. code-block:: c
706
707 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
708
709 factorial (signed int arg)
710 {
711 signed int stack[8];
712 signed int stack_depth;
713 signed int x;
714 signed int y;
715 <unnamed type> _20;
716 signed int _21;
717 signed int _38;
718 signed int _44;
719 signed int _51;
720 signed int _56;
721
722 initial:
723 stack_depth_3 = 0;
724 stack[stack_depth_3] = arg_5(D);
725 stack_depth_7 = stack_depth_3 + 1;
726 stack_depth_8 = stack_depth_7 + -1;
727 x_9 = stack[stack_depth_8];
728 stack[stack_depth_8] = x_9;
729 stack_depth_11 = stack_depth_8 + 1;
730 stack[stack_depth_11] = x_9;
731 stack_depth_13 = stack_depth_11 + 1;
732 stack[stack_depth_13] = 2;
733 stack_depth_15 = stack_depth_13 + 1;
734 stack_depth_16 = stack_depth_15 + -1;
735 y_17 = stack[stack_depth_16];
736 stack_depth_18 = stack_depth_16 + -1;
737 x_19 = stack[stack_depth_18];
738 _20 = x_19 < y_17;
739 _21 = (signed int) _20;
740 stack[stack_depth_18] = _21;
741 stack_depth_23 = stack_depth_18 + 1;
742 stack_depth_24 = stack_depth_23 + -1;
743 x_25 = stack[stack_depth_24];
744 if (x_25 != 0)
745 goto <bb 4> (instr9);
746 else
747 goto <bb 3> (instr4);
748
749 instr4:
750 /* DUP */:
751 stack_depth_26 = stack_depth_24 + -1;
752 x_27 = stack[stack_depth_26];
753 stack[stack_depth_26] = x_27;
754 stack_depth_29 = stack_depth_26 + 1;
755 stack[stack_depth_29] = x_27;
756 stack_depth_31 = stack_depth_29 + 1;
757 stack[stack_depth_31] = 1;
758 stack_depth_33 = stack_depth_31 + 1;
759 stack_depth_34 = stack_depth_33 + -1;
760 y_35 = stack[stack_depth_34];
761 stack_depth_36 = stack_depth_34 + -1;
762 x_37 = stack[stack_depth_36];
763 _38 = x_37 - y_35;
764 stack[stack_depth_36] = _38;
765 stack_depth_40 = stack_depth_36 + 1;
766 stack_depth_41 = stack_depth_40 + -1;
767 x_42 = stack[stack_depth_41];
768 _44 = factorial (x_42);
769 stack[stack_depth_41] = _44;
770 stack_depth_46 = stack_depth_41 + 1;
771 stack_depth_47 = stack_depth_46 + -1;
772 y_48 = stack[stack_depth_47];
773 stack_depth_49 = stack_depth_47 + -1;
774 x_50 = stack[stack_depth_49];
775 _51 = x_50 * y_48;
776 stack[stack_depth_49] = _51;
777 stack_depth_53 = stack_depth_49 + 1;
778
779 # stack_depth_1 = PHI <stack_depth_24(2), stack_depth_53(3)>
780 instr9:
781 /* RETURN */:
782 stack_depth_54 = stack_depth_1 + -1;
783 x_55 = stack[stack_depth_54];
784 _56 = x_55;
785 stack ={v} {CLOBBER};
786 return _56;
787
788 }
789
790 Note in the above how all the :c:type:`gcc_jit_block` instances we
791 created have been consolidated into just 3 blocks in GCC's internal
792 representation: ``initial``, ``instr4`` and ``instr9``.
793
794 Optimizing away stack manipulation
795 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
796 Recall our simple implementation of stack operations. Let's examine
797 how the stack operations are optimized away.
798
799 After a pass of constant-propagation, the depth of the stack at each
800 opcode can be determined at compile-time:
801
802 .. code-block:: console
803
804 $ less /tmp/libgccjit-1Hywc0/fake.c.021t.ccp1
805
806 .. code-block:: c
807
808 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
809
810 factorial (signed int arg)
811 {
812 signed int stack[8];
813 signed int stack_depth;
814 signed int x;
815 signed int y;
816 <unnamed type> _20;
817 signed int _21;
818 signed int _38;
819 signed int _44;
820 signed int _51;
821
822 initial:
823 stack[0] = arg_5(D);
824 x_9 = stack[0];
825 stack[0] = x_9;
826 stack[1] = x_9;
827 stack[2] = 2;
828 y_17 = stack[2];
829 x_19 = stack[1];
830 _20 = x_19 < y_17;
831 _21 = (signed int) _20;
832 stack[1] = _21;
833 x_25 = stack[1];
834 if (x_25 != 0)
835 goto <bb 4> (instr9);
836 else
837 goto <bb 3> (instr4);
838
839 instr4:
840 /* DUP */:
841 x_27 = stack[0];
842 stack[0] = x_27;
843 stack[1] = x_27;
844 stack[2] = 1;
845 y_35 = stack[2];
846 x_37 = stack[1];
847 _38 = x_37 - y_35;
848 stack[1] = _38;
849 x_42 = stack[1];
850 _44 = factorial (x_42);
851 stack[1] = _44;
852 y_48 = stack[1];
853 x_50 = stack[0];
854 _51 = x_50 * y_48;
855 stack[0] = _51;
856
857 instr9:
858 /* RETURN */:
859 x_55 = stack[0];
860 x_56 = x_55;
861 stack ={v} {CLOBBER};
862 return x_56;
863
864 }
865
866 Note how, in the above, all those ``stack_depth`` values are now just
867 constants: we're accessing specific stack locations at each opcode.
868
869 The "esra" pass ("Early Scalar Replacement of Aggregates") breaks
870 out our "stack" array into individual elements:
871
872 .. code-block:: console
873
874 $ less /tmp/libgccjit-1Hywc0/fake.c.024t.esra
875
876 .. code-block:: c
877
878 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
879
880 Created a replacement for stack offset: 0, size: 32: stack$0
881 Created a replacement for stack offset: 32, size: 32: stack$1
882 Created a replacement for stack offset: 64, size: 32: stack$2
883
884 Symbols to be put in SSA form
885 { D.89 D.90 D.91 }
886 Incremental SSA update started at block: 0
887 Number of blocks in CFG: 5
888 Number of blocks to update: 4 ( 80%)
889
890
891 factorial (signed int arg)
892 {
893 signed int stack$2;
894 signed int stack$1;
895 signed int stack$0;
896 signed int stack[8];
897 signed int stack_depth;
898 signed int x;
899 signed int y;
900 <unnamed type> _20;
901 signed int _21;
902 signed int _38;
903 signed int _44;
904 signed int _51;
905
906 initial:
907 stack$0_45 = arg_5(D);
908 x_9 = stack$0_45;
909 stack$0_39 = x_9;
910 stack$1_32 = x_9;
911 stack$2_30 = 2;
912 y_17 = stack$2_30;
913 x_19 = stack$1_32;
914 _20 = x_19 < y_17;
915 _21 = (signed int) _20;
916 stack$1_28 = _21;
917 x_25 = stack$1_28;
918 if (x_25 != 0)
919 goto <bb 4> (instr9);
920 else
921 goto <bb 3> (instr4);
922
923 instr4:
924 /* DUP */:
925 x_27 = stack$0_39;
926 stack$0_22 = x_27;
927 stack$1_14 = x_27;
928 stack$2_12 = 1;
929 y_35 = stack$2_12;
930 x_37 = stack$1_14;
931 _38 = x_37 - y_35;
932 stack$1_10 = _38;
933 x_42 = stack$1_10;
934 _44 = factorial (x_42);
935 stack$1_6 = _44;
936 y_48 = stack$1_6;
937 x_50 = stack$0_22;
938 _51 = x_50 * y_48;
939 stack$0_1 = _51;
940
941 # stack$0_52 = PHI <stack$0_39(2), stack$0_1(3)>
942 instr9:
943 /* RETURN */:
944 x_55 = stack$0_52;
945 x_56 = x_55;
946 stack ={v} {CLOBBER};
947 return x_56;
948
949 }
950
951 Hence at this point, all those pushes and pops of the stack are now
952 simply assignments to specific temporary variables.
953
954 After some copy propagation, the stack manipulation has been completely
955 optimized away:
956
957 .. code-block:: console
958
959 $ less /tmp/libgccjit-1Hywc0/fake.c.026t.copyprop1
960
961 .. code-block:: c
962
963 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
964
965 factorial (signed int arg)
966 {
967 signed int stack$2;
968 signed int stack$1;
969 signed int stack$0;
970 signed int stack[8];
971 signed int stack_depth;
972 signed int x;
973 signed int y;
974 <unnamed type> _20;
975 signed int _21;
976 signed int _38;
977 signed int _44;
978 signed int _51;
979
980 initial:
981 stack$0_39 = arg_5(D);
982 _20 = arg_5(D) <= 1;
983 _21 = (signed int) _20;
984 if (_21 != 0)
985 goto <bb 4> (instr9);
986 else
987 goto <bb 3> (instr4);
988
989 instr4:
990 /* DUP */:
991 _38 = arg_5(D) + -1;
992 _44 = factorial (_38);
993 _51 = arg_5(D) * _44;
994 stack$0_1 = _51;
995
996 # stack$0_52 = PHI <arg_5(D)(2), _51(3)>
997 instr9:
998 /* RETURN */:
999 stack ={v} {CLOBBER};
1000 return stack$0_52;
1001
1002 }
1003
1004 Later on, another pass finally eliminated ``stack_depth`` local and the
1005 unused parts of the `stack`` array altogether:
1006
1007 .. code-block:: console
1008
1009 $ less /tmp/libgccjit-1Hywc0/fake.c.036t.release_ssa
1010
1011 .. code-block:: c
1012
1013 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
1014
1015 Released 44 names, 314.29%, removed 44 holes
1016 factorial (signed int arg)
1017 {
1018 signed int stack$0;
1019 signed int mult_acc_1;
1020 <unnamed type> _5;
1021 signed int _6;
1022 signed int _7;
1023 signed int mul_tmp_10;
1024 signed int mult_acc_11;
1025 signed int mult_acc_13;
1026
1027 # arg_9 = PHI <arg_8(D)(0)>
1028 # mult_acc_13 = PHI <1(0)>
1029 initial:
1030
1031 <bb 5>:
1032 # arg_4 = PHI <arg_9(2), _7(3)>
1033 # mult_acc_1 = PHI <mult_acc_13(2), mult_acc_11(3)>
1034 _5 = arg_4 <= 1;
1035 _6 = (signed int) _5;
1036 if (_6 != 0)
1037 goto <bb 4> (instr9);
1038 else
1039 goto <bb 3> (instr4);
1040
1041 instr4:
1042 /* DUP */:
1043 _7 = arg_4 + -1;
1044 mult_acc_11 = mult_acc_1 * arg_4;
1045 goto <bb 5>;
1046
1047 # stack$0_12 = PHI <arg_4(5)>
1048 instr9:
1049 /* RETURN */:
1050 mul_tmp_10 = mult_acc_1 * stack$0_12;
1051 return mul_tmp_10;
1052
1053 }
1054
1055
1056 Elimination of tail recursion
1057 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1058 Another significant optimization is the detection that the call to
1059 ``factorial`` is tail recursion, which can be eliminated in favor of
1060 an iteration:
1061
1062 .. code-block:: console
1063
1064 $ less /tmp/libgccjit-1Hywc0/fake.c.030t.tailr1
1065
1066 .. code-block:: c
1067
1068 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
1069
1070
1071 Symbols to be put in SSA form
1072 { D.88 }
1073 Incremental SSA update started at block: 0
1074 Number of blocks in CFG: 5
1075 Number of blocks to update: 4 ( 80%)
1076
1077
1078 factorial (signed int arg)
1079 {
1080 signed int stack$2;
1081 signed int stack$1;
1082 signed int stack$0;
1083 signed int stack[8];
1084 signed int stack_depth;
1085 signed int x;
1086 signed int y;
1087 signed int mult_acc_1;
1088 <unnamed type> _20;
1089 signed int _21;
1090 signed int _38;
1091 signed int mul_tmp_44;
1092 signed int mult_acc_51;
1093
1094 # arg_5 = PHI <arg_39(D)(0), _38(3)>
1095 # mult_acc_1 = PHI <1(0), mult_acc_51(3)>
1096 initial:
1097 _20 = arg_5 <= 1;
1098 _21 = (signed int) _20;
1099 if (_21 != 0)
1100 goto <bb 4> (instr9);
1101 else
1102 goto <bb 3> (instr4);
1103
1104 instr4:
1105 /* DUP */:
1106 _38 = arg_5 + -1;
1107 mult_acc_51 = mult_acc_1 * arg_5;
1108 goto <bb 2> (initial);
1109
1110 # stack$0_52 = PHI <arg_5(2)>
1111 instr9:
1112 /* RETURN */:
1113 stack ={v} {CLOBBER};
1114 mul_tmp_44 = mult_acc_1 * stack$0_52;
1115 return mul_tmp_44;
1116
1117 }