1 .. Copyright (C) 2014-2016 Free Software Foundation, Inc.
2 Originally contributed by David Malcolm <dmalcolm@redhat.com>
4 This is free software: you can redistribute it and/or modify it
5 under the terms of the GNU General Public License as published by
6 the Free Software Foundation, either version 3 of the License, or
7 (at your option) any later version.
9 This program is distributed in the hope that it will be useful, but
10 WITHOUT ANY WARRANTY; without even the implied warranty of
11 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
12 General Public License for more details.
14 You should have received a copy of the GNU General Public License
15 along with this program. If not, see
16 <http://www.gnu.org/licenses/>.
18 Tutorial part 4: Adding JIT-compilation to a toy interpreter
19 ------------------------------------------------------------
20 In this example we construct a "toy" interpreter, and add JIT-compilation
26 It's a stack-based interpreter, and is intended as a (very simple) example
27 of the kind of bytecode interpreter seen in dynamic languages such as
30 For the sake of simplicity, our toy virtual machine is very limited:
32 * The only data type is `int`
34 * It can only work on one function at a time (so that the only
35 function call that can be made is to recurse).
37 * Functions can only take one parameter.
39 * Functions have a stack of `int` values.
41 * We'll implement function call within the interpreter by calling a
42 function in our implementation, rather than implementing our own
45 * The parser is only good enough to get the examples to work.
47 Naturally, a real interpreter would be much more complicated that this.
49 The following operations are supported:
51 ====================== ======================== =============== ==============
52 Operation Meaning Old Stack New Stack
53 ====================== ======================== =============== ==============
54 DUP Duplicate top of stack. ``[..., x]`` ``[..., x, x]``
55 ROT Swap top two elements ``[..., x, y]`` ``[..., y, x]``
57 BINARY_ADD Add the top two elements ``[..., x, y]`` ``[..., (x+y)]``
59 BINARY_SUBTRACT Likewise, but subtract. ``[..., x, y]`` ``[..., (x-y)]``
60 BINARY_MULT Likewise, but multiply. ``[..., x, y]`` ``[..., (x*y)]``
61 BINARY_COMPARE_LT Compare the top two ``[..., x, y]`` ``[..., (x<y)]``
63 and push a nonzero/zero
65 RECURSE Recurse, passing the top ``[..., x]`` ``[..., fn(x)]``
68 RETURN Return the top of the ``[x]`` ``[]``
70 PUSH_CONST `arg` Push an int const. ``[...]`` ``[..., arg]``
71 JUMP_ABS_IF_TRUE `arg` Pop; if top of stack was ``[..., x]`` ``[...]``
74 ====================== ======================== =============== ==============
76 Programs can be interpreted, disassembled, and compiled to machine code.
78 The interpreter reads ``.toy`` scripts. Here's what a simple recursive
79 factorial program looks like, the script ``factorial.toy``.
80 The parser ignores lines beginning with a `#`.
82 .. literalinclude:: ../examples/tut04-toyvm/factorial.toy
86 The interpreter is a simple infinite loop with a big ``switch`` statement
87 based on what the next opcode is:
89 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
90 :start-after: /* Execute the given function. */
91 :end-before: /* JIT compilation. */
94 Compiling to machine code
95 *************************
96 We want to generate machine code that can be cast to this type and
97 then directly executed in-process:
99 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
100 :start-after: /* Functions are compiled to this function ptr type. */
101 :end-before: enum opcode
104 The lifetime of the code is tied to that of a :c:type:`gcc_jit_result *`.
105 We'll handle this by bundling them up in a structure, so that we can
106 clean them up together by calling :c:func:`gcc_jit_result_release`:
108 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
109 :start-after: /* A struct to hold the compilation results. */
110 :end-before: /* The main compilation hook. */
113 Our compiler isn't very sophisticated; it takes the implementation of
114 each opcode above, and maps it directly to the operations supported by
117 How should we handle the stack? In theory we could calculate what the
118 stack depth will be at each opcode, and optimize away the stack
119 manipulation "by hand". We'll see below that libgccjit is able to do
120 this for us, so we'll implement stack manipulation
121 in a direct way, by creating a ``stack`` array and ``stack_depth``
122 variables, local within the generated function, equivalent to this C code:
127 int stack[MAX_STACK_DEPTH];
129 We'll also have local variables ``x`` and ``y`` for use when implementing
130 the opcodes, equivalent to this:
137 This means our compiler has the following state:
139 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
140 :start-after: /* JIT compilation. */
141 :end-before: /* Stack manipulation. */
147 First we create our types:
149 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
150 :start-after: /* Create types. */
151 :end-before: /* The constant value 1. */
154 along with extracting a useful `int` constant:
156 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
157 :start-after: /* The constant value 1. */
158 :end-before: /* Create locations. */
161 We'll implement push and pop in terms of the ``stack`` array and
162 ``stack_depth``. Here are helper functions for adding statements to
163 a block, implementing pushing and popping values:
165 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
166 :start-after: /* Stack manipulation. */
167 :end-before: /* A struct to hold the compilation results. */
170 We will support single-stepping through the generated code in the
171 debugger, so we need to create :c:type:`gcc_jit_location` instances, one
172 per operation in the source code. These will reference the lines of
173 e.g. ``factorial.toy``.
175 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
176 :start-after: /* Create locations. */
177 :end-before: /* Creating the function. */
180 Let's create the function itself. As usual, we create its parameter
181 first, then use the parameter to create the function:
183 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
184 :start-after: /* Creating the function. */
185 :end-before: /* Create stack lvalues. */
188 We create the locals within the function.
190 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
191 :start-after: /* Create stack lvalues. */
192 :end-before: /* 1st pass: create blocks, one per opcode.
195 Populating the function
196 ***********************
198 There's some one-time initialization, and the API treats the first block
199 you create as the entrypoint of the function, so we need to create that
202 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
203 :start-after: first. */
204 :end-before: /* Create a block per operation. */
207 We can now create blocks for each of the operations. Most of these will
208 be consolidated into larger blocks when the optimizer runs.
210 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
211 :start-after: /* Create a block per operation. */
212 :end-before: /* Populate the initial block. */
215 Now that we have a block it can jump to when it's done, we can populate
218 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
219 :start-after: /* Populate the initial block. */
220 :end-before: /* 2nd pass: fill in instructions. */
223 We can now populate the blocks for the individual operations. We loop
224 through them, adding instructions to their blocks:
226 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
227 :start-after: /* 2nd pass: fill in instructions. */
228 :end-before: /* Helper macros. */
231 We're going to have another big ``switch`` statement for implementing
232 the opcodes, this time for compiling them, rather than interpreting
233 them. It's helpful to have macros for implementing push and pop, so that
234 we can make the ``switch`` statement that's coming up look as much as
235 possible like the one above within the interpreter:
237 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
238 :start-after: /* Helper macros. */
239 :end-before: gcc_jit_block_add_comment
244 A particularly clever implementation would have an *identical*
245 ``switch`` statement shared by the interpreter and the compiler, with
246 some preprocessor "magic". We're not doing that here, for the sake
249 When I first implemented this compiler, I accidentally missed an edit
250 when copying and pasting the ``Y_EQUALS_POP`` macro, so that popping the
251 stack into ``y`` instead erroneously assigned it to ``x``, leaving ``y``
254 To track this kind of thing down, we can use
255 :c:func:`gcc_jit_block_add_comment` to add descriptive comments
256 to the internal representation. This is invaluable when looking through
257 the generated IR for, say ``factorial``:
259 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
260 :start-after: PUSH_RVALUE (gcc_jit_lvalue_as_rvalue (state.y))
261 :end-before: /* Handle the individual opcodes. */
264 We can now write the big ``switch`` statement that implements the
265 individual opcodes, populating the relevant block with statements:
267 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
268 :start-after: /* Handle the individual opcodes. */
269 :end-before: /* Go to the next block. */
272 Every block must be terminated, via a call to one of the
273 ``gcc_jit_block_end_with_`` entrypoints. This has been done for two
274 of the opcodes, but we need to do it for the other ones, by jumping
277 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
278 :start-after: /* Go to the next block. */
279 :end-before: /* end of loop on PC locations. */
282 This is analogous to simply incrementing the program counter.
284 Verifying the control flow graph
285 ********************************
286 Having finished looping over the blocks, the context is complete.
288 As before, we can verify that the control flow and statements are sane by
289 using :c:func:`gcc_jit_function_dump_to_dot`:
293 gcc_jit_function_dump_to_dot (state.fn, "/tmp/factorial.dot");
295 and viewing the result. Note how the label names, comments, and
296 variable names show up in the dump, to make it easier to spot
297 errors in our compiler.
299 .. figure:: factorial.png
300 :alt: image of a control flow graph
302 Compiling the context
303 *********************
304 Having finished looping over the blocks and populating them with
305 statements, the context is complete.
307 We can now compile it, and extract machine code from the result:
309 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
310 :start-after: /* We've now finished populating the context. Compile it. */
311 :end-before: /* (this leaks "result" and "funcname") */
314 We can now run the result:
316 .. literalinclude:: ../examples/tut04-toyvm/toyvm.c
317 :start-after: /* JIT-compilation. */
318 :end-before: return 0;
321 Single-stepping through the generated code
322 ******************************************
324 It's possible to debug the generated code. To do this we need to both:
326 * Set up source code locations for our statements, so that we can
327 meaningfully step through the code. We did this above by
328 calling :c:func:`gcc_jit_context_new_location` and using the
331 * Enable the generation of debugging information, by setting
332 :c:macro:`GCC_JIT_BOOL_OPTION_DEBUGINFO` on the
333 :c:type:`gcc_jit_context` via
334 :c:func:`gcc_jit_context_set_bool_option`:
338 gcc_jit_context_set_bool_option (
340 GCC_JIT_BOOL_OPTION_DEBUGINFO,
343 Having done this, we can put a breakpoint on the generated function:
345 .. code-block:: console
347 $ gdb --args ./toyvm factorial.toy 10
348 (gdb) break factorial
349 Function "factorial" not defined.
350 Make breakpoint pending on future shared library load? (y or [n]) y
351 Breakpoint 1 (factorial) pending.
353 Breakpoint 1, factorial (arg=10) at factorial.toy:14
356 We've set up location information, which references ``factorial.toy``.
357 This allows us to use e.g. ``list`` to see where we are in the script:
359 .. code-block:: console
368 15 # stack: [arg, arg]
373 and to step through the function, examining the data:
375 .. code-block:: console
382 $5 = {10, 10, 2, 0, -7152, 32767, 0, 0}
383 (gdb) print stack_depth
386 You'll see that the parts of the ``stack`` array that haven't been
387 touched yet are uninitialized.
391 Turning on optimizations may lead to unpredictable results when
392 stepping through the generated code: the execution may appear to
393 "jump around" the source code. This is analogous to turning up the
394 optimization level in a regular compiler.
396 Examining the generated code
397 ****************************
399 How good is the optimized code?
401 We can turn up optimizations, by calling
402 :c:func:`gcc_jit_context_set_int_option` with
403 :c:macro:`GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL`:
407 gcc_jit_context_set_int_option (
409 GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL,
412 One of GCC's internal representations is called "gimple". A dump of the
413 initial gimple representation of the code can be seen by setting:
417 gcc_jit_context_set_bool_option (ctxt,
418 GCC_JIT_BOOL_OPTION_DUMP_INITIAL_GIMPLE,
421 With optimization on and source locations displayed, this gives:
423 .. We'll use "c" for gimple dumps
427 factorial (signed int arg)
437 signed int stack_depth;
444 stack[stack_depth] = arg;
445 stack_depth = stack_depth + 1;
449 stack_depth = stack_depth + -1;
450 x = stack[stack_depth];
451 stack[stack_depth] = x;
452 stack_depth = stack_depth + 1;
453 stack[stack_depth] = x;
454 stack_depth = stack_depth + 1;
458 stack[stack_depth] = 2;
459 stack_depth = stack_depth + 1;
464 You can see the generated machine code in assembly form via:
468 gcc_jit_context_set_bool_option (
470 GCC_JIT_BOOL_OPTION_DUMP_GENERATED_CODE,
472 result = gcc_jit_context_compile (ctxt);
474 which shows that (on this x86_64 box) the compiler has unrolled the loop
475 and is using MMX instructions to perform several multiplications
485 .type factorial, @function
488 .file 1 "factorial.toy"
499 leal 0(,%rcx,4), %esi
507 movd -16(%rsp), %xmm0
510 movd -16(%rsp), %xmm1
513 movd -12(%rsp), %xmm4
514 movd -16(%rsp), %xmm6
515 punpckldq %xmm4, %xmm0
516 movdqa .LC1(%rip), %xmm4
517 punpckldq %xmm6, %xmm1
518 punpcklqdq %xmm0, %xmm1
519 movdqa .LC0(%rip), %xmm0
521 # etc - edited for brevity
523 This is clearly overkill for a function that will likely overflow the
524 ``int`` type before the vectorization is worthwhile - but then again, this
527 Turning down the optimization level to 2:
531 gcc_jit_context_set_int_option (
533 GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL,
536 yields this code, which is simple enough to quote in its entirety:
544 .type factorial, @function
573 .size factorial, .-factorial
574 .ident "GCC: (GNU) 4.9.0 20131023 (Red Hat 0.2-%{gcc_release})"
575 .section .note.GNU-stack,"",@progbits
577 Note that the stack pushing and popping have been eliminated, as has the
578 recursive call (in favor of an iteration).
580 Putting it all together
581 ***********************
583 The complete example can be seen in the source tree at
584 ``gcc/jit/docs/examples/tut04-toyvm/toyvm.c``
586 along with a Makefile and a couple of sample .toy scripts:
588 .. code-block:: console
591 drwxrwxr-x. 2 david david 4096 Sep 19 17:46 .
592 drwxrwxr-x. 3 david david 4096 Sep 19 15:26 ..
593 -rw-rw-r--. 1 david david 615 Sep 19 12:43 factorial.toy
594 -rw-rw-r--. 1 david david 834 Sep 19 13:08 fibonacci.toy
595 -rw-rw-r--. 1 david david 238 Sep 19 14:22 Makefile
596 -rw-rw-r--. 1 david david 16457 Sep 19 17:07 toyvm.c
599 g++ -Wall -g -o toyvm toyvm.c -lgccjit
601 $ ./toyvm factorial.toy 10
602 interpreter result: 3628800
603 compiler result: 3628800
605 $ ./toyvm fibonacci.toy 10
606 interpreter result: 55
609 Behind the curtain: How does our code get optimized?
610 ****************************************************
612 Our example is done, but you may be wondering about exactly how the
613 compiler turned what we gave it into the machine code seen above.
615 We can examine what the compiler is doing in detail by setting:
619 gcc_jit_context_set_bool_option (state.ctxt,
620 GCC_JIT_BOOL_OPTION_DUMP_EVERYTHING,
622 gcc_jit_context_set_bool_option (state.ctxt,
623 GCC_JIT_BOOL_OPTION_KEEP_INTERMEDIATES,
626 This will dump detailed information about the compiler's state to a
627 directory under ``/tmp``, and keep it from being cleaned up.
629 The precise names and their formats of these files is subject to change.
630 Higher optimization levels lead to more files.
631 Here's what I saw (edited for brevity; there were almost 200 files):
633 .. code-block:: console
635 intermediate files written to /tmp/libgccjit-KPQbGw
636 $ ls /tmp/libgccjit-KPQbGw/
638 fake.c.000i.type-inheritance
644 fake.c.014i.visibility
645 fake.c.015i.early_local_cleanups
649 The gimple code is converted into Static Single Assignment form,
650 with annotations for use when generating the debuginfo:
652 .. code-block:: console
654 $ less /tmp/libgccjit-KPQbGw/fake.c.016t.ssa
658 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
660 factorial (signed int arg)
663 signed int stack_depth;
675 # DEBUG stack_depth => stack_depth_3
676 stack[stack_depth_3] = arg_5(D);
677 stack_depth_7 = stack_depth_3 + 1;
678 # DEBUG stack_depth => stack_depth_7
679 # DEBUG instr0 => NULL
680 # DEBUG /* DUP */ => NULL
681 stack_depth_8 = stack_depth_7 + -1;
682 # DEBUG stack_depth => stack_depth_8
683 x_9 = stack[stack_depth_8];
685 stack[stack_depth_8] = x_9;
686 stack_depth_11 = stack_depth_8 + 1;
687 # DEBUG stack_depth => stack_depth_11
688 stack[stack_depth_11] = x_9;
689 stack_depth_13 = stack_depth_11 + 1;
690 # DEBUG stack_depth => stack_depth_13
691 # DEBUG instr1 => NULL
692 # DEBUG /* PUSH_CONST */ => NULL
693 stack[stack_depth_13] = 2;
695 /* etc; edited for brevity */
697 We can perhaps better see the code by turning off
698 :c:macro:`GCC_JIT_BOOL_OPTION_DEBUGINFO` to suppress all those ``DEBUG``
701 .. code-block:: console
703 $ less /tmp/libgccjit-1Hywc0/fake.c.016t.ssa
707 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
709 factorial (signed int arg)
712 signed int stack_depth;
724 stack[stack_depth_3] = arg_5(D);
725 stack_depth_7 = stack_depth_3 + 1;
726 stack_depth_8 = stack_depth_7 + -1;
727 x_9 = stack[stack_depth_8];
728 stack[stack_depth_8] = x_9;
729 stack_depth_11 = stack_depth_8 + 1;
730 stack[stack_depth_11] = x_9;
731 stack_depth_13 = stack_depth_11 + 1;
732 stack[stack_depth_13] = 2;
733 stack_depth_15 = stack_depth_13 + 1;
734 stack_depth_16 = stack_depth_15 + -1;
735 y_17 = stack[stack_depth_16];
736 stack_depth_18 = stack_depth_16 + -1;
737 x_19 = stack[stack_depth_18];
739 _21 = (signed int) _20;
740 stack[stack_depth_18] = _21;
741 stack_depth_23 = stack_depth_18 + 1;
742 stack_depth_24 = stack_depth_23 + -1;
743 x_25 = stack[stack_depth_24];
745 goto <bb 4> (instr9);
747 goto <bb 3> (instr4);
751 stack_depth_26 = stack_depth_24 + -1;
752 x_27 = stack[stack_depth_26];
753 stack[stack_depth_26] = x_27;
754 stack_depth_29 = stack_depth_26 + 1;
755 stack[stack_depth_29] = x_27;
756 stack_depth_31 = stack_depth_29 + 1;
757 stack[stack_depth_31] = 1;
758 stack_depth_33 = stack_depth_31 + 1;
759 stack_depth_34 = stack_depth_33 + -1;
760 y_35 = stack[stack_depth_34];
761 stack_depth_36 = stack_depth_34 + -1;
762 x_37 = stack[stack_depth_36];
764 stack[stack_depth_36] = _38;
765 stack_depth_40 = stack_depth_36 + 1;
766 stack_depth_41 = stack_depth_40 + -1;
767 x_42 = stack[stack_depth_41];
768 _44 = factorial (x_42);
769 stack[stack_depth_41] = _44;
770 stack_depth_46 = stack_depth_41 + 1;
771 stack_depth_47 = stack_depth_46 + -1;
772 y_48 = stack[stack_depth_47];
773 stack_depth_49 = stack_depth_47 + -1;
774 x_50 = stack[stack_depth_49];
776 stack[stack_depth_49] = _51;
777 stack_depth_53 = stack_depth_49 + 1;
779 # stack_depth_1 = PHI <stack_depth_24(2), stack_depth_53(3)>
782 stack_depth_54 = stack_depth_1 + -1;
783 x_55 = stack[stack_depth_54];
785 stack ={v} {CLOBBER};
790 Note in the above how all the :c:type:`gcc_jit_block` instances we
791 created have been consolidated into just 3 blocks in GCC's internal
792 representation: ``initial``, ``instr4`` and ``instr9``.
794 Optimizing away stack manipulation
795 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
796 Recall our simple implementation of stack operations. Let's examine
797 how the stack operations are optimized away.
799 After a pass of constant-propagation, the depth of the stack at each
800 opcode can be determined at compile-time:
802 .. code-block:: console
804 $ less /tmp/libgccjit-1Hywc0/fake.c.021t.ccp1
808 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
810 factorial (signed int arg)
813 signed int stack_depth;
831 _21 = (signed int) _20;
835 goto <bb 4> (instr9);
837 goto <bb 3> (instr4);
850 _44 = factorial (x_42);
861 stack ={v} {CLOBBER};
866 Note how, in the above, all those ``stack_depth`` values are now just
867 constants: we're accessing specific stack locations at each opcode.
869 The "esra" pass ("Early Scalar Replacement of Aggregates") breaks
870 out our "stack" array into individual elements:
872 .. code-block:: console
874 $ less /tmp/libgccjit-1Hywc0/fake.c.024t.esra
878 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
880 Created a replacement for stack offset: 0, size: 32: stack$0
881 Created a replacement for stack offset: 32, size: 32: stack$1
882 Created a replacement for stack offset: 64, size: 32: stack$2
884 Symbols to be put in SSA form
886 Incremental SSA update started at block: 0
887 Number of blocks in CFG: 5
888 Number of blocks to update: 4 ( 80%)
891 factorial (signed int arg)
897 signed int stack_depth;
907 stack$0_45 = arg_5(D);
915 _21 = (signed int) _20;
919 goto <bb 4> (instr9);
921 goto <bb 3> (instr4);
934 _44 = factorial (x_42);
941 # stack$0_52 = PHI <stack$0_39(2), stack$0_1(3)>
946 stack ={v} {CLOBBER};
951 Hence at this point, all those pushes and pops of the stack are now
952 simply assignments to specific temporary variables.
954 After some copy propagation, the stack manipulation has been completely
957 .. code-block:: console
959 $ less /tmp/libgccjit-1Hywc0/fake.c.026t.copyprop1
963 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
965 factorial (signed int arg)
971 signed int stack_depth;
981 stack$0_39 = arg_5(D);
983 _21 = (signed int) _20;
985 goto <bb 4> (instr9);
987 goto <bb 3> (instr4);
992 _44 = factorial (_38);
993 _51 = arg_5(D) * _44;
996 # stack$0_52 = PHI <arg_5(D)(2), _51(3)>
999 stack ={v} {CLOBBER};
1004 Later on, another pass finally eliminated ``stack_depth`` local and the
1005 unused parts of the `stack`` array altogether:
1007 .. code-block:: console
1009 $ less /tmp/libgccjit-1Hywc0/fake.c.036t.release_ssa
1013 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
1015 Released 44 names, 314.29%, removed 44 holes
1016 factorial (signed int arg)
1019 signed int mult_acc_1;
1023 signed int mul_tmp_10;
1024 signed int mult_acc_11;
1025 signed int mult_acc_13;
1027 # arg_9 = PHI <arg_8(D)(0)>
1028 # mult_acc_13 = PHI <1(0)>
1032 # arg_4 = PHI <arg_9(2), _7(3)>
1033 # mult_acc_1 = PHI <mult_acc_13(2), mult_acc_11(3)>
1035 _6 = (signed int) _5;
1037 goto <bb 4> (instr9);
1039 goto <bb 3> (instr4);
1044 mult_acc_11 = mult_acc_1 * arg_4;
1047 # stack$0_12 = PHI <arg_4(5)>
1050 mul_tmp_10 = mult_acc_1 * stack$0_12;
1056 Elimination of tail recursion
1057 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1058 Another significant optimization is the detection that the call to
1059 ``factorial`` is tail recursion, which can be eliminated in favor of
1062 .. code-block:: console
1064 $ less /tmp/libgccjit-1Hywc0/fake.c.030t.tailr1
1068 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
1071 Symbols to be put in SSA form
1073 Incremental SSA update started at block: 0
1074 Number of blocks in CFG: 5
1075 Number of blocks to update: 4 ( 80%)
1078 factorial (signed int arg)
1083 signed int stack[8];
1084 signed int stack_depth;
1087 signed int mult_acc_1;
1091 signed int mul_tmp_44;
1092 signed int mult_acc_51;
1094 # arg_5 = PHI <arg_39(D)(0), _38(3)>
1095 # mult_acc_1 = PHI <1(0), mult_acc_51(3)>
1098 _21 = (signed int) _20;
1100 goto <bb 4> (instr9);
1102 goto <bb 3> (instr4);
1107 mult_acc_51 = mult_acc_1 * arg_5;
1108 goto <bb 2> (initial);
1110 # stack$0_52 = PHI <arg_5(2)>
1113 stack ={v} {CLOBBER};
1114 mul_tmp_44 = mult_acc_1 * stack$0_52;