1 .. Copyright (C) 2014-2016 Free Software Foundation, Inc.
2 Originally contributed by David Malcolm <dmalcolm@redhat.com>
4 This is free software: you can redistribute it and/or modify it
5 under the terms of the GNU General Public License as published by
6 the Free Software Foundation, either version 3 of the License, or
7 (at your option) any later version.
9 This program is distributed in the hope that it will be useful, but
10 WITHOUT ANY WARRANTY; without even the implied warranty of
11 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
12 General Public License for more details.
14 You should have received a copy of the GNU General Public License
15 along with this program. If not, see
16 <http://www.gnu.org/licenses/>.
18 .. default-domain:: cpp
20 Tutorial part 4: Adding JIT-compilation to a toy interpreter
21 ------------------------------------------------------------
22 In this example we construct a "toy" interpreter, and add JIT-compilation
28 It's a stack-based interpreter, and is intended as a (very simple) example
29 of the kind of bytecode interpreter seen in dynamic languages such as
32 For the sake of simplicity, our toy virtual machine is very limited:
34 * The only data type is `int`
36 * It can only work on one function at a time (so that the only
37 function call that can be made is to recurse).
39 * Functions can only take one parameter.
41 * Functions have a stack of `int` values.
43 * We'll implement function call within the interpreter by calling a
44 function in our implementation, rather than implementing our own
47 * The parser is only good enough to get the examples to work.
49 Naturally, a real interpreter would be much more complicated that this.
51 The following operations are supported:
53 ====================== ======================== =============== ==============
54 Operation Meaning Old Stack New Stack
55 ====================== ======================== =============== ==============
56 DUP Duplicate top of stack. ``[..., x]`` ``[..., x, x]``
57 ROT Swap top two elements ``[..., x, y]`` ``[..., y, x]``
59 BINARY_ADD Add the top two elements ``[..., x, y]`` ``[..., (x+y)]``
61 BINARY_SUBTRACT Likewise, but subtract. ``[..., x, y]`` ``[..., (x-y)]``
62 BINARY_MULT Likewise, but multiply. ``[..., x, y]`` ``[..., (x*y)]``
63 BINARY_COMPARE_LT Compare the top two ``[..., x, y]`` ``[..., (x<y)]``
65 and push a nonzero/zero
67 RECURSE Recurse, passing the top ``[..., x]`` ``[..., fn(x)]``
70 RETURN Return the top of the ``[x]`` ``[]``
72 PUSH_CONST `arg` Push an int const. ``[...]`` ``[..., arg]``
73 JUMP_ABS_IF_TRUE `arg` Pop; if top of stack was ``[..., x]`` ``[...]``
76 ====================== ======================== =============== ==============
78 Programs can be interpreted, disassembled, and compiled to machine code.
80 The interpreter reads ``.toy`` scripts. Here's what a simple recursive
81 factorial program looks like, the script ``factorial.toy``.
82 The parser ignores lines beginning with a `#`.
84 .. literalinclude:: ../../examples/tut04-toyvm/factorial.toy
88 The interpreter is a simple infinite loop with a big ``switch`` statement
89 based on what the next opcode is:
91 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
92 :start-after: /* Execute the given function. */
93 :end-before: /* JIT compilation. */
96 Compiling to machine code
97 *************************
98 We want to generate machine code that can be cast to this type and
99 then directly executed in-process:
101 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
102 :start-after: /* Functions are compiled to this function ptr type. */
103 :end-before: enum opcode
106 Our compiler isn't very sophisticated; it takes the implementation of
107 each opcode above, and maps it directly to the operations supported by
110 How should we handle the stack? In theory we could calculate what the
111 stack depth will be at each opcode, and optimize away the stack
112 manipulation "by hand". We'll see below that libgccjit is able to do
113 this for us, so we'll implement stack manipulation
114 in a direct way, by creating a ``stack`` array and ``stack_depth``
115 variables, local within the generated function, equivalent to this C code:
120 int stack[MAX_STACK_DEPTH];
122 We'll also have local variables ``x`` and ``y`` for use when implementing
123 the opcodes, equivalent to this:
130 This means our compiler has the following state:
132 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
133 :start-after: /* State. */
140 First we create our types:
142 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
143 :start-after: /* Create types. */
144 :end-before: /* The constant value 1. */
147 along with extracting a useful `int` constant:
149 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
150 :start-after: /* The constant value 1. */
151 :end-before: /* Create locations. */
154 We'll implement push and pop in terms of the ``stack`` array and
155 ``stack_depth``. Here are helper functions for adding statements to
156 a block, implementing pushing and popping values:
158 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
159 :start-after: /* Stack manipulation. */
160 :end-before: /* Create the context. */
163 We will support single-stepping through the generated code in the
164 debugger, so we need to create :type:`gccjit::location` instances, one
165 per operation in the source code. These will reference the lines of
166 e.g. ``factorial.toy``.
168 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
169 :start-after: /* Create locations. */
170 :end-before: /* Creating the function. */
173 Let's create the function itself. As usual, we create its parameter
174 first, then use the parameter to create the function:
176 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
177 :start-after: /* Creating the function. */
178 :end-before: /* Create stack lvalues. */
181 We create the locals within the function.
183 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
184 :start-after: /* Create stack lvalues. */
185 :end-before: /* 1st pass: create blocks, one per opcode.
188 Populating the function
189 ***********************
191 There's some one-time initialization, and the API treats the first block
192 you create as the entrypoint of the function, so we need to create that
195 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
196 :start-after: first. */
197 :end-before: /* Create a block per operation. */
200 We can now create blocks for each of the operations. Most of these will
201 be consolidated into larger blocks when the optimizer runs.
203 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
204 :start-after: /* Create a block per operation. */
205 :end-before: /* Populate the initial block. */
208 Now that we have a block it can jump to when it's done, we can populate
211 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
212 :start-after: /* Populate the initial block. */
213 :end-before: /* 2nd pass: fill in instructions. */
216 We can now populate the blocks for the individual operations. We loop
217 through them, adding instructions to their blocks:
219 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
220 :start-after: /* 2nd pass: fill in instructions. */
221 :end-before: /* Helper macros. */
224 We're going to have another big ``switch`` statement for implementing
225 the opcodes, this time for compiling them, rather than interpreting
226 them. It's helpful to have macros for implementing push and pop, so that
227 we can make the ``switch`` statement that's coming up look as much as
228 possible like the one above within the interpreter:
230 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
231 :start-after: /* Helper macros. */
232 :end-before: block.add_comment
237 A particularly clever implementation would have an *identical*
238 ``switch`` statement shared by the interpreter and the compiler, with
239 some preprocessor "magic". We're not doing that here, for the sake
242 When I first implemented this compiler, I accidentally missed an edit
243 when copying and pasting the ``Y_EQUALS_POP`` macro, so that popping the
244 stack into ``y`` instead erroneously assigned it to ``x``, leaving ``y``
247 To track this kind of thing down, we can use
248 :func:`gccjit::block::add_comment` to add descriptive comments
249 to the internal representation. This is invaluable when looking through
250 the generated IR for, say ``factorial``:
252 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
253 :start-after: PUSH_RVALUE (y)
254 :end-before: /* Handle the individual opcodes. */
257 We can now write the big ``switch`` statement that implements the
258 individual opcodes, populating the relevant block with statements:
260 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
261 :start-after: /* Handle the individual opcodes. */
262 :end-before: /* Go to the next block. */
265 Every block must be terminated, via a call to one of the
266 ``gccjit::block::end_with_`` entrypoints. This has been done for two
267 of the opcodes, but we need to do it for the other ones, by jumping
270 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
271 :start-after: /* Go to the next block. */
272 :end-before: /* end of loop on PC locations. */
275 This is analogous to simply incrementing the program counter.
277 Verifying the control flow graph
278 ********************************
279 Having finished looping over the blocks, the context is complete.
281 As before, we can verify that the control flow and statements are sane by
282 using :func:`gccjit::function::dump_to_dot`:
286 fn.dump_to_dot ("/tmp/factorial.dot");
288 and viewing the result. Note how the label names, comments, and
289 variable names show up in the dump, to make it easier to spot
290 errors in our compiler.
292 .. figure:: ../../intro/factorial.png
293 :alt: image of a control flow graph
295 Compiling the context
296 *********************
297 Having finished looping over the blocks and populating them with
298 statements, the context is complete.
300 We can now compile it, and extract machine code from the result:
302 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
303 :start-after: /* We've now finished populating the context. Compile it. */
304 :end-before: /* (this leaks "result" and "funcname") */
307 We can now run the result:
309 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
310 :start-after: /* JIT-compilation. */
311 :end-before: return 0;
314 Single-stepping through the generated code
315 ******************************************
317 It's possible to debug the generated code. To do this we need to both:
319 * Set up source code locations for our statements, so that we can
320 meaningfully step through the code. We did this above by
321 calling :func:`gccjit::context::new_location` and using the
324 * Enable the generation of debugging information, by setting
325 :c:macro:`GCC_JIT_BOOL_OPTION_DEBUGINFO` on the
326 :type:`gccjit::context` via
327 :func:`gccjit::context::set_bool_option`:
331 ctxt.set_bool_option (GCC_JIT_BOOL_OPTION_DEBUGINFO, 1);
333 Having done this, we can put a breakpoint on the generated function:
335 .. code-block:: console
337 $ gdb --args ./toyvm factorial.toy 10
338 (gdb) break factorial
339 Function "factorial" not defined.
340 Make breakpoint pending on future shared library load? (y or [n]) y
341 Breakpoint 1 (factorial) pending.
343 Breakpoint 1, factorial (arg=10) at factorial.toy:14
346 We've set up location information, which references ``factorial.toy``.
347 This allows us to use e.g. ``list`` to see where we are in the script:
349 .. code-block:: console
358 15 # stack: [arg, arg]
363 and to step through the function, examining the data:
365 .. code-block:: console
372 $5 = {10, 10, 2, 0, -7152, 32767, 0, 0}
373 (gdb) print stack_depth
376 You'll see that the parts of the ``stack`` array that haven't been
377 touched yet are uninitialized.
381 Turning on optimizations may lead to unpredictable results when
382 stepping through the generated code: the execution may appear to
383 "jump around" the source code. This is analogous to turning up the
384 optimization level in a regular compiler.
386 Examining the generated code
387 ****************************
389 How good is the optimized code?
391 We can turn up optimizations, by calling
392 :cpp:func:`gccjit::context::set_int_option` with
393 :c:macro:`GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL`:
397 ctxt.set_int_option (GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL, 3);
399 One of GCC's internal representations is called "gimple". A dump of the
400 initial gimple representation of the code can be seen by setting:
404 ctxt.set_bool_option (GCC_JIT_BOOL_OPTION_DUMP_INITIAL_GIMPLE, 1);
406 With optimization on and source locations displayed, this gives:
408 .. We'll use "c" for gimple dumps
412 factorial (signed int arg)
422 signed int stack_depth;
429 stack[stack_depth] = arg;
430 stack_depth = stack_depth + 1;
434 stack_depth = stack_depth + -1;
435 x = stack[stack_depth];
436 stack[stack_depth] = x;
437 stack_depth = stack_depth + 1;
438 stack[stack_depth] = x;
439 stack_depth = stack_depth + 1;
443 stack[stack_depth] = 2;
444 stack_depth = stack_depth + 1;
449 You can see the generated machine code in assembly form via:
453 ctxt.set_bool_option (GCC_JIT_BOOL_OPTION_DUMP_GENERATED_CODE, 1);
454 result = ctxt.compile ();
456 which shows that (on this x86_64 box) the compiler has unrolled the loop
457 and is using MMX instructions to perform several multiplications
467 .type factorial, @function
470 .file 1 "factorial.toy"
481 leal 0(,%rcx,4), %esi
489 movd -16(%rsp), %xmm0
492 movd -16(%rsp), %xmm1
495 movd -12(%rsp), %xmm4
496 movd -16(%rsp), %xmm6
497 punpckldq %xmm4, %xmm0
498 movdqa .LC1(%rip), %xmm4
499 punpckldq %xmm6, %xmm1
500 punpcklqdq %xmm0, %xmm1
501 movdqa .LC0(%rip), %xmm0
503 # etc - edited for brevity
505 This is clearly overkill for a function that will likely overflow the
506 ``int`` type before the vectorization is worthwhile - but then again, this
509 Turning down the optimization level to 2:
513 ctxt.set_int_option (GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL, 2);
515 yields this code, which is simple enough to quote in its entirety:
523 .type factorial, @function
552 .size factorial, .-factorial
553 .ident "GCC: (GNU) 4.9.0 20131023 (Red Hat 0.2-%{gcc_release})"
554 .section .note.GNU-stack,"",@progbits
556 Note that the stack pushing and popping have been eliminated, as has the
557 recursive call (in favor of an iteration).
559 Putting it all together
560 ***********************
562 The complete example can be seen in the source tree at
563 ``gcc/jit/docs/examples/tut04-toyvm/toyvm.cc``
565 along with a Makefile and a couple of sample .toy scripts:
567 .. code-block:: console
570 drwxrwxr-x. 2 david david 4096 Sep 19 17:46 .
571 drwxrwxr-x. 3 david david 4096 Sep 19 15:26 ..
572 -rw-rw-r--. 1 david david 615 Sep 19 12:43 factorial.toy
573 -rw-rw-r--. 1 david david 834 Sep 19 13:08 fibonacci.toy
574 -rw-rw-r--. 1 david david 238 Sep 19 14:22 Makefile
575 -rw-rw-r--. 1 david david 16457 Sep 19 17:07 toyvm.cc
578 g++ -Wall -g -o toyvm toyvm.cc -lgccjit
580 $ ./toyvm factorial.toy 10
581 interpreter result: 3628800
582 compiler result: 3628800
584 $ ./toyvm fibonacci.toy 10
585 interpreter result: 55
588 Behind the curtain: How does our code get optimized?
589 ****************************************************
591 Our example is done, but you may be wondering about exactly how the
592 compiler turned what we gave it into the machine code seen above.
594 We can examine what the compiler is doing in detail by setting:
598 state.ctxt.set_bool_option (GCC_JIT_BOOL_OPTION_DUMP_EVERYTHING, 1);
599 state.ctxt.set_bool_option (GCC_JIT_BOOL_OPTION_KEEP_INTERMEDIATES, 1);
601 This will dump detailed information about the compiler's state to a
602 directory under ``/tmp``, and keep it from being cleaned up.
604 The precise names and their formats of these files is subject to change.
605 Higher optimization levels lead to more files.
606 Here's what I saw (edited for brevity; there were almost 200 files):
608 .. code-block:: console
610 intermediate files written to /tmp/libgccjit-KPQbGw
611 $ ls /tmp/libgccjit-KPQbGw/
613 fake.c.000i.type-inheritance
619 fake.c.014i.visibility
620 fake.c.015i.early_local_cleanups
624 The gimple code is converted into Static Single Assignment form,
625 with annotations for use when generating the debuginfo:
627 .. code-block:: console
629 $ less /tmp/libgccjit-KPQbGw/fake.c.016t.ssa
633 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
635 factorial (signed int arg)
638 signed int stack_depth;
650 # DEBUG stack_depth => stack_depth_3
651 stack[stack_depth_3] = arg_5(D);
652 stack_depth_7 = stack_depth_3 + 1;
653 # DEBUG stack_depth => stack_depth_7
654 # DEBUG instr0 => NULL
655 # DEBUG /* DUP */ => NULL
656 stack_depth_8 = stack_depth_7 + -1;
657 # DEBUG stack_depth => stack_depth_8
658 x_9 = stack[stack_depth_8];
660 stack[stack_depth_8] = x_9;
661 stack_depth_11 = stack_depth_8 + 1;
662 # DEBUG stack_depth => stack_depth_11
663 stack[stack_depth_11] = x_9;
664 stack_depth_13 = stack_depth_11 + 1;
665 # DEBUG stack_depth => stack_depth_13
666 # DEBUG instr1 => NULL
667 # DEBUG /* PUSH_CONST */ => NULL
668 stack[stack_depth_13] = 2;
670 /* etc; edited for brevity */
672 We can perhaps better see the code by turning off
673 :c:macro:`GCC_JIT_BOOL_OPTION_DEBUGINFO` to suppress all those ``DEBUG``
676 .. code-block:: console
678 $ less /tmp/libgccjit-1Hywc0/fake.c.016t.ssa
682 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
684 factorial (signed int arg)
687 signed int stack_depth;
699 stack[stack_depth_3] = arg_5(D);
700 stack_depth_7 = stack_depth_3 + 1;
701 stack_depth_8 = stack_depth_7 + -1;
702 x_9 = stack[stack_depth_8];
703 stack[stack_depth_8] = x_9;
704 stack_depth_11 = stack_depth_8 + 1;
705 stack[stack_depth_11] = x_9;
706 stack_depth_13 = stack_depth_11 + 1;
707 stack[stack_depth_13] = 2;
708 stack_depth_15 = stack_depth_13 + 1;
709 stack_depth_16 = stack_depth_15 + -1;
710 y_17 = stack[stack_depth_16];
711 stack_depth_18 = stack_depth_16 + -1;
712 x_19 = stack[stack_depth_18];
714 _21 = (signed int) _20;
715 stack[stack_depth_18] = _21;
716 stack_depth_23 = stack_depth_18 + 1;
717 stack_depth_24 = stack_depth_23 + -1;
718 x_25 = stack[stack_depth_24];
720 goto <bb 4> (instr9);
722 goto <bb 3> (instr4);
726 stack_depth_26 = stack_depth_24 + -1;
727 x_27 = stack[stack_depth_26];
728 stack[stack_depth_26] = x_27;
729 stack_depth_29 = stack_depth_26 + 1;
730 stack[stack_depth_29] = x_27;
731 stack_depth_31 = stack_depth_29 + 1;
732 stack[stack_depth_31] = 1;
733 stack_depth_33 = stack_depth_31 + 1;
734 stack_depth_34 = stack_depth_33 + -1;
735 y_35 = stack[stack_depth_34];
736 stack_depth_36 = stack_depth_34 + -1;
737 x_37 = stack[stack_depth_36];
739 stack[stack_depth_36] = _38;
740 stack_depth_40 = stack_depth_36 + 1;
741 stack_depth_41 = stack_depth_40 + -1;
742 x_42 = stack[stack_depth_41];
743 _44 = factorial (x_42);
744 stack[stack_depth_41] = _44;
745 stack_depth_46 = stack_depth_41 + 1;
746 stack_depth_47 = stack_depth_46 + -1;
747 y_48 = stack[stack_depth_47];
748 stack_depth_49 = stack_depth_47 + -1;
749 x_50 = stack[stack_depth_49];
751 stack[stack_depth_49] = _51;
752 stack_depth_53 = stack_depth_49 + 1;
754 # stack_depth_1 = PHI <stack_depth_24(2), stack_depth_53(3)>
757 stack_depth_54 = stack_depth_1 + -1;
758 x_55 = stack[stack_depth_54];
760 stack ={v} {CLOBBER};
765 Note in the above how all the :type:`gccjit::block` instances we
766 created have been consolidated into just 3 blocks in GCC's internal
767 representation: ``initial``, ``instr4`` and ``instr9``.
769 Optimizing away stack manipulation
770 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
771 Recall our simple implementation of stack operations. Let's examine
772 how the stack operations are optimized away.
774 After a pass of constant-propagation, the depth of the stack at each
775 opcode can be determined at compile-time:
777 .. code-block:: console
779 $ less /tmp/libgccjit-1Hywc0/fake.c.021t.ccp1
783 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
785 factorial (signed int arg)
788 signed int stack_depth;
806 _21 = (signed int) _20;
810 goto <bb 4> (instr9);
812 goto <bb 3> (instr4);
825 _44 = factorial (x_42);
836 stack ={v} {CLOBBER};
841 Note how, in the above, all those ``stack_depth`` values are now just
842 constants: we're accessing specific stack locations at each opcode.
844 The "esra" pass ("Early Scalar Replacement of Aggregates") breaks
845 out our "stack" array into individual elements:
847 .. code-block:: console
849 $ less /tmp/libgccjit-1Hywc0/fake.c.024t.esra
853 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
855 Created a replacement for stack offset: 0, size: 32: stack$0
856 Created a replacement for stack offset: 32, size: 32: stack$1
857 Created a replacement for stack offset: 64, size: 32: stack$2
859 Symbols to be put in SSA form
861 Incremental SSA update started at block: 0
862 Number of blocks in CFG: 5
863 Number of blocks to update: 4 ( 80%)
866 factorial (signed int arg)
872 signed int stack_depth;
882 stack$0_45 = arg_5(D);
890 _21 = (signed int) _20;
894 goto <bb 4> (instr9);
896 goto <bb 3> (instr4);
909 _44 = factorial (x_42);
916 # stack$0_52 = PHI <stack$0_39(2), stack$0_1(3)>
921 stack ={v} {CLOBBER};
926 Hence at this point, all those pushes and pops of the stack are now
927 simply assignments to specific temporary variables.
929 After some copy propagation, the stack manipulation has been completely
932 .. code-block:: console
934 $ less /tmp/libgccjit-1Hywc0/fake.c.026t.copyprop1
938 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
940 factorial (signed int arg)
946 signed int stack_depth;
956 stack$0_39 = arg_5(D);
958 _21 = (signed int) _20;
960 goto <bb 4> (instr9);
962 goto <bb 3> (instr4);
967 _44 = factorial (_38);
968 _51 = arg_5(D) * _44;
971 # stack$0_52 = PHI <arg_5(D)(2), _51(3)>
974 stack ={v} {CLOBBER};
979 Later on, another pass finally eliminated ``stack_depth`` local and the
980 unused parts of the `stack`` array altogether:
982 .. code-block:: console
984 $ less /tmp/libgccjit-1Hywc0/fake.c.036t.release_ssa
988 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
990 Released 44 names, 314.29%, removed 44 holes
991 factorial (signed int arg)
994 signed int mult_acc_1;
998 signed int mul_tmp_10;
999 signed int mult_acc_11;
1000 signed int mult_acc_13;
1002 # arg_9 = PHI <arg_8(D)(0)>
1003 # mult_acc_13 = PHI <1(0)>
1007 # arg_4 = PHI <arg_9(2), _7(3)>
1008 # mult_acc_1 = PHI <mult_acc_13(2), mult_acc_11(3)>
1010 _6 = (signed int) _5;
1012 goto <bb 4> (instr9);
1014 goto <bb 3> (instr4);
1019 mult_acc_11 = mult_acc_1 * arg_4;
1022 # stack$0_12 = PHI <arg_4(5)>
1025 mul_tmp_10 = mult_acc_1 * stack$0_12;
1031 Elimination of tail recursion
1032 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1033 Another significant optimization is the detection that the call to
1034 ``factorial`` is tail recursion, which can be eliminated in favor of
1037 .. code-block:: console
1039 $ less /tmp/libgccjit-1Hywc0/fake.c.030t.tailr1
1043 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
1046 Symbols to be put in SSA form
1048 Incremental SSA update started at block: 0
1049 Number of blocks in CFG: 5
1050 Number of blocks to update: 4 ( 80%)
1053 factorial (signed int arg)
1058 signed int stack[8];
1059 signed int stack_depth;
1062 signed int mult_acc_1;
1066 signed int mul_tmp_44;
1067 signed int mult_acc_51;
1069 # arg_5 = PHI <arg_39(D)(0), _38(3)>
1070 # mult_acc_1 = PHI <1(0), mult_acc_51(3)>
1073 _21 = (signed int) _20;
1075 goto <bb 4> (instr9);
1077 goto <bb 3> (instr4);
1082 mult_acc_51 = mult_acc_1 * arg_5;
1083 goto <bb 2> (initial);
1085 # stack$0_52 = PHI <arg_5(2)>
1088 stack ={v} {CLOBBER};
1089 mul_tmp_44 = mult_acc_1 * stack$0_52;