1 .. Copyright (C) 2014-2021 Free Software Foundation, Inc.
2 Originally contributed by David Malcolm <dmalcolm@redhat.com>
4 This is free software: you can redistribute it and/or modify it
5 under the terms of the GNU General Public License as published by
6 the Free Software Foundation, either version 3 of the License, or
7 (at your option) any later version.
9 This program is distributed in the hope that it will be useful, but
10 WITHOUT ANY WARRANTY; without even the implied warranty of
11 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
12 General Public License for more details.
14 You should have received a copy of the GNU General Public License
15 along with this program. If not, see
16 <http://www.gnu.org/licenses/>.
18 .. default-domain:: cpp
20 Tutorial part 4: Adding JIT-compilation to a toy interpreter
21 ------------------------------------------------------------
22 In this example we construct a "toy" interpreter, and add JIT-compilation
28 It's a stack-based interpreter, and is intended as a (very simple) example
29 of the kind of bytecode interpreter seen in dynamic languages such as
32 For the sake of simplicity, our toy virtual machine is very limited:
34 * The only data type is `int`
36 * It can only work on one function at a time (so that the only
37 function call that can be made is to recurse).
39 * Functions can only take one parameter.
41 * Functions have a stack of `int` values.
43 * We'll implement function call within the interpreter by calling a
44 function in our implementation, rather than implementing our own
47 * The parser is only good enough to get the examples to work.
49 Naturally, a real interpreter would be much more complicated that this.
51 The following operations are supported:
53 ====================== ======================== =============== ==============
54 Operation Meaning Old Stack New Stack
55 ====================== ======================== =============== ==============
56 DUP Duplicate top of stack. ``[..., x]`` ``[..., x, x]``
57 ROT Swap top two elements ``[..., x, y]`` ``[..., y, x]``
59 BINARY_ADD Add the top two elements ``[..., x, y]`` ``[..., (x+y)]``
61 BINARY_SUBTRACT Likewise, but subtract. ``[..., x, y]`` ``[..., (x-y)]``
62 BINARY_MULT Likewise, but multiply. ``[..., x, y]`` ``[..., (x*y)]``
63 BINARY_COMPARE_LT Compare the top two ``[..., x, y]`` ``[..., (x<y)]``
65 and push a nonzero/zero
67 RECURSE Recurse, passing the top ``[..., x]`` ``[..., fn(x)]``
70 RETURN Return the top of the ``[x]`` ``[]``
72 PUSH_CONST `arg` Push an int const. ``[...]`` ``[..., arg]``
73 JUMP_ABS_IF_TRUE `arg` Pop; if top of stack was ``[..., x]`` ``[...]``
76 ====================== ======================== =============== ==============
78 Programs can be interpreted, disassembled, and compiled to machine code.
80 The interpreter reads ``.toy`` scripts. Here's what a simple recursive
81 factorial program looks like, the script ``factorial.toy``.
82 The parser ignores lines beginning with a `#`.
84 .. literalinclude:: ../../examples/tut04-toyvm/factorial.toy
88 The interpreter is a simple infinite loop with a big ``switch`` statement
89 based on what the next opcode is:
91 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
92 :start-after: /* Execute the given function. */
93 :end-before: /* JIT compilation. */
96 Compiling to machine code
97 *************************
98 We want to generate machine code that can be cast to this type and
99 then directly executed in-process:
101 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
102 :start-after: /* Functions are compiled to this function ptr type. */
103 :end-before: enum opcode
106 Our compiler isn't very sophisticated; it takes the implementation of
107 each opcode above, and maps it directly to the operations supported by
110 How should we handle the stack? In theory we could calculate what the
111 stack depth will be at each opcode, and optimize away the stack
112 manipulation "by hand". We'll see below that libgccjit is able to do
113 this for us, so we'll implement stack manipulation
114 in a direct way, by creating a ``stack`` array and ``stack_depth``
115 variables, local within the generated function, equivalent to this C code:
120 int stack[MAX_STACK_DEPTH];
122 We'll also have local variables ``x`` and ``y`` for use when implementing
123 the opcodes, equivalent to this:
130 This means our compiler has the following state:
132 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
133 :start-after: /* State. */
140 First we create our types:
142 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
143 :start-after: /* Create types. */
144 :end-before: /* The constant value 1. */
147 along with extracting a useful `int` constant:
149 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
150 :start-after: /* The constant value 1. */
151 :end-before: /* Create locations. */
154 We'll implement push and pop in terms of the ``stack`` array and
155 ``stack_depth``. Here are helper functions for adding statements to
156 a block, implementing pushing and popping values:
158 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
159 :start-after: /* Stack manipulation. */
160 :end-before: /* Create the context. */
163 We will support single-stepping through the generated code in the
164 debugger, so we need to create :type:`gccjit::location` instances, one
165 per operation in the source code. These will reference the lines of
166 e.g. ``factorial.toy``.
168 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
169 :start-after: /* Create locations. */
170 :end-before: /* Creating the function. */
173 Let's create the function itself. As usual, we create its parameter
174 first, then use the parameter to create the function:
176 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
177 :start-after: /* Creating the function. */
178 :end-before: /* Create stack lvalues. */
181 We create the locals within the function.
183 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
184 :start-after: /* Create stack lvalues. */
185 :end-before: /* 1st pass: create blocks, one per opcode.
188 Populating the function
189 ***********************
191 There's some one-time initialization, and the API treats the first block
192 you create as the entrypoint of the function, so we need to create that
195 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
196 :start-after: first. */
197 :end-before: /* Create a block per operation. */
200 We can now create blocks for each of the operations. Most of these will
201 be consolidated into larger blocks when the optimizer runs.
203 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
204 :start-after: /* Create a block per operation. */
205 :end-before: /* Populate the initial block. */
208 Now that we have a block it can jump to when it's done, we can populate
211 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
212 :start-after: /* Populate the initial block. */
213 :end-before: /* 2nd pass: fill in instructions. */
216 We can now populate the blocks for the individual operations. We loop
217 through them, adding instructions to their blocks:
219 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
220 :start-after: /* 2nd pass: fill in instructions. */
221 :end-before: /* Helper macros. */
224 We're going to have another big ``switch`` statement for implementing
225 the opcodes, this time for compiling them, rather than interpreting
226 them. It's helpful to have macros for implementing push and pop, so that
227 we can make the ``switch`` statement that's coming up look as much as
228 possible like the one above within the interpreter:
230 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
231 :start-after: /* Helper macros. */
232 :end-before: block.add_comment
237 A particularly clever implementation would have an *identical*
238 ``switch`` statement shared by the interpreter and the compiler, with
239 some preprocessor "magic". We're not doing that here, for the sake
242 When I first implemented this compiler, I accidentally missed an edit
243 when copying and pasting the ``Y_EQUALS_POP`` macro, so that popping the
244 stack into ``y`` instead erroneously assigned it to ``x``, leaving ``y``
247 To track this kind of thing down, we can use
248 :func:`gccjit::block::add_comment` to add descriptive comments
249 to the internal representation. This is invaluable when looking through
250 the generated IR for, say ``factorial``:
252 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
253 :start-after: PUSH_RVALUE (y)
254 :end-before: /* Handle the individual opcodes. */
257 We can now write the big ``switch`` statement that implements the
258 individual opcodes, populating the relevant block with statements:
260 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
261 :start-after: /* Handle the individual opcodes. */
262 :end-before: /* Go to the next block. */
265 Every block must be terminated, via a call to one of the
266 ``gccjit::block::end_with_`` entrypoints. This has been done for two
267 of the opcodes, but we need to do it for the other ones, by jumping
270 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
271 :start-after: /* Go to the next block. */
272 :end-before: /* end of loop on PC locations. */
275 This is analogous to simply incrementing the program counter.
277 Verifying the control flow graph
278 ********************************
279 Having finished looping over the blocks, the context is complete.
281 As before, we can verify that the control flow and statements are sane by
282 using :func:`gccjit::function::dump_to_dot`:
286 fn.dump_to_dot ("/tmp/factorial.dot");
288 and viewing the result. Note how the label names, comments, and
289 variable names show up in the dump, to make it easier to spot
290 errors in our compiler.
292 .. figure:: ../../intro/factorial.png
293 :alt: image of a control flow graph
295 Compiling the context
296 *********************
297 Having finished looping over the blocks and populating them with
298 statements, the context is complete.
300 We can now compile it, extract machine code from the result, and
303 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
304 :start-after: /* Wrapper around a gcc_jit_result *. */
305 :end-before: /* Functions are compiled to this function ptr type. */
308 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
309 :start-after: /* JIT-compilation. */
310 :end-before: return 0;
313 Single-stepping through the generated code
314 ******************************************
316 It's possible to debug the generated code. To do this we need to both:
318 * Set up source code locations for our statements, so that we can
319 meaningfully step through the code. We did this above by
320 calling :func:`gccjit::context::new_location` and using the
323 * Enable the generation of debugging information, by setting
324 :c:macro:`GCC_JIT_BOOL_OPTION_DEBUGINFO` on the
325 :type:`gccjit::context` via
326 :func:`gccjit::context::set_bool_option`:
330 ctxt.set_bool_option (GCC_JIT_BOOL_OPTION_DEBUGINFO, 1);
332 Having done this, we can put a breakpoint on the generated function:
334 .. code-block:: console
336 $ gdb --args ./toyvm factorial.toy 10
337 (gdb) break factorial
338 Function "factorial" not defined.
339 Make breakpoint pending on future shared library load? (y or [n]) y
340 Breakpoint 1 (factorial) pending.
342 Breakpoint 1, factorial (arg=10) at factorial.toy:14
345 We've set up location information, which references ``factorial.toy``.
346 This allows us to use e.g. ``list`` to see where we are in the script:
348 .. code-block:: console
357 15 # stack: [arg, arg]
362 and to step through the function, examining the data:
364 .. code-block:: console
371 $5 = {10, 10, 2, 0, -7152, 32767, 0, 0}
372 (gdb) print stack_depth
375 You'll see that the parts of the ``stack`` array that haven't been
376 touched yet are uninitialized.
380 Turning on optimizations may lead to unpredictable results when
381 stepping through the generated code: the execution may appear to
382 "jump around" the source code. This is analogous to turning up the
383 optimization level in a regular compiler.
385 Examining the generated code
386 ****************************
388 How good is the optimized code?
390 We can turn up optimizations, by calling
391 :cpp:func:`gccjit::context::set_int_option` with
392 :c:macro:`GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL`:
396 ctxt.set_int_option (GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL, 3);
398 One of GCC's internal representations is called "gimple". A dump of the
399 initial gimple representation of the code can be seen by setting:
403 ctxt.set_bool_option (GCC_JIT_BOOL_OPTION_DUMP_INITIAL_GIMPLE, 1);
405 With optimization on and source locations displayed, this gives:
407 .. We'll use "c" for gimple dumps
411 factorial (signed int arg)
421 signed int stack_depth;
428 stack[stack_depth] = arg;
429 stack_depth = stack_depth + 1;
433 stack_depth = stack_depth + -1;
434 x = stack[stack_depth];
435 stack[stack_depth] = x;
436 stack_depth = stack_depth + 1;
437 stack[stack_depth] = x;
438 stack_depth = stack_depth + 1;
442 stack[stack_depth] = 2;
443 stack_depth = stack_depth + 1;
448 You can see the generated machine code in assembly form via:
452 ctxt.set_bool_option (GCC_JIT_BOOL_OPTION_DUMP_GENERATED_CODE, 1);
453 result = ctxt.compile ();
455 which shows that (on this x86_64 box) the compiler has unrolled the loop
456 and is using MMX instructions to perform several multiplications
466 .type factorial, @function
469 .file 1 "factorial.toy"
480 leal 0(,%rcx,4), %esi
488 movd -16(%rsp), %xmm0
491 movd -16(%rsp), %xmm1
494 movd -12(%rsp), %xmm4
495 movd -16(%rsp), %xmm6
496 punpckldq %xmm4, %xmm0
497 movdqa .LC1(%rip), %xmm4
498 punpckldq %xmm6, %xmm1
499 punpcklqdq %xmm0, %xmm1
500 movdqa .LC0(%rip), %xmm0
502 # etc - edited for brevity
504 This is clearly overkill for a function that will likely overflow the
505 ``int`` type before the vectorization is worthwhile - but then again, this
508 Turning down the optimization level to 2:
512 ctxt.set_int_option (GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL, 2);
514 yields this code, which is simple enough to quote in its entirety:
522 .type factorial, @function
551 .size factorial, .-factorial
552 .ident "GCC: (GNU) 4.9.0 20131023 (Red Hat 0.2-%{gcc_release})"
553 .section .note.GNU-stack,"",@progbits
555 Note that the stack pushing and popping have been eliminated, as has the
556 recursive call (in favor of an iteration).
558 Putting it all together
559 ***********************
561 The complete example can be seen in the source tree at
562 ``gcc/jit/docs/examples/tut04-toyvm/toyvm.cc``
564 along with a Makefile and a couple of sample .toy scripts:
566 .. code-block:: console
569 drwxrwxr-x. 2 david david 4096 Sep 19 17:46 .
570 drwxrwxr-x. 3 david david 4096 Sep 19 15:26 ..
571 -rw-rw-r--. 1 david david 615 Sep 19 12:43 factorial.toy
572 -rw-rw-r--. 1 david david 834 Sep 19 13:08 fibonacci.toy
573 -rw-rw-r--. 1 david david 238 Sep 19 14:22 Makefile
574 -rw-rw-r--. 1 david david 16457 Sep 19 17:07 toyvm.cc
577 g++ -Wall -g -o toyvm toyvm.cc -lgccjit
579 $ ./toyvm factorial.toy 10
580 interpreter result: 3628800
581 compiler result: 3628800
583 $ ./toyvm fibonacci.toy 10
584 interpreter result: 55
587 Behind the curtain: How does our code get optimized?
588 ****************************************************
590 Our example is done, but you may be wondering about exactly how the
591 compiler turned what we gave it into the machine code seen above.
593 We can examine what the compiler is doing in detail by setting:
597 state.ctxt.set_bool_option (GCC_JIT_BOOL_OPTION_DUMP_EVERYTHING, 1);
598 state.ctxt.set_bool_option (GCC_JIT_BOOL_OPTION_KEEP_INTERMEDIATES, 1);
600 This will dump detailed information about the compiler's state to a
601 directory under ``/tmp``, and keep it from being cleaned up.
603 The precise names and their formats of these files is subject to change.
604 Higher optimization levels lead to more files.
605 Here's what I saw (edited for brevity; there were almost 200 files):
607 .. code-block:: console
609 intermediate files written to /tmp/libgccjit-KPQbGw
610 $ ls /tmp/libgccjit-KPQbGw/
612 fake.c.000i.type-inheritance
618 fake.c.014i.visibility
619 fake.c.015i.early_local_cleanups
623 The gimple code is converted into Static Single Assignment form,
624 with annotations for use when generating the debuginfo:
626 .. code-block:: console
628 $ less /tmp/libgccjit-KPQbGw/fake.c.016t.ssa
632 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
634 factorial (signed int arg)
637 signed int stack_depth;
649 # DEBUG stack_depth => stack_depth_3
650 stack[stack_depth_3] = arg_5(D);
651 stack_depth_7 = stack_depth_3 + 1;
652 # DEBUG stack_depth => stack_depth_7
653 # DEBUG instr0 => NULL
654 # DEBUG /* DUP */ => NULL
655 stack_depth_8 = stack_depth_7 + -1;
656 # DEBUG stack_depth => stack_depth_8
657 x_9 = stack[stack_depth_8];
659 stack[stack_depth_8] = x_9;
660 stack_depth_11 = stack_depth_8 + 1;
661 # DEBUG stack_depth => stack_depth_11
662 stack[stack_depth_11] = x_9;
663 stack_depth_13 = stack_depth_11 + 1;
664 # DEBUG stack_depth => stack_depth_13
665 # DEBUG instr1 => NULL
666 # DEBUG /* PUSH_CONST */ => NULL
667 stack[stack_depth_13] = 2;
669 /* etc; edited for brevity */
671 We can perhaps better see the code by turning off
672 :c:macro:`GCC_JIT_BOOL_OPTION_DEBUGINFO` to suppress all those ``DEBUG``
675 .. code-block:: console
677 $ less /tmp/libgccjit-1Hywc0/fake.c.016t.ssa
681 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
683 factorial (signed int arg)
686 signed int stack_depth;
698 stack[stack_depth_3] = arg_5(D);
699 stack_depth_7 = stack_depth_3 + 1;
700 stack_depth_8 = stack_depth_7 + -1;
701 x_9 = stack[stack_depth_8];
702 stack[stack_depth_8] = x_9;
703 stack_depth_11 = stack_depth_8 + 1;
704 stack[stack_depth_11] = x_9;
705 stack_depth_13 = stack_depth_11 + 1;
706 stack[stack_depth_13] = 2;
707 stack_depth_15 = stack_depth_13 + 1;
708 stack_depth_16 = stack_depth_15 + -1;
709 y_17 = stack[stack_depth_16];
710 stack_depth_18 = stack_depth_16 + -1;
711 x_19 = stack[stack_depth_18];
713 _21 = (signed int) _20;
714 stack[stack_depth_18] = _21;
715 stack_depth_23 = stack_depth_18 + 1;
716 stack_depth_24 = stack_depth_23 + -1;
717 x_25 = stack[stack_depth_24];
719 goto <bb 4> (instr9);
721 goto <bb 3> (instr4);
725 stack_depth_26 = stack_depth_24 + -1;
726 x_27 = stack[stack_depth_26];
727 stack[stack_depth_26] = x_27;
728 stack_depth_29 = stack_depth_26 + 1;
729 stack[stack_depth_29] = x_27;
730 stack_depth_31 = stack_depth_29 + 1;
731 stack[stack_depth_31] = 1;
732 stack_depth_33 = stack_depth_31 + 1;
733 stack_depth_34 = stack_depth_33 + -1;
734 y_35 = stack[stack_depth_34];
735 stack_depth_36 = stack_depth_34 + -1;
736 x_37 = stack[stack_depth_36];
738 stack[stack_depth_36] = _38;
739 stack_depth_40 = stack_depth_36 + 1;
740 stack_depth_41 = stack_depth_40 + -1;
741 x_42 = stack[stack_depth_41];
742 _44 = factorial (x_42);
743 stack[stack_depth_41] = _44;
744 stack_depth_46 = stack_depth_41 + 1;
745 stack_depth_47 = stack_depth_46 + -1;
746 y_48 = stack[stack_depth_47];
747 stack_depth_49 = stack_depth_47 + -1;
748 x_50 = stack[stack_depth_49];
750 stack[stack_depth_49] = _51;
751 stack_depth_53 = stack_depth_49 + 1;
753 # stack_depth_1 = PHI <stack_depth_24(2), stack_depth_53(3)>
756 stack_depth_54 = stack_depth_1 + -1;
757 x_55 = stack[stack_depth_54];
759 stack ={v} {CLOBBER};
764 Note in the above how all the :type:`gccjit::block` instances we
765 created have been consolidated into just 3 blocks in GCC's internal
766 representation: ``initial``, ``instr4`` and ``instr9``.
768 Optimizing away stack manipulation
769 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
770 Recall our simple implementation of stack operations. Let's examine
771 how the stack operations are optimized away.
773 After a pass of constant-propagation, the depth of the stack at each
774 opcode can be determined at compile-time:
776 .. code-block:: console
778 $ less /tmp/libgccjit-1Hywc0/fake.c.021t.ccp1
782 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
784 factorial (signed int arg)
787 signed int stack_depth;
805 _21 = (signed int) _20;
809 goto <bb 4> (instr9);
811 goto <bb 3> (instr4);
824 _44 = factorial (x_42);
835 stack ={v} {CLOBBER};
840 Note how, in the above, all those ``stack_depth`` values are now just
841 constants: we're accessing specific stack locations at each opcode.
843 The "esra" pass ("Early Scalar Replacement of Aggregates") breaks
844 out our "stack" array into individual elements:
846 .. code-block:: console
848 $ less /tmp/libgccjit-1Hywc0/fake.c.024t.esra
852 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
854 Created a replacement for stack offset: 0, size: 32: stack$0
855 Created a replacement for stack offset: 32, size: 32: stack$1
856 Created a replacement for stack offset: 64, size: 32: stack$2
858 Symbols to be put in SSA form
860 Incremental SSA update started at block: 0
861 Number of blocks in CFG: 5
862 Number of blocks to update: 4 ( 80%)
865 factorial (signed int arg)
871 signed int stack_depth;
881 stack$0_45 = arg_5(D);
889 _21 = (signed int) _20;
893 goto <bb 4> (instr9);
895 goto <bb 3> (instr4);
908 _44 = factorial (x_42);
915 # stack$0_52 = PHI <stack$0_39(2), stack$0_1(3)>
920 stack ={v} {CLOBBER};
925 Hence at this point, all those pushes and pops of the stack are now
926 simply assignments to specific temporary variables.
928 After some copy propagation, the stack manipulation has been completely
931 .. code-block:: console
933 $ less /tmp/libgccjit-1Hywc0/fake.c.026t.copyprop1
937 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
939 factorial (signed int arg)
945 signed int stack_depth;
955 stack$0_39 = arg_5(D);
957 _21 = (signed int) _20;
959 goto <bb 4> (instr9);
961 goto <bb 3> (instr4);
966 _44 = factorial (_38);
967 _51 = arg_5(D) * _44;
970 # stack$0_52 = PHI <arg_5(D)(2), _51(3)>
973 stack ={v} {CLOBBER};
978 Later on, another pass finally eliminated ``stack_depth`` local and the
979 unused parts of the `stack`` array altogether:
981 .. code-block:: console
983 $ less /tmp/libgccjit-1Hywc0/fake.c.036t.release_ssa
987 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
989 Released 44 names, 314.29%, removed 44 holes
990 factorial (signed int arg)
993 signed int mult_acc_1;
997 signed int mul_tmp_10;
998 signed int mult_acc_11;
999 signed int mult_acc_13;
1001 # arg_9 = PHI <arg_8(D)(0)>
1002 # mult_acc_13 = PHI <1(0)>
1006 # arg_4 = PHI <arg_9(2), _7(3)>
1007 # mult_acc_1 = PHI <mult_acc_13(2), mult_acc_11(3)>
1009 _6 = (signed int) _5;
1011 goto <bb 4> (instr9);
1013 goto <bb 3> (instr4);
1018 mult_acc_11 = mult_acc_1 * arg_4;
1021 # stack$0_12 = PHI <arg_4(5)>
1024 mul_tmp_10 = mult_acc_1 * stack$0_12;
1030 Elimination of tail recursion
1031 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1032 Another significant optimization is the detection that the call to
1033 ``factorial`` is tail recursion, which can be eliminated in favor of
1036 .. code-block:: console
1038 $ less /tmp/libgccjit-1Hywc0/fake.c.030t.tailr1
1042 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
1045 Symbols to be put in SSA form
1047 Incremental SSA update started at block: 0
1048 Number of blocks in CFG: 5
1049 Number of blocks to update: 4 ( 80%)
1052 factorial (signed int arg)
1057 signed int stack[8];
1058 signed int stack_depth;
1061 signed int mult_acc_1;
1065 signed int mul_tmp_44;
1066 signed int mult_acc_51;
1068 # arg_5 = PHI <arg_39(D)(0), _38(3)>
1069 # mult_acc_1 = PHI <1(0), mult_acc_51(3)>
1072 _21 = (signed int) _20;
1074 goto <bb 4> (instr9);
1076 goto <bb 3> (instr4);
1081 mult_acc_51 = mult_acc_1 * arg_5;
1082 goto <bb 2> (initial);
1084 # stack$0_52 = PHI <arg_5(2)>
1087 stack ={v} {CLOBBER};
1088 mul_tmp_44 = mult_acc_1 * stack$0_52;