]> git.ipfire.org Git - thirdparty/gcc.git/blob - gcc/jit/docs/cp/intro/tutorial04.rst
Update copyright years.
[thirdparty/gcc.git] / gcc / jit / docs / cp / intro / tutorial04.rst
1 .. Copyright (C) 2014-2016 Free Software Foundation, Inc.
2 Originally contributed by David Malcolm <dmalcolm@redhat.com>
3
4 This is free software: you can redistribute it and/or modify it
5 under the terms of the GNU General Public License as published by
6 the Free Software Foundation, either version 3 of the License, or
7 (at your option) any later version.
8
9 This program is distributed in the hope that it will be useful, but
10 WITHOUT ANY WARRANTY; without even the implied warranty of
11 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
12 General Public License for more details.
13
14 You should have received a copy of the GNU General Public License
15 along with this program. If not, see
16 <http://www.gnu.org/licenses/>.
17
18 .. default-domain:: cpp
19
20 Tutorial part 4: Adding JIT-compilation to a toy interpreter
21 ------------------------------------------------------------
22 In this example we construct a "toy" interpreter, and add JIT-compilation
23 to it.
24
25 Our toy interpreter
26 *******************
27
28 It's a stack-based interpreter, and is intended as a (very simple) example
29 of the kind of bytecode interpreter seen in dynamic languages such as
30 Python, Ruby etc.
31
32 For the sake of simplicity, our toy virtual machine is very limited:
33
34 * The only data type is `int`
35
36 * It can only work on one function at a time (so that the only
37 function call that can be made is to recurse).
38
39 * Functions can only take one parameter.
40
41 * Functions have a stack of `int` values.
42
43 * We'll implement function call within the interpreter by calling a
44 function in our implementation, rather than implementing our own
45 frame stack.
46
47 * The parser is only good enough to get the examples to work.
48
49 Naturally, a real interpreter would be much more complicated that this.
50
51 The following operations are supported:
52
53 ====================== ======================== =============== ==============
54 Operation Meaning Old Stack New Stack
55 ====================== ======================== =============== ==============
56 DUP Duplicate top of stack. ``[..., x]`` ``[..., x, x]``
57 ROT Swap top two elements ``[..., x, y]`` ``[..., y, x]``
58 of stack.
59 BINARY_ADD Add the top two elements ``[..., x, y]`` ``[..., (x+y)]``
60 on the stack.
61 BINARY_SUBTRACT Likewise, but subtract. ``[..., x, y]`` ``[..., (x-y)]``
62 BINARY_MULT Likewise, but multiply. ``[..., x, y]`` ``[..., (x*y)]``
63 BINARY_COMPARE_LT Compare the top two ``[..., x, y]`` ``[..., (x<y)]``
64 elements on the stack
65 and push a nonzero/zero
66 if (x<y).
67 RECURSE Recurse, passing the top ``[..., x]`` ``[..., fn(x)]``
68 of the stack, and
69 popping the result.
70 RETURN Return the top of the ``[x]`` ``[]``
71 stack.
72 PUSH_CONST `arg` Push an int const. ``[...]`` ``[..., arg]``
73 JUMP_ABS_IF_TRUE `arg` Pop; if top of stack was ``[..., x]`` ``[...]``
74 nonzero, jump to
75 ``arg``.
76 ====================== ======================== =============== ==============
77
78 Programs can be interpreted, disassembled, and compiled to machine code.
79
80 The interpreter reads ``.toy`` scripts. Here's what a simple recursive
81 factorial program looks like, the script ``factorial.toy``.
82 The parser ignores lines beginning with a `#`.
83
84 .. literalinclude:: ../../examples/tut04-toyvm/factorial.toy
85 :lines: 1-
86 :language: c
87
88 The interpreter is a simple infinite loop with a big ``switch`` statement
89 based on what the next opcode is:
90
91 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
92 :start-after: /* Execute the given function. */
93 :end-before: /* JIT compilation. */
94 :language: c++
95
96 Compiling to machine code
97 *************************
98 We want to generate machine code that can be cast to this type and
99 then directly executed in-process:
100
101 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
102 :start-after: /* Functions are compiled to this function ptr type. */
103 :end-before: enum opcode
104 :language: c++
105
106 Our compiler isn't very sophisticated; it takes the implementation of
107 each opcode above, and maps it directly to the operations supported by
108 the libgccjit API.
109
110 How should we handle the stack? In theory we could calculate what the
111 stack depth will be at each opcode, and optimize away the stack
112 manipulation "by hand". We'll see below that libgccjit is able to do
113 this for us, so we'll implement stack manipulation
114 in a direct way, by creating a ``stack`` array and ``stack_depth``
115 variables, local within the generated function, equivalent to this C code:
116
117 .. code-block:: c
118
119 int stack_depth;
120 int stack[MAX_STACK_DEPTH];
121
122 We'll also have local variables ``x`` and ``y`` for use when implementing
123 the opcodes, equivalent to this:
124
125 .. code-block:: c
126
127 int x;
128 int y;
129
130 This means our compiler has the following state:
131
132 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
133 :start-after: /* State. */
134 :end-before: };
135 :language: c++
136
137 Setting things up
138 *****************
139
140 First we create our types:
141
142 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
143 :start-after: /* Create types. */
144 :end-before: /* The constant value 1. */
145 :language: c++
146
147 along with extracting a useful `int` constant:
148
149 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
150 :start-after: /* The constant value 1. */
151 :end-before: /* Create locations. */
152 :language: c++
153
154 We'll implement push and pop in terms of the ``stack`` array and
155 ``stack_depth``. Here are helper functions for adding statements to
156 a block, implementing pushing and popping values:
157
158 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
159 :start-after: /* Stack manipulation. */
160 :end-before: /* Create the context. */
161 :language: c++
162
163 We will support single-stepping through the generated code in the
164 debugger, so we need to create :type:`gccjit::location` instances, one
165 per operation in the source code. These will reference the lines of
166 e.g. ``factorial.toy``.
167
168 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
169 :start-after: /* Create locations. */
170 :end-before: /* Creating the function. */
171 :language: c++
172
173 Let's create the function itself. As usual, we create its parameter
174 first, then use the parameter to create the function:
175
176 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
177 :start-after: /* Creating the function. */
178 :end-before: /* Create stack lvalues. */
179 :language: c++
180
181 We create the locals within the function.
182
183 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
184 :start-after: /* Create stack lvalues. */
185 :end-before: /* 1st pass: create blocks, one per opcode.
186 :language: c++
187
188 Populating the function
189 ***********************
190
191 There's some one-time initialization, and the API treats the first block
192 you create as the entrypoint of the function, so we need to create that
193 block first:
194
195 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
196 :start-after: first. */
197 :end-before: /* Create a block per operation. */
198 :language: c++
199
200 We can now create blocks for each of the operations. Most of these will
201 be consolidated into larger blocks when the optimizer runs.
202
203 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
204 :start-after: /* Create a block per operation. */
205 :end-before: /* Populate the initial block. */
206 :language: c++
207
208 Now that we have a block it can jump to when it's done, we can populate
209 the initial block:
210
211 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
212 :start-after: /* Populate the initial block. */
213 :end-before: /* 2nd pass: fill in instructions. */
214 :language: c++
215
216 We can now populate the blocks for the individual operations. We loop
217 through them, adding instructions to their blocks:
218
219 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
220 :start-after: /* 2nd pass: fill in instructions. */
221 :end-before: /* Helper macros. */
222 :language: c++
223
224 We're going to have another big ``switch`` statement for implementing
225 the opcodes, this time for compiling them, rather than interpreting
226 them. It's helpful to have macros for implementing push and pop, so that
227 we can make the ``switch`` statement that's coming up look as much as
228 possible like the one above within the interpreter:
229
230 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
231 :start-after: /* Helper macros. */
232 :end-before: block.add_comment
233 :language: c++
234
235 .. note::
236
237 A particularly clever implementation would have an *identical*
238 ``switch`` statement shared by the interpreter and the compiler, with
239 some preprocessor "magic". We're not doing that here, for the sake
240 of simplicity.
241
242 When I first implemented this compiler, I accidentally missed an edit
243 when copying and pasting the ``Y_EQUALS_POP`` macro, so that popping the
244 stack into ``y`` instead erroneously assigned it to ``x``, leaving ``y``
245 uninitialized.
246
247 To track this kind of thing down, we can use
248 :func:`gccjit::block::add_comment` to add descriptive comments
249 to the internal representation. This is invaluable when looking through
250 the generated IR for, say ``factorial``:
251
252 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
253 :start-after: PUSH_RVALUE (y)
254 :end-before: /* Handle the individual opcodes. */
255 :language: c++
256
257 We can now write the big ``switch`` statement that implements the
258 individual opcodes, populating the relevant block with statements:
259
260 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
261 :start-after: /* Handle the individual opcodes. */
262 :end-before: /* Go to the next block. */
263 :language: c++
264
265 Every block must be terminated, via a call to one of the
266 ``gccjit::block::end_with_`` entrypoints. This has been done for two
267 of the opcodes, but we need to do it for the other ones, by jumping
268 to the next block.
269
270 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
271 :start-after: /* Go to the next block. */
272 :end-before: /* end of loop on PC locations. */
273 :language: c++
274
275 This is analogous to simply incrementing the program counter.
276
277 Verifying the control flow graph
278 ********************************
279 Having finished looping over the blocks, the context is complete.
280
281 As before, we can verify that the control flow and statements are sane by
282 using :func:`gccjit::function::dump_to_dot`:
283
284 .. code-block:: c++
285
286 fn.dump_to_dot ("/tmp/factorial.dot");
287
288 and viewing the result. Note how the label names, comments, and
289 variable names show up in the dump, to make it easier to spot
290 errors in our compiler.
291
292 .. figure:: ../../intro/factorial.png
293 :alt: image of a control flow graph
294
295 Compiling the context
296 *********************
297 Having finished looping over the blocks and populating them with
298 statements, the context is complete.
299
300 We can now compile it, and extract machine code from the result:
301
302 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
303 :start-after: /* We've now finished populating the context. Compile it. */
304 :end-before: /* (this leaks "result" and "funcname") */
305 :language: c++
306
307 We can now run the result:
308
309 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc
310 :start-after: /* JIT-compilation. */
311 :end-before: return 0;
312 :language: c++
313
314 Single-stepping through the generated code
315 ******************************************
316
317 It's possible to debug the generated code. To do this we need to both:
318
319 * Set up source code locations for our statements, so that we can
320 meaningfully step through the code. We did this above by
321 calling :func:`gccjit::context::new_location` and using the
322 results.
323
324 * Enable the generation of debugging information, by setting
325 :c:macro:`GCC_JIT_BOOL_OPTION_DEBUGINFO` on the
326 :type:`gccjit::context` via
327 :func:`gccjit::context::set_bool_option`:
328
329 .. code-block:: c++
330
331 ctxt.set_bool_option (GCC_JIT_BOOL_OPTION_DEBUGINFO, 1);
332
333 Having done this, we can put a breakpoint on the generated function:
334
335 .. code-block:: console
336
337 $ gdb --args ./toyvm factorial.toy 10
338 (gdb) break factorial
339 Function "factorial" not defined.
340 Make breakpoint pending on future shared library load? (y or [n]) y
341 Breakpoint 1 (factorial) pending.
342 (gdb) run
343 Breakpoint 1, factorial (arg=10) at factorial.toy:14
344 14 DUP
345
346 We've set up location information, which references ``factorial.toy``.
347 This allows us to use e.g. ``list`` to see where we are in the script:
348
349 .. code-block:: console
350
351 (gdb) list
352 9
353 10 # Initial state:
354 11 # stack: [arg]
355 12
356 13 # 0:
357 14 DUP
358 15 # stack: [arg, arg]
359 16
360 17 # 1:
361 18 PUSH_CONST 2
362
363 and to step through the function, examining the data:
364
365 .. code-block:: console
366
367 (gdb) n
368 18 PUSH_CONST 2
369 (gdb) n
370 22 BINARY_COMPARE_LT
371 (gdb) print stack
372 $5 = {10, 10, 2, 0, -7152, 32767, 0, 0}
373 (gdb) print stack_depth
374 $6 = 3
375
376 You'll see that the parts of the ``stack`` array that haven't been
377 touched yet are uninitialized.
378
379 .. note::
380
381 Turning on optimizations may lead to unpredictable results when
382 stepping through the generated code: the execution may appear to
383 "jump around" the source code. This is analogous to turning up the
384 optimization level in a regular compiler.
385
386 Examining the generated code
387 ****************************
388
389 How good is the optimized code?
390
391 We can turn up optimizations, by calling
392 :cpp:func:`gccjit::context::set_int_option` with
393 :c:macro:`GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL`:
394
395 .. code-block:: c++
396
397 ctxt.set_int_option (GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL, 3);
398
399 One of GCC's internal representations is called "gimple". A dump of the
400 initial gimple representation of the code can be seen by setting:
401
402 .. code-block:: c++
403
404 ctxt.set_bool_option (GCC_JIT_BOOL_OPTION_DUMP_INITIAL_GIMPLE, 1);
405
406 With optimization on and source locations displayed, this gives:
407
408 .. We'll use "c" for gimple dumps
409
410 .. code-block:: c
411
412 factorial (signed int arg)
413 {
414 <unnamed type> D.80;
415 signed int D.81;
416 signed int D.82;
417 signed int D.83;
418 signed int D.84;
419 signed int D.85;
420 signed int y;
421 signed int x;
422 signed int stack_depth;
423 signed int stack[8];
424
425 try
426 {
427 initial:
428 stack_depth = 0;
429 stack[stack_depth] = arg;
430 stack_depth = stack_depth + 1;
431 goto instr0;
432 instr0:
433 /* DUP */:
434 stack_depth = stack_depth + -1;
435 x = stack[stack_depth];
436 stack[stack_depth] = x;
437 stack_depth = stack_depth + 1;
438 stack[stack_depth] = x;
439 stack_depth = stack_depth + 1;
440 goto instr1;
441 instr1:
442 /* PUSH_CONST */:
443 stack[stack_depth] = 2;
444 stack_depth = stack_depth + 1;
445 goto instr2;
446
447 /* etc */
448
449 You can see the generated machine code in assembly form via:
450
451 .. code-block:: c++
452
453 ctxt.set_bool_option (GCC_JIT_BOOL_OPTION_DUMP_GENERATED_CODE, 1);
454 result = ctxt.compile ();
455
456 which shows that (on this x86_64 box) the compiler has unrolled the loop
457 and is using MMX instructions to perform several multiplications
458 simultaneously:
459
460 .. code-block:: gas
461
462 .file "fake.c"
463 .text
464 .Ltext0:
465 .p2align 4,,15
466 .globl factorial
467 .type factorial, @function
468 factorial:
469 .LFB0:
470 .file 1 "factorial.toy"
471 .loc 1 14 0
472 .cfi_startproc
473 .LVL0:
474 .L2:
475 .loc 1 26 0
476 cmpl $1, %edi
477 jle .L13
478 leal -1(%rdi), %edx
479 movl %edx, %ecx
480 shrl $2, %ecx
481 leal 0(,%rcx,4), %esi
482 testl %esi, %esi
483 je .L14
484 cmpl $9, %edx
485 jbe .L14
486 leal -2(%rdi), %eax
487 movl %eax, -16(%rsp)
488 leal -3(%rdi), %eax
489 movd -16(%rsp), %xmm0
490 movl %edi, -16(%rsp)
491 movl %eax, -12(%rsp)
492 movd -16(%rsp), %xmm1
493 xorl %eax, %eax
494 movl %edx, -16(%rsp)
495 movd -12(%rsp), %xmm4
496 movd -16(%rsp), %xmm6
497 punpckldq %xmm4, %xmm0
498 movdqa .LC1(%rip), %xmm4
499 punpckldq %xmm6, %xmm1
500 punpcklqdq %xmm0, %xmm1
501 movdqa .LC0(%rip), %xmm0
502 jmp .L5
503 # etc - edited for brevity
504
505 This is clearly overkill for a function that will likely overflow the
506 ``int`` type before the vectorization is worthwhile - but then again, this
507 is a toy example.
508
509 Turning down the optimization level to 2:
510
511 .. code-block:: c++
512
513 ctxt.set_int_option (GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL, 2);
514
515 yields this code, which is simple enough to quote in its entirety:
516
517 .. code-block:: gas
518
519 .file "fake.c"
520 .text
521 .p2align 4,,15
522 .globl factorial
523 .type factorial, @function
524 factorial:
525 .LFB0:
526 .cfi_startproc
527 .L2:
528 cmpl $1, %edi
529 jle .L8
530 movl $1, %edx
531 jmp .L4
532 .p2align 4,,10
533 .p2align 3
534 .L6:
535 movl %eax, %edi
536 .L4:
537 .L5:
538 leal -1(%rdi), %eax
539 imull %edi, %edx
540 cmpl $1, %eax
541 jne .L6
542 .L3:
543 .L7:
544 imull %edx, %eax
545 ret
546 .L8:
547 movl %edi, %eax
548 movl $1, %edx
549 jmp .L7
550 .cfi_endproc
551 .LFE0:
552 .size factorial, .-factorial
553 .ident "GCC: (GNU) 4.9.0 20131023 (Red Hat 0.2-%{gcc_release})"
554 .section .note.GNU-stack,"",@progbits
555
556 Note that the stack pushing and popping have been eliminated, as has the
557 recursive call (in favor of an iteration).
558
559 Putting it all together
560 ***********************
561
562 The complete example can be seen in the source tree at
563 ``gcc/jit/docs/examples/tut04-toyvm/toyvm.cc``
564
565 along with a Makefile and a couple of sample .toy scripts:
566
567 .. code-block:: console
568
569 $ ls -al
570 drwxrwxr-x. 2 david david 4096 Sep 19 17:46 .
571 drwxrwxr-x. 3 david david 4096 Sep 19 15:26 ..
572 -rw-rw-r--. 1 david david 615 Sep 19 12:43 factorial.toy
573 -rw-rw-r--. 1 david david 834 Sep 19 13:08 fibonacci.toy
574 -rw-rw-r--. 1 david david 238 Sep 19 14:22 Makefile
575 -rw-rw-r--. 1 david david 16457 Sep 19 17:07 toyvm.cc
576
577 $ make toyvm
578 g++ -Wall -g -o toyvm toyvm.cc -lgccjit
579
580 $ ./toyvm factorial.toy 10
581 interpreter result: 3628800
582 compiler result: 3628800
583
584 $ ./toyvm fibonacci.toy 10
585 interpreter result: 55
586 compiler result: 55
587
588 Behind the curtain: How does our code get optimized?
589 ****************************************************
590
591 Our example is done, but you may be wondering about exactly how the
592 compiler turned what we gave it into the machine code seen above.
593
594 We can examine what the compiler is doing in detail by setting:
595
596 .. code-block:: c++
597
598 state.ctxt.set_bool_option (GCC_JIT_BOOL_OPTION_DUMP_EVERYTHING, 1);
599 state.ctxt.set_bool_option (GCC_JIT_BOOL_OPTION_KEEP_INTERMEDIATES, 1);
600
601 This will dump detailed information about the compiler's state to a
602 directory under ``/tmp``, and keep it from being cleaned up.
603
604 The precise names and their formats of these files is subject to change.
605 Higher optimization levels lead to more files.
606 Here's what I saw (edited for brevity; there were almost 200 files):
607
608 .. code-block:: console
609
610 intermediate files written to /tmp/libgccjit-KPQbGw
611 $ ls /tmp/libgccjit-KPQbGw/
612 fake.c.000i.cgraph
613 fake.c.000i.type-inheritance
614 fake.c.004t.gimple
615 fake.c.007t.omplower
616 fake.c.008t.lower
617 fake.c.011t.eh
618 fake.c.012t.cfg
619 fake.c.014i.visibility
620 fake.c.015i.early_local_cleanups
621 fake.c.016t.ssa
622 # etc
623
624 The gimple code is converted into Static Single Assignment form,
625 with annotations for use when generating the debuginfo:
626
627 .. code-block:: console
628
629 $ less /tmp/libgccjit-KPQbGw/fake.c.016t.ssa
630
631 .. code-block:: c
632
633 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
634
635 factorial (signed int arg)
636 {
637 signed int stack[8];
638 signed int stack_depth;
639 signed int x;
640 signed int y;
641 <unnamed type> _20;
642 signed int _21;
643 signed int _38;
644 signed int _44;
645 signed int _51;
646 signed int _56;
647
648 initial:
649 stack_depth_3 = 0;
650 # DEBUG stack_depth => stack_depth_3
651 stack[stack_depth_3] = arg_5(D);
652 stack_depth_7 = stack_depth_3 + 1;
653 # DEBUG stack_depth => stack_depth_7
654 # DEBUG instr0 => NULL
655 # DEBUG /* DUP */ => NULL
656 stack_depth_8 = stack_depth_7 + -1;
657 # DEBUG stack_depth => stack_depth_8
658 x_9 = stack[stack_depth_8];
659 # DEBUG x => x_9
660 stack[stack_depth_8] = x_9;
661 stack_depth_11 = stack_depth_8 + 1;
662 # DEBUG stack_depth => stack_depth_11
663 stack[stack_depth_11] = x_9;
664 stack_depth_13 = stack_depth_11 + 1;
665 # DEBUG stack_depth => stack_depth_13
666 # DEBUG instr1 => NULL
667 # DEBUG /* PUSH_CONST */ => NULL
668 stack[stack_depth_13] = 2;
669
670 /* etc; edited for brevity */
671
672 We can perhaps better see the code by turning off
673 :c:macro:`GCC_JIT_BOOL_OPTION_DEBUGINFO` to suppress all those ``DEBUG``
674 statements, giving:
675
676 .. code-block:: console
677
678 $ less /tmp/libgccjit-1Hywc0/fake.c.016t.ssa
679
680 .. code-block:: c
681
682 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
683
684 factorial (signed int arg)
685 {
686 signed int stack[8];
687 signed int stack_depth;
688 signed int x;
689 signed int y;
690 <unnamed type> _20;
691 signed int _21;
692 signed int _38;
693 signed int _44;
694 signed int _51;
695 signed int _56;
696
697 initial:
698 stack_depth_3 = 0;
699 stack[stack_depth_3] = arg_5(D);
700 stack_depth_7 = stack_depth_3 + 1;
701 stack_depth_8 = stack_depth_7 + -1;
702 x_9 = stack[stack_depth_8];
703 stack[stack_depth_8] = x_9;
704 stack_depth_11 = stack_depth_8 + 1;
705 stack[stack_depth_11] = x_9;
706 stack_depth_13 = stack_depth_11 + 1;
707 stack[stack_depth_13] = 2;
708 stack_depth_15 = stack_depth_13 + 1;
709 stack_depth_16 = stack_depth_15 + -1;
710 y_17 = stack[stack_depth_16];
711 stack_depth_18 = stack_depth_16 + -1;
712 x_19 = stack[stack_depth_18];
713 _20 = x_19 < y_17;
714 _21 = (signed int) _20;
715 stack[stack_depth_18] = _21;
716 stack_depth_23 = stack_depth_18 + 1;
717 stack_depth_24 = stack_depth_23 + -1;
718 x_25 = stack[stack_depth_24];
719 if (x_25 != 0)
720 goto <bb 4> (instr9);
721 else
722 goto <bb 3> (instr4);
723
724 instr4:
725 /* DUP */:
726 stack_depth_26 = stack_depth_24 + -1;
727 x_27 = stack[stack_depth_26];
728 stack[stack_depth_26] = x_27;
729 stack_depth_29 = stack_depth_26 + 1;
730 stack[stack_depth_29] = x_27;
731 stack_depth_31 = stack_depth_29 + 1;
732 stack[stack_depth_31] = 1;
733 stack_depth_33 = stack_depth_31 + 1;
734 stack_depth_34 = stack_depth_33 + -1;
735 y_35 = stack[stack_depth_34];
736 stack_depth_36 = stack_depth_34 + -1;
737 x_37 = stack[stack_depth_36];
738 _38 = x_37 - y_35;
739 stack[stack_depth_36] = _38;
740 stack_depth_40 = stack_depth_36 + 1;
741 stack_depth_41 = stack_depth_40 + -1;
742 x_42 = stack[stack_depth_41];
743 _44 = factorial (x_42);
744 stack[stack_depth_41] = _44;
745 stack_depth_46 = stack_depth_41 + 1;
746 stack_depth_47 = stack_depth_46 + -1;
747 y_48 = stack[stack_depth_47];
748 stack_depth_49 = stack_depth_47 + -1;
749 x_50 = stack[stack_depth_49];
750 _51 = x_50 * y_48;
751 stack[stack_depth_49] = _51;
752 stack_depth_53 = stack_depth_49 + 1;
753
754 # stack_depth_1 = PHI <stack_depth_24(2), stack_depth_53(3)>
755 instr9:
756 /* RETURN */:
757 stack_depth_54 = stack_depth_1 + -1;
758 x_55 = stack[stack_depth_54];
759 _56 = x_55;
760 stack ={v} {CLOBBER};
761 return _56;
762
763 }
764
765 Note in the above how all the :type:`gccjit::block` instances we
766 created have been consolidated into just 3 blocks in GCC's internal
767 representation: ``initial``, ``instr4`` and ``instr9``.
768
769 Optimizing away stack manipulation
770 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
771 Recall our simple implementation of stack operations. Let's examine
772 how the stack operations are optimized away.
773
774 After a pass of constant-propagation, the depth of the stack at each
775 opcode can be determined at compile-time:
776
777 .. code-block:: console
778
779 $ less /tmp/libgccjit-1Hywc0/fake.c.021t.ccp1
780
781 .. code-block:: c
782
783 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
784
785 factorial (signed int arg)
786 {
787 signed int stack[8];
788 signed int stack_depth;
789 signed int x;
790 signed int y;
791 <unnamed type> _20;
792 signed int _21;
793 signed int _38;
794 signed int _44;
795 signed int _51;
796
797 initial:
798 stack[0] = arg_5(D);
799 x_9 = stack[0];
800 stack[0] = x_9;
801 stack[1] = x_9;
802 stack[2] = 2;
803 y_17 = stack[2];
804 x_19 = stack[1];
805 _20 = x_19 < y_17;
806 _21 = (signed int) _20;
807 stack[1] = _21;
808 x_25 = stack[1];
809 if (x_25 != 0)
810 goto <bb 4> (instr9);
811 else
812 goto <bb 3> (instr4);
813
814 instr4:
815 /* DUP */:
816 x_27 = stack[0];
817 stack[0] = x_27;
818 stack[1] = x_27;
819 stack[2] = 1;
820 y_35 = stack[2];
821 x_37 = stack[1];
822 _38 = x_37 - y_35;
823 stack[1] = _38;
824 x_42 = stack[1];
825 _44 = factorial (x_42);
826 stack[1] = _44;
827 y_48 = stack[1];
828 x_50 = stack[0];
829 _51 = x_50 * y_48;
830 stack[0] = _51;
831
832 instr9:
833 /* RETURN */:
834 x_55 = stack[0];
835 x_56 = x_55;
836 stack ={v} {CLOBBER};
837 return x_56;
838
839 }
840
841 Note how, in the above, all those ``stack_depth`` values are now just
842 constants: we're accessing specific stack locations at each opcode.
843
844 The "esra" pass ("Early Scalar Replacement of Aggregates") breaks
845 out our "stack" array into individual elements:
846
847 .. code-block:: console
848
849 $ less /tmp/libgccjit-1Hywc0/fake.c.024t.esra
850
851 .. code-block:: c
852
853 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
854
855 Created a replacement for stack offset: 0, size: 32: stack$0
856 Created a replacement for stack offset: 32, size: 32: stack$1
857 Created a replacement for stack offset: 64, size: 32: stack$2
858
859 Symbols to be put in SSA form
860 { D.89 D.90 D.91 }
861 Incremental SSA update started at block: 0
862 Number of blocks in CFG: 5
863 Number of blocks to update: 4 ( 80%)
864
865
866 factorial (signed int arg)
867 {
868 signed int stack$2;
869 signed int stack$1;
870 signed int stack$0;
871 signed int stack[8];
872 signed int stack_depth;
873 signed int x;
874 signed int y;
875 <unnamed type> _20;
876 signed int _21;
877 signed int _38;
878 signed int _44;
879 signed int _51;
880
881 initial:
882 stack$0_45 = arg_5(D);
883 x_9 = stack$0_45;
884 stack$0_39 = x_9;
885 stack$1_32 = x_9;
886 stack$2_30 = 2;
887 y_17 = stack$2_30;
888 x_19 = stack$1_32;
889 _20 = x_19 < y_17;
890 _21 = (signed int) _20;
891 stack$1_28 = _21;
892 x_25 = stack$1_28;
893 if (x_25 != 0)
894 goto <bb 4> (instr9);
895 else
896 goto <bb 3> (instr4);
897
898 instr4:
899 /* DUP */:
900 x_27 = stack$0_39;
901 stack$0_22 = x_27;
902 stack$1_14 = x_27;
903 stack$2_12 = 1;
904 y_35 = stack$2_12;
905 x_37 = stack$1_14;
906 _38 = x_37 - y_35;
907 stack$1_10 = _38;
908 x_42 = stack$1_10;
909 _44 = factorial (x_42);
910 stack$1_6 = _44;
911 y_48 = stack$1_6;
912 x_50 = stack$0_22;
913 _51 = x_50 * y_48;
914 stack$0_1 = _51;
915
916 # stack$0_52 = PHI <stack$0_39(2), stack$0_1(3)>
917 instr9:
918 /* RETURN */:
919 x_55 = stack$0_52;
920 x_56 = x_55;
921 stack ={v} {CLOBBER};
922 return x_56;
923
924 }
925
926 Hence at this point, all those pushes and pops of the stack are now
927 simply assignments to specific temporary variables.
928
929 After some copy propagation, the stack manipulation has been completely
930 optimized away:
931
932 .. code-block:: console
933
934 $ less /tmp/libgccjit-1Hywc0/fake.c.026t.copyprop1
935
936 .. code-block:: c
937
938 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
939
940 factorial (signed int arg)
941 {
942 signed int stack$2;
943 signed int stack$1;
944 signed int stack$0;
945 signed int stack[8];
946 signed int stack_depth;
947 signed int x;
948 signed int y;
949 <unnamed type> _20;
950 signed int _21;
951 signed int _38;
952 signed int _44;
953 signed int _51;
954
955 initial:
956 stack$0_39 = arg_5(D);
957 _20 = arg_5(D) <= 1;
958 _21 = (signed int) _20;
959 if (_21 != 0)
960 goto <bb 4> (instr9);
961 else
962 goto <bb 3> (instr4);
963
964 instr4:
965 /* DUP */:
966 _38 = arg_5(D) + -1;
967 _44 = factorial (_38);
968 _51 = arg_5(D) * _44;
969 stack$0_1 = _51;
970
971 # stack$0_52 = PHI <arg_5(D)(2), _51(3)>
972 instr9:
973 /* RETURN */:
974 stack ={v} {CLOBBER};
975 return stack$0_52;
976
977 }
978
979 Later on, another pass finally eliminated ``stack_depth`` local and the
980 unused parts of the `stack`` array altogether:
981
982 .. code-block:: console
983
984 $ less /tmp/libgccjit-1Hywc0/fake.c.036t.release_ssa
985
986 .. code-block:: c
987
988 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
989
990 Released 44 names, 314.29%, removed 44 holes
991 factorial (signed int arg)
992 {
993 signed int stack$0;
994 signed int mult_acc_1;
995 <unnamed type> _5;
996 signed int _6;
997 signed int _7;
998 signed int mul_tmp_10;
999 signed int mult_acc_11;
1000 signed int mult_acc_13;
1001
1002 # arg_9 = PHI <arg_8(D)(0)>
1003 # mult_acc_13 = PHI <1(0)>
1004 initial:
1005
1006 <bb 5>:
1007 # arg_4 = PHI <arg_9(2), _7(3)>
1008 # mult_acc_1 = PHI <mult_acc_13(2), mult_acc_11(3)>
1009 _5 = arg_4 <= 1;
1010 _6 = (signed int) _5;
1011 if (_6 != 0)
1012 goto <bb 4> (instr9);
1013 else
1014 goto <bb 3> (instr4);
1015
1016 instr4:
1017 /* DUP */:
1018 _7 = arg_4 + -1;
1019 mult_acc_11 = mult_acc_1 * arg_4;
1020 goto <bb 5>;
1021
1022 # stack$0_12 = PHI <arg_4(5)>
1023 instr9:
1024 /* RETURN */:
1025 mul_tmp_10 = mult_acc_1 * stack$0_12;
1026 return mul_tmp_10;
1027
1028 }
1029
1030
1031 Elimination of tail recursion
1032 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1033 Another significant optimization is the detection that the call to
1034 ``factorial`` is tail recursion, which can be eliminated in favor of
1035 an iteration:
1036
1037 .. code-block:: console
1038
1039 $ less /tmp/libgccjit-1Hywc0/fake.c.030t.tailr1
1040
1041 .. code-block:: c
1042
1043 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0)
1044
1045
1046 Symbols to be put in SSA form
1047 { D.88 }
1048 Incremental SSA update started at block: 0
1049 Number of blocks in CFG: 5
1050 Number of blocks to update: 4 ( 80%)
1051
1052
1053 factorial (signed int arg)
1054 {
1055 signed int stack$2;
1056 signed int stack$1;
1057 signed int stack$0;
1058 signed int stack[8];
1059 signed int stack_depth;
1060 signed int x;
1061 signed int y;
1062 signed int mult_acc_1;
1063 <unnamed type> _20;
1064 signed int _21;
1065 signed int _38;
1066 signed int mul_tmp_44;
1067 signed int mult_acc_51;
1068
1069 # arg_5 = PHI <arg_39(D)(0), _38(3)>
1070 # mult_acc_1 = PHI <1(0), mult_acc_51(3)>
1071 initial:
1072 _20 = arg_5 <= 1;
1073 _21 = (signed int) _20;
1074 if (_21 != 0)
1075 goto <bb 4> (instr9);
1076 else
1077 goto <bb 3> (instr4);
1078
1079 instr4:
1080 /* DUP */:
1081 _38 = arg_5 + -1;
1082 mult_acc_51 = mult_acc_1 * arg_5;
1083 goto <bb 2> (initial);
1084
1085 # stack$0_52 = PHI <arg_5(2)>
1086 instr9:
1087 /* RETURN */:
1088 stack ={v} {CLOBBER};
1089 mul_tmp_44 = mult_acc_1 * stack$0_52;
1090 return mul_tmp_44;
1091
1092 }