]> git.ipfire.org Git - thirdparty/git.git/blob - Documentation/MyFirstObjectWalk.txt
Merge branch 'jc/unleak-core-excludesfile'
[thirdparty/git.git] / Documentation / MyFirstObjectWalk.txt
1 = My First Object Walk
2
3 == What's an Object Walk?
4
5 The object walk is a key concept in Git - this is the process that underpins
6 operations like object transfer and fsck. Beginning from a given commit, the
7 list of objects is found by walking parent relationships between commits (commit
8 X based on commit W) and containment relationships between objects (tree Y is
9 contained within commit X, and blob Z is located within tree Y, giving our
10 working tree for commit X something like `y/z.txt`).
11
12 A related concept is the revision walk, which is focused on commit objects and
13 their parent relationships and does not delve into other object types. The
14 revision walk is used for operations like `git log`.
15
16 === Related Reading
17
18 - `Documentation/user-manual.txt` under "Hacking Git" contains some coverage of
19 the revision walker in its various incarnations.
20 - `revision.h`
21 - https://eagain.net/articles/git-for-computer-scientists/[Git for Computer Scientists]
22 gives a good overview of the types of objects in Git and what your object
23 walk is really describing.
24
25 == Setting Up
26
27 Create a new branch from `master`.
28
29 ----
30 git checkout -b revwalk origin/master
31 ----
32
33 We'll put our fiddling into a new command. For fun, let's name it `git walken`.
34 Open up a new file `builtin/walken.c` and set up the command handler:
35
36 ----
37 /*
38 * "git walken"
39 *
40 * Part of the "My First Object Walk" tutorial.
41 */
42
43 #include "builtin.h"
44 #include "trace.h"
45
46 int cmd_walken(int argc, const char **argv, const char *prefix)
47 {
48 trace_printf(_("cmd_walken incoming...\n"));
49 return 0;
50 }
51 ----
52
53 NOTE: `trace_printf()`, defined in `trace.h`, differs from `printf()` in
54 that it can be turned on or off at runtime. For the purposes of this
55 tutorial, we will write `walken` as though it is intended for use as
56 a "plumbing" command: that is, a command which is used primarily in
57 scripts, rather than interactively by humans (a "porcelain" command).
58 So we will send our debug output to `trace_printf()` instead.
59 When running, enable trace output by setting the environment variable `GIT_TRACE`.
60
61 Add usage text and `-h` handling, like all subcommands should consistently do
62 (our test suite will notice and complain if you fail to do so).
63 We'll need to include the `parse-options.h` header.
64
65 ----
66 #include "parse-options.h"
67
68 ...
69
70 int cmd_walken(int argc, const char **argv, const char *prefix)
71 {
72 const char * const walken_usage[] = {
73 N_("git walken"),
74 NULL,
75 };
76 struct option options[] = {
77 OPT_END()
78 };
79
80 argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
81
82 ...
83 }
84 ----
85
86 Also add the relevant line in `builtin.h` near `cmd_whatchanged()`:
87
88 ----
89 int cmd_walken(int argc, const char **argv, const char *prefix);
90 ----
91
92 Include the command in `git.c` in `commands[]` near the entry for `whatchanged`,
93 maintaining alphabetical ordering:
94
95 ----
96 { "walken", cmd_walken, RUN_SETUP },
97 ----
98
99 Add it to the `Makefile` near the line for `builtin/worktree.o`:
100
101 ----
102 BUILTIN_OBJS += builtin/walken.o
103 ----
104
105 Build and test out your command, without forgetting to ensure the `DEVELOPER`
106 flag is set, and with `GIT_TRACE` enabled so the debug output can be seen:
107
108 ----
109 $ echo DEVELOPER=1 >>config.mak
110 $ make
111 $ GIT_TRACE=1 ./bin-wrappers/git walken
112 ----
113
114 NOTE: For a more exhaustive overview of the new command process, take a look at
115 `Documentation/MyFirstContribution.txt`.
116
117 NOTE: A reference implementation can be found at
118 https://github.com/nasamuffin/git/tree/revwalk.
119
120 === `struct rev_cmdline_info`
121
122 The definition of `struct rev_cmdline_info` can be found in `revision.h`.
123
124 This struct is contained within the `rev_info` struct and is used to reflect
125 parameters provided by the user over the CLI.
126
127 `nr` represents the number of `rev_cmdline_entry` present in the array.
128
129 `alloc` is used by the `ALLOC_GROW` macro. Check `alloc.h` - this variable is
130 used to track the allocated size of the list.
131
132 Per entry, we find:
133
134 `item` is the object provided upon which to base the object walk. Items in Git
135 can be blobs, trees, commits, or tags. (See `Documentation/gittutorial-2.txt`.)
136
137 `name` is the object ID (OID) of the object - a hex string you may be familiar
138 with from using Git to organize your source in the past. Check the tutorial
139 mentioned above towards the top for a discussion of where the OID can come
140 from.
141
142 `whence` indicates some information about what to do with the parents of the
143 specified object. We'll explore this flag more later on; take a look at
144 `Documentation/revisions.txt` to get an idea of what could set the `whence`
145 value.
146
147 `flags` are used to hint the beginning of the revision walk and are the first
148 block under the `#include`s in `revision.h`. The most likely ones to be set in
149 the `rev_cmdline_info` are `UNINTERESTING` and `BOTTOM`, but these same flags
150 can be used during the walk, as well.
151
152 === `struct rev_info`
153
154 This one is quite a bit longer, and many fields are only used during the walk
155 by `revision.c` - not configuration options. Most of the configurable flags in
156 `struct rev_info` have a mirror in `Documentation/rev-list-options.txt`. It's a
157 good idea to take some time and read through that document.
158
159 == Basic Commit Walk
160
161 First, let's see if we can replicate the output of `git log --oneline`. We'll
162 refer back to the implementation frequently to discover norms when performing
163 an object walk of our own.
164
165 To do so, we'll first find all the commits, in order, which preceded the current
166 commit. We'll extract the name and subject of the commit from each.
167
168 Ideally, we will also be able to find out which ones are currently at the tip of
169 various branches.
170
171 === Setting Up
172
173 Preparing for your object walk has some distinct stages.
174
175 1. Perform default setup for this mode, and others which may be invoked.
176 2. Check configuration files for relevant settings.
177 3. Set up the `rev_info` struct.
178 4. Tweak the initialized `rev_info` to suit the current walk.
179 5. Prepare the `rev_info` for the walk.
180 6. Iterate over the objects, processing each one.
181
182 ==== Default Setups
183
184 Before examining configuration files which may modify command behavior, set up
185 default state for switches or options your command may have. If your command
186 utilizes other Git components, ask them to set up their default states as well.
187 For instance, `git log` takes advantage of `grep` and `diff` functionality, so
188 its `init_log_defaults()` sets its own state (`decoration_style`) and asks
189 `grep` and `diff` to initialize themselves by calling each of their
190 initialization functions.
191
192 ==== Configuring From `.gitconfig`
193
194 Next, we should have a look at any relevant configuration settings (i.e.,
195 settings readable and settable from `git config`). This is done by providing a
196 callback to `git_config()`; within that callback, you can also invoke methods
197 from other components you may need that need to intercept these options. Your
198 callback will be invoked once per each configuration value which Git knows about
199 (global, local, worktree, etc.).
200
201 Similarly to the default values, we don't have anything to do here yet
202 ourselves; however, we should call `git_default_config()` if we aren't calling
203 any other existing config callbacks.
204
205 Add a new function to `builtin/walken.c`.
206 We'll also need to include the `config.h` header:
207
208 ----
209 #include "config.h"
210
211 ...
212
213 static int git_walken_config(const char *var, const char *value,
214 const struct config_context *ctx, void *cb)
215 {
216 /*
217 * For now, we don't have any custom configuration, so fall back to
218 * the default config.
219 */
220 return git_default_config(var, value, ctx, cb);
221 }
222 ----
223
224 Make sure to invoke `git_config()` with it in your `cmd_walken()`:
225
226 ----
227 int cmd_walken(int argc, const char **argv, const char *prefix)
228 {
229 ...
230
231 git_config(git_walken_config, NULL);
232
233 ...
234 }
235 ----
236
237 ==== Setting Up `rev_info`
238
239 Now that we've gathered external configuration and options, it's time to
240 initialize the `rev_info` object which we will use to perform the walk. This is
241 typically done by calling `repo_init_revisions()` with the repository you intend
242 to target, as well as the `prefix` argument of `cmd_walken` and your `rev_info`
243 struct.
244
245 Add the `struct rev_info` and the `repo_init_revisions()` call.
246 We'll also need to include the `revision.h` header:
247
248 ----
249 #include "revision.h"
250
251 ...
252
253 int cmd_walken(int argc, const char **argv, const char *prefix)
254 {
255 /* This can go wherever you like in your declarations.*/
256 struct rev_info rev;
257 ...
258
259 /* This should go after the git_config() call. */
260 repo_init_revisions(the_repository, &rev, prefix);
261
262 ...
263 }
264 ----
265
266 ==== Tweaking `rev_info` For the Walk
267
268 We're getting close, but we're still not quite ready to go. Now that `rev` is
269 initialized, we can modify it to fit our needs. This is usually done within a
270 helper for clarity, so let's add one:
271
272 ----
273 static void final_rev_info_setup(struct rev_info *rev)
274 {
275 /*
276 * We want to mimic the appearance of `git log --oneline`, so let's
277 * force oneline format.
278 */
279 get_commit_format("oneline", rev);
280
281 /* Start our object walk at HEAD. */
282 add_head_to_pending(rev);
283 }
284 ----
285
286 [NOTE]
287 ====
288 Instead of using the shorthand `add_head_to_pending()`, you could do
289 something like this:
290 ----
291 struct setup_revision_opt opt;
292
293 memset(&opt, 0, sizeof(opt));
294 opt.def = "HEAD";
295 opt.revarg_opt = REVARG_COMMITTISH;
296 setup_revisions(argc, argv, rev, &opt);
297 ----
298 Using a `setup_revision_opt` gives you finer control over your walk's starting
299 point.
300 ====
301
302 Then let's invoke `final_rev_info_setup()` after the call to
303 `repo_init_revisions()`:
304
305 ----
306 int cmd_walken(int argc, const char **argv, const char *prefix)
307 {
308 ...
309
310 final_rev_info_setup(&rev);
311
312 ...
313 }
314 ----
315
316 Later, we may wish to add more arguments to `final_rev_info_setup()`. But for
317 now, this is all we need.
318
319 ==== Preparing `rev_info` For the Walk
320
321 Now that `rev` is all initialized and configured, we've got one more setup step
322 before we get rolling. We can do this in a helper, which will both prepare the
323 `rev_info` for the walk, and perform the walk itself. Let's start the helper
324 with the call to `prepare_revision_walk()`, which can return an error without
325 dying on its own:
326
327 ----
328 static void walken_commit_walk(struct rev_info *rev)
329 {
330 if (prepare_revision_walk(rev))
331 die(_("revision walk setup failed"));
332 }
333 ----
334
335 NOTE: `die()` prints to `stderr` and exits the program. Since it will print to
336 `stderr` it's likely to be seen by a human, so we will localize it.
337
338 ==== Performing the Walk!
339
340 Finally! We are ready to begin the walk itself. Now we can see that `rev_info`
341 can also be used as an iterator; we move to the next item in the walk by using
342 `get_revision()` repeatedly. Add the listed variable declarations at the top and
343 the walk loop below the `prepare_revision_walk()` call within your
344 `walken_commit_walk()`:
345
346 ----
347 #include "pretty.h"
348
349 ...
350
351 static void walken_commit_walk(struct rev_info *rev)
352 {
353 struct commit *commit;
354 struct strbuf prettybuf = STRBUF_INIT;
355
356 ...
357
358 while ((commit = get_revision(rev))) {
359 strbuf_reset(&prettybuf);
360 pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf);
361 puts(prettybuf.buf);
362 }
363 strbuf_release(&prettybuf);
364 }
365 ----
366
367 NOTE: `puts()` prints a `char*` to `stdout`. Since this is the part of the
368 command we expect to be machine-parsed, we're sending it directly to stdout.
369
370 Give it a shot.
371
372 ----
373 $ make
374 $ ./bin-wrappers/git walken
375 ----
376
377 You should see all of the subject lines of all the commits in
378 your tree's history, in order, ending with the initial commit, "Initial revision
379 of "git", the information manager from hell". Congratulations! You've written
380 your first revision walk. You can play with printing some additional fields
381 from each commit if you're curious; have a look at the functions available in
382 `commit.h`.
383
384 === Adding a Filter
385
386 Next, let's try to filter the commits we see based on their author. This is
387 equivalent to running `git log --author=<pattern>`. We can add a filter by
388 modifying `rev_info.grep_filter`, which is a `struct grep_opt`.
389
390 First some setup. Add `grep_config()` to `git_walken_config()`:
391
392 ----
393 static int git_walken_config(const char *var, const char *value,
394 const struct config_context *ctx, void *cb)
395 {
396 grep_config(var, value, ctx, cb);
397 return git_default_config(var, value, ctx, cb);
398 }
399 ----
400
401 Next, we can modify the `grep_filter`. This is done with convenience functions
402 found in `grep.h`. For fun, we're filtering to only commits from folks using a
403 `gmail.com` email address - a not-very-precise guess at who may be working on
404 Git as a hobby. Since we're checking the author, which is a specific line in the
405 header, we'll use the `append_header_grep_pattern()` helper. We can use
406 the `enum grep_header_field` to indicate which part of the commit header we want
407 to search.
408
409 In `final_rev_info_setup()`, add your filter line:
410
411 ----
412 static void final_rev_info_setup(int argc, const char **argv,
413 const char *prefix, struct rev_info *rev)
414 {
415 ...
416
417 append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR,
418 "gmail");
419 compile_grep_patterns(&rev->grep_filter);
420
421 ...
422 }
423 ----
424
425 `append_header_grep_pattern()` adds your new "gmail" pattern to `rev_info`, but
426 it won't work unless we compile it with `compile_grep_patterns()`.
427
428 NOTE: If you are using `setup_revisions()` (for example, if you are passing a
429 `setup_revision_opt` instead of using `add_head_to_pending()`), you don't need
430 to call `compile_grep_patterns()` because `setup_revisions()` calls it for you.
431
432 NOTE: We could add the same filter via the `append_grep_pattern()` helper if we
433 wanted to, but `append_header_grep_pattern()` adds the `enum grep_context` and
434 `enum grep_pat_token` for us.
435
436 === Changing the Order
437
438 There are a few ways that we can change the order of the commits during a
439 revision walk. Firstly, we can use the `enum rev_sort_order` to choose from some
440 typical orderings.
441
442 `topo_order` is the same as `git log --topo-order`: we avoid showing a parent
443 before all of its children have been shown, and we avoid mixing commits which
444 are in different lines of history. (`git help log`'s section on `--topo-order`
445 has a very nice diagram to illustrate this.)
446
447 Let's see what happens when we run with `REV_SORT_BY_COMMIT_DATE` as opposed to
448 `REV_SORT_BY_AUTHOR_DATE`. Add the following:
449
450 ----
451 static void final_rev_info_setup(int argc, const char **argv,
452 const char *prefix, struct rev_info *rev)
453 {
454 ...
455
456 rev->topo_order = 1;
457 rev->sort_order = REV_SORT_BY_COMMIT_DATE;
458
459 ...
460 }
461 ----
462
463 Let's output this into a file so we can easily diff it with the walk sorted by
464 author date.
465
466 ----
467 $ make
468 $ ./bin-wrappers/git walken > commit-date.txt
469 ----
470
471 Then, let's sort by author date and run it again.
472
473 ----
474 static void final_rev_info_setup(int argc, const char **argv,
475 const char *prefix, struct rev_info *rev)
476 {
477 ...
478
479 rev->topo_order = 1;
480 rev->sort_order = REV_SORT_BY_AUTHOR_DATE;
481
482 ...
483 }
484 ----
485
486 ----
487 $ make
488 $ ./bin-wrappers/git walken > author-date.txt
489 ----
490
491 Finally, compare the two. This is a little less helpful without object names or
492 dates, but hopefully we get the idea.
493
494 ----
495 $ diff -u commit-date.txt author-date.txt
496 ----
497
498 This display indicates that commits can be reordered after they're written, for
499 example with `git rebase`.
500
501 Let's try one more reordering of commits. `rev_info` exposes a `reverse` flag.
502 Set that flag somewhere inside of `final_rev_info_setup()`:
503
504 ----
505 static void final_rev_info_setup(int argc, const char **argv, const char *prefix,
506 struct rev_info *rev)
507 {
508 ...
509
510 rev->reverse = 1;
511
512 ...
513 }
514 ----
515
516 Run your walk again and note the difference in order. (If you remove the grep
517 pattern, you should see the last commit this call gives you as your current
518 HEAD.)
519
520 == Basic Object Walk
521
522 So far we've been walking only commits. But Git has more types of objects than
523 that! Let's see if we can walk _all_ objects, and find out some information
524 about each one.
525
526 We can base our work on an example. `git pack-objects` prepares all kinds of
527 objects for packing into a bitmap or packfile. The work we are interested in
528 resides in `builtin/pack-objects.c:get_object_list()`; examination of that
529 function shows that the all-object walk is being performed by
530 `traverse_commit_list()` or `traverse_commit_list_filtered()`. Those two
531 functions reside in `list-objects.c`; examining the source shows that, despite
532 the name, these functions traverse all kinds of objects. Let's have a look at
533 the arguments to `traverse_commit_list()`.
534
535 - `struct rev_info *revs`: This is the `rev_info` used for the walk. If
536 its `filter` member is not `NULL`, then `filter` contains information for
537 how to filter the object list.
538 - `show_commit_fn show_commit`: A callback which will be used to handle each
539 individual commit object.
540 - `show_object_fn show_object`: A callback which will be used to handle each
541 non-commit object (so each blob, tree, or tag).
542 - `void *show_data`: A context buffer which is passed in turn to `show_commit`
543 and `show_object`.
544
545 In addition, `traverse_commit_list_filtered()` has an additional parameter:
546
547 - `struct oidset *omitted`: A linked-list of object IDs which the provided
548 filter caused to be omitted.
549
550 It looks like these methods use callbacks we provide instead of needing us
551 to call it repeatedly ourselves. Cool! Let's add the callbacks first.
552
553 For the sake of this tutorial, we'll simply keep track of how many of each kind
554 of object we find. At file scope in `builtin/walken.c` add the following
555 tracking variables:
556
557 ----
558 static int commit_count;
559 static int tag_count;
560 static int blob_count;
561 static int tree_count;
562 ----
563
564 Commits are handled by a different callback than other objects; let's do that
565 one first:
566
567 ----
568 static void walken_show_commit(struct commit *cmt, void *buf)
569 {
570 commit_count++;
571 }
572 ----
573
574 The `cmt` argument is fairly self-explanatory. But it's worth mentioning that
575 the `buf` argument is actually the context buffer that we can provide to the
576 traversal calls - `show_data`, which we mentioned a moment ago.
577
578 Since we have the `struct commit` object, we can look at all the same parts that
579 we looked at in our earlier commit-only walk. For the sake of this tutorial,
580 though, we'll just increment the commit counter and move on.
581
582 The callback for non-commits is a little different, as we'll need to check
583 which kind of object we're dealing with:
584
585 ----
586 static void walken_show_object(struct object *obj, const char *str, void *buf)
587 {
588 switch (obj->type) {
589 case OBJ_TREE:
590 tree_count++;
591 break;
592 case OBJ_BLOB:
593 blob_count++;
594 break;
595 case OBJ_TAG:
596 tag_count++;
597 break;
598 case OBJ_COMMIT:
599 BUG("unexpected commit object in walken_show_object\n");
600 default:
601 BUG("unexpected object type %s in walken_show_object\n",
602 type_name(obj->type));
603 }
604 }
605 ----
606
607 Again, `obj` is fairly self-explanatory, and we can guess that `buf` is the same
608 context pointer that `walken_show_commit()` receives: the `show_data` argument
609 to `traverse_commit_list()` and `traverse_commit_list_filtered()`. Finally,
610 `str` contains the name of the object, which ends up being something like
611 `foo.txt` (blob), `bar/baz` (tree), or `v1.2.3` (tag).
612
613 To help assure us that we aren't double-counting commits, we'll include some
614 complaining if a commit object is routed through our non-commit callback; we'll
615 also complain if we see an invalid object type. Since those two cases should be
616 unreachable, and would only change in the event of a semantic change to the Git
617 codebase, we complain by using `BUG()` - which is a signal to a developer that
618 the change they made caused unintended consequences, and the rest of the
619 codebase needs to be updated to understand that change. `BUG()` is not intended
620 to be seen by the public, so it is not localized.
621
622 Our main object walk implementation is substantially different from our commit
623 walk implementation, so let's make a new function to perform the object walk. We
624 can perform setup which is applicable to all objects here, too, to keep separate
625 from setup which is applicable to commit-only walks.
626
627 We'll start by enabling all types of objects in the `struct rev_info`. We'll
628 also turn on `tree_blobs_in_commit_order`, which means that we will walk a
629 commit's tree and everything it points to immediately after we find each commit,
630 as opposed to waiting for the end and walking through all trees after the commit
631 history has been discovered. With the appropriate settings configured, we are
632 ready to call `prepare_revision_walk()`.
633
634 ----
635 static void walken_object_walk(struct rev_info *rev)
636 {
637 rev->tree_objects = 1;
638 rev->blob_objects = 1;
639 rev->tag_objects = 1;
640 rev->tree_blobs_in_commit_order = 1;
641
642 if (prepare_revision_walk(rev))
643 die(_("revision walk setup failed"));
644
645 commit_count = 0;
646 tag_count = 0;
647 blob_count = 0;
648 tree_count = 0;
649 ----
650
651 Let's start by calling just the unfiltered walk and reporting our counts.
652 Complete your implementation of `walken_object_walk()`.
653 We'll also need to include the `list-objects.h` header.
654
655 ----
656 #include "list-objects.h"
657
658 ...
659
660 traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
661
662 printf("commits %d\nblobs %d\ntags %d\ntrees %d\n", commit_count,
663 blob_count, tag_count, tree_count);
664 }
665 ----
666
667 NOTE: This output is intended to be machine-parsed. Therefore, we are not
668 sending it to `trace_printf()`, and we are not localizing it - we need scripts
669 to be able to count on the formatting to be exactly the way it is shown here.
670 If we were intending this output to be read by humans, we would need to localize
671 it with `_()`.
672
673 Finally, we'll ask `cmd_walken()` to use the object walk instead. Discussing
674 command line options is out of scope for this tutorial, so we'll just hardcode
675 a branch we can change at compile time. Where you call `final_rev_info_setup()`
676 and `walken_commit_walk()`, instead branch like so:
677
678 ----
679 if (1) {
680 add_head_to_pending(&rev);
681 walken_object_walk(&rev);
682 } else {
683 final_rev_info_setup(argc, argv, prefix, &rev);
684 walken_commit_walk(&rev);
685 }
686 ----
687
688 NOTE: For simplicity, we've avoided all the filters and sorts we applied in
689 `final_rev_info_setup()` and simply added `HEAD` to our pending queue. If you
690 want, you can certainly use the filters we added before by moving
691 `final_rev_info_setup()` out of the conditional and removing the call to
692 `add_head_to_pending()`.
693
694 Now we can try to run our command! It should take noticeably longer than the
695 commit walk, but an examination of the output will give you an idea why. Your
696 output should look similar to this example, but with different counts:
697
698 ----
699 Object walk completed. Found 55733 commits, 100274 blobs, 0 tags, and 104210 trees.
700 ----
701
702 This makes sense. We have more trees than commits because the Git project has
703 lots of subdirectories which can change, plus at least one tree per commit. We
704 have no tags because we started on a commit (`HEAD`) and while tags can point to
705 commits, commits can't point to tags.
706
707 NOTE: You will have different counts when you run this yourself! The number of
708 objects grows along with the Git project.
709
710 === Adding a Filter
711
712 There are a handful of filters that we can apply to the object walk laid out in
713 `Documentation/rev-list-options.txt`. These filters are typically useful for
714 operations such as creating packfiles or performing a partial clone. They are
715 defined in `list-objects-filter-options.h`. For the purposes of this tutorial we
716 will use the "tree:1" filter, which causes the walk to omit all trees and blobs
717 which are not directly referenced by commits reachable from the commit in
718 `pending` when the walk begins. (`pending` is the list of objects which need to
719 be traversed during a walk; you can imagine a breadth-first tree traversal to
720 help understand. In our case, that means we omit trees and blobs not directly
721 referenced by `HEAD` or `HEAD`'s history, because we begin the walk with only
722 `HEAD` in the `pending` list.)
723
724 For now, we are not going to track the omitted objects, so we'll replace those
725 parameters with `NULL`. For the sake of simplicity, we'll add a simple
726 build-time branch to use our filter or not. Preface the line calling
727 `traverse_commit_list()` with the following, which will remind us which kind of
728 walk we've just performed:
729
730 ----
731 if (0) {
732 /* Unfiltered: */
733 trace_printf(_("Unfiltered object walk.\n"));
734 } else {
735 trace_printf(
736 _("Filtered object walk with filterspec 'tree:1'.\n"));
737
738 parse_list_objects_filter(&rev->filter, "tree:1");
739 }
740 traverse_commit_list(rev, walken_show_commit,
741 walken_show_object, NULL);
742 ----
743
744 The `rev->filter` member is usually built directly from a command
745 line argument, so the module provides an easy way to build one from a string.
746 Even though we aren't taking user input right now, we can still build one with
747 a hardcoded string using `parse_list_objects_filter()`.
748
749 With the filter spec "tree:1", we are expecting to see _only_ the root tree for
750 each commit; therefore, the tree object count should be less than or equal to
751 the number of commits. (For an example of why that's true: `git commit --revert`
752 points to the same tree object as its grandparent.)
753
754 === Counting Omitted Objects
755
756 We also have the capability to enumerate all objects which were omitted by a
757 filter, like with `git log --filter=<spec> --filter-print-omitted`. To do this,
758 change `traverse_commit_list()` to `traverse_commit_list_filtered()`, which is
759 able to populate an `omitted` list. Asking for this list of filtered objects
760 may cause performance degradations, however, because in this case, despite
761 filtering objects, the possibly much larger set of all reachable objects must
762 be processed in order to populate that list.
763
764 First, add the `struct oidset` and related items we will use to iterate it:
765
766 ----
767 #include "oidset.h"
768
769 ...
770
771 static void walken_object_walk(
772 ...
773
774 struct oidset omitted;
775 struct oidset_iter oit;
776 struct object_id *oid = NULL;
777 int omitted_count = 0;
778 oidset_init(&omitted, 0);
779
780 ...
781 ----
782
783 Replace the call to `traverse_commit_list()` with
784 `traverse_commit_list_filtered()` and pass a pointer to the `omitted` oidset
785 defined and initialized above:
786
787 ----
788 ...
789
790 traverse_commit_list_filtered(rev,
791 walken_show_commit, walken_show_object, NULL, &omitted);
792
793 ...
794 ----
795
796 Then, after your traversal, the `oidset` traversal is pretty straightforward.
797 Count all the objects within and modify the print statement:
798
799 ----
800 /* Count the omitted objects. */
801 oidset_iter_init(&omitted, &oit);
802
803 while ((oid = oidset_iter_next(&oit)))
804 omitted_count++;
805
806 printf("commits %d\nblobs %d\ntags %d\ntrees %d\nomitted %d\n",
807 commit_count, blob_count, tag_count, tree_count, omitted_count);
808 ----
809
810 By running your walk with and without the filter, you should find that the total
811 object count in each case is identical. You can also time each invocation of
812 the `walken` subcommand, with and without `omitted` being passed in, to confirm
813 to yourself the runtime impact of tracking all omitted objects.
814
815 === Changing the Order
816
817 Finally, let's demonstrate that you can also reorder walks of all objects, not
818 just walks of commits. First, we'll make our handlers chattier - modify
819 `walken_show_commit()` and `walken_show_object()` to print the object as they
820 go:
821
822 ----
823 #include "hex.h"
824
825 ...
826
827 static void walken_show_commit(struct commit *cmt, void *buf)
828 {
829 trace_printf("commit: %s\n", oid_to_hex(&cmt->object.oid));
830 commit_count++;
831 }
832
833 static void walken_show_object(struct object *obj, const char *str, void *buf)
834 {
835 trace_printf("%s: %s\n", type_name(obj->type), oid_to_hex(&obj->oid));
836
837 ...
838 }
839 ----
840
841 NOTE: Since we will be examining this output directly as humans, we'll use
842 `trace_printf()` here. Additionally, since this change introduces a significant
843 number of printed lines, using `trace_printf()` will allow us to easily silence
844 those lines without having to recompile.
845
846 (Leave the counter increment logic in place.)
847
848 With only that change, run again (but save yourself some scrollback):
849
850 ----
851 $ GIT_TRACE=1 ./bin-wrappers/git walken 2>&1 | head -n 10
852 ----
853
854 Take a look at the top commit with `git show` and the object ID you printed; it
855 should be the same as the output of `git show HEAD`.
856
857 Next, let's change a setting on our `struct rev_info` within
858 `walken_object_walk()`. Find where you're changing the other settings on `rev`,
859 such as `rev->tree_objects` and `rev->tree_blobs_in_commit_order`, and add the
860 `reverse` setting at the bottom:
861
862 ----
863 ...
864
865 rev->tree_objects = 1;
866 rev->blob_objects = 1;
867 rev->tag_objects = 1;
868 rev->tree_blobs_in_commit_order = 1;
869 rev->reverse = 1;
870
871 ...
872 ----
873
874 Now, run again, but this time, let's grab the last handful of objects instead
875 of the first handful:
876
877 ----
878 $ make
879 $ GIT_TRACE=1 ./bin-wrappers/git walken 2>&1 | tail -n 10
880 ----
881
882 The last commit object given should have the same OID as the one we saw at the
883 top before, and running `git show <oid>` with that OID should give you again
884 the same results as `git show HEAD`. Furthermore, if you run and examine the
885 first ten lines again (with `head` instead of `tail` like we did before applying
886 the `reverse` setting), you should see that now the first commit printed is the
887 initial commit, `e83c5163`.
888
889 == Wrapping Up
890
891 Let's review. In this tutorial, we:
892
893 - Built a commit walk from the ground up
894 - Enabled a grep filter for that commit walk
895 - Changed the sort order of that filtered commit walk
896 - Built an object walk (tags, commits, trees, and blobs) from the ground up
897 - Learned how to add a filter-spec to an object walk
898 - Changed the display order of the filtered object walk