]> git.ipfire.org Git - thirdparty/git.git/blame - Documentation/MyFirstObjectWalk.txt
documentation: add tutorial for object walking
[thirdparty/git.git] / Documentation / MyFirstObjectWalk.txt
CommitLineData
e0479fa0
ES
1= My First Object Walk
2
3== What's an Object Walk?
4
5The object walk is a key concept in Git - this is the process that underpins
6operations like object transfer and fsck. Beginning from a given commit, the
7list of objects is found by walking parent relationships between commits (commit
8X based on commit W) and containment relationships between objects (tree Y is
9contained within commit X, and blob Z is located within tree Y, giving our
10working tree for commit X something like `y/z.txt`).
11
12A related concept is the revision walk, which is focused on commit objects and
13their parent relationships and does not delve into other object types. The
14revision walk is used for operations like `git log`.
15
16=== Related Reading
17
18- `Documentation/user-manual.txt` under "Hacking Git" contains some coverage of
19 the revision walker in its various incarnations.
20- `Documentation/technical/api-revision-walking.txt`
21- https://eagain.net/articles/git-for-computer-scientists/[Git for Computer Scientists]
22 gives a good overview of the types of objects in Git and what your object
23 walk is really describing.
24
25== Setting Up
26
27Create a new branch from `master`.
28
29----
30git checkout -b revwalk origin/master
31----
32
33We'll put our fiddling into a new command. For fun, let's name it `git walken`.
34Open up a new file `builtin/walken.c` and set up the command handler:
35
36----
37/*
38 * "git walken"
39 *
40 * Part of the "My First Object Walk" tutorial.
41 */
42
43#include "builtin.h"
44
45int cmd_walken(int argc, const char **argv, const char *prefix)
46{
47 trace_printf(_("cmd_walken incoming...\n"));
48 return 0;
49}
50----
51
52NOTE: `trace_printf()` differs from `printf()` in that it can be turned on or
53off at runtime. For the purposes of this tutorial, we will write `walken` as
54though it is intended for use as a "plumbing" command: that is, a command which
55is used primarily in scripts, rather than interactively by humans (a "porcelain"
56command). So we will send our debug output to `trace_printf()` instead. When
57running, enable trace output by setting the environment variable `GIT_TRACE`.
58
59Add usage text and `-h` handling, like all subcommands should consistently do
60(our test suite will notice and complain if you fail to do so).
61
62----
63int cmd_walken(int argc, const char **argv, const char *prefix)
64{
65 const char * const walken_usage[] = {
66 N_("git walken"),
67 NULL,
68 }
69 struct option options[] = {
70 OPT_END()
71 };
72
73 argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
74
75 ...
76}
77----
78
79Also add the relevant line in `builtin.h` near `cmd_whatchanged()`:
80
81----
82int cmd_walken(int argc, const char **argv, const char *prefix);
83----
84
85Include the command in `git.c` in `commands[]` near the entry for `whatchanged`,
86maintaining alphabetical ordering:
87
88----
89{ "walken", cmd_walken, RUN_SETUP },
90----
91
92Add it to the `Makefile` near the line for `builtin/worktree.o`:
93
94----
95BUILTIN_OBJS += builtin/walken.o
96----
97
98Build and test out your command, without forgetting to ensure the `DEVELOPER`
99flag is set, and with `GIT_TRACE` enabled so the debug output can be seen:
100
101----
102$ echo DEVELOPER=1 >>config.mak
103$ make
104$ GIT_TRACE=1 ./bin-wrappers/git walken
105----
106
107NOTE: For a more exhaustive overview of the new command process, take a look at
108`Documentation/MyFirstContribution.txt`.
109
110NOTE: A reference implementation can be found at
111https://github.com/nasamuffin/git/tree/revwalk.
112
113=== `struct rev_cmdline_info`
114
115The definition of `struct rev_cmdline_info` can be found in `revision.h`.
116
117This struct is contained within the `rev_info` struct and is used to reflect
118parameters provided by the user over the CLI.
119
120`nr` represents the number of `rev_cmdline_entry` present in the array.
121
122`alloc` is used by the `ALLOC_GROW` macro. Check
123`Documentation/technical/api-allocation-growing.txt` - this variable is used to
124track the allocated size of the list.
125
126Per entry, we find:
127
128`item` is the object provided upon which to base the object walk. Items in Git
129can be blobs, trees, commits, or tags. (See `Documentation/gittutorial-2.txt`.)
130
131`name` is the object ID (OID) of the object - a hex string you may be familiar
132with from using Git to organize your source in the past. Check the tutorial
133mentioned above towards the top for a discussion of where the OID can come
134from.
135
136`whence` indicates some information about what to do with the parents of the
137specified object. We'll explore this flag more later on; take a look at
138`Documentation/revisions.txt` to get an idea of what could set the `whence`
139value.
140
141`flags` are used to hint the beginning of the revision walk and are the first
142block under the `#include`s in `revision.h`. The most likely ones to be set in
143the `rev_cmdline_info` are `UNINTERESTING` and `BOTTOM`, but these same flags
144can be used during the walk, as well.
145
146=== `struct rev_info`
147
148This one is quite a bit longer, and many fields are only used during the walk
149by `revision.c` - not configuration options. Most of the configurable flags in
150`struct rev_info` have a mirror in `Documentation/rev-list-options.txt`. It's a
151good idea to take some time and read through that document.
152
153== Basic Commit Walk
154
155First, let's see if we can replicate the output of `git log --oneline`. We'll
156refer back to the implementation frequently to discover norms when performing
157an object walk of our own.
158
159To do so, we'll first find all the commits, in order, which preceded the current
160commit. We'll extract the name and subject of the commit from each.
161
162Ideally, we will also be able to find out which ones are currently at the tip of
163various branches.
164
165=== Setting Up
166
167Preparing for your object walk has some distinct stages.
168
1691. Perform default setup for this mode, and others which may be invoked.
1702. Check configuration files for relevant settings.
1713. Set up the `rev_info` struct.
1724. Tweak the initialized `rev_info` to suit the current walk.
1735. Prepare the `rev_info` for the walk.
1746. Iterate over the objects, processing each one.
175
176==== Default Setups
177
178Before examining configuration files which may modify command behavior, set up
179default state for switches or options your command may have. If your command
180utilizes other Git components, ask them to set up their default states as well.
181For instance, `git log` takes advantage of `grep` and `diff` functionality, so
182its `init_log_defaults()` sets its own state (`decoration_style`) and asks
183`grep` and `diff` to initialize themselves by calling each of their
184initialization functions.
185
186For our first example within `git walken`, we don't intend to use any other
187components within Git, and we don't have any configuration to do. However, we
188may want to add some later, so for now, we can add an empty placeholder. Create
189a new function in `builtin/walken.c`:
190
191----
192static void init_walken_defaults(void)
193{
194 /*
195 * We don't actually need the same components `git log` does; leave this
196 * empty for now.
197 */
198}
199----
200
201Make sure to add a line invoking it inside of `cmd_walken()`.
202
203----
204int cmd_walken(int argc, const char **argv, const char *prefix)
205{
206 init_walken_defaults();
207}
208----
209
210==== Configuring From `.gitconfig`
211
212Next, we should have a look at any relevant configuration settings (i.e.,
213settings readable and settable from `git config`). This is done by providing a
214callback to `git_config()`; within that callback, you can also invoke methods
215from other components you may need that need to intercept these options. Your
216callback will be invoked once per each configuration value which Git knows about
217(global, local, worktree, etc.).
218
219Similarly to the default values, we don't have anything to do here yet
220ourselves; however, we should call `git_default_config()` if we aren't calling
221any other existing config callbacks.
222
223Add a new function to `builtin/walken.c`:
224
225----
226static int git_walken_config(const char *var, const char *value, void *cb)
227{
228 /*
229 * For now, we don't have any custom configuration, so fall back to
230 * the default config.
231 */
232 return git_default_config(var, value, cb);
233}
234----
235
236Make sure to invoke `git_config()` with it in your `cmd_walken()`:
237
238----
239int cmd_walken(int argc, const char **argv, const char *prefix)
240{
241 ...
242
243 git_config(git_walken_config, NULL);
244
245 ...
246}
247----
248
249==== Setting Up `rev_info`
250
251Now that we've gathered external configuration and options, it's time to
252initialize the `rev_info` object which we will use to perform the walk. This is
253typically done by calling `repo_init_revisions()` with the repository you intend
254to target, as well as the `prefix` argument of `cmd_walken` and your `rev_info`
255struct.
256
257Add the `struct rev_info` and the `repo_init_revisions()` call:
258----
259int cmd_walken(int argc, const char **argv, const char *prefix)
260{
261 /* This can go wherever you like in your declarations.*/
262 struct rev_info rev;
263 ...
264
265 /* This should go after the git_config() call. */
266 repo_init_revisions(the_repository, &rev, prefix);
267
268 ...
269}
270----
271
272==== Tweaking `rev_info` For the Walk
273
274We're getting close, but we're still not quite ready to go. Now that `rev` is
275initialized, we can modify it to fit our needs. This is usually done within a
276helper for clarity, so let's add one:
277
278----
279static void final_rev_info_setup(struct rev_info *rev)
280{
281 /*
282 * We want to mimic the appearance of `git log --oneline`, so let's
283 * force oneline format.
284 */
285 get_commit_format("oneline", rev);
286
287 /* Start our object walk at HEAD. */
288 add_head_to_pending(rev);
289}
290----
291
292[NOTE]
293====
294Instead of using the shorthand `add_head_to_pending()`, you could do
295something like this:
296----
297 struct setup_revision_opt opt;
298
299 memset(&opt, 0, sizeof(opt));
300 opt.def = "HEAD";
301 opt.revarg_opt = REVARG_COMMITTISH;
302 setup_revisions(argc, argv, rev, &opt);
303----
304Using a `setup_revision_opt` gives you finer control over your walk's starting
305point.
306====
307
308Then let's invoke `final_rev_info_setup()` after the call to
309`repo_init_revisions()`:
310
311----
312int cmd_walken(int argc, const char **argv, const char *prefix)
313{
314 ...
315
316 final_rev_info_setup(&rev);
317
318 ...
319}
320----
321
322Later, we may wish to add more arguments to `final_rev_info_setup()`. But for
323now, this is all we need.
324
325==== Preparing `rev_info` For the Walk
326
327Now that `rev` is all initialized and configured, we've got one more setup step
328before we get rolling. We can do this in a helper, which will both prepare the
329`rev_info` for the walk, and perform the walk itself. Let's start the helper
330with the call to `prepare_revision_walk()`, which can return an error without
331dying on its own:
332
333----
334static void walken_commit_walk(struct rev_info *rev)
335{
336 if (prepare_revision_walk(rev))
337 die(_("revision walk setup failed"));
338}
339----
340
341NOTE: `die()` prints to `stderr` and exits the program. Since it will print to
342`stderr` it's likely to be seen by a human, so we will localize it.
343
344==== Performing the Walk!
345
346Finally! We are ready to begin the walk itself. Now we can see that `rev_info`
347can also be used as an iterator; we move to the next item in the walk by using
348`get_revision()` repeatedly. Add the listed variable declarations at the top and
349the walk loop below the `prepare_revision_walk()` call within your
350`walken_commit_walk()`:
351
352----
353static void walken_commit_walk(struct rev_info *rev)
354{
355 struct commit *commit;
356 struct strbuf prettybuf = STRBUF_INIT;
357
358 ...
359
360 while ((commit = get_revision(rev))) {
361 if (!commit)
362 continue;
363
364 strbuf_reset(&prettybuf);
365 pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf);
366 puts(prettybuf.buf);
367 }
368 strbuf_release(&prettybuf);
369}
370----
371
372NOTE: `puts()` prints a `char*` to `stdout`. Since this is the part of the
373command we expect to be machine-parsed, we're sending it directly to stdout.
374
375Give it a shot.
376
377----
378$ make
379$ ./bin-wrappers/git walken
380----
381
382You should see all of the subject lines of all the commits in
383your tree's history, in order, ending with the initial commit, "Initial revision
384of "git", the information manager from hell". Congratulations! You've written
385your first revision walk. You can play with printing some additional fields
386from each commit if you're curious; have a look at the functions available in
387`commit.h`.
388
389=== Adding a Filter
390
391Next, let's try to filter the commits we see based on their author. This is
392equivalent to running `git log --author=<pattern>`. We can add a filter by
393modifying `rev_info.grep_filter`, which is a `struct grep_opt`.
394
395First some setup. Add `init_grep_defaults()` to `init_walken_defaults()` and add
396`grep_config()` to `git_walken_config()`:
397
398----
399static void init_walken_defaults(void)
400{
401 init_grep_defaults(the_repository);
402}
403
404...
405
406static int git_walken_config(const char *var, const char *value, void *cb)
407{
408 grep_config(var, value, cb);
409 return git_default_config(var, value, cb);
410}
411----
412
413Next, we can modify the `grep_filter`. This is done with convenience functions
414found in `grep.h`. For fun, we're filtering to only commits from folks using a
415`gmail.com` email address - a not-very-precise guess at who may be working on
416Git as a hobby. Since we're checking the author, which is a specific line in the
417header, we'll use the `append_header_grep_pattern()` helper. We can use
418the `enum grep_header_field` to indicate which part of the commit header we want
419to search.
420
421In `final_rev_info_setup()`, add your filter line:
422
423----
424static void final_rev_info_setup(int argc, const char **argv,
425 const char *prefix, struct rev_info *rev)
426{
427 ...
428
429 append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR,
430 "gmail");
431 compile_grep_patterns(&rev->grep_filter);
432
433 ...
434}
435----
436
437`append_header_grep_pattern()` adds your new "gmail" pattern to `rev_info`, but
438it won't work unless we compile it with `compile_grep_patterns()`.
439
440NOTE: If you are using `setup_revisions()` (for example, if you are passing a
441`setup_revision_opt` instead of using `add_head_to_pending()`), you don't need
442to call `compile_grep_patterns()` because `setup_revisions()` calls it for you.
443
444NOTE: We could add the same filter via the `append_grep_pattern()` helper if we
445wanted to, but `append_header_grep_pattern()` adds the `enum grep_context` and
446`enum grep_pat_token` for us.
447
448=== Changing the Order
449
450There are a few ways that we can change the order of the commits during a
451revision walk. Firstly, we can use the `enum rev_sort_order` to choose from some
452typical orderings.
453
454`topo_order` is the same as `git log --topo-order`: we avoid showing a parent
455before all of its children have been shown, and we avoid mixing commits which
456are in different lines of history. (`git help log`'s section on `--topo-order`
457has a very nice diagram to illustrate this.)
458
459Let's see what happens when we run with `REV_SORT_BY_COMMIT_DATE` as opposed to
460`REV_SORT_BY_AUTHOR_DATE`. Add the following:
461
462----
463static void final_rev_info_setup(int argc, const char **argv,
464 const char *prefix, struct rev_info *rev)
465{
466 ...
467
468 rev->topo_order = 1;
469 rev->sort_order = REV_SORT_BY_COMMIT_DATE;
470
471 ...
472}
473----
474
475Let's output this into a file so we can easily diff it with the walk sorted by
476author date.
477
478----
479$ make
480$ ./bin-wrappers/git walken > commit-date.txt
481----
482
483Then, let's sort by author date and run it again.
484
485----
486static void final_rev_info_setup(int argc, const char **argv,
487 const char *prefix, struct rev_info *rev)
488{
489 ...
490
491 rev->topo_order = 1;
492 rev->sort_order = REV_SORT_BY_AUTHOR_DATE;
493
494 ...
495}
496----
497
498----
499$ make
500$ ./bin-wrappers/git walken > author-date.txt
501----
502
503Finally, compare the two. This is a little less helpful without object names or
504dates, but hopefully we get the idea.
505
506----
507$ diff -u commit-date.txt author-date.txt
508----
509
510This display indicates that commits can be reordered after they're written, for
511example with `git rebase`.
512
513Let's try one more reordering of commits. `rev_info` exposes a `reverse` flag.
514Set that flag somewhere inside of `final_rev_info_setup()`:
515
516----
517static void final_rev_info_setup(int argc, const char **argv, const char *prefix,
518 struct rev_info *rev)
519{
520 ...
521
522 rev->reverse = 1;
523
524 ...
525}
526----
527
528Run your walk again and note the difference in order. (If you remove the grep
529pattern, you should see the last commit this call gives you as your current
530HEAD.)
531
532== Basic Object Walk
533
534So far we've been walking only commits. But Git has more types of objects than
535that! Let's see if we can walk _all_ objects, and find out some information
536about each one.
537
538We can base our work on an example. `git pack-objects` prepares all kinds of
539objects for packing into a bitmap or packfile. The work we are interested in
540resides in `builtins/pack-objects.c:get_object_list()`; examination of that
541function shows that the all-object walk is being performed by
542`traverse_commit_list()` or `traverse_commit_list_filtered()`. Those two
543functions reside in `list-objects.c`; examining the source shows that, despite
544the name, these functions traverse all kinds of objects. Let's have a look at
545the arguments to `traverse_commit_list_filtered()`, which are a superset of the
546arguments to the unfiltered version.
547
548- `struct list_objects_filter_options *filter_options`: This is a struct which
549 stores a filter-spec as outlined in `Documentation/rev-list-options.txt`.
550- `struct rev_info *revs`: This is the `rev_info` used for the walk.
551- `show_commit_fn show_commit`: A callback which will be used to handle each
552 individual commit object.
553- `show_object_fn show_object`: A callback which will be used to handle each
554 non-commit object (so each blob, tree, or tag).
555- `void *show_data`: A context buffer which is passed in turn to `show_commit`
556 and `show_object`.
557- `struct oidset *omitted`: A linked-list of object IDs which the provided
558 filter caused to be omitted.
559
560It looks like this `traverse_commit_list_filtered()` uses callbacks we provide
561instead of needing us to call it repeatedly ourselves. Cool! Let's add the
562callbacks first.
563
564For the sake of this tutorial, we'll simply keep track of how many of each kind
565of object we find. At file scope in `builtin/walken.c` add the following
566tracking variables:
567
568----
569static int commit_count;
570static int tag_count;
571static int blob_count;
572static int tree_count;
573----
574
575Commits are handled by a different callback than other objects; let's do that
576one first:
577
578----
579static void walken_show_commit(struct commit *cmt, void *buf)
580{
581 commit_count++;
582}
583----
584
585The `cmt` argument is fairly self-explanatory. But it's worth mentioning that
586the `buf` argument is actually the context buffer that we can provide to the
587traversal calls - `show_data`, which we mentioned a moment ago.
588
589Since we have the `struct commit` object, we can look at all the same parts that
590we looked at in our earlier commit-only walk. For the sake of this tutorial,
591though, we'll just increment the commit counter and move on.
592
593The callback for non-commits is a little different, as we'll need to check
594which kind of object we're dealing with:
595
596----
597static void walken_show_object(struct object *obj, const char *str, void *buf)
598{
599 switch (obj->type) {
600 case OBJ_TREE:
601 tree_count++;
602 break;
603 case OBJ_BLOB:
604 blob_count++;
605 break;
606 case OBJ_TAG:
607 tag_count++;
608 break;
609 case OBJ_COMMIT:
610 BUG("unexpected commit object in walken_show_object\n");
611 default:
612 BUG("unexpected object type %s in walken_show_object\n",
613 type_name(obj->type));
614 }
615}
616----
617
618Again, `obj` is fairly self-explanatory, and we can guess that `buf` is the same
619context pointer that `walken_show_commit()` receives: the `show_data` argument
620to `traverse_commit_list()` and `traverse_commit_list_filtered()`. Finally,
621`str` contains the name of the object, which ends up being something like
622`foo.txt` (blob), `bar/baz` (tree), or `v1.2.3` (tag).
623
624To help assure us that we aren't double-counting commits, we'll include some
625complaining if a commit object is routed through our non-commit callback; we'll
626also complain if we see an invalid object type. Since those two cases should be
627unreachable, and would only change in the event of a semantic change to the Git
628codebase, we complain by using `BUG()` - which is a signal to a developer that
629the change they made caused unintended consequences, and the rest of the
630codebase needs to be updated to understand that change. `BUG()` is not intended
631to be seen by the public, so it is not localized.
632
633Our main object walk implementation is substantially different from our commit
634walk implementation, so let's make a new function to perform the object walk. We
635can perform setup which is applicable to all objects here, too, to keep separate
636from setup which is applicable to commit-only walks.
637
638We'll start by enabling all types of objects in the `struct rev_info`. We'll
639also turn on `tree_blobs_in_commit_order`, which means that we will walk a
640commit's tree and everything it points to immediately after we find each commit,
641as opposed to waiting for the end and walking through all trees after the commit
642history has been discovered. With the appropriate settings configured, we are
643ready to call `prepare_revision_walk()`.
644
645----
646static void walken_object_walk(struct rev_info *rev)
647{
648 rev->tree_objects = 1;
649 rev->blob_objects = 1;
650 rev->tag_objects = 1;
651 rev->tree_blobs_in_commit_order = 1;
652
653 if (prepare_revision_walk(rev))
654 die(_("revision walk setup failed"));
655
656 commit_count = 0;
657 tag_count = 0;
658 blob_count = 0;
659 tree_count = 0;
660----
661
662Let's start by calling just the unfiltered walk and reporting our counts.
663Complete your implementation of `walken_object_walk()`:
664
665----
666 traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
667
668 printf("commits %d\nblobs %d\ntags %d\ntrees %d\n", commit_count,
669 blob_count, tag_count, tree_count);
670}
671----
672
673NOTE: This output is intended to be machine-parsed. Therefore, we are not
674sending it to `trace_printf()`, and we are not localizing it - we need scripts
675to be able to count on the formatting to be exactly the way it is shown here.
676If we were intending this output to be read by humans, we would need to localize
677it with `_()`.
678
679Finally, we'll ask `cmd_walken()` to use the object walk instead. Discussing
680command line options is out of scope for this tutorial, so we'll just hardcode
681a branch we can change at compile time. Where you call `final_rev_info_setup()`
682and `walken_commit_walk()`, instead branch like so:
683
684----
685 if (1) {
686 add_head_to_pending(&rev);
687 walken_object_walk(&rev);
688 } else {
689 final_rev_info_setup(argc, argv, prefix, &rev);
690 walken_commit_walk(&rev);
691 }
692----
693
694NOTE: For simplicity, we've avoided all the filters and sorts we applied in
695`final_rev_info_setup()` and simply added `HEAD` to our pending queue. If you
696want, you can certainly use the filters we added before by moving
697`final_rev_info_setup()` out of the conditional and removing the call to
698`add_head_to_pending()`.
699
700Now we can try to run our command! It should take noticeably longer than the
701commit walk, but an examination of the output will give you an idea why. Your
702output should look similar to this example, but with different counts:
703
704----
705Object walk completed. Found 55733 commits, 100274 blobs, 0 tags, and 104210 trees.
706----
707
708This makes sense. We have more trees than commits because the Git project has
709lots of subdirectories which can change, plus at least one tree per commit. We
710have no tags because we started on a commit (`HEAD`) and while tags can point to
711commits, commits can't point to tags.
712
713NOTE: You will have different counts when you run this yourself! The number of
714objects grows along with the Git project.
715
716=== Adding a Filter
717
718There are a handful of filters that we can apply to the object walk laid out in
719`Documentation/rev-list-options.txt`. These filters are typically useful for
720operations such as creating packfiles or performing a partial clone. They are
721defined in `list-objects-filter-options.h`. For the purposes of this tutorial we
722will use the "tree:1" filter, which causes the walk to omit all trees and blobs
723which are not directly referenced by commits reachable from the commit in
724`pending` when the walk begins. (`pending` is the list of objects which need to
725be traversed during a walk; you can imagine a breadth-first tree traversal to
726help understand. In our case, that means we omit trees and blobs not directly
727referenced by `HEAD` or `HEAD`'s history, because we begin the walk with only
728`HEAD` in the `pending` list.)
729
730First, we'll need to `#include "list-objects-filter-options.h`" and set up the
731`struct list_objects_filter_options` at the top of the function.
732
733----
734static void walken_object_walk(struct rev_info *rev)
735{
736 struct list_objects_filter_options filter_options = {};
737
738 ...
739----
740
741For now, we are not going to track the omitted objects, so we'll replace those
742parameters with `NULL`. For the sake of simplicity, we'll add a simple
743build-time branch to use our filter or not. Replace the line calling
744`traverse_commit_list()` with the following, which will remind us which kind of
745walk we've just performed:
746
747----
748 if (0) {
749 /* Unfiltered: */
750 trace_printf(_("Unfiltered object walk.\n"));
751 traverse_commit_list(rev, walken_show_commit,
752 walken_show_object, NULL);
753 } else {
754 trace_printf(
755 _("Filtered object walk with filterspec 'tree:1'.\n"));
756 parse_list_objects_filter(&filter_options, "tree:1");
757
758 traverse_commit_list_filtered(&filter_options, rev,
759 walken_show_commit, walken_show_object, NULL, NULL);
760 }
761----
762
763`struct list_objects_filter_options` is usually built directly from a command
764line argument, so the module provides an easy way to build one from a string.
765Even though we aren't taking user input right now, we can still build one with
766a hardcoded string using `parse_list_objects_filter()`.
767
768With the filter spec "tree:1", we are expecting to see _only_ the root tree for
769each commit; therefore, the tree object count should be less than or equal to
770the number of commits. (For an example of why that's true: `git commit --revert`
771points to the same tree object as its grandparent.)
772
773=== Counting Omitted Objects
774
775We also have the capability to enumerate all objects which were omitted by a
776filter, like with `git log --filter=<spec> --filter-print-omitted`. Asking
777`traverse_commit_list_filtered()` to populate the `omitted` list means that our
778object walk does not perform any better than an unfiltered object walk; all
779reachable objects are walked in order to populate the list.
780
781First, add the `struct oidset` and related items we will use to iterate it:
782
783----
784static void walken_object_walk(
785 ...
786
787 struct oidset omitted;
788 struct oidset_iter oit;
789 struct object_id *oid = NULL;
790 int omitted_count = 0;
791 oidset_init(&omitted, 0);
792
793 ...
794----
795
796Modify the call to `traverse_commit_list_filtered()` to include your `omitted`
797object:
798
799----
800 ...
801
802 traverse_commit_list_filtered(&filter_options, rev,
803 walken_show_commit, walken_show_object, NULL, &omitted);
804
805 ...
806----
807
808Then, after your traversal, the `oidset` traversal is pretty straightforward.
809Count all the objects within and modify the print statement:
810
811----
812 /* Count the omitted objects. */
813 oidset_iter_init(&omitted, &oit);
814
815 while ((oid = oidset_iter_next(&oit)))
816 omitted_count++;
817
818 printf("commits %d\nblobs %d\ntags %d\ntrees%d\nomitted %d\n",
819 commit_count, blob_count, tag_count, tree_count, omitted_count);
820----
821
822By running your walk with and without the filter, you should find that the total
823object count in each case is identical. You can also time each invocation of
824the `walken` subcommand, with and without `omitted` being passed in, to confirm
825to yourself the runtime impact of tracking all omitted objects.
826
827=== Changing the Order
828
829Finally, let's demonstrate that you can also reorder walks of all objects, not
830just walks of commits. First, we'll make our handlers chattier - modify
831`walken_show_commit()` and `walken_show_object()` to print the object as they
832go:
833
834----
835static void walken_show_commit(struct commit *cmt, void *buf)
836{
837 trace_printf("commit: %s\n", oid_to_hex(&cmt->object.oid));
838 commit_count++;
839}
840
841static void walken_show_object(struct object *obj, const char *str, void *buf)
842{
843 trace_printf("%s: %s\n", type_name(obj->type), oid_to_hex(&obj->oid));
844
845 ...
846}
847----
848
849NOTE: Since we will be examining this output directly as humans, we'll use
850`trace_printf()` here. Additionally, since this change introduces a significant
851number of printed lines, using `trace_printf()` will allow us to easily silence
852those lines without having to recompile.
853
854(Leave the counter increment logic in place.)
855
856With only that change, run again (but save yourself some scrollback):
857
858----
859$ GIT_TRACE=1 ./bin-wrappers/git walken | head -n 10
860----
861
862Take a look at the top commit with `git show` and the object ID you printed; it
863should be the same as the output of `git show HEAD`.
864
865Next, let's change a setting on our `struct rev_info` within
866`walken_object_walk()`. Find where you're changing the other settings on `rev`,
867such as `rev->tree_objects` and `rev->tree_blobs_in_commit_order`, and add the
868`reverse` setting at the bottom:
869
870----
871 ...
872
873 rev->tree_objects = 1;
874 rev->blob_objects = 1;
875 rev->tag_objects = 1;
876 rev->tree_blobs_in_commit_order = 1;
877 rev->reverse = 1;
878
879 ...
880----
881
882Now, run again, but this time, let's grab the last handful of objects instead
883of the first handful:
884
885----
886$ make
887$ GIT_TRACE=1 ./bin-wrappers git walken | tail -n 10
888----
889
890The last commit object given should have the same OID as the one we saw at the
891top before, and running `git show <oid>` with that OID should give you again
892the same results as `git show HEAD`. Furthermore, if you run and examine the
893first ten lines again (with `head` instead of `tail` like we did before applying
894the `reverse` setting), you should see that now the first commit printed is the
895initial commit, `e83c5163`.
896
897== Wrapping Up
898
899Let's review. In this tutorial, we:
900
901- Built a commit walk from the ground up
902- Enabled a grep filter for that commit walk
903- Changed the sort order of that filtered commit walk
904- Built an object walk (tags, commits, trees, and blobs) from the ground up
905- Learned how to add a filter-spec to an object walk
906- Changed the display order of the filtered object walk