]> git.ipfire.org Git - thirdparty/git.git/blame - Documentation/MyFirstObjectWalk.txt
Merge branch 'jk/rev-parse-end-of-options'
[thirdparty/git.git] / Documentation / MyFirstObjectWalk.txt
CommitLineData
e0479fa0
ES
1= My First Object Walk
2
3== What's an Object Walk?
4
5The object walk is a key concept in Git - this is the process that underpins
6operations like object transfer and fsck. Beginning from a given commit, the
7list of objects is found by walking parent relationships between commits (commit
8X based on commit W) and containment relationships between objects (tree Y is
9contained within commit X, and blob Z is located within tree Y, giving our
10working tree for commit X something like `y/z.txt`).
11
12A related concept is the revision walk, which is focused on commit objects and
13their parent relationships and does not delve into other object types. The
14revision walk is used for operations like `git log`.
15
16=== Related Reading
17
18- `Documentation/user-manual.txt` under "Hacking Git" contains some coverage of
19 the revision walker in its various incarnations.
301d595e 20- `revision.h`
e0479fa0
ES
21- https://eagain.net/articles/git-for-computer-scientists/[Git for Computer Scientists]
22 gives a good overview of the types of objects in Git and what your object
23 walk is really describing.
24
25== Setting Up
26
27Create a new branch from `master`.
28
29----
30git checkout -b revwalk origin/master
31----
32
33We'll put our fiddling into a new command. For fun, let's name it `git walken`.
34Open up a new file `builtin/walken.c` and set up the command handler:
35
36----
37/*
38 * "git walken"
39 *
40 * Part of the "My First Object Walk" tutorial.
41 */
42
43#include "builtin.h"
44
45int cmd_walken(int argc, const char **argv, const char *prefix)
46{
47 trace_printf(_("cmd_walken incoming...\n"));
48 return 0;
49}
50----
51
52NOTE: `trace_printf()` differs from `printf()` in that it can be turned on or
53off at runtime. For the purposes of this tutorial, we will write `walken` as
54though it is intended for use as a "plumbing" command: that is, a command which
55is used primarily in scripts, rather than interactively by humans (a "porcelain"
56command). So we will send our debug output to `trace_printf()` instead. When
57running, enable trace output by setting the environment variable `GIT_TRACE`.
58
59Add usage text and `-h` handling, like all subcommands should consistently do
60(our test suite will notice and complain if you fail to do so).
61
62----
63int cmd_walken(int argc, const char **argv, const char *prefix)
64{
65 const char * const walken_usage[] = {
66 N_("git walken"),
67 NULL,
68 }
69 struct option options[] = {
70 OPT_END()
71 };
72
73 argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
74
75 ...
76}
77----
78
79Also add the relevant line in `builtin.h` near `cmd_whatchanged()`:
80
81----
82int cmd_walken(int argc, const char **argv, const char *prefix);
83----
84
85Include the command in `git.c` in `commands[]` near the entry for `whatchanged`,
86maintaining alphabetical ordering:
87
88----
89{ "walken", cmd_walken, RUN_SETUP },
90----
91
92Add it to the `Makefile` near the line for `builtin/worktree.o`:
93
94----
95BUILTIN_OBJS += builtin/walken.o
96----
97
98Build and test out your command, without forgetting to ensure the `DEVELOPER`
99flag is set, and with `GIT_TRACE` enabled so the debug output can be seen:
100
101----
102$ echo DEVELOPER=1 >>config.mak
103$ make
104$ GIT_TRACE=1 ./bin-wrappers/git walken
105----
106
107NOTE: For a more exhaustive overview of the new command process, take a look at
108`Documentation/MyFirstContribution.txt`.
109
110NOTE: A reference implementation can be found at
111https://github.com/nasamuffin/git/tree/revwalk.
112
113=== `struct rev_cmdline_info`
114
115The definition of `struct rev_cmdline_info` can be found in `revision.h`.
116
117This struct is contained within the `rev_info` struct and is used to reflect
118parameters provided by the user over the CLI.
119
120`nr` represents the number of `rev_cmdline_entry` present in the array.
121
13aa9c8b
HW
122`alloc` is used by the `ALLOC_GROW` macro. Check `cache.h` - this variable is
123used to track the allocated size of the list.
e0479fa0
ES
124
125Per entry, we find:
126
127`item` is the object provided upon which to base the object walk. Items in Git
128can be blobs, trees, commits, or tags. (See `Documentation/gittutorial-2.txt`.)
129
130`name` is the object ID (OID) of the object - a hex string you may be familiar
131with from using Git to organize your source in the past. Check the tutorial
132mentioned above towards the top for a discussion of where the OID can come
133from.
134
135`whence` indicates some information about what to do with the parents of the
136specified object. We'll explore this flag more later on; take a look at
137`Documentation/revisions.txt` to get an idea of what could set the `whence`
138value.
139
140`flags` are used to hint the beginning of the revision walk and are the first
141block under the `#include`s in `revision.h`. The most likely ones to be set in
142the `rev_cmdline_info` are `UNINTERESTING` and `BOTTOM`, but these same flags
143can be used during the walk, as well.
144
145=== `struct rev_info`
146
147This one is quite a bit longer, and many fields are only used during the walk
148by `revision.c` - not configuration options. Most of the configurable flags in
149`struct rev_info` have a mirror in `Documentation/rev-list-options.txt`. It's a
150good idea to take some time and read through that document.
151
152== Basic Commit Walk
153
154First, let's see if we can replicate the output of `git log --oneline`. We'll
155refer back to the implementation frequently to discover norms when performing
156an object walk of our own.
157
158To do so, we'll first find all the commits, in order, which preceded the current
159commit. We'll extract the name and subject of the commit from each.
160
161Ideally, we will also be able to find out which ones are currently at the tip of
162various branches.
163
164=== Setting Up
165
166Preparing for your object walk has some distinct stages.
167
1681. Perform default setup for this mode, and others which may be invoked.
1692. Check configuration files for relevant settings.
1703. Set up the `rev_info` struct.
1714. Tweak the initialized `rev_info` to suit the current walk.
1725. Prepare the `rev_info` for the walk.
1736. Iterate over the objects, processing each one.
174
175==== Default Setups
176
177Before examining configuration files which may modify command behavior, set up
178default state for switches or options your command may have. If your command
179utilizes other Git components, ask them to set up their default states as well.
180For instance, `git log` takes advantage of `grep` and `diff` functionality, so
181its `init_log_defaults()` sets its own state (`decoration_style`) and asks
182`grep` and `diff` to initialize themselves by calling each of their
183initialization functions.
184
185For our first example within `git walken`, we don't intend to use any other
186components within Git, and we don't have any configuration to do. However, we
187may want to add some later, so for now, we can add an empty placeholder. Create
188a new function in `builtin/walken.c`:
189
190----
191static void init_walken_defaults(void)
192{
193 /*
194 * We don't actually need the same components `git log` does; leave this
195 * empty for now.
196 */
197}
198----
199
200Make sure to add a line invoking it inside of `cmd_walken()`.
201
202----
203int cmd_walken(int argc, const char **argv, const char *prefix)
204{
205 init_walken_defaults();
206}
207----
208
209==== Configuring From `.gitconfig`
210
211Next, we should have a look at any relevant configuration settings (i.e.,
212settings readable and settable from `git config`). This is done by providing a
213callback to `git_config()`; within that callback, you can also invoke methods
214from other components you may need that need to intercept these options. Your
215callback will be invoked once per each configuration value which Git knows about
216(global, local, worktree, etc.).
217
218Similarly to the default values, we don't have anything to do here yet
219ourselves; however, we should call `git_default_config()` if we aren't calling
220any other existing config callbacks.
221
222Add a new function to `builtin/walken.c`:
223
224----
225static int git_walken_config(const char *var, const char *value, void *cb)
226{
227 /*
228 * For now, we don't have any custom configuration, so fall back to
229 * the default config.
230 */
231 return git_default_config(var, value, cb);
232}
233----
234
235Make sure to invoke `git_config()` with it in your `cmd_walken()`:
236
237----
238int cmd_walken(int argc, const char **argv, const char *prefix)
239{
240 ...
241
242 git_config(git_walken_config, NULL);
243
244 ...
245}
246----
247
248==== Setting Up `rev_info`
249
250Now that we've gathered external configuration and options, it's time to
251initialize the `rev_info` object which we will use to perform the walk. This is
252typically done by calling `repo_init_revisions()` with the repository you intend
253to target, as well as the `prefix` argument of `cmd_walken` and your `rev_info`
254struct.
255
256Add the `struct rev_info` and the `repo_init_revisions()` call:
257----
258int cmd_walken(int argc, const char **argv, const char *prefix)
259{
260 /* This can go wherever you like in your declarations.*/
261 struct rev_info rev;
262 ...
263
264 /* This should go after the git_config() call. */
265 repo_init_revisions(the_repository, &rev, prefix);
266
267 ...
268}
269----
270
271==== Tweaking `rev_info` For the Walk
272
273We're getting close, but we're still not quite ready to go. Now that `rev` is
274initialized, we can modify it to fit our needs. This is usually done within a
275helper for clarity, so let's add one:
276
277----
278static void final_rev_info_setup(struct rev_info *rev)
279{
280 /*
281 * We want to mimic the appearance of `git log --oneline`, so let's
282 * force oneline format.
283 */
284 get_commit_format("oneline", rev);
285
286 /* Start our object walk at HEAD. */
287 add_head_to_pending(rev);
288}
289----
290
291[NOTE]
292====
293Instead of using the shorthand `add_head_to_pending()`, you could do
294something like this:
295----
296 struct setup_revision_opt opt;
297
298 memset(&opt, 0, sizeof(opt));
299 opt.def = "HEAD";
300 opt.revarg_opt = REVARG_COMMITTISH;
301 setup_revisions(argc, argv, rev, &opt);
302----
303Using a `setup_revision_opt` gives you finer control over your walk's starting
304point.
305====
306
307Then let's invoke `final_rev_info_setup()` after the call to
308`repo_init_revisions()`:
309
310----
311int cmd_walken(int argc, const char **argv, const char *prefix)
312{
313 ...
314
315 final_rev_info_setup(&rev);
316
317 ...
318}
319----
320
321Later, we may wish to add more arguments to `final_rev_info_setup()`. But for
322now, this is all we need.
323
324==== Preparing `rev_info` For the Walk
325
326Now that `rev` is all initialized and configured, we've got one more setup step
327before we get rolling. We can do this in a helper, which will both prepare the
328`rev_info` for the walk, and perform the walk itself. Let's start the helper
329with the call to `prepare_revision_walk()`, which can return an error without
330dying on its own:
331
332----
333static void walken_commit_walk(struct rev_info *rev)
334{
335 if (prepare_revision_walk(rev))
336 die(_("revision walk setup failed"));
337}
338----
339
340NOTE: `die()` prints to `stderr` and exits the program. Since it will print to
341`stderr` it's likely to be seen by a human, so we will localize it.
342
343==== Performing the Walk!
344
345Finally! We are ready to begin the walk itself. Now we can see that `rev_info`
346can also be used as an iterator; we move to the next item in the walk by using
347`get_revision()` repeatedly. Add the listed variable declarations at the top and
348the walk loop below the `prepare_revision_walk()` call within your
349`walken_commit_walk()`:
350
351----
352static void walken_commit_walk(struct rev_info *rev)
353{
354 struct commit *commit;
355 struct strbuf prettybuf = STRBUF_INIT;
356
357 ...
358
359 while ((commit = get_revision(rev))) {
e0479fa0
ES
360 strbuf_reset(&prettybuf);
361 pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf);
362 puts(prettybuf.buf);
363 }
364 strbuf_release(&prettybuf);
365}
366----
367
368NOTE: `puts()` prints a `char*` to `stdout`. Since this is the part of the
369command we expect to be machine-parsed, we're sending it directly to stdout.
370
371Give it a shot.
372
373----
374$ make
375$ ./bin-wrappers/git walken
376----
377
378You should see all of the subject lines of all the commits in
379your tree's history, in order, ending with the initial commit, "Initial revision
380of "git", the information manager from hell". Congratulations! You've written
381your first revision walk. You can play with printing some additional fields
382from each commit if you're curious; have a look at the functions available in
383`commit.h`.
384
385=== Adding a Filter
386
387Next, let's try to filter the commits we see based on their author. This is
388equivalent to running `git log --author=<pattern>`. We can add a filter by
389modifying `rev_info.grep_filter`, which is a `struct grep_opt`.
390
391First some setup. Add `init_grep_defaults()` to `init_walken_defaults()` and add
392`grep_config()` to `git_walken_config()`:
393
394----
395static void init_walken_defaults(void)
396{
397 init_grep_defaults(the_repository);
398}
399
400...
401
402static int git_walken_config(const char *var, const char *value, void *cb)
403{
404 grep_config(var, value, cb);
405 return git_default_config(var, value, cb);
406}
407----
408
409Next, we can modify the `grep_filter`. This is done with convenience functions
410found in `grep.h`. For fun, we're filtering to only commits from folks using a
411`gmail.com` email address - a not-very-precise guess at who may be working on
412Git as a hobby. Since we're checking the author, which is a specific line in the
413header, we'll use the `append_header_grep_pattern()` helper. We can use
414the `enum grep_header_field` to indicate which part of the commit header we want
415to search.
416
417In `final_rev_info_setup()`, add your filter line:
418
419----
420static void final_rev_info_setup(int argc, const char **argv,
421 const char *prefix, struct rev_info *rev)
422{
423 ...
424
425 append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR,
426 "gmail");
427 compile_grep_patterns(&rev->grep_filter);
428
429 ...
430}
431----
432
433`append_header_grep_pattern()` adds your new "gmail" pattern to `rev_info`, but
434it won't work unless we compile it with `compile_grep_patterns()`.
435
436NOTE: If you are using `setup_revisions()` (for example, if you are passing a
437`setup_revision_opt` instead of using `add_head_to_pending()`), you don't need
438to call `compile_grep_patterns()` because `setup_revisions()` calls it for you.
439
440NOTE: We could add the same filter via the `append_grep_pattern()` helper if we
441wanted to, but `append_header_grep_pattern()` adds the `enum grep_context` and
442`enum grep_pat_token` for us.
443
444=== Changing the Order
445
446There are a few ways that we can change the order of the commits during a
447revision walk. Firstly, we can use the `enum rev_sort_order` to choose from some
448typical orderings.
449
450`topo_order` is the same as `git log --topo-order`: we avoid showing a parent
451before all of its children have been shown, and we avoid mixing commits which
452are in different lines of history. (`git help log`'s section on `--topo-order`
453has a very nice diagram to illustrate this.)
454
455Let's see what happens when we run with `REV_SORT_BY_COMMIT_DATE` as opposed to
456`REV_SORT_BY_AUTHOR_DATE`. Add the following:
457
458----
459static void final_rev_info_setup(int argc, const char **argv,
460 const char *prefix, struct rev_info *rev)
461{
462 ...
463
464 rev->topo_order = 1;
465 rev->sort_order = REV_SORT_BY_COMMIT_DATE;
466
467 ...
468}
469----
470
471Let's output this into a file so we can easily diff it with the walk sorted by
472author date.
473
474----
475$ make
476$ ./bin-wrappers/git walken > commit-date.txt
477----
478
479Then, let's sort by author date and run it again.
480
481----
482static void final_rev_info_setup(int argc, const char **argv,
483 const char *prefix, struct rev_info *rev)
484{
485 ...
486
487 rev->topo_order = 1;
488 rev->sort_order = REV_SORT_BY_AUTHOR_DATE;
489
490 ...
491}
492----
493
494----
495$ make
496$ ./bin-wrappers/git walken > author-date.txt
497----
498
499Finally, compare the two. This is a little less helpful without object names or
500dates, but hopefully we get the idea.
501
502----
503$ diff -u commit-date.txt author-date.txt
504----
505
506This display indicates that commits can be reordered after they're written, for
507example with `git rebase`.
508
509Let's try one more reordering of commits. `rev_info` exposes a `reverse` flag.
510Set that flag somewhere inside of `final_rev_info_setup()`:
511
512----
513static void final_rev_info_setup(int argc, const char **argv, const char *prefix,
514 struct rev_info *rev)
515{
516 ...
517
518 rev->reverse = 1;
519
520 ...
521}
522----
523
524Run your walk again and note the difference in order. (If you remove the grep
525pattern, you should see the last commit this call gives you as your current
526HEAD.)
527
528== Basic Object Walk
529
530So far we've been walking only commits. But Git has more types of objects than
531that! Let's see if we can walk _all_ objects, and find out some information
532about each one.
533
534We can base our work on an example. `git pack-objects` prepares all kinds of
535objects for packing into a bitmap or packfile. The work we are interested in
536resides in `builtins/pack-objects.c:get_object_list()`; examination of that
537function shows that the all-object walk is being performed by
538`traverse_commit_list()` or `traverse_commit_list_filtered()`. Those two
539functions reside in `list-objects.c`; examining the source shows that, despite
540the name, these functions traverse all kinds of objects. Let's have a look at
541the arguments to `traverse_commit_list_filtered()`, which are a superset of the
542arguments to the unfiltered version.
543
544- `struct list_objects_filter_options *filter_options`: This is a struct which
545 stores a filter-spec as outlined in `Documentation/rev-list-options.txt`.
546- `struct rev_info *revs`: This is the `rev_info` used for the walk.
547- `show_commit_fn show_commit`: A callback which will be used to handle each
548 individual commit object.
549- `show_object_fn show_object`: A callback which will be used to handle each
550 non-commit object (so each blob, tree, or tag).
551- `void *show_data`: A context buffer which is passed in turn to `show_commit`
552 and `show_object`.
553- `struct oidset *omitted`: A linked-list of object IDs which the provided
554 filter caused to be omitted.
555
556It looks like this `traverse_commit_list_filtered()` uses callbacks we provide
557instead of needing us to call it repeatedly ourselves. Cool! Let's add the
558callbacks first.
559
560For the sake of this tutorial, we'll simply keep track of how many of each kind
561of object we find. At file scope in `builtin/walken.c` add the following
562tracking variables:
563
564----
565static int commit_count;
566static int tag_count;
567static int blob_count;
568static int tree_count;
569----
570
571Commits are handled by a different callback than other objects; let's do that
572one first:
573
574----
575static void walken_show_commit(struct commit *cmt, void *buf)
576{
577 commit_count++;
578}
579----
580
581The `cmt` argument is fairly self-explanatory. But it's worth mentioning that
582the `buf` argument is actually the context buffer that we can provide to the
583traversal calls - `show_data`, which we mentioned a moment ago.
584
585Since we have the `struct commit` object, we can look at all the same parts that
586we looked at in our earlier commit-only walk. For the sake of this tutorial,
587though, we'll just increment the commit counter and move on.
588
589The callback for non-commits is a little different, as we'll need to check
590which kind of object we're dealing with:
591
592----
593static void walken_show_object(struct object *obj, const char *str, void *buf)
594{
595 switch (obj->type) {
596 case OBJ_TREE:
597 tree_count++;
598 break;
599 case OBJ_BLOB:
600 blob_count++;
601 break;
602 case OBJ_TAG:
603 tag_count++;
604 break;
605 case OBJ_COMMIT:
606 BUG("unexpected commit object in walken_show_object\n");
607 default:
608 BUG("unexpected object type %s in walken_show_object\n",
609 type_name(obj->type));
610 }
611}
612----
613
614Again, `obj` is fairly self-explanatory, and we can guess that `buf` is the same
615context pointer that `walken_show_commit()` receives: the `show_data` argument
616to `traverse_commit_list()` and `traverse_commit_list_filtered()`. Finally,
617`str` contains the name of the object, which ends up being something like
618`foo.txt` (blob), `bar/baz` (tree), or `v1.2.3` (tag).
619
620To help assure us that we aren't double-counting commits, we'll include some
621complaining if a commit object is routed through our non-commit callback; we'll
622also complain if we see an invalid object type. Since those two cases should be
623unreachable, and would only change in the event of a semantic change to the Git
624codebase, we complain by using `BUG()` - which is a signal to a developer that
625the change they made caused unintended consequences, and the rest of the
626codebase needs to be updated to understand that change. `BUG()` is not intended
627to be seen by the public, so it is not localized.
628
629Our main object walk implementation is substantially different from our commit
630walk implementation, so let's make a new function to perform the object walk. We
631can perform setup which is applicable to all objects here, too, to keep separate
632from setup which is applicable to commit-only walks.
633
634We'll start by enabling all types of objects in the `struct rev_info`. We'll
635also turn on `tree_blobs_in_commit_order`, which means that we will walk a
636commit's tree and everything it points to immediately after we find each commit,
637as opposed to waiting for the end and walking through all trees after the commit
638history has been discovered. With the appropriate settings configured, we are
639ready to call `prepare_revision_walk()`.
640
641----
642static void walken_object_walk(struct rev_info *rev)
643{
644 rev->tree_objects = 1;
645 rev->blob_objects = 1;
646 rev->tag_objects = 1;
647 rev->tree_blobs_in_commit_order = 1;
648
649 if (prepare_revision_walk(rev))
650 die(_("revision walk setup failed"));
651
652 commit_count = 0;
653 tag_count = 0;
654 blob_count = 0;
655 tree_count = 0;
656----
657
658Let's start by calling just the unfiltered walk and reporting our counts.
659Complete your implementation of `walken_object_walk()`:
660
661----
662 traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
663
664 printf("commits %d\nblobs %d\ntags %d\ntrees %d\n", commit_count,
665 blob_count, tag_count, tree_count);
666}
667----
668
669NOTE: This output is intended to be machine-parsed. Therefore, we are not
670sending it to `trace_printf()`, and we are not localizing it - we need scripts
671to be able to count on the formatting to be exactly the way it is shown here.
672If we were intending this output to be read by humans, we would need to localize
673it with `_()`.
674
675Finally, we'll ask `cmd_walken()` to use the object walk instead. Discussing
676command line options is out of scope for this tutorial, so we'll just hardcode
677a branch we can change at compile time. Where you call `final_rev_info_setup()`
678and `walken_commit_walk()`, instead branch like so:
679
680----
681 if (1) {
682 add_head_to_pending(&rev);
683 walken_object_walk(&rev);
684 } else {
685 final_rev_info_setup(argc, argv, prefix, &rev);
686 walken_commit_walk(&rev);
687 }
688----
689
690NOTE: For simplicity, we've avoided all the filters and sorts we applied in
691`final_rev_info_setup()` and simply added `HEAD` to our pending queue. If you
692want, you can certainly use the filters we added before by moving
693`final_rev_info_setup()` out of the conditional and removing the call to
694`add_head_to_pending()`.
695
696Now we can try to run our command! It should take noticeably longer than the
697commit walk, but an examination of the output will give you an idea why. Your
698output should look similar to this example, but with different counts:
699
700----
701Object walk completed. Found 55733 commits, 100274 blobs, 0 tags, and 104210 trees.
702----
703
704This makes sense. We have more trees than commits because the Git project has
705lots of subdirectories which can change, plus at least one tree per commit. We
706have no tags because we started on a commit (`HEAD`) and while tags can point to
707commits, commits can't point to tags.
708
709NOTE: You will have different counts when you run this yourself! The number of
710objects grows along with the Git project.
711
712=== Adding a Filter
713
714There are a handful of filters that we can apply to the object walk laid out in
715`Documentation/rev-list-options.txt`. These filters are typically useful for
716operations such as creating packfiles or performing a partial clone. They are
717defined in `list-objects-filter-options.h`. For the purposes of this tutorial we
718will use the "tree:1" filter, which causes the walk to omit all trees and blobs
719which are not directly referenced by commits reachable from the commit in
720`pending` when the walk begins. (`pending` is the list of objects which need to
721be traversed during a walk; you can imagine a breadth-first tree traversal to
722help understand. In our case, that means we omit trees and blobs not directly
723referenced by `HEAD` or `HEAD`'s history, because we begin the walk with only
724`HEAD` in the `pending` list.)
725
726First, we'll need to `#include "list-objects-filter-options.h`" and set up the
727`struct list_objects_filter_options` at the top of the function.
728
729----
730static void walken_object_walk(struct rev_info *rev)
731{
732 struct list_objects_filter_options filter_options = {};
733
734 ...
735----
736
737For now, we are not going to track the omitted objects, so we'll replace those
738parameters with `NULL`. For the sake of simplicity, we'll add a simple
739build-time branch to use our filter or not. Replace the line calling
740`traverse_commit_list()` with the following, which will remind us which kind of
741walk we've just performed:
742
743----
744 if (0) {
745 /* Unfiltered: */
746 trace_printf(_("Unfiltered object walk.\n"));
747 traverse_commit_list(rev, walken_show_commit,
748 walken_show_object, NULL);
749 } else {
750 trace_printf(
751 _("Filtered object walk with filterspec 'tree:1'.\n"));
752 parse_list_objects_filter(&filter_options, "tree:1");
753
754 traverse_commit_list_filtered(&filter_options, rev,
755 walken_show_commit, walken_show_object, NULL, NULL);
756 }
757----
758
759`struct list_objects_filter_options` is usually built directly from a command
760line argument, so the module provides an easy way to build one from a string.
761Even though we aren't taking user input right now, we can still build one with
762a hardcoded string using `parse_list_objects_filter()`.
763
764With the filter spec "tree:1", we are expecting to see _only_ the root tree for
765each commit; therefore, the tree object count should be less than or equal to
766the number of commits. (For an example of why that's true: `git commit --revert`
767points to the same tree object as its grandparent.)
768
769=== Counting Omitted Objects
770
771We also have the capability to enumerate all objects which were omitted by a
772filter, like with `git log --filter=<spec> --filter-print-omitted`. Asking
773`traverse_commit_list_filtered()` to populate the `omitted` list means that our
774object walk does not perform any better than an unfiltered object walk; all
775reachable objects are walked in order to populate the list.
776
777First, add the `struct oidset` and related items we will use to iterate it:
778
779----
780static void walken_object_walk(
781 ...
782
783 struct oidset omitted;
784 struct oidset_iter oit;
785 struct object_id *oid = NULL;
786 int omitted_count = 0;
787 oidset_init(&omitted, 0);
788
789 ...
790----
791
792Modify the call to `traverse_commit_list_filtered()` to include your `omitted`
793object:
794
795----
796 ...
797
798 traverse_commit_list_filtered(&filter_options, rev,
799 walken_show_commit, walken_show_object, NULL, &omitted);
800
801 ...
802----
803
804Then, after your traversal, the `oidset` traversal is pretty straightforward.
805Count all the objects within and modify the print statement:
806
807----
808 /* Count the omitted objects. */
809 oidset_iter_init(&omitted, &oit);
810
811 while ((oid = oidset_iter_next(&oit)))
812 omitted_count++;
813
814 printf("commits %d\nblobs %d\ntags %d\ntrees%d\nomitted %d\n",
815 commit_count, blob_count, tag_count, tree_count, omitted_count);
816----
817
818By running your walk with and without the filter, you should find that the total
819object count in each case is identical. You can also time each invocation of
820the `walken` subcommand, with and without `omitted` being passed in, to confirm
821to yourself the runtime impact of tracking all omitted objects.
822
823=== Changing the Order
824
825Finally, let's demonstrate that you can also reorder walks of all objects, not
826just walks of commits. First, we'll make our handlers chattier - modify
827`walken_show_commit()` and `walken_show_object()` to print the object as they
828go:
829
830----
831static void walken_show_commit(struct commit *cmt, void *buf)
832{
833 trace_printf("commit: %s\n", oid_to_hex(&cmt->object.oid));
834 commit_count++;
835}
836
837static void walken_show_object(struct object *obj, const char *str, void *buf)
838{
839 trace_printf("%s: %s\n", type_name(obj->type), oid_to_hex(&obj->oid));
840
841 ...
842}
843----
844
845NOTE: Since we will be examining this output directly as humans, we'll use
846`trace_printf()` here. Additionally, since this change introduces a significant
847number of printed lines, using `trace_printf()` will allow us to easily silence
848those lines without having to recompile.
849
850(Leave the counter increment logic in place.)
851
852With only that change, run again (but save yourself some scrollback):
853
854----
855$ GIT_TRACE=1 ./bin-wrappers/git walken | head -n 10
856----
857
858Take a look at the top commit with `git show` and the object ID you printed; it
859should be the same as the output of `git show HEAD`.
860
861Next, let's change a setting on our `struct rev_info` within
862`walken_object_walk()`. Find where you're changing the other settings on `rev`,
863such as `rev->tree_objects` and `rev->tree_blobs_in_commit_order`, and add the
864`reverse` setting at the bottom:
865
866----
867 ...
868
869 rev->tree_objects = 1;
870 rev->blob_objects = 1;
871 rev->tag_objects = 1;
872 rev->tree_blobs_in_commit_order = 1;
873 rev->reverse = 1;
874
875 ...
876----
877
878Now, run again, but this time, let's grab the last handful of objects instead
879of the first handful:
880
881----
882$ make
883$ GIT_TRACE=1 ./bin-wrappers git walken | tail -n 10
884----
885
886The last commit object given should have the same OID as the one we saw at the
887top before, and running `git show <oid>` with that OID should give you again
888the same results as `git show HEAD`. Furthermore, if you run and examine the
889first ten lines again (with `head` instead of `tail` like we did before applying
890the `reverse` setting), you should see that now the first commit printed is the
891initial commit, `e83c5163`.
892
893== Wrapping Up
894
895Let's review. In this tutorial, we:
896
897- Built a commit walk from the ground up
898- Enabled a grep filter for that commit walk
899- Changed the sort order of that filtered commit walk
900- Built an object walk (tags, commits, trees, and blobs) from the ground up
901- Learned how to add a filter-spec to an object walk
902- Changed the display order of the filtered object walk