]> git.ipfire.org Git - thirdparty/git.git/blame - Documentation/MyFirstObjectWalk.txt
doc/git-config: clarify GIT_CONFIG environment variable
[thirdparty/git.git] / Documentation / MyFirstObjectWalk.txt
CommitLineData
e0479fa0
ES
1= My First Object Walk
2
3== What's an Object Walk?
4
5The object walk is a key concept in Git - this is the process that underpins
6operations like object transfer and fsck. Beginning from a given commit, the
7list of objects is found by walking parent relationships between commits (commit
8X based on commit W) and containment relationships between objects (tree Y is
9contained within commit X, and blob Z is located within tree Y, giving our
10working tree for commit X something like `y/z.txt`).
11
12A related concept is the revision walk, which is focused on commit objects and
13their parent relationships and does not delve into other object types. The
14revision walk is used for operations like `git log`.
15
16=== Related Reading
17
18- `Documentation/user-manual.txt` under "Hacking Git" contains some coverage of
19 the revision walker in its various incarnations.
301d595e 20- `revision.h`
e0479fa0
ES
21- https://eagain.net/articles/git-for-computer-scientists/[Git for Computer Scientists]
22 gives a good overview of the types of objects in Git and what your object
23 walk is really describing.
24
25== Setting Up
26
27Create a new branch from `master`.
28
29----
30git checkout -b revwalk origin/master
31----
32
33We'll put our fiddling into a new command. For fun, let's name it `git walken`.
34Open up a new file `builtin/walken.c` and set up the command handler:
35
36----
37/*
38 * "git walken"
39 *
40 * Part of the "My First Object Walk" tutorial.
41 */
42
43#include "builtin.h"
44
45int cmd_walken(int argc, const char **argv, const char *prefix)
46{
47 trace_printf(_("cmd_walken incoming...\n"));
48 return 0;
49}
50----
51
52NOTE: `trace_printf()` differs from `printf()` in that it can be turned on or
53off at runtime. For the purposes of this tutorial, we will write `walken` as
54though it is intended for use as a "plumbing" command: that is, a command which
55is used primarily in scripts, rather than interactively by humans (a "porcelain"
56command). So we will send our debug output to `trace_printf()` instead. When
57running, enable trace output by setting the environment variable `GIT_TRACE`.
58
59Add usage text and `-h` handling, like all subcommands should consistently do
60(our test suite will notice and complain if you fail to do so).
61
62----
63int cmd_walken(int argc, const char **argv, const char *prefix)
64{
65 const char * const walken_usage[] = {
66 N_("git walken"),
67 NULL,
68 }
69 struct option options[] = {
70 OPT_END()
71 };
72
73 argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
74
75 ...
76}
77----
78
79Also add the relevant line in `builtin.h` near `cmd_whatchanged()`:
80
81----
82int cmd_walken(int argc, const char **argv, const char *prefix);
83----
84
85Include the command in `git.c` in `commands[]` near the entry for `whatchanged`,
86maintaining alphabetical ordering:
87
88----
89{ "walken", cmd_walken, RUN_SETUP },
90----
91
92Add it to the `Makefile` near the line for `builtin/worktree.o`:
93
94----
95BUILTIN_OBJS += builtin/walken.o
96----
97
98Build and test out your command, without forgetting to ensure the `DEVELOPER`
99flag is set, and with `GIT_TRACE` enabled so the debug output can be seen:
100
101----
102$ echo DEVELOPER=1 >>config.mak
103$ make
104$ GIT_TRACE=1 ./bin-wrappers/git walken
105----
106
107NOTE: For a more exhaustive overview of the new command process, take a look at
108`Documentation/MyFirstContribution.txt`.
109
110NOTE: A reference implementation can be found at
111https://github.com/nasamuffin/git/tree/revwalk.
112
113=== `struct rev_cmdline_info`
114
115The definition of `struct rev_cmdline_info` can be found in `revision.h`.
116
117This struct is contained within the `rev_info` struct and is used to reflect
118parameters provided by the user over the CLI.
119
120`nr` represents the number of `rev_cmdline_entry` present in the array.
121
13aa9c8b
HW
122`alloc` is used by the `ALLOC_GROW` macro. Check `cache.h` - this variable is
123used to track the allocated size of the list.
e0479fa0
ES
124
125Per entry, we find:
126
127`item` is the object provided upon which to base the object walk. Items in Git
128can be blobs, trees, commits, or tags. (See `Documentation/gittutorial-2.txt`.)
129
130`name` is the object ID (OID) of the object - a hex string you may be familiar
131with from using Git to organize your source in the past. Check the tutorial
132mentioned above towards the top for a discussion of where the OID can come
133from.
134
135`whence` indicates some information about what to do with the parents of the
136specified object. We'll explore this flag more later on; take a look at
137`Documentation/revisions.txt` to get an idea of what could set the `whence`
138value.
139
140`flags` are used to hint the beginning of the revision walk and are the first
141block under the `#include`s in `revision.h`. The most likely ones to be set in
142the `rev_cmdline_info` are `UNINTERESTING` and `BOTTOM`, but these same flags
143can be used during the walk, as well.
144
145=== `struct rev_info`
146
147This one is quite a bit longer, and many fields are only used during the walk
148by `revision.c` - not configuration options. Most of the configurable flags in
149`struct rev_info` have a mirror in `Documentation/rev-list-options.txt`. It's a
150good idea to take some time and read through that document.
151
152== Basic Commit Walk
153
154First, let's see if we can replicate the output of `git log --oneline`. We'll
155refer back to the implementation frequently to discover norms when performing
156an object walk of our own.
157
158To do so, we'll first find all the commits, in order, which preceded the current
159commit. We'll extract the name and subject of the commit from each.
160
161Ideally, we will also be able to find out which ones are currently at the tip of
162various branches.
163
164=== Setting Up
165
166Preparing for your object walk has some distinct stages.
167
1681. Perform default setup for this mode, and others which may be invoked.
1692. Check configuration files for relevant settings.
1703. Set up the `rev_info` struct.
1714. Tweak the initialized `rev_info` to suit the current walk.
1725. Prepare the `rev_info` for the walk.
1736. Iterate over the objects, processing each one.
174
175==== Default Setups
176
177Before examining configuration files which may modify command behavior, set up
178default state for switches or options your command may have. If your command
179utilizes other Git components, ask them to set up their default states as well.
180For instance, `git log` takes advantage of `grep` and `diff` functionality, so
181its `init_log_defaults()` sets its own state (`decoration_style`) and asks
182`grep` and `diff` to initialize themselves by calling each of their
183initialization functions.
184
e0479fa0
ES
185==== Configuring From `.gitconfig`
186
187Next, we should have a look at any relevant configuration settings (i.e.,
188settings readable and settable from `git config`). This is done by providing a
189callback to `git_config()`; within that callback, you can also invoke methods
190from other components you may need that need to intercept these options. Your
191callback will be invoked once per each configuration value which Git knows about
192(global, local, worktree, etc.).
193
194Similarly to the default values, we don't have anything to do here yet
195ourselves; however, we should call `git_default_config()` if we aren't calling
196any other existing config callbacks.
197
198Add a new function to `builtin/walken.c`:
199
200----
201static int git_walken_config(const char *var, const char *value, void *cb)
202{
203 /*
204 * For now, we don't have any custom configuration, so fall back to
205 * the default config.
206 */
207 return git_default_config(var, value, cb);
208}
209----
210
211Make sure to invoke `git_config()` with it in your `cmd_walken()`:
212
213----
214int cmd_walken(int argc, const char **argv, const char *prefix)
215{
216 ...
217
218 git_config(git_walken_config, NULL);
219
220 ...
221}
222----
223
224==== Setting Up `rev_info`
225
226Now that we've gathered external configuration and options, it's time to
227initialize the `rev_info` object which we will use to perform the walk. This is
228typically done by calling `repo_init_revisions()` with the repository you intend
229to target, as well as the `prefix` argument of `cmd_walken` and your `rev_info`
230struct.
231
232Add the `struct rev_info` and the `repo_init_revisions()` call:
233----
234int cmd_walken(int argc, const char **argv, const char *prefix)
235{
236 /* This can go wherever you like in your declarations.*/
237 struct rev_info rev;
238 ...
239
240 /* This should go after the git_config() call. */
241 repo_init_revisions(the_repository, &rev, prefix);
242
243 ...
244}
245----
246
247==== Tweaking `rev_info` For the Walk
248
249We're getting close, but we're still not quite ready to go. Now that `rev` is
250initialized, we can modify it to fit our needs. This is usually done within a
251helper for clarity, so let's add one:
252
253----
254static void final_rev_info_setup(struct rev_info *rev)
255{
256 /*
257 * We want to mimic the appearance of `git log --oneline`, so let's
258 * force oneline format.
259 */
260 get_commit_format("oneline", rev);
261
262 /* Start our object walk at HEAD. */
263 add_head_to_pending(rev);
264}
265----
266
267[NOTE]
268====
269Instead of using the shorthand `add_head_to_pending()`, you could do
270something like this:
271----
272 struct setup_revision_opt opt;
273
274 memset(&opt, 0, sizeof(opt));
275 opt.def = "HEAD";
276 opt.revarg_opt = REVARG_COMMITTISH;
277 setup_revisions(argc, argv, rev, &opt);
278----
279Using a `setup_revision_opt` gives you finer control over your walk's starting
280point.
281====
282
283Then let's invoke `final_rev_info_setup()` after the call to
284`repo_init_revisions()`:
285
286----
287int cmd_walken(int argc, const char **argv, const char *prefix)
288{
289 ...
290
291 final_rev_info_setup(&rev);
292
293 ...
294}
295----
296
297Later, we may wish to add more arguments to `final_rev_info_setup()`. But for
298now, this is all we need.
299
300==== Preparing `rev_info` For the Walk
301
302Now that `rev` is all initialized and configured, we've got one more setup step
303before we get rolling. We can do this in a helper, which will both prepare the
304`rev_info` for the walk, and perform the walk itself. Let's start the helper
305with the call to `prepare_revision_walk()`, which can return an error without
306dying on its own:
307
308----
309static void walken_commit_walk(struct rev_info *rev)
310{
311 if (prepare_revision_walk(rev))
312 die(_("revision walk setup failed"));
313}
314----
315
316NOTE: `die()` prints to `stderr` and exits the program. Since it will print to
317`stderr` it's likely to be seen by a human, so we will localize it.
318
319==== Performing the Walk!
320
321Finally! We are ready to begin the walk itself. Now we can see that `rev_info`
322can also be used as an iterator; we move to the next item in the walk by using
323`get_revision()` repeatedly. Add the listed variable declarations at the top and
324the walk loop below the `prepare_revision_walk()` call within your
325`walken_commit_walk()`:
326
327----
328static void walken_commit_walk(struct rev_info *rev)
329{
330 struct commit *commit;
331 struct strbuf prettybuf = STRBUF_INIT;
332
333 ...
334
335 while ((commit = get_revision(rev))) {
e0479fa0
ES
336 strbuf_reset(&prettybuf);
337 pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf);
338 puts(prettybuf.buf);
339 }
340 strbuf_release(&prettybuf);
341}
342----
343
344NOTE: `puts()` prints a `char*` to `stdout`. Since this is the part of the
345command we expect to be machine-parsed, we're sending it directly to stdout.
346
347Give it a shot.
348
349----
350$ make
351$ ./bin-wrappers/git walken
352----
353
354You should see all of the subject lines of all the commits in
355your tree's history, in order, ending with the initial commit, "Initial revision
356of "git", the information manager from hell". Congratulations! You've written
357your first revision walk. You can play with printing some additional fields
358from each commit if you're curious; have a look at the functions available in
359`commit.h`.
360
361=== Adding a Filter
362
363Next, let's try to filter the commits we see based on their author. This is
364equivalent to running `git log --author=<pattern>`. We can add a filter by
365modifying `rev_info.grep_filter`, which is a `struct grep_opt`.
366
96313423 367First some setup. Add `grep_config()` to `git_walken_config()`:
e0479fa0
ES
368
369----
e0479fa0
ES
370static int git_walken_config(const char *var, const char *value, void *cb)
371{
372 grep_config(var, value, cb);
373 return git_default_config(var, value, cb);
374}
375----
376
377Next, we can modify the `grep_filter`. This is done with convenience functions
378found in `grep.h`. For fun, we're filtering to only commits from folks using a
379`gmail.com` email address - a not-very-precise guess at who may be working on
380Git as a hobby. Since we're checking the author, which is a specific line in the
381header, we'll use the `append_header_grep_pattern()` helper. We can use
382the `enum grep_header_field` to indicate which part of the commit header we want
383to search.
384
385In `final_rev_info_setup()`, add your filter line:
386
387----
388static void final_rev_info_setup(int argc, const char **argv,
389 const char *prefix, struct rev_info *rev)
390{
391 ...
392
393 append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR,
394 "gmail");
395 compile_grep_patterns(&rev->grep_filter);
396
397 ...
398}
399----
400
401`append_header_grep_pattern()` adds your new "gmail" pattern to `rev_info`, but
402it won't work unless we compile it with `compile_grep_patterns()`.
403
404NOTE: If you are using `setup_revisions()` (for example, if you are passing a
405`setup_revision_opt` instead of using `add_head_to_pending()`), you don't need
406to call `compile_grep_patterns()` because `setup_revisions()` calls it for you.
407
408NOTE: We could add the same filter via the `append_grep_pattern()` helper if we
409wanted to, but `append_header_grep_pattern()` adds the `enum grep_context` and
410`enum grep_pat_token` for us.
411
412=== Changing the Order
413
414There are a few ways that we can change the order of the commits during a
415revision walk. Firstly, we can use the `enum rev_sort_order` to choose from some
416typical orderings.
417
418`topo_order` is the same as `git log --topo-order`: we avoid showing a parent
419before all of its children have been shown, and we avoid mixing commits which
420are in different lines of history. (`git help log`'s section on `--topo-order`
421has a very nice diagram to illustrate this.)
422
423Let's see what happens when we run with `REV_SORT_BY_COMMIT_DATE` as opposed to
424`REV_SORT_BY_AUTHOR_DATE`. Add the following:
425
426----
427static void final_rev_info_setup(int argc, const char **argv,
428 const char *prefix, struct rev_info *rev)
429{
430 ...
431
432 rev->topo_order = 1;
433 rev->sort_order = REV_SORT_BY_COMMIT_DATE;
434
435 ...
436}
437----
438
439Let's output this into a file so we can easily diff it with the walk sorted by
440author date.
441
442----
443$ make
444$ ./bin-wrappers/git walken > commit-date.txt
445----
446
447Then, let's sort by author date and run it again.
448
449----
450static void final_rev_info_setup(int argc, const char **argv,
451 const char *prefix, struct rev_info *rev)
452{
453 ...
454
455 rev->topo_order = 1;
456 rev->sort_order = REV_SORT_BY_AUTHOR_DATE;
457
458 ...
459}
460----
461
462----
463$ make
464$ ./bin-wrappers/git walken > author-date.txt
465----
466
467Finally, compare the two. This is a little less helpful without object names or
468dates, but hopefully we get the idea.
469
470----
471$ diff -u commit-date.txt author-date.txt
472----
473
474This display indicates that commits can be reordered after they're written, for
475example with `git rebase`.
476
477Let's try one more reordering of commits. `rev_info` exposes a `reverse` flag.
478Set that flag somewhere inside of `final_rev_info_setup()`:
479
480----
481static void final_rev_info_setup(int argc, const char **argv, const char *prefix,
482 struct rev_info *rev)
483{
484 ...
485
486 rev->reverse = 1;
487
488 ...
489}
490----
491
492Run your walk again and note the difference in order. (If you remove the grep
493pattern, you should see the last commit this call gives you as your current
494HEAD.)
495
496== Basic Object Walk
497
498So far we've been walking only commits. But Git has more types of objects than
499that! Let's see if we can walk _all_ objects, and find out some information
500about each one.
501
502We can base our work on an example. `git pack-objects` prepares all kinds of
503objects for packing into a bitmap or packfile. The work we are interested in
504resides in `builtins/pack-objects.c:get_object_list()`; examination of that
505function shows that the all-object walk is being performed by
506`traverse_commit_list()` or `traverse_commit_list_filtered()`. Those two
507functions reside in `list-objects.c`; examining the source shows that, despite
508the name, these functions traverse all kinds of objects. Let's have a look at
509the arguments to `traverse_commit_list_filtered()`, which are a superset of the
510arguments to the unfiltered version.
511
512- `struct list_objects_filter_options *filter_options`: This is a struct which
513 stores a filter-spec as outlined in `Documentation/rev-list-options.txt`.
514- `struct rev_info *revs`: This is the `rev_info` used for the walk.
515- `show_commit_fn show_commit`: A callback which will be used to handle each
516 individual commit object.
517- `show_object_fn show_object`: A callback which will be used to handle each
518 non-commit object (so each blob, tree, or tag).
519- `void *show_data`: A context buffer which is passed in turn to `show_commit`
520 and `show_object`.
521- `struct oidset *omitted`: A linked-list of object IDs which the provided
522 filter caused to be omitted.
523
524It looks like this `traverse_commit_list_filtered()` uses callbacks we provide
525instead of needing us to call it repeatedly ourselves. Cool! Let's add the
526callbacks first.
527
528For the sake of this tutorial, we'll simply keep track of how many of each kind
529of object we find. At file scope in `builtin/walken.c` add the following
530tracking variables:
531
532----
533static int commit_count;
534static int tag_count;
535static int blob_count;
536static int tree_count;
537----
538
539Commits are handled by a different callback than other objects; let's do that
540one first:
541
542----
543static void walken_show_commit(struct commit *cmt, void *buf)
544{
545 commit_count++;
546}
547----
548
549The `cmt` argument is fairly self-explanatory. But it's worth mentioning that
550the `buf` argument is actually the context buffer that we can provide to the
551traversal calls - `show_data`, which we mentioned a moment ago.
552
553Since we have the `struct commit` object, we can look at all the same parts that
554we looked at in our earlier commit-only walk. For the sake of this tutorial,
555though, we'll just increment the commit counter and move on.
556
557The callback for non-commits is a little different, as we'll need to check
558which kind of object we're dealing with:
559
560----
561static void walken_show_object(struct object *obj, const char *str, void *buf)
562{
563 switch (obj->type) {
564 case OBJ_TREE:
565 tree_count++;
566 break;
567 case OBJ_BLOB:
568 blob_count++;
569 break;
570 case OBJ_TAG:
571 tag_count++;
572 break;
573 case OBJ_COMMIT:
574 BUG("unexpected commit object in walken_show_object\n");
575 default:
576 BUG("unexpected object type %s in walken_show_object\n",
577 type_name(obj->type));
578 }
579}
580----
581
582Again, `obj` is fairly self-explanatory, and we can guess that `buf` is the same
583context pointer that `walken_show_commit()` receives: the `show_data` argument
584to `traverse_commit_list()` and `traverse_commit_list_filtered()`. Finally,
585`str` contains the name of the object, which ends up being something like
586`foo.txt` (blob), `bar/baz` (tree), or `v1.2.3` (tag).
587
588To help assure us that we aren't double-counting commits, we'll include some
589complaining if a commit object is routed through our non-commit callback; we'll
590also complain if we see an invalid object type. Since those two cases should be
591unreachable, and would only change in the event of a semantic change to the Git
592codebase, we complain by using `BUG()` - which is a signal to a developer that
593the change they made caused unintended consequences, and the rest of the
594codebase needs to be updated to understand that change. `BUG()` is not intended
595to be seen by the public, so it is not localized.
596
597Our main object walk implementation is substantially different from our commit
598walk implementation, so let's make a new function to perform the object walk. We
599can perform setup which is applicable to all objects here, too, to keep separate
600from setup which is applicable to commit-only walks.
601
602We'll start by enabling all types of objects in the `struct rev_info`. We'll
603also turn on `tree_blobs_in_commit_order`, which means that we will walk a
604commit's tree and everything it points to immediately after we find each commit,
605as opposed to waiting for the end and walking through all trees after the commit
606history has been discovered. With the appropriate settings configured, we are
607ready to call `prepare_revision_walk()`.
608
609----
610static void walken_object_walk(struct rev_info *rev)
611{
612 rev->tree_objects = 1;
613 rev->blob_objects = 1;
614 rev->tag_objects = 1;
615 rev->tree_blobs_in_commit_order = 1;
616
617 if (prepare_revision_walk(rev))
618 die(_("revision walk setup failed"));
619
620 commit_count = 0;
621 tag_count = 0;
622 blob_count = 0;
623 tree_count = 0;
624----
625
626Let's start by calling just the unfiltered walk and reporting our counts.
627Complete your implementation of `walken_object_walk()`:
628
629----
630 traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
631
632 printf("commits %d\nblobs %d\ntags %d\ntrees %d\n", commit_count,
633 blob_count, tag_count, tree_count);
634}
635----
636
637NOTE: This output is intended to be machine-parsed. Therefore, we are not
638sending it to `trace_printf()`, and we are not localizing it - we need scripts
639to be able to count on the formatting to be exactly the way it is shown here.
640If we were intending this output to be read by humans, we would need to localize
641it with `_()`.
642
643Finally, we'll ask `cmd_walken()` to use the object walk instead. Discussing
644command line options is out of scope for this tutorial, so we'll just hardcode
645a branch we can change at compile time. Where you call `final_rev_info_setup()`
646and `walken_commit_walk()`, instead branch like so:
647
648----
649 if (1) {
650 add_head_to_pending(&rev);
651 walken_object_walk(&rev);
652 } else {
653 final_rev_info_setup(argc, argv, prefix, &rev);
654 walken_commit_walk(&rev);
655 }
656----
657
658NOTE: For simplicity, we've avoided all the filters and sorts we applied in
659`final_rev_info_setup()` and simply added `HEAD` to our pending queue. If you
660want, you can certainly use the filters we added before by moving
661`final_rev_info_setup()` out of the conditional and removing the call to
662`add_head_to_pending()`.
663
664Now we can try to run our command! It should take noticeably longer than the
665commit walk, but an examination of the output will give you an idea why. Your
666output should look similar to this example, but with different counts:
667
668----
669Object walk completed. Found 55733 commits, 100274 blobs, 0 tags, and 104210 trees.
670----
671
672This makes sense. We have more trees than commits because the Git project has
673lots of subdirectories which can change, plus at least one tree per commit. We
674have no tags because we started on a commit (`HEAD`) and while tags can point to
675commits, commits can't point to tags.
676
677NOTE: You will have different counts when you run this yourself! The number of
678objects grows along with the Git project.
679
680=== Adding a Filter
681
682There are a handful of filters that we can apply to the object walk laid out in
683`Documentation/rev-list-options.txt`. These filters are typically useful for
684operations such as creating packfiles or performing a partial clone. They are
685defined in `list-objects-filter-options.h`. For the purposes of this tutorial we
686will use the "tree:1" filter, which causes the walk to omit all trees and blobs
687which are not directly referenced by commits reachable from the commit in
688`pending` when the walk begins. (`pending` is the list of objects which need to
689be traversed during a walk; you can imagine a breadth-first tree traversal to
690help understand. In our case, that means we omit trees and blobs not directly
691referenced by `HEAD` or `HEAD`'s history, because we begin the walk with only
692`HEAD` in the `pending` list.)
693
694First, we'll need to `#include "list-objects-filter-options.h`" and set up the
695`struct list_objects_filter_options` at the top of the function.
696
697----
698static void walken_object_walk(struct rev_info *rev)
699{
700 struct list_objects_filter_options filter_options = {};
701
702 ...
703----
704
705For now, we are not going to track the omitted objects, so we'll replace those
706parameters with `NULL`. For the sake of simplicity, we'll add a simple
707build-time branch to use our filter or not. Replace the line calling
708`traverse_commit_list()` with the following, which will remind us which kind of
709walk we've just performed:
710
711----
712 if (0) {
713 /* Unfiltered: */
714 trace_printf(_("Unfiltered object walk.\n"));
715 traverse_commit_list(rev, walken_show_commit,
716 walken_show_object, NULL);
717 } else {
718 trace_printf(
719 _("Filtered object walk with filterspec 'tree:1'.\n"));
720 parse_list_objects_filter(&filter_options, "tree:1");
721
722 traverse_commit_list_filtered(&filter_options, rev,
723 walken_show_commit, walken_show_object, NULL, NULL);
724 }
725----
726
727`struct list_objects_filter_options` is usually built directly from a command
728line argument, so the module provides an easy way to build one from a string.
729Even though we aren't taking user input right now, we can still build one with
730a hardcoded string using `parse_list_objects_filter()`.
731
732With the filter spec "tree:1", we are expecting to see _only_ the root tree for
733each commit; therefore, the tree object count should be less than or equal to
734the number of commits. (For an example of why that's true: `git commit --revert`
735points to the same tree object as its grandparent.)
736
737=== Counting Omitted Objects
738
739We also have the capability to enumerate all objects which were omitted by a
740filter, like with `git log --filter=<spec> --filter-print-omitted`. Asking
741`traverse_commit_list_filtered()` to populate the `omitted` list means that our
742object walk does not perform any better than an unfiltered object walk; all
743reachable objects are walked in order to populate the list.
744
745First, add the `struct oidset` and related items we will use to iterate it:
746
747----
748static void walken_object_walk(
749 ...
750
751 struct oidset omitted;
752 struct oidset_iter oit;
753 struct object_id *oid = NULL;
754 int omitted_count = 0;
755 oidset_init(&omitted, 0);
756
757 ...
758----
759
760Modify the call to `traverse_commit_list_filtered()` to include your `omitted`
761object:
762
763----
764 ...
765
766 traverse_commit_list_filtered(&filter_options, rev,
767 walken_show_commit, walken_show_object, NULL, &omitted);
768
769 ...
770----
771
772Then, after your traversal, the `oidset` traversal is pretty straightforward.
773Count all the objects within and modify the print statement:
774
775----
776 /* Count the omitted objects. */
777 oidset_iter_init(&omitted, &oit);
778
779 while ((oid = oidset_iter_next(&oit)))
780 omitted_count++;
781
782 printf("commits %d\nblobs %d\ntags %d\ntrees%d\nomitted %d\n",
783 commit_count, blob_count, tag_count, tree_count, omitted_count);
784----
785
786By running your walk with and without the filter, you should find that the total
787object count in each case is identical. You can also time each invocation of
788the `walken` subcommand, with and without `omitted` being passed in, to confirm
789to yourself the runtime impact of tracking all omitted objects.
790
791=== Changing the Order
792
793Finally, let's demonstrate that you can also reorder walks of all objects, not
794just walks of commits. First, we'll make our handlers chattier - modify
795`walken_show_commit()` and `walken_show_object()` to print the object as they
796go:
797
798----
799static void walken_show_commit(struct commit *cmt, void *buf)
800{
801 trace_printf("commit: %s\n", oid_to_hex(&cmt->object.oid));
802 commit_count++;
803}
804
805static void walken_show_object(struct object *obj, const char *str, void *buf)
806{
807 trace_printf("%s: %s\n", type_name(obj->type), oid_to_hex(&obj->oid));
808
809 ...
810}
811----
812
813NOTE: Since we will be examining this output directly as humans, we'll use
814`trace_printf()` here. Additionally, since this change introduces a significant
815number of printed lines, using `trace_printf()` will allow us to easily silence
816those lines without having to recompile.
817
818(Leave the counter increment logic in place.)
819
820With only that change, run again (but save yourself some scrollback):
821
822----
823$ GIT_TRACE=1 ./bin-wrappers/git walken | head -n 10
824----
825
826Take a look at the top commit with `git show` and the object ID you printed; it
827should be the same as the output of `git show HEAD`.
828
829Next, let's change a setting on our `struct rev_info` within
830`walken_object_walk()`. Find where you're changing the other settings on `rev`,
831such as `rev->tree_objects` and `rev->tree_blobs_in_commit_order`, and add the
832`reverse` setting at the bottom:
833
834----
835 ...
836
837 rev->tree_objects = 1;
838 rev->blob_objects = 1;
839 rev->tag_objects = 1;
840 rev->tree_blobs_in_commit_order = 1;
841 rev->reverse = 1;
842
843 ...
844----
845
846Now, run again, but this time, let's grab the last handful of objects instead
847of the first handful:
848
849----
850$ make
851$ GIT_TRACE=1 ./bin-wrappers git walken | tail -n 10
852----
853
854The last commit object given should have the same OID as the one we saw at the
855top before, and running `git show <oid>` with that OID should give you again
856the same results as `git show HEAD`. Furthermore, if you run and examine the
857first ten lines again (with `head` instead of `tail` like we did before applying
858the `reverse` setting), you should see that now the first commit printed is the
859initial commit, `e83c5163`.
860
861== Wrapping Up
862
863Let's review. In this tutorial, we:
864
865- Built a commit walk from the ground up
866- Enabled a grep filter for that commit walk
867- Changed the sort order of that filtered commit walk
868- Built an object walk (tags, commits, trees, and blobs) from the ground up
869- Learned how to add a filter-spec to an object walk
870- Changed the display order of the filtered object walk