]> git.ipfire.org Git - thirdparty/git.git/blame - Documentation/MyFirstObjectWalk.txt
Merge branch 'dg/myfirstobjectwalk-updates'
[thirdparty/git.git] / Documentation / MyFirstObjectWalk.txt
CommitLineData
e0479fa0
ES
1= My First Object Walk
2
3== What's an Object Walk?
4
5The object walk is a key concept in Git - this is the process that underpins
6operations like object transfer and fsck. Beginning from a given commit, the
7list of objects is found by walking parent relationships between commits (commit
8X based on commit W) and containment relationships between objects (tree Y is
9contained within commit X, and blob Z is located within tree Y, giving our
10working tree for commit X something like `y/z.txt`).
11
12A related concept is the revision walk, which is focused on commit objects and
13their parent relationships and does not delve into other object types. The
14revision walk is used for operations like `git log`.
15
16=== Related Reading
17
18- `Documentation/user-manual.txt` under "Hacking Git" contains some coverage of
19 the revision walker in its various incarnations.
301d595e 20- `revision.h`
e0479fa0
ES
21- https://eagain.net/articles/git-for-computer-scientists/[Git for Computer Scientists]
22 gives a good overview of the types of objects in Git and what your object
23 walk is really describing.
24
25== Setting Up
26
27Create a new branch from `master`.
28
29----
30git checkout -b revwalk origin/master
31----
32
33We'll put our fiddling into a new command. For fun, let's name it `git walken`.
34Open up a new file `builtin/walken.c` and set up the command handler:
35
36----
37/*
38 * "git walken"
39 *
40 * Part of the "My First Object Walk" tutorial.
41 */
42
43#include "builtin.h"
bbd7c7b7 44#include "trace.h"
e0479fa0
ES
45
46int cmd_walken(int argc, const char **argv, const char *prefix)
47{
48 trace_printf(_("cmd_walken incoming...\n"));
49 return 0;
50}
51----
52
bbd7c7b7
VD
53NOTE: `trace_printf()`, defined in `trace.h`, differs from `printf()` in
54that it can be turned on or off at runtime. For the purposes of this
55tutorial, we will write `walken` as though it is intended for use as
56a "plumbing" command: that is, a command which is used primarily in
57scripts, rather than interactively by humans (a "porcelain" command).
58So we will send our debug output to `trace_printf()` instead.
59When running, enable trace output by setting the environment variable `GIT_TRACE`.
e0479fa0
ES
60
61Add usage text and `-h` handling, like all subcommands should consistently do
62(our test suite will notice and complain if you fail to do so).
7d1b8667 63We'll need to include the `parse-options.h` header.
e0479fa0
ES
64
65----
7d1b8667
JC
66#include "parse-options.h"
67
68...
69
e0479fa0
ES
70int cmd_walken(int argc, const char **argv, const char *prefix)
71{
72 const char * const walken_usage[] = {
73 N_("git walken"),
74 NULL,
f0ac30ec 75 };
e0479fa0
ES
76 struct option options[] = {
77 OPT_END()
78 };
79
80 argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
81
82 ...
83}
84----
85
86Also add the relevant line in `builtin.h` near `cmd_whatchanged()`:
87
88----
89int cmd_walken(int argc, const char **argv, const char *prefix);
90----
91
92Include the command in `git.c` in `commands[]` near the entry for `whatchanged`,
93maintaining alphabetical ordering:
94
95----
96{ "walken", cmd_walken, RUN_SETUP },
97----
98
99Add it to the `Makefile` near the line for `builtin/worktree.o`:
100
101----
102BUILTIN_OBJS += builtin/walken.o
103----
104
105Build and test out your command, without forgetting to ensure the `DEVELOPER`
106flag is set, and with `GIT_TRACE` enabled so the debug output can be seen:
107
108----
109$ echo DEVELOPER=1 >>config.mak
110$ make
111$ GIT_TRACE=1 ./bin-wrappers/git walken
112----
113
114NOTE: For a more exhaustive overview of the new command process, take a look at
115`Documentation/MyFirstContribution.txt`.
116
117NOTE: A reference implementation can be found at
118https://github.com/nasamuffin/git/tree/revwalk.
119
120=== `struct rev_cmdline_info`
121
122The definition of `struct rev_cmdline_info` can be found in `revision.h`.
123
124This struct is contained within the `rev_info` struct and is used to reflect
125parameters provided by the user over the CLI.
126
127`nr` represents the number of `rev_cmdline_entry` present in the array.
128
bc5c5ec0 129`alloc` is used by the `ALLOC_GROW` macro. Check `alloc.h` - this variable is
13aa9c8b 130used to track the allocated size of the list.
e0479fa0
ES
131
132Per entry, we find:
133
134`item` is the object provided upon which to base the object walk. Items in Git
135can be blobs, trees, commits, or tags. (See `Documentation/gittutorial-2.txt`.)
136
137`name` is the object ID (OID) of the object - a hex string you may be familiar
138with from using Git to organize your source in the past. Check the tutorial
139mentioned above towards the top for a discussion of where the OID can come
140from.
141
142`whence` indicates some information about what to do with the parents of the
143specified object. We'll explore this flag more later on; take a look at
144`Documentation/revisions.txt` to get an idea of what could set the `whence`
145value.
146
147`flags` are used to hint the beginning of the revision walk and are the first
148block under the `#include`s in `revision.h`. The most likely ones to be set in
149the `rev_cmdline_info` are `UNINTERESTING` and `BOTTOM`, but these same flags
150can be used during the walk, as well.
151
152=== `struct rev_info`
153
154This one is quite a bit longer, and many fields are only used during the walk
155by `revision.c` - not configuration options. Most of the configurable flags in
156`struct rev_info` have a mirror in `Documentation/rev-list-options.txt`. It's a
157good idea to take some time and read through that document.
158
159== Basic Commit Walk
160
161First, let's see if we can replicate the output of `git log --oneline`. We'll
162refer back to the implementation frequently to discover norms when performing
163an object walk of our own.
164
165To do so, we'll first find all the commits, in order, which preceded the current
166commit. We'll extract the name and subject of the commit from each.
167
168Ideally, we will also be able to find out which ones are currently at the tip of
169various branches.
170
171=== Setting Up
172
173Preparing for your object walk has some distinct stages.
174
1751. Perform default setup for this mode, and others which may be invoked.
1762. Check configuration files for relevant settings.
1773. Set up the `rev_info` struct.
1784. Tweak the initialized `rev_info` to suit the current walk.
1795. Prepare the `rev_info` for the walk.
1806. Iterate over the objects, processing each one.
181
182==== Default Setups
183
184Before examining configuration files which may modify command behavior, set up
185default state for switches or options your command may have. If your command
186utilizes other Git components, ask them to set up their default states as well.
187For instance, `git log` takes advantage of `grep` and `diff` functionality, so
188its `init_log_defaults()` sets its own state (`decoration_style`) and asks
189`grep` and `diff` to initialize themselves by calling each of their
190initialization functions.
191
e0479fa0
ES
192==== Configuring From `.gitconfig`
193
194Next, we should have a look at any relevant configuration settings (i.e.,
195settings readable and settable from `git config`). This is done by providing a
196callback to `git_config()`; within that callback, you can also invoke methods
197from other components you may need that need to intercept these options. Your
198callback will be invoked once per each configuration value which Git knows about
199(global, local, worktree, etc.).
200
201Similarly to the default values, we don't have anything to do here yet
202ourselves; however, we should call `git_default_config()` if we aren't calling
203any other existing config callbacks.
204
7d1b8667
JC
205Add a new function to `builtin/walken.c`.
206We'll also need to include the `config.h` header:
e0479fa0
ES
207
208----
7d1b8667
JC
209#include "config.h"
210
211...
212
d08a189c
DG
213static int git_walken_config(const char *var, const char *value,
214 const struct config_context *ctx, void *cb)
e0479fa0
ES
215{
216 /*
217 * For now, we don't have any custom configuration, so fall back to
218 * the default config.
219 */
d08a189c 220 return git_default_config(var, value, ctx, cb);
e0479fa0
ES
221}
222----
223
224Make sure to invoke `git_config()` with it in your `cmd_walken()`:
225
226----
227int cmd_walken(int argc, const char **argv, const char *prefix)
228{
229 ...
230
231 git_config(git_walken_config, NULL);
232
233 ...
234}
235----
236
237==== Setting Up `rev_info`
238
239Now that we've gathered external configuration and options, it's time to
240initialize the `rev_info` object which we will use to perform the walk. This is
241typically done by calling `repo_init_revisions()` with the repository you intend
242to target, as well as the `prefix` argument of `cmd_walken` and your `rev_info`
243struct.
244
7d1b8667
JC
245Add the `struct rev_info` and the `repo_init_revisions()` call.
246We'll also need to include the `revision.h` header:
247
e0479fa0 248----
7d1b8667
JC
249#include "revision.h"
250
251...
252
e0479fa0
ES
253int cmd_walken(int argc, const char **argv, const char *prefix)
254{
255 /* This can go wherever you like in your declarations.*/
256 struct rev_info rev;
257 ...
258
259 /* This should go after the git_config() call. */
260 repo_init_revisions(the_repository, &rev, prefix);
261
262 ...
263}
264----
265
266==== Tweaking `rev_info` For the Walk
267
268We're getting close, but we're still not quite ready to go. Now that `rev` is
269initialized, we can modify it to fit our needs. This is usually done within a
270helper for clarity, so let's add one:
271
272----
273static void final_rev_info_setup(struct rev_info *rev)
274{
275 /*
276 * We want to mimic the appearance of `git log --oneline`, so let's
277 * force oneline format.
278 */
279 get_commit_format("oneline", rev);
280
281 /* Start our object walk at HEAD. */
282 add_head_to_pending(rev);
283}
284----
285
286[NOTE]
287====
288Instead of using the shorthand `add_head_to_pending()`, you could do
289something like this:
290----
291 struct setup_revision_opt opt;
292
293 memset(&opt, 0, sizeof(opt));
294 opt.def = "HEAD";
295 opt.revarg_opt = REVARG_COMMITTISH;
296 setup_revisions(argc, argv, rev, &opt);
297----
298Using a `setup_revision_opt` gives you finer control over your walk's starting
299point.
300====
301
302Then let's invoke `final_rev_info_setup()` after the call to
303`repo_init_revisions()`:
304
305----
306int cmd_walken(int argc, const char **argv, const char *prefix)
307{
308 ...
309
310 final_rev_info_setup(&rev);
311
312 ...
313}
314----
315
316Later, we may wish to add more arguments to `final_rev_info_setup()`. But for
317now, this is all we need.
318
319==== Preparing `rev_info` For the Walk
320
321Now that `rev` is all initialized and configured, we've got one more setup step
322before we get rolling. We can do this in a helper, which will both prepare the
323`rev_info` for the walk, and perform the walk itself. Let's start the helper
324with the call to `prepare_revision_walk()`, which can return an error without
325dying on its own:
326
327----
328static void walken_commit_walk(struct rev_info *rev)
329{
330 if (prepare_revision_walk(rev))
331 die(_("revision walk setup failed"));
332}
333----
334
335NOTE: `die()` prints to `stderr` and exits the program. Since it will print to
336`stderr` it's likely to be seen by a human, so we will localize it.
337
338==== Performing the Walk!
339
340Finally! We are ready to begin the walk itself. Now we can see that `rev_info`
341can also be used as an iterator; we move to the next item in the walk by using
342`get_revision()` repeatedly. Add the listed variable declarations at the top and
343the walk loop below the `prepare_revision_walk()` call within your
344`walken_commit_walk()`:
345
346----
bbd7c7b7
VD
347#include "pretty.h"
348
349...
350
e0479fa0
ES
351static void walken_commit_walk(struct rev_info *rev)
352{
353 struct commit *commit;
354 struct strbuf prettybuf = STRBUF_INIT;
355
356 ...
357
358 while ((commit = get_revision(rev))) {
e0479fa0
ES
359 strbuf_reset(&prettybuf);
360 pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf);
361 puts(prettybuf.buf);
362 }
363 strbuf_release(&prettybuf);
364}
365----
366
367NOTE: `puts()` prints a `char*` to `stdout`. Since this is the part of the
368command we expect to be machine-parsed, we're sending it directly to stdout.
369
370Give it a shot.
371
372----
373$ make
374$ ./bin-wrappers/git walken
375----
376
377You should see all of the subject lines of all the commits in
378your tree's history, in order, ending with the initial commit, "Initial revision
379of "git", the information manager from hell". Congratulations! You've written
380your first revision walk. You can play with printing some additional fields
381from each commit if you're curious; have a look at the functions available in
382`commit.h`.
383
384=== Adding a Filter
385
386Next, let's try to filter the commits we see based on their author. This is
387equivalent to running `git log --author=<pattern>`. We can add a filter by
388modifying `rev_info.grep_filter`, which is a `struct grep_opt`.
389
96313423 390First some setup. Add `grep_config()` to `git_walken_config()`:
e0479fa0
ES
391
392----
d08a189c
DG
393static int git_walken_config(const char *var, const char *value,
394 const struct config_context *ctx, void *cb)
e0479fa0 395{
d08a189c
DG
396 grep_config(var, value, ctx, cb);
397 return git_default_config(var, value, ctx, cb);
e0479fa0
ES
398}
399----
400
401Next, we can modify the `grep_filter`. This is done with convenience functions
402found in `grep.h`. For fun, we're filtering to only commits from folks using a
403`gmail.com` email address - a not-very-precise guess at who may be working on
404Git as a hobby. Since we're checking the author, which is a specific line in the
405header, we'll use the `append_header_grep_pattern()` helper. We can use
406the `enum grep_header_field` to indicate which part of the commit header we want
407to search.
408
409In `final_rev_info_setup()`, add your filter line:
410
411----
412static void final_rev_info_setup(int argc, const char **argv,
413 const char *prefix, struct rev_info *rev)
414{
415 ...
416
417 append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR,
418 "gmail");
419 compile_grep_patterns(&rev->grep_filter);
420
421 ...
422}
423----
424
425`append_header_grep_pattern()` adds your new "gmail" pattern to `rev_info`, but
426it won't work unless we compile it with `compile_grep_patterns()`.
427
428NOTE: If you are using `setup_revisions()` (for example, if you are passing a
429`setup_revision_opt` instead of using `add_head_to_pending()`), you don't need
430to call `compile_grep_patterns()` because `setup_revisions()` calls it for you.
431
432NOTE: We could add the same filter via the `append_grep_pattern()` helper if we
433wanted to, but `append_header_grep_pattern()` adds the `enum grep_context` and
434`enum grep_pat_token` for us.
435
436=== Changing the Order
437
438There are a few ways that we can change the order of the commits during a
439revision walk. Firstly, we can use the `enum rev_sort_order` to choose from some
440typical orderings.
441
442`topo_order` is the same as `git log --topo-order`: we avoid showing a parent
443before all of its children have been shown, and we avoid mixing commits which
444are in different lines of history. (`git help log`'s section on `--topo-order`
445has a very nice diagram to illustrate this.)
446
447Let's see what happens when we run with `REV_SORT_BY_COMMIT_DATE` as opposed to
448`REV_SORT_BY_AUTHOR_DATE`. Add the following:
449
450----
451static void final_rev_info_setup(int argc, const char **argv,
452 const char *prefix, struct rev_info *rev)
453{
454 ...
455
456 rev->topo_order = 1;
457 rev->sort_order = REV_SORT_BY_COMMIT_DATE;
458
459 ...
460}
461----
462
463Let's output this into a file so we can easily diff it with the walk sorted by
464author date.
465
466----
467$ make
468$ ./bin-wrappers/git walken > commit-date.txt
469----
470
471Then, let's sort by author date and run it again.
472
473----
474static void final_rev_info_setup(int argc, const char **argv,
475 const char *prefix, struct rev_info *rev)
476{
477 ...
478
479 rev->topo_order = 1;
480 rev->sort_order = REV_SORT_BY_AUTHOR_DATE;
481
482 ...
483}
484----
485
486----
487$ make
488$ ./bin-wrappers/git walken > author-date.txt
489----
490
491Finally, compare the two. This is a little less helpful without object names or
492dates, but hopefully we get the idea.
493
494----
495$ diff -u commit-date.txt author-date.txt
496----
497
498This display indicates that commits can be reordered after they're written, for
499example with `git rebase`.
500
501Let's try one more reordering of commits. `rev_info` exposes a `reverse` flag.
502Set that flag somewhere inside of `final_rev_info_setup()`:
503
504----
505static void final_rev_info_setup(int argc, const char **argv, const char *prefix,
506 struct rev_info *rev)
507{
508 ...
509
510 rev->reverse = 1;
511
512 ...
513}
514----
515
516Run your walk again and note the difference in order. (If you remove the grep
517pattern, you should see the last commit this call gives you as your current
518HEAD.)
519
520== Basic Object Walk
521
522So far we've been walking only commits. But Git has more types of objects than
523that! Let's see if we can walk _all_ objects, and find out some information
524about each one.
525
526We can base our work on an example. `git pack-objects` prepares all kinds of
527objects for packing into a bitmap or packfile. The work we are interested in
34e0b72b 528resides in `builtin/pack-objects.c:get_object_list()`; examination of that
e0479fa0
ES
529function shows that the all-object walk is being performed by
530`traverse_commit_list()` or `traverse_commit_list_filtered()`. Those two
531functions reside in `list-objects.c`; examining the source shows that, despite
532the name, these functions traverse all kinds of objects. Let's have a look at
f0d2f849 533the arguments to `traverse_commit_list()`.
e0479fa0 534
f0d2f849
DS
535- `struct rev_info *revs`: This is the `rev_info` used for the walk. If
536 its `filter` member is not `NULL`, then `filter` contains information for
537 how to filter the object list.
e0479fa0
ES
538- `show_commit_fn show_commit`: A callback which will be used to handle each
539 individual commit object.
540- `show_object_fn show_object`: A callback which will be used to handle each
541 non-commit object (so each blob, tree, or tag).
542- `void *show_data`: A context buffer which is passed in turn to `show_commit`
543 and `show_object`.
f0d2f849 544
72991ff5 545In addition, `traverse_commit_list_filtered()` has an additional parameter:
f0d2f849 546
e0479fa0
ES
547- `struct oidset *omitted`: A linked-list of object IDs which the provided
548 filter caused to be omitted.
549
f0d2f849
DS
550It looks like these methods use callbacks we provide instead of needing us
551to call it repeatedly ourselves. Cool! Let's add the callbacks first.
e0479fa0
ES
552
553For the sake of this tutorial, we'll simply keep track of how many of each kind
554of object we find. At file scope in `builtin/walken.c` add the following
555tracking variables:
556
557----
558static int commit_count;
559static int tag_count;
560static int blob_count;
561static int tree_count;
562----
563
564Commits are handled by a different callback than other objects; let's do that
565one first:
566
567----
568static void walken_show_commit(struct commit *cmt, void *buf)
569{
570 commit_count++;
571}
572----
573
574The `cmt` argument is fairly self-explanatory. But it's worth mentioning that
575the `buf` argument is actually the context buffer that we can provide to the
576traversal calls - `show_data`, which we mentioned a moment ago.
577
578Since we have the `struct commit` object, we can look at all the same parts that
579we looked at in our earlier commit-only walk. For the sake of this tutorial,
580though, we'll just increment the commit counter and move on.
581
582The callback for non-commits is a little different, as we'll need to check
583which kind of object we're dealing with:
584
585----
586static void walken_show_object(struct object *obj, const char *str, void *buf)
587{
588 switch (obj->type) {
589 case OBJ_TREE:
590 tree_count++;
591 break;
592 case OBJ_BLOB:
593 blob_count++;
594 break;
595 case OBJ_TAG:
596 tag_count++;
597 break;
598 case OBJ_COMMIT:
599 BUG("unexpected commit object in walken_show_object\n");
600 default:
601 BUG("unexpected object type %s in walken_show_object\n",
602 type_name(obj->type));
603 }
604}
605----
606
607Again, `obj` is fairly self-explanatory, and we can guess that `buf` is the same
608context pointer that `walken_show_commit()` receives: the `show_data` argument
609to `traverse_commit_list()` and `traverse_commit_list_filtered()`. Finally,
610`str` contains the name of the object, which ends up being something like
611`foo.txt` (blob), `bar/baz` (tree), or `v1.2.3` (tag).
612
613To help assure us that we aren't double-counting commits, we'll include some
614complaining if a commit object is routed through our non-commit callback; we'll
615also complain if we see an invalid object type. Since those two cases should be
616unreachable, and would only change in the event of a semantic change to the Git
617codebase, we complain by using `BUG()` - which is a signal to a developer that
618the change they made caused unintended consequences, and the rest of the
619codebase needs to be updated to understand that change. `BUG()` is not intended
620to be seen by the public, so it is not localized.
621
622Our main object walk implementation is substantially different from our commit
623walk implementation, so let's make a new function to perform the object walk. We
624can perform setup which is applicable to all objects here, too, to keep separate
625from setup which is applicable to commit-only walks.
626
627We'll start by enabling all types of objects in the `struct rev_info`. We'll
628also turn on `tree_blobs_in_commit_order`, which means that we will walk a
629commit's tree and everything it points to immediately after we find each commit,
630as opposed to waiting for the end and walking through all trees after the commit
631history has been discovered. With the appropriate settings configured, we are
632ready to call `prepare_revision_walk()`.
633
634----
635static void walken_object_walk(struct rev_info *rev)
636{
637 rev->tree_objects = 1;
638 rev->blob_objects = 1;
639 rev->tag_objects = 1;
640 rev->tree_blobs_in_commit_order = 1;
641
642 if (prepare_revision_walk(rev))
643 die(_("revision walk setup failed"));
644
645 commit_count = 0;
646 tag_count = 0;
647 blob_count = 0;
648 tree_count = 0;
649----
650
651Let's start by calling just the unfiltered walk and reporting our counts.
7d1b8667
JC
652Complete your implementation of `walken_object_walk()`.
653We'll also need to include the `list-objects.h` header.
e0479fa0
ES
654
655----
7d1b8667
JC
656#include "list-objects.h"
657
658...
659
e0479fa0
ES
660 traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
661
662 printf("commits %d\nblobs %d\ntags %d\ntrees %d\n", commit_count,
663 blob_count, tag_count, tree_count);
664}
665----
666
667NOTE: This output is intended to be machine-parsed. Therefore, we are not
668sending it to `trace_printf()`, and we are not localizing it - we need scripts
669to be able to count on the formatting to be exactly the way it is shown here.
670If we were intending this output to be read by humans, we would need to localize
671it with `_()`.
672
673Finally, we'll ask `cmd_walken()` to use the object walk instead. Discussing
674command line options is out of scope for this tutorial, so we'll just hardcode
675a branch we can change at compile time. Where you call `final_rev_info_setup()`
676and `walken_commit_walk()`, instead branch like so:
677
678----
679 if (1) {
680 add_head_to_pending(&rev);
681 walken_object_walk(&rev);
682 } else {
683 final_rev_info_setup(argc, argv, prefix, &rev);
684 walken_commit_walk(&rev);
685 }
686----
687
688NOTE: For simplicity, we've avoided all the filters and sorts we applied in
689`final_rev_info_setup()` and simply added `HEAD` to our pending queue. If you
690want, you can certainly use the filters we added before by moving
691`final_rev_info_setup()` out of the conditional and removing the call to
692`add_head_to_pending()`.
693
694Now we can try to run our command! It should take noticeably longer than the
695commit walk, but an examination of the output will give you an idea why. Your
696output should look similar to this example, but with different counts:
697
698----
699Object walk completed. Found 55733 commits, 100274 blobs, 0 tags, and 104210 trees.
700----
701
702This makes sense. We have more trees than commits because the Git project has
703lots of subdirectories which can change, plus at least one tree per commit. We
704have no tags because we started on a commit (`HEAD`) and while tags can point to
705commits, commits can't point to tags.
706
707NOTE: You will have different counts when you run this yourself! The number of
708objects grows along with the Git project.
709
710=== Adding a Filter
711
712There are a handful of filters that we can apply to the object walk laid out in
713`Documentation/rev-list-options.txt`. These filters are typically useful for
714operations such as creating packfiles or performing a partial clone. They are
715defined in `list-objects-filter-options.h`. For the purposes of this tutorial we
716will use the "tree:1" filter, which causes the walk to omit all trees and blobs
717which are not directly referenced by commits reachable from the commit in
718`pending` when the walk begins. (`pending` is the list of objects which need to
719be traversed during a walk; you can imagine a breadth-first tree traversal to
720help understand. In our case, that means we omit trees and blobs not directly
721referenced by `HEAD` or `HEAD`'s history, because we begin the walk with only
722`HEAD` in the `pending` list.)
723
e0479fa0
ES
724For now, we are not going to track the omitted objects, so we'll replace those
725parameters with `NULL`. For the sake of simplicity, we'll add a simple
f0d2f849 726build-time branch to use our filter or not. Preface the line calling
e0479fa0
ES
727`traverse_commit_list()` with the following, which will remind us which kind of
728walk we've just performed:
729
730----
731 if (0) {
732 /* Unfiltered: */
733 trace_printf(_("Unfiltered object walk.\n"));
e0479fa0
ES
734 } else {
735 trace_printf(
736 _("Filtered object walk with filterspec 'tree:1'.\n"));
af388889
DG
737
738 parse_list_objects_filter(&rev->filter, "tree:1");
e0479fa0 739 }
f0d2f849
DS
740 traverse_commit_list(rev, walken_show_commit,
741 walken_show_object, NULL);
e0479fa0
ES
742----
743
f0d2f849 744The `rev->filter` member is usually built directly from a command
e0479fa0
ES
745line argument, so the module provides an easy way to build one from a string.
746Even though we aren't taking user input right now, we can still build one with
747a hardcoded string using `parse_list_objects_filter()`.
748
749With the filter spec "tree:1", we are expecting to see _only_ the root tree for
750each commit; therefore, the tree object count should be less than or equal to
751the number of commits. (For an example of why that's true: `git commit --revert`
752points to the same tree object as its grandparent.)
753
754=== Counting Omitted Objects
755
756We also have the capability to enumerate all objects which were omitted by a
7250cdb6
DG
757filter, like with `git log --filter=<spec> --filter-print-omitted`. To do this,
758change `traverse_commit_list()` to `traverse_commit_list_filtered()`, which is
759able to populate an `omitted` list. Asking for this list of filtered objects
760may cause performance degradations, however, because in this case, despite
761filtering objects, the possibly much larger set of all reachable objects must
762be processed in order to populate that list.
e0479fa0
ES
763
764First, add the `struct oidset` and related items we will use to iterate it:
765
766----
bbd7c7b7
VD
767#include "oidset.h"
768
769...
770
e0479fa0
ES
771static void walken_object_walk(
772 ...
773
774 struct oidset omitted;
775 struct oidset_iter oit;
776 struct object_id *oid = NULL;
777 int omitted_count = 0;
778 oidset_init(&omitted, 0);
779
780 ...
781----
782
7250cdb6
DG
783Replace the call to `traverse_commit_list()` with
784`traverse_commit_list_filtered()` and pass a pointer to the `omitted` oidset
785defined and initialized above:
e0479fa0
ES
786
787----
788 ...
789
f0d2f849 790 traverse_commit_list_filtered(rev,
e0479fa0
ES
791 walken_show_commit, walken_show_object, NULL, &omitted);
792
793 ...
794----
795
796Then, after your traversal, the `oidset` traversal is pretty straightforward.
797Count all the objects within and modify the print statement:
798
799----
800 /* Count the omitted objects. */
801 oidset_iter_init(&omitted, &oit);
802
803 while ((oid = oidset_iter_next(&oit)))
804 omitted_count++;
805
469888e6 806 printf("commits %d\nblobs %d\ntags %d\ntrees %d\nomitted %d\n",
e0479fa0
ES
807 commit_count, blob_count, tag_count, tree_count, omitted_count);
808----
809
810By running your walk with and without the filter, you should find that the total
811object count in each case is identical. You can also time each invocation of
812the `walken` subcommand, with and without `omitted` being passed in, to confirm
813to yourself the runtime impact of tracking all omitted objects.
814
815=== Changing the Order
816
817Finally, let's demonstrate that you can also reorder walks of all objects, not
818just walks of commits. First, we'll make our handlers chattier - modify
819`walken_show_commit()` and `walken_show_object()` to print the object as they
820go:
821
822----
bbd7c7b7
VD
823#include "hex.h"
824
825...
826
e0479fa0
ES
827static void walken_show_commit(struct commit *cmt, void *buf)
828{
829 trace_printf("commit: %s\n", oid_to_hex(&cmt->object.oid));
830 commit_count++;
831}
832
833static void walken_show_object(struct object *obj, const char *str, void *buf)
834{
835 trace_printf("%s: %s\n", type_name(obj->type), oid_to_hex(&obj->oid));
836
837 ...
838}
839----
840
841NOTE: Since we will be examining this output directly as humans, we'll use
842`trace_printf()` here. Additionally, since this change introduces a significant
843number of printed lines, using `trace_printf()` will allow us to easily silence
844those lines without having to recompile.
845
846(Leave the counter increment logic in place.)
847
848With only that change, run again (but save yourself some scrollback):
849
850----
95ab557b 851$ GIT_TRACE=1 ./bin-wrappers/git walken 2>&1 | head -n 10
e0479fa0
ES
852----
853
854Take a look at the top commit with `git show` and the object ID you printed; it
855should be the same as the output of `git show HEAD`.
856
857Next, let's change a setting on our `struct rev_info` within
858`walken_object_walk()`. Find where you're changing the other settings on `rev`,
859such as `rev->tree_objects` and `rev->tree_blobs_in_commit_order`, and add the
860`reverse` setting at the bottom:
861
862----
863 ...
864
865 rev->tree_objects = 1;
866 rev->blob_objects = 1;
867 rev->tag_objects = 1;
868 rev->tree_blobs_in_commit_order = 1;
869 rev->reverse = 1;
870
871 ...
872----
873
874Now, run again, but this time, let's grab the last handful of objects instead
875of the first handful:
876
877----
878$ make
95ab557b 879$ GIT_TRACE=1 ./bin-wrappers/git walken 2>&1 | tail -n 10
e0479fa0
ES
880----
881
882The last commit object given should have the same OID as the one we saw at the
883top before, and running `git show <oid>` with that OID should give you again
884the same results as `git show HEAD`. Furthermore, if you run and examine the
885first ten lines again (with `head` instead of `tail` like we did before applying
886the `reverse` setting), you should see that now the first commit printed is the
887initial commit, `e83c5163`.
888
889== Wrapping Up
890
891Let's review. In this tutorial, we:
892
893- Built a commit walk from the ground up
894- Enabled a grep filter for that commit walk
895- Changed the sort order of that filtered commit walk
896- Built an object walk (tags, commits, trees, and blobs) from the ground up
897- Learned how to add a filter-spec to an object walk
898- Changed the display order of the filtered object walk