]> git.ipfire.org Git - thirdparty/git.git/blame - Documentation/MyFirstObjectWalk.txt
CodingGuidelines: quote assigned value in 'local var=$val'
[thirdparty/git.git] / Documentation / MyFirstObjectWalk.txt
CommitLineData
e0479fa0
ES
1= My First Object Walk
2
3== What's an Object Walk?
4
5The object walk is a key concept in Git - this is the process that underpins
6operations like object transfer and fsck. Beginning from a given commit, the
7list of objects is found by walking parent relationships between commits (commit
8X based on commit W) and containment relationships between objects (tree Y is
9contained within commit X, and blob Z is located within tree Y, giving our
10working tree for commit X something like `y/z.txt`).
11
12A related concept is the revision walk, which is focused on commit objects and
13their parent relationships and does not delve into other object types. The
14revision walk is used for operations like `git log`.
15
16=== Related Reading
17
18- `Documentation/user-manual.txt` under "Hacking Git" contains some coverage of
19 the revision walker in its various incarnations.
301d595e 20- `revision.h`
e0479fa0
ES
21- https://eagain.net/articles/git-for-computer-scientists/[Git for Computer Scientists]
22 gives a good overview of the types of objects in Git and what your object
23 walk is really describing.
24
25== Setting Up
26
27Create a new branch from `master`.
28
29----
30git checkout -b revwalk origin/master
31----
32
33We'll put our fiddling into a new command. For fun, let's name it `git walken`.
34Open up a new file `builtin/walken.c` and set up the command handler:
35
36----
37/*
38 * "git walken"
39 *
40 * Part of the "My First Object Walk" tutorial.
41 */
42
43#include "builtin.h"
bbd7c7b7 44#include "trace.h"
e0479fa0
ES
45
46int cmd_walken(int argc, const char **argv, const char *prefix)
47{
48 trace_printf(_("cmd_walken incoming...\n"));
49 return 0;
50}
51----
52
bbd7c7b7
VD
53NOTE: `trace_printf()`, defined in `trace.h`, differs from `printf()` in
54that it can be turned on or off at runtime. For the purposes of this
55tutorial, we will write `walken` as though it is intended for use as
56a "plumbing" command: that is, a command which is used primarily in
57scripts, rather than interactively by humans (a "porcelain" command).
58So we will send our debug output to `trace_printf()` instead.
59When running, enable trace output by setting the environment variable `GIT_TRACE`.
e0479fa0
ES
60
61Add usage text and `-h` handling, like all subcommands should consistently do
62(our test suite will notice and complain if you fail to do so).
7d1b8667 63We'll need to include the `parse-options.h` header.
e0479fa0
ES
64
65----
7d1b8667
JC
66#include "parse-options.h"
67
68...
69
e0479fa0
ES
70int cmd_walken(int argc, const char **argv, const char *prefix)
71{
72 const char * const walken_usage[] = {
73 N_("git walken"),
74 NULL,
f0ac30ec 75 };
e0479fa0
ES
76 struct option options[] = {
77 OPT_END()
78 };
79
80 argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
81
82 ...
83}
84----
85
86Also add the relevant line in `builtin.h` near `cmd_whatchanged()`:
87
88----
89int cmd_walken(int argc, const char **argv, const char *prefix);
90----
91
92Include the command in `git.c` in `commands[]` near the entry for `whatchanged`,
93maintaining alphabetical ordering:
94
95----
96{ "walken", cmd_walken, RUN_SETUP },
97----
98
99Add it to the `Makefile` near the line for `builtin/worktree.o`:
100
101----
102BUILTIN_OBJS += builtin/walken.o
103----
104
105Build and test out your command, without forgetting to ensure the `DEVELOPER`
106flag is set, and with `GIT_TRACE` enabled so the debug output can be seen:
107
108----
109$ echo DEVELOPER=1 >>config.mak
110$ make
111$ GIT_TRACE=1 ./bin-wrappers/git walken
112----
113
114NOTE: For a more exhaustive overview of the new command process, take a look at
115`Documentation/MyFirstContribution.txt`.
116
117NOTE: A reference implementation can be found at
118https://github.com/nasamuffin/git/tree/revwalk.
119
120=== `struct rev_cmdline_info`
121
122The definition of `struct rev_cmdline_info` can be found in `revision.h`.
123
124This struct is contained within the `rev_info` struct and is used to reflect
125parameters provided by the user over the CLI.
126
127`nr` represents the number of `rev_cmdline_entry` present in the array.
128
bc5c5ec0 129`alloc` is used by the `ALLOC_GROW` macro. Check `alloc.h` - this variable is
13aa9c8b 130used to track the allocated size of the list.
e0479fa0
ES
131
132Per entry, we find:
133
134`item` is the object provided upon which to base the object walk. Items in Git
135can be blobs, trees, commits, or tags. (See `Documentation/gittutorial-2.txt`.)
136
137`name` is the object ID (OID) of the object - a hex string you may be familiar
138with from using Git to organize your source in the past. Check the tutorial
139mentioned above towards the top for a discussion of where the OID can come
140from.
141
142`whence` indicates some information about what to do with the parents of the
143specified object. We'll explore this flag more later on; take a look at
144`Documentation/revisions.txt` to get an idea of what could set the `whence`
145value.
146
147`flags` are used to hint the beginning of the revision walk and are the first
148block under the `#include`s in `revision.h`. The most likely ones to be set in
149the `rev_cmdline_info` are `UNINTERESTING` and `BOTTOM`, but these same flags
150can be used during the walk, as well.
151
152=== `struct rev_info`
153
154This one is quite a bit longer, and many fields are only used during the walk
155by `revision.c` - not configuration options. Most of the configurable flags in
156`struct rev_info` have a mirror in `Documentation/rev-list-options.txt`. It's a
157good idea to take some time and read through that document.
158
159== Basic Commit Walk
160
161First, let's see if we can replicate the output of `git log --oneline`. We'll
162refer back to the implementation frequently to discover norms when performing
163an object walk of our own.
164
165To do so, we'll first find all the commits, in order, which preceded the current
166commit. We'll extract the name and subject of the commit from each.
167
168Ideally, we will also be able to find out which ones are currently at the tip of
169various branches.
170
171=== Setting Up
172
173Preparing for your object walk has some distinct stages.
174
1751. Perform default setup for this mode, and others which may be invoked.
1762. Check configuration files for relevant settings.
1773. Set up the `rev_info` struct.
1784. Tweak the initialized `rev_info` to suit the current walk.
1795. Prepare the `rev_info` for the walk.
1806. Iterate over the objects, processing each one.
181
182==== Default Setups
183
184Before examining configuration files which may modify command behavior, set up
185default state for switches or options your command may have. If your command
186utilizes other Git components, ask them to set up their default states as well.
187For instance, `git log` takes advantage of `grep` and `diff` functionality, so
188its `init_log_defaults()` sets its own state (`decoration_style`) and asks
189`grep` and `diff` to initialize themselves by calling each of their
190initialization functions.
191
e0479fa0
ES
192==== Configuring From `.gitconfig`
193
194Next, we should have a look at any relevant configuration settings (i.e.,
195settings readable and settable from `git config`). This is done by providing a
196callback to `git_config()`; within that callback, you can also invoke methods
197from other components you may need that need to intercept these options. Your
198callback will be invoked once per each configuration value which Git knows about
199(global, local, worktree, etc.).
200
201Similarly to the default values, we don't have anything to do here yet
202ourselves; however, we should call `git_default_config()` if we aren't calling
203any other existing config callbacks.
204
7d1b8667
JC
205Add a new function to `builtin/walken.c`.
206We'll also need to include the `config.h` header:
e0479fa0
ES
207
208----
7d1b8667
JC
209#include "config.h"
210
211...
212
e0479fa0
ES
213static int git_walken_config(const char *var, const char *value, void *cb)
214{
215 /*
216 * For now, we don't have any custom configuration, so fall back to
217 * the default config.
218 */
219 return git_default_config(var, value, cb);
220}
221----
222
223Make sure to invoke `git_config()` with it in your `cmd_walken()`:
224
225----
226int cmd_walken(int argc, const char **argv, const char *prefix)
227{
228 ...
229
230 git_config(git_walken_config, NULL);
231
232 ...
233}
234----
235
236==== Setting Up `rev_info`
237
238Now that we've gathered external configuration and options, it's time to
239initialize the `rev_info` object which we will use to perform the walk. This is
240typically done by calling `repo_init_revisions()` with the repository you intend
241to target, as well as the `prefix` argument of `cmd_walken` and your `rev_info`
242struct.
243
7d1b8667
JC
244Add the `struct rev_info` and the `repo_init_revisions()` call.
245We'll also need to include the `revision.h` header:
246
e0479fa0 247----
7d1b8667
JC
248#include "revision.h"
249
250...
251
e0479fa0
ES
252int cmd_walken(int argc, const char **argv, const char *prefix)
253{
254 /* This can go wherever you like in your declarations.*/
255 struct rev_info rev;
256 ...
257
258 /* This should go after the git_config() call. */
259 repo_init_revisions(the_repository, &rev, prefix);
260
261 ...
262}
263----
264
265==== Tweaking `rev_info` For the Walk
266
267We're getting close, but we're still not quite ready to go. Now that `rev` is
268initialized, we can modify it to fit our needs. This is usually done within a
269helper for clarity, so let's add one:
270
271----
272static void final_rev_info_setup(struct rev_info *rev)
273{
274 /*
275 * We want to mimic the appearance of `git log --oneline`, so let's
276 * force oneline format.
277 */
278 get_commit_format("oneline", rev);
279
280 /* Start our object walk at HEAD. */
281 add_head_to_pending(rev);
282}
283----
284
285[NOTE]
286====
287Instead of using the shorthand `add_head_to_pending()`, you could do
288something like this:
289----
290 struct setup_revision_opt opt;
291
292 memset(&opt, 0, sizeof(opt));
293 opt.def = "HEAD";
294 opt.revarg_opt = REVARG_COMMITTISH;
295 setup_revisions(argc, argv, rev, &opt);
296----
297Using a `setup_revision_opt` gives you finer control over your walk's starting
298point.
299====
300
301Then let's invoke `final_rev_info_setup()` after the call to
302`repo_init_revisions()`:
303
304----
305int cmd_walken(int argc, const char **argv, const char *prefix)
306{
307 ...
308
309 final_rev_info_setup(&rev);
310
311 ...
312}
313----
314
315Later, we may wish to add more arguments to `final_rev_info_setup()`. But for
316now, this is all we need.
317
318==== Preparing `rev_info` For the Walk
319
320Now that `rev` is all initialized and configured, we've got one more setup step
321before we get rolling. We can do this in a helper, which will both prepare the
322`rev_info` for the walk, and perform the walk itself. Let's start the helper
323with the call to `prepare_revision_walk()`, which can return an error without
324dying on its own:
325
326----
327static void walken_commit_walk(struct rev_info *rev)
328{
329 if (prepare_revision_walk(rev))
330 die(_("revision walk setup failed"));
331}
332----
333
334NOTE: `die()` prints to `stderr` and exits the program. Since it will print to
335`stderr` it's likely to be seen by a human, so we will localize it.
336
337==== Performing the Walk!
338
339Finally! We are ready to begin the walk itself. Now we can see that `rev_info`
340can also be used as an iterator; we move to the next item in the walk by using
341`get_revision()` repeatedly. Add the listed variable declarations at the top and
342the walk loop below the `prepare_revision_walk()` call within your
343`walken_commit_walk()`:
344
345----
bbd7c7b7
VD
346#include "pretty.h"
347
348...
349
e0479fa0
ES
350static void walken_commit_walk(struct rev_info *rev)
351{
352 struct commit *commit;
353 struct strbuf prettybuf = STRBUF_INIT;
354
355 ...
356
357 while ((commit = get_revision(rev))) {
e0479fa0
ES
358 strbuf_reset(&prettybuf);
359 pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf);
360 puts(prettybuf.buf);
361 }
362 strbuf_release(&prettybuf);
363}
364----
365
366NOTE: `puts()` prints a `char*` to `stdout`. Since this is the part of the
367command we expect to be machine-parsed, we're sending it directly to stdout.
368
369Give it a shot.
370
371----
372$ make
373$ ./bin-wrappers/git walken
374----
375
376You should see all of the subject lines of all the commits in
377your tree's history, in order, ending with the initial commit, "Initial revision
378of "git", the information manager from hell". Congratulations! You've written
379your first revision walk. You can play with printing some additional fields
380from each commit if you're curious; have a look at the functions available in
381`commit.h`.
382
383=== Adding a Filter
384
385Next, let's try to filter the commits we see based on their author. This is
386equivalent to running `git log --author=<pattern>`. We can add a filter by
387modifying `rev_info.grep_filter`, which is a `struct grep_opt`.
388
96313423 389First some setup. Add `grep_config()` to `git_walken_config()`:
e0479fa0
ES
390
391----
e0479fa0
ES
392static int git_walken_config(const char *var, const char *value, void *cb)
393{
394 grep_config(var, value, cb);
395 return git_default_config(var, value, cb);
396}
397----
398
399Next, we can modify the `grep_filter`. This is done with convenience functions
400found in `grep.h`. For fun, we're filtering to only commits from folks using a
401`gmail.com` email address - a not-very-precise guess at who may be working on
402Git as a hobby. Since we're checking the author, which is a specific line in the
403header, we'll use the `append_header_grep_pattern()` helper. We can use
404the `enum grep_header_field` to indicate which part of the commit header we want
405to search.
406
407In `final_rev_info_setup()`, add your filter line:
408
409----
410static void final_rev_info_setup(int argc, const char **argv,
411 const char *prefix, struct rev_info *rev)
412{
413 ...
414
415 append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR,
416 "gmail");
417 compile_grep_patterns(&rev->grep_filter);
418
419 ...
420}
421----
422
423`append_header_grep_pattern()` adds your new "gmail" pattern to `rev_info`, but
424it won't work unless we compile it with `compile_grep_patterns()`.
425
426NOTE: If you are using `setup_revisions()` (for example, if you are passing a
427`setup_revision_opt` instead of using `add_head_to_pending()`), you don't need
428to call `compile_grep_patterns()` because `setup_revisions()` calls it for you.
429
430NOTE: We could add the same filter via the `append_grep_pattern()` helper if we
431wanted to, but `append_header_grep_pattern()` adds the `enum grep_context` and
432`enum grep_pat_token` for us.
433
434=== Changing the Order
435
436There are a few ways that we can change the order of the commits during a
437revision walk. Firstly, we can use the `enum rev_sort_order` to choose from some
438typical orderings.
439
440`topo_order` is the same as `git log --topo-order`: we avoid showing a parent
441before all of its children have been shown, and we avoid mixing commits which
442are in different lines of history. (`git help log`'s section on `--topo-order`
443has a very nice diagram to illustrate this.)
444
445Let's see what happens when we run with `REV_SORT_BY_COMMIT_DATE` as opposed to
446`REV_SORT_BY_AUTHOR_DATE`. Add the following:
447
448----
449static void final_rev_info_setup(int argc, const char **argv,
450 const char *prefix, struct rev_info *rev)
451{
452 ...
453
454 rev->topo_order = 1;
455 rev->sort_order = REV_SORT_BY_COMMIT_DATE;
456
457 ...
458}
459----
460
461Let's output this into a file so we can easily diff it with the walk sorted by
462author date.
463
464----
465$ make
466$ ./bin-wrappers/git walken > commit-date.txt
467----
468
469Then, let's sort by author date and run it again.
470
471----
472static void final_rev_info_setup(int argc, const char **argv,
473 const char *prefix, struct rev_info *rev)
474{
475 ...
476
477 rev->topo_order = 1;
478 rev->sort_order = REV_SORT_BY_AUTHOR_DATE;
479
480 ...
481}
482----
483
484----
485$ make
486$ ./bin-wrappers/git walken > author-date.txt
487----
488
489Finally, compare the two. This is a little less helpful without object names or
490dates, but hopefully we get the idea.
491
492----
493$ diff -u commit-date.txt author-date.txt
494----
495
496This display indicates that commits can be reordered after they're written, for
497example with `git rebase`.
498
499Let's try one more reordering of commits. `rev_info` exposes a `reverse` flag.
500Set that flag somewhere inside of `final_rev_info_setup()`:
501
502----
503static void final_rev_info_setup(int argc, const char **argv, const char *prefix,
504 struct rev_info *rev)
505{
506 ...
507
508 rev->reverse = 1;
509
510 ...
511}
512----
513
514Run your walk again and note the difference in order. (If you remove the grep
515pattern, you should see the last commit this call gives you as your current
516HEAD.)
517
518== Basic Object Walk
519
520So far we've been walking only commits. But Git has more types of objects than
521that! Let's see if we can walk _all_ objects, and find out some information
522about each one.
523
524We can base our work on an example. `git pack-objects` prepares all kinds of
525objects for packing into a bitmap or packfile. The work we are interested in
526resides in `builtins/pack-objects.c:get_object_list()`; examination of that
527function shows that the all-object walk is being performed by
528`traverse_commit_list()` or `traverse_commit_list_filtered()`. Those two
529functions reside in `list-objects.c`; examining the source shows that, despite
530the name, these functions traverse all kinds of objects. Let's have a look at
f0d2f849 531the arguments to `traverse_commit_list()`.
e0479fa0 532
f0d2f849
DS
533- `struct rev_info *revs`: This is the `rev_info` used for the walk. If
534 its `filter` member is not `NULL`, then `filter` contains information for
535 how to filter the object list.
e0479fa0
ES
536- `show_commit_fn show_commit`: A callback which will be used to handle each
537 individual commit object.
538- `show_object_fn show_object`: A callback which will be used to handle each
539 non-commit object (so each blob, tree, or tag).
540- `void *show_data`: A context buffer which is passed in turn to `show_commit`
541 and `show_object`.
f0d2f849 542
72991ff5 543In addition, `traverse_commit_list_filtered()` has an additional parameter:
f0d2f849 544
e0479fa0
ES
545- `struct oidset *omitted`: A linked-list of object IDs which the provided
546 filter caused to be omitted.
547
f0d2f849
DS
548It looks like these methods use callbacks we provide instead of needing us
549to call it repeatedly ourselves. Cool! Let's add the callbacks first.
e0479fa0
ES
550
551For the sake of this tutorial, we'll simply keep track of how many of each kind
552of object we find. At file scope in `builtin/walken.c` add the following
553tracking variables:
554
555----
556static int commit_count;
557static int tag_count;
558static int blob_count;
559static int tree_count;
560----
561
562Commits are handled by a different callback than other objects; let's do that
563one first:
564
565----
566static void walken_show_commit(struct commit *cmt, void *buf)
567{
568 commit_count++;
569}
570----
571
572The `cmt` argument is fairly self-explanatory. But it's worth mentioning that
573the `buf` argument is actually the context buffer that we can provide to the
574traversal calls - `show_data`, which we mentioned a moment ago.
575
576Since we have the `struct commit` object, we can look at all the same parts that
577we looked at in our earlier commit-only walk. For the sake of this tutorial,
578though, we'll just increment the commit counter and move on.
579
580The callback for non-commits is a little different, as we'll need to check
581which kind of object we're dealing with:
582
583----
584static void walken_show_object(struct object *obj, const char *str, void *buf)
585{
586 switch (obj->type) {
587 case OBJ_TREE:
588 tree_count++;
589 break;
590 case OBJ_BLOB:
591 blob_count++;
592 break;
593 case OBJ_TAG:
594 tag_count++;
595 break;
596 case OBJ_COMMIT:
597 BUG("unexpected commit object in walken_show_object\n");
598 default:
599 BUG("unexpected object type %s in walken_show_object\n",
600 type_name(obj->type));
601 }
602}
603----
604
605Again, `obj` is fairly self-explanatory, and we can guess that `buf` is the same
606context pointer that `walken_show_commit()` receives: the `show_data` argument
607to `traverse_commit_list()` and `traverse_commit_list_filtered()`. Finally,
608`str` contains the name of the object, which ends up being something like
609`foo.txt` (blob), `bar/baz` (tree), or `v1.2.3` (tag).
610
611To help assure us that we aren't double-counting commits, we'll include some
612complaining if a commit object is routed through our non-commit callback; we'll
613also complain if we see an invalid object type. Since those two cases should be
614unreachable, and would only change in the event of a semantic change to the Git
615codebase, we complain by using `BUG()` - which is a signal to a developer that
616the change they made caused unintended consequences, and the rest of the
617codebase needs to be updated to understand that change. `BUG()` is not intended
618to be seen by the public, so it is not localized.
619
620Our main object walk implementation is substantially different from our commit
621walk implementation, so let's make a new function to perform the object walk. We
622can perform setup which is applicable to all objects here, too, to keep separate
623from setup which is applicable to commit-only walks.
624
625We'll start by enabling all types of objects in the `struct rev_info`. We'll
626also turn on `tree_blobs_in_commit_order`, which means that we will walk a
627commit's tree and everything it points to immediately after we find each commit,
628as opposed to waiting for the end and walking through all trees after the commit
629history has been discovered. With the appropriate settings configured, we are
630ready to call `prepare_revision_walk()`.
631
632----
633static void walken_object_walk(struct rev_info *rev)
634{
635 rev->tree_objects = 1;
636 rev->blob_objects = 1;
637 rev->tag_objects = 1;
638 rev->tree_blobs_in_commit_order = 1;
639
640 if (prepare_revision_walk(rev))
641 die(_("revision walk setup failed"));
642
643 commit_count = 0;
644 tag_count = 0;
645 blob_count = 0;
646 tree_count = 0;
647----
648
649Let's start by calling just the unfiltered walk and reporting our counts.
7d1b8667
JC
650Complete your implementation of `walken_object_walk()`.
651We'll also need to include the `list-objects.h` header.
e0479fa0
ES
652
653----
7d1b8667
JC
654#include "list-objects.h"
655
656...
657
e0479fa0
ES
658 traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
659
660 printf("commits %d\nblobs %d\ntags %d\ntrees %d\n", commit_count,
661 blob_count, tag_count, tree_count);
662}
663----
664
665NOTE: This output is intended to be machine-parsed. Therefore, we are not
666sending it to `trace_printf()`, and we are not localizing it - we need scripts
667to be able to count on the formatting to be exactly the way it is shown here.
668If we were intending this output to be read by humans, we would need to localize
669it with `_()`.
670
671Finally, we'll ask `cmd_walken()` to use the object walk instead. Discussing
672command line options is out of scope for this tutorial, so we'll just hardcode
673a branch we can change at compile time. Where you call `final_rev_info_setup()`
674and `walken_commit_walk()`, instead branch like so:
675
676----
677 if (1) {
678 add_head_to_pending(&rev);
679 walken_object_walk(&rev);
680 } else {
681 final_rev_info_setup(argc, argv, prefix, &rev);
682 walken_commit_walk(&rev);
683 }
684----
685
686NOTE: For simplicity, we've avoided all the filters and sorts we applied in
687`final_rev_info_setup()` and simply added `HEAD` to our pending queue. If you
688want, you can certainly use the filters we added before by moving
689`final_rev_info_setup()` out of the conditional and removing the call to
690`add_head_to_pending()`.
691
692Now we can try to run our command! It should take noticeably longer than the
693commit walk, but an examination of the output will give you an idea why. Your
694output should look similar to this example, but with different counts:
695
696----
697Object walk completed. Found 55733 commits, 100274 blobs, 0 tags, and 104210 trees.
698----
699
700This makes sense. We have more trees than commits because the Git project has
701lots of subdirectories which can change, plus at least one tree per commit. We
702have no tags because we started on a commit (`HEAD`) and while tags can point to
703commits, commits can't point to tags.
704
705NOTE: You will have different counts when you run this yourself! The number of
706objects grows along with the Git project.
707
708=== Adding a Filter
709
710There are a handful of filters that we can apply to the object walk laid out in
711`Documentation/rev-list-options.txt`. These filters are typically useful for
712operations such as creating packfiles or performing a partial clone. They are
713defined in `list-objects-filter-options.h`. For the purposes of this tutorial we
714will use the "tree:1" filter, which causes the walk to omit all trees and blobs
715which are not directly referenced by commits reachable from the commit in
716`pending` when the walk begins. (`pending` is the list of objects which need to
717be traversed during a walk; you can imagine a breadth-first tree traversal to
718help understand. In our case, that means we omit trees and blobs not directly
719referenced by `HEAD` or `HEAD`'s history, because we begin the walk with only
720`HEAD` in the `pending` list.)
721
e0479fa0
ES
722For now, we are not going to track the omitted objects, so we'll replace those
723parameters with `NULL`. For the sake of simplicity, we'll add a simple
f0d2f849 724build-time branch to use our filter or not. Preface the line calling
e0479fa0
ES
725`traverse_commit_list()` with the following, which will remind us which kind of
726walk we've just performed:
727
728----
729 if (0) {
730 /* Unfiltered: */
731 trace_printf(_("Unfiltered object walk.\n"));
e0479fa0
ES
732 } else {
733 trace_printf(
734 _("Filtered object walk with filterspec 'tree:1'.\n"));
f0d2f849
DS
735 CALLOC_ARRAY(rev->filter, 1);
736 parse_list_objects_filter(rev->filter, "tree:1");
e0479fa0 737 }
f0d2f849
DS
738 traverse_commit_list(rev, walken_show_commit,
739 walken_show_object, NULL);
e0479fa0
ES
740----
741
f0d2f849 742The `rev->filter` member is usually built directly from a command
e0479fa0
ES
743line argument, so the module provides an easy way to build one from a string.
744Even though we aren't taking user input right now, we can still build one with
745a hardcoded string using `parse_list_objects_filter()`.
746
747With the filter spec "tree:1", we are expecting to see _only_ the root tree for
748each commit; therefore, the tree object count should be less than or equal to
749the number of commits. (For an example of why that's true: `git commit --revert`
750points to the same tree object as its grandparent.)
751
752=== Counting Omitted Objects
753
754We also have the capability to enumerate all objects which were omitted by a
755filter, like with `git log --filter=<spec> --filter-print-omitted`. Asking
756`traverse_commit_list_filtered()` to populate the `omitted` list means that our
757object walk does not perform any better than an unfiltered object walk; all
758reachable objects are walked in order to populate the list.
759
760First, add the `struct oidset` and related items we will use to iterate it:
761
762----
bbd7c7b7
VD
763#include "oidset.h"
764
765...
766
e0479fa0
ES
767static void walken_object_walk(
768 ...
769
770 struct oidset omitted;
771 struct oidset_iter oit;
772 struct object_id *oid = NULL;
773 int omitted_count = 0;
774 oidset_init(&omitted, 0);
775
776 ...
777----
778
779Modify the call to `traverse_commit_list_filtered()` to include your `omitted`
780object:
781
782----
783 ...
784
f0d2f849 785 traverse_commit_list_filtered(rev,
e0479fa0
ES
786 walken_show_commit, walken_show_object, NULL, &omitted);
787
788 ...
789----
790
791Then, after your traversal, the `oidset` traversal is pretty straightforward.
792Count all the objects within and modify the print statement:
793
794----
795 /* Count the omitted objects. */
796 oidset_iter_init(&omitted, &oit);
797
798 while ((oid = oidset_iter_next(&oit)))
799 omitted_count++;
800
469888e6 801 printf("commits %d\nblobs %d\ntags %d\ntrees %d\nomitted %d\n",
e0479fa0
ES
802 commit_count, blob_count, tag_count, tree_count, omitted_count);
803----
804
805By running your walk with and without the filter, you should find that the total
806object count in each case is identical. You can also time each invocation of
807the `walken` subcommand, with and without `omitted` being passed in, to confirm
808to yourself the runtime impact of tracking all omitted objects.
809
810=== Changing the Order
811
812Finally, let's demonstrate that you can also reorder walks of all objects, not
813just walks of commits. First, we'll make our handlers chattier - modify
814`walken_show_commit()` and `walken_show_object()` to print the object as they
815go:
816
817----
bbd7c7b7
VD
818#include "hex.h"
819
820...
821
e0479fa0
ES
822static void walken_show_commit(struct commit *cmt, void *buf)
823{
824 trace_printf("commit: %s\n", oid_to_hex(&cmt->object.oid));
825 commit_count++;
826}
827
828static void walken_show_object(struct object *obj, const char *str, void *buf)
829{
830 trace_printf("%s: %s\n", type_name(obj->type), oid_to_hex(&obj->oid));
831
832 ...
833}
834----
835
836NOTE: Since we will be examining this output directly as humans, we'll use
837`trace_printf()` here. Additionally, since this change introduces a significant
838number of printed lines, using `trace_printf()` will allow us to easily silence
839those lines without having to recompile.
840
841(Leave the counter increment logic in place.)
842
843With only that change, run again (but save yourself some scrollback):
844
845----
846$ GIT_TRACE=1 ./bin-wrappers/git walken | head -n 10
847----
848
849Take a look at the top commit with `git show` and the object ID you printed; it
850should be the same as the output of `git show HEAD`.
851
852Next, let's change a setting on our `struct rev_info` within
853`walken_object_walk()`. Find where you're changing the other settings on `rev`,
854such as `rev->tree_objects` and `rev->tree_blobs_in_commit_order`, and add the
855`reverse` setting at the bottom:
856
857----
858 ...
859
860 rev->tree_objects = 1;
861 rev->blob_objects = 1;
862 rev->tag_objects = 1;
863 rev->tree_blobs_in_commit_order = 1;
864 rev->reverse = 1;
865
866 ...
867----
868
869Now, run again, but this time, let's grab the last handful of objects instead
870of the first handful:
871
872----
873$ make
874$ GIT_TRACE=1 ./bin-wrappers git walken | tail -n 10
875----
876
877The last commit object given should have the same OID as the one we saw at the
878top before, and running `git show <oid>` with that OID should give you again
879the same results as `git show HEAD`. Furthermore, if you run and examine the
880first ten lines again (with `head` instead of `tail` like we did before applying
881the `reverse` setting), you should see that now the first commit printed is the
882initial commit, `e83c5163`.
883
884== Wrapping Up
885
886Let's review. In this tutorial, we:
887
888- Built a commit walk from the ground up
889- Enabled a grep filter for that commit walk
890- Changed the sort order of that filtered commit walk
891- Built an object walk (tags, commits, trees, and blobs) from the ground up
892- Learned how to add a filter-spec to an object walk
893- Changed the display order of the filtered object walk