Documentation/MyFirstObjectWalk.txt

   1 = My First Object Walk
   2
   3 == What's an Object Walk?
   4
   5 The object walk is a key concept in Git - this is the process that underpins
   6 operations like object transfer and fsck. Beginning from a given commit, the
   7 list of objects is found by walking parent relationships between commits (commit
   8 X based on commit W) and containment relationships between objects (tree Y is
   9 contained within commit X, and blob Z is located within tree Y, giving our
  10 working tree for commit X something like `y/z.txt`).
  11
  12 A related concept is the revision walk, which is focused on commit objects and
  13 their parent relationships and does not delve into other object types. The
  14 revision walk is used for operations like `git log`.
  15
  16 === Related Reading
  17
  18 - `Documentation/user-manual.txt` under "Hacking Git" contains some coverage of
  19   the revision walker in its various incarnations.
  20 - `revision.h`
  21 - https://eagain.net/articles/git-for-computer-scientists/[Git for Computer Scientists]
  22   gives a good overview of the types of objects in Git and what your object
  23   walk is really describing.
  24
  25 == Setting Up
  26
  27 Create a new branch from `master`.
  28
  29 ----
  30 git checkout -b revwalk origin/master
  31 ----
  32
  33 We'll put our fiddling into a new command. For fun, let's name it `git walken`.
  34 Open up a new file `builtin/walken.c` and set up the command handler:
  35
  36 ----
  37 /*
  38  * "git walken"
  39  *
  40  * Part of the "My First Object Walk" tutorial.
  41  */
  42
  43 #include "builtin.h"
  44
  45 int cmd_walken(int argc, const char **argv, const char *prefix)
  46 {
  47         trace_printf(_("cmd_walken incoming...\n"));
  48         return 0;
  49 }
  50 ----
  51
  52 NOTE: `trace_printf()` differs from `printf()` in that it can be turned on or
  53 off at runtime. For the purposes of this tutorial, we will write `walken` as
  54 though it is intended for use as a "plumbing" command: that is, a command which
  55 is used primarily in scripts, rather than interactively by humans (a "porcelain"
  56 command). So we will send our debug output to `trace_printf()` instead. When
  57 running, enable trace output by setting the environment variable `GIT_TRACE`.
  58
  59 Add usage text and `-h` handling, like all subcommands should consistently do
  60 (our test suite will notice and complain if you fail to do so).
  61 We'll need to include the `parse-options.h` header.
  62
  63 ----
  64 #include "parse-options.h"
  65
  66 ...
  67
  68 int cmd_walken(int argc, const char **argv, const char *prefix)
  69 {
  70         const char * const walken_usage[] = {
  71                 N_("git walken"),
  72                 NULL,
  73         };
  74         struct option options[] = {
  75                 OPT_END()
  76         };
  77
  78         argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
  79
  80         ...
  81 }
  82 ----
  83
  84 Also add the relevant line in `builtin.h` near `cmd_whatchanged()`:
  85
  86 ----
  87 int cmd_walken(int argc, const char **argv, const char *prefix);
  88 ----
  89
  90 Include the command in `git.c` in `commands[]` near the entry for `whatchanged`,
  91 maintaining alphabetical ordering:
  92
  93 ----
  94 { "walken", cmd_walken, RUN_SETUP },
  95 ----
  96
  97 Add it to the `Makefile` near the line for `builtin/worktree.o`:
  98
  99 ----
 100 BUILTIN_OBJS += builtin/walken.o
 101 ----
 102
 103 Build and test out your command, without forgetting to ensure the `DEVELOPER`
 104 flag is set, and with `GIT_TRACE` enabled so the debug output can be seen:
 105
 106 ----
 107 $ echo DEVELOPER=1 >>config.mak
 108 $ make
 109 $ GIT_TRACE=1 ./bin-wrappers/git walken
 110 ----
 111
 112 NOTE: For a more exhaustive overview of the new command process, take a look at
 113 `Documentation/MyFirstContribution.txt`.
 114
 115 NOTE: A reference implementation can be found at
 116 https://github.com/nasamuffin/git/tree/revwalk.
 117
 118 === `struct rev_cmdline_info`
 119
 120 The definition of `struct rev_cmdline_info` can be found in `revision.h`.
 121
 122 This struct is contained within the `rev_info` struct and is used to reflect
 123 parameters provided by the user over the CLI.
 124
 125 `nr` represents the number of `rev_cmdline_entry` present in the array.
 126
 127 `alloc` is used by the `ALLOC_GROW` macro. Check `cache.h` - this variable is
 128 used to track the allocated size of the list.
 129
 130 Per entry, we find:
 131
 132 `item` is the object provided upon which to base the object walk. Items in Git
 133 can be blobs, trees, commits, or tags. (See `Documentation/gittutorial-2.txt`.)
 134
 135 `name` is the object ID (OID) of the object - a hex string you may be familiar
 136 with from using Git to organize your source in the past. Check the tutorial
 137 mentioned above towards the top for a discussion of where the OID can come
 138 from.
 139
 140 `whence` indicates some information about what to do with the parents of the
 141 specified object. We'll explore this flag more later on; take a look at
 142 `Documentation/revisions.txt` to get an idea of what could set the `whence`
 143 value.
 144
 145 `flags` are used to hint the beginning of the revision walk and are the first
 146 block under the `#include`s in `revision.h`. The most likely ones to be set in
 147 the `rev_cmdline_info` are `UNINTERESTING` and `BOTTOM`, but these same flags
 148 can be used during the walk, as well.
 149
 150 === `struct rev_info`
 151
 152 This one is quite a bit longer, and many fields are only used during the walk
 153 by `revision.c` - not configuration options. Most of the configurable flags in
 154 `struct rev_info` have a mirror in `Documentation/rev-list-options.txt`. It's a
 155 good idea to take some time and read through that document.
 156
 157 == Basic Commit Walk
 158
 159 First, let's see if we can replicate the output of `git log --oneline`. We'll
 160 refer back to the implementation frequently to discover norms when performing
 161 an object walk of our own.
 162
 163 To do so, we'll first find all the commits, in order, which preceded the current
 164 commit. We'll extract the name and subject of the commit from each.
 165
 166 Ideally, we will also be able to find out which ones are currently at the tip of
 167 various branches.
 168
 169 === Setting Up
 170
 171 Preparing for your object walk has some distinct stages.
 172
 173 1. Perform default setup for this mode, and others which may be invoked.
 174 2. Check configuration files for relevant settings.
 175 3. Set up the `rev_info` struct.
 176 4. Tweak the initialized `rev_info` to suit the current walk.
 177 5. Prepare the `rev_info` for the walk.
 178 6. Iterate over the objects, processing each one.
 179
 180 ==== Default Setups
 181
 182 Before examining configuration files which may modify command behavior, set up
 183 default state for switches or options your command may have. If your command
 184 utilizes other Git components, ask them to set up their default states as well.
 185 For instance, `git log` takes advantage of `grep` and `diff` functionality, so
 186 its `init_log_defaults()` sets its own state (`decoration_style`) and asks
 187 `grep` and `diff` to initialize themselves by calling each of their
 188 initialization functions.
 189
 190 ==== Configuring From `.gitconfig`
 191
 192 Next, we should have a look at any relevant configuration settings (i.e.,
 193 settings readable and settable from `git config`). This is done by providing a
 194 callback to `git_config()`; within that callback, you can also invoke methods
 195 from other components you may need that need to intercept these options. Your
 196 callback will be invoked once per each configuration value which Git knows about
 197 (global, local, worktree, etc.).
 198
 199 Similarly to the default values, we don't have anything to do here yet
 200 ourselves; however, we should call `git_default_config()` if we aren't calling
 201 any other existing config callbacks.
 202
 203 Add a new function to `builtin/walken.c`.
 204 We'll also need to include the `config.h` header:
 205
 206 ----
 207 #include "config.h"
 208
 209 ...
 210
 211 static int git_walken_config(const char *var, const char *value, void *cb)
 212 {
 213         /*
 214          * For now, we don't have any custom configuration, so fall back to
 215          * the default config.
 216          */
 217         return git_default_config(var, value, cb);
 218 }
 219 ----
 220
 221 Make sure to invoke `git_config()` with it in your `cmd_walken()`:
 222
 223 ----
 224 int cmd_walken(int argc, const char **argv, const char *prefix)
 225 {
 226         ...
 227
 228         git_config(git_walken_config, NULL);
 229
 230         ...
 231 }
 232 ----
 233
 234 ==== Setting Up `rev_info`
 235
 236 Now that we've gathered external configuration and options, it's time to
 237 initialize the `rev_info` object which we will use to perform the walk. This is
 238 typically done by calling `repo_init_revisions()` with the repository you intend
 239 to target, as well as the `prefix` argument of `cmd_walken` and your `rev_info`
 240 struct.
 241
 242 Add the `struct rev_info` and the `repo_init_revisions()` call.
 243 We'll also need to include the `revision.h` header:
 244
 245 ----
 246 #include "revision.h"
 247
 248 ...
 249
 250 int cmd_walken(int argc, const char **argv, const char *prefix)
 251 {
 252         /* This can go wherever you like in your declarations.*/
 253         struct rev_info rev;
 254         ...
 255
 256         /* This should go after the git_config() call. */
 257         repo_init_revisions(the_repository, &rev, prefix);
 258
 259         ...
 260 }
 261 ----
 262
 263 ==== Tweaking `rev_info` For the Walk
 264
 265 We're getting close, but we're still not quite ready to go. Now that `rev` is
 266 initialized, we can modify it to fit our needs. This is usually done within a
 267 helper for clarity, so let's add one:
 268
 269 ----
 270 static void final_rev_info_setup(struct rev_info *rev)
 271 {
 272         /*
 273          * We want to mimic the appearance of `git log --oneline`, so let's
 274          * force oneline format.
 275          */
 276         get_commit_format("oneline", rev);
 277
 278         /* Start our object walk at HEAD. */
 279         add_head_to_pending(rev);
 280 }
 281 ----
 282
 283 [NOTE]
 284 ====
 285 Instead of using the shorthand `add_head_to_pending()`, you could do
 286 something like this:
 287 ----
 288         struct setup_revision_opt opt;
 289
 290         memset(&opt, 0, sizeof(opt));
 291         opt.def = "HEAD";
 292         opt.revarg_opt = REVARG_COMMITTISH;
 293         setup_revisions(argc, argv, rev, &opt);
 294 ----
 295 Using a `setup_revision_opt` gives you finer control over your walk's starting
 296 point.
 297 ====
 298
 299 Then let's invoke `final_rev_info_setup()` after the call to
 300 `repo_init_revisions()`:
 301
 302 ----
 303 int cmd_walken(int argc, const char **argv, const char *prefix)
 304 {
 305         ...
 306
 307         final_rev_info_setup(&rev);
 308
 309         ...
 310 }
 311 ----
 312
 313 Later, we may wish to add more arguments to `final_rev_info_setup()`. But for
 314 now, this is all we need.
 315
 316 ==== Preparing `rev_info` For the Walk
 317
 318 Now that `rev` is all initialized and configured, we've got one more setup step
 319 before we get rolling. We can do this in a helper, which will both prepare the
 320 `rev_info` for the walk, and perform the walk itself. Let's start the helper
 321 with the call to `prepare_revision_walk()`, which can return an error without
 322 dying on its own:
 323
 324 ----
 325 static void walken_commit_walk(struct rev_info *rev)
 326 {
 327         if (prepare_revision_walk(rev))
 328                 die(_("revision walk setup failed"));
 329 }
 330 ----
 331
 332 NOTE: `die()` prints to `stderr` and exits the program. Since it will print to
 333 `stderr` it's likely to be seen by a human, so we will localize it.
 334
 335 ==== Performing the Walk!
 336
 337 Finally! We are ready to begin the walk itself. Now we can see that `rev_info`
 338 can also be used as an iterator; we move to the next item in the walk by using
 339 `get_revision()` repeatedly. Add the listed variable declarations at the top and
 340 the walk loop below the `prepare_revision_walk()` call within your
 341 `walken_commit_walk()`:
 342
 343 ----
 344 static void walken_commit_walk(struct rev_info *rev)
 345 {
 346         struct commit *commit;
 347         struct strbuf prettybuf = STRBUF_INIT;
 348
 349         ...
 350
 351         while ((commit = get_revision(rev))) {
 352                 strbuf_reset(&prettybuf);
 353                 pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf);
 354                 puts(prettybuf.buf);
 355         }
 356         strbuf_release(&prettybuf);
 357 }
 358 ----
 359
 360 NOTE: `puts()` prints a `char*` to `stdout`. Since this is the part of the
 361 command we expect to be machine-parsed, we're sending it directly to stdout.
 362
 363 Give it a shot.
 364
 365 ----
 366 $ make
 367 $ ./bin-wrappers/git walken
 368 ----
 369
 370 You should see all of the subject lines of all the commits in
 371 your tree's history, in order, ending with the initial commit, "Initial revision
 372 of "git", the information manager from hell". Congratulations! You've written
 373 your first revision walk. You can play with printing some additional fields
 374 from each commit if you're curious; have a look at the functions available in
 375 `commit.h`.
 376
 377 === Adding a Filter
 378
 379 Next, let's try to filter the commits we see based on their author. This is
 380 equivalent to running `git log --author=<pattern>`. We can add a filter by
 381 modifying `rev_info.grep_filter`, which is a `struct grep_opt`.
 382
 383 First some setup. Add `grep_config()` to `git_walken_config()`:
 384
 385 ----
 386 static int git_walken_config(const char *var, const char *value, void *cb)
 387 {
 388         grep_config(var, value, cb);
 389         return git_default_config(var, value, cb);
 390 }
 391 ----
 392
 393 Next, we can modify the `grep_filter`. This is done with convenience functions
 394 found in `grep.h`. For fun, we're filtering to only commits from folks using a
 395 `gmail.com` email address - a not-very-precise guess at who may be working on
 396 Git as a hobby. Since we're checking the author, which is a specific line in the
 397 header, we'll use the `append_header_grep_pattern()` helper. We can use
 398 the `enum grep_header_field` to indicate which part of the commit header we want
 399 to search.
 400
 401 In `final_rev_info_setup()`, add your filter line:
 402
 403 ----
 404 static void final_rev_info_setup(int argc, const char **argv,
 405                 const char *prefix, struct rev_info *rev)
 406 {
 407         ...
 408
 409         append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR,
 410                 "gmail");
 411         compile_grep_patterns(&rev->grep_filter);
 412
 413         ...
 414 }
 415 ----
 416
 417 `append_header_grep_pattern()` adds your new "gmail" pattern to `rev_info`, but
 418 it won't work unless we compile it with `compile_grep_patterns()`.
 419
 420 NOTE: If you are using `setup_revisions()` (for example, if you are passing a
 421 `setup_revision_opt` instead of using `add_head_to_pending()`), you don't need
 422 to call `compile_grep_patterns()` because `setup_revisions()` calls it for you.
 423
 424 NOTE: We could add the same filter via the `append_grep_pattern()` helper if we
 425 wanted to, but `append_header_grep_pattern()` adds the `enum grep_context` and
 426 `enum grep_pat_token` for us.
 427
 428 === Changing the Order
 429
 430 There are a few ways that we can change the order of the commits during a
 431 revision walk. Firstly, we can use the `enum rev_sort_order` to choose from some
 432 typical orderings.
 433
 434 `topo_order` is the same as `git log --topo-order`: we avoid showing a parent
 435 before all of its children have been shown, and we avoid mixing commits which
 436 are in different lines of history. (`git help log`'s section on `--topo-order`
 437 has a very nice diagram to illustrate this.)
 438
 439 Let's see what happens when we run with `REV_SORT_BY_COMMIT_DATE` as opposed to
 440 `REV_SORT_BY_AUTHOR_DATE`. Add the following:
 441
 442 ----
 443 static void final_rev_info_setup(int argc, const char **argv,
 444                 const char *prefix, struct rev_info *rev)
 445 {
 446         ...
 447
 448         rev->topo_order = 1;
 449         rev->sort_order = REV_SORT_BY_COMMIT_DATE;
 450
 451         ...
 452 }
 453 ----
 454
 455 Let's output this into a file so we can easily diff it with the walk sorted by
 456 author date.
 457
 458 ----
 459 $ make
 460 $ ./bin-wrappers/git walken > commit-date.txt
 461 ----
 462
 463 Then, let's sort by author date and run it again.
 464
 465 ----
 466 static void final_rev_info_setup(int argc, const char **argv,
 467                 const char *prefix, struct rev_info *rev)
 468 {
 469         ...
 470
 471         rev->topo_order = 1;
 472         rev->sort_order = REV_SORT_BY_AUTHOR_DATE;
 473
 474         ...
 475 }
 476 ----
 477
 478 ----
 479 $ make
 480 $ ./bin-wrappers/git walken > author-date.txt
 481 ----
 482
 483 Finally, compare the two. This is a little less helpful without object names or
 484 dates, but hopefully we get the idea.
 485
 486 ----
 487 $ diff -u commit-date.txt author-date.txt
 488 ----
 489
 490 This display indicates that commits can be reordered after they're written, for
 491 example with `git rebase`.
 492
 493 Let's try one more reordering of commits. `rev_info` exposes a `reverse` flag.
 494 Set that flag somewhere inside of `final_rev_info_setup()`:
 495
 496 ----
 497 static void final_rev_info_setup(int argc, const char **argv, const char *prefix,
 498                 struct rev_info *rev)
 499 {
 500         ...
 501
 502         rev->reverse = 1;
 503
 504         ...
 505 }
 506 ----
 507
 508 Run your walk again and note the difference in order. (If you remove the grep
 509 pattern, you should see the last commit this call gives you as your current
 510 HEAD.)
 511
 512 == Basic Object Walk
 513
 514 So far we've been walking only commits. But Git has more types of objects than
 515 that! Let's see if we can walk _all_ objects, and find out some information
 516 about each one.
 517
 518 We can base our work on an example. `git pack-objects` prepares all kinds of
 519 objects for packing into a bitmap or packfile. The work we are interested in
 520 resides in `builtins/pack-objects.c:get_object_list()`; examination of that
 521 function shows that the all-object walk is being performed by
 522 `traverse_commit_list()` or `traverse_commit_list_filtered()`. Those two
 523 functions reside in `list-objects.c`; examining the source shows that, despite
 524 the name, these functions traverse all kinds of objects. Let's have a look at
 525 the arguments to `traverse_commit_list_filtered()`, which are a superset of the
 526 arguments to the unfiltered version.
 527
 528 - `struct list_objects_filter_options *filter_options`: This is a struct which
 529   stores a filter-spec as outlined in `Documentation/rev-list-options.txt`.
 530 - `struct rev_info *revs`: This is the `rev_info` used for the walk.
 531 - `show_commit_fn show_commit`: A callback which will be used to handle each
 532   individual commit object.
 533 - `show_object_fn show_object`: A callback which will be used to handle each
 534   non-commit object (so each blob, tree, or tag).
 535 - `void *show_data`: A context buffer which is passed in turn to `show_commit`
 536   and `show_object`.
 537 - `struct oidset *omitted`: A linked-list of object IDs which the provided
 538   filter caused to be omitted.
 539
 540 It looks like this `traverse_commit_list_filtered()` uses callbacks we provide
 541 instead of needing us to call it repeatedly ourselves. Cool! Let's add the
 542 callbacks first.
 543
 544 For the sake of this tutorial, we'll simply keep track of how many of each kind
 545 of object we find. At file scope in `builtin/walken.c` add the following
 546 tracking variables:
 547
 548 ----
 549 static int commit_count;
 550 static int tag_count;
 551 static int blob_count;
 552 static int tree_count;
 553 ----
 554
 555 Commits are handled by a different callback than other objects; let's do that
 556 one first:
 557
 558 ----
 559 static void walken_show_commit(struct commit *cmt, void *buf)
 560 {
 561         commit_count++;
 562 }
 563 ----
 564
 565 The `cmt` argument is fairly self-explanatory. But it's worth mentioning that
 566 the `buf` argument is actually the context buffer that we can provide to the
 567 traversal calls - `show_data`, which we mentioned a moment ago.
 568
 569 Since we have the `struct commit` object, we can look at all the same parts that
 570 we looked at in our earlier commit-only walk. For the sake of this tutorial,
 571 though, we'll just increment the commit counter and move on.
 572
 573 The callback for non-commits is a little different, as we'll need to check
 574 which kind of object we're dealing with:
 575
 576 ----
 577 static void walken_show_object(struct object *obj, const char *str, void *buf)
 578 {
 579         switch (obj->type) {
 580         case OBJ_TREE:
 581                 tree_count++;
 582                 break;
 583         case OBJ_BLOB:
 584                 blob_count++;
 585                 break;
 586         case OBJ_TAG:
 587                 tag_count++;
 588                 break;
 589         case OBJ_COMMIT:
 590                 BUG("unexpected commit object in walken_show_object\n");
 591         default:
 592                 BUG("unexpected object type %s in walken_show_object\n",
 593                         type_name(obj->type));
 594         }
 595 }
 596 ----
 597
 598 Again, `obj` is fairly self-explanatory, and we can guess that `buf` is the same
 599 context pointer that `walken_show_commit()` receives: the `show_data` argument
 600 to `traverse_commit_list()` and `traverse_commit_list_filtered()`. Finally,
 601 `str` contains the name of the object, which ends up being something like
 602 `foo.txt` (blob), `bar/baz` (tree), or `v1.2.3` (tag).
 603
 604 To help assure us that we aren't double-counting commits, we'll include some
 605 complaining if a commit object is routed through our non-commit callback; we'll
 606 also complain if we see an invalid object type. Since those two cases should be
 607 unreachable, and would only change in the event of a semantic change to the Git
 608 codebase, we complain by using `BUG()` - which is a signal to a developer that
 609 the change they made caused unintended consequences, and the rest of the
 610 codebase needs to be updated to understand that change. `BUG()` is not intended
 611 to be seen by the public, so it is not localized.
 612
 613 Our main object walk implementation is substantially different from our commit
 614 walk implementation, so let's make a new function to perform the object walk. We
 615 can perform setup which is applicable to all objects here, too, to keep separate
 616 from setup which is applicable to commit-only walks.
 617
 618 We'll start by enabling all types of objects in the `struct rev_info`.  We'll
 619 also turn on `tree_blobs_in_commit_order`, which means that we will walk a
 620 commit's tree and everything it points to immediately after we find each commit,
 621 as opposed to waiting for the end and walking through all trees after the commit
 622 history has been discovered. With the appropriate settings configured, we are
 623 ready to call `prepare_revision_walk()`.
 624
 625 ----
 626 static void walken_object_walk(struct rev_info *rev)
 627 {
 628         rev->tree_objects = 1;
 629         rev->blob_objects = 1;
 630         rev->tag_objects = 1;
 631         rev->tree_blobs_in_commit_order = 1;
 632
 633         if (prepare_revision_walk(rev))
 634                 die(_("revision walk setup failed"));
 635
 636         commit_count = 0;
 637         tag_count = 0;
 638         blob_count = 0;
 639         tree_count = 0;
 640 ----
 641
 642 Let's start by calling just the unfiltered walk and reporting our counts.
 643 Complete your implementation of `walken_object_walk()`.
 644 We'll also need to include the `list-objects.h` header.
 645
 646 ----
 647 #include "list-objects.h"
 648
 649 ...
 650
 651         traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
 652
 653         printf("commits %d\nblobs %d\ntags %d\ntrees %d\n", commit_count,
 654                 blob_count, tag_count, tree_count);
 655 }
 656 ----
 657
 658 NOTE: This output is intended to be machine-parsed. Therefore, we are not
 659 sending it to `trace_printf()`, and we are not localizing it - we need scripts
 660 to be able to count on the formatting to be exactly the way it is shown here.
 661 If we were intending this output to be read by humans, we would need to localize
 662 it with `_()`.
 663
 664 Finally, we'll ask `cmd_walken()` to use the object walk instead. Discussing
 665 command line options is out of scope for this tutorial, so we'll just hardcode
 666 a branch we can change at compile time. Where you call `final_rev_info_setup()`
 667 and `walken_commit_walk()`, instead branch like so:
 668
 669 ----
 670         if (1) {
 671                 add_head_to_pending(&rev);
 672                 walken_object_walk(&rev);
 673         } else {
 674                 final_rev_info_setup(argc, argv, prefix, &rev);
 675                 walken_commit_walk(&rev);
 676         }
 677 ----
 678
 679 NOTE: For simplicity, we've avoided all the filters and sorts we applied in
 680 `final_rev_info_setup()` and simply added `HEAD` to our pending queue. If you
 681 want, you can certainly use the filters we added before by moving
 682 `final_rev_info_setup()` out of the conditional and removing the call to
 683 `add_head_to_pending()`.
 684
 685 Now we can try to run our command! It should take noticeably longer than the
 686 commit walk, but an examination of the output will give you an idea why. Your
 687 output should look similar to this example, but with different counts:
 688
 689 ----
 690 Object walk completed. Found 55733 commits, 100274 blobs, 0 tags, and 104210 trees.
 691 ----
 692
 693 This makes sense. We have more trees than commits because the Git project has
 694 lots of subdirectories which can change, plus at least one tree per commit. We
 695 have no tags because we started on a commit (`HEAD`) and while tags can point to
 696 commits, commits can't point to tags.
 697
 698 NOTE: You will have different counts when you run this yourself! The number of
 699 objects grows along with the Git project.
 700
 701 === Adding a Filter
 702
 703 There are a handful of filters that we can apply to the object walk laid out in
 704 `Documentation/rev-list-options.txt`. These filters are typically useful for
 705 operations such as creating packfiles or performing a partial clone. They are
 706 defined in `list-objects-filter-options.h`. For the purposes of this tutorial we
 707 will use the "tree:1" filter, which causes the walk to omit all trees and blobs
 708 which are not directly referenced by commits reachable from the commit in
 709 `pending` when the walk begins. (`pending` is the list of objects which need to
 710 be traversed during a walk; you can imagine a breadth-first tree traversal to
 711 help understand. In our case, that means we omit trees and blobs not directly
 712 referenced by `HEAD` or `HEAD`'s history, because we begin the walk with only
 713 `HEAD` in the `pending` list.)
 714
 715 First, we'll need to `#include "list-objects-filter-options.h"` and set up the
 716 `struct list_objects_filter_options` at the top of the function.
 717
 718 ----
 719 static void walken_object_walk(struct rev_info *rev)
 720 {
 721         struct list_objects_filter_options filter_options = { 0 };
 722
 723         ...
 724 ----
 725
 726 For now, we are not going to track the omitted objects, so we'll replace those
 727 parameters with `NULL`. For the sake of simplicity, we'll add a simple
 728 build-time branch to use our filter or not. Replace the line calling
 729 `traverse_commit_list()` with the following, which will remind us which kind of
 730 walk we've just performed:
 731
 732 ----
 733         if (0) {
 734                 /* Unfiltered: */
 735                 trace_printf(_("Unfiltered object walk.\n"));
 736                 traverse_commit_list(rev, walken_show_commit,
 737                                 walken_show_object, NULL);
 738         } else {
 739                 trace_printf(
 740                         _("Filtered object walk with filterspec 'tree:1'.\n"));
 741                 parse_list_objects_filter(&filter_options, "tree:1");
 742
 743                 traverse_commit_list_filtered(&filter_options, rev,
 744                         walken_show_commit, walken_show_object, NULL, NULL);
 745         }
 746 ----
 747
 748 `struct list_objects_filter_options` is usually built directly from a command
 749 line argument, so the module provides an easy way to build one from a string.
 750 Even though we aren't taking user input right now, we can still build one with
 751 a hardcoded string using `parse_list_objects_filter()`.
 752
 753 With the filter spec "tree:1", we are expecting to see _only_ the root tree for
 754 each commit; therefore, the tree object count should be less than or equal to
 755 the number of commits. (For an example of why that's true: `git commit --revert`
 756 points to the same tree object as its grandparent.)
 757
 758 === Counting Omitted Objects
 759
 760 We also have the capability to enumerate all objects which were omitted by a
 761 filter, like with `git log --filter=<spec> --filter-print-omitted`. Asking
 762 `traverse_commit_list_filtered()` to populate the `omitted` list means that our
 763 object walk does not perform any better than an unfiltered object walk; all
 764 reachable objects are walked in order to populate the list.
 765
 766 First, add the `struct oidset` and related items we will use to iterate it:
 767
 768 ----
 769 static void walken_object_walk(
 770         ...
 771
 772         struct oidset omitted;
 773         struct oidset_iter oit;
 774         struct object_id *oid = NULL;
 775         int omitted_count = 0;
 776         oidset_init(&omitted, 0);
 777
 778         ...
 779 ----
 780
 781 Modify the call to `traverse_commit_list_filtered()` to include your `omitted`
 782 object:
 783
 784 ----
 785         ...
 786
 787                 traverse_commit_list_filtered(&filter_options, rev,
 788                         walken_show_commit, walken_show_object, NULL, &omitted);
 789
 790         ...
 791 ----
 792
 793 Then, after your traversal, the `oidset` traversal is pretty straightforward.
 794 Count all the objects within and modify the print statement:
 795
 796 ----
 797         /* Count the omitted objects. */
 798         oidset_iter_init(&omitted, &oit);
 799
 800         while ((oid = oidset_iter_next(&oit)))
 801                 omitted_count++;
 802
 803         printf("commits %d\nblobs %d\ntags %d\ntrees %d\nomitted %d\n",
 804                 commit_count, blob_count, tag_count, tree_count, omitted_count);
 805 ----
 806
 807 By running your walk with and without the filter, you should find that the total
 808 object count in each case is identical. You can also time each invocation of
 809 the `walken` subcommand, with and without `omitted` being passed in, to confirm
 810 to yourself the runtime impact of tracking all omitted objects.
 811
 812 === Changing the Order
 813
 814 Finally, let's demonstrate that you can also reorder walks of all objects, not
 815 just walks of commits. First, we'll make our handlers chattier - modify
 816 `walken_show_commit()` and `walken_show_object()` to print the object as they
 817 go:
 818
 819 ----
 820 static void walken_show_commit(struct commit *cmt, void *buf)
 821 {
 822         trace_printf("commit: %s\n", oid_to_hex(&cmt->object.oid));
 823         commit_count++;
 824 }
 825
 826 static void walken_show_object(struct object *obj, const char *str, void *buf)
 827 {
 828         trace_printf("%s: %s\n", type_name(obj->type), oid_to_hex(&obj->oid));
 829
 830         ...
 831 }
 832 ----
 833
 834 NOTE: Since we will be examining this output directly as humans, we'll use
 835 `trace_printf()` here. Additionally, since this change introduces a significant
 836 number of printed lines, using `trace_printf()` will allow us to easily silence
 837 those lines without having to recompile.
 838
 839 (Leave the counter increment logic in place.)
 840
 841 With only that change, run again (but save yourself some scrollback):
 842
 843 ----
 844 $ GIT_TRACE=1 ./bin-wrappers/git walken | head -n 10
 845 ----
 846
 847 Take a look at the top commit with `git show` and the object ID you printed; it
 848 should be the same as the output of `git show HEAD`.
 849
 850 Next, let's change a setting on our `struct rev_info` within
 851 `walken_object_walk()`. Find where you're changing the other settings on `rev`,
 852 such as `rev->tree_objects` and `rev->tree_blobs_in_commit_order`, and add the
 853 `reverse` setting at the bottom:
 854
 855 ----
 856         ...
 857
 858         rev->tree_objects = 1;
 859         rev->blob_objects = 1;
 860         rev->tag_objects = 1;
 861         rev->tree_blobs_in_commit_order = 1;
 862         rev->reverse = 1;
 863
 864         ...
 865 ----
 866
 867 Now, run again, but this time, let's grab the last handful of objects instead
 868 of the first handful:
 869
 870 ----
 871 $ make
 872 $ GIT_TRACE=1 ./bin-wrappers git walken | tail -n 10
 873 ----
 874
 875 The last commit object given should have the same OID as the one we saw at the
 876 top before, and running `git show <oid>` with that OID should give you again
 877 the same results as `git show HEAD`. Furthermore, if you run and examine the
 878 first ten lines again (with `head` instead of `tail` like we did before applying
 879 the `reverse` setting), you should see that now the first commit printed is the
 880 initial commit, `e83c5163`.
 881
 882 == Wrapping Up
 883
 884 Let's review. In this tutorial, we:
 885
 886 - Built a commit walk from the ground up
 887 - Enabled a grep filter for that commit walk
 888 - Changed the sort order of that filtered commit walk
 889 - Built an object walk (tags, commits, trees, and blobs) from the ground up
 890 - Learned how to add a filter-spec to an object walk
 891 - Changed the display order of the filtered object walk