Documentation/MyFirstObjectWalk.txt

   1 = My First Object Walk
   2
   3 == What's an Object Walk?
   4
   5 The object walk is a key concept in Git - this is the process that underpins
   6 operations like object transfer and fsck. Beginning from a given commit, the
   7 list of objects is found by walking parent relationships between commits (commit
   8 X based on commit W) and containment relationships between objects (tree Y is
   9 contained within commit X, and blob Z is located within tree Y, giving our
  10 working tree for commit X something like `y/z.txt`).
  11
  12 A related concept is the revision walk, which is focused on commit objects and
  13 their parent relationships and does not delve into other object types. The
  14 revision walk is used for operations like `git log`.
  15
  16 === Related Reading
  17
  18 - `Documentation/user-manual.txt` under "Hacking Git" contains some coverage of
  19   the revision walker in its various incarnations.
  20 - `revision.h`
  21 - https://eagain.net/articles/git-for-computer-scientists/[Git for Computer Scientists]
  22   gives a good overview of the types of objects in Git and what your object
  23   walk is really describing.
  24
  25 == Setting Up
  26
  27 Create a new branch from `master`.
  28
  29 ----
  30 git checkout -b revwalk origin/master
  31 ----
  32
  33 We'll put our fiddling into a new command. For fun, let's name it `git walken`.
  34 Open up a new file `builtin/walken.c` and set up the command handler:
  35
  36 ----
  37 /*
  38  * "git walken"
  39  *
  40  * Part of the "My First Object Walk" tutorial.
  41  */
  42
  43 #include "builtin.h"
  44
  45 int cmd_walken(int argc, const char **argv, const char *prefix)
  46 {
  47         trace_printf(_("cmd_walken incoming...\n"));
  48         return 0;
  49 }
  50 ----
  51
  52 NOTE: `trace_printf()` differs from `printf()` in that it can be turned on or
  53 off at runtime. For the purposes of this tutorial, we will write `walken` as
  54 though it is intended for use as a "plumbing" command: that is, a command which
  55 is used primarily in scripts, rather than interactively by humans (a "porcelain"
  56 command). So we will send our debug output to `trace_printf()` instead. When
  57 running, enable trace output by setting the environment variable `GIT_TRACE`.
  58
  59 Add usage text and `-h` handling, like all subcommands should consistently do
  60 (our test suite will notice and complain if you fail to do so).
  61
  62 ----
  63 int cmd_walken(int argc, const char **argv, const char *prefix)
  64 {
  65         const char * const walken_usage[] = {
  66                 N_("git walken"),
  67                 NULL,
  68         }
  69         struct option options[] = {
  70                 OPT_END()
  71         };
  72
  73         argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
  74
  75         ...
  76 }
  77 ----
  78
  79 Also add the relevant line in `builtin.h` near `cmd_whatchanged()`:
  80
  81 ----
  82 int cmd_walken(int argc, const char **argv, const char *prefix);
  83 ----
  84
  85 Include the command in `git.c` in `commands[]` near the entry for `whatchanged`,
  86 maintaining alphabetical ordering:
  87
  88 ----
  89 { "walken", cmd_walken, RUN_SETUP },
  90 ----
  91
  92 Add it to the `Makefile` near the line for `builtin/worktree.o`:
  93
  94 ----
  95 BUILTIN_OBJS += builtin/walken.o
  96 ----
  97
  98 Build and test out your command, without forgetting to ensure the `DEVELOPER`
  99 flag is set, and with `GIT_TRACE` enabled so the debug output can be seen:
 100
 101 ----
 102 $ echo DEVELOPER=1 >>config.mak
 103 $ make
 104 $ GIT_TRACE=1 ./bin-wrappers/git walken
 105 ----
 106
 107 NOTE: For a more exhaustive overview of the new command process, take a look at
 108 `Documentation/MyFirstContribution.txt`.
 109
 110 NOTE: A reference implementation can be found at
 111 https://github.com/nasamuffin/git/tree/revwalk.
 112
 113 === `struct rev_cmdline_info`
 114
 115 The definition of `struct rev_cmdline_info` can be found in `revision.h`.
 116
 117 This struct is contained within the `rev_info` struct and is used to reflect
 118 parameters provided by the user over the CLI.
 119
 120 `nr` represents the number of `rev_cmdline_entry` present in the array.
 121
 122 `alloc` is used by the `ALLOC_GROW` macro. Check `cache.h` - this variable is
 123 used to track the allocated size of the list.
 124
 125 Per entry, we find:
 126
 127 `item` is the object provided upon which to base the object walk. Items in Git
 128 can be blobs, trees, commits, or tags. (See `Documentation/gittutorial-2.txt`.)
 129
 130 `name` is the object ID (OID) of the object - a hex string you may be familiar
 131 with from using Git to organize your source in the past. Check the tutorial
 132 mentioned above towards the top for a discussion of where the OID can come
 133 from.
 134
 135 `whence` indicates some information about what to do with the parents of the
 136 specified object. We'll explore this flag more later on; take a look at
 137 `Documentation/revisions.txt` to get an idea of what could set the `whence`
 138 value.
 139
 140 `flags` are used to hint the beginning of the revision walk and are the first
 141 block under the `#include`s in `revision.h`. The most likely ones to be set in
 142 the `rev_cmdline_info` are `UNINTERESTING` and `BOTTOM`, but these same flags
 143 can be used during the walk, as well.
 144
 145 === `struct rev_info`
 146
 147 This one is quite a bit longer, and many fields are only used during the walk
 148 by `revision.c` - not configuration options. Most of the configurable flags in
 149 `struct rev_info` have a mirror in `Documentation/rev-list-options.txt`. It's a
 150 good idea to take some time and read through that document.
 151
 152 == Basic Commit Walk
 153
 154 First, let's see if we can replicate the output of `git log --oneline`. We'll
 155 refer back to the implementation frequently to discover norms when performing
 156 an object walk of our own.
 157
 158 To do so, we'll first find all the commits, in order, which preceded the current
 159 commit. We'll extract the name and subject of the commit from each.
 160
 161 Ideally, we will also be able to find out which ones are currently at the tip of
 162 various branches.
 163
 164 === Setting Up
 165
 166 Preparing for your object walk has some distinct stages.
 167
 168 1. Perform default setup for this mode, and others which may be invoked.
 169 2. Check configuration files for relevant settings.
 170 3. Set up the `rev_info` struct.
 171 4. Tweak the initialized `rev_info` to suit the current walk.
 172 5. Prepare the `rev_info` for the walk.
 173 6. Iterate over the objects, processing each one.
 174
 175 ==== Default Setups
 176
 177 Before examining configuration files which may modify command behavior, set up
 178 default state for switches or options your command may have. If your command
 179 utilizes other Git components, ask them to set up their default states as well.
 180 For instance, `git log` takes advantage of `grep` and `diff` functionality, so
 181 its `init_log_defaults()` sets its own state (`decoration_style`) and asks
 182 `grep` and `diff` to initialize themselves by calling each of their
 183 initialization functions.
 184
 185 For our first example within `git walken`, we don't intend to use any other
 186 components within Git, and we don't have any configuration to do.  However, we
 187 may want to add some later, so for now, we can add an empty placeholder. Create
 188 a new function in `builtin/walken.c`:
 189
 190 ----
 191 static void init_walken_defaults(void)
 192 {
 193         /*
 194          * We don't actually need the same components `git log` does; leave this
 195          * empty for now.
 196          */
 197 }
 198 ----
 199
 200 Make sure to add a line invoking it inside of `cmd_walken()`.
 201
 202 ----
 203 int cmd_walken(int argc, const char **argv, const char *prefix)
 204 {
 205         init_walken_defaults();
 206 }
 207 ----
 208
 209 ==== Configuring From `.gitconfig`
 210
 211 Next, we should have a look at any relevant configuration settings (i.e.,
 212 settings readable and settable from `git config`). This is done by providing a
 213 callback to `git_config()`; within that callback, you can also invoke methods
 214 from other components you may need that need to intercept these options. Your
 215 callback will be invoked once per each configuration value which Git knows about
 216 (global, local, worktree, etc.).
 217
 218 Similarly to the default values, we don't have anything to do here yet
 219 ourselves; however, we should call `git_default_config()` if we aren't calling
 220 any other existing config callbacks.
 221
 222 Add a new function to `builtin/walken.c`:
 223
 224 ----
 225 static int git_walken_config(const char *var, const char *value, void *cb)
 226 {
 227         /*
 228          * For now, we don't have any custom configuration, so fall back to
 229          * the default config.
 230          */
 231         return git_default_config(var, value, cb);
 232 }
 233 ----
 234
 235 Make sure to invoke `git_config()` with it in your `cmd_walken()`:
 236
 237 ----
 238 int cmd_walken(int argc, const char **argv, const char *prefix)
 239 {
 240         ...
 241
 242         git_config(git_walken_config, NULL);
 243
 244         ...
 245 }
 246 ----
 247
 248 ==== Setting Up `rev_info`
 249
 250 Now that we've gathered external configuration and options, it's time to
 251 initialize the `rev_info` object which we will use to perform the walk. This is
 252 typically done by calling `repo_init_revisions()` with the repository you intend
 253 to target, as well as the `prefix` argument of `cmd_walken` and your `rev_info`
 254 struct.
 255
 256 Add the `struct rev_info` and the `repo_init_revisions()` call:
 257 ----
 258 int cmd_walken(int argc, const char **argv, const char *prefix)
 259 {
 260         /* This can go wherever you like in your declarations.*/
 261         struct rev_info rev;
 262         ...
 263
 264         /* This should go after the git_config() call. */
 265         repo_init_revisions(the_repository, &rev, prefix);
 266
 267         ...
 268 }
 269 ----
 270
 271 ==== Tweaking `rev_info` For the Walk
 272
 273 We're getting close, but we're still not quite ready to go. Now that `rev` is
 274 initialized, we can modify it to fit our needs. This is usually done within a
 275 helper for clarity, so let's add one:
 276
 277 ----
 278 static void final_rev_info_setup(struct rev_info *rev)
 279 {
 280         /*
 281          * We want to mimic the appearance of `git log --oneline`, so let's
 282          * force oneline format.
 283          */
 284         get_commit_format("oneline", rev);
 285
 286         /* Start our object walk at HEAD. */
 287         add_head_to_pending(rev);
 288 }
 289 ----
 290
 291 [NOTE]
 292 ====
 293 Instead of using the shorthand `add_head_to_pending()`, you could do
 294 something like this:
 295 ----
 296         struct setup_revision_opt opt;
 297
 298         memset(&opt, 0, sizeof(opt));
 299         opt.def = "HEAD";
 300         opt.revarg_opt = REVARG_COMMITTISH;
 301         setup_revisions(argc, argv, rev, &opt);
 302 ----
 303 Using a `setup_revision_opt` gives you finer control over your walk's starting
 304 point.
 305 ====
 306
 307 Then let's invoke `final_rev_info_setup()` after the call to
 308 `repo_init_revisions()`:
 309
 310 ----
 311 int cmd_walken(int argc, const char **argv, const char *prefix)
 312 {
 313         ...
 314
 315         final_rev_info_setup(&rev);
 316
 317         ...
 318 }
 319 ----
 320
 321 Later, we may wish to add more arguments to `final_rev_info_setup()`. But for
 322 now, this is all we need.
 323
 324 ==== Preparing `rev_info` For the Walk
 325
 326 Now that `rev` is all initialized and configured, we've got one more setup step
 327 before we get rolling. We can do this in a helper, which will both prepare the
 328 `rev_info` for the walk, and perform the walk itself. Let's start the helper
 329 with the call to `prepare_revision_walk()`, which can return an error without
 330 dying on its own:
 331
 332 ----
 333 static void walken_commit_walk(struct rev_info *rev)
 334 {
 335         if (prepare_revision_walk(rev))
 336                 die(_("revision walk setup failed"));
 337 }
 338 ----
 339
 340 NOTE: `die()` prints to `stderr` and exits the program. Since it will print to
 341 `stderr` it's likely to be seen by a human, so we will localize it.
 342
 343 ==== Performing the Walk!
 344
 345 Finally! We are ready to begin the walk itself. Now we can see that `rev_info`
 346 can also be used as an iterator; we move to the next item in the walk by using
 347 `get_revision()` repeatedly. Add the listed variable declarations at the top and
 348 the walk loop below the `prepare_revision_walk()` call within your
 349 `walken_commit_walk()`:
 350
 351 ----
 352 static void walken_commit_walk(struct rev_info *rev)
 353 {
 354         struct commit *commit;
 355         struct strbuf prettybuf = STRBUF_INIT;
 356
 357         ...
 358
 359         while ((commit = get_revision(rev))) {
 360                 if (!commit)
 361                         continue;
 362
 363                 strbuf_reset(&prettybuf);
 364                 pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf);
 365                 puts(prettybuf.buf);
 366         }
 367         strbuf_release(&prettybuf);
 368 }
 369 ----
 370
 371 NOTE: `puts()` prints a `char*` to `stdout`. Since this is the part of the
 372 command we expect to be machine-parsed, we're sending it directly to stdout.
 373
 374 Give it a shot.
 375
 376 ----
 377 $ make
 378 $ ./bin-wrappers/git walken
 379 ----
 380
 381 You should see all of the subject lines of all the commits in
 382 your tree's history, in order, ending with the initial commit, "Initial revision
 383 of "git", the information manager from hell". Congratulations! You've written
 384 your first revision walk. You can play with printing some additional fields
 385 from each commit if you're curious; have a look at the functions available in
 386 `commit.h`.
 387
 388 === Adding a Filter
 389
 390 Next, let's try to filter the commits we see based on their author. This is
 391 equivalent to running `git log --author=<pattern>`. We can add a filter by
 392 modifying `rev_info.grep_filter`, which is a `struct grep_opt`.
 393
 394 First some setup. Add `init_grep_defaults()` to `init_walken_defaults()` and add
 395 `grep_config()` to `git_walken_config()`:
 396
 397 ----
 398 static void init_walken_defaults(void)
 399 {
 400         init_grep_defaults(the_repository);
 401 }
 402
 403 ...
 404
 405 static int git_walken_config(const char *var, const char *value, void *cb)
 406 {
 407         grep_config(var, value, cb);
 408         return git_default_config(var, value, cb);
 409 }
 410 ----
 411
 412 Next, we can modify the `grep_filter`. This is done with convenience functions
 413 found in `grep.h`. For fun, we're filtering to only commits from folks using a
 414 `gmail.com` email address - a not-very-precise guess at who may be working on
 415 Git as a hobby. Since we're checking the author, which is a specific line in the
 416 header, we'll use the `append_header_grep_pattern()` helper. We can use
 417 the `enum grep_header_field` to indicate which part of the commit header we want
 418 to search.
 419
 420 In `final_rev_info_setup()`, add your filter line:
 421
 422 ----
 423 static void final_rev_info_setup(int argc, const char **argv,
 424                 const char *prefix, struct rev_info *rev)
 425 {
 426         ...
 427
 428         append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR,
 429                 "gmail");
 430         compile_grep_patterns(&rev->grep_filter);
 431
 432         ...
 433 }
 434 ----
 435
 436 `append_header_grep_pattern()` adds your new "gmail" pattern to `rev_info`, but
 437 it won't work unless we compile it with `compile_grep_patterns()`.
 438
 439 NOTE: If you are using `setup_revisions()` (for example, if you are passing a
 440 `setup_revision_opt` instead of using `add_head_to_pending()`), you don't need
 441 to call `compile_grep_patterns()` because `setup_revisions()` calls it for you.
 442
 443 NOTE: We could add the same filter via the `append_grep_pattern()` helper if we
 444 wanted to, but `append_header_grep_pattern()` adds the `enum grep_context` and
 445 `enum grep_pat_token` for us.
 446
 447 === Changing the Order
 448
 449 There are a few ways that we can change the order of the commits during a
 450 revision walk. Firstly, we can use the `enum rev_sort_order` to choose from some
 451 typical orderings.
 452
 453 `topo_order` is the same as `git log --topo-order`: we avoid showing a parent
 454 before all of its children have been shown, and we avoid mixing commits which
 455 are in different lines of history. (`git help log`'s section on `--topo-order`
 456 has a very nice diagram to illustrate this.)
 457
 458 Let's see what happens when we run with `REV_SORT_BY_COMMIT_DATE` as opposed to
 459 `REV_SORT_BY_AUTHOR_DATE`. Add the following:
 460
 461 ----
 462 static void final_rev_info_setup(int argc, const char **argv,
 463                 const char *prefix, struct rev_info *rev)
 464 {
 465         ...
 466
 467         rev->topo_order = 1;
 468         rev->sort_order = REV_SORT_BY_COMMIT_DATE;
 469
 470         ...
 471 }
 472 ----
 473
 474 Let's output this into a file so we can easily diff it with the walk sorted by
 475 author date.
 476
 477 ----
 478 $ make
 479 $ ./bin-wrappers/git walken > commit-date.txt
 480 ----
 481
 482 Then, let's sort by author date and run it again.
 483
 484 ----
 485 static void final_rev_info_setup(int argc, const char **argv,
 486                 const char *prefix, struct rev_info *rev)
 487 {
 488         ...
 489
 490         rev->topo_order = 1;
 491         rev->sort_order = REV_SORT_BY_AUTHOR_DATE;
 492
 493         ...
 494 }
 495 ----
 496
 497 ----
 498 $ make
 499 $ ./bin-wrappers/git walken > author-date.txt
 500 ----
 501
 502 Finally, compare the two. This is a little less helpful without object names or
 503 dates, but hopefully we get the idea.
 504
 505 ----
 506 $ diff -u commit-date.txt author-date.txt
 507 ----
 508
 509 This display indicates that commits can be reordered after they're written, for
 510 example with `git rebase`.
 511
 512 Let's try one more reordering of commits. `rev_info` exposes a `reverse` flag.
 513 Set that flag somewhere inside of `final_rev_info_setup()`:
 514
 515 ----
 516 static void final_rev_info_setup(int argc, const char **argv, const char *prefix,
 517                 struct rev_info *rev)
 518 {
 519         ...
 520
 521         rev->reverse = 1;
 522
 523         ...
 524 }
 525 ----
 526
 527 Run your walk again and note the difference in order. (If you remove the grep
 528 pattern, you should see the last commit this call gives you as your current
 529 HEAD.)
 530
 531 == Basic Object Walk
 532
 533 So far we've been walking only commits. But Git has more types of objects than
 534 that! Let's see if we can walk _all_ objects, and find out some information
 535 about each one.
 536
 537 We can base our work on an example. `git pack-objects` prepares all kinds of
 538 objects for packing into a bitmap or packfile. The work we are interested in
 539 resides in `builtins/pack-objects.c:get_object_list()`; examination of that
 540 function shows that the all-object walk is being performed by
 541 `traverse_commit_list()` or `traverse_commit_list_filtered()`. Those two
 542 functions reside in `list-objects.c`; examining the source shows that, despite
 543 the name, these functions traverse all kinds of objects. Let's have a look at
 544 the arguments to `traverse_commit_list_filtered()`, which are a superset of the
 545 arguments to the unfiltered version.
 546
 547 - `struct list_objects_filter_options *filter_options`: This is a struct which
 548   stores a filter-spec as outlined in `Documentation/rev-list-options.txt`.
 549 - `struct rev_info *revs`: This is the `rev_info` used for the walk.
 550 - `show_commit_fn show_commit`: A callback which will be used to handle each
 551   individual commit object.
 552 - `show_object_fn show_object`: A callback which will be used to handle each
 553   non-commit object (so each blob, tree, or tag).
 554 - `void *show_data`: A context buffer which is passed in turn to `show_commit`
 555   and `show_object`.
 556 - `struct oidset *omitted`: A linked-list of object IDs which the provided
 557   filter caused to be omitted.
 558
 559 It looks like this `traverse_commit_list_filtered()` uses callbacks we provide
 560 instead of needing us to call it repeatedly ourselves. Cool! Let's add the
 561 callbacks first.
 562
 563 For the sake of this tutorial, we'll simply keep track of how many of each kind
 564 of object we find. At file scope in `builtin/walken.c` add the following
 565 tracking variables:
 566
 567 ----
 568 static int commit_count;
 569 static int tag_count;
 570 static int blob_count;
 571 static int tree_count;
 572 ----
 573
 574 Commits are handled by a different callback than other objects; let's do that
 575 one first:
 576
 577 ----
 578 static void walken_show_commit(struct commit *cmt, void *buf)
 579 {
 580         commit_count++;
 581 }
 582 ----
 583
 584 The `cmt` argument is fairly self-explanatory. But it's worth mentioning that
 585 the `buf` argument is actually the context buffer that we can provide to the
 586 traversal calls - `show_data`, which we mentioned a moment ago.
 587
 588 Since we have the `struct commit` object, we can look at all the same parts that
 589 we looked at in our earlier commit-only walk. For the sake of this tutorial,
 590 though, we'll just increment the commit counter and move on.
 591
 592 The callback for non-commits is a little different, as we'll need to check
 593 which kind of object we're dealing with:
 594
 595 ----
 596 static void walken_show_object(struct object *obj, const char *str, void *buf)
 597 {
 598         switch (obj->type) {
 599         case OBJ_TREE:
 600                 tree_count++;
 601                 break;
 602         case OBJ_BLOB:
 603                 blob_count++;
 604                 break;
 605         case OBJ_TAG:
 606                 tag_count++;
 607                 break;
 608         case OBJ_COMMIT:
 609                 BUG("unexpected commit object in walken_show_object\n");
 610         default:
 611                 BUG("unexpected object type %s in walken_show_object\n",
 612                         type_name(obj->type));
 613         }
 614 }
 615 ----
 616
 617 Again, `obj` is fairly self-explanatory, and we can guess that `buf` is the same
 618 context pointer that `walken_show_commit()` receives: the `show_data` argument
 619 to `traverse_commit_list()` and `traverse_commit_list_filtered()`. Finally,
 620 `str` contains the name of the object, which ends up being something like
 621 `foo.txt` (blob), `bar/baz` (tree), or `v1.2.3` (tag).
 622
 623 To help assure us that we aren't double-counting commits, we'll include some
 624 complaining if a commit object is routed through our non-commit callback; we'll
 625 also complain if we see an invalid object type. Since those two cases should be
 626 unreachable, and would only change in the event of a semantic change to the Git
 627 codebase, we complain by using `BUG()` - which is a signal to a developer that
 628 the change they made caused unintended consequences, and the rest of the
 629 codebase needs to be updated to understand that change. `BUG()` is not intended
 630 to be seen by the public, so it is not localized.
 631
 632 Our main object walk implementation is substantially different from our commit
 633 walk implementation, so let's make a new function to perform the object walk. We
 634 can perform setup which is applicable to all objects here, too, to keep separate
 635 from setup which is applicable to commit-only walks.
 636
 637 We'll start by enabling all types of objects in the `struct rev_info`.  We'll
 638 also turn on `tree_blobs_in_commit_order`, which means that we will walk a
 639 commit's tree and everything it points to immediately after we find each commit,
 640 as opposed to waiting for the end and walking through all trees after the commit
 641 history has been discovered. With the appropriate settings configured, we are
 642 ready to call `prepare_revision_walk()`.
 643
 644 ----
 645 static void walken_object_walk(struct rev_info *rev)
 646 {
 647         rev->tree_objects = 1;
 648         rev->blob_objects = 1;
 649         rev->tag_objects = 1;
 650         rev->tree_blobs_in_commit_order = 1;
 651
 652         if (prepare_revision_walk(rev))
 653                 die(_("revision walk setup failed"));
 654
 655         commit_count = 0;
 656         tag_count = 0;
 657         blob_count = 0;
 658         tree_count = 0;
 659 ----
 660
 661 Let's start by calling just the unfiltered walk and reporting our counts.
 662 Complete your implementation of `walken_object_walk()`:
 663
 664 ----
 665         traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
 666
 667         printf("commits %d\nblobs %d\ntags %d\ntrees %d\n", commit_count,
 668                 blob_count, tag_count, tree_count);
 669 }
 670 ----
 671
 672 NOTE: This output is intended to be machine-parsed. Therefore, we are not
 673 sending it to `trace_printf()`, and we are not localizing it - we need scripts
 674 to be able to count on the formatting to be exactly the way it is shown here.
 675 If we were intending this output to be read by humans, we would need to localize
 676 it with `_()`.
 677
 678 Finally, we'll ask `cmd_walken()` to use the object walk instead. Discussing
 679 command line options is out of scope for this tutorial, so we'll just hardcode
 680 a branch we can change at compile time. Where you call `final_rev_info_setup()`
 681 and `walken_commit_walk()`, instead branch like so:
 682
 683 ----
 684         if (1) {
 685                 add_head_to_pending(&rev);
 686                 walken_object_walk(&rev);
 687         } else {
 688                 final_rev_info_setup(argc, argv, prefix, &rev);
 689                 walken_commit_walk(&rev);
 690         }
 691 ----
 692
 693 NOTE: For simplicity, we've avoided all the filters and sorts we applied in
 694 `final_rev_info_setup()` and simply added `HEAD` to our pending queue. If you
 695 want, you can certainly use the filters we added before by moving
 696 `final_rev_info_setup()` out of the conditional and removing the call to
 697 `add_head_to_pending()`.
 698
 699 Now we can try to run our command! It should take noticeably longer than the
 700 commit walk, but an examination of the output will give you an idea why. Your
 701 output should look similar to this example, but with different counts:
 702
 703 ----
 704 Object walk completed. Found 55733 commits, 100274 blobs, 0 tags, and 104210 trees.
 705 ----
 706
 707 This makes sense. We have more trees than commits because the Git project has
 708 lots of subdirectories which can change, plus at least one tree per commit. We
 709 have no tags because we started on a commit (`HEAD`) and while tags can point to
 710 commits, commits can't point to tags.
 711
 712 NOTE: You will have different counts when you run this yourself! The number of
 713 objects grows along with the Git project.
 714
 715 === Adding a Filter
 716
 717 There are a handful of filters that we can apply to the object walk laid out in
 718 `Documentation/rev-list-options.txt`. These filters are typically useful for
 719 operations such as creating packfiles or performing a partial clone. They are
 720 defined in `list-objects-filter-options.h`. For the purposes of this tutorial we
 721 will use the "tree:1" filter, which causes the walk to omit all trees and blobs
 722 which are not directly referenced by commits reachable from the commit in
 723 `pending` when the walk begins. (`pending` is the list of objects which need to
 724 be traversed during a walk; you can imagine a breadth-first tree traversal to
 725 help understand. In our case, that means we omit trees and blobs not directly
 726 referenced by `HEAD` or `HEAD`'s history, because we begin the walk with only
 727 `HEAD` in the `pending` list.)
 728
 729 First, we'll need to `#include "list-objects-filter-options.h`" and set up the
 730 `struct list_objects_filter_options` at the top of the function.
 731
 732 ----
 733 static void walken_object_walk(struct rev_info *rev)
 734 {
 735         struct list_objects_filter_options filter_options = {};
 736
 737         ...
 738 ----
 739
 740 For now, we are not going to track the omitted objects, so we'll replace those
 741 parameters with `NULL`. For the sake of simplicity, we'll add a simple
 742 build-time branch to use our filter or not. Replace the line calling
 743 `traverse_commit_list()` with the following, which will remind us which kind of
 744 walk we've just performed:
 745
 746 ----
 747         if (0) {
 748                 /* Unfiltered: */
 749                 trace_printf(_("Unfiltered object walk.\n"));
 750                 traverse_commit_list(rev, walken_show_commit,
 751                                 walken_show_object, NULL);
 752         } else {
 753                 trace_printf(
 754                         _("Filtered object walk with filterspec 'tree:1'.\n"));
 755                 parse_list_objects_filter(&filter_options, "tree:1");
 756
 757                 traverse_commit_list_filtered(&filter_options, rev,
 758                         walken_show_commit, walken_show_object, NULL, NULL);
 759         }
 760 ----
 761
 762 `struct list_objects_filter_options` is usually built directly from a command
 763 line argument, so the module provides an easy way to build one from a string.
 764 Even though we aren't taking user input right now, we can still build one with
 765 a hardcoded string using `parse_list_objects_filter()`.
 766
 767 With the filter spec "tree:1", we are expecting to see _only_ the root tree for
 768 each commit; therefore, the tree object count should be less than or equal to
 769 the number of commits. (For an example of why that's true: `git commit --revert`
 770 points to the same tree object as its grandparent.)
 771
 772 === Counting Omitted Objects
 773
 774 We also have the capability to enumerate all objects which were omitted by a
 775 filter, like with `git log --filter=<spec> --filter-print-omitted`. Asking
 776 `traverse_commit_list_filtered()` to populate the `omitted` list means that our
 777 object walk does not perform any better than an unfiltered object walk; all
 778 reachable objects are walked in order to populate the list.
 779
 780 First, add the `struct oidset` and related items we will use to iterate it:
 781
 782 ----
 783 static void walken_object_walk(
 784         ...
 785
 786         struct oidset omitted;
 787         struct oidset_iter oit;
 788         struct object_id *oid = NULL;
 789         int omitted_count = 0;
 790         oidset_init(&omitted, 0);
 791
 792         ...
 793 ----
 794
 795 Modify the call to `traverse_commit_list_filtered()` to include your `omitted`
 796 object:
 797
 798 ----
 799         ...
 800
 801                 traverse_commit_list_filtered(&filter_options, rev,
 802                         walken_show_commit, walken_show_object, NULL, &omitted);
 803
 804         ...
 805 ----
 806
 807 Then, after your traversal, the `oidset` traversal is pretty straightforward.
 808 Count all the objects within and modify the print statement:
 809
 810 ----
 811         /* Count the omitted objects. */
 812         oidset_iter_init(&omitted, &oit);
 813
 814         while ((oid = oidset_iter_next(&oit)))
 815                 omitted_count++;
 816
 817         printf("commits %d\nblobs %d\ntags %d\ntrees%d\nomitted %d\n",
 818                 commit_count, blob_count, tag_count, tree_count, omitted_count);
 819 ----
 820
 821 By running your walk with and without the filter, you should find that the total
 822 object count in each case is identical. You can also time each invocation of
 823 the `walken` subcommand, with and without `omitted` being passed in, to confirm
 824 to yourself the runtime impact of tracking all omitted objects.
 825
 826 === Changing the Order
 827
 828 Finally, let's demonstrate that you can also reorder walks of all objects, not
 829 just walks of commits. First, we'll make our handlers chattier - modify
 830 `walken_show_commit()` and `walken_show_object()` to print the object as they
 831 go:
 832
 833 ----
 834 static void walken_show_commit(struct commit *cmt, void *buf)
 835 {
 836         trace_printf("commit: %s\n", oid_to_hex(&cmt->object.oid));
 837         commit_count++;
 838 }
 839
 840 static void walken_show_object(struct object *obj, const char *str, void *buf)
 841 {
 842         trace_printf("%s: %s\n", type_name(obj->type), oid_to_hex(&obj->oid));
 843
 844         ...
 845 }
 846 ----
 847
 848 NOTE: Since we will be examining this output directly as humans, we'll use
 849 `trace_printf()` here. Additionally, since this change introduces a significant
 850 number of printed lines, using `trace_printf()` will allow us to easily silence
 851 those lines without having to recompile.
 852
 853 (Leave the counter increment logic in place.)
 854
 855 With only that change, run again (but save yourself some scrollback):
 856
 857 ----
 858 $ GIT_TRACE=1 ./bin-wrappers/git walken | head -n 10
 859 ----
 860
 861 Take a look at the top commit with `git show` and the object ID you printed; it
 862 should be the same as the output of `git show HEAD`.
 863
 864 Next, let's change a setting on our `struct rev_info` within
 865 `walken_object_walk()`. Find where you're changing the other settings on `rev`,
 866 such as `rev->tree_objects` and `rev->tree_blobs_in_commit_order`, and add the
 867 `reverse` setting at the bottom:
 868
 869 ----
 870         ...
 871
 872         rev->tree_objects = 1;
 873         rev->blob_objects = 1;
 874         rev->tag_objects = 1;
 875         rev->tree_blobs_in_commit_order = 1;
 876         rev->reverse = 1;
 877
 878         ...
 879 ----
 880
 881 Now, run again, but this time, let's grab the last handful of objects instead
 882 of the first handful:
 883
 884 ----
 885 $ make
 886 $ GIT_TRACE=1 ./bin-wrappers git walken | tail -n 10
 887 ----
 888
 889 The last commit object given should have the same OID as the one we saw at the
 890 top before, and running `git show <oid>` with that OID should give you again
 891 the same results as `git show HEAD`. Furthermore, if you run and examine the
 892 first ten lines again (with `head` instead of `tail` like we did before applying
 893 the `reverse` setting), you should see that now the first commit printed is the
 894 initial commit, `e83c5163`.
 895
 896 == Wrapping Up
 897
 898 Let's review. In this tutorial, we:
 899
 900 - Built a commit walk from the ground up
 901 - Enabled a grep filter for that commit walk
 902 - Changed the sort order of that filtered commit walk
 903 - Built an object walk (tags, commits, trees, and blobs) from the ground up
 904 - Learned how to add a filter-spec to an object walk
 905 - Changed the display order of the filtered object walk