]>
Commit | Line | Data |
---|---|---|
e0479fa0 ES |
1 | = My First Object Walk |
2 | ||
3 | == What's an Object Walk? | |
4 | ||
5 | The object walk is a key concept in Git - this is the process that underpins | |
6 | operations like object transfer and fsck. Beginning from a given commit, the | |
7 | list of objects is found by walking parent relationships between commits (commit | |
8 | X based on commit W) and containment relationships between objects (tree Y is | |
9 | contained within commit X, and blob Z is located within tree Y, giving our | |
10 | working tree for commit X something like `y/z.txt`). | |
11 | ||
12 | A related concept is the revision walk, which is focused on commit objects and | |
13 | their parent relationships and does not delve into other object types. The | |
14 | revision walk is used for operations like `git log`. | |
15 | ||
16 | === Related Reading | |
17 | ||
18 | - `Documentation/user-manual.txt` under "Hacking Git" contains some coverage of | |
19 | the revision walker in its various incarnations. | |
301d595e | 20 | - `revision.h` |
e0479fa0 ES |
21 | - https://eagain.net/articles/git-for-computer-scientists/[Git for Computer Scientists] |
22 | gives a good overview of the types of objects in Git and what your object | |
23 | walk is really describing. | |
24 | ||
25 | == Setting Up | |
26 | ||
27 | Create a new branch from `master`. | |
28 | ||
29 | ---- | |
30 | git checkout -b revwalk origin/master | |
31 | ---- | |
32 | ||
33 | We'll put our fiddling into a new command. For fun, let's name it `git walken`. | |
34 | Open up a new file `builtin/walken.c` and set up the command handler: | |
35 | ||
36 | ---- | |
37 | /* | |
38 | * "git walken" | |
39 | * | |
40 | * Part of the "My First Object Walk" tutorial. | |
41 | */ | |
42 | ||
43 | #include "builtin.h" | |
bbd7c7b7 | 44 | #include "trace.h" |
e0479fa0 ES |
45 | |
46 | int cmd_walken(int argc, const char **argv, const char *prefix) | |
47 | { | |
48 | trace_printf(_("cmd_walken incoming...\n")); | |
49 | return 0; | |
50 | } | |
51 | ---- | |
52 | ||
bbd7c7b7 VD |
53 | NOTE: `trace_printf()`, defined in `trace.h`, differs from `printf()` in |
54 | that it can be turned on or off at runtime. For the purposes of this | |
55 | tutorial, we will write `walken` as though it is intended for use as | |
56 | a "plumbing" command: that is, a command which is used primarily in | |
57 | scripts, rather than interactively by humans (a "porcelain" command). | |
58 | So we will send our debug output to `trace_printf()` instead. | |
59 | When running, enable trace output by setting the environment variable `GIT_TRACE`. | |
e0479fa0 ES |
60 | |
61 | Add usage text and `-h` handling, like all subcommands should consistently do | |
62 | (our test suite will notice and complain if you fail to do so). | |
7d1b8667 | 63 | We'll need to include the `parse-options.h` header. |
e0479fa0 ES |
64 | |
65 | ---- | |
7d1b8667 JC |
66 | #include "parse-options.h" |
67 | ||
68 | ... | |
69 | ||
e0479fa0 ES |
70 | int cmd_walken(int argc, const char **argv, const char *prefix) |
71 | { | |
72 | const char * const walken_usage[] = { | |
73 | N_("git walken"), | |
74 | NULL, | |
f0ac30ec | 75 | }; |
e0479fa0 ES |
76 | struct option options[] = { |
77 | OPT_END() | |
78 | }; | |
79 | ||
80 | argc = parse_options(argc, argv, prefix, options, walken_usage, 0); | |
81 | ||
82 | ... | |
83 | } | |
84 | ---- | |
85 | ||
86 | Also add the relevant line in `builtin.h` near `cmd_whatchanged()`: | |
87 | ||
88 | ---- | |
89 | int cmd_walken(int argc, const char **argv, const char *prefix); | |
90 | ---- | |
91 | ||
92 | Include the command in `git.c` in `commands[]` near the entry for `whatchanged`, | |
93 | maintaining alphabetical ordering: | |
94 | ||
95 | ---- | |
96 | { "walken", cmd_walken, RUN_SETUP }, | |
97 | ---- | |
98 | ||
99 | Add it to the `Makefile` near the line for `builtin/worktree.o`: | |
100 | ||
101 | ---- | |
102 | BUILTIN_OBJS += builtin/walken.o | |
103 | ---- | |
104 | ||
105 | Build and test out your command, without forgetting to ensure the `DEVELOPER` | |
106 | flag is set, and with `GIT_TRACE` enabled so the debug output can be seen: | |
107 | ||
108 | ---- | |
109 | $ echo DEVELOPER=1 >>config.mak | |
110 | $ make | |
111 | $ GIT_TRACE=1 ./bin-wrappers/git walken | |
112 | ---- | |
113 | ||
114 | NOTE: For a more exhaustive overview of the new command process, take a look at | |
115 | `Documentation/MyFirstContribution.txt`. | |
116 | ||
117 | NOTE: A reference implementation can be found at | |
118 | https://github.com/nasamuffin/git/tree/revwalk. | |
119 | ||
120 | === `struct rev_cmdline_info` | |
121 | ||
122 | The definition of `struct rev_cmdline_info` can be found in `revision.h`. | |
123 | ||
124 | This struct is contained within the `rev_info` struct and is used to reflect | |
125 | parameters provided by the user over the CLI. | |
126 | ||
127 | `nr` represents the number of `rev_cmdline_entry` present in the array. | |
128 | ||
bc5c5ec0 | 129 | `alloc` is used by the `ALLOC_GROW` macro. Check `alloc.h` - this variable is |
13aa9c8b | 130 | used to track the allocated size of the list. |
e0479fa0 ES |
131 | |
132 | Per entry, we find: | |
133 | ||
134 | `item` is the object provided upon which to base the object walk. Items in Git | |
135 | can be blobs, trees, commits, or tags. (See `Documentation/gittutorial-2.txt`.) | |
136 | ||
137 | `name` is the object ID (OID) of the object - a hex string you may be familiar | |
138 | with from using Git to organize your source in the past. Check the tutorial | |
139 | mentioned above towards the top for a discussion of where the OID can come | |
140 | from. | |
141 | ||
142 | `whence` indicates some information about what to do with the parents of the | |
143 | specified object. We'll explore this flag more later on; take a look at | |
144 | `Documentation/revisions.txt` to get an idea of what could set the `whence` | |
145 | value. | |
146 | ||
147 | `flags` are used to hint the beginning of the revision walk and are the first | |
148 | block under the `#include`s in `revision.h`. The most likely ones to be set in | |
149 | the `rev_cmdline_info` are `UNINTERESTING` and `BOTTOM`, but these same flags | |
150 | can be used during the walk, as well. | |
151 | ||
152 | === `struct rev_info` | |
153 | ||
154 | This one is quite a bit longer, and many fields are only used during the walk | |
155 | by `revision.c` - not configuration options. Most of the configurable flags in | |
156 | `struct rev_info` have a mirror in `Documentation/rev-list-options.txt`. It's a | |
157 | good idea to take some time and read through that document. | |
158 | ||
159 | == Basic Commit Walk | |
160 | ||
161 | First, let's see if we can replicate the output of `git log --oneline`. We'll | |
162 | refer back to the implementation frequently to discover norms when performing | |
163 | an object walk of our own. | |
164 | ||
165 | To do so, we'll first find all the commits, in order, which preceded the current | |
166 | commit. We'll extract the name and subject of the commit from each. | |
167 | ||
168 | Ideally, we will also be able to find out which ones are currently at the tip of | |
169 | various branches. | |
170 | ||
171 | === Setting Up | |
172 | ||
173 | Preparing for your object walk has some distinct stages. | |
174 | ||
175 | 1. Perform default setup for this mode, and others which may be invoked. | |
176 | 2. Check configuration files for relevant settings. | |
177 | 3. Set up the `rev_info` struct. | |
178 | 4. Tweak the initialized `rev_info` to suit the current walk. | |
179 | 5. Prepare the `rev_info` for the walk. | |
180 | 6. Iterate over the objects, processing each one. | |
181 | ||
182 | ==== Default Setups | |
183 | ||
184 | Before examining configuration files which may modify command behavior, set up | |
185 | default state for switches or options your command may have. If your command | |
186 | utilizes other Git components, ask them to set up their default states as well. | |
187 | For instance, `git log` takes advantage of `grep` and `diff` functionality, so | |
188 | its `init_log_defaults()` sets its own state (`decoration_style`) and asks | |
189 | `grep` and `diff` to initialize themselves by calling each of their | |
190 | initialization functions. | |
191 | ||
e0479fa0 ES |
192 | ==== Configuring From `.gitconfig` |
193 | ||
194 | Next, we should have a look at any relevant configuration settings (i.e., | |
195 | settings readable and settable from `git config`). This is done by providing a | |
196 | callback to `git_config()`; within that callback, you can also invoke methods | |
197 | from other components you may need that need to intercept these options. Your | |
198 | callback will be invoked once per each configuration value which Git knows about | |
199 | (global, local, worktree, etc.). | |
200 | ||
201 | Similarly to the default values, we don't have anything to do here yet | |
202 | ourselves; however, we should call `git_default_config()` if we aren't calling | |
203 | any other existing config callbacks. | |
204 | ||
7d1b8667 JC |
205 | Add a new function to `builtin/walken.c`. |
206 | We'll also need to include the `config.h` header: | |
e0479fa0 ES |
207 | |
208 | ---- | |
7d1b8667 JC |
209 | #include "config.h" |
210 | ||
211 | ... | |
212 | ||
d08a189c DG |
213 | static int git_walken_config(const char *var, const char *value, |
214 | const struct config_context *ctx, void *cb) | |
e0479fa0 ES |
215 | { |
216 | /* | |
217 | * For now, we don't have any custom configuration, so fall back to | |
218 | * the default config. | |
219 | */ | |
d08a189c | 220 | return git_default_config(var, value, ctx, cb); |
e0479fa0 ES |
221 | } |
222 | ---- | |
223 | ||
224 | Make sure to invoke `git_config()` with it in your `cmd_walken()`: | |
225 | ||
226 | ---- | |
227 | int cmd_walken(int argc, const char **argv, const char *prefix) | |
228 | { | |
229 | ... | |
230 | ||
231 | git_config(git_walken_config, NULL); | |
232 | ||
233 | ... | |
234 | } | |
235 | ---- | |
236 | ||
237 | ==== Setting Up `rev_info` | |
238 | ||
239 | Now that we've gathered external configuration and options, it's time to | |
240 | initialize the `rev_info` object which we will use to perform the walk. This is | |
241 | typically done by calling `repo_init_revisions()` with the repository you intend | |
242 | to target, as well as the `prefix` argument of `cmd_walken` and your `rev_info` | |
243 | struct. | |
244 | ||
7d1b8667 JC |
245 | Add the `struct rev_info` and the `repo_init_revisions()` call. |
246 | We'll also need to include the `revision.h` header: | |
247 | ||
e0479fa0 | 248 | ---- |
7d1b8667 JC |
249 | #include "revision.h" |
250 | ||
251 | ... | |
252 | ||
e0479fa0 ES |
253 | int cmd_walken(int argc, const char **argv, const char *prefix) |
254 | { | |
255 | /* This can go wherever you like in your declarations.*/ | |
256 | struct rev_info rev; | |
257 | ... | |
258 | ||
259 | /* This should go after the git_config() call. */ | |
260 | repo_init_revisions(the_repository, &rev, prefix); | |
261 | ||
262 | ... | |
263 | } | |
264 | ---- | |
265 | ||
266 | ==== Tweaking `rev_info` For the Walk | |
267 | ||
268 | We're getting close, but we're still not quite ready to go. Now that `rev` is | |
269 | initialized, we can modify it to fit our needs. This is usually done within a | |
270 | helper for clarity, so let's add one: | |
271 | ||
272 | ---- | |
273 | static void final_rev_info_setup(struct rev_info *rev) | |
274 | { | |
275 | /* | |
276 | * We want to mimic the appearance of `git log --oneline`, so let's | |
277 | * force oneline format. | |
278 | */ | |
279 | get_commit_format("oneline", rev); | |
280 | ||
281 | /* Start our object walk at HEAD. */ | |
282 | add_head_to_pending(rev); | |
283 | } | |
284 | ---- | |
285 | ||
286 | [NOTE] | |
287 | ==== | |
288 | Instead of using the shorthand `add_head_to_pending()`, you could do | |
289 | something like this: | |
290 | ---- | |
291 | struct setup_revision_opt opt; | |
292 | ||
293 | memset(&opt, 0, sizeof(opt)); | |
294 | opt.def = "HEAD"; | |
295 | opt.revarg_opt = REVARG_COMMITTISH; | |
296 | setup_revisions(argc, argv, rev, &opt); | |
297 | ---- | |
298 | Using a `setup_revision_opt` gives you finer control over your walk's starting | |
299 | point. | |
300 | ==== | |
301 | ||
302 | Then let's invoke `final_rev_info_setup()` after the call to | |
303 | `repo_init_revisions()`: | |
304 | ||
305 | ---- | |
306 | int cmd_walken(int argc, const char **argv, const char *prefix) | |
307 | { | |
308 | ... | |
309 | ||
310 | final_rev_info_setup(&rev); | |
311 | ||
312 | ... | |
313 | } | |
314 | ---- | |
315 | ||
316 | Later, we may wish to add more arguments to `final_rev_info_setup()`. But for | |
317 | now, this is all we need. | |
318 | ||
319 | ==== Preparing `rev_info` For the Walk | |
320 | ||
321 | Now that `rev` is all initialized and configured, we've got one more setup step | |
322 | before we get rolling. We can do this in a helper, which will both prepare the | |
323 | `rev_info` for the walk, and perform the walk itself. Let's start the helper | |
324 | with the call to `prepare_revision_walk()`, which can return an error without | |
325 | dying on its own: | |
326 | ||
327 | ---- | |
328 | static void walken_commit_walk(struct rev_info *rev) | |
329 | { | |
330 | if (prepare_revision_walk(rev)) | |
331 | die(_("revision walk setup failed")); | |
332 | } | |
333 | ---- | |
334 | ||
335 | NOTE: `die()` prints to `stderr` and exits the program. Since it will print to | |
336 | `stderr` it's likely to be seen by a human, so we will localize it. | |
337 | ||
338 | ==== Performing the Walk! | |
339 | ||
340 | Finally! We are ready to begin the walk itself. Now we can see that `rev_info` | |
341 | can also be used as an iterator; we move to the next item in the walk by using | |
342 | `get_revision()` repeatedly. Add the listed variable declarations at the top and | |
343 | the walk loop below the `prepare_revision_walk()` call within your | |
344 | `walken_commit_walk()`: | |
345 | ||
346 | ---- | |
bbd7c7b7 VD |
347 | #include "pretty.h" |
348 | ||
349 | ... | |
350 | ||
e0479fa0 ES |
351 | static void walken_commit_walk(struct rev_info *rev) |
352 | { | |
353 | struct commit *commit; | |
354 | struct strbuf prettybuf = STRBUF_INIT; | |
355 | ||
356 | ... | |
357 | ||
358 | while ((commit = get_revision(rev))) { | |
e0479fa0 ES |
359 | strbuf_reset(&prettybuf); |
360 | pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf); | |
361 | puts(prettybuf.buf); | |
362 | } | |
363 | strbuf_release(&prettybuf); | |
364 | } | |
365 | ---- | |
366 | ||
367 | NOTE: `puts()` prints a `char*` to `stdout`. Since this is the part of the | |
368 | command we expect to be machine-parsed, we're sending it directly to stdout. | |
369 | ||
370 | Give it a shot. | |
371 | ||
372 | ---- | |
373 | $ make | |
374 | $ ./bin-wrappers/git walken | |
375 | ---- | |
376 | ||
377 | You should see all of the subject lines of all the commits in | |
378 | your tree's history, in order, ending with the initial commit, "Initial revision | |
379 | of "git", the information manager from hell". Congratulations! You've written | |
380 | your first revision walk. You can play with printing some additional fields | |
381 | from each commit if you're curious; have a look at the functions available in | |
382 | `commit.h`. | |
383 | ||
384 | === Adding a Filter | |
385 | ||
386 | Next, let's try to filter the commits we see based on their author. This is | |
387 | equivalent to running `git log --author=<pattern>`. We can add a filter by | |
388 | modifying `rev_info.grep_filter`, which is a `struct grep_opt`. | |
389 | ||
96313423 | 390 | First some setup. Add `grep_config()` to `git_walken_config()`: |
e0479fa0 ES |
391 | |
392 | ---- | |
d08a189c DG |
393 | static int git_walken_config(const char *var, const char *value, |
394 | const struct config_context *ctx, void *cb) | |
e0479fa0 | 395 | { |
d08a189c DG |
396 | grep_config(var, value, ctx, cb); |
397 | return git_default_config(var, value, ctx, cb); | |
e0479fa0 ES |
398 | } |
399 | ---- | |
400 | ||
401 | Next, we can modify the `grep_filter`. This is done with convenience functions | |
402 | found in `grep.h`. For fun, we're filtering to only commits from folks using a | |
403 | `gmail.com` email address - a not-very-precise guess at who may be working on | |
404 | Git as a hobby. Since we're checking the author, which is a specific line in the | |
405 | header, we'll use the `append_header_grep_pattern()` helper. We can use | |
406 | the `enum grep_header_field` to indicate which part of the commit header we want | |
407 | to search. | |
408 | ||
409 | In `final_rev_info_setup()`, add your filter line: | |
410 | ||
411 | ---- | |
412 | static void final_rev_info_setup(int argc, const char **argv, | |
413 | const char *prefix, struct rev_info *rev) | |
414 | { | |
415 | ... | |
416 | ||
417 | append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR, | |
418 | "gmail"); | |
419 | compile_grep_patterns(&rev->grep_filter); | |
420 | ||
421 | ... | |
422 | } | |
423 | ---- | |
424 | ||
425 | `append_header_grep_pattern()` adds your new "gmail" pattern to `rev_info`, but | |
426 | it won't work unless we compile it with `compile_grep_patterns()`. | |
427 | ||
428 | NOTE: If you are using `setup_revisions()` (for example, if you are passing a | |
429 | `setup_revision_opt` instead of using `add_head_to_pending()`), you don't need | |
430 | to call `compile_grep_patterns()` because `setup_revisions()` calls it for you. | |
431 | ||
432 | NOTE: We could add the same filter via the `append_grep_pattern()` helper if we | |
433 | wanted to, but `append_header_grep_pattern()` adds the `enum grep_context` and | |
434 | `enum grep_pat_token` for us. | |
435 | ||
436 | === Changing the Order | |
437 | ||
438 | There are a few ways that we can change the order of the commits during a | |
439 | revision walk. Firstly, we can use the `enum rev_sort_order` to choose from some | |
440 | typical orderings. | |
441 | ||
442 | `topo_order` is the same as `git log --topo-order`: we avoid showing a parent | |
443 | before all of its children have been shown, and we avoid mixing commits which | |
444 | are in different lines of history. (`git help log`'s section on `--topo-order` | |
445 | has a very nice diagram to illustrate this.) | |
446 | ||
447 | Let's see what happens when we run with `REV_SORT_BY_COMMIT_DATE` as opposed to | |
448 | `REV_SORT_BY_AUTHOR_DATE`. Add the following: | |
449 | ||
450 | ---- | |
451 | static void final_rev_info_setup(int argc, const char **argv, | |
452 | const char *prefix, struct rev_info *rev) | |
453 | { | |
454 | ... | |
455 | ||
456 | rev->topo_order = 1; | |
457 | rev->sort_order = REV_SORT_BY_COMMIT_DATE; | |
458 | ||
459 | ... | |
460 | } | |
461 | ---- | |
462 | ||
463 | Let's output this into a file so we can easily diff it with the walk sorted by | |
464 | author date. | |
465 | ||
466 | ---- | |
467 | $ make | |
468 | $ ./bin-wrappers/git walken > commit-date.txt | |
469 | ---- | |
470 | ||
471 | Then, let's sort by author date and run it again. | |
472 | ||
473 | ---- | |
474 | static void final_rev_info_setup(int argc, const char **argv, | |
475 | const char *prefix, struct rev_info *rev) | |
476 | { | |
477 | ... | |
478 | ||
479 | rev->topo_order = 1; | |
480 | rev->sort_order = REV_SORT_BY_AUTHOR_DATE; | |
481 | ||
482 | ... | |
483 | } | |
484 | ---- | |
485 | ||
486 | ---- | |
487 | $ make | |
488 | $ ./bin-wrappers/git walken > author-date.txt | |
489 | ---- | |
490 | ||
491 | Finally, compare the two. This is a little less helpful without object names or | |
492 | dates, but hopefully we get the idea. | |
493 | ||
494 | ---- | |
495 | $ diff -u commit-date.txt author-date.txt | |
496 | ---- | |
497 | ||
498 | This display indicates that commits can be reordered after they're written, for | |
499 | example with `git rebase`. | |
500 | ||
501 | Let's try one more reordering of commits. `rev_info` exposes a `reverse` flag. | |
502 | Set that flag somewhere inside of `final_rev_info_setup()`: | |
503 | ||
504 | ---- | |
505 | static void final_rev_info_setup(int argc, const char **argv, const char *prefix, | |
506 | struct rev_info *rev) | |
507 | { | |
508 | ... | |
509 | ||
510 | rev->reverse = 1; | |
511 | ||
512 | ... | |
513 | } | |
514 | ---- | |
515 | ||
516 | Run your walk again and note the difference in order. (If you remove the grep | |
517 | pattern, you should see the last commit this call gives you as your current | |
518 | HEAD.) | |
519 | ||
520 | == Basic Object Walk | |
521 | ||
522 | So far we've been walking only commits. But Git has more types of objects than | |
523 | that! Let's see if we can walk _all_ objects, and find out some information | |
524 | about each one. | |
525 | ||
526 | We can base our work on an example. `git pack-objects` prepares all kinds of | |
527 | objects for packing into a bitmap or packfile. The work we are interested in | |
34e0b72b | 528 | resides in `builtin/pack-objects.c:get_object_list()`; examination of that |
e0479fa0 ES |
529 | function shows that the all-object walk is being performed by |
530 | `traverse_commit_list()` or `traverse_commit_list_filtered()`. Those two | |
531 | functions reside in `list-objects.c`; examining the source shows that, despite | |
532 | the name, these functions traverse all kinds of objects. Let's have a look at | |
f0d2f849 | 533 | the arguments to `traverse_commit_list()`. |
e0479fa0 | 534 | |
f0d2f849 DS |
535 | - `struct rev_info *revs`: This is the `rev_info` used for the walk. If |
536 | its `filter` member is not `NULL`, then `filter` contains information for | |
537 | how to filter the object list. | |
e0479fa0 ES |
538 | - `show_commit_fn show_commit`: A callback which will be used to handle each |
539 | individual commit object. | |
540 | - `show_object_fn show_object`: A callback which will be used to handle each | |
541 | non-commit object (so each blob, tree, or tag). | |
542 | - `void *show_data`: A context buffer which is passed in turn to `show_commit` | |
543 | and `show_object`. | |
f0d2f849 | 544 | |
72991ff5 | 545 | In addition, `traverse_commit_list_filtered()` has an additional parameter: |
f0d2f849 | 546 | |
e0479fa0 ES |
547 | - `struct oidset *omitted`: A linked-list of object IDs which the provided |
548 | filter caused to be omitted. | |
549 | ||
f0d2f849 DS |
550 | It looks like these methods use callbacks we provide instead of needing us |
551 | to call it repeatedly ourselves. Cool! Let's add the callbacks first. | |
e0479fa0 ES |
552 | |
553 | For the sake of this tutorial, we'll simply keep track of how many of each kind | |
554 | of object we find. At file scope in `builtin/walken.c` add the following | |
555 | tracking variables: | |
556 | ||
557 | ---- | |
558 | static int commit_count; | |
559 | static int tag_count; | |
560 | static int blob_count; | |
561 | static int tree_count; | |
562 | ---- | |
563 | ||
564 | Commits are handled by a different callback than other objects; let's do that | |
565 | one first: | |
566 | ||
567 | ---- | |
568 | static void walken_show_commit(struct commit *cmt, void *buf) | |
569 | { | |
570 | commit_count++; | |
571 | } | |
572 | ---- | |
573 | ||
574 | The `cmt` argument is fairly self-explanatory. But it's worth mentioning that | |
575 | the `buf` argument is actually the context buffer that we can provide to the | |
576 | traversal calls - `show_data`, which we mentioned a moment ago. | |
577 | ||
578 | Since we have the `struct commit` object, we can look at all the same parts that | |
579 | we looked at in our earlier commit-only walk. For the sake of this tutorial, | |
580 | though, we'll just increment the commit counter and move on. | |
581 | ||
582 | The callback for non-commits is a little different, as we'll need to check | |
583 | which kind of object we're dealing with: | |
584 | ||
585 | ---- | |
586 | static void walken_show_object(struct object *obj, const char *str, void *buf) | |
587 | { | |
588 | switch (obj->type) { | |
589 | case OBJ_TREE: | |
590 | tree_count++; | |
591 | break; | |
592 | case OBJ_BLOB: | |
593 | blob_count++; | |
594 | break; | |
595 | case OBJ_TAG: | |
596 | tag_count++; | |
597 | break; | |
598 | case OBJ_COMMIT: | |
599 | BUG("unexpected commit object in walken_show_object\n"); | |
600 | default: | |
601 | BUG("unexpected object type %s in walken_show_object\n", | |
602 | type_name(obj->type)); | |
603 | } | |
604 | } | |
605 | ---- | |
606 | ||
607 | Again, `obj` is fairly self-explanatory, and we can guess that `buf` is the same | |
608 | context pointer that `walken_show_commit()` receives: the `show_data` argument | |
609 | to `traverse_commit_list()` and `traverse_commit_list_filtered()`. Finally, | |
610 | `str` contains the name of the object, which ends up being something like | |
611 | `foo.txt` (blob), `bar/baz` (tree), or `v1.2.3` (tag). | |
612 | ||
613 | To help assure us that we aren't double-counting commits, we'll include some | |
614 | complaining if a commit object is routed through our non-commit callback; we'll | |
615 | also complain if we see an invalid object type. Since those two cases should be | |
616 | unreachable, and would only change in the event of a semantic change to the Git | |
617 | codebase, we complain by using `BUG()` - which is a signal to a developer that | |
618 | the change they made caused unintended consequences, and the rest of the | |
619 | codebase needs to be updated to understand that change. `BUG()` is not intended | |
620 | to be seen by the public, so it is not localized. | |
621 | ||
622 | Our main object walk implementation is substantially different from our commit | |
623 | walk implementation, so let's make a new function to perform the object walk. We | |
624 | can perform setup which is applicable to all objects here, too, to keep separate | |
625 | from setup which is applicable to commit-only walks. | |
626 | ||
627 | We'll start by enabling all types of objects in the `struct rev_info`. We'll | |
628 | also turn on `tree_blobs_in_commit_order`, which means that we will walk a | |
629 | commit's tree and everything it points to immediately after we find each commit, | |
630 | as opposed to waiting for the end and walking through all trees after the commit | |
631 | history has been discovered. With the appropriate settings configured, we are | |
632 | ready to call `prepare_revision_walk()`. | |
633 | ||
634 | ---- | |
635 | static void walken_object_walk(struct rev_info *rev) | |
636 | { | |
637 | rev->tree_objects = 1; | |
638 | rev->blob_objects = 1; | |
639 | rev->tag_objects = 1; | |
640 | rev->tree_blobs_in_commit_order = 1; | |
641 | ||
642 | if (prepare_revision_walk(rev)) | |
643 | die(_("revision walk setup failed")); | |
644 | ||
645 | commit_count = 0; | |
646 | tag_count = 0; | |
647 | blob_count = 0; | |
648 | tree_count = 0; | |
649 | ---- | |
650 | ||
651 | Let's start by calling just the unfiltered walk and reporting our counts. | |
7d1b8667 JC |
652 | Complete your implementation of `walken_object_walk()`. |
653 | We'll also need to include the `list-objects.h` header. | |
e0479fa0 ES |
654 | |
655 | ---- | |
7d1b8667 JC |
656 | #include "list-objects.h" |
657 | ||
658 | ... | |
659 | ||
e0479fa0 ES |
660 | traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL); |
661 | ||
662 | printf("commits %d\nblobs %d\ntags %d\ntrees %d\n", commit_count, | |
663 | blob_count, tag_count, tree_count); | |
664 | } | |
665 | ---- | |
666 | ||
667 | NOTE: This output is intended to be machine-parsed. Therefore, we are not | |
668 | sending it to `trace_printf()`, and we are not localizing it - we need scripts | |
669 | to be able to count on the formatting to be exactly the way it is shown here. | |
670 | If we were intending this output to be read by humans, we would need to localize | |
671 | it with `_()`. | |
672 | ||
673 | Finally, we'll ask `cmd_walken()` to use the object walk instead. Discussing | |
674 | command line options is out of scope for this tutorial, so we'll just hardcode | |
675 | a branch we can change at compile time. Where you call `final_rev_info_setup()` | |
676 | and `walken_commit_walk()`, instead branch like so: | |
677 | ||
678 | ---- | |
679 | if (1) { | |
680 | add_head_to_pending(&rev); | |
681 | walken_object_walk(&rev); | |
682 | } else { | |
683 | final_rev_info_setup(argc, argv, prefix, &rev); | |
684 | walken_commit_walk(&rev); | |
685 | } | |
686 | ---- | |
687 | ||
688 | NOTE: For simplicity, we've avoided all the filters and sorts we applied in | |
689 | `final_rev_info_setup()` and simply added `HEAD` to our pending queue. If you | |
690 | want, you can certainly use the filters we added before by moving | |
691 | `final_rev_info_setup()` out of the conditional and removing the call to | |
692 | `add_head_to_pending()`. | |
693 | ||
694 | Now we can try to run our command! It should take noticeably longer than the | |
695 | commit walk, but an examination of the output will give you an idea why. Your | |
696 | output should look similar to this example, but with different counts: | |
697 | ||
698 | ---- | |
699 | Object walk completed. Found 55733 commits, 100274 blobs, 0 tags, and 104210 trees. | |
700 | ---- | |
701 | ||
702 | This makes sense. We have more trees than commits because the Git project has | |
703 | lots of subdirectories which can change, plus at least one tree per commit. We | |
704 | have no tags because we started on a commit (`HEAD`) and while tags can point to | |
705 | commits, commits can't point to tags. | |
706 | ||
707 | NOTE: You will have different counts when you run this yourself! The number of | |
708 | objects grows along with the Git project. | |
709 | ||
710 | === Adding a Filter | |
711 | ||
712 | There are a handful of filters that we can apply to the object walk laid out in | |
713 | `Documentation/rev-list-options.txt`. These filters are typically useful for | |
714 | operations such as creating packfiles or performing a partial clone. They are | |
715 | defined in `list-objects-filter-options.h`. For the purposes of this tutorial we | |
716 | will use the "tree:1" filter, which causes the walk to omit all trees and blobs | |
717 | which are not directly referenced by commits reachable from the commit in | |
718 | `pending` when the walk begins. (`pending` is the list of objects which need to | |
719 | be traversed during a walk; you can imagine a breadth-first tree traversal to | |
720 | help understand. In our case, that means we omit trees and blobs not directly | |
721 | referenced by `HEAD` or `HEAD`'s history, because we begin the walk with only | |
722 | `HEAD` in the `pending` list.) | |
723 | ||
e0479fa0 ES |
724 | For now, we are not going to track the omitted objects, so we'll replace those |
725 | parameters with `NULL`. For the sake of simplicity, we'll add a simple | |
f0d2f849 | 726 | build-time branch to use our filter or not. Preface the line calling |
e0479fa0 ES |
727 | `traverse_commit_list()` with the following, which will remind us which kind of |
728 | walk we've just performed: | |
729 | ||
730 | ---- | |
731 | if (0) { | |
732 | /* Unfiltered: */ | |
733 | trace_printf(_("Unfiltered object walk.\n")); | |
e0479fa0 ES |
734 | } else { |
735 | trace_printf( | |
736 | _("Filtered object walk with filterspec 'tree:1'.\n")); | |
af388889 DG |
737 | |
738 | parse_list_objects_filter(&rev->filter, "tree:1"); | |
e0479fa0 | 739 | } |
f0d2f849 DS |
740 | traverse_commit_list(rev, walken_show_commit, |
741 | walken_show_object, NULL); | |
e0479fa0 ES |
742 | ---- |
743 | ||
f0d2f849 | 744 | The `rev->filter` member is usually built directly from a command |
e0479fa0 ES |
745 | line argument, so the module provides an easy way to build one from a string. |
746 | Even though we aren't taking user input right now, we can still build one with | |
747 | a hardcoded string using `parse_list_objects_filter()`. | |
748 | ||
749 | With the filter spec "tree:1", we are expecting to see _only_ the root tree for | |
750 | each commit; therefore, the tree object count should be less than or equal to | |
751 | the number of commits. (For an example of why that's true: `git commit --revert` | |
752 | points to the same tree object as its grandparent.) | |
753 | ||
754 | === Counting Omitted Objects | |
755 | ||
756 | We also have the capability to enumerate all objects which were omitted by a | |
7250cdb6 DG |
757 | filter, like with `git log --filter=<spec> --filter-print-omitted`. To do this, |
758 | change `traverse_commit_list()` to `traverse_commit_list_filtered()`, which is | |
759 | able to populate an `omitted` list. Asking for this list of filtered objects | |
760 | may cause performance degradations, however, because in this case, despite | |
761 | filtering objects, the possibly much larger set of all reachable objects must | |
762 | be processed in order to populate that list. | |
e0479fa0 ES |
763 | |
764 | First, add the `struct oidset` and related items we will use to iterate it: | |
765 | ||
766 | ---- | |
bbd7c7b7 VD |
767 | #include "oidset.h" |
768 | ||
769 | ... | |
770 | ||
e0479fa0 ES |
771 | static void walken_object_walk( |
772 | ... | |
773 | ||
774 | struct oidset omitted; | |
775 | struct oidset_iter oit; | |
776 | struct object_id *oid = NULL; | |
777 | int omitted_count = 0; | |
778 | oidset_init(&omitted, 0); | |
779 | ||
780 | ... | |
781 | ---- | |
782 | ||
7250cdb6 DG |
783 | Replace the call to `traverse_commit_list()` with |
784 | `traverse_commit_list_filtered()` and pass a pointer to the `omitted` oidset | |
785 | defined and initialized above: | |
e0479fa0 ES |
786 | |
787 | ---- | |
788 | ... | |
789 | ||
f0d2f849 | 790 | traverse_commit_list_filtered(rev, |
e0479fa0 ES |
791 | walken_show_commit, walken_show_object, NULL, &omitted); |
792 | ||
793 | ... | |
794 | ---- | |
795 | ||
796 | Then, after your traversal, the `oidset` traversal is pretty straightforward. | |
797 | Count all the objects within and modify the print statement: | |
798 | ||
799 | ---- | |
800 | /* Count the omitted objects. */ | |
801 | oidset_iter_init(&omitted, &oit); | |
802 | ||
803 | while ((oid = oidset_iter_next(&oit))) | |
804 | omitted_count++; | |
805 | ||
469888e6 | 806 | printf("commits %d\nblobs %d\ntags %d\ntrees %d\nomitted %d\n", |
e0479fa0 ES |
807 | commit_count, blob_count, tag_count, tree_count, omitted_count); |
808 | ---- | |
809 | ||
810 | By running your walk with and without the filter, you should find that the total | |
811 | object count in each case is identical. You can also time each invocation of | |
812 | the `walken` subcommand, with and without `omitted` being passed in, to confirm | |
813 | to yourself the runtime impact of tracking all omitted objects. | |
814 | ||
815 | === Changing the Order | |
816 | ||
817 | Finally, let's demonstrate that you can also reorder walks of all objects, not | |
818 | just walks of commits. First, we'll make our handlers chattier - modify | |
819 | `walken_show_commit()` and `walken_show_object()` to print the object as they | |
820 | go: | |
821 | ||
822 | ---- | |
bbd7c7b7 VD |
823 | #include "hex.h" |
824 | ||
825 | ... | |
826 | ||
e0479fa0 ES |
827 | static void walken_show_commit(struct commit *cmt, void *buf) |
828 | { | |
829 | trace_printf("commit: %s\n", oid_to_hex(&cmt->object.oid)); | |
830 | commit_count++; | |
831 | } | |
832 | ||
833 | static void walken_show_object(struct object *obj, const char *str, void *buf) | |
834 | { | |
835 | trace_printf("%s: %s\n", type_name(obj->type), oid_to_hex(&obj->oid)); | |
836 | ||
837 | ... | |
838 | } | |
839 | ---- | |
840 | ||
841 | NOTE: Since we will be examining this output directly as humans, we'll use | |
842 | `trace_printf()` here. Additionally, since this change introduces a significant | |
843 | number of printed lines, using `trace_printf()` will allow us to easily silence | |
844 | those lines without having to recompile. | |
845 | ||
846 | (Leave the counter increment logic in place.) | |
847 | ||
848 | With only that change, run again (but save yourself some scrollback): | |
849 | ||
850 | ---- | |
95ab557b | 851 | $ GIT_TRACE=1 ./bin-wrappers/git walken 2>&1 | head -n 10 |
e0479fa0 ES |
852 | ---- |
853 | ||
854 | Take a look at the top commit with `git show` and the object ID you printed; it | |
855 | should be the same as the output of `git show HEAD`. | |
856 | ||
857 | Next, let's change a setting on our `struct rev_info` within | |
858 | `walken_object_walk()`. Find where you're changing the other settings on `rev`, | |
859 | such as `rev->tree_objects` and `rev->tree_blobs_in_commit_order`, and add the | |
860 | `reverse` setting at the bottom: | |
861 | ||
862 | ---- | |
863 | ... | |
864 | ||
865 | rev->tree_objects = 1; | |
866 | rev->blob_objects = 1; | |
867 | rev->tag_objects = 1; | |
868 | rev->tree_blobs_in_commit_order = 1; | |
869 | rev->reverse = 1; | |
870 | ||
871 | ... | |
872 | ---- | |
873 | ||
874 | Now, run again, but this time, let's grab the last handful of objects instead | |
875 | of the first handful: | |
876 | ||
877 | ---- | |
878 | $ make | |
95ab557b | 879 | $ GIT_TRACE=1 ./bin-wrappers/git walken 2>&1 | tail -n 10 |
e0479fa0 ES |
880 | ---- |
881 | ||
882 | The last commit object given should have the same OID as the one we saw at the | |
883 | top before, and running `git show <oid>` with that OID should give you again | |
884 | the same results as `git show HEAD`. Furthermore, if you run and examine the | |
885 | first ten lines again (with `head` instead of `tail` like we did before applying | |
886 | the `reverse` setting), you should see that now the first commit printed is the | |
887 | initial commit, `e83c5163`. | |
888 | ||
889 | == Wrapping Up | |
890 | ||
891 | Let's review. In this tutorial, we: | |
892 | ||
893 | - Built a commit walk from the ground up | |
894 | - Enabled a grep filter for that commit walk | |
895 | - Changed the sort order of that filtered commit walk | |
896 | - Built an object walk (tags, commits, trees, and blobs) from the ground up | |
897 | - Learned how to add a filter-spec to an object walk | |
898 | - Changed the display order of the filtered object walk |