pack-bitmap: pass object position to `fill_bitmap_tree()`
In the following commit, callers of `fill_bitmap_tree()` will be
required to check the bit corresponding to their tree before calling
that function. That change will reduce the overhead of setting up and
tearing down stack frames for trees whose bits are already set.
To prepare for that change, have callers pass in the tree's bit position
in `fill_bitmap_tree()`, which will make the next commit easier to read.
In the meantime, this change has a surprising and measurable benefit
during bitmap generation, particularly on very large repositories.
When processing sub-trees within `fill_bitmap_tree()`, the preimage of
this patch did the following:
while (tree_entry(&desc, entry)) {
switch (object_type(entry.mode)) {
case OBJ_TREE:
if (fill_bitmap_tree(writer, bitmap,
lookup_tree(writer->repo,
&entry.oid)) < 0) {
/* ... */
}
/* ... */
}
}
, first performing the object lookup via `lookup_tree()`, and then
locating its bit position within the recursive call. This patch
effectively reorders those two calls so that we first discover the
sub-tree's bit position, *then* load its tree.
By reordering these two operations, we spend fewer CPU cycles per
instruction, likely due to improved CPU dependency/cache/pipeline
behavior. Comparing the results of: running `perf stat` before and after
this commit, we have:
+--------------+-------------+-------------+-------------------+
| | HEAD^ | HEAD | Delta |
+--------------+-------------+-------------+-------------------+
| elapsed | 612.5 s | 582.4 s | -30.1 s (-4.9%) |
| cycles | 2,857.3 B | 2,713.3 B | -144.0 B (-5.0%) |
| instructions | 2,413.2 B | 2,415.5 B | +2.3 B (+0.1%) |
| CPI | 1.184 | 1.123 | -0.061 (-5.1%) |
+--------------+-------------+-------------+-------------------+
In a large repository with ~4.8M commit, and ~37.1M tree objects this
change improves timing from ~612.5 seconds down to ~582.4 seconds, or a
~4.9% improvement. More importantly, the number of CPU cycles spent
dropped off significantly as a result of this commit, lowering our
cycles-per-instruction ratio by about ~5.1%.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>