From: Taylor Blau Date: Tue, 2 Jun 2026 22:21:53 +0000 (-0400) Subject: pack-objects: support `--delta-islands` with `--path-walk` X-Git-Url: http://git.ipfire.org/gitweb/index.cgi?a=commitdiff_plain;h=d88060561e0bd621cc68434b79c18b44721dfd90;p=thirdparty%2Fgit.git pack-objects: support `--delta-islands` with `--path-walk` Since the inception of `--path-walk`, this option has had a documented incompatibility with `--delta-islands`. When discussing those original patches on the list, a message from Stolee in [1] noted the following: this could be remedied by [...] doing a separate walk to identify islands using the normal method In a related portion of the thread, Peff explains[2]: The delta islands code already does its own tree walk to propagate the bits down (it does rely on the base walk's show_commit() to propagate through the commits). Once each object has its island bitmaps, I think however you choose to come up with delta candidates [...] you should be able to use it. It's fundamentally just answering the question of "am I allowed to delta between these two objects". That is similar to what this patch does, and it turns out the cheaper option is sufficient: perform the same island side effects from the path-walk callback rather than doing a second walk. Recall how delta-islands are computed during a normal repack: - `show_commit()` calls `propagate_island_marks()` for each commit, which merges the commit's island bitset onto its root tree object and onto each of its parent commits. - `show_object()` for a tree records the tree's depth derived from the slash-separated pathname. Subsequent `resolve_tree_islands()` uses that depth to walk trees in increasing-depth order, propagating each tree's marks to its children. - At delta-search time, `in_same_island()` enforces that a delta target's island bitmap is a subset of its base's: every island that reaches the target must also reach the base. Path-walk's enumeration callback is `add_objects_by_path()`. It already adds objects to `to_pack`, but until now did not perform the island-related side effects. Two things are needed: - For each commit batch, call `propagate_island_marks()` on commits, exactly as `show_commit()` does. We have to be careful about the order in which we call this function, and we must see a commit before its parents in order to have island marks to propagate. The path-walk batch preserves that order. Path-walk appends commits to its `OBJ_COMMIT` batch as they come back from the same `get_revision()` loop the regular traversal uses, and `add_objects_by_path()` iterates the batch in array order. So every commit reaches `propagate_island_marks()` in the same sequence that `show_commit()` would have seen it, and the descendant-first chain that the algorithm relies on is intact. Skip island propagation for excluded commits to match the regular traversal, whose `show_commit()` callback is only invoked for interesting commits. Boundary commits may still be present in path-walk's callback so they can serve as thin-pack bases, but they should not contribute island marks. - For each tree batch, record the tree's depth from the path. Use the `record_tree_depth()` helper from the previous commit so both callbacks behave identically, including the max-depth-wins behavior when a tree is reached via more than one path. The helper accepts both the `show_object()` path shape ("foo", "foo/bar") and the path-walk shape with a trailing slash ("foo/", "foo/bar/"), so depths recorded from either traversal mode are directly comparable. This is implicit in the implementation sketch from Peff above. `resolve_tree_islands()` sorts trees by `oe->tree_depth` in increasing-depth order before propagating marks down, so that a parent tree's marks are finalized before its children inherit them. Without recording the depth at path-walk time, every path-walk-discovered tree would land at depth 0 in `to_pack`, the sort would lose its ordering, and children could inherit marks from parents whose own contributions had not yet been merged in. With those two pieces in place, `resolve_tree_islands()` receives the same island inputs from path-walk as it would from the regular traversal, so the existing island checks can be reused unchanged. Drop the documented incompatibility between `--path-walk` and `--delta-islands`, and add t5320 coverage for path-walk island repacks with and without bitmap writing, as well as the same-island case where a delta remains allowed. [1]: https://lore.kernel.org/git/9aa2471b-0850-4707-9733-d3b33609f5f2@gmail.com/ [2]: https://lore.kernel.org/git/20240911063203.GA1538586@coredump.intra.peff.net/ Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- diff --git a/Documentation/git-pack-objects.adoc b/Documentation/git-pack-objects.adoc index 0adce8961a..65cd00c152 100644 --- a/Documentation/git-pack-objects.adoc +++ b/Documentation/git-pack-objects.adoc @@ -402,13 +402,13 @@ will be automatically changed to version `1`. of filenames that cause collisions in Git's default name-hash algorithm. + -Incompatible with `--delta-islands`. When `--use-bitmap-index` is -specified with `--path-walk`, a successful bitmap traversal is used for -object enumeration, with path-walk remaining as the fallback traversal -when the bitmap cannot satisfy the request. The `--path-walk` option -supports the `--filter=` forms `blob:none`, `blob:limit=`, -`tree:0`, `object:type=`, and `sparse:`. These supported filter -types can be combined with the `combine:+` form. +When `--use-bitmap-index` is specified with `--path-walk`, a successful +bitmap traversal is used for object enumeration, with path-walk +remaining as the fallback traversal when the bitmap cannot satisfy the +request. The `--path-walk` option supports the `--filter=` forms +`blob:none`, `blob:limit=`, `tree:0`, `object:type=`, and +`sparse:`. These supported filter types can be combined with the +`combine:+` form. DELTA ISLANDS diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index db65d0e189..33db92be4c 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -4747,13 +4747,29 @@ static int add_objects_by_path(const char *path, add_object_entry(oid, type, path, exclude); - if (type == OBJ_COMMIT && write_bitmap_index) { + if (type == OBJ_COMMIT) { struct commit *commit; + if (!write_bitmap_index && !use_delta_islands) + continue; + commit = lookup_commit(the_repository, oid); if (!commit) die(_("could not find commit %s"), oid_to_hex(oid)); - index_commit_for_bitmap(commit); + if (write_bitmap_index) + index_commit_for_bitmap(commit); + /* + * Skip island propagation for boundary commits. + * The regular traversal's show_commit() is only + * called for interesting commits; matching that + * here keeps path-walk from doing extra work that + * would only be a no-op anyway (boundary commits + * are not in island_marks). + */ + if (use_delta_islands && !exclude) + propagate_island_marks(the_repository, commit); + } else if (type == OBJ_TREE && use_delta_islands) { + record_tree_depth(oid, path); } } @@ -5215,8 +5231,6 @@ int cmd_pack_objects(int argc, const char *option = NULL; if (!path_walk_filter_compatible(&filter_options)) option = "--filter"; - else if (use_delta_islands) - option = "--delta-islands"; if (option) { warning(_("cannot use %s with %s"), diff --git a/t/t5320-delta-islands.sh b/t/t5320-delta-islands.sh index 2c961c7096..9b28344a0a 100755 --- a/t/t5320-delta-islands.sh +++ b/t/t5320-delta-islands.sh @@ -53,6 +53,35 @@ test_expect_success 'separate islands disallows delta' ' ! is_delta_base $two $one ' +test_expect_success 'path-walk island repack respects islands' ' + GIT_TRACE2_EVENT="$(pwd)/trace.path-walk-islands" \ + git -c "pack.island=refs/heads/(.*)" repack -adfi \ + --path-walk 2>err && + test_region pack-objects path-walk trace.path-walk-islands && + test_grep ! "cannot use --delta-islands with --path-walk" err && + ! is_delta_base $one $two && + ! is_delta_base $two $one +' + +test_expect_success 'path-walk island bitmap repack respects islands' ' + GIT_TRACE2_EVENT="$(pwd)/trace.path-walk-island-bitmap" \ + git -c "pack.island=refs/heads/(.*)" repack -a -d -f -i -b \ + --path-walk 2>err && + test_region pack-objects path-walk trace.path-walk-island-bitmap && + test_path_is_file .git/objects/pack/*.bitmap && + git rev-list --test-bitmap --use-bitmap-index one && + test_grep ! "cannot use --delta-islands with --path-walk" err && + ! is_delta_base $one $two && + ! is_delta_base $two $one +' + +test_expect_success 'path-walk same island allows delta' ' + GIT_TRACE2_EVENT="$(pwd)/trace.path-walk-same-island" \ + git -c "pack.island=refs/heads" repack -adfi --path-walk && + test_region pack-objects path-walk trace.path-walk-same-island && + is_delta_base $one $two +' + test_expect_success 'same island allows delta' ' git -c "pack.island=refs/heads" repack -adfi && is_delta_base $one $two