From: Patrick Steinhardt Date: Fri, 24 Oct 2025 06:57:23 +0000 (+0200) Subject: builtin/maintenance: introduce "geometric" strategy X-Git-Tag: v2.52.0-rc0~3^2~1 X-Git-Url: http://git.ipfire.org/gitweb.cgi?a=commitdiff_plain;h=d9bccf2ec3871963098dcd78c61990e27733eb03;p=thirdparty%2Fgit.git builtin/maintenance: introduce "geometric" strategy We have two different repacking strategies in Git: - The "gc" strategy uses git-gc(1). - The "incremental" strategy uses multi-pack indices and `git multi-pack-index repack` to merge together smaller packfiles as determined by a specific batch size. The former strategy is our old and trusted default, whereas the latter has historically been used for our scheduled maintenance. But both strategies have their shortcomings: - The "gc" strategy performs regular all-into-one repacks. Furthermore it is rather inflexible, as it is not easily possible for a user to enable or disable specific subtasks. - The "incremental" strategy is not a full replacement for the "gc" strategy as it doesn't know to prune stale data. So today, we don't have a strategy that is well-suited for large repos while being a full replacement for the "gc" strategy. Introduce a new "geometric" strategy that aims to fill this gap. This strategy invokes all the usual cleanup tasks that git-gc(1) does like pruning reflogs and rerere caches as well as stale worktrees. But where it differs from both the "gc" and "incremental" strategy is that it uses our geometric repacking infrastructure exposed by git-repack(1) to repack packfiles. The advantage of geometric repacking is that we only need to perform an all-into-one repack when the object count in a repo has grown significantly. One downside of this strategy is that pruning of unreferenced objects is not going to happen regularly anymore. Every geometric repack knows to soak up all loose objects regardless of their reachability, and merging two or more packs doesn't consider reachability, either. Consequently, the number of unreachable objects will grow over time. This is remedied by doing an all-into-one repack instead of a geometric repack whenever we determine that the geometric repack would end up merging all packfiles anyway. This all-into-one repack then performs our usual reachability checks and writes unreachable objects into a cruft pack. As cruft packs won't ever be merged during geometric repacks we can thus phase out these objects over time. Of course, this still means that we retain unreachable objects for far longer than with the "gc" strategy. But the maintenance strategy is intended especially for large repositories, where the basic assumption is that the set of unreachable objects will be significantly dwarfed by the number of reachable objects. If this assumption is ever proven to be too disadvantageous we could for example introduce a time-based strategy: if the largest packfile has not been touched for longer than $T, we perform an all-into-one repack. But for now, such a mechanism is deferred into the future as it is not clear yet whether it is needed in the first place. Signed-off-by: Patrick Steinhardt Acked-by: Taylor Blau Signed-off-by: Junio C Hamano --- diff --git a/Documentation/config/maintenance.adoc b/Documentation/config/maintenance.adoc index b2bacdc822..d0c38f03fa 100644 --- a/Documentation/config/maintenance.adoc +++ b/Documentation/config/maintenance.adoc @@ -32,6 +32,15 @@ The possible strategies are: strategy for scheduled maintenance. * `gc`: This strategy runs the `gc` task. This is the default strategy for manual maintenance. +* `geometric`: This strategy performs geometric repacking of packfiles and + keeps auxiliary data structures up-to-date. The strategy expires data in the + reflog and removes worktrees that cannot be located anymore. When the + geometric repacking strategy would decide to do an all-into-one repack, then + the strategy generates a cruft pack for all unreachable objects. Objects that + are already part of a cruft pack will be expired. ++ +This repacking strategy is a full replacement for the `gc` strategy and is +recommended for large repositories. * `incremental`: This setting optimizes for performing small maintenance activities that do not delete any data. This does not schedule the `gc` task, but runs the `prefetch` and `commit-graph` tasks hourly, the diff --git a/builtin/gc.c b/builtin/gc.c index 8cab145009..19be3f87e1 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -1891,12 +1891,43 @@ static const struct maintenance_strategy incremental_strategy = { }, }; +static const struct maintenance_strategy geometric_strategy = { + .tasks = { + [TASK_COMMIT_GRAPH] = { + .type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL, + .schedule = SCHEDULE_HOURLY, + }, + [TASK_GEOMETRIC_REPACK] = { + .type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL, + .schedule = SCHEDULE_DAILY, + }, + [TASK_PACK_REFS] = { + .type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL, + .schedule = SCHEDULE_DAILY, + }, + [TASK_RERERE_GC] = { + .type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL, + .schedule = SCHEDULE_WEEKLY, + }, + [TASK_REFLOG_EXPIRE] = { + .type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL, + .schedule = SCHEDULE_WEEKLY, + }, + [TASK_WORKTREE_PRUNE] = { + .type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL, + .schedule = SCHEDULE_WEEKLY, + }, + }, +}; + static struct maintenance_strategy parse_maintenance_strategy(const char *name) { if (!strcasecmp(name, "incremental")) return incremental_strategy; if (!strcasecmp(name, "gc")) return gc_strategy; + if (!strcasecmp(name, "geometric")) + return geometric_strategy; die(_("unknown maintenance strategy: '%s'"), name); } diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh index 85e0cea4d9..0d76693fee 100755 --- a/t/t7900-maintenance.sh +++ b/t/t7900-maintenance.sh @@ -930,11 +930,29 @@ test_expect_success 'maintenance.strategy is respected' ' git gc --quiet --no-detach --skip-foreground-tasks EOF - test_strategy gc --schedule=weekly <<-\EOF + test_strategy gc --schedule=weekly <<-\EOF && git pack-refs --all --prune git reflog expire --all git gc --quiet --no-detach --skip-foreground-tasks EOF + + test_strategy geometric <<-\EOF && + git pack-refs --all --prune + git reflog expire --all + git repack -d -l --geometric=2 --quiet --write-midx + git commit-graph write --split --reachable --no-progress + git worktree prune --expire 3.months.ago + git rerere gc + EOF + + test_strategy geometric --schedule=weekly <<-\EOF + git pack-refs --all --prune + git reflog expire --all + git repack -d -l --geometric=2 --quiet --write-midx + git commit-graph write --split --reachable --no-progress + git worktree prune --expire 3.months.ago + git rerere gc + EOF ) '