]> git.ipfire.org Git - thirdparty/git.git/blame - Documentation/technical/cruft-packs.txt
clone: allow "--bare" with "-o"
[thirdparty/git.git] / Documentation / technical / cruft-packs.txt
CommitLineData
3d89a8c1
TB
1= Cruft packs
2
3The cruft packs feature offer an alternative to Git's traditional mechanism of
4removing unreachable objects. This document provides an overview of Git's
5pruning mechanism, and how a cruft pack can be used instead to accomplish the
6same.
7
8== Background
9
10To remove unreachable objects from your repository, Git offers `git repack -Ad`
11(see linkgit:git-repack[1]). Quoting from the documentation:
12
13[quote]
14[...] unreachable objects in a previous pack become loose, unpacked objects,
15instead of being left in the old pack. [...] loose unreachable objects will be
16pruned according to normal expiry rules with the next 'git gc' invocation.
17
18Unreachable objects aren't removed immediately, since doing so could race with
19an incoming push which may reference an object which is about to be deleted.
f9825d1c 20Instead, those unreachable objects are stored as loose objects and stay that way
3d89a8c1
TB
21until they are older than the expiration window, at which point they are removed
22by linkgit:git-prune[1].
23
24Git must store these unreachable objects loose in order to keep track of their
25per-object mtimes. If these unreachable objects were written into one big pack,
26then either freshening that pack (because an object contained within it was
27re-written) or creating a new pack of unreachable objects would cause the pack's
28mtime to get updated, and the objects within it would never leave the expiration
29window. Instead, objects are stored loose in order to keep track of the
30individual object mtimes and avoid a situation where all cruft objects are
31freshened at once.
32
33This can lead to undesirable situations when a repository contains many
34unreachable objects which have not yet left the grace period. Having large
35directories in the shards of `.git/objects` can lead to decreased performance in
36the repository. But given enough unreachable objects, this can lead to inode
37starvation and degrade the performance of the whole system. Since we
38can never pack those objects, these repositories often take up a large amount of
39disk space, since we can only zlib compress them, but not store them in delta
40chains.
41
42== Cruft packs
43
44A cruft pack eliminates the need for storing unreachable objects in a loose
45state by including the per-object mtimes in a separate file alongside a single
46pack containing all loose objects.
47
48A cruft pack is written by `git repack --cruft` when generating a new pack.
49linkgit:git-pack-objects[1]'s `--cruft` option. Note that `git repack --cruft`
50is a classic all-into-one repack, meaning that everything in the resulting pack is
51reachable, and everything else is unreachable. Once written, the `--cruft`
52option instructs `git repack` to generate another pack containing only objects
53not packed in the previous step (which equates to packing all unreachable
54objects together). This progresses as follows:
55
56 1. Enumerate every object, marking any object which is (a) not contained in a
57 kept-pack, and (b) whose mtime is within the grace period as a traversal
58 tip.
59
60 2. Perform a reachability traversal based on the tips gathered in the previous
61 step, adding every object along the way to the pack.
62
63 3. Write the pack out, along with a `.mtimes` file that records the per-object
64 timestamps.
65
66This mode is invoked internally by linkgit:git-repack[1] when instructed to
67write a cruft pack. Crucially, the set of in-core kept packs is exactly the set
68of packs which will not be deleted by the repack; in other words, they contain
69all of the repository's reachable objects.
70
71When a repository already has a cruft pack, `git repack --cruft` typically only
72adds objects to it. An exception to this is when `git repack` is given the
73`--cruft-expiration` option, which allows the generated cruft pack to omit
74expired objects instead of waiting for linkgit:git-gc[1] to expire those objects
75later on.
76
77It is linkgit:git-gc[1] that is typically responsible for removing expired
78unreachable objects.
79
80== Caution for mixed-version environments
81
82Repositories that have cruft packs in them will continue to work with any older
83version of Git. Note, however, that previous versions of Git which do not
84understand the `.mtimes` file will use the cruft pack's mtime as the mtime for
85all of the objects in it. In other words, do not expect older (pre-cruft pack)
86versions of Git to interpret or even read the contents of the `.mtimes` file.
87
88Note that having mixed versions of Git GC-ing the same repository can lead to
89unreachable objects never being completely pruned. This can happen under the
90following circumstances:
91
92 - An older version of Git running GC explodes the contents of an existing
93 cruft pack loose, using the cruft pack's mtime.
94 - A newer version running GC collects those loose objects into a cruft pack,
95 where the .mtime file reflects the loose object's actual mtimes, but the
96 cruft pack mtime is "now".
97
98Repeating this process will lead to unreachable objects not getting pruned as a
99result of repeatedly resetting the objects' mtimes to the present time.
100
101If you are GC-ing repositories in a mixed version environment, consider omitting
102the `--cruft` option when using linkgit:git-repack[1] and linkgit:git-gc[1], and
103leaving the `gc.cruftPacks` configuration unset until all writers understand
104cruft packs.
105
106== Alternatives
107
108Notable alternatives to this design include:
109
110 - The location of the per-object mtime data, and
111 - Storing unreachable objects in multiple cruft packs.
112
113On the location of mtime data, a new auxiliary file tied to the pack was chosen
114to avoid complicating the `.idx` format. If the `.idx` format were ever to gain
115support for optional chunks of data, it may make sense to consolidate the
116`.mtimes` format into the `.idx` itself.
117
118Storing unreachable objects among multiple cruft packs (e.g., creating a new
119cruft pack during each repacking operation including only unreachable objects
120which aren't already stored in an earlier cruft pack) is significantly more
121complicated to construct, and so aren't pursued here. The obvious drawback to
122the current implementation is that the entire cruft pack must be re-written from
123scratch.