]>
Commit | Line | Data |
---|---|---|
b84f767c DS |
1 | Git commit graph format |
2 | ======================= | |
3 | ||
4 | The Git commit graph stores a list of commit OIDs and some associated | |
5 | metadata, including: | |
6 | ||
7 | - The generation number of the commit. Commits with no parents have | |
8 | generation number 1; commits with parents have generation number | |
9 | one more than the maximum generation number of its parents. We | |
10 | reserve zero as special, and can be used to mark a generation | |
11 | number invalid or as "not computed". | |
12 | ||
13 | - The root tree OID. | |
14 | ||
15 | - The commit date. | |
16 | ||
17 | - The parents of the commit, stored using positional references within | |
18 | the graph file. | |
19 | ||
76ffbca7 GS |
20 | - The Bloom filter of the commit carrying the paths that were changed between |
21 | the commit and its first parent, if requested. | |
22 | ||
b84f767c | 23 | These positional references are stored as unsigned 32-bit integers |
06994ae0 | 24 | corresponding to the array position within the list of commit OIDs. Due |
a9aa3c09 DS |
25 | to some special constants we use to track parents, we can store at most |
26 | (1 << 30) + (1 << 29) + (1 << 28) - 1 (around 1.8 billion) commits. | |
b84f767c DS |
27 | |
28 | == Commit graph files have the following format: | |
29 | ||
30 | In order to allow extensions that add extra data to the graph, we organize | |
31 | the body into "chunks" and provide a binary lookup table at the beginning | |
32 | of the body. The header includes certain values, such as number of chunks | |
33 | and hash type. | |
34 | ||
6141cdfd | 35 | All multi-byte numbers are in network byte order. |
b84f767c DS |
36 | |
37 | HEADER: | |
38 | ||
39 | 4-byte signature: | |
40 | The signature is: {'C', 'G', 'P', 'H'} | |
41 | ||
42 | 1-byte version number: | |
43 | Currently, the only valid version is 1. | |
44 | ||
665d70ad DS |
45 | 1-byte Hash Version |
46 | We infer the hash length (H) from this value: | |
47 | 1 => SHA-1 | |
48 | 2 => SHA-256 | |
49 | If the hash type does not match the repository's hash algorithm, the | |
50 | commit-graph file should be ignored with a warning presented to the | |
51 | user. | |
b84f767c DS |
52 | |
53 | 1-byte number (C) of "chunks" | |
54 | ||
118bd570 DS |
55 | 1-byte number (B) of base commit-graphs |
56 | We infer the length (H*B) of the Base Graphs chunk | |
57 | from this value. | |
b84f767c DS |
58 | |
59 | CHUNK LOOKUP: | |
60 | ||
61 | (C + 1) * 12 bytes listing the table of contents for the chunks: | |
62 | First 4 bytes describe the chunk id. Value 0 is a terminating label. | |
63 | Other 8 bytes provide the byte-offset in current file for chunk to | |
64 | start. (Chunks are ordered contiguously in the file, so you can infer | |
65 | the length using the next chunk position if necessary.) Each chunk | |
66 | ID appears at most once. | |
67 | ||
68 | The remaining data in the body is described one chunk at a time, and | |
69 | these chunks may be given in any order. Chunks are required unless | |
70 | otherwise specified. | |
71 | ||
72 | CHUNK DATA: | |
73 | ||
74 | OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes) | |
75 | The ith entry, F[i], stores the number of OIDs with first | |
76 | byte at most i. Thus F[255] stores the total | |
77 | number of commits (N). | |
78 | ||
79 | OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes) | |
80 | The OIDs for all commits in the graph, sorted in ascending order. | |
81 | ||
a9aa3c09 | 82 | Commit Data (ID: {'C', 'D', 'A', 'T' }) (N * (H + 16) bytes) |
b84f767c DS |
83 | * The first H bytes are for the OID of the root tree. |
84 | * The next 8 bytes are for the positions of the first two parents | |
e40e9365 | 85 | of the ith commit. Stores value 0x70000000 if no parent in that |
b84f767c DS |
86 | position. If there are more than two parents, the second value |
87 | has its most-significant bit on and the other bits store an array | |
5af7417b | 88 | position into the Extra Edge List chunk. |
b84f767c DS |
89 | * The next 8 bytes store the generation number of the commit and |
90 | the commit time in seconds since EPOCH. The generation number | |
91 | uses the higher 30 bits of the first 4 bytes, while the commit | |
92 | time uses the 32 bits of the second 4 bytes, along with the lowest | |
93 | 2 bits of the lowest byte, storing the 33rd and 34th bit of the | |
94 | commit time. | |
95 | ||
5af7417b | 96 | Extra Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional] |
b84f767c DS |
97 | This list of 4-byte values store the second through nth parents for |
98 | all octopus merges. The second parent value in the commit data stores | |
99 | an array position within this list along with the most-significant bit | |
100 | on. Starting at that array position, iterate through this list of commit | |
101 | positions for the parents until reaching a value with the most-significant | |
102 | bit on. The other bits correspond to the position of the last parent. | |
103 | ||
76ffbca7 | 104 | Bloom Filter Index (ID: {'B', 'I', 'D', 'X'}) (N * 4 bytes) [Optional] |
88093289 DS |
105 | * The ith entry, BIDX[i], stores the number of bytes in all Bloom filters |
106 | from commit 0 to commit i (inclusive) in lexicographic order. The Bloom | |
107 | filter for the i-th commit spans from BIDX[i-1] to BIDX[i] (plus header | |
108 | length), where BIDX[-1] is 0. | |
76ffbca7 GS |
109 | * The BIDX chunk is ignored if the BDAT chunk is not present. |
110 | ||
111 | Bloom Filter Data (ID: {'B', 'D', 'A', 'T'}) [Optional] | |
112 | * It starts with header consisting of three unsigned 32-bit integers: | |
113 | - Version of the hash algorithm being used. We currently only support | |
114 | value 1 which corresponds to the 32-bit version of the murmur3 hash | |
115 | implemented exactly as described in | |
116 | https://en.wikipedia.org/wiki/MurmurHash#Algorithm and the double | |
117 | hashing technique using seed values 0x293ae76f and 0x7e646e2 as | |
118 | described in https://doi.org/10.1007/978-3-540-30494-4_26 "Bloom Filters | |
119 | in Probabilistic Verification" | |
120 | - The number of times a path is hashed and hence the number of bit positions | |
121 | that cumulatively determine whether a file is present in the commit. | |
122 | - The minimum number of bits 'b' per entry in the Bloom filter. If the filter | |
123 | contains 'n' entries, then the filter size is the minimum number of 64-bit | |
124 | words that contain n*b bits. | |
125 | * The rest of the chunk is the concatenation of all the computed Bloom | |
126 | filters for the commits in lexicographic order. | |
127 | * Note: Commits with no changes or more than 512 changes have Bloom filters | |
59f0d507 | 128 | of length one, with either all bits set to zero or one respectively. |
76ffbca7 GS |
129 | * The BDAT chunk is present if and only if BIDX is present. |
130 | ||
118bd570 DS |
131 | Base Graphs List (ID: {'B', 'A', 'S', 'E'}) [Optional] |
132 | This list of H-byte hashes describe a set of B commit-graph files that | |
133 | form a commit-graph chain. The graph position for the ith commit in this | |
134 | file's OID Lookup chunk is equal to i plus the number of commits in all | |
135 | base graphs. If B is non-zero, this chunk must exist. | |
136 | ||
b84f767c DS |
137 | TRAILER: |
138 | ||
139 | H-byte HASH-checksum of all of the above. |