]>
Commit | Line | Data |
---|---|---|
48a8c26c | 1 | Git pack format |
9760662f JH |
2 | =============== |
3 | ||
5316c8e9 | 4 | == pack-*.pack files have the following format: |
9760662f | 5 | |
71362bd5 | 6 | - A header appears at the beginning and consists of the following: |
9760662f | 7 | |
1361fa3e SP |
8 | 4-byte signature: |
9 | The signature is: {'P', 'A', 'C', 'K'} | |
10 | ||
11 | 4-byte version number (network byte order): | |
48a8c26c | 12 | Git currently accepts version number 2 or 3 but |
1361fa3e SP |
13 | generates version 2 only. |
14 | ||
9760662f JH |
15 | 4-byte number of objects contained in the pack (network byte order) |
16 | ||
17 | Observation: we cannot have more than 4G versions ;-) and | |
18 | more than 4G objects in a pack. | |
19 | ||
20 | - The header is followed by number of object entries, each of | |
21 | which looks like this: | |
22 | ||
23 | (undeltified representation) | |
979ea585 | 24 | n-byte type and length (3-bit type, (n-1)*7+4-bit length) |
9760662f JH |
25 | compressed data |
26 | ||
27 | (deltified representation) | |
979ea585 | 28 | n-byte type and length (3-bit type, (n-1)*7+4-bit length) |
06cb843f SS |
29 | 20-byte base object name if OBJ_REF_DELTA or a negative relative |
30 | offset from the delta object's position in the pack if this | |
31 | is an OBJ_OFS_DELTA object | |
9760662f JH |
32 | compressed delta data |
33 | ||
34 | Observation: length of each object is encoded in a variable | |
35 | length format and is not constrained to 32-bit or anything. | |
36 | ||
d5fa1f1a | 37 | - The trailer records 20-byte SHA-1 checksum of all of the above. |
9760662f | 38 | |
011b6486 NTND |
39 | === Object types |
40 | ||
41 | Valid object types are: | |
42 | ||
43 | - OBJ_COMMIT (1) | |
44 | - OBJ_TREE (2) | |
45 | - OBJ_BLOB (3) | |
46 | - OBJ_TAG (4) | |
47 | - OBJ_OFS_DELTA (6) | |
48 | - OBJ_REF_DELTA (7) | |
49 | ||
50 | Type 5 is reserved for future expansion. Type 0 is invalid. | |
51 | ||
52 | === Deltified representation | |
53 | ||
54 | Conceptually there are only four object types: commit, tree, tag and | |
55 | blob. However to save space, an object could be stored as a "delta" of | |
56 | another "base" object. These representations are assigned new types | |
57 | ofs-delta and ref-delta, which is only valid in a pack file. | |
58 | ||
59 | Both ofs-delta and ref-delta store the "delta" to be applied to | |
60 | another object (called 'base object') to reconstruct the object. The | |
61 | difference between them is, ref-delta directly encodes 20-byte base | |
62 | object name. If the base object is in the same pack, ofs-delta encodes | |
63 | the offset of the base object in the pack instead. | |
64 | ||
65 | The base object could also be deltified if it's in the same pack. | |
66 | Ref-delta can also refer to an object outside the pack (i.e. the | |
67 | so-called "thin pack"). When stored on disk however, the pack should | |
68 | be self contained to avoid cyclic dependency. | |
69 | ||
70 | The delta data is a sequence of instructions to reconstruct an object | |
71 | from the base object. If the base object is deltified, it must be | |
72 | converted to canonical form first. Each instruction appends more and | |
73 | more data to the target object until it's complete. There are two | |
74 | supported instructions so far: one for copy a byte range from the | |
75 | source object and one for inserting new data embedded in the | |
76 | instruction itself. | |
77 | ||
78 | Each instruction has variable length. Instruction type is determined | |
79 | by the seventh bit of the first octet. The following diagrams follow | |
80 | the convention in RFC 1951 (Deflate compressed data format). | |
81 | ||
82 | ==== Instruction to copy from base object | |
83 | ||
84 | +----------+---------+---------+---------+---------+-------+-------+-------+ | |
85 | | 1xxxxxxx | offset1 | offset2 | offset3 | offset4 | size1 | size2 | size3 | | |
86 | +----------+---------+---------+---------+---------+-------+-------+-------+ | |
87 | ||
88 | This is the instruction format to copy a byte range from the source | |
89 | object. It encodes the offset to copy from and the number of bytes to | |
90 | copy. Offset and size are in little-endian order. | |
91 | ||
92 | All offset and size bytes are optional. This is to reduce the | |
93 | instruction size when encoding small offsets or sizes. The first seven | |
94 | bits in the first octet determines which of the next seven octets is | |
95 | present. If bit zero is set, offset1 is present. If bit one is set | |
96 | offset2 is present and so on. | |
97 | ||
98 | Note that a more compact instruction does not change offset and size | |
99 | encoding. For example, if only offset2 is omitted like below, offset3 | |
100 | still contains bits 16-23. It does not become offset2 and contains | |
101 | bits 8-15 even if it's right next to offset1. | |
102 | ||
103 | +----------+---------+---------+ | |
104 | | 10000101 | offset1 | offset3 | | |
105 | +----------+---------+---------+ | |
106 | ||
107 | In its most compact form, this instruction only takes up one byte | |
108 | (0x80) with both offset and size omitted, which will have default | |
109 | values zero. There is another exception: size zero is automatically | |
110 | converted to 0x10000. | |
111 | ||
112 | ==== Instruction to add new data | |
113 | ||
114 | +----------+============+ | |
115 | | 0xxxxxxx | data | | |
116 | +----------+============+ | |
117 | ||
118 | This is the instruction to construct target object without the base | |
119 | object. The following data is appended to the target object. The first | |
120 | seven bits of the first octet determines the size of data in | |
121 | bytes. The size must be non-zero. | |
122 | ||
123 | ==== Reserved instruction | |
124 | ||
125 | +----------+============ | |
126 | | 00000000 | | |
127 | +----------+============ | |
128 | ||
129 | This is the instruction reserved for future expansion. | |
130 | ||
5316c8e9 | 131 | == Original (version 1) pack-*.idx files have the following format: |
9760662f JH |
132 | |
133 | - The header consists of 256 4-byte network byte order | |
134 | integers. N-th entry of this table records the number of | |
135 | objects in the corresponding pack, the first byte of whose | |
71362bd5 | 136 | object name is less than or equal to N. This is called the |
9760662f JH |
137 | 'first-level fan-out' table. |
138 | ||
1361fa3e | 139 | - The header is followed by sorted 24-byte entries, one entry |
9760662f JH |
140 | per object in the pack. Each entry is: |
141 | ||
142 | 4-byte network byte order integer, recording where the | |
143 | object is stored in the packfile as the offset from the | |
144 | beginning. | |
145 | ||
146 | 20-byte object name. | |
147 | ||
9760662f JH |
148 | - The file is concluded with a trailer: |
149 | ||
d5fa1f1a | 150 | A copy of the 20-byte SHA-1 checksum at the end of |
9760662f JH |
151 | corresponding packfile. |
152 | ||
d5fa1f1a | 153 | 20-byte SHA-1-checksum of all of the above. |
9760662f JH |
154 | |
155 | Pack Idx file: | |
156 | ||
71362bd5 | 157 | -- +--------------------------------+ |
158 | fanout | fanout[0] = 2 (for example) |-. | |
159 | table +--------------------------------+ | | |
9760662f JH |
160 | | fanout[1] | | |
161 | +--------------------------------+ | | |
162 | | fanout[2] | | | |
163 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | | |
71362bd5 | 164 | | fanout[255] = total objects |---. |
165 | -- +--------------------------------+ | | | |
166 | main | offset | | | | |
167 | index | object name 00XXXXXXXXXXXXXXXX | | | | |
168 | table +--------------------------------+ | | | |
169 | | offset | | | | |
170 | | object name 00XXXXXXXXXXXXXXXX | | | | |
171 | +--------------------------------+<+ | | |
172 | .-| offset | | | |
173 | | | object name 01XXXXXXXXXXXXXXXX | | | |
174 | | +--------------------------------+ | | |
175 | | | offset | | | |
176 | | | object name 01XXXXXXXXXXXXXXXX | | | |
177 | | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | | |
178 | | | offset | | | |
179 | | | object name FFXXXXXXXXXXXXXXXX | | | |
180 | --| +--------------------------------+<--+ | |
9760662f JH |
181 | trailer | | packfile checksum | |
182 | | +--------------------------------+ | |
183 | | | idxfile checksum | | |
184 | | +--------------------------------+ | |
a6080a0a | 185 | .-------. |
9760662f JH |
186 | | |
187 | Pack file entry: <+ | |
188 | ||
189 | packed object header: | |
979ea585 PE |
190 | 1-byte size extension bit (MSB) |
191 | type (next 3 bit) | |
a6080a0a | 192 | size0 (lower 4-bit) |
9760662f JH |
193 | n-byte sizeN (as long as MSB is set, each 7-bit) |
194 | size0..sizeN form 4+7+7+..+7 bit integer, size0 | |
979ea585 PE |
195 | is the least significant part, and sizeN is the |
196 | most significant part. | |
9760662f JH |
197 | packed object data: |
198 | If it is not DELTA, then deflated bytes (the size above | |
199 | is the size before compression). | |
9de328fe | 200 | If it is REF_DELTA, then |
d5fa1f1a | 201 | 20-byte base object name SHA-1 (the size above is the |
a6080a0a | 202 | size of the delta data that follows). |
9760662f | 203 | delta data, deflated. |
9de328fe PE |
204 | If it is OFS_DELTA, then |
205 | n-byte offset (see below) interpreted as a negative | |
206 | offset from the type-byte of the header of the | |
207 | ofs-delta entry (the size above is the size of | |
208 | the delta data that follows). | |
209 | delta data, deflated. | |
210 | ||
211 | offset encoding: | |
212 | n bytes with MSB set in all but the last one. | |
213 | The offset is then the number constructed by | |
214 | concatenating the lower 7 bit of each byte, and | |
215 | for n >= 2 adding 2^7 + 2^14 + ... + 2^(7*(n-1)) | |
216 | to the result. | |
217 | ||
71362bd5 | 218 | |
219 | ||
5316c8e9 TA |
220 | == Version 2 pack-*.idx files support packs larger than 4 GiB, and |
221 | have some other reorganizations. They have the format: | |
71362bd5 | 222 | |
223 | - A 4-byte magic number '\377tOc' which is an unreasonable | |
224 | fanout[0] value. | |
225 | ||
226 | - A 4-byte version number (= 2) | |
227 | ||
228 | - A 256-entry fan-out table just like v1. | |
229 | ||
d5fa1f1a | 230 | - A table of sorted 20-byte SHA-1 object names. These are |
71362bd5 | 231 | packed together without offset values to reduce the cache |
232 | footprint of the binary search for a specific object name. | |
233 | ||
234 | - A table of 4-byte CRC32 values of the packed object data. | |
235 | This is new in v2 so compressed data can be copied directly | |
f1cdcc70 | 236 | from pack to pack during repacking without undetected |
71362bd5 | 237 | data corruption. |
238 | ||
239 | - A table of 4-byte offset values (in network byte order). | |
240 | These are usually 31-bit pack file offsets, but large | |
241 | offsets are encoded as an index into the next table with | |
242 | the msbit set. | |
243 | ||
244 | - A table of 8-byte offset entries (empty for pack files less | |
245 | than 2 GiB). Pack files are organized with heavily used | |
246 | objects toward the front, so most object references should | |
247 | not need to refer to this table. | |
248 | ||
249 | - The same trailer as a v1 pack file: | |
250 | ||
d5fa1f1a | 251 | A copy of the 20-byte SHA-1 checksum at the end of |
71362bd5 | 252 | corresponding packfile. |
253 | ||
d5fa1f1a | 254 | 20-byte SHA-1-checksum of all of the above. |
e0d1bcf8 DS |
255 | |
256 | == multi-pack-index (MIDX) files have the following format: | |
257 | ||
258 | The multi-pack-index files refer to multiple pack-files and loose objects. | |
259 | ||
260 | In order to allow extensions that add extra data to the MIDX, we organize | |
261 | the body into "chunks" and provide a lookup table at the beginning of the | |
262 | body. The header includes certain length values, such as the number of packs, | |
263 | the number of base MIDX files, hash lengths and types. | |
264 | ||
265 | All 4-byte numbers are in network order. | |
266 | ||
267 | HEADER: | |
268 | ||
269 | 4-byte signature: | |
270 | The signature is: {'M', 'I', 'D', 'X'} | |
271 | ||
272 | 1-byte version number: | |
273 | Git only writes or recognizes version 1. | |
274 | ||
275 | 1-byte Object Id Version | |
276 | Git only writes or recognizes version 1 (SHA1). | |
277 | ||
278 | 1-byte number of "chunks" | |
279 | ||
280 | 1-byte number of base multi-pack-index files: | |
281 | This value is currently always zero. | |
282 | ||
283 | 4-byte number of pack files | |
284 | ||
285 | CHUNK LOOKUP: | |
286 | ||
287 | (C + 1) * 12 bytes providing the chunk offsets: | |
288 | First 4 bytes describe chunk id. Value 0 is a terminating label. | |
289 | Other 8 bytes provide offset in current file for chunk to start. | |
290 | (Chunks are provided in file-order, so you can infer the length | |
291 | using the next chunk position if necessary.) | |
292 | ||
293 | The remaining data in the body is described one chunk at a time, and | |
294 | these chunks may be given in any order. Chunks are required unless | |
295 | otherwise specified. | |
296 | ||
297 | CHUNK DATA: | |
298 | ||
32f3c541 DS |
299 | Packfile Names (ID: {'P', 'N', 'A', 'M'}) |
300 | Stores the packfile names as concatenated, null-terminated strings. | |
301 | Packfiles must be listed in lexicographic order for fast lookups by | |
302 | name. This is the only chunk not guaranteed to be a multiple of four | |
303 | bytes in length, so should be the last chunk for alignment reasons. | |
304 | ||
d7cacf29 DS |
305 | OID Fanout (ID: {'O', 'I', 'D', 'F'}) |
306 | The ith entry, F[i], stores the number of OIDs with first | |
307 | byte at most i. Thus F[255] stores the total | |
308 | number of objects. | |
309 | ||
0d5b3a5e DS |
310 | OID Lookup (ID: {'O', 'I', 'D', 'L'}) |
311 | The OIDs for all objects in the MIDX are stored in lexicographic | |
312 | order in this chunk. | |
313 | ||
662148c4 DS |
314 | Object Offsets (ID: {'O', 'O', 'F', 'F'}) |
315 | Stores two 4-byte values for every object. | |
316 | 1: The pack-int-id for the pack storing this object. | |
317 | 2: The offset within the pack. | |
318 | If all offsets are less than 2^31, then the large offset chunk | |
319 | will not exist and offsets are stored as in IDX v1. | |
320 | If there is at least one offset value larger than 2^32-1, then | |
321 | the large offset chunk must exist. If the large offset chunk | |
322 | exists and the 31st bit is on, then removing that bit reveals | |
323 | the row in the large offsets containing the 8-byte offset of | |
324 | this object. | |
325 | ||
326 | [Optional] Object Large Offsets (ID: {'L', 'O', 'F', 'F'}) | |
327 | 8-byte offsets into large packfiles. | |
e0d1bcf8 DS |
328 | |
329 | TRAILER: | |
330 | ||
331 | 20-byte SHA1-checksum of the above contents. |