]>
Commit | Line | Data |
---|---|---|
00d3e8d7 ÆAB |
1 | gitformat-index(5) |
2 | ================== | |
3 | ||
4 | NAME | |
5 | ---- | |
6 | gitformat-index - Git index format | |
7 | ||
8 | SYNOPSIS | |
9 | -------- | |
10 | [verse] | |
11 | $GIT_DIR/index | |
12 | ||
13 | DESCRIPTION | |
14 | ----------- | |
15 | ||
48a8c26c | 16 | Git index format |
8c7d0517 | 17 | |
2de9b711 | 18 | == The Git index file has the following format |
8c7d0517 | 19 | |
123712ba MÅ |
20 | All binary numbers are in network byte order. |
21 | In a repository using the traditional SHA-1, checksums and object IDs | |
22 | (object names) mentioned below are all computed using SHA-1. Similarly, | |
23 | in SHA-256 repositories, these values are computed using SHA-256. | |
24 | Version 2 is described here unless stated otherwise. | |
8c7d0517 NTND |
25 | |
26 | - A 12-byte header consisting of | |
27 | ||
28 | 4-byte signature: | |
23fcc98f | 29 | The signature is { 'D', 'I', 'R', 'C' } (stands for "dircache") |
8c7d0517 NTND |
30 | |
31 | 4-byte version number: | |
300e39f6 | 32 | The current supported versions are 2, 3 and 4. |
8c7d0517 NTND |
33 | |
34 | 32-bit number of index entries. | |
35 | ||
23fcc98f | 36 | - A number of sorted index entries (see below). |
8c7d0517 NTND |
37 | |
38 | - Extensions | |
39 | ||
40 | Extensions are identified by signature. Optional extensions can | |
48a8c26c | 41 | be ignored if Git does not understand them. |
8c7d0517 | 42 | |
8c7d0517 NTND |
43 | 4-byte extension signature. If the first byte is 'A'..'Z' the |
44 | extension is optional and can be ignored. | |
45 | ||
46 | 32-bit size of the extension | |
47 | ||
48 | Extension data | |
49 | ||
123712ba | 50 | - Hash checksum over the content of the index file before this checksum. |
8c7d0517 NTND |
51 | |
52 | == Index entry | |
53 | ||
54 | Index entries are sorted in ascending order on the name field, | |
23fcc98f JH |
55 | interpreted as a string of unsigned bytes (i.e. memcmp() order, no |
56 | localization, no special casing of directory separator '/'). Entries | |
57 | with the same name are sorted by their stage field. | |
8c7d0517 | 58 | |
0ad6090b DS |
59 | An index entry typically represents a file. However, if sparse-checkout |
60 | is enabled in cone mode (`core.sparseCheckoutCone` is enabled) and the | |
61 | `extensions.sparseIndex` extension is enabled, then the index may | |
62 | contain entries for directories outside of the sparse-checkout definition. | |
63 | These entries have mode `040000`, include the `SKIP_WORKTREE` bit, and | |
64 | the path ends in a directory separator. | |
65 | ||
8c7d0517 NTND |
66 | 32-bit ctime seconds, the last time a file's metadata changed |
67 | this is stat(2) data | |
68 | ||
69 | 32-bit ctime nanosecond fractions | |
70 | this is stat(2) data | |
71 | ||
72 | 32-bit mtime seconds, the last time a file's data changed | |
73 | this is stat(2) data | |
74 | ||
75 | 32-bit mtime nanosecond fractions | |
76 | this is stat(2) data | |
77 | ||
78 | 32-bit dev | |
79 | this is stat(2) data | |
80 | ||
81 | 32-bit ino | |
82 | this is stat(2) data | |
83 | ||
84 | 32-bit mode, split into (high to low bits) | |
85 | ||
3a2ebaeb GC |
86 | 16-bit unused, must be zero |
87 | ||
8c7d0517 | 88 | 4-bit object type |
23fcc98f | 89 | valid values in binary are 1000 (regular file), 1010 (symbolic link) |
8c7d0517 NTND |
90 | and 1110 (gitlink) |
91 | ||
3a2ebaeb | 92 | 3-bit unused, must be zero |
8c7d0517 | 93 | |
23fcc98f JH |
94 | 9-bit unix permission. Only 0755 and 0644 are valid for regular files. |
95 | Symbolic links and gitlinks have value 0 in this field. | |
8c7d0517 NTND |
96 | |
97 | 32-bit uid | |
98 | this is stat(2) data | |
99 | ||
100 | 32-bit gid | |
101 | this is stat(2) data | |
102 | ||
103 | 32-bit file size | |
23fcc98f | 104 | This is the on-disk size from stat(2), truncated to 32-bit. |
8c7d0517 | 105 | |
123712ba | 106 | Object name for the represented object |
8c7d0517 | 107 | |
23fcc98f | 108 | A 16-bit 'flags' field split into (high to low bits) |
8c7d0517 NTND |
109 | |
110 | 1-bit assume-valid flag | |
111 | ||
112 | 1-bit extended flag (must be zero in version 2) | |
113 | ||
114 | 2-bit stage (during merge) | |
115 | ||
23fcc98f JH |
116 | 12-bit name length if the length is less than 0xFFF; otherwise 0xFFF |
117 | is stored in this field. | |
8c7d0517 | 118 | |
300e39f6 NTND |
119 | (Version 3 or later) A 16-bit field, only applicable if the |
120 | "extended flag" above is 1, split into (high to low bits). | |
8c7d0517 NTND |
121 | |
122 | 1-bit reserved for future | |
123 | ||
124 | 1-bit skip-worktree flag (used by sparse checkout) | |
125 | ||
126 | 1-bit intent-to-add flag (used by "git add -N") | |
127 | ||
128 | 13-bit unused, must be zero | |
129 | ||
130 | Entry path name (variable length) relative to top level directory | |
131 | (without leading slash). '/' is used as path separator. The special | |
23fcc98f | 132 | path components ".", ".." and ".git" (without quotes) are disallowed. |
8c7d0517 NTND |
133 | Trailing slash is also disallowed. |
134 | ||
135 | The exact encoding is undefined, but the '.' and '/' characters | |
23fcc98f JH |
136 | are encoded in 7-bit ASCII and the encoding cannot contain a NUL |
137 | byte (iow, this is a UNIX pathname). | |
8c7d0517 | 138 | |
afd7bd22 JH |
139 | (Version 4) In version 4, the entry path name is prefix-compressed |
140 | relative to the path name for the previous entry (the very first | |
141 | entry is encoded as if the path name for the previous entry is an | |
142 | empty string). At the beginning of an entry, an integer N in the | |
143 | variable width encoding (the same encoding as the offset is encoded | |
977c47b4 | 144 | for OFS_DELTA pack entries; see linkgit:gitformat-pack[5]) is stored, followed |
afd7bd22 JH |
145 | by a NUL-terminated string S. Removing N bytes from the end of the |
146 | path name for the previous entry, and replacing it with the string S | |
147 | yields the path name for this entry. | |
148 | ||
8c7d0517 NTND |
149 | 1-8 nul bytes as necessary to pad the entry to a multiple of eight bytes |
150 | while keeping the name NUL-terminated. | |
151 | ||
afd7bd22 JH |
152 | (Version 4) In version 4, the padding after the pathname does not |
153 | exist. | |
154 | ||
5fc2fc8f NTND |
155 | Interpretation of index entries in split index mode is completely |
156 | different. See below for details. | |
157 | ||
8c7d0517 NTND |
158 | == Extensions |
159 | ||
845d15d4 | 160 | === Cache tree |
8c7d0517 | 161 | |
22ad8600 DS |
162 | Since the index does not record entries for directories, the cache |
163 | entries cannot describe tree objects that already exist in the object | |
164 | database for regions of the index that are unchanged from an existing | |
165 | commit. The cache tree extension stores a recursive tree structure that | |
166 | describes the trees that already exist and completely match sections of | |
167 | the cache entries. This speeds up tree object generation from the index | |
168 | for a new commit by only computing the trees that are "new" to that | |
169 | commit. It also assists when comparing the index to another tree, such | |
170 | as `HEAD^{tree}`, since sections of the index can be skipped when a tree | |
171 | comparison demonstrates equality. | |
172 | ||
173 | The recursive tree structure uses nodes that store a number of cache | |
174 | entries, a list of subnodes, and an object ID (OID). The OID references | |
175 | the existing tree for that node, if it is known to exist. The subnodes | |
176 | correspond to subdirectories that themselves have cache tree nodes. The | |
177 | number of cache entries corresponds to the number of cache entries in | |
178 | the index that describe paths within that tree's directory. | |
179 | ||
180 | The extension tracks the full directory structure in the cache tree | |
181 | extension, but this is generally smaller than the full cache entry list. | |
182 | ||
183 | When a path is updated in index, Git invalidates all nodes of the | |
184 | recursive cache tree corresponding to the parent directories of that | |
185 | path. We store these tree nodes as being "invalid" by using "-1" as the | |
186 | number of cache entries. Invalid nodes still store a span of index | |
187 | entries, allowing Git to focus its efforts when reconstructing a full | |
188 | cache tree. | |
8c7d0517 | 189 | |
23fcc98f | 190 | The signature for this extension is { 'T', 'R', 'E', 'E' }. |
8c7d0517 | 191 | |
23fcc98f JH |
192 | A series of entries fill the entire extension; each of which |
193 | consists of: | |
8c7d0517 | 194 | |
23fcc98f | 195 | - NUL-terminated path component (relative to its parent directory); |
8c7d0517 | 196 | |
23fcc98f JH |
197 | - ASCII decimal number of entries in the index that is covered by the |
198 | tree this entry represents (entry_count); | |
8c7d0517 | 199 | |
23fcc98f | 200 | - A space (ASCII 32); |
8c7d0517 | 201 | |
23fcc98f JH |
202 | - ASCII decimal number that represents the number of subtrees this |
203 | tree has; | |
8c7d0517 | 204 | |
23fcc98f JH |
205 | - A newline (ASCII 10); and |
206 | ||
123712ba MÅ |
207 | - Object name for the object that would result from writing this span |
208 | of index as a tree. | |
23fcc98f | 209 | |
e44b6df9 | 210 | An entry can be in an invalidated state and is represented by having |
4a6385fe NTND |
211 | a negative number in the entry_count field. In this case, there is no |
212 | object name and the next entry starts immediately after the newline. | |
213 | When writing an invalid entry, -1 should always be used as entry_count. | |
23fcc98f JH |
214 | |
215 | The entries are written out in the top-down, depth-first order. The | |
216 | first entry represents the root level of the repository, followed by the | |
3b19dba7 | 217 | first subtree--let's call this A--of the root level (with its name |
23fcc98f | 218 | relative to the root level), followed by the first subtree of A (with |
4bdde337 DS |
219 | its name relative to A), and so on. The specified number of subtrees |
220 | indicates when the current level of the recursive stack is complete. | |
8c7d0517 NTND |
221 | |
222 | === Resolve undo | |
223 | ||
23fcc98f | 224 | A conflict is represented in the index as a set of higher stage entries. |
8c7d0517 | 225 | When a conflict is resolved (e.g. with "git add path"), these higher |
17b83d71 | 226 | stage entries will be removed and a stage-0 entry with proper resolution |
23fcc98f | 227 | is added. |
8c7d0517 | 228 | |
23fcc98f JH |
229 | When these higher stage entries are removed, they are saved in the |
230 | resolve undo extension, so that conflicts can be recreated (e.g. with | |
231 | "git checkout -m"), in case users want to redo a conflict resolution | |
232 | from scratch. | |
8c7d0517 | 233 | |
23fcc98f | 234 | The signature for this extension is { 'R', 'E', 'U', 'C' }. |
8c7d0517 | 235 | |
23fcc98f JH |
236 | A series of entries fill the entire extension; each of which |
237 | consists of: | |
8c7d0517 | 238 | |
23fcc98f JH |
239 | - NUL-terminated pathname the entry describes (relative to the root of |
240 | the repository, i.e. full pathname); | |
8c7d0517 | 241 | |
23fcc98f JH |
242 | - Three NUL-terminated ASCII octal numbers, entry mode of entries in |
243 | stage 1 to 3 (a missing stage is represented by "0" in this field); | |
244 | and | |
8c7d0517 | 245 | |
123712ba | 246 | - At most three object names of the entry in stages from 1 to 3 |
23fcc98f | 247 | (nothing is written for a missing stage). |
8c7d0517 | 248 | |
5fc2fc8f NTND |
249 | === Split index |
250 | ||
251 | In split index mode, the majority of index entries could be stored | |
252 | in a separate file. This extension records the changes to be made on | |
253 | top of that to produce the final index. | |
254 | ||
f2667a83 | 255 | The signature for this extension is { 'l', 'i', 'n', 'k' }. |
5fc2fc8f NTND |
256 | |
257 | The extension consists of: | |
258 | ||
123712ba MÅ |
259 | - Hash of the shared index file. The shared index file path |
260 | is $GIT_DIR/sharedindex.<hash>. If all bits are zero, the | |
5fc2fc8f NTND |
261 | index does not require a shared index file. |
262 | ||
263 | - An ewah-encoded delete bitmap, each bit represents an entry in the | |
264 | shared index. If a bit is set, its corresponding entry in the | |
265 | shared index will be removed from the final index. Note, because | |
266 | a delete operation changes index entry positions, but we do need | |
267 | original positions in replace phase, it's best to just mark | |
268 | entries for removal, then do a mass deletion after replacement. | |
269 | ||
270 | - An ewah-encoded replace bitmap, each bit represents an entry in | |
271 | the shared index. If a bit is set, its corresponding entry in the | |
272 | shared index will be replaced with an entry in this index | |
273 | file. All replaced entries are stored in sorted order in this | |
274 | index. The first "1" bit in the replace bitmap corresponds to the | |
275 | first index entry, the second "1" bit to the second entry and so | |
276 | on. Replaced entries may have empty path names to save space. | |
277 | ||
278 | The remaining index entries after replaced ones will be added to the | |
f745acb0 | 279 | final index. These added entries are also sorted by entry name then |
5fc2fc8f | 280 | stage. |
83c094ad NTND |
281 | |
282 | == Untracked cache | |
283 | ||
284 | Untracked cache saves the untracked file list and necessary data to | |
285 | verify the cache. The signature for this extension is { 'U', 'N', | |
286 | 'T', 'R' }. | |
287 | ||
288 | The extension starts with | |
289 | ||
1e8fef60 NTND |
290 | - A sequence of NUL-terminated strings, preceded by the size of the |
291 | sequence in variable width encoding. Each string describes the | |
292 | environment where the cache can be used. | |
293 | ||
83c094ad NTND |
294 | - Stat data of $GIT_DIR/info/exclude. See "Index entry" section from |
295 | ctime field until "file size". | |
296 | ||
7dd0eaa3 | 297 | - Stat data of core.excludesFile |
83c094ad NTND |
298 | |
299 | - 32-bit dir_flags (see struct dir_struct) | |
300 | ||
123712ba | 301 | - Hash of $GIT_DIR/info/exclude. A null hash means the file |
83c094ad NTND |
302 | does not exist. |
303 | ||
7dd0eaa3 | 304 | - Hash of core.excludesFile. A null hash means the file does |
83c094ad NTND |
305 | not exist. |
306 | ||
307 | - NUL-terminated string of per-dir exclude file name. This usually | |
308 | is ".gitignore". | |
309 | ||
310 | - The number of following directory blocks, variable width | |
311 | encoding. If this number is zero, the extension ends here with a | |
312 | following NUL. | |
313 | ||
314 | - A number of directory blocks in depth-first-search order, each | |
315 | consists of | |
316 | ||
317 | - The number of untracked entries, variable width encoding. | |
318 | ||
319 | - The number of sub-directory blocks, variable width encoding. | |
320 | ||
321 | - The directory name terminated by NUL. | |
322 | ||
da4c5ada | 323 | - A number of untracked file/dir names terminated by NUL. |
83c094ad NTND |
324 | |
325 | The remaining data of each directory block is grouped by type: | |
326 | ||
327 | - An ewah bitmap, the n-th bit marks whether the n-th directory has | |
328 | valid untracked cache entries. | |
329 | ||
330 | - An ewah bitmap, the n-th bit records "check-only" bit of | |
331 | read_directory_recursive() for the n-th directory. | |
332 | ||
123712ba | 333 | - An ewah bitmap, the n-th bit indicates whether hash and stat data |
83c094ad NTND |
334 | is valid for the n-th directory and exists in the next data. |
335 | ||
336 | - An array of stat data. The n-th data corresponds with the n-th | |
337 | "one" bit in the previous ewah bitmap. | |
338 | ||
123712ba | 339 | - An array of hashes. The n-th hash corresponds with the n-th "one" bit |
83c094ad NTND |
340 | in the previous ewah bitmap. |
341 | ||
342 | - One NUL. | |
780494b1 BP |
343 | |
344 | == File System Monitor cache | |
345 | ||
346 | The file system monitor cache tracks files for which the core.fsmonitor | |
347 | hook has told us about changes. The signature for this extension is | |
348 | { 'F', 'S', 'M', 'N' }. | |
349 | ||
350 | The extension starts with | |
351 | ||
5885367e | 352 | - 32-bit version number: the current supported versions are 1 and 2. |
780494b1 | 353 | |
5885367e JH |
354 | - (Version 1) |
355 | 64-bit time: the extension data reflects all changes through the given | |
780494b1 BP |
356 | time which is stored as the nanoseconds elapsed since midnight, |
357 | January 1, 1970. | |
358 | ||
5885367e JH |
359 | - (Version 2) |
360 | A null terminated string: an opaque token defined by the file system | |
361 | monitor application. The extension data reflects all changes relative | |
362 | to that token. | |
363 | ||
780494b1 BP |
364 | - 32-bit bitmap size: the size of the CE_FSMONITOR_VALID bitmap. |
365 | ||
366 | - An ewah bitmap, the n-th bit indicates whether the n-th index entry | |
367 | is not CE_FSMONITOR_VALID. | |
3b1d9e04 BP |
368 | |
369 | == End of Index Entry | |
370 | ||
371 | The End of Index Entry (EOIE) is used to locate the end of the variable | |
031fd4b9 | 372 | length index entries and the beginning of the extensions. Code can take |
3b1d9e04 BP |
373 | advantage of this to quickly locate the index extensions without having |
374 | to parse through all of the index entries. | |
375 | ||
376 | Because it must be able to be loaded before the variable length cache | |
377 | entries and other index extensions, this extension must be written last. | |
378 | The signature for this extension is { 'E', 'O', 'I', 'E' }. | |
379 | ||
380 | The extension consists of: | |
381 | ||
382 | - 32-bit offset to the end of the index entries | |
383 | ||
123712ba | 384 | - Hash over the extension types and their sizes (but not |
3b1d9e04 BP |
385 | their contents). E.g. if we have "TREE" extension that is N-bytes |
386 | long, "REUC" extension that is M-bytes long, followed by "EOIE", | |
387 | then the hash would be: | |
388 | ||
123712ba | 389 | Hash("TREE" + <binary representation of N> + |
3b1d9e04 | 390 | "REUC" + <binary representation of M>) |
3255089a BP |
391 | |
392 | == Index Entry Offset Table | |
393 | ||
394 | The Index Entry Offset Table (IEOT) is used to help address the CPU | |
395 | cost of loading the index by enabling multi-threading the process of | |
396 | converting cache entries from the on-disk format to the in-memory format. | |
397 | The signature for this extension is { 'I', 'E', 'O', 'T' }. | |
398 | ||
399 | The extension consists of: | |
400 | ||
401 | - 32-bit version (currently 1) | |
402 | ||
403 | - A number of index offset entries each consisting of: | |
404 | ||
031fd4b9 | 405 | - 32-bit offset from the beginning of the file to the first cache entry |
3255089a BP |
406 | in this block of entries. |
407 | ||
408 | - 32-bit count of cache entries in this block | |
cd42415f DS |
409 | |
410 | == Sparse Directory Entries | |
411 | ||
412 | When using sparse-checkout in cone mode, some entire directories within | |
413 | the index can be summarized by pointing to a tree object instead of the | |
414 | entire expanded list of paths within that tree. An index containing such | |
415 | entries is a "sparse index". Index format versions 4 and less were not | |
416 | implemented with such entries in mind. Thus, for these versions, an | |
417 | index containing sparse directory entries will include this extension | |
418 | with signature { 's', 'd', 'i', 'r' }. Like the split-index extension, | |
419 | tools should avoid interacting with a sparse index unless they understand | |
420 | this extension. | |
00d3e8d7 ÆAB |
421 | |
422 | GIT | |
423 | --- | |
424 | Part of the linkgit:git[1] suite |