]>
Commit | Line | Data |
---|---|---|
ebba6c0c TA |
1 | Concerning Git's Packing Heuristics |
2 | =================================== | |
b116b297 JL |
3 | |
4 | Oh, here's a really stupid question: | |
5 | ||
6 | Where do I go | |
7 | to learn the details | |
2de9b711 | 8 | of Git's packing heuristics? |
b116b297 JL |
9 | |
10 | Be careful what you ask! | |
11 | ||
2de9b711 | 12 | Followers of the Git, please open the Git IRC Log and turn to |
b116b297 JL |
13 | February 10, 2006. |
14 | ||
15 | It's a rare occasion, and we are joined by the King Git Himself, | |
16 | Linus Torvalds (linus). Nathaniel Smith, (njs`), has the floor | |
17 | and seeks enlightenment. Others are present, but silent. | |
18 | ||
19 | Let's listen in! | |
20 | ||
21 | <njs`> Oh, here's a really stupid question -- where do I go to | |
2de9b711 | 22 | learn the details of Git's packing heuristics? google avails |
b116b297 JL |
23 | me not, reading the source didn't help a lot, and wading |
24 | through the whole mailing list seems less efficient than any | |
25 | of that. | |
26 | ||
27 | It is a bold start! A plea for help combined with a simultaneous | |
28 | tri-part attack on some of the tried and true mainstays in the quest | |
29 | for enlightenment. Brash accusations of google being useless. Hubris! | |
30 | Maligning the source. Heresy! Disdain for the mailing list archives. | |
31 | Woe. | |
32 | ||
33 | <pasky> yes, the packing-related delta stuff is somewhat | |
34 | mysterious even for me ;) | |
35 | ||
36 | Ah! Modesty after all. | |
37 | ||
38 | <linus> njs, I don't think the docs exist. That's something where | |
39 | I don't think anybody else than me even really got involved. | |
2de9b711 | 40 | Most of the rest of Git others have been busy with (especially |
b116b297 JL |
41 | Junio), but packing nobody touched after I did it. |
42 | ||
43 | It's cryptic, yet vague. Linus in style for sure. Wise men | |
44 | interpret this as an apology. A few argue it is merely a | |
45 | statement of fact. | |
46 | ||
47 | <njs`> I guess the next step is "read the source again", but I | |
48 | have to build up a certain level of gumption first :-) | |
49 | ||
50 | Indeed! On both points. | |
51 | ||
52 | <linus> The packing heuristic is actually really really simple. | |
53 | ||
54 | Bait... | |
55 | ||
56 | <linus> But strange. | |
57 | ||
58 | And switch. That ought to do it! | |
59 | ||
2de9b711 | 60 | <linus> Remember: Git really doesn't follow files. So what it does is |
b116b297 JL |
61 | - generate a list of all objects |
62 | - sort the list according to magic heuristics | |
63 | - walk the list, using a sliding window, seeing if an object | |
64 | can be diffed against another object in the window | |
65 | - write out the list in recency order | |
66 | ||
67 | The traditional understatement: | |
68 | ||
69 | <njs`> I suspect that what I'm missing is the precise definition of | |
70 | the word "magic" | |
71 | ||
72 | The traditional insight: | |
73 | ||
74 | <pasky> yes | |
75 | ||
addf88e4 | 76 | And Babel-like confusion flowed. |
b116b297 JL |
77 | |
78 | <njs`> oh, hmm, and I'm not sure what this sliding window means either | |
79 | ||
80 | <pasky> iirc, it appeared to me to be just the sha1 of the object | |
81 | when reading the code casually ... | |
82 | ||
83 | ... which simply doesn't sound as a very good heuristics, though ;) | |
84 | ||
85 | <njs`> .....and recency order. okay, I think it's clear I didn't | |
86 | even realize how much I wasn't realizing :-) | |
87 | ||
88 | Ah, grasshopper! And thus the enlightenment begins anew. | |
89 | ||
90 | <linus> The "magic" is actually in theory totally arbitrary. | |
91 | ANY order will give you a working pack, but no, it's not | |
d5fa1f1a | 92 | ordered by SHA-1. |
b116b297 JL |
93 | |
94 | Before talking about the ordering for the sliding delta | |
95 | window, let's talk about the recency order. That's more | |
96 | important in one way. | |
97 | ||
98 | <njs`> Right, but if all you want is a working way to pack things | |
99 | together, you could just use cat and save yourself some | |
100 | trouble... | |
101 | ||
102 | Waaait for it.... | |
103 | ||
104 | <linus> The recency ordering (which is basically: put objects | |
105 | _physically_ into the pack in the order that they are | |
106 | "reachable" from the head) is important. | |
107 | ||
108 | <njs`> okay | |
109 | ||
110 | <linus> It's important because that's the thing that gives packs | |
111 | good locality. It keeps the objects close to the head (whether | |
112 | they are old or new, but they are _reachable_ from the head) | |
113 | at the head of the pack. So packs actually have absolutely | |
114 | _wonderful_ IO patterns. | |
115 | ||
116 | Read that again, because it is important. | |
117 | ||
118 | <linus> But recency ordering is totally useless for deciding how | |
119 | to actually generate the deltas, so the delta ordering is | |
120 | something else. | |
121 | ||
122 | The delta ordering is (wait for it): | |
123 | - first sort by the "basename" of the object, as defined by | |
124 | the name the object was _first_ reached through when | |
125 | generating the object list | |
126 | - within the same basename, sort by size of the object | |
127 | - but always sort different types separately (commits first). | |
128 | ||
129 | That's not exactly it, but it's very close. | |
130 | ||
131 | <njs`> The "_first_ reached" thing is not too important, just you | |
132 | need some way to break ties since the same objects may be | |
133 | reachable many ways, yes? | |
134 | ||
135 | And as if to clarify: | |
136 | ||
137 | <linus> The point is that it's all really just any random | |
138 | heuristic, and the ordering is totally unimportant for | |
139 | correctness, but it helps a lot if the heuristic gives | |
140 | "clumping" for things that are likely to delta well against | |
141 | each other. | |
142 | ||
143 | It is an important point, so secretly, I did my own research and have | |
144 | included my results below. To be fair, it has changed some over time. | |
145 | And through the magic of Revisionistic History, I draw upon this entry | |
146 | from The Git IRC Logs on my father's birthday, March 1: | |
147 | ||
148 | <gitster> The quote from the above linus should be rewritten a | |
149 | bit (wait for it): | |
150 | - first sort by type. Different objects never delta with | |
151 | each other. | |
152 | - then sort by filename/dirname. hash of the basename | |
153 | occupies the top BITS_PER_INT-DIR_BITS bits, and bottom | |
154 | DIR_BITS are for the hash of leading path elements. | |
155 | - then if we are doing "thin" pack, the objects we are _not_ | |
156 | going to pack but we know about are sorted earlier than | |
157 | other objects. | |
158 | - and finally sort by size, larger to smaller. | |
159 | ||
160 | In one swell-foop, clarification and obscurification! Nonetheless, | |
161 | authoritative. Cryptic, yet concise. It even solicits notions of | |
162 | quotes from The Source Code. Clearly, more study is needed. | |
163 | ||
164 | <gitster> That's the sort order. What this means is: | |
165 | - we do not delta different object types. | |
166 | - we prefer to delta the objects with the same full path, but | |
167 | allow files with the same name from different directories. | |
168 | - we always prefer to delta against objects we are not going | |
169 | to send, if there are some. | |
170 | - we prefer to delta against larger objects, so that we have | |
171 | lots of removals. | |
172 | ||
173 | The penultimate rule is for "thin" packs. It is used when | |
174 | the other side is known to have such objects. | |
175 | ||
176 | There it is again. "Thin" packs. I'm thinking to myself, "What | |
177 | is a 'thin' pack?" So I ask: | |
178 | ||
179 | <jdl> What is a "thin" pack? | |
180 | ||
181 | <gitster> Use of --objects-edge to rev-list as the upstream of | |
182 | pack-objects. The pack transfer protocol negotiates that. | |
183 | ||
184 | Woo hoo! Cleared that _right_ up! | |
185 | ||
186 | <gitster> There are two directions - push and fetch. | |
187 | ||
188 | There! Did you see it? It is not '"push" and "pull"'! How often the | |
189 | confusion has started here. So casually mentioned, too! | |
190 | ||
191 | <gitster> For push, git-send-pack invokes git-receive-pack on the | |
192 | other end. The receive-pack says "I have up to these commits". | |
193 | send-pack looks at them, and computes what are missing from | |
194 | the other end. So "thin" could be the default there. | |
195 | ||
196 | In the other direction, fetch, git-fetch-pack and | |
197 | git-clone-pack invokes git-upload-pack on the other end | |
198 | (via ssh or by talking to the daemon). | |
199 | ||
200 | There are two cases: fetch-pack with -k and clone-pack is one, | |
201 | fetch-pack without -k is the other. clone-pack and fetch-pack | |
202 | with -k will keep the downloaded packfile without expanded, so | |
203 | we do not use thin pack transfer. Otherwise, the generated | |
204 | pack will have delta without base object in the same pack. | |
205 | ||
206 | But fetch-pack without -k will explode the received pack into | |
207 | individual objects, so we automatically ask upload-pack to | |
208 | give us a thin pack if upload-pack supports it. | |
209 | ||
210 | OK then. | |
211 | ||
212 | Uh. | |
213 | ||
214 | Let's return to the previous conversation still in progress. | |
215 | ||
216 | <njs`> and "basename" means something like "the tail of end of | |
217 | path of file objects and dir objects, as per basename(3), and | |
218 | we just declare all commit and tag objects to have the same | |
219 | basename" or something? | |
220 | ||
221 | Luckily, that too is a point that gitster clarified for us! | |
222 | ||
223 | If I might add, the trick is to make files that _might_ be similar be | |
224 | located close to each other in the hash buckets based on their file | |
225 | names. It used to be that "foo/Makefile", "bar/baz/quux/Makefile" and | |
226 | "Makefile" all landed in the same bucket due to their common basename, | |
227 | "Makefile". However, now they land in "close" buckets. | |
228 | ||
229 | The algorithm allows not just for the _same_ bucket, but for _close_ | |
230 | buckets to be considered delta candidates. The rationale is | |
231 | essentially that files, like Makefiles, often have very similar | |
232 | content no matter what directory they live in. | |
233 | ||
234 | <linus> I played around with different delta algorithms, and with | |
235 | making the "delta window" bigger, but having too big of a | |
236 | sliding window makes it very expensive to generate the pack: | |
237 | you need to compare every object with a _ton_ of other objects. | |
238 | ||
239 | There are a number of other trivial heuristics too, which | |
240 | basically boil down to "don't bother even trying to delta this | |
241 | pair" if we can tell before-hand that the delta isn't worth it | |
242 | (due to size differences, where we can take a previous delta | |
243 | result into account to decide that "ok, no point in trying | |
244 | that one, it will be worse"). | |
245 | ||
246 | End result: packing is actually very size efficient. It's | |
247 | somewhat CPU-wasteful, but on the other hand, since you're | |
248 | really only supposed to do it maybe once a month (and you can | |
249 | do it during the night), nobody really seems to care. | |
250 | ||
251 | Nice Engineering Touch, there. Find when it doesn't matter, and | |
252 | proclaim it a non-issue. Good style too! | |
253 | ||
254 | <njs`> So, just to repeat to see if I'm following, we start by | |
255 | getting a list of the objects we want to pack, we sort it by | |
256 | this heuristic (basically lexicographically on the tuple | |
257 | (type, basename, size)). | |
258 | ||
259 | Then we walk through this list, and calculate a delta of | |
addf88e4 | 260 | each object against the last n (tunable parameter) objects, |
b116b297 JL |
261 | and pick the smallest of these deltas. |
262 | ||
263 | Vastly simplified, but the essence is there! | |
264 | ||
265 | <linus> Correct. | |
266 | ||
267 | <njs`> And then once we have picked a delta or fulltext to | |
268 | represent each object, we re-sort by recency, and write them | |
269 | out in that order. | |
270 | ||
271 | <linus> Yup. Some other small details: | |
272 | ||
273 | And of course there is the "Other Shoe" Factor too. | |
274 | ||
275 | <linus> - We limit the delta depth to another magic value (right | |
276 | now both the window and delta depth magic values are just "10") | |
277 | ||
278 | <njs`> Hrm, my intuition is that you'd end up with really _bad_ IO | |
279 | patterns, because the things you want are near by, but to | |
280 | actually reconstruct them you may have to jump all over in | |
281 | random ways. | |
282 | ||
283 | <linus> - When we write out a delta, and we haven't yet written | |
284 | out the object it is a delta against, we write out the base | |
285 | object first. And no, when we reconstruct them, we actually | |
286 | get nice IO patterns, because: | |
287 | - larger objects tend to be "more recent" (Linus' law: files grow) | |
288 | - we actively try to generate deltas from a larger object to a | |
289 | smaller one | |
290 | - this means that the top-of-tree very seldom has deltas | |
abda1ef5 | 291 | (i.e. deltas in _practice_ are "backwards deltas") |
b116b297 JL |
292 | |
293 | Again, we should reread that whole paragraph. Not just because | |
294 | Linus has slipped Linus's Law in there on us, but because it is | |
295 | important. Let's make sure we clarify some of the points here: | |
296 | ||
297 | <njs`> So the point is just that in practice, delta order and | |
298 | recency order match each other quite well. | |
299 | ||
300 | <linus> Yes. There's another nice side to this (and yes, it was | |
301 | designed that way ;): | |
302 | - the reason we generate deltas against the larger object is | |
303 | actually a big space saver too! | |
304 | ||
305 | <njs`> Hmm, but your last comment (if "we haven't yet written out | |
306 | the object it is a delta against, we write out the base object | |
307 | first"), seems like it would make these facts mostly | |
308 | irrelevant because even if in practice you would not have to | |
309 | wander around much, in fact you just brute-force say that in | |
310 | the cases where you might have to wander, don't do that :-) | |
311 | ||
312 | <linus> Yes and no. Notice the rule: we only write out the base | |
313 | object first if the delta against it was more recent. That | |
314 | means that you can actually have deltas that refer to a base | |
315 | object that is _not_ close to the delta object, but that only | |
316 | happens when the delta is needed to generate an _old_ object. | |
317 | ||
318 | <linus> See? | |
319 | ||
320 | Yeah, no. I missed that on the first two or three readings myself. | |
321 | ||
322 | <linus> This keeps the front of the pack dense. The front of the | |
323 | pack never contains data that isn't relevant to a "recent" | |
324 | object. The size optimization comes from our use of xdelta | |
325 | (but is true for many other delta algorithms): removing data | |
326 | is cheaper (in size) than adding data. | |
327 | ||
328 | When you remove data, you only need to say "copy bytes n--m". | |
329 | In contrast, in a delta that _adds_ data, you have to say "add | |
330 | these bytes: 'actual data goes here'" | |
331 | ||
332 | *** njs` has quit: Read error: 104 (Connection reset by peer) | |
333 | ||
334 | <linus> Uhhuh. I hope I didn't blow njs` mind. | |
335 | ||
336 | *** njs` has joined channel #git | |
337 | ||
338 | <pasky> :) | |
339 | ||
340 | The silent observers are amused. Of course. | |
341 | ||
342 | And as if njs` was expected to be omniscient: | |
343 | ||
344 | <linus> njs - did you miss anything? | |
345 | ||
346 | OK, I'll spell it out. That's Geek Humor. If njs` was not actually | |
347 | connected for a little bit there, how would he know if missed anything | |
348 | while he was disconnected? He's a benevolent dictator with a sense of | |
349 | humor! Well noted! | |
350 | ||
351 | <njs`> Stupid router. Or gremlins, or whatever. | |
352 | ||
353 | It's a cheap shot at Cisco. Take 'em when you can. | |
354 | ||
355 | <njs`> Yes and no. Notice the rule: we only write out the base | |
356 | object first if the delta against it was more recent. | |
357 | ||
358 | I'm getting lost in all these orders, let me re-read :-) | |
359 | So the write-out order is from most recent to least recent? | |
360 | (Conceivably it could be the opposite way too, I'm not sure if | |
361 | we've said) though my connection back at home is logging, so I | |
362 | can just read what you said there :-) | |
363 | ||
364 | And for those of you paying attention, the Omniscient Trick has just | |
365 | been detailed! | |
366 | ||
367 | <linus> Yes, we always write out most recent first | |
368 | ||
b116b297 JL |
369 | <njs`> And, yeah, I got the part about deeper-in-history stuff |
370 | having worse IO characteristics, one sort of doesn't care. | |
371 | ||
372 | <linus> With the caveat that if the "most recent" needs an older | |
373 | object to delta against (hey, shrinking sometimes does | |
374 | happen), we write out the old object with the delta. | |
375 | ||
376 | <njs`> (if only it happened more...) | |
377 | ||
378 | <linus> Anyway, the pack-file could easily be denser still, but | |
2de9b711 | 379 | because it's used both for streaming (the Git protocol) and |
b116b297 JL |
380 | for on-disk, it has a few pessimizations. |
381 | ||
382 | Actually, it is a made-up word. But it is a made-up word being | |
383 | used as setup for a later optimization, which is a real word: | |
384 | ||
385 | <linus> In particular, while the pack-file is then compressed, | |
386 | it's compressed just one object at a time, so the actual | |
387 | compression factor is less than it could be in theory. But it | |
388 | means that it's all nice random-access with a simple index to | |
389 | do "object name->location in packfile" translation. | |
390 | ||
391 | <njs`> I'm assuming the real win for delta-ing large->small is | |
addf88e4 | 392 | more homogeneous statistics for gzip to run over? |
b116b297 JL |
393 | |
394 | (You have to put the bytes in one place or another, but | |
395 | putting them in a larger blob wins on compression) | |
396 | ||
397 | Actually, what is the compression strategy -- each delta | |
398 | individually gzipped, the whole file gzipped, somewhere in | |
399 | between, no compression at all, ....? | |
400 | ||
401 | Right. | |
402 | ||
403 | Reality IRC sets in. For example: | |
404 | ||
405 | <pasky> I'll read the rest in the morning, I really have to go | |
406 | sleep or there's no hope whatsoever for me at the today's | |
407 | exam... g'nite all. | |
408 | ||
409 | Heh. | |
410 | ||
411 | <linus> pasky: g'nite | |
412 | ||
413 | <njs`> pasky: 'luck | |
414 | ||
415 | <linus> Right: large->small matters exactly because of compression | |
416 | behaviour. If it was non-compressed, it probably wouldn't make | |
417 | any difference. | |
418 | ||
419 | <njs`> yeah | |
420 | ||
421 | <linus> Anyway: I'm not even trying to claim that the pack-files | |
422 | are perfect, but they do tend to have a nice balance of | |
423 | density vs ease-of use. | |
424 | ||
425 | Gasp! OK, saved. That's a fair Engineering trade off. Close call! | |
426 | In fact, Linus reflects on some Basic Engineering Fundamentals, | |
427 | design options, etc. | |
428 | ||
2de9b711 | 429 | <linus> More importantly, they allow Git to still _conceptually_ |
b116b297 JL |
430 | never deal with deltas at all, and be a "whole object" store. |
431 | ||
432 | Which has some problems (we discussed bad huge-file | |
2de9b711 TA |
433 | behaviour on the Git lists the other day), but it does mean |
434 | that the basic Git concepts are really really simple and | |
b116b297 JL |
435 | straightforward. |
436 | ||
437 | It's all been quite stable. | |
438 | ||
439 | Which I think is very much a result of having very simple | |
440 | basic ideas, so that there's never any confusion about what's | |
441 | going on. | |
442 | ||
443 | Bugs happen, but they are "simple" bugs. And bugs that | |
444 | actually get some object store detail wrong are almost always | |
addf88e4 | 445 | so obvious that they never go anywhere. |
b116b297 JL |
446 | |
447 | <njs`> Yeah. | |
448 | ||
449 | Nuff said. | |
450 | ||
451 | <linus> Anyway. I'm off for bed. It's not 6AM here, but I've got | |
452 | three kids, and have to get up early in the morning to send | |
453 | them off. I need my beauty sleep. | |
454 | ||
455 | <njs`> :-) | |
456 | ||
457 | <njs`> appreciate the infodump, I really was failing to find the | |
2de9b711 | 458 | details on Git packs :-) |
b116b297 JL |
459 | |
460 | And now you know the rest of the story. |