]>
Commit | Line | Data |
---|---|---|
2729cadc NP |
1 | Date: Fri, 9 Nov 2007 08:28:38 -0800 (PST) |
2 | From: Linus Torvalds <torvalds@linux-foundation.org> | |
3 | Subject: corrupt object on git-gc | |
4 | Abstract: Some tricks to reconstruct blob objects in order to fix | |
5 | a corrupted repository. | |
1797e5c5 | 6 | Content-type: text/asciidoc |
2729cadc | 7 | |
1797e5c5 TA |
8 | How to recover a corrupted blob object |
9 | ====================================== | |
10 | ||
11 | ----------------------------------------------------------- | |
2729cadc NP |
12 | On Fri, 9 Nov 2007, Yossi Leybovich wrote: |
13 | > | |
14 | > Did not help still the repository look for this object? | |
15 | > Any one know how can I track this object and understand which file is it | |
1797e5c5 | 16 | ----------------------------------------------------------- |
2729cadc | 17 | |
d5fa1f1a | 18 | So exactly *because* the SHA-1 hash is cryptographically secure, the hash |
2729cadc NP |
19 | itself doesn't actually tell you anything, in order to fix a corrupt |
20 | object you basically have to find the "original source" for it. | |
21 | ||
22 | The easiest way to do that is almost always to have backups, and find the | |
2de9b711 | 23 | same object somewhere else. Backups really are a good idea, and Git makes |
2729cadc NP |
24 | it pretty easy (if nothing else, just clone the repository somewhere else, |
25 | and make sure that you do *not* use a hard-linked clone, and preferably | |
26 | not the same disk/machine). | |
27 | ||
28 | But since you don't seem to have backups right now, the good news is that | |
29 | especially with a single blob being corrupt, these things *are* somewhat | |
30 | debuggable. | |
31 | ||
32 | First off, move the corrupt object away, and *save* it. The most common | |
33 | cause of corruption so far has been memory corruption, but even so, there | |
34 | are people who would be interested in seeing the corruption - but it's | |
35 | basically impossible to judge the corruption until we can also see the | |
36 | original object, so right now the corrupt object is useless, but it's very | |
37 | interesting for the future, in the hope that you can re-create a | |
38 | non-corrupt version. | |
39 | ||
1797e5c5 | 40 | ----------------------------------------------------------- |
2729cadc NP |
41 | So: |
42 | ||
43 | > ib]$ mv .git/objects/4b/9458b3786228369c63936db65827de3cc06200 ../ | |
1797e5c5 | 44 | ----------------------------------------------------------- |
2729cadc NP |
45 | |
46 | This is the right thing to do, although it's usually best to save it under | |
d5fa1f1a | 47 | it's full SHA-1 name (you just dropped the "4b" from the result ;). |
2729cadc NP |
48 | |
49 | Let's see what that tells us: | |
50 | ||
1797e5c5 | 51 | ----------------------------------------------------------- |
2729cadc NP |
52 | > ib]$ git-fsck --full |
53 | > broken link from tree 2d9263c6d23595e7cb2a21e5ebbb53655278dff8 | |
54 | > to blob 4b9458b3786228369c63936db65827de3cc06200 | |
55 | > missing blob 4b9458b3786228369c63936db65827de3cc06200 | |
1797e5c5 | 56 | ----------------------------------------------------------- |
2729cadc NP |
57 | |
58 | Ok, I removed the "dangling commit" messages, because they are just | |
59 | messages about the fact that you probably have rebased etc, so they're not | |
60 | at all interesting. But what remains is still very useful. In particular, | |
61 | we now know which tree points to it! | |
62 | ||
63 | Now you can do | |
64 | ||
65 | git ls-tree 2d9263c6d23595e7cb2a21e5ebbb53655278dff8 | |
66 | ||
67 | which will show something like | |
68 | ||
69 | 100644 blob 8d14531846b95bfa3564b58ccfb7913a034323b8 .gitignore | |
70 | 100644 blob ebf9bf84da0aab5ed944264a5db2a65fe3a3e883 .mailmap | |
71 | 100644 blob ca442d313d86dc67e0a2e5d584b465bd382cbf5c COPYING | |
72 | 100644 blob ee909f2cc49e54f0799a4739d24c4cb9151ae453 CREDITS | |
73 | 040000 tree 0f5f709c17ad89e72bdbbef6ea221c69807009f6 Documentation | |
74 | 100644 blob 1570d248ad9237e4fa6e4d079336b9da62d9ba32 Kbuild | |
75 | 100644 blob 1c7c229a092665b11cd46a25dbd40feeb31661d9 MAINTAINERS | |
76 | ... | |
77 | ||
78 | and you should now have a line that looks like | |
79 | ||
80 | 10064 blob 4b9458b3786228369c63936db65827de3cc06200 my-magic-file | |
81 | ||
82 | in the output. This already tells you a *lot* it tells you what file the | |
83 | corrupt blob came from! | |
84 | ||
85 | Now, it doesn't tell you quite enough, though: it doesn't tell what | |
86 | *version* of the file didn't get correctly written! You might be really | |
87 | lucky, and it may be the version that you already have checked out in your | |
88 | working tree, in which case fixing this problem is really simple, just do | |
89 | ||
90 | git hash-object -w my-magic-file | |
91 | ||
d5fa1f1a | 92 | again, and if it outputs the missing SHA-1 (4b945..) you're now all done! |
2729cadc NP |
93 | |
94 | But that's the really lucky case, so let's assume that it was some older | |
95 | version that was broken. How do you tell which version it was? | |
96 | ||
97 | The easiest way to do it is to do | |
98 | ||
99 | git log --raw --all --full-history -- subdirectory/my-magic-file | |
100 | ||
101 | and that will show you the whole log for that file (please realize that | |
102 | the tree you had may not be the top-level tree, so you need to figure out | |
103 | which subdirectory it was in on your own), and because you're asking for | |
104 | raw output, you'll now get something like | |
105 | ||
106 | commit abc | |
107 | Author: | |
108 | Date: | |
109 | .. | |
110 | :100644 100644 4b9458b... newsha... M somedirectory/my-magic-file | |
111 | ||
112 | ||
113 | commit xyz | |
114 | Author: | |
115 | Date: | |
116 | ||
117 | .. | |
118 | :100644 100644 oldsha... 4b9458b... M somedirectory/my-magic-file | |
119 | ||
120 | and this actually tells you what the *previous* and *subsequent* versions | |
121 | of that file were! So now you can look at those ("oldsha" and "newsha" | |
122 | respectively), and hopefully you have done commits often, and can | |
123 | re-create the missing my-magic-file version by looking at those older and | |
124 | newer versions! | |
125 | ||
126 | If you can do that, you can now recreate the missing object with | |
127 | ||
128 | git hash-object -w <recreated-file> | |
129 | ||
130 | and your repository is good again! | |
131 | ||
132 | (Btw, you could have ignored the fsck, and started with doing a | |
133 | ||
134 | git log --raw --all | |
135 | ||
136 | and just looked for the sha of the missing object (4b9458b..) in that | |
2de9b711 | 137 | whole thing. It's up to you - Git does *have* a lot of information, it is |
2729cadc NP |
138 | just missing one particular blob version. |
139 | ||
140 | Trying to recreate trees and especially commits is *much* harder. So you | |
141 | were lucky that it's a blob. It's quite possible that you can recreate the | |
142 | thing. | |
143 | ||
144 | Linus |