]>
Commit | Line | Data |
---|---|---|
e31952da BF |
1 | A tutorial introduction to git: part two |
2 | ======================================== | |
3 | ||
4 | You should work through link:tutorial.html[A tutorial introduction to | |
5 | git] before reading this tutorial. | |
6 | ||
7 | The goal of this tutorial is to introduce two fundamental pieces of | |
8 | git's architecture--the object database and the index file--and to | |
9 | provide the reader with everything necessary to understand the rest | |
10 | of the git documentation. | |
11 | ||
12 | The git object database | |
13 | ----------------------- | |
14 | ||
15 | Let's start a new project and create a small amount of history: | |
16 | ||
17 | ------------------------------------------------ | |
18 | $ mkdir test-project | |
19 | $ cd test-project | |
20 | $ git init-db | |
21 | defaulting to local storage area | |
22 | $ echo 'hello world' > file.txt | |
23 | $ git add . | |
24 | $ git commit -a -m "initial commit" | |
25 | Committing initial tree 92b8b694ffb1675e5975148e1121810081dbdffe | |
26 | $ echo 'hello world!' >file.txt | |
27 | $ git commit -a -m "add emphasis" | |
28 | ------------------------------------------------ | |
29 | ||
30 | What are the 40 digits of hex that git responded to the first commit | |
31 | with? | |
32 | ||
33 | We saw in part one of the tutorial that commits have names like this. | |
34 | It turns out that every object in the git history is stored under | |
35 | such a 40-digit hex name. That name is the SHA1 hash of the object's | |
36 | contents; among other things, this ensures that git will never store | |
37 | the same data twice (since identical data is given an identical SHA1 | |
38 | name), and that the contents of a git object will never change (since | |
39 | that would change the object's name as well). | |
40 | ||
41 | We can ask git about this particular object with the cat-file | |
42 | command--just cut-and-paste from the reply to the initial commit, to | |
43 | save yourself typing all 40 hex digits: | |
44 | ||
45 | ------------------------------------------------ | |
46 | $ git cat-file -t 92b8b694ffb1675e5975148e1121810081dbdffe | |
47 | tree | |
48 | ------------------------------------------------ | |
49 | ||
50 | A tree can refer to one or more "blob" objects, each corresponding to | |
51 | a file. In addition, a tree can also refer to other tree objects, | |
abda1ef5 | 52 | thus creating a directory hierarchy. You can examine the contents of |
e31952da BF |
53 | any tree using ls-tree (remember that a long enough initial portion |
54 | of the SHA1 will also work): | |
55 | ||
56 | ------------------------------------------------ | |
57 | $ git ls-tree 92b8b694 | |
58 | 100644 blob 3b18e512dba79e4c8300dd08aeb37f8e728b8dad file.txt | |
59 | ------------------------------------------------ | |
60 | ||
61 | Thus we see that this tree has one file in it. The SHA1 hash is a | |
62 | reference to that file's data: | |
63 | ||
64 | ------------------------------------------------ | |
65 | $ git cat-file -t 3b18e512 | |
66 | blob | |
67 | ------------------------------------------------ | |
68 | ||
69 | A "blob" is just file data, which we can also examine with cat-file: | |
70 | ||
71 | ------------------------------------------------ | |
72 | $ git cat-file blob 3b18e512 | |
73 | hello world | |
74 | ------------------------------------------------ | |
75 | ||
76 | Note that this is the old file data; so the object that git named in | |
77 | its response to the initial tree was a tree with a snapshot of the | |
78 | directory state that was recorded by the first commit. | |
79 | ||
80 | All of these objects are stored under their SHA1 names inside the git | |
81 | directory: | |
82 | ||
83 | ------------------------------------------------ | |
84 | $ find .git/objects/ | |
85 | .git/objects/ | |
86 | .git/objects/pack | |
87 | .git/objects/info | |
88 | .git/objects/3b | |
89 | .git/objects/3b/18e512dba79e4c8300dd08aeb37f8e728b8dad | |
90 | .git/objects/92 | |
91 | .git/objects/92/b8b694ffb1675e5975148e1121810081dbdffe | |
92 | .git/objects/54 | |
93 | .git/objects/54/196cc2703dc165cbd373a65a4dcf22d50ae7f7 | |
94 | .git/objects/a0 | |
95 | .git/objects/a0/423896973644771497bdc03eb99d5281615b51 | |
96 | .git/objects/d0 | |
97 | .git/objects/d0/492b368b66bdabf2ac1fd8c92b39d3db916e59 | |
98 | .git/objects/c4 | |
99 | .git/objects/c4/d59f390b9cfd4318117afde11d601c1085f241 | |
100 | ------------------------------------------------ | |
101 | ||
102 | and the contents of these files is just the compressed data plus a | |
103 | header identifying their length and their type. The type is either a | |
104 | blob, a tree, a commit, or a tag. We've seen a blob and a tree now, | |
105 | so next we should look at a commit. | |
106 | ||
107 | The simplest commit to find is the HEAD commit, which we can find | |
108 | from .git/HEAD: | |
109 | ||
110 | ------------------------------------------------ | |
111 | $ cat .git/HEAD | |
112 | ref: refs/heads/master | |
113 | ------------------------------------------------ | |
114 | ||
115 | As you can see, this tells us which branch we're currently on, and it | |
116 | tells us this by naming a file under the .git directory, which itself | |
117 | contains a SHA1 name referring to a commit object, which we can | |
118 | examine with cat-file: | |
119 | ||
120 | ------------------------------------------------ | |
121 | $ cat .git/refs/heads/master | |
122 | c4d59f390b9cfd4318117afde11d601c1085f241 | |
123 | $ git cat-file -t c4d59f39 | |
124 | commit | |
125 | $ git cat-file commit c4d59f39 | |
126 | tree d0492b368b66bdabf2ac1fd8c92b39d3db916e59 | |
127 | parent 54196cc2703dc165cbd373a65a4dcf22d50ae7f7 | |
128 | author J. Bruce Fields <bfields@puzzle.fieldses.org> 1143418702 -0500 | |
129 | committer J. Bruce Fields <bfields@puzzle.fieldses.org> 1143418702 -0500 | |
130 | ||
131 | add emphasis | |
132 | ------------------------------------------------ | |
133 | ||
134 | The "tree" object here refers to the new state of the tree: | |
135 | ||
136 | ------------------------------------------------ | |
137 | $ git ls-tree d0492b36 | |
138 | 100644 blob a0423896973644771497bdc03eb99d5281615b51 file.txt | |
2befe6fe | 139 | $ git cat-file blob a0423896 |
e31952da BF |
140 | hello world! |
141 | ------------------------------------------------ | |
142 | ||
143 | and the "parent" object refers to the previous commit: | |
144 | ||
145 | ------------------------------------------------ | |
146 | $ git-cat-file commit 54196cc2 | |
147 | tree 92b8b694ffb1675e5975148e1121810081dbdffe | |
148 | author J. Bruce Fields <bfields@puzzle.fieldses.org> 1143414668 -0500 | |
149 | committer J. Bruce Fields <bfields@puzzle.fieldses.org> 1143414668 -0500 | |
150 | ||
151 | initial commit | |
152 | ------------------------------------------------ | |
153 | ||
154 | The tree object is the tree we examined first, and this commit is | |
155 | unusual in that it lacks any parent. | |
156 | ||
157 | Most commits have only one parent, but it is also common for a commit | |
158 | to have multiple parents. In that case the commit represents a | |
159 | merge, with the parent references pointing to the heads of the merged | |
160 | branches. | |
161 | ||
162 | Besides blobs, trees, and commits, the only remaining type of object | |
163 | is a "tag", which we won't discuss here; refer to gitlink:git-tag[1] | |
164 | for details. | |
165 | ||
166 | So now we know how git uses the object database to represent a | |
167 | project's history: | |
168 | ||
169 | * "commit" objects refer to "tree" objects representing the | |
170 | snapshot of a directory tree at a particular point in the | |
171 | history, and refer to "parent" commits to show how they're | |
172 | connected into the project history. | |
173 | * "tree" objects represent the state of a single directory, | |
174 | associating directory names to "blob" objects containing file | |
175 | data and "tree" objects containing subdirectory information. | |
176 | * "blob" objects contain file data without any other structure. | |
177 | * References to commit objects at the head of each branch are | |
178 | stored in files under .git/refs/heads/. | |
179 | * The name of the current branch is stored in .git/HEAD. | |
180 | ||
181 | Note, by the way, that lots of commands take a tree as an argument. | |
182 | But as we can see above, a tree can be referred to in many different | |
183 | ways--by the SHA1 name for that tree, by the name of a commit that | |
184 | refers to the tree, by the name of a branch whose head refers to that | |
185 | tree, etc.--and most such commands can accept any of these names. | |
186 | ||
187 | In command synopses, the word "tree-ish" is sometimes used to | |
188 | designate such an argument. | |
189 | ||
190 | The index file | |
191 | -------------- | |
192 | ||
193 | The primary tool we've been using to create commits is "git commit | |
194 | -a", which creates a commit including every change you've made to | |
195 | your working tree. But what if you want to commit changes only to | |
196 | certain files? Or only certain changes to certain files? | |
197 | ||
198 | If we look at the way commits are created under the cover, we'll see | |
199 | that there are more flexible ways creating commits. | |
200 | ||
201 | Continuing with our test-project, let's modify file.txt again: | |
202 | ||
203 | ------------------------------------------------ | |
204 | $ echo "hello world, again" >>file.txt | |
205 | ------------------------------------------------ | |
206 | ||
207 | but this time instead of immediately making the commit, let's take an | |
208 | intermediate step, and ask for diffs along the way to keep track of | |
209 | what's happening: | |
210 | ||
211 | ------------------------------------------------ | |
212 | $ git diff | |
213 | --- a/file.txt | |
214 | +++ b/file.txt | |
215 | @@ -1 +1,2 @@ | |
216 | hello world! | |
d5e3d60c | 217 | +hello world, again |
e31952da BF |
218 | $ git update-index file.txt |
219 | $ git diff | |
220 | ------------------------------------------------ | |
221 | ||
222 | The last diff is empty, but no new commits have been made, and the | |
223 | head still doesn't contain the new line: | |
224 | ||
225 | ------------------------------------------------ | |
226 | $ git-diff HEAD | |
227 | diff --git a/file.txt b/file.txt | |
228 | index a042389..513feba 100644 | |
229 | --- a/file.txt | |
230 | +++ b/file.txt | |
231 | @@ -1 +1,2 @@ | |
232 | hello world! | |
d5e3d60c | 233 | +hello world, again |
e31952da BF |
234 | ------------------------------------------------ |
235 | ||
236 | So "git diff" is comparing against something other than the head. | |
237 | The thing that it's comparing against is actually the index file, | |
238 | which is stored in .git/index in a binary format, but whose contents | |
239 | we can examine with ls-files: | |
240 | ||
241 | ------------------------------------------------ | |
242 | $ git ls-files --stage | |
243 | 100644 513feba2e53ebbd2532419ded848ba19de88ba00 0 file.txt | |
244 | $ git cat-file -t 513feba2 | |
245 | blob | |
246 | $ git cat-file blob 513feba2 | |
247 | hello world, again | |
248 | ------------------------------------------------ | |
249 | ||
250 | So what our "git update-index" did was store a new blob and then put | |
251 | a reference to it in the index file. If we modify the file again, | |
252 | we'll see that the new modifications are reflected in the "git-diff" | |
253 | output: | |
254 | ||
255 | ------------------------------------------------ | |
256 | $ echo 'again?' >>file.txt | |
257 | $ git diff | |
258 | index 513feba..ba3da7b 100644 | |
259 | --- a/file.txt | |
260 | +++ b/file.txt | |
261 | @@ -1,2 +1,3 @@ | |
262 | hello world! | |
263 | hello world, again | |
264 | +again? | |
265 | ------------------------------------------------ | |
266 | ||
267 | With the right arguments, git diff can also show us the difference | |
268 | between the working directory and the last commit, or between the | |
269 | index and the last commit: | |
270 | ||
271 | ------------------------------------------------ | |
272 | $ git diff HEAD | |
273 | diff --git a/file.txt b/file.txt | |
274 | index a042389..ba3da7b 100644 | |
275 | --- a/file.txt | |
276 | +++ b/file.txt | |
277 | @@ -1 +1,3 @@ | |
278 | hello world! | |
279 | +hello world, again | |
280 | +again? | |
281 | $ git diff --cached | |
282 | diff --git a/file.txt b/file.txt | |
283 | index a042389..513feba 100644 | |
284 | --- a/file.txt | |
285 | +++ b/file.txt | |
286 | @@ -1 +1,2 @@ | |
287 | hello world! | |
288 | +hello world, again | |
289 | ------------------------------------------------ | |
290 | ||
291 | At any time, we can create a new commit using "git commit" (without | |
292 | the -a option), and verify that the state committed only includes the | |
293 | changes stored in the index file, not the additional change that is | |
294 | still only in our working tree: | |
295 | ||
296 | ------------------------------------------------ | |
297 | $ git commit -m "repeat" | |
298 | $ git diff HEAD | |
299 | diff --git a/file.txt b/file.txt | |
300 | index 513feba..ba3da7b 100644 | |
301 | --- a/file.txt | |
302 | +++ b/file.txt | |
303 | @@ -1,2 +1,3 @@ | |
304 | hello world! | |
305 | hello world, again | |
306 | +again? | |
307 | ------------------------------------------------ | |
308 | ||
309 | So by default "git commit" uses the index to create the commit, not | |
310 | the working tree; the -a option to commit tells it to first update | |
311 | the index with all changes in the working tree. | |
312 | ||
313 | Finally, it's worth looking at the effect of "git add" on the index | |
314 | file: | |
315 | ||
316 | ------------------------------------------------ | |
317 | $ echo "goodbye, world" >closing.txt | |
318 | $ git add closing.txt | |
319 | ------------------------------------------------ | |
320 | ||
321 | The effect of the "git add" was to add one entry to the index file: | |
322 | ||
323 | ------------------------------------------------ | |
324 | $ git ls-files --stage | |
325 | 100644 8b9743b20d4b15be3955fc8d5cd2b09cd2336138 0 closing.txt | |
326 | 100644 513feba2e53ebbd2532419ded848ba19de88ba00 0 file.txt | |
327 | ------------------------------------------------ | |
328 | ||
329 | And, as you can see with cat-file, this new entry refers to the | |
330 | current contents of the file: | |
331 | ||
332 | ------------------------------------------------ | |
333 | $ git cat-file blob a6b11f7a | |
334 | goodbye, word | |
335 | ------------------------------------------------ | |
336 | ||
337 | The "status" command is a useful way to get a quick summary of the | |
338 | situation: | |
339 | ||
340 | ------------------------------------------------ | |
341 | $ git status | |
342 | # | |
343 | # Updated but not checked in: | |
344 | # (will commit) | |
345 | # | |
346 | # new file: closing.txt | |
347 | # | |
348 | # | |
349 | # Changed but not updated: | |
350 | # (use git-update-index to mark for commit) | |
351 | # | |
352 | # modified: file.txt | |
353 | # | |
354 | ------------------------------------------------ | |
355 | ||
356 | Since the current state of closing.txt is cached in the index file, | |
357 | it is listed as "updated but not checked in". Since file.txt has | |
358 | changes in the working directory that aren't reflected in the index, | |
359 | it is marked "changed but not updated". At this point, running "git | |
360 | commit" would create a commit that added closing.txt (with its new | |
361 | contents), but that didn't modify file.txt. | |
362 | ||
363 | Also, note that a bare "git diff" shows the changes to file.txt, but | |
364 | not the addition of closing.txt, because the version of closing.txt | |
365 | in the index file is identical to the one in the working directory. | |
366 | ||
367 | In addition to being the staging area for new commits, the index file | |
368 | is also populated from the object database when checking out a | |
369 | branch, and is used to hold the trees involved in a merge operation. | |
370 | See the link:core-tutorial.txt[core tutorial] and the relevant man | |
371 | pages for details. | |
372 | ||
373 | What next? | |
374 | ---------- | |
375 | ||
376 | At this point you should know everything necessary to read the man | |
377 | pages for any of the git commands; one good place to start would be | |
884e3134 | 378 | with the commands mentioned in link:everyday.html[Everyday git]. You |
e31952da | 379 | should be able to find any unknown jargon in the |
a746f688 | 380 | link:glossary.html[Glossary]. |
e31952da BF |
381 | |
382 | The link:cvs-migration.html[CVS migration] document explains how to | |
383 | import a CVS repository into git, and shows how to use git in a | |
384 | CVS-like way. | |
385 | ||
386 | For some interesting examples of git use, see the | |
387 | link:howto-index.html[howtos]. | |
388 | ||
389 | For git developers, the link:core-tutorial.html[Core tutorial] goes | |
390 | into detail on the lower-level git mechanisms involved in, for | |
391 | example, creating a new commit. |