]>
Commit | Line | Data |
---|---|---|
8c7fa247 LT |
1 | A short git tutorial |
2 | ==================== | |
3 | May 2005 | |
4 | ||
5 | ||
6 | Introduction | |
7 | ------------ | |
8 | ||
9 | This is trying to be a short tutorial on setting up and using a git | |
10 | archive, mainly because being hands-on and using explicit examples is | |
11 | often the best way of explaining what is going on. | |
12 | ||
13 | In normal life, most people wouldn't use the "core" git programs | |
14 | directly, but rather script around them to make them more palatable. | |
15 | Understanding the core git stuff may help some people get those scripts | |
16 | done, though, and it may also be instructive in helping people | |
17 | understand what it is that the higher-level helper scripts are actually | |
18 | doing. | |
19 | ||
20 | The core git is often called "plumbing", with the prettier user | |
f35ca9ed LT |
21 | interfaces on top of it called "porcelain". You may not want to use the |
22 | plumbing directly very often, but it can be good to know what the | |
23 | plumbing does for when the porcelain isn't flushing... | |
8c7fa247 LT |
24 | |
25 | ||
26 | Creating a git archive | |
27 | ---------------------- | |
28 | ||
29 | Creating a new git archive couldn't be easier: all git archives start | |
30 | out empty, and the only thing you need to do is find yourself a | |
31 | subdirectory that you want to use as a working tree - either an empty | |
32 | one for a totally new project, or an existing working tree that you want | |
33 | to import into git. | |
34 | ||
837eedf4 | 35 | For our first example, we're going to start a totally new archive from |
8c7fa247 LT |
36 | scratch, with no pre-existing files, and we'll call it "git-tutorial". |
37 | To start up, create a subdirectory for it, change into that | |
38 | subdirectory, and initialize the git infrastructure with "git-init-db": | |
39 | ||
40 | mkdir git-tutorial | |
41 | cd git-tutorial | |
42 | git-init-db | |
43 | ||
44 | to which git will reply | |
45 | ||
46 | defaulting to local storage area | |
47 | ||
837eedf4 | 48 | which is just git's way of saying that you haven't been doing anything |
8c7fa247 LT |
49 | strange, and that it will have created a local .git directory setup for |
50 | your new project. You will now have a ".git" directory, and you can | |
51 | inspect that with "ls". For your new empty project, ls should show you | |
52 | three entries: | |
53 | ||
54 | - a symlink called HEAD, pointing to "refs/heads/master" | |
55 | ||
56 | Don't worry about the fact that the file that the HEAD link points to | |
837eedf4 | 57 | doesn't even exist yet - you haven't created the commit that will |
8c7fa247 LT |
58 | start your HEAD development branch yet. |
59 | ||
60 | - a subdirectory called "objects", which will contain all the git SHA1 | |
61 | objects of your project. You should never have any real reason to | |
62 | look at the objects directly, but you might want to know that these | |
63 | objects are what contains all the real _data_ in your repository. | |
64 | ||
65 | - a subdirectory called "refs", which contains references to objects. | |
66 | ||
67 | In particular, the "refs" subdirectory will contain two other | |
68 | subdirectories, named "heads" and "tags" respectively. They do | |
69 | exactly what their names imply: they contain references to any number | |
70 | of different "heads" of development (aka "branches"), and to any | |
71 | "tags" that you have created to name specific versions of your | |
72 | repository. | |
73 | ||
74 | One note: the special "master" head is the default branch, which is | |
75 | why the .git/HEAD file was created as a symlink to it even if it | |
837eedf4 | 76 | doesn't yet exist. Basically, the HEAD link is supposed to always |
8c7fa247 LT |
77 | point to the branch you are working on right now, and you always |
78 | start out expecting to work on the "master" branch. | |
79 | ||
80 | However, this is only a convention, and you can name your branches | |
81 | anything you want, and don't have to ever even _have_ a "master" | |
82 | branch. A number of the git tools will assume that .git/HEAD is | |
83 | valid, though. | |
84 | ||
85 | [ Implementation note: an "object" is identified by its 160-bit SHA1 | |
86 | hash, aka "name", and a reference to an object is always the 40-byte | |
87 | hex representation of that SHA1 name. The files in the "refs" | |
88 | subdirectory are expected to contain these hex references (usually | |
89 | with a final '\n' at the end), and you should thus expect to see a | |
90 | number of 41-byte files containing these references in this refs | |
91 | subdirectories when you actually start populating your tree ] | |
92 | ||
93 | You have now created your first git archive. Of course, since it's | |
94 | empty, that's not very useful, so let's start populating it with data. | |
95 | ||
96 | ||
97 | Populating a git archive | |
98 | ------------------------ | |
99 | ||
100 | We'll keep this simple and stupid, so we'll start off with populating a | |
101 | few trivial files just to get a feel for it. | |
102 | ||
103 | Start off with just creating any random files that you want to maintain | |
104 | in your git archive. We'll start off with a few bad examples, just to | |
105 | get a feel for how this works: | |
106 | ||
107 | echo "Hello World" > a | |
108 | echo "Silly example" > b | |
109 | ||
110 | you have now created two files in your working directory, but to | |
111 | actually check in your hard work, you will have to go through two steps: | |
112 | ||
113 | - fill in the "cache" aka "index" file with the information about your | |
114 | working directory state | |
115 | ||
116 | - commit that index file as an object. | |
117 | ||
118 | The first step is trivial: when you want to tell git about any changes | |
119 | to your working directory, you use the "git-update-cache" program. That | |
120 | program normally just takes a list of filenames you want to update, but | |
121 | to avoid trivial mistakes, it refuses to add new entries to the cache | |
122 | (or remove existing ones) unless you explicitly tell it that you're | |
123 | adding a new entry with the "--add" flag (or removing an entry with the | |
124 | "--remove") flag. | |
125 | ||
126 | So to populate the index with the two files you just created, you can do | |
127 | ||
128 | git-update-cache --add a b | |
129 | ||
130 | and you have now told git to track those two files. | |
131 | ||
132 | In fact, as you did that, if you now look into your object directory, | |
837eedf4 | 133 | you'll notice that git will have added two new objects to the object |
8c7fa247 LT |
134 | store. If you did exactly the steps above, you should now be able to do |
135 | ||
136 | ls .git/objects/??/* | |
137 | ||
138 | and see two files: | |
139 | ||
140 | .git/objects/55/7db03de997c86a4a028e1ebd3a1ceb225be238 | |
141 | .git/objects/f2/4c74a2e500f5ee1332c86b94199f52b1d1d962 | |
142 | ||
143 | which correspond with the object with SHA1 names of 557db... and f24c7.. | |
144 | respectively. | |
145 | ||
146 | If you want to, you can use "git-cat-file" to look at those objects, but | |
147 | you'll have to use the object name, not the filename of the object: | |
148 | ||
149 | git-cat-file -t 557db03de997c86a4a028e1ebd3a1ceb225be238 | |
150 | ||
151 | where the "-t" tells git-cat-file to tell you what the "type" of the | |
152 | object is. Git will tell you that you have a "blob" object (ie just a | |
153 | regular file), and you can see the contents with | |
154 | ||
155 | git-cat-file "blob" 557db03de997c86a4a028e1ebd3a1ceb225be238 | |
156 | ||
157 | which will print out "Hello World". The object 557db... is nothing | |
158 | more than the contents of your file "a". | |
159 | ||
160 | [ Digression: don't confuse that object with the file "a" itself. The | |
81bb573e LT |
161 | object is literally just those specific _contents_ of the file, and |
162 | however much you later change the contents in file "a", the object we | |
163 | just looked at will never change. Objects are immutable. ] | |
8c7fa247 LT |
164 | |
165 | Anyway, as we mentioned previously, you normally never actually take a | |
166 | look at the objects themselves, and typing long 40-character hex SHA1 | |
167 | names is not something you'd normally want to do. The above digression | |
168 | was just to show that "git-update-cache" did something magical, and | |
169 | actually saved away the contents of your files into the git content | |
170 | store. | |
171 | ||
172 | Updating the cache did something else too: it created a ".git/index" | |
173 | file. This is the index that describes your current working tree, and | |
174 | something you should be very aware of. Again, you normally never worry | |
175 | about the index file itself, but you should be aware of the fact that | |
176 | you have not actually really "checked in" your files into git so far, | |
177 | you've only _told_ git about them. | |
178 | ||
f35ca9ed | 179 | However, since git knows about them, you can now start using some of the |
8c7fa247 LT |
180 | most basic git commands to manipulate the files or look at their status. |
181 | ||
182 | In particular, let's not even check in the two files into git yet, we'll | |
183 | start off by adding another line to "a" first: | |
184 | ||
185 | echo "It's a new day for git" >> a | |
186 | ||
187 | and you can now, since you told git about the previous state of "a", ask | |
188 | git what has changed in the tree compared to your old index, using the | |
189 | "git-diff-files" command: | |
190 | ||
191 | git-diff-files | |
192 | ||
193 | oops. That wasn't very readable. It just spit out its own internal | |
194 | version of a "diff", but that internal version really just tells you | |
195 | that it has noticed that "a" has been modified, and that the old object | |
196 | contents it had have been replaced with something else. | |
197 | ||
198 | To make it readable, we can tell git-diff-files to output the | |
199 | differences as a patch, using the "-p" flag: | |
200 | ||
201 | git-diff-files -p | |
202 | ||
203 | which will spit out | |
204 | ||
205 | diff --git a/a b/a | |
206 | --- a/a | |
207 | +++ b/a | |
208 | @@ -1 +1,2 @@ | |
209 | Hello World | |
210 | +It's a new day for git | |
211 | ||
212 | ie the diff of the change we caused by adding another line to "a". | |
213 | ||
214 | In other words, git-diff-files always shows us the difference between | |
215 | what is recorded in the index, and what is currently in the working | |
216 | tree. That's very useful. | |
217 | ||
218 | ||
219 | Committing git state | |
220 | -------------------- | |
221 | ||
222 | Now, we want to go to the next stage in git, which is to take the files | |
223 | that git knows about in the index, and commit them as a real tree. We do | |
224 | that in two phases: creating a "tree" object, and committing that "tree" | |
225 | object as a "commit" object together with an explanation of what the | |
226 | tree was all about, along with information of how we came to that state. | |
227 | ||
228 | Creating a tree object is trivial, and is done with "git-write-tree". | |
229 | There are no options or other input: git-write-tree will take the | |
230 | current index state, and write an object that describes that whole | |
231 | index. In other words, we're now tying together all the different | |
232 | filenames with their contents (and their permissions), and we're | |
233 | creating the equivalent of a git "directory" object: | |
234 | ||
235 | git-write-tree | |
236 | ||
237 | and this will just output the name of the resulting tree, in this case | |
238 | (if you have does exactly as I've described) it should be | |
239 | ||
240 | 3ede4ed7e895432c0a247f09d71a76db53bd0fa4 | |
241 | ||
242 | which is another incomprehensible object name. Again, if you want to, | |
243 | you can use "git-cat-file -t 3ede4.." to see that this time the object | |
244 | is not a "blob" object, but a "tree" object (you can also use | |
245 | git-cat-file to actually output the raw object contents, but you'll see | |
246 | mainly a binary mess, so that's less interesting). | |
247 | ||
248 | However - normally you'd never use "git-write-tree" on its own, because | |
249 | normally you always commit a tree into a commit object using the | |
250 | "git-commit-tree" command. In fact, it's easier to not actually use | |
251 | git-write-tree on its own at all, but to just pass its result in as an | |
252 | argument to "git-commit-tree". | |
253 | ||
254 | "git-commit-tree" normally takes several arguments - it wants to know | |
255 | what the _parent_ of a commit was, but since this is the first commit | |
256 | ever in this new archive, and it has no parents, we only need to pass in | |
257 | the tree ID. However, git-commit-tree also wants to get a commit message | |
258 | on its standard input, and it will write out the resulting ID for the | |
259 | commit to its standard output. | |
260 | ||
261 | And this is where we start using the .git/HEAD file. The HEAD file is | |
262 | supposed to contain the reference to the top-of-tree, and since that's | |
263 | exactly what git-commit-tree spits out, we can do this all with a simple | |
264 | shell pipeline: | |
265 | ||
266 | echo "Initial commit" | git-commit-tree $(git-write-tree) > .git/HEAD | |
267 | ||
268 | which will say: | |
269 | ||
270 | Committing initial tree 3ede4ed7e895432c0a247f09d71a76db53bd0fa4 | |
271 | ||
272 | just to warn you about the fact that it created a totally new commit | |
273 | that is not related to anything else. Normally you do this only _once_ | |
274 | for a project ever, and all later commits will be parented on top of an | |
275 | earlier commit, and you'll never see this "Committing initial tree" | |
276 | message ever again. | |
277 | ||
278 | ||
279 | Making a change | |
280 | --------------- | |
281 | ||
282 | Remember how we did the "git-update-cache" on file "a" and then we | |
837eedf4 | 283 | changed "a" afterward, and could compare the new state of "a" with the |
8c7fa247 LT |
284 | state we saved in the index file? |
285 | ||
286 | Further, remember how I said that "git-write-tree" writes the contents | |
287 | of the _index_ file to the tree, and thus what we just committed was in | |
288 | fact the _original_ contents of the file "a", not the new ones. We did | |
289 | that on purpose, to show the difference between the index state, and the | |
290 | state in the working directory, and how they don't have to match, even | |
291 | when we commit things. | |
292 | ||
293 | As before, if we do "git-diff-files -p" in our git-tutorial project, | |
294 | we'll still see the same difference we saw last time: the index file | |
295 | hasn't changed by the act of committing anything. However, now that we | |
296 | have committed something, we can also learn to use a new command: | |
297 | "git-diff-cache". | |
298 | ||
299 | Unlike "git-diff-files", which showed the difference between the index | |
300 | file and the working directory, "git-diff-cache" shows the differences | |
301 | between a committed _tree_ and the index file. In other words, | |
302 | git-diff-cache wants a tree to be diffed against, and before we did the | |
303 | commit, we couldn't do that, because we didn't have anything to diff | |
304 | against. | |
305 | ||
306 | But now we can do | |
307 | ||
308 | git-diff-cache -p HEAD | |
309 | ||
310 | (where "-p" has the same meaning as it did in git-diff-files), and it | |
311 | will show us the same difference, but for a totally different reason. | |
312 | Now we're not comparing against the index file, we're comparing against | |
313 | the tree we just wrote. It just so happens that those two are obviously | |
314 | the same. | |
315 | ||
316 | "git-diff-cache" also has a specific flag "--cached", which is used to | |
317 | tell it to show the differences purely with the index file, and ignore | |
318 | the current working directory state entirely. Since we just wrote the | |
319 | index file to HEAD, doing "git-diff-cache --cached -p HEAD" should thus | |
320 | return an empty set of differences, and that's exactly what it does. | |
321 | ||
322 | However, our next step is to commit the _change_ we did, and again, to | |
837eedf4 | 323 | understand what's going on, keep in mind the difference between "working |
8c7fa247 LT |
324 | directory contents", "index file" and "committed tree". We have changes |
325 | in the working directory that we want to commit, and we always have to | |
326 | work through the index file, so the first thing we need to do is to | |
327 | update the index cache: | |
328 | ||
329 | git-update-cache a | |
330 | ||
331 | (note how we didn't need the "--add" flag this time, since git knew | |
332 | about the file already). | |
333 | ||
334 | Note what happens to the different git-diff-xxx versions here. After | |
335 | we've updated "a" in the index, "git-diff-files -p" now shows no | |
336 | differences, but "git-diff-cache -p HEAD" still _does_ show that the | |
337 | current state is different from the state we committed. In fact, now | |
338 | "git-diff-cache" shows the same difference whether we use the "--cached" | |
339 | flag or not, since now the index is coherent with the working directory. | |
340 | ||
341 | Now, since we've updated "a" in the index, we can commit the new | |
342 | version. We could do it by writing the tree by hand, and committing the | |
343 | tree (this time we'd have to use the "-p HEAD" flag to tell commit that | |
837eedf4 | 344 | the HEAD was the _parent_ of the new commit, and that this wasn't an |
8c7fa247 LT |
345 | initial commit any more), but the fact is, git has a simple helper |
346 | script for doing all of the non-initial commits that does all of this | |
347 | for you, and starts up an editor to let you write your commit message | |
348 | yourself, so let's just use that: | |
349 | ||
81bb573e | 350 | git commit |
8c7fa247 LT |
351 | |
352 | Write whatever message you want, and all the lines that start with '#' | |
353 | will be pruned out, and the rest will be used as the commit message for | |
354 | the change. If you decide you don't want to commit anything after all at | |
355 | this point (you can continue to edit things and update the cache), you | |
356 | can just leave an empty message. Otherwise git-commit-script will commit | |
357 | the change for you. | |
358 | ||
359 | (Btw, current versions of git will consider the change in question to be | |
360 | so big that it's considered a whole new file, since the diff is actually | |
361 | bigger than the file. So the helpful comments that git-commit-script | |
362 | tells you for this example will say that you deleted and re-created the | |
837eedf4 | 363 | file "a". For a less contrived example, these things are usually more |
8c7fa247 LT |
364 | obvious). |
365 | ||
366 | You've now made your first real git commit. And if you're interested in | |
367 | looking at what git-commit-script really does, feel free to investigate: | |
368 | it's a few very simple shell scripts to generate the helpful (?) commit | |
369 | message headers, and a few one-liners that actually do the commit itself. | |
370 | ||
371 | ||
372 | Checking it out | |
373 | --------------- | |
374 | ||
375 | While creating changes is useful, it's even more useful if you can tell | |
376 | later what changed. The most useful command for this is another of the | |
377 | "diff" family, namely "git-diff-tree". | |
378 | ||
379 | git-diff-tree can be given two arbitrary trees, and it will tell you the | |
380 | differences between them. Perhaps even more commonly, though, you can | |
381 | give it just a single commit object, and it will figure out the parent | |
382 | of that commit itself, and show the difference directly. Thus, to get | |
383 | the same diff that we've already seen several times, we can now do | |
384 | ||
385 | git-diff-tree -p HEAD | |
386 | ||
387 | (again, "-p" means to show the difference as a human-readable patch), | |
388 | and it will show what the last commit (in HEAD) actually changed. | |
389 | ||
390 | More interestingly, you can also give git-diff-tree the "-v" flag, which | |
391 | tells it to also show the commit message and author and date of the | |
392 | commit, and you can tell it to show a whole series of diffs. | |
393 | Alternatively, you can tell it to be "silent", and not show the diffs at | |
394 | all, but just show the actual commit message. | |
395 | ||
396 | In fact, together with the "git-rev-list" program (which generates a | |
397 | list of revisions), git-diff-tree ends up being a veritable fount of | |
398 | changes. A trivial (but very useful) script called "git-whatchanged" is | |
399 | included with git which does exactly this, and shows a log of recent | |
400 | activity. | |
401 | ||
81bb573e | 402 | To see the whole history of our pitiful little git-tutorial project, you |
8c7fa247 LT |
403 | can do |
404 | ||
81bb573e LT |
405 | git log |
406 | ||
407 | which shows just the log messages, or if we want to see the log together | |
408 | whith the associated patches use the more complex (and much more | |
409 | powerful) | |
410 | ||
837eedf4 | 411 | git-whatchanged -p --root |
8c7fa247 | 412 | |
81bb573e LT |
413 | and you will see exactly what has changed in the repository over its |
414 | short history. | |
415 | ||
416 | [ Side note: the "--root" flag is a flag to git-diff-tree to tell it to | |
417 | show the initial aka "root" commit too. Normally you'd probably not | |
418 | want to see the initial import diff, but since the tutorial project | |
419 | was started from scratch and is so small, we use it to make the result | |
420 | a bit more interesting ] | |
8c7fa247 | 421 | |
837eedf4 | 422 | With that, you should now be having some inkling of what git does, and |
8c7fa247 LT |
423 | can explore on your own. |
424 | ||
f35ca9ed LT |
425 | |
426 | Copoying archives | |
427 | ----------------- | |
428 | ||
429 | Git arhives are normally totally self-sufficient, and it's worth noting | |
430 | that unlike CVS, for example, there is no separate notion of | |
431 | "repository" and "working tree". A git repository normally _is_ the | |
432 | working tree, with the local git information hidden in the ".git" | |
433 | subdirectory. There is nothing else. What you see is what you got. | |
434 | ||
435 | [ Side note: you can tell git to split the git internal information from | |
436 | the directory that it tracks, but we'll ignore that for now: it's not | |
437 | how normal projects work, and it's really only meant for special uses. | |
438 | So the mental model of "the git information is always tied directly to | |
439 | the working directory that it describes" may not be technically 100% | |
440 | accurate, but it's a good model for all normal use ] | |
441 | ||
442 | This has two implications: | |
443 | ||
444 | - if you grow bored with the tutorial archive you created (or you've | |
445 | made a mistake and want to start all over), you can just do simple | |
446 | ||
447 | rm -rf git-tutorial | |
448 | ||
449 | and it will be gone. There's no external repository, and there's no | |
450 | history outside of the project you created. | |
451 | ||
452 | - if you want to move or duplicate a git archive, you can do so. There | |
453 | is no "git clone" command: if you want to create a copy of your | |
454 | archive (with all the full history that went along with it), you can | |
455 | do so with a regular "cp -a git-tutorial new-git-tutorial". | |
456 | ||
457 | Note that when you've moved or copied a git archive, your git index | |
458 | file (which caches various information, notably some of the "stat" | |
459 | information for the files involved) will likely need to be refreshed. | |
460 | So after you do a "cp -a" to create a new copy, you'll want to do | |
461 | ||
462 | git-update-cache --refresh | |
463 | ||
464 | to make sure that the index file is up-to-date in the new one. | |
465 | ||
466 | Note that the second point is true even across machines. You can | |
467 | duplicate a remote git archive with _any_ regular copy mechanism, be it | |
468 | "scp", "rsync" or "wget". | |
469 | ||
470 | When copying a remote repository, you'll want to at a minimum update the | |
471 | index cache when you do this, and especially with other peoples | |
472 | repositories you often want to make sure that the index cache is in some | |
473 | known state (you don't know _what_ they've done and not yet checked in), | |
474 | so usually you'll precede the "git-update-cache" with a | |
475 | ||
476 | git-read-tree HEAD | |
477 | git-update-cache --refresh | |
478 | ||
479 | which will force a total index re-build from the tree pointed to by | |
480 | HEAD. | |
481 | ||
482 | In fact, many public remote repositories will not contain any of the | |
483 | checked out files or even an index file, and will _only_ contain the | |
484 | actual core git files. Such a repository usually doesn't even have the | |
485 | ".git" subdirectory, but has all the git files directly in the | |
486 | repository. | |
487 | ||
488 | To create your own local live copy of such a "raw" git repository, you'd | |
489 | first create your own subdirectory for the project, adn then copy the | |
490 | raw repository contents into the ".git" directory. For example, to | |
491 | create your own copy of the git repository, you'd do the following | |
492 | ||
493 | mkdir my-git | |
494 | cd my-git | |
495 | rsync -rL rsync://rsync.kernel.org/pub/scm/linux/kernel/git/torvalds/git.git/ .git | |
496 | ||
497 | followed by | |
498 | ||
499 | git-read-tree HEAD | |
500 | ||
501 | to populate the index. However, now you have populated the index, and | |
502 | you have all the git internal files, but you will notice that you don't | |
503 | actually have any of the _working_directory_ files to work on. To get | |
504 | those, you'd check them out with | |
505 | ||
506 | git-checkout-cache -u -a | |
507 | ||
508 | where the "-u" flag means that you want the checkout to keep the index | |
509 | up-to-date (so that you don't have to refresh it afterwards), and the | |
510 | "-a" file means "check out all files" (if you have a stale copy or an | |
511 | older version of a checked out tree you may also need to add the "-f" | |
512 | file first, to tell git-checkout-cache to _force_ overwriting of any old | |
513 | files). | |
514 | ||
515 | You have now successfully copied somebody elses (mine) remote | |
516 | repository, and checked it out. | |
517 | ||
8c7fa247 | 518 | [ to be continued.. cvs2git, tagging versions, branches, merging.. ] |