]> git.ipfire.org Git - thirdparty/git.git/blame - README
receive-pack hooks updates.
[thirdparty/git.git] / README
CommitLineData
8ac866a8 1////////////////////////////////////////////////////////////////
6ad6d3d3 2
e83c5163
LT
3 GIT - the stupid content tracker
4
8ac866a8 5////////////////////////////////////////////////////////////////
e83c5163
LT
6"git" can mean anything, depending on your mood.
7
8 - random three-letter combination that is pronounceable, and not
9 actually used by any common UNIX command. The fact that it is a
90c4851b 10 mispronunciation of "get" may or may not be relevant.
e83c5163
LT
11 - stupid. contemptible and despicable. simple. Take your pick from the
12 dictionary of slang.
13 - "global information tracker": you're in a good mood, and it actually
14 works for you. Angels sing, and a light suddenly fills the room.
15 - "goddamn idiotic truckload of sh*t": when it breaks
16
17This is a stupid (but extremely fast) directory content manager. It
18doesn't do a whole lot, but what it _does_ do is track directory
19contents efficiently.
20
21There are two object abstractions: the "object database", and the
6ad6d3d3
LT
22"current directory cache" aka "index".
23
8ac866a8
DG
24The Object Database
25~~~~~~~~~~~~~~~~~~~
e83c5163
LT
26The object database is literally just a content-addressable collection
27of objects. All objects are named by their content, which is
28approximated by the SHA1 hash of the object itself. Objects may refer
8ac866a8
DG
29to other objects (by referencing their SHA1 hash), and so you can
30build up a hierarchy of objects.
e83c5163 31
6ad6d3d3
LT
32All objects have a statically determined "type" aka "tag", which is
33determined at object creation time, and which identifies the format of
7096a645 34the object (i.e. how it is used, and how it can refer to other
c4584ae3
JH
35objects). There are currently four different object types: "blob",
36"tree", "commit" and "tag".
6ad6d3d3
LT
37
38A "blob" object cannot refer to any other object, and is, like the tag
39implies, a pure storage object containing some user data. It is used to
90c4851b 40actually store the file data, i.e. a blob object is associated with some
6ad6d3d3
LT
41particular version of some file.
42
43A "tree" object is an object that ties one or more "blob" objects into a
44directory structure. In addition, a tree object can refer to other tree
45objects, thus creating a directory hierarchy.
46
7096a645 47A "commit" object ties such directory hierarchies together into
6ad6d3d3
LT
48a DAG of revisions - each "commit" is associated with exactly one tree
49(the directory hierarchy at the time of the commit). In addition, a
50"commit" refers to one or more "parent" commit objects that describe the
51history of how we arrived at that directory hierarchy.
52
53As a special case, a commit object with no parents is called the "root"
54object, and is the point of an initial project commit. Each project
55must have at least one root, and while you can tie several different
56root objects together into one project by creating a commit object which
57has two or more separate roots as its ultimate parents, that's probably
58just going to confuse people. So aim for the notion of "one root object
59per project", even if git itself does not enforce that.
60
8ac866a8
DG
61A "tag" object symbolically identifies and can be used to sign other
62objects. It contains the identifier and type of another object, a
63symbolic name (of course!) and, optionally, a signature.
64
2aef5bba
DG
65Regardless of object type, all objects share the following
66characteristics: they are all deflated with zlib, and have a header
67that not only specifies their tag, but also provides size information
68about the data in the object. It's worth noting that the SHA1 hash
c4584ae3
JH
69that is used to name the object is the hash of the original data.
70(Historical note: in the dawn of the age of git the hash
2aef5bba 71was the sha1 of the _compressed_ object)
6ad6d3d3
LT
72
73As a result, the general consistency of an object can always be tested
e83c5163
LT
74independently of the contents or the type of the object: all objects can
75be validated by verifying that (a) their hashes match the content of the
76file and (b) the object successfully inflates to a stream of bytes that
77forms a sequence of <ascii tag without space> + <space> + <ascii decimal
78size> + <byte\0> + <binary object data>.
79
8ac866a8
DG
80The structured objects can further have their structure and
81connectivity to other objects verified. This is generally done with
7096a645
DG
82the "git-fsck-cache" program, which generates a full dependency graph
83of all objects, and verifies their internal consistency (in addition
84to just verifying their superficial consistency through the hash).
6ad6d3d3
LT
85
86The object types in some more detail:
87
8ac866a8
DG
88Blob Object
89~~~~~~~~~~~
90A "blob" object is nothing but a binary blob of data, and doesn't
91refer to anything else. There is no signature or any other
92verification of the data, so while the object is consistent (it _is_
93indexed by its sha1 hash, so the data itself is certainly correct), it
94has absolutely no other attributes. No name associations, no
95permissions. It is purely a blob of data (i.e. normally "file
96contents").
97
98In particular, since the blob is entirely defined by its data, if two
99files in a directory tree (or in multiple different versions of the
100repository) have the same contents, they will share the same blob
101object. The object is totally independent of it's location in the
102directory tree, and renaming a file does not change the object that
103file is associated with in any way.
104
7672db20
BL
105A blob is typically created when link:git-update-cache.html[git-update-cache]
106is run, and it's data can be accessed by link:git-cat-file.html[git-cat-file].
7096a645 107
8ac866a8
DG
108Tree Object
109~~~~~~~~~~~
110The next hierarchical object type is the "tree" object. A tree object
111is a list of mode/name/blob data, sorted by name. Alternatively, the
112mode data may specify a directory mode, in which case instead of
113naming a blob, that name is associated with another TREE object.
114
115Like the "blob" object, a tree object is uniquely determined by the
116set contents, and so two separate but identical trees will always
117share the exact same object. This is true at all levels, i.e. it's
118true for a "leaf" tree (which does not refer to any other trees, only
119blobs) as well as for a whole subdirectory.
120
121For that reason a "tree" object is just a pure data abstraction: it
122has no history, no signatures, no verification of validity, except
123that since the contents are again protected by the hash itself, we can
124trust that the tree is immutable and its contents never change.
125
126So you can trust the contents of a tree to be valid, the same way you
127can trust the contents of a blob, but you don't know where those
128contents _came_ from.
129
130Side note on trees: since a "tree" object is a sorted list of
131"filename+content", you can create a diff between two trees without
132actually having to unpack two trees. Just ignore all common parts,
133and your diff will look right. In other words, you can effectively
134(and efficiently) tell the difference between any two random trees by
135O(n) where "n" is the size of the difference, rather than the size of
136the tree.
137
138Side note 2 on trees: since the name of a "blob" depends entirely and
139exclusively on its contents (i.e. there are no names or permissions
140involved), you can see trivial renames or permission changes by
141noticing that the blob stayed the same. However, renames with data
142changes need a smarter "diff" implementation.
143
7096a645
DG
144A tree is created with link:git-write-tree.html[git-write-tree] and
145it's data can be accessed by link:git-ls-tree.html[git-ls-tree]
8ac866a8 146
7096a645
DG
147Commit Object
148~~~~~~~~~~~~~
149The "commit" object is an object that introduces the notion of
8ac866a8
DG
150history into the picture. In contrast to the other objects, it
151doesn't just describe the physical state of a tree, it describes how
152we got there, and why.
153
7096a645
DG
154A "commit" is defined by the tree-object that it results in, the
155parent commits (zero, one or more) that led up to that point, and a
156comment on what happened. Again, a commit is not trusted per se:
8ac866a8
DG
157the contents are well-defined and "safe" due to the cryptographically
158strong signatures at all levels, but there is no reason to believe
159that the tree is "good" or that the merge information makes sense.
160The parents do not have to actually have any relationship with the
161result, for example.
162
7096a645
DG
163Note on commits: unlike real SCM's, commits do not contain
164rename information or file mode chane information. All of that is
8ac866a8
DG
165implicit in the trees involved (the result tree, and the result trees
166of the parents), and describing that makes no sense in this idiotic
167file manager.
168
7096a645
DG
169A commit is created with link:git-commit-tree.html[git-commit-tree] and
170it's data can be accessed by link:git-cat-file.html[git-cat-file]
171
172Trust
173~~~~~
174An aside on the notion of "trust". Trust is really outside the scope
175of "git", but it's worth noting a few things. First off, since
176everything is hashed with SHA1, you _can_ trust that an object is
177intact and has not been messed with by external sources. So the name
178of an object uniquely identifies a known state - just not a state that
179you may want to trust.
8ac866a8 180
7096a645 181Furthermore, since the SHA1 signature of a commit refers to the
8ac866a8 182SHA1 signatures of the tree it is associated with and the signatures
7096a645 183of the parent, a single named commit specifies uniquely a whole set
8ac866a8 184of history, with full contents. You can't later fake any step of the
7096a645 185way once you have the name of a commit.
8ac866a8
DG
186
187So to introduce some real trust in the system, the only thing you need
188to do is to digitally sign just _one_ special note, which includes the
7096a645
DG
189name of a top-level commit. Your digital signature shows others
190that you trust that commit, and the immutability of the history of
191commits tells others that they can trust the whole history.
8ac866a8
DG
192
193In other words, you can easily validate a whole archive by just
194sending out a single email that tells the people the name (SHA1 hash)
7096a645 195of the top commit, and digitally sign that email using something
8ac866a8
DG
196like GPG/PGP.
197
7096a645 198To assist in this, git also provides the tag object...
8ac866a8 199
7096a645
DG
200Tag Object
201~~~~~~~~~~
202Git provides the "tag" object to simplify creating, managing and
203exchanging symbolic and signed tokens. The "tag" object at its
204simplest simply symbolically identifies another object by containing
205the sha1, type and symbolic name.
8ac866a8 206
7096a645
DG
207However it can optionally contain additional signature information
208(which git doesn't care about as long as there's less than 8k of
209it). This can then be verified externally to git.
8ac866a8 210
7096a645
DG
211Note that despite the tag features, "git" itself only handles content
212integrity; the trust framework (and signature provision and
213verification) has to come from outside.
8ac866a8 214
7096a645
DG
215A tag is created with link:git-mktag.html[git-mktag] and
216it's data can be accessed by link:git-cat-file.html[git-cat-file]
8ac866a8 217
2aef5bba 218
8ac866a8
DG
219The "index" aka "Current Directory Cache"
220-----------------------------------------
6ad6d3d3
LT
221The index is a simple binary file, which contains an efficient
222representation of a virtual directory content at some random time. It
223does so by a simple array that associates a set of names, dates,
224permissions and content (aka "blob") objects together. The cache is
225always kept ordered by name, and names are unique (with a few very
226specific rules) at any point in time, but the cache has no long-term
8ac866a8 227meaning, and can be partially updated at any time.
6ad6d3d3
LT
228
229In particular, the index certainly does not need to be consistent with
230the current directory contents (in fact, most operations will depend on
231different ways to make the index _not_ be consistent with the directory
232hierarchy), but it has three very important attributes:
e83c5163 233
8ac866a8
DG
234'(a) it can re-generate the full state it caches (not just the
235directory structure: it contains pointers to the "blob" objects so
236that it can regenerate the data too)'
e83c5163 237
8ac866a8
DG
238As a special case, there is a clear and unambiguous one-way mapping
239from a current directory cache to a "tree object", which can be
240efficiently created from just the current directory cache without
241actually looking at any other data. So a directory cache at any one
242time uniquely specifies one and only one "tree" object (but has
243additional data to make it easy to match up that tree object with what
244has happened in the directory)
e83c5163 245
8ac866a8
DG
246'(b) it has efficient methods for finding inconsistencies between that
247cached state ("tree object waiting to be instantiated") and the
248current state.'
e83c5163 249
8ac866a8
DG
250'(c) it can additionally efficiently represent information about merge
251conflicts between different tree objects, allowing each pathname to be
252associated with sufficient information about the trees involved that
253you can create a three-way merge between them.'
6ad6d3d3
LT
254
255Those are the three ONLY things that the directory cache does. It's a
e83c5163
LT
256cache, and the normal operation is to re-generate it completely from a
257known tree object, or update/compare it with a live tree that is being
6ad6d3d3
LT
258developed. If you blow the directory cache away entirely, you generally
259haven't lost any information as long as you have the name of the tree
260that it described.
261
262At the same time, the directory index is at the same time also the
263staging area for creating new trees, and creating a new tree always
264involves a controlled modification of the index file. In particular,
265the index file can have the representation of an intermediate tree that
266has not yet been instantiated. So the index can be thought of as a
267write-back cache, which can contain dirty information that has not yet
8ac866a8 268been written back to the backing store.
6ad6d3d3
LT
269
270
271
8ac866a8
DG
272The Workflow
273------------
6ad6d3d3 274Generally, all "git" operations work on the index file. Some operations
8ac866a8 275work *purely* on the index file (showing the current state of the
6ad6d3d3
LT
276index), but most operations move data to and from the index file. Either
277from the database or from the working directory. Thus there are four
278main combinations:
279
8ac866a8
DG
2801) working directory -> index
281~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
6ad6d3d3 282
8ac866a8 283You update the index with information from the working directory with
7096a645
DG
284the link:git-update-cache.html[git-update-cache] command. You
285generally update the index information by just specifying the filename
286you want to update, like so:
6ad6d3d3 287
7096a645 288 git-update-cache filename
6ad6d3d3 289
8ac866a8
DG
290but to avoid common mistakes with filename globbing etc, the command
291will not normally add totally new entries or remove old entries,
292i.e. it will normally just update existing cache entries.
6ad6d3d3 293
8ac866a8
DG
294To tell git that yes, you really do realize that certain files no
295longer exist in the archive, or that new files should be added, you
296should use the "--remove" and "--add" flags respectively.
6ad6d3d3 297
8ac866a8
DG
298NOTE! A "--remove" flag does _not_ mean that subsequent filenames will
299necessarily be removed: if the files still exist in your directory
300structure, the index will be updated with their new status, not
301removed. The only thing "--remove" means is that update-cache will be
302considering a removed file to be a valid thing, and if the file really
303does not exist any more, it will update the index accordingly.
6ad6d3d3 304
7096a645 305As a special case, you can also do "git-update-cache --refresh", which
8ac866a8
DG
306will refresh the "stat" information of each index to match the current
307stat information. It will _not_ update the object status itself, and
308it will only update the fields that are used to quickly test whether
309an object still matches its old backing store object.
6ad6d3d3 310
8ac866a8
DG
3112) index -> object database
312~~~~~~~~~~~~~~~~~~~~~~~~~~~
6ad6d3d3 313
8ac866a8 314You write your current index file to a "tree" object with the program
6ad6d3d3 315
7096a645 316 git-write-tree
6ad6d3d3 317
8ac866a8
DG
318that doesn't come with any options - it will just write out the
319current index into the set of tree objects that describe that state,
320and it will return the name of the resulting top-level tree. You can
321use that tree to re-generate the index at any time by going in the
322other direction:
6ad6d3d3 323
8ac866a8
DG
3243) object database -> index
325~~~~~~~~~~~~~~~~~~~~~~~~~~~
6ad6d3d3 326
8ac866a8
DG
327You read a "tree" file from the object database, and use that to
328populate (and overwrite - don't do this if your index contains any
329unsaved state that you might want to restore later!) your current
330index. Normal operation is just
6ad6d3d3 331
7096a645 332 git-read-tree <sha1 of tree>
6ad6d3d3 333
8ac866a8
DG
334and your index file will now be equivalent to the tree that you saved
335earlier. However, that is only your _index_ file: your working
336directory contents have not been modified.
6ad6d3d3 337
8ac866a8
DG
3384) index -> working directory
339~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
6ad6d3d3 340
8ac866a8
DG
341You update your working directory from the index by "checking out"
342files. This is not a very common operation, since normally you'd just
343keep your files updated, and rather than write to your working
344directory, you'd tell the index files about the changes in your
7096a645 345working directory (i.e. "git-update-cache").
6ad6d3d3 346
8ac866a8
DG
347However, if you decide to jump to a new version, or check out somebody
348else's version, or just restore a previous tree, you'd populate your
349index file with read-tree, and then you need to check out the result
350with
7096a645 351 git-checkout-cache filename
6ad6d3d3 352
8ac866a8 353or, if you want to check out all of the index, use "-a".
6ad6d3d3 354
7096a645
DG
355NOTE! git-checkout-cache normally refuses to overwrite old files, so
356if you have an old version of the tree already checked out, you will
357need to use the "-f" flag (_before_ the "-a" flag or the filename) to
8ac866a8 358_force_ the checkout.
6ad6d3d3
LT
359
360
8ac866a8
DG
361Finally, there are a few odds and ends which are not purely moving
362from one representation to the other:
6ad6d3d3 363
8ac866a8
DG
3645) Tying it all together
365~~~~~~~~~~~~~~~~~~~~~~~~
7096a645
DG
366To commit a tree you have instantiated with "git-write-tree", you'd
367create a "commit" object that refers to that tree and the history
368behind it - most notably the "parent" commits that preceded it in
369history.
6ad6d3d3 370
8ac866a8
DG
371Normally a "commit" has one parent: the previous state of the tree
372before a certain change was made. However, sometimes it can have two
373or more parent commits, in which case we call it a "merge", due to the
374fact that such a commit brings together ("merges") two or more
375previous states represented by other commits.
6ad6d3d3 376
8ac866a8
DG
377In other words, while a "tree" represents a particular directory state
378of a working directory, a "commit" represents that state in "time",
379and explains how we got there.
6ad6d3d3 380
8ac866a8
DG
381You create a commit object by giving it the tree that describes the
382state at the time of the commit, and a list of parents:
6ad6d3d3 383
7096a645 384 git-commit-tree <tree> -p <parent> [-p <parent2> ..]
6ad6d3d3 385
8ac866a8
DG
386and then giving the reason for the commit on stdin (either through
387redirection from a pipe or file, or by just typing it at the tty).
6ad6d3d3 388
7096a645
DG
389git-commit-tree will return the name of the object that represents
390that commit, and you should save it away for later use. Normally,
391you'd commit a new "HEAD" state, and while git doesn't care where you
392save the note about that state, in practice we tend to just write the
8ac866a8
DG
393result to the file ".git/HEAD", so that we can always see what the
394last committed state was.
6ad6d3d3 395
8ac866a8
DG
3966) Examining the data
397~~~~~~~~~~~~~~~~~~~~~
6ad6d3d3 398
8ac866a8
DG
399You can examine the data represented in the object database and the
400index with various helper tools. For every object, you can use
7096a645
DG
401link:git-cat-file.html[git-cat-file] to examine details about the
402object:
6ad6d3d3 403
7096a645 404 git-cat-file -t <objectname>
6ad6d3d3 405
8ac866a8
DG
406shows the type of the object, and once you have the type (which is
407usually implicit in where you find the object), you can use
6ad6d3d3 408
7096a645 409 git-cat-file blob|tree|commit <objectname>
6ad6d3d3 410
8ac866a8 411to show its contents. NOTE! Trees have binary content, and as a result
7096a645
DG
412there is a special helper for showing that content, called
413"git-ls-tree", which turns the binary content into a more easily
414readable form.
6ad6d3d3 415
8ac866a8
DG
416It's especially instructive to look at "commit" objects, since those
417tend to be small and fairly self-explanatory. In particular, if you
418follow the convention of having the top commit name in ".git/HEAD",
419you can do
6ad6d3d3 420
7096a645 421 git-cat-file commit $(cat .git/HEAD)
6ad6d3d3 422
8ac866a8 423to see what the top commit was.
6ad6d3d3 424
8ac866a8
DG
4257) Merging multiple trees
426~~~~~~~~~~~~~~~~~~~~~~~~~
6ad6d3d3 427
8ac866a8
DG
428Git helps you do a three-way merge, which you can expand to n-way by
429repeating the merge procedure arbitrary times until you finally
430"commit" the state. The normal situation is that you'd only do one
431three-way merge (two parents), and commit it, but if you like to, you
432can do multiple parents in one go.
6ad6d3d3 433
8ac866a8
DG
434To do a three-way merge, you need the two sets of "commit" objects
435that you want to merge, use those to find the closest common parent (a
436third "commit" object), and then use those commit objects to find the
437state of the directory ("tree" object) at these points.
6ad6d3d3 438
8ac866a8
DG
439To get the "base" for the merge, you first look up the common parent
440of two commits with
6ad6d3d3 441
7096a645 442 git-merge-base <commit1> <commit2>
6ad6d3d3 443
8ac866a8
DG
444which will return you the commit they are both based on. You should
445now look up the "tree" objects of those commits, which you can easily
446do with (for example)
6ad6d3d3 447
7096a645 448 git-cat-file commit <commitname> | head -1
6ad6d3d3 449
8ac866a8
DG
450since the tree object information is always the first line in a commit
451object.
452
453Once you know the three trees you are going to merge (the one
454"original" tree, aka the common case, and the two "result" trees, aka
455the branches you want to merge), you do a "merge" read into the
456index. This will throw away your old index contents, so you should
457make sure that you've committed those - in fact you would normally
458always do a merge against your last commit (which should thus match
459what you have in your current index anyway).
6ad6d3d3 460
8ac866a8 461To do the merge, do
6ad6d3d3 462
7096a645 463 git-read-tree -m <origtree> <target1tree> <target2tree>
6ad6d3d3 464
8ac866a8 465which will do all trivial merge operations for you directly in the
7096a645
DG
466index file, and you can just write the result out with
467"git-write-tree".
6ad6d3d3 468
8ac866a8
DG
469NOTE! Because the merge is done in the index file, and not in your
470working directory, your working directory will no longer match your
7096a645
DG
471index. You can use "git-checkout-cache -f -a" to make the effect of
472the merge be seen in your working directory.
6ad6d3d3 473
8ac866a8
DG
474NOTE2! Sadly, many merges aren't trivial. If there are files that have
475been added.moved or removed, or if both branches have modified the
476same file, you will be left with an index tree that contains "merge
477entries" in it. Such an index tree can _NOT_ be written out to a tree
478object, and you will have to resolve any such merge clashes using
479other tools before you can write out the result.
6ad6d3d3 480
6ad6d3d3 481
8ac866a8 482[ fixme: talk about resolving merges here ]