]> git.ipfire.org Git - thirdparty/git.git/blame - Documentation/diffcore.txt
Merge branch 'sv/objfixes'
[thirdparty/git.git] / Documentation / diffcore.txt
CommitLineData
4a1332d0
JH
1Tweaking diff output
2====================
3June 2005
4
5
6Introduction
7------------
8
4cc41a16
JH
9The diff commands git-diff-index, git-diff-files, and git-diff-tree
10can be told to manipulate differences they find in
59df2a11
CS
11unconventional ways before showing diff(1) output. The manipulation
12is collectively called "diffcore transformation". This short note
13describes what they are and how to use them to produce diff outputs
14that are easier to understand than the conventional kind.
4a1332d0
JH
15
16
17The chain of operation
18----------------------
19
20The git-diff-* family works by first comparing two sets of
21files:
22
215a7ad1 23 - git-diff-index compares contents of a "tree" object and the
e1ccf53a
YS
24 working directory (when '\--cached' flag is not used) or a
25 "tree" object and the index file (when '\--cached' flag is
4a1332d0
JH
26 used);
27
28 - git-diff-files compares contents of the index file and the
29 working directory;
30
59df2a11
CS
31 - git-diff-tree compares contents of two "tree" objects;
32
4a1332d0
JH
33In all of these cases, the commands themselves compare
34corresponding paths in the two sets of files. The result of
35comparison is passed from these commands to what is internally
36called "diffcore", in a format similar to what is output when
37the -p option is not used. E.g.
38
8db9307c
JH
39------------------------------------------------
40in-place edit :100644 100644 bcd1234... 0123456... M file0
41create :000000 100644 0000000... 1234567... A file4
42delete :100644 000000 1234567... 0000000... D file5
43unmerged :000000 000000 0000000... 0000000... U file6
44------------------------------------------------
4a1332d0
JH
45
46The diffcore mechanism is fed a list of such comparison results
47(each of which is called "filepair", although at this point each
48of them talks about a single file), and transforms such a list
28f8faff 49into another list. There are currently 6 such transformations:
4a1332d0 50
8db9307c
JH
51- diffcore-pathspec
52- diffcore-break
53- diffcore-rename
54- diffcore-merge-broken
55- diffcore-pickaxe
56- diffcore-order
4a1332d0 57
8db9307c 58These are applied in sequence. The set of filepairs git-diff-\*
4a1332d0
JH
59commands find are used as the input to diffcore-pathspec, and
60the output from diffcore-pathspec is used as the input to the
61next transformation. The final result is then passed to the
62output routine and generates either diff-raw format (see Output
8db9307c 63format sections of the manual for git-diff-\* commands) or
4a1332d0
JH
64diff-patch format.
65
66
59df2a11 67diffcore-pathspec: For Ignoring Files Outside Our Consideration
a67c1d08 68---------------------------------------------------------------
4a1332d0
JH
69
70The first transformation in the chain is diffcore-pathspec, and
71is controlled by giving the pathname parameters to the
72git-diff-* commands on the command line. The pathspec is used
73to limit the world diff operates in. It removes the filepairs
59df2a11
CS
74outside the specified set of pathnames. E.g. If the input set
75of filepairs included:
76
77------------------------------------------------
78:100644 100644 bcd1234... 0123456... M junkfile
79------------------------------------------------
80
81but the command invocation was "git-diff-files myfile", then the
82junkfile entry would be removed from the list because only "myfile"
83is under consideration.
4a1332d0
JH
84
85Implementation note. For performance reasons, git-diff-tree
86uses the pathname parameters on the command line to cull set of
87filepairs it feeds the diffcore mechanism itself, and does not
88use diffcore-pathspec, but the end result is the same.
89
90
59df2a11 91diffcore-break: For Splitting Up "Complete Rewrites"
a67c1d08 92----------------------------------------------------
4a1332d0
JH
93
94The second transformation in the chain is diffcore-break, and is
95controlled by the -B option to the git-diff-* commands. This is
96used to detect a filepair that represents "complete rewrite" and
97break such filepair into two filepairs that represent delete and
98create. E.g. If the input contained this filepair:
99
8db9307c
JH
100------------------------------------------------
101:100644 100644 bcd1234... 0123456... M file0
102------------------------------------------------
4a1332d0
JH
103
104and if it detects that the file "file0" is completely rewritten,
105it changes it to:
106
8db9307c
JH
107------------------------------------------------
108:100644 000000 bcd1234... 0000000... D file0
109:000000 100644 0000000... 0123456... A file0
110------------------------------------------------
4a1332d0
JH
111
112For the purpose of breaking a filepair, diffcore-break examines
113the extent of changes between the contents of the files before
114and after modification (i.e. the contents that have "bcd1234..."
115and "0123456..." as their SHA1 content ID, in the above
116example). The amount of deletion of original contents and
117insertion of new material are added together, and if it exceeds
118the "break score", the filepair is broken into two. The break
119score defaults to 50% of the size of the smaller of the original
120and the result (i.e. if the edit shrinks the file, the size of
121the result is used; if the edit lengthens the file, the size of
122the original is used), and can be customized by giving a number
123after "-B" option (e.g. "-B75" to tell it to use 75%).
124
125
59df2a11 126diffcore-rename: For Detection Renames and Copies
a67c1d08 127-------------------------------------------------
4a1332d0
JH
128
129This transformation is used to detect renames and copies, and is
130controlled by the -M option (to detect renames) and the -C option
131(to detect copies as well) to the git-diff-* commands. If the
132input contained these filepairs:
133
8db9307c
JH
134------------------------------------------------
135:100644 000000 0123456... 0000000... D fileX
136:000000 100644 0000000... 0123456... A file0
137------------------------------------------------
4a1332d0
JH
138
139and the contents of the deleted file fileX is similar enough to
140the contents of the created file file0, then rename detection
141merges these filepairs and creates:
142
8db9307c
JH
143------------------------------------------------
144:100644 100644 0123456... 0123456... R100 fileX file0
145------------------------------------------------
4a1332d0 146
59df2a11
CS
147When the "-C" option is used, the original contents of modified files,
148and deleted files (and also unmodified files, if the
149"\--find-copies-harder" option is used) are considered as candidates
150of the source files in rename/copy operation. If the input were like
151these filepairs, that talk about a modified file fileY and a newly
4a1332d0
JH
152created file file0:
153
8db9307c
JH
154------------------------------------------------
155:100644 100644 0123456... 1234567... M fileY
59df2a11 156:000000 100644 0000000... bcd3456... A file0
8db9307c 157------------------------------------------------
4a1332d0
JH
158
159the original contents of fileY and the resulting contents of
160file0 are compared, and if they are similar enough, they are
161changed to:
162
8db9307c
JH
163------------------------------------------------
164:100644 100644 0123456... 1234567... M fileY
59df2a11 165:100644 100644 0123456... bcd3456... C100 fileY file0
8db9307c 166------------------------------------------------
4a1332d0
JH
167
168In both rename and copy detection, the same "extent of changes"
169algorithm used in diffcore-break is used to determine if two
170files are "similar enough", and can be customized to use
59df2a11
CS
171a similarity score different from the default of 50% by giving a
172number after the "-M" or "-C" option (e.g. "-M8" to tell it to use
4a1332d0
JH
1738/10 = 80%).
174
e1ccf53a 175Note. When the "-C" option is used with `\--find-copies-harder`
8db9307c 176option, git-diff-\* commands feed unmodified filepairs to
232b75ab
JH
177diffcore mechanism as well as modified ones. This lets the copy
178detector consider unmodified files as copy source candidates at
e1ccf53a 179the expense of making it slower. Without `\--find-copies-harder`,
8db9307c 180git-diff-\* commands can detect copies only if the file that was
232b75ab 181copied happened to have been modified in the same changeset.
4a1332d0
JH
182
183
59df2a11 184diffcore-merge-broken: For Putting "Complete Rewrites" Back Together
a67c1d08 185--------------------------------------------------------------------
4a1332d0
JH
186
187This transformation is used to merge filepairs broken by
f73ae1fc 188diffcore-break, and not transformed into rename/copy by
4a1332d0
JH
189diffcore-rename, back into a single modification. This always
190runs when diffcore-break is used.
191
192For the purpose of merging broken filepairs back, it uses a
193different "extent of changes" computation from the ones used by
194diffcore-break and diffcore-rename. It counts only the deletion
195from the original, and does not count insertion. If you removed
196only 10 lines from a 100-line document, even if you added 910
197new lines to make a new 1000-line document, you did not do a
198complete rewrite. diffcore-break breaks such a case in order to
199help diffcore-rename to consider such filepairs as candidate of
200rename/copy detection, but if filepairs broken that way were not
201matched with other filepairs to create rename/copy, then this
202transformation merges them back into the original
203"modification".
204
205The "extent of changes" parameter can be tweaked from the
206default 80% (that is, unless more than 80% of the original
207material is deleted, the broken pairs are merged back into a
208single modification) by giving a second number to -B option,
209like these:
210
8db9307c
JH
211* -B50/60 (give 50% "break score" to diffcore-break, use 60%
212 for diffcore-merge-broken).
213
214* -B/60 (the same as above, since diffcore-break defaults to 50%).
4a1332d0 215
366175ef 216Note that earlier implementation left a broken pair as a separate
f73ae1fc 217creation and deletion patches. This was an unnecessary hack and
366175ef
JH
218the latest implementation always merges all the broken pairs
219back into modifications, but the resulting patch output is
f73ae1fc 220formatted differently for easier review in case of such
366175ef
JH
221a complete rewrite by showing the entire contents of old version
222prefixed with '-', followed by the entire contents of new
223version prefixed with '+'.
224
4a1332d0 225
59df2a11 226diffcore-pickaxe: For Detecting Addition/Deletion of Specified String
a67c1d08 227---------------------------------------------------------------------
4a1332d0
JH
228
229This transformation is used to find filepairs that represent
230changes that touch a specified string, and is controlled by the
e1ccf53a 231-S option and the `\--pickaxe-all` option to the git-diff-*
4a1332d0
JH
232commands.
233
234When diffcore-pickaxe is in use, it checks if there are
235filepairs whose "original" side has the specified string and
236whose "result" side does not. Such a filepair represents "the
237string appeared in this changeset". It also checks for the
238opposite case that loses the specified string.
239
e1ccf53a 240When `\--pickaxe-all` is not in effect, diffcore-pickaxe leaves
59df2a11 241only such filepairs that touch the specified string in its
e1ccf53a 242output. When `\--pickaxe-all` is used, diffcore-pickaxe leaves all
4a1332d0
JH
243filepairs intact if there is such a filepair, or makes the
244output empty otherwise. The latter behaviour is designed to
245make reviewing of the changes in the context of the whole
246changeset easier.
247
248
59df2a11 249diffcore-order: For Sorting the Output Based on Filenames
a67c1d08 250---------------------------------------------------------
4a1332d0
JH
251
252This is used to reorder the filepairs according to the user's
253(or project's) taste, and is controlled by the -O option to the
254git-diff-* commands.
255
59df2a11 256This takes a text file each of whose lines is a shell glob
4a1332d0
JH
257pattern. Filepairs that match a glob pattern on an earlier line
258in the file are output before ones that match a later line, and
259filepairs that do not match any glob pattern are output last.
260
59df2a11 261As an example, a typical orderfile for the core git probably
8db9307c 262would look like this:
4a1332d0 263
8db9307c 264------------------------------------------------
df8baa42
JF
265README
266Makefile
267Documentation
268*.h
269*.c
270t
8db9307c 271------------------------------------------------
4a1332d0 272