]>
Commit | Line | Data |
---|---|---|
4a1332d0 JH |
1 | Tweaking diff output |
2 | ==================== | |
3 | June 2005 | |
4 | ||
5 | ||
6 | Introduction | |
7 | ------------ | |
8 | ||
9 | The diff commands git-diff-cache, git-diff-files, and | |
10 | git-diff-tree can be told to manipulate differences they find | |
11 | in unconventional ways before showing diff(1) output. The | |
12 | manipulation is collectively called "diffcore transformation". | |
13 | This short note describes what they are and how to use them to | |
14 | produce diff outputs that are easier to understand than the | |
15 | conventional kind. | |
16 | ||
17 | ||
18 | The chain of operation | |
19 | ---------------------- | |
20 | ||
21 | The git-diff-* family works by first comparing two sets of | |
22 | files: | |
23 | ||
24 | - git-diff-cache compares contents of a "tree" object and the | |
8db9307c JH |
25 | working directory (when '--cached' flag is not used) or a |
26 | "tree" object and the index file (when '--cached' flag is | |
4a1332d0 JH |
27 | used); |
28 | ||
29 | - git-diff-files compares contents of the index file and the | |
30 | working directory; | |
31 | ||
32 | - git-diff-tree compares contents of two "tree" objects. | |
33 | ||
34 | In all of these cases, the commands themselves compare | |
35 | corresponding paths in the two sets of files. The result of | |
36 | comparison is passed from these commands to what is internally | |
37 | called "diffcore", in a format similar to what is output when | |
38 | the -p option is not used. E.g. | |
39 | ||
8db9307c JH |
40 | ------------------------------------------------ |
41 | in-place edit :100644 100644 bcd1234... 0123456... M file0 | |
42 | create :000000 100644 0000000... 1234567... A file4 | |
43 | delete :100644 000000 1234567... 0000000... D file5 | |
44 | unmerged :000000 000000 0000000... 0000000... U file6 | |
45 | ------------------------------------------------ | |
4a1332d0 JH |
46 | |
47 | The diffcore mechanism is fed a list of such comparison results | |
48 | (each of which is called "filepair", although at this point each | |
49 | of them talks about a single file), and transforms such a list | |
28f8faff | 50 | into another list. There are currently 6 such transformations: |
4a1332d0 | 51 | |
8db9307c JH |
52 | - diffcore-pathspec |
53 | - diffcore-break | |
54 | - diffcore-rename | |
55 | - diffcore-merge-broken | |
56 | - diffcore-pickaxe | |
57 | - diffcore-order | |
4a1332d0 | 58 | |
8db9307c | 59 | These are applied in sequence. The set of filepairs git-diff-\* |
4a1332d0 JH |
60 | commands find are used as the input to diffcore-pathspec, and |
61 | the output from diffcore-pathspec is used as the input to the | |
62 | next transformation. The final result is then passed to the | |
63 | output routine and generates either diff-raw format (see Output | |
8db9307c | 64 | format sections of the manual for git-diff-\* commands) or |
4a1332d0 JH |
65 | diff-patch format. |
66 | ||
67 | ||
68 | diffcore-pathspec | |
69 | ----------------- | |
70 | ||
71 | The first transformation in the chain is diffcore-pathspec, and | |
72 | is controlled by giving the pathname parameters to the | |
73 | git-diff-* commands on the command line. The pathspec is used | |
74 | to limit the world diff operates in. It removes the filepairs | |
75 | outside the specified set of pathnames. | |
76 | ||
77 | Implementation note. For performance reasons, git-diff-tree | |
78 | uses the pathname parameters on the command line to cull set of | |
79 | filepairs it feeds the diffcore mechanism itself, and does not | |
80 | use diffcore-pathspec, but the end result is the same. | |
81 | ||
82 | ||
83 | diffcore-break | |
84 | -------------- | |
85 | ||
86 | The second transformation in the chain is diffcore-break, and is | |
87 | controlled by the -B option to the git-diff-* commands. This is | |
88 | used to detect a filepair that represents "complete rewrite" and | |
89 | break such filepair into two filepairs that represent delete and | |
90 | create. E.g. If the input contained this filepair: | |
91 | ||
8db9307c JH |
92 | ------------------------------------------------ |
93 | :100644 100644 bcd1234... 0123456... M file0 | |
94 | ------------------------------------------------ | |
4a1332d0 JH |
95 | |
96 | and if it detects that the file "file0" is completely rewritten, | |
97 | it changes it to: | |
98 | ||
8db9307c JH |
99 | ------------------------------------------------ |
100 | :100644 000000 bcd1234... 0000000... D file0 | |
101 | :000000 100644 0000000... 0123456... A file0 | |
102 | ------------------------------------------------ | |
4a1332d0 JH |
103 | |
104 | For the purpose of breaking a filepair, diffcore-break examines | |
105 | the extent of changes between the contents of the files before | |
106 | and after modification (i.e. the contents that have "bcd1234..." | |
107 | and "0123456..." as their SHA1 content ID, in the above | |
108 | example). The amount of deletion of original contents and | |
109 | insertion of new material are added together, and if it exceeds | |
110 | the "break score", the filepair is broken into two. The break | |
111 | score defaults to 50% of the size of the smaller of the original | |
112 | and the result (i.e. if the edit shrinks the file, the size of | |
113 | the result is used; if the edit lengthens the file, the size of | |
114 | the original is used), and can be customized by giving a number | |
115 | after "-B" option (e.g. "-B75" to tell it to use 75%). | |
116 | ||
117 | ||
118 | diffcore-rename | |
119 | --------------- | |
120 | ||
121 | This transformation is used to detect renames and copies, and is | |
122 | controlled by the -M option (to detect renames) and the -C option | |
123 | (to detect copies as well) to the git-diff-* commands. If the | |
124 | input contained these filepairs: | |
125 | ||
8db9307c JH |
126 | ------------------------------------------------ |
127 | :100644 000000 0123456... 0000000... D fileX | |
128 | :000000 100644 0000000... 0123456... A file0 | |
129 | ------------------------------------------------ | |
4a1332d0 JH |
130 | |
131 | and the contents of the deleted file fileX is similar enough to | |
132 | the contents of the created file file0, then rename detection | |
133 | merges these filepairs and creates: | |
134 | ||
8db9307c JH |
135 | ------------------------------------------------ |
136 | :100644 100644 0123456... 0123456... R100 fileX file0 | |
137 | ------------------------------------------------ | |
4a1332d0 JH |
138 | |
139 | When the "-C" option is used, the original contents of modified | |
140 | files and contents of unchanged files are considered as | |
141 | candidates of the source files in rename/copy operation, in | |
142 | addition to the deleted files. If the input were like these | |
143 | filepairs, that talk about a modified file fileY and a newly | |
144 | created file file0: | |
145 | ||
8db9307c JH |
146 | ------------------------------------------------ |
147 | :100644 100644 0123456... 1234567... M fileY | |
148 | :000000 100644 0000000... 0123456... A file0 | |
149 | ------------------------------------------------ | |
4a1332d0 JH |
150 | |
151 | the original contents of fileY and the resulting contents of | |
152 | file0 are compared, and if they are similar enough, they are | |
153 | changed to: | |
154 | ||
8db9307c JH |
155 | ------------------------------------------------ |
156 | :100644 100644 0123456... 1234567... M fileY | |
157 | :100644 100644 0123456... 0123456... C100 fileY file0 | |
158 | ------------------------------------------------ | |
4a1332d0 JH |
159 | |
160 | In both rename and copy detection, the same "extent of changes" | |
161 | algorithm used in diffcore-break is used to determine if two | |
162 | files are "similar enough", and can be customized to use | |
163 | similarity score different from the default 50% by giving a | |
164 | number after "-M" or "-C" option (e.g. "-M8" to tell it to use | |
165 | 8/10 = 80%). | |
166 | ||
232b75ab | 167 | Note. When the "-C" option is used with --find-copies-harder |
8db9307c | 168 | option, git-diff-\* commands feed unmodified filepairs to |
232b75ab JH |
169 | diffcore mechanism as well as modified ones. This lets the copy |
170 | detector consider unmodified files as copy source candidates at | |
171 | the expense of making it slower. Without --find-copies-harder, | |
8db9307c | 172 | git-diff-\* commands can detect copies only if the file that was |
232b75ab | 173 | copied happened to have been modified in the same changeset. |
4a1332d0 JH |
174 | |
175 | ||
176 | diffcore-merge-broken | |
177 | --------------------- | |
178 | ||
179 | This transformation is used to merge filepairs broken by | |
180 | diffcore-break, and were not transformed into rename/copy by | |
181 | diffcore-rename, back into a single modification. This always | |
182 | runs when diffcore-break is used. | |
183 | ||
184 | For the purpose of merging broken filepairs back, it uses a | |
185 | different "extent of changes" computation from the ones used by | |
186 | diffcore-break and diffcore-rename. It counts only the deletion | |
187 | from the original, and does not count insertion. If you removed | |
188 | only 10 lines from a 100-line document, even if you added 910 | |
189 | new lines to make a new 1000-line document, you did not do a | |
190 | complete rewrite. diffcore-break breaks such a case in order to | |
191 | help diffcore-rename to consider such filepairs as candidate of | |
192 | rename/copy detection, but if filepairs broken that way were not | |
193 | matched with other filepairs to create rename/copy, then this | |
194 | transformation merges them back into the original | |
195 | "modification". | |
196 | ||
197 | The "extent of changes" parameter can be tweaked from the | |
198 | default 80% (that is, unless more than 80% of the original | |
199 | material is deleted, the broken pairs are merged back into a | |
200 | single modification) by giving a second number to -B option, | |
201 | like these: | |
202 | ||
8db9307c JH |
203 | * -B50/60 (give 50% "break score" to diffcore-break, use 60% |
204 | for diffcore-merge-broken). | |
205 | ||
206 | * -B/60 (the same as above, since diffcore-break defaults to 50%). | |
4a1332d0 | 207 | |
366175ef JH |
208 | Note that earlier implementation left a broken pair as a separate |
209 | creation and deletion patches. This was unnecessary hack and | |
210 | the latest implementation always merges all the broken pairs | |
211 | back into modifications, but the resulting patch output is | |
212 | formatted differently to still let the reviewing easier for such | |
213 | a complete rewrite by showing the entire contents of old version | |
214 | prefixed with '-', followed by the entire contents of new | |
215 | version prefixed with '+'. | |
216 | ||
4a1332d0 JH |
217 | |
218 | diffcore-pickaxe | |
219 | ---------------- | |
220 | ||
221 | This transformation is used to find filepairs that represent | |
222 | changes that touch a specified string, and is controlled by the | |
223 | -S option and the --pickaxe-all option to the git-diff-* | |
224 | commands. | |
225 | ||
226 | When diffcore-pickaxe is in use, it checks if there are | |
227 | filepairs whose "original" side has the specified string and | |
228 | whose "result" side does not. Such a filepair represents "the | |
229 | string appeared in this changeset". It also checks for the | |
230 | opposite case that loses the specified string. | |
231 | ||
232 | When --pickaxe-all is not in effect, diffcore-pickaxe leaves | |
233 | only such filepairs that touches the specified string in its | |
234 | output. When --pickaxe-all is used, diffcore-pickaxe leaves all | |
235 | filepairs intact if there is such a filepair, or makes the | |
236 | output empty otherwise. The latter behaviour is designed to | |
237 | make reviewing of the changes in the context of the whole | |
238 | changeset easier. | |
239 | ||
240 | ||
241 | diffcore-order | |
242 | -------------- | |
243 | ||
244 | This is used to reorder the filepairs according to the user's | |
245 | (or project's) taste, and is controlled by the -O option to the | |
246 | git-diff-* commands. | |
247 | ||
248 | This takes a text file each of whose line is a shell glob | |
249 | pattern. Filepairs that match a glob pattern on an earlier line | |
250 | in the file are output before ones that match a later line, and | |
251 | filepairs that do not match any glob pattern are output last. | |
252 | ||
253 | As an example, typical orderfile for the core GIT probably | |
8db9307c | 254 | would look like this: |
4a1332d0 | 255 | |
8db9307c | 256 | ------------------------------------------------ |
4a1332d0 JH |
257 | README |
258 | Makefile | |
259 | Documentation | |
260 | *.h | |
261 | *.c | |
262 | t | |
8db9307c | 263 | ------------------------------------------------ |
4a1332d0 | 264 |