]>
Commit | Line | Data |
---|---|---|
bb80333c EN |
1 | Rebases and cherry-picks involve a sequence of merges whose results are |
2 | recorded as new single-parent commits. The first parent side of those | |
3 | merges represent the "upstream" side, and often include a far larger set of | |
4 | changes than the second parent side. Traditionally, the renames on the | |
5 | first-parent side of that sequence of merges were repeatedly re-detected | |
6 | for every merge. This file explains why it is safe and effective during | |
7 | rebases and cherry-picks to remember renames on the upstream side of | |
8 | history as an optimization, assuming all merges are automatic and clean | |
9 | (i.e. no conflicts and not interrupted for user input or editing). | |
10 | ||
11 | Outline: | |
12 | ||
13 | 0. Assumptions | |
14 | ||
15 | 1. How rebasing and cherry-picking work | |
16 | ||
17 | 2. Why the renames on MERGE_SIDE1 in any given pick are *always* a | |
18 | superset of the renames on MERGE_SIDE1 for the next pick. | |
19 | ||
20 | 3. Why any rename on MERGE_SIDE1 in any given pick is _almost_ always also | |
21 | a rename on MERGE_SIDE1 for the next pick | |
22 | ||
c9dba103 | 23 | 4. A detailed description of the counter-examples to #3. |
bb80333c EN |
24 | |
25 | 5. Why the special cases in #4 are still fully reasonable to use to pair | |
26 | up files for three-way content merging in the merge machinery, and why | |
27 | they do not affect the correctness of the merge. | |
28 | ||
29 | 6. Interaction with skipping of "irrelevant" renames | |
30 | ||
31 | 7. Additional items that need to be cached | |
32 | ||
33 | 8. How directory rename detection interacts with the above and why this | |
34 | optimization is still safe even if merge.directoryRenames is set to | |
35 | "true". | |
36 | ||
37 | ||
38 | === 0. Assumptions === | |
39 | ||
40 | There are two assumptions that will hold throughout this document: | |
41 | ||
42 | * The upstream side where commits are transplanted to is treated as the | |
43 | first parent side when rebase/cherry-pick call the merge machinery | |
44 | ||
45 | * All merges are fully automatic | |
46 | ||
47 | and a third that will hold in sections 2-5 for simplicity, that I'll later | |
48 | address in section 8: | |
49 | ||
50 | * No directory renames occur | |
51 | ||
52 | ||
53 | Let me explain more about each assumption and why I include it: | |
54 | ||
55 | ||
56 | The first assumption is merely for the purposes of making this document | |
57 | clearer; the optimization implementation does not actually depend upon it. | |
58 | However, the assumption does hold in all cases because it reflects the way | |
59 | that both rebase and cherry-pick were implemented; and the implementation | |
60 | of cherry-pick and rebase are not readily changeable for backwards | |
61 | compatibility reasons (see for example the discussion of the --ours and | |
62 | --theirs flag in the documentation of `git checkout`, particularly the | |
63 | comments about how they behave with rebase). The optimization avoids | |
64 | checking first-parent-ness, though. It checks the conditions that make the | |
65 | optimization valid instead, so it would still continue working if someone | |
66 | changed the parent ordering that cherry-pick and rebase use. But making | |
67 | this assumption does make this document much clearer and prevents me from | |
68 | having to repeat every example twice. | |
69 | ||
70 | If the second assumption is violated, then the optimization simply is | |
71 | turned off and thus isn't relevant to consider. The second assumption can | |
72 | also be stated as "there is no interruption for a user to resolve conflicts | |
73 | or to just further edit or tweak files". While real rebases and | |
74 | cherry-picks are often interrupted (either because it's an interactive | |
75 | rebase where the user requested to stop and edit, or because there were | |
76 | conflicts that the user needs to resolve), the cache of renames is not | |
77 | stored on disk, and thus is thrown away as soon as the rebase or cherry | |
78 | pick stops for the user to resolve the operation. | |
79 | ||
80 | The third assumption makes sections 2-5 simpler, and allows people to | |
81 | understand the basics of why this optimization is safe and effective, and | |
82 | then I can go back and address the specifics in section 8. It is probably | |
83 | also worth noting that if directory renames do occur, then the default of | |
84 | merge.directoryRenames being set to "conflict" means that the operation | |
85 | will stop for users to resolve the conflicts and the cache will be thrown | |
86 | away, and thus that there won't be an optimization to apply. So, the only | |
87 | reason we need to address directory renames specifically, is that some | |
88 | users will have set merge.directoryRenames to "true" to allow the merges to | |
89 | continue to proceed automatically. The optimization is still safe with | |
90 | this config setting, but we have to discuss a few more cases to show why; | |
91 | this discussion is deferred until section 8. | |
92 | ||
93 | ||
94 | === 1. How rebasing and cherry-picking work === | |
95 | ||
96 | Consider the following setup (from the git-rebase manpage): | |
97 | ||
98 | A---B---C topic | |
99 | / | |
100 | D---E---F---G main | |
101 | ||
102 | After rebasing or cherry-picking topic onto main, this will appear as: | |
103 | ||
104 | A'--B'--C' topic | |
105 | / | |
106 | D---E---F---G main | |
107 | ||
108 | The way the commits A', B', and C' are created is through a series of | |
109 | merges, where rebase or cherry-pick sequentially uses each of the three | |
110 | A-B-C commits in a special merge operation. Let's label the three commits | |
111 | in the merge operation as MERGE_BASE, MERGE_SIDE1, and MERGE_SIDE2. For | |
112 | this picture, the three commits for each of the three merges would be: | |
113 | ||
114 | To create A': | |
115 | MERGE_BASE: E | |
116 | MERGE_SIDE1: G | |
117 | MERGE_SIDE2: A | |
118 | ||
119 | To create B': | |
120 | MERGE_BASE: A | |
121 | MERGE_SIDE1: A' | |
122 | MERGE_SIDE2: B | |
123 | ||
124 | To create C': | |
125 | MERGE_BASE: B | |
126 | MERGE_SIDE1: B' | |
127 | MERGE_SIDE2: C | |
128 | ||
129 | Sometimes, folks are surprised that these three-way merges are done. It | |
130 | can be useful in understanding these three-way merges to view them in a | |
131 | slightly different light. For example, in creating C', you can view it as | |
132 | either: | |
133 | ||
134 | * Apply the changes between B & C to B' | |
135 | * Apply the changes between B & B' to C | |
136 | ||
137 | Conceptually the two statements above are the same as a three-way merge of | |
138 | B, B', and C, at least the parts before you decide to record a commit. | |
139 | ||
140 | ||
141 | === 2. Why the renames on MERGE_SIDE1 in any given pick are always a === | |
142 | === superset of the renames on MERGE_SIDE1 for the next pick. === | |
143 | ||
144 | The merge machinery uses the filenames it is fed from MERGE_BASE, | |
145 | MERGE_SIDE1, and MERGE_SIDE2. It will only move content to a different | |
146 | filename under one of three conditions: | |
147 | ||
148 | * To make both pieces of a conflict available to a user during conflict | |
149 | resolution (examples: directory/file conflict, add/add type conflict | |
150 | such as symlink vs. regular file) | |
151 | ||
152 | * When MERGE_SIDE1 renames the file. | |
153 | ||
154 | * When MERGE_SIDE2 renames the file. | |
155 | ||
156 | First, let's remember what commits are involved in the first and second | |
157 | picks of the cherry-pick or rebase sequence: | |
158 | ||
159 | To create A': | |
160 | MERGE_BASE: E | |
161 | MERGE_SIDE1: G | |
162 | MERGE_SIDE2: A | |
163 | ||
164 | To create B': | |
165 | MERGE_BASE: A | |
166 | MERGE_SIDE1: A' | |
167 | MERGE_SIDE2: B | |
168 | ||
169 | So, in particular, we need to show that the renames between E and G are a | |
170 | superset of those between A and A'. | |
171 | ||
172 | A' is created by the first merge. A' will only have renames for one of the | |
173 | three reasons listed above. The first case, a conflict, results in a | |
174 | situation where the cache is dropped and thus this optimization doesn't | |
175 | take effect, so we need not consider that case. The third case, a rename | |
176 | on MERGE_SIDE2 (i.e. from G to A), will show up in A' but it also shows up | |
177 | in A -- therefore when diffing A and A' that path does not show up as a | |
178 | rename. The only remaining way for renames to show up in A' is for the | |
179 | rename to come from MERGE_SIDE1. Therefore, all renames between A and A' | |
180 | are a subset of those between E and G. Equivalently, all renames between E | |
181 | and G are a superset of those between A and A'. | |
182 | ||
183 | ||
184 | === 3. Why any rename on MERGE_SIDE1 in any given pick is _almost_ === | |
185 | === always also a rename on MERGE_SIDE1 for the next pick. === | |
186 | ||
187 | Let's again look at the first two picks: | |
188 | ||
189 | To create A': | |
190 | MERGE_BASE: E | |
191 | MERGE_SIDE1: G | |
192 | MERGE_SIDE2: A | |
193 | ||
194 | To create B': | |
195 | MERGE_BASE: A | |
196 | MERGE_SIDE1: A' | |
197 | MERGE_SIDE2: B | |
198 | ||
199 | Now let's look at any given rename from MERGE_SIDE1 of the first pick, i.e. | |
200 | any given rename from E to G. Let's use the filenames 'oldfile' and | |
201 | 'newfile' for demonstration purposes. That first pick will function as | |
202 | follows; when the rename is detected, the merge machinery will do a | |
203 | three-way content merge of the following: | |
204 | E:oldfile | |
205 | G:newfile | |
206 | A:oldfile | |
207 | and produce a new result: | |
208 | A':newfile | |
209 | ||
210 | Note above that I've assumed that E->A did not rename oldfile. If that | |
211 | side did rename, then we most likely have a rename/rename(1to2) conflict | |
212 | that will cause the rebase or cherry-pick operation to halt and drop the | |
213 | in-memory cache of renames and thus doesn't need to be considered further. | |
214 | In the special case that E->A does rename the file but also renames it to | |
215 | newfile, then there is no conflict from the renaming and the merge can | |
216 | succeed. In this special case, the rename is not valid to cache because | |
a22099f5 EN |
217 | the second merge will find A:newfile in the MERGE_BASE (see also the new |
218 | testcases in t6429 with "rename same file identically" in their | |
219 | description). So a rename/rename(1to1) needs to be specially handled by | |
220 | pruning renames from the cache and decrementing the dir_rename_counts in | |
221 | the current and leading directories associated with those renames. Or, | |
222 | since these are really rare, one could just take the easy way out and | |
223 | disable the remembering renames optimization when a rename/rename(1to1) | |
224 | happens. | |
bb80333c EN |
225 | |
226 | The previous paragraph handled the cases for E->A renaming oldfile, let's | |
227 | continue assuming that oldfile is not renamed in A. | |
228 | ||
229 | As per the diagram for creating B', MERGE_SIDE1 involves the changes from A | |
230 | to A'. So, we are curious whether A:oldfile and A':newfile will be viewed | |
231 | as renames. Note that: | |
232 | ||
233 | * There will be no A':oldfile (because there could not have been a | |
234 | G:oldfile as we do not do break detection in the merge machinery and | |
235 | G:newfile was detected as a rename, and by the construction of the | |
236 | rename above that merged cleanly, the merge machinery will ensure there | |
237 | is no 'oldfile' in the result). | |
238 | ||
239 | * There will be no A:newfile (if there had been, we would have had a | |
240 | rename/add conflict). | |
241 | ||
242 | * Clearly A:oldfile and A':newfile are "related" (A':newfile came from a | |
243 | clean three-way content merge involving A:oldfile). | |
244 | ||
245 | We can also expound on the third point above, by noting that three-way | |
246 | content merges can also be viewed as applying the differences between the | |
247 | base and one side to the other side. Thus we can view A':newfile as | |
248 | having been created by taking the changes between E:oldfile and G:newfile | |
249 | (which were detected as being related, i.e. <50% changed) to A:oldfile. | |
250 | ||
251 | Thus A:oldfile and A':newfile are just as related as E:oldfile and | |
252 | G:newfile are -- they have exactly identical differences. Since the latter | |
253 | were detected as renames, A:oldfile and A':newfile should also be | |
254 | detectable as renames almost always. | |
255 | ||
256 | ||
257 | === 4. A detailed description of the counter-examples to #3. === | |
258 | ||
259 | We already noted in section 3 that rename/rename(1to1) (i.e. both sides | |
260 | renaming a file the same way) was one counter-example. The more | |
261 | interesting bit, though, is why did we need to use the "almost" qualifier | |
262 | when stating that A:oldfile and A':newfile are "almost" always detectable | |
263 | as renames? | |
264 | ||
265 | Let's repeat an earlier point that section 3 made: | |
266 | ||
267 | A':newfile was created by applying the changes between E:oldfile and | |
268 | G:newfile to A:oldfile. The changes between E:oldfile and G:newfile were | |
269 | <50% of the size of E:oldfile. | |
270 | ||
271 | If those changes that were <50% of the size of E:oldfile are also <50% of | |
272 | the size of A:oldfile, then A:oldfile and A':newfile will be detectable as | |
273 | renames. However, if there is a dramatic size reduction between E:oldfile | |
274 | and A:oldfile (but the changes between E:oldfile, G:newfile, and A:oldfile | |
275 | still somehow merge cleanly), then traditional rename detection would not | |
276 | detect A:oldfile and A':newfile as renames. | |
277 | ||
278 | Here's an example where that can happen: | |
279 | * E:oldfile had 20 lines | |
280 | * G:newfile added 10 new lines at the beginning of the file | |
281 | * A:oldfile kept the first 3 lines of the file, and deleted all the rest | |
282 | then | |
283 | => A':newfile would have 13 lines, 3 of which matches those in A:oldfile. | |
284 | E:oldfile -> G:newfile would be detected as a rename, but A:oldfile and | |
285 | A':newfile would not be. | |
286 | ||
287 | ||
288 | === 5. Why the special cases in #4 are still fully reasonable to use to === | |
289 | === pair up files for three-way content merging in the merge machinery, === | |
290 | === and why they do not affect the correctness of the merge. === | |
291 | ||
292 | In the rename/rename(1to1) case, A:newfile and A':newfile are not renames | |
293 | since they use the *same* filename. However, files with the same filename | |
294 | are obviously fine to pair up for three-way content merging (the merge | |
295 | machinery has never employed break detection). The interesting | |
296 | counter-example case is thus not the rename/rename(1to1) case, but the case | |
297 | where A did not rename oldfile. That was the case that we spent most of | |
298 | the time discussing in sections 3 and 4. The remainder of this section | |
299 | will be devoted to that case as well. | |
300 | ||
301 | So, even if A:oldfile and A':newfile aren't detectable as renames, why is | |
302 | it still reasonable to pair them up for three-way content merging in the | |
303 | merge machinery? There are multiple reasons: | |
304 | ||
305 | * As noted in sections 3 and 4, the diff between A:oldfile and A':newfile | |
306 | is *exactly* the same as the diff between E:oldfile and G:newfile. The | |
307 | latter pair were detected as renames, so it seems unlikely to surprise | |
308 | users for us to treat A:oldfile and A':newfile as renames. | |
309 | ||
310 | * In fact, "oldfile" and "newfile" were at one point detected as renames | |
311 | due to how they were constructed in the E..G chain. And we used that | |
312 | information once already in this rebase/cherry-pick. I think users | |
313 | would be unlikely to be surprised at us continuing to treat the files | |
314 | as renames and would quickly understand why we had done so. | |
315 | ||
316 | * Marking or declaring files as renames is *not* the end goal for merges. | |
317 | Merges use renames to determine which files make sense to be paired up | |
318 | for three-way content merges. | |
319 | ||
320 | * A:oldfile and A':newfile were _already_ paired up in a three-way | |
321 | content merge; that is how A':newfile was created. In fact, that | |
322 | three-way content merge was clean. So using them again in a later | |
323 | three-way content merge seems very reasonable. | |
324 | ||
325 | However, the above is focusing on the common scenarios. Let's try to look | |
326 | at all possible unusual scenarios and compare without the optimization to | |
327 | with the optimization. Consider the following theoretical cases; we will | |
328 | then dive into each to determine which of them are possible, | |
329 | and if so, what they mean: | |
330 | ||
331 | 1. Without the optimization, the second merge results in a conflict. | |
332 | With the optimization, the second merge also results in a conflict. | |
333 | Questions: Are the conflicts confusingly different? Better in one case? | |
334 | ||
335 | 2. Without the optimization, the second merge results in NO conflict. | |
336 | With the optimization, the second merge also results in NO conflict. | |
337 | Questions: Are the merges the same? | |
338 | ||
339 | 3. Without the optimization, the second merge results in a conflict. | |
340 | With the optimization, the second merge results in NO conflict. | |
341 | Questions: Possible? Bug, bugfix, or something else? | |
342 | ||
343 | 4. Without the optimization, the second merge results in NO conflict. | |
344 | With the optimization, the second merge results in a conflict. | |
345 | Questions: Possible? Bug, bugfix, or something else? | |
346 | ||
347 | I'll consider all four cases, but out of order. | |
348 | ||
349 | The fourth case is impossible. For the code without the remembering | |
350 | renames optimization to not get a conflict, B:oldfile would need to exactly | |
351 | match A:oldfile -- if it doesn't, there would be a modify/delete conflict. | |
352 | If A:oldfile matches B:oldfile exactly, then a three-way content merge | |
353 | between A:oldfile, A':newfile, and B:oldfile would have no conflict and | |
354 | just give us the version of newfile from A' as the result. | |
355 | ||
356 | From the same logic as the above paragraph, the second case would indeed | |
357 | result in identical merges. When A:oldfile exactly matches B:oldfile, an | |
358 | undetected rename would say, "Oh, I see one side didn't modify 'oldfile' | |
359 | and the other side deleted it. I'll delete it. And I see you have this | |
360 | brand new file named 'newfile' in A', so I'll keep it." That gives the | |
361 | same results as three-way content merging A:oldfile, A':newfile, and | |
362 | B:oldfile -- a removal of oldfile with the version of newfile from A' | |
363 | showing up in the result. | |
364 | ||
365 | The third case is interesting. It means that A:oldfile and A':newfile were | |
366 | not just similar enough, but that the changes between them did not conflict | |
367 | with the changes between A:oldfile and B:oldfile. This would validate our | |
368 | hunch that the files were similar enough to be used in a three-way content | |
369 | merge, and thus seems entirely correct for us to have used them that way. | |
370 | (Sidenote: One particular example here may be enlightening. Let's say that | |
371 | B was an immediate revert of A. B clearly would have been a clean revert | |
372 | of A, since A was B's immediate parent. One would assume that if you can | |
373 | pick a commit, you should also be able to cherry-pick its immediate revert. | |
374 | However, this is one of those funny corner cases; without this | |
375 | optimization, we just successfully picked a commit cleanly, but we are | |
376 | unable to cherry-pick its immediate revert due to the size differences | |
377 | between E:oldfile and A:oldfile.) | |
378 | ||
379 | That leaves only the first case to consider -- when we get conflicts both | |
380 | with or without the optimization. Without the optimization, we'll have a | |
381 | modify/delete conflict, where both A':newfile and B:oldfile are left in the | |
382 | tree for the user to deal with and no hints about the potential similarity | |
383 | between the two. With the optimization, we'll have a three-way content | |
384 | merged A:oldfile, A':newfile, and B:oldfile with conflict markers | |
385 | suggesting we thought the files were related but giving the user the chance | |
386 | to resolve. As noted above, I don't think users will find us treating | |
387 | 'oldfile' and 'newfile' as related as a surprise since they were between E | |
388 | and G. In any event, though, this case shouldn't be concerning since we | |
389 | hit a conflict in both cases, told the user what we know, and asked them to | |
390 | resolve it. | |
391 | ||
392 | So, in summary, case 4 is impossible, case 2 yields the same behavior, and | |
393 | cases 1 and 3 seem to provide as good or better behavior with the | |
394 | optimization than without. | |
395 | ||
396 | ||
397 | === 6. Interaction with skipping of "irrelevant" renames === | |
398 | ||
399 | Previous optimizations involved skipping rename detection for paths | |
400 | considered to be "irrelevant". See for example the following commits: | |
401 | ||
402 | * 32a56dfb99 ("merge-ort: precompute subset of sources for which we | |
403 | need rename detection", 2021-03-11) | |
404 | * 2fd9eda462 ("merge-ort: precompute whether directory rename | |
405 | detection is needed", 2021-03-11) | |
406 | * 9bd342137e ("diffcore-rename: determine which relevant_sources are | |
407 | no longer relevant", 2021-03-13) | |
408 | ||
409 | Relevance is always determined by what the _other_ side of history has | |
bbb0c357 | 410 | done, in terms of modifying a file that our side renamed, or adding a |
bb80333c EN |
411 | file to a directory which our side renamed. This means that a path |
412 | that is "irrelevant" when picking the first commit of a series in a | |
413 | rebase or cherry-pick, may suddenly become "relevant" when picking the | |
414 | next commit. | |
415 | ||
416 | The upshot of this is that we can only cache rename detection results | |
417 | for relevant paths, and need to re-check relevance in subsequent | |
418 | commits. If those subsequent commits have additional paths that are | |
419 | relevant for rename detection, then we will need to redo rename | |
420 | detection -- though we can limit it to the paths for which we have not | |
421 | already detected renames. | |
422 | ||
423 | ||
424 | === 7. Additional items that need to be cached === | |
425 | ||
426 | It turns out we have to cache more than just renames; we also cache: | |
427 | ||
428 | A) non-renames (i.e. unpaired deletes) | |
429 | B) counts of renames within directories | |
430 | C) sources that were marked as RELEVANT_LOCATION, but which were | |
431 | downgraded to RELEVANT_NO_MORE | |
432 | D) the toplevel trees involved in the merge | |
433 | ||
434 | These are all stored in struct rename_info, and respectively appear in | |
435 | * cached_pairs (along side actual renames, just with a value of NULL) | |
436 | * dir_rename_counts | |
437 | * cached_irrelevant | |
438 | * merge_trees | |
439 | ||
440 | The reason for (A) comes from the irrelevant renames skipping | |
441 | optimization discussed in section 6. The fact that irrelevant renames | |
442 | are skipped means we only get a subset of the potential renames | |
443 | detected and subsequent commits may need to run rename detection on | |
444 | the upstream side on a subset of the remaining renames (to get the | |
445 | renames that are relevant for that later commit). Since unpaired | |
446 | deletes are involved in rename detection too, we don't want to | |
447 | repeatedly check that those paths remain unpaired on the upstream side | |
448 | with every commit we are transplanting. | |
449 | ||
450 | The reason for (B) is that diffcore_rename_extended() is what | |
451 | generates the counts of renames by directory which is needed in | |
452 | directory rename detection, and if we don't run | |
453 | diffcore_rename_extended() again then we need to have the output from | |
454 | it, including dir_rename_counts, from the previous run. | |
455 | ||
456 | The reason for (C) is that merge-ort's tree traversal will again think | |
457 | those paths are relevant (marking them as RELEVANT_LOCATION), but the | |
458 | fact that they were downgraded to RELEVANT_NO_MORE means that | |
459 | dir_rename_counts already has the information we need for directory | |
460 | rename detection. (A path which becomes RELEVANT_CONTENT in a | |
461 | subsequent commit will be removed from cached_irrelevant.) | |
462 | ||
463 | The reason for (D) is that is how we determine whether the remember | |
464 | renames optimization can be used. In particular, remembering that our | |
465 | sequence of merges looks like: | |
466 | ||
467 | Merge 1: | |
468 | MERGE_BASE: E | |
469 | MERGE_SIDE1: G | |
470 | MERGE_SIDE2: A | |
471 | => Creates A' | |
472 | ||
473 | Merge 2: | |
474 | MERGE_BASE: A | |
475 | MERGE_SIDE1: A' | |
476 | MERGE_SIDE2: B | |
477 | => Creates B' | |
478 | ||
479 | It is the fact that the trees A and A' appear both in Merge 1 and in | |
480 | Merge 2, with A as a parent of A' that allows this optimization. So | |
481 | we store the trees to compare with what we are asked to merge next | |
482 | time. | |
483 | ||
484 | ||
485 | === 8. How directory rename detection interacts with the above and === | |
486 | === why this optimization is still safe even if === | |
487 | === merge.directoryRenames is set to "true". === | |
488 | ||
489 | As noted in the assumptions section: | |
490 | ||
491 | """ | |
492 | ...if directory renames do occur, then the default of | |
493 | merge.directoryRenames being set to "conflict" means that the operation | |
494 | will stop for users to resolve the conflicts and the cache will be | |
495 | thrown away, and thus that there won't be an optimization to apply. | |
496 | So, the only reason we need to address directory renames specifically, | |
497 | is that some users will have set merge.directoryRenames to "true" to | |
498 | allow the merges to continue to proceed automatically. | |
499 | """ | |
500 | ||
501 | Let's remember that we need to look at how any given pick affects the next | |
502 | one. So let's again use the first two picks from the diagram in section | |
503 | one: | |
504 | ||
505 | First pick does this three-way merge: | |
506 | MERGE_BASE: E | |
507 | MERGE_SIDE1: G | |
508 | MERGE_SIDE2: A | |
509 | => creates A' | |
510 | ||
511 | Second pick does this three-way merge: | |
512 | MERGE_BASE: A | |
513 | MERGE_SIDE1: A' | |
514 | MERGE_SIDE2: B | |
515 | => creates B' | |
516 | ||
517 | Now, directory rename detection exists so that if one side of history | |
518 | renames a directory, and the other side adds a new file to the old | |
519 | directory, then the merge (with merge.directoryRenames=true) can move the | |
520 | file into the new directory. There are two qualitatively different ways to | |
521 | add a new file to an old directory: create a new file, or rename a file | |
522 | into that directory. Also, directory renames can be done on either side of | |
523 | history, so there are four cases to consider: | |
524 | ||
525 | * MERGE_SIDE1 renames old dir, MERGE_SIDE2 adds new file to old dir | |
526 | * MERGE_SIDE1 renames old dir, MERGE_SIDE2 renames file into old dir | |
527 | * MERGE_SIDE1 adds new file to old dir, MERGE_SIDE2 renames old dir | |
528 | * MERGE_SIDE1 renames file into old dir, MERGE_SIDE2 renames old dir | |
529 | ||
530 | One last note before we consider these four cases: There are some | |
531 | important properties about how we implement this optimization with | |
532 | respect to directory rename detection that we need to bear in mind | |
533 | while considering all of these cases: | |
534 | ||
535 | * rename caching occurs *after* applying directory renames | |
536 | ||
537 | * a rename created by directory rename detection is recorded for the side | |
538 | of history that did the directory rename. | |
539 | ||
540 | * dir_rename_counts, the nested map of | |
541 | {oldname => {newname => count}}, | |
542 | is cached between runs as well. This basically means that directory | |
543 | rename detection is also cached, though only on the side of history | |
544 | that we cache renames for (MERGE_SIDE1 as far as this document is | |
545 | concerned; see the assumptions section). Two interesting sub-notes | |
546 | about these counts: | |
547 | ||
548 | * If we need to perform rename-detection again on the given side (e.g. | |
549 | some paths are relevant for rename detection that weren't before), | |
550 | then we clear dir_rename_counts and recompute it, making use of | |
551 | cached_pairs. The reason it is important to do this is optimizations | |
552 | around RELEVANT_LOCATION exist to prevent us from computing | |
553 | unnecessary renames for directory rename detection and from computing | |
554 | dir_rename_counts for irrelevant directories; but those same renames | |
555 | or directories may become necessary for subsequent merges. The | |
556 | easiest way to "fix up" dir_rename_counts in such cases is to just | |
557 | recompute it. | |
558 | ||
559 | * If we prune rename/rename(1to1) entries from the cache, then we also | |
560 | need to update dir_rename_counts to decrement the counts for the | |
561 | involved directory and any relevant parent directories (to undo what | |
562 | update_dir_rename_counts() in diffcore-rename.c incremented when the | |
563 | rename was initially found). If we instead just disable the | |
564 | remembering renames optimization when the exceedingly rare | |
565 | rename/rename(1to1) cases occur, then dir_rename_counts will get | |
566 | re-computed the next time rename detection occurs, as noted above. | |
567 | ||
568 | * the side with multiple commits to pick, is the side of history that we | |
569 | do NOT cache renames for. Thus, there are no additional commits to | |
570 | change the number of renames in a directory, except for those done by | |
571 | directory rename detection (which always pad the majority). | |
572 | ||
573 | * the "renames" we cache are modified slightly by any directory rename, | |
574 | as noted below. | |
575 | ||
576 | Now, with those notes out of the way, let's go through the four cases | |
577 | in order: | |
578 | ||
579 | Case 1: MERGE_SIDE1 renames old dir, MERGE_SIDE2 adds new file to old dir | |
580 | ||
581 | This case looks like this: | |
582 | ||
583 | MERGE_BASE: E, Has olddir/ | |
584 | MERGE_SIDE1: G, Renames olddir/ -> newdir/ | |
585 | MERGE_SIDE2: A, Adds olddir/newfile | |
586 | => creates A', With newdir/newfile | |
587 | ||
588 | MERGE_BASE: A, Has olddir/newfile | |
589 | MERGE_SIDE1: A', Has newdir/newfile | |
590 | MERGE_SIDE2: B, Modifies olddir/newfile | |
591 | => expected B', with threeway-merged newdir/newfile from above | |
592 | ||
593 | In this case, with the optimization, note that after the first commit: | |
594 | * MERGE_SIDE1 remembers olddir/ -> newdir/ | |
595 | * MERGE_SIDE1 has cached olddir/newfile -> newdir/newfile | |
596 | Given the cached rename noted above, the second merge can proceed as | |
597 | expected without needing to perform rename detection from A -> A'. | |
598 | ||
599 | Case 2: MERGE_SIDE1 renames old dir, MERGE_SIDE2 renames file into old dir | |
600 | ||
601 | This case looks like this: | |
602 | MERGE_BASE: E oldfile, olddir/ | |
603 | MERGE_SIDE1: G oldfile, olddir/ -> newdir/ | |
604 | MERGE_SIDE2: A oldfile -> olddir/newfile | |
605 | => creates A', With newdir/newfile representing original oldfile | |
606 | ||
607 | MERGE_BASE: A olddir/newfile | |
608 | MERGE_SIDE1: A' newdir/newfile | |
609 | MERGE_SIDE2: B modify olddir/newfile | |
610 | => expected B', with threeway-merged newdir/newfile from above | |
611 | ||
612 | In this case, with the optimization, note that after the first commit: | |
613 | * MERGE_SIDE1 remembers olddir/ -> newdir/ | |
614 | * MERGE_SIDE1 has cached olddir/newfile -> newdir/newfile | |
615 | (NOT oldfile -> newdir/newfile; compare to case with | |
616 | (p->status == 'R' && new_path) in possibly_cache_new_pair()) | |
617 | ||
618 | Given the cached rename noted above, the second merge can proceed as | |
619 | expected without needing to perform rename detection from A -> A'. | |
620 | ||
621 | Case 3: MERGE_SIDE1 adds new file to old dir, MERGE_SIDE2 renames old dir | |
622 | ||
623 | This case looks like this: | |
624 | ||
625 | MERGE_BASE: E, Has olddir/ | |
626 | MERGE_SIDE1: G, Adds olddir/newfile | |
627 | MERGE_SIDE2: A, Renames olddir/ -> newdir/ | |
628 | => creates A', With newdir/newfile | |
629 | ||
630 | MERGE_BASE: A, Has newdir/, but no notion of newdir/newfile | |
631 | MERGE_SIDE1: A', Has newdir/newfile | |
632 | MERGE_SIDE2: B, Has newdir/, but no notion of newdir/newfile | |
633 | => expected B', with newdir/newfile from A' | |
634 | ||
635 | In this case, with the optimization, note that after the first commit there | |
636 | were no renames on MERGE_SIDE1, and any renames on MERGE_SIDE2 are tossed. | |
637 | But the second merge didn't need any renames so this is fine. | |
638 | ||
639 | Case 4: MERGE_SIDE1 renames file into old dir, MERGE_SIDE2 renames old dir | |
640 | ||
641 | This case looks like this: | |
642 | ||
643 | MERGE_BASE: E, Has olddir/ | |
644 | MERGE_SIDE1: G, Renames oldfile -> olddir/newfile | |
645 | MERGE_SIDE2: A, Renames olddir/ -> newdir/ | |
646 | => creates A', With newdir/newfile representing original oldfile | |
647 | ||
648 | MERGE_BASE: A, Has oldfile | |
649 | MERGE_SIDE1: A', Has newdir/newfile | |
650 | MERGE_SIDE2: B, Modifies oldfile | |
651 | => expected B', with threeway-merged newdir/newfile from above | |
652 | ||
653 | In this case, with the optimization, note that after the first commit: | |
654 | * MERGE_SIDE1 remembers oldfile -> newdir/newfile | |
655 | (NOT oldfile -> olddir/newfile; compare to case of second | |
656 | block under p->status == 'R' in possibly_cache_new_pair()) | |
657 | * MERGE_SIDE2 renames are tossed because only MERGE_SIDE1 is remembered | |
658 | ||
659 | Given the cached rename noted above, the second merge can proceed as | |
660 | expected without needing to perform rename detection from A -> A'. | |
661 | ||
662 | Finally, I'll just note here that interactions with the | |
663 | skip-irrelevant-renames optimization means we sometimes don't detect | |
664 | renames for any files within a directory that was renamed, in which | |
665 | case we will not have been able to detect any rename for the directory | |
666 | itself. In such a case, we do not know whether the directory was | |
548afb0d | 667 | renamed; we want to be careful to avoid caching some kind of "this |
bb80333c EN |
668 | directory was not renamed" statement. If we did, then a subsequent |
669 | commit being rebased could add a file to the old directory, and the | |
670 | user would expect it to end up in the correct directory -- something | |
671 | our erroneous "this directory was not renamed" cache would preclude. |