]> git.ipfire.org Git - thirdparty/git.git/blame - Documentation/technical/remembering-renames.txt
Merge branch 'ms/send-email-validate-fix'
[thirdparty/git.git] / Documentation / technical / remembering-renames.txt
CommitLineData
bb80333c
EN
1Rebases and cherry-picks involve a sequence of merges whose results are
2recorded as new single-parent commits. The first parent side of those
3merges represent the "upstream" side, and often include a far larger set of
4changes than the second parent side. Traditionally, the renames on the
5first-parent side of that sequence of merges were repeatedly re-detected
6for every merge. This file explains why it is safe and effective during
7rebases and cherry-picks to remember renames on the upstream side of
8history as an optimization, assuming all merges are automatic and clean
9(i.e. no conflicts and not interrupted for user input or editing).
10
11Outline:
12
13 0. Assumptions
14
15 1. How rebasing and cherry-picking work
16
17 2. Why the renames on MERGE_SIDE1 in any given pick are *always* a
18 superset of the renames on MERGE_SIDE1 for the next pick.
19
20 3. Why any rename on MERGE_SIDE1 in any given pick is _almost_ always also
21 a rename on MERGE_SIDE1 for the next pick
22
c9dba103 23 4. A detailed description of the counter-examples to #3.
bb80333c
EN
24
25 5. Why the special cases in #4 are still fully reasonable to use to pair
26 up files for three-way content merging in the merge machinery, and why
27 they do not affect the correctness of the merge.
28
29 6. Interaction with skipping of "irrelevant" renames
30
31 7. Additional items that need to be cached
32
33 8. How directory rename detection interacts with the above and why this
34 optimization is still safe even if merge.directoryRenames is set to
35 "true".
36
37
38=== 0. Assumptions ===
39
40There are two assumptions that will hold throughout this document:
41
42 * The upstream side where commits are transplanted to is treated as the
43 first parent side when rebase/cherry-pick call the merge machinery
44
45 * All merges are fully automatic
46
47and a third that will hold in sections 2-5 for simplicity, that I'll later
48address in section 8:
49
50 * No directory renames occur
51
52
53Let me explain more about each assumption and why I include it:
54
55
56The first assumption is merely for the purposes of making this document
57clearer; the optimization implementation does not actually depend upon it.
58However, the assumption does hold in all cases because it reflects the way
59that both rebase and cherry-pick were implemented; and the implementation
60of cherry-pick and rebase are not readily changeable for backwards
61compatibility reasons (see for example the discussion of the --ours and
62--theirs flag in the documentation of `git checkout`, particularly the
63comments about how they behave with rebase). The optimization avoids
64checking first-parent-ness, though. It checks the conditions that make the
65optimization valid instead, so it would still continue working if someone
66changed the parent ordering that cherry-pick and rebase use. But making
67this assumption does make this document much clearer and prevents me from
68having to repeat every example twice.
69
70If the second assumption is violated, then the optimization simply is
71turned off and thus isn't relevant to consider. The second assumption can
72also be stated as "there is no interruption for a user to resolve conflicts
73or to just further edit or tweak files". While real rebases and
74cherry-picks are often interrupted (either because it's an interactive
75rebase where the user requested to stop and edit, or because there were
76conflicts that the user needs to resolve), the cache of renames is not
77stored on disk, and thus is thrown away as soon as the rebase or cherry
78pick stops for the user to resolve the operation.
79
80The third assumption makes sections 2-5 simpler, and allows people to
81understand the basics of why this optimization is safe and effective, and
82then I can go back and address the specifics in section 8. It is probably
83also worth noting that if directory renames do occur, then the default of
84merge.directoryRenames being set to "conflict" means that the operation
85will stop for users to resolve the conflicts and the cache will be thrown
86away, and thus that there won't be an optimization to apply. So, the only
87reason we need to address directory renames specifically, is that some
88users will have set merge.directoryRenames to "true" to allow the merges to
89continue to proceed automatically. The optimization is still safe with
90this config setting, but we have to discuss a few more cases to show why;
91this discussion is deferred until section 8.
92
93
94=== 1. How rebasing and cherry-picking work ===
95
96Consider the following setup (from the git-rebase manpage):
97
98 A---B---C topic
99 /
100 D---E---F---G main
101
102After rebasing or cherry-picking topic onto main, this will appear as:
103
104 A'--B'--C' topic
105 /
106 D---E---F---G main
107
108The way the commits A', B', and C' are created is through a series of
109merges, where rebase or cherry-pick sequentially uses each of the three
110A-B-C commits in a special merge operation. Let's label the three commits
111in the merge operation as MERGE_BASE, MERGE_SIDE1, and MERGE_SIDE2. For
112this picture, the three commits for each of the three merges would be:
113
114To create A':
115 MERGE_BASE: E
116 MERGE_SIDE1: G
117 MERGE_SIDE2: A
118
119To create B':
120 MERGE_BASE: A
121 MERGE_SIDE1: A'
122 MERGE_SIDE2: B
123
124To create C':
125 MERGE_BASE: B
126 MERGE_SIDE1: B'
127 MERGE_SIDE2: C
128
129Sometimes, folks are surprised that these three-way merges are done. It
130can be useful in understanding these three-way merges to view them in a
131slightly different light. For example, in creating C', you can view it as
132either:
133
134 * Apply the changes between B & C to B'
135 * Apply the changes between B & B' to C
136
137Conceptually the two statements above are the same as a three-way merge of
138B, B', and C, at least the parts before you decide to record a commit.
139
140
141=== 2. Why the renames on MERGE_SIDE1 in any given pick are always a ===
142=== superset of the renames on MERGE_SIDE1 for the next pick. ===
143
144The merge machinery uses the filenames it is fed from MERGE_BASE,
145MERGE_SIDE1, and MERGE_SIDE2. It will only move content to a different
146filename under one of three conditions:
147
148 * To make both pieces of a conflict available to a user during conflict
149 resolution (examples: directory/file conflict, add/add type conflict
150 such as symlink vs. regular file)
151
152 * When MERGE_SIDE1 renames the file.
153
154 * When MERGE_SIDE2 renames the file.
155
156First, let's remember what commits are involved in the first and second
157picks of the cherry-pick or rebase sequence:
158
159To create A':
160 MERGE_BASE: E
161 MERGE_SIDE1: G
162 MERGE_SIDE2: A
163
164To create B':
165 MERGE_BASE: A
166 MERGE_SIDE1: A'
167 MERGE_SIDE2: B
168
169So, in particular, we need to show that the renames between E and G are a
170superset of those between A and A'.
171
172A' is created by the first merge. A' will only have renames for one of the
173three reasons listed above. The first case, a conflict, results in a
174situation where the cache is dropped and thus this optimization doesn't
175take effect, so we need not consider that case. The third case, a rename
176on MERGE_SIDE2 (i.e. from G to A), will show up in A' but it also shows up
177in A -- therefore when diffing A and A' that path does not show up as a
178rename. The only remaining way for renames to show up in A' is for the
179rename to come from MERGE_SIDE1. Therefore, all renames between A and A'
180are a subset of those between E and G. Equivalently, all renames between E
181and G are a superset of those between A and A'.
182
183
184=== 3. Why any rename on MERGE_SIDE1 in any given pick is _almost_ ===
185=== always also a rename on MERGE_SIDE1 for the next pick. ===
186
187Let's again look at the first two picks:
188
189To create A':
190 MERGE_BASE: E
191 MERGE_SIDE1: G
192 MERGE_SIDE2: A
193
194To create B':
195 MERGE_BASE: A
196 MERGE_SIDE1: A'
197 MERGE_SIDE2: B
198
199Now let's look at any given rename from MERGE_SIDE1 of the first pick, i.e.
200any given rename from E to G. Let's use the filenames 'oldfile' and
201'newfile' for demonstration purposes. That first pick will function as
202follows; when the rename is detected, the merge machinery will do a
203three-way content merge of the following:
204 E:oldfile
205 G:newfile
206 A:oldfile
207and produce a new result:
208 A':newfile
209
210Note above that I've assumed that E->A did not rename oldfile. If that
211side did rename, then we most likely have a rename/rename(1to2) conflict
212that will cause the rebase or cherry-pick operation to halt and drop the
213in-memory cache of renames and thus doesn't need to be considered further.
214In the special case that E->A does rename the file but also renames it to
215newfile, then there is no conflict from the renaming and the merge can
216succeed. In this special case, the rename is not valid to cache because
a22099f5
EN
217the second merge will find A:newfile in the MERGE_BASE (see also the new
218testcases in t6429 with "rename same file identically" in their
219description). So a rename/rename(1to1) needs to be specially handled by
220pruning renames from the cache and decrementing the dir_rename_counts in
221the current and leading directories associated with those renames. Or,
222since these are really rare, one could just take the easy way out and
223disable the remembering renames optimization when a rename/rename(1to1)
224happens.
bb80333c
EN
225
226The previous paragraph handled the cases for E->A renaming oldfile, let's
227continue assuming that oldfile is not renamed in A.
228
229As per the diagram for creating B', MERGE_SIDE1 involves the changes from A
230to A'. So, we are curious whether A:oldfile and A':newfile will be viewed
231as renames. Note that:
232
233 * There will be no A':oldfile (because there could not have been a
234 G:oldfile as we do not do break detection in the merge machinery and
235 G:newfile was detected as a rename, and by the construction of the
236 rename above that merged cleanly, the merge machinery will ensure there
237 is no 'oldfile' in the result).
238
239 * There will be no A:newfile (if there had been, we would have had a
240 rename/add conflict).
241
242 * Clearly A:oldfile and A':newfile are "related" (A':newfile came from a
243 clean three-way content merge involving A:oldfile).
244
245We can also expound on the third point above, by noting that three-way
246content merges can also be viewed as applying the differences between the
247base and one side to the other side. Thus we can view A':newfile as
248having been created by taking the changes between E:oldfile and G:newfile
249(which were detected as being related, i.e. <50% changed) to A:oldfile.
250
251Thus A:oldfile and A':newfile are just as related as E:oldfile and
252G:newfile are -- they have exactly identical differences. Since the latter
253were detected as renames, A:oldfile and A':newfile should also be
254detectable as renames almost always.
255
256
257=== 4. A detailed description of the counter-examples to #3. ===
258
259We already noted in section 3 that rename/rename(1to1) (i.e. both sides
260renaming a file the same way) was one counter-example. The more
261interesting bit, though, is why did we need to use the "almost" qualifier
262when stating that A:oldfile and A':newfile are "almost" always detectable
263as renames?
264
265Let's repeat an earlier point that section 3 made:
266
267 A':newfile was created by applying the changes between E:oldfile and
268 G:newfile to A:oldfile. The changes between E:oldfile and G:newfile were
269 <50% of the size of E:oldfile.
270
271If those changes that were <50% of the size of E:oldfile are also <50% of
272the size of A:oldfile, then A:oldfile and A':newfile will be detectable as
273renames. However, if there is a dramatic size reduction between E:oldfile
274and A:oldfile (but the changes between E:oldfile, G:newfile, and A:oldfile
275still somehow merge cleanly), then traditional rename detection would not
276detect A:oldfile and A':newfile as renames.
277
278Here's an example where that can happen:
279 * E:oldfile had 20 lines
280 * G:newfile added 10 new lines at the beginning of the file
281 * A:oldfile kept the first 3 lines of the file, and deleted all the rest
282then
283 => A':newfile would have 13 lines, 3 of which matches those in A:oldfile.
284E:oldfile -> G:newfile would be detected as a rename, but A:oldfile and
285A':newfile would not be.
286
287
288=== 5. Why the special cases in #4 are still fully reasonable to use to ===
289=== pair up files for three-way content merging in the merge machinery, ===
290=== and why they do not affect the correctness of the merge. ===
291
292In the rename/rename(1to1) case, A:newfile and A':newfile are not renames
293since they use the *same* filename. However, files with the same filename
294are obviously fine to pair up for three-way content merging (the merge
295machinery has never employed break detection). The interesting
296counter-example case is thus not the rename/rename(1to1) case, but the case
297where A did not rename oldfile. That was the case that we spent most of
298the time discussing in sections 3 and 4. The remainder of this section
299will be devoted to that case as well.
300
301So, even if A:oldfile and A':newfile aren't detectable as renames, why is
302it still reasonable to pair them up for three-way content merging in the
303merge machinery? There are multiple reasons:
304
305 * As noted in sections 3 and 4, the diff between A:oldfile and A':newfile
306 is *exactly* the same as the diff between E:oldfile and G:newfile. The
307 latter pair were detected as renames, so it seems unlikely to surprise
308 users for us to treat A:oldfile and A':newfile as renames.
309
310 * In fact, "oldfile" and "newfile" were at one point detected as renames
311 due to how they were constructed in the E..G chain. And we used that
312 information once already in this rebase/cherry-pick. I think users
313 would be unlikely to be surprised at us continuing to treat the files
314 as renames and would quickly understand why we had done so.
315
316 * Marking or declaring files as renames is *not* the end goal for merges.
317 Merges use renames to determine which files make sense to be paired up
318 for three-way content merges.
319
320 * A:oldfile and A':newfile were _already_ paired up in a three-way
321 content merge; that is how A':newfile was created. In fact, that
322 three-way content merge was clean. So using them again in a later
323 three-way content merge seems very reasonable.
324
325However, the above is focusing on the common scenarios. Let's try to look
326at all possible unusual scenarios and compare without the optimization to
327with the optimization. Consider the following theoretical cases; we will
328then dive into each to determine which of them are possible,
329and if so, what they mean:
330
331 1. Without the optimization, the second merge results in a conflict.
332 With the optimization, the second merge also results in a conflict.
333 Questions: Are the conflicts confusingly different? Better in one case?
334
335 2. Without the optimization, the second merge results in NO conflict.
336 With the optimization, the second merge also results in NO conflict.
337 Questions: Are the merges the same?
338
339 3. Without the optimization, the second merge results in a conflict.
340 With the optimization, the second merge results in NO conflict.
341 Questions: Possible? Bug, bugfix, or something else?
342
343 4. Without the optimization, the second merge results in NO conflict.
344 With the optimization, the second merge results in a conflict.
345 Questions: Possible? Bug, bugfix, or something else?
346
347I'll consider all four cases, but out of order.
348
349The fourth case is impossible. For the code without the remembering
350renames optimization to not get a conflict, B:oldfile would need to exactly
351match A:oldfile -- if it doesn't, there would be a modify/delete conflict.
352If A:oldfile matches B:oldfile exactly, then a three-way content merge
353between A:oldfile, A':newfile, and B:oldfile would have no conflict and
354just give us the version of newfile from A' as the result.
355
356From the same logic as the above paragraph, the second case would indeed
357result in identical merges. When A:oldfile exactly matches B:oldfile, an
358undetected rename would say, "Oh, I see one side didn't modify 'oldfile'
359and the other side deleted it. I'll delete it. And I see you have this
360brand new file named 'newfile' in A', so I'll keep it." That gives the
361same results as three-way content merging A:oldfile, A':newfile, and
362B:oldfile -- a removal of oldfile with the version of newfile from A'
363showing up in the result.
364
365The third case is interesting. It means that A:oldfile and A':newfile were
366not just similar enough, but that the changes between them did not conflict
367with the changes between A:oldfile and B:oldfile. This would validate our
368hunch that the files were similar enough to be used in a three-way content
369merge, and thus seems entirely correct for us to have used them that way.
370(Sidenote: One particular example here may be enlightening. Let's say that
371B was an immediate revert of A. B clearly would have been a clean revert
372of A, since A was B's immediate parent. One would assume that if you can
373pick a commit, you should also be able to cherry-pick its immediate revert.
374However, this is one of those funny corner cases; without this
375optimization, we just successfully picked a commit cleanly, but we are
376unable to cherry-pick its immediate revert due to the size differences
377between E:oldfile and A:oldfile.)
378
379That leaves only the first case to consider -- when we get conflicts both
380with or without the optimization. Without the optimization, we'll have a
381modify/delete conflict, where both A':newfile and B:oldfile are left in the
382tree for the user to deal with and no hints about the potential similarity
383between the two. With the optimization, we'll have a three-way content
384merged A:oldfile, A':newfile, and B:oldfile with conflict markers
385suggesting we thought the files were related but giving the user the chance
386to resolve. As noted above, I don't think users will find us treating
387'oldfile' and 'newfile' as related as a surprise since they were between E
388and G. In any event, though, this case shouldn't be concerning since we
389hit a conflict in both cases, told the user what we know, and asked them to
390resolve it.
391
392So, in summary, case 4 is impossible, case 2 yields the same behavior, and
393cases 1 and 3 seem to provide as good or better behavior with the
394optimization than without.
395
396
397=== 6. Interaction with skipping of "irrelevant" renames ===
398
399Previous optimizations involved skipping rename detection for paths
400considered to be "irrelevant". See for example the following commits:
401
402 * 32a56dfb99 ("merge-ort: precompute subset of sources for which we
403 need rename detection", 2021-03-11)
404 * 2fd9eda462 ("merge-ort: precompute whether directory rename
405 detection is needed", 2021-03-11)
406 * 9bd342137e ("diffcore-rename: determine which relevant_sources are
407 no longer relevant", 2021-03-13)
408
409Relevance is always determined by what the _other_ side of history has
bbb0c357 410done, in terms of modifying a file that our side renamed, or adding a
bb80333c
EN
411file to a directory which our side renamed. This means that a path
412that is "irrelevant" when picking the first commit of a series in a
413rebase or cherry-pick, may suddenly become "relevant" when picking the
414next commit.
415
416The upshot of this is that we can only cache rename detection results
417for relevant paths, and need to re-check relevance in subsequent
418commits. If those subsequent commits have additional paths that are
419relevant for rename detection, then we will need to redo rename
420detection -- though we can limit it to the paths for which we have not
421already detected renames.
422
423
424=== 7. Additional items that need to be cached ===
425
426It turns out we have to cache more than just renames; we also cache:
427
428 A) non-renames (i.e. unpaired deletes)
429 B) counts of renames within directories
430 C) sources that were marked as RELEVANT_LOCATION, but which were
431 downgraded to RELEVANT_NO_MORE
432 D) the toplevel trees involved in the merge
433
434These are all stored in struct rename_info, and respectively appear in
435 * cached_pairs (along side actual renames, just with a value of NULL)
436 * dir_rename_counts
437 * cached_irrelevant
438 * merge_trees
439
440The reason for (A) comes from the irrelevant renames skipping
441optimization discussed in section 6. The fact that irrelevant renames
442are skipped means we only get a subset of the potential renames
443detected and subsequent commits may need to run rename detection on
444the upstream side on a subset of the remaining renames (to get the
445renames that are relevant for that later commit). Since unpaired
446deletes are involved in rename detection too, we don't want to
447repeatedly check that those paths remain unpaired on the upstream side
448with every commit we are transplanting.
449
450The reason for (B) is that diffcore_rename_extended() is what
451generates the counts of renames by directory which is needed in
452directory rename detection, and if we don't run
453diffcore_rename_extended() again then we need to have the output from
454it, including dir_rename_counts, from the previous run.
455
456The reason for (C) is that merge-ort's tree traversal will again think
457those paths are relevant (marking them as RELEVANT_LOCATION), but the
458fact that they were downgraded to RELEVANT_NO_MORE means that
459dir_rename_counts already has the information we need for directory
460rename detection. (A path which becomes RELEVANT_CONTENT in a
461subsequent commit will be removed from cached_irrelevant.)
462
463The reason for (D) is that is how we determine whether the remember
464renames optimization can be used. In particular, remembering that our
465sequence of merges looks like:
466
467 Merge 1:
468 MERGE_BASE: E
469 MERGE_SIDE1: G
470 MERGE_SIDE2: A
471 => Creates A'
472
473 Merge 2:
474 MERGE_BASE: A
475 MERGE_SIDE1: A'
476 MERGE_SIDE2: B
477 => Creates B'
478
479It is the fact that the trees A and A' appear both in Merge 1 and in
480Merge 2, with A as a parent of A' that allows this optimization. So
481we store the trees to compare with what we are asked to merge next
482time.
483
484
485=== 8. How directory rename detection interacts with the above and ===
486=== why this optimization is still safe even if ===
487=== merge.directoryRenames is set to "true". ===
488
489As noted in the assumptions section:
490
491 """
492 ...if directory renames do occur, then the default of
493 merge.directoryRenames being set to "conflict" means that the operation
494 will stop for users to resolve the conflicts and the cache will be
495 thrown away, and thus that there won't be an optimization to apply.
496 So, the only reason we need to address directory renames specifically,
497 is that some users will have set merge.directoryRenames to "true" to
498 allow the merges to continue to proceed automatically.
499 """
500
501Let's remember that we need to look at how any given pick affects the next
502one. So let's again use the first two picks from the diagram in section
503one:
504
505 First pick does this three-way merge:
506 MERGE_BASE: E
507 MERGE_SIDE1: G
508 MERGE_SIDE2: A
509 => creates A'
510
511 Second pick does this three-way merge:
512 MERGE_BASE: A
513 MERGE_SIDE1: A'
514 MERGE_SIDE2: B
515 => creates B'
516
517Now, directory rename detection exists so that if one side of history
518renames a directory, and the other side adds a new file to the old
519directory, then the merge (with merge.directoryRenames=true) can move the
520file into the new directory. There are two qualitatively different ways to
521add a new file to an old directory: create a new file, or rename a file
522into that directory. Also, directory renames can be done on either side of
523history, so there are four cases to consider:
524
525 * MERGE_SIDE1 renames old dir, MERGE_SIDE2 adds new file to old dir
526 * MERGE_SIDE1 renames old dir, MERGE_SIDE2 renames file into old dir
527 * MERGE_SIDE1 adds new file to old dir, MERGE_SIDE2 renames old dir
528 * MERGE_SIDE1 renames file into old dir, MERGE_SIDE2 renames old dir
529
530One last note before we consider these four cases: There are some
531important properties about how we implement this optimization with
532respect to directory rename detection that we need to bear in mind
533while considering all of these cases:
534
535 * rename caching occurs *after* applying directory renames
536
537 * a rename created by directory rename detection is recorded for the side
538 of history that did the directory rename.
539
540 * dir_rename_counts, the nested map of
541 {oldname => {newname => count}},
542 is cached between runs as well. This basically means that directory
543 rename detection is also cached, though only on the side of history
544 that we cache renames for (MERGE_SIDE1 as far as this document is
545 concerned; see the assumptions section). Two interesting sub-notes
546 about these counts:
547
548 * If we need to perform rename-detection again on the given side (e.g.
549 some paths are relevant for rename detection that weren't before),
550 then we clear dir_rename_counts and recompute it, making use of
551 cached_pairs. The reason it is important to do this is optimizations
552 around RELEVANT_LOCATION exist to prevent us from computing
553 unnecessary renames for directory rename detection and from computing
554 dir_rename_counts for irrelevant directories; but those same renames
555 or directories may become necessary for subsequent merges. The
556 easiest way to "fix up" dir_rename_counts in such cases is to just
557 recompute it.
558
559 * If we prune rename/rename(1to1) entries from the cache, then we also
560 need to update dir_rename_counts to decrement the counts for the
561 involved directory and any relevant parent directories (to undo what
562 update_dir_rename_counts() in diffcore-rename.c incremented when the
563 rename was initially found). If we instead just disable the
564 remembering renames optimization when the exceedingly rare
565 rename/rename(1to1) cases occur, then dir_rename_counts will get
566 re-computed the next time rename detection occurs, as noted above.
567
568 * the side with multiple commits to pick, is the side of history that we
569 do NOT cache renames for. Thus, there are no additional commits to
570 change the number of renames in a directory, except for those done by
571 directory rename detection (which always pad the majority).
572
573 * the "renames" we cache are modified slightly by any directory rename,
574 as noted below.
575
576Now, with those notes out of the way, let's go through the four cases
577in order:
578
579Case 1: MERGE_SIDE1 renames old dir, MERGE_SIDE2 adds new file to old dir
580
581 This case looks like this:
582
583 MERGE_BASE: E, Has olddir/
584 MERGE_SIDE1: G, Renames olddir/ -> newdir/
585 MERGE_SIDE2: A, Adds olddir/newfile
586 => creates A', With newdir/newfile
587
588 MERGE_BASE: A, Has olddir/newfile
589 MERGE_SIDE1: A', Has newdir/newfile
590 MERGE_SIDE2: B, Modifies olddir/newfile
591 => expected B', with threeway-merged newdir/newfile from above
592
593 In this case, with the optimization, note that after the first commit:
594 * MERGE_SIDE1 remembers olddir/ -> newdir/
595 * MERGE_SIDE1 has cached olddir/newfile -> newdir/newfile
596 Given the cached rename noted above, the second merge can proceed as
597 expected without needing to perform rename detection from A -> A'.
598
599Case 2: MERGE_SIDE1 renames old dir, MERGE_SIDE2 renames file into old dir
600
601 This case looks like this:
602 MERGE_BASE: E oldfile, olddir/
603 MERGE_SIDE1: G oldfile, olddir/ -> newdir/
604 MERGE_SIDE2: A oldfile -> olddir/newfile
605 => creates A', With newdir/newfile representing original oldfile
606
607 MERGE_BASE: A olddir/newfile
608 MERGE_SIDE1: A' newdir/newfile
609 MERGE_SIDE2: B modify olddir/newfile
610 => expected B', with threeway-merged newdir/newfile from above
611
612 In this case, with the optimization, note that after the first commit:
613 * MERGE_SIDE1 remembers olddir/ -> newdir/
614 * MERGE_SIDE1 has cached olddir/newfile -> newdir/newfile
615 (NOT oldfile -> newdir/newfile; compare to case with
616 (p->status == 'R' && new_path) in possibly_cache_new_pair())
617
618 Given the cached rename noted above, the second merge can proceed as
619 expected without needing to perform rename detection from A -> A'.
620
621Case 3: MERGE_SIDE1 adds new file to old dir, MERGE_SIDE2 renames old dir
622
623 This case looks like this:
624
625 MERGE_BASE: E, Has olddir/
626 MERGE_SIDE1: G, Adds olddir/newfile
627 MERGE_SIDE2: A, Renames olddir/ -> newdir/
628 => creates A', With newdir/newfile
629
630 MERGE_BASE: A, Has newdir/, but no notion of newdir/newfile
631 MERGE_SIDE1: A', Has newdir/newfile
632 MERGE_SIDE2: B, Has newdir/, but no notion of newdir/newfile
633 => expected B', with newdir/newfile from A'
634
635 In this case, with the optimization, note that after the first commit there
636 were no renames on MERGE_SIDE1, and any renames on MERGE_SIDE2 are tossed.
637 But the second merge didn't need any renames so this is fine.
638
639Case 4: MERGE_SIDE1 renames file into old dir, MERGE_SIDE2 renames old dir
640
641 This case looks like this:
642
643 MERGE_BASE: E, Has olddir/
644 MERGE_SIDE1: G, Renames oldfile -> olddir/newfile
645 MERGE_SIDE2: A, Renames olddir/ -> newdir/
646 => creates A', With newdir/newfile representing original oldfile
647
648 MERGE_BASE: A, Has oldfile
649 MERGE_SIDE1: A', Has newdir/newfile
650 MERGE_SIDE2: B, Modifies oldfile
651 => expected B', with threeway-merged newdir/newfile from above
652
653 In this case, with the optimization, note that after the first commit:
654 * MERGE_SIDE1 remembers oldfile -> newdir/newfile
655 (NOT oldfile -> olddir/newfile; compare to case of second
656 block under p->status == 'R' in possibly_cache_new_pair())
657 * MERGE_SIDE2 renames are tossed because only MERGE_SIDE1 is remembered
658
659 Given the cached rename noted above, the second merge can proceed as
660 expected without needing to perform rename detection from A -> A'.
661
662Finally, I'll just note here that interactions with the
663skip-irrelevant-renames optimization means we sometimes don't detect
664renames for any files within a directory that was renamed, in which
665case we will not have been able to detect any rename for the directory
666itself. In such a case, we do not know whether the directory was
548afb0d 667renamed; we want to be careful to avoid caching some kind of "this
bb80333c
EN
668directory was not renamed" statement. If we did, then a subsequent
669commit being rebased could add a file to the old directory, and the
670user would expect it to end up in the correct directory -- something
671our erroneous "this directory was not renamed" cache would preclude.