]>
Commit | Line | Data |
---|---|---|
fb90dca3 TG |
1 | Rerere |
2 | ====== | |
3 | ||
4 | This document describes the rerere logic. | |
5 | ||
6 | Conflict normalization | |
7 | ---------------------- | |
8 | ||
9 | To ensure recorded conflict resolutions can be looked up in the rerere | |
10 | database, even when branches are merged in a different order, | |
11 | different branches are merged that result in the same conflict, or | |
12 | when different conflict style settings are used, rerere normalizes the | |
13 | conflicts before writing them to the rerere database. | |
14 | ||
15 | Different conflict styles and branch names are normalized by stripping | |
16 | the labels from the conflict markers, and removing the common ancestor | |
17 | version from the `diff3` conflict style. Branches that are merged | |
18 | in different order are normalized by sorting the conflict hunks. More | |
19 | on each of those steps in the following sections. | |
20 | ||
21 | Once these two normalization operations are applied, a conflict ID is | |
22 | calculated based on the normalized conflict, which is later used by | |
23 | rerere to look up the conflict in the rerere database. | |
24 | ||
25 | Removing the common ancestor version | |
26 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
27 | ||
28 | Say we have three branches AB, AC and AC2. The common ancestor of | |
29 | these branches has a file with a line containing the string "A" (for | |
30 | brevity this is called "line A" in the rest of the document). In | |
31 | branch AB this line is changed to "B", in AC, this line is changed to | |
32 | "C", and branch AC2 is forked off of AC, after the line was changed to | |
33 | "C". | |
34 | ||
35 | Forking a branch ABAC off of branch AB and then merging AC into it, we | |
36 | get a conflict like the following: | |
37 | ||
38 | <<<<<<< HEAD | |
39 | B | |
40 | ======= | |
41 | C | |
42 | >>>>>>> AC | |
43 | ||
44 | Doing the analogous with AC2 (forking a branch ABAC2 off of branch AB | |
45 | and then merging branch AC2 into it), using the diff3 conflict style, | |
46 | we get a conflict like the following: | |
47 | ||
48 | <<<<<<< HEAD | |
49 | B | |
50 | ||||||| merged common ancestors | |
51 | A | |
52 | ======= | |
53 | C | |
54 | >>>>>>> AC2 | |
55 | ||
56 | By resolving this conflict, to leave line D, the user declares: | |
57 | ||
58 | After examining what branches AB and AC did, I believe that making | |
59 | line A into line D is the best thing to do that is compatible with | |
60 | what AB and AC wanted to do. | |
61 | ||
62 | As branch AC2 refers to the same commit as AC, the above implies that | |
63 | this is also compatible what AB and AC2 wanted to do. | |
64 | ||
65 | By extension, this means that rerere should recognize that the above | |
66 | conflicts are the same. To do this, the labels on the conflict | |
67 | markers are stripped, and the common ancestor version is removed. The above | |
68 | examples would both result in the following normalized conflict: | |
69 | ||
70 | <<<<<<< | |
71 | B | |
72 | ======= | |
73 | C | |
74 | >>>>>>> | |
75 | ||
76 | Sorting hunks | |
77 | ~~~~~~~~~~~~~ | |
78 | ||
79 | As before, lets imagine that a common ancestor had a file with line A | |
80 | its early part, and line X in its late part. And then four branches | |
81 | are forked that do these things: | |
82 | ||
83 | - AB: changes A to B | |
84 | - AC: changes A to C | |
85 | - XY: changes X to Y | |
86 | - XZ: changes X to Z | |
87 | ||
88 | Now, forking a branch ABAC off of branch AB and then merging AC into | |
89 | it, and forking a branch ACAB off of branch AC and then merging AB | |
90 | into it, would yield the conflict in a different order. The former | |
91 | would say "A became B or C, what now?" while the latter would say "A | |
92 | became C or B, what now?" | |
93 | ||
94 | As a reminder, the act of merging AC into ABAC and resolving the | |
95 | conflict to leave line D means that the user declares: | |
96 | ||
97 | After examining what branches AB and AC did, I believe that | |
98 | making line A into line D is the best thing to do that is | |
99 | compatible with what AB and AC wanted to do. | |
100 | ||
101 | So the conflict we would see when merging AB into ACAB should be | |
102 | resolved the same way---it is the resolution that is in line with that | |
103 | declaration. | |
104 | ||
105 | Imagine that similarly previously a branch XYXZ was forked from XY, | |
106 | and XZ was merged into it, and resolved "X became Y or Z" into "X | |
107 | became W". | |
108 | ||
109 | Now, if a branch ABXY was forked from AB and then merged XY, then ABXY | |
110 | would have line B in its early part and line Y in its later part. | |
111 | Such a merge would be quite clean. We can construct 4 combinations | |
112 | using these four branches ((AB, AC) x (XY, XZ)). | |
113 | ||
114 | Merging ABXY and ACXZ would make "an early A became B or C, a late X | |
115 | became Y or Z" conflict, while merging ACXY and ABXZ would make "an | |
116 | early A became C or B, a late X became Y or Z". We can see there are | |
117 | 4 combinations of ("B or C", "C or B") x ("X or Y", "Y or X"). | |
118 | ||
119 | By sorting, the conflict is given its canonical name, namely, "an | |
120 | early part became B or C, a late part becames X or Y", and whenever | |
121 | any of these four patterns appear, and we can get to the same conflict | |
122 | and resolution that we saw earlier. | |
123 | ||
124 | Without the sorting, we'd have to somehow find a previous resolution | |
125 | from combinatorial explosion. | |
126 | ||
127 | Conflict ID calculation | |
128 | ~~~~~~~~~~~~~~~~~~~~~~~ | |
129 | ||
130 | Once the conflict normalization is done, the conflict ID is calculated | |
131 | as the sha1 hash of the conflict hunks appended to each other, | |
132 | separated by <NUL> characters. The conflict markers are stripped out | |
133 | before the sha1 is calculated. So in the example above, where we | |
134 | merge branch AC which changes line A to line C, into branch AB, which | |
135 | changes line A to line C, the conflict ID would be | |
136 | SHA1('B<NUL>C<NUL>'). | |
137 | ||
138 | If there are multiple conflicts in one file, the sha1 is calculated | |
139 | the same way with all hunks appended to each other, in the order in | |
140 | which they appear in the file, separated by a <NUL> character. | |
4af32207 TG |
141 | |
142 | Nested conflicts | |
143 | ~~~~~~~~~~~~~~~~ | |
144 | ||
145 | Nested conflicts are handled very similarly to "simple" conflicts. | |
146 | Similar to simple conflicts, the conflict is first normalized by | |
147 | stripping the labels from conflict markers, stripping the common ancestor | |
148 | version, and the sorting the conflict hunks, both for the outer and the | |
149 | inner conflict. This is done recursively, so any number of nested | |
150 | conflicts can be handled. | |
151 | ||
bc4caecf TG |
152 | Note that this only works for conflict markers that "cleanly nest". If |
153 | there are any unmatched conflict markers, rerere will fail to handle | |
154 | the conflict and record a conflict resolution. | |
155 | ||
4af32207 TG |
156 | The only difference is in how the conflict ID is calculated. For the |
157 | inner conflict, the conflict markers themselves are not stripped out | |
158 | before calculating the sha1. | |
159 | ||
160 | Say we have the following conflict for example: | |
161 | ||
162 | <<<<<<< HEAD | |
163 | 1 | |
164 | ======= | |
165 | <<<<<<< HEAD | |
166 | 3 | |
167 | ======= | |
168 | 2 | |
169 | >>>>>>> branch-2 | |
170 | >>>>>>> branch-3~ | |
171 | ||
172 | After stripping out the labels of the conflict markers, and sorting | |
173 | the hunks, the conflict would look as follows: | |
174 | ||
175 | <<<<<<< | |
176 | 1 | |
177 | ======= | |
178 | <<<<<<< | |
179 | 2 | |
180 | ======= | |
181 | 3 | |
182 | >>>>>>> | |
183 | >>>>>>> | |
184 | ||
185 | and finally the conflict ID would be calculated as: | |
186 | `sha1('1<NUL><<<<<<<\n3\n=======\n2\n>>>>>>><NUL>')` |