]> git.ipfire.org Git - thirdparty/git.git/blame - Documentation/git-filter-branch.txt
docs: describe impact of repack on "clone -s"
[thirdparty/git.git] / Documentation / git-filter-branch.txt
CommitLineData
c401b33c
JS
1git-filter-branch(1)
2====================
3
4NAME
5----
6git-filter-branch - Rewrite branches
7
8SYNOPSIS
9--------
10[verse]
b1889c36 11'git filter-branch' [--env-filter <command>] [--tree-filter <command>]
c401b33c
JS
12 [--index-filter <command>] [--parent-filter <command>]
13 [--msg-filter <command>] [--commit-filter <command>]
14 [--tag-name-filter <command>] [--subdirectory-filter <directory>]
5433235d 15 [--original <namespace>] [-d <directory>] [-f | --force]
8afa4210 16 [--] [<rev-list options>...]
c401b33c
JS
17
18DESCRIPTION
19-----------
08203668
JS
20Lets you rewrite git revision history by rewriting the branches mentioned
21in the <rev-list options>, applying custom filters on each revision.
c401b33c
JS
22Those filters can modify each tree (e.g. removing a file or running
23a perl rewrite on all files) or information about each commit.
24Otherwise, all information (including original commit times or merge
25information) will be preserved.
26
08203668 27The command will only rewrite the _positive_ refs mentioned in the
bf7c9021 28command line (e.g. if you pass 'a..b', only 'b' will be rewritten).
08203668
JS
29If you specify no filters, the commits will be recommitted without any
30changes, which would normally have no effect. Nevertheless, this may be
31useful in the future for compensating for some git bugs or such,
32therefore such a usage is permitted.
c401b33c 33
c6d8f763
DCS
34*NOTE*: This command honors `.git/info/grafts`. If you have any grafts
35defined, running this command will make them permanent.
36
73616fd3 37*WARNING*! The rewritten history will have different object names for all
c401b33c
JS
38the objects and will not converge with the original branch. You will not
39be able to easily push and distribute the rewritten branch on top of the
40original branch. Please do not use this command if you do not know the
41full implications, and avoid using it anyway, if a simple single commit
97c33c65
TR
42would suffice to fix your problem. (See the "RECOVERING FROM UPSTREAM
43REBASE" section in linkgit:git-rebase[1] for further information about
44rewriting published history.)
c401b33c 45
dfd05e38
JS
46Always verify that the rewritten version is correct: The original refs,
47if different from the rewritten ones, will be stored in the namespace
48'refs/original/'.
c401b33c 49
bf7c9021 50Note that since this operation is very I/O expensive, it might
08203668
JS
51be a good idea to redirect the temporary directory off-disk with the
52'-d' option, e.g. on tmpfs. Reportedly the speedup is very noticeable.
c401b33c
JS
53
54
55Filters
56~~~~~~~
57
58The filters are applied in the order as listed below. The <command>
bf7c9021
RW
59argument is always evaluated in the shell context using the 'eval' command
60(with the notable exception of the commit filter, for technical reasons).
c401b33c
JS
61Prior to that, the $GIT_COMMIT environment variable will be set to contain
62the id of the commit being rewritten. Also, GIT_AUTHOR_NAME,
63GIT_AUTHOR_EMAIL, GIT_AUTHOR_DATE, GIT_COMMITTER_NAME, GIT_COMMITTER_EMAIL,
bf7c9021
RW
64and GIT_COMMITTER_DATE are set according to the current commit. The values
65of these variables after the filters have run, are used for the new commit.
66If any evaluation of <command> returns a non-zero exit status, the whole
67operation will be aborted.
c401b33c
JS
68
69A 'map' function is available that takes an "original sha1 id" argument
70and outputs a "rewritten sha1 id" if the commit has been already
32c37c12
JS
71rewritten, and "original sha1 id" otherwise; the 'map' function can
72return several ids on separate lines if your commit filter emitted
73multiple commits.
c401b33c
JS
74
75
76OPTIONS
77-------
78
79--env-filter <command>::
bf7c9021
RW
80 This filter may be used if you only need to modify the environment
81 in which the commit will be performed. Specifically, you might
82 want to rewrite the author/committer name/email/time environment
5162e697 83 variables (see linkgit:git-commit[1] for details). Do not forget
c401b33c
JS
84 to re-export the variables.
85
86--tree-filter <command>::
87 This is the filter for rewriting the tree and its contents.
88 The argument is evaluated in shell with the working
89 directory set to the root of the checked out tree. The new tree
90 is then used as-is (new files are auto-added, disappeared files
91 are auto-removed - neither .gitignore files nor any other ignore
73616fd3 92 rules *HAVE ANY EFFECT*!).
c401b33c
JS
93
94--index-filter <command>::
95 This is the filter for rewriting the index. It is similar to the
96 tree filter but does not check out the tree, which makes it much
3bc427e0
TR
97 faster. Frequently used with `git rm \--cached
98 \--ignore-unmatch ...`, see EXAMPLES below. For hairy
99 cases, see linkgit:git-update-index[1].
c401b33c
JS
100
101--parent-filter <command>::
102 This is the filter for rewriting the commit's parent list.
103 It will receive the parent string on stdin and shall output
104 the new parent string on stdout. The parent string is in
483bc4f0 105 the format described in linkgit:git-commit-tree[1]: empty for
c401b33c
JS
106 the initial commit, "-p parent" for a normal commit and
107 "-p parent1 -p parent2 -p parent3 ..." for a merge commit.
108
109--msg-filter <command>::
110 This is the filter for rewriting the commit messages.
111 The argument is evaluated in the shell with the original
112 commit message on standard input; its standard output is
113 used as the new commit message.
114
115--commit-filter <command>::
116 This is the filter for performing the commit.
117 If this filter is specified, it will be called instead of the
ba020ef5 118 'git-commit-tree' command, with arguments of the form
c401b33c
JS
119 "<TREE_ID> [-p <PARENT_COMMIT_ID>]..." and the log message on
120 stdin. The commit id is expected on stdout.
121+
122As a special extension, the commit filter may emit multiple
c5833f6e 123commit ids; in that case, the rewritten children of the original commit will
c401b33c 124have all of them as parents.
f95eef15
JS
125+
126You can use the 'map' convenience function in this filter, and other
127convenience functions, too. For example, calling 'skip_commit "$@"'
128will leave out the current commit (but not its changes! If you want
ba020ef5 129that, use 'git-rebase' instead).
d3240d93
PH
130+
131You can also use the 'git_commit_non_empty_tree "$@"' instead of
132'git commit-tree "$@"' if you don't wish to keep commits with a single parent
133and that makes no change to the tree.
c401b33c
JS
134
135--tag-name-filter <command>::
136 This is the filter for rewriting tag names. When passed,
137 it will be called for every tag ref that points to a rewritten
138 object (or to a tag object which points to a rewritten object).
139 The original tag name is passed via standard input, and the new
140 tag name is expected on standard output.
141+
142The original tags are not deleted, but can be overwritten;
5876b8ee 143use "--tag-name-filter cat" to simply update the tags. In this
c401b33c
JS
144case, be very careful and make sure you have the old tags
145backed up in case the conversion has run afoul.
146+
1bf6551e
BC
147Nearly proper rewriting of tag objects is supported. If the tag has
148a message attached, a new tag object will be created with the same message,
149author, and timestamp. If the tag has a signature attached, the
150signature will be stripped. It is by definition impossible to preserve
151signatures. The reason this is "nearly" proper, is because ideally if
152the tag did not change (points to the same object, has the same name, etc.)
153it should retain any signature. That is not the case, signatures will always
154be removed, buyer beware. There is also no support for changing the
155author or timestamp (or the tag message for that matter). Tags which point
156to other tags will be rewritten to point to the underlying commit.
c401b33c
JS
157
158--subdirectory-filter <directory>::
73616fd3
JS
159 Only look at the history which touches the given subdirectory.
160 The result will contain that directory (and only that) as its
161 project root.
c401b33c 162
d3240d93
PH
163--prune-empty::
164 Some kind of filters will generate empty commits, that left the tree
165 untouched. This switch allow git-filter-branch to ignore such
166 commits. Though, this switch only applies for commits that have one
167 and only one parent, it will hence keep merges points. Also, this
168 option is not compatible with the use of '--commit-filter'. Though you
169 just need to use the function 'git_commit_non_empty_tree "$@"' instead
170 of the 'git commit-tree "$@"' idiom in your commit filter to make that
171 happen.
172
5433235d
GB
173--original <namespace>::
174 Use this option to set the namespace where the original commits
175 will be stored. The default value is 'refs/original'.
176
c401b33c
JS
177-d <directory>::
178 Use this option to set the path to the temporary directory used for
179 rewriting. When applying a tree filter, the command needs to
bf7c9021 180 temporarily check out the tree to some directory, which may consume
c401b33c
JS
181 considerable space in case of large projects. By default it
182 does this in the '.git-rewrite/' directory but you can override
183 that choice by this parameter.
184
3240240f
SB
185-f::
186--force::
ba020ef5 187 'git-filter-branch' refuses to start with an existing temporary
dfd05e38
JS
188 directory or when there are already refs starting with
189 'refs/original/', unless forced.
190
f448e24e 191<rev-list options>...::
8afa4210
TR
192 Arguments for 'git-rev-list'. All positive refs included by
193 these options are rewritten. You may also specify options
194 such as '--all', but you must use '--' to separate them from
195 the 'git-filter-branch' options.
c401b33c
JS
196
197
198Examples
199--------
200
201Suppose you want to remove a file (containing confidential information
202or copyright violation) from all commits:
203
204-------------------------------------------------------
dfd05e38 205git filter-branch --tree-filter 'rm filename' HEAD
c401b33c
JS
206-------------------------------------------------------
207
e4d594c6
JL
208However, if the file is absent from the tree of some commit,
209a simple `rm filename` will fail for that tree and commit.
210Thus you may instead want to use `rm -f filename` as the script.
211
3bc427e0
TR
212Using `\--index-filter` with 'git-rm' yields a significantly faster
213version. Like with using `rm filename`, `git rm --cached filename`
214will fail if the file is absent from the tree of a commit. If you
215want to "completely forget" a file, it does not matter when it entered
216history, so we also add `\--ignore-unmatch`:
c401b33c 217
dfd05e38 218--------------------------------------------------------------------------
3bc427e0 219git filter-branch --index-filter 'git rm --cached --ignore-unmatch filename' HEAD
dfd05e38 220--------------------------------------------------------------------------
c401b33c 221
8ef44519 222Now, you will get the rewritten history saved in HEAD.
c401b33c 223
8afa4210
TR
224To rewrite the repository to look as if `foodir/` had been its project
225root, and discard all other history:
226
227-------------------------------------------------------
228git filter-branch --subdirectory-filter foodir -- --all
229-------------------------------------------------------
230
231Thus you can, e.g., turn a library subdirectory into a repository of
232its own. Note the `\--` that separates 'filter-branch' options from
233revision options, and the `\--all` to rewrite all branches and tags.
234
32c37c12
JS
235To set a commit (which typically is at the tip of another
236history) to be the parent of the current initial commit, in
237order to paste the other history behind the current history:
c401b33c 238
dfd05e38
JS
239-------------------------------------------------------------------
240git filter-branch --parent-filter 'sed "s/^\$/-p <graft-id>/"' HEAD
241-------------------------------------------------------------------
c401b33c 242
08203668
JS
243(if the parent string is empty - which happens when we are dealing with
244the initial commit - add graftcommit as a parent). Note that this assumes
c401b33c
JS
245history with a single root (that is, no merge without common ancestors
246happened). If this is not the case, use:
247
dfd05e38 248--------------------------------------------------------------------------
c401b33c 249git filter-branch --parent-filter \
41e86a37 250 'test $GIT_COMMIT = <commit-id> && echo "-p <graft-id>" || cat' HEAD
dfd05e38 251--------------------------------------------------------------------------
c401b33c 252
32c37c12
JS
253or even simpler:
254
255-----------------------------------------------
256echo "$commit-id $graft-id" >> .git/info/grafts
dfd05e38 257git filter-branch $graft-id..HEAD
32c37c12
JS
258-----------------------------------------------
259
c401b33c
JS
260To remove commits authored by "Darl McBribe" from the history:
261
262------------------------------------------------------------------------------
263git filter-branch --commit-filter '
264 if [ "$GIT_AUTHOR_NAME" = "Darl McBribe" ];
265 then
f95eef15 266 skip_commit "$@";
c401b33c
JS
267 else
268 git commit-tree "$@";
dfd05e38 269 fi' HEAD
c401b33c
JS
270------------------------------------------------------------------------------
271
8451c565 272The function 'skip_commit' is defined as follows:
f95eef15
JS
273
274--------------------------
275skip_commit()
276{
277 shift;
278 while [ -n "$1" ];
279 do
280 shift;
281 map "$1";
282 shift;
283 done;
284}
285--------------------------
286
c401b33c
JS
287The shift magic first throws away the tree id and then the -p
288parameters. Note that this handles merges properly! In case Darl
289committed a merge between P1 and P2, it will be propagated properly
290and all children of the merge will become merge commits with P1,P2
291as their parents instead of the merge commit.
292
a1748890 293You can rewrite the commit log messages using `--msg-filter`. For
ba020ef5 294example, 'git-svn-id' strings in a repository created by 'git-svn' can
ed10d9aa
MV
295be removed this way:
296
297-------------------------------------------------------
a1748890 298git filter-branch --msg-filter '
ed10d9aa
MV
299 sed -e "/^git-svn-id:/d"
300'
301-------------------------------------------------------
f95eef15 302
c401b33c
JS
303To restrict rewriting to only part of the history, specify a revision
304range in addition to the new branch name. The new branch name will
ba020ef5 305point to the top-most revision that a 'git-rev-list' of this range
c401b33c
JS
306will print.
307
08203668
JS
308*NOTE* the changes introduced by the commits, and which are not reverted
309by subsequent commits, will still be in the rewritten branch. If you want
c401b33c 310to throw out _changes_ together with the commits, you should use the
ba020ef5 311interactive mode of 'git-rebase'.
c401b33c 312
08203668 313
c401b33c
JS
314Consider this history:
315
316------------------
317 D--E--F--G--H
318 / /
319A--B-----C
320------------------
321
322To rewrite only commits D,E,F,G,H, but leave A, B and C alone, use:
323
324--------------------------------
dfd05e38 325git filter-branch ... C..H
c401b33c
JS
326--------------------------------
327
328To rewrite commits E,F,G,H, use one of these:
329
330----------------------------------------
dfd05e38
JS
331git filter-branch ... C..H --not D
332git filter-branch ... D..H --not C
c401b33c
JS
333----------------------------------------
334
335To move the whole tree into a subdirectory, or remove it from there:
336
337---------------------------------------------------------------
338git filter-branch --index-filter \
339 'git ls-files -s | sed "s-\t-&newsubdir/-" |
340 GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
341 git update-index --index-info &&
dfd05e38 342 mv $GIT_INDEX_FILE.new $GIT_INDEX_FILE' HEAD
c401b33c
JS
343---------------------------------------------------------------
344
345
d0268de6
TR
346
347Checklist for Shrinking a Repository
348------------------------------------
349
350git-filter-branch is often used to get rid of a subset of files,
351usually with some combination of `\--index-filter` and
352`\--subdirectory-filter`. People expect the resulting repository to
353be smaller than the original, but you need a few more steps to
354actually make it smaller, because git tries hard not to lose your
355objects until you tell it to. First make sure that:
356
357* You really removed all variants of a filename, if a blob was moved
358 over its lifetime. `git log \--name-only \--follow \--all \--
359 filename` can help you find renames.
360
361* You really filtered all refs: use `\--tag-name-filter cat \--
362 \--all` when calling git-filter-branch.
363
364Then there are two ways to get a smaller repository. A safer way is
365to clone, that keeps your original intact.
366
367* Clone it with `git clone +++file:///path/to/repo+++`. The clone
368 will not have the removed objects. See linkgit:git-clone[1]. (Note
369 that cloning with a plain path just hardlinks everything!)
370
371If you really don't want to clone it, for whatever reasons, check the
372following points instead (in this order). This is a very destructive
373approach, so *make a backup* or go back to cloning it. You have been
374warned.
375
376* Remove the original refs backed up by git-filter-branch: say `git
377 for-each-ref \--format="%(refname)" refs/original/ | xargs -n 1 git
378 update-ref -d`.
379
380* Expire all reflogs with `git reflog expire \--expire=now \--all`.
381
382* Garbage collect all unreferenced objects with `git gc \--prune=now`
383 (or if your git-gc is not new enough to support arguments to
384 `\--prune`, use `git repack -ad; git prune` instead).
385
386
c401b33c
JS
387Author
388------
389Written by Petr "Pasky" Baudis <pasky@suse.cz>,
390and the git list <git@vger.kernel.org>
391
392Documentation
393--------------
394Documentation by Petr Baudis and the git list.
395
396GIT
397---
9e1f0a85 398Part of the linkgit:git[1] suite