]> git.ipfire.org Git - thirdparty/git.git/blame - Documentation/git-filter-branch.txt
Git 1.9.0
[thirdparty/git.git] / Documentation / git-filter-branch.txt
CommitLineData
c401b33c
JS
1git-filter-branch(1)
2====================
3
4NAME
5----
6git-filter-branch - Rewrite branches
7
8SYNOPSIS
9--------
10[verse]
b1889c36 11'git filter-branch' [--env-filter <command>] [--tree-filter <command>]
c401b33c
JS
12 [--index-filter <command>] [--parent-filter <command>]
13 [--msg-filter <command>] [--commit-filter <command>]
14 [--tag-name-filter <command>] [--subdirectory-filter <directory>]
e3679ab4 15 [--prune-empty]
5433235d 16 [--original <namespace>] [-d <directory>] [-f | --force]
8afa4210 17 [--] [<rev-list options>...]
c401b33c
JS
18
19DESCRIPTION
20-----------
2de9b711 21Lets you rewrite Git revision history by rewriting the branches mentioned
08203668 22in the <rev-list options>, applying custom filters on each revision.
c401b33c
JS
23Those filters can modify each tree (e.g. removing a file or running
24a perl rewrite on all files) or information about each commit.
25Otherwise, all information (including original commit times or merge
26information) will be preserved.
27
08203668 28The command will only rewrite the _positive_ refs mentioned in the
bf7c9021 29command line (e.g. if you pass 'a..b', only 'b' will be rewritten).
08203668
JS
30If you specify no filters, the commits will be recommitted without any
31changes, which would normally have no effect. Nevertheless, this may be
2de9b711 32useful in the future for compensating for some Git bugs or such,
08203668 33therefore such a usage is permitted.
c401b33c 34
831e61f8
JH
35*NOTE*: This command honors `.git/info/grafts` file and refs in
36the `refs/replace/` namespace.
0dc310e8
PC
37If you have any grafts or replacement refs defined, running this command
38will make them permanent.
c6d8f763 39
73616fd3 40*WARNING*! The rewritten history will have different object names for all
c401b33c
JS
41the objects and will not converge with the original branch. You will not
42be able to easily push and distribute the rewritten branch on top of the
43original branch. Please do not use this command if you do not know the
44full implications, and avoid using it anyway, if a simple single commit
97c33c65
TR
45would suffice to fix your problem. (See the "RECOVERING FROM UPSTREAM
46REBASE" section in linkgit:git-rebase[1] for further information about
47rewriting published history.)
c401b33c 48
dfd05e38
JS
49Always verify that the rewritten version is correct: The original refs,
50if different from the rewritten ones, will be stored in the namespace
51'refs/original/'.
c401b33c 52
bf7c9021 53Note that since this operation is very I/O expensive, it might
08203668
JS
54be a good idea to redirect the temporary directory off-disk with the
55'-d' option, e.g. on tmpfs. Reportedly the speedup is very noticeable.
c401b33c
JS
56
57
58Filters
59~~~~~~~
60
61The filters are applied in the order as listed below. The <command>
bf7c9021
RW
62argument is always evaluated in the shell context using the 'eval' command
63(with the notable exception of the commit filter, for technical reasons).
c401b33c
JS
64Prior to that, the $GIT_COMMIT environment variable will be set to contain
65the id of the commit being rewritten. Also, GIT_AUTHOR_NAME,
66GIT_AUTHOR_EMAIL, GIT_AUTHOR_DATE, GIT_COMMITTER_NAME, GIT_COMMITTER_EMAIL,
bee3eb07
TAK
67and GIT_COMMITTER_DATE are taken from the current commit and exported to
68the environment, in order to affect the author and committer identities of
69the replacement commit created by linkgit:git-commit-tree[1] after the
70filters have run.
71
bf7c9021
RW
72If any evaluation of <command> returns a non-zero exit status, the whole
73operation will be aborted.
c401b33c
JS
74
75A 'map' function is available that takes an "original sha1 id" argument
76and outputs a "rewritten sha1 id" if the commit has been already
32c37c12
JS
77rewritten, and "original sha1 id" otherwise; the 'map' function can
78return several ids on separate lines if your commit filter emitted
79multiple commits.
c401b33c
JS
80
81
82OPTIONS
83-------
84
85--env-filter <command>::
bf7c9021
RW
86 This filter may be used if you only need to modify the environment
87 in which the commit will be performed. Specifically, you might
88 want to rewrite the author/committer name/email/time environment
831a8b84 89 variables (see linkgit:git-commit-tree[1] for details). Do not forget
c401b33c
JS
90 to re-export the variables.
91
92--tree-filter <command>::
93 This is the filter for rewriting the tree and its contents.
94 The argument is evaluated in shell with the working
95 directory set to the root of the checked out tree. The new tree
96 is then used as-is (new files are auto-added, disappeared files
97 are auto-removed - neither .gitignore files nor any other ignore
73616fd3 98 rules *HAVE ANY EFFECT*!).
c401b33c
JS
99
100--index-filter <command>::
101 This is the filter for rewriting the index. It is similar to the
102 tree filter but does not check out the tree, which makes it much
6cf378f0
JK
103 faster. Frequently used with `git rm --cached
104 --ignore-unmatch ...`, see EXAMPLES below. For hairy
3bc427e0 105 cases, see linkgit:git-update-index[1].
c401b33c
JS
106
107--parent-filter <command>::
108 This is the filter for rewriting the commit's parent list.
109 It will receive the parent string on stdin and shall output
110 the new parent string on stdout. The parent string is in
483bc4f0 111 the format described in linkgit:git-commit-tree[1]: empty for
c401b33c
JS
112 the initial commit, "-p parent" for a normal commit and
113 "-p parent1 -p parent2 -p parent3 ..." for a merge commit.
114
115--msg-filter <command>::
116 This is the filter for rewriting the commit messages.
117 The argument is evaluated in the shell with the original
118 commit message on standard input; its standard output is
119 used as the new commit message.
120
121--commit-filter <command>::
122 This is the filter for performing the commit.
123 If this filter is specified, it will be called instead of the
0b444cdb 124 'git commit-tree' command, with arguments of the form
0adda936 125 "<TREE_ID> [(-p <PARENT_COMMIT_ID>)...]" and the log message on
c401b33c
JS
126 stdin. The commit id is expected on stdout.
127+
128As a special extension, the commit filter may emit multiple
c5833f6e 129commit ids; in that case, the rewritten children of the original commit will
c401b33c 130have all of them as parents.
f95eef15
JS
131+
132You can use the 'map' convenience function in this filter, and other
133convenience functions, too. For example, calling 'skip_commit "$@"'
134will leave out the current commit (but not its changes! If you want
0b444cdb 135that, use 'git rebase' instead).
d3240d93 136+
ca768288
TR
137You can also use the `git_commit_non_empty_tree "$@"` instead of
138`git commit-tree "$@"` if you don't wish to keep commits with a single parent
d3240d93 139and that makes no change to the tree.
c401b33c
JS
140
141--tag-name-filter <command>::
142 This is the filter for rewriting tag names. When passed,
143 it will be called for every tag ref that points to a rewritten
144 object (or to a tag object which points to a rewritten object).
145 The original tag name is passed via standard input, and the new
146 tag name is expected on standard output.
147+
148The original tags are not deleted, but can be overwritten;
5876b8ee 149use "--tag-name-filter cat" to simply update the tags. In this
c401b33c
JS
150case, be very careful and make sure you have the old tags
151backed up in case the conversion has run afoul.
152+
1bf6551e
BC
153Nearly proper rewriting of tag objects is supported. If the tag has
154a message attached, a new tag object will be created with the same message,
155author, and timestamp. If the tag has a signature attached, the
156signature will be stripped. It is by definition impossible to preserve
157signatures. The reason this is "nearly" proper, is because ideally if
158the tag did not change (points to the same object, has the same name, etc.)
159it should retain any signature. That is not the case, signatures will always
160be removed, buyer beware. There is also no support for changing the
161author or timestamp (or the tag message for that matter). Tags which point
162to other tags will be rewritten to point to the underlying commit.
c401b33c
JS
163
164--subdirectory-filter <directory>::
73616fd3
JS
165 Only look at the history which touches the given subdirectory.
166 The result will contain that directory (and only that) as its
7ec344d8 167 project root. Implies <<Remap_to_ancestor>>.
c401b33c 168
d3240d93
PH
169--prune-empty::
170 Some kind of filters will generate empty commits, that left the tree
171 untouched. This switch allow git-filter-branch to ignore such
172 commits. Though, this switch only applies for commits that have one
173 and only one parent, it will hence keep merges points. Also, this
174 option is not compatible with the use of '--commit-filter'. Though you
175 just need to use the function 'git_commit_non_empty_tree "$@"' instead
ca768288 176 of the `git commit-tree "$@"` idiom in your commit filter to make that
d3240d93
PH
177 happen.
178
5433235d
GB
179--original <namespace>::
180 Use this option to set the namespace where the original commits
181 will be stored. The default value is 'refs/original'.
182
c401b33c
JS
183-d <directory>::
184 Use this option to set the path to the temporary directory used for
185 rewriting. When applying a tree filter, the command needs to
bf7c9021 186 temporarily check out the tree to some directory, which may consume
c401b33c
JS
187 considerable space in case of large projects. By default it
188 does this in the '.git-rewrite/' directory but you can override
189 that choice by this parameter.
190
3240240f
SB
191-f::
192--force::
0b444cdb 193 'git filter-branch' refuses to start with an existing temporary
dfd05e38
JS
194 directory or when there are already refs starting with
195 'refs/original/', unless forced.
196
f448e24e 197<rev-list options>...::
0b444cdb 198 Arguments for 'git rev-list'. All positive refs included by
8afa4210
TR
199 these options are rewritten. You may also specify options
200 such as '--all', but you must use '--' to separate them from
7ec344d8
CH
201 the 'git filter-branch' options. Implies <<Remap_to_ancestor>>.
202
203
204[[Remap_to_ancestor]]
205Remap to ancestor
206~~~~~~~~~~~~~~~~~
207
208By using linkgit:rev-list[1] arguments, e.g., path limiters, you can limit the
209set of revisions which get rewritten. However, positive refs on the command
210line are distinguished: we don't let them be excluded by such limiters. For
211this purpose, they are instead rewritten to point at the nearest ancestor that
212was not excluded.
c401b33c
JS
213
214
215Examples
216--------
217
218Suppose you want to remove a file (containing confidential information
219or copyright violation) from all commits:
220
221-------------------------------------------------------
dfd05e38 222git filter-branch --tree-filter 'rm filename' HEAD
c401b33c
JS
223-------------------------------------------------------
224
e4d594c6
JL
225However, if the file is absent from the tree of some commit,
226a simple `rm filename` will fail for that tree and commit.
227Thus you may instead want to use `rm -f filename` as the script.
228
6cf378f0 229Using `--index-filter` with 'git rm' yields a significantly faster
3bc427e0
TR
230version. Like with using `rm filename`, `git rm --cached filename`
231will fail if the file is absent from the tree of a commit. If you
232want to "completely forget" a file, it does not matter when it entered
6cf378f0 233history, so we also add `--ignore-unmatch`:
c401b33c 234
dfd05e38 235--------------------------------------------------------------------------
3bc427e0 236git filter-branch --index-filter 'git rm --cached --ignore-unmatch filename' HEAD
dfd05e38 237--------------------------------------------------------------------------
c401b33c 238
8ef44519 239Now, you will get the rewritten history saved in HEAD.
c401b33c 240
8afa4210
TR
241To rewrite the repository to look as if `foodir/` had been its project
242root, and discard all other history:
243
244-------------------------------------------------------
245git filter-branch --subdirectory-filter foodir -- --all
246-------------------------------------------------------
247
248Thus you can, e.g., turn a library subdirectory into a repository of
6cf378f0
JK
249its own. Note the `--` that separates 'filter-branch' options from
250revision options, and the `--all` to rewrite all branches and tags.
8afa4210 251
32c37c12
JS
252To set a commit (which typically is at the tip of another
253history) to be the parent of the current initial commit, in
254order to paste the other history behind the current history:
c401b33c 255
dfd05e38
JS
256-------------------------------------------------------------------
257git filter-branch --parent-filter 'sed "s/^\$/-p <graft-id>/"' HEAD
258-------------------------------------------------------------------
c401b33c 259
08203668
JS
260(if the parent string is empty - which happens when we are dealing with
261the initial commit - add graftcommit as a parent). Note that this assumes
c401b33c
JS
262history with a single root (that is, no merge without common ancestors
263happened). If this is not the case, use:
264
dfd05e38 265--------------------------------------------------------------------------
c401b33c 266git filter-branch --parent-filter \
41e86a37 267 'test $GIT_COMMIT = <commit-id> && echo "-p <graft-id>" || cat' HEAD
dfd05e38 268--------------------------------------------------------------------------
c401b33c 269
32c37c12
JS
270or even simpler:
271
272-----------------------------------------------
273echo "$commit-id $graft-id" >> .git/info/grafts
dfd05e38 274git filter-branch $graft-id..HEAD
32c37c12
JS
275-----------------------------------------------
276
c401b33c
JS
277To remove commits authored by "Darl McBribe" from the history:
278
279------------------------------------------------------------------------------
280git filter-branch --commit-filter '
281 if [ "$GIT_AUTHOR_NAME" = "Darl McBribe" ];
282 then
f95eef15 283 skip_commit "$@";
c401b33c
JS
284 else
285 git commit-tree "$@";
dfd05e38 286 fi' HEAD
c401b33c
JS
287------------------------------------------------------------------------------
288
8451c565 289The function 'skip_commit' is defined as follows:
f95eef15
JS
290
291--------------------------
292skip_commit()
293{
294 shift;
295 while [ -n "$1" ];
296 do
297 shift;
298 map "$1";
299 shift;
300 done;
301}
302--------------------------
303
c401b33c
JS
304The shift magic first throws away the tree id and then the -p
305parameters. Note that this handles merges properly! In case Darl
306committed a merge between P1 and P2, it will be propagated properly
307and all children of the merge will become merge commits with P1,P2
308as their parents instead of the merge commit.
309
8093ae88
AS
310*NOTE* the changes introduced by the commits, and which are not reverted
311by subsequent commits, will still be in the rewritten branch. If you want
312to throw out _changes_ together with the commits, you should use the
313interactive mode of 'git rebase'.
314
a1748890 315You can rewrite the commit log messages using `--msg-filter`. For
0b444cdb 316example, 'git svn-id' strings in a repository created by 'git svn' can
ed10d9aa
MV
317be removed this way:
318
319-------------------------------------------------------
a1748890 320git filter-branch --msg-filter '
ed10d9aa
MV
321 sed -e "/^git-svn-id:/d"
322'
323-------------------------------------------------------
f95eef15 324
b8f42332
JS
325If you need to add 'Acked-by' lines to, say, the last 10 commits (none
326of which is a merge), use this command:
327
328--------------------------------------------------------
329git filter-branch --msg-filter '
330 cat &&
331 echo "Acked-by: Bugs Bunny <bunny@bugzilla.org>"
332' HEAD~10..HEAD
333--------------------------------------------------------
334
21b6e4f2
TAK
335The `--env-filter` option can be used to modify committer and/or author
336identity. For example, if you found out that your commits have the wrong
337identity due to a misconfigured user.email, you can make a correction,
338before publishing the project, like this:
339
340--------------------------------------------------------
341git filter-branch --env-filter '
342 if test "$GIT_AUTHOR_EMAIL" = "root@localhost"
343 then
344 GIT_AUTHOR_EMAIL=john@example.com
345 export GIT_AUTHOR_EMAIL
346 fi
347 if test "$GIT_COMMITTER_EMAIL" = "root@localhost"
348 then
349 GIT_COMMITTER_EMAIL=john@example.com
350 export GIT_COMMITTER_EMAIL
351 fi
352' -- --all
353--------------------------------------------------------
354
8093ae88
AS
355To restrict rewriting to only part of the history, specify a revision
356range in addition to the new branch name. The new branch name will
357point to the top-most revision that a 'git rev-list' of this range
358will print.
08203668 359
c401b33c
JS
360Consider this history:
361
362------------------
363 D--E--F--G--H
364 / /
365A--B-----C
366------------------
367
368To rewrite only commits D,E,F,G,H, but leave A, B and C alone, use:
369
370--------------------------------
dfd05e38 371git filter-branch ... C..H
c401b33c
JS
372--------------------------------
373
374To rewrite commits E,F,G,H, use one of these:
375
376----------------------------------------
dfd05e38
JS
377git filter-branch ... C..H --not D
378git filter-branch ... D..H --not C
c401b33c
JS
379----------------------------------------
380
381To move the whole tree into a subdirectory, or remove it from there:
382
383---------------------------------------------------------------
384git filter-branch --index-filter \
d2d66f15 385 'git ls-files -s | sed "s-\t\"*-&newsubdir/-" |
c401b33c
JS
386 GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
387 git update-index --index-info &&
6cb0186a 388 mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE"' HEAD
c401b33c
JS
389---------------------------------------------------------------
390
391
d0268de6
TR
392
393Checklist for Shrinking a Repository
394------------------------------------
395
615b8f1a 396git-filter-branch can be used to get rid of a subset of files,
6cf378f0
JK
397usually with some combination of `--index-filter` and
398`--subdirectory-filter`. People expect the resulting repository to
d0268de6 399be smaller than the original, but you need a few more steps to
2de9b711 400actually make it smaller, because Git tries hard not to lose your
d0268de6
TR
401objects until you tell it to. First make sure that:
402
403* You really removed all variants of a filename, if a blob was moved
6cf378f0
JK
404 over its lifetime. `git log --name-only --follow --all -- filename`
405 can help you find renames.
d0268de6 406
6cf378f0
JK
407* You really filtered all refs: use `--tag-name-filter cat -- --all`
408 when calling git-filter-branch.
d0268de6
TR
409
410Then there are two ways to get a smaller repository. A safer way is
411to clone, that keeps your original intact.
412
6cf378f0 413* Clone it with `git clone file:///path/to/repo`. The clone
d0268de6
TR
414 will not have the removed objects. See linkgit:git-clone[1]. (Note
415 that cloning with a plain path just hardlinks everything!)
416
417If you really don't want to clone it, for whatever reasons, check the
418following points instead (in this order). This is a very destructive
419approach, so *make a backup* or go back to cloning it. You have been
420warned.
421
422* Remove the original refs backed up by git-filter-branch: say `git
6cf378f0 423 for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git
d0268de6
TR
424 update-ref -d`.
425
6cf378f0 426* Expire all reflogs with `git reflog expire --expire=now --all`.
d0268de6 427
6cf378f0 428* Garbage collect all unreferenced objects with `git gc --prune=now`
d0268de6 429 (or if your git-gc is not new enough to support arguments to
6cf378f0 430 `--prune`, use `git repack -ad; git prune` instead).
d0268de6 431
615b8f1a
RT
432Notes
433-----
434
435git-filter-branch allows you to make complex shell-scripted rewrites
436of your Git history, but you probably don't need this flexibility if
437you're simply _removing unwanted data_ like large files or passwords.
438For those operations you may want to consider
439link:http://rtyley.github.io/bfg-repo-cleaner/[The BFG Repo-Cleaner],
440a JVM-based alternative to git-filter-branch, typically at least
44110-50x faster for those use-cases, and with quite different
442characteristics:
443
444* Any particular version of a file is cleaned exactly _once_. The BFG,
445 unlike git-filter-branch, does not give you the opportunity to
446 handle a file differently based on where or when it was committed
447 within your history. This constraint gives the core performance
448 benefit of The BFG, and is well-suited to the task of cleansing bad
449 data - you don't care _where_ the bad data is, you just want it
450 _gone_.
451
452* By default The BFG takes full advantage of multi-core machines,
453 cleansing commit file-trees in parallel. git-filter-branch cleans
454 commits sequentially (ie in a single-threaded manner), though it
455 _is_ possible to write filters that include their own parallellism,
456 in the scripts executed against each commit.
457
458* The link:http://rtyley.github.io/bfg-repo-cleaner/#examples[command options]
459 are much more restrictive than git-filter branch, and dedicated just
460 to the tasks of removing unwanted data- e.g:
461 `--strip-blobs-bigger-than 1M`.
462
c401b33c
JS
463GIT
464---
9e1f0a85 465Part of the linkgit:git[1] suite