]> git.ipfire.org Git - thirdparty/git.git/blame - Documentation/git-filter-branch.txt
path.c: clarify trie_find()'s in-code comment
[thirdparty/git.git] / Documentation / git-filter-branch.txt
CommitLineData
c401b33c
JS
1git-filter-branch(1)
2====================
3
4NAME
5----
6git-filter-branch - Rewrite branches
7
8SYNOPSIS
9--------
10[verse]
07c49845
DG
11'git filter-branch' [--setup <command>] [--subdirectory-filter <directory>]
12 [--env-filter <command>] [--tree-filter <command>]
13 [--index-filter <command>] [--parent-filter <command>]
14 [--msg-filter <command>] [--commit-filter <command>]
15 [--tag-name-filter <command>] [--prune-empty]
5433235d 16 [--original <namespace>] [-d <directory>] [-f | --force]
bd2c79fb 17 [--state-branch <branch>] [--] [<rev-list options>...]
c401b33c
JS
18
19DESCRIPTION
20-----------
2de9b711 21Lets you rewrite Git revision history by rewriting the branches mentioned
08203668 22in the <rev-list options>, applying custom filters on each revision.
c401b33c
JS
23Those filters can modify each tree (e.g. removing a file or running
24a perl rewrite on all files) or information about each commit.
25Otherwise, all information (including original commit times or merge
26information) will be preserved.
27
08203668 28The command will only rewrite the _positive_ refs mentioned in the
bf7c9021 29command line (e.g. if you pass 'a..b', only 'b' will be rewritten).
08203668
JS
30If you specify no filters, the commits will be recommitted without any
31changes, which would normally have no effect. Nevertheless, this may be
2de9b711 32useful in the future for compensating for some Git bugs or such,
08203668 33therefore such a usage is permitted.
c401b33c 34
831e61f8
JH
35*NOTE*: This command honors `.git/info/grafts` file and refs in
36the `refs/replace/` namespace.
0dc310e8
PC
37If you have any grafts or replacement refs defined, running this command
38will make them permanent.
c6d8f763 39
73616fd3 40*WARNING*! The rewritten history will have different object names for all
c401b33c
JS
41the objects and will not converge with the original branch. You will not
42be able to easily push and distribute the rewritten branch on top of the
43original branch. Please do not use this command if you do not know the
44full implications, and avoid using it anyway, if a simple single commit
97c33c65
TR
45would suffice to fix your problem. (See the "RECOVERING FROM UPSTREAM
46REBASE" section in linkgit:git-rebase[1] for further information about
47rewriting published history.)
c401b33c 48
dfd05e38
JS
49Always verify that the rewritten version is correct: The original refs,
50if different from the rewritten ones, will be stored in the namespace
51'refs/original/'.
c401b33c 52
bf7c9021 53Note that since this operation is very I/O expensive, it might
08203668 54be a good idea to redirect the temporary directory off-disk with the
23f8239b 55`-d` option, e.g. on tmpfs. Reportedly the speedup is very noticeable.
c401b33c
JS
56
57
58Filters
59~~~~~~~
60
61The filters are applied in the order as listed below. The <command>
bf7c9021
RW
62argument is always evaluated in the shell context using the 'eval' command
63(with the notable exception of the commit filter, for technical reasons).
47d81b5c 64Prior to that, the `$GIT_COMMIT` environment variable will be set to contain
c401b33c
JS
65the id of the commit being rewritten. Also, GIT_AUTHOR_NAME,
66GIT_AUTHOR_EMAIL, GIT_AUTHOR_DATE, GIT_COMMITTER_NAME, GIT_COMMITTER_EMAIL,
bee3eb07
TAK
67and GIT_COMMITTER_DATE are taken from the current commit and exported to
68the environment, in order to affect the author and committer identities of
69the replacement commit created by linkgit:git-commit-tree[1] after the
70filters have run.
71
bf7c9021
RW
72If any evaluation of <command> returns a non-zero exit status, the whole
73operation will be aborted.
c401b33c
JS
74
75A 'map' function is available that takes an "original sha1 id" argument
76and outputs a "rewritten sha1 id" if the commit has been already
32c37c12
JS
77rewritten, and "original sha1 id" otherwise; the 'map' function can
78return several ids on separate lines if your commit filter emitted
79multiple commits.
c401b33c
JS
80
81
82OPTIONS
83-------
84
3b117f73
AH
85--setup <command>::
86 This is not a real filter executed for each commit but a one
87 time setup just before the loop. Therefore no commit-specific
88 variables are defined yet. Functions or variables defined here
89 can be used or modified in the following filter steps except
90 the commit filter, for technical reasons.
91
07c49845
DG
92--subdirectory-filter <directory>::
93 Only look at the history which touches the given subdirectory.
94 The result will contain that directory (and only that) as its
95 project root. Implies <<Remap_to_ancestor>>.
96
c401b33c 97--env-filter <command>::
bf7c9021
RW
98 This filter may be used if you only need to modify the environment
99 in which the commit will be performed. Specifically, you might
100 want to rewrite the author/committer name/email/time environment
ba746ff9 101 variables (see linkgit:git-commit-tree[1] for details).
c401b33c
JS
102
103--tree-filter <command>::
104 This is the filter for rewriting the tree and its contents.
105 The argument is evaluated in shell with the working
106 directory set to the root of the checked out tree. The new tree
107 is then used as-is (new files are auto-added, disappeared files
108 are auto-removed - neither .gitignore files nor any other ignore
73616fd3 109 rules *HAVE ANY EFFECT*!).
c401b33c
JS
110
111--index-filter <command>::
112 This is the filter for rewriting the index. It is similar to the
113 tree filter but does not check out the tree, which makes it much
6cf378f0
JK
114 faster. Frequently used with `git rm --cached
115 --ignore-unmatch ...`, see EXAMPLES below. For hairy
3bc427e0 116 cases, see linkgit:git-update-index[1].
c401b33c
JS
117
118--parent-filter <command>::
119 This is the filter for rewriting the commit's parent list.
120 It will receive the parent string on stdin and shall output
121 the new parent string on stdout. The parent string is in
483bc4f0 122 the format described in linkgit:git-commit-tree[1]: empty for
c401b33c
JS
123 the initial commit, "-p parent" for a normal commit and
124 "-p parent1 -p parent2 -p parent3 ..." for a merge commit.
125
126--msg-filter <command>::
127 This is the filter for rewriting the commit messages.
128 The argument is evaluated in the shell with the original
129 commit message on standard input; its standard output is
130 used as the new commit message.
131
132--commit-filter <command>::
133 This is the filter for performing the commit.
134 If this filter is specified, it will be called instead of the
0b444cdb 135 'git commit-tree' command, with arguments of the form
0adda936 136 "<TREE_ID> [(-p <PARENT_COMMIT_ID>)...]" and the log message on
c401b33c
JS
137 stdin. The commit id is expected on stdout.
138+
139As a special extension, the commit filter may emit multiple
c5833f6e 140commit ids; in that case, the rewritten children of the original commit will
c401b33c 141have all of them as parents.
f95eef15
JS
142+
143You can use the 'map' convenience function in this filter, and other
144convenience functions, too. For example, calling 'skip_commit "$@"'
145will leave out the current commit (but not its changes! If you want
0b444cdb 146that, use 'git rebase' instead).
d3240d93 147+
ca768288
TR
148You can also use the `git_commit_non_empty_tree "$@"` instead of
149`git commit-tree "$@"` if you don't wish to keep commits with a single parent
d3240d93 150and that makes no change to the tree.
c401b33c
JS
151
152--tag-name-filter <command>::
153 This is the filter for rewriting tag names. When passed,
154 it will be called for every tag ref that points to a rewritten
155 object (or to a tag object which points to a rewritten object).
156 The original tag name is passed via standard input, and the new
157 tag name is expected on standard output.
158+
159The original tags are not deleted, but can be overwritten;
5876b8ee 160use "--tag-name-filter cat" to simply update the tags. In this
c401b33c
JS
161case, be very careful and make sure you have the old tags
162backed up in case the conversion has run afoul.
163+
1bf6551e
BC
164Nearly proper rewriting of tag objects is supported. If the tag has
165a message attached, a new tag object will be created with the same message,
166author, and timestamp. If the tag has a signature attached, the
167signature will be stripped. It is by definition impossible to preserve
168signatures. The reason this is "nearly" proper, is because ideally if
169the tag did not change (points to the same object, has the same name, etc.)
170it should retain any signature. That is not the case, signatures will always
171be removed, buyer beware. There is also no support for changing the
172author or timestamp (or the tag message for that matter). Tags which point
173to other tags will be rewritten to point to the underlying commit.
c401b33c 174
d3240d93 175--prune-empty::
a582a82d
DP
176 Some filters will generate empty commits that leave the tree untouched.
177 This option instructs git-filter-branch to remove such commits if they
178 have exactly one or zero non-pruned parents; merge commits will
179 therefore remain intact. This option cannot be used together with
180 `--commit-filter`, though the same effect can be achieved by using the
181 provided `git_commit_non_empty_tree` function in a commit filter.
d3240d93 182
5433235d
GB
183--original <namespace>::
184 Use this option to set the namespace where the original commits
185 will be stored. The default value is 'refs/original'.
186
c401b33c
JS
187-d <directory>::
188 Use this option to set the path to the temporary directory used for
189 rewriting. When applying a tree filter, the command needs to
bf7c9021 190 temporarily check out the tree to some directory, which may consume
c401b33c 191 considerable space in case of large projects. By default it
68ed71b5 192 does this in the `.git-rewrite/` directory but you can override
c401b33c
JS
193 that choice by this parameter.
194
3240240f
SB
195-f::
196--force::
0b444cdb 197 'git filter-branch' refuses to start with an existing temporary
dfd05e38
JS
198 directory or when there are already refs starting with
199 'refs/original/', unless forced.
200
bd2c79fb
IC
201--state-branch <branch>::
202 This option will cause the mapping from old to new objects to
203 be loaded from named branch upon startup and saved as a new
204 commit to that branch upon exit, enabling incremental of large
205 trees. If '<branch>' does not exist it will be created.
206
f448e24e 207<rev-list options>...::
0b444cdb 208 Arguments for 'git rev-list'. All positive refs included by
8afa4210 209 these options are rewritten. You may also specify options
04b125de 210 such as `--all`, but you must use `--` to separate them from
7ec344d8
CH
211 the 'git filter-branch' options. Implies <<Remap_to_ancestor>>.
212
213
214[[Remap_to_ancestor]]
215Remap to ancestor
216~~~~~~~~~~~~~~~~~
217
1cca17df 218By using linkgit:git-rev-list[1] arguments, e.g., path limiters, you can limit the
7ec344d8
CH
219set of revisions which get rewritten. However, positive refs on the command
220line are distinguished: we don't let them be excluded by such limiters. For
221this purpose, they are instead rewritten to point at the nearest ancestor that
222was not excluded.
c401b33c
JS
223
224
0a0eb2e5
ML
225EXIT STATUS
226-----------
227
228On success, the exit status is `0`. If the filter can't find any commits to
229rewrite, the exit status is `2`. On any other error, the exit status may be
230any other non-zero value.
231
232
76a8788c 233EXAMPLES
c401b33c
JS
234--------
235
236Suppose you want to remove a file (containing confidential information
237or copyright violation) from all commits:
238
239-------------------------------------------------------
dfd05e38 240git filter-branch --tree-filter 'rm filename' HEAD
c401b33c
JS
241-------------------------------------------------------
242
e4d594c6
JL
243However, if the file is absent from the tree of some commit,
244a simple `rm filename` will fail for that tree and commit.
245Thus you may instead want to use `rm -f filename` as the script.
246
6cf378f0 247Using `--index-filter` with 'git rm' yields a significantly faster
3bc427e0
TR
248version. Like with using `rm filename`, `git rm --cached filename`
249will fail if the file is absent from the tree of a commit. If you
250want to "completely forget" a file, it does not matter when it entered
6cf378f0 251history, so we also add `--ignore-unmatch`:
c401b33c 252
dfd05e38 253--------------------------------------------------------------------------
3bc427e0 254git filter-branch --index-filter 'git rm --cached --ignore-unmatch filename' HEAD
dfd05e38 255--------------------------------------------------------------------------
c401b33c 256
8ef44519 257Now, you will get the rewritten history saved in HEAD.
c401b33c 258
8afa4210
TR
259To rewrite the repository to look as if `foodir/` had been its project
260root, and discard all other history:
261
262-------------------------------------------------------
263git filter-branch --subdirectory-filter foodir -- --all
264-------------------------------------------------------
265
266Thus you can, e.g., turn a library subdirectory into a repository of
6cf378f0
JK
267its own. Note the `--` that separates 'filter-branch' options from
268revision options, and the `--all` to rewrite all branches and tags.
8afa4210 269
32c37c12
JS
270To set a commit (which typically is at the tip of another
271history) to be the parent of the current initial commit, in
272order to paste the other history behind the current history:
c401b33c 273
dfd05e38
JS
274-------------------------------------------------------------------
275git filter-branch --parent-filter 'sed "s/^\$/-p <graft-id>/"' HEAD
276-------------------------------------------------------------------
c401b33c 277
08203668
JS
278(if the parent string is empty - which happens when we are dealing with
279the initial commit - add graftcommit as a parent). Note that this assumes
c401b33c
JS
280history with a single root (that is, no merge without common ancestors
281happened). If this is not the case, use:
282
dfd05e38 283--------------------------------------------------------------------------
c401b33c 284git filter-branch --parent-filter \
41e86a37 285 'test $GIT_COMMIT = <commit-id> && echo "-p <graft-id>" || cat' HEAD
dfd05e38 286--------------------------------------------------------------------------
c401b33c 287
32c37c12
JS
288or even simpler:
289
290-----------------------------------------------
e2d65c1e 291git replace --graft $commit-id $graft-id
dfd05e38 292git filter-branch $graft-id..HEAD
32c37c12
JS
293-----------------------------------------------
294
c401b33c
JS
295To remove commits authored by "Darl McBribe" from the history:
296
297------------------------------------------------------------------------------
298git filter-branch --commit-filter '
299 if [ "$GIT_AUTHOR_NAME" = "Darl McBribe" ];
300 then
f95eef15 301 skip_commit "$@";
c401b33c
JS
302 else
303 git commit-tree "$@";
dfd05e38 304 fi' HEAD
c401b33c
JS
305------------------------------------------------------------------------------
306
8451c565 307The function 'skip_commit' is defined as follows:
f95eef15
JS
308
309--------------------------
310skip_commit()
311{
312 shift;
313 while [ -n "$1" ];
314 do
315 shift;
316 map "$1";
317 shift;
318 done;
319}
320--------------------------
321
c401b33c
JS
322The shift magic first throws away the tree id and then the -p
323parameters. Note that this handles merges properly! In case Darl
324committed a merge between P1 and P2, it will be propagated properly
325and all children of the merge will become merge commits with P1,P2
326as their parents instead of the merge commit.
327
8093ae88
AS
328*NOTE* the changes introduced by the commits, and which are not reverted
329by subsequent commits, will still be in the rewritten branch. If you want
330to throw out _changes_ together with the commits, you should use the
331interactive mode of 'git rebase'.
332
a1748890 333You can rewrite the commit log messages using `--msg-filter`. For
0b444cdb 334example, 'git svn-id' strings in a repository created by 'git svn' can
ed10d9aa
MV
335be removed this way:
336
337-------------------------------------------------------
a1748890 338git filter-branch --msg-filter '
ed10d9aa
MV
339 sed -e "/^git-svn-id:/d"
340'
341-------------------------------------------------------
f95eef15 342
b8f42332
JS
343If you need to add 'Acked-by' lines to, say, the last 10 commits (none
344of which is a merge), use this command:
345
346--------------------------------------------------------
347git filter-branch --msg-filter '
348 cat &&
349 echo "Acked-by: Bugs Bunny <bunny@bugzilla.org>"
350' HEAD~10..HEAD
351--------------------------------------------------------
352
21b6e4f2
TAK
353The `--env-filter` option can be used to modify committer and/or author
354identity. For example, if you found out that your commits have the wrong
355identity due to a misconfigured user.email, you can make a correction,
356before publishing the project, like this:
357
358--------------------------------------------------------
359git filter-branch --env-filter '
360 if test "$GIT_AUTHOR_EMAIL" = "root@localhost"
361 then
362 GIT_AUTHOR_EMAIL=john@example.com
21b6e4f2
TAK
363 fi
364 if test "$GIT_COMMITTER_EMAIL" = "root@localhost"
365 then
366 GIT_COMMITTER_EMAIL=john@example.com
21b6e4f2
TAK
367 fi
368' -- --all
369--------------------------------------------------------
370
8093ae88
AS
371To restrict rewriting to only part of the history, specify a revision
372range in addition to the new branch name. The new branch name will
373point to the top-most revision that a 'git rev-list' of this range
374will print.
08203668 375
c401b33c
JS
376Consider this history:
377
378------------------
379 D--E--F--G--H
380 / /
381A--B-----C
382------------------
383
384To rewrite only commits D,E,F,G,H, but leave A, B and C alone, use:
385
386--------------------------------
dfd05e38 387git filter-branch ... C..H
c401b33c
JS
388--------------------------------
389
390To rewrite commits E,F,G,H, use one of these:
391
392----------------------------------------
dfd05e38
JS
393git filter-branch ... C..H --not D
394git filter-branch ... D..H --not C
c401b33c
JS
395----------------------------------------
396
397To move the whole tree into a subdirectory, or remove it from there:
398
399---------------------------------------------------------------
400git filter-branch --index-filter \
d2d66f15 401 'git ls-files -s | sed "s-\t\"*-&newsubdir/-" |
c401b33c
JS
402 GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
403 git update-index --index-info &&
6cb0186a 404 mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE"' HEAD
c401b33c
JS
405---------------------------------------------------------------
406
407
d0268de6 408
76a8788c 409CHECKLIST FOR SHRINKING A REPOSITORY
d0268de6
TR
410------------------------------------
411
615b8f1a 412git-filter-branch can be used to get rid of a subset of files,
6cf378f0
JK
413usually with some combination of `--index-filter` and
414`--subdirectory-filter`. People expect the resulting repository to
d0268de6 415be smaller than the original, but you need a few more steps to
2de9b711 416actually make it smaller, because Git tries hard not to lose your
d0268de6
TR
417objects until you tell it to. First make sure that:
418
419* You really removed all variants of a filename, if a blob was moved
6cf378f0
JK
420 over its lifetime. `git log --name-only --follow --all -- filename`
421 can help you find renames.
d0268de6 422
6cf378f0
JK
423* You really filtered all refs: use `--tag-name-filter cat -- --all`
424 when calling git-filter-branch.
d0268de6
TR
425
426Then there are two ways to get a smaller repository. A safer way is
427to clone, that keeps your original intact.
428
6cf378f0 429* Clone it with `git clone file:///path/to/repo`. The clone
d0268de6
TR
430 will not have the removed objects. See linkgit:git-clone[1]. (Note
431 that cloning with a plain path just hardlinks everything!)
432
433If you really don't want to clone it, for whatever reasons, check the
434following points instead (in this order). This is a very destructive
435approach, so *make a backup* or go back to cloning it. You have been
436warned.
437
438* Remove the original refs backed up by git-filter-branch: say `git
6cf378f0 439 for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git
d0268de6
TR
440 update-ref -d`.
441
6cf378f0 442* Expire all reflogs with `git reflog expire --expire=now --all`.
d0268de6 443
6cf378f0 444* Garbage collect all unreferenced objects with `git gc --prune=now`
d0268de6 445 (or if your git-gc is not new enough to support arguments to
6cf378f0 446 `--prune`, use `git repack -ad; git prune` instead).
d0268de6 447
76a8788c 448NOTES
615b8f1a
RT
449-----
450
451git-filter-branch allows you to make complex shell-scripted rewrites
452of your Git history, but you probably don't need this flexibility if
453you're simply _removing unwanted data_ like large files or passwords.
454For those operations you may want to consider
2df85669 455http://rtyley.github.io/bfg-repo-cleaner/[The BFG Repo-Cleaner],
615b8f1a
RT
456a JVM-based alternative to git-filter-branch, typically at least
45710-50x faster for those use-cases, and with quite different
458characteristics:
459
460* Any particular version of a file is cleaned exactly _once_. The BFG,
461 unlike git-filter-branch, does not give you the opportunity to
462 handle a file differently based on where or when it was committed
463 within your history. This constraint gives the core performance
464 benefit of The BFG, and is well-suited to the task of cleansing bad
465 data - you don't care _where_ the bad data is, you just want it
466 _gone_.
467
468* By default The BFG takes full advantage of multi-core machines,
469 cleansing commit file-trees in parallel. git-filter-branch cleans
f745acb0
TA
470 commits sequentially (i.e. in a single-threaded manner), though it
471 _is_ possible to write filters that include their own parallelism,
615b8f1a
RT
472 in the scripts executed against each commit.
473
2df85669 474* The http://rtyley.github.io/bfg-repo-cleaner/#examples[command options]
615b8f1a
RT
475 are much more restrictive than git-filter branch, and dedicated just
476 to the tasks of removing unwanted data- e.g:
477 `--strip-blobs-bigger-than 1M`.
478
c401b33c
JS
479GIT
480---
9e1f0a85 481Part of the linkgit:git[1] suite