]> git.ipfire.org Git - thirdparty/git.git/blame - Documentation/git-filter-branch.txt
refs: convert peel_object to struct object_id
[thirdparty/git.git] / Documentation / git-filter-branch.txt
CommitLineData
c401b33c
JS
1git-filter-branch(1)
2====================
3
4NAME
5----
6git-filter-branch - Rewrite branches
7
8SYNOPSIS
9--------
10[verse]
3b117f73
AH
11'git filter-branch' [--setup <command>] [--env-filter <command>]
12 [--tree-filter <command>] [--index-filter <command>]
13 [--parent-filter <command>] [--msg-filter <command>]
14 [--commit-filter <command>] [--tag-name-filter <command>]
15 [--subdirectory-filter <directory>] [--prune-empty]
5433235d 16 [--original <namespace>] [-d <directory>] [-f | --force]
bd2c79fb 17 [--state-branch <branch>] [--] [<rev-list options>...]
c401b33c
JS
18
19DESCRIPTION
20-----------
2de9b711 21Lets you rewrite Git revision history by rewriting the branches mentioned
08203668 22in the <rev-list options>, applying custom filters on each revision.
c401b33c
JS
23Those filters can modify each tree (e.g. removing a file or running
24a perl rewrite on all files) or information about each commit.
25Otherwise, all information (including original commit times or merge
26information) will be preserved.
27
08203668 28The command will only rewrite the _positive_ refs mentioned in the
bf7c9021 29command line (e.g. if you pass 'a..b', only 'b' will be rewritten).
08203668
JS
30If you specify no filters, the commits will be recommitted without any
31changes, which would normally have no effect. Nevertheless, this may be
2de9b711 32useful in the future for compensating for some Git bugs or such,
08203668 33therefore such a usage is permitted.
c401b33c 34
831e61f8
JH
35*NOTE*: This command honors `.git/info/grafts` file and refs in
36the `refs/replace/` namespace.
0dc310e8
PC
37If you have any grafts or replacement refs defined, running this command
38will make them permanent.
c6d8f763 39
73616fd3 40*WARNING*! The rewritten history will have different object names for all
c401b33c
JS
41the objects and will not converge with the original branch. You will not
42be able to easily push and distribute the rewritten branch on top of the
43original branch. Please do not use this command if you do not know the
44full implications, and avoid using it anyway, if a simple single commit
97c33c65
TR
45would suffice to fix your problem. (See the "RECOVERING FROM UPSTREAM
46REBASE" section in linkgit:git-rebase[1] for further information about
47rewriting published history.)
c401b33c 48
dfd05e38
JS
49Always verify that the rewritten version is correct: The original refs,
50if different from the rewritten ones, will be stored in the namespace
51'refs/original/'.
c401b33c 52
bf7c9021 53Note that since this operation is very I/O expensive, it might
08203668 54be a good idea to redirect the temporary directory off-disk with the
23f8239b 55`-d` option, e.g. on tmpfs. Reportedly the speedup is very noticeable.
c401b33c
JS
56
57
58Filters
59~~~~~~~
60
61The filters are applied in the order as listed below. The <command>
bf7c9021
RW
62argument is always evaluated in the shell context using the 'eval' command
63(with the notable exception of the commit filter, for technical reasons).
47d81b5c 64Prior to that, the `$GIT_COMMIT` environment variable will be set to contain
c401b33c
JS
65the id of the commit being rewritten. Also, GIT_AUTHOR_NAME,
66GIT_AUTHOR_EMAIL, GIT_AUTHOR_DATE, GIT_COMMITTER_NAME, GIT_COMMITTER_EMAIL,
bee3eb07
TAK
67and GIT_COMMITTER_DATE are taken from the current commit and exported to
68the environment, in order to affect the author and committer identities of
69the replacement commit created by linkgit:git-commit-tree[1] after the
70filters have run.
71
bf7c9021
RW
72If any evaluation of <command> returns a non-zero exit status, the whole
73operation will be aborted.
c401b33c
JS
74
75A 'map' function is available that takes an "original sha1 id" argument
76and outputs a "rewritten sha1 id" if the commit has been already
32c37c12
JS
77rewritten, and "original sha1 id" otherwise; the 'map' function can
78return several ids on separate lines if your commit filter emitted
79multiple commits.
c401b33c
JS
80
81
82OPTIONS
83-------
84
3b117f73
AH
85--setup <command>::
86 This is not a real filter executed for each commit but a one
87 time setup just before the loop. Therefore no commit-specific
88 variables are defined yet. Functions or variables defined here
89 can be used or modified in the following filter steps except
90 the commit filter, for technical reasons.
91
c401b33c 92--env-filter <command>::
bf7c9021
RW
93 This filter may be used if you only need to modify the environment
94 in which the commit will be performed. Specifically, you might
95 want to rewrite the author/committer name/email/time environment
ba746ff9 96 variables (see linkgit:git-commit-tree[1] for details).
c401b33c
JS
97
98--tree-filter <command>::
99 This is the filter for rewriting the tree and its contents.
100 The argument is evaluated in shell with the working
101 directory set to the root of the checked out tree. The new tree
102 is then used as-is (new files are auto-added, disappeared files
103 are auto-removed - neither .gitignore files nor any other ignore
73616fd3 104 rules *HAVE ANY EFFECT*!).
c401b33c
JS
105
106--index-filter <command>::
107 This is the filter for rewriting the index. It is similar to the
108 tree filter but does not check out the tree, which makes it much
6cf378f0
JK
109 faster. Frequently used with `git rm --cached
110 --ignore-unmatch ...`, see EXAMPLES below. For hairy
3bc427e0 111 cases, see linkgit:git-update-index[1].
c401b33c
JS
112
113--parent-filter <command>::
114 This is the filter for rewriting the commit's parent list.
115 It will receive the parent string on stdin and shall output
116 the new parent string on stdout. The parent string is in
483bc4f0 117 the format described in linkgit:git-commit-tree[1]: empty for
c401b33c
JS
118 the initial commit, "-p parent" for a normal commit and
119 "-p parent1 -p parent2 -p parent3 ..." for a merge commit.
120
121--msg-filter <command>::
122 This is the filter for rewriting the commit messages.
123 The argument is evaluated in the shell with the original
124 commit message on standard input; its standard output is
125 used as the new commit message.
126
127--commit-filter <command>::
128 This is the filter for performing the commit.
129 If this filter is specified, it will be called instead of the
0b444cdb 130 'git commit-tree' command, with arguments of the form
0adda936 131 "<TREE_ID> [(-p <PARENT_COMMIT_ID>)...]" and the log message on
c401b33c
JS
132 stdin. The commit id is expected on stdout.
133+
134As a special extension, the commit filter may emit multiple
c5833f6e 135commit ids; in that case, the rewritten children of the original commit will
c401b33c 136have all of them as parents.
f95eef15
JS
137+
138You can use the 'map' convenience function in this filter, and other
139convenience functions, too. For example, calling 'skip_commit "$@"'
140will leave out the current commit (but not its changes! If you want
0b444cdb 141that, use 'git rebase' instead).
d3240d93 142+
ca768288
TR
143You can also use the `git_commit_non_empty_tree "$@"` instead of
144`git commit-tree "$@"` if you don't wish to keep commits with a single parent
d3240d93 145and that makes no change to the tree.
c401b33c
JS
146
147--tag-name-filter <command>::
148 This is the filter for rewriting tag names. When passed,
149 it will be called for every tag ref that points to a rewritten
150 object (or to a tag object which points to a rewritten object).
151 The original tag name is passed via standard input, and the new
152 tag name is expected on standard output.
153+
154The original tags are not deleted, but can be overwritten;
5876b8ee 155use "--tag-name-filter cat" to simply update the tags. In this
c401b33c
JS
156case, be very careful and make sure you have the old tags
157backed up in case the conversion has run afoul.
158+
1bf6551e
BC
159Nearly proper rewriting of tag objects is supported. If the tag has
160a message attached, a new tag object will be created with the same message,
161author, and timestamp. If the tag has a signature attached, the
162signature will be stripped. It is by definition impossible to preserve
163signatures. The reason this is "nearly" proper, is because ideally if
164the tag did not change (points to the same object, has the same name, etc.)
165it should retain any signature. That is not the case, signatures will always
166be removed, buyer beware. There is also no support for changing the
167author or timestamp (or the tag message for that matter). Tags which point
168to other tags will be rewritten to point to the underlying commit.
c401b33c
JS
169
170--subdirectory-filter <directory>::
73616fd3
JS
171 Only look at the history which touches the given subdirectory.
172 The result will contain that directory (and only that) as its
7ec344d8 173 project root. Implies <<Remap_to_ancestor>>.
c401b33c 174
d3240d93 175--prune-empty::
a582a82d
DP
176 Some filters will generate empty commits that leave the tree untouched.
177 This option instructs git-filter-branch to remove such commits if they
178 have exactly one or zero non-pruned parents; merge commits will
179 therefore remain intact. This option cannot be used together with
180 `--commit-filter`, though the same effect can be achieved by using the
181 provided `git_commit_non_empty_tree` function in a commit filter.
d3240d93 182
5433235d
GB
183--original <namespace>::
184 Use this option to set the namespace where the original commits
185 will be stored. The default value is 'refs/original'.
186
c401b33c
JS
187-d <directory>::
188 Use this option to set the path to the temporary directory used for
189 rewriting. When applying a tree filter, the command needs to
bf7c9021 190 temporarily check out the tree to some directory, which may consume
c401b33c
JS
191 considerable space in case of large projects. By default it
192 does this in the '.git-rewrite/' directory but you can override
193 that choice by this parameter.
194
3240240f
SB
195-f::
196--force::
0b444cdb 197 'git filter-branch' refuses to start with an existing temporary
dfd05e38
JS
198 directory or when there are already refs starting with
199 'refs/original/', unless forced.
200
bd2c79fb
IC
201--state-branch <branch>::
202 This option will cause the mapping from old to new objects to
203 be loaded from named branch upon startup and saved as a new
204 commit to that branch upon exit, enabling incremental of large
205 trees. If '<branch>' does not exist it will be created.
206
f448e24e 207<rev-list options>...::
0b444cdb 208 Arguments for 'git rev-list'. All positive refs included by
8afa4210 209 these options are rewritten. You may also specify options
04b125de 210 such as `--all`, but you must use `--` to separate them from
7ec344d8
CH
211 the 'git filter-branch' options. Implies <<Remap_to_ancestor>>.
212
213
214[[Remap_to_ancestor]]
215Remap to ancestor
216~~~~~~~~~~~~~~~~~
217
1cca17df 218By using linkgit:git-rev-list[1] arguments, e.g., path limiters, you can limit the
7ec344d8
CH
219set of revisions which get rewritten. However, positive refs on the command
220line are distinguished: we don't let them be excluded by such limiters. For
221this purpose, they are instead rewritten to point at the nearest ancestor that
222was not excluded.
c401b33c
JS
223
224
225Examples
226--------
227
228Suppose you want to remove a file (containing confidential information
229or copyright violation) from all commits:
230
231-------------------------------------------------------
dfd05e38 232git filter-branch --tree-filter 'rm filename' HEAD
c401b33c
JS
233-------------------------------------------------------
234
e4d594c6
JL
235However, if the file is absent from the tree of some commit,
236a simple `rm filename` will fail for that tree and commit.
237Thus you may instead want to use `rm -f filename` as the script.
238
6cf378f0 239Using `--index-filter` with 'git rm' yields a significantly faster
3bc427e0
TR
240version. Like with using `rm filename`, `git rm --cached filename`
241will fail if the file is absent from the tree of a commit. If you
242want to "completely forget" a file, it does not matter when it entered
6cf378f0 243history, so we also add `--ignore-unmatch`:
c401b33c 244
dfd05e38 245--------------------------------------------------------------------------
3bc427e0 246git filter-branch --index-filter 'git rm --cached --ignore-unmatch filename' HEAD
dfd05e38 247--------------------------------------------------------------------------
c401b33c 248
8ef44519 249Now, you will get the rewritten history saved in HEAD.
c401b33c 250
8afa4210
TR
251To rewrite the repository to look as if `foodir/` had been its project
252root, and discard all other history:
253
254-------------------------------------------------------
255git filter-branch --subdirectory-filter foodir -- --all
256-------------------------------------------------------
257
258Thus you can, e.g., turn a library subdirectory into a repository of
6cf378f0
JK
259its own. Note the `--` that separates 'filter-branch' options from
260revision options, and the `--all` to rewrite all branches and tags.
8afa4210 261
32c37c12
JS
262To set a commit (which typically is at the tip of another
263history) to be the parent of the current initial commit, in
264order to paste the other history behind the current history:
c401b33c 265
dfd05e38
JS
266-------------------------------------------------------------------
267git filter-branch --parent-filter 'sed "s/^\$/-p <graft-id>/"' HEAD
268-------------------------------------------------------------------
c401b33c 269
08203668
JS
270(if the parent string is empty - which happens when we are dealing with
271the initial commit - add graftcommit as a parent). Note that this assumes
c401b33c
JS
272history with a single root (that is, no merge without common ancestors
273happened). If this is not the case, use:
274
dfd05e38 275--------------------------------------------------------------------------
c401b33c 276git filter-branch --parent-filter \
41e86a37 277 'test $GIT_COMMIT = <commit-id> && echo "-p <graft-id>" || cat' HEAD
dfd05e38 278--------------------------------------------------------------------------
c401b33c 279
32c37c12
JS
280or even simpler:
281
282-----------------------------------------------
283echo "$commit-id $graft-id" >> .git/info/grafts
dfd05e38 284git filter-branch $graft-id..HEAD
32c37c12
JS
285-----------------------------------------------
286
c401b33c
JS
287To remove commits authored by "Darl McBribe" from the history:
288
289------------------------------------------------------------------------------
290git filter-branch --commit-filter '
291 if [ "$GIT_AUTHOR_NAME" = "Darl McBribe" ];
292 then
f95eef15 293 skip_commit "$@";
c401b33c
JS
294 else
295 git commit-tree "$@";
dfd05e38 296 fi' HEAD
c401b33c
JS
297------------------------------------------------------------------------------
298
8451c565 299The function 'skip_commit' is defined as follows:
f95eef15
JS
300
301--------------------------
302skip_commit()
303{
304 shift;
305 while [ -n "$1" ];
306 do
307 shift;
308 map "$1";
309 shift;
310 done;
311}
312--------------------------
313
c401b33c
JS
314The shift magic first throws away the tree id and then the -p
315parameters. Note that this handles merges properly! In case Darl
316committed a merge between P1 and P2, it will be propagated properly
317and all children of the merge will become merge commits with P1,P2
318as their parents instead of the merge commit.
319
8093ae88
AS
320*NOTE* the changes introduced by the commits, and which are not reverted
321by subsequent commits, will still be in the rewritten branch. If you want
322to throw out _changes_ together with the commits, you should use the
323interactive mode of 'git rebase'.
324
a1748890 325You can rewrite the commit log messages using `--msg-filter`. For
0b444cdb 326example, 'git svn-id' strings in a repository created by 'git svn' can
ed10d9aa
MV
327be removed this way:
328
329-------------------------------------------------------
a1748890 330git filter-branch --msg-filter '
ed10d9aa
MV
331 sed -e "/^git-svn-id:/d"
332'
333-------------------------------------------------------
f95eef15 334
b8f42332
JS
335If you need to add 'Acked-by' lines to, say, the last 10 commits (none
336of which is a merge), use this command:
337
338--------------------------------------------------------
339git filter-branch --msg-filter '
340 cat &&
341 echo "Acked-by: Bugs Bunny <bunny@bugzilla.org>"
342' HEAD~10..HEAD
343--------------------------------------------------------
344
21b6e4f2
TAK
345The `--env-filter` option can be used to modify committer and/or author
346identity. For example, if you found out that your commits have the wrong
347identity due to a misconfigured user.email, you can make a correction,
348before publishing the project, like this:
349
350--------------------------------------------------------
351git filter-branch --env-filter '
352 if test "$GIT_AUTHOR_EMAIL" = "root@localhost"
353 then
354 GIT_AUTHOR_EMAIL=john@example.com
21b6e4f2
TAK
355 fi
356 if test "$GIT_COMMITTER_EMAIL" = "root@localhost"
357 then
358 GIT_COMMITTER_EMAIL=john@example.com
21b6e4f2
TAK
359 fi
360' -- --all
361--------------------------------------------------------
362
8093ae88
AS
363To restrict rewriting to only part of the history, specify a revision
364range in addition to the new branch name. The new branch name will
365point to the top-most revision that a 'git rev-list' of this range
366will print.
08203668 367
c401b33c
JS
368Consider this history:
369
370------------------
371 D--E--F--G--H
372 / /
373A--B-----C
374------------------
375
376To rewrite only commits D,E,F,G,H, but leave A, B and C alone, use:
377
378--------------------------------
dfd05e38 379git filter-branch ... C..H
c401b33c
JS
380--------------------------------
381
382To rewrite commits E,F,G,H, use one of these:
383
384----------------------------------------
dfd05e38
JS
385git filter-branch ... C..H --not D
386git filter-branch ... D..H --not C
c401b33c
JS
387----------------------------------------
388
389To move the whole tree into a subdirectory, or remove it from there:
390
391---------------------------------------------------------------
392git filter-branch --index-filter \
d2d66f15 393 'git ls-files -s | sed "s-\t\"*-&newsubdir/-" |
c401b33c
JS
394 GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
395 git update-index --index-info &&
6cb0186a 396 mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE"' HEAD
c401b33c
JS
397---------------------------------------------------------------
398
399
d0268de6
TR
400
401Checklist for Shrinking a Repository
402------------------------------------
403
615b8f1a 404git-filter-branch can be used to get rid of a subset of files,
6cf378f0
JK
405usually with some combination of `--index-filter` and
406`--subdirectory-filter`. People expect the resulting repository to
d0268de6 407be smaller than the original, but you need a few more steps to
2de9b711 408actually make it smaller, because Git tries hard not to lose your
d0268de6
TR
409objects until you tell it to. First make sure that:
410
411* You really removed all variants of a filename, if a blob was moved
6cf378f0
JK
412 over its lifetime. `git log --name-only --follow --all -- filename`
413 can help you find renames.
d0268de6 414
6cf378f0
JK
415* You really filtered all refs: use `--tag-name-filter cat -- --all`
416 when calling git-filter-branch.
d0268de6
TR
417
418Then there are two ways to get a smaller repository. A safer way is
419to clone, that keeps your original intact.
420
6cf378f0 421* Clone it with `git clone file:///path/to/repo`. The clone
d0268de6
TR
422 will not have the removed objects. See linkgit:git-clone[1]. (Note
423 that cloning with a plain path just hardlinks everything!)
424
425If you really don't want to clone it, for whatever reasons, check the
426following points instead (in this order). This is a very destructive
427approach, so *make a backup* or go back to cloning it. You have been
428warned.
429
430* Remove the original refs backed up by git-filter-branch: say `git
6cf378f0 431 for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git
d0268de6
TR
432 update-ref -d`.
433
6cf378f0 434* Expire all reflogs with `git reflog expire --expire=now --all`.
d0268de6 435
6cf378f0 436* Garbage collect all unreferenced objects with `git gc --prune=now`
d0268de6 437 (or if your git-gc is not new enough to support arguments to
6cf378f0 438 `--prune`, use `git repack -ad; git prune` instead).
d0268de6 439
615b8f1a
RT
440Notes
441-----
442
443git-filter-branch allows you to make complex shell-scripted rewrites
444of your Git history, but you probably don't need this flexibility if
445you're simply _removing unwanted data_ like large files or passwords.
446For those operations you may want to consider
2df85669 447http://rtyley.github.io/bfg-repo-cleaner/[The BFG Repo-Cleaner],
615b8f1a
RT
448a JVM-based alternative to git-filter-branch, typically at least
44910-50x faster for those use-cases, and with quite different
450characteristics:
451
452* Any particular version of a file is cleaned exactly _once_. The BFG,
453 unlike git-filter-branch, does not give you the opportunity to
454 handle a file differently based on where or when it was committed
455 within your history. This constraint gives the core performance
456 benefit of The BFG, and is well-suited to the task of cleansing bad
457 data - you don't care _where_ the bad data is, you just want it
458 _gone_.
459
460* By default The BFG takes full advantage of multi-core machines,
461 cleansing commit file-trees in parallel. git-filter-branch cleans
f745acb0
TA
462 commits sequentially (i.e. in a single-threaded manner), though it
463 _is_ possible to write filters that include their own parallelism,
615b8f1a
RT
464 in the scripts executed against each commit.
465
2df85669 466* The http://rtyley.github.io/bfg-repo-cleaner/#examples[command options]
615b8f1a
RT
467 are much more restrictive than git-filter branch, and dedicated just
468 to the tasks of removing unwanted data- e.g:
469 `--strip-blobs-bigger-than 1M`.
470
c401b33c
JS
471GIT
472---
9e1f0a85 473Part of the linkgit:git[1] suite