]> git.ipfire.org Git - thirdparty/git.git/blame - Documentation/git-fast-import.txt
Merge branch 'bc/faq'
[thirdparty/git.git] / Documentation / git-fast-import.txt
CommitLineData
6e411d20
SP
1git-fast-import(1)
2==================
3
4NAME
5----
7a33631f 6git-fast-import - Backend for fast Git data importers
6e411d20
SP
7
8
9SYNOPSIS
10--------
7791a1d9 11[verse]
de613050 12frontend | 'git fast-import' [<options>]
6e411d20
SP
13
14DESCRIPTION
15-----------
16This program is usually not what the end user wants to run directly.
17Most end users want to use one of the existing frontend programs,
18which parses a specific type of foreign source and feeds the contents
0b444cdb 19stored there to 'git fast-import'.
6e411d20 20
882227f1 21fast-import reads a mixed command/data stream from standard input and
6e411d20
SP
22writes one or more packfiles directly into the current repository.
23When EOF is received on standard input, fast import writes out
24updated branch and tag refs, fully updating the current repository
25with the newly imported data.
26
882227f1 27The fast-import backend itself can import into an empty repository (one that
0b444cdb 28has already been initialized by 'git init') or incrementally
6e411d20
SP
29update an existing populated repository. Whether or not incremental
30imports are supported from a particular foreign source depends on
31the frontend program in use.
32
33
34OPTIONS
35-------
63e0c8b3 36
7073e69e
SP
37--force::
38 Force updating modified existing branches, even if doing
39 so would cause commits to be lost (as the new commit does
40 not contain the old commit).
41
29b1b21f 42--quiet::
f55c979b
EN
43 Disable the output shown by --stats, making fast-import usually
44 be silent when it is successful. However, if the import stream
45 has directives intended to show user output (e.g. `progress`
46 directives), the corresponding messages will still be shown.
6e411d20 47
29b1b21f
JK
48--stats::
49 Display some basic statistics about the objects fast-import has
50 created, the packfiles they were stored into, and the
51 memory used by fast-import during this run. Showing this output
1c262bb7 52 is currently the default, but can be disabled with --quiet.
5eef828b 53
68061e34
JK
54--allow-unsafe-features::
55 Many command-line options can be provided as part of the
56 fast-import stream itself by using the `feature` or `option`
57 commands. However, some of these options are unsafe (e.g.,
58 allowing fast-import to access the filesystem outside of the
59 repository). These options are disabled by default, but can be
60 allowed by providing this option on the command line. This
a52ed761
JK
61 currently impacts only the `export-marks`, `import-marks`, and
62 `import-marks-if-exists` feature commands.
68061e34
JK
63+
64 Only enable this option if you trust the program generating the
65 fast-import stream! This option is enabled automatically for
66 remote-helpers that use the `import` capability, as they are
67 already trusted to run their own code.
68
29b1b21f
JK
69Options for Frontends
70~~~~~~~~~~~~~~~~~~~~~
6e411d20 71
29b1b21f 72--cat-blob-fd=<fd>::
28c7b1f7 73 Write responses to `get-mark`, `cat-blob`, and `ls` queries to the
a96e8078
JH
74 file descriptor <fd> instead of `stdout`. Allows `progress`
75 output intended for the end-user to be separated from other
76 output.
29b1b21f
JK
77
78--date-format=<fmt>::
79 Specify the type of dates the frontend will supply to
80 fast-import within `author`, `committer` and `tagger` commands.
81 See ``Date Formats'' below for details about which formats
82 are supported, and their syntax.
83
84--done::
85 Terminate with error if there is no `done` command at the end of
86 the stream. This option might be useful for detecting errors
87 that cause the frontend to terminate before it has started to
88 write a stream.
89
90Locations of Marks Files
91~~~~~~~~~~~~~~~~~~~~~~~~
6e411d20
SP
92
93--export-marks=<file>::
94 Dumps the internal marks table to <file> when complete.
95 Marks are written one per line as `:markid SHA-1`.
96 Frontends can use this file to validate imports after they
e8438420
SP
97 have been completed, or to save the marks table across
98 incremental runs. As <file> is only opened and truncated
99 at checkpoint (or completion) the same path can also be
1c262bb7 100 safely given to --import-marks.
e8438420
SP
101
102--import-marks=<file>::
103 Before processing any input, load the marks specified in
104 <file>. The input file must exist, must be readable, and
1c262bb7 105 must use the same format as produced by --export-marks.
e8438420
SP
106 Multiple options may be supplied to import more than one
107 set of marks. If a mark is defined to different values,
108 the last file wins.
6e411d20 109
dded4f12
RR
110--import-marks-if-exists=<file>::
111 Like --import-marks but instead of erroring out, silently
112 skips the file if it does not exist.
113
c8a9f3d3 114--[no-]relative-marks::
9fee24ca 115 After specifying --relative-marks the paths specified
bc3c79ae
SR
116 with --import-marks= and --export-marks= are relative
117 to an internal directory in the current repository.
118 In git-fast-import this means that the paths are relative
119 to the .git/info/fast-import directory. However, other
120 importers may use a different location.
c8a9f3d3
JK
121+
122Relative and non-relative marks may be combined by interweaving
123--(no-)-relative-marks with the --(import|export)-marks= options.
bc3c79ae 124
1bdca816 125Submodule Rewriting
126~~~~~~~~~~~~~~~~~~~
127
128--rewrite-submodules-from=<name>:<file>::
129--rewrite-submodules-to=<name>:<file>::
130 Rewrite the object IDs for the submodule specified by <name> from the values
131 used in the from <file> to those used in the to <file>. The from marks should
132 have been created by `git fast-export`, and the to marks should have been
133 created by `git fast-import` when importing that same submodule.
134+
135<name> may be any arbitrary string not containing a colon character, but the
136same value must be used with both options when specifying corresponding marks.
137Multiple submodules may be specified with different values for <name>. It is an
138error not to use these options in corresponding pairs.
139+
140These options are primarily useful when converting a repository from one hash
141algorithm to another; without them, fast-import will fail if it encounters a
142submodule because it has no way of writing the object ID into the new hash
143algorithm.
144
29b1b21f
JK
145Performance and Compression Tuning
146~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
bc3c79ae 147
29b1b21f
JK
148--active-branches=<n>::
149 Maximum number of branches to maintain active at once.
150 See ``Memory Utilization'' below for details. Default is 5.
85c62395 151
29b1b21f
JK
152--big-file-threshold=<n>::
153 Maximum size of a blob that fast-import will attempt to
154 create a delta for, expressed in bytes. The default is 512m
155 (512 MiB). Some importers may wish to lower this on systems
156 with constrained memory.
157
158--depth=<n>::
159 Maximum delta depth, for blob and tree deltification.
4f2220e6 160 Default is 50.
be56862f 161
bdf1c06d
SP
162--export-pack-edges=<file>::
163 After creating a packfile, print a line of data to
164 <file> listing the filename of the packfile and the last
165 commit on each branch that was written to that packfile.
166 This information may be useful after importing projects
167 whose total object set exceeds the 4 GiB packfile limit,
168 as these commits can be used as edge points during calls
0b444cdb 169 to 'git pack-objects'.
bdf1c06d 170
29b1b21f
JK
171--max-pack-size=<n>::
172 Maximum size of each output packfile.
173 The default is unlimited.
c499d768 174
d9545c7f
EW
175fastimport.unpackLimit::
176 See linkgit:git-config[1]
c499d768 177
76a8788c 178PERFORMANCE
6e411d20 179-----------
882227f1 180The design of fast-import allows it to import large projects in a minimum
6e411d20 181amount of memory usage and processing time. Assuming the frontend
882227f1 182is able to keep up with fast-import and feed it a constant stream of data,
6e411d20
SP
183import times for projects holding 10+ years of history and containing
184100,000+ individual commits are generally completed in just 1-2
185hours on quite modest (~$2,000 USD) hardware.
186
187Most bottlenecks appear to be in foreign source data access (the
882227f1 188source just cannot extract revisions fast enough) or disk IO (fast-import
6e411d20
SP
189writes as fast as the disk will take the data). Imports will run
190faster if the source data is stored on a different drive than the
191destination Git repository (due to less IO contention).
192
193
76a8788c 194DEVELOPMENT COST
6e411d20 195----------------
882227f1 196A typical frontend for fast-import tends to weigh in at approximately 200
6e411d20
SP
197lines of Perl/Python/Ruby code. Most developers have been able to
198create working importers in just a couple of hours, even though it
882227f1 199is their first exposure to fast-import, and sometimes even to Git. This is
6e411d20
SP
200an ideal situation, given that most conversion tools are throw-away
201(use once, and never look back).
202
203
76a8788c 204PARALLEL OPERATION
6e411d20 205------------------
0b444cdb 206Like 'git push' or 'git fetch', imports handled by fast-import are safe to
6e411d20 207run alongside parallel `git repack -a -d` or `git gc` invocations,
0b444cdb 208or any other Git operation (including 'git prune', as loose objects
882227f1 209are never used by fast-import).
6e411d20 210
882227f1
SP
211fast-import does not lock the branch or tag refs it is actively importing.
212After the import, during its ref update phase, fast-import tests each
7073e69e
SP
213existing branch ref to verify the update will be a fast-forward
214update (the commit stored in the ref is contained in the new
215history of the commit to be written). If the update is not a
882227f1
SP
216fast-forward update, fast-import will skip updating that ref and instead
217prints a warning message. fast-import will always attempt to update all
7073e69e
SP
218branch refs, and does not stop on the first failure.
219
1c262bb7
JK
220Branch updates can be forced with --force, but it's recommended that
221this only be used on an otherwise quiet repository. Using --force
7073e69e 222is not necessary for an initial import into an empty repository.
6e411d20
SP
223
224
76a8788c 225TECHNICAL DISCUSSION
6e411d20 226--------------------
882227f1 227fast-import tracks a set of branches in memory. Any branch can be created
6e411d20
SP
228or modified at any point during the import process by sending a
229`commit` command on the input stream. This design allows a frontend
230program to process an unlimited number of branches simultaneously,
231generating commits in the order they are available from the source
232data. It also simplifies the frontend programs considerably.
233
882227f1 234fast-import does not use or alter the current working directory, or any
6e411d20
SP
235file within it. (It does however update the current Git repository,
236as referenced by `GIT_DIR`.) Therefore an import frontend may use
237the working directory for its own purposes, such as extracting file
238revisions from the foreign source. This ignorance of the working
882227f1 239directory also allows fast-import to run very quickly, as it does not
6e411d20
SP
240need to perform any costly file update operations when switching
241between branches.
242
76a8788c 243INPUT FORMAT
6e411d20
SP
244------------
245With the exception of raw file data (which Git does not interpret)
882227f1 246the fast-import input format is text (ASCII) based. This text based
6e411d20
SP
247format simplifies development and debugging of frontend programs,
248especially when a higher level language such as Perl, Python or
249Ruby is being used.
250
882227f1 251fast-import is very strict about its input. Where we say SP below we mean
8dc6a373
DB
252*exactly* one space. Likewise LF means one (and only one) linefeed
253and HT one (and only one) horizontal tab.
6e411d20
SP
254Supplying additional whitespace characters will cause unexpected
255results, such as branch names or file names with leading or trailing
882227f1 256spaces in their name, or early termination of fast-import when it encounters
6e411d20
SP
257unexpected input.
258
401d53fa
SP
259Stream Comments
260~~~~~~~~~~~~~~~
261To aid in debugging frontends fast-import ignores any line that
262begins with `#` (ASCII pound/hash) up to and including the line
263ending `LF`. A comment line may contain any sequence of bytes
264that does not contain an LF and therefore may be used to include
265any detailed debugging information that might be specific to the
266frontend and useful when inspecting a fast-import data stream.
267
63e0c8b3
SP
268Date Formats
269~~~~~~~~~~~~
270The following date formats are supported. A frontend should select
271the format it will use for this import by passing the format name
1c262bb7 272in the --date-format=<fmt> command-line option.
63e0c8b3
SP
273
274`raw`::
9b92c82f 275 This is the Git native format and is `<time> SP <offutc>`.
1c262bb7 276 It is also fast-import's default format, if --date-format was
63e0c8b3
SP
277 not specified.
278+
279The time of the event is specified by `<time>` as the number of
280seconds since the UNIX epoch (midnight, Jan 1, 1970, UTC) and is
281written as an ASCII decimal integer.
282+
9b92c82f
SP
283The local offset is specified by `<offutc>` as a positive or negative
284offset from UTC. For example EST (which is 5 hours behind UTC)
285would be expressed in `<tz>` by ``-0500'' while UTC is ``+0000''.
286The local offset does not affect `<time>`; it is used only as an
287advisement to help formatting routines display the timestamp.
63e0c8b3 288+
9b92c82f
SP
289If the local offset is not available in the source material, use
290``+0000'', or the most common local offset. For example many
63e0c8b3 291organizations have a CVS repository which has only ever been accessed
0ffa154b 292by users who are located in the same location and time zone. In this
f842fdb0 293case a reasonable offset from UTC could be assumed.
63e0c8b3
SP
294+
295Unlike the `rfc2822` format, this format is very strict. Any
882227f1 296variation in formatting will cause fast-import to reject the value.
63e0c8b3
SP
297
298`rfc2822`::
299 This is the standard email format as described by RFC 2822.
300+
301An example value is ``Tue Feb 6 11:22:18 2007 -0500''. The Git
f842fdb0 302parser is accurate, but a little on the lenient side. It is the
0b444cdb 303same parser used by 'git am' when applying patches
63e0c8b3
SP
304received from email.
305+
306Some malformed strings may be accepted as valid dates. In some of
307these cases Git will still be able to obtain the correct date from
308the malformed string. There are also some types of malformed
309strings which Git will parse wrong, and yet consider valid.
310Seriously malformed strings will be rejected.
311+
0ffa154b 312Unlike the `raw` format above, the time zone/UTC offset information
9b92c82f
SP
313contained in an RFC 2822 date string is used to adjust the date
314value to UTC prior to storage. Therefore it is important that
315this information be as accurate as possible.
316+
f842fdb0 317If the source material uses RFC 2822 style dates,
882227f1 318the frontend should let fast-import handle the parsing and conversion
63e0c8b3
SP
319(rather than attempting to do it itself) as the Git parser has
320been well tested in the wild.
321+
322Frontends should prefer the `raw` format if the source material
f842fdb0 323already uses UNIX-epoch format, can be coaxed to give dates in that
02783075 324format, or its format is easily convertible to it, as there is no
f842fdb0 325ambiguity in parsing.
63e0c8b3
SP
326
327`now`::
0ffa154b 328 Always use the current time and time zone. The literal
63e0c8b3
SP
329 `now` must always be supplied for `<when>`.
330+
0ffa154b 331This is a toy format. The current time and time zone of this system
63e0c8b3 332is always copied into the identity string at the time it is being
882227f1 333created by fast-import. There is no way to specify a different time or
0ffa154b 334time zone.
63e0c8b3 335+
6a5d0b0a 336This particular format is supplied as it's short to implement and
63e0c8b3
SP
337may be useful to a process that wants to create a new commit
338right now, without needing to use a working directory or
0b444cdb 339'git update-index'.
63e0c8b3
SP
340+
341If separate `author` and `committer` commands are used in a `commit`
342the timestamps may not match, as the system clock will be polled
343twice (once for each command). The only way to ensure that both
344author and committer identity information has the same timestamp
345is to omit `author` (thus copying from `committer`) or to use a
346date format other than `now`.
347
6e411d20
SP
348Commands
349~~~~~~~~
882227f1 350fast-import accepts several commands to update the current repository
6e411d20
SP
351and control the current import process. More detailed discussion
352(with examples) of each command follows later.
353
354`commit`::
355 Creates a new branch or updates an existing branch by
356 creating a new commit and updating the branch to point at
357 the newly created commit.
358
359`tag`::
360 Creates an annotated tag object from an existing commit or
361 branch. Lightweight tags are not supported by this command,
362 as they are not recommended for recording meaningful points
363 in time.
364
365`reset`::
366 Reset an existing branch (or a new branch) to a specific
367 revision. This command must be used to change a branch to
368 a specific revision without making a commit on it.
369
370`blob`::
371 Convert raw file data into a blob, for future use in a
372 `commit` command. This command is optional and is not
373 needed to perform an import.
374
b8f50e5b
EN
375`alias`::
376 Record that a mark refers to a given object without first
377 creating any new object. Using --import-marks and referring
378 to missing marks will cause fast-import to fail, so aliases
379 can provide a way to set otherwise pruned commits to a valid
380 value (e.g. the nearest non-pruned ancestor).
381
6e411d20 382`checkpoint`::
882227f1 383 Forces fast-import to close the current packfile, generate its
6e411d20
SP
384 unique SHA-1 checksum and index, and start a new packfile.
385 This command is optional and is not needed to perform
386 an import.
387
ac053c02
SP
388`progress`::
389 Causes fast-import to echo the entire line to its own
390 standard output. This command is optional and is not needed
391 to perform an import.
392
be56862f
SR
393`done`::
394 Marks the end of the stream. This command is optional
395 unless the `done` feature was requested using the
06ab60c0 396 `--done` command-line option or `feature done` command.
be56862f 397
28c7b1f7
MH
398`get-mark`::
399 Causes fast-import to print the SHA-1 corresponding to a mark
400 to the file descriptor set with `--cat-blob-fd`, or `stdout` if
401 unspecified.
402
85c62395
DB
403`cat-blob`::
404 Causes fast-import to print a blob in 'cat-file --batch'
405 format to the file descriptor set with `--cat-blob-fd` or
406 `stdout` if unspecified.
407
8dc6a373
DB
408`ls`::
409 Causes fast-import to print a line describing a directory
410 entry in 'ls-tree' format to the file descriptor set with
411 `--cat-blob-fd` or `stdout` if unspecified.
412
f963bd5d 413`feature`::
87c9a140
MM
414 Enable the specified feature. This requires that fast-import
415 supports the specified feature, and aborts if it does not.
f963bd5d 416
9c8398f0
SR
417`option`::
418 Specify any of the options listed under OPTIONS that do not
419 change stream semantic to suit the frontend's needs. This
420 command is optional and is not needed to perform an import.
421
6e411d20
SP
422`commit`
423~~~~~~~~
424Create or update a branch with a new commit, recording one logical
425change to the project.
426
427....
428 'commit' SP <ref> LF
429 mark?
a965bb31 430 original-oid?
74fbd118
SP
431 ('author' (SP <name>)? SP LT <email> GT SP <when> LF)?
432 'committer' (SP <name>)? SP LT <email> GT SP <when> LF
3edfcc65 433 ('encoding' SP <encoding>)?
6e411d20 434 data
a8a5406a 435 ('from' SP <commit-ish> LF)?
d1387d38 436 ('merge' SP <commit-ish> LF)*
a8dd2e7d 437 (filemodify | filedelete | filecopy | filerename | filedeleteall | notemodify)*
1fdb649c 438 LF?
6e411d20
SP
439....
440
441where `<ref>` is the name of the branch to make the commit on.
442Typically branch names are prefixed with `refs/heads/` in
443Git, so importing the CVS branch symbol `RELENG-1_0` would use
444`refs/heads/RELENG-1_0` for the value of `<ref>`. The value of
445`<ref>` must be a valid refname in Git. As `LF` is not valid in
446a Git refname, no quoting or escaping syntax is supported here.
447
882227f1 448A `mark` command may optionally appear, requesting fast-import to save a
6e411d20
SP
449reference to the newly created commit for future use by the frontend
450(see below for format). It is very common for frontends to mark
451every commit they create, thereby allowing future branch creation
452from any imported commit.
453
454The `data` command following `committer` must supply the commit
455message (see below for `data` command syntax). To import an empty
456commit message use a 0 length data. Commit messages are free-form
457and are not interpreted by Git. Currently they must be encoded in
882227f1 458UTF-8, as fast-import does not permit other encodings to be specified.
6e411d20 459
a8dd2e7d
JH
460Zero or more `filemodify`, `filedelete`, `filecopy`, `filerename`,
461`filedeleteall` and `notemodify` commands
825769a8
SP
462may be included to update the contents of the branch prior to
463creating the commit. These commands may be supplied in any order.
02783075 464However it is recommended that a `filedeleteall` command precede
a8dd2e7d
JH
465all `filemodify`, `filecopy`, `filerename` and `notemodify` commands in
466the same commit, as `filedeleteall` wipes the branch clean (see below).
6e411d20 467
62edbec7
EN
468The `LF` after the command is optional (it used to be required). Note
469that for reasons of backward compatibility, if the commit ends with a
24966cd9 470`data` command (i.e. it has no `from`, `merge`, `filemodify`,
62edbec7
EN
471`filedelete`, `filecopy`, `filerename`, `filedeleteall` or
472`notemodify` commands) then two `LF` commands may appear at the end of
473the command instead of just one.
1fdb649c 474
6e411d20
SP
475`author`
476^^^^^^^^
477An `author` command may optionally appear, if the author information
478might differ from the committer information. If `author` is omitted
882227f1 479then fast-import will automatically use the committer's information for
6e411d20
SP
480the author portion of the commit. See below for a description of
481the fields in `author`, as they are identical to `committer`.
482
483`committer`
484^^^^^^^^^^^
485The `committer` command indicates who made this commit, and when
486they made it.
487
488Here `<name>` is the person's display name (for example
489``Com M Itter'') and `<email>` is the person's email address
f430ed8b 490(``\cm@example.com''). `LT` and `GT` are the literal less-than (\x3c)
6e411d20
SP
491and greater-than (\x3e) symbols. These are required to delimit
492the email address from the other fields in the line. Note that
4b4963c0
DI
493`<name>` and `<email>` are free-form and may contain any sequence
494of bytes, except `LT`, `GT` and `LF`. `<name>` is typically UTF-8 encoded.
6e411d20 495
63e0c8b3 496The time of the change is specified by `<when>` using the date format
1c262bb7 497that was selected by the --date-format=<fmt> command-line option.
63e0c8b3
SP
498See ``Date Formats'' above for the set of supported formats, and
499their syntax.
6e411d20 500
3edfcc65
EN
501`encoding`
502^^^^^^^^^^
503The optional `encoding` command indicates the encoding of the commit
504message. Most commits are UTF-8 and the encoding is omitted, but this
505allows importing commit messages into git without first reencoding them.
506
6e411d20
SP
507`from`
508^^^^^^
ea5e370a
SP
509The `from` command is used to specify the commit to initialize
510this branch from. This revision will be the first ancestor of the
e7052886
ER
511new commit. The state of the tree built at this commit will begin
512with the state at the `from` commit, and be altered by the content
513modifications in this commit.
ea5e370a
SP
514
515Omitting the `from` command in the first commit of a new branch
516will cause fast-import to create that commit with no ancestor. This
517tends to be desired only for the initial commit of a project.
9b33fa08
EB
518If the frontend creates all files from scratch when making a new
519branch, a `merge` command may be used instead of `from` to start
520the commit with an empty tree.
ea5e370a
SP
521Omitting the `from` command on existing branches is usually desired,
522as the current commit on that branch is automatically assumed to
523be the first ancestor of the new commit.
6e411d20
SP
524
525As `LF` is not valid in a Git refname or SHA-1 expression, no
a8a5406a 526quoting or escaping syntax is supported within `<commit-ish>`.
6e411d20 527
a8a5406a 528Here `<commit-ish>` is any of the following:
6e411d20 529
882227f1 530* The name of an existing branch already in fast-import's internal branch
6a5d0b0a 531 table. If fast-import doesn't know the name, it's treated as a SHA-1
6e411d20
SP
532 expression.
533
534* A mark reference, `:<idnum>`, where `<idnum>` is the mark number.
535+
882227f1 536The reason fast-import uses `:` to denote a mark reference is this character
6e411d20 537is not legal in a Git branch name. The leading `:` makes it easy
02783075 538to distinguish between the mark 42 (`:42`) and the branch 42 (`42`
6e411d20
SP
539or `refs/heads/42`), or an abbreviated SHA-1 which happened to
540consist only of base-10 digits.
541+
542Marks must be declared (via `mark`) before they can be used.
543
544* A complete 40 byte or abbreviated commit SHA-1 in hex.
545
546* Any valid Git SHA-1 expression that resolves to a commit. See
9d83e382 547 ``SPECIFYING REVISIONS'' in linkgit:gitrevisions[7] for details.
6e411d20 548
4ee1b225
FC
549* The special null SHA-1 (40 zeros) specifies that the branch is to be
550 removed.
551
6e411d20
SP
552The special case of restarting an incremental import from the
553current branch value should be written as:
554----
555 from refs/heads/branch^0
556----
6cf378f0 557The `^0` suffix is necessary as fast-import does not permit a branch to
6e411d20 558start from itself, and the branch is created in memory before the
6cf378f0 559`from` command is even read from the input. Adding `^0` will force
882227f1 560fast-import to resolve the commit through Git's revision parsing library,
6e411d20
SP
561rather than its internal branch table, thereby loading in the
562existing value of the branch.
563
564`merge`
565^^^^^^^
e7052886
ER
566Includes one additional ancestor commit. The additional ancestry
567link does not change the way the tree state is built at this commit.
568If the `from` command is
9b33fa08
EB
569omitted when creating a new branch, the first `merge` commit will be
570the first ancestor of the current commit, and the branch will start
571out with no files. An unlimited number of `merge` commands per
882227f1 572commit are permitted by fast-import, thereby establishing an n-way merge.
6e411d20 573
a8a5406a 574Here `<commit-ish>` is any of the commit specification expressions
6e411d20
SP
575also accepted by `from` (see above).
576
577`filemodify`
ef94edb5 578^^^^^^^^^^^^
6e411d20
SP
579Included in a `commit` command to add a new file or change the
580content of an existing file. This command has two different means
581of specifying the content of the file.
582
583External data format::
584 The data content for the file was already supplied by a prior
585 `blob` command. The frontend just needs to connect it.
586+
587....
588 'M' SP <mode> SP <dataref> SP <path> LF
589....
590+
334fba65 591Here usually `<dataref>` must be either a mark reference (`:<idnum>`)
6e411d20 592set by a prior `blob` command, or a full 40-byte SHA-1 of an
334fba65
JN
593existing Git blob object. If `<mode>` is `040000`` then
594`<dataref>` must be the full 40-byte SHA-1 of an existing
595Git tree object or a mark reference set with `--import-marks`.
6e411d20
SP
596
597Inline data format::
598 The data content for the file has not been supplied yet.
599 The frontend wants to supply it as part of this modify
600 command.
601+
602....
603 'M' SP <mode> SP 'inline' SP <path> LF
604 data
605....
606+
607See below for a detailed description of the `data` command.
608
609In both formats `<mode>` is the type of file entry, specified
610in octal. Git only supports the following modes:
611
612* `100644` or `644`: A normal (not-executable) file. The majority
613 of files in most projects use this mode. If in doubt, this is
614 what you want.
615* `100755` or `755`: A normal, but executable, file.
9981b6d9 616* `120000`: A symlink, the content of the file will be the link target.
03db4525
AG
617* `160000`: A gitlink, SHA-1 of the object refers to a commit in
618 another repository. Git links can only be specified by SHA or through
619 a commit mark. They are used to implement submodules.
334fba65
JN
620* `040000`: A subdirectory. Subdirectories can only be specified by
621 SHA or through a tree mark set with `--import-marks`.
6e411d20
SP
622
623In both formats `<path>` is the complete path of the file to be added
624(if not already existing) or modified (if already existing).
625
c4431d38 626A `<path>` string must use UNIX-style directory separators (forward
6e411d20
SP
627slash `/`), may contain any byte other than `LF`, and must not
628start with double quote (`"`).
629
7c65b2eb
MM
630A path can use C-style string quoting; this is accepted in all cases
631and mandatory if the filename starts with double quote or contains
632`LF`. In C-style quoting, the complete name should be surrounded with
633double quotes, and any `LF`, backslash, or double quote characters
634must be escaped by preceding them with a backslash (e.g.,
635`"path/with\n, \\ and \" in it"`).
6e411d20 636
02783075 637The value of `<path>` must be in canonical form. That is it must not:
6e411d20
SP
638
639* contain an empty directory component (e.g. `foo//bar` is invalid),
c4431d38
JK
640* end with a directory separator (e.g. `foo/` is invalid),
641* start with a directory separator (e.g. `/foo` is invalid),
6e411d20
SP
642* contain the special component `.` or `..` (e.g. `foo/./bar` and
643 `foo/../bar` are invalid).
644
e5959106
JN
645The root of the tree can be represented by an empty string as `<path>`.
646
6e411d20
SP
647It is recommended that `<path>` always be encoded using UTF-8.
648
6e411d20 649`filedelete`
ef94edb5 650^^^^^^^^^^^^
512e44b2
SP
651Included in a `commit` command to remove a file or recursively
652delete an entire directory from the branch. If the file or directory
653removal makes its parent directory empty, the parent directory will
6e411d20
SP
654be automatically removed too. This cascades up the tree until the
655first non-empty directory or the root is reached.
656
657....
658 'D' SP <path> LF
659....
660
512e44b2
SP
661here `<path>` is the complete path of the file or subdirectory to
662be removed from the branch.
6e411d20
SP
663See `filemodify` above for a detailed description of `<path>`.
664
b6f3481b 665`filecopy`
a367b869 666^^^^^^^^^^
b6f3481b
SP
667Recursively copies an existing file or subdirectory to a different
668location within the branch. The existing file or directory must
669exist. If the destination exists it will be completely replaced
670by the content copied from the source.
671
672....
673 'C' SP <path> SP <path> LF
674....
675
676here the first `<path>` is the source location and the second
677`<path>` is the destination. See `filemodify` above for a detailed
678description of what `<path>` may look like. To use a source path
679that contains SP the path must be quoted.
680
681A `filecopy` command takes effect immediately. Once the source
682location has been copied to the destination any future commands
683applied to the source location will not impact the destination of
684the copy.
685
f39a946a
SP
686`filerename`
687^^^^^^^^^^^^
688Renames an existing file or subdirectory to a different location
689within the branch. The existing file or directory must exist. If
690the destination exists it will be replaced by the source directory.
691
692....
693 'R' SP <path> SP <path> LF
694....
695
696here the first `<path>` is the source location and the second
697`<path>` is the destination. See `filemodify` above for a detailed
698description of what `<path>` may look like. To use a source path
699that contains SP the path must be quoted.
700
701A `filerename` command takes effect immediately. Once the source
702location has been renamed to the destination any future commands
703applied to the source location will create new files there and not
704impact the destination of the rename.
705
b6f3481b
SP
706Note that a `filerename` is the same as a `filecopy` followed by a
707`filedelete` of the source location. There is a slight performance
708advantage to using `filerename`, but the advantage is so small
709that it is never worth trying to convert a delete/add pair in
710source material into a rename for fast-import. This `filerename`
711command is provided just to simplify frontends that already have
712rename information and don't want bother with decomposing it into a
713`filecopy` followed by a `filedelete`.
714
825769a8
SP
715`filedeleteall`
716^^^^^^^^^^^^^^^
717Included in a `commit` command to remove all files (and also all
718directories) from the branch. This command resets the internal
719branch structure to have no files in it, allowing the frontend
720to subsequently add all interesting files from scratch.
721
722....
723 'deleteall' LF
724....
725
726This command is extremely useful if the frontend does not know
727(or does not care to know) what files are currently on the branch,
728and therefore cannot generate the proper `filedelete` commands to
729update the content.
730
731Issuing a `filedeleteall` followed by the needed `filemodify`
732commands to set the correct content will produce the same results
733as sending only the needed `filemodify` and `filedelete` commands.
882227f1 734The `filedeleteall` approach may however require fast-import to use slightly
825769a8
SP
735more memory per active branch (less than 1 MiB for even most large
736projects); so frontends that can easily obtain only the affected
737paths for a commit are encouraged to do so.
738
a8dd2e7d
JH
739`notemodify`
740^^^^^^^^^^^^
b421812b 741Included in a `commit` `<notes_ref>` command to add a new note
a8a5406a
RH
742annotating a `<commit-ish>` or change this annotation contents.
743Internally it is similar to filemodify 100644 on `<commit-ish>`
b421812b
DI
744path (maybe split into subdirectories). It's not advised to
745use any other commands to write to the `<notes_ref>` tree except
746`filedeleteall` to delete all existing notes in this tree.
747This command has two different means of specifying the content
748of the note.
a8dd2e7d
JH
749
750External data format::
751 The data content for the note was already supplied by a prior
752 `blob` command. The frontend just needs to connect it to the
753 commit that is to be annotated.
754+
755....
a8a5406a 756 'N' SP <dataref> SP <commit-ish> LF
a8dd2e7d
JH
757....
758+
759Here `<dataref>` can be either a mark reference (`:<idnum>`)
760set by a prior `blob` command, or a full 40-byte SHA-1 of an
761existing Git blob object.
762
763Inline data format::
764 The data content for the note has not been supplied yet.
765 The frontend wants to supply it as part of this modify
766 command.
767+
768....
a8a5406a 769 'N' SP 'inline' SP <commit-ish> LF
a8dd2e7d
JH
770 data
771....
772+
773See below for a detailed description of the `data` command.
774
a8a5406a 775In both formats `<commit-ish>` is any of the commit specification
a8dd2e7d
JH
776expressions also accepted by `from` (see above).
777
6e411d20
SP
778`mark`
779~~~~~~
882227f1 780Arranges for fast-import to save a reference to the current object, allowing
6e411d20
SP
781the frontend to recall this object at a future point in time, without
782knowing its SHA-1. Here the current object is the object creation
783command the `mark` command appears within. This can be `commit`,
784`tag`, and `blob`, but `commit` is the most common usage.
785
786....
787 'mark' SP ':' <idnum> LF
788....
789
790where `<idnum>` is the number assigned by the frontend to this mark.
ef94edb5
SP
791The value of `<idnum>` is expressed as an ASCII decimal integer.
792The value 0 is reserved and cannot be used as
6e411d20
SP
793a mark. Only values greater than or equal to 1 may be used as marks.
794
795New marks are created automatically. Existing marks can be moved
796to another object simply by reusing the same `<idnum>` in another
797`mark` command.
798
a965bb31
EN
799`original-oid`
800~~~~~~~~~~~~~~
801Provides the name of the object in the original source control system.
802fast-import will simply ignore this directive, but filter processes
803which operate on and modify the stream before feeding to fast-import
804may have uses for this information
805
806....
807 'original-oid' SP <object-identifier> LF
808....
809
810where `<object-identifer>` is any string not containing LF.
811
6e411d20
SP
812`tag`
813~~~~~
814Creates an annotated tag referring to a specific commit. To create
815lightweight (non-annotated) tags see the `reset` command below.
816
817....
818 'tag' SP <name> LF
f73b2aba 819 mark?
a8a5406a 820 'from' SP <commit-ish> LF
a965bb31 821 original-oid?
74fbd118 822 'tagger' (SP <name>)? SP LT <email> GT SP <when> LF
6e411d20 823 data
6e411d20
SP
824....
825
826where `<name>` is the name of the tag to create.
827
828Tag names are automatically prefixed with `refs/tags/` when stored
829in Git, so importing the CVS branch symbol `RELENG-1_0-FINAL` would
882227f1 830use just `RELENG-1_0-FINAL` for `<name>`, and fast-import will write the
6e411d20
SP
831corresponding ref as `refs/tags/RELENG-1_0-FINAL`.
832
833The value of `<name>` must be a valid refname in Git and therefore
834may contain forward slashes. As `LF` is not valid in a Git refname,
835no quoting or escaping syntax is supported here.
836
837The `from` command is the same as in the `commit` command; see
838above for details.
839
840The `tagger` command uses the same format as `committer` within
841`commit`; again see above for details.
842
843The `data` command following `tagger` must supply the annotated tag
844message (see below for `data` command syntax). To import an empty
845tag message use a 0 length data. Tag messages are free-form and are
846not interpreted by Git. Currently they must be encoded in UTF-8,
882227f1 847as fast-import does not permit other encodings to be specified.
6e411d20 848
882227f1 849Signing annotated tags during import from within fast-import is not
6e411d20
SP
850supported. Trying to include your own PGP/GPG signature is not
851recommended, as the frontend does not (easily) have access to the
852complete set of bytes which normally goes into such a signature.
882227f1 853If signing is required, create lightweight tags from within fast-import with
6e411d20 854`reset`, then create the annotated versions of those tags offline
0b444cdb 855with the standard 'git tag' process.
6e411d20
SP
856
857`reset`
858~~~~~~~
859Creates (or recreates) the named branch, optionally starting from
860a specific revision. The reset command allows a frontend to issue
861a new `from` command for an existing branch, or to create a new
862branch from an existing commit without creating a new commit.
863
864....
865 'reset' SP <ref> LF
a8a5406a 866 ('from' SP <commit-ish> LF)?
1fdb649c 867 LF?
6e411d20
SP
868....
869
a8a5406a 870For a detailed description of `<ref>` and `<commit-ish>` see above
6e411d20
SP
871under `commit` and `from`.
872
1fdb649c
SP
873The `LF` after the command is optional (it used to be required).
874
6e411d20
SP
875The `reset` command can also be used to create lightweight
876(non-annotated) tags. For example:
877
878====
879 reset refs/tags/938
880 from :938
881====
882
883would create the lightweight tag `refs/tags/938` referring to
884whatever commit mark `:938` references.
885
886`blob`
887~~~~~~
888Requests writing one file revision to the packfile. The revision
889is not connected to any commit; this connection must be formed in
890a subsequent `commit` command by referencing the blob through an
891assigned mark.
892
893....
894 'blob' LF
895 mark?
a965bb31 896 original-oid?
6e411d20
SP
897 data
898....
899
900The mark command is optional here as some frontends have chosen
901to generate the Git SHA-1 for the blob on their own, and feed that
6a5d0b0a 902directly to `commit`. This is typically more work than it's worth
6e411d20
SP
903however, as marks are inexpensive to store and easy to use.
904
905`data`
906~~~~~~
907Supplies raw data (for use as blob/file content, commit messages, or
882227f1 908annotated tag messages) to fast-import. Data can be supplied using an exact
6e411d20
SP
909byte count or delimited with a terminating line. Real frontends
910intended for production-quality conversions should always use the
911exact byte count format, as it is more robust and performs better.
882227f1 912The delimited format is intended primarily for testing fast-import.
6e411d20 913
401d53fa
SP
914Comment lines appearing within the `<raw>` part of `data` commands
915are always taken to be part of the body of the data and are therefore
916never ignored by fast-import. This makes it safe to import any
917file/message content whose lines might start with `#`.
918
ef94edb5
SP
919Exact byte count format::
920 The frontend must specify the number of bytes of data.
921+
6e411d20
SP
922....
923 'data' SP <count> LF
2c570cde 924 <raw> LF?
6e411d20 925....
ef94edb5 926+
6e411d20 927where `<count>` is the exact number of bytes appearing within
ef94edb5
SP
928`<raw>`. The value of `<count>` is expressed as an ASCII decimal
929integer. The `LF` on either side of `<raw>` is not
6e411d20 930included in `<count>` and will not be included in the imported data.
2c570cde
SP
931+
932The `LF` after `<raw>` is optional (it used to be required) but
933recommended. Always including it makes debugging a fast-import
934stream easier as the next command always starts in column 0
935of the next line, even if `<raw>` did not end with an `LF`.
6e411d20 936
ef94edb5
SP
937Delimited format::
938 A delimiter string is used to mark the end of the data.
882227f1 939 fast-import will compute the length by searching for the delimiter.
02783075 940 This format is primarily useful for testing and is not
ef94edb5
SP
941 recommended for real data.
942+
6e411d20
SP
943....
944 'data' SP '<<' <delim> LF
945 <raw> LF
946 <delim> LF
2c570cde 947 LF?
6e411d20 948....
ef94edb5 949+
6e411d20
SP
950where `<delim>` is the chosen delimiter string. The string `<delim>`
951must not appear on a line by itself within `<raw>`, as otherwise
882227f1 952fast-import will think the data ends earlier than it really does. The `LF`
6e411d20
SP
953immediately trailing `<raw>` is part of `<raw>`. This is one of
954the limitations of the delimited format, it is impossible to supply
955a data chunk which does not have an LF as its last byte.
2c570cde
SP
956+
957The `LF` after `<delim> LF` is optional (it used to be required).
6e411d20 958
b8f50e5b
EN
959`alias`
960~~~~~~~
961Record that a mark refers to a given object without first creating any
962new object.
963
964....
965 'alias' LF
966 mark
967 'to' SP <commit-ish> LF
968 LF?
969....
970
971For a detailed description of `<commit-ish>` see above under `from`.
972
973
6e411d20
SP
974`checkpoint`
975~~~~~~~~~~~~
882227f1 976Forces fast-import to close the current packfile, start a new one, and to
820b9310 977save out all current branch refs, tags and marks.
6e411d20
SP
978
979....
980 'checkpoint' LF
1fdb649c 981 LF?
6e411d20
SP
982....
983
882227f1 984Note that fast-import automatically switches packfiles when the current
1c262bb7 985packfile reaches --max-pack-size, or 4 GiB, whichever limit is
882227f1 986smaller. During an automatic packfile switch fast-import does not update
820b9310
SP
987the branch refs, tags or marks.
988
989As a `checkpoint` can require a significant amount of CPU time and
990disk IO (to compute the overall pack SHA-1 checksum, generate the
991corresponding index file, and update the refs) it can easily take
992several minutes for a single `checkpoint` command to complete.
993
994Frontends may choose to issue checkpoints during extremely large
995and long running imports, or when they need to allow another Git
996process access to a branch. However given that a 30 GiB Subversion
882227f1 997repository can be loaded into Git through fast-import in about 3 hours,
820b9310
SP
998explicit checkpointing may not be necessary.
999
1fdb649c 1000The `LF` after the command is optional (it used to be required).
820b9310 1001
ac053c02
SP
1002`progress`
1003~~~~~~~~~~
1004Causes fast-import to print the entire `progress` line unmodified to
1005its standard output channel (file descriptor 1) when the command is
1006processed from the input stream. The command otherwise has no impact
1007on the current import, or on any of fast-import's internal state.
1008
1009....
1010 'progress' SP <any> LF
1011 LF?
1012....
1013
1014The `<any>` part of the command may contain any sequence of bytes
1015that does not contain `LF`. The `LF` after the command is optional.
1016Callers may wish to process the output through a tool such as sed to
1017remove the leading part of the line, for example:
1018
1019====
b1889c36 1020 frontend | git fast-import | sed 's/^progress //'
ac053c02
SP
1021====
1022
1023Placing a `progress` command immediately after a `checkpoint` will
1024inform the reader when the `checkpoint` has been completed and it
1025can safely access the refs that fast-import updated.
1026
28c7b1f7
MH
1027`get-mark`
1028~~~~~~~~~~
1029Causes fast-import to print the SHA-1 corresponding to a mark to
1030stdout or to the file descriptor previously arranged with the
1031`--cat-blob-fd` argument. The command otherwise has no impact on the
1032current import; its purpose is to retrieve SHA-1s that later commits
1033might want to refer to in their commit messages.
1034
1035....
1036 'get-mark' SP ':' <idnum> LF
1037....
1038
28c7b1f7
MH
1039See ``Responses To Commands'' below for details about how to read
1040this output safely.
1041
85c62395
DB
1042`cat-blob`
1043~~~~~~~~~~
1044Causes fast-import to print a blob to a file descriptor previously
1045arranged with the `--cat-blob-fd` argument. The command otherwise
1046has no impact on the current import; its main purpose is to
1047retrieve blobs that may be in fast-import's memory but not
1048accessible from the target repository.
1049
1050....
1051 'cat-blob' SP <dataref> LF
1052....
1053
1054The `<dataref>` can be either a mark reference (`:<idnum>`)
1055set previously or a full 40-byte SHA-1 of a Git blob, preexisting or
1056ready to be written.
1057
898243b8 1058Output uses the same format as `git cat-file --batch`:
85c62395
DB
1059
1060====
1061 <sha1> SP 'blob' SP <size> LF
1062 <contents> LF
1063====
1064
7ffde293
EN
1065This command can be used where a `filemodify` directive can appear,
1066allowing it to be used in the middle of a commit. For a `filemodify`
1067using an inline directive, it can also appear right before the `data`
1068directive.
777f80d7 1069
d57e490a
JN
1070See ``Responses To Commands'' below for details about how to read
1071this output safely.
1072
8dc6a373
DB
1073`ls`
1074~~~~
1075Prints information about the object at a path to a file descriptor
1076previously arranged with the `--cat-blob-fd` argument. This allows
1077printing a blob from the active commit (with `cat-blob`) or copying a
1078blob or tree from a previous commit for use in the current one (with
1079`filemodify`).
1080
a63c54a0
EN
1081The `ls` command can also be used where a `filemodify` directive can
1082appear, allowing it to be used in the middle of a commit.
8dc6a373
DB
1083
1084Reading from the active commit::
1085 This form can only be used in the middle of a `commit`.
1086 The path names a directory entry within fast-import's
1087 active commit. The path must be quoted in this case.
1088+
1089....
1090 'ls' SP <path> LF
1091....
1092
1093Reading from a named tree::
1094 The `<dataref>` can be a mark reference (`:<idnum>`) or the
1095 full 40-byte SHA-1 of a Git tag, commit, or tree object,
1096 preexisting or waiting to be written.
1097 The path is relative to the top level of the tree
1098 named by `<dataref>`.
1099+
1100....
1101 'ls' SP <dataref> SP <path> LF
1102....
1103
1104See `filemodify` above for a detailed description of `<path>`.
1105
6cf378f0 1106Output uses the same format as `git ls-tree <tree> -- <path>`:
8dc6a373
DB
1107
1108====
1109 <mode> SP ('blob' | 'tree' | 'commit') SP <dataref> HT <path> LF
1110====
1111
1112The <dataref> represents the blob, tree, or commit object at <path>
28c7b1f7
MH
1113and can be used in later 'get-mark', 'cat-blob', 'filemodify', or
1114'ls' commands.
8dc6a373
DB
1115
1116If there is no file or subtree at that path, 'git fast-import' will
1117instead report
1118
1119====
1120 missing SP <path> LF
1121====
1122
d57e490a
JN
1123See ``Responses To Commands'' below for details about how to read
1124this output safely.
1125
f963bd5d
SR
1126`feature`
1127~~~~~~~~~
1128Require that fast-import supports the specified feature, or abort if
1129it does not.
1130
1131....
4980fffb 1132 'feature' SP <feature> ('=' <argument>)? LF
f963bd5d
SR
1133....
1134
4980fffb 1135The <feature> part of the command may be any one of the following:
f963bd5d 1136
4980fffb
JN
1137date-format::
1138export-marks::
1139relative-marks::
1140no-relative-marks::
1141force::
1142 Act as though the corresponding command-line option with
04b125de 1143 a leading `--` was passed on the command line
4980fffb 1144 (see OPTIONS, above).
f963bd5d 1145
4980fffb 1146import-marks::
3beb4fc4 1147import-marks-if-exists::
4980fffb 1148 Like --import-marks except in two respects: first, only one
3beb4fc4
DI
1149 "feature import-marks" or "feature import-marks-if-exists"
1150 command is allowed per stream; second, an --import-marks=
1151 or --import-marks-if-exists command-line option overrides
1152 any of these "feature" commands in the stream; third,
1153 "feature import-marks-if-exists" like a corresponding
1154 command-line option silently skips a nonexistent file.
f963bd5d 1155
28c7b1f7 1156get-mark::
85c62395 1157cat-blob::
8dc6a373 1158ls::
28c7b1f7
MH
1159 Require that the backend support the 'get-mark', 'cat-blob',
1160 or 'ls' command respectively.
8dc6a373
DB
1161 Versions of fast-import not supporting the specified command
1162 will exit with a message indicating so.
85c62395
DB
1163 This lets the import error out early with a clear message,
1164 rather than wasting time on the early part of an import
1165 before the unsupported command is detected.
081751c8 1166
547e8b92
JN
1167notes::
1168 Require that the backend support the 'notemodify' (N)
1169 subcommand to the 'commit' command.
1170 Versions of fast-import not supporting notes will exit
1171 with a message indicating so.
1172
be56862f
SR
1173done::
1174 Error out if the stream ends without a 'done' command.
1175 Without this feature, errors causing the frontend to end
1176 abruptly at a convenient point in the stream can go
3266de10
ER
1177 undetected. This may occur, for example, if an import
1178 front end dies in mid-operation without emitting SIGTERM
1179 or SIGKILL at its subordinate git fast-import instance.
a8e4a594 1180
9c8398f0
SR
1181`option`
1182~~~~~~~~
1183Processes the specified option so that git fast-import behaves in a
1184way that suits the frontend's needs.
1185Note that options specified by the frontend are overridden by any
1186options the user may specify to git fast-import itself.
1187
1188....
1189 'option' SP <option> LF
1190....
1191
1192The `<option>` part of the command may contain any of the options
1193listed in the OPTIONS section that do not change import semantics,
04b125de 1194without the leading `--` and is treated in the same way.
9c8398f0
SR
1195
1196Option commands must be the first commands on the input (not counting
1197feature commands), to give an option command after any non-option
1198command is an error.
1199
06ab60c0 1200The following command-line options change import semantics and may therefore
9c8398f0
SR
1201not be passed as option:
1202
1203* date-format
1204* import-marks
1205* export-marks
85c62395 1206* cat-blob-fd
9c8398f0
SR
1207* force
1208
be56862f
SR
1209`done`
1210~~~~~~
1211If the `done` feature is not in use, treated as if EOF was read.
1212This can be used to tell fast-import to finish early.
1213
06ab60c0 1214If the `--done` command-line option or `feature done` command is
be56862f
SR
1215in use, the `done` command is mandatory and marks the end of the
1216stream.
1217
76a8788c 1218RESPONSES TO COMMANDS
d57e490a
JN
1219---------------------
1220New objects written by fast-import are not available immediately.
1221Most fast-import commands have no visible effect until the next
1222checkpoint (or completion). The frontend can send commands to
1223fill fast-import's input pipe without worrying about how quickly
1224they will take effect, which improves performance by simplifying
1225scheduling.
1226
1227For some frontends, though, it is useful to be able to read back
1228data from the current repository as it is being updated (for
1229example when the source material describes objects in terms of
1230patches to be applied to previously imported objects). This can
1231be accomplished by connecting the frontend and fast-import via
1232bidirectional pipes:
1233
1234====
1235 mkfifo fast-import-output
1236 frontend <fast-import-output |
1237 git fast-import >fast-import-output
1238====
1239
28c7b1f7
MH
1240A frontend set up this way can use `progress`, `get-mark`, `ls`, and
1241`cat-blob` commands to read information from the import in progress.
d57e490a
JN
1242
1243To avoid deadlock, such frontends must completely consume any
28c7b1f7 1244pending output from `progress`, `ls`, `get-mark`, and `cat-blob` before
d57e490a
JN
1245performing writes to fast-import that might block.
1246
76a8788c 1247CRASH REPORTS
e7e5170f
SP
1248-------------
1249If fast-import is supplied invalid input it will terminate with a
1250non-zero exit status and create a crash report in the top level of
1251the Git repository it was importing into. Crash reports contain
1252a snapshot of the internal fast-import state as well as the most
1253recent commands that lead up to the crash.
1254
1255All recent commands (including stream comments, file changes and
1256progress commands) are shown in the command history within the crash
1257report, but raw file data and commit messages are excluded from the
1258crash report. This exclusion saves space within the report file
1259and reduces the amount of buffering that fast-import must perform
1260during execution.
1261
1262After writing a crash report fast-import will close the current
1263packfile and export the marks table. This allows the frontend
1264developer to inspect the repository state and resume the import from
1265the point where it crashed. The modified branches and tags are not
1266updated during a crash, as the import did not complete successfully.
1267Branch and tag information can be found in the crash report and
1268must be applied manually if the update is needed.
1269
1270An example crash:
1271
1272====
1273 $ cat >in <<END_OF_INPUT
1274 # my very first test commit
1275 commit refs/heads/master
1276 committer Shawn O. Pearce <spearce> 19283 -0400
1277 # who is that guy anyway?
1278 data <<EOF
1279 this is my commit
1280 EOF
1281 M 644 inline .gitignore
1282 data <<EOF
1283 .gitignore
1284 EOF
1285 M 777 inline bob
1286 END_OF_INPUT
1287
b1889c36 1288 $ git fast-import <in
e7e5170f
SP
1289 fatal: Corrupt mode: M 777 inline bob
1290 fast-import: dumping crash report to .git/fast_import_crash_8434
1291
1292 $ cat .git/fast_import_crash_8434
1293 fast-import crash report:
1294 fast-import process: 8434
1295 parent process : 1391
1296 at Sat Sep 1 00:58:12 2007
1297
1298 fatal: Corrupt mode: M 777 inline bob
1299
1300 Most Recent Commands Before Crash
1301 ---------------------------------
1302 # my very first test commit
1303 commit refs/heads/master
1304 committer Shawn O. Pearce <spearce> 19283 -0400
1305 # who is that guy anyway?
1306 data <<EOF
1307 M 644 inline .gitignore
1308 data <<EOF
1309 * M 777 inline bob
1310
1311 Active Branch LRU
1312 -----------------
1313 active_branches = 1 cur, 5 max
1314
1315 pos clock name
1316 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1317 1) 0 refs/heads/master
1318
1319 Inactive Branches
1320 -----------------
1321 refs/heads/master:
1322 status : active loaded dirty
1323 tip commit : 0000000000000000000000000000000000000000
1324 old tree : 0000000000000000000000000000000000000000
1325 cur tree : 0000000000000000000000000000000000000000
1326 commit clock: 0
1327 last pack :
1328
1329
1330 -------------------
1331 END OF CRASH REPORT
1332====
1333
76a8788c 1334TIPS AND TRICKS
bdd9f424
SP
1335---------------
1336The following tips and tricks have been collected from various
882227f1 1337users of fast-import, and are offered here as suggestions.
bdd9f424
SP
1338
1339Use One Mark Per Commit
1340~~~~~~~~~~~~~~~~~~~~~~~
1341When doing a repository conversion, use a unique mark per commit
1c262bb7 1342(`mark :<n>`) and supply the --export-marks option on the command
882227f1 1343line. fast-import will dump a file which lists every mark and the Git
bdd9f424
SP
1344object SHA-1 that corresponds to it. If the frontend can tie
1345the marks back to the source repository, it is easy to verify the
1346accuracy and completeness of the import by comparing each Git
1347commit to the corresponding source revision.
1348
1349Coming from a system such as Perforce or Subversion this should be
882227f1 1350quite simple, as the fast-import mark can also be the Perforce changeset
bdd9f424
SP
1351number or the Subversion revision number.
1352
1353Freely Skip Around Branches
1354~~~~~~~~~~~~~~~~~~~~~~~~~~~
1355Don't bother trying to optimize the frontend to stick to one branch
1356at a time during an import. Although doing so might be slightly
882227f1 1357faster for fast-import, it tends to increase the complexity of the frontend
bdd9f424
SP
1358code considerably.
1359
882227f1 1360The branch LRU builtin to fast-import tends to behave very well, and the
bdd9f424
SP
1361cost of activating an inactive branch is so low that bouncing around
1362between branches has virtually no impact on import performance.
1363
c7346156
SP
1364Handling Renames
1365~~~~~~~~~~~~~~~~
1366When importing a renamed file or directory, simply delete the old
1367name(s) and modify the new name(s) during the corresponding commit.
1368Git performs rename detection after-the-fact, rather than explicitly
1369during a commit.
1370
bdd9f424
SP
1371Use Tag Fixup Branches
1372~~~~~~~~~~~~~~~~~~~~~~
1373Some other SCM systems let the user create a tag from multiple
1374files which are not from the same commit/changeset. Or to create
1375tags which are a subset of the files available in the repository.
1376
1377Importing these tags as-is in Git is impossible without making at
1378least one commit which ``fixes up'' the files to match the content
882227f1 1379of the tag. Use fast-import's `reset` command to reset a dummy branch
bdd9f424
SP
1380outside of your normal branch space to the base commit for the tag,
1381then commit one or more file fixup commits, and finally tag the
1382dummy branch.
1383
1384For example since all normal branches are stored under `refs/heads/`
1385name the tag fixup branch `TAG_FIXUP`. This way it is impossible for
1386the fixup branch used by the importer to have namespace conflicts
1387with real branches imported from the source (the name `TAG_FIXUP`
1388is not `refs/heads/TAG_FIXUP`).
1389
1390When committing fixups, consider using `merge` to connect the
1391commit(s) which are supplying file revisions to the fixup branch.
0b444cdb 1392Doing so will allow tools such as 'git blame' to track
bdd9f424
SP
1393through the real commit history and properly annotate the source
1394files.
1395
882227f1 1396After fast-import terminates the frontend will need to do `rm .git/TAG_FIXUP`
bdd9f424
SP
1397to remove the dummy branch.
1398
1399Import Now, Repack Later
1400~~~~~~~~~~~~~~~~~~~~~~~~
882227f1 1401As soon as fast-import completes the Git repository is completely valid
02783075 1402and ready for use. Typically this takes only a very short time,
bdd9f424
SP
1403even for considerably large projects (100,000+ commits).
1404
1405However repacking the repository is necessary to improve data
1406locality and access performance. It can also take hours on extremely
1c262bb7 1407large projects (especially if -f and a large --window parameter is
bdd9f424
SP
1408used). Since repacking is safe to run alongside readers and writers,
1409run the repack in the background and let it finish when it finishes.
1410There is no reason to wait to explore your new Git project!
1411
1412If you choose to wait for the repack, don't try to run benchmarks
882227f1 1413or performance tests until repacking is completed. fast-import outputs
bdd9f424
SP
1414suboptimal packfiles that are simply never seen in real use
1415situations.
1416
1417Repacking Historical Data
1418~~~~~~~~~~~~~~~~~~~~~~~~~
1419If you are repacking very old imported data (e.g. older than the
1420last year), consider expending some extra CPU time and supplying
1c262bb7 1421--window=50 (or higher) when you run 'git repack'.
bdd9f424
SP
1422This will take longer, but will also produce a smaller packfile.
1423You only need to expend the effort once, and everyone using your
1424project will benefit from the smaller repository.
1425
ac053c02
SP
1426Include Some Progress Messages
1427~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1428Every once in a while have your frontend emit a `progress` message
1429to fast-import. The contents of the messages are entirely free-form,
1430so one suggestion would be to output the current month and year
1431each time the current commit date moves into the next month.
1432Your users will feel better knowing how much of the data stream
1433has been processed.
1434
bdd9f424 1435
76a8788c 1436PACKFILE OPTIMIZATION
6e411d20 1437---------------------
882227f1 1438When packing a blob fast-import always attempts to deltify against the last
6e411d20
SP
1439blob written. Unless specifically arranged for by the frontend,
1440this will probably not be a prior version of the same file, so the
1441generated delta will not be the smallest possible. The resulting
1442packfile will be compressed, but will not be optimal.
1443
1444Frontends which have efficient access to all revisions of a
1445single file (for example reading an RCS/CVS ,v file) can choose
1446to supply all revisions of that file as a sequence of consecutive
882227f1 1447`blob` commands. This allows fast-import to deltify the different file
6e411d20
SP
1448revisions against each other, saving space in the final packfile.
1449Marks can be used to later identify individual file revisions during
1450a sequence of `commit` commands.
1451
882227f1
SP
1452The packfile(s) created by fast-import do not encourage good disk access
1453patterns. This is caused by fast-import writing the data in the order
6e411d20
SP
1454it is received on standard input, while Git typically organizes
1455data within packfiles to make the most recent (current tip) data
1456appear before historical data. Git also clusters commits together,
1457speeding up revision traversal through better cache locality.
1458
1459For this reason it is strongly recommended that users repack the
882227f1 1460repository with `git repack -a -d` after fast-import completes, allowing
6e411d20
SP
1461Git to reorganize the packfiles for faster data access. If blob
1462deltas are suboptimal (see above) then also adding the `-f` option
1463to force recomputation of all deltas can significantly reduce the
1464final packfile size (30-50% smaller can be quite typical).
1465
73845048
ÆAB
1466Instead of running `git repack` you can also run `git gc
1467--aggressive`, which will also optimize other things after an import
1468(e.g. pack loose refs). As noted in the "AGGRESSIVE" section in
1469linkgit:git-gc[1] the `--aggressive` option will find new deltas with
1470the `-f` option to linkgit:git-repack[1]. For the reasons elaborated
1471on above using `--aggressive` after a fast-import is one of the few
1472cases where it's known to be worthwhile.
bdd9f424 1473
76a8788c 1474MEMORY UTILIZATION
6e411d20 1475------------------
882227f1 1476There are a number of factors which affect how much memory fast-import
6e411d20 1477requires to perform an import. Like critical sections of core
02783075
BH
1478Git, fast-import uses its own memory allocators to amortize any overheads
1479associated with malloc. In practice fast-import tends to amortize any
6e411d20
SP
1480malloc overheads to 0, due to its use of large block allocations.
1481
1482per object
1483~~~~~~~~~~
882227f1 1484fast-import maintains an in-memory structure for every object written in
6e411d20
SP
1485this execution. On a 32 bit system the structure is 32 bytes,
1486on a 64 bit system the structure is 40 bytes (due to the larger
1487pointer sizes). Objects in the table are not deallocated until
882227f1 1488fast-import terminates. Importing 2 million objects on a 32 bit system
6e411d20
SP
1489will require approximately 64 MiB of memory.
1490
1491The object table is actually a hashtable keyed on the object name
882227f1 1492(the unique SHA-1). This storage configuration allows fast-import to reuse
6e411d20
SP
1493an existing or already written object and avoid writing duplicates
1494to the output packfile. Duplicate blobs are surprisingly common
1495in an import, typically due to branch merges in the source.
1496
1497per mark
1498~~~~~~~~
1499Marks are stored in a sparse array, using 1 pointer (4 bytes or 8
1500bytes, depending on pointer size) per mark. Although the array
1501is sparse, frontends are still strongly encouraged to use marks
1502between 1 and n, where n is the total number of marks required for
1503this import.
1504
1505per branch
1506~~~~~~~~~~
1507Branches are classified as active and inactive. The memory usage
1508of the two classes is significantly different.
1509
1510Inactive branches are stored in a structure which uses 96 or 120
1511bytes (32 bit or 64 bit systems, respectively), plus the length of
882227f1 1512the branch name (typically under 200 bytes), per branch. fast-import will
6e411d20
SP
1513easily handle as many as 10,000 inactive branches in under 2 MiB
1514of memory.
1515
1516Active branches have the same overhead as inactive branches, but
1517also contain copies of every tree that has been recently modified on
1518that branch. If subtree `include` has not been modified since the
1519branch became active, its contents will not be loaded into memory,
1520but if subtree `src` has been modified by a commit since the branch
1521became active, then its contents will be loaded in memory.
1522
1523As active branches store metadata about the files contained on that
1524branch, their in-memory storage size can grow to a considerable size
1525(see below).
1526
882227f1 1527fast-import automatically moves active branches to inactive status based on
6e411d20
SP
1528a simple least-recently-used algorithm. The LRU chain is updated on
1529each `commit` command. The maximum number of active branches can be
1c262bb7 1530increased or decreased on the command line with --active-branches=.
6e411d20
SP
1531
1532per active tree
1533~~~~~~~~~~~~~~~
1534Trees (aka directories) use just 12 bytes of memory on top of the
1535memory required for their entries (see ``per active file'' below).
02783075 1536The cost of a tree is virtually 0, as its overhead amortizes out
6e411d20
SP
1537over the individual file entries.
1538
1539per active file entry
1540~~~~~~~~~~~~~~~~~~~~~
1541Files (and pointers to subtrees) within active trees require 52 or 64
1542bytes (32/64 bit platforms) per entry. To conserve space, file and
1543tree names are pooled in a common string table, allowing the filename
1544``Makefile'' to use just 16 bytes (after including the string header
1545overhead) no matter how many times it occurs within the project.
1546
1547The active branch LRU, when coupled with the filename string pool
882227f1 1548and lazy loading of subtrees, allows fast-import to efficiently import
6e411d20
SP
1549projects with 2,000+ branches and 45,114+ files in a very limited
1550memory footprint (less than 2.7 MiB per active branch).
1551
76a8788c 1552SIGNALS
dc01f59d
JN
1553-------
1554Sending *SIGUSR1* to the 'git fast-import' process ends the current
1555packfile early, simulating a `checkpoint` command. The impatient
1556operator can use this facility to peek at the objects and refs from an
1557import in progress, at the cost of some added running time and worse
1558compression.
6e411d20 1559
26726718
MH
1560SEE ALSO
1561--------
1562linkgit:git-fast-export[1]
1563
6e411d20
SP
1564GIT
1565---
9e1f0a85 1566Part of the linkgit:git[1] suite