]> git.ipfire.org Git - thirdparty/git.git/blame - Documentation/git-fast-import.txt
Fix sizeof usage in get_permutations
[thirdparty/git.git] / Documentation / git-fast-import.txt
CommitLineData
6e411d20
SP
1git-fast-import(1)
2==================
3
4NAME
5----
7a33631f 6git-fast-import - Backend for fast Git data importers
6e411d20
SP
7
8
9SYNOPSIS
10--------
7791a1d9 11[verse]
b1889c36 12frontend | 'git fast-import' [options]
6e411d20
SP
13
14DESCRIPTION
15-----------
16This program is usually not what the end user wants to run directly.
17Most end users want to use one of the existing frontend programs,
18which parses a specific type of foreign source and feeds the contents
0b444cdb 19stored there to 'git fast-import'.
6e411d20 20
882227f1 21fast-import reads a mixed command/data stream from standard input and
6e411d20
SP
22writes one or more packfiles directly into the current repository.
23When EOF is received on standard input, fast import writes out
24updated branch and tag refs, fully updating the current repository
25with the newly imported data.
26
882227f1 27The fast-import backend itself can import into an empty repository (one that
0b444cdb 28has already been initialized by 'git init') or incrementally
6e411d20
SP
29update an existing populated repository. Whether or not incremental
30imports are supported from a particular foreign source depends on
31the frontend program in use.
32
33
34OPTIONS
35-------
63e0c8b3
SP
36--date-format=<fmt>::
37 Specify the type of dates the frontend will supply to
882227f1 38 fast-import within `author`, `committer` and `tagger` commands.
63e0c8b3
SP
39 See ``Date Formats'' below for details about which formats
40 are supported, and their syntax.
41
3266de10
ER
42-- done::
43 Terminate with error if there is no 'done' command at the
44 end of the stream.
45
7073e69e
SP
46--force::
47 Force updating modified existing branches, even if doing
48 so would cause commits to be lost (as the new commit does
49 not contain the old commit).
50
6e411d20 51--max-pack-size=<n>::
4d0cc224 52 Maximum size of each output packfile.
89e0a3a1 53 The default is unlimited.
6e411d20 54
5eef828b
SP
55--big-file-threshold=<n>::
56 Maximum size of a blob that fast-import will attempt to
57 create a delta for, expressed in bytes. The default is 512m
58 (512 MiB). Some importers may wish to lower this on systems
59 with constrained memory.
60
6e411d20
SP
61--depth=<n>::
62 Maximum delta depth, for blob and tree deltification.
63 Default is 10.
64
65--active-branches=<n>::
66 Maximum number of branches to maintain active at once.
67 See ``Memory Utilization'' below for details. Default is 5.
68
69--export-marks=<file>::
70 Dumps the internal marks table to <file> when complete.
71 Marks are written one per line as `:markid SHA-1`.
72 Frontends can use this file to validate imports after they
e8438420
SP
73 have been completed, or to save the marks table across
74 incremental runs. As <file> is only opened and truncated
75 at checkpoint (or completion) the same path can also be
76 safely given to \--import-marks.
77
78--import-marks=<file>::
79 Before processing any input, load the marks specified in
80 <file>. The input file must exist, must be readable, and
81 must use the same format as produced by \--export-marks.
82 Multiple options may be supplied to import more than one
83 set of marks. If a mark is defined to different values,
84 the last file wins.
6e411d20 85
dded4f12
RR
86--import-marks-if-exists=<file>::
87 Like --import-marks but instead of erroring out, silently
88 skips the file if it does not exist.
89
bc3c79ae 90--relative-marks::
9fee24ca 91 After specifying --relative-marks the paths specified
bc3c79ae
SR
92 with --import-marks= and --export-marks= are relative
93 to an internal directory in the current repository.
94 In git-fast-import this means that the paths are relative
95 to the .git/info/fast-import directory. However, other
96 importers may use a different location.
97
98--no-relative-marks::
99 Negates a previous --relative-marks. Allows for combining
100 relative and non-relative marks by interweaving
9fee24ca 101 --(no-)-relative-marks with the --(import|export)-marks=
bc3c79ae
SR
102 options.
103
85c62395 104--cat-blob-fd=<fd>::
d57e490a
JN
105 Write responses to `cat-blob` and `ls` queries to the
106 file descriptor <fd> instead of `stdout`. Allows `progress`
107 output intended for the end-user to be separated from other
108 output.
85c62395 109
be56862f
SR
110--done::
111 Require a `done` command at the end of the stream.
112 This option might be useful for detecting errors that
113 cause the frontend to terminate before it has started to
114 write a stream.
115
bdf1c06d
SP
116--export-pack-edges=<file>::
117 After creating a packfile, print a line of data to
118 <file> listing the filename of the packfile and the last
119 commit on each branch that was written to that packfile.
120 This information may be useful after importing projects
121 whose total object set exceeds the 4 GiB packfile limit,
122 as these commits can be used as edge points during calls
0b444cdb 123 to 'git pack-objects'.
bdf1c06d 124
c499d768 125--quiet::
882227f1 126 Disable all non-fatal output, making fast-import silent when it
7f9d77f2 127 is successful. This option disables the output shown by
c499d768
SP
128 \--stats.
129
130--stats::
882227f1 131 Display some basic statistics about the objects fast-import has
c499d768 132 created, the packfiles they were stored into, and the
882227f1 133 memory used by fast-import during this run. Showing this output
c499d768
SP
134 is currently the default, but can be disabled with \--quiet.
135
136
6e411d20
SP
137Performance
138-----------
882227f1 139The design of fast-import allows it to import large projects in a minimum
6e411d20 140amount of memory usage and processing time. Assuming the frontend
882227f1 141is able to keep up with fast-import and feed it a constant stream of data,
6e411d20
SP
142import times for projects holding 10+ years of history and containing
143100,000+ individual commits are generally completed in just 1-2
144hours on quite modest (~$2,000 USD) hardware.
145
146Most bottlenecks appear to be in foreign source data access (the
882227f1 147source just cannot extract revisions fast enough) or disk IO (fast-import
6e411d20
SP
148writes as fast as the disk will take the data). Imports will run
149faster if the source data is stored on a different drive than the
150destination Git repository (due to less IO contention).
151
152
153Development Cost
154----------------
882227f1 155A typical frontend for fast-import tends to weigh in at approximately 200
6e411d20
SP
156lines of Perl/Python/Ruby code. Most developers have been able to
157create working importers in just a couple of hours, even though it
882227f1 158is their first exposure to fast-import, and sometimes even to Git. This is
6e411d20
SP
159an ideal situation, given that most conversion tools are throw-away
160(use once, and never look back).
161
162
163Parallel Operation
164------------------
0b444cdb 165Like 'git push' or 'git fetch', imports handled by fast-import are safe to
6e411d20 166run alongside parallel `git repack -a -d` or `git gc` invocations,
0b444cdb 167or any other Git operation (including 'git prune', as loose objects
882227f1 168are never used by fast-import).
6e411d20 169
882227f1
SP
170fast-import does not lock the branch or tag refs it is actively importing.
171After the import, during its ref update phase, fast-import tests each
7073e69e
SP
172existing branch ref to verify the update will be a fast-forward
173update (the commit stored in the ref is contained in the new
174history of the commit to be written). If the update is not a
882227f1
SP
175fast-forward update, fast-import will skip updating that ref and instead
176prints a warning message. fast-import will always attempt to update all
7073e69e
SP
177branch refs, and does not stop on the first failure.
178
6a5d0b0a 179Branch updates can be forced with \--force, but it's recommended that
c499d768 180this only be used on an otherwise quiet repository. Using \--force
7073e69e 181is not necessary for an initial import into an empty repository.
6e411d20
SP
182
183
184Technical Discussion
185--------------------
882227f1 186fast-import tracks a set of branches in memory. Any branch can be created
6e411d20
SP
187or modified at any point during the import process by sending a
188`commit` command on the input stream. This design allows a frontend
189program to process an unlimited number of branches simultaneously,
190generating commits in the order they are available from the source
191data. It also simplifies the frontend programs considerably.
192
882227f1 193fast-import does not use or alter the current working directory, or any
6e411d20
SP
194file within it. (It does however update the current Git repository,
195as referenced by `GIT_DIR`.) Therefore an import frontend may use
196the working directory for its own purposes, such as extracting file
197revisions from the foreign source. This ignorance of the working
882227f1 198directory also allows fast-import to run very quickly, as it does not
6e411d20
SP
199need to perform any costly file update operations when switching
200between branches.
201
202Input Format
203------------
204With the exception of raw file data (which Git does not interpret)
882227f1 205the fast-import input format is text (ASCII) based. This text based
6e411d20
SP
206format simplifies development and debugging of frontend programs,
207especially when a higher level language such as Perl, Python or
208Ruby is being used.
209
882227f1 210fast-import is very strict about its input. Where we say SP below we mean
8dc6a373
DB
211*exactly* one space. Likewise LF means one (and only one) linefeed
212and HT one (and only one) horizontal tab.
6e411d20
SP
213Supplying additional whitespace characters will cause unexpected
214results, such as branch names or file names with leading or trailing
882227f1 215spaces in their name, or early termination of fast-import when it encounters
6e411d20
SP
216unexpected input.
217
401d53fa
SP
218Stream Comments
219~~~~~~~~~~~~~~~
220To aid in debugging frontends fast-import ignores any line that
221begins with `#` (ASCII pound/hash) up to and including the line
222ending `LF`. A comment line may contain any sequence of bytes
223that does not contain an LF and therefore may be used to include
224any detailed debugging information that might be specific to the
225frontend and useful when inspecting a fast-import data stream.
226
63e0c8b3
SP
227Date Formats
228~~~~~~~~~~~~
229The following date formats are supported. A frontend should select
230the format it will use for this import by passing the format name
c499d768 231in the \--date-format=<fmt> command line option.
63e0c8b3
SP
232
233`raw`::
9b92c82f 234 This is the Git native format and is `<time> SP <offutc>`.
882227f1 235 It is also fast-import's default format, if \--date-format was
63e0c8b3
SP
236 not specified.
237+
238The time of the event is specified by `<time>` as the number of
239seconds since the UNIX epoch (midnight, Jan 1, 1970, UTC) and is
240written as an ASCII decimal integer.
241+
9b92c82f
SP
242The local offset is specified by `<offutc>` as a positive or negative
243offset from UTC. For example EST (which is 5 hours behind UTC)
244would be expressed in `<tz>` by ``-0500'' while UTC is ``+0000''.
245The local offset does not affect `<time>`; it is used only as an
246advisement to help formatting routines display the timestamp.
63e0c8b3 247+
9b92c82f
SP
248If the local offset is not available in the source material, use
249``+0000'', or the most common local offset. For example many
63e0c8b3
SP
250organizations have a CVS repository which has only ever been accessed
251by users who are located in the same location and timezone. In this
f842fdb0 252case a reasonable offset from UTC could be assumed.
63e0c8b3
SP
253+
254Unlike the `rfc2822` format, this format is very strict. Any
882227f1 255variation in formatting will cause fast-import to reject the value.
63e0c8b3
SP
256
257`rfc2822`::
258 This is the standard email format as described by RFC 2822.
259+
260An example value is ``Tue Feb 6 11:22:18 2007 -0500''. The Git
f842fdb0 261parser is accurate, but a little on the lenient side. It is the
0b444cdb 262same parser used by 'git am' when applying patches
63e0c8b3
SP
263received from email.
264+
265Some malformed strings may be accepted as valid dates. In some of
266these cases Git will still be able to obtain the correct date from
267the malformed string. There are also some types of malformed
268strings which Git will parse wrong, and yet consider valid.
269Seriously malformed strings will be rejected.
270+
9b92c82f
SP
271Unlike the `raw` format above, the timezone/UTC offset information
272contained in an RFC 2822 date string is used to adjust the date
273value to UTC prior to storage. Therefore it is important that
274this information be as accurate as possible.
275+
f842fdb0 276If the source material uses RFC 2822 style dates,
882227f1 277the frontend should let fast-import handle the parsing and conversion
63e0c8b3
SP
278(rather than attempting to do it itself) as the Git parser has
279been well tested in the wild.
280+
281Frontends should prefer the `raw` format if the source material
f842fdb0 282already uses UNIX-epoch format, can be coaxed to give dates in that
02783075 283format, or its format is easily convertible to it, as there is no
f842fdb0 284ambiguity in parsing.
63e0c8b3
SP
285
286`now`::
287 Always use the current time and timezone. The literal
288 `now` must always be supplied for `<when>`.
289+
290This is a toy format. The current time and timezone of this system
291is always copied into the identity string at the time it is being
882227f1 292created by fast-import. There is no way to specify a different time or
63e0c8b3
SP
293timezone.
294+
6a5d0b0a 295This particular format is supplied as it's short to implement and
63e0c8b3
SP
296may be useful to a process that wants to create a new commit
297right now, without needing to use a working directory or
0b444cdb 298'git update-index'.
63e0c8b3
SP
299+
300If separate `author` and `committer` commands are used in a `commit`
301the timestamps may not match, as the system clock will be polled
302twice (once for each command). The only way to ensure that both
303author and committer identity information has the same timestamp
304is to omit `author` (thus copying from `committer`) or to use a
305date format other than `now`.
306
6e411d20
SP
307Commands
308~~~~~~~~
882227f1 309fast-import accepts several commands to update the current repository
6e411d20
SP
310and control the current import process. More detailed discussion
311(with examples) of each command follows later.
312
313`commit`::
314 Creates a new branch or updates an existing branch by
315 creating a new commit and updating the branch to point at
316 the newly created commit.
317
318`tag`::
319 Creates an annotated tag object from an existing commit or
320 branch. Lightweight tags are not supported by this command,
321 as they are not recommended for recording meaningful points
322 in time.
323
324`reset`::
325 Reset an existing branch (or a new branch) to a specific
326 revision. This command must be used to change a branch to
327 a specific revision without making a commit on it.
328
329`blob`::
330 Convert raw file data into a blob, for future use in a
331 `commit` command. This command is optional and is not
332 needed to perform an import.
333
334`checkpoint`::
882227f1 335 Forces fast-import to close the current packfile, generate its
6e411d20
SP
336 unique SHA-1 checksum and index, and start a new packfile.
337 This command is optional and is not needed to perform
338 an import.
339
ac053c02
SP
340`progress`::
341 Causes fast-import to echo the entire line to its own
342 standard output. This command is optional and is not needed
343 to perform an import.
344
be56862f
SR
345`done`::
346 Marks the end of the stream. This command is optional
347 unless the `done` feature was requested using the
348 `--done` command line option or `feature done` command.
349
85c62395
DB
350`cat-blob`::
351 Causes fast-import to print a blob in 'cat-file --batch'
352 format to the file descriptor set with `--cat-blob-fd` or
353 `stdout` if unspecified.
354
8dc6a373
DB
355`ls`::
356 Causes fast-import to print a line describing a directory
357 entry in 'ls-tree' format to the file descriptor set with
358 `--cat-blob-fd` or `stdout` if unspecified.
359
f963bd5d
SR
360`feature`::
361 Require that fast-import supports the specified feature, or
362 abort if it does not.
363
9c8398f0
SR
364`option`::
365 Specify any of the options listed under OPTIONS that do not
366 change stream semantic to suit the frontend's needs. This
367 command is optional and is not needed to perform an import.
368
6e411d20
SP
369`commit`
370~~~~~~~~
371Create or update a branch with a new commit, recording one logical
372change to the project.
373
374....
375 'commit' SP <ref> LF
376 mark?
74fbd118
SP
377 ('author' (SP <name>)? SP LT <email> GT SP <when> LF)?
378 'committer' (SP <name>)? SP LT <email> GT SP <when> LF
6e411d20
SP
379 data
380 ('from' SP <committish> LF)?
381 ('merge' SP <committish> LF)?
a8dd2e7d 382 (filemodify | filedelete | filecopy | filerename | filedeleteall | notemodify)*
1fdb649c 383 LF?
6e411d20
SP
384....
385
386where `<ref>` is the name of the branch to make the commit on.
387Typically branch names are prefixed with `refs/heads/` in
388Git, so importing the CVS branch symbol `RELENG-1_0` would use
389`refs/heads/RELENG-1_0` for the value of `<ref>`. The value of
390`<ref>` must be a valid refname in Git. As `LF` is not valid in
391a Git refname, no quoting or escaping syntax is supported here.
392
882227f1 393A `mark` command may optionally appear, requesting fast-import to save a
6e411d20
SP
394reference to the newly created commit for future use by the frontend
395(see below for format). It is very common for frontends to mark
396every commit they create, thereby allowing future branch creation
397from any imported commit.
398
399The `data` command following `committer` must supply the commit
400message (see below for `data` command syntax). To import an empty
401commit message use a 0 length data. Commit messages are free-form
402and are not interpreted by Git. Currently they must be encoded in
882227f1 403UTF-8, as fast-import does not permit other encodings to be specified.
6e411d20 404
a8dd2e7d
JH
405Zero or more `filemodify`, `filedelete`, `filecopy`, `filerename`,
406`filedeleteall` and `notemodify` commands
825769a8
SP
407may be included to update the contents of the branch prior to
408creating the commit. These commands may be supplied in any order.
02783075 409However it is recommended that a `filedeleteall` command precede
a8dd2e7d
JH
410all `filemodify`, `filecopy`, `filerename` and `notemodify` commands in
411the same commit, as `filedeleteall` wipes the branch clean (see below).
6e411d20 412
1fdb649c
SP
413The `LF` after the command is optional (it used to be required).
414
6e411d20
SP
415`author`
416^^^^^^^^
417An `author` command may optionally appear, if the author information
418might differ from the committer information. If `author` is omitted
882227f1 419then fast-import will automatically use the committer's information for
6e411d20
SP
420the author portion of the commit. See below for a description of
421the fields in `author`, as they are identical to `committer`.
422
423`committer`
424^^^^^^^^^^^
425The `committer` command indicates who made this commit, and when
426they made it.
427
428Here `<name>` is the person's display name (for example
429``Com M Itter'') and `<email>` is the person's email address
430(``cm@example.com''). `LT` and `GT` are the literal less-than (\x3c)
431and greater-than (\x3e) symbols. These are required to delimit
432the email address from the other fields in the line. Note that
4b4963c0
DI
433`<name>` and `<email>` are free-form and may contain any sequence
434of bytes, except `LT`, `GT` and `LF`. `<name>` is typically UTF-8 encoded.
6e411d20 435
63e0c8b3 436The time of the change is specified by `<when>` using the date format
c499d768 437that was selected by the \--date-format=<fmt> command line option.
63e0c8b3
SP
438See ``Date Formats'' above for the set of supported formats, and
439their syntax.
6e411d20
SP
440
441`from`
442^^^^^^
ea5e370a
SP
443The `from` command is used to specify the commit to initialize
444this branch from. This revision will be the first ancestor of the
e7052886
ER
445new commit. The state of the tree built at this commit will begin
446with the state at the `from` commit, and be altered by the content
447modifications in this commit.
ea5e370a
SP
448
449Omitting the `from` command in the first commit of a new branch
450will cause fast-import to create that commit with no ancestor. This
451tends to be desired only for the initial commit of a project.
9b33fa08
EB
452If the frontend creates all files from scratch when making a new
453branch, a `merge` command may be used instead of `from` to start
454the commit with an empty tree.
ea5e370a
SP
455Omitting the `from` command on existing branches is usually desired,
456as the current commit on that branch is automatically assumed to
457be the first ancestor of the new commit.
6e411d20
SP
458
459As `LF` is not valid in a Git refname or SHA-1 expression, no
460quoting or escaping syntax is supported within `<committish>`.
461
462Here `<committish>` is any of the following:
463
882227f1 464* The name of an existing branch already in fast-import's internal branch
6a5d0b0a 465 table. If fast-import doesn't know the name, it's treated as a SHA-1
6e411d20
SP
466 expression.
467
468* A mark reference, `:<idnum>`, where `<idnum>` is the mark number.
469+
882227f1 470The reason fast-import uses `:` to denote a mark reference is this character
6e411d20 471is not legal in a Git branch name. The leading `:` makes it easy
02783075 472to distinguish between the mark 42 (`:42`) and the branch 42 (`42`
6e411d20
SP
473or `refs/heads/42`), or an abbreviated SHA-1 which happened to
474consist only of base-10 digits.
475+
476Marks must be declared (via `mark`) before they can be used.
477
478* A complete 40 byte or abbreviated commit SHA-1 in hex.
479
480* Any valid Git SHA-1 expression that resolves to a commit. See
9d83e382 481 ``SPECIFYING REVISIONS'' in linkgit:gitrevisions[7] for details.
6e411d20
SP
482
483The special case of restarting an incremental import from the
484current branch value should be written as:
485----
486 from refs/heads/branch^0
487----
6cf378f0 488The `^0` suffix is necessary as fast-import does not permit a branch to
6e411d20 489start from itself, and the branch is created in memory before the
6cf378f0 490`from` command is even read from the input. Adding `^0` will force
882227f1 491fast-import to resolve the commit through Git's revision parsing library,
6e411d20
SP
492rather than its internal branch table, thereby loading in the
493existing value of the branch.
494
495`merge`
496^^^^^^^
e7052886
ER
497Includes one additional ancestor commit. The additional ancestry
498link does not change the way the tree state is built at this commit.
499If the `from` command is
9b33fa08
EB
500omitted when creating a new branch, the first `merge` commit will be
501the first ancestor of the current commit, and the branch will start
502out with no files. An unlimited number of `merge` commands per
882227f1 503commit are permitted by fast-import, thereby establishing an n-way merge.
6e411d20
SP
504However Git's other tools never create commits with more than 15
505additional ancestors (forming a 16-way merge). For this reason
506it is suggested that frontends do not use more than 15 `merge`
9b33fa08 507commands per commit; 16, if starting a new, empty branch.
6e411d20
SP
508
509Here `<committish>` is any of the commit specification expressions
510also accepted by `from` (see above).
511
512`filemodify`
ef94edb5 513^^^^^^^^^^^^
6e411d20
SP
514Included in a `commit` command to add a new file or change the
515content of an existing file. This command has two different means
516of specifying the content of the file.
517
518External data format::
519 The data content for the file was already supplied by a prior
520 `blob` command. The frontend just needs to connect it.
521+
522....
523 'M' SP <mode> SP <dataref> SP <path> LF
524....
525+
334fba65 526Here usually `<dataref>` must be either a mark reference (`:<idnum>`)
6e411d20 527set by a prior `blob` command, or a full 40-byte SHA-1 of an
334fba65
JN
528existing Git blob object. If `<mode>` is `040000`` then
529`<dataref>` must be the full 40-byte SHA-1 of an existing
530Git tree object or a mark reference set with `--import-marks`.
6e411d20
SP
531
532Inline data format::
533 The data content for the file has not been supplied yet.
534 The frontend wants to supply it as part of this modify
535 command.
536+
537....
538 'M' SP <mode> SP 'inline' SP <path> LF
539 data
540....
541+
542See below for a detailed description of the `data` command.
543
544In both formats `<mode>` is the type of file entry, specified
545in octal. Git only supports the following modes:
546
547* `100644` or `644`: A normal (not-executable) file. The majority
548 of files in most projects use this mode. If in doubt, this is
549 what you want.
550* `100755` or `755`: A normal, but executable, file.
9981b6d9 551* `120000`: A symlink, the content of the file will be the link target.
03db4525
AG
552* `160000`: A gitlink, SHA-1 of the object refers to a commit in
553 another repository. Git links can only be specified by SHA or through
554 a commit mark. They are used to implement submodules.
334fba65
JN
555* `040000`: A subdirectory. Subdirectories can only be specified by
556 SHA or through a tree mark set with `--import-marks`.
6e411d20
SP
557
558In both formats `<path>` is the complete path of the file to be added
559(if not already existing) or modified (if already existing).
560
c4431d38 561A `<path>` string must use UNIX-style directory separators (forward
6e411d20
SP
562slash `/`), may contain any byte other than `LF`, and must not
563start with double quote (`"`).
564
7c65b2eb
MM
565A path can use C-style string quoting; this is accepted in all cases
566and mandatory if the filename starts with double quote or contains
567`LF`. In C-style quoting, the complete name should be surrounded with
568double quotes, and any `LF`, backslash, or double quote characters
569must be escaped by preceding them with a backslash (e.g.,
570`"path/with\n, \\ and \" in it"`).
6e411d20 571
02783075 572The value of `<path>` must be in canonical form. That is it must not:
6e411d20
SP
573
574* contain an empty directory component (e.g. `foo//bar` is invalid),
c4431d38
JK
575* end with a directory separator (e.g. `foo/` is invalid),
576* start with a directory separator (e.g. `/foo` is invalid),
6e411d20
SP
577* contain the special component `.` or `..` (e.g. `foo/./bar` and
578 `foo/../bar` are invalid).
579
e5959106
JN
580The root of the tree can be represented by an empty string as `<path>`.
581
6e411d20
SP
582It is recommended that `<path>` always be encoded using UTF-8.
583
6e411d20 584`filedelete`
ef94edb5 585^^^^^^^^^^^^
512e44b2
SP
586Included in a `commit` command to remove a file or recursively
587delete an entire directory from the branch. If the file or directory
588removal makes its parent directory empty, the parent directory will
6e411d20
SP
589be automatically removed too. This cascades up the tree until the
590first non-empty directory or the root is reached.
591
592....
593 'D' SP <path> LF
594....
595
512e44b2
SP
596here `<path>` is the complete path of the file or subdirectory to
597be removed from the branch.
6e411d20
SP
598See `filemodify` above for a detailed description of `<path>`.
599
b6f3481b
SP
600`filecopy`
601^^^^^^^^^^^^
602Recursively copies an existing file or subdirectory to a different
603location within the branch. The existing file or directory must
604exist. If the destination exists it will be completely replaced
605by the content copied from the source.
606
607....
608 'C' SP <path> SP <path> LF
609....
610
611here the first `<path>` is the source location and the second
612`<path>` is the destination. See `filemodify` above for a detailed
613description of what `<path>` may look like. To use a source path
614that contains SP the path must be quoted.
615
616A `filecopy` command takes effect immediately. Once the source
617location has been copied to the destination any future commands
618applied to the source location will not impact the destination of
619the copy.
620
f39a946a
SP
621`filerename`
622^^^^^^^^^^^^
623Renames an existing file or subdirectory to a different location
624within the branch. The existing file or directory must exist. If
625the destination exists it will be replaced by the source directory.
626
627....
628 'R' SP <path> SP <path> LF
629....
630
631here the first `<path>` is the source location and the second
632`<path>` is the destination. See `filemodify` above for a detailed
633description of what `<path>` may look like. To use a source path
634that contains SP the path must be quoted.
635
636A `filerename` command takes effect immediately. Once the source
637location has been renamed to the destination any future commands
638applied to the source location will create new files there and not
639impact the destination of the rename.
640
b6f3481b
SP
641Note that a `filerename` is the same as a `filecopy` followed by a
642`filedelete` of the source location. There is a slight performance
643advantage to using `filerename`, but the advantage is so small
644that it is never worth trying to convert a delete/add pair in
645source material into a rename for fast-import. This `filerename`
646command is provided just to simplify frontends that already have
647rename information and don't want bother with decomposing it into a
648`filecopy` followed by a `filedelete`.
649
825769a8
SP
650`filedeleteall`
651^^^^^^^^^^^^^^^
652Included in a `commit` command to remove all files (and also all
653directories) from the branch. This command resets the internal
654branch structure to have no files in it, allowing the frontend
655to subsequently add all interesting files from scratch.
656
657....
658 'deleteall' LF
659....
660
661This command is extremely useful if the frontend does not know
662(or does not care to know) what files are currently on the branch,
663and therefore cannot generate the proper `filedelete` commands to
664update the content.
665
666Issuing a `filedeleteall` followed by the needed `filemodify`
667commands to set the correct content will produce the same results
668as sending only the needed `filemodify` and `filedelete` commands.
882227f1 669The `filedeleteall` approach may however require fast-import to use slightly
825769a8
SP
670more memory per active branch (less than 1 MiB for even most large
671projects); so frontends that can easily obtain only the affected
672paths for a commit are encouraged to do so.
673
a8dd2e7d
JH
674`notemodify`
675^^^^^^^^^^^^
b421812b
DI
676Included in a `commit` `<notes_ref>` command to add a new note
677annotating a `<committish>` or change this annotation contents.
678Internally it is similar to filemodify 100644 on `<committish>`
679path (maybe split into subdirectories). It's not advised to
680use any other commands to write to the `<notes_ref>` tree except
681`filedeleteall` to delete all existing notes in this tree.
682This command has two different means of specifying the content
683of the note.
a8dd2e7d
JH
684
685External data format::
686 The data content for the note was already supplied by a prior
687 `blob` command. The frontend just needs to connect it to the
688 commit that is to be annotated.
689+
690....
691 'N' SP <dataref> SP <committish> LF
692....
693+
694Here `<dataref>` can be either a mark reference (`:<idnum>`)
695set by a prior `blob` command, or a full 40-byte SHA-1 of an
696existing Git blob object.
697
698Inline data format::
699 The data content for the note has not been supplied yet.
700 The frontend wants to supply it as part of this modify
701 command.
702+
703....
704 'N' SP 'inline' SP <committish> LF
705 data
706....
707+
708See below for a detailed description of the `data` command.
709
710In both formats `<committish>` is any of the commit specification
711expressions also accepted by `from` (see above).
712
6e411d20
SP
713`mark`
714~~~~~~
882227f1 715Arranges for fast-import to save a reference to the current object, allowing
6e411d20
SP
716the frontend to recall this object at a future point in time, without
717knowing its SHA-1. Here the current object is the object creation
718command the `mark` command appears within. This can be `commit`,
719`tag`, and `blob`, but `commit` is the most common usage.
720
721....
722 'mark' SP ':' <idnum> LF
723....
724
725where `<idnum>` is the number assigned by the frontend to this mark.
ef94edb5
SP
726The value of `<idnum>` is expressed as an ASCII decimal integer.
727The value 0 is reserved and cannot be used as
6e411d20
SP
728a mark. Only values greater than or equal to 1 may be used as marks.
729
730New marks are created automatically. Existing marks can be moved
731to another object simply by reusing the same `<idnum>` in another
732`mark` command.
733
734`tag`
735~~~~~
736Creates an annotated tag referring to a specific commit. To create
737lightweight (non-annotated) tags see the `reset` command below.
738
739....
740 'tag' SP <name> LF
741 'from' SP <committish> LF
74fbd118 742 'tagger' (SP <name>)? SP LT <email> GT SP <when> LF
6e411d20 743 data
6e411d20
SP
744....
745
746where `<name>` is the name of the tag to create.
747
748Tag names are automatically prefixed with `refs/tags/` when stored
749in Git, so importing the CVS branch symbol `RELENG-1_0-FINAL` would
882227f1 750use just `RELENG-1_0-FINAL` for `<name>`, and fast-import will write the
6e411d20
SP
751corresponding ref as `refs/tags/RELENG-1_0-FINAL`.
752
753The value of `<name>` must be a valid refname in Git and therefore
754may contain forward slashes. As `LF` is not valid in a Git refname,
755no quoting or escaping syntax is supported here.
756
757The `from` command is the same as in the `commit` command; see
758above for details.
759
760The `tagger` command uses the same format as `committer` within
761`commit`; again see above for details.
762
763The `data` command following `tagger` must supply the annotated tag
764message (see below for `data` command syntax). To import an empty
765tag message use a 0 length data. Tag messages are free-form and are
766not interpreted by Git. Currently they must be encoded in UTF-8,
882227f1 767as fast-import does not permit other encodings to be specified.
6e411d20 768
882227f1 769Signing annotated tags during import from within fast-import is not
6e411d20
SP
770supported. Trying to include your own PGP/GPG signature is not
771recommended, as the frontend does not (easily) have access to the
772complete set of bytes which normally goes into such a signature.
882227f1 773If signing is required, create lightweight tags from within fast-import with
6e411d20 774`reset`, then create the annotated versions of those tags offline
0b444cdb 775with the standard 'git tag' process.
6e411d20
SP
776
777`reset`
778~~~~~~~
779Creates (or recreates) the named branch, optionally starting from
780a specific revision. The reset command allows a frontend to issue
781a new `from` command for an existing branch, or to create a new
782branch from an existing commit without creating a new commit.
783
784....
785 'reset' SP <ref> LF
786 ('from' SP <committish> LF)?
1fdb649c 787 LF?
6e411d20
SP
788....
789
790For a detailed description of `<ref>` and `<committish>` see above
791under `commit` and `from`.
792
1fdb649c
SP
793The `LF` after the command is optional (it used to be required).
794
6e411d20
SP
795The `reset` command can also be used to create lightweight
796(non-annotated) tags. For example:
797
798====
799 reset refs/tags/938
800 from :938
801====
802
803would create the lightweight tag `refs/tags/938` referring to
804whatever commit mark `:938` references.
805
806`blob`
807~~~~~~
808Requests writing one file revision to the packfile. The revision
809is not connected to any commit; this connection must be formed in
810a subsequent `commit` command by referencing the blob through an
811assigned mark.
812
813....
814 'blob' LF
815 mark?
816 data
817....
818
819The mark command is optional here as some frontends have chosen
820to generate the Git SHA-1 for the blob on their own, and feed that
6a5d0b0a 821directly to `commit`. This is typically more work than it's worth
6e411d20
SP
822however, as marks are inexpensive to store and easy to use.
823
824`data`
825~~~~~~
826Supplies raw data (for use as blob/file content, commit messages, or
882227f1 827annotated tag messages) to fast-import. Data can be supplied using an exact
6e411d20
SP
828byte count or delimited with a terminating line. Real frontends
829intended for production-quality conversions should always use the
830exact byte count format, as it is more robust and performs better.
882227f1 831The delimited format is intended primarily for testing fast-import.
6e411d20 832
401d53fa
SP
833Comment lines appearing within the `<raw>` part of `data` commands
834are always taken to be part of the body of the data and are therefore
835never ignored by fast-import. This makes it safe to import any
836file/message content whose lines might start with `#`.
837
ef94edb5
SP
838Exact byte count format::
839 The frontend must specify the number of bytes of data.
840+
6e411d20
SP
841....
842 'data' SP <count> LF
2c570cde 843 <raw> LF?
6e411d20 844....
ef94edb5 845+
6e411d20 846where `<count>` is the exact number of bytes appearing within
ef94edb5
SP
847`<raw>`. The value of `<count>` is expressed as an ASCII decimal
848integer. The `LF` on either side of `<raw>` is not
6e411d20 849included in `<count>` and will not be included in the imported data.
2c570cde
SP
850+
851The `LF` after `<raw>` is optional (it used to be required) but
852recommended. Always including it makes debugging a fast-import
853stream easier as the next command always starts in column 0
854of the next line, even if `<raw>` did not end with an `LF`.
6e411d20 855
ef94edb5
SP
856Delimited format::
857 A delimiter string is used to mark the end of the data.
882227f1 858 fast-import will compute the length by searching for the delimiter.
02783075 859 This format is primarily useful for testing and is not
ef94edb5
SP
860 recommended for real data.
861+
6e411d20
SP
862....
863 'data' SP '<<' <delim> LF
864 <raw> LF
865 <delim> LF
2c570cde 866 LF?
6e411d20 867....
ef94edb5 868+
6e411d20
SP
869where `<delim>` is the chosen delimiter string. The string `<delim>`
870must not appear on a line by itself within `<raw>`, as otherwise
882227f1 871fast-import will think the data ends earlier than it really does. The `LF`
6e411d20
SP
872immediately trailing `<raw>` is part of `<raw>`. This is one of
873the limitations of the delimited format, it is impossible to supply
874a data chunk which does not have an LF as its last byte.
2c570cde
SP
875+
876The `LF` after `<delim> LF` is optional (it used to be required).
6e411d20
SP
877
878`checkpoint`
879~~~~~~~~~~~~
882227f1 880Forces fast-import to close the current packfile, start a new one, and to
820b9310 881save out all current branch refs, tags and marks.
6e411d20
SP
882
883....
884 'checkpoint' LF
1fdb649c 885 LF?
6e411d20
SP
886....
887
882227f1 888Note that fast-import automatically switches packfiles when the current
820b9310 889packfile reaches \--max-pack-size, or 4 GiB, whichever limit is
882227f1 890smaller. During an automatic packfile switch fast-import does not update
820b9310
SP
891the branch refs, tags or marks.
892
893As a `checkpoint` can require a significant amount of CPU time and
894disk IO (to compute the overall pack SHA-1 checksum, generate the
895corresponding index file, and update the refs) it can easily take
896several minutes for a single `checkpoint` command to complete.
897
898Frontends may choose to issue checkpoints during extremely large
899and long running imports, or when they need to allow another Git
900process access to a branch. However given that a 30 GiB Subversion
882227f1 901repository can be loaded into Git through fast-import in about 3 hours,
820b9310
SP
902explicit checkpointing may not be necessary.
903
1fdb649c 904The `LF` after the command is optional (it used to be required).
820b9310 905
ac053c02
SP
906`progress`
907~~~~~~~~~~
908Causes fast-import to print the entire `progress` line unmodified to
909its standard output channel (file descriptor 1) when the command is
910processed from the input stream. The command otherwise has no impact
911on the current import, or on any of fast-import's internal state.
912
913....
914 'progress' SP <any> LF
915 LF?
916....
917
918The `<any>` part of the command may contain any sequence of bytes
919that does not contain `LF`. The `LF` after the command is optional.
920Callers may wish to process the output through a tool such as sed to
921remove the leading part of the line, for example:
922
923====
b1889c36 924 frontend | git fast-import | sed 's/^progress //'
ac053c02
SP
925====
926
927Placing a `progress` command immediately after a `checkpoint` will
928inform the reader when the `checkpoint` has been completed and it
929can safely access the refs that fast-import updated.
930
85c62395
DB
931`cat-blob`
932~~~~~~~~~~
933Causes fast-import to print a blob to a file descriptor previously
934arranged with the `--cat-blob-fd` argument. The command otherwise
935has no impact on the current import; its main purpose is to
936retrieve blobs that may be in fast-import's memory but not
937accessible from the target repository.
938
939....
940 'cat-blob' SP <dataref> LF
941....
942
943The `<dataref>` can be either a mark reference (`:<idnum>`)
944set previously or a full 40-byte SHA-1 of a Git blob, preexisting or
945ready to be written.
946
898243b8 947Output uses the same format as `git cat-file --batch`:
85c62395
DB
948
949====
950 <sha1> SP 'blob' SP <size> LF
951 <contents> LF
952====
953
777f80d7
JN
954This command can be used anywhere in the stream that comments are
955accepted. In particular, the `cat-blob` command can be used in the
956middle of a commit but not in the middle of a `data` command.
957
d57e490a
JN
958See ``Responses To Commands'' below for details about how to read
959this output safely.
960
8dc6a373
DB
961`ls`
962~~~~
963Prints information about the object at a path to a file descriptor
964previously arranged with the `--cat-blob-fd` argument. This allows
965printing a blob from the active commit (with `cat-blob`) or copying a
966blob or tree from a previous commit for use in the current one (with
967`filemodify`).
968
969The `ls` command can be used anywhere in the stream that comments are
970accepted, including the middle of a commit.
971
972Reading from the active commit::
973 This form can only be used in the middle of a `commit`.
974 The path names a directory entry within fast-import's
975 active commit. The path must be quoted in this case.
976+
977....
978 'ls' SP <path> LF
979....
980
981Reading from a named tree::
982 The `<dataref>` can be a mark reference (`:<idnum>`) or the
983 full 40-byte SHA-1 of a Git tag, commit, or tree object,
984 preexisting or waiting to be written.
985 The path is relative to the top level of the tree
986 named by `<dataref>`.
987+
988....
989 'ls' SP <dataref> SP <path> LF
990....
991
992See `filemodify` above for a detailed description of `<path>`.
993
6cf378f0 994Output uses the same format as `git ls-tree <tree> -- <path>`:
8dc6a373
DB
995
996====
997 <mode> SP ('blob' | 'tree' | 'commit') SP <dataref> HT <path> LF
998====
999
1000The <dataref> represents the blob, tree, or commit object at <path>
1001and can be used in later 'cat-blob', 'filemodify', or 'ls' commands.
1002
1003If there is no file or subtree at that path, 'git fast-import' will
1004instead report
1005
1006====
1007 missing SP <path> LF
1008====
1009
d57e490a
JN
1010See ``Responses To Commands'' below for details about how to read
1011this output safely.
1012
f963bd5d
SR
1013`feature`
1014~~~~~~~~~
1015Require that fast-import supports the specified feature, or abort if
1016it does not.
1017
1018....
4980fffb 1019 'feature' SP <feature> ('=' <argument>)? LF
f963bd5d
SR
1020....
1021
4980fffb 1022The <feature> part of the command may be any one of the following:
f963bd5d 1023
4980fffb
JN
1024date-format::
1025export-marks::
1026relative-marks::
1027no-relative-marks::
1028force::
1029 Act as though the corresponding command-line option with
1030 a leading '--' was passed on the command line
1031 (see OPTIONS, above).
f963bd5d 1032
4980fffb 1033import-marks::
3beb4fc4 1034import-marks-if-exists::
4980fffb 1035 Like --import-marks except in two respects: first, only one
3beb4fc4
DI
1036 "feature import-marks" or "feature import-marks-if-exists"
1037 command is allowed per stream; second, an --import-marks=
1038 or --import-marks-if-exists command-line option overrides
1039 any of these "feature" commands in the stream; third,
1040 "feature import-marks-if-exists" like a corresponding
1041 command-line option silently skips a nonexistent file.
f963bd5d 1042
85c62395 1043cat-blob::
8dc6a373
DB
1044ls::
1045 Require that the backend support the 'cat-blob' or 'ls' command.
1046 Versions of fast-import not supporting the specified command
1047 will exit with a message indicating so.
85c62395
DB
1048 This lets the import error out early with a clear message,
1049 rather than wasting time on the early part of an import
1050 before the unsupported command is detected.
081751c8 1051
547e8b92
JN
1052notes::
1053 Require that the backend support the 'notemodify' (N)
1054 subcommand to the 'commit' command.
1055 Versions of fast-import not supporting notes will exit
1056 with a message indicating so.
1057
be56862f
SR
1058done::
1059 Error out if the stream ends without a 'done' command.
1060 Without this feature, errors causing the frontend to end
1061 abruptly at a convenient point in the stream can go
3266de10
ER
1062 undetected. This may occur, for example, if an import
1063 front end dies in mid-operation without emitting SIGTERM
1064 or SIGKILL at its subordinate git fast-import instance.
a8e4a594 1065
9c8398f0
SR
1066`option`
1067~~~~~~~~
1068Processes the specified option so that git fast-import behaves in a
1069way that suits the frontend's needs.
1070Note that options specified by the frontend are overridden by any
1071options the user may specify to git fast-import itself.
1072
1073....
1074 'option' SP <option> LF
1075....
1076
1077The `<option>` part of the command may contain any of the options
1078listed in the OPTIONS section that do not change import semantics,
1079without the leading '--' and is treated in the same way.
1080
1081Option commands must be the first commands on the input (not counting
1082feature commands), to give an option command after any non-option
1083command is an error.
1084
1085The following commandline options change import semantics and may therefore
1086not be passed as option:
1087
1088* date-format
1089* import-marks
1090* export-marks
85c62395 1091* cat-blob-fd
9c8398f0
SR
1092* force
1093
be56862f
SR
1094`done`
1095~~~~~~
1096If the `done` feature is not in use, treated as if EOF was read.
1097This can be used to tell fast-import to finish early.
1098
1099If the `--done` command line option or `feature done` command is
1100in use, the `done` command is mandatory and marks the end of the
1101stream.
1102
d57e490a
JN
1103Responses To Commands
1104---------------------
1105New objects written by fast-import are not available immediately.
1106Most fast-import commands have no visible effect until the next
1107checkpoint (or completion). The frontend can send commands to
1108fill fast-import's input pipe without worrying about how quickly
1109they will take effect, which improves performance by simplifying
1110scheduling.
1111
1112For some frontends, though, it is useful to be able to read back
1113data from the current repository as it is being updated (for
1114example when the source material describes objects in terms of
1115patches to be applied to previously imported objects). This can
1116be accomplished by connecting the frontend and fast-import via
1117bidirectional pipes:
1118
1119====
1120 mkfifo fast-import-output
1121 frontend <fast-import-output |
1122 git fast-import >fast-import-output
1123====
1124
1125A frontend set up this way can use `progress`, `ls`, and `cat-blob`
1126commands to read information from the import in progress.
1127
1128To avoid deadlock, such frontends must completely consume any
1129pending output from `progress`, `ls`, and `cat-blob` before
1130performing writes to fast-import that might block.
1131
e7e5170f
SP
1132Crash Reports
1133-------------
1134If fast-import is supplied invalid input it will terminate with a
1135non-zero exit status and create a crash report in the top level of
1136the Git repository it was importing into. Crash reports contain
1137a snapshot of the internal fast-import state as well as the most
1138recent commands that lead up to the crash.
1139
1140All recent commands (including stream comments, file changes and
1141progress commands) are shown in the command history within the crash
1142report, but raw file data and commit messages are excluded from the
1143crash report. This exclusion saves space within the report file
1144and reduces the amount of buffering that fast-import must perform
1145during execution.
1146
1147After writing a crash report fast-import will close the current
1148packfile and export the marks table. This allows the frontend
1149developer to inspect the repository state and resume the import from
1150the point where it crashed. The modified branches and tags are not
1151updated during a crash, as the import did not complete successfully.
1152Branch and tag information can be found in the crash report and
1153must be applied manually if the update is needed.
1154
1155An example crash:
1156
1157====
1158 $ cat >in <<END_OF_INPUT
1159 # my very first test commit
1160 commit refs/heads/master
1161 committer Shawn O. Pearce <spearce> 19283 -0400
1162 # who is that guy anyway?
1163 data <<EOF
1164 this is my commit
1165 EOF
1166 M 644 inline .gitignore
1167 data <<EOF
1168 .gitignore
1169 EOF
1170 M 777 inline bob
1171 END_OF_INPUT
1172
b1889c36 1173 $ git fast-import <in
e7e5170f
SP
1174 fatal: Corrupt mode: M 777 inline bob
1175 fast-import: dumping crash report to .git/fast_import_crash_8434
1176
1177 $ cat .git/fast_import_crash_8434
1178 fast-import crash report:
1179 fast-import process: 8434
1180 parent process : 1391
1181 at Sat Sep 1 00:58:12 2007
1182
1183 fatal: Corrupt mode: M 777 inline bob
1184
1185 Most Recent Commands Before Crash
1186 ---------------------------------
1187 # my very first test commit
1188 commit refs/heads/master
1189 committer Shawn O. Pearce <spearce> 19283 -0400
1190 # who is that guy anyway?
1191 data <<EOF
1192 M 644 inline .gitignore
1193 data <<EOF
1194 * M 777 inline bob
1195
1196 Active Branch LRU
1197 -----------------
1198 active_branches = 1 cur, 5 max
1199
1200 pos clock name
1201 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1202 1) 0 refs/heads/master
1203
1204 Inactive Branches
1205 -----------------
1206 refs/heads/master:
1207 status : active loaded dirty
1208 tip commit : 0000000000000000000000000000000000000000
1209 old tree : 0000000000000000000000000000000000000000
1210 cur tree : 0000000000000000000000000000000000000000
1211 commit clock: 0
1212 last pack :
1213
1214
1215 -------------------
1216 END OF CRASH REPORT
1217====
1218
bdd9f424
SP
1219Tips and Tricks
1220---------------
1221The following tips and tricks have been collected from various
882227f1 1222users of fast-import, and are offered here as suggestions.
bdd9f424
SP
1223
1224Use One Mark Per Commit
1225~~~~~~~~~~~~~~~~~~~~~~~
1226When doing a repository conversion, use a unique mark per commit
1227(`mark :<n>`) and supply the \--export-marks option on the command
882227f1 1228line. fast-import will dump a file which lists every mark and the Git
bdd9f424
SP
1229object SHA-1 that corresponds to it. If the frontend can tie
1230the marks back to the source repository, it is easy to verify the
1231accuracy and completeness of the import by comparing each Git
1232commit to the corresponding source revision.
1233
1234Coming from a system such as Perforce or Subversion this should be
882227f1 1235quite simple, as the fast-import mark can also be the Perforce changeset
bdd9f424
SP
1236number or the Subversion revision number.
1237
1238Freely Skip Around Branches
1239~~~~~~~~~~~~~~~~~~~~~~~~~~~
1240Don't bother trying to optimize the frontend to stick to one branch
1241at a time during an import. Although doing so might be slightly
882227f1 1242faster for fast-import, it tends to increase the complexity of the frontend
bdd9f424
SP
1243code considerably.
1244
882227f1 1245The branch LRU builtin to fast-import tends to behave very well, and the
bdd9f424
SP
1246cost of activating an inactive branch is so low that bouncing around
1247between branches has virtually no impact on import performance.
1248
c7346156
SP
1249Handling Renames
1250~~~~~~~~~~~~~~~~
1251When importing a renamed file or directory, simply delete the old
1252name(s) and modify the new name(s) during the corresponding commit.
1253Git performs rename detection after-the-fact, rather than explicitly
1254during a commit.
1255
bdd9f424
SP
1256Use Tag Fixup Branches
1257~~~~~~~~~~~~~~~~~~~~~~
1258Some other SCM systems let the user create a tag from multiple
1259files which are not from the same commit/changeset. Or to create
1260tags which are a subset of the files available in the repository.
1261
1262Importing these tags as-is in Git is impossible without making at
1263least one commit which ``fixes up'' the files to match the content
882227f1 1264of the tag. Use fast-import's `reset` command to reset a dummy branch
bdd9f424
SP
1265outside of your normal branch space to the base commit for the tag,
1266then commit one or more file fixup commits, and finally tag the
1267dummy branch.
1268
1269For example since all normal branches are stored under `refs/heads/`
1270name the tag fixup branch `TAG_FIXUP`. This way it is impossible for
1271the fixup branch used by the importer to have namespace conflicts
1272with real branches imported from the source (the name `TAG_FIXUP`
1273is not `refs/heads/TAG_FIXUP`).
1274
1275When committing fixups, consider using `merge` to connect the
1276commit(s) which are supplying file revisions to the fixup branch.
0b444cdb 1277Doing so will allow tools such as 'git blame' to track
bdd9f424
SP
1278through the real commit history and properly annotate the source
1279files.
1280
882227f1 1281After fast-import terminates the frontend will need to do `rm .git/TAG_FIXUP`
bdd9f424
SP
1282to remove the dummy branch.
1283
1284Import Now, Repack Later
1285~~~~~~~~~~~~~~~~~~~~~~~~
882227f1 1286As soon as fast-import completes the Git repository is completely valid
02783075 1287and ready for use. Typically this takes only a very short time,
bdd9f424
SP
1288even for considerably large projects (100,000+ commits).
1289
1290However repacking the repository is necessary to improve data
1291locality and access performance. It can also take hours on extremely
1292large projects (especially if -f and a large \--window parameter is
1293used). Since repacking is safe to run alongside readers and writers,
1294run the repack in the background and let it finish when it finishes.
1295There is no reason to wait to explore your new Git project!
1296
1297If you choose to wait for the repack, don't try to run benchmarks
882227f1 1298or performance tests until repacking is completed. fast-import outputs
bdd9f424
SP
1299suboptimal packfiles that are simply never seen in real use
1300situations.
1301
1302Repacking Historical Data
1303~~~~~~~~~~~~~~~~~~~~~~~~~
1304If you are repacking very old imported data (e.g. older than the
1305last year), consider expending some extra CPU time and supplying
0b444cdb 1306\--window=50 (or higher) when you run 'git repack'.
bdd9f424
SP
1307This will take longer, but will also produce a smaller packfile.
1308You only need to expend the effort once, and everyone using your
1309project will benefit from the smaller repository.
1310
ac053c02
SP
1311Include Some Progress Messages
1312~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1313Every once in a while have your frontend emit a `progress` message
1314to fast-import. The contents of the messages are entirely free-form,
1315so one suggestion would be to output the current month and year
1316each time the current commit date moves into the next month.
1317Your users will feel better knowing how much of the data stream
1318has been processed.
1319
bdd9f424 1320
6e411d20
SP
1321Packfile Optimization
1322---------------------
882227f1 1323When packing a blob fast-import always attempts to deltify against the last
6e411d20
SP
1324blob written. Unless specifically arranged for by the frontend,
1325this will probably not be a prior version of the same file, so the
1326generated delta will not be the smallest possible. The resulting
1327packfile will be compressed, but will not be optimal.
1328
1329Frontends which have efficient access to all revisions of a
1330single file (for example reading an RCS/CVS ,v file) can choose
1331to supply all revisions of that file as a sequence of consecutive
882227f1 1332`blob` commands. This allows fast-import to deltify the different file
6e411d20
SP
1333revisions against each other, saving space in the final packfile.
1334Marks can be used to later identify individual file revisions during
1335a sequence of `commit` commands.
1336
882227f1
SP
1337The packfile(s) created by fast-import do not encourage good disk access
1338patterns. This is caused by fast-import writing the data in the order
6e411d20
SP
1339it is received on standard input, while Git typically organizes
1340data within packfiles to make the most recent (current tip) data
1341appear before historical data. Git also clusters commits together,
1342speeding up revision traversal through better cache locality.
1343
1344For this reason it is strongly recommended that users repack the
882227f1 1345repository with `git repack -a -d` after fast-import completes, allowing
6e411d20
SP
1346Git to reorganize the packfiles for faster data access. If blob
1347deltas are suboptimal (see above) then also adding the `-f` option
1348to force recomputation of all deltas can significantly reduce the
1349final packfile size (30-50% smaller can be quite typical).
1350
bdd9f424 1351
6e411d20
SP
1352Memory Utilization
1353------------------
882227f1 1354There are a number of factors which affect how much memory fast-import
6e411d20 1355requires to perform an import. Like critical sections of core
02783075
BH
1356Git, fast-import uses its own memory allocators to amortize any overheads
1357associated with malloc. In practice fast-import tends to amortize any
6e411d20
SP
1358malloc overheads to 0, due to its use of large block allocations.
1359
1360per object
1361~~~~~~~~~~
882227f1 1362fast-import maintains an in-memory structure for every object written in
6e411d20
SP
1363this execution. On a 32 bit system the structure is 32 bytes,
1364on a 64 bit system the structure is 40 bytes (due to the larger
1365pointer sizes). Objects in the table are not deallocated until
882227f1 1366fast-import terminates. Importing 2 million objects on a 32 bit system
6e411d20
SP
1367will require approximately 64 MiB of memory.
1368
1369The object table is actually a hashtable keyed on the object name
882227f1 1370(the unique SHA-1). This storage configuration allows fast-import to reuse
6e411d20
SP
1371an existing or already written object and avoid writing duplicates
1372to the output packfile. Duplicate blobs are surprisingly common
1373in an import, typically due to branch merges in the source.
1374
1375per mark
1376~~~~~~~~
1377Marks are stored in a sparse array, using 1 pointer (4 bytes or 8
1378bytes, depending on pointer size) per mark. Although the array
1379is sparse, frontends are still strongly encouraged to use marks
1380between 1 and n, where n is the total number of marks required for
1381this import.
1382
1383per branch
1384~~~~~~~~~~
1385Branches are classified as active and inactive. The memory usage
1386of the two classes is significantly different.
1387
1388Inactive branches are stored in a structure which uses 96 or 120
1389bytes (32 bit or 64 bit systems, respectively), plus the length of
882227f1 1390the branch name (typically under 200 bytes), per branch. fast-import will
6e411d20
SP
1391easily handle as many as 10,000 inactive branches in under 2 MiB
1392of memory.
1393
1394Active branches have the same overhead as inactive branches, but
1395also contain copies of every tree that has been recently modified on
1396that branch. If subtree `include` has not been modified since the
1397branch became active, its contents will not be loaded into memory,
1398but if subtree `src` has been modified by a commit since the branch
1399became active, then its contents will be loaded in memory.
1400
1401As active branches store metadata about the files contained on that
1402branch, their in-memory storage size can grow to a considerable size
1403(see below).
1404
882227f1 1405fast-import automatically moves active branches to inactive status based on
6e411d20
SP
1406a simple least-recently-used algorithm. The LRU chain is updated on
1407each `commit` command. The maximum number of active branches can be
c499d768 1408increased or decreased on the command line with \--active-branches=.
6e411d20
SP
1409
1410per active tree
1411~~~~~~~~~~~~~~~
1412Trees (aka directories) use just 12 bytes of memory on top of the
1413memory required for their entries (see ``per active file'' below).
02783075 1414The cost of a tree is virtually 0, as its overhead amortizes out
6e411d20
SP
1415over the individual file entries.
1416
1417per active file entry
1418~~~~~~~~~~~~~~~~~~~~~
1419Files (and pointers to subtrees) within active trees require 52 or 64
1420bytes (32/64 bit platforms) per entry. To conserve space, file and
1421tree names are pooled in a common string table, allowing the filename
1422``Makefile'' to use just 16 bytes (after including the string header
1423overhead) no matter how many times it occurs within the project.
1424
1425The active branch LRU, when coupled with the filename string pool
882227f1 1426and lazy loading of subtrees, allows fast-import to efficiently import
6e411d20
SP
1427projects with 2,000+ branches and 45,114+ files in a very limited
1428memory footprint (less than 2.7 MiB per active branch).
1429
dc01f59d
JN
1430Signals
1431-------
1432Sending *SIGUSR1* to the 'git fast-import' process ends the current
1433packfile early, simulating a `checkpoint` command. The impatient
1434operator can use this facility to peek at the objects and refs from an
1435import in progress, at the cost of some added running time and worse
1436compression.
6e411d20 1437
6e411d20
SP
1438GIT
1439---
9e1f0a85 1440Part of the linkgit:git[1] suite