]>
Commit | Line | Data |
---|---|---|
5dc7bcc2 JH |
1 | At the core level, git is character encoding agnostic. |
2 | ||
3 | - The pathnames recorded in the index and in the tree objects | |
4 | are treated as uninterpreted sequences of non-NUL bytes. | |
5 | What readdir(2) returns are what are recorded and compared | |
6 | with the data git keeps track of, which in turn are expected | |
7 | to be what lstat(2) and creat(2) accepts. There is no such | |
8 | thing as pathname encoding translation. | |
9 | ||
10 | - The contents of the blob objects are uninterpreted sequence | |
11 | of bytes. There is no encoding translation at the core | |
12 | level. | |
13 | ||
14 | - The commit log messages are uninterpreted sequence of non-NUL | |
15 | bytes. | |
16 | ||
17 | Although we encourage that the commit log messages are encoded | |
18 | in UTF-8, both the core and git Porcelain are designed not to | |
19 | force UTF-8 on projects. If all participants of a particular | |
20 | project find it more convenient to use legacy encodings, git | |
21 | does not forbid it. However, there are a few things to keep in | |
22 | mind. | |
23 | ||
73bae1dc | 24 | . 'git-commit' and 'git-commit-tree' issues |
790296fd | 25 | a warning if the commit log message given to it does not look |
5dc7bcc2 JH |
26 | like a valid UTF-8 string, unless you explicitly say your |
27 | project uses a legacy encoding. The way to say this is to | |
38eb9329 | 28 | have i18n.commitencoding in `.git/config` file, like this: |
5dc7bcc2 JH |
29 | + |
30 | ------------ | |
38eb9329 | 31 | [i18n] |
5dc7bcc2 JH |
32 | commitencoding = ISO-8859-1 |
33 | ------------ | |
34 | + | |
35 | Commit objects created with the above setting record the value | |
38eb9329 | 36 | of `i18n.commitencoding` in its `encoding` header. This is to |
5dc7bcc2 JH |
37 | help other people who look at them later. Lack of this header |
38 | implies that the commit log message is encoded in UTF-8. | |
39 | ||
69cd8f63 AG |
40 | . 'git-log', 'git-show', 'git-blame' and friends look at the |
41 | `encoding` header of a commit object, and try to re-code the | |
42 | log message into UTF-8 unless otherwise specified. You can | |
5dc7bcc2 | 43 | specify the desired output encoding with |
38eb9329 | 44 | `i18n.logoutputencoding` in `.git/config` file, like this: |
5dc7bcc2 JH |
45 | + |
46 | ------------ | |
38eb9329 | 47 | [i18n] |
5dc7bcc2 JH |
48 | logoutputencoding = ISO-8859-1 |
49 | ------------ | |
50 | + | |
51 | If you do not have this configuration variable, the value of | |
38eb9329 | 52 | `i18n.commitencoding` is used instead. |
5dc7bcc2 JH |
53 | |
54 | Note that we deliberately chose not to re-code the commit log | |
55 | message when a commit is made to force UTF-8 at the commit | |
56 | object level, because re-coding to UTF-8 is not necessarily a | |
57 | reversible operation. |