]>
Commit | Line | Data |
---|---|---|
5dc7bcc2 JH |
1 | At the core level, git is character encoding agnostic. |
2 | ||
3 | - The pathnames recorded in the index and in the tree objects | |
4 | are treated as uninterpreted sequences of non-NUL bytes. | |
5 | What readdir(2) returns are what are recorded and compared | |
6 | with the data git keeps track of, which in turn are expected | |
7 | to be what lstat(2) and creat(2) accepts. There is no such | |
8 | thing as pathname encoding translation. | |
9 | ||
10 | - The contents of the blob objects are uninterpreted sequence | |
11 | of bytes. There is no encoding translation at the core | |
12 | level. | |
13 | ||
14 | - The commit log messages are uninterpreted sequence of non-NUL | |
15 | bytes. | |
16 | ||
17 | Although we encourage that the commit log messages are encoded | |
18 | in UTF-8, both the core and git Porcelain are designed not to | |
19 | force UTF-8 on projects. If all participants of a particular | |
20 | project find it more convenient to use legacy encodings, git | |
21 | does not forbid it. However, there are a few things to keep in | |
22 | mind. | |
23 | ||
24 | . `git-commit-tree` (hence, `git-commit` which uses it) issues | |
25 | an warning if the commit log message given to it does not look | |
26 | like a valid UTF-8 string, unless you explicitly say your | |
27 | project uses a legacy encoding. The way to say this is to | |
38eb9329 | 28 | have i18n.commitencoding in `.git/config` file, like this: |
5dc7bcc2 JH |
29 | + |
30 | ------------ | |
38eb9329 | 31 | [i18n] |
5dc7bcc2 JH |
32 | commitencoding = ISO-8859-1 |
33 | ------------ | |
34 | + | |
35 | Commit objects created with the above setting record the value | |
38eb9329 | 36 | of `i18n.commitencoding` in its `encoding` header. This is to |
5dc7bcc2 JH |
37 | help other people who look at them later. Lack of this header |
38 | implies that the commit log message is encoded in UTF-8. | |
39 | ||
40 | . `git-log`, `git-show` and friends looks at the `encoding` | |
41 | header of a commit object, and tries to re-code the log | |
42 | message into UTF-8 unless otherwise specified. You can | |
43 | specify the desired output encoding with | |
38eb9329 | 44 | `i18n.logoutputencoding` in `.git/config` file, like this: |
5dc7bcc2 JH |
45 | + |
46 | ------------ | |
38eb9329 | 47 | [i18n] |
5dc7bcc2 JH |
48 | logoutputencoding = ISO-8859-1 |
49 | ------------ | |
50 | + | |
51 | If you do not have this configuration variable, the value of | |
38eb9329 | 52 | `i18n.commitencoding` is used instead. |
5dc7bcc2 JH |
53 | |
54 | Note that we deliberately chose not to re-code the commit log | |
55 | message when a commit is made to force UTF-8 at the commit | |
56 | object level, because re-coding to UTF-8 is not necessarily a | |
57 | reversible operation. |