]>
Commit | Line | Data |
---|---|---|
3a59e595 | 1 | Git is to some extent character encoding agnostic. |
5dc7bcc2 | 2 | |
04c8ce9c | 3 | - The contents of the blob objects are uninterpreted sequences |
5dc7bcc2 JH |
4 | of bytes. There is no encoding translation at the core |
5 | level. | |
6 | ||
3a59e595 KB |
7 | - Path names are encoded in UTF-8 normalization form C. This |
8 | applies to tree objects, the index file, ref names, as well as | |
9 | path names in command line arguments, environment variables | |
10 | and config files (`.git/config` (see linkgit:git-config[1]), | |
11 | linkgit:gitignore[5], linkgit:gitattributes[5] and | |
12 | linkgit:gitmodules[5]). | |
13 | + | |
14 | Note that Git at the core level treats path names simply as | |
15 | sequences of non-NUL bytes, there are no path name encoding | |
16 | conversions (except on Mac and Windows). Therefore, using | |
17 | non-ASCII path names will mostly work even on platforms and file | |
18 | systems that use legacy extended ASCII encodings. However, | |
19 | repositories created on such systems will not work properly on | |
20 | UTF-8-based systems (e.g. Linux, Mac, Windows) and vice versa. | |
21 | Additionally, many Git-based tools simply assume path names to | |
22 | be UTF-8 and will fail to display other encodings correctly. | |
23 | ||
24 | - Commit log messages are typically encoded in UTF-8, but other | |
25 | extended ASCII encodings are also supported. This includes | |
26 | ISO-8859-x, CP125x and many others, but _not_ UTF-16/32, | |
27 | EBCDIC and CJK multi-byte encodings (GBK, Shift-JIS, Big5, | |
28 | EUC-x, CP9xx etc.). | |
5dc7bcc2 JH |
29 | |
30 | Although we encourage that the commit log messages are encoded | |
2de9b711 | 31 | in UTF-8, both the core and Git Porcelain are designed not to |
5dc7bcc2 | 32 | force UTF-8 on projects. If all participants of a particular |
2de9b711 | 33 | project find it more convenient to use legacy encodings, Git |
5dc7bcc2 JH |
34 | does not forbid it. However, there are a few things to keep in |
35 | mind. | |
36 | ||
0b444cdb | 37 | . 'git commit' and 'git commit-tree' issues |
790296fd | 38 | a warning if the commit log message given to it does not look |
5dc7bcc2 JH |
39 | like a valid UTF-8 string, unless you explicitly say your |
40 | project uses a legacy encoding. The way to say this is to | |
38eb9329 | 41 | have i18n.commitencoding in `.git/config` file, like this: |
5dc7bcc2 JH |
42 | + |
43 | ------------ | |
38eb9329 | 44 | [i18n] |
95791be7 | 45 | commitEncoding = ISO-8859-1 |
5dc7bcc2 JH |
46 | ------------ |
47 | + | |
48 | Commit objects created with the above setting record the value | |
95791be7 | 49 | of `i18n.commitEncoding` in its `encoding` header. This is to |
5dc7bcc2 JH |
50 | help other people who look at them later. Lack of this header |
51 | implies that the commit log message is encoded in UTF-8. | |
52 | ||
0b444cdb | 53 | . 'git log', 'git show', 'git blame' and friends look at the |
69cd8f63 AG |
54 | `encoding` header of a commit object, and try to re-code the |
55 | log message into UTF-8 unless otherwise specified. You can | |
5dc7bcc2 | 56 | specify the desired output encoding with |
95791be7 | 57 | `i18n.logOutputEncoding` in `.git/config` file, like this: |
5dc7bcc2 JH |
58 | + |
59 | ------------ | |
38eb9329 | 60 | [i18n] |
95791be7 | 61 | logOutputEncoding = ISO-8859-1 |
5dc7bcc2 JH |
62 | ------------ |
63 | + | |
64 | If you do not have this configuration variable, the value of | |
95791be7 | 65 | `i18n.commitEncoding` is used instead. |
5dc7bcc2 JH |
66 | |
67 | Note that we deliberately chose not to re-code the commit log | |
68 | message when a commit is made to force UTF-8 at the commit | |
69 | object level, because re-coding to UTF-8 is not necessarily a | |
70 | reversible operation. |