]>
Commit | Line | Data |
---|---|---|
f8b3a85b MS |
1 | |
2 | ||
3 | ||
4 | ||
5 | ||
6 | ||
7 | Network Working Group A. Phillips, Ed. | |
8 | Request for Comments: 4646 Yahoo! Inc. | |
9 | BCP: 47 M. Davis, Ed. | |
10 | Obsoletes: 3066 Google | |
11 | Category: Best Current Practice September 2006 | |
12 | ||
13 | ||
14 | Tags for Identifying Languages | |
15 | ||
16 | Status of This Memo | |
17 | ||
18 | This document specifies an Internet Best Current Practices for the | |
19 | Internet Community, and requests discussion and suggestions for | |
20 | improvements. Distribution of this memo is unlimited. | |
21 | ||
22 | Copyright Notice | |
23 | ||
24 | Copyright (C) The Internet Society (2005). | |
25 | ||
26 | Abstract | |
27 | ||
28 | This document describes the structure, content, construction, and | |
29 | semantics of language tags for use in cases where it is desirable to | |
30 | indicate the language used in an information object. It also | |
31 | describes how to register values for use in language tags and the | |
32 | creation of user-defined extensions for private interchange. This | |
33 | document, in combination with RFC 4647, replaces RFC 3066, which | |
34 | replaced RFC 1766. | |
35 | ||
36 | ||
37 | ||
38 | ||
39 | ||
40 | ||
41 | ||
42 | ||
43 | ||
44 | ||
45 | ||
46 | ||
47 | ||
48 | ||
49 | ||
50 | ||
51 | ||
52 | ||
53 | ||
54 | ||
55 | ||
56 | ||
57 | ||
58 | Phillips & Davis Best Current Practice [Page 1] | |
59 | \f | |
60 | RFC 4646 Tags for Identifying Languages September 2006 | |
61 | ||
62 | ||
63 | Table of Contents | |
64 | ||
65 | 1. Introduction ....................................................3 | |
66 | 2. The Language Tag ................................................4 | |
67 | 2.1. Syntax .....................................................4 | |
68 | 2.2. Language Subtag Sources and Interpretation .................7 | |
69 | 2.2.1. Primary Language Subtag .............................8 | |
70 | 2.2.2. Extended Language Subtags ..........................10 | |
71 | 2.2.3. Script Subtag ......................................11 | |
72 | 2.2.4. Region Subtag ......................................11 | |
73 | 2.2.5. Variant Subtags ....................................13 | |
74 | 2.2.6. Extension Subtags ..................................14 | |
75 | 2.2.7. Private Use Subtags ................................16 | |
76 | 2.2.8. Preexisting RFC 3066 Registrations .................16 | |
77 | 2.2.9. Classes of Conformance .............................17 | |
78 | 3. Registry Format and Maintenance ................................18 | |
79 | 3.1. Format of the IANA Language Subtag Registry ...............18 | |
80 | 3.2. Language Subtag Reviewer ..................................24 | |
81 | 3.3. Maintenance of the Registry ...............................24 | |
82 | 3.4. Stability of IANA Registry Entries ........................25 | |
83 | 3.5. Registration Procedure for Subtags ........................29 | |
84 | 3.6. Possibilities for Registration ............................32 | |
85 | 3.7. Extensions and Extensions Registry ........................34 | |
86 | 3.8. Initialization of the Registries ..........................37 | |
87 | 4. Formation and Processing of Language Tags ......................38 | |
88 | 4.1. Choice of Language Tag ....................................38 | |
89 | 4.2. Meaning of the Language Tag ...............................40 | |
90 | 4.3. Length Considerations .....................................41 | |
91 | 4.3.1. Working with Limited Buffer Sizes ..................42 | |
92 | 4.3.2. Truncation of Language Tags ........................43 | |
93 | 4.4. Canonicalization of Language Tags .........................44 | |
94 | 4.5. Considerations for Private Use Subtags ....................45 | |
95 | 5. IANA Considerations ............................................46 | |
96 | 5.1. Language Subtag Registry ..................................46 | |
97 | 5.2. Extensions Registry .......................................47 | |
98 | 6. Security Considerations ........................................48 | |
99 | 7. Character Set Considerations ...................................48 | |
100 | 8. Changes from RFC 3066 ..........................................49 | |
101 | 9. References .....................................................52 | |
102 | 9.1. Normative References ......................................52 | |
103 | 9.2. Informative References ....................................53 | |
104 | Appendix A. Acknowledgements ......................................55 | |
105 | Appendix B. Examples of Language Tags (Informative) ...............56 | |
106 | ||
107 | ||
108 | ||
109 | ||
110 | ||
111 | ||
112 | ||
113 | ||
114 | Phillips & Davis Best Current Practice [Page 2] | |
115 | \f | |
116 | RFC 4646 Tags for Identifying Languages September 2006 | |
117 | ||
118 | ||
119 | 1. Introduction | |
120 | ||
121 | Human beings on our planet have, past and present, used a number of | |
122 | languages. There are many reasons why one would want to identify the | |
123 | language used when presenting or requesting information. | |
124 | ||
125 | A user's language preferences often need to be identified so that | |
126 | appropriate processing can be applied. For example, the user's | |
127 | language preferences in a Web browser can be used to select Web pages | |
128 | appropriately. Language preferences can also be used to select among | |
129 | tools (such as dictionaries) to assist in the processing or | |
130 | understanding of content in different languages. | |
131 | ||
132 | In addition, knowledge about the particular language used by some | |
133 | piece of information content might be useful or even required by some | |
134 | types of processing; for example, spell-checking, computer- | |
135 | synthesized speech, Braille transcription, or high-quality print | |
136 | renderings. | |
137 | ||
138 | One means of indicating the language used is by labeling the | |
139 | information content with an identifier or "tag". These tags can be | |
140 | used to specify user preferences when selecting information content, | |
141 | or for labeling additional attributes of content and associated | |
142 | resources. | |
143 | ||
144 | Tags can also be used to indicate additional language attributes of | |
145 | content. For example, indicating specific information about the | |
146 | dialect, writing system, or orthography used in a document or | |
147 | resource may enable the user to obtain information in a form that | |
148 | they can understand, or it can be important in processing or | |
149 | rendering the given content into an appropriate form or style. | |
150 | ||
151 | This document specifies a particular identifier mechanism (the | |
152 | language tag) and a registration function for values to be used to | |
153 | form tags. It also defines a mechanism for private use values and | |
154 | future extension. | |
155 | ||
156 | This document, in combination with [RFC4647], replaces [RFC3066], | |
157 | which replaced [RFC1766]. For a list of changes in this document, | |
158 | see Section 8. | |
159 | ||
160 | The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |
161 | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |
162 | document are to be interpreted as described in [RFC2119]. | |
163 | ||
164 | ||
165 | ||
166 | ||
167 | ||
168 | ||
169 | ||
170 | Phillips & Davis Best Current Practice [Page 3] | |
171 | \f | |
172 | RFC 4646 Tags for Identifying Languages September 2006 | |
173 | ||
174 | ||
175 | 2. The Language Tag | |
176 | ||
177 | Language tags are used to help identify languages, whether spoken, | |
178 | written, signed, or otherwise signaled, for the purpose of | |
179 | communication. This includes constructed and artificial languages, | |
180 | but excludes languages not intended primarily for human | |
181 | communication, such as programming languages. | |
182 | ||
183 | 2.1. Syntax | |
184 | ||
185 | The language tag is composed of one or more parts, known as | |
186 | "subtags". Each subtag consists of a sequence of alphanumeric | |
187 | characters. Subtags are distinguished and separated from one another | |
188 | by a hyphen ("-", ABNF [RFC4234] %x2D). A language tag consists of a | |
189 | "primary language" subtag and a (possibly empty) series of subsequent | |
190 | subtags, each of which refines or narrows the range of languages | |
191 | identified by the overall tag. | |
192 | ||
193 | Usually, each type of subtag is distinguished by length, position in | |
194 | the tag, and content: subtags can be recognized solely by these | |
195 | features. The only exception to this is a fixed list of | |
196 | grandfathered tags registered under RFC 3066 [RFC3066]. This makes | |
197 | it possible to construct a parser that can extract and assign some | |
198 | semantic information to the subtags, even if the specific subtag | |
199 | values are not recognized. Thus, a parser need not have an up-to- | |
200 | date copy (or any copy at all) of the subtag registry to perform most | |
201 | searching and matching operations. | |
202 | ||
203 | ||
204 | ||
205 | ||
206 | ||
207 | ||
208 | ||
209 | ||
210 | ||
211 | ||
212 | ||
213 | ||
214 | ||
215 | ||
216 | ||
217 | ||
218 | ||
219 | ||
220 | ||
221 | ||
222 | ||
223 | ||
224 | ||
225 | ||
226 | Phillips & Davis Best Current Practice [Page 4] | |
227 | \f | |
228 | RFC 4646 Tags for Identifying Languages September 2006 | |
229 | ||
230 | ||
231 | The syntax of the language tag in ABNF [RFC4234] is: | |
232 | ||
233 | Language-Tag = langtag | |
234 | / privateuse ; private use tag | |
235 | / grandfathered ; grandfathered registrations | |
236 | ||
237 | langtag = (language | |
238 | ["-" script] | |
239 | ["-" region] | |
240 | *("-" variant) | |
241 | *("-" extension) | |
242 | ["-" privateuse]) | |
243 | ||
244 | language = (2*3ALPHA [ extlang ]) ; shortest ISO 639 code | |
245 | / 4ALPHA ; reserved for future use | |
246 | / 5*8ALPHA ; registered language subtag | |
247 | ||
248 | extlang = *3("-" 3ALPHA) ; reserved for future use | |
249 | ||
250 | script = 4ALPHA ; ISO 15924 code | |
251 | ||
252 | region = 2ALPHA ; ISO 3166 code | |
253 | / 3DIGIT ; UN M.49 code | |
254 | ||
255 | variant = 5*8alphanum ; registered variants | |
256 | / (DIGIT 3alphanum) | |
257 | ||
258 | extension = singleton 1*("-" (2*8alphanum)) | |
259 | ||
260 | singleton = %x41-57 / %x59-5A / %x61-77 / %x79-7A / DIGIT | |
261 | ; "a"-"w" / "y"-"z" / "A"-"W" / "Y"-"Z" / "0"-"9" | |
262 | ; Single letters: x/X is reserved for private use | |
263 | ||
264 | privateuse = ("x"/"X") 1*("-" (1*8alphanum)) | |
265 | ||
266 | grandfathered = 1*3ALPHA 1*2("-" (2*8alphanum)) | |
267 | ; grandfathered registration | |
268 | ; Note: i is the only singleton | |
269 | ; that starts a grandfathered tag | |
270 | ||
271 | alphanum = (ALPHA / DIGIT) ; letters and numbers | |
272 | ||
273 | Figure 1: Language Tag ABNF | |
274 | ||
275 | Note: There is a subtlety in the ABNF for 'variant': variants | |
276 | starting with a digit MAY be four characters long, while those | |
277 | starting with a letter MUST be at least five characters long. | |
278 | ||
279 | ||
280 | ||
281 | ||
282 | Phillips & Davis Best Current Practice [Page 5] | |
283 | \f | |
284 | RFC 4646 Tags for Identifying Languages September 2006 | |
285 | ||
286 | ||
287 | All subtags have a maximum length of eight characters and whitespace | |
288 | is not permitted in a language tag. For examples of language tags, | |
289 | see Appendix B. | |
290 | ||
291 | Note that although [RFC4234] refers to octets, the language tags | |
292 | described in this document are sequences of characters from the | |
293 | US-ASCII [ISO646] repertoire. Language tags MAY be used in documents | |
294 | and applications that use other encodings, so long as these encompass | |
295 | the US-ASCII repertoire. An example of this would be an XML document | |
296 | that uses the UTF-16LE [RFC2781] encoding of [Unicode]. | |
297 | ||
298 | The tags and their subtags, including private use and extensions, are | |
299 | to be treated as case insensitive: there exist conventions for the | |
300 | capitalization of some of the subtags, but these MUST NOT be taken to | |
301 | carry meaning. | |
302 | ||
303 | For example: | |
304 | ||
305 | o [ISO639-1] recommends that language codes be written in lowercase | |
306 | ('mn' Mongolian). | |
307 | ||
308 | o [ISO3166-1] recommends that country codes be capitalized ('MN' | |
309 | Mongolia). | |
310 | ||
311 | o [ISO15924] recommends that script codes use lowercase with the | |
312 | initial letter capitalized ('Cyrl' Cyrillic). | |
313 | ||
314 | However, in the tags defined by this document, the uppercase US-ASCII | |
315 | letters in the range 'A' through 'Z' are considered equivalent and | |
316 | mapped directly to their US-ASCII lowercase equivalents in the range | |
317 | 'a' through 'z'. Thus, the tag "mn-Cyrl-MN" is not distinct from | |
318 | "MN-cYRL-mn" or "mN-cYrL-Mn" (or any other combination), and each of | |
319 | these variations conveys the same meaning: Mongolian written in the | |
320 | Cyrillic script as used in Mongolia. | |
321 | ||
322 | Although case distinctions do not carry meaning in language tags, | |
323 | consistent formatting and presentation of the tags will aid users. | |
324 | The format of the tags and subtags in the registry is RECOMMENDED. | |
325 | In this format, all non-initial two-letter subtags are uppercase, all | |
326 | non-initial four-letter subtags are titlecase, and all other subtags | |
327 | are lowercase. | |
328 | ||
329 | ||
330 | ||
331 | ||
332 | ||
333 | ||
334 | ||
335 | ||
336 | ||
337 | ||
338 | Phillips & Davis Best Current Practice [Page 6] | |
339 | \f | |
340 | RFC 4646 Tags for Identifying Languages September 2006 | |
341 | ||
342 | ||
343 | 2.2. Language Subtag Sources and Interpretation | |
344 | ||
345 | The namespace of language tags and their subtags is administered by | |
346 | the Internet Assigned Numbers Authority (IANA) [RFC2860] according to | |
347 | the rules in Section 5 of this document. The Language Subtag | |
348 | Registry maintained by IANA is the source for valid subtags: other | |
349 | standards referenced in this section provide the source material for | |
350 | that registry. | |
351 | ||
352 | Terminology in this section: | |
353 | ||
354 | o Tag or tags refers to a complete language tag, such as | |
355 | "fr-Latn-CA". Examples of tags in this document are enclosed in | |
356 | double-quotes ("en-US"). | |
357 | ||
358 | o Subtag refers to a specific section of a tag, delimited by hyphen, | |
359 | such as the subtag 'Latn' in "fr-Latn-CA". Examples of subtags in | |
360 | this document are enclosed in single quotes ('Latn'). | |
361 | ||
362 | o Code or codes refers to values defined in external standards (and | |
363 | that are used as subtags in this document). For example, 'Latn' | |
364 | is an [ISO15924] script code that was used to define the 'Latn' | |
365 | script subtag for use in a language tag. Examples of codes in | |
366 | this document are enclosed in single quotes ('en', 'Latn'). | |
367 | ||
368 | The definitions in this section apply to the various subtags within | |
369 | the language tags defined by this document, excepting those | |
370 | "grandfathered" tags defined in Section 2.2.8. | |
371 | ||
372 | Language tags are designed so that each subtag type has unique length | |
373 | and content restrictions. These make identification of the subtag's | |
374 | type possible, even if the content of the subtag itself is | |
375 | unrecognized. This allows tags to be parsed and processed without | |
376 | reference to the latest version of the underlying standards or the | |
377 | IANA registry and makes the associated exception handling when | |
378 | parsing tags simpler. | |
379 | ||
380 | Subtags in the IANA registry that do not come from an underlying | |
381 | standard can only appear in specific positions in a tag. | |
382 | Specifically, they can only occur as primary language subtags or as | |
383 | variant subtags. | |
384 | ||
385 | Note that sequences of private use and extension subtags MUST occur | |
386 | at the end of the sequence of subtags and MUST NOT be interspersed | |
387 | with subtags defined elsewhere in this document. | |
388 | ||
389 | Single-letter and single-digit subtags are reserved for current or | |
390 | future use. These include the following current uses: | |
391 | ||
392 | ||
393 | ||
394 | Phillips & Davis Best Current Practice [Page 7] | |
395 | \f | |
396 | RFC 4646 Tags for Identifying Languages September 2006 | |
397 | ||
398 | ||
399 | o The single-letter subtag 'x' is reserved to introduce a sequence | |
400 | of private use subtags. The interpretation of any private use | |
401 | subtags is defined solely by private agreement and is not defined | |
402 | by the rules in this section or in any standard or registry | |
403 | defined in this document. | |
404 | ||
405 | o All other single-letter subtags are reserved to introduce | |
406 | standardized extension subtag sequences as described in | |
407 | Section 3.7. | |
408 | ||
409 | The single-letter subtag 'i' is used by some grandfathered tags, such | |
410 | as "i-enochian", where it always appears in the first position and | |
411 | cannot be confused with an extension. | |
412 | ||
413 | 2.2.1. Primary Language Subtag | |
414 | ||
415 | The primary language subtag is the first subtag in a language tag | |
416 | (with the exception of private use and certain grandfathered tags) | |
417 | and cannot be omitted. The following rules apply to the primary | |
418 | language subtag: | |
419 | ||
420 | 1. All two-character language subtags were defined in the IANA | |
421 | registry according to the assignments found in the standard ISO | |
422 | 639 Part 1, "ISO 639-1:2002, Codes for the representation of | |
423 | names of languages -- Part 1: Alpha-2 code" [ISO639-1], or using | |
424 | assignments subsequently made by the ISO 639 Part 1 maintenance | |
425 | agency or governing standardization bodies. | |
426 | ||
427 | 2. All three-character language subtags were defined in the IANA | |
428 | registry according to the assignments found in ISO 639 Part 2, | |
429 | "ISO 639-2:1998 - Codes for the representation of names of | |
430 | languages -- Part 2: Alpha-3 code - edition 1" [ISO639-2], or | |
431 | assignments subsequently made by the ISO 639 Part 2 maintenance | |
432 | agency or governing standardization bodies. | |
433 | ||
434 | 3. The subtags in the range 'qaa' through 'qtz' are reserved for | |
435 | private use in language tags. These subtags correspond to codes | |
436 | reserved by ISO 639-2 for private use. These codes MAY be used | |
437 | for non-registered primary language subtags (instead of using | |
438 | private use subtags following 'x-'). Please refer to Section 4.5 | |
439 | for more information on private use subtags. | |
440 | ||
441 | 4. All four-character language subtags are reserved for possible | |
442 | future standardization. | |
443 | ||
444 | 5. All language subtags of 5 to 8 characters in length in the IANA | |
445 | registry were defined via the registration process in Section 3.5 | |
446 | and MAY be used to form the primary language subtag. At the time | |
447 | ||
448 | ||
449 | ||
450 | Phillips & Davis Best Current Practice [Page 8] | |
451 | \f | |
452 | RFC 4646 Tags for Identifying Languages September 2006 | |
453 | ||
454 | ||
455 | this document was created, there were no examples of this kind of | |
456 | subtag and future registrations of this type will be discouraged: | |
457 | primary languages are strongly RECOMMENDED for registration with | |
458 | ISO 639, and proposals rejected by ISO 639/RA will be closely | |
459 | scrutinized before they are registered with IANA. | |
460 | ||
461 | 6. The single-character subtag 'x' as the primary subtag indicates | |
462 | that the language tag consists solely of subtags whose meaning is | |
463 | defined by private agreement. For example, in the tag "x-fr-CH", | |
464 | the subtags 'fr' and 'CH' SHOULD NOT be taken to represent the | |
465 | French language or the country of Switzerland (or any other value | |
466 | in the IANA registry) unless there is a private agreement in | |
467 | place to do so. See Section 4.5. | |
468 | ||
469 | 7. The single-character subtag 'i' is used by some grandfathered | |
470 | tags (see Section 2.2.8) such as "i-klingon" and "i-bnn". (Other | |
471 | grandfathered tags have a primary language subtag in their first | |
472 | position.) | |
473 | ||
474 | 8. Other values MUST NOT be assigned to the primary subtag except by | |
475 | revision or update of this document. | |
476 | ||
477 | Note: For languages that have both an ISO 639-1 two-character code | |
478 | and an ISO 639-2 three-character code, only the ISO 639-1 two- | |
479 | character code is defined in the IANA registry. | |
480 | ||
481 | Note: For languages that have no ISO 639-1 two-character code and for | |
482 | which the ISO 639-2/T (Terminology) code and the ISO 639-2/B | |
483 | (Bibliographic) codes differ, only the Terminology code is defined in | |
484 | the IANA registry. At the time this document was created, all | |
485 | languages that had both kinds of three-character code were also | |
486 | assigned a two-character code; it is not expected that future | |
487 | assignments of this nature will occur. | |
488 | ||
489 | Note: To avoid problems with versioning and subtag choice as | |
490 | experienced during the transition between RFC 1766 and RFC 3066, as | |
491 | well as the canonical nature of subtags defined by this document, the | |
492 | ISO 639 Registration Authority Joint Advisory Committee (ISO 639/ | |
493 | RA-JAC) has included the following statement in [iso639.prin]: | |
494 | ||
495 | "A language code already in ISO 639-2 at the point of freezing ISO | |
496 | 639-1 shall not later be added to ISO 639-1. This is to ensure | |
497 | consistency in usage over time, since users are directed in Internet | |
498 | applications to employ the alpha-3 code when an alpha-2 code for that | |
499 | language is not available." | |
500 | ||
501 | ||
502 | ||
503 | ||
504 | ||
505 | ||
506 | Phillips & Davis Best Current Practice [Page 9] | |
507 | \f | |
508 | RFC 4646 Tags for Identifying Languages September 2006 | |
509 | ||
510 | ||
511 | In order to avoid instability in the canonical form of tags, if a | |
512 | two-character code is added to ISO 639-1 for a language for which a | |
513 | three-character code was already included in ISO 639-2, the two- | |
514 | character code MUST NOT be registered. See Section 3.4. | |
515 | ||
516 | For example, if some content were tagged with 'haw' (Hawaiian), which | |
517 | currently has no two-character code, the tag would not be invalidated | |
518 | if ISO 639-1 were to assign a two-character code to the Hawaiian | |
519 | language at a later date. | |
520 | ||
521 | For example, one of the grandfathered IANA registrations is | |
522 | "i-enochian". The subtag 'enochian' could be registered in the IANA | |
523 | registry as a primary language subtag (assuming that ISO 639 does not | |
524 | register this language first), making tags such as "enochian-AQ" and | |
525 | "enochian-Latn" valid. | |
526 | ||
527 | 2.2.2. Extended Language Subtags | |
528 | ||
529 | The following rules apply to the extended language subtags: | |
530 | ||
531 | 1. Three-letter subtags immediately following the primary subtag are | |
532 | reserved for future standardization, anticipating work that is | |
533 | currently under way on ISO 639. | |
534 | ||
535 | 2. Extended language subtags MUST follow the primary subtag and | |
536 | precede any other subtags. | |
537 | ||
538 | 3. There MAY be up to three extended language subtags. | |
539 | ||
540 | 4. Extended language subtags MUST NOT be registered or used to form | |
541 | language tags. Their syntax is described here so that | |
542 | implementations can be compatible with any future revision of | |
543 | this document that does provide for their registration. | |
544 | ||
545 | Extended language subtag records, once they appear in the registry, | |
546 | MUST include exactly one 'Prefix' field indicating an appropriate | |
547 | language subtag or sequence of subtags that MUST always appear as a | |
548 | prefix to the extended language subtag. | |
549 | ||
550 | Example: In a future revision or update of this document, the tag | |
551 | "zh-gan" (registered under RFC 3066) might become a valid non- | |
552 | grandfathered (that is, redundant) tag in which the subtag 'gan' | |
553 | might represent the Chinese dialect 'Gan'. | |
554 | ||
555 | ||
556 | ||
557 | ||
558 | ||
559 | ||
560 | ||
561 | ||
562 | Phillips & Davis Best Current Practice [Page 10] | |
563 | \f | |
564 | RFC 4646 Tags for Identifying Languages September 2006 | |
565 | ||
566 | ||
567 | 2.2.3. Script Subtag | |
568 | ||
569 | Script subtags are used to indicate the script or writing system | |
570 | variations that distinguish the written forms of a language or its | |
571 | dialects. The following rules apply to the script subtags: | |
572 | ||
573 | 1. All four-character subtags were defined according to | |
574 | [ISO15924]--"Codes for the representation of names of scripts": | |
575 | alpha-4 script codes, or subsequently assigned by the ISO 15924 | |
576 | maintenance agency or governing standardization bodies, denoting | |
577 | the script or writing system used in conjunction with this | |
578 | language. | |
579 | ||
580 | 2. Script subtags MUST immediately follow the primary language | |
581 | subtag and all extended language subtags and MUST occur before | |
582 | any other type of subtag described below. | |
583 | ||
584 | 3. The script subtags 'Qaaa' through 'Qabx' are reserved for private | |
585 | use in language tags. These subtags correspond to codes reserved | |
586 | by ISO 15924 for private use. These codes MAY be used for non- | |
587 | registered script values. Please refer to Section 4.5 for more | |
588 | information on private use subtags. | |
589 | ||
590 | 4. Script subtags MUST NOT be registered using the process in | |
591 | Section 3.5 of this document. Variant subtags MAY be considered | |
592 | for registration for that purpose. | |
593 | ||
594 | 5. There MUST be at most one script subtag in a language tag, and | |
595 | the script subtag SHOULD be omitted when it adds no | |
596 | distinguishing value to the tag or when the primary language | |
597 | subtag's record includes a Suppress-Script field listing the | |
598 | applicable script subtag. | |
599 | ||
600 | Example: "sr-Latn" represents Serbian written using the Latin script. | |
601 | ||
602 | 2.2.4. Region Subtag | |
603 | ||
604 | Region subtags are used to indicate linguistic variations associated | |
605 | with or appropriate to a specific country, territory, or region. | |
606 | Typically, a region subtag is used to indicate regional dialects or | |
607 | usage, or region-specific spelling conventions. A region subtag can | |
608 | also be used to indicate that content is expressed in a way that is | |
609 | appropriate for use throughout a region, for instance, Spanish | |
610 | content tailored to be useful throughout Latin America. | |
611 | ||
612 | ||
613 | ||
614 | ||
615 | ||
616 | ||
617 | ||
618 | Phillips & Davis Best Current Practice [Page 11] | |
619 | \f | |
620 | RFC 4646 Tags for Identifying Languages September 2006 | |
621 | ||
622 | ||
623 | The following rules apply to the region subtags: | |
624 | ||
625 | 1. Region subtags MUST follow any language, extended language, or | |
626 | script subtags and MUST precede all other subtags. | |
627 | ||
628 | 2. All two-character subtags following the primary subtag were | |
629 | defined in the IANA registry according to the assignments found | |
630 | in [ISO3166-1] ("Codes for the representation of names of | |
631 | countries and their subdivisions -- Part 1: Country codes") using | |
632 | the list of alpha-2 country codes, or using assignments | |
633 | subsequently made by the ISO 3166 maintenance agency or governing | |
634 | standardization bodies. | |
635 | ||
636 | 3. All three-character subtags consisting of digit (numeric) | |
637 | characters following the primary subtag were defined in the IANA | |
638 | registry according to the assignments found in UN Standard | |
639 | Country or Area Codes for Statistical Use [UN_M.49] or | |
640 | assignments subsequently made by the governing standards body. | |
641 | Note that not all of the UN M.49 codes are defined in the IANA | |
642 | registry. The following rules define which codes are entered | |
643 | into the registry as valid subtags: | |
644 | ||
645 | A. UN numeric codes assigned to 'macro-geographical | |
646 | (continental)' or sub-regions MUST be registered in the | |
647 | registry. These codes are not associated with an assigned | |
648 | ISO 3166 alpha-2 code and represent supra-national areas, | |
649 | usually covering more than one nation, state, province, or | |
650 | territory. | |
651 | ||
652 | B. UN numeric codes for 'economic groupings' or 'other | |
653 | groupings' MUST NOT be registered in the IANA registry and | |
654 | MUST NOT be used to form language tags. | |
655 | ||
656 | C. UN numeric codes for countries or areas with ambiguous ISO | |
657 | 3166 alpha-2 codes, when entered into the registry, MUST be | |
658 | defined according to the rules in Section 3.4 and MUST be | |
659 | used to form language tags that represent the country or | |
660 | region for which they are defined. | |
661 | ||
662 | D. UN numeric codes for countries or areas for which there is an | |
663 | associated ISO 3166 alpha-2 code in the registry MUST NOT be | |
664 | entered into the registry and MUST NOT be used to form | |
665 | language tags. Note that the ISO 3166-based subtag in the | |
666 | registry MUST actually be associated with the UN M.49 code in | |
667 | question. | |
668 | ||
669 | ||
670 | ||
671 | ||
672 | ||
673 | ||
674 | Phillips & Davis Best Current Practice [Page 12] | |
675 | \f | |
676 | RFC 4646 Tags for Identifying Languages September 2006 | |
677 | ||
678 | ||
679 | E. UN numeric codes and ISO 3166 alpha-2 codes for countries or | |
680 | areas listed as eligible for registration in [RFC4645] but | |
681 | not presently registered MAY be entered into the IANA | |
682 | registry via the process described in Section 3.5. Once | |
683 | registered, these codes MAY be used to form language tags. | |
684 | ||
685 | F. All other UN numeric codes for countries or areas that do not | |
686 | have an associated ISO 3166 alpha-2 code MUST NOT be entered | |
687 | into the registry and MUST NOT be used to form language tags. | |
688 | For more information about these codes, see Section 3.4. | |
689 | ||
690 | 4. Note: The alphanumeric codes in Appendix X of the UN document | |
691 | MUST NOT be entered into the registry and MUST NOT be used to | |
692 | form language tags. (At the time this document was created, | |
693 | these values matched the ISO 3166 alpha-2 codes.) | |
694 | ||
695 | 5. There MUST be at most one region subtag in a language tag and the | |
696 | region subtag MAY be omitted, as when it adds no distinguishing | |
697 | value to the tag. | |
698 | ||
699 | 6. The region subtags 'AA', 'QM'-'QZ', 'XA'-'XZ', and 'ZZ' are | |
700 | reserved for private use in language tags. These subtags | |
701 | correspond to codes reserved by ISO 3166 for private use. These | |
702 | codes MAY be used for private use region subtags (instead of | |
703 | using a private use subtag sequence). Please refer to | |
704 | Section 4.5 for more information on private use subtags. | |
705 | ||
706 | "de-CH" represents German ('de') as used in Switzerland ('CH'). | |
707 | ||
708 | "sr-Latn-CS" represents Serbian ('sr') written using Latin script | |
709 | ('Latn') as used in Serbia and Montenegro ('CS'). | |
710 | ||
711 | "es-419" represents Spanish ('es') appropriate to the UN-defined | |
712 | Latin America and Caribbean region ('419'). | |
713 | ||
714 | 2.2.5. Variant Subtags | |
715 | ||
716 | Variant subtags are used to indicate additional, well-recognized | |
717 | variations that define a language or its dialects that are not | |
718 | covered by other available subtags. The following rules apply to the | |
719 | variant subtags: | |
720 | ||
721 | 1. Variant subtags are not associated with any external standard. | |
722 | Variant subtags and their meanings are defined by the | |
723 | registration process defined in Section 3.5. | |
724 | ||
725 | 2. Variant subtags MUST follow all of the other defined subtags, but | |
726 | precede any extension or private use subtag sequences. | |
727 | ||
728 | ||
729 | ||
730 | Phillips & Davis Best Current Practice [Page 13] | |
731 | \f | |
732 | RFC 4646 Tags for Identifying Languages September 2006 | |
733 | ||
734 | ||
735 | 3. More than one variant MAY be used to form the language tag. | |
736 | ||
737 | 4. Variant subtags MUST be registered with IANA according to the | |
738 | rules in Section 3.5 of this document before being used to form | |
739 | language tags. In order to distinguish variants from other types | |
740 | of subtags, registrations MUST meet the following length and | |
741 | content restrictions: | |
742 | ||
743 | 1. Variant subtags that begin with a letter (a-z, A-Z) MUST be | |
744 | at least five characters long. | |
745 | ||
746 | 2. Variant subtags that begin with a digit (0-9) MUST be at | |
747 | least four characters long. | |
748 | ||
749 | Variant subtag records in the language subtag registry MAY include | |
750 | one or more 'Prefix' fields, which indicate the language tag or tags | |
751 | that would make a suitable prefix (with other subtags, as | |
752 | appropriate) in forming a language tag with the variant. For | |
753 | example, the subtag 'nedis' has a Prefix of "sl", making it suitable | |
754 | to form language tags such as "sl-nedis" and "sl-IT-nedis", but not | |
755 | suitable for use in a tag such as "zh-nedis" or "it-IT-nedis". | |
756 | ||
757 | "sl-nedis" represents the Natisone or Nadiza dialect of Slovenian. | |
758 | ||
759 | "de-CH-1996" represents German as used in Switzerland and as written | |
760 | using the spelling reform beginning in the year 1996 C.E. | |
761 | ||
762 | Most variants that share a prefix are mutually exclusive. For | |
763 | example, the German orthographic variations '1996' and '1901' SHOULD | |
764 | NOT be used in the same tag, as they represent the dates of different | |
765 | spelling reforms. A variant that can meaningfully be used in | |
766 | combination with another variant SHOULD include a 'Prefix' field in | |
767 | its registry record that lists that other variant. For example, if | |
768 | another German variant 'example' were created that made sense to use | |
769 | with '1996', then 'example' should include two Prefix fields: "de" | |
770 | and "de-1996". | |
771 | ||
772 | 2.2.6. Extension Subtags | |
773 | ||
774 | Extensions provide a mechanism for extending language tags for use in | |
775 | various applications. See Section 3.7. The following rules apply to | |
776 | extensions: | |
777 | ||
778 | 1. Extension subtags are separated from the other subtags defined | |
779 | in this document by a single-character subtag ("singleton"). | |
780 | The singleton MUST be one allocated to a registration authority | |
781 | via the mechanism described in Section 3.7 and MUST NOT be the | |
782 | letter 'x', which is reserved for private use subtag sequences. | |
783 | ||
784 | ||
785 | ||
786 | Phillips & Davis Best Current Practice [Page 14] | |
787 | \f | |
788 | RFC 4646 Tags for Identifying Languages September 2006 | |
789 | ||
790 | ||
791 | 2. Note: Private use subtag sequences starting with the singleton | |
792 | subtag 'x' are described in Section 2.2.7 below. | |
793 | ||
794 | 3. An extension MUST follow at least a primary language subtag. | |
795 | That is, a language tag cannot begin with an extension. | |
796 | Extensions extend language tags, they do not override or replace | |
797 | them. For example, "a-value" is not a well-formed language tag, | |
798 | while "de-a-value" is. | |
799 | ||
800 | 4. Each singleton subtag MUST appear at most one time in each tag | |
801 | (other than as a private use subtag). That is, singleton | |
802 | subtags MUST NOT be repeated. For example, the tag | |
803 | "en-a-bbb-a-ccc" is invalid because the subtag 'a' appears | |
804 | twice. Note that the tag "en-a-bbb-x-a-ccc" is valid because | |
805 | the second appearance of the singleton 'a' is in a private use | |
806 | sequence. | |
807 | ||
808 | 5. Extension subtags MUST meet all of the requirements for the | |
809 | content and format of subtags defined in this document. | |
810 | ||
811 | 6. Extension subtags MUST meet whatever requirements are set by the | |
812 | document that defines their singleton prefix and whatever | |
813 | requirements are provided by the maintaining authority. | |
814 | ||
815 | 7. Each extension subtag MUST be from two to eight characters long | |
816 | and consist solely of letters or digits, with each subtag | |
817 | separated by a single '-'. | |
818 | ||
819 | 8. Each singleton MUST be followed by at least one extension | |
820 | subtag. For example, the tag "tlh-a-b-foo" is invalid because | |
821 | the first singleton 'a' is followed immediately by another | |
822 | singleton 'b'. | |
823 | ||
824 | 9. Extension subtags MUST follow all language, extended language, | |
825 | script, region, and variant subtags in a tag. | |
826 | ||
827 | 10. All subtags following the singleton and before another singleton | |
828 | are part of the extension. Example: In the tag "fr-a-Latn", the | |
829 | subtag 'Latn' does not represent the script subtag 'Latn' | |
830 | defined in the IANA Language Subtag Registry. Its meaning is | |
831 | defined by the extension 'a'. | |
832 | ||
833 | 11. In the event that more than one extension appears in a single | |
834 | tag, the tag SHOULD be canonicalized as described in | |
835 | Section 4.4. | |
836 | ||
837 | ||
838 | ||
839 | ||
840 | ||
841 | ||
842 | Phillips & Davis Best Current Practice [Page 15] | |
843 | \f | |
844 | RFC 4646 Tags for Identifying Languages September 2006 | |
845 | ||
846 | ||
847 | For example, if the prefix singleton 'r' and the shown subtags were | |
848 | defined, then the following tag would be a valid example: | |
849 | "en-Latn-GB-boont-r-extended-sequence-x-private". | |
850 | ||
851 | 2.2.7. Private Use Subtags | |
852 | ||
853 | Private use subtags are used to indicate distinctions in language | |
854 | important in a given context by private agreement. The following | |
855 | rules apply to private use subtags: | |
856 | ||
857 | 1. Private use subtags are separated from the other subtags defined | |
858 | in this document by the reserved single-character subtag 'x'. | |
859 | ||
860 | 2. Private use subtags MUST conform to the format and content | |
861 | constraints defined in the ABNF for all subtags. | |
862 | ||
863 | 3. Private use subtags MUST follow all language, extended language, | |
864 | script, region, variant, and extension subtags in the tag. | |
865 | Another way of saying this is that all subtags following the | |
866 | singleton 'x' MUST be considered private use. Example: The | |
867 | subtag 'US' in the tag "en-x-US" is a private use subtag. | |
868 | ||
869 | 4. A tag MAY consist entirely of private use subtags. | |
870 | ||
871 | 5. No source is defined for private use subtags. Use of private use | |
872 | subtags is by private agreement only. | |
873 | ||
874 | 6. Private use subtags are NOT RECOMMENDED where alternatives exist | |
875 | or for general interchange. See Section 4.5 for more information | |
876 | on private use subtag choice. | |
877 | ||
878 | For example: Users who wished to utilize codes from the Ethnologue | |
879 | publication of SIL International for language identification might | |
880 | agree to exchange tags such as "az-Arab-x-AZE-derbend". This example | |
881 | contains two private use subtags. The first is 'AZE' and the second | |
882 | is 'derbend'. | |
883 | ||
884 | 2.2.8. Preexisting RFC 3066 Registrations | |
885 | ||
886 | Existing IANA-registered language tags from RFC 1766 and/or RFC 3066 | |
887 | maintain their validity. These tags will be maintained in the | |
888 | registry in records of either the "grandfathered" or "redundant" | |
889 | type. Grandfathered tags contain one or more subtags that are not | |
890 | defined in the Language Subtag Registry (see Section 3). Redundant | |
891 | tags consist entirely of subtags defined above and whose independent | |
892 | registration is superseded by this document. For more information, | |
893 | see Section 3.8. | |
894 | ||
895 | ||
896 | ||
897 | ||
898 | Phillips & Davis Best Current Practice [Page 16] | |
899 | \f | |
900 | RFC 4646 Tags for Identifying Languages September 2006 | |
901 | ||
902 | ||
903 | It is important to note that all language tags formed under the | |
904 | guidelines in this document were either legal, well-formed tags or | |
905 | could have been registered under RFC 3066. | |
906 | ||
907 | 2.2.9. Classes of Conformance | |
908 | ||
909 | Implementations sometimes need to describe their capabilities with | |
910 | regard to the rules and practices described in this document. There | |
911 | are two classes of conforming implementations described by this | |
912 | document: "well-formed" processors and "validating" processors. | |
913 | Claims of conformance SHOULD explicitly reference one of these | |
914 | definitions. | |
915 | ||
916 | An implementation that claims to check for well-formed language tags | |
917 | MUST: | |
918 | ||
919 | o Check that the tag and all of its subtags, including extension and | |
920 | private use subtags, conform to the ABNF or that the tag is on the | |
921 | list of grandfathered tags. | |
922 | ||
923 | o Check that singleton subtags that identify extensions do not | |
924 | repeat. For example, the tag "en-a-xx-b-yy-a-zz" is not well- | |
925 | formed. | |
926 | ||
927 | Well-formed processors are strongly encouraged to implement the | |
928 | canonicalization rules contained in Section 4.4. | |
929 | ||
930 | An implementation that claims to be validating MUST: | |
931 | ||
932 | o Check that the tag is well-formed. | |
933 | ||
934 | o Specify the particular registry date for which the implementation | |
935 | performs validation of subtags. | |
936 | ||
937 | o Check that either the tag is a grandfathered tag, or that all | |
938 | language, script, region, and variant subtags consist of valid | |
939 | codes for use in language tags according to the IANA registry as | |
940 | of the particular date specified by the implementation. | |
941 | ||
942 | o Specify which, if any, extension RFCs as defined in Section 3.7 | |
943 | are supported, including version, revision, and date. | |
944 | ||
945 | o For any such extensions supported, check that all subtags used in | |
946 | that extension are valid. | |
947 | ||
948 | o For variant and extended language subtags, if the registry | |
949 | contains one or more 'Prefix' fields for that subtag, check that | |
950 | the tag matches at least one prefix. The tag matches if all the | |
951 | ||
952 | ||
953 | ||
954 | Phillips & Davis Best Current Practice [Page 17] | |
955 | \f | |
956 | RFC 4646 Tags for Identifying Languages September 2006 | |
957 | ||
958 | ||
959 | subtags in the 'Prefix' also appear in the tag. For example, the | |
960 | prefix "es-CO" matches the tag "es-Latn-CO-x-private" because both | |
961 | the 'es' language subtag and 'CO' region subtag appear in the tag. | |
962 | ||
963 | 3. Registry Format and Maintenance | |
964 | ||
965 | This section defines the Language Subtag Registry and the maintenance | |
966 | and update procedures associated with it, as well as a registry for | |
967 | extensions to language tags (Section 3.7). | |
968 | ||
969 | The Language Subtag Registry contains a comprehensive list of all of | |
970 | the subtags valid in language tags. This allows implementers a | |
971 | straightforward and reliable way to validate language tags. The | |
972 | Language Subtag Registry will be maintained so that, except for | |
973 | extension subtags, it is possible to validate all of the subtags that | |
974 | appear in a language tag under the provisions of this document or its | |
975 | revisions or successors. In addition, the meaning of the various | |
976 | subtags will be unambiguous and stable over time. (The meaning of | |
977 | private use subtags, of course, is not defined by the IANA registry.) | |
978 | ||
979 | 3.1. Format of the IANA Language Subtag Registry | |
980 | ||
981 | The IANA Language Subtag Registry ("the registry") consists of a text | |
982 | file that is machine readable in the format described in this | |
983 | section, plus copies of the registration forms approved in accordance | |
984 | with the process described in Section 3.5. The existing registration | |
985 | forms for grandfathered and redundant tags taken from RFC 3066 will | |
986 | be maintained as part of the obsolete RFC 3066 registry. The | |
987 | remaining set of initial subtags will not have registration forms | |
988 | created for them. | |
989 | ||
990 | The registry is in the text format described below. This format was | |
991 | based on the record-jar format described in [record-jar]. | |
992 | ||
993 | Each line of text is limited to 72 characters, including all | |
994 | whitespace. Records are separated by lines containing only the | |
995 | sequence "%%" (%x25.25). | |
996 | ||
997 | Each field can be viewed as a single, logical line of ASCII | |
998 | characters, comprising a field-name and a field-body separated by a | |
999 | COLON character (%x3A). For convenience, the field-body portion of | |
1000 | this conceptual entity can be split into a multiple-line | |
1001 | representation; this is called "folding". The format of the registry | |
1002 | is described by the following ABNF (per [RFC4234]): | |
1003 | ||
1004 | ||
1005 | ||
1006 | ||
1007 | ||
1008 | ||
1009 | ||
1010 | Phillips & Davis Best Current Practice [Page 18] | |
1011 | \f | |
1012 | RFC 4646 Tags for Identifying Languages September 2006 | |
1013 | ||
1014 | ||
1015 | registry = record *("%%" CRLF record) | |
1016 | record = 1*( field-name *SP ":" *SP field-body CRLF ) | |
1017 | field-name = (ALPHA / DIGIT) [*(ALPHA / DIGIT / "-") (ALPHA / DIGIT)] | |
1018 | field-body = *(ASCCHAR/LWSP) | |
1019 | ASCCHAR = %x21-25 / %x27-7E / UNICHAR ; Note: AMPERSAND is %x26 | |
1020 | UNICHAR = "&#x" 2*6HEXDIG ";" | |
1021 | ||
1022 | Figure 2: Registry Format ABNF | |
1023 | ||
1024 | The sequence '..' (%x2E.2E) in a field-body denotes a range of | |
1025 | values. Such a range represents all subtags of the same length that | |
1026 | are in alphabetic or numeric order within that range, including the | |
1027 | values explicitly mentioned. For example 'a..c' denotes the values | |
1028 | 'a', 'b', and 'c' and '11..13' denotes the values '11', '12', and | |
1029 | '13'. | |
1030 | ||
1031 | Characters from outside the US-ASCII [ISO646] repertoire, as well as | |
1032 | the AMPERSAND character ("&", %x26) when it occurs in a field-body, | |
1033 | are represented by a "Numeric Character Reference" using hexadecimal | |
1034 | notation in the style used by [XML10] (see | |
1035 | <http://www.w3.org/TR/REC-xml/#dt-charref>). This consists of the | |
1036 | sequence "&#x" (%x26.23.78) followed by a hexadecimal representation | |
1037 | of the character's code point in [ISO10646] followed by a closing | |
1038 | semicolon (%x3B). For example, the EURO SIGN, U+20AC, would be | |
1039 | represented by the sequence "€". Note that the hexadecimal | |
1040 | notation MAY have between two and six digits. | |
1041 | ||
1042 | All fields whose field-body contains a date value use the "full-date" | |
1043 | format specified in [RFC3339]. For example: "2004-06-28" represents | |
1044 | June 28, 2004, in the Gregorian calendar. | |
1045 | ||
1046 | The first record in the file contains the single field whose field- | |
1047 | name is "File-Date" (see Figure 3). The field-body of this record | |
1048 | contains the last modification date of this copy of the registry, | |
1049 | making it possible to compare different versions of the registry. | |
1050 | The registry on the IANA website is the most current. Versions with | |
1051 | an older date than that one are not up-to-date. | |
1052 | ||
1053 | File-Date: 2004-06-28 | |
1054 | %% | |
1055 | ||
1056 | Figure 3: Example of the File-Date Record | |
1057 | ||
1058 | Subsequent records represent subtags in the registry. Each of the | |
1059 | fields in each record MUST occur no more than once, unless otherwise | |
1060 | noted below. Each record MUST contain the following fields: | |
1061 | ||
1062 | ||
1063 | ||
1064 | ||
1065 | ||
1066 | Phillips & Davis Best Current Practice [Page 19] | |
1067 | \f | |
1068 | RFC 4646 Tags for Identifying Languages September 2006 | |
1069 | ||
1070 | ||
1071 | o 'Type' | |
1072 | ||
1073 | * Type's field-value MUST consist of one of the following | |
1074 | strings: "language", "extlang", "script", "region", "variant", | |
1075 | "grandfathered", and "redundant" and denotes the type of tag or | |
1076 | subtag. | |
1077 | ||
1078 | o Either 'Subtag' or 'Tag' | |
1079 | ||
1080 | * Subtag's field-value contains the subtag being defined. This | |
1081 | field MUST only appear in records of whose 'Type' has one of | |
1082 | these values: "language", "extlang", "script", "region", or | |
1083 | "variant". | |
1084 | ||
1085 | * Tag's field-value contains a complete language tag. This field | |
1086 | MUST only appear in records whose 'Type' has one of these | |
1087 | values: "grandfathered" or "redundant". Note that the field- | |
1088 | value will always follow the 'grandfathered' production in the | |
1089 | ABNF in Section 2.1 | |
1090 | ||
1091 | o Description | |
1092 | ||
1093 | * Description's field-value contains a non-normative description | |
1094 | of the subtag or tag. | |
1095 | ||
1096 | o Added | |
1097 | ||
1098 | * Added's field-value contains the date the record was added to | |
1099 | the registry. | |
1100 | ||
1101 | The 'Subtag' or 'Tag' field MUST use lowercase letters to form the | |
1102 | subtag or tag, with two exceptions. Subtags whose 'Type' field is | |
1103 | 'script' (in other words, subtags defined by ISO 15924) MUST use | |
1104 | titlecase. Subtags whose 'Type' field is 'region' (in other words, | |
1105 | subtags defined by ISO 3166) MUST use uppercase. These exceptions | |
1106 | mirror the use of case in the underlying standards. | |
1107 | ||
1108 | The field 'Description' MAY appear more than one time and contains a | |
1109 | description of the tag or subtag in the record. At least one of the | |
1110 | 'Description' fields MUST be written or transcribed into the Latin | |
1111 | script; the same or additional fields MAY also include a description | |
1112 | in a non-Latin script. The 'Description' field is used for | |
1113 | identification purposes and SHOULD NOT be taken to represent the | |
1114 | actual native name of the language or variation or to be in any | |
1115 | particular language. Most descriptions are taken directly from | |
1116 | source standards such as ISO 639 or ISO 3166. | |
1117 | ||
1118 | ||
1119 | ||
1120 | ||
1121 | ||
1122 | Phillips & Davis Best Current Practice [Page 20] | |
1123 | \f | |
1124 | RFC 4646 Tags for Identifying Languages September 2006 | |
1125 | ||
1126 | ||
1127 | Note: Descriptions in registry entries that correspond to ISO 639, | |
1128 | ISO 15924, ISO 3166, or UN M.49 codes are intended only to indicate | |
1129 | the meaning of that identifier as defined in the source standard at | |
1130 | the time it was added to the registry. The description does not | |
1131 | replace the content of the source standard itself. The descriptions | |
1132 | are not intended to be the English localized names for the subtags. | |
1133 | Localization or translation of language tag and subtag descriptions | |
1134 | is out of scope of this document. | |
1135 | ||
1136 | Each record MAY also contain the following fields: | |
1137 | ||
1138 | o Preferred-Value | |
1139 | ||
1140 | * For fields of type 'language', 'extlang', 'script', 'region', | |
1141 | and 'variant', 'Preferred-Value' contains the subtag of the | |
1142 | same 'Type' that is preferred for forming the language tag. | |
1143 | ||
1144 | * For fields of type 'grandfathered' and 'redundant', a canonical | |
1145 | mapping to a complete language tag. | |
1146 | ||
1147 | o Deprecated | |
1148 | ||
1149 | * Deprecated's field-value contains the date the record was | |
1150 | deprecated. | |
1151 | ||
1152 | o Prefix | |
1153 | ||
1154 | * Prefix's field-value contains a language tag with which this | |
1155 | subtag MAY be used to form a new language tag, perhaps with | |
1156 | other subtags as well. This field MUST only appear in records | |
1157 | whose 'Type' field-value is 'variant' or 'extlang'. For | |
1158 | example, the 'Prefix' for the variant 'nedis' is 'sl', meaning | |
1159 | that the tags "sl-nedis" and "sl-IT-nedis" might be appropriate | |
1160 | while the tag "is-nedis" is not. | |
1161 | ||
1162 | o Comments | |
1163 | ||
1164 | * Comments contains additional information about the subtag, as | |
1165 | deemed appropriate for understanding the registry and | |
1166 | implementing language tags using the subtag or tag. | |
1167 | ||
1168 | o Suppress-Script | |
1169 | ||
1170 | * Suppress-Script contains a script subtag that SHOULD NOT be | |
1171 | used to form language tags with the associated primary language | |
1172 | subtag. This field MUST only appear in records whose 'Type' | |
1173 | field-value is 'language'. See Section 4.1. | |
1174 | ||
1175 | ||
1176 | ||
1177 | ||
1178 | Phillips & Davis Best Current Practice [Page 21] | |
1179 | \f | |
1180 | RFC 4646 Tags for Identifying Languages September 2006 | |
1181 | ||
1182 | ||
1183 | The field 'Deprecated' MAY be added to any record via the maintenance | |
1184 | process described in Section 3.3 or via the registration process | |
1185 | described in Section 3.5. Usually, the addition of a 'Deprecated' | |
1186 | field is due to the action of one of the standards bodies, such as | |
1187 | ISO 3166, withdrawing a code. In some historical cases, it might not | |
1188 | have been possible to reconstruct the original deprecation date. For | |
1189 | these cases, an approximate date appears in the registry. Although | |
1190 | valid in language tags, subtags and tags with a 'Deprecated' field | |
1191 | are deprecated and validating processors SHOULD NOT generate these | |
1192 | subtags. Note that a record that contains a 'Deprecated' field and | |
1193 | no corresponding 'Preferred-Value' field has no replacement mapping. | |
1194 | ||
1195 | The field 'Preferred-Value' contains a mapping between the record in | |
1196 | which it appears and another tag or subtag. The value in this field | |
1197 | is STRONGLY RECOMMENDED as the best choice to represent the value of | |
1198 | this record when selecting a language tag. These values form three | |
1199 | groups: | |
1200 | ||
1201 | 1. ISO 639 language codes that were later withdrawn in favor of | |
1202 | other codes. These values are mostly a historical curiosity. | |
1203 | ||
1204 | 2. ISO 3166 region codes that have been withdrawn in favor of a new | |
1205 | code. This sometimes happens when a country changes its name or | |
1206 | administration in such a way that warrants a new region code. | |
1207 | ||
1208 | 3. Tags grandfathered from RFC 3066. In many cases, these tags have | |
1209 | become obsolete because the values they represent were later | |
1210 | encoded by ISO 639. | |
1211 | ||
1212 | Records that contain a 'Preferred-Value' field MUST also have a | |
1213 | 'Deprecated' field. This field contains a date of deprecation. | |
1214 | Thus, a language tag processor can use the registry to construct the | |
1215 | valid, non-deprecated set of subtags for a given date. In addition, | |
1216 | for any given tag, a processor can construct the set of valid | |
1217 | language tags that correspond to that tag for all dates up to the | |
1218 | date of the registry. The ability to do these mappings MAY be | |
1219 | beneficial to applications that are matching, selecting, for | |
1220 | filtering content based on its language tags. | |
1221 | ||
1222 | Note that 'Preferred-Value' mappings in records of type 'region' | |
1223 | sometimes do not represent exactly the same meaning as the original | |
1224 | value. There are many reasons for a country code to be changed, and | |
1225 | the effect this has on the formation of language tags will depend on | |
1226 | the nature of the change in question. | |
1227 | ||
1228 | In particular, the 'Preferred-Value' field does not imply retagging | |
1229 | content that uses the affected subtag. | |
1230 | ||
1231 | ||
1232 | ||
1233 | ||
1234 | Phillips & Davis Best Current Practice [Page 22] | |
1235 | \f | |
1236 | RFC 4646 Tags for Identifying Languages September 2006 | |
1237 | ||
1238 | ||
1239 | The field 'Preferred-Value' MUST NOT be modified once created in the | |
1240 | registry. The field MAY be added to records of type "grandfathered" | |
1241 | and "region" according to the rules in Section 3.3. Otherwise the | |
1242 | field MUST NOT be added to any record already in the registry. | |
1243 | ||
1244 | The 'Preferred-Value' field in records of type "grandfathered" and | |
1245 | "redundant" contains whole language tags that are strongly | |
1246 | RECOMMENDED for use in place of the record's value. In many cases, | |
1247 | the mappings were created by deprecation of the tags during the | |
1248 | period before this document was adopted. For example, the tag | |
1249 | "no-nyn" was deprecated in favor of the ISO 639-1-defined language | |
1250 | code 'nn'. | |
1251 | ||
1252 | Records of type 'variant' MAY have more than one field of type | |
1253 | 'Prefix'. Additional fields of this type MAY be added to a 'variant' | |
1254 | record via the registration process. | |
1255 | ||
1256 | Records of type 'extlang' MUST have _exactly_ one 'Prefix' field. | |
1257 | ||
1258 | The field-value of the 'Prefix' field consists of a language tag | |
1259 | whose subtags are appropriate to use with this subtag. For example, | |
1260 | the variant subtag '1996' has a 'Prefix' field of "de". This means | |
1261 | that tags starting with the sequence "de-" are appropriate with this | |
1262 | subtag, so "de-Latg-1996" and "de-CH-1996" are both acceptable, while | |
1263 | the tag "fr-1996" is an inappropriate choice. | |
1264 | ||
1265 | The field of type 'Prefix' MUST NOT be removed from any record. The | |
1266 | field-value for this type of field MUST NOT be modified. | |
1267 | ||
1268 | The field 'Comments' MAY appear more than once per record. This | |
1269 | field MAY be inserted or changed via the registration process and no | |
1270 | guarantee of stability is provided. The content of this field is not | |
1271 | restricted, except by the need to register the information, the | |
1272 | suitability of the request, and by reasonable practical size | |
1273 | limitations. | |
1274 | ||
1275 | The field 'Suppress-Script' MUST only appear in records whose 'Type' | |
1276 | field-value is 'language'. This field MUST NOT appear more than one | |
1277 | time in a record. This field indicates a script used to write the | |
1278 | overwhelming majority of documents for the given language and that | |
1279 | therefore adds no distinguishing information to a language tag. It | |
1280 | helps ensure greater compatibility between the language tags | |
1281 | generated according to the rules in this document and language tags | |
1282 | and tag processors or consumers based on RFC 3066. For example, | |
1283 | virtually all Icelandic documents are written in the Latin script, | |
1284 | making the subtag 'Latn' redundant in the tag "is-Latn". | |
1285 | ||
1286 | ||
1287 | ||
1288 | ||
1289 | ||
1290 | Phillips & Davis Best Current Practice [Page 23] | |
1291 | \f | |
1292 | RFC 4646 Tags for Identifying Languages September 2006 | |
1293 | ||
1294 | ||
1295 | 3.2. Language Subtag Reviewer | |
1296 | ||
1297 | The Language Subtag Reviewer is appointed by the IESG for an | |
1298 | indefinite term, subject to removal or replacement at the IESG's | |
1299 | discretion. The Language Subtag Reviewer moderates the ietf- | |
1300 | languages mailing list, responds to requests for registration, and | |
1301 | performs the other registry maintenance duties described in | |
1302 | Section 3.3. Only the Language Subtag Reviewer is permitted to | |
1303 | request IANA to change, update, or add records to the Language Subtag | |
1304 | Registry. | |
1305 | ||
1306 | The performance or decisions of the Language Subtag Reviewer MAY be | |
1307 | appealed to the IESG under the same rules as other IETF decisions | |
1308 | (see [RFC2026]). The IESG can reverse or overturn the decision of | |
1309 | the Language Subtag Reviewer, provide guidance, or take other | |
1310 | appropriate actions. | |
1311 | ||
1312 | 3.3. Maintenance of the Registry | |
1313 | ||
1314 | Maintenance of the registry requires that as codes are assigned or | |
1315 | withdrawn by ISO 639, ISO 15924, ISO 3166, and UN M.49, the Language | |
1316 | Subtag Reviewer MUST evaluate each change, determine whether it | |
1317 | conflicts with existing registry entries, and submit the information | |
1318 | to IANA for inclusion in the registry. If a change takes place and | |
1319 | the Language Subtag Reviewer does not do this in a timely manner, | |
1320 | then any interested party MAY use the procedure in Section 3.5 to | |
1321 | register the appropriate update. | |
1322 | ||
1323 | Note: The redundant and grandfathered entries together are the | |
1324 | complete list of tags registered under [RFC3066]. The redundant tags | |
1325 | are those that can now be formed using the subtags defined in the | |
1326 | registry together with the rules of Section 2.2. The grandfathered | |
1327 | entries include those that can never be legal under those same | |
1328 | provisions. | |
1329 | ||
1330 | The set of redundant and grandfathered tags is permanent and stable: | |
1331 | new entries in this section MUST NOT be added and existing entries | |
1332 | MUST NOT be removed. Records of type 'grandfathered' MAY have their | |
1333 | type converted to 'redundant'; see item 12 in Section 3.6 for more | |
1334 | information. The decision-making process about which tags were | |
1335 | initially grandfathered and which were made redundant is described in | |
1336 | [RFC4645]. | |
1337 | ||
1338 | RFC 3066 tags that were deprecated prior to the adoption of this | |
1339 | document are part of the list of grandfathered tags, and their | |
1340 | component subtags were not included as registered variants (although | |
1341 | they remain eligible for registration). For example, the tag | |
1342 | "art-lojban" was deprecated in favor of the language subtag 'jbo'. | |
1343 | ||
1344 | ||
1345 | ||
1346 | Phillips & Davis Best Current Practice [Page 24] | |
1347 | \f | |
1348 | RFC 4646 Tags for Identifying Languages September 2006 | |
1349 | ||
1350 | ||
1351 | The Language Subtag Reviewer MUST ensure that new subtags meet the | |
1352 | requirements in Section 4.1 or submit an appropriate alternate subtag | |
1353 | as described in that section. When either a change or addition to | |
1354 | the registry is needed, the Language Subtag Reviewer MUST prepare the | |
1355 | complete record, including all fields, and forward it to IANA for | |
1356 | insertion into the registry. Each record being modified or inserted | |
1357 | MUST be forwarded in a separate message. | |
1358 | ||
1359 | If a record represents a new subtag that does not currently exist in | |
1360 | the registry, then the message's subject line MUST include the word | |
1361 | "INSERT". If the record represents a change to an existing subtag, | |
1362 | then the subject line of the message MUST include the word "MODIFY". | |
1363 | The message MUST contain both the record for the subtag being | |
1364 | inserted or modified and the new File-Date record. Here is an | |
1365 | example of what the body of the message might contain: | |
1366 | ||
1367 | LANGUAGE SUBTAG MODIFICATION | |
1368 | File-Date: 2005-01-02 | |
1369 | %% | |
1370 | Type: variant | |
1371 | Subtag: nedis | |
1372 | Description: Natisone dialect | |
1373 | Description: Nadiza dialect | |
1374 | Added: 2003-10-09 | |
1375 | Prefix: sl | |
1376 | Comments: This is a comment shown | |
1377 | as an example. | |
1378 | %% | |
1379 | ||
1380 | Figure 4: Example of a Language Subtag Modification Form | |
1381 | ||
1382 | Whenever an entry is created or modified in the registry, the | |
1383 | 'File-Date' record at the start of the registry is updated to reflect | |
1384 | the most recent modification date in the [RFC3339] "full-date" | |
1385 | format. | |
1386 | ||
1387 | Before forwarding a new registration to IANA, the Language Subtag | |
1388 | Reviewer MUST ensure that values in the 'Subtag' field match case | |
1389 | according to the description in Section 3.1. | |
1390 | ||
1391 | 3.4. Stability of IANA Registry Entries | |
1392 | ||
1393 | The stability of entries and their meaning in the registry is | |
1394 | critical to the long-term stability of language tags. The rules in | |
1395 | this section guarantee that a specific language tag's meaning is | |
1396 | stable over time and will not change. | |
1397 | ||
1398 | ||
1399 | ||
1400 | ||
1401 | ||
1402 | Phillips & Davis Best Current Practice [Page 25] | |
1403 | \f | |
1404 | RFC 4646 Tags for Identifying Languages September 2006 | |
1405 | ||
1406 | ||
1407 | These rules specifically deal with how changes to codes (including | |
1408 | withdrawal and deprecation of codes) maintained by ISO 639, ISO | |
1409 | 15924, ISO 3166, and UN M.49 are reflected in the IANA Language | |
1410 | Subtag Registry. Assignments to the IANA Language Subtag Registry | |
1411 | MUST follow the following stability rules: | |
1412 | ||
1413 | 1. Values in the fields 'Type', 'Subtag', 'Tag', 'Added', | |
1414 | 'Deprecated' and 'Preferred-Value' MUST NOT be changed and are | |
1415 | guaranteed to be stable over time. | |
1416 | ||
1417 | 2. Values in the 'Description' field MUST NOT be changed in a way | |
1418 | that would invalidate previously-existing tags. They MAY be | |
1419 | broadened somewhat in scope, changed to add information, or | |
1420 | adapted to the most common modern usage. For example, countries | |
1421 | occasionally change their official names; a historical example | |
1422 | of this would be "Upper Volta" changing to "Burkina Faso". | |
1423 | ||
1424 | 3. Values in the field 'Prefix' MAY be added to records of type | |
1425 | 'variant' via the registration process. | |
1426 | ||
1427 | 4. Values in the field 'Prefix' MAY be modified, so long as the | |
1428 | modifications broaden the set of prefixes. That is, a prefix | |
1429 | MAY be replaced by one of its own prefixes. For example, the | |
1430 | prefix "en-US" could be replaced by "en", but not by the | |
1431 | prefixes "en-Latn", "fr", or "en-US-boont". If one of those | |
1432 | prefixes were needed, a new Prefix SHOULD be registered. | |
1433 | ||
1434 | 5. Values in the field 'Prefix' MUST NOT be removed. | |
1435 | ||
1436 | 6. The field 'Comments' MAY be added, changed, modified, or removed | |
1437 | via the registration process or any of the processes or | |
1438 | considerations described in this section. | |
1439 | ||
1440 | 7. The field 'Suppress-Script' MAY be added or removed via the | |
1441 | registration process. | |
1442 | ||
1443 | 8. Codes assigned by ISO 639, ISO 15924, and ISO 3166 that do not | |
1444 | conflict with existing subtags of the associated type and whose | |
1445 | meaning is not the same as an existing subtag of the same type | |
1446 | are entered into the IANA registry as new records. | |
1447 | ||
1448 | 9. Codes assigned by ISO 639, ISO 15924, or ISO 3166 that are | |
1449 | withdrawn by their respective maintenance or registration | |
1450 | authority remain valid in language tags. A 'Deprecated' field | |
1451 | containing the date of withdrawal is added to the record. If a | |
1452 | new record of the same type is added that represents a | |
1453 | ||
1454 | ||
1455 | ||
1456 | ||
1457 | ||
1458 | Phillips & Davis Best Current Practice [Page 26] | |
1459 | \f | |
1460 | RFC 4646 Tags for Identifying Languages September 2006 | |
1461 | ||
1462 | ||
1463 | replacement value, then a 'Preferred-Value' field MAY also be | |
1464 | added. The registration process MAY be used to add comments | |
1465 | about the withdrawal of the code by the respective standard. | |
1466 | ||
1467 | Example | |
1468 | The region code 'TL' was assigned to the country 'Timor- | |
1469 | Leste', replacing the code 'TP' (which was assigned to 'East | |
1470 | Timor' when it was under administration by Portugal). The | |
1471 | subtag 'TP' remains valid in language tags, but its record | |
1472 | contains the a 'Preferred-Value' of 'TL' and its field | |
1473 | 'Deprecated' contains the date the new code was assigned | |
1474 | ('2004-07-06'). | |
1475 | ||
1476 | 10. Codes assigned by ISO 639, ISO 15924, or ISO 3166 that conflict | |
1477 | with existing subtags of the associated type, including subtags | |
1478 | that are deprecated, MUST NOT be entered into the registry. The | |
1479 | following additional considerations apply to subtag values that | |
1480 | are reassigned: | |
1481 | ||
1482 | A. For ISO 639 codes, if the newly assigned code's meaning is | |
1483 | not represented by a subtag in the IANA registry, the | |
1484 | Language Subtag Reviewer, as described in Section 3.5, SHALL | |
1485 | prepare a proposal for entering in the IANA registry as soon | |
1486 | as practical a registered language subtag as an alternate | |
1487 | value for the new code. The form of the registered language | |
1488 | subtag will be at the discretion of the Language Subtag | |
1489 | Reviewer and MUST conform to other restrictions on language | |
1490 | subtags in this document. | |
1491 | ||
1492 | B. For all subtags whose meaning is derived from an external | |
1493 | standard (i.e., ISO 639, ISO 15924, ISO 3166, or UN M.49), | |
1494 | if a new meaning is assigned to an existing code and the new | |
1495 | meaning broadens the meaning of that code, then the meaning | |
1496 | for the associated subtag MAY be changed to match. The | |
1497 | meaning of a subtag MUST NOT be narrowed, however, as this | |
1498 | can result in an unknown proportion of the existing uses of | |
1499 | a subtag becoming invalid. Note: ISO 639 maintenance | |
1500 | agency/registration authority (MA/RA) has adopted a similar | |
1501 | stability policy. | |
1502 | ||
1503 | C. For ISO 15924 codes, if the newly assigned code's meaning is | |
1504 | not represented by a subtag in the IANA registry, the | |
1505 | Language Subtag Reviewer, as described in Section 3.5, SHALL | |
1506 | prepare a proposal for entering in the IANA registry as soon | |
1507 | as practical a registered variant subtag as an alternate | |
1508 | value for the new code. The form of the registered variant | |
1509 | ||
1510 | ||
1511 | ||
1512 | ||
1513 | ||
1514 | Phillips & Davis Best Current Practice [Page 27] | |
1515 | \f | |
1516 | RFC 4646 Tags for Identifying Languages September 2006 | |
1517 | ||
1518 | ||
1519 | subtag will be at the discretion of the Language Subtag | |
1520 | Reviewer and MUST conform to other restrictions on variant | |
1521 | subtags in this document. | |
1522 | ||
1523 | D. For ISO 3166 codes, if the newly assigned code's meaning is | |
1524 | associated with the same UN M.49 code as another 'region' | |
1525 | subtag, then the existing region subtag remains as the | |
1526 | preferred value for that region and no new entry is created. | |
1527 | A comment MAY be added to the existing region subtag | |
1528 | indicating the relationship to the new ISO 3166 code. | |
1529 | ||
1530 | E. For ISO 3166 codes, if the newly assigned code's meaning is | |
1531 | associated with a UN M.49 code that is not represented by an | |
1532 | existing region subtag, then the Language Subtag Reviewer, | |
1533 | as described in Section 3.5, SHALL prepare a proposal for | |
1534 | entering the appropriate UN M.49 country code as an entry in | |
1535 | the IANA registry. | |
1536 | ||
1537 | F. For ISO 3166 codes, if there is no associated UN numeric | |
1538 | code, then the Language Subtag Reviewer SHALL petition the | |
1539 | UN to create one. If there is no response from the UN | |
1540 | within ninety days of the request being sent, the Language | |
1541 | Subtag Reviewer SHALL prepare a proposal for entering in the | |
1542 | IANA registry as soon as practical a registered variant | |
1543 | subtag as an alternate value for the new code. The form of | |
1544 | the registered variant subtag will be at the discretion of | |
1545 | the Language Subtag Reviewer and MUST conform to other | |
1546 | restrictions on variant subtags in this document. This | |
1547 | situation is very unlikely to ever occur. | |
1548 | ||
1549 | 11. UN M.49 has codes for both countries and areas (such as '276' | |
1550 | for Germany) and geographical regions and sub-regions (such as | |
1551 | '150' for Europe). UN M.49 country or area codes for which | |
1552 | there is no corresponding ISO 3166 code SHOULD NOT be | |
1553 | registered, except as a surrogate for an ISO 3166 code that is | |
1554 | blocked from registration by an existing subtag. If such a code | |
1555 | becomes necessary, then the registration authority for ISO 3166 | |
1556 | SHOULD first be petitioned to assign a code to the region. If | |
1557 | the petition for a code assignment by ISO 3166 is refused or not | |
1558 | acted on in a timely manner, the registration process described | |
1559 | in Section 3.5 MAY then be used to register the corresponding UN | |
1560 | M.49 code. At the time this document was written, there were | |
1561 | only four such codes: 830 (Channel Islands), 831 (Guernsey), 832 | |
1562 | (Jersey), and 833 (Isle of Man). This way, UN M.49 codes remain | |
1563 | available as the value of last resort in cases where ISO 3166 | |
1564 | reassigns a deprecated value in the registry. | |
1565 | ||
1566 | ||
1567 | ||
1568 | ||
1569 | ||
1570 | Phillips & Davis Best Current Practice [Page 28] | |
1571 | \f | |
1572 | RFC 4646 Tags for Identifying Languages September 2006 | |
1573 | ||
1574 | ||
1575 | 12. Stability provisions apply to grandfathered tags with this | |
1576 | exception: should all of the subtags in a grandfathered tag | |
1577 | become valid subtags in the IANA registry, then the field 'Type' | |
1578 | in that record is changed from 'grandfathered' to 'redundant'. | |
1579 | Note that this will not affect language tags that match the | |
1580 | grandfathered tag, since these tags will now match valid | |
1581 | generative subtag sequences. For example, if the subtag 'gan' | |
1582 | in the language tag "zh-gan" were to be registered as an | |
1583 | extended language subtag, then the grandfathered tag "zh-gan" | |
1584 | would be deprecated (but existing content or implementations | |
1585 | that use "zh-gan" would remain valid). | |
1586 | ||
1587 | 3.5. Registration Procedure for Subtags | |
1588 | ||
1589 | The procedure given here MUST be used by anyone who wants to use a | |
1590 | subtag not currently in the IANA Language Subtag Registry. | |
1591 | ||
1592 | Only subtags of type 'language' and 'variant' will be considered for | |
1593 | independent registration of new subtags. Handling of subtags needed | |
1594 | for stability and subtags necessary to keep the registry synchronized | |
1595 | with ISO 639, ISO 15924, ISO 3166, and UN M.49 within the limits | |
1596 | defined by this document are described in Section 3.3. Stability | |
1597 | provisions are described in Section 3.4. | |
1598 | ||
1599 | This procedure MAY also be used to register or alter the information | |
1600 | for the 'Description', 'Comments', 'Deprecated', or 'Prefix' fields | |
1601 | in a subtag's record as described in Section 3.4. Changes to all | |
1602 | other fields in the IANA registry are NOT permitted. | |
1603 | ||
1604 | Registering a new subtag or requesting modifications to an existing | |
1605 | tag or subtag starts with the requester filling out the registration | |
1606 | form reproduced below. Note that each response is not limited in | |
1607 | size so that the request can adequately describe the registration. | |
1608 | The fields in the "Record Requested" section SHOULD follow the | |
1609 | requirements in Section 3.1. | |
1610 | ||
1611 | ||
1612 | ||
1613 | ||
1614 | ||
1615 | ||
1616 | ||
1617 | ||
1618 | ||
1619 | ||
1620 | ||
1621 | ||
1622 | ||
1623 | ||
1624 | ||
1625 | ||
1626 | Phillips & Davis Best Current Practice [Page 29] | |
1627 | \f | |
1628 | RFC 4646 Tags for Identifying Languages September 2006 | |
1629 | ||
1630 | ||
1631 | LANGUAGE SUBTAG REGISTRATION FORM | |
1632 | 1. Name of requester: | |
1633 | 2. E-mail address of requester: | |
1634 | 3. Record Requested: | |
1635 | ||
1636 | Type: | |
1637 | Subtag: | |
1638 | Description: | |
1639 | Prefix: | |
1640 | Preferred-Value: | |
1641 | Deprecated: | |
1642 | Suppress-Script: | |
1643 | Comments: | |
1644 | ||
1645 | 4. Intended meaning of the subtag: | |
1646 | 5. Reference to published description | |
1647 | of the language (book or article): | |
1648 | 6. Any other relevant information: | |
1649 | ||
1650 | Figure 5: The Language Subtag Registration Form | |
1651 | ||
1652 | The subtag registration form MUST be sent to | |
1653 | <ietf-languages@iana.org> for a two-week review period before it can | |
1654 | be submitted to IANA. (This is an open list and can be joined by | |
1655 | sending a request to <ietf-languages-request@iana.org>.) | |
1656 | ||
1657 | Variant subtags are usually registered for use with a particular | |
1658 | range of language tags. For example, the subtag 'rozaj' is intended | |
1659 | for use with language tags that start with the primary language | |
1660 | subtag "sl", since Resian is a dialect of Slovenian. Thus, the | |
1661 | subtag 'rozaj' would be appropriate in tags such as "sl-Latn-rozaj" | |
1662 | or "sl-IT-rozaj". This information is stored in the 'Prefix' field | |
1663 | in the registry. Variant registration requests SHOULD include at | |
1664 | least one 'Prefix' field in the registration form. | |
1665 | ||
1666 | Extended language subtags are reserved for future standardization. | |
1667 | These subtags will be REQUIRED to include exactly one 'Prefix' field | |
1668 | once they are allowed for registration. | |
1669 | ||
1670 | The 'Prefix' field for a given registered subtag exists in the IANA | |
1671 | registry as a guide to usage. Additional prefixes MAY be added by | |
1672 | filing an additional registration form. In that form, the "Any other | |
1673 | relevant information:" field MUST indicate that it is the addition of | |
1674 | a prefix. | |
1675 | ||
1676 | Requests to add a prefix to a variant subtag that imply a different | |
1677 | semantic meaning will probably be rejected. For example, a request | |
1678 | to add the prefix "de" to the subtag 'nedis' so that the tag | |
1679 | ||
1680 | ||
1681 | ||
1682 | Phillips & Davis Best Current Practice [Page 30] | |
1683 | \f | |
1684 | RFC 4646 Tags for Identifying Languages September 2006 | |
1685 | ||
1686 | ||
1687 | "de-nedis" represented some German dialect would be rejected. The | |
1688 | 'nedis' subtag represents a particular Slovenian dialect and the | |
1689 | additional registration would change the semantic meaning assigned to | |
1690 | the subtag. A separate subtag SHOULD be proposed instead. | |
1691 | ||
1692 | The 'Description' field MUST contain a description of the tag being | |
1693 | registered written or transcribed into the Latin script; it MAY also | |
1694 | include a description in a non-Latin script. Non-ASCII characters | |
1695 | MUST be escaped using the syntax described in Section 3.1. The | |
1696 | 'Description' field is used for identification purposes and doesn't | |
1697 | necessarily represent the actual native name of the language or | |
1698 | variation or to be in any particular language. | |
1699 | ||
1700 | While the 'Description' field itself is not guaranteed to be stable | |
1701 | and errata corrections MAY be undertaken from time to time, attempts | |
1702 | to provide translations or transcriptions of entries in the registry | |
1703 | itself will probably be frowned upon by the community or rejected | |
1704 | outright, as changes of this nature have an impact on the provisions | |
1705 | in Section 3.4. | |
1706 | ||
1707 | When the two-week period has passed, the Language Subtag Reviewer | |
1708 | either forwards the record to be inserted or modified to | |
1709 | iana@iana.org according to the procedure described in Section 3.3, or | |
1710 | rejects the request because of significant objections raised on the | |
1711 | list or due to problems with constraints in this document (which MUST | |
1712 | be explicitly cited). The Language Subtag Reviewer MAY also extend | |
1713 | the review period in two-week increments to permit further | |
1714 | discussion. The Language Subtag Reviewer MUST indicate on the list | |
1715 | whether the registration has been accepted, rejected, or extended | |
1716 | following each two-week period. | |
1717 | ||
1718 | Note that the Language Subtag Reviewer MAY raise objections on the | |
1719 | list if he or she so desires. The important thing is that the | |
1720 | objection MUST be made publicly. | |
1721 | ||
1722 | The applicant is free to modify a rejected application with | |
1723 | additional information and submit it again; this restarts the two- | |
1724 | week comment period. | |
1725 | ||
1726 | Decisions made by the Language Subtag Reviewer MAY be appealed to the | |
1727 | IESG [RFC2028] under the same rules as other IETF decisions | |
1728 | [RFC2026]. | |
1729 | ||
1730 | All approved registration forms are available online in the directory | |
1731 | http://www.iana.org/numbers.html under "languages". | |
1732 | ||
1733 | ||
1734 | ||
1735 | ||
1736 | ||
1737 | ||
1738 | Phillips & Davis Best Current Practice [Page 31] | |
1739 | \f | |
1740 | RFC 4646 Tags for Identifying Languages September 2006 | |
1741 | ||
1742 | ||
1743 | Updates or changes to existing records follow the same procedure as | |
1744 | new registrations. The Language Subtag Reviewer decides whether | |
1745 | there is consensus to update the registration following the two-week | |
1746 | review period; normally, objections by the original registrant will | |
1747 | carry extra weight in forming such a consensus. | |
1748 | ||
1749 | Registrations are permanent and stable. Once registered, subtags | |
1750 | will not be removed from the registry and will remain a valid way in | |
1751 | which to specify a specific language or variant. | |
1752 | ||
1753 | Note: The purpose of the "Description" in the registration form is to | |
1754 | aid people trying to verify whether a language is registered or what | |
1755 | language or language variation a particular subtag refers to. In | |
1756 | most cases, reference to an authoritative grammar or dictionary of | |
1757 | that language will be useful; in cases where no such work exists, | |
1758 | other well-known works describing that language or in that language | |
1759 | MAY be appropriate. The Language Subtag Reviewer decides what | |
1760 | constitutes "good enough" reference material. This requirement is | |
1761 | not intended to exclude particular languages or dialects due to the | |
1762 | size of the speaker population or lack of a standardized orthography. | |
1763 | Minority languages will be considered equally on their own merits. | |
1764 | ||
1765 | 3.6. Possibilities for Registration | |
1766 | ||
1767 | Possibilities for registration of subtags or information about | |
1768 | subtags include: | |
1769 | ||
1770 | o Primary language subtags for languages not listed in ISO 639 that | |
1771 | are not variants of any listed or registered language MAY be | |
1772 | registered. At the time this document was created, there were no | |
1773 | examples of this form of subtag. Before attempting to register a | |
1774 | language subtag, there MUST be an attempt to register the language | |
1775 | with ISO 639. Subtags MUST NOT be registered for codes that exist | |
1776 | in ISO 639-1 or ISO 639-2, that are under consideration by the ISO | |
1777 | 639 maintenance or registration authorities, or that have never | |
1778 | been attempted for registration with those authorities. If ISO | |
1779 | 639 has previously rejected a language for registration, it is | |
1780 | reasonable to assume that there must be additional, very | |
1781 | compelling evidence of need before it will be registered in the | |
1782 | IANA registry (to the extent that it is very unlikely that any | |
1783 | subtags will be registered of this type). | |
1784 | ||
1785 | o Dialect or other divisions or variations within a language, its | |
1786 | orthography, writing system, regional or historical usage, | |
1787 | transliteration or other transformation, or distinguishing | |
1788 | variation MAY be registered as variant subtags. An example is the | |
1789 | 'rozaj' subtag (the Resian dialect of Slovenian). | |
1790 | ||
1791 | ||
1792 | ||
1793 | ||
1794 | Phillips & Davis Best Current Practice [Page 32] | |
1795 | \f | |
1796 | RFC 4646 Tags for Identifying Languages September 2006 | |
1797 | ||
1798 | ||
1799 | o The addition or maintenance of fields (generally of an | |
1800 | informational nature) in Tag or Subtag records as described in | |
1801 | Section 3.1 and subject to the stability provisions in | |
1802 | Section 3.4. This includes descriptions, comments, deprecation | |
1803 | and preferred values for obsolete or withdrawn codes, or the | |
1804 | addition of script or extlang information to primary language | |
1805 | subtags. | |
1806 | ||
1807 | o The addition of records and related field value changes necessary | |
1808 | to reflect assignments made by ISO 639, ISO 15924, ISO 3166, and | |
1809 | UN M.49 as described in Section 3.4. | |
1810 | ||
1811 | Subtags proposed for registration that would cause all or part of a | |
1812 | grandfathered tag to become redundant but whose meaning conflicts | |
1813 | with or alters the meaning of the grandfathered tag MUST be rejected. | |
1814 | ||
1815 | This document leaves the decision on what subtags or changes to | |
1816 | subtags are appropriate (or not) to the registration process | |
1817 | described in Section 3.5. | |
1818 | ||
1819 | Note: four-character primary language subtags are reserved to allow | |
1820 | for the possibility of alpha4 codes in some future addition to the | |
1821 | ISO 639 family of standards. | |
1822 | ||
1823 | ISO 639 defines a maintenance agency for additions to and changes in | |
1824 | the list of languages in ISO 639. This agency is: | |
1825 | ||
1826 | International Information Centre for Terminology (Infoterm) | |
1827 | Aichholzgasse 6/12, AT-1120 | |
1828 | Wien, Austria | |
1829 | Phone: +43 1 26 75 35 Ext. 312 Fax: +43 1 216 32 72 | |
1830 | ||
1831 | ISO 639-2 defines a maintenance agency for additions to and changes | |
1832 | in the list of languages in ISO 639-2. This agency is: | |
1833 | ||
1834 | Library of Congress | |
1835 | Network Development and MARC Standards Office | |
1836 | Washington, D.C. 20540 USA | |
1837 | Phone: +1 202 707 6237 Fax: +1 202 707 0115 | |
1838 | URL: http://www.loc.gov/standards/iso639-2 | |
1839 | ||
1840 | ||
1841 | ||
1842 | ||
1843 | ||
1844 | ||
1845 | ||
1846 | ||
1847 | ||
1848 | ||
1849 | ||
1850 | Phillips & Davis Best Current Practice [Page 33] | |
1851 | \f | |
1852 | RFC 4646 Tags for Identifying Languages September 2006 | |
1853 | ||
1854 | ||
1855 | The maintenance agency for ISO 3166 (country codes) is: | |
1856 | ||
1857 | ISO 3166 Maintenance Agency | |
1858 | c/o International Organization for Standardization | |
1859 | Case postale 56 | |
1860 | CH-1211 Geneva 20 Switzerland | |
1861 | Phone: +41 22 749 72 33 Fax: +41 22 749 73 49 | |
1862 | URL: http://www.iso.org/iso/en/prods-services/iso3166ma/index.html | |
1863 | ||
1864 | The registration authority for ISO 15924 (script codes) is: | |
1865 | ||
1866 | Unicode Consortium Box 391476 | |
1867 | Mountain View, CA 94039-1476, USA | |
1868 | URL: http://www.unicode.org/iso15924 | |
1869 | ||
1870 | The Statistics Division of the United Nations Secretariat maintains | |
1871 | the Standard Country or Area Codes for Statistical Use and can be | |
1872 | reached at: | |
1873 | ||
1874 | Statistical Services Branch | |
1875 | Statistics Division | |
1876 | United Nations, Room DC2-1620 | |
1877 | New York, NY 10017, USA | |
1878 | ||
1879 | Fax: +1-212-963-0623 | |
1880 | E-mail: statistics@un.org | |
1881 | URL: http://unstats.un.org/unsd/methods/m49/m49alpha.htm | |
1882 | ||
1883 | 3.7. Extensions and Extensions Registry | |
1884 | ||
1885 | Extension subtags are those introduced by single-character subtags | |
1886 | ("singletons") other than 'x'. They are reserved for the generation | |
1887 | of identifiers that contain a language component and are compatible | |
1888 | with applications that understand language tags. | |
1889 | ||
1890 | The structure and form of extensions are defined by this document so | |
1891 | that implementations can be created that are forward compatible with | |
1892 | applications that might be created using singletons in the future. | |
1893 | In addition, defining a mechanism for maintaining singletons will | |
1894 | lend stability to this document by reducing the likely need for | |
1895 | future revisions or updates. | |
1896 | ||
1897 | Single-character subtags are assigned by IANA using the "IETF | |
1898 | Consensus" policy defined by [RFC2434]. This policy requires the | |
1899 | development of an RFC, which SHALL define the name, purpose, | |
1900 | processes, and procedures for maintaining the subtags. The | |
1901 | maintaining or registering authority, including name, contact email, | |
1902 | ||
1903 | ||
1904 | ||
1905 | ||
1906 | Phillips & Davis Best Current Practice [Page 34] | |
1907 | \f | |
1908 | RFC 4646 Tags for Identifying Languages September 2006 | |
1909 | ||
1910 | ||
1911 | discussion list email, and URL location of the registry, MUST be | |
1912 | indicated clearly in the RFC. The RFC MUST specify or include each | |
1913 | of the following: | |
1914 | ||
1915 | o The specification MUST reference the specific version or revision | |
1916 | of this document that governs its creation and MUST reference this | |
1917 | section of this document. | |
1918 | ||
1919 | o The specification and all subtags defined by the specification | |
1920 | MUST follow the ABNF and other rules for the formation of tags and | |
1921 | subtags as defined in this document. In particular, it MUST | |
1922 | specify that case is not significant and that subtags MUST NOT | |
1923 | exceed eight characters in length. | |
1924 | ||
1925 | o The specification MUST specify a canonical representation. | |
1926 | ||
1927 | o The specification of valid subtags MUST be available over the | |
1928 | Internet and at no cost. | |
1929 | ||
1930 | o The specification MUST be in the public domain or available via a | |
1931 | royalty-free license acceptable to the IETF and specified in the | |
1932 | RFC. | |
1933 | ||
1934 | o The specification MUST be versioned, and each version of the | |
1935 | specification MUST be numbered, dated, and stable. | |
1936 | ||
1937 | o The specification MUST be stable. That is, extension subtags, | |
1938 | once defined by a specification, MUST NOT be retracted or change | |
1939 | in meaning in any substantial way. | |
1940 | ||
1941 | o The specification MUST include in a separate section the | |
1942 | registration form reproduced in this section (below) to be used in | |
1943 | registering the extension upon publication as an RFC. | |
1944 | ||
1945 | o IANA MUST be informed of changes to the contact information and | |
1946 | URL for the specification. | |
1947 | ||
1948 | IANA will maintain a registry of allocated single-character | |
1949 | (singleton) subtags. This registry MUST use the record-jar format | |
1950 | described by the ABNF in Section 3.1. Upon publication of an | |
1951 | extension as an RFC, the maintaining authority defined in the RFC | |
1952 | MUST forward this registration form to iesg@ietf.org, who MUST | |
1953 | forward the request to iana@iana.org. The maintaining authority of | |
1954 | the extension MUST maintain the accuracy of the record by sending an | |
1955 | updated full copy of the record to iana@iana.org with the subject | |
1956 | line "LANGUAGE TAG EXTENSION UPDATE" whenever content changes. Only | |
1957 | the 'Comments', 'Contact_Email', 'Mailing_List', and 'URL' fields MAY | |
1958 | be modified in these updates. | |
1959 | ||
1960 | ||
1961 | ||
1962 | Phillips & Davis Best Current Practice [Page 35] | |
1963 | \f | |
1964 | RFC 4646 Tags for Identifying Languages September 2006 | |
1965 | ||
1966 | ||
1967 | Failure to maintain this record, maintain the corresponding registry, | |
1968 | or meet other conditions imposed by this section of this document MAY | |
1969 | be appealed to the IESG [RFC2028] under the same rules as other IETF | |
1970 | decisions (see [RFC2026]) and MAY result in the authority to maintain | |
1971 | the extension being withdrawn or reassigned by the IESG. | |
1972 | ||
1973 | %% | |
1974 | Identifier: | |
1975 | Description: | |
1976 | Comments: | |
1977 | Added: | |
1978 | RFC: | |
1979 | Authority: | |
1980 | Contact_Email: | |
1981 | Mailing_List: | |
1982 | URL: | |
1983 | %% | |
1984 | ||
1985 | Figure 6: Format of Records in the Language Tag Extensions Registry | |
1986 | ||
1987 | 'Identifier' contains the single-character subtag (singleton) | |
1988 | assigned to the extension. The Internet-Draft submitted to define | |
1989 | the extension SHOULD specify which letter or digit to use, although | |
1990 | the IESG MAY change the assignment when approving the RFC. | |
1991 | ||
1992 | 'Description' contains the name and description of the extension. | |
1993 | ||
1994 | 'Comments' is an OPTIONAL field and MAY contain a broader description | |
1995 | of the extension. | |
1996 | ||
1997 | 'Added' contains the date the RFC was published in the "full-date" | |
1998 | format specified in [RFC3339]. For example: 2004-06-28 represents | |
1999 | June 28, 2004, in the Gregorian calendar. | |
2000 | ||
2001 | 'RFC' contains the RFC number assigned to the extension. | |
2002 | ||
2003 | 'Authority' contains the name of the maintaining authority for the | |
2004 | extension. | |
2005 | ||
2006 | 'Contact_Email' contains the email address used to contact the | |
2007 | maintaining authority. | |
2008 | ||
2009 | 'Mailing_List' contains the URL or subscription email address of the | |
2010 | mailing list used by the maintaining authority. | |
2011 | ||
2012 | 'URL' contains the URL of the registry for this extension. | |
2013 | ||
2014 | ||
2015 | ||
2016 | ||
2017 | ||
2018 | Phillips & Davis Best Current Practice [Page 36] | |
2019 | \f | |
2020 | RFC 4646 Tags for Identifying Languages September 2006 | |
2021 | ||
2022 | ||
2023 | The determination of whether an Internet-Draft meets the above | |
2024 | conditions and the decision to grant or withhold such authority rests | |
2025 | solely with the IESG and is subject to the normal review and appeals | |
2026 | process associated with the RFC process. | |
2027 | ||
2028 | Extension authors are strongly cautioned that many (including most | |
2029 | well-formed) processors will be unaware of any special relationships | |
2030 | or meaning inherent in the order of extension subtags. Extension | |
2031 | authors SHOULD avoid subtag relationships or canonicalization | |
2032 | mechanisms that interfere with matching or with length restrictions | |
2033 | that sometimes exist in common protocols where the extension is used. | |
2034 | In particular, applications MAY truncate the subtags in doing | |
2035 | matching or in fitting into limited lengths, so it is RECOMMENDED | |
2036 | that the most significant information be in the most significant | |
2037 | (left-most) subtags and that the specification gracefully handle | |
2038 | truncated subtags. | |
2039 | ||
2040 | When a language tag is to be used in a specific, known, protocol, it | |
2041 | is RECOMMENDED that the language tag not contain extensions not | |
2042 | supported by that protocol. In addition, note that some protocols | |
2043 | MAY impose upper limits on the length of the strings used to store or | |
2044 | transport the language tag. | |
2045 | ||
2046 | 3.8. Initialization of the Registries | |
2047 | ||
2048 | Upon adoption of this document, an initial version of the Language | |
2049 | Subtag Registry containing the various subtags initially valid in a | |
2050 | language tag is necessary. This collection of subtags, along with a | |
2051 | description of the process used to create it, is described by | |
2052 | [RFC4645]. IANA SHALL publish the initial version of the registry | |
2053 | described by this document from the content of [RFC4645]. Once | |
2054 | published by IANA, the maintenance procedures, rules, and | |
2055 | registration processes described in this document will be available | |
2056 | for new registrations or updates. | |
2057 | ||
2058 | Registrations that are in process under the rules defined in | |
2059 | [RFC3066] when this document is adopted MAY be completed under the | |
2060 | former rules, at the discretion of the Language Tag Reviewer (as | |
2061 | described in [RFC3066]). Until the IESG officially appoints a | |
2062 | Language Subtag Reviewer, the existing Language Tag Reviewer SHALL | |
2063 | serve as the Language Subtag Reviewer. | |
2064 | ||
2065 | Any new registrations submitted using the RFC 3066 forms or format | |
2066 | after the adoption of this document and publication of the registry | |
2067 | by IANA MUST be rejected. | |
2068 | ||
2069 | ||
2070 | ||
2071 | ||
2072 | ||
2073 | ||
2074 | Phillips & Davis Best Current Practice [Page 37] | |
2075 | \f | |
2076 | RFC 4646 Tags for Identifying Languages September 2006 | |
2077 | ||
2078 | ||
2079 | An initial version of the Language Tag Extensions Registry described | |
2080 | in Section 3.7 is also needed. The Language Tag Extensions Registry | |
2081 | SHALL be initialized with a single record containing a single field | |
2082 | of type "File-Date" as a placeholder for future assignments. | |
2083 | ||
2084 | 4. Formation and Processing of Language Tags | |
2085 | ||
2086 | This section addresses how to use the information in the registry | |
2087 | with the tag syntax to choose, form, and process language tags. | |
2088 | ||
2089 | 4.1. Choice of Language Tag | |
2090 | ||
2091 | One is sometimes faced with the choice between several possible tags | |
2092 | for the same body of text. | |
2093 | ||
2094 | Interoperability is best served when all users use the same language | |
2095 | tag in order to represent the same language. If an application has | |
2096 | requirements that make the rules here inapplicable, then that | |
2097 | application risks damaging interoperability. It is strongly | |
2098 | RECOMMENDED that users not define their own rules for language tag | |
2099 | choice. | |
2100 | ||
2101 | Subtags SHOULD only be used where they add useful distinguishing | |
2102 | information; extraneous subtags interfere with the meaning, | |
2103 | understanding, and processing of language tags. In particular, users | |
2104 | and implementations SHOULD follow the 'Prefix' and 'Suppress-Script' | |
2105 | fields in the registry (defined in Section 3.1): these fields provide | |
2106 | guidance on when specific additional subtags SHOULD (and SHOULD NOT) | |
2107 | be used in a language tag. | |
2108 | ||
2109 | Of particular note, many applications can benefit from the use of | |
2110 | script subtags in language tags, as long as the use is consistent for | |
2111 | a given context. Script subtags were not formally defined in RFC | |
2112 | 3066 and their use can affect matching and subtag identification by | |
2113 | implementations of RFC 3066, as these subtags appear between the | |
2114 | primary language and region subtags. For example, if a user requests | |
2115 | content in an implementation of Section 2.5 of [RFC3066] using the | |
2116 | language range "en-US", content labeled "en-Latn-US" will not match | |
2117 | the request. Therefore, it is important to know when script subtags | |
2118 | will customarily be used and when they ought not be used. In the | |
2119 | registry, the Suppress-Script field helps ensure greater | |
2120 | compatibility between the language tags generated according to the | |
2121 | rules in this document and language tags and tag processors or | |
2122 | consumers based on RFC 3066 by defining when users SHOULD NOT include | |
2123 | a script subtag with a particular primary language subtag. | |
2124 | ||
2125 | ||
2126 | ||
2127 | ||
2128 | ||
2129 | ||
2130 | Phillips & Davis Best Current Practice [Page 38] | |
2131 | \f | |
2132 | RFC 4646 Tags for Identifying Languages September 2006 | |
2133 | ||
2134 | ||
2135 | Extended language subtags (type 'extlang' in the registry; see | |
2136 | Section 3.1) also appear between the primary language and region | |
2137 | subtags and are reserved for future standardization. Applications | |
2138 | might benefit from their judicious use in forming language tags in | |
2139 | the future. Similar recommendations are expected to apply to their | |
2140 | use as apply to script subtags. | |
2141 | ||
2142 | Standards, protocols, and applications that reference this document | |
2143 | normatively but apply different rules to the ones given in this | |
2144 | section MUST specify how the procedure varies from the one given | |
2145 | here. | |
2146 | ||
2147 | The choice of subtags used to form a language tag SHOULD be guided by | |
2148 | the following rules: | |
2149 | ||
2150 | 1. Use as precise a tag as possible, but no more specific than is | |
2151 | justified. Avoid using subtags that are not important for | |
2152 | distinguishing content in an application. | |
2153 | ||
2154 | * For example, 'de' might suffice for tagging an email written | |
2155 | in German, while "de-CH-1996" is probably unnecessarily | |
2156 | precise for such a task. | |
2157 | ||
2158 | 2. The script subtag SHOULD NOT be used to form language tags unless | |
2159 | the script adds some distinguishing information to the tag. The | |
2160 | field 'Suppress-Script' in the primary language record in the | |
2161 | registry indicates which script subtags do not add distinguishing | |
2162 | information for most applications. | |
2163 | ||
2164 | * For example, the subtag 'Latn' should not be used with the | |
2165 | primary language 'en' because nearly all English documents are | |
2166 | written in the Latin script and it adds no distinguishing | |
2167 | information. However, if a document were written in English | |
2168 | mixing Latin script with another script such as Braille | |
2169 | ('Brai'), then it might be appropriate to choose to indicate | |
2170 | both scripts to aid in content selection, such as the | |
2171 | application of a style sheet. | |
2172 | ||
2173 | 3. If a tag or subtag has a 'Preferred-Value' field in its registry | |
2174 | entry, then the value of that field SHOULD be used to form the | |
2175 | language tag in preference to the tag or subtag in which the | |
2176 | preferred value appears. | |
2177 | ||
2178 | * For example, use 'he' for Hebrew in preference to 'iw'. | |
2179 | ||
2180 | ||
2181 | ||
2182 | ||
2183 | ||
2184 | ||
2185 | ||
2186 | Phillips & Davis Best Current Practice [Page 39] | |
2187 | \f | |
2188 | RFC 4646 Tags for Identifying Languages September 2006 | |
2189 | ||
2190 | ||
2191 | 4. The 'und' (Undetermined) primary language subtag SHOULD NOT be | |
2192 | used to label content, even if the language is unknown. Omitting | |
2193 | the language tag altogether is preferred to using a tag with a | |
2194 | primary language subtag of 'und'. The 'und' subtag MAY be useful | |
2195 | for protocols that require a language tag to be provided. The | |
2196 | 'und' subtag MAY also be useful when matching language tags in | |
2197 | certain situations. | |
2198 | ||
2199 | 5. The 'mul' (Multiple) primary language subtag SHOULD NOT be used | |
2200 | whenever the protocol allows the separate tags for multiple | |
2201 | languages, as is the case for the Content-Language header in | |
2202 | HTTP. The 'mul' subtag conveys little useful information: | |
2203 | content in multiple languages SHOULD individually tag the | |
2204 | languages where they appear or otherwise indicate the actual | |
2205 | language in preference to the 'mul' subtag. | |
2206 | ||
2207 | 6. The same variant subtag SHOULD NOT be used more than once within | |
2208 | a language tag. | |
2209 | ||
2210 | * For example, do not use "de-DE-1901-1901". | |
2211 | ||
2212 | To ensure consistent backward compatibility, this document contains | |
2213 | several provisions to account for potential instability in the | |
2214 | standards used to define the subtags that make up language tags. | |
2215 | These provisions mean that no language tag created under the rules in | |
2216 | this document will become obsolete. | |
2217 | ||
2218 | 4.2. Meaning of the Language Tag | |
2219 | ||
2220 | The relationship between the tag and the information it relates to is | |
2221 | defined by the context in which the tag appears. Accordingly, this | |
2222 | section gives only possible examples of its usage. | |
2223 | ||
2224 | o For a single information object, the associated language tags | |
2225 | might be interpreted as the set of languages that is necessary for | |
2226 | a complete comprehension of the complete object. Example: Plain | |
2227 | text documents. | |
2228 | ||
2229 | o For an aggregation of information objects, the associated language | |
2230 | tags could be taken as the set of languages used inside components | |
2231 | of that aggregation. Examples: Document stores and libraries. | |
2232 | ||
2233 | o For information objects whose purpose is to provide alternatives, | |
2234 | the associated language tags could be regarded as a hint that the | |
2235 | content is provided in several languages and that one has to | |
2236 | inspect each of the alternatives in order to find its language or | |
2237 | languages. In this case, the presence of multiple tags might not | |
2238 | mean that one needs to be multi-lingual to get complete | |
2239 | ||
2240 | ||
2241 | ||
2242 | Phillips & Davis Best Current Practice [Page 40] | |
2243 | \f | |
2244 | RFC 4646 Tags for Identifying Languages September 2006 | |
2245 | ||
2246 | ||
2247 | understanding of the document. Example: MIME multipart/ | |
2248 | alternative. | |
2249 | ||
2250 | o In markup languages, such as HTML and XML, language information | |
2251 | can be added to each part of the document identified by the markup | |
2252 | structure (including the whole document itself). For example, one | |
2253 | could write <span lang="fr">C'est la vie.</span> inside a | |
2254 | Norwegian document; the Norwegian-speaking user could then access | |
2255 | a French-Norwegian dictionary to find out what the marked section | |
2256 | meant. If the user were listening to that document through a | |
2257 | speech synthesis interface, this formation could be used to signal | |
2258 | the synthesizer to appropriately apply French text-to-speech | |
2259 | pronunciation rules to that span of text, instead of applying the | |
2260 | inappropriate Norwegian rules. | |
2261 | ||
2262 | Language tags are related when they contain a similar sequence of | |
2263 | subtags. For example, if a language tag B contains language tag A as | |
2264 | a prefix, then B is typically "narrower" or "more specific" than A. | |
2265 | Thus, "zh-Hant-TW" is more specific than "zh-Hant". | |
2266 | ||
2267 | This relationship is not guaranteed in all cases: specifically, | |
2268 | languages that begin with the same sequence of subtags are NOT | |
2269 | guaranteed to be mutually intelligible, although they might be. For | |
2270 | example, the tag "az" shares a prefix with both "az-Latn" | |
2271 | (Azerbaijani written using the Latin script) and "az-Cyrl" | |
2272 | (Azerbaijani written using the Cyrillic script). A person fluent in | |
2273 | one script might not be able to read the other, even though the text | |
2274 | might be identical. Content tagged as "az" most probably is written | |
2275 | in just one script and thus might not be intelligible to a reader | |
2276 | familiar with the other script. | |
2277 | ||
2278 | 4.3. Length Considerations | |
2279 | ||
2280 | [RFC3066] did not provide an upper limit on the size of language | |
2281 | tags. While RFC 3066 did define the semantics of particular subtags | |
2282 | in such a way that most language tags consisted of language and | |
2283 | region subtags with a combined total length of up to six characters, | |
2284 | larger registered tags were not only possible but were actually | |
2285 | registered. | |
2286 | ||
2287 | Neither the language tag syntax nor other requirements in this | |
2288 | document impose a fixed upper limit on the number of subtags in a | |
2289 | language tag (and thus an upper bound on the size of a tag). The | |
2290 | language tag syntax suggests that, depending on the specific | |
2291 | language, more subtags (and thus a longer tag) are sometimes | |
2292 | necessary to completely identify the language for certain | |
2293 | applications; thus, it is possible to envision long or complex subtag | |
2294 | sequences. | |
2295 | ||
2296 | ||
2297 | ||
2298 | Phillips & Davis Best Current Practice [Page 41] | |
2299 | \f | |
2300 | RFC 4646 Tags for Identifying Languages September 2006 | |
2301 | ||
2302 | ||
2303 | 4.3.1. Working with Limited Buffer Sizes | |
2304 | ||
2305 | Some applications and protocols are forced to allocate fixed buffer | |
2306 | sizes or otherwise limit the length of a language tag. A conformant | |
2307 | implementation or specification MAY refuse to support the storage of | |
2308 | language tags that exceed a specified length. Any such limitation | |
2309 | SHOULD be clearly documented, and such documentation SHOULD include | |
2310 | what happens to longer tags (for example, whether an error value is | |
2311 | generated or the language tag is truncated). A protocol that allows | |
2312 | tags to be truncated at an arbitrary limit, without giving any | |
2313 | indication of what that limit is, has the potential for causing harm | |
2314 | by changing the meaning of tags in substantial ways. | |
2315 | ||
2316 | In practice, most language tags do not require more than a few | |
2317 | subtags and will not approach reasonably sized buffer limitations; | |
2318 | see Section 4.1. | |
2319 | ||
2320 | Some specifications or protocols have limits on tag length but do not | |
2321 | have a fixed length limitation. For example, [RFC2231] has no | |
2322 | explicit length limitation: the length available for the language tag | |
2323 | is constrained by the length of other header components (such as the | |
2324 | charset's name) coupled with the 76-character limit in [RFC2047]. | |
2325 | Thus, the "limit" might be 50 or more characters, but it could | |
2326 | potentially be quite small. | |
2327 | ||
2328 | The considerations for assigning a buffer limit are: | |
2329 | ||
2330 | Implementations SHOULD NOT truncate language tags unless the | |
2331 | meaning of the tag is purposefully being changed, or unless the | |
2332 | tag does not fit into a limited buffer size specified by a | |
2333 | protocol for storage or transmission. | |
2334 | ||
2335 | Implementations SHOULD warn the user when a tag is truncated since | |
2336 | truncation changes the semantic meaning of the tag. | |
2337 | ||
2338 | Implementations of protocols or specifications that are space | |
2339 | constrained but do not have a fixed limit SHOULD use the longest | |
2340 | possible tag in preference to truncation. | |
2341 | ||
2342 | Protocols or specifications that specify limited buffer sizes for | |
2343 | language tags MUST allow for language tags of up to 33 characters. | |
2344 | ||
2345 | Protocols or specifications that specify limited buffer sizes for | |
2346 | language tags SHOULD allow for language tags of at least 42 | |
2347 | characters. | |
2348 | ||
2349 | ||
2350 | ||
2351 | ||
2352 | ||
2353 | ||
2354 | Phillips & Davis Best Current Practice [Page 42] | |
2355 | \f | |
2356 | RFC 4646 Tags for Identifying Languages September 2006 | |
2357 | ||
2358 | ||
2359 | The following illustration shows how the 42-character recommendation | |
2360 | was derived. The combination of language and extended language | |
2361 | subtags was chosen for future compatibility. At up to 15 characters, | |
2362 | this combination is longer than the longest possible primary language | |
2363 | subtag (8 characters): | |
2364 | ||
2365 | language = 3 (ISO 639-2; ISO 639-1 requires 2) | |
2366 | extlang1 = 4 (each subsequent subtag includes '-') | |
2367 | extlang2 = 4 (unlikely: needs prefix="language-extlang1") | |
2368 | extlang3 = 4 (extremely unlikely) | |
2369 | script = 5 (if not suppressed: see Section 4.1) | |
2370 | region = 4 (UN M.49; ISO 3166 requires 3) | |
2371 | variant1 = 9 (MUST have language as a prefix) | |
2372 | variant2 = 9 (MUST have language-variant1 as a prefix) | |
2373 | ||
2374 | total = 42 characters | |
2375 | ||
2376 | Figure 7: Derivation of the Limit on Tag Length | |
2377 | ||
2378 | 4.3.2. Truncation of Language Tags | |
2379 | ||
2380 | Truncation of a language tag alters the meaning of the tag, and thus | |
2381 | SHOULD be avoided. However, truncation of language tags is sometimes | |
2382 | necessary due to limited buffer sizes. Such truncation MUST NOT | |
2383 | permit a subtag to be chopped off in the middle or the formation of | |
2384 | invalid tags (for example, one ending with the "-" character). | |
2385 | ||
2386 | This means that applications or protocols that truncate tags MUST do | |
2387 | so by progressively removing subtags along with their preceding "-" | |
2388 | from the right side of the language tag until the tag is short enough | |
2389 | for the given buffer. If the resulting tag ends with a single- | |
2390 | character subtag, that subtag and its preceding "-" MUST also be | |
2391 | removed. For example: | |
2392 | ||
2393 | Tag to truncate: zh-Latn-CN-variant1-a-extend1-x-wadegile-private1 | |
2394 | 1. zh-Latn-CN-variant1-a-extend1-x-wadegile | |
2395 | 2. zh-Latn-CN-variant1-a-extend1 | |
2396 | 3. zh-Latn-CN-variant1 | |
2397 | 4. zh-Latn-CN | |
2398 | 5. zh-Latn | |
2399 | 6. zh | |
2400 | ||
2401 | Figure 8: Example of Tag Truncation | |
2402 | ||
2403 | ||
2404 | ||
2405 | ||
2406 | ||
2407 | ||
2408 | ||
2409 | ||
2410 | Phillips & Davis Best Current Practice [Page 43] | |
2411 | \f | |
2412 | RFC 4646 Tags for Identifying Languages September 2006 | |
2413 | ||
2414 | ||
2415 | 4.4. Canonicalization of Language Tags | |
2416 | ||
2417 | Since a particular language tag is sometimes used by many processes, | |
2418 | language tags SHOULD always be created or generated in a canonical | |
2419 | form. | |
2420 | ||
2421 | A language tag is in canonical form when: | |
2422 | ||
2423 | 1. The tag is well-formed according the rules in Section 2.1 and | |
2424 | Section 2.2. | |
2425 | ||
2426 | 2. Subtags of type 'Region' that have a Preferred-Value mapping in | |
2427 | the IANA registry (see Section 3.1) SHOULD be replaced with their | |
2428 | mapped value. Note: In rare cases, the mapped value will also | |
2429 | have a Preferred-Value. | |
2430 | ||
2431 | 3. Redundant or grandfathered tags that have a Preferred-Value | |
2432 | mapping in the IANA registry (see Section 3.1) MUST be replaced | |
2433 | with their mapped value. These items either are deprecated | |
2434 | mappings created before the adoption of this document (such as | |
2435 | the mapping of "no-nyn" to "nn" or "i-klingon" to "tlh") or are | |
2436 | the result of later registrations or additions to this document | |
2437 | (for example, "zh-guoyu" might be mapped to a language-extlang | |
2438 | combination such as "zh-cmn" by some future update of this | |
2439 | document). | |
2440 | ||
2441 | 4. Other subtags that have a Preferred-Value mapping in the IANA | |
2442 | registry (see Section 3.1) MUST be replaced with their mapped | |
2443 | value. These items consist entirely of clerical corrections to | |
2444 | ISO 639-1 in which the deprecated subtags have been maintained | |
2445 | for compatibility purposes. | |
2446 | ||
2447 | 5. If more than one extension subtag sequence exists, the extension | |
2448 | sequences are ordered into case-insensitive ASCII order by | |
2449 | singleton subtag. | |
2450 | ||
2451 | Example: The language tag "en-A-aaa-B-ccc-bbb-x-xyz" is in canonical | |
2452 | form, while "en-B-ccc-bbb-A-aaa-X-xyz" is well-formed but not in | |
2453 | canonical form. | |
2454 | ||
2455 | Example: The language tag "en-BU" (English as used in Burma) is not | |
2456 | canonical because the 'BU' subtag has a canonical mapping to 'MM' | |
2457 | (Myanmar), although the tag "en-BU" maintains its validity. | |
2458 | ||
2459 | Canonicalization of language tags does not imply anything about the | |
2460 | use of upper or lowercase letters when processing or comparing | |
2461 | subtags (and as described in Section 2.1). All comparisons MUST be | |
2462 | performed in a case-insensitive manner. | |
2463 | ||
2464 | ||
2465 | ||
2466 | Phillips & Davis Best Current Practice [Page 44] | |
2467 | \f | |
2468 | RFC 4646 Tags for Identifying Languages September 2006 | |
2469 | ||
2470 | ||
2471 | When performing canonicalization of language tags, processors MAY | |
2472 | regularize the case of the subtags (that is, this process is | |
2473 | OPTIONAL), following the case used in the registry. Note that this | |
2474 | corresponds to the following casing rules: uppercase all non-initial | |
2475 | two-letter subtags; titlecase all non-initial four-letter subtags; | |
2476 | lowercase everything else. | |
2477 | ||
2478 | Note: Case folding of ASCII letters in certain locales, unless | |
2479 | carefully handled, sometimes produces non-ASCII character values. | |
2480 | The Unicode Character Database file "SpecialCasing.txt" defines the | |
2481 | specific cases that are known to cause problems with this. In | |
2482 | particular, the letter 'i' (U+0069) in Turkish and Azerbaijani is | |
2483 | uppercased to U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE). | |
2484 | Implementers SHOULD specify a locale-neutral casing operation to | |
2485 | ensure that case folding of subtags does not produce this value, | |
2486 | which is illegal in language tags. For example, if one were to | |
2487 | uppercase the region subtag 'in' using Turkish locale rules, the | |
2488 | sequence U+0130 U+004E would result instead of the expected 'IN'. | |
2489 | ||
2490 | Note: if the field 'Deprecated' appears in a registry record without | |
2491 | an accompanying 'Preferred-Value' field, then that tag or subtag is | |
2492 | deprecated without a replacement. Validating processors SHOULD NOT | |
2493 | generate tags that include these values, although the values are | |
2494 | canonical when they appear in a language tag. | |
2495 | ||
2496 | An extension MUST define any relationships that exist between the | |
2497 | various subtags in the extension and thus MAY define an alternate | |
2498 | canonicalization scheme for the extension's subtags. Extensions MAY | |
2499 | define how the order of the extension's subtags are interpreted. For | |
2500 | example, an extension could define that its subtags are in canonical | |
2501 | order when the subtags are placed into ASCII order: that is, | |
2502 | "en-a-aaa-bbb-ccc" instead of "en-a-ccc-bbb-aaa". Another extension | |
2503 | might define that the order of the subtags influences their semantic | |
2504 | meaning (so that "en-b-ccc-bbb-aaa" has a different value from | |
2505 | "en-b-aaa-bbb-ccc"). However, extension specifications SHOULD be | |
2506 | designed so that they are tolerant of the typical processes described | |
2507 | in Section 3.7. | |
2508 | ||
2509 | 4.5. Considerations for Private Use Subtags | |
2510 | ||
2511 | Private use subtags, like all other subtags, MUST conform to the | |
2512 | format and content constraints in the ABNF. Private use subtags have | |
2513 | no meaning outside the private agreement between the parties that | |
2514 | intend to use or exchange language tags that employ them. The same | |
2515 | subtags MAY be used with a different meaning under a separate private | |
2516 | agreement. They SHOULD NOT be used where alternatives exist and | |
2517 | SHOULD NOT be used in content or protocols intended for general use. | |
2518 | ||
2519 | ||
2520 | ||
2521 | ||
2522 | Phillips & Davis Best Current Practice [Page 45] | |
2523 | \f | |
2524 | RFC 4646 Tags for Identifying Languages September 2006 | |
2525 | ||
2526 | ||
2527 | Private use subtags are simply useless for information exchange | |
2528 | without prior arrangement. The value and semantic meaning of private | |
2529 | use tags and of the subtags used within such a language tag are not | |
2530 | defined by this document. | |
2531 | ||
2532 | Subtags defined in the IANA registry as having a specific private use | |
2533 | meaning convey more information that a purely private use tag | |
2534 | prefixed by the singleton subtag 'x'. For applications, this | |
2535 | additional information MAY be useful. | |
2536 | ||
2537 | For example, the region subtags 'AA', 'ZZ', and in the ranges | |
2538 | 'QM'-'QZ' and 'XA'-'XZ' (derived from ISO 3166 private use codes) MAY | |
2539 | be used to form a language tag. A tag such as "zh-Hans-XQ" conveys a | |
2540 | great deal of public, interchangeable information about the language | |
2541 | material (that it is Chinese in the simplified Chinese script and is | |
2542 | suitable for some geographic region 'XQ'). While the precise | |
2543 | geographic region is not known outside of private agreement, the tag | |
2544 | conveys far more information than an opaque tag such as "x-someLang", | |
2545 | which contains no information about the language subtag or script | |
2546 | subtag outside of the private agreement. | |
2547 | ||
2548 | However, in some cases content tagged with private use subtags MAY | |
2549 | interact with other systems in a different and possibly unsuitable | |
2550 | manner compared to tags that use opaque, privately defined subtags, | |
2551 | so the choice of the best approach sometimes depends on the | |
2552 | particular domain in question. | |
2553 | ||
2554 | 5. IANA Considerations | |
2555 | ||
2556 | This section deals with the processes and requirements necessary for | |
2557 | IANA to undertake to maintain the subtag and extension registries as | |
2558 | defined by this document and in accordance with the requirements of | |
2559 | [RFC2434]. | |
2560 | ||
2561 | The impact on the IANA maintainers of the two registries defined by | |
2562 | this document will be a small increase in the frequency of new | |
2563 | entries or updates. | |
2564 | ||
2565 | 5.1. Language Subtag Registry | |
2566 | ||
2567 | Upon adoption of this document, the registry will be initialized by a | |
2568 | companion document: [RFC4645]. The criteria and process for | |
2569 | selecting the initial set of records are described in that document. | |
2570 | The initial set of records represents no impact on IANA, since the | |
2571 | work to create it will be performed externally. | |
2572 | ||
2573 | ||
2574 | ||
2575 | ||
2576 | ||
2577 | ||
2578 | Phillips & Davis Best Current Practice [Page 46] | |
2579 | \f | |
2580 | RFC 4646 Tags for Identifying Languages September 2006 | |
2581 | ||
2582 | ||
2583 | The new registry MUST be listed under "Language Tags" at | |
2584 | <http://www.iana.org/numbers.html>, replacing the existing | |
2585 | registrations defined by [RFC3066]. The existing set of registration | |
2586 | forms and RFC 3066 registrations MUST be relabeled as "Language Tags | |
2587 | (Obsolete)" and maintained (but not added to or modified). | |
2588 | ||
2589 | Future work on the Language Subtag Registry SHALL be limited to | |
2590 | inserting or replacing whole records preformatted for IANA by the | |
2591 | Language Subtag Reviewer as described in Section 3.3 of this document | |
2592 | and archiving the forwarded registration form. | |
2593 | ||
2594 | Each record MUST be sent to iana@iana.org with a subject line | |
2595 | indicating whether the enclosed record is an insertion of a new | |
2596 | record (indicated by the word "INSERT" in the subject line) or a | |
2597 | replacement of an existing record (indicated by the word "MODIFY" in | |
2598 | the subject line). Records MUST NOT be deleted from the registry. | |
2599 | IANA MUST place any inserted or modified records into the appropriate | |
2600 | section of the language subtag registry, grouping the records by | |
2601 | their 'Type' field. Inserted records MAY be placed anywhere in the | |
2602 | appropriate section; there is no guarantee of the order of the | |
2603 | records beyond grouping them together by 'Type'. Modified records | |
2604 | MUST overwrite the record they replace. | |
2605 | ||
2606 | Included in any request to insert or modify records MUST be a new | |
2607 | File-Date record. This record MUST be placed first in the registry. | |
2608 | In the event that the File-Date record present in the registry has a | |
2609 | later date than the record being inserted or modified, the existing | |
2610 | record MUST be preserved. | |
2611 | ||
2612 | 5.2. Extensions Registry | |
2613 | ||
2614 | The Language Tag Extensions Registry will also be generated and sent | |
2615 | to IANA as described in Section 3.7. This registry can contain at | |
2616 | most 35 records, and thus changes to this registry are expected to be | |
2617 | very infrequent. | |
2618 | ||
2619 | Future work by IANA on the Language Tag Extensions Registry is | |
2620 | limited to two cases. First, the IESG MAY request that new records | |
2621 | be inserted into this registry from time to time. These requests | |
2622 | MUST include the record to insert in the exact format described in | |
2623 | Section 3.7. In addition, there MAY be occasional requests from the | |
2624 | maintaining authority for a specific extension to update the contact | |
2625 | information or URLs in the record. These requests MUST include the | |
2626 | complete, updated record. IANA is not responsible for validating the | |
2627 | information provided, only that it is properly formatted. It should | |
2628 | reasonably be seen to come from the maintaining authority named in | |
2629 | the record present in the registry. | |
2630 | ||
2631 | ||
2632 | ||
2633 | ||
2634 | Phillips & Davis Best Current Practice [Page 47] | |
2635 | \f | |
2636 | RFC 4646 Tags for Identifying Languages September 2006 | |
2637 | ||
2638 | ||
2639 | 6. Security Considerations | |
2640 | ||
2641 | Language tags used in content negotiation, like any other information | |
2642 | exchanged on the Internet, might be a source of concern because they | |
2643 | might be used to infer the nationality of the sender, and thus | |
2644 | identify potential targets for surveillance. | |
2645 | ||
2646 | This is a special case of the general problem that anything sent is | |
2647 | visible to the receiving party and possibly to third parties as well. | |
2648 | It is useful to be aware that such concerns can exist in some cases. | |
2649 | ||
2650 | The evaluation of the exact magnitude of the threat, and any possible | |
2651 | countermeasures, is left to each application protocol (see BCP 72 | |
2652 | [RFC3552] for best current practice guidance on security threats and | |
2653 | defenses). | |
2654 | ||
2655 | The language tag associated with a particular information item is of | |
2656 | no consequence whatsoever in determining whether that content might | |
2657 | contain possible homographs. The fact that a text is tagged as being | |
2658 | in one language or using a particular script subtag provides no | |
2659 | assurance whatsoever that it does not contain characters from scripts | |
2660 | other than the one(s) associated with or specified by that language | |
2661 | tag. | |
2662 | ||
2663 | Since there is no limit to the number of variant, private use, and | |
2664 | extension subtags, and consequently no limit on the possible length | |
2665 | of a tag, implementations need to guard against buffer overflow | |
2666 | attacks. See Section 4.3 for details on language tag truncation, | |
2667 | which can occur as a consequence of defenses against buffer overflow. | |
2668 | ||
2669 | Although the specification of valid subtags for an extension (see | |
2670 | Section 3.7) MUST be available over the Internet, implementations | |
2671 | SHOULD NOT mechanically depend on it being always accessible, to | |
2672 | prevent denial-of-service attacks. | |
2673 | ||
2674 | 7. Character Set Considerations | |
2675 | ||
2676 | The syntax in this document requires that language tags use only the | |
2677 | characters A-Z, a-z, 0-9, and HYPHEN-MINUS, which are present in most | |
2678 | character sets, so the composition of language tags should not have | |
2679 | any character set issues. | |
2680 | ||
2681 | Rendering of characters based on the content of a language tag is not | |
2682 | addressed in this memo. Historically, some languages have relied on | |
2683 | the use of specific character sets or other information in order to | |
2684 | infer how a specific character should be rendered (notably this | |
2685 | applies to language- and culture-specific variations of Han | |
2686 | ideographs as used in Japanese, Chinese, and Korean). When language | |
2687 | ||
2688 | ||
2689 | ||
2690 | Phillips & Davis Best Current Practice [Page 48] | |
2691 | \f | |
2692 | RFC 4646 Tags for Identifying Languages September 2006 | |
2693 | ||
2694 | ||
2695 | tags are applied to spans of text, rendering engines sometimes use | |
2696 | that information in deciding which font to use in the absence of | |
2697 | other information, particularly where languages with distinct writing | |
2698 | traditions use the same characters. | |
2699 | ||
2700 | 8. Changes from RFC 3066 | |
2701 | ||
2702 | The main goals for this revision of language tags were the following: | |
2703 | ||
2704 | *Compatibility.* All RFC 3066 language tags (including those in the | |
2705 | IANA registry) remain valid in this specification. The changes in | |
2706 | this document represent additional constraints on language tags. | |
2707 | That is, in no case is the syntax more permissive and processors | |
2708 | based on the ABNF and other provisions of RFC 3066 (such as those | |
2709 | described in [XMLSchema]) will be able to process the tags described | |
2710 | by this document. In addition, this document defines language tags | |
2711 | in such as way as to ensure future compatibility. | |
2712 | ||
2713 | *Stability.* Because of changes in the past in the underlying ISO | |
2714 | standards, a valid RFC 3066 language tag could become invalid or have | |
2715 | its meaning change. This has the potential of invalidating content | |
2716 | that may have an extensive shelf-life. In this specification, once a | |
2717 | language tag is valid, it remains valid forever. | |
2718 | ||
2719 | *Validity.* The structure of language tags defined by this document | |
2720 | makes it possible to determine if a particular tag is well-formed | |
2721 | without regard for the actual content or "meaning" of the tag as a | |
2722 | whole. This is important because the registry grows and underlying | |
2723 | standards change over time. In addition, it must be possible to | |
2724 | determine if a tag is valid (or not) for a given point in time in | |
2725 | order to provide reproducible, testable results. This process must | |
2726 | not be error-prone; otherwise implementations might give different | |
2727 | results. By having an authoritative registry with specific | |
2728 | versioning information, the validity of language tags at any point in | |
2729 | time can be precisely determined (instead of interpolating values | |
2730 | from many separate sources). | |
2731 | ||
2732 | *Utility.* It is sometimes important to be able to differentiate | |
2733 | between written forms of a language -- for many implementations this | |
2734 | is more important than distinguishing between the spoken variants of | |
2735 | a language. Languages are written in a wide variety of different | |
2736 | scripts, so this document provides for the generative use of ISO | |
2737 | 15924 script codes. Like the generative use of ISO language and | |
2738 | country codes in RFC 3066, this allows combinations to be produced | |
2739 | without resorting to the registration process. The addition of UN | |
2740 | M.49 codes provides for the generation of language tags with regional | |
2741 | scope, which is also required by some applications. | |
2742 | ||
2743 | ||
2744 | ||
2745 | ||
2746 | Phillips & Davis Best Current Practice [Page 49] | |
2747 | \f | |
2748 | RFC 4646 Tags for Identifying Languages September 2006 | |
2749 | ||
2750 | ||
2751 | The recast of the registry from containing whole language tags to | |
2752 | subtags is a key part of this. An important feature of RFC 3066 was | |
2753 | that it allowed generative use of subtags. This allows people to | |
2754 | meaningfully use generated tags, without the delays in registering | |
2755 | whole tags or the need to register all of the combinations that might | |
2756 | be useful. | |
2757 | ||
2758 | The choice of placing the extended language and script subtags | |
2759 | between the primary language and region subtags was widely debated. | |
2760 | This design was chosen because the prevalent matching and content | |
2761 | negotiation schemes rely on the subtags being arranged in order of | |
2762 | increasing specificity. That is, the subtags that mark a greater | |
2763 | barrier to mutual intelligibility appear left-most in a tag. For | |
2764 | example, when selecting content written in Azerbaijani, the script | |
2765 | (Arabic, Cyrillic, or Latin) represents a greater barrier to | |
2766 | understanding than any regional variations (those associated with | |
2767 | Azerbaijan or Iran, for example). Individuals who prefer documents | |
2768 | in a particular script, but can deal with the minor regional | |
2769 | differences, can therefore select appropriate content. Applications | |
2770 | that do not deal with written content will continue to omit these | |
2771 | subtags. | |
2772 | ||
2773 | *Extensibility.* Because of the widespread use of language tags, it | |
2774 | is disruptive to have periodic revisions of the core specification, | |
2775 | even in the face of demonstrated need. The extension mechanism | |
2776 | provides for a way for independent RFCs to define extensions to | |
2777 | language tags. These extensions have a very constrained, well- | |
2778 | defined structure that prevents extensions from interfering with | |
2779 | implementations of language tags defined in this document. | |
2780 | ||
2781 | The document also anticipates features of ISO 639-3 with the addition | |
2782 | of the extended language subtags, as well as the possibility of other | |
2783 | ISO 639 parts becoming useful for the formation of language tags in | |
2784 | the future. | |
2785 | ||
2786 | The use and definition of private use tags have also been modified, | |
2787 | to allow people to use private use subtags to extend or modify | |
2788 | defined tags and to move as much information as possible out of | |
2789 | private use and into the regular structure. | |
2790 | ||
2791 | The goal for each of these modifications is to reduce or eliminate | |
2792 | the need for future revisions of this document. | |
2793 | ||
2794 | ||
2795 | ||
2796 | ||
2797 | ||
2798 | ||
2799 | ||
2800 | ||
2801 | ||
2802 | Phillips & Davis Best Current Practice [Page 50] | |
2803 | \f | |
2804 | RFC 4646 Tags for Identifying Languages September 2006 | |
2805 | ||
2806 | ||
2807 | The specific changes in this document to meet these goals are: | |
2808 | ||
2809 | o Defines the ABNF and rules for subtags so that the category of all | |
2810 | subtags can be determined without reference to the registry. | |
2811 | ||
2812 | o Adds the concept of well-formed vs. validating processors, | |
2813 | defining the rules by which an implementation can claim to be one | |
2814 | or the other. | |
2815 | ||
2816 | o Replaces the IANA language tag registry with a language subtag | |
2817 | registry that provides a complete list of valid subtags in the | |
2818 | IANA registry. This allows for robust implementation and ease of | |
2819 | maintenance. The language subtag registry becomes the canonical | |
2820 | source for forming language tags. | |
2821 | ||
2822 | o Provides a process that guarantees stability of language tags, by | |
2823 | handling reuse of values by ISO 639, ISO 15924, and ISO 3166 in | |
2824 | the event that they register a previously used value for a new | |
2825 | purpose. | |
2826 | ||
2827 | o Allows ISO 15924 script code subtags and allows them to be used | |
2828 | generatively. Defines a method for indicating in the registry | |
2829 | when script subtags are necessary for a given language tag. | |
2830 | ||
2831 | o Adds the concept of a variant subtag and allows variants to be | |
2832 | used generatively. | |
2833 | ||
2834 | o Adds the ability to use a class of UN M.49 tags for supra-national | |
2835 | regions and to resolve conflicts in the assignment of ISO 3166 | |
2836 | codes. | |
2837 | ||
2838 | o Defines the private use tags in ISO 639, ISO 15924, and ISO 3166 | |
2839 | as the mechanism for creating private use language, script, and | |
2840 | region subtags, respectively. | |
2841 | ||
2842 | o Adds a well-defined extension mechanism. | |
2843 | ||
2844 | o Defines an extended language subtag, possibly for use with certain | |
2845 | anticipated features of ISO 639-3. | |
2846 | ||
2847 | ||
2848 | ||
2849 | ||
2850 | ||
2851 | ||
2852 | ||
2853 | ||
2854 | ||
2855 | ||
2856 | ||
2857 | ||
2858 | Phillips & Davis Best Current Practice [Page 51] | |
2859 | \f | |
2860 | RFC 4646 Tags for Identifying Languages September 2006 | |
2861 | ||
2862 | ||
2863 | 9. References | |
2864 | ||
2865 | 9.1. Normative References | |
2866 | ||
2867 | [ISO10646] International Organization for Standardization, | |
2868 | "ISO/IEC 10646:2003. Information technology -- | |
2869 | Universal Multiple-Octet Coded Character Set (UCS)", | |
2870 | 2003. | |
2871 | ||
2872 | [ISO15924] International Organization for Standardization, "ISO | |
2873 | 15924:2004. Information and documentation -- Codes for | |
2874 | the representation of names of scripts", January 2004. | |
2875 | ||
2876 | [ISO3166-1] International Organization for Standardization, "ISO | |
2877 | 3166-1:1997. Codes for the representation of names of | |
2878 | countries and their subdivisions -- Part 1: Country | |
2879 | codes", 1997. | |
2880 | ||
2881 | [ISO639-1] International Organization for Standardization, "ISO | |
2882 | 639-1:2002. Codes for the representation of names of | |
2883 | languages -- Part 1: Alpha-2 code", 2002. | |
2884 | ||
2885 | [ISO639-2] International Organization for Standardization, "ISO | |
2886 | 639-2:1998. Codes for the representation of names of | |
2887 | languages -- Part 2: Alpha-3 code, first edition", | |
2888 | 1998. | |
2889 | ||
2890 | [ISO646] International Organization for Standardization, | |
2891 | "ISO/IEC 646:1991, Information technology -- ISO 7-bit | |
2892 | coded character set for information interchange.", | |
2893 | 1991. | |
2894 | ||
2895 | [RFC2026] Bradner, S., "The Internet Standards Process -- | |
2896 | Revision 3", BCP 9, RFC 2026, October 1996. | |
2897 | ||
2898 | [RFC2028] Hovey, R. and S. Bradner, "The Organizations Involved | |
2899 | in the IETF Standards Process", BCP 11, RFC 2028, | |
2900 | October 1996. | |
2901 | ||
2902 | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |
2903 | Requirement Levels", BCP 14, RFC 2119, March 1997. | |
2904 | ||
2905 | [RFC2434] Narten, T. and H. Alvestrand, "Guidelines for Writing | |
2906 | an IANA Considerations Section in RFCs", BCP 26, | |
2907 | RFC 2434, October 1998. | |
2908 | ||
2909 | ||
2910 | ||
2911 | ||
2912 | ||
2913 | ||
2914 | Phillips & Davis Best Current Practice [Page 52] | |
2915 | \f | |
2916 | RFC 4646 Tags for Identifying Languages September 2006 | |
2917 | ||
2918 | ||
2919 | [RFC2860] Carpenter, B., Baker, F., and M. Roberts, "Memorandum | |
2920 | of Understanding Concerning the Technical Work of the | |
2921 | Internet Assigned Numbers Authority", RFC 2860, | |
2922 | June 2000. | |
2923 | ||
2924 | [RFC3339] Klyne, G., Ed. and C. Newman, "Date and Time on the | |
2925 | Internet: Timestamps", RFC 3339, July 2002. | |
2926 | ||
2927 | [RFC4234] Crocker, D., Ed. and P. Overell, "Augmented BNF for | |
2928 | Syntax Specifications: ABNF", RFC 4234, October 2005. | |
2929 | ||
2930 | [UN_M.49] Statistics Division, United Nations, "Standard Country | |
2931 | or Area Codes for Statistical Use", UN Standard | |
2932 | Country or Area Codes for Statistical Use, Revision 4 | |
2933 | (United Nations publication, Sales No. 98.XVII.9, | |
2934 | June 1999. | |
2935 | ||
2936 | 9.2. Informative References | |
2937 | ||
2938 | [RFC1766] Alvestrand, H., "Tags for the Identification of | |
2939 | Languages", RFC 1766, March 1995. | |
2940 | ||
2941 | [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail | |
2942 | Extensions) Part Three: Message Header Extensions for | |
2943 | Non-ASCII Text", RFC 2047, November 1996. | |
2944 | ||
2945 | [RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and | |
2946 | Encoded Word Extensions: Character Sets, Languages, | |
2947 | and Continuations", RFC 2231, November 1997. | |
2948 | ||
2949 | [RFC2781] Hoffman, P. and F. Yergeau, "UTF-16, an encoding of | |
2950 | ISO 10646", RFC 2781, February 2000. | |
2951 | ||
2952 | [RFC3066] Alvestrand, H., "Tags for the Identification of | |
2953 | Languages", BCP 47, RFC 3066, January 2001. | |
2954 | ||
2955 | [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing | |
2956 | RFC Text on Security Considerations", BCP 72, | |
2957 | RFC 3552, July 2003. | |
2958 | ||
2959 | [RFC4645] Ewell, D., Ed., "Initial Language Subtag Registry", | |
2960 | RFC 4645, September 2006. | |
2961 | ||
2962 | [RFC4647] Phillips, A., Ed. and M. Davis, Ed., "Matching of | |
2963 | Language Tags", BCP 47, RFC 4647, September 2006. | |
2964 | ||
2965 | ||
2966 | ||
2967 | ||
2968 | ||
2969 | ||
2970 | Phillips & Davis Best Current Practice [Page 53] | |
2971 | \f | |
2972 | RFC 4646 Tags for Identifying Languages September 2006 | |
2973 | ||
2974 | ||
2975 | [Unicode] Unicode Consortium, "The Unicode Standard, Version | |
2976 | 5.0", Boston, MA, Addison-Wesley, 2007. ISBN 0-321- | |
2977 | 48091-0. | |
2978 | ||
2979 | [XML10] Bray (et al), T., "Extensible Markup Language (XML) | |
2980 | 1.0", 02 2004. | |
2981 | ||
2982 | [XMLSchema] Biron, P., Ed. and A. Malhotra, Ed., "XML Schema Part | |
2983 | 2: Datatypes Second Edition", 10 2004, < | |
2984 | http://www.w3.org/TR/xmlschema-2/>. | |
2985 | ||
2986 | [iso639.prin] ISO 639 Joint Advisory Committee, "ISO 639 Joint | |
2987 | Advisory Committee: Working principles for ISO 639 | |
2988 | maintenance", March 2000, <http://www.loc.gov/ | |
2989 | standards/iso639-2/iso639jac_n3r.html>. | |
2990 | ||
2991 | [record-jar] Raymond, E., "The Art of Unix Programming", 2003, | |
2992 | <urn:isbn:0-13-142901-9>. | |
2993 | ||
2994 | ||
2995 | ||
2996 | ||
2997 | ||
2998 | ||
2999 | ||
3000 | ||
3001 | ||
3002 | ||
3003 | ||
3004 | ||
3005 | ||
3006 | ||
3007 | ||
3008 | ||
3009 | ||
3010 | ||
3011 | ||
3012 | ||
3013 | ||
3014 | ||
3015 | ||
3016 | ||
3017 | ||
3018 | ||
3019 | ||
3020 | ||
3021 | ||
3022 | ||
3023 | ||
3024 | ||
3025 | ||
3026 | Phillips & Davis Best Current Practice [Page 54] | |
3027 | \f | |
3028 | RFC 4646 Tags for Identifying Languages September 2006 | |
3029 | ||
3030 | ||
3031 | Appendix A. Acknowledgements | |
3032 | ||
3033 | Any list of contributors is bound to be incomplete; please regard the | |
3034 | following as only a selection from the group of people who have | |
3035 | contributed to make this document what it is today. | |
3036 | ||
3037 | The contributors to RFC 3066 and RFC 1766, the precursors of this | |
3038 | document, made enormous contributions directly or indirectly to this | |
3039 | document and are generally responsible for the success of language | |
3040 | tags. | |
3041 | ||
3042 | The following people (in alphabetical order) contributed to this | |
3043 | document or to RFCs 1766 and 3066: | |
3044 | ||
3045 | Glenn Adams, Harald Tveit Alvestrand, Tim Berners-Lee, Marc Blanchet, | |
3046 | Nathaniel Borenstein, Karen Broome, Eric Brunner, Sean M. Burke, M.T. | |
3047 | Carrasco Benitez, Jeremy Carroll, John Clews, Jim Conklin, Peter | |
3048 | Constable, John Cowan, Mark Crispin, Dave Crocker, Elwyn Davies, | |
3049 | Martin Duerst, Frank Ellerman, Michael Everson, Doug Ewell, Ned | |
3050 | Freed, Tim Goodwin, Dirk-Willem van Gulik, Marion Gunn, Joel Halpren, | |
3051 | Elliotte Rusty Harold, Paul Hoffman, Scott Hollenbeck, Richard | |
3052 | Ishida, Olle Jarnefors, Kent Karlsson, John Klensin, Erkki | |
3053 | Kolehmainen, Alain LaBonte, Eric Mader, Ira McDonald, Keith Moore, | |
3054 | Chris Newman, Masataka Ohta, Dylan Pierce, Randy Presuhn, George | |
3055 | Rhoten, Felix Sasaki, Markus Scherer, Keld Jorn Simonsen, Thierry | |
3056 | Sourbier, Otto Stolz, Tex Texin, Andrea Vine, Rhys Weatherley, Misha | |
3057 | Wolf, Francois Yergeau and many, many others. | |
3058 | ||
3059 | Very special thanks must go to Harald Tveit Alvestrand, who | |
3060 | originated RFCs 1766 and 3066, and without whom this document would | |
3061 | not have been possible. Special thanks must go to Michael Everson, | |
3062 | who has served as Language Tag Reviewer for almost the complete | |
3063 | period since the publication of RFC 1766. Special thanks to Doug | |
3064 | Ewell, for his production of the first complete subtag registry, and | |
3065 | his work in producing a test parser for verifying language tags. | |
3066 | ||
3067 | ||
3068 | ||
3069 | ||
3070 | ||
3071 | ||
3072 | ||
3073 | ||
3074 | ||
3075 | ||
3076 | ||
3077 | ||
3078 | ||
3079 | ||
3080 | ||
3081 | ||
3082 | Phillips & Davis Best Current Practice [Page 55] | |
3083 | \f | |
3084 | RFC 4646 Tags for Identifying Languages September 2006 | |
3085 | ||
3086 | ||
3087 | Appendix B. Examples of Language Tags (Informative) | |
3088 | ||
3089 | Simple language subtag: | |
3090 | ||
3091 | de (German) | |
3092 | ||
3093 | fr (French) | |
3094 | ||
3095 | ja (Japanese) | |
3096 | ||
3097 | i-enochian (example of a grandfathered tag) | |
3098 | ||
3099 | Language subtag plus Script subtag: | |
3100 | ||
3101 | zh-Hant (Chinese written using the Traditional Chinese script) | |
3102 | ||
3103 | zh-Hans (Chinese written using the Simplified Chinese script) | |
3104 | ||
3105 | sr-Cyrl (Serbian written using the Cyrillic script) | |
3106 | ||
3107 | sr-Latn (Serbian written using the Latin script) | |
3108 | ||
3109 | Language-Script-Region: | |
3110 | ||
3111 | zh-Hans-CN (Chinese written using the Simplified script as used in | |
3112 | mainland China) | |
3113 | ||
3114 | sr-Latn-CS (Serbian written using the Latin script as used in | |
3115 | Serbia and Montenegro) | |
3116 | ||
3117 | Language-Variant: | |
3118 | ||
3119 | sl-rozaj (Resian dialect of Slovenian | |
3120 | ||
3121 | sl-nedis (Nadiza dialect of Slovenian) | |
3122 | ||
3123 | Language-Region-Variant: | |
3124 | ||
3125 | de-CH-1901 (German as used in Switzerland using the 1901 variant | |
3126 | [orthography]) | |
3127 | ||
3128 | sl-IT-nedis (Slovenian as used in Italy, Nadiza dialect) | |
3129 | ||
3130 | ||
3131 | ||
3132 | ||
3133 | ||
3134 | ||
3135 | ||
3136 | ||
3137 | ||
3138 | Phillips & Davis Best Current Practice [Page 56] | |
3139 | \f | |
3140 | RFC 4646 Tags for Identifying Languages September 2006 | |
3141 | ||
3142 | ||
3143 | Language-Script-Region-Variant: | |
3144 | ||
3145 | sl-Latn-IT-nedis (Nadiza dialect of Slovenian written using the | |
3146 | Latin script as used in Italy. Note that this tag is NOT | |
3147 | RECOMMENDED because subtag 'sl' has a Suppress-Script value of | |
3148 | 'Latn') | |
3149 | ||
3150 | Language-Region: | |
3151 | ||
3152 | de-DE (German for Germany) | |
3153 | ||
3154 | en-US (English as used in the United States) | |
3155 | ||
3156 | es-419 (Spanish appropriate for the Latin America and Caribbean | |
3157 | region using the UN region code) | |
3158 | ||
3159 | Private use subtags: | |
3160 | ||
3161 | de-CH-x-phonebk | |
3162 | ||
3163 | az-Arab-x-AZE-derbend | |
3164 | ||
3165 | Extended language subtags (examples ONLY: extended languages MUST be | |
3166 | defined by revision or update to this document): | |
3167 | ||
3168 | zh-min | |
3169 | ||
3170 | zh-min-nan-Hant-CN | |
3171 | ||
3172 | Private use registry values: | |
3173 | ||
3174 | x-whatever (private use using the singleton 'x') | |
3175 | ||
3176 | qaa-Qaaa-QM-x-southern (all private tags) | |
3177 | ||
3178 | de-Qaaa (German, with a private script) | |
3179 | ||
3180 | sr-Latn-QM (Serbian, Latin-script, private region) | |
3181 | ||
3182 | sr-Qaaa-CS (Serbian, private script, for Serbia and Montenegro) | |
3183 | ||
3184 | Tags that use extensions (examples ONLY: extensions MUST be defined | |
3185 | by revision or update to this document or by RFC): | |
3186 | ||
3187 | en-US-u-islamCal | |
3188 | ||
3189 | zh-CN-a-myExt-x-private | |
3190 | ||
3191 | ||
3192 | ||
3193 | ||
3194 | Phillips & Davis Best Current Practice [Page 57] | |
3195 | \f | |
3196 | RFC 4646 Tags for Identifying Languages September 2006 | |
3197 | ||
3198 | ||
3199 | en-a-myExt-b-another | |
3200 | ||
3201 | Some Invalid Tags: | |
3202 | ||
3203 | de-419-DE (two region tags) | |
3204 | ||
3205 | a-DE (use of a single-character subtag in primary position; note | |
3206 | that there are a few grandfathered tags that start with "i-" that | |
3207 | are valid) | |
3208 | ||
3209 | ar-a-aaa-b-bbb-a-ccc (two extensions with same single-letter | |
3210 | prefix) | |
3211 | ||
3212 | Authors' Addresses | |
3213 | ||
3214 | Addison Phillips (Editor) | |
3215 | Yahoo! Inc. | |
3216 | ||
3217 | EMail: addison@inter-locale.com | |
3218 | ||
3219 | ||
3220 | Mark Davis (Editor) | |
3221 | ||
3222 | ||
3223 | EMail: mark.davis@macchiato.com or mark.davis@google.com | |
3224 | ||
3225 | ||
3226 | ||
3227 | ||
3228 | ||
3229 | ||
3230 | ||
3231 | ||
3232 | ||
3233 | ||
3234 | ||
3235 | ||
3236 | ||
3237 | ||
3238 | ||
3239 | ||
3240 | ||
3241 | ||
3242 | ||
3243 | ||
3244 | ||
3245 | ||
3246 | ||
3247 | ||
3248 | ||
3249 | ||
3250 | Phillips & Davis Best Current Practice [Page 58] | |
3251 | \f | |
3252 | RFC 4646 Tags for Identifying Languages September 2006 | |
3253 | ||
3254 | ||
3255 | Full Copyright Statement | |
3256 | ||
3257 | Copyright (C) The Internet Society (2006). | |
3258 | ||
3259 | This document is subject to the rights, licenses and restrictions | |
3260 | contained in BCP 78, and except as set forth therein, the authors | |
3261 | retain all their rights. | |
3262 | ||
3263 | This document and the information contained herein are provided on an | |
3264 | "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS | |
3265 | OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET | |
3266 | ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, | |
3267 | INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE | |
3268 | INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED | |
3269 | WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. | |
3270 | ||
3271 | Intellectual Property | |
3272 | ||
3273 | The IETF takes no position regarding the validity or scope of any | |
3274 | Intellectual Property Rights or other rights that might be claimed to | |
3275 | pertain to the implementation or use of the technology described in | |
3276 | this document or the extent to which any license under such rights | |
3277 | might or might not be available; nor does it represent that it has | |
3278 | made any independent effort to identify any such rights. Information | |
3279 | on the procedures with respect to rights in RFC documents can be | |
3280 | found in BCP 78 and BCP 79. | |
3281 | ||
3282 | Copies of IPR disclosures made to the IETF Secretariat and any | |
3283 | assurances of licenses to be made available, or the result of an | |
3284 | attempt made to obtain a general license or permission for the use of | |
3285 | such proprietary rights by implementers or users of this | |
3286 | specification can be obtained from the IETF on-line IPR repository at | |
3287 | http://www.ietf.org/ipr. | |
3288 | ||
3289 | The IETF invites any interested party to bring to its attention any | |
3290 | copyrights, patents or patent applications, or other proprietary | |
3291 | rights that may cover technology that may be required to implement | |
3292 | this standard. Please address the information to the IETF at | |
3293 | ietf-ipr@ietf.org. | |
3294 | ||
3295 | Acknowledgement | |
3296 | ||
3297 | Funding for the RFC Editor function is provided by the IETF | |
3298 | Administrative Support Activity (IASA). | |
3299 | ||
3300 | ||
3301 | ||
3302 | ||
3303 | ||
3304 | ||
3305 | ||
3306 | Phillips & Davis Best Current Practice [Page 59] | |
3307 | \f |