]> git.ipfire.org Git - thirdparty/glibc.git/blame - manual/message.texi
nptl/tst-cancel25 needs to be an internal test
[thirdparty/glibc.git] / manual / message.texi
CommitLineData
7a68c94a
UD
1@node Message Translation, Searching and Sorting, Locales, Top
2@c %MENU% How to make the program speak the user's language
40a55d20
UD
3@chapter Message Translation
4
e8dd4791
CD
5The program's interface with the user should be designed to ease the user's
6task. One way to ease the user's task is to use messages in whatever
7language the user prefers.
40a55d20
UD
8
9Printing messages in different languages can be implemented in different
10ways. One could add all the different languages in the source code and
c430c4af
BS
11choose among the variants every time a message has to be printed. This is
12certainly not a good solution since extending the set of languages is
13cumbersome (the code must be changed) and the code itself can become
40a55d20
UD
14really big with dozens of message sets.
15
c430c4af 16A better solution is to keep the message sets for each language
40a55d20
UD
17in separate files which are loaded at runtime depending on the language
18selection of the user.
19
1f77f049 20@Theglibc{} provides two different sets of functions to support
40a55d20
UD
21message translation. The problem is that neither of the interfaces is
22officially defined by the POSIX standard. The @code{catgets} family of
f2ea0f5b
UD
23functions is defined in the X/Open standard but this is derived from
24industry decisions and therefore not necessarily based on reasonable
40a55d20
UD
25decisions.
26
10b89412 27As mentioned above, the message catalog handling provides easy
ef48b196 28extendability by using external data files which contain the message
40a55d20
UD
29translations. I.e., these files contain for each of the messages used
30in the program a translation for the appropriate language. So the tasks
fed8f7f7 31of the message handling functions are
40a55d20
UD
32
33@itemize @bullet
34@item
c430c4af 35locate the external data file with the appropriate translations
40a55d20
UD
36@item
37load the data and make it possible to address the messages
38@item
39map a given key to the translated message
40@end itemize
41
42The two approaches mainly differ in the implementation of this last
e8dd4791 43step. Decisions made in the last step influence the rest of the design.
40a55d20
UD
44
45@menu
46* Message catalogs a la X/Open:: The @code{catgets} family of functions.
47* The Uniforum approach:: The @code{gettext} family of functions.
48@end menu
49
50
51@node Message catalogs a la X/Open
52@section X/Open Message Catalog Handling
53
54The @code{catgets} functions are based on the simple scheme:
55
56@quotation
57Associate every message to translate in the source code with a unique
58identifier. To retrieve a message from a catalog file solely the
59identifier is used.
60@end quotation
61
62This means for the author of the program that s/he will have to make
63sure the meaning of the identifier in the program code and in the
10b89412 64message catalogs is always the same.
40a55d20
UD
65
66Before a message can be translated the catalog file must be located.
67The user of the program must be able to guide the responsible function
68to find whatever catalog the user wants. This is separated from what
69the programmer had in mind.
70
f2ea0f5b 71All the types, constants and functions for the @code{catgets} functions
40a55d20
UD
72are defined/declared in the @file{nl_types.h} header file.
73
74@menu
75* The catgets Functions:: The @code{catgets} function family.
76* The message catalog files:: Format of the message catalog files.
77* The gencat program:: How to generate message catalogs files which
78 can be used by the functions.
79* Common Usage:: How to use the @code{catgets} interface.
80@end menu
81
82
83@node The catgets Functions
84@subsection The @code{catgets} function family
85
40a55d20 86@deftypefun nl_catd catopen (const char *@var{cat_name}, int @var{flag})
d08a7e4c 87@standards{X/Open, nl_types.h}
29e7e2df
AO
88@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
89@c catopen @mtsenv @ascuheap @acsmem
90@c strchr ok
91@c setlocale(,NULL) ok
92@c getenv @mtsenv
93@c strlen ok
94@c alloca ok
95@c stpcpy ok
96@c malloc @ascuheap @acsmem
97@c __open_catalog @ascuheap @acsmem
98@c strchr ok
99@c open_not_cancel_2 @acsfd
100@c strlen ok
101@c ENOUGH ok
102@c alloca ok
103@c memcpy ok
104@c fxstat64 ok
105@c __set_errno ok
106@c mmap @acsmem
107@c malloc dup @ascuheap @acsmem
108@c read_not_cancel ok
109@c free dup @ascuheap @acsmem
110@c munmap ok
111@c close_not_cancel_no_status ok
112@c free @ascuheap @acsmem
10b89412 113The @code{catopen} function tries to locate the message data file named
40a55d20
UD
114@var{cat_name} and loads it when found. The return value is of an
115opaque type and can be used in calls to the other functions to refer to
116this loaded catalog.
117
118The return value is @code{(nl_catd) -1} in case the function failed and
010fe231 119no catalog was loaded. The global variable @code{errno} contains a code
40a55d20
UD
120for the error causing the failure. But even if the function call
121succeeded this does not mean that all messages can be translated.
122
123Locating the catalog file must happen in a way which lets the user of
124the program influence the decision. It is up to the user to decide
125about the language to use and sometimes it is useful to use alternate
126catalog files. All this can be specified by the user by setting some
f2ea0f5b 127environment variables.
40a55d20
UD
128
129The first problem is to find out where all the message catalogs are
130stored. Every program could have its own place to keep all the
131different files but usually the catalog files are grouped by languages
132and the catalogs for all programs are kept in the same place.
133
134@cindex NLSPATH environment variable
135To tell the @code{catopen} function where the catalog for the program
136can be found the user can set the environment variable @code{NLSPATH} to
137a value which describes her/his choice. Since this value must be usable
138for different languages and locales it cannot be a simple string.
139Instead it is a format string (similar to @code{printf}'s). An example
140is
141
142@smallexample
143/usr/share/locale/%L/%N:/usr/share/locale/%L/LC_MESSAGES/%N
144@end smallexample
145
146First one can see that more than one directory can be specified (with
147the usual syntax of separating them by colons). The next things to
148observe are the format string, @code{%L} and @code{%N} in this case.
149The @code{catopen} function knows about several of them and the
150replacement for all of them is of course different.
151
152@table @code
153@item %N
154This format element is substituted with the name of the catalog file.
155This is the value of the @var{cat_name} argument given to
156@code{catgets}.
157
158@item %L
159This format element is substituted with the name of the currently
160selected locale for translating messages. How this is determined is
161explained below.
162
163@item %l
164(This is the lowercase ell.) This format element is substituted with the
f2ea0f5b 165language element of the locale name. The string describing the selected
40a55d20
UD
166locale is expected to have the form
167@code{@var{lang}[_@var{terr}[.@var{codeset}]]} and this format uses the
168first part @var{lang}.
169
170@item %t
171This format element is substituted by the territory part @var{terr} of
172the name of the currently selected locale. See the explanation of the
173format above.
174
175@item %c
176This format element is substituted by the codeset part @var{codeset} of
177the name of the currently selected locale. See the explanation of the
178format above.
179
180@item %%
10b89412 181Since @code{%} is used as a meta character there must be a way to
40a55d20
UD
182express the @code{%} character in the result itself. Using @code{%%}
183does this just like it works for @code{printf}.
184@end table
185
186
e8b1163e
AJ
187Using @code{NLSPATH} allows arbitrary directories to be searched for
188message catalogs while still allowing different languages to be used.
189If the @code{NLSPATH} environment variable is not set, the default value
190is
40a55d20
UD
191
192@smallexample
193@var{prefix}/share/locale/%L/%N:@var{prefix}/share/locale/%L/LC_MESSAGES/%N
194@end smallexample
195
196@noindent
1f77f049
JM
197where @var{prefix} is given to @code{configure} while installing @theglibc{}
198(this value is in many cases @code{/usr} or the empty string).
40a55d20
UD
199
200The remaining problem is to decide which must be used. The value
201decides about the substitution of the format elements mentioned above.
202First of all the user can specify a path in the message catalog name
203(i.e., the name contains a slash character). In this situation the
204@code{NLSPATH} environment variable is not used. The catalog must exist
205as specified in the program, perhaps relative to the current working
206directory. This situation in not desirable and catalogs names never
608cc1f0 207should be written this way. Beside this, this behavior is not portable
40a55d20
UD
208to all other platforms providing the @code{catgets} interface.
209
210@cindex LC_ALL environment variable
211@cindex LC_MESSAGES environment variable
212@cindex LANG environment variable
213Otherwise the values of environment variables from the standard
f2ea0f5b 214environment are examined (@pxref{Standard Environment}). Which
40a55d20
UD
215variables are examined is decided by the @var{flag} parameter of
216@code{catopen}. If the value is @code{NL_CAT_LOCALE} (which is defined
10b89412 217in @file{nl_types.h}) then the @code{catopen} function uses the name of
4d76a0ec
UD
218the locale currently selected for the @code{LC_MESSAGES} category.
219
220If @var{flag} is zero the @code{LANG} environment variable is examined.
10b89412 221This is a left-over from the early days when the concept of locales
4d76a0ec
UD
222had not even reached the level of POSIX locales.
223
224The environment variable and the locale name should have a value of the
225form @code{@var{lang}[_@var{terr}[.@var{codeset}]]} as explained above.
226If no environment variable is set the @code{"C"} locale is used which
40a55d20
UD
227prevents any translation.
228
229The return value of the function is in any case a valid string. Either
230it is a translation from a message catalog or it is the same as the
231@var{string} parameter. So a piece of code to decide whether a
232translation actually happened must look like this:
233
234@smallexample
235@{
236 char *trans = catgets (desc, set, msg, input_string);
237 if (trans == input_string)
238 @{
239 /* Something went wrong. */
240 @}
241@}
242@end smallexample
243
244@noindent
010fe231 245When an error occurs the global variable @code{errno} is set to
40a55d20
UD
246
247@table @var
248@item EBADF
249The catalog does not exist.
250@item ENOMSG
b8a46c1d 251The set/message tuple does not name an existing element in the
40a55d20
UD
252message catalog.
253@end table
254
255While it sometimes can be useful to test for errors programs normally
256will avoid any test. If the translation is not available it is no big
257problem if the original, untranslated message is printed. Either the
258user understands this as well or s/he will look for the reason why the
259messages are not translated.
260@end deftypefun
261
262Please note that the currently selected locale does not depend on a call
263to the @code{setlocale} function. It is not necessary that the locale
264data files for this locale exist and calling @code{setlocale} succeeds.
265The @code{catopen} function directly reads the values of the environment
266variables.
267
268
269@deftypefun {char *} catgets (nl_catd @var{catalog_desc}, int @var{set}, int @var{message}, const char *@var{string})
29e7e2df 270@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
10b89412 271The function @code{catgets} has to be used to access the message catalog
40a55d20
UD
272previously opened using the @code{catopen} function. The
273@var{catalog_desc} parameter must be a value previously returned by
274@code{catopen}.
275
276The next two parameters, @var{set} and @var{message}, reflect the
277internal organization of the message catalog files. This will be
278explained in detail below. For now it is interesting to know that a
10b89412 279catalog can consist of several sets and the messages in each thread are
40a55d20
UD
280individually numbered using numbers. Neither the set number nor the
281message number must be consecutive. They can be arbitrarily chosen.
282But each message (unless equal to another one) must have its own unique
10b89412 283pair of set and message numbers.
40a55d20
UD
284
285Since it is not guaranteed that the message catalog for the language
286selected by the user exists the last parameter @var{string} helps to
287handle this case gracefully. If no matching string can be found
288@var{string} is returned. This means for the programmer that
289
290@itemize @bullet
291@item
292the @var{string} parameters should contain reasonable text (this also
293helps to understand the program seems otherwise there would be no hint
294on the string which is expected to be returned.
295@item
296all @var{string} arguments should be written in the same language.
297@end itemize
298@end deftypefun
299
300It is somewhat uncomfortable to write a program using the @code{catgets}
301functions if no supporting functionality is available. Since each
f2ea0f5b 302set/message number tuple must be unique the programmer must keep lists
40a55d20
UD
303of the messages at the same time the code is written. And the work
304between several people working on the same project must be coordinated.
10b89412 305We will see how some of these problems can be relaxed a bit (@pxref{Common
8b7fb588 306Usage}).
40a55d20
UD
307
308@deftypefun int catclose (nl_catd @var{catalog_desc})
29e7e2df
AO
309@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acucorrupt{} @acsmem{}}}
310@c catclose @ascuheap @acucorrupt @acsmem
311@c __set_errno ok
312@c munmap ok
313@c free @ascuheap @acsmem
40a55d20
UD
314The @code{catclose} function can be used to free the resources
315associated with a message catalog which previously was opened by a call
316to @code{catopen}. If the resources can be successfully freed the
10b89412 317function returns @code{0}. Otherwise it returns @code{@minus{}1} and the
010fe231
FW
318global variable @code{errno} is set. Errors can occur if the catalog
319descriptor @var{catalog_desc} is not valid in which case @code{errno} is
40a55d20
UD
320set to @code{EBADF}.
321@end deftypefun
322
323
324@node The message catalog files
325@subsection Format of the message catalog files
326
10b89412 327The only reasonable way to translate all the messages of a function and
40a55d20
UD
328store the result in a message catalog file which can be read by the
329@code{catopen} function is to write all the message text to the
330translator and let her/him translate them all. I.e., we must have a
f2ea0f5b 331file with entries which associate the set/message tuple with a specific
40a55d20
UD
332translation. This file format is specified in the X/Open standard and
333is as follows:
334
335@itemize @bullet
336@item
337Lines containing only whitespace characters or empty lines are ignored.
338
339@item
340Lines which contain as the first non-whitespace character a @code{$}
341followed by a whitespace character are comment and are also ignored.
342
343@item
344If a line contains as the first non-whitespace characters the sequence
345@code{$set} followed by a whitespace character an additional argument
346is required to follow. This argument can either be:
347
348@itemize @minus
349@item
350a number. In this case the value of this number determines the set
351to which the following messages are added.
352
353@item
354an identifier consisting of alphanumeric characters plus the underscore
355character. In this case the set get automatically a number assigned.
356This value is one added to the largest set number which so far appeared.
357
358How to use the symbolic names is explained in section @ref{Common Usage}.
359
360It is an error if a symbol name appears more than once. All following
361messages are placed in a set with this number.
362@end itemize
363
364@item
365If a line contains as the first non-whitespace characters the sequence
366@code{$delset} followed by a whitespace character an additional argument
367is required to follow. This argument can either be:
368
369@itemize @minus
370@item
371a number. In this case the value of this number determines the set
372which will be deleted.
373
374@item
375an identifier consisting of alphanumeric characters plus the underscore
376character. This symbolic identifier must match a name for a set which
377previously was defined. It is an error if the name is unknown.
378@end itemize
379
380In both cases all messages in the specified set will be removed. They
381will not appear in the output. But if this set is later again selected
382with a @code{$set} command again messages could be added and these
383messages will appear in the output.
384
385@item
386If a line contains after leading whitespaces the sequence
387@code{$quote}, the quoting character used for this input file is
10b89412 388changed to the first non-whitespace character following
40a55d20 389@code{$quote}. If no non-whitespace character is present before the
10b89412 390line ends quoting is disabled.
40a55d20
UD
391
392By default no quoting character is used. In this mode strings are
393terminated with the first unescaped line break. If there is a
394@code{$quote} sequence present newline need not be escaped. Instead a
f2ea0f5b 395string is terminated with the first unescaped appearance of the quote
40a55d20
UD
396character.
397
398A common usage of this feature would be to set the quote character to
f2ea0f5b 399@code{"}. Then any appearance of the @code{"} in the strings must
40a55d20
UD
400be escaped using the backslash (i.e., @code{\"} must be written).
401
402@item
403Any other line must start with a number or an alphanumeric identifier
404(with the underscore character included). The following characters
a2d63612 405(starting after the first whitespace character) will form the string
40a55d20
UD
406which gets associated with the currently selected set and the message
407number represented by the number and identifier respectively.
408
409If the start of the line is a number the message number is obvious. It
410is an error if the same message number already appeared for this set.
411
412If the leading token was an identifier the message number gets
10b89412 413automatically assigned. The value is the current maximum message
40a55d20 414number for this set plus one. It is an error if the identifier was
608cc1f0 415already used for a message in this set. It is OK to reuse the
40a55d20
UD
416identifier for a message in another thread. How to use the symbolic
417identifiers will be explained below (@pxref{Common Usage}). There is
418one limitation with the identifier: it must not be @code{Set}. The
419reason will be explained below.
420
40a55d20
UD
421The text of the messages can contain escape characters. The usual bunch
422of characters known from the @w{ISO C} language are recognized
423(@code{\n}, @code{\t}, @code{\v}, @code{\b}, @code{\r}, @code{\f},
424@code{\\}, and @code{\@var{nnn}}, where @var{nnn} is the octal coding of
425a character code).
426@end itemize
427
428@strong{Important:} The handling of identifiers instead of numbers for
429the set and messages is a GNU extension. Systems strictly following the
430X/Open specification do not have this feature. An example for a message
431catalog file is this:
432
433@smallexample
434$ This is a leading comment.
435$quote "
436
437$set SetOne
4381 Message with ID 1.
439two " Message with ID \"two\", which gets the value 2 assigned"
440
441$set SetTwo
f2ea0f5b 442$ Since the last set got the number 1 assigned this set has number 2.
40a55d20
UD
4434000 "The numbers can be arbitrary, they need not start at one."
444@end smallexample
445
446This small example shows various aspects:
447@itemize @bullet
448@item
449Lines 1 and 9 are comments since they start with @code{$} followed by
450a whitespace.
451@item
452The quoting character is set to @code{"}. Otherwise the quotes in the
10b89412
RJ
453message definition would have to be omitted and in this case the
454message with the identifier @code{two} would lose its leading whitespace.
40a55d20 455@item
10b89412 456Mixing numbered messages with messages having symbolic names is no
f2ea0f5b 457problem and the numbering happens automatically.
40a55d20
UD
458@end itemize
459
460
461While this file format is pretty easy it is not the best possible for
462use in a running program. The @code{catopen} function would have to
10b89412 463parse the file and handle syntactic errors gracefully. This is not so
40a55d20
UD
464easy and the whole process is pretty slow. Therefore the @code{catgets}
465functions expect the data in another more compact and ready-to-use file
f2ea0f5b 466format. There is a special program @code{gencat} which is explained in
40a55d20
UD
467detail in the next section.
468
469Files in this other format are not human readable. To be easy to use by
470programs it is a binary file. But the format is byte order independent
471so translation files can be shared by systems of arbitrary architecture
1f77f049 472(as long as they use @theglibc{}).
40a55d20
UD
473
474Details about the binary file format are not important to know since
475these files are always created by the @code{gencat} program. The
1f77f049 476sources of @theglibc{} also provide the sources for the
f2ea0f5b 477@code{gencat} program and so the interested reader can look through
40a55d20
UD
478these source files to learn about the file format.
479
480
481@node The gencat program
482@subsection Generate Message Catalogs files
483
484@cindex gencat
485The @code{gencat} program is specified in the X/Open standard and the
e8b1163e 486GNU implementation follows this specification and so processes
40a55d20 487all correctly formed input files. Additionally some extension are
3081378b 488implemented which help to work in a more reasonable way with the
40a55d20
UD
489@code{catgets} functions.
490
491The @code{gencat} program can be invoked in two ways:
492
493@example
10b89412 494`gencat [@var{Option} @dots{}] [@var{Output-File} [@var{Input-File} @dots{}]]`
40a55d20
UD
495@end example
496
497This is the interface defined in the X/Open standard. If no
10b89412
RJ
498@var{Input-File} parameter is given, input will be read from standard
499input. Multiple input files will be read as if they were concatenated.
40a55d20 500If @var{Output-File} is also missing, the output will be written to
b8a46c1d 501standard output. To provide the interface one is used to from other
40a55d20
UD
502programs a second interface is provided.
503
504@smallexample
10b89412 505`gencat [@var{Option} @dots{}] -o @var{Output-File} [@var{Input-File} @dots{}]`
40a55d20
UD
506@end smallexample
507
508The option @samp{-o} is used to specify the output file and all file
509arguments are used as input files.
510
511Beside this one can use @file{-} or @file{/dev/stdin} for
512@var{Input-File} to denote the standard input. Corresponding one can
513use @file{-} and @file{/dev/stdout} for @var{Output-File} to denote
514standard output. Using @file{-} as a file name is allowed in X/Open
515while using the device names is a GNU extension.
516
517The @code{gencat} program works by concatenating all input files and
10b89412 518then @strong{merging} the resulting collection of message sets with a
f2ea0f5b
UD
519possibly existing output file. This is done by removing all messages
520with set/message number tuples matching any of the generated messages
40a55d20
UD
521from the output file and then adding all the new messages. To
522regenerate a catalog file while ignoring the old contents therefore
10b89412 523requires removing the output file if it exists. If the output is
40a55d20
UD
524written to standard output no merging takes place.
525
526@noindent
527The following table shows the options understood by the @code{gencat}
10b89412 528program. The X/Open standard does not specify any options for the
40a55d20
UD
529program so all of these are GNU extensions.
530
531@table @samp
532@item -V
533@itemx --version
534Print the version information and exit.
535@item -h
536@itemx --help
537Print a usage message listing all available options, then exit successfully.
538@item --new
10b89412
RJ
539Do not merge the new messages from the input files with the old content
540of the output file. The old content of the output file is discarded.
40a55d20
UD
541@item -H
542@itemx --header=name
543This option is used to emit the symbolic names given to sets and
544messages in the input files for use in the program. Details about how
545to use this are given in the next section. The @var{name} parameter to
546this option specifies the name of the output file. It will contain a
547number of C preprocessor @code{#define}s to associate a name with a
548number.
549
550Please note that the generated file only contains the symbols from the
551input files. If the output is merged with the previous content of the
552output file the possibly existing symbols from the file(s) which
553generated the old output files are not in the generated header file.
554@end table
555
556
557@node Common Usage
558@subsection How to use the @code{catgets} interface
559
560The @code{catgets} functions can be used in two different ways. By
561following slavishly the X/Open specs and not relying on the extension
562and by using the GNU extensions. We will take a look at the former
563method first to understand the benefits of extensions.
564
fed8f7f7 565@subsubsection Not using symbolic names
40a55d20
UD
566
567Since the X/Open format of the message catalog files does not allow
568symbol names we have to work with numbers all the time. When we start
f2ea0f5b
UD
569writing a program we have to replace all appearances of translatable
570strings with something like
40a55d20
UD
571
572@smallexample
573catgets (catdesc, set, msg, "string")
574@end smallexample
575
576@noindent
577@var{catgets} is retrieved from a call to @code{catopen} which is
578normally done once at the program start. The @code{"string"} is the
579string we want to translate. The problems start with the set and
580message numbers.
581
582In a bigger program several programmers usually work at the same time on
583the program and so coordinating the number allocation is crucial.
f2ea0f5b
UD
584Though no two different strings must be indexed by the same tuple of
585numbers it is highly desirable to reuse the numbers for equal strings
40a55d20
UD
586with equal translations (please note that there might be strings which
587are equal in one language but have different translations due to
588difference contexts).
589
590The allocation process can be relaxed a bit by different set numbers for
591different parts of the program. So the number of developers who have to
592coordinate the allocation can be reduced. But still lists must be keep
593track of the allocation and errors can easily happen. These errors
594cannot be discovered by the compiler or the @code{catgets} functions.
595Only the user of the program might see wrong messages printed. In the
596worst cases the messages are so irritating that they cannot be
597recognized as wrong. Think about the translations for @code{"true"} and
f2ea0f5b 598@code{"false"} being exchanged. This could result in a disaster.
40a55d20
UD
599
600
601@subsubsection Using symbolic names
602
603The problems mentioned in the last section derive from the fact that:
604
605@enumerate
606@item
607the numbers are allocated once and due to the possibly frequent use of
608them it is difficult to change a number later.
609@item
10b89412 610the numbers do not allow guessing anything about the string and
40a55d20
UD
611therefore collisions can easily happen.
612@end enumerate
613
614By constantly using symbolic names and by providing a method which maps
615the string content to a symbolic name (however this will happen) one can
616prevent both problems above. The cost of this is that the programmer
617has to write a complete message catalog file while s/he is writing the
618program itself.
619
620This is necessary since the symbolic names must be mapped to numbers
621before the program sources can be compiled. In the last section it was
622described how to generate a header containing the mapping of the names.
623E.g., for the example message file given in the last section we could
10b89412 624call the @code{gencat} program as follows (assume @file{ex.msg} contains
40a55d20
UD
625the sources).
626
627@smallexample
628gencat -H ex.h -o ex.cat ex.msg
629@end smallexample
630
631@noindent
632This generates a header file with the following content:
633
634@smallexample
b8a46c1d 635#define SetTwoSet 0x2 /* ex.msg:8 */
40a55d20 636
b8a46c1d
UD
637#define SetOneSet 0x1 /* ex.msg:4 */
638#define SetOnetwo 0x2 /* ex.msg:6 */
40a55d20
UD
639@end smallexample
640
641As can be seen the various symbols given in the source file are mangled
642to generate unique identifiers and these identifiers get numbers
643assigned. Reading the source file and knowing about the rules will
644allow to predict the content of the header file (it is deterministic)
645but this is not necessary. The @code{gencat} program can take care for
646everything. All the programmer has to do is to put the generated header
647file in the dependency list of the source files of her/his project and
10b89412 648add a rule to regenerate the header if any of the input files change.
40a55d20
UD
649
650One word about the symbol mangling. Every symbol consists of two parts:
651the name of the message set plus the name of the message or the special
652string @code{Set}. So @code{SetOnetwo} means this macro can be used to
653access the translation with identifier @code{two} in the message set
654@code{SetOne}.
655
656The other names denote the names of the message sets. The special
657string @code{Set} is used in the place of the message identifier.
658
659If in the code the second string of the set @code{SetOne} is used the C
660code should look like this:
661
662@smallexample
663catgets (catdesc, SetOneSet, SetOnetwo,
664 " Message with ID \"two\", which gets the value 2 assigned")
665@end smallexample
666
667Writing the function this way will allow to change the message number
668and even the set number without requiring any change in the C source
669code. (The text of the string is normally not the same; this is only
670for this example.)
671
672
673@subsubsection How does to this allow to develop
674
675To illustrate the usual way to work with the symbolic version numbers
676here is a little example. Assume we want to write the very complex and
677famous greeting program. We start by writing the code as usual:
678
679@smallexample
680#include <stdio.h>
681int
682main (void)
683@{
684 printf ("Hello, world!\n");
685 return 0;
686@}
687@end smallexample
688
689Now we want to internationalize the message and therefore replace the
690message with whatever the user wants.
691
692@smallexample
693#include <nl_types.h>
694#include <stdio.h>
695#include "msgnrs.h"
696int
697main (void)
698@{
699 nl_catd catdesc = catopen ("hello.cat", NL_CAT_LOCALE);
fed8f7f7 700 printf (catgets (catdesc, SetMainSet, SetMainHello,
838e5ffe 701 "Hello, world!\n"));
40a55d20
UD
702 catclose (catdesc);
703 return 0;
704@}
705@end smallexample
706
707We see how the catalog object is opened and the returned descriptor used
708in the other function calls. It is not really necessary to check for
709failure of any of the functions since even in these situations the
710functions will behave reasonable. They simply will be return a
711translation.
712
713What remains unspecified here are the constants @code{SetMainSet} and
714@code{SetMainHello}. These are the symbolic names describing the
715message. To get the actual definitions which match the information in
716the catalog file we have to create the message catalog source file and
717process it using the @code{gencat} program.
718
719@smallexample
720$ Messages for the famous greeting program.
721$quote "
722
723$set Main
724Hello "Hallo, Welt!\n"
725@end smallexample
726
727Now we can start building the program (assume the message catalog source
728file is named @file{hello.msg} and the program source file @file{hello.c}):
729
730@smallexample
40a55d20
UD
731% gencat -H msgnrs.h -o hello.cat hello.msg
732% cat msgnrs.h
733#define MainSet 0x1 /* hello.msg:4 */
734#define MainHello 0x1 /* hello.msg:5 */
735% gcc -o hello hello.c -I.
736% cp hello.cat /usr/share/locale/de/LC_MESSAGES
737% echo $LC_ALL
738de
739% ./hello
740Hallo, Welt!
741%
40a55d20
UD
742@end smallexample
743
744The call of the @code{gencat} program creates the missing header file
745@file{msgnrs.h} as well as the message catalog binary. The former is
746used in the compilation of @file{hello.c} while the later is placed in a
747directory in which the @code{catopen} function will try to locate it.
748Please check the @code{LC_ALL} environment variable and the default path
749for @code{catopen} presented in the description above.
750
751
752@node The Uniforum approach
753@section The Uniforum approach to Message Translation
754
755Sun Microsystems tried to standardize a different approach to message
756translation in the Uniforum group. There never was a real standard
6c55cda3 757defined but still the interface was used in Sun's operating systems.
40a55d20 758Since this approach fits better in the development process of free
1410e233 759software it is also used throughout the GNU project and the GNU
1f77f049 760@file{gettext} package provides support for this outside @theglibc{}.
40a55d20
UD
761
762The code of the @file{libintl} from GNU @file{gettext} is the same as
1f77f049 763the code in @theglibc{}. So the documentation in the GNU
40a55d20
UD
764@file{gettext} manual is also valid for the functionality here. The
765following text will describe the library functions in detail. But the
766numerous helper programs are not described in this manual. Instead
767people should read the GNU @file{gettext} manual
768(@pxref{Top,,GNU gettext utilities,gettext,Native Language Support Library and Tools}).
769We will only give a short overview.
770
771Though the @code{catgets} functions are available by default on more
772systems the @code{gettext} interface is at least as portable as the
773former. The GNU @file{gettext} package can be used wherever the
774functions are not available.
775
776
777@menu
778* Message catalogs with gettext:: The @code{gettext} family of functions.
779* Helper programs for gettext:: Programs to handle message catalogs
780 for @code{gettext}.
781@end menu
782
783
784@node Message catalogs with gettext
785@subsection The @code{gettext} family of functions
786
787The paradigms underlying the @code{gettext} approach to message
788translations is different from that of the @code{catgets} functions the
789basic functionally is equivalent. There are functions of the following
790categories:
791
792@menu
17c389fc
UD
793* Translation with gettext:: What has to be done to translate a message.
794* Locating gettext catalog:: How to determine which catalog to be used.
795* Advanced gettext functions:: Additional functions for more complicated
796 situations.
797* Charset conversion in gettext:: How to specify the output character set
798 @code{gettext} uses.
799* GUI program problems:: How to use @code{gettext} in GUI programs.
800* Using gettextized software:: The possibilities of the user to influence
801 the way @code{gettext} works.
40a55d20
UD
802@end menu
803
804@node Translation with gettext
805@subsubsection What has to be done to translate a message?
806
807The @code{gettext} functions have a very simple interface. The most
808basic function just takes the string which shall be translated as the
809argument and it returns the translation. This is fundamentally
810different from the @code{catgets} approach where an extra key is
811necessary and the original string is only used for the error case.
812
813If the string which has to be translated is the only argument this of
814course means the string itself is the key. I.e., the translation will
815be selected based on the original string. The message catalogs must
816therefore contain the original strings plus one translation for any such
10b89412 817string. The task of the @code{gettext} function is to compare the
40a55d20
UD
818argument string with the available strings in the catalog and return the
819appropriate translation. Of course this process is optimized so that
820this process is not more expensive than an access using an atomic key
821like in @code{catgets}.
822
823The @code{gettext} approach has some advantages but also some
824disadvantages. Please see the GNU @file{gettext} manual for a detailed
825discussion of the pros and cons.
826
827All the definitions and declarations for @code{gettext} can be found in
828the @file{libintl.h} header file. On systems where these functions are
829not part of the C library they can be found in a separate library named
830@file{libintl.a} (or accordingly different for shared libraries).
831
832@deftypefun {char *} gettext (const char *@var{msgid})
d08a7e4c 833@standards{GNU, libintl.h}
29e7e2df
AO
834@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}}
835@c Wrapper for dcgettext.
40a55d20
UD
836The @code{gettext} function searches the currently selected message
837catalogs for a string which is equal to @var{msgid}. If there is such a
838string available it is returned. Otherwise the argument string
839@var{msgid} is returned.
840
29e7e2df 841Please note that although the return value is @code{char *} the
40a55d20
UD
842returned string must not be changed. This broken type results from the
843history of the function and does not reflect the way the function should
844be used.
845
846Please note that above we wrote ``message catalogs'' (plural). This is
608cc1f0 847a specialty of the GNU implementation of these functions and we will
8b7fb588
UD
848say more about this when we talk about the ways message catalogs are
849selected (@pxref{Locating gettext catalog}).
40a55d20
UD
850
851The @code{gettext} function does not modify the value of the global
010fe231 852@code{errno} variable. This is necessary to make it possible to write
40a55d20
UD
853something like
854
855@smallexample
856 printf (gettext ("Operation failed: %m\n"));
857@end smallexample
858
010fe231 859Here the @code{errno} value is used in the @code{printf} function while
40a55d20
UD
860processing the @code{%m} format element and if the @code{gettext}
861function would change this value (it is called before @code{printf} is
f2ea0f5b 862called) we would get a wrong message.
40a55d20 863
10b89412 864So there is no easy way to detect a missing message catalog besides
40a55d20
UD
865comparing the argument string with the result. But it is normally the
866task of the user to react on missing catalogs. The program cannot guess
1410e233 867when a message catalog is really necessary since for a user who speaks
10b89412 868the language the program was developed in, the message does not need any translation.
40a55d20
UD
869@end deftypefun
870
871The remaining two functions to access the message catalog add some
872functionality to select a message catalog which is not the default one.
873This is important if parts of the program are developed independently.
874Every part can have its own message catalog and all of them can be used
875at the same time. The C library itself is an example: internally it
876uses the @code{gettext} functions but since it must not depend on a
877currently selected default message catalog it must specify all ambiguous
878information.
879
880@deftypefun {char *} dgettext (const char *@var{domainname}, const char *@var{msgid})
d08a7e4c 881@standards{GNU, libintl.h}
29e7e2df
AO
882@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}}
883@c Wrapper for dcgettext.
10b89412 884The @code{dgettext} function acts just like the @code{gettext}
40a55d20
UD
885function. It only takes an additional first argument @var{domainname}
886which guides the selection of the message catalogs which are searched
887for the translation. If the @var{domainname} parameter is the null
888pointer the @code{dgettext} function is exactly equivalent to
889@code{gettext} since the default value for the domain name is used.
890
891As for @code{gettext} the return value type is @code{char *} which is an
f2ea0f5b 892anachronism. The returned string must never be modified.
40a55d20
UD
893@end deftypefun
894
895@deftypefun {char *} dcgettext (const char *@var{domainname}, const char *@var{msgid}, int @var{category})
d08a7e4c 896@standards{GNU, libintl.h}
29e7e2df
AO
897@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}}
898@c dcgettext @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
899@c dcigettext @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
900@c libc_rwlock_rdlock @asulock @aculock
901@c current_locale_name ok [protected from @mtslocale]
902@c tfind ok
903@c libc_rwlock_unlock ok
904@c plural_lookup ok
905@c plural_eval ok
906@c rawmemchr ok
907@c DETERMINE_SECURE ok, nothing
908@c strcmp ok
909@c strlen ok
910@c getcwd @ascuheap @acsmem @acsfd
911@c strchr ok
912@c stpcpy ok
913@c category_to_name ok
914@c guess_category_value @mtsenv
915@c getenv @mtsenv
916@c current_locale_name dup ok [protected from @mtslocale by dcigettext]
917@c strcmp ok
918@c ENABLE_SECURE ok
919@c _nl_find_domain @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
920@c libc_rwlock_rdlock dup @asulock @aculock
921@c _nl_make_l10nflist dup @ascuheap @acsmem
922@c libc_rwlock_unlock dup ok
923@c _nl_load_domain @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
924@c libc_lock_lock_recursive @aculock
925@c libc_lock_unlock_recursive @aculock
926@c open->open_not_cancel_2 @acsfd
927@c fstat ok
928@c mmap dup @acsmem
929@c close->close_not_cancel_no_status @acsfd
930@c malloc dup @ascuheap @acsmem
931@c read->read_not_cancel ok
932@c munmap dup @acsmem
933@c W dup ok
934@c strlen dup ok
935@c get_sysdep_segment_value ok
936@c memcpy dup ok
937@c hash_string dup ok
938@c free dup @ascuheap @acsmem
939@c libc_rwlock_init ok
940@c _nl_find_msg dup @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
941@c libc_rwlock_fini ok
942@c EXTRACT_PLURAL_EXPRESSION @ascuheap @acsmem
943@c strstr dup ok
944@c isspace ok
945@c strtoul ok
946@c PLURAL_PARSE @ascuheap @acsmem
947@c malloc dup @ascuheap @acsmem
948@c free dup @ascuheap @acsmem
949@c INIT_GERMANIC_PLURAL ok, nothing
950@c the pre-C99 variant is @acucorrupt [protected from @mtuinit by dcigettext]
951@c _nl_expand_alias dup @ascuheap @asulock @acsmem @acsfd @aculock
952@c _nl_explode_name dup @ascuheap @acsmem
953@c libc_rwlock_wrlock dup @asulock @aculock
954@c free dup @asulock @aculock @acsfd @acsmem
955@c _nl_find_msg @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
956@c _nl_load_domain dup @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
957@c strlen ok
958@c hash_string ok
959@c W ok
960@c SWAP ok
961@c bswap_32 ok
962@c strcmp ok
963@c get_output_charset @mtsenv @ascuheap @acsmem
964@c getenv dup @mtsenv
965@c strlen dup ok
966@c malloc dup @ascuheap @acsmem
967@c memcpy dup ok
968@c libc_rwlock_rdlock dup @asulock @aculock
969@c libc_rwlock_unlock dup ok
970@c libc_rwlock_wrlock dup @asulock @aculock
971@c realloc @ascuheap @acsmem
972@c strdup @ascuheap @acsmem
973@c strstr ok
974@c strcspn ok
975@c mempcpy dup ok
976@c norm_add_slashes dup ok
977@c gconv_open @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd
978@c [protected from @mtslocale by dcigettext locale lock]
979@c free dup @ascuheap @acsmem
980@c libc_lock_lock @asulock @aculock
981@c calloc @ascuheap @acsmem
982@c gconv dup @acucorrupt [protected from @mtsrace and @asucorrupt by lock]
983@c libc_lock_unlock ok
984@c malloc @ascuheap @acsmem
985@c mempcpy ok
986@c memcpy ok
987@c strcpy ok
988@c libc_rwlock_wrlock @asulock @aculock
989@c tsearch @ascuheap @acucorrupt @acsmem [protected from @mtsrace and @asucorrupt]
990@c transcmp ok
991@c strmp dup ok
992@c free @ascuheap @acsmem
40a55d20
UD
993The @code{dcgettext} adds another argument to those which
994@code{dgettext} takes. This argument @var{category} specifies the last
995piece of information needed to localize the message catalog. I.e., the
996domain name and the locale category exactly specify which message
997catalog has to be used (relative to a given directory, see below).
998
999The @code{dgettext} function can be expressed in terms of
1000@code{dcgettext} by using
1001
1002@smallexample
1003dcgettext (domain, string, LC_MESSAGES)
1004@end smallexample
1005
1006@noindent
1007instead of
1008
1009@smallexample
1010dgettext (domain, string)
1011@end smallexample
1012
1013This also shows which values are expected for the third parameter. One
1014has to use the available selectors for the categories available in
1015@file{locale.h}. Normally the available values are @code{LC_CTYPE},
1016@code{LC_COLLATE}, @code{LC_MESSAGES}, @code{LC_MONETARY},
1017@code{LC_NUMERIC}, and @code{LC_TIME}. Please note that @code{LC_ALL}
1018must not be used and even though the names might suggest this, there is
10b89412 1019no relation to the environment variable of this name.
40a55d20
UD
1020
1021The @code{dcgettext} function is only implemented for compatibility with
1022other systems which have @code{gettext} functions. There is not really
1023any situation where it is necessary (or useful) to use a different value
10b89412 1024than @code{LC_MESSAGES} for the @var{category} parameter. We are
40a55d20
UD
1025dealing with messages here and any other choice can only be irritating.
1026
1027As for @code{gettext} the return value type is @code{char *} which is an
f2ea0f5b 1028anachronism. The returned string must never be modified.
40a55d20
UD
1029@end deftypefun
1030
1031When using the three functions above in a program it is a frequent case
10b89412 1032that the @var{msgid} argument is a constant string. So it is worthwhile to
40a55d20
UD
1033optimize this case. Thinking shortly about this one will realize that
1034as long as no new message catalog is loaded the translation of a message
1410e233
UD
1035will not change. This optimization is actually implemented by the
1036@code{gettext}, @code{dgettext} and @code{dcgettext} functions.
40a55d20
UD
1037
1038
1039@node Locating gettext catalog
1040@subsubsection How to determine which catalog to be used
1041
f2ea0f5b 1042The functions to retrieve the translations for a given message have a
40a55d20
UD
1043remarkable simple interface. But to provide the user of the program
1044still the opportunity to select exactly the translation s/he wants and
1045also to provide the programmer the possibility to influence the way to
1046locate the search for catalogs files there is a quite complicated
1047underlying mechanism which controls all this. The code is complicated
1048the use is easy.
1049
1050Basically we have two different tasks to perform which can also be
1051performed by the @code{catgets} functions:
1052
1053@enumerate
1054@item
1055Locate the set of message catalogs. There are a number of files for
10b89412 1056different languages which all belong to the package. Usually they
40a55d20
UD
1057are all stored in the filesystem below a certain directory.
1058
10b89412 1059There can be arbitrarily many packages installed and they can follow
40a55d20
UD
1060different guidelines for the placement of their files.
1061
1062@item
1063Relative to the location specified by the package the actual translation
1064files must be searched, based on the wishes of the user. I.e., for each
1065language the user selects the program should be able to locate the
1066appropriate file.
1067@end enumerate
1068
1069This is the functionality required by the specifications for
1070@code{gettext} and this is also what the @code{catgets} functions are
1071able to do. But there are some problems unresolved:
1072
1073@itemize @bullet
1074@item
1075The language to be used can be specified in several different ways.
1076There is no generally accepted standard for this and the user always
10b89412 1077expects the program to understand what s/he means. E.g., to select the
40a55d20
UD
1078German translation one could write @code{de}, @code{german}, or
1079@code{deutsch} and the program should always react the same.
1080
1081@item
1082Sometimes the specification of the user is too detailed. If s/he, e.g.,
1083specifies @code{de_DE.ISO-8859-1} which means German, spoken in Germany,
1084coded using the @w{ISO 8859-1} character set there is the possibility
1085that a message catalog matching this exactly is not available. But
1086there could be a catalog matching @code{de} and if the character set
1087used on the machine is always @w{ISO 8859-1} there is no reason why this
1088later message catalog should not be used. (We call this @dfn{message
1089inheritance}.)
1090
1091@item
1092If a catalog for a wanted language is not available it is not always the
1093second best choice to fall back on the language of the developer and
1094simply not translate any message. Instead a user might be better able
1095to read the messages in another language and so the user of the program
9dcc8f11 1096should be able to define a precedence order of languages.
40a55d20
UD
1097@end itemize
1098
f2ea0f5b 1099We can divide the configuration actions in two parts: the one is
40a55d20
UD
1100performed by the programmer, the other by the user. We will start with
1101the functions the programmer can use since the user configuration will
1102be based on this.
1103
1104As the functions described in the last sections already mention separate
1105sets of messages can be selected by a @dfn{domain name}. This is a
10b89412
RJ
1106simple string which should be unique for each program part that uses a
1107separate domain. It is possible to use in one program arbitrarily many
1f77f049 1108domains at the same time. E.g., @theglibc{} itself uses a domain
40a55d20
UD
1109named @code{libc} while the program using the C Library could use a
1110domain named @code{foo}. The important point is that at any time
1111exactly one domain is active. This is controlled with the following
1112function.
1113
1114@deftypefun {char *} textdomain (const char *@var{domainname})
d08a7e4c 1115@standards{GNU, libintl.h}
29e7e2df
AO
1116@safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}}
1117@c textdomain @asulock @ascuheap @aculock @acsmem
1118@c libc_rwlock_wrlock @asulock @aculock
1119@c strcmp ok
1120@c strdup @ascuheap @acsmem
1121@c free @ascuheap @acsmem
1122@c libc_rwlock_unlock ok
40a55d20
UD
1123The @code{textdomain} function sets the default domain, which is used in
1124all future @code{gettext} calls, to @var{domainname}. Please note that
1125@code{dgettext} and @code{dcgettext} calls are not influenced if the
1126@var{domainname} parameter of these functions is not the null pointer.
1127
1128Before the first call to @code{textdomain} the default domain is
f2ea0f5b 1129@code{messages}. This is the name specified in the specification of
40a55d20
UD
1130the @code{gettext} API. This name is as good as any other name. No
1131program should ever really use a domain with this name since this can
1132only lead to problems.
1133
1134The function returns the value which is from now on taken as the default
1135domain. If the system went out of memory the returned value is
010fe231 1136@code{NULL} and the global variable @code{errno} is set to @code{ENOMEM}.
40a55d20
UD
1137Despite the return value type being @code{char *} the return string must
1138not be changed. It is allocated internally by the @code{textdomain}
1139function.
1140
1141If the @var{domainname} parameter is the null pointer no new default
1142domain is set. Instead the currently selected default domain is
1143returned.
1144
1145If the @var{domainname} parameter is the empty string the default domain
1146is reset to its initial value, the domain with the name @code{messages}.
1147This possibility is questionable to use since the domain @code{messages}
1148really never should be used.
1149@end deftypefun
1150
1151@deftypefun {char *} bindtextdomain (const char *@var{domainname}, const char *@var{dirname})
d08a7e4c 1152@standards{GNU, libintl.h}
29e7e2df
AO
1153@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
1154@c bindtextdomain @ascuheap @acsmem
1155@c set_binding_values @ascuheap @acsmem
1156@c libc_rwlock_wrlock dup @asulock @aculock
1157@c strcmp dup ok
1158@c strdup dup @ascuheap @acsmem
1159@c free dup @ascuheap @acsmem
1160@c malloc dup @ascuheap @acsmem
9133b79b 1161The @code{bindtextdomain} function can be used to specify the directory
40a55d20
UD
1162which contains the message catalogs for domain @var{domainname} for the
1163different languages. To be correct, this is the directory where the
f2ea0f5b 1164hierarchy of directories is expected. Details are explained below.
40a55d20
UD
1165
1166For the programmer it is important to note that the translations which
10b89412 1167come with the program have to be placed in a directory hierarchy starting
40a55d20
UD
1168at, say, @file{/foo/bar}. Then the program should make a
1169@code{bindtextdomain} call to bind the domain for the current program to
1170this directory. So it is made sure the catalogs are found. A correctly
1171running program does not depend on the user setting an environment
1172variable.
1173
1174The @code{bindtextdomain} function can be used several times and if the
17c389fc 1175@var{domainname} argument is different the previously bound domains
40a55d20
UD
1176will not be overwritten.
1177
26b4d766
UD
1178If the program which wish to use @code{bindtextdomain} at some point of
1179time use the @code{chdir} function to change the current working
1180directory it is important that the @var{dirname} strings ought to be an
1181absolute pathname. Otherwise the addressed directory might vary with
1182the time.
1183
40a55d20
UD
1184If the @var{dirname} parameter is the null pointer @code{bindtextdomain}
1185returns the currently selected directory for the domain with the name
1186@var{domainname}.
1187
9133b79b 1188The @code{bindtextdomain} function returns a pointer to a string
40a55d20
UD
1189containing the name of the selected directory name. The string is
1190allocated internally in the function and must not be changed by the
1191user. If the system went out of core during the execution of
1192@code{bindtextdomain} the return value is @code{NULL} and the global
010fe231 1193variable @code{errno} is set accordingly.
40a55d20
UD
1194@end deftypefun
1195
1196
b8a46c1d
UD
1197@node Advanced gettext functions
1198@subsubsection Additional functions for more complicated situations
1199
1200The functions of the @code{gettext} family described so far (and all the
1201@code{catgets} functions as well) have one problem in the real world
10b89412 1202which has been neglected completely in all existing approaches. What
b8a46c1d
UD
1203is meant here is the handling of plural forms.
1204
1205Looking through Unix source code before the time anybody thought about
1206internationalization (and, sadly, even afterwards) one can often find
1207code similar to the following:
1208
1209@smallexample
1210 printf ("%d file%s deleted", n, n == 1 ? "" : "s");
1211@end smallexample
1212
1213@noindent
c891b2df 1214After the first complaints from people internationalizing the code people
b8a46c1d
UD
1215either completely avoided formulations like this or used strings like
1216@code{"file(s)"}. Both look unnatural and should be avoided. First
1217tries to solve the problem correctly looked like this:
1218
1219@smallexample
1220 if (n == 1)
1221 printf ("%d file deleted", n);
1222 else
1223 printf ("%d files deleted", n);
1224@end smallexample
1225
1226But this does not solve the problem. It helps languages where the
1227plural form of a noun is not simply constructed by adding an `s' but
1228that is all. Once again people fell into the trap of believing the
10b89412 1229rules their language uses are universal. But the handling of plural
b8a46c1d
UD
1230forms differs widely between the language families. There are two
1231things we can differ between (and even inside language families);
1232
1233@itemize @bullet
1234@item
1235The form how plural forms are build differs. This is a problem with
1236language which have many irregularities. German, for instance, is a
1237drastic case. Though English and German are part of the same language
1238family (Germanic), the almost regular forming of plural noun forms
608cc1f0 1239(appending an `s') is hardly found in German.
b8a46c1d
UD
1240
1241@item
1242The number of plural forms differ. This is somewhat surprising for
1243those who only have experiences with Romanic and Germanic languages
1244since here the number is the same (there are two).
1245
1246But other language families have only one form or many forms. More
1247information on this in an extra section.
1248@end itemize
1249
1250The consequence of this is that application writers should not try to
1251solve the problem in their code. This would be localization since it is
1252only usable for certain, hardcoded language environments. Instead the
1253extended @code{gettext} interface should be used.
1254
1255These extra functions are taking instead of the one key string two
9dcc8f11 1256strings and a numerical argument. The idea behind this is that using
b8a46c1d
UD
1257the numerical argument and the first string as a key, the implementation
1258can select using rules specified by the translator the right plural
1259form. The two string arguments then will be used to provide a return
1260value in case no message catalog is found (similar to the normal
608cc1f0 1261@code{gettext} behavior). In this case the rules for Germanic language
10b89412 1262are used and it is assumed that the first string argument is the singular
b8a46c1d
UD
1263form, the second the plural form.
1264
1265This has the consequence that programs without language catalogs can
1266display the correct strings only if the program itself is written using
1f77f049 1267a Germanic language. This is a limitation but since @theglibc{}
10b89412
RJ
1268(as well as the GNU @code{gettext} package) is written as part of the
1269GNU package and the coding standards for the GNU project require programs
1270to be written in English, this solution nevertheless fulfills its
b8a46c1d
UD
1271purpose.
1272
b8a46c1d 1273@deftypefun {char *} ngettext (const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n})
d08a7e4c 1274@standards{GNU, libintl.h}
29e7e2df
AO
1275@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}}
1276@c Wrapper for dcngettext.
b8a46c1d
UD
1277The @code{ngettext} function is similar to the @code{gettext} function
1278as it finds the message catalogs in the same way. But it takes two
1279extra arguments. The @var{msgid1} parameter must contain the singular
1280form of the string to be converted. It is also used as the key for the
1281search in the catalog. The @var{msgid2} parameter is the plural form.
1282The parameter @var{n} is used to determine the plural form. If no
1283message catalog is found @var{msgid1} is returned if @code{n == 1},
1284otherwise @code{msgid2}.
1285
10b89412 1286An example for the use of this function is:
b8a46c1d
UD
1287
1288@smallexample
1289 printf (ngettext ("%d file removed", "%d files removed", n), n);
1290@end smallexample
1291
1292Please note that the numeric value @var{n} has to be passed to the
1293@code{printf} function as well. It is not sufficient to pass it only to
1294@code{ngettext}.
1295@end deftypefun
1296
b8a46c1d 1297@deftypefun {char *} dngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n})
d08a7e4c 1298@standards{GNU, libintl.h}
29e7e2df
AO
1299@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}}
1300@c Wrapper for dcngettext.
b8a46c1d
UD
1301The @code{dngettext} is similar to the @code{dgettext} function in the
1302way the message catalog is selected. The difference is that it takes
10b89412 1303two extra parameters to provide the correct plural form. These two
b8a46c1d
UD
1304parameters are handled in the same way @code{ngettext} handles them.
1305@end deftypefun
1306
b8a46c1d 1307@deftypefun {char *} dcngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n}, int @var{category})
d08a7e4c 1308@standards{GNU, libintl.h}
29e7e2df
AO
1309@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}}
1310@c Wrapper for dcigettext.
b8a46c1d
UD
1311The @code{dcngettext} is similar to the @code{dcgettext} function in the
1312way the message catalog is selected. The difference is that it takes
10b89412 1313two extra parameters to provide the correct plural form. These two
b8a46c1d
UD
1314parameters are handled in the same way @code{ngettext} handles them.
1315@end deftypefun
1316
1317@subsubheading The problem of plural forms
1318
1319A description of the problem can be found at the beginning of the last
1320section. Now there is the question how to solve it. Without the input
1321of linguists (which was not available) it was not possible to determine
1322whether there are only a few different forms in which plural forms are
1323formed or whether the number can increase with every new supported
1324language.
1325
1326Therefore the solution implemented is to allow the translator to specify
1327the rules of how to select the plural form. Since the formula varies
1328with every language this is the only viable solution except for
608cc1f0
UD
1329hardcoding the information in the code (which still would require the
1330possibility of extensions to not prevent the use of new languages). The
a1286745 1331details are explained in the GNU @code{gettext} manual. Here only a
b8a46c1d
UD
1332bit of information is provided.
1333
1334The information about the plural form selection has to be stored in the
10b89412 1335header entry (the one with the empty @code{msgid} string). It looks
c891b2df 1336like this:
b8a46c1d
UD
1337
1338@smallexample
c891b2df 1339Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1;
b8a46c1d
UD
1340@end smallexample
1341
1342The @code{nplurals} value must be a decimal number which specifies how
1343many different plural forms exist for this language. The string
10b89412
RJ
1344following @code{plural} is an expression using the C language
1345syntax. Exceptions are that no negative numbers are allowed, numbers
b8a46c1d
UD
1346must be decimal, and the only variable allowed is @code{n}. This
1347expression will be evaluated whenever one of the functions
1348@code{ngettext}, @code{dngettext}, or @code{dcngettext} is called. The
1349numeric value passed to these functions is then substituted for all uses
1350of the variable @code{n} in the expression. The resulting value then
1351must be greater or equal to zero and smaller than the value given as the
1352value of @code{nplurals}.
1353
1354@noindent
1355The following rules are known at this point. The language with families
1356are listed. But this does not necessarily mean the information can be
1357generalized for the whole family (as can be easily seen in the table
1358below).@footnote{Additions are welcome. Send appropriate information to
1359@email{bug-glibc-manual@@gnu.org}.}
1360
1361@table @asis
1362@item Only one form:
1363Some languages only require one single form. There is no distinction
c891b2df 1364between the singular and plural form. An appropriate header entry
b8a46c1d
UD
1365would look like this:
1366
1367@smallexample
c891b2df 1368Plural-Forms: nplurals=1; plural=0;
b8a46c1d
UD
1369@end smallexample
1370
1371@noindent
1372Languages with this property include:
1373
1374@table @asis
1375@item Finno-Ugric family
1376Hungarian
1377@item Asian family
3c945c44 1378Japanese, Korean
b8a46c1d
UD
1379@item Turkic/Altaic family
1380Turkish
1381@end table
1382
1383@item Two forms, singular used for one only
c934e1c0 1384This is the form used in most existing programs since it is what English
10b89412 1385uses. A header entry would look like this:
b8a46c1d
UD
1386
1387@smallexample
c891b2df 1388Plural-Forms: nplurals=2; plural=n != 1;
b8a46c1d
UD
1389@end smallexample
1390
1391(Note: this uses the feature of C expressions that boolean expressions
1392have to value zero or one.)
1393
1394@noindent
1395Languages with this property include:
1396
1397@table @asis
1398@item Germanic family
1399Danish, Dutch, English, German, Norwegian, Swedish
1400@item Finno-Ugric family
aa9e3c39 1401Estonian, Finnish
b8a46c1d
UD
1402@item Latin/Greek family
1403Greek
1404@item Semitic family
1405Hebrew
1406@item Romance family
3c945c44 1407Italian, Portuguese, Spanish
b8a46c1d
UD
1408@item Artificial
1409Esperanto
1410@end table
1411
1412@item Two forms, singular used for zero and one
1413Exceptional case in the language family. The header entry would be:
1414
1415@smallexample
c891b2df 1416Plural-Forms: nplurals=2; plural=n>1;
b8a46c1d
UD
1417@end smallexample
1418
1419@noindent
1420Languages with this property include:
1421
1422@table @asis
1423@item Romanic family
3c945c44
UD
1424French, Brazilian Portuguese
1425@end table
1426
1427@item Three forms, special case for zero
1428The header entry would be:
1429
1430@smallexample
1431Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n != 0 ? 1 : 2;
1432@end smallexample
1433
1434@noindent
1435Languages with this property include:
1436
1437@table @asis
1438@item Baltic family
1439Latvian
b8a46c1d
UD
1440@end table
1441
1442@item Three forms, special cases for one and two
1443The header entry would be:
1444
1445@smallexample
c891b2df 1446Plural-Forms: nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2;
b8a46c1d
UD
1447@end smallexample
1448
1449@noindent
1450Languages with this property include:
1451
1452@table @asis
1453@item Celtic
3c945c44
UD
1454Gaeilge (Irish)
1455@end table
1456
1457@item Three forms, special case for numbers ending in 1[2-9]
1458The header entry would look like this:
1459
1460@smallexample
1461Plural-Forms: nplurals=3; \
1462 plural=n%10==1 && n%100!=11 ? 0 : \
1463 n%10>=2 && (n%100<10 || n%100>=20) ? 1 : 2;
1464@end smallexample
1465
1466@noindent
1467Languages with this property include:
1468
1469@table @asis
1470@item Baltic family
1471Lithuanian
b8a46c1d
UD
1472@end table
1473
aa9e3c39 1474@item Three forms, special cases for numbers ending in 1 and 2, 3, 4, except those ending in 1[1-4]
b8a46c1d
UD
1475The header entry would look like this:
1476
1477@smallexample
c891b2df
UD
1478Plural-Forms: nplurals=3; \
1479 plural=n%100/10==1 ? 2 : n%10==1 ? 0 : (n+9)%10>3 ? 2 : 1;
b8a46c1d
UD
1480@end smallexample
1481
1482@noindent
1483Languages with this property include:
1484
1485@table @asis
1486@item Slavic family
3c945c44 1487Croatian, Czech, Russian, Ukrainian
107d41a9
UD
1488@end table
1489
1490@item Three forms, special cases for 1 and 2, 3, 4
1491The header entry would look like this:
1492
1493@smallexample
1494Plural-Forms: nplurals=3; \
1495 plural=(n==1) ? 1 : (n>=2 && n<=4) ? 2 : 0;
1496@end smallexample
1497
1498@noindent
1499Languages with this property include:
1500
1501@table @asis
1502@item Slavic family
1503Slovak
b8a46c1d
UD
1504@end table
1505
1506@item Three forms, special case for one and some numbers ending in 2, 3, or 4
1507The header entry would look like this:
1508
1509@smallexample
c891b2df
UD
1510Plural-Forms: nplurals=3; \
1511 plural=n==1 ? 0 : \
1512 n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;
b8a46c1d
UD
1513@end smallexample
1514
b8a46c1d
UD
1515@noindent
1516Languages with this property include:
1517
1518@table @asis
1519@item Slavic family
1520Polish
1521@end table
1522
3c945c44 1523@item Four forms, special case for one and all numbers ending in 02, 03, or 04
b8a46c1d
UD
1524The header entry would look like this:
1525
1526@smallexample
c891b2df 1527Plural-Forms: nplurals=4; \
3c945c44 1528 plural=n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3;
b8a46c1d
UD
1529@end smallexample
1530
1531@noindent
1532Languages with this property include:
1533
1534@table @asis
1535@item Slavic family
1536Slovenian
1537@end table
1538@end table
1539
1540
17c389fc
UD
1541@node Charset conversion in gettext
1542@subsubsection How to specify the output character set @code{gettext} uses
1543
10b89412 1544@code{gettext} not only looks up a translation in a message catalog, it
17c389fc
UD
1545also converts the translation on the fly to the desired output character
1546set. This is useful if the user is working in a different character set
1547than the translator who created the message catalog, because it avoids
1548distributing variants of message catalogs which differ only in the
1549character set.
1550
1551The output character set is, by default, the value of @code{nl_langinfo
1552(CODESET)}, which depends on the @code{LC_CTYPE} part of the current
1553locale. But programs which store strings in a locale independent way
1554(e.g. UTF-8) can request that @code{gettext} and related functions
1555return the translations in that encoding, by use of the
1556@code{bind_textdomain_codeset} function.
1557
1558Note that the @var{msgid} argument to @code{gettext} is not subject to
1559character set conversion. Also, when @code{gettext} does not find a
1560translation for @var{msgid}, it returns @var{msgid} unchanged --
1561independently of the current output character set. It is therefore
1562recommended that all @var{msgid}s be US-ASCII strings.
1563
17c389fc 1564@deftypefun {char *} bind_textdomain_codeset (const char *@var{domainname}, const char *@var{codeset})
d08a7e4c 1565@standards{GNU, libintl.h}
29e7e2df
AO
1566@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
1567@c bind_textdomain_codeset @ascuheap @acsmem
1568@c set_binding_values dup @ascuheap @acsmem
17c389fc
UD
1569The @code{bind_textdomain_codeset} function can be used to specify the
1570output character set for message catalogs for domain @var{domainname}.
1410e233
UD
1571The @var{codeset} argument must be a valid codeset name which can be used
1572for the @code{iconv_open} function, or a null pointer.
17c389fc
UD
1573
1574If the @var{codeset} parameter is the null pointer,
1575@code{bind_textdomain_codeset} returns the currently selected codeset
cf822e3c 1576for the domain with the name @var{domainname}. It returns @code{NULL} if
17c389fc
UD
1577no codeset has yet been selected.
1578
107d41a9 1579The @code{bind_textdomain_codeset} function can be used several times.
17c389fc
UD
1580If used multiple times with the same @var{domainname} argument, the
1581later call overrides the settings made by the earlier one.
1582
1583The @code{bind_textdomain_codeset} function returns a pointer to a
1584string containing the name of the selected codeset. The string is
1585allocated internally in the function and must not be changed by the
1586user. If the system went out of core during the execution of
1587@code{bind_textdomain_codeset}, the return value is @code{NULL} and the
010fe231 1588global variable @code{errno} is set accordingly.
582a3cff 1589@end deftypefun
17c389fc
UD
1590
1591
608cc1f0
UD
1592@node GUI program problems
1593@subsubsection How to use @code{gettext} in GUI programs
1594
1410e233
UD
1595One place where the @code{gettext} functions, if used normally, have big
1596problems is within programs with graphical user interfaces (GUIs). The
608cc1f0
UD
1597problem is that many of the strings which have to be translated are very
1598short. They have to appear in pull-down menus which restricts the
1599length. But strings which are not containing entire sentences or at
1600least large fragments of a sentence may appear in more than one
1601situation in the program but might have different translations. This is
1602especially true for the one-word strings which are frequently used in
1603GUI programs.
1604
1605As a consequence many people say that the @code{gettext} approach is
1606wrong and instead @code{catgets} should be used which indeed does not
1607have this problem. But there is a very simple and powerful method to
1608handle these kind of problems with the @code{gettext} functions.
1609
1610@noindent
bbf70ae9 1611As an example consider the following fictional situation. A GUI program
608cc1f0
UD
1612has a menu bar with the following entries:
1613
1614@smallexample
1615+------------+------------+--------------------------------------+
1616| File | Printer | |
1617+------------+------------+--------------------------------------+
1618| Open | | Select |
1619| New | | Open |
1620+----------+ | Connect |
1621 +----------+
1622@end smallexample
1623
1624To have the strings @code{File}, @code{Printer}, @code{Open},
1625@code{New}, @code{Select}, and @code{Connect} translated there has to be
1626at some point in the code a call to a function of the @code{gettext}
1627family. But in two places the string passed into the function would be
1628@code{Open}. The translations might not be the same and therefore we
1629are in the dilemma described above.
1630
ef48b196 1631One solution to this problem is to artificially extend the strings
608cc1f0 1632to make them unambiguous. But what would the program do if no
ef48b196 1633translation is available? The extended string is not what should be
10b89412 1634printed. So we should use a slightly modified version of the functions.
608cc1f0 1635
ef48b196 1636To extend the strings a uniform method should be used. E.g., in the
10b89412 1637example above, the strings could be chosen as
608cc1f0
UD
1638
1639@smallexample
1640Menu|File
1641Menu|Printer
1642Menu|File|Open
1643Menu|File|New
1644Menu|Printer|Select
1645Menu|Printer|Open
1646Menu|Printer|Connect
1647@end smallexample
1648
1649Now all the strings are different and if now instead of @code{gettext}
1650the following little wrapper function is used, everything works just
1651fine:
1652
1653@cindex sgettext
1654@smallexample
1655 char *
1656 sgettext (const char *msgid)
1657 @{
1658 char *msgval = gettext (msgid);
1659 if (msgval == msgid)
1660 msgval = strrchr (msgid, '|') + 1;
1661 return msgval;
1662 @}
1663@end smallexample
1664
1665What this little function does is to recognize the case when no
1666translation is available. This can be done very efficiently by a
1667pointer comparison since the return value is the input value. If there
1668is no translation we know that the input string is in the format we used
1669for the Menu entries and therefore contains a @code{|} character. We
1670simply search for the last occurrence of this character and return a
1671pointer to the character following it. That's it!
1672
ef48b196 1673If one now consistently uses the extended string form and replaces
608cc1f0
UD
1674the @code{gettext} calls with calls to @code{sgettext} (this is normally
1675limited to very few places in the GUI implementation) then it is
1676possible to produce a program which can be internationalized.
1677
1678With advanced compilers (such as GNU C) one can write the
1679@code{sgettext} functions as an inline function or as a macro like this:
1680
1681@cindex sgettext
1682@smallexample
1683#define sgettext(msgid) \
1684 (@{ const char *__msgid = (msgid); \
1685 char *__msgstr = gettext (__msgid); \
1686 if (__msgval == __msgid) \
1687 __msgval = strrchr (__msgid, '|') + 1; \
1688 __msgval; @})
1689@end smallexample
1690
1691The other @code{gettext} functions (@code{dgettext}, @code{dcgettext}
1692and the @code{ngettext} equivalents) can and should have corresponding
1693functions as well which look almost identical, except for the parameters
1694and the call to the underlying function.
1695
1696Now there is of course the question why such functions do not exist in
1f77f049 1697@theglibc{}? There are two parts of the answer to this question.
608cc1f0
UD
1698
1699@itemize @bullet
1700@item
1701They are easy to write and therefore can be provided by the project they
1702are used in. This is not an answer by itself and must be seen together
1703with the second part which is:
1704
1705@item
1706There is no way the C library can contain a version which can work
1707everywhere. The problem is the selection of the character to separate
ef48b196 1708the prefix from the actual string in the extended string. The
608cc1f0
UD
1709examples above used @code{|} which is a quite good choice because it
1710resembles a notation frequently used in this context and it also is a
1711character not often used in message strings.
1712
1713But what if the character is used in message strings. Or if the chose
1714character is not available in the character set on the machine one
1715compiles (e.g., @code{|} is not required to exist for @w{ISO C}; this is
1716why the @file{iso646.h} file exists in @w{ISO C} programming environments).
1717@end itemize
1718
1719There is only one more comment to make left. The wrapper function above
10b89412 1720requires that the translations strings are not extended themselves.
608cc1f0
UD
1721This is only logical. There is no need to disambiguate the strings
1722(since they are never used as keys for a search) and one also saves
1723quite some memory and disk space by doing this.
1724
1725
40a55d20
UD
1726@node Using gettextized software
1727@subsubsection User influence on @code{gettext}
1728
1729The last sections described what the programmer can do to
1730internationalize the messages of the program. But it is finally up to
1731the user to select the message s/he wants to see. S/He must understand
1732them.
1733
1734The POSIX locale model uses the environment variables @code{LC_COLLATE},
a1286745 1735@code{LC_CTYPE}, @code{LC_MESSAGES}, @code{LC_MONETARY}, @code{LC_NUMERIC},
40a55d20 1736and @code{LC_TIME} to select the locale which is to be used. This way
10b89412 1737the user can influence lots of functions. As we mentioned above, the
40a55d20
UD
1738@code{gettext} functions also take advantage of this.
1739
1740To understand how this happens it is necessary to take a look at the
1741various components of the filename which gets computed to locate a
1742message catalog. It is composed as follows:
1743
1744@smallexample
1745@var{dir_name}/@var{locale}/LC_@var{category}/@var{domain_name}.mo
1746@end smallexample
1747
1748The default value for @var{dir_name} is system specific. It is computed
1749from the value given as the prefix while configuring the C library.
1750This value normally is @file{/usr} or @file{/}. For the former the
1751complete @var{dir_name} is:
1752
1753@smallexample
1754/usr/share/locale
1755@end smallexample
1756
1757We can use @file{/usr/share} since the @file{.mo} files containing the
e8b1163e 1758message catalogs are system independent, so all systems can use the same
40a55d20 1759files. If the program executed the @code{bindtextdomain} function for
e8b1163e
AJ
1760the message domain that is currently handled, the @code{dir_name}
1761component is exactly the value which was given to the function as
1762the second parameter. I.e., @code{bindtextdomain} allows overwriting
f2ea0f5b 1763the only system dependent and fixed value to make it possible to
e8b1163e 1764address files anywhere in the filesystem.
40a55d20
UD
1765
1766The @var{category} is the name of the locale category which was selected
1767in the program code. For @code{gettext} and @code{dgettext} this is
1768always @code{LC_MESSAGES}, for @code{dcgettext} this is selected by the
1769value of the third parameter. As said above it should be avoided to
1770ever use a category other than @code{LC_MESSAGES}.
1771
1772The @var{locale} component is computed based on the category used. Just
1773like for the @code{setlocale} function here comes the user selection
1774into the play. Some environment variables are examined in a fixed order
1775and the first environment variable set determines the return value of
1776the lookup process. In detail, for the category @code{LC_xxx} the
1777following variables in this order are examined:
1778
1779@table @code
1780@item LANGUAGE
1781@item LC_ALL
1782@item LC_xxx
1783@item LANG
1784@end table
1785
1786This looks very familiar. With the exception of the @code{LANGUAGE}
1787environment variable this is exactly the lookup order the
10b89412 1788@code{setlocale} function uses. But why introduce the @code{LANGUAGE}
40a55d20
UD
1789variable?
1790
1791The reason is that the syntax of the values these variables can have is
1792different to what is expected by the @code{setlocale} function. If we
1793would set @code{LC_ALL} to a value following the extended syntax that
1794would mean the @code{setlocale} function will never be able to use the
1795value of this variable as well. An additional variable removes this
1796problem plus we can select the language independently of the locale
1797setting which sometimes is useful.
1798
1799While for the @code{LC_xxx} variables the value should consist of
1800exactly one specification of a locale the @code{LANGUAGE} variable's
1801value can consist of a colon separated list of locale names. The
1802attentive reader will realize that this is the way we manage to
1803implement one of our additional demands above: we want to be able to
10b89412 1804specify an ordered list of languages.
40a55d20
UD
1805
1806Back to the constructed filename we have only one component missing.
1807The @var{domain_name} part is the name which was either registered using
1808the @code{textdomain} function or which was given to @code{dgettext} or
1809@code{dcgettext} as the first parameter. Now it becomes obvious that a
1810good choice for the domain name in the program code is a string which is
1f77f049
JM
1811closely related to the program/package name. E.g., for @theglibc{}
1812the domain name is @code{libc}.
40a55d20
UD
1813
1814@noindent
10b89412 1815A limited piece of example code should show how the program is supposed
40a55d20
UD
1816to work:
1817
1818@smallexample
1819@{
1410e233 1820 setlocale (LC_ALL, "");
40a55d20
UD
1821 textdomain ("test-package");
1822 bindtextdomain ("test-package", "/usr/local/share/locale");
17c389fc 1823 puts (gettext ("Hello, world!"));
40a55d20
UD
1824@}
1825@end smallexample
1826
1410e233
UD
1827At the program start the default domain is @code{messages}, and the
1828default locale is "C". The @code{setlocale} call sets the locale
1829according to the user's environment variables; remember that correct
1830functioning of @code{gettext} relies on the correct setting of the
1831@code{LC_MESSAGES} locale (for looking up the message catalog) and
1832of the @code{LC_CTYPE} locale (for the character set conversion).
1833The @code{textdomain} call changes the default domain to
1834@code{test-package}. The @code{bindtextdomain} call specifies that
1835the message catalogs for the domain @code{test-package} can be found
1836below the directory @file{/usr/local/share/locale}.
40a55d20 1837
10b89412 1838If the user sets in her/his environment the variable @code{LANGUAGE}
40a55d20
UD
1839to @code{de} the @code{gettext} function will try to use the
1840translations from the file
1841
1842@smallexample
1843/usr/local/share/locale/de/LC_MESSAGES/test-package.mo
1844@end smallexample
1845
1846From the above descriptions it should be clear which component of this
f41c8091
UD
1847filename is determined by which source.
1848
10b89412
RJ
1849In the above example we assumed the @code{LANGUAGE} environment
1850variable to be @code{de}. This might be an appropriate selection but what
f41c8091
UD
1851happens if the user wants to use @code{LC_ALL} because of the wider
1852usability and here the required value is @code{de_DE.ISO-8859-1}? We
1853already mentioned above that a situation like this is not infrequent.
1854E.g., a person might prefer reading a dialect and if this is not
1855available fall back on the standard language.
1856
1857The @code{gettext} functions know about situations like this and can
1858handle them gracefully. The functions recognize the format of the value
1859of the environment variable. It can split the value is different pieces
1860and by leaving out the only or the other part it can construct new
1861values. This happens of course in a predictable way. To understand
1862this one must know the format of the environment variable value. There
7a9a2681
UD
1863is one more or less standardized form, originally from the X/Open
1864specification:
f41c8091 1865
f41c8091
UD
1866@code{language[_territory[.codeset]][@@modifier]}
1867
10b89412 1868Less specific locale names will be stripped in the order of the
7a9a2681 1869following list:
40a55d20 1870
f41c8091
UD
1871@enumerate
1872@item
f41c8091
UD
1873@code{codeset}
1874@item
1875@code{normalized codeset}
1876@item
1877@code{territory}
1878@item
7a9a2681 1879@code{modifier}
f41c8091
UD
1880@end enumerate
1881
7a9a2681 1882The @code{language} field will never be dropped for obvious reasons.
f41c8091
UD
1883
1884The only new thing is the @code{normalized codeset} entry. This is
10b89412
RJ
1885another goodie which is introduced to help reduce the chaos which
1886derives from the inability of people to standardize the names of
f41c8091
UD
1887character sets. Instead of @w{ISO-8859-1} one can often see @w{8859-1},
1888@w{88591}, @w{iso8859-1}, or @w{iso_8859-1}. The @code{normalized
1889codeset} value is generated from the user-provided character set name by
1890applying the following rules:
1891
1892@enumerate
1893@item
10b89412 1894Remove all characters besides numbers and letters.
f41c8091
UD
1895@item
1896Fold letters to lowercase.
1897@item
1898If the same only contains digits prepend the string @code{"iso"}.
1899@end enumerate
1900
1901@noindent
10b89412
RJ
1902So all of the above names will be normalized to @code{iso88591}. This
1903allows the program user much more freedom in choosing the locale name.
f41c8091
UD
1904
1905Even this extended functionality still does not help to solve the
1906problem that completely different names can be used to denote the same
1907locale (e.g., @code{de} and @code{german}). To be of help in this
1908situation the locale implementation and also the @code{gettext}
1909functions know about aliases.
1910
1911The file @file{/usr/share/locale/locale.alias} (replace @file{/usr} with
1912whatever prefix you used for configuring the C library) contains a
1913mapping of alternative names to more regular names. The system manager
1914is free to add new entries to fill her/his own needs. The selected
1915locale from the environment is compared with the entries in the first
10b89412 1916column of this file ignoring the case. If they match, the value of the
f41c8091
UD
1917second column is used instead for the further handling.
1918
1919In the description of the format of the environment variables we already
1920mentioned the character set as a factor in the selection of the message
1921catalog. In fact, only catalogs which contain text written using the
1922character set of the system/program can be used (directly; there will
1923come a solution for this some day). This means for the user that s/he
10b89412 1924will always have to take care of this. If in the collection of the
f41c8091
UD
1925message catalogs there are files for the same language but coded using
1926different character sets the user has to be careful.
40a55d20
UD
1927
1928
1929@node Helper programs for gettext
1930@subsection Programs to handle message catalogs for @code{gettext}
1931
1f77f049 1932@Theglibc{} does not contain the source code for the programs to
f41c8091
UD
1933handle message catalogs for the @code{gettext} functions. As part of
1934the GNU project the GNU gettext package contains everything the
1935developer needs. The functionality provided by the tools in this
1936package by far exceeds the abilities of the @code{gencat} program
1937described above for the @code{catgets} functions.
1938
1939There is a program @code{msgfmt} which is the equivalent program to the
1940@code{gencat} program. It generates from the human-readable and
1941-editable form of the message catalog a binary file which can be used by
1942the @code{gettext} functions. But there are several more programs
1943available.
1944
1945The @code{xgettext} program can be used to automatically extract the
1946translatable messages from a source file. I.e., the programmer need not
c430c4af 1947take care of the translations and the list of messages which have to be
f41c8091
UD
1948translated. S/He will simply wrap the translatable string in calls to
1949@code{gettext} et.al and the rest will be done by @code{xgettext}. This
c430c4af 1950program has a lot of options which help to customize the output or
f41c8091
UD
1951help to understand the input better.
1952
c430c4af
BS
1953Other programs help to manage the development cycle when new messages appear
1954in the source files or when a new translation of the messages appears.
11bf311e
UD
1955Here it should only be noted that using all the tools in GNU gettext it
1956is possible to @emph{completely} automate the handling of message
10b89412 1957catalogs. Besides marking the translatable strings in the source code and
f41c8091 1958generating the translations the developers do not have anything to do
608cc1f0 1959themselves.