]>
Commit | Line | Data |
---|---|---|
7a68c94a UD |
1 | @node Message Translation, Searching and Sorting, Locales, Top |
2 | @c %MENU% How to make the program speak the user's language | |
40a55d20 UD |
3 | @chapter Message Translation |
4 | ||
e8dd4791 CD |
5 | The program's interface with the user should be designed to ease the user's |
6 | task. One way to ease the user's task is to use messages in whatever | |
7 | language the user prefers. | |
40a55d20 UD |
8 | |
9 | Printing messages in different languages can be implemented in different | |
10 | ways. One could add all the different languages in the source code and | |
c430c4af BS |
11 | choose among the variants every time a message has to be printed. This is |
12 | certainly not a good solution since extending the set of languages is | |
13 | cumbersome (the code must be changed) and the code itself can become | |
40a55d20 UD |
14 | really big with dozens of message sets. |
15 | ||
c430c4af | 16 | A better solution is to keep the message sets for each language |
40a55d20 UD |
17 | in separate files which are loaded at runtime depending on the language |
18 | selection of the user. | |
19 | ||
1f77f049 | 20 | @Theglibc{} provides two different sets of functions to support |
40a55d20 UD |
21 | message translation. The problem is that neither of the interfaces is |
22 | officially defined by the POSIX standard. The @code{catgets} family of | |
f2ea0f5b UD |
23 | functions is defined in the X/Open standard but this is derived from |
24 | industry decisions and therefore not necessarily based on reasonable | |
40a55d20 UD |
25 | decisions. |
26 | ||
10b89412 | 27 | As mentioned above, the message catalog handling provides easy |
ef48b196 | 28 | extendability by using external data files which contain the message |
40a55d20 UD |
29 | translations. I.e., these files contain for each of the messages used |
30 | in the program a translation for the appropriate language. So the tasks | |
fed8f7f7 | 31 | of the message handling functions are |
40a55d20 UD |
32 | |
33 | @itemize @bullet | |
34 | @item | |
c430c4af | 35 | locate the external data file with the appropriate translations |
40a55d20 UD |
36 | @item |
37 | load the data and make it possible to address the messages | |
38 | @item | |
39 | map a given key to the translated message | |
40 | @end itemize | |
41 | ||
42 | The two approaches mainly differ in the implementation of this last | |
e8dd4791 | 43 | step. Decisions made in the last step influence the rest of the design. |
40a55d20 UD |
44 | |
45 | @menu | |
46 | * Message catalogs a la X/Open:: The @code{catgets} family of functions. | |
47 | * The Uniforum approach:: The @code{gettext} family of functions. | |
48 | @end menu | |
49 | ||
50 | ||
51 | @node Message catalogs a la X/Open | |
52 | @section X/Open Message Catalog Handling | |
53 | ||
54 | The @code{catgets} functions are based on the simple scheme: | |
55 | ||
56 | @quotation | |
57 | Associate every message to translate in the source code with a unique | |
58 | identifier. To retrieve a message from a catalog file solely the | |
59 | identifier is used. | |
60 | @end quotation | |
61 | ||
62 | This means for the author of the program that s/he will have to make | |
63 | sure the meaning of the identifier in the program code and in the | |
10b89412 | 64 | message catalogs is always the same. |
40a55d20 UD |
65 | |
66 | Before a message can be translated the catalog file must be located. | |
67 | The user of the program must be able to guide the responsible function | |
68 | to find whatever catalog the user wants. This is separated from what | |
69 | the programmer had in mind. | |
70 | ||
f2ea0f5b | 71 | All the types, constants and functions for the @code{catgets} functions |
40a55d20 UD |
72 | are defined/declared in the @file{nl_types.h} header file. |
73 | ||
74 | @menu | |
75 | * The catgets Functions:: The @code{catgets} function family. | |
76 | * The message catalog files:: Format of the message catalog files. | |
77 | * The gencat program:: How to generate message catalogs files which | |
78 | can be used by the functions. | |
79 | * Common Usage:: How to use the @code{catgets} interface. | |
80 | @end menu | |
81 | ||
82 | ||
83 | @node The catgets Functions | |
84 | @subsection The @code{catgets} function family | |
85 | ||
40a55d20 | 86 | @deftypefun nl_catd catopen (const char *@var{cat_name}, int @var{flag}) |
d08a7e4c | 87 | @standards{X/Open, nl_types.h} |
29e7e2df AO |
88 | @safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
89 | @c catopen @mtsenv @ascuheap @acsmem | |
90 | @c strchr ok | |
91 | @c setlocale(,NULL) ok | |
92 | @c getenv @mtsenv | |
93 | @c strlen ok | |
94 | @c alloca ok | |
95 | @c stpcpy ok | |
96 | @c malloc @ascuheap @acsmem | |
97 | @c __open_catalog @ascuheap @acsmem | |
98 | @c strchr ok | |
99 | @c open_not_cancel_2 @acsfd | |
100 | @c strlen ok | |
101 | @c ENOUGH ok | |
102 | @c alloca ok | |
103 | @c memcpy ok | |
104 | @c fxstat64 ok | |
105 | @c __set_errno ok | |
106 | @c mmap @acsmem | |
107 | @c malloc dup @ascuheap @acsmem | |
108 | @c read_not_cancel ok | |
109 | @c free dup @ascuheap @acsmem | |
110 | @c munmap ok | |
111 | @c close_not_cancel_no_status ok | |
112 | @c free @ascuheap @acsmem | |
10b89412 | 113 | The @code{catopen} function tries to locate the message data file named |
40a55d20 UD |
114 | @var{cat_name} and loads it when found. The return value is of an |
115 | opaque type and can be used in calls to the other functions to refer to | |
116 | this loaded catalog. | |
117 | ||
118 | The return value is @code{(nl_catd) -1} in case the function failed and | |
010fe231 | 119 | no catalog was loaded. The global variable @code{errno} contains a code |
40a55d20 UD |
120 | for the error causing the failure. But even if the function call |
121 | succeeded this does not mean that all messages can be translated. | |
122 | ||
123 | Locating the catalog file must happen in a way which lets the user of | |
124 | the program influence the decision. It is up to the user to decide | |
125 | about the language to use and sometimes it is useful to use alternate | |
126 | catalog files. All this can be specified by the user by setting some | |
f2ea0f5b | 127 | environment variables. |
40a55d20 UD |
128 | |
129 | The first problem is to find out where all the message catalogs are | |
130 | stored. Every program could have its own place to keep all the | |
131 | different files but usually the catalog files are grouped by languages | |
132 | and the catalogs for all programs are kept in the same place. | |
133 | ||
134 | @cindex NLSPATH environment variable | |
135 | To tell the @code{catopen} function where the catalog for the program | |
136 | can be found the user can set the environment variable @code{NLSPATH} to | |
137 | a value which describes her/his choice. Since this value must be usable | |
138 | for different languages and locales it cannot be a simple string. | |
139 | Instead it is a format string (similar to @code{printf}'s). An example | |
140 | is | |
141 | ||
142 | @smallexample | |
143 | /usr/share/locale/%L/%N:/usr/share/locale/%L/LC_MESSAGES/%N | |
144 | @end smallexample | |
145 | ||
146 | First one can see that more than one directory can be specified (with | |
147 | the usual syntax of separating them by colons). The next things to | |
148 | observe are the format string, @code{%L} and @code{%N} in this case. | |
149 | The @code{catopen} function knows about several of them and the | |
150 | replacement for all of them is of course different. | |
151 | ||
152 | @table @code | |
153 | @item %N | |
154 | This format element is substituted with the name of the catalog file. | |
155 | This is the value of the @var{cat_name} argument given to | |
156 | @code{catgets}. | |
157 | ||
158 | @item %L | |
159 | This format element is substituted with the name of the currently | |
160 | selected locale for translating messages. How this is determined is | |
161 | explained below. | |
162 | ||
163 | @item %l | |
164 | (This is the lowercase ell.) This format element is substituted with the | |
f2ea0f5b | 165 | language element of the locale name. The string describing the selected |
40a55d20 UD |
166 | locale is expected to have the form |
167 | @code{@var{lang}[_@var{terr}[.@var{codeset}]]} and this format uses the | |
168 | first part @var{lang}. | |
169 | ||
170 | @item %t | |
171 | This format element is substituted by the territory part @var{terr} of | |
172 | the name of the currently selected locale. See the explanation of the | |
173 | format above. | |
174 | ||
175 | @item %c | |
176 | This format element is substituted by the codeset part @var{codeset} of | |
177 | the name of the currently selected locale. See the explanation of the | |
178 | format above. | |
179 | ||
180 | @item %% | |
10b89412 | 181 | Since @code{%} is used as a meta character there must be a way to |
40a55d20 UD |
182 | express the @code{%} character in the result itself. Using @code{%%} |
183 | does this just like it works for @code{printf}. | |
184 | @end table | |
185 | ||
186 | ||
e8b1163e AJ |
187 | Using @code{NLSPATH} allows arbitrary directories to be searched for |
188 | message catalogs while still allowing different languages to be used. | |
189 | If the @code{NLSPATH} environment variable is not set, the default value | |
190 | is | |
40a55d20 UD |
191 | |
192 | @smallexample | |
193 | @var{prefix}/share/locale/%L/%N:@var{prefix}/share/locale/%L/LC_MESSAGES/%N | |
194 | @end smallexample | |
195 | ||
196 | @noindent | |
1f77f049 JM |
197 | where @var{prefix} is given to @code{configure} while installing @theglibc{} |
198 | (this value is in many cases @code{/usr} or the empty string). | |
40a55d20 UD |
199 | |
200 | The remaining problem is to decide which must be used. The value | |
201 | decides about the substitution of the format elements mentioned above. | |
202 | First of all the user can specify a path in the message catalog name | |
203 | (i.e., the name contains a slash character). In this situation the | |
204 | @code{NLSPATH} environment variable is not used. The catalog must exist | |
205 | as specified in the program, perhaps relative to the current working | |
206 | directory. This situation in not desirable and catalogs names never | |
608cc1f0 | 207 | should be written this way. Beside this, this behavior is not portable |
40a55d20 UD |
208 | to all other platforms providing the @code{catgets} interface. |
209 | ||
210 | @cindex LC_ALL environment variable | |
211 | @cindex LC_MESSAGES environment variable | |
212 | @cindex LANG environment variable | |
213 | Otherwise the values of environment variables from the standard | |
f2ea0f5b | 214 | environment are examined (@pxref{Standard Environment}). Which |
40a55d20 UD |
215 | variables are examined is decided by the @var{flag} parameter of |
216 | @code{catopen}. If the value is @code{NL_CAT_LOCALE} (which is defined | |
10b89412 | 217 | in @file{nl_types.h}) then the @code{catopen} function uses the name of |
4d76a0ec UD |
218 | the locale currently selected for the @code{LC_MESSAGES} category. |
219 | ||
220 | If @var{flag} is zero the @code{LANG} environment variable is examined. | |
10b89412 | 221 | This is a left-over from the early days when the concept of locales |
4d76a0ec UD |
222 | had not even reached the level of POSIX locales. |
223 | ||
224 | The environment variable and the locale name should have a value of the | |
225 | form @code{@var{lang}[_@var{terr}[.@var{codeset}]]} as explained above. | |
226 | If no environment variable is set the @code{"C"} locale is used which | |
40a55d20 UD |
227 | prevents any translation. |
228 | ||
229 | The return value of the function is in any case a valid string. Either | |
230 | it is a translation from a message catalog or it is the same as the | |
231 | @var{string} parameter. So a piece of code to decide whether a | |
232 | translation actually happened must look like this: | |
233 | ||
234 | @smallexample | |
235 | @{ | |
236 | char *trans = catgets (desc, set, msg, input_string); | |
237 | if (trans == input_string) | |
238 | @{ | |
239 | /* Something went wrong. */ | |
240 | @} | |
241 | @} | |
242 | @end smallexample | |
243 | ||
244 | @noindent | |
010fe231 | 245 | When an error occurs the global variable @code{errno} is set to |
40a55d20 UD |
246 | |
247 | @table @var | |
248 | @item EBADF | |
249 | The catalog does not exist. | |
250 | @item ENOMSG | |
b8a46c1d | 251 | The set/message tuple does not name an existing element in the |
40a55d20 UD |
252 | message catalog. |
253 | @end table | |
254 | ||
255 | While it sometimes can be useful to test for errors programs normally | |
256 | will avoid any test. If the translation is not available it is no big | |
257 | problem if the original, untranslated message is printed. Either the | |
258 | user understands this as well or s/he will look for the reason why the | |
259 | messages are not translated. | |
260 | @end deftypefun | |
261 | ||
262 | Please note that the currently selected locale does not depend on a call | |
263 | to the @code{setlocale} function. It is not necessary that the locale | |
264 | data files for this locale exist and calling @code{setlocale} succeeds. | |
265 | The @code{catopen} function directly reads the values of the environment | |
266 | variables. | |
267 | ||
268 | ||
269 | @deftypefun {char *} catgets (nl_catd @var{catalog_desc}, int @var{set}, int @var{message}, const char *@var{string}) | |
29e7e2df | 270 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
10b89412 | 271 | The function @code{catgets} has to be used to access the message catalog |
40a55d20 UD |
272 | previously opened using the @code{catopen} function. The |
273 | @var{catalog_desc} parameter must be a value previously returned by | |
274 | @code{catopen}. | |
275 | ||
276 | The next two parameters, @var{set} and @var{message}, reflect the | |
277 | internal organization of the message catalog files. This will be | |
278 | explained in detail below. For now it is interesting to know that a | |
10b89412 | 279 | catalog can consist of several sets and the messages in each thread are |
40a55d20 UD |
280 | individually numbered using numbers. Neither the set number nor the |
281 | message number must be consecutive. They can be arbitrarily chosen. | |
282 | But each message (unless equal to another one) must have its own unique | |
10b89412 | 283 | pair of set and message numbers. |
40a55d20 UD |
284 | |
285 | Since it is not guaranteed that the message catalog for the language | |
286 | selected by the user exists the last parameter @var{string} helps to | |
287 | handle this case gracefully. If no matching string can be found | |
288 | @var{string} is returned. This means for the programmer that | |
289 | ||
290 | @itemize @bullet | |
291 | @item | |
292 | the @var{string} parameters should contain reasonable text (this also | |
293 | helps to understand the program seems otherwise there would be no hint | |
294 | on the string which is expected to be returned. | |
295 | @item | |
296 | all @var{string} arguments should be written in the same language. | |
297 | @end itemize | |
298 | @end deftypefun | |
299 | ||
300 | It is somewhat uncomfortable to write a program using the @code{catgets} | |
301 | functions if no supporting functionality is available. Since each | |
f2ea0f5b | 302 | set/message number tuple must be unique the programmer must keep lists |
40a55d20 UD |
303 | of the messages at the same time the code is written. And the work |
304 | between several people working on the same project must be coordinated. | |
10b89412 | 305 | We will see how some of these problems can be relaxed a bit (@pxref{Common |
8b7fb588 | 306 | Usage}). |
40a55d20 UD |
307 | |
308 | @deftypefun int catclose (nl_catd @var{catalog_desc}) | |
29e7e2df AO |
309 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acucorrupt{} @acsmem{}}} |
310 | @c catclose @ascuheap @acucorrupt @acsmem | |
311 | @c __set_errno ok | |
312 | @c munmap ok | |
313 | @c free @ascuheap @acsmem | |
40a55d20 UD |
314 | The @code{catclose} function can be used to free the resources |
315 | associated with a message catalog which previously was opened by a call | |
316 | to @code{catopen}. If the resources can be successfully freed the | |
10b89412 | 317 | function returns @code{0}. Otherwise it returns @code{@minus{}1} and the |
010fe231 FW |
318 | global variable @code{errno} is set. Errors can occur if the catalog |
319 | descriptor @var{catalog_desc} is not valid in which case @code{errno} is | |
40a55d20 UD |
320 | set to @code{EBADF}. |
321 | @end deftypefun | |
322 | ||
323 | ||
324 | @node The message catalog files | |
325 | @subsection Format of the message catalog files | |
326 | ||
10b89412 | 327 | The only reasonable way to translate all the messages of a function and |
40a55d20 UD |
328 | store the result in a message catalog file which can be read by the |
329 | @code{catopen} function is to write all the message text to the | |
330 | translator and let her/him translate them all. I.e., we must have a | |
f2ea0f5b | 331 | file with entries which associate the set/message tuple with a specific |
40a55d20 UD |
332 | translation. This file format is specified in the X/Open standard and |
333 | is as follows: | |
334 | ||
335 | @itemize @bullet | |
336 | @item | |
337 | Lines containing only whitespace characters or empty lines are ignored. | |
338 | ||
339 | @item | |
340 | Lines which contain as the first non-whitespace character a @code{$} | |
341 | followed by a whitespace character are comment and are also ignored. | |
342 | ||
343 | @item | |
344 | If a line contains as the first non-whitespace characters the sequence | |
345 | @code{$set} followed by a whitespace character an additional argument | |
346 | is required to follow. This argument can either be: | |
347 | ||
348 | @itemize @minus | |
349 | @item | |
350 | a number. In this case the value of this number determines the set | |
351 | to which the following messages are added. | |
352 | ||
353 | @item | |
354 | an identifier consisting of alphanumeric characters plus the underscore | |
355 | character. In this case the set get automatically a number assigned. | |
356 | This value is one added to the largest set number which so far appeared. | |
357 | ||
358 | How to use the symbolic names is explained in section @ref{Common Usage}. | |
359 | ||
360 | It is an error if a symbol name appears more than once. All following | |
361 | messages are placed in a set with this number. | |
362 | @end itemize | |
363 | ||
364 | @item | |
365 | If a line contains as the first non-whitespace characters the sequence | |
366 | @code{$delset} followed by a whitespace character an additional argument | |
367 | is required to follow. This argument can either be: | |
368 | ||
369 | @itemize @minus | |
370 | @item | |
371 | a number. In this case the value of this number determines the set | |
372 | which will be deleted. | |
373 | ||
374 | @item | |
375 | an identifier consisting of alphanumeric characters plus the underscore | |
376 | character. This symbolic identifier must match a name for a set which | |
377 | previously was defined. It is an error if the name is unknown. | |
378 | @end itemize | |
379 | ||
380 | In both cases all messages in the specified set will be removed. They | |
381 | will not appear in the output. But if this set is later again selected | |
382 | with a @code{$set} command again messages could be added and these | |
383 | messages will appear in the output. | |
384 | ||
385 | @item | |
386 | If a line contains after leading whitespaces the sequence | |
387 | @code{$quote}, the quoting character used for this input file is | |
10b89412 | 388 | changed to the first non-whitespace character following |
40a55d20 | 389 | @code{$quote}. If no non-whitespace character is present before the |
10b89412 | 390 | line ends quoting is disabled. |
40a55d20 UD |
391 | |
392 | By default no quoting character is used. In this mode strings are | |
393 | terminated with the first unescaped line break. If there is a | |
394 | @code{$quote} sequence present newline need not be escaped. Instead a | |
f2ea0f5b | 395 | string is terminated with the first unescaped appearance of the quote |
40a55d20 UD |
396 | character. |
397 | ||
398 | A common usage of this feature would be to set the quote character to | |
f2ea0f5b | 399 | @code{"}. Then any appearance of the @code{"} in the strings must |
40a55d20 UD |
400 | be escaped using the backslash (i.e., @code{\"} must be written). |
401 | ||
402 | @item | |
403 | Any other line must start with a number or an alphanumeric identifier | |
404 | (with the underscore character included). The following characters | |
a2d63612 | 405 | (starting after the first whitespace character) will form the string |
40a55d20 UD |
406 | which gets associated with the currently selected set and the message |
407 | number represented by the number and identifier respectively. | |
408 | ||
409 | If the start of the line is a number the message number is obvious. It | |
410 | is an error if the same message number already appeared for this set. | |
411 | ||
412 | If the leading token was an identifier the message number gets | |
10b89412 | 413 | automatically assigned. The value is the current maximum message |
40a55d20 | 414 | number for this set plus one. It is an error if the identifier was |
608cc1f0 | 415 | already used for a message in this set. It is OK to reuse the |
40a55d20 UD |
416 | identifier for a message in another thread. How to use the symbolic |
417 | identifiers will be explained below (@pxref{Common Usage}). There is | |
418 | one limitation with the identifier: it must not be @code{Set}. The | |
419 | reason will be explained below. | |
420 | ||
40a55d20 UD |
421 | The text of the messages can contain escape characters. The usual bunch |
422 | of characters known from the @w{ISO C} language are recognized | |
423 | (@code{\n}, @code{\t}, @code{\v}, @code{\b}, @code{\r}, @code{\f}, | |
424 | @code{\\}, and @code{\@var{nnn}}, where @var{nnn} is the octal coding of | |
425 | a character code). | |
426 | @end itemize | |
427 | ||
428 | @strong{Important:} The handling of identifiers instead of numbers for | |
429 | the set and messages is a GNU extension. Systems strictly following the | |
430 | X/Open specification do not have this feature. An example for a message | |
431 | catalog file is this: | |
432 | ||
433 | @smallexample | |
434 | $ This is a leading comment. | |
435 | $quote " | |
436 | ||
437 | $set SetOne | |
438 | 1 Message with ID 1. | |
439 | two " Message with ID \"two\", which gets the value 2 assigned" | |
440 | ||
441 | $set SetTwo | |
f2ea0f5b | 442 | $ Since the last set got the number 1 assigned this set has number 2. |
40a55d20 UD |
443 | 4000 "The numbers can be arbitrary, they need not start at one." |
444 | @end smallexample | |
445 | ||
446 | This small example shows various aspects: | |
447 | @itemize @bullet | |
448 | @item | |
449 | Lines 1 and 9 are comments since they start with @code{$} followed by | |
450 | a whitespace. | |
451 | @item | |
452 | The quoting character is set to @code{"}. Otherwise the quotes in the | |
10b89412 RJ |
453 | message definition would have to be omitted and in this case the |
454 | message with the identifier @code{two} would lose its leading whitespace. | |
40a55d20 | 455 | @item |
10b89412 | 456 | Mixing numbered messages with messages having symbolic names is no |
f2ea0f5b | 457 | problem and the numbering happens automatically. |
40a55d20 UD |
458 | @end itemize |
459 | ||
460 | ||
461 | While this file format is pretty easy it is not the best possible for | |
462 | use in a running program. The @code{catopen} function would have to | |
10b89412 | 463 | parse the file and handle syntactic errors gracefully. This is not so |
40a55d20 UD |
464 | easy and the whole process is pretty slow. Therefore the @code{catgets} |
465 | functions expect the data in another more compact and ready-to-use file | |
f2ea0f5b | 466 | format. There is a special program @code{gencat} which is explained in |
40a55d20 UD |
467 | detail in the next section. |
468 | ||
469 | Files in this other format are not human readable. To be easy to use by | |
470 | programs it is a binary file. But the format is byte order independent | |
471 | so translation files can be shared by systems of arbitrary architecture | |
1f77f049 | 472 | (as long as they use @theglibc{}). |
40a55d20 UD |
473 | |
474 | Details about the binary file format are not important to know since | |
475 | these files are always created by the @code{gencat} program. The | |
1f77f049 | 476 | sources of @theglibc{} also provide the sources for the |
f2ea0f5b | 477 | @code{gencat} program and so the interested reader can look through |
40a55d20 UD |
478 | these source files to learn about the file format. |
479 | ||
480 | ||
481 | @node The gencat program | |
482 | @subsection Generate Message Catalogs files | |
483 | ||
484 | @cindex gencat | |
485 | The @code{gencat} program is specified in the X/Open standard and the | |
e8b1163e | 486 | GNU implementation follows this specification and so processes |
40a55d20 | 487 | all correctly formed input files. Additionally some extension are |
3081378b | 488 | implemented which help to work in a more reasonable way with the |
40a55d20 UD |
489 | @code{catgets} functions. |
490 | ||
491 | The @code{gencat} program can be invoked in two ways: | |
492 | ||
493 | @example | |
10b89412 | 494 | `gencat [@var{Option} @dots{}] [@var{Output-File} [@var{Input-File} @dots{}]]` |
40a55d20 UD |
495 | @end example |
496 | ||
497 | This is the interface defined in the X/Open standard. If no | |
10b89412 RJ |
498 | @var{Input-File} parameter is given, input will be read from standard |
499 | input. Multiple input files will be read as if they were concatenated. | |
40a55d20 | 500 | If @var{Output-File} is also missing, the output will be written to |
b8a46c1d | 501 | standard output. To provide the interface one is used to from other |
40a55d20 UD |
502 | programs a second interface is provided. |
503 | ||
504 | @smallexample | |
10b89412 | 505 | `gencat [@var{Option} @dots{}] -o @var{Output-File} [@var{Input-File} @dots{}]` |
40a55d20 UD |
506 | @end smallexample |
507 | ||
508 | The option @samp{-o} is used to specify the output file and all file | |
509 | arguments are used as input files. | |
510 | ||
511 | Beside this one can use @file{-} or @file{/dev/stdin} for | |
512 | @var{Input-File} to denote the standard input. Corresponding one can | |
513 | use @file{-} and @file{/dev/stdout} for @var{Output-File} to denote | |
514 | standard output. Using @file{-} as a file name is allowed in X/Open | |
515 | while using the device names is a GNU extension. | |
516 | ||
517 | The @code{gencat} program works by concatenating all input files and | |
10b89412 | 518 | then @strong{merging} the resulting collection of message sets with a |
f2ea0f5b UD |
519 | possibly existing output file. This is done by removing all messages |
520 | with set/message number tuples matching any of the generated messages | |
40a55d20 UD |
521 | from the output file and then adding all the new messages. To |
522 | regenerate a catalog file while ignoring the old contents therefore | |
10b89412 | 523 | requires removing the output file if it exists. If the output is |
40a55d20 UD |
524 | written to standard output no merging takes place. |
525 | ||
526 | @noindent | |
527 | The following table shows the options understood by the @code{gencat} | |
10b89412 | 528 | program. The X/Open standard does not specify any options for the |
40a55d20 UD |
529 | program so all of these are GNU extensions. |
530 | ||
531 | @table @samp | |
532 | @item -V | |
533 | @itemx --version | |
534 | Print the version information and exit. | |
535 | @item -h | |
536 | @itemx --help | |
537 | Print a usage message listing all available options, then exit successfully. | |
538 | @item --new | |
10b89412 RJ |
539 | Do not merge the new messages from the input files with the old content |
540 | of the output file. The old content of the output file is discarded. | |
40a55d20 UD |
541 | @item -H |
542 | @itemx --header=name | |
543 | This option is used to emit the symbolic names given to sets and | |
544 | messages in the input files for use in the program. Details about how | |
545 | to use this are given in the next section. The @var{name} parameter to | |
546 | this option specifies the name of the output file. It will contain a | |
547 | number of C preprocessor @code{#define}s to associate a name with a | |
548 | number. | |
549 | ||
550 | Please note that the generated file only contains the symbols from the | |
551 | input files. If the output is merged with the previous content of the | |
552 | output file the possibly existing symbols from the file(s) which | |
553 | generated the old output files are not in the generated header file. | |
554 | @end table | |
555 | ||
556 | ||
557 | @node Common Usage | |
558 | @subsection How to use the @code{catgets} interface | |
559 | ||
560 | The @code{catgets} functions can be used in two different ways. By | |
561 | following slavishly the X/Open specs and not relying on the extension | |
562 | and by using the GNU extensions. We will take a look at the former | |
563 | method first to understand the benefits of extensions. | |
564 | ||
fed8f7f7 | 565 | @subsubsection Not using symbolic names |
40a55d20 UD |
566 | |
567 | Since the X/Open format of the message catalog files does not allow | |
568 | symbol names we have to work with numbers all the time. When we start | |
f2ea0f5b UD |
569 | writing a program we have to replace all appearances of translatable |
570 | strings with something like | |
40a55d20 UD |
571 | |
572 | @smallexample | |
573 | catgets (catdesc, set, msg, "string") | |
574 | @end smallexample | |
575 | ||
576 | @noindent | |
577 | @var{catgets} is retrieved from a call to @code{catopen} which is | |
578 | normally done once at the program start. The @code{"string"} is the | |
579 | string we want to translate. The problems start with the set and | |
580 | message numbers. | |
581 | ||
582 | In a bigger program several programmers usually work at the same time on | |
583 | the program and so coordinating the number allocation is crucial. | |
f2ea0f5b UD |
584 | Though no two different strings must be indexed by the same tuple of |
585 | numbers it is highly desirable to reuse the numbers for equal strings | |
40a55d20 UD |
586 | with equal translations (please note that there might be strings which |
587 | are equal in one language but have different translations due to | |
588 | difference contexts). | |
589 | ||
590 | The allocation process can be relaxed a bit by different set numbers for | |
591 | different parts of the program. So the number of developers who have to | |
592 | coordinate the allocation can be reduced. But still lists must be keep | |
593 | track of the allocation and errors can easily happen. These errors | |
594 | cannot be discovered by the compiler or the @code{catgets} functions. | |
595 | Only the user of the program might see wrong messages printed. In the | |
596 | worst cases the messages are so irritating that they cannot be | |
597 | recognized as wrong. Think about the translations for @code{"true"} and | |
f2ea0f5b | 598 | @code{"false"} being exchanged. This could result in a disaster. |
40a55d20 UD |
599 | |
600 | ||
601 | @subsubsection Using symbolic names | |
602 | ||
603 | The problems mentioned in the last section derive from the fact that: | |
604 | ||
605 | @enumerate | |
606 | @item | |
607 | the numbers are allocated once and due to the possibly frequent use of | |
608 | them it is difficult to change a number later. | |
609 | @item | |
10b89412 | 610 | the numbers do not allow guessing anything about the string and |
40a55d20 UD |
611 | therefore collisions can easily happen. |
612 | @end enumerate | |
613 | ||
614 | By constantly using symbolic names and by providing a method which maps | |
615 | the string content to a symbolic name (however this will happen) one can | |
616 | prevent both problems above. The cost of this is that the programmer | |
617 | has to write a complete message catalog file while s/he is writing the | |
618 | program itself. | |
619 | ||
620 | This is necessary since the symbolic names must be mapped to numbers | |
621 | before the program sources can be compiled. In the last section it was | |
622 | described how to generate a header containing the mapping of the names. | |
623 | E.g., for the example message file given in the last section we could | |
10b89412 | 624 | call the @code{gencat} program as follows (assume @file{ex.msg} contains |
40a55d20 UD |
625 | the sources). |
626 | ||
627 | @smallexample | |
628 | gencat -H ex.h -o ex.cat ex.msg | |
629 | @end smallexample | |
630 | ||
631 | @noindent | |
632 | This generates a header file with the following content: | |
633 | ||
634 | @smallexample | |
b8a46c1d | 635 | #define SetTwoSet 0x2 /* ex.msg:8 */ |
40a55d20 | 636 | |
b8a46c1d UD |
637 | #define SetOneSet 0x1 /* ex.msg:4 */ |
638 | #define SetOnetwo 0x2 /* ex.msg:6 */ | |
40a55d20 UD |
639 | @end smallexample |
640 | ||
641 | As can be seen the various symbols given in the source file are mangled | |
642 | to generate unique identifiers and these identifiers get numbers | |
643 | assigned. Reading the source file and knowing about the rules will | |
644 | allow to predict the content of the header file (it is deterministic) | |
645 | but this is not necessary. The @code{gencat} program can take care for | |
646 | everything. All the programmer has to do is to put the generated header | |
647 | file in the dependency list of the source files of her/his project and | |
10b89412 | 648 | add a rule to regenerate the header if any of the input files change. |
40a55d20 UD |
649 | |
650 | One word about the symbol mangling. Every symbol consists of two parts: | |
651 | the name of the message set plus the name of the message or the special | |
652 | string @code{Set}. So @code{SetOnetwo} means this macro can be used to | |
653 | access the translation with identifier @code{two} in the message set | |
654 | @code{SetOne}. | |
655 | ||
656 | The other names denote the names of the message sets. The special | |
657 | string @code{Set} is used in the place of the message identifier. | |
658 | ||
659 | If in the code the second string of the set @code{SetOne} is used the C | |
660 | code should look like this: | |
661 | ||
662 | @smallexample | |
663 | catgets (catdesc, SetOneSet, SetOnetwo, | |
664 | " Message with ID \"two\", which gets the value 2 assigned") | |
665 | @end smallexample | |
666 | ||
667 | Writing the function this way will allow to change the message number | |
668 | and even the set number without requiring any change in the C source | |
669 | code. (The text of the string is normally not the same; this is only | |
670 | for this example.) | |
671 | ||
672 | ||
673 | @subsubsection How does to this allow to develop | |
674 | ||
675 | To illustrate the usual way to work with the symbolic version numbers | |
676 | here is a little example. Assume we want to write the very complex and | |
677 | famous greeting program. We start by writing the code as usual: | |
678 | ||
679 | @smallexample | |
680 | #include <stdio.h> | |
681 | int | |
682 | main (void) | |
683 | @{ | |
684 | printf ("Hello, world!\n"); | |
685 | return 0; | |
686 | @} | |
687 | @end smallexample | |
688 | ||
689 | Now we want to internationalize the message and therefore replace the | |
690 | message with whatever the user wants. | |
691 | ||
692 | @smallexample | |
693 | #include <nl_types.h> | |
694 | #include <stdio.h> | |
695 | #include "msgnrs.h" | |
696 | int | |
697 | main (void) | |
698 | @{ | |
699 | nl_catd catdesc = catopen ("hello.cat", NL_CAT_LOCALE); | |
fed8f7f7 | 700 | printf (catgets (catdesc, SetMainSet, SetMainHello, |
838e5ffe | 701 | "Hello, world!\n")); |
40a55d20 UD |
702 | catclose (catdesc); |
703 | return 0; | |
704 | @} | |
705 | @end smallexample | |
706 | ||
707 | We see how the catalog object is opened and the returned descriptor used | |
708 | in the other function calls. It is not really necessary to check for | |
709 | failure of any of the functions since even in these situations the | |
710 | functions will behave reasonable. They simply will be return a | |
711 | translation. | |
712 | ||
713 | What remains unspecified here are the constants @code{SetMainSet} and | |
714 | @code{SetMainHello}. These are the symbolic names describing the | |
715 | message. To get the actual definitions which match the information in | |
716 | the catalog file we have to create the message catalog source file and | |
717 | process it using the @code{gencat} program. | |
718 | ||
719 | @smallexample | |
720 | $ Messages for the famous greeting program. | |
721 | $quote " | |
722 | ||
723 | $set Main | |
724 | Hello "Hallo, Welt!\n" | |
725 | @end smallexample | |
726 | ||
727 | Now we can start building the program (assume the message catalog source | |
728 | file is named @file{hello.msg} and the program source file @file{hello.c}): | |
729 | ||
730 | @smallexample | |
40a55d20 UD |
731 | % gencat -H msgnrs.h -o hello.cat hello.msg |
732 | % cat msgnrs.h | |
733 | #define MainSet 0x1 /* hello.msg:4 */ | |
734 | #define MainHello 0x1 /* hello.msg:5 */ | |
735 | % gcc -o hello hello.c -I. | |
736 | % cp hello.cat /usr/share/locale/de/LC_MESSAGES | |
737 | % echo $LC_ALL | |
738 | de | |
739 | % ./hello | |
740 | Hallo, Welt! | |
741 | % | |
40a55d20 UD |
742 | @end smallexample |
743 | ||
744 | The call of the @code{gencat} program creates the missing header file | |
745 | @file{msgnrs.h} as well as the message catalog binary. The former is | |
746 | used in the compilation of @file{hello.c} while the later is placed in a | |
747 | directory in which the @code{catopen} function will try to locate it. | |
748 | Please check the @code{LC_ALL} environment variable and the default path | |
749 | for @code{catopen} presented in the description above. | |
750 | ||
751 | ||
752 | @node The Uniforum approach | |
753 | @section The Uniforum approach to Message Translation | |
754 | ||
755 | Sun Microsystems tried to standardize a different approach to message | |
756 | translation in the Uniforum group. There never was a real standard | |
6c55cda3 | 757 | defined but still the interface was used in Sun's operating systems. |
40a55d20 | 758 | Since this approach fits better in the development process of free |
1410e233 | 759 | software it is also used throughout the GNU project and the GNU |
1f77f049 | 760 | @file{gettext} package provides support for this outside @theglibc{}. |
40a55d20 UD |
761 | |
762 | The code of the @file{libintl} from GNU @file{gettext} is the same as | |
1f77f049 | 763 | the code in @theglibc{}. So the documentation in the GNU |
40a55d20 UD |
764 | @file{gettext} manual is also valid for the functionality here. The |
765 | following text will describe the library functions in detail. But the | |
766 | numerous helper programs are not described in this manual. Instead | |
767 | people should read the GNU @file{gettext} manual | |
768 | (@pxref{Top,,GNU gettext utilities,gettext,Native Language Support Library and Tools}). | |
769 | We will only give a short overview. | |
770 | ||
771 | Though the @code{catgets} functions are available by default on more | |
772 | systems the @code{gettext} interface is at least as portable as the | |
773 | former. The GNU @file{gettext} package can be used wherever the | |
774 | functions are not available. | |
775 | ||
776 | ||
777 | @menu | |
778 | * Message catalogs with gettext:: The @code{gettext} family of functions. | |
779 | * Helper programs for gettext:: Programs to handle message catalogs | |
780 | for @code{gettext}. | |
781 | @end menu | |
782 | ||
783 | ||
784 | @node Message catalogs with gettext | |
785 | @subsection The @code{gettext} family of functions | |
786 | ||
787 | The paradigms underlying the @code{gettext} approach to message | |
788 | translations is different from that of the @code{catgets} functions the | |
789 | basic functionally is equivalent. There are functions of the following | |
790 | categories: | |
791 | ||
792 | @menu | |
17c389fc UD |
793 | * Translation with gettext:: What has to be done to translate a message. |
794 | * Locating gettext catalog:: How to determine which catalog to be used. | |
795 | * Advanced gettext functions:: Additional functions for more complicated | |
796 | situations. | |
797 | * Charset conversion in gettext:: How to specify the output character set | |
798 | @code{gettext} uses. | |
799 | * GUI program problems:: How to use @code{gettext} in GUI programs. | |
800 | * Using gettextized software:: The possibilities of the user to influence | |
801 | the way @code{gettext} works. | |
40a55d20 UD |
802 | @end menu |
803 | ||
804 | @node Translation with gettext | |
805 | @subsubsection What has to be done to translate a message? | |
806 | ||
807 | The @code{gettext} functions have a very simple interface. The most | |
808 | basic function just takes the string which shall be translated as the | |
809 | argument and it returns the translation. This is fundamentally | |
810 | different from the @code{catgets} approach where an extra key is | |
811 | necessary and the original string is only used for the error case. | |
812 | ||
813 | If the string which has to be translated is the only argument this of | |
814 | course means the string itself is the key. I.e., the translation will | |
815 | be selected based on the original string. The message catalogs must | |
816 | therefore contain the original strings plus one translation for any such | |
10b89412 | 817 | string. The task of the @code{gettext} function is to compare the |
40a55d20 UD |
818 | argument string with the available strings in the catalog and return the |
819 | appropriate translation. Of course this process is optimized so that | |
820 | this process is not more expensive than an access using an atomic key | |
821 | like in @code{catgets}. | |
822 | ||
823 | The @code{gettext} approach has some advantages but also some | |
824 | disadvantages. Please see the GNU @file{gettext} manual for a detailed | |
825 | discussion of the pros and cons. | |
826 | ||
827 | All the definitions and declarations for @code{gettext} can be found in | |
828 | the @file{libintl.h} header file. On systems where these functions are | |
829 | not part of the C library they can be found in a separate library named | |
830 | @file{libintl.a} (or accordingly different for shared libraries). | |
831 | ||
832 | @deftypefun {char *} gettext (const char *@var{msgid}) | |
d08a7e4c | 833 | @standards{GNU, libintl.h} |
29e7e2df AO |
834 | @safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} |
835 | @c Wrapper for dcgettext. | |
40a55d20 UD |
836 | The @code{gettext} function searches the currently selected message |
837 | catalogs for a string which is equal to @var{msgid}. If there is such a | |
838 | string available it is returned. Otherwise the argument string | |
839 | @var{msgid} is returned. | |
840 | ||
29e7e2df | 841 | Please note that although the return value is @code{char *} the |
40a55d20 UD |
842 | returned string must not be changed. This broken type results from the |
843 | history of the function and does not reflect the way the function should | |
844 | be used. | |
845 | ||
846 | Please note that above we wrote ``message catalogs'' (plural). This is | |
608cc1f0 | 847 | a specialty of the GNU implementation of these functions and we will |
8b7fb588 UD |
848 | say more about this when we talk about the ways message catalogs are |
849 | selected (@pxref{Locating gettext catalog}). | |
40a55d20 UD |
850 | |
851 | The @code{gettext} function does not modify the value of the global | |
010fe231 | 852 | @code{errno} variable. This is necessary to make it possible to write |
40a55d20 UD |
853 | something like |
854 | ||
855 | @smallexample | |
856 | printf (gettext ("Operation failed: %m\n")); | |
857 | @end smallexample | |
858 | ||
010fe231 | 859 | Here the @code{errno} value is used in the @code{printf} function while |
40a55d20 UD |
860 | processing the @code{%m} format element and if the @code{gettext} |
861 | function would change this value (it is called before @code{printf} is | |
f2ea0f5b | 862 | called) we would get a wrong message. |
40a55d20 | 863 | |
10b89412 | 864 | So there is no easy way to detect a missing message catalog besides |
40a55d20 UD |
865 | comparing the argument string with the result. But it is normally the |
866 | task of the user to react on missing catalogs. The program cannot guess | |
1410e233 | 867 | when a message catalog is really necessary since for a user who speaks |
10b89412 | 868 | the language the program was developed in, the message does not need any translation. |
40a55d20 UD |
869 | @end deftypefun |
870 | ||
871 | The remaining two functions to access the message catalog add some | |
872 | functionality to select a message catalog which is not the default one. | |
873 | This is important if parts of the program are developed independently. | |
874 | Every part can have its own message catalog and all of them can be used | |
875 | at the same time. The C library itself is an example: internally it | |
876 | uses the @code{gettext} functions but since it must not depend on a | |
877 | currently selected default message catalog it must specify all ambiguous | |
878 | information. | |
879 | ||
880 | @deftypefun {char *} dgettext (const char *@var{domainname}, const char *@var{msgid}) | |
d08a7e4c | 881 | @standards{GNU, libintl.h} |
29e7e2df AO |
882 | @safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} |
883 | @c Wrapper for dcgettext. | |
10b89412 | 884 | The @code{dgettext} function acts just like the @code{gettext} |
40a55d20 UD |
885 | function. It only takes an additional first argument @var{domainname} |
886 | which guides the selection of the message catalogs which are searched | |
887 | for the translation. If the @var{domainname} parameter is the null | |
888 | pointer the @code{dgettext} function is exactly equivalent to | |
889 | @code{gettext} since the default value for the domain name is used. | |
890 | ||
891 | As for @code{gettext} the return value type is @code{char *} which is an | |
f2ea0f5b | 892 | anachronism. The returned string must never be modified. |
40a55d20 UD |
893 | @end deftypefun |
894 | ||
895 | @deftypefun {char *} dcgettext (const char *@var{domainname}, const char *@var{msgid}, int @var{category}) | |
d08a7e4c | 896 | @standards{GNU, libintl.h} |
29e7e2df AO |
897 | @safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} |
898 | @c dcgettext @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem | |
899 | @c dcigettext @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem | |
900 | @c libc_rwlock_rdlock @asulock @aculock | |
901 | @c current_locale_name ok [protected from @mtslocale] | |
902 | @c tfind ok | |
903 | @c libc_rwlock_unlock ok | |
904 | @c plural_lookup ok | |
905 | @c plural_eval ok | |
906 | @c rawmemchr ok | |
907 | @c DETERMINE_SECURE ok, nothing | |
908 | @c strcmp ok | |
909 | @c strlen ok | |
910 | @c getcwd @ascuheap @acsmem @acsfd | |
911 | @c strchr ok | |
912 | @c stpcpy ok | |
913 | @c category_to_name ok | |
914 | @c guess_category_value @mtsenv | |
915 | @c getenv @mtsenv | |
916 | @c current_locale_name dup ok [protected from @mtslocale by dcigettext] | |
917 | @c strcmp ok | |
918 | @c ENABLE_SECURE ok | |
919 | @c _nl_find_domain @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem | |
920 | @c libc_rwlock_rdlock dup @asulock @aculock | |
921 | @c _nl_make_l10nflist dup @ascuheap @acsmem | |
922 | @c libc_rwlock_unlock dup ok | |
923 | @c _nl_load_domain @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem | |
924 | @c libc_lock_lock_recursive @aculock | |
925 | @c libc_lock_unlock_recursive @aculock | |
926 | @c open->open_not_cancel_2 @acsfd | |
927 | @c fstat ok | |
928 | @c mmap dup @acsmem | |
929 | @c close->close_not_cancel_no_status @acsfd | |
930 | @c malloc dup @ascuheap @acsmem | |
931 | @c read->read_not_cancel ok | |
932 | @c munmap dup @acsmem | |
933 | @c W dup ok | |
934 | @c strlen dup ok | |
935 | @c get_sysdep_segment_value ok | |
936 | @c memcpy dup ok | |
937 | @c hash_string dup ok | |
938 | @c free dup @ascuheap @acsmem | |
939 | @c libc_rwlock_init ok | |
940 | @c _nl_find_msg dup @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem | |
941 | @c libc_rwlock_fini ok | |
942 | @c EXTRACT_PLURAL_EXPRESSION @ascuheap @acsmem | |
943 | @c strstr dup ok | |
944 | @c isspace ok | |
945 | @c strtoul ok | |
946 | @c PLURAL_PARSE @ascuheap @acsmem | |
947 | @c malloc dup @ascuheap @acsmem | |
948 | @c free dup @ascuheap @acsmem | |
949 | @c INIT_GERMANIC_PLURAL ok, nothing | |
950 | @c the pre-C99 variant is @acucorrupt [protected from @mtuinit by dcigettext] | |
951 | @c _nl_expand_alias dup @ascuheap @asulock @acsmem @acsfd @aculock | |
952 | @c _nl_explode_name dup @ascuheap @acsmem | |
953 | @c libc_rwlock_wrlock dup @asulock @aculock | |
954 | @c free dup @asulock @aculock @acsfd @acsmem | |
955 | @c _nl_find_msg @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem | |
956 | @c _nl_load_domain dup @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem | |
957 | @c strlen ok | |
958 | @c hash_string ok | |
959 | @c W ok | |
960 | @c SWAP ok | |
961 | @c bswap_32 ok | |
962 | @c strcmp ok | |
963 | @c get_output_charset @mtsenv @ascuheap @acsmem | |
964 | @c getenv dup @mtsenv | |
965 | @c strlen dup ok | |
966 | @c malloc dup @ascuheap @acsmem | |
967 | @c memcpy dup ok | |
968 | @c libc_rwlock_rdlock dup @asulock @aculock | |
969 | @c libc_rwlock_unlock dup ok | |
970 | @c libc_rwlock_wrlock dup @asulock @aculock | |
971 | @c realloc @ascuheap @acsmem | |
972 | @c strdup @ascuheap @acsmem | |
973 | @c strstr ok | |
974 | @c strcspn ok | |
975 | @c mempcpy dup ok | |
976 | @c norm_add_slashes dup ok | |
977 | @c gconv_open @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
978 | @c [protected from @mtslocale by dcigettext locale lock] | |
979 | @c free dup @ascuheap @acsmem | |
980 | @c libc_lock_lock @asulock @aculock | |
981 | @c calloc @ascuheap @acsmem | |
982 | @c gconv dup @acucorrupt [protected from @mtsrace and @asucorrupt by lock] | |
983 | @c libc_lock_unlock ok | |
984 | @c malloc @ascuheap @acsmem | |
985 | @c mempcpy ok | |
986 | @c memcpy ok | |
987 | @c strcpy ok | |
988 | @c libc_rwlock_wrlock @asulock @aculock | |
989 | @c tsearch @ascuheap @acucorrupt @acsmem [protected from @mtsrace and @asucorrupt] | |
990 | @c transcmp ok | |
991 | @c strmp dup ok | |
992 | @c free @ascuheap @acsmem | |
40a55d20 UD |
993 | The @code{dcgettext} adds another argument to those which |
994 | @code{dgettext} takes. This argument @var{category} specifies the last | |
995 | piece of information needed to localize the message catalog. I.e., the | |
996 | domain name and the locale category exactly specify which message | |
997 | catalog has to be used (relative to a given directory, see below). | |
998 | ||
999 | The @code{dgettext} function can be expressed in terms of | |
1000 | @code{dcgettext} by using | |
1001 | ||
1002 | @smallexample | |
1003 | dcgettext (domain, string, LC_MESSAGES) | |
1004 | @end smallexample | |
1005 | ||
1006 | @noindent | |
1007 | instead of | |
1008 | ||
1009 | @smallexample | |
1010 | dgettext (domain, string) | |
1011 | @end smallexample | |
1012 | ||
1013 | This also shows which values are expected for the third parameter. One | |
1014 | has to use the available selectors for the categories available in | |
1015 | @file{locale.h}. Normally the available values are @code{LC_CTYPE}, | |
1016 | @code{LC_COLLATE}, @code{LC_MESSAGES}, @code{LC_MONETARY}, | |
1017 | @code{LC_NUMERIC}, and @code{LC_TIME}. Please note that @code{LC_ALL} | |
1018 | must not be used and even though the names might suggest this, there is | |
10b89412 | 1019 | no relation to the environment variable of this name. |
40a55d20 UD |
1020 | |
1021 | The @code{dcgettext} function is only implemented for compatibility with | |
1022 | other systems which have @code{gettext} functions. There is not really | |
1023 | any situation where it is necessary (or useful) to use a different value | |
10b89412 | 1024 | than @code{LC_MESSAGES} for the @var{category} parameter. We are |
40a55d20 UD |
1025 | dealing with messages here and any other choice can only be irritating. |
1026 | ||
1027 | As for @code{gettext} the return value type is @code{char *} which is an | |
f2ea0f5b | 1028 | anachronism. The returned string must never be modified. |
40a55d20 UD |
1029 | @end deftypefun |
1030 | ||
1031 | When using the three functions above in a program it is a frequent case | |
10b89412 | 1032 | that the @var{msgid} argument is a constant string. So it is worthwhile to |
40a55d20 UD |
1033 | optimize this case. Thinking shortly about this one will realize that |
1034 | as long as no new message catalog is loaded the translation of a message | |
1410e233 UD |
1035 | will not change. This optimization is actually implemented by the |
1036 | @code{gettext}, @code{dgettext} and @code{dcgettext} functions. | |
40a55d20 UD |
1037 | |
1038 | ||
1039 | @node Locating gettext catalog | |
1040 | @subsubsection How to determine which catalog to be used | |
1041 | ||
f2ea0f5b | 1042 | The functions to retrieve the translations for a given message have a |
40a55d20 UD |
1043 | remarkable simple interface. But to provide the user of the program |
1044 | still the opportunity to select exactly the translation s/he wants and | |
1045 | also to provide the programmer the possibility to influence the way to | |
1046 | locate the search for catalogs files there is a quite complicated | |
1047 | underlying mechanism which controls all this. The code is complicated | |
1048 | the use is easy. | |
1049 | ||
1050 | Basically we have two different tasks to perform which can also be | |
1051 | performed by the @code{catgets} functions: | |
1052 | ||
1053 | @enumerate | |
1054 | @item | |
1055 | Locate the set of message catalogs. There are a number of files for | |
10b89412 | 1056 | different languages which all belong to the package. Usually they |
40a55d20 UD |
1057 | are all stored in the filesystem below a certain directory. |
1058 | ||
10b89412 | 1059 | There can be arbitrarily many packages installed and they can follow |
40a55d20 UD |
1060 | different guidelines for the placement of their files. |
1061 | ||
1062 | @item | |
1063 | Relative to the location specified by the package the actual translation | |
1064 | files must be searched, based on the wishes of the user. I.e., for each | |
1065 | language the user selects the program should be able to locate the | |
1066 | appropriate file. | |
1067 | @end enumerate | |
1068 | ||
1069 | This is the functionality required by the specifications for | |
1070 | @code{gettext} and this is also what the @code{catgets} functions are | |
1071 | able to do. But there are some problems unresolved: | |
1072 | ||
1073 | @itemize @bullet | |
1074 | @item | |
1075 | The language to be used can be specified in several different ways. | |
1076 | There is no generally accepted standard for this and the user always | |
10b89412 | 1077 | expects the program to understand what s/he means. E.g., to select the |
40a55d20 UD |
1078 | German translation one could write @code{de}, @code{german}, or |
1079 | @code{deutsch} and the program should always react the same. | |
1080 | ||
1081 | @item | |
1082 | Sometimes the specification of the user is too detailed. If s/he, e.g., | |
1083 | specifies @code{de_DE.ISO-8859-1} which means German, spoken in Germany, | |
1084 | coded using the @w{ISO 8859-1} character set there is the possibility | |
1085 | that a message catalog matching this exactly is not available. But | |
1086 | there could be a catalog matching @code{de} and if the character set | |
1087 | used on the machine is always @w{ISO 8859-1} there is no reason why this | |
1088 | later message catalog should not be used. (We call this @dfn{message | |
1089 | inheritance}.) | |
1090 | ||
1091 | @item | |
1092 | If a catalog for a wanted language is not available it is not always the | |
1093 | second best choice to fall back on the language of the developer and | |
1094 | simply not translate any message. Instead a user might be better able | |
1095 | to read the messages in another language and so the user of the program | |
9dcc8f11 | 1096 | should be able to define a precedence order of languages. |
40a55d20 UD |
1097 | @end itemize |
1098 | ||
f2ea0f5b | 1099 | We can divide the configuration actions in two parts: the one is |
40a55d20 UD |
1100 | performed by the programmer, the other by the user. We will start with |
1101 | the functions the programmer can use since the user configuration will | |
1102 | be based on this. | |
1103 | ||
1104 | As the functions described in the last sections already mention separate | |
1105 | sets of messages can be selected by a @dfn{domain name}. This is a | |
10b89412 RJ |
1106 | simple string which should be unique for each program part that uses a |
1107 | separate domain. It is possible to use in one program arbitrarily many | |
1f77f049 | 1108 | domains at the same time. E.g., @theglibc{} itself uses a domain |
40a55d20 UD |
1109 | named @code{libc} while the program using the C Library could use a |
1110 | domain named @code{foo}. The important point is that at any time | |
1111 | exactly one domain is active. This is controlled with the following | |
1112 | function. | |
1113 | ||
1114 | @deftypefun {char *} textdomain (const char *@var{domainname}) | |
d08a7e4c | 1115 | @standards{GNU, libintl.h} |
29e7e2df AO |
1116 | @safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}} |
1117 | @c textdomain @asulock @ascuheap @aculock @acsmem | |
1118 | @c libc_rwlock_wrlock @asulock @aculock | |
1119 | @c strcmp ok | |
1120 | @c strdup @ascuheap @acsmem | |
1121 | @c free @ascuheap @acsmem | |
1122 | @c libc_rwlock_unlock ok | |
40a55d20 UD |
1123 | The @code{textdomain} function sets the default domain, which is used in |
1124 | all future @code{gettext} calls, to @var{domainname}. Please note that | |
1125 | @code{dgettext} and @code{dcgettext} calls are not influenced if the | |
1126 | @var{domainname} parameter of these functions is not the null pointer. | |
1127 | ||
1128 | Before the first call to @code{textdomain} the default domain is | |
f2ea0f5b | 1129 | @code{messages}. This is the name specified in the specification of |
40a55d20 UD |
1130 | the @code{gettext} API. This name is as good as any other name. No |
1131 | program should ever really use a domain with this name since this can | |
1132 | only lead to problems. | |
1133 | ||
1134 | The function returns the value which is from now on taken as the default | |
1135 | domain. If the system went out of memory the returned value is | |
010fe231 | 1136 | @code{NULL} and the global variable @code{errno} is set to @code{ENOMEM}. |
40a55d20 UD |
1137 | Despite the return value type being @code{char *} the return string must |
1138 | not be changed. It is allocated internally by the @code{textdomain} | |
1139 | function. | |
1140 | ||
1141 | If the @var{domainname} parameter is the null pointer no new default | |
1142 | domain is set. Instead the currently selected default domain is | |
1143 | returned. | |
1144 | ||
1145 | If the @var{domainname} parameter is the empty string the default domain | |
1146 | is reset to its initial value, the domain with the name @code{messages}. | |
1147 | This possibility is questionable to use since the domain @code{messages} | |
1148 | really never should be used. | |
1149 | @end deftypefun | |
1150 | ||
1151 | @deftypefun {char *} bindtextdomain (const char *@var{domainname}, const char *@var{dirname}) | |
d08a7e4c | 1152 | @standards{GNU, libintl.h} |
29e7e2df AO |
1153 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
1154 | @c bindtextdomain @ascuheap @acsmem | |
1155 | @c set_binding_values @ascuheap @acsmem | |
1156 | @c libc_rwlock_wrlock dup @asulock @aculock | |
1157 | @c strcmp dup ok | |
1158 | @c strdup dup @ascuheap @acsmem | |
1159 | @c free dup @ascuheap @acsmem | |
1160 | @c malloc dup @ascuheap @acsmem | |
9133b79b | 1161 | The @code{bindtextdomain} function can be used to specify the directory |
40a55d20 UD |
1162 | which contains the message catalogs for domain @var{domainname} for the |
1163 | different languages. To be correct, this is the directory where the | |
f2ea0f5b | 1164 | hierarchy of directories is expected. Details are explained below. |
40a55d20 UD |
1165 | |
1166 | For the programmer it is important to note that the translations which | |
10b89412 | 1167 | come with the program have to be placed in a directory hierarchy starting |
40a55d20 UD |
1168 | at, say, @file{/foo/bar}. Then the program should make a |
1169 | @code{bindtextdomain} call to bind the domain for the current program to | |
1170 | this directory. So it is made sure the catalogs are found. A correctly | |
1171 | running program does not depend on the user setting an environment | |
1172 | variable. | |
1173 | ||
1174 | The @code{bindtextdomain} function can be used several times and if the | |
17c389fc | 1175 | @var{domainname} argument is different the previously bound domains |
40a55d20 UD |
1176 | will not be overwritten. |
1177 | ||
26b4d766 UD |
1178 | If the program which wish to use @code{bindtextdomain} at some point of |
1179 | time use the @code{chdir} function to change the current working | |
1180 | directory it is important that the @var{dirname} strings ought to be an | |
1181 | absolute pathname. Otherwise the addressed directory might vary with | |
1182 | the time. | |
1183 | ||
40a55d20 UD |
1184 | If the @var{dirname} parameter is the null pointer @code{bindtextdomain} |
1185 | returns the currently selected directory for the domain with the name | |
1186 | @var{domainname}. | |
1187 | ||
9133b79b | 1188 | The @code{bindtextdomain} function returns a pointer to a string |
40a55d20 UD |
1189 | containing the name of the selected directory name. The string is |
1190 | allocated internally in the function and must not be changed by the | |
1191 | user. If the system went out of core during the execution of | |
1192 | @code{bindtextdomain} the return value is @code{NULL} and the global | |
010fe231 | 1193 | variable @code{errno} is set accordingly. |
40a55d20 UD |
1194 | @end deftypefun |
1195 | ||
1196 | ||
b8a46c1d UD |
1197 | @node Advanced gettext functions |
1198 | @subsubsection Additional functions for more complicated situations | |
1199 | ||
1200 | The functions of the @code{gettext} family described so far (and all the | |
1201 | @code{catgets} functions as well) have one problem in the real world | |
10b89412 | 1202 | which has been neglected completely in all existing approaches. What |
b8a46c1d UD |
1203 | is meant here is the handling of plural forms. |
1204 | ||
1205 | Looking through Unix source code before the time anybody thought about | |
1206 | internationalization (and, sadly, even afterwards) one can often find | |
1207 | code similar to the following: | |
1208 | ||
1209 | @smallexample | |
1210 | printf ("%d file%s deleted", n, n == 1 ? "" : "s"); | |
1211 | @end smallexample | |
1212 | ||
1213 | @noindent | |
c891b2df | 1214 | After the first complaints from people internationalizing the code people |
b8a46c1d UD |
1215 | either completely avoided formulations like this or used strings like |
1216 | @code{"file(s)"}. Both look unnatural and should be avoided. First | |
1217 | tries to solve the problem correctly looked like this: | |
1218 | ||
1219 | @smallexample | |
1220 | if (n == 1) | |
1221 | printf ("%d file deleted", n); | |
1222 | else | |
1223 | printf ("%d files deleted", n); | |
1224 | @end smallexample | |
1225 | ||
1226 | But this does not solve the problem. It helps languages where the | |
1227 | plural form of a noun is not simply constructed by adding an `s' but | |
1228 | that is all. Once again people fell into the trap of believing the | |
10b89412 | 1229 | rules their language uses are universal. But the handling of plural |
b8a46c1d UD |
1230 | forms differs widely between the language families. There are two |
1231 | things we can differ between (and even inside language families); | |
1232 | ||
1233 | @itemize @bullet | |
1234 | @item | |
1235 | The form how plural forms are build differs. This is a problem with | |
1236 | language which have many irregularities. German, for instance, is a | |
1237 | drastic case. Though English and German are part of the same language | |
1238 | family (Germanic), the almost regular forming of plural noun forms | |
608cc1f0 | 1239 | (appending an `s') is hardly found in German. |
b8a46c1d UD |
1240 | |
1241 | @item | |
1242 | The number of plural forms differ. This is somewhat surprising for | |
1243 | those who only have experiences with Romanic and Germanic languages | |
1244 | since here the number is the same (there are two). | |
1245 | ||
1246 | But other language families have only one form or many forms. More | |
1247 | information on this in an extra section. | |
1248 | @end itemize | |
1249 | ||
1250 | The consequence of this is that application writers should not try to | |
1251 | solve the problem in their code. This would be localization since it is | |
1252 | only usable for certain, hardcoded language environments. Instead the | |
1253 | extended @code{gettext} interface should be used. | |
1254 | ||
1255 | These extra functions are taking instead of the one key string two | |
9dcc8f11 | 1256 | strings and a numerical argument. The idea behind this is that using |
b8a46c1d UD |
1257 | the numerical argument and the first string as a key, the implementation |
1258 | can select using rules specified by the translator the right plural | |
1259 | form. The two string arguments then will be used to provide a return | |
1260 | value in case no message catalog is found (similar to the normal | |
608cc1f0 | 1261 | @code{gettext} behavior). In this case the rules for Germanic language |
10b89412 | 1262 | are used and it is assumed that the first string argument is the singular |
b8a46c1d UD |
1263 | form, the second the plural form. |
1264 | ||
1265 | This has the consequence that programs without language catalogs can | |
1266 | display the correct strings only if the program itself is written using | |
1f77f049 | 1267 | a Germanic language. This is a limitation but since @theglibc{} |
10b89412 RJ |
1268 | (as well as the GNU @code{gettext} package) is written as part of the |
1269 | GNU package and the coding standards for the GNU project require programs | |
1270 | to be written in English, this solution nevertheless fulfills its | |
b8a46c1d UD |
1271 | purpose. |
1272 | ||
b8a46c1d | 1273 | @deftypefun {char *} ngettext (const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n}) |
d08a7e4c | 1274 | @standards{GNU, libintl.h} |
29e7e2df AO |
1275 | @safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} |
1276 | @c Wrapper for dcngettext. | |
b8a46c1d UD |
1277 | The @code{ngettext} function is similar to the @code{gettext} function |
1278 | as it finds the message catalogs in the same way. But it takes two | |
1279 | extra arguments. The @var{msgid1} parameter must contain the singular | |
1280 | form of the string to be converted. It is also used as the key for the | |
1281 | search in the catalog. The @var{msgid2} parameter is the plural form. | |
1282 | The parameter @var{n} is used to determine the plural form. If no | |
1283 | message catalog is found @var{msgid1} is returned if @code{n == 1}, | |
1284 | otherwise @code{msgid2}. | |
1285 | ||
10b89412 | 1286 | An example for the use of this function is: |
b8a46c1d UD |
1287 | |
1288 | @smallexample | |
1289 | printf (ngettext ("%d file removed", "%d files removed", n), n); | |
1290 | @end smallexample | |
1291 | ||
1292 | Please note that the numeric value @var{n} has to be passed to the | |
1293 | @code{printf} function as well. It is not sufficient to pass it only to | |
1294 | @code{ngettext}. | |
1295 | @end deftypefun | |
1296 | ||
b8a46c1d | 1297 | @deftypefun {char *} dngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n}) |
d08a7e4c | 1298 | @standards{GNU, libintl.h} |
29e7e2df AO |
1299 | @safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} |
1300 | @c Wrapper for dcngettext. | |
b8a46c1d UD |
1301 | The @code{dngettext} is similar to the @code{dgettext} function in the |
1302 | way the message catalog is selected. The difference is that it takes | |
10b89412 | 1303 | two extra parameters to provide the correct plural form. These two |
b8a46c1d UD |
1304 | parameters are handled in the same way @code{ngettext} handles them. |
1305 | @end deftypefun | |
1306 | ||
b8a46c1d | 1307 | @deftypefun {char *} dcngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n}, int @var{category}) |
d08a7e4c | 1308 | @standards{GNU, libintl.h} |
29e7e2df AO |
1309 | @safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} |
1310 | @c Wrapper for dcigettext. | |
b8a46c1d UD |
1311 | The @code{dcngettext} is similar to the @code{dcgettext} function in the |
1312 | way the message catalog is selected. The difference is that it takes | |
10b89412 | 1313 | two extra parameters to provide the correct plural form. These two |
b8a46c1d UD |
1314 | parameters are handled in the same way @code{ngettext} handles them. |
1315 | @end deftypefun | |
1316 | ||
1317 | @subsubheading The problem of plural forms | |
1318 | ||
1319 | A description of the problem can be found at the beginning of the last | |
1320 | section. Now there is the question how to solve it. Without the input | |
1321 | of linguists (which was not available) it was not possible to determine | |
1322 | whether there are only a few different forms in which plural forms are | |
1323 | formed or whether the number can increase with every new supported | |
1324 | language. | |
1325 | ||
1326 | Therefore the solution implemented is to allow the translator to specify | |
1327 | the rules of how to select the plural form. Since the formula varies | |
1328 | with every language this is the only viable solution except for | |
608cc1f0 UD |
1329 | hardcoding the information in the code (which still would require the |
1330 | possibility of extensions to not prevent the use of new languages). The | |
a1286745 | 1331 | details are explained in the GNU @code{gettext} manual. Here only a |
b8a46c1d UD |
1332 | bit of information is provided. |
1333 | ||
1334 | The information about the plural form selection has to be stored in the | |
10b89412 | 1335 | header entry (the one with the empty @code{msgid} string). It looks |
c891b2df | 1336 | like this: |
b8a46c1d UD |
1337 | |
1338 | @smallexample | |
c891b2df | 1339 | Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1; |
b8a46c1d UD |
1340 | @end smallexample |
1341 | ||
1342 | The @code{nplurals} value must be a decimal number which specifies how | |
1343 | many different plural forms exist for this language. The string | |
10b89412 RJ |
1344 | following @code{plural} is an expression using the C language |
1345 | syntax. Exceptions are that no negative numbers are allowed, numbers | |
b8a46c1d UD |
1346 | must be decimal, and the only variable allowed is @code{n}. This |
1347 | expression will be evaluated whenever one of the functions | |
1348 | @code{ngettext}, @code{dngettext}, or @code{dcngettext} is called. The | |
1349 | numeric value passed to these functions is then substituted for all uses | |
1350 | of the variable @code{n} in the expression. The resulting value then | |
1351 | must be greater or equal to zero and smaller than the value given as the | |
1352 | value of @code{nplurals}. | |
1353 | ||
1354 | @noindent | |
1355 | The following rules are known at this point. The language with families | |
1356 | are listed. But this does not necessarily mean the information can be | |
1357 | generalized for the whole family (as can be easily seen in the table | |
1358 | below).@footnote{Additions are welcome. Send appropriate information to | |
1359 | @email{bug-glibc-manual@@gnu.org}.} | |
1360 | ||
1361 | @table @asis | |
1362 | @item Only one form: | |
1363 | Some languages only require one single form. There is no distinction | |
c891b2df | 1364 | between the singular and plural form. An appropriate header entry |
b8a46c1d UD |
1365 | would look like this: |
1366 | ||
1367 | @smallexample | |
c891b2df | 1368 | Plural-Forms: nplurals=1; plural=0; |
b8a46c1d UD |
1369 | @end smallexample |
1370 | ||
1371 | @noindent | |
1372 | Languages with this property include: | |
1373 | ||
1374 | @table @asis | |
1375 | @item Finno-Ugric family | |
1376 | Hungarian | |
1377 | @item Asian family | |
3c945c44 | 1378 | Japanese, Korean |
b8a46c1d UD |
1379 | @item Turkic/Altaic family |
1380 | Turkish | |
1381 | @end table | |
1382 | ||
1383 | @item Two forms, singular used for one only | |
c934e1c0 | 1384 | This is the form used in most existing programs since it is what English |
10b89412 | 1385 | uses. A header entry would look like this: |
b8a46c1d UD |
1386 | |
1387 | @smallexample | |
c891b2df | 1388 | Plural-Forms: nplurals=2; plural=n != 1; |
b8a46c1d UD |
1389 | @end smallexample |
1390 | ||
1391 | (Note: this uses the feature of C expressions that boolean expressions | |
1392 | have to value zero or one.) | |
1393 | ||
1394 | @noindent | |
1395 | Languages with this property include: | |
1396 | ||
1397 | @table @asis | |
1398 | @item Germanic family | |
1399 | Danish, Dutch, English, German, Norwegian, Swedish | |
1400 | @item Finno-Ugric family | |
aa9e3c39 | 1401 | Estonian, Finnish |
b8a46c1d UD |
1402 | @item Latin/Greek family |
1403 | Greek | |
1404 | @item Semitic family | |
1405 | Hebrew | |
1406 | @item Romance family | |
3c945c44 | 1407 | Italian, Portuguese, Spanish |
b8a46c1d UD |
1408 | @item Artificial |
1409 | Esperanto | |
1410 | @end table | |
1411 | ||
1412 | @item Two forms, singular used for zero and one | |
1413 | Exceptional case in the language family. The header entry would be: | |
1414 | ||
1415 | @smallexample | |
c891b2df | 1416 | Plural-Forms: nplurals=2; plural=n>1; |
b8a46c1d UD |
1417 | @end smallexample |
1418 | ||
1419 | @noindent | |
1420 | Languages with this property include: | |
1421 | ||
1422 | @table @asis | |
1423 | @item Romanic family | |
3c945c44 UD |
1424 | French, Brazilian Portuguese |
1425 | @end table | |
1426 | ||
1427 | @item Three forms, special case for zero | |
1428 | The header entry would be: | |
1429 | ||
1430 | @smallexample | |
1431 | Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n != 0 ? 1 : 2; | |
1432 | @end smallexample | |
1433 | ||
1434 | @noindent | |
1435 | Languages with this property include: | |
1436 | ||
1437 | @table @asis | |
1438 | @item Baltic family | |
1439 | Latvian | |
b8a46c1d UD |
1440 | @end table |
1441 | ||
1442 | @item Three forms, special cases for one and two | |
1443 | The header entry would be: | |
1444 | ||
1445 | @smallexample | |
c891b2df | 1446 | Plural-Forms: nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2; |
b8a46c1d UD |
1447 | @end smallexample |
1448 | ||
1449 | @noindent | |
1450 | Languages with this property include: | |
1451 | ||
1452 | @table @asis | |
1453 | @item Celtic | |
3c945c44 UD |
1454 | Gaeilge (Irish) |
1455 | @end table | |
1456 | ||
1457 | @item Three forms, special case for numbers ending in 1[2-9] | |
1458 | The header entry would look like this: | |
1459 | ||
1460 | @smallexample | |
1461 | Plural-Forms: nplurals=3; \ | |
1462 | plural=n%10==1 && n%100!=11 ? 0 : \ | |
1463 | n%10>=2 && (n%100<10 || n%100>=20) ? 1 : 2; | |
1464 | @end smallexample | |
1465 | ||
1466 | @noindent | |
1467 | Languages with this property include: | |
1468 | ||
1469 | @table @asis | |
1470 | @item Baltic family | |
1471 | Lithuanian | |
b8a46c1d UD |
1472 | @end table |
1473 | ||
aa9e3c39 | 1474 | @item Three forms, special cases for numbers ending in 1 and 2, 3, 4, except those ending in 1[1-4] |
b8a46c1d UD |
1475 | The header entry would look like this: |
1476 | ||
1477 | @smallexample | |
c891b2df UD |
1478 | Plural-Forms: nplurals=3; \ |
1479 | plural=n%100/10==1 ? 2 : n%10==1 ? 0 : (n+9)%10>3 ? 2 : 1; | |
b8a46c1d UD |
1480 | @end smallexample |
1481 | ||
1482 | @noindent | |
1483 | Languages with this property include: | |
1484 | ||
1485 | @table @asis | |
1486 | @item Slavic family | |
3c945c44 | 1487 | Croatian, Czech, Russian, Ukrainian |
107d41a9 UD |
1488 | @end table |
1489 | ||
1490 | @item Three forms, special cases for 1 and 2, 3, 4 | |
1491 | The header entry would look like this: | |
1492 | ||
1493 | @smallexample | |
1494 | Plural-Forms: nplurals=3; \ | |
1495 | plural=(n==1) ? 1 : (n>=2 && n<=4) ? 2 : 0; | |
1496 | @end smallexample | |
1497 | ||
1498 | @noindent | |
1499 | Languages with this property include: | |
1500 | ||
1501 | @table @asis | |
1502 | @item Slavic family | |
1503 | Slovak | |
b8a46c1d UD |
1504 | @end table |
1505 | ||
1506 | @item Three forms, special case for one and some numbers ending in 2, 3, or 4 | |
1507 | The header entry would look like this: | |
1508 | ||
1509 | @smallexample | |
c891b2df UD |
1510 | Plural-Forms: nplurals=3; \ |
1511 | plural=n==1 ? 0 : \ | |
1512 | n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2; | |
b8a46c1d UD |
1513 | @end smallexample |
1514 | ||
b8a46c1d UD |
1515 | @noindent |
1516 | Languages with this property include: | |
1517 | ||
1518 | @table @asis | |
1519 | @item Slavic family | |
1520 | Polish | |
1521 | @end table | |
1522 | ||
3c945c44 | 1523 | @item Four forms, special case for one and all numbers ending in 02, 03, or 04 |
b8a46c1d UD |
1524 | The header entry would look like this: |
1525 | ||
1526 | @smallexample | |
c891b2df | 1527 | Plural-Forms: nplurals=4; \ |
3c945c44 | 1528 | plural=n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3; |
b8a46c1d UD |
1529 | @end smallexample |
1530 | ||
1531 | @noindent | |
1532 | Languages with this property include: | |
1533 | ||
1534 | @table @asis | |
1535 | @item Slavic family | |
1536 | Slovenian | |
1537 | @end table | |
1538 | @end table | |
1539 | ||
1540 | ||
17c389fc UD |
1541 | @node Charset conversion in gettext |
1542 | @subsubsection How to specify the output character set @code{gettext} uses | |
1543 | ||
10b89412 | 1544 | @code{gettext} not only looks up a translation in a message catalog, it |
17c389fc UD |
1545 | also converts the translation on the fly to the desired output character |
1546 | set. This is useful if the user is working in a different character set | |
1547 | than the translator who created the message catalog, because it avoids | |
1548 | distributing variants of message catalogs which differ only in the | |
1549 | character set. | |
1550 | ||
1551 | The output character set is, by default, the value of @code{nl_langinfo | |
1552 | (CODESET)}, which depends on the @code{LC_CTYPE} part of the current | |
1553 | locale. But programs which store strings in a locale independent way | |
1554 | (e.g. UTF-8) can request that @code{gettext} and related functions | |
1555 | return the translations in that encoding, by use of the | |
1556 | @code{bind_textdomain_codeset} function. | |
1557 | ||
1558 | Note that the @var{msgid} argument to @code{gettext} is not subject to | |
1559 | character set conversion. Also, when @code{gettext} does not find a | |
1560 | translation for @var{msgid}, it returns @var{msgid} unchanged -- | |
1561 | independently of the current output character set. It is therefore | |
1562 | recommended that all @var{msgid}s be US-ASCII strings. | |
1563 | ||
17c389fc | 1564 | @deftypefun {char *} bind_textdomain_codeset (const char *@var{domainname}, const char *@var{codeset}) |
d08a7e4c | 1565 | @standards{GNU, libintl.h} |
29e7e2df AO |
1566 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
1567 | @c bind_textdomain_codeset @ascuheap @acsmem | |
1568 | @c set_binding_values dup @ascuheap @acsmem | |
17c389fc UD |
1569 | The @code{bind_textdomain_codeset} function can be used to specify the |
1570 | output character set for message catalogs for domain @var{domainname}. | |
1410e233 UD |
1571 | The @var{codeset} argument must be a valid codeset name which can be used |
1572 | for the @code{iconv_open} function, or a null pointer. | |
17c389fc UD |
1573 | |
1574 | If the @var{codeset} parameter is the null pointer, | |
1575 | @code{bind_textdomain_codeset} returns the currently selected codeset | |
cf822e3c | 1576 | for the domain with the name @var{domainname}. It returns @code{NULL} if |
17c389fc UD |
1577 | no codeset has yet been selected. |
1578 | ||
107d41a9 | 1579 | The @code{bind_textdomain_codeset} function can be used several times. |
17c389fc UD |
1580 | If used multiple times with the same @var{domainname} argument, the |
1581 | later call overrides the settings made by the earlier one. | |
1582 | ||
1583 | The @code{bind_textdomain_codeset} function returns a pointer to a | |
1584 | string containing the name of the selected codeset. The string is | |
1585 | allocated internally in the function and must not be changed by the | |
1586 | user. If the system went out of core during the execution of | |
1587 | @code{bind_textdomain_codeset}, the return value is @code{NULL} and the | |
010fe231 | 1588 | global variable @code{errno} is set accordingly. |
582a3cff | 1589 | @end deftypefun |
17c389fc UD |
1590 | |
1591 | ||
608cc1f0 UD |
1592 | @node GUI program problems |
1593 | @subsubsection How to use @code{gettext} in GUI programs | |
1594 | ||
1410e233 UD |
1595 | One place where the @code{gettext} functions, if used normally, have big |
1596 | problems is within programs with graphical user interfaces (GUIs). The | |
608cc1f0 UD |
1597 | problem is that many of the strings which have to be translated are very |
1598 | short. They have to appear in pull-down menus which restricts the | |
1599 | length. But strings which are not containing entire sentences or at | |
1600 | least large fragments of a sentence may appear in more than one | |
1601 | situation in the program but might have different translations. This is | |
1602 | especially true for the one-word strings which are frequently used in | |
1603 | GUI programs. | |
1604 | ||
1605 | As a consequence many people say that the @code{gettext} approach is | |
1606 | wrong and instead @code{catgets} should be used which indeed does not | |
1607 | have this problem. But there is a very simple and powerful method to | |
1608 | handle these kind of problems with the @code{gettext} functions. | |
1609 | ||
1610 | @noindent | |
bbf70ae9 | 1611 | As an example consider the following fictional situation. A GUI program |
608cc1f0 UD |
1612 | has a menu bar with the following entries: |
1613 | ||
1614 | @smallexample | |
1615 | +------------+------------+--------------------------------------+ | |
1616 | | File | Printer | | | |
1617 | +------------+------------+--------------------------------------+ | |
1618 | | Open | | Select | | |
1619 | | New | | Open | | |
1620 | +----------+ | Connect | | |
1621 | +----------+ | |
1622 | @end smallexample | |
1623 | ||
1624 | To have the strings @code{File}, @code{Printer}, @code{Open}, | |
1625 | @code{New}, @code{Select}, and @code{Connect} translated there has to be | |
1626 | at some point in the code a call to a function of the @code{gettext} | |
1627 | family. But in two places the string passed into the function would be | |
1628 | @code{Open}. The translations might not be the same and therefore we | |
1629 | are in the dilemma described above. | |
1630 | ||
ef48b196 | 1631 | One solution to this problem is to artificially extend the strings |
608cc1f0 | 1632 | to make them unambiguous. But what would the program do if no |
ef48b196 | 1633 | translation is available? The extended string is not what should be |
10b89412 | 1634 | printed. So we should use a slightly modified version of the functions. |
608cc1f0 | 1635 | |
ef48b196 | 1636 | To extend the strings a uniform method should be used. E.g., in the |
10b89412 | 1637 | example above, the strings could be chosen as |
608cc1f0 UD |
1638 | |
1639 | @smallexample | |
1640 | Menu|File | |
1641 | Menu|Printer | |
1642 | Menu|File|Open | |
1643 | Menu|File|New | |
1644 | Menu|Printer|Select | |
1645 | Menu|Printer|Open | |
1646 | Menu|Printer|Connect | |
1647 | @end smallexample | |
1648 | ||
1649 | Now all the strings are different and if now instead of @code{gettext} | |
1650 | the following little wrapper function is used, everything works just | |
1651 | fine: | |
1652 | ||
1653 | @cindex sgettext | |
1654 | @smallexample | |
1655 | char * | |
1656 | sgettext (const char *msgid) | |
1657 | @{ | |
1658 | char *msgval = gettext (msgid); | |
1659 | if (msgval == msgid) | |
1660 | msgval = strrchr (msgid, '|') + 1; | |
1661 | return msgval; | |
1662 | @} | |
1663 | @end smallexample | |
1664 | ||
1665 | What this little function does is to recognize the case when no | |
1666 | translation is available. This can be done very efficiently by a | |
1667 | pointer comparison since the return value is the input value. If there | |
1668 | is no translation we know that the input string is in the format we used | |
1669 | for the Menu entries and therefore contains a @code{|} character. We | |
1670 | simply search for the last occurrence of this character and return a | |
1671 | pointer to the character following it. That's it! | |
1672 | ||
ef48b196 | 1673 | If one now consistently uses the extended string form and replaces |
608cc1f0 UD |
1674 | the @code{gettext} calls with calls to @code{sgettext} (this is normally |
1675 | limited to very few places in the GUI implementation) then it is | |
1676 | possible to produce a program which can be internationalized. | |
1677 | ||
1678 | With advanced compilers (such as GNU C) one can write the | |
1679 | @code{sgettext} functions as an inline function or as a macro like this: | |
1680 | ||
1681 | @cindex sgettext | |
1682 | @smallexample | |
1683 | #define sgettext(msgid) \ | |
1684 | (@{ const char *__msgid = (msgid); \ | |
1685 | char *__msgstr = gettext (__msgid); \ | |
1686 | if (__msgval == __msgid) \ | |
1687 | __msgval = strrchr (__msgid, '|') + 1; \ | |
1688 | __msgval; @}) | |
1689 | @end smallexample | |
1690 | ||
1691 | The other @code{gettext} functions (@code{dgettext}, @code{dcgettext} | |
1692 | and the @code{ngettext} equivalents) can and should have corresponding | |
1693 | functions as well which look almost identical, except for the parameters | |
1694 | and the call to the underlying function. | |
1695 | ||
1696 | Now there is of course the question why such functions do not exist in | |
1f77f049 | 1697 | @theglibc{}? There are two parts of the answer to this question. |
608cc1f0 UD |
1698 | |
1699 | @itemize @bullet | |
1700 | @item | |
1701 | They are easy to write and therefore can be provided by the project they | |
1702 | are used in. This is not an answer by itself and must be seen together | |
1703 | with the second part which is: | |
1704 | ||
1705 | @item | |
1706 | There is no way the C library can contain a version which can work | |
1707 | everywhere. The problem is the selection of the character to separate | |
ef48b196 | 1708 | the prefix from the actual string in the extended string. The |
608cc1f0 UD |
1709 | examples above used @code{|} which is a quite good choice because it |
1710 | resembles a notation frequently used in this context and it also is a | |
1711 | character not often used in message strings. | |
1712 | ||
1713 | But what if the character is used in message strings. Or if the chose | |
1714 | character is not available in the character set on the machine one | |
1715 | compiles (e.g., @code{|} is not required to exist for @w{ISO C}; this is | |
1716 | why the @file{iso646.h} file exists in @w{ISO C} programming environments). | |
1717 | @end itemize | |
1718 | ||
1719 | There is only one more comment to make left. The wrapper function above | |
10b89412 | 1720 | requires that the translations strings are not extended themselves. |
608cc1f0 UD |
1721 | This is only logical. There is no need to disambiguate the strings |
1722 | (since they are never used as keys for a search) and one also saves | |
1723 | quite some memory and disk space by doing this. | |
1724 | ||
1725 | ||
40a55d20 UD |
1726 | @node Using gettextized software |
1727 | @subsubsection User influence on @code{gettext} | |
1728 | ||
1729 | The last sections described what the programmer can do to | |
1730 | internationalize the messages of the program. But it is finally up to | |
1731 | the user to select the message s/he wants to see. S/He must understand | |
1732 | them. | |
1733 | ||
1734 | The POSIX locale model uses the environment variables @code{LC_COLLATE}, | |
a1286745 | 1735 | @code{LC_CTYPE}, @code{LC_MESSAGES}, @code{LC_MONETARY}, @code{LC_NUMERIC}, |
40a55d20 | 1736 | and @code{LC_TIME} to select the locale which is to be used. This way |
10b89412 | 1737 | the user can influence lots of functions. As we mentioned above, the |
40a55d20 UD |
1738 | @code{gettext} functions also take advantage of this. |
1739 | ||
1740 | To understand how this happens it is necessary to take a look at the | |
1741 | various components of the filename which gets computed to locate a | |
1742 | message catalog. It is composed as follows: | |
1743 | ||
1744 | @smallexample | |
1745 | @var{dir_name}/@var{locale}/LC_@var{category}/@var{domain_name}.mo | |
1746 | @end smallexample | |
1747 | ||
1748 | The default value for @var{dir_name} is system specific. It is computed | |
1749 | from the value given as the prefix while configuring the C library. | |
1750 | This value normally is @file{/usr} or @file{/}. For the former the | |
1751 | complete @var{dir_name} is: | |
1752 | ||
1753 | @smallexample | |
1754 | /usr/share/locale | |
1755 | @end smallexample | |
1756 | ||
1757 | We can use @file{/usr/share} since the @file{.mo} files containing the | |
e8b1163e | 1758 | message catalogs are system independent, so all systems can use the same |
40a55d20 | 1759 | files. If the program executed the @code{bindtextdomain} function for |
e8b1163e AJ |
1760 | the message domain that is currently handled, the @code{dir_name} |
1761 | component is exactly the value which was given to the function as | |
1762 | the second parameter. I.e., @code{bindtextdomain} allows overwriting | |
f2ea0f5b | 1763 | the only system dependent and fixed value to make it possible to |
e8b1163e | 1764 | address files anywhere in the filesystem. |
40a55d20 UD |
1765 | |
1766 | The @var{category} is the name of the locale category which was selected | |
1767 | in the program code. For @code{gettext} and @code{dgettext} this is | |
1768 | always @code{LC_MESSAGES}, for @code{dcgettext} this is selected by the | |
1769 | value of the third parameter. As said above it should be avoided to | |
1770 | ever use a category other than @code{LC_MESSAGES}. | |
1771 | ||
1772 | The @var{locale} component is computed based on the category used. Just | |
1773 | like for the @code{setlocale} function here comes the user selection | |
1774 | into the play. Some environment variables are examined in a fixed order | |
1775 | and the first environment variable set determines the return value of | |
1776 | the lookup process. In detail, for the category @code{LC_xxx} the | |
1777 | following variables in this order are examined: | |
1778 | ||
1779 | @table @code | |
1780 | @item LANGUAGE | |
1781 | @item LC_ALL | |
1782 | @item LC_xxx | |
1783 | @item LANG | |
1784 | @end table | |
1785 | ||
1786 | This looks very familiar. With the exception of the @code{LANGUAGE} | |
1787 | environment variable this is exactly the lookup order the | |
10b89412 | 1788 | @code{setlocale} function uses. But why introduce the @code{LANGUAGE} |
40a55d20 UD |
1789 | variable? |
1790 | ||
1791 | The reason is that the syntax of the values these variables can have is | |
1792 | different to what is expected by the @code{setlocale} function. If we | |
1793 | would set @code{LC_ALL} to a value following the extended syntax that | |
1794 | would mean the @code{setlocale} function will never be able to use the | |
1795 | value of this variable as well. An additional variable removes this | |
1796 | problem plus we can select the language independently of the locale | |
1797 | setting which sometimes is useful. | |
1798 | ||
1799 | While for the @code{LC_xxx} variables the value should consist of | |
1800 | exactly one specification of a locale the @code{LANGUAGE} variable's | |
1801 | value can consist of a colon separated list of locale names. The | |
1802 | attentive reader will realize that this is the way we manage to | |
1803 | implement one of our additional demands above: we want to be able to | |
10b89412 | 1804 | specify an ordered list of languages. |
40a55d20 UD |
1805 | |
1806 | Back to the constructed filename we have only one component missing. | |
1807 | The @var{domain_name} part is the name which was either registered using | |
1808 | the @code{textdomain} function or which was given to @code{dgettext} or | |
1809 | @code{dcgettext} as the first parameter. Now it becomes obvious that a | |
1810 | good choice for the domain name in the program code is a string which is | |
1f77f049 JM |
1811 | closely related to the program/package name. E.g., for @theglibc{} |
1812 | the domain name is @code{libc}. | |
40a55d20 UD |
1813 | |
1814 | @noindent | |
10b89412 | 1815 | A limited piece of example code should show how the program is supposed |
40a55d20 UD |
1816 | to work: |
1817 | ||
1818 | @smallexample | |
1819 | @{ | |
1410e233 | 1820 | setlocale (LC_ALL, ""); |
40a55d20 UD |
1821 | textdomain ("test-package"); |
1822 | bindtextdomain ("test-package", "/usr/local/share/locale"); | |
17c389fc | 1823 | puts (gettext ("Hello, world!")); |
40a55d20 UD |
1824 | @} |
1825 | @end smallexample | |
1826 | ||
1410e233 UD |
1827 | At the program start the default domain is @code{messages}, and the |
1828 | default locale is "C". The @code{setlocale} call sets the locale | |
1829 | according to the user's environment variables; remember that correct | |
1830 | functioning of @code{gettext} relies on the correct setting of the | |
1831 | @code{LC_MESSAGES} locale (for looking up the message catalog) and | |
1832 | of the @code{LC_CTYPE} locale (for the character set conversion). | |
1833 | The @code{textdomain} call changes the default domain to | |
1834 | @code{test-package}. The @code{bindtextdomain} call specifies that | |
1835 | the message catalogs for the domain @code{test-package} can be found | |
1836 | below the directory @file{/usr/local/share/locale}. | |
40a55d20 | 1837 | |
10b89412 | 1838 | If the user sets in her/his environment the variable @code{LANGUAGE} |
40a55d20 UD |
1839 | to @code{de} the @code{gettext} function will try to use the |
1840 | translations from the file | |
1841 | ||
1842 | @smallexample | |
1843 | /usr/local/share/locale/de/LC_MESSAGES/test-package.mo | |
1844 | @end smallexample | |
1845 | ||
1846 | From the above descriptions it should be clear which component of this | |
f41c8091 UD |
1847 | filename is determined by which source. |
1848 | ||
10b89412 RJ |
1849 | In the above example we assumed the @code{LANGUAGE} environment |
1850 | variable to be @code{de}. This might be an appropriate selection but what | |
f41c8091 UD |
1851 | happens if the user wants to use @code{LC_ALL} because of the wider |
1852 | usability and here the required value is @code{de_DE.ISO-8859-1}? We | |
1853 | already mentioned above that a situation like this is not infrequent. | |
1854 | E.g., a person might prefer reading a dialect and if this is not | |
1855 | available fall back on the standard language. | |
1856 | ||
1857 | The @code{gettext} functions know about situations like this and can | |
1858 | handle them gracefully. The functions recognize the format of the value | |
1859 | of the environment variable. It can split the value is different pieces | |
1860 | and by leaving out the only or the other part it can construct new | |
1861 | values. This happens of course in a predictable way. To understand | |
1862 | this one must know the format of the environment variable value. There | |
7a9a2681 UD |
1863 | is one more or less standardized form, originally from the X/Open |
1864 | specification: | |
f41c8091 | 1865 | |
f41c8091 UD |
1866 | @code{language[_territory[.codeset]][@@modifier]} |
1867 | ||
10b89412 | 1868 | Less specific locale names will be stripped in the order of the |
7a9a2681 | 1869 | following list: |
40a55d20 | 1870 | |
f41c8091 UD |
1871 | @enumerate |
1872 | @item | |
f41c8091 UD |
1873 | @code{codeset} |
1874 | @item | |
1875 | @code{normalized codeset} | |
1876 | @item | |
1877 | @code{territory} | |
1878 | @item | |
7a9a2681 | 1879 | @code{modifier} |
f41c8091 UD |
1880 | @end enumerate |
1881 | ||
7a9a2681 | 1882 | The @code{language} field will never be dropped for obvious reasons. |
f41c8091 UD |
1883 | |
1884 | The only new thing is the @code{normalized codeset} entry. This is | |
10b89412 RJ |
1885 | another goodie which is introduced to help reduce the chaos which |
1886 | derives from the inability of people to standardize the names of | |
f41c8091 UD |
1887 | character sets. Instead of @w{ISO-8859-1} one can often see @w{8859-1}, |
1888 | @w{88591}, @w{iso8859-1}, or @w{iso_8859-1}. The @code{normalized | |
1889 | codeset} value is generated from the user-provided character set name by | |
1890 | applying the following rules: | |
1891 | ||
1892 | @enumerate | |
1893 | @item | |
10b89412 | 1894 | Remove all characters besides numbers and letters. |
f41c8091 UD |
1895 | @item |
1896 | Fold letters to lowercase. | |
1897 | @item | |
1898 | If the same only contains digits prepend the string @code{"iso"}. | |
1899 | @end enumerate | |
1900 | ||
1901 | @noindent | |
10b89412 RJ |
1902 | So all of the above names will be normalized to @code{iso88591}. This |
1903 | allows the program user much more freedom in choosing the locale name. | |
f41c8091 UD |
1904 | |
1905 | Even this extended functionality still does not help to solve the | |
1906 | problem that completely different names can be used to denote the same | |
1907 | locale (e.g., @code{de} and @code{german}). To be of help in this | |
1908 | situation the locale implementation and also the @code{gettext} | |
1909 | functions know about aliases. | |
1910 | ||
1911 | The file @file{/usr/share/locale/locale.alias} (replace @file{/usr} with | |
1912 | whatever prefix you used for configuring the C library) contains a | |
1913 | mapping of alternative names to more regular names. The system manager | |
1914 | is free to add new entries to fill her/his own needs. The selected | |
1915 | locale from the environment is compared with the entries in the first | |
10b89412 | 1916 | column of this file ignoring the case. If they match, the value of the |
f41c8091 UD |
1917 | second column is used instead for the further handling. |
1918 | ||
1919 | In the description of the format of the environment variables we already | |
1920 | mentioned the character set as a factor in the selection of the message | |
1921 | catalog. In fact, only catalogs which contain text written using the | |
1922 | character set of the system/program can be used (directly; there will | |
1923 | come a solution for this some day). This means for the user that s/he | |
10b89412 | 1924 | will always have to take care of this. If in the collection of the |
f41c8091 UD |
1925 | message catalogs there are files for the same language but coded using |
1926 | different character sets the user has to be careful. | |
40a55d20 UD |
1927 | |
1928 | ||
1929 | @node Helper programs for gettext | |
1930 | @subsection Programs to handle message catalogs for @code{gettext} | |
1931 | ||
1f77f049 | 1932 | @Theglibc{} does not contain the source code for the programs to |
f41c8091 UD |
1933 | handle message catalogs for the @code{gettext} functions. As part of |
1934 | the GNU project the GNU gettext package contains everything the | |
1935 | developer needs. The functionality provided by the tools in this | |
1936 | package by far exceeds the abilities of the @code{gencat} program | |
1937 | described above for the @code{catgets} functions. | |
1938 | ||
1939 | There is a program @code{msgfmt} which is the equivalent program to the | |
1940 | @code{gencat} program. It generates from the human-readable and | |
1941 | -editable form of the message catalog a binary file which can be used by | |
1942 | the @code{gettext} functions. But there are several more programs | |
1943 | available. | |
1944 | ||
1945 | The @code{xgettext} program can be used to automatically extract the | |
1946 | translatable messages from a source file. I.e., the programmer need not | |
c430c4af | 1947 | take care of the translations and the list of messages which have to be |
f41c8091 UD |
1948 | translated. S/He will simply wrap the translatable string in calls to |
1949 | @code{gettext} et.al and the rest will be done by @code{xgettext}. This | |
c430c4af | 1950 | program has a lot of options which help to customize the output or |
f41c8091 UD |
1951 | help to understand the input better. |
1952 | ||
c430c4af BS |
1953 | Other programs help to manage the development cycle when new messages appear |
1954 | in the source files or when a new translation of the messages appears. | |
11bf311e UD |
1955 | Here it should only be noted that using all the tools in GNU gettext it |
1956 | is possible to @emph{completely} automate the handling of message | |
10b89412 | 1957 | catalogs. Besides marking the translatable strings in the source code and |
f41c8091 | 1958 | generating the translations the developers do not have anything to do |
608cc1f0 | 1959 | themselves. |