@c %MENU% How to make the program speak the user's language
@chapter Message Translation
-The program's interface with the human should be designed in a way to
-ease the human the task. One of the possibilities is to use messages in
-whatever language the user prefers.
+The program's interface with the user should be designed to ease the user's
+task. One way to ease the user's task is to use messages in whatever
+language the user prefers.
Printing messages in different languages can be implemented in different
ways. One could add all the different languages in the source code and
-add among the variants every time a message has to be printed. This is
-certainly no good solution since extending the set of languages is
-difficult (the code must be changed) and the code itself can become
+choose among the variants every time a message has to be printed. This is
+certainly not a good solution since extending the set of languages is
+cumbersome (the code must be changed) and the code itself can become
really big with dozens of message sets.
-A better solution is to keep the message sets for each language are kept
+A better solution is to keep the message sets for each language
in separate files which are loaded at runtime depending on the language
selection of the user.
-The GNU C Library provides two different sets of functions to support
+@Theglibc{} provides two different sets of functions to support
message translation. The problem is that neither of the interfaces is
officially defined by the POSIX standard. The @code{catgets} family of
functions is defined in the X/Open standard but this is derived from
industry decisions and therefore not necessarily based on reasonable
decisions.
-As mentioned above the message catalog handling provides easy
-extendibility by using external data files which contain the message
+As mentioned above, the message catalog handling provides easy
+extendability by using external data files which contain the message
translations. I.e., these files contain for each of the messages used
in the program a translation for the appropriate language. So the tasks
of the message handling functions are
@itemize @bullet
@item
-locate the external data file with the appropriate translations.
+locate the external data file with the appropriate translations
@item
load the data and make it possible to address the messages
@item
@end itemize
The two approaches mainly differ in the implementation of this last
-step. The design decisions made for this influences the whole rest.
+step. Decisions made in the last step influence the rest of the design.
@menu
* Message catalogs a la X/Open:: The @code{catgets} family of functions.
This means for the author of the program that s/he will have to make
sure the meaning of the identifier in the program code and in the
-message catalogs are always the same.
+message catalogs is always the same.
Before a message can be translated the catalog file must be located.
The user of the program must be able to guide the responsible function
@node The catgets Functions
@subsection The @code{catgets} function family
-@comment nl_types.h
-@comment X/Open
@deftypefun nl_catd catopen (const char *@var{cat_name}, int @var{flag})
-The @code{catgets} function tries to locate the message data file names
+@standards{X/Open, nl_types.h}
+@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
+@c catopen @mtsenv @ascuheap @acsmem
+@c strchr ok
+@c setlocale(,NULL) ok
+@c getenv @mtsenv
+@c strlen ok
+@c alloca ok
+@c stpcpy ok
+@c malloc @ascuheap @acsmem
+@c __open_catalog @ascuheap @acsmem
+@c strchr ok
+@c open_not_cancel_2 @acsfd
+@c strlen ok
+@c ENOUGH ok
+@c alloca ok
+@c memcpy ok
+@c fxstat64 ok
+@c __set_errno ok
+@c mmap @acsmem
+@c malloc dup @ascuheap @acsmem
+@c read_not_cancel ok
+@c free dup @ascuheap @acsmem
+@c munmap ok
+@c close_not_cancel_no_status ok
+@c free @ascuheap @acsmem
+The @code{catopen} function tries to locate the message data file named
@var{cat_name} and loads it when found. The return value is of an
opaque type and can be used in calls to the other functions to refer to
this loaded catalog.
The return value is @code{(nl_catd) -1} in case the function failed and
-no catalog was loaded. The global variable @var{errno} contains a code
+no catalog was loaded. The global variable @code{errno} contains a code
for the error causing the failure. But even if the function call
succeeded this does not mean that all messages can be translated.
format above.
@item %%
-Since @code{%} is used in a meta character there must be a way to
+Since @code{%} is used as a meta character there must be a way to
express the @code{%} character in the result itself. Using @code{%%}
does this just like it works for @code{printf}.
@end table
@end smallexample
@noindent
-where @var{prefix} is given to @code{configure} while installing the GNU
-C Library (this value is in many cases @code{/usr} or the empty string).
+where @var{prefix} is given to @code{configure} while installing @theglibc{}
+(this value is in many cases @code{/usr} or the empty string).
The remaining problem is to decide which must be used. The value
decides about the substitution of the format elements mentioned above.
environment are examined (@pxref{Standard Environment}). Which
variables are examined is decided by the @var{flag} parameter of
@code{catopen}. If the value is @code{NL_CAT_LOCALE} (which is defined
-in @file{nl_types.h}) then the @code{catopen} function use the name of
+in @file{nl_types.h}) then the @code{catopen} function uses the name of
the locale currently selected for the @code{LC_MESSAGES} category.
If @var{flag} is zero the @code{LANG} environment variable is examined.
-This is a left-over from the early days where the concept of the locales
+This is a left-over from the early days when the concept of locales
had not even reached the level of POSIX locales.
The environment variable and the locale name should have a value of the
@end smallexample
@noindent
-When an error occurred the global variable @var{errno} is set to
+When an error occurs the global variable @code{errno} is set to
@table @var
@item EBADF
@deftypefun {char *} catgets (nl_catd @var{catalog_desc}, int @var{set}, int @var{message}, const char *@var{string})
-The function @code{catgets} has to be used to access the massage catalog
+@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
+The function @code{catgets} has to be used to access the message catalog
previously opened using the @code{catopen} function. The
@var{catalog_desc} parameter must be a value previously returned by
@code{catopen}.
The next two parameters, @var{set} and @var{message}, reflect the
internal organization of the message catalog files. This will be
explained in detail below. For now it is interesting to know that a
-catalog can consists of several set and the messages in each thread are
+catalog can consist of several sets and the messages in each thread are
individually numbered using numbers. Neither the set number nor the
message number must be consecutive. They can be arbitrarily chosen.
But each message (unless equal to another one) must have its own unique
-pair of set and message number.
+pair of set and message numbers.
Since it is not guaranteed that the message catalog for the language
selected by the user exists the last parameter @var{string} helps to
set/message number tuple must be unique the programmer must keep lists
of the messages at the same time the code is written. And the work
between several people working on the same project must be coordinated.
-We will see some how these problems can be relaxed a bit (@pxref{Common
+We will see how some of these problems can be relaxed a bit (@pxref{Common
Usage}).
@deftypefun int catclose (nl_catd @var{catalog_desc})
+@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acucorrupt{} @acsmem{}}}
+@c catclose @ascuheap @acucorrupt @acsmem
+@c __set_errno ok
+@c munmap ok
+@c free @ascuheap @acsmem
The @code{catclose} function can be used to free the resources
associated with a message catalog which previously was opened by a call
to @code{catopen}. If the resources can be successfully freed the
-function returns @code{0}. Otherwise it return @code{@minus{}1} and the
-global variable @var{errno} is set. Errors can occur if the catalog
-descriptor @var{catalog_desc} is not valid in which case @var{errno} is
+function returns @code{0}. Otherwise it returns @code{@minus{}1} and the
+global variable @code{errno} is set. Errors can occur if the catalog
+descriptor @var{catalog_desc} is not valid in which case @code{errno} is
set to @code{EBADF}.
@end deftypefun
@node The message catalog files
@subsection Format of the message catalog files
-The only reasonable way the translate all the messages of a function and
+The only reasonable way to translate all the messages of a function and
store the result in a message catalog file which can be read by the
@code{catopen} function is to write all the message text to the
translator and let her/him translate them all. I.e., we must have a
@item
If a line contains after leading whitespaces the sequence
@code{$quote}, the quoting character used for this input file is
-changed to the first non-whitespace character following the
+changed to the first non-whitespace character following
@code{$quote}. If no non-whitespace character is present before the
-line ends quoting is disable.
+line ends quoting is disabled.
By default no quoting character is used. In this mode strings are
terminated with the first unescaped line break. If there is a
is an error if the same message number already appeared for this set.
If the leading token was an identifier the message number gets
-automatically assigned. The value is the current maximum messages
+automatically assigned. The value is the current maximum message
number for this set plus one. It is an error if the identifier was
already used for a message in this set. It is OK to reuse the
identifier for a message in another thread. How to use the symbolic
a whitespace.
@item
The quoting character is set to @code{"}. Otherwise the quotes in the
-message definition would have to be left away and in this case the
-message with the identifier @code{two} would loose its leading whitespace.
+message definition would have to be omitted and in this case the
+message with the identifier @code{two} would lose its leading whitespace.
@item
-Mixing numbered messages with message having symbolic names is no
+Mixing numbered messages with messages having symbolic names is no
problem and the numbering happens automatically.
@end itemize
While this file format is pretty easy it is not the best possible for
use in a running program. The @code{catopen} function would have to
-parser the file and handle syntactic errors gracefully. This is not so
+parse the file and handle syntactic errors gracefully. This is not so
easy and the whole process is pretty slow. Therefore the @code{catgets}
functions expect the data in another more compact and ready-to-use file
format. There is a special program @code{gencat} which is explained in
Files in this other format are not human readable. To be easy to use by
programs it is a binary file. But the format is byte order independent
so translation files can be shared by systems of arbitrary architecture
-(as long as they use the GNU C Library).
+(as long as they use @theglibc{}).
Details about the binary file format are not important to know since
these files are always created by the @code{gencat} program. The
-sources of the GNU C Library also provide the sources for the
+sources of @theglibc{} also provide the sources for the
@code{gencat} program and so the interested reader can look through
these source files to learn about the file format.
The @code{gencat} program can be invoked in two ways:
@example
-`gencat [@var{Option}]@dots{} [@var{Output-File} [@var{Input-File}]@dots{}]`
+`gencat [@var{Option} @dots{}] [@var{Output-File} [@var{Input-File} @dots{}]]`
@end example
This is the interface defined in the X/Open standard. If no
-@var{Input-File} parameter is given input will be read from standard
-input. Multiple input files will be read as if they are concatenated.
+@var{Input-File} parameter is given, input will be read from standard
+input. Multiple input files will be read as if they were concatenated.
If @var{Output-File} is also missing, the output will be written to
standard output. To provide the interface one is used to from other
programs a second interface is provided.
@smallexample
-`gencat [@var{Option}]@dots{} -o @var{Output-File} [@var{Input-File}]@dots{}`
+`gencat [@var{Option} @dots{}] -o @var{Output-File} [@var{Input-File} @dots{}]`
@end smallexample
The option @samp{-o} is used to specify the output file and all file
while using the device names is a GNU extension.
The @code{gencat} program works by concatenating all input files and
-then @strong{merge} the resulting collection of message sets with a
+then @strong{merging} the resulting collection of message sets with a
possibly existing output file. This is done by removing all messages
with set/message number tuples matching any of the generated messages
from the output file and then adding all the new messages. To
regenerate a catalog file while ignoring the old contents therefore
-requires to remove the output file if it exists. If the output is
+requires removing the output file if it exists. If the output is
written to standard output no merging takes place.
@noindent
The following table shows the options understood by the @code{gencat}
-program. The X/Open standard does not specify any option for the
+program. The X/Open standard does not specify any options for the
program so all of these are GNU extensions.
@table @samp
@itemx --help
Print a usage message listing all available options, then exit successfully.
@item --new
-Do never merge the new messages from the input files with the old content
-of the output files. The old content of the output file is discarded.
+Do not merge the new messages from the input files with the old content
+of the output file. The old content of the output file is discarded.
@item -H
@itemx --header=name
This option is used to emit the symbolic names given to sets and
the numbers are allocated once and due to the possibly frequent use of
them it is difficult to change a number later.
@item
-the numbers do not allow to guess anything about the string and
+the numbers do not allow guessing anything about the string and
therefore collisions can easily happen.
@end enumerate
before the program sources can be compiled. In the last section it was
described how to generate a header containing the mapping of the names.
E.g., for the example message file given in the last section we could
-call the @code{gencat} program as follow (assume @file{ex.msg} contains
+call the @code{gencat} program as follows (assume @file{ex.msg} contains
the sources).
@smallexample
but this is not necessary. The @code{gencat} program can take care for
everything. All the programmer has to do is to put the generated header
file in the dependency list of the source files of her/his project and
-to add a rules to regenerate the header of any of the input files
-change.
+add a rule to regenerate the header if any of the input files change.
One word about the symbol mangling. Every symbol consists of two parts:
the name of the message set plus the name of the message or the special
file is named @file{hello.msg} and the program source file @file{hello.c}):
@smallexample
-@cartouche
% gencat -H msgnrs.h -o hello.cat hello.msg
% cat msgnrs.h
#define MainSet 0x1 /* hello.msg:4 */
% ./hello
Hallo, Welt!
%
-@end cartouche
@end smallexample
The call of the @code{gencat} program creates the missing header file
Sun Microsystems tried to standardize a different approach to message
translation in the Uniforum group. There never was a real standard
-defined but still the interface was used in Sun's operation systems.
+defined but still the interface was used in Sun's operating systems.
Since this approach fits better in the development process of free
software it is also used throughout the GNU project and the GNU
-@file{gettext} package provides support for this outside the GNU C
-Library.
+@file{gettext} package provides support for this outside @theglibc{}.
The code of the @file{libintl} from GNU @file{gettext} is the same as
-the code in the GNU C Library. So the documentation in the GNU
+the code in @theglibc{}. So the documentation in the GNU
@file{gettext} manual is also valid for the functionality here. The
following text will describe the library functions in detail. But the
numerous helper programs are not described in this manual. Instead
course means the string itself is the key. I.e., the translation will
be selected based on the original string. The message catalogs must
therefore contain the original strings plus one translation for any such
-string. The task of the @code{gettext} function is it to compare the
+string. The task of the @code{gettext} function is to compare the
argument string with the available strings in the catalog and return the
appropriate translation. Of course this process is optimized so that
this process is not more expensive than an access using an atomic key
not part of the C library they can be found in a separate library named
@file{libintl.a} (or accordingly different for shared libraries).
-@comment libintl.h
-@comment GNU
@deftypefun {char *} gettext (const char *@var{msgid})
+@standards{GNU, libintl.h}
+@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}}
+@c Wrapper for dcgettext.
The @code{gettext} function searches the currently selected message
catalogs for a string which is equal to @var{msgid}. If there is such a
string available it is returned. Otherwise the argument string
@var{msgid} is returned.
-Please note that all though the return value is @code{char *} the
+Please note that although the return value is @code{char *} the
returned string must not be changed. This broken type results from the
history of the function and does not reflect the way the function should
be used.
selected (@pxref{Locating gettext catalog}).
The @code{gettext} function does not modify the value of the global
-@var{errno} variable. This is necessary to make it possible to write
+@code{errno} variable. This is necessary to make it possible to write
something like
@smallexample
printf (gettext ("Operation failed: %m\n"));
@end smallexample
-Here the @var{errno} value is used in the @code{printf} function while
+Here the @code{errno} value is used in the @code{printf} function while
processing the @code{%m} format element and if the @code{gettext}
function would change this value (it is called before @code{printf} is
called) we would get a wrong message.
-So there is no easy way to detect a missing message catalog beside
+So there is no easy way to detect a missing message catalog besides
comparing the argument string with the result. But it is normally the
task of the user to react on missing catalogs. The program cannot guess
when a message catalog is really necessary since for a user who speaks
-the language the program was developed in does not need any translation.
+the language the program was developed in, the message does not need any translation.
@end deftypefun
The remaining two functions to access the message catalog add some
currently selected default message catalog it must specify all ambiguous
information.
-@comment libintl.h
-@comment GNU
@deftypefun {char *} dgettext (const char *@var{domainname}, const char *@var{msgid})
-The @code{dgettext} functions acts just like the @code{gettext}
+@standards{GNU, libintl.h}
+@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}}
+@c Wrapper for dcgettext.
+The @code{dgettext} function acts just like the @code{gettext}
function. It only takes an additional first argument @var{domainname}
which guides the selection of the message catalogs which are searched
for the translation. If the @var{domainname} parameter is the null
anachronism. The returned string must never be modified.
@end deftypefun
-@comment libintl.h
-@comment GNU
@deftypefun {char *} dcgettext (const char *@var{domainname}, const char *@var{msgid}, int @var{category})
+@standards{GNU, libintl.h}
+@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}}
+@c dcgettext @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
+@c dcigettext @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
+@c libc_rwlock_rdlock @asulock @aculock
+@c current_locale_name ok [protected from @mtslocale]
+@c tfind ok
+@c libc_rwlock_unlock ok
+@c plural_lookup ok
+@c plural_eval ok
+@c rawmemchr ok
+@c DETERMINE_SECURE ok, nothing
+@c strcmp ok
+@c strlen ok
+@c getcwd @ascuheap @acsmem @acsfd
+@c strchr ok
+@c stpcpy ok
+@c category_to_name ok
+@c guess_category_value @mtsenv
+@c getenv @mtsenv
+@c current_locale_name dup ok [protected from @mtslocale by dcigettext]
+@c strcmp ok
+@c ENABLE_SECURE ok
+@c _nl_find_domain @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
+@c libc_rwlock_rdlock dup @asulock @aculock
+@c _nl_make_l10nflist dup @ascuheap @acsmem
+@c libc_rwlock_unlock dup ok
+@c _nl_load_domain @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
+@c libc_lock_lock_recursive @aculock
+@c libc_lock_unlock_recursive @aculock
+@c open->open_not_cancel_2 @acsfd
+@c fstat ok
+@c mmap dup @acsmem
+@c close->close_not_cancel_no_status @acsfd
+@c malloc dup @ascuheap @acsmem
+@c read->read_not_cancel ok
+@c munmap dup @acsmem
+@c W dup ok
+@c strlen dup ok
+@c get_sysdep_segment_value ok
+@c memcpy dup ok
+@c hash_string dup ok
+@c free dup @ascuheap @acsmem
+@c libc_rwlock_init ok
+@c _nl_find_msg dup @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
+@c libc_rwlock_fini ok
+@c EXTRACT_PLURAL_EXPRESSION @ascuheap @acsmem
+@c strstr dup ok
+@c isspace ok
+@c strtoul ok
+@c PLURAL_PARSE @ascuheap @acsmem
+@c malloc dup @ascuheap @acsmem
+@c free dup @ascuheap @acsmem
+@c INIT_GERMANIC_PLURAL ok, nothing
+@c the pre-C99 variant is @acucorrupt [protected from @mtuinit by dcigettext]
+@c _nl_expand_alias dup @ascuheap @asulock @acsmem @acsfd @aculock
+@c _nl_explode_name dup @ascuheap @acsmem
+@c libc_rwlock_wrlock dup @asulock @aculock
+@c free dup @asulock @aculock @acsfd @acsmem
+@c _nl_find_msg @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
+@c _nl_load_domain dup @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
+@c strlen ok
+@c hash_string ok
+@c W ok
+@c SWAP ok
+@c bswap_32 ok
+@c strcmp ok
+@c get_output_charset @mtsenv @ascuheap @acsmem
+@c getenv dup @mtsenv
+@c strlen dup ok
+@c malloc dup @ascuheap @acsmem
+@c memcpy dup ok
+@c libc_rwlock_rdlock dup @asulock @aculock
+@c libc_rwlock_unlock dup ok
+@c libc_rwlock_wrlock dup @asulock @aculock
+@c realloc @ascuheap @acsmem
+@c strdup @ascuheap @acsmem
+@c strstr ok
+@c strcspn ok
+@c mempcpy dup ok
+@c norm_add_slashes dup ok
+@c gconv_open @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd
+@c [protected from @mtslocale by dcigettext locale lock]
+@c free dup @ascuheap @acsmem
+@c libc_lock_lock @asulock @aculock
+@c calloc @ascuheap @acsmem
+@c gconv dup @acucorrupt [protected from @mtsrace and @asucorrupt by lock]
+@c libc_lock_unlock ok
+@c malloc @ascuheap @acsmem
+@c mempcpy ok
+@c memcpy ok
+@c strcpy ok
+@c libc_rwlock_wrlock @asulock @aculock
+@c tsearch @ascuheap @acucorrupt @acsmem [protected from @mtsrace and @asucorrupt]
+@c transcmp ok
+@c strmp dup ok
+@c free @ascuheap @acsmem
The @code{dcgettext} adds another argument to those which
@code{dgettext} takes. This argument @var{category} specifies the last
piece of information needed to localize the message catalog. I.e., the
@code{LC_COLLATE}, @code{LC_MESSAGES}, @code{LC_MONETARY},
@code{LC_NUMERIC}, and @code{LC_TIME}. Please note that @code{LC_ALL}
must not be used and even though the names might suggest this, there is
-no relation to the environments variables of this name.
+no relation to the environment variable of this name.
The @code{dcgettext} function is only implemented for compatibility with
other systems which have @code{gettext} functions. There is not really
any situation where it is necessary (or useful) to use a different value
-but @code{LC_MESSAGES} in for the @var{category} parameter. We are
+than @code{LC_MESSAGES} for the @var{category} parameter. We are
dealing with messages here and any other choice can only be irritating.
As for @code{gettext} the return value type is @code{char *} which is an
@end deftypefun
When using the three functions above in a program it is a frequent case
-that the @var{msgid} argument is a constant string. So it is worth to
+that the @var{msgid} argument is a constant string. So it is worthwhile to
optimize this case. Thinking shortly about this one will realize that
as long as no new message catalog is loaded the translation of a message
will not change. This optimization is actually implemented by the
@enumerate
@item
Locate the set of message catalogs. There are a number of files for
-different languages and which all belong to the package. Usually they
+different languages which all belong to the package. Usually they
are all stored in the filesystem below a certain directory.
-There can be arbitrary many packages installed and they can follow
+There can be arbitrarily many packages installed and they can follow
different guidelines for the placement of their files.
@item
@item
The language to be used can be specified in several different ways.
There is no generally accepted standard for this and the user always
-expects the program understand what s/he means. E.g., to select the
+expects the program to understand what s/he means. E.g., to select the
German translation one could write @code{de}, @code{german}, or
@code{deutsch} and the program should always react the same.
second best choice to fall back on the language of the developer and
simply not translate any message. Instead a user might be better able
to read the messages in another language and so the user of the program
-should be able to define an precedence order of languages.
+should be able to define a precedence order of languages.
@end itemize
We can divide the configuration actions in two parts: the one is
As the functions described in the last sections already mention separate
sets of messages can be selected by a @dfn{domain name}. This is a
-simple string which should be unique for each program part with uses a
-separate domain. It is possible to use in one program arbitrary many
-domains at the same time. E.g., the GNU C Library itself uses a domain
+simple string which should be unique for each program part that uses a
+separate domain. It is possible to use in one program arbitrarily many
+domains at the same time. E.g., @theglibc{} itself uses a domain
named @code{libc} while the program using the C Library could use a
domain named @code{foo}. The important point is that at any time
exactly one domain is active. This is controlled with the following
function.
-@comment libintl.h
-@comment GNU
@deftypefun {char *} textdomain (const char *@var{domainname})
+@standards{GNU, libintl.h}
+@safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}}
+@c textdomain @asulock @ascuheap @aculock @acsmem
+@c libc_rwlock_wrlock @asulock @aculock
+@c strcmp ok
+@c strdup @ascuheap @acsmem
+@c free @ascuheap @acsmem
+@c libc_rwlock_unlock ok
The @code{textdomain} function sets the default domain, which is used in
all future @code{gettext} calls, to @var{domainname}. Please note that
@code{dgettext} and @code{dcgettext} calls are not influenced if the
The function returns the value which is from now on taken as the default
domain. If the system went out of memory the returned value is
-@code{NULL} and the global variable @var{errno} is set to @code{ENOMEM}.
+@code{NULL} and the global variable @code{errno} is set to @code{ENOMEM}.
Despite the return value type being @code{char *} the return string must
not be changed. It is allocated internally by the @code{textdomain}
function.
really never should be used.
@end deftypefun
-@comment libintl.h
-@comment GNU
@deftypefun {char *} bindtextdomain (const char *@var{domainname}, const char *@var{dirname})
+@standards{GNU, libintl.h}
+@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
+@c bindtextdomain @ascuheap @acsmem
+@c set_binding_values @ascuheap @acsmem
+@c libc_rwlock_wrlock dup @asulock @aculock
+@c strcmp dup ok
+@c strdup dup @ascuheap @acsmem
+@c free dup @ascuheap @acsmem
+@c malloc dup @ascuheap @acsmem
The @code{bindtextdomain} function can be used to specify the directory
which contains the message catalogs for domain @var{domainname} for the
different languages. To be correct, this is the directory where the
hierarchy of directories is expected. Details are explained below.
For the programmer it is important to note that the translations which
-come with the program have be placed in a directory hierarchy starting
+come with the program have to be placed in a directory hierarchy starting
at, say, @file{/foo/bar}. Then the program should make a
@code{bindtextdomain} call to bind the domain for the current program to
this directory. So it is made sure the catalogs are found. A correctly
allocated internally in the function and must not be changed by the
user. If the system went out of core during the execution of
@code{bindtextdomain} the return value is @code{NULL} and the global
-variable @var{errno} is set accordingly.
+variable @code{errno} is set accordingly.
@end deftypefun
The functions of the @code{gettext} family described so far (and all the
@code{catgets} functions as well) have one problem in the real world
-which have been neglected completely in all existing approaches. What
+which has been neglected completely in all existing approaches. What
is meant here is the handling of plural forms.
Looking through Unix source code before the time anybody thought about
But this does not solve the problem. It helps languages where the
plural form of a noun is not simply constructed by adding an `s' but
that is all. Once again people fell into the trap of believing the
-rules their language is using are universal. But the handling of plural
+rules their language uses are universal. But the handling of plural
forms differs widely between the language families. There are two
things we can differ between (and even inside language families);
extended @code{gettext} interface should be used.
These extra functions are taking instead of the one key string two
-strings and an numerical argument. The idea behind this is that using
+strings and a numerical argument. The idea behind this is that using
the numerical argument and the first string as a key, the implementation
can select using rules specified by the translator the right plural
form. The two string arguments then will be used to provide a return
value in case no message catalog is found (similar to the normal
@code{gettext} behavior). In this case the rules for Germanic language
-is used and it is assumed that the first string argument is the singular
+are used and it is assumed that the first string argument is the singular
form, the second the plural form.
This has the consequence that programs without language catalogs can
display the correct strings only if the program itself is written using
-a Germanic language. This is a limitation but since the GNU C library
-(as well as the GNU @code{gettext} package) are written as part of the
-GNU package and the coding standards for the GNU project require program
-being written in English, this solution nevertheless fulfills its
+a Germanic language. This is a limitation but since @theglibc{}
+(as well as the GNU @code{gettext} package) is written as part of the
+GNU package and the coding standards for the GNU project require programs
+to be written in English, this solution nevertheless fulfills its
purpose.
-@comment libintl.h
-@comment GNU
@deftypefun {char *} ngettext (const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n})
+@standards{GNU, libintl.h}
+@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}}
+@c Wrapper for dcngettext.
The @code{ngettext} function is similar to the @code{gettext} function
as it finds the message catalogs in the same way. But it takes two
extra arguments. The @var{msgid1} parameter must contain the singular
message catalog is found @var{msgid1} is returned if @code{n == 1},
otherwise @code{msgid2}.
-An example for the us of this function is:
+An example for the use of this function is:
@smallexample
printf (ngettext ("%d file removed", "%d files removed", n), n);
@code{ngettext}.
@end deftypefun
-@comment libintl.h
-@comment GNU
@deftypefun {char *} dngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n})
+@standards{GNU, libintl.h}
+@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}}
+@c Wrapper for dcngettext.
The @code{dngettext} is similar to the @code{dgettext} function in the
way the message catalog is selected. The difference is that it takes
-two extra parameter to provide the correct plural form. These two
+two extra parameters to provide the correct plural form. These two
parameters are handled in the same way @code{ngettext} handles them.
@end deftypefun
-@comment libintl.h
-@comment GNU
@deftypefun {char *} dcngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n}, int @var{category})
+@standards{GNU, libintl.h}
+@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}}
+@c Wrapper for dcigettext.
The @code{dcngettext} is similar to the @code{dcgettext} function in the
way the message catalog is selected. The difference is that it takes
-two extra parameter to provide the correct plural form. These two
+two extra parameters to provide the correct plural form. These two
parameters are handled in the same way @code{ngettext} handles them.
@end deftypefun
with every language this is the only viable solution except for
hardcoding the information in the code (which still would require the
possibility of extensions to not prevent the use of new languages). The
-details are explained in the GNU @code{gettext} manual. Here only a a
+details are explained in the GNU @code{gettext} manual. Here only a
bit of information is provided.
The information about the plural form selection has to be stored in the
-header entry (the one with the empty (@code{msgid} string). It looks
+header entry (the one with the empty @code{msgid} string). It looks
like this:
@smallexample
The @code{nplurals} value must be a decimal number which specifies how
many different plural forms exist for this language. The string
-following @code{plural} is an expression which is using the C language
-syntax. Exceptions are that no negative number are allowed, numbers
+following @code{plural} is an expression using the C language
+syntax. Exceptions are that no negative numbers are allowed, numbers
must be decimal, and the only variable allowed is @code{n}. This
expression will be evaluated whenever one of the functions
@code{ngettext}, @code{dngettext}, or @code{dcngettext} is called. The
@item Finno-Ugric family
Hungarian
@item Asian family
-Japanese
+Japanese, Korean
@item Turkic/Altaic family
Turkish
@end table
@item Two forms, singular used for one only
This is the form used in most existing programs since it is what English
-is using. A header entry would look like this:
+uses. A header entry would look like this:
@smallexample
Plural-Forms: nplurals=2; plural=n != 1;
@item Semitic family
Hebrew
@item Romance family
-Italian, Spanish
+Italian, Portuguese, Spanish
@item Artificial
Esperanto
@end table
@table @asis
@item Romanic family
-French
+French, Brazilian Portuguese
+@end table
+
+@item Three forms, special case for zero
+The header entry would be:
+
+@smallexample
+Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n != 0 ? 1 : 2;
+@end smallexample
+
+@noindent
+Languages with this property include:
+
+@table @asis
+@item Baltic family
+Latvian
@end table
@item Three forms, special cases for one and two
@table @asis
@item Celtic
-Gaeilge
+Gaeilge (Irish)
+@end table
+
+@item Three forms, special case for numbers ending in 1[2-9]
+The header entry would look like this:
+
+@smallexample
+Plural-Forms: nplurals=3; \
+ plural=n%10==1 && n%100!=11 ? 0 : \
+ n%10>=2 && (n%100<10 || n%100>=20) ? 1 : 2;
+@end smallexample
+
+@noindent
+Languages with this property include:
+
+@table @asis
+@item Baltic family
+Lithuanian
@end table
@item Three forms, special cases for numbers ending in 1 and 2, 3, 4, except those ending in 1[1-4]
@table @asis
@item Slavic family
-Czech, Russian
+Croatian, Czech, Russian, Ukrainian
@end table
@item Three forms, special cases for 1 and 2, 3, 4
n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;
@end smallexample
-(Continuation in the next line is possible.)
-
@noindent
Languages with this property include:
Polish
@end table
-@item Four forms, special case for one and all numbers ending in 2, 3, or 4
+@item Four forms, special case for one and all numbers ending in 02, 03, or 04
The header entry would look like this:
@smallexample
Plural-Forms: nplurals=4; \
- plural=n==1 ? 0 : n%10==2 ? 1 : n%10==3 || n%10==4 ? 2 : 3;
+ plural=n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3;
@end smallexample
@noindent
@node Charset conversion in gettext
@subsubsection How to specify the output character set @code{gettext} uses
-@code{gettext} not only looks up a translation in a message catalog. It
+@code{gettext} not only looks up a translation in a message catalog, it
also converts the translation on the fly to the desired output character
set. This is useful if the user is working in a different character set
than the translator who created the message catalog, because it avoids
independently of the current output character set. It is therefore
recommended that all @var{msgid}s be US-ASCII strings.
-@comment libintl.h
-@comment GNU
@deftypefun {char *} bind_textdomain_codeset (const char *@var{domainname}, const char *@var{codeset})
+@standards{GNU, libintl.h}
+@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
+@c bind_textdomain_codeset @ascuheap @acsmem
+@c set_binding_values dup @ascuheap @acsmem
The @code{bind_textdomain_codeset} function can be used to specify the
output character set for message catalogs for domain @var{domainname}.
The @var{codeset} argument must be a valid codeset name which can be used
If the @var{codeset} parameter is the null pointer,
@code{bind_textdomain_codeset} returns the currently selected codeset
-for the domain with the name @var{domainname}. It returns @code{NULL} if
+for the domain with the name @var{domainname}. It returns @code{NULL} if
no codeset has yet been selected.
The @code{bind_textdomain_codeset} function can be used several times.
allocated internally in the function and must not be changed by the
user. If the system went out of core during the execution of
@code{bind_textdomain_codeset}, the return value is @code{NULL} and the
-global variable @var{errno} is set accordingly. @end deftypefun
+global variable @code{errno} is set accordingly.
+@end deftypefun
@node GUI program problems
handle these kind of problems with the @code{gettext} functions.
@noindent
-As as example consider the following fictional situation. A GUI program
+As an example consider the following fictional situation. A GUI program
has a menu bar with the following entries:
@smallexample
@code{Open}. The translations might not be the same and therefore we
are in the dilemma described above.
-One solution to this problem is to artificially enlengthen the strings
+One solution to this problem is to artificially extend the strings
to make them unambiguous. But what would the program do if no
-translation is available? The enlengthened string is not what should be
-printed. So we should use a little bit modified version of the functions.
+translation is available? The extended string is not what should be
+printed. So we should use a slightly modified version of the functions.
-To enlengthen the strings a uniform method should be used. E.g., in the
-example above the strings could be chosen as
+To extend the strings a uniform method should be used. E.g., in the
+example above, the strings could be chosen as
@smallexample
Menu|File
simply search for the last occurrence of this character and return a
pointer to the character following it. That's it!
-If one now consistently uses the enlengthened string form and replaces
+If one now consistently uses the extended string form and replaces
the @code{gettext} calls with calls to @code{sgettext} (this is normally
limited to very few places in the GUI implementation) then it is
possible to produce a program which can be internationalized.
and the call to the underlying function.
Now there is of course the question why such functions do not exist in
-the GNU C library? There are two parts of the answer to this question.
+@theglibc{}? There are two parts of the answer to this question.
@itemize @bullet
@item
@item
There is no way the C library can contain a version which can work
everywhere. The problem is the selection of the character to separate
-the prefix from the actual string in the enlenghtened string. The
+the prefix from the actual string in the extended string. The
examples above used @code{|} which is a quite good choice because it
resembles a notation frequently used in this context and it also is a
character not often used in message strings.
@end itemize
There is only one more comment to make left. The wrapper function above
-require that the translations strings are not enlengthened themselves.
+requires that the translations strings are not extended themselves.
This is only logical. There is no need to disambiguate the strings
(since they are never used as keys for a search) and one also saves
quite some memory and disk space by doing this.
them.
The POSIX locale model uses the environment variables @code{LC_COLLATE},
-@code{LC_CTYPE}, @code{LC_MESSAGES}, @code{LC_MONETARY}, @code{NUMERIC},
+@code{LC_CTYPE}, @code{LC_MESSAGES}, @code{LC_MONETARY}, @code{LC_NUMERIC},
and @code{LC_TIME} to select the locale which is to be used. This way
-the user can influence lots of functions. As we mentioned above the
+the user can influence lots of functions. As we mentioned above, the
@code{gettext} functions also take advantage of this.
To understand how this happens it is necessary to take a look at the
This looks very familiar. With the exception of the @code{LANGUAGE}
environment variable this is exactly the lookup order the
-@code{setlocale} function uses. But why introducing the @code{LANGUAGE}
+@code{setlocale} function uses. But why introduce the @code{LANGUAGE}
variable?
The reason is that the syntax of the values these variables can have is
value can consist of a colon separated list of locale names. The
attentive reader will realize that this is the way we manage to
implement one of our additional demands above: we want to be able to
-specify an ordered list of language.
+specify an ordered list of languages.
Back to the constructed filename we have only one component missing.
The @var{domain_name} part is the name which was either registered using
the @code{textdomain} function or which was given to @code{dgettext} or
@code{dcgettext} as the first parameter. Now it becomes obvious that a
good choice for the domain name in the program code is a string which is
-closely related to the program/package name. E.g., for the GNU C
-Library the domain name is @code{libc}.
+closely related to the program/package name. E.g., for @theglibc{}
+the domain name is @code{libc}.
@noindent
-A limit piece of example code should show how the programmer is supposed
+A limited piece of example code should show how the program is supposed
to work:
@smallexample
the message catalogs for the domain @code{test-package} can be found
below the directory @file{/usr/local/share/locale}.
-If now the user set in her/his environment the variable @code{LANGUAGE}
+If the user sets in her/his environment the variable @code{LANGUAGE}
to @code{de} the @code{gettext} function will try to use the
translations from the file
From the above descriptions it should be clear which component of this
filename is determined by which source.
-In the above example we assumed that the @code{LANGUAGE} environment
-variable to @code{de}. This might be an appropriate selection but what
+In the above example we assumed the @code{LANGUAGE} environment
+variable to be @code{de}. This might be an appropriate selection but what
happens if the user wants to use @code{LC_ALL} because of the wider
usability and here the required value is @code{de_DE.ISO-8859-1}? We
already mentioned above that a situation like this is not infrequent.
@code{language[_territory[.codeset]][@@modifier]}
-Less specific locale names will be stripped of in the order of the
+Less specific locale names will be stripped in the order of the
following list:
@enumerate
The @code{language} field will never be dropped for obvious reasons.
The only new thing is the @code{normalized codeset} entry. This is
-another goodie which is introduced to help reducing the chaos which
-derives from the inability of the people to standardize the names of
+another goodie which is introduced to help reduce the chaos which
+derives from the inability of people to standardize the names of
character sets. Instead of @w{ISO-8859-1} one can often see @w{8859-1},
@w{88591}, @w{iso8859-1}, or @w{iso_8859-1}. The @code{normalized
codeset} value is generated from the user-provided character set name by
@enumerate
@item
-Remove all characters beside numbers and letters.
+Remove all characters besides numbers and letters.
@item
Fold letters to lowercase.
@item
@end enumerate
@noindent
-So all of the above name will be normalized to @code{iso88591}. This
-allows the program user much more freely choosing the locale name.
+So all of the above names will be normalized to @code{iso88591}. This
+allows the program user much more freedom in choosing the locale name.
Even this extended functionality still does not help to solve the
problem that completely different names can be used to denote the same
mapping of alternative names to more regular names. The system manager
is free to add new entries to fill her/his own needs. The selected
locale from the environment is compared with the entries in the first
-column of this file ignoring the case. If they match the value of the
+column of this file ignoring the case. If they match, the value of the
second column is used instead for the further handling.
In the description of the format of the environment variables we already
catalog. In fact, only catalogs which contain text written using the
character set of the system/program can be used (directly; there will
come a solution for this some day). This means for the user that s/he
-will always have to take care for this. If in the collection of the
+will always have to take care of this. If in the collection of the
message catalogs there are files for the same language but coded using
different character sets the user has to be careful.
@node Helper programs for gettext
@subsection Programs to handle message catalogs for @code{gettext}
-The GNU C Library does not contain the source code for the programs to
+@Theglibc{} does not contain the source code for the programs to
handle message catalogs for the @code{gettext} functions. As part of
the GNU project the GNU gettext package contains everything the
developer needs. The functionality provided by the tools in this
The @code{xgettext} program can be used to automatically extract the
translatable messages from a source file. I.e., the programmer need not
-take care for the translations and the list of messages which have to be
+take care of the translations and the list of messages which have to be
translated. S/He will simply wrap the translatable string in calls to
@code{gettext} et.al and the rest will be done by @code{xgettext}. This
-program has a lot of option which help to customize the output or do
+program has a lot of options which help to customize the output or
help to understand the input better.
-Other programs help to manage development cycle when new messages appear
-in the source files or when a new translation of the messages appear.
-here it should only be noted that using all the tools in GNU gettext it
-is possible to @emph{completely} automize the handling of message
-catalog. Beside marking the translatable string in the source code and
+Other programs help to manage the development cycle when new messages appear
+in the source files or when a new translation of the messages appears.
+Here it should only be noted that using all the tools in GNU gettext it
+is possible to @emph{completely} automate the handling of message
+catalogs. Besides marking the translatable strings in the source code and
generating the translations the developers do not have anything to do
themselves.