]> git.ipfire.org Git - thirdparty/glibc.git/blame - manual/locale.texi
Document wide character string functions.
[thirdparty/glibc.git] / manual / locale.texi
CommitLineData
390955cb 1@node Locales, Message Translation, Character Set Handling, Top
7a68c94a 2@c %MENU% The country and language can affect the behavior of library functions
28f540f4
RM
3@chapter Locales and Internationalization
4
5Different countries and cultures have varying conventions for how to
6communicate. These conventions range from very simple ones, such as the
7format for representing dates and times, to very complex ones, such as
8the language spoken.
9
10@cindex internationalization
11@cindex locales
12@dfn{Internationalization} of software means programming it to be able
f65fd747 13to adapt to the user's favorite conventions. In @w{ISO C},
28f540f4
RM
14internationalization works by means of @dfn{locales}. Each locale
15specifies a collection of conventions, one convention for each purpose.
16The user chooses a set of conventions by specifying a locale (via
17environment variables).
18
19All programs inherit the chosen locale as part of their environment.
20Provided the programs are written to obey the choice of locale, they
21will follow the conventions preferred by the user.
22
23@menu
24* Effects of Locale:: Actions affected by the choice of
f65fd747 25 locale.
28f540f4
RM
26* Choosing Locale:: How the user specifies a locale.
27* Locale Categories:: Different purposes for which you can
f65fd747 28 select a locale.
28f540f4 29* Setting the Locale:: How a program specifies the locale
f65fd747 30 with library functions.
28f540f4 31* Standard Locales:: Locale names available on all systems.
85c165be 32* Locale Information:: How to access the information for the locale.
5e0889da 33* Formatting Numbers:: A dedicated function to format numbers.
28f540f4
RM
34@end menu
35
36@node Effects of Locale, Choosing Locale, , Locales
37@section What Effects a Locale Has
38
39Each locale specifies conventions for several purposes, including the
40following:
41
42@itemize @bullet
43@item
44What multibyte character sequences are valid, and how they are
390955cb 45interpreted (@pxref{Character Set Handling}).
28f540f4
RM
46
47@item
48Classification of which characters in the local character set are
49considered alphabetic, and upper- and lower-case conversion conventions
50(@pxref{Character Handling}).
51
52@item
53The collating sequence for the local language and character set
54(@pxref{Collation Functions}).
55
56@item
85c165be 57Formatting of numbers and currency amounts (@pxref{General Numeric}).
28f540f4
RM
58
59@item
99a20616 60Formatting of dates and times (@pxref{Formatting Calendar Time}).
28f540f4
RM
61
62@item
85c165be
UD
63What language to use for output, including error messages
64(@pxref{Message Translation}).
28f540f4
RM
65
66@item
67What language to use for user answers to yes-or-no questions.
68
69@item
70What language to use for more complex user input.
71(The C library doesn't yet help you implement this.)
72@end itemize
73
74Some aspects of adapting to the specified locale are handled
75automatically by the library subroutines. For example, all your program
76needs to do in order to use the collating sequence of the chosen locale
77is to use @code{strcoll} or @code{strxfrm} to compare strings.
78
79Other aspects of locales are beyond the comprehension of the library.
80For example, the library can't automatically translate your program's
81output messages into other languages. The only way you can support
82output in the user's favorite language is to program this more or less
85c165be
UD
83by hand. The C library provides functions to handle translations for
84multiple languages easily.
28f540f4
RM
85
86This chapter discusses the mechanism by which you can modify the current
87locale. The effects of the current locale on specific library functions
88are discussed in more detail in the descriptions of those functions.
89
90@node Choosing Locale, Locale Categories, Effects of Locale, Locales
91@section Choosing a Locale
92
93The simplest way for the user to choose a locale is to set the
94environment variable @code{LANG}. This specifies a single locale to use
95for all purposes. For example, a user could specify a hypothetical
96locale named @samp{espana-castellano} to use the standard conventions of
97most of Spain.
98
99The set of locales supported depends on the operating system you are
100using, and so do their names. We can't make any promises about what
101locales will exist, except for one standard locale called @samp{C} or
6dd5b57e 102@samp{POSIX}. Later we will describe how to construct locales.
85c165be 103@comment (@pxref{Building Locale Files}).
28f540f4
RM
104
105@cindex combining locales
106A user also has the option of specifying different locales for different
107purposes---in effect, choosing a mixture of multiple locales.
108
109For example, the user might specify the locale @samp{espana-castellano}
110for most purposes, but specify the locale @samp{usa-english} for
111currency formatting. This might make sense if the user is a
112Spanish-speaking American, working in Spanish, but representing monetary
113amounts in US dollars.
114
115Note that both locales @samp{espana-castellano} and @samp{usa-english},
116like all locales, would include conventions for all of the purposes to
117which locales apply. However, the user can choose to use each locale
118for a particular subset of those purposes.
119
120@node Locale Categories, Setting the Locale, Choosing Locale, Locales
121@section Categories of Activities that Locales Affect
122@cindex categories for locales
123@cindex locale categories
124
125The purposes that locales serve are grouped into @dfn{categories}, so
126that a user or a program can choose the locale for each category
127independently. Here is a table of categories; each name is both an
128environment variable that a user can set, and a macro name that you can
129use as an argument to @code{setlocale}.
130
85c165be 131@vtable @code
28f540f4 132@comment locale.h
f65fd747 133@comment ISO
28f540f4 134@item LC_COLLATE
28f540f4
RM
135This category applies to collation of strings (functions @code{strcoll}
136and @code{strxfrm}); see @ref{Collation Functions}.
137
138@comment locale.h
f65fd747 139@comment ISO
28f540f4 140@item LC_CTYPE
28f540f4
RM
141This category applies to classification and conversion of characters,
142and to multibyte and wide characters;
390955cb 143see @ref{Character Handling}, and @ref{Character Set Handling}.
28f540f4
RM
144
145@comment locale.h
f65fd747 146@comment ISO
28f540f4 147@item LC_MONETARY
85c165be 148This category applies to formatting monetary values; see @ref{General Numeric}.
28f540f4
RM
149
150@comment locale.h
f65fd747 151@comment ISO
28f540f4 152@item LC_NUMERIC
28f540f4 153This category applies to formatting numeric values that are not
85c165be 154monetary; see @ref{General Numeric}.
28f540f4
RM
155
156@comment locale.h
f65fd747 157@comment ISO
28f540f4 158@item LC_TIME
28f540f4 159This category applies to formatting date and time values; see
99a20616 160@ref{Formatting Calendar Time}.
28f540f4 161
28f540f4 162@comment locale.h
f65fd747
UD
163@comment XOPEN
164@item LC_MESSAGES
85c165be 165This category applies to selecting the language used in the user
8b7fb588
UD
166interface for message translation (@pxref{The Uniforum approach};
167@pxref{Message catalogs a la X/Open}).
28f540f4
RM
168
169@comment locale.h
f65fd747 170@comment ISO
28f540f4 171@item LC_ALL
28f540f4 172This is not an environment variable; it is only a macro that you can use
85c165be
UD
173with @code{setlocale} to set a single locale for all purposes. Setting
174this environment variable overwrites all selections by the other
175@code{LC_*} variables or @code{LANG}.
28f540f4
RM
176
177@comment locale.h
f65fd747 178@comment ISO
28f540f4 179@item LANG
28f540f4
RM
180If this environment variable is defined, its value specifies the locale
181to use for all purposes except as overridden by the variables above.
85c165be
UD
182@end vtable
183
184@vindex LANGUAGE
185When developing the message translation functions it was felt that the
6dd5b57e 186functionality provided by the variables above is not sufficient. For
6941c42a 187example, it should be possible to specify more than one locale name.
6dd5b57e
UD
188Take a Swedish user who better speaks German than English, and a program
189whose messages are output in English by default. It should be possible
190to specify that the first choice of language is Swedish, the second
191German, and if this also fails to use English. This is
85c165be
UD
192possible with the variable @code{LANGUAGE}. For further description of
193this GNU extension see @ref{Using gettextized software}.
28f540f4
RM
194
195@node Setting the Locale, Standard Locales, Locale Categories, Locales
196@section How Programs Set the Locale
197
198A C program inherits its locale environment variables when it starts up.
199This happens automatically. However, these variables do not
200automatically control the locale used by the library functions, because
f65fd747 201@w{ISO C} says that all programs start by default in the standard @samp{C}
28f540f4
RM
202locale. To use the locales specified by the environment, you must call
203@code{setlocale}. Call it as follows:
204
205@smallexample
206setlocale (LC_ALL, "");
207@end smallexample
208
209@noindent
85c165be
UD
210to select a locale based on the user choice of the appropriate
211environment variables.
28f540f4
RM
212
213@cindex changing the locale
214@cindex locale, changing
215You can also use @code{setlocale} to specify a particular locale, for
216general use or for a specific category.
217
218@pindex locale.h
219The symbols in this section are defined in the header file @file{locale.h}.
220
221@comment locale.h
f65fd747 222@comment ISO
28f540f4 223@deftypefun {char *} setlocale (int @var{category}, const char *@var{locale})
403cb8a1
UD
224The function @code{setlocale} sets the current locale for category
225@var{category} to @var{locale}. A list of all the locales the system
226provides can be created by running
227
228@smallexample
229 locale -a
230@end smallexample
28f540f4
RM
231
232If @var{category} is @code{LC_ALL}, this specifies the locale for all
233purposes. The other possible values of @var{category} specify an
6dd5b57e 234single purpose (@pxref{Locale Categories}).
28f540f4
RM
235
236You can also use this function to find out the current locale by passing
237a null pointer as the @var{locale} argument. In this case,
238@code{setlocale} returns a string that is the name of the locale
239currently selected for category @var{category}.
240
241The string returned by @code{setlocale} can be overwritten by subsequent
242calls, so you should make a copy of the string (@pxref{Copying and
243Concatenation}) if you want to save it past any further calls to
244@code{setlocale}. (The standard library is guaranteed never to call
245@code{setlocale} itself.)
246
403cb8a1
UD
247You should not modify the string returned by @code{setlocale}. It might
248be the same string that was passed as an argument in a previous call to
249@code{setlocale}. One requirement is that the @var{category} must be
250the same in the call the string was returned and the one when the string
251is passed in as @var{locale} parameter.
28f540f4
RM
252
253When you read the current locale for category @code{LC_ALL}, the value
254encodes the entire combination of selected locales for all categories.
255In this case, the value is not just a single locale name. In fact, we
256don't make any promises about what it looks like. But if you specify
257the same ``locale name'' with @code{LC_ALL} in a subsequent call to
258@code{setlocale}, it restores the same combination of locale selections.
259
6dd5b57e
UD
260To be sure you can use the returned string encoding the currently selected
261locale at a later time, you must make a copy of the string. It is not
262guaranteed that the returned pointer remains valid over time.
85c165be 263
28f540f4 264When the @var{locale} argument is not a null pointer, the string returned
6dd5b57e 265by @code{setlocale} reflects the newly-modified locale.
28f540f4
RM
266
267If you specify an empty string for @var{locale}, this means to read the
268appropriate environment variable and use its value to select the locale
269for @var{category}.
270
6dd5b57e
UD
271If a nonempty string is given for @var{locale}, then the locale of that
272name is used if possible.
85c165be 273
28f540f4
RM
274If you specify an invalid locale name, @code{setlocale} returns a null
275pointer and leaves the current locale unchanged.
276@end deftypefun
277
278Here is an example showing how you might use @code{setlocale} to
279temporarily switch to a new locale.
280
281@smallexample
282#include <stddef.h>
283#include <locale.h>
284#include <stdlib.h>
285#include <string.h>
286
287void
288with_other_locale (char *new_locale,
289 void (*subroutine) (int),
290 int argument)
291@{
292 char *old_locale, *saved_locale;
293
294 /* @r{Get the name of the current locale.} */
295 old_locale = setlocale (LC_ALL, NULL);
f65fd747 296
28f540f4
RM
297 /* @r{Copy the name so it won't be clobbered by @code{setlocale}.} */
298 saved_locale = strdup (old_locale);
816e6eb5 299 if (saved_locale == NULL)
28f540f4 300 fatal ("Out of memory");
f65fd747 301
28f540f4
RM
302 /* @r{Now change the locale and do some stuff with it.} */
303 setlocale (LC_ALL, new_locale);
304 (*subroutine) (argument);
f65fd747 305
28f540f4
RM
306 /* @r{Restore the original locale.} */
307 setlocale (LC_ALL, saved_locale);
308 free (saved_locale);
309@}
310@end smallexample
311
f65fd747 312@strong{Portability Note:} Some @w{ISO C} systems may define additional
6dd5b57e 313locale categories, and future versions of the library will do so. For
85c165be
UD
314portability, assume that any symbol beginning with @samp{LC_} might be
315defined in @file{locale.h}.
28f540f4 316
85c165be 317@node Standard Locales, Locale Information, Setting the Locale, Locales
28f540f4
RM
318@section Standard Locales
319
320The only locale names you can count on finding on all operating systems
321are these three standard ones:
322
323@table @code
324@item "C"
325This is the standard C locale. The attributes and behavior it provides
f65fd747 326are specified in the @w{ISO C} standard. When your program starts up, it
28f540f4
RM
327initially uses this locale by default.
328
329@item "POSIX"
330This is the standard POSIX locale. Currently, it is an alias for the
331standard C locale.
332
333@item ""
334The empty name says to select a locale based on environment variables.
335@xref{Locale Categories}.
336@end table
337
338Defining and installing named locales is normally a responsibility of
339the system administrator at your site (or the person who installed the
85c165be
UD
340GNU C library). It is also possible for the user to create private
341locales. All this will be discussed later when describing the tool to
6dd5b57e 342do so.
85c165be 343@comment (@pxref{Building Locale Files}).
28f540f4
RM
344
345If your program needs to use something other than the @samp{C} locale,
346it will be more portable if you use whatever locale the user specifies
347with the environment, rather than trying to specify some non-standard
348locale explicitly by name. Remember, different machines might have
349different sets of locales installed.
350
85c165be 351@node Locale Information, Formatting Numbers, Standard Locales, Locales
6dd5b57e 352@section Accessing Locale Information
85c165be 353
6dd5b57e 354There are several ways to access locale information. The simplest
85c165be 355way is to let the C library itself do the work. Several of the
6dd5b57e
UD
356functions in this library implicitly access the locale data, and use
357what information is provided by the currently selected locale. This is
85c165be
UD
358how the locale model is meant to work normally.
359
6dd5b57e 360As an example take the @code{strftime} function, which is meant to nicely
99a20616 361format date and time information (@pxref{Formatting Calendar Time}).
85c165be 362Part of the standard information contained in the @code{LC_TIME}
6dd5b57e 363category is the names of the months. Instead of requiring the
85c165be 364programmer to take care of providing the translations the
6dd5b57e
UD
365@code{strftime} function does this all by itself. @code{%A}
366in the format string is replaced by the appropriate weekday
367name of the locale currently selected by @code{LC_TIME}. This is an
368easy example, and wherever possible functions do things automatically
369in this way.
370
371But there are quite often situations when there is simply no function
372to perform the task, or it is simply not possible to do the work
85c165be
UD
373automatically. For these cases it is necessary to access the
374information in the locale directly. To do this the C library provides
375two functions: @code{localeconv} and @code{nl_langinfo}. The former is
376part of @w{ISO C} and therefore portable, but has a brain-damaged
377interface. The second is part of the Unix interface and is portable in
378as far as the system follows the Unix standards.
28f540f4 379
85c165be
UD
380@menu
381* The Lame Way to Locale Data:: ISO C's @code{localeconv}.
382* The Elegant and Fast Way:: X/Open's @code{nl_langinfo}.
383@end menu
384
385@node The Lame Way to Locale Data, The Elegant and Fast Way, ,Locale Information
c66dbe00 386@subsection @code{localeconv}: It is portable but @dots{}
85c165be
UD
387
388Together with the @code{setlocale} function the @w{ISO C} people
6dd5b57e
UD
389invented the @code{localeconv} function. It is a masterpiece of poor
390design. It is expensive to use, not extendable, and not generally
391usable as it provides access to only @code{LC_MONETARY} and
392@code{LC_NUMERIC} related information. Nevertheless, if it is
393applicable to a given situation it should be used since it is very
394portable. The function @code{strfmon} formats monetary amounts
395according to the selected locale using this information.
28f540f4
RM
396@pindex locale.h
397@cindex monetary value formatting
398@cindex numeric value formatting
399
400@comment locale.h
f65fd747 401@comment ISO
28f540f4
RM
402@deftypefun {struct lconv *} localeconv (void)
403The @code{localeconv} function returns a pointer to a structure whose
404components contain information about how numeric and monetary values
405should be formatted in the current locale.
406
85c165be 407You should not modify the structure or its contents. The structure might
28f540f4
RM
408be overwritten by subsequent calls to @code{localeconv}, or by calls to
409@code{setlocale}, but no other function in the library overwrites this
410value.
411@end deftypefun
412
413@comment locale.h
f65fd747 414@comment ISO
28f540f4 415@deftp {Data Type} {struct lconv}
6dd5b57e
UD
416@code{localeconv}'s return value is of this data type. Its elements are
417described in the following subsections.
28f540f4
RM
418@end deftp
419
420If a member of the structure @code{struct lconv} has type @code{char},
421and the value is @code{CHAR_MAX}, it means that the current locale has
422no value for that parameter.
423
424@menu
425* General Numeric:: Parameters for formatting numbers and
426 currency amounts.
427* Currency Symbol:: How to print the symbol that identifies an
428 amount of money (e.g. @samp{$}).
429* Sign of Money Amount:: How to print the (positive or negative) sign
430 for a monetary amount, if one exists.
431@end menu
432
85c165be
UD
433@node General Numeric, Currency Symbol, , The Lame Way to Locale Data
434@subsubsection Generic Numeric Formatting Parameters
28f540f4
RM
435
436These are the standard members of @code{struct lconv}; there may be
437others.
438
439@table @code
440@item char *decimal_point
441@itemx char *mon_decimal_point
442These are the decimal-point separators used in formatting non-monetary
443and monetary quantities, respectively. In the @samp{C} locale, the
444value of @code{decimal_point} is @code{"."}, and the value of
445@code{mon_decimal_point} is @code{""}.
446@cindex decimal-point separator
447
448@item char *thousands_sep
449@itemx char *mon_thousands_sep
450These are the separators used to delimit groups of digits to the left of
451the decimal point in formatting non-monetary and monetary quantities,
452respectively. In the @samp{C} locale, both members have a value of
453@code{""} (the empty string).
454
455@item char *grouping
456@itemx char *mon_grouping
457These are strings that specify how to group the digits to the left of
458the decimal point. @code{grouping} applies to non-monetary quantities
459and @code{mon_grouping} applies to monetary quantities. Use either
460@code{thousands_sep} or @code{mon_thousands_sep} to separate the digit
461groups.
462@cindex grouping of digits
463
bcf6d602
UD
464Each member of these strings is to be interpreted as an integer value of
465type @code{char}. Successive numbers (from left to right) give the
466sizes of successive groups (from right to left, starting at the decimal
467point.) The last member is either @code{0}, in which case the previous
468member is used over and over again for all the remaining groups, or
469@code{CHAR_MAX}, in which case there is no more grouping---or, put
470another way, any remaining digits form one large group without
471separators.
472
473For example, if @code{grouping} is @code{"\04\03\02"}, the correct
474grouping for the number @code{123456787654321} is @samp{12}, @samp{34},
28f540f4
RM
475@samp{56}, @samp{78}, @samp{765}, @samp{4321}. This uses a group of 4
476digits at the end, preceded by a group of 3 digits, preceded by groups
477of 2 digits (as many as needed). With a separator of @samp{,}, the
478number would be printed as @samp{12,34,56,78,765,4321}.
479
bcf6d602 480A value of @code{"\03"} indicates repeated groups of three digits, as
28f540f4
RM
481normally used in the U.S.
482
483In the standard @samp{C} locale, both @code{grouping} and
484@code{mon_grouping} have a value of @code{""}. This value specifies no
485grouping at all.
486
487@item char int_frac_digits
488@itemx char frac_digits
489These are small integers indicating how many fractional digits (to the
490right of the decimal point) should be displayed in a monetary value in
491international and local formats, respectively. (Most often, both
492members have the same value.)
493
494In the standard @samp{C} locale, both of these members have the value
f65fd747 495@code{CHAR_MAX}, meaning ``unspecified''. The ISO standard doesn't say
6dd5b57e 496what to do when you find this value; we recommend printing no
28f540f4
RM
497fractional digits. (This locale also specifies the empty string for
498@code{mon_decimal_point}, so printing any fractional digits would be
499confusing!)
500@end table
501
85c165be
UD
502@node Currency Symbol, Sign of Money Amount, General Numeric, The Lame Way to Locale Data
503@subsubsection Printing the Currency Symbol
28f540f4
RM
504@cindex currency symbols
505
506These members of the @code{struct lconv} structure specify how to print
507the symbol to identify a monetary value---the international analog of
508@samp{$} for US dollars.
509
510Each country has two standard currency symbols. The @dfn{local currency
511symbol} is used commonly within the country, while the
512@dfn{international currency symbol} is used internationally to refer to
513that country's currency when it is necessary to indicate the country
514unambiguously.
515
516For example, many countries use the dollar as their monetary unit, and
517when dealing with international currencies it's important to specify
518that one is dealing with (say) Canadian dollars instead of U.S. dollars
519or Australian dollars. But when the context is known to be Canada,
520there is no need to make this explicit---dollar amounts are implicitly
521assumed to be in Canadian dollars.
522
523@table @code
524@item char *currency_symbol
525The local currency symbol for the selected locale.
526
527In the standard @samp{C} locale, this member has a value of @code{""}
f65fd747 528(the empty string), meaning ``unspecified''. The ISO standard doesn't
28f540f4 529say what to do when you find this value; we recommend you simply print
6dd5b57e
UD
530the empty string as you would print any other string pointed to by this
531variable.
28f540f4
RM
532
533@item char *int_curr_symbol
534The international currency symbol for the selected locale.
535
536The value of @code{int_curr_symbol} should normally consist of a
537three-letter abbreviation determined by the international standard
538@cite{ISO 4217 Codes for the Representation of Currency and Funds},
539followed by a one-character separator (often a space).
540
541In the standard @samp{C} locale, this member has a value of @code{""}
6dd5b57e
UD
542(the empty string), meaning ``unspecified''. We recommend you simply print
543the empty string as you would print any other string pointed to by this
544variable.
28f540f4
RM
545
546@item char p_cs_precedes
547@itemx char n_cs_precedes
bcf6d602
UD
548@itemx char int_p_cs_precedes
549@itemx char int_n_cs_precedes
550These members are @code{1} if the @code{currency_symbol} or
551@code{int_curr_symbol} strings should precede the value of a monetary
552amount, or @code{0} if the strings should follow the value. The
553@code{p_cs_precedes} and @code{int_p_cs_precedes} members apply to
554positive amounts (or zero), and the @code{n_cs_precedes} and
555@code{int_n_cs_precedes} members apply to negative amounts.
556
557In the standard @samp{C} locale, all of these members have a value of
f65fd747 558@code{CHAR_MAX}, meaning ``unspecified''. The ISO standard doesn't say
6dd5b57e
UD
559what to do when you find this value. We recommend printing the
560currency symbol before the amount, which is right for most countries.
28f540f4
RM
561In other words, treat all nonzero values alike in these members.
562
bcf6d602
UD
563The members with the @code{int_} prefix apply to the
564@code{int_curr_symbol} while the other two apply to
565@code{currency_symbol}.
28f540f4
RM
566
567@item char p_sep_by_space
568@itemx char n_sep_by_space
bcf6d602
UD
569@itemx char int_p_sep_by_space
570@itemx char int_n_sep_by_space
28f540f4 571These members are @code{1} if a space should appear between the
bcf6d602
UD
572@code{currency_symbol} or @code{int_curr_symbol} strings and the
573amount, or @code{0} if no space should appear. The
574@code{p_sep_by_space} and @code{int_p_sep_by_space} members apply to
575positive amounts (or zero), and the @code{n_sep_by_space} and
576@code{int_n_sep_by_space} members apply to negative amounts.
28f540f4 577
bcf6d602 578In the standard @samp{C} locale, all of these members have a value of
f65fd747 579@code{CHAR_MAX}, meaning ``unspecified''. The ISO standard doesn't say
28f540f4 580what you should do when you find this value; we suggest you treat it as
6dd5b57e 5811 (print a space). In other words, treat all nonzero values alike in
28f540f4
RM
582these members.
583
bcf6d602
UD
584The members with the @code{int_} prefix apply to the
585@code{int_curr_symbol} while the other two apply to
586@code{currency_symbol}. There is one specialty with the
587@code{int_curr_symbol}, though. Since all legal values contain a space
588at the end the string one either printf this space (if the currency
589symbol must appear in front and must be separated) or one has to avoid
590printing this character at all (especially when at the end of the
591string).
28f540f4
RM
592@end table
593
85c165be 594@node Sign of Money Amount, , Currency Symbol, The Lame Way to Locale Data
6dd5b57e 595@subsubsection Printing the Sign of a Monetary Amount
28f540f4
RM
596
597These members of the @code{struct lconv} structure specify how to print
6dd5b57e 598the sign (if any) of a monetary value.
28f540f4
RM
599
600@table @code
601@item char *positive_sign
602@itemx char *negative_sign
603These are strings used to indicate positive (or zero) and negative
6dd5b57e 604monetary quantities, respectively.
28f540f4
RM
605
606In the standard @samp{C} locale, both of these members have a value of
607@code{""} (the empty string), meaning ``unspecified''.
608
f65fd747 609The ISO standard doesn't say what to do when you find this value; we
28f540f4
RM
610recommend printing @code{positive_sign} as you find it, even if it is
611empty. For a negative value, print @code{negative_sign} as you find it
612unless both it and @code{positive_sign} are empty, in which case print
613@samp{-} instead. (Failing to indicate the sign at all seems rather
614unreasonable.)
615
616@item char p_sign_posn
617@itemx char n_sign_posn
bcf6d602
UD
618@itemx char int_p_sign_posn
619@itemx char int_n_sign_posn
6dd5b57e 620These members are small integers that indicate how to
28f540f4
RM
621position the sign for nonnegative and negative monetary quantities,
622respectively. (The string used by the sign is what was specified with
623@code{positive_sign} or @code{negative_sign}.) The possible values are
624as follows:
625
626@table @code
627@item 0
628The currency symbol and quantity should be surrounded by parentheses.
629
630@item 1
631Print the sign string before the quantity and currency symbol.
632
633@item 2
634Print the sign string after the quantity and currency symbol.
635
636@item 3
637Print the sign string right before the currency symbol.
638
639@item 4
640Print the sign string right after the currency symbol.
641
642@item CHAR_MAX
643``Unspecified''. Both members have this value in the standard
644@samp{C} locale.
645@end table
646
f65fd747 647The ISO standard doesn't say what you should do when the value is
28f540f4
RM
648@code{CHAR_MAX}. We recommend you print the sign after the currency
649symbol.
28f540f4 650
bcf6d602
UD
651The members with the @code{int_} prefix apply to the
652@code{int_curr_symbol} while the other two apply to
653@code{currency_symbol}.
654@end table
85c165be
UD
655
656@node The Elegant and Fast Way, , The Lame Way to Locale Data, Locale Information
657@subsection Pinpoint Access to Locale Data
658
5e0889da
UD
659When writing the X/Open Portability Guide the authors realized that the
660@code{localeconv} function is not enough to provide reasonable access to
6dd5b57e 661locale information. The information which was meant to be available
5e0889da 662in the locale (as later specified in the POSIX.1 standard) requires more
6dd5b57e 663ways to access it. Therefore the @code{nl_langinfo} function
5e0889da 664was introduced.
85c165be
UD
665
666@comment langinfo.h
667@comment XOPEN
668@deftypefun {char *} nl_langinfo (nl_item @var{item})
669The @code{nl_langinfo} function can be used to access individual
6dd5b57e
UD
670elements of the locale categories. Unlike the @code{localeconv}
671function, which returns all the information, @code{nl_langinfo}
672lets the caller select what information it requires. This is very
673fast and it is not a problem to call this function multiple times.
85c165be 674
6dd5b57e
UD
675A second advantage is that in addition to the numeric and monetary
676formatting information, information from the
85c165be
UD
677@code{LC_TIME} and @code{LC_MESSAGES} categories is available.
678
6dd5b57e
UD
679The type @code{nl_type} is defined in @file{nl_types.h}. The argument
680@var{item} is a numeric value defined in the header @file{langinfo.h}.
681The X/Open standard defines the following values:
85c165be
UD
682
683@vtable @code
684@item ABDAY_1
685@itemx ABDAY_2
686@itemx ABDAY_3
687@itemx ABDAY_4
688@itemx ABDAY_5
689@itemx ABDAY_6
690@itemx ABDAY_7
691@code{nl_langinfo} returns the abbreviated weekday name. @code{ABDAY_1}
692corresponds to Sunday.
693@item DAY_1
694@itemx DAY_2
695@itemx DAY_3
696@itemx DAY_4
697@itemx DAY_5
698@itemx DAY_6
699@itemx DAY_7
6dd5b57e 700Similar to @code{ABDAY_1} etc., but here the return value is the
5e0889da 701unabbreviated weekday name.
85c165be
UD
702@item ABMON_1
703@itemx ABMON_2
704@itemx ABMON_3
705@itemx ABMON_4
706@itemx ABMON_5
707@itemx ABMON_6
708@itemx ABMON_7
709@itemx ABMON_8
710@itemx ABMON_9
711@itemx ABMON_10
712@itemx ABMON_11
713@itemx ABMON_12
6dd5b57e 714The return value is abbreviated name of the month. @code{ABMON_1}
85c165be
UD
715corresponds to January.
716@item MON_1
717@itemx MON_2
718@itemx MON_3
719@itemx MON_4
720@itemx MON_5
721@itemx MON_6
722@itemx MON_7
723@itemx MON_8
724@itemx MON_9
725@itemx MON_10
726@itemx MON_11
727@itemx MON_12
6dd5b57e 728Similar to @code{ABMON_1} etc., but here the month names are not abbreviated.
85c165be
UD
729Here the first value @code{MON_1} also corresponds to January.
730@item AM_STR
731@itemx PM_STR
6dd5b57e
UD
732The return values are strings which can be used in the representation of time
733as an hour from 1 to 12 plus an am/pm specifier.
85c165be 734
6dd5b57e
UD
735Note that in locales which do not use this time representation
736these strings might be empty, in which case the am/pm format
85c165be
UD
737cannot be used at all.
738@item D_T_FMT
739The return value can be used as a format string for @code{strftime} to
6dd5b57e 740represent time and date in a locale-specific way.
85c165be
UD
741@item D_FMT
742The return value can be used as a format string for @code{strftime} to
6dd5b57e 743represent a date in a locale-specific way.
85c165be
UD
744@item T_FMT
745The return value can be used as a format string for @code{strftime} to
6dd5b57e 746represent time in a locale-specific way.
85c165be
UD
747@item T_FMT_AMPM
748The return value can be used as a format string for @code{strftime} to
6dd5b57e 749represent time in the am/pm format.
85c165be 750
6dd5b57e
UD
751Note that if the am/pm format does not make any sense for the
752selected locale, the return value might be the same as the one for
85c165be
UD
753@code{T_FMT}.
754@item ERA
6dd5b57e
UD
755The return value represents the era used in the current locale.
756
757Most locales do not define this value. An example of a locale which
758does define this value is the Japanese one. In Japan, the traditional
759representation of dates includes the name of the era corresponding to
760the then-emperor's reign.
761
762Normally it should not be necessary to use this value directly.
763Specifying the @code{E} modifier in their format strings causes the
764@code{strftime} functions to use this information. The format of the
765returned string is not specified, and therefore you should not assume
766knowledge of it on different systems.
85c165be 767@item ERA_YEAR
6dd5b57e 768The return value gives the year in the relevant era of the locale.
85c165be
UD
769As for @code{ERA} it should not be necessary to use this value directly.
770@item ERA_D_T_FMT
771This return value can be used as a format string for @code{strftime} to
6dd5b57e 772represent dates and times in a locale-specific era-based way.
85c165be
UD
773@item ERA_D_FMT
774This return value can be used as a format string for @code{strftime} to
6dd5b57e 775represent a date in a locale-specific era-based way.
85c165be
UD
776@item ERA_T_FMT
777This return value can be used as a format string for @code{strftime} to
6dd5b57e 778represent time in a locale-specific era-based way.
85c165be
UD
779@item ALT_DIGITS
780The return value is a representation of up to @math{100} values used to
781represent the values @math{0} to @math{99}. As for @code{ERA} this
782value is not intended to be used directly, but instead indirectly
783through the @code{strftime} function. When the modifier @code{O} is
6dd5b57e
UD
784used in a format which would otherwise use numerals to represent hours,
785minutes, seconds, weekdays, months, or weeks, the appropriate value for
786the locale is used instead.
85c165be 787@item INT_CURR_SYMBOL
6dd5b57e 788The same as the value returned by @code{localeconv} in the
85c165be
UD
789@code{int_curr_symbol} element of the @code{struct lconv}.
790@item CURRENCY_SYMBOL
791@itemx CRNCYSTR
6dd5b57e 792The same as the value returned by @code{localeconv} in the
85c165be
UD
793@code{currency_symbol} element of the @code{struct lconv}.
794
6dd5b57e 795@code{CRNCYSTR} is a deprecated alias still required by Unix98.
85c165be 796@item MON_DECIMAL_POINT
6dd5b57e 797The same as the value returned by @code{localeconv} in the
85c165be
UD
798@code{mon_decimal_point} element of the @code{struct lconv}.
799@item MON_THOUSANDS_SEP
6dd5b57e 800The same as the value returned by @code{localeconv} in the
85c165be
UD
801@code{mon_thousands_sep} element of the @code{struct lconv}.
802@item MON_GROUPING
6dd5b57e 803The same as the value returned by @code{localeconv} in the
85c165be
UD
804@code{mon_grouping} element of the @code{struct lconv}.
805@item POSITIVE_SIGN
6dd5b57e 806The same as the value returned by @code{localeconv} in the
85c165be
UD
807@code{positive_sign} element of the @code{struct lconv}.
808@item NEGATIVE_SIGN
6dd5b57e 809The same as the value returned by @code{localeconv} in the
85c165be
UD
810@code{negative_sign} element of the @code{struct lconv}.
811@item INT_FRAC_DIGITS
6dd5b57e 812The same as the value returned by @code{localeconv} in the
85c165be
UD
813@code{int_frac_digits} element of the @code{struct lconv}.
814@item FRAC_DIGITS
6dd5b57e 815The same as the value returned by @code{localeconv} in the
85c165be
UD
816@code{frac_digits} element of the @code{struct lconv}.
817@item P_CS_PRECEDES
6dd5b57e 818The same as the value returned by @code{localeconv} in the
85c165be
UD
819@code{p_cs_precedes} element of the @code{struct lconv}.
820@item P_SEP_BY_SPACE
6dd5b57e 821The same as the value returned by @code{localeconv} in the
85c165be
UD
822@code{p_sep_by_space} element of the @code{struct lconv}.
823@item N_CS_PRECEDES
6dd5b57e 824The same as the value returned by @code{localeconv} in the
85c165be
UD
825@code{n_cs_precedes} element of the @code{struct lconv}.
826@item N_SEP_BY_SPACE
6dd5b57e 827The same as the value returned by @code{localeconv} in the
85c165be
UD
828@code{n_sep_by_space} element of the @code{struct lconv}.
829@item P_SIGN_POSN
6dd5b57e 830The same as the value returned by @code{localeconv} in the
85c165be
UD
831@code{p_sign_posn} element of the @code{struct lconv}.
832@item N_SIGN_POSN
6dd5b57e 833The same as the value returned by @code{localeconv} in the
85c165be
UD
834@code{n_sign_posn} element of the @code{struct lconv}.
835@item DECIMAL_POINT
836@itemx RADIXCHAR
6dd5b57e 837The same as the value returned by @code{localeconv} in the
85c165be
UD
838@code{decimal_point} element of the @code{struct lconv}.
839
840The name @code{RADIXCHAR} is a deprecated alias still used in Unix98.
841@item THOUSANDS_SEP
842@itemx THOUSEP
6dd5b57e 843The same as the value returned by @code{localeconv} in the
85c165be
UD
844@code{thousands_sep} element of the @code{struct lconv}.
845
846The name @code{THOUSEP} is a deprecated alias still used in Unix98.
847@item GROUPING
6dd5b57e 848The same as the value returned by @code{localeconv} in the
85c165be
UD
849@code{grouping} element of the @code{struct lconv}.
850@item YESEXPR
851The return value is a regular expression which can be used with the
852@code{regex} function to recognize a positive response to a yes/no
853question.
854@item NOEXPR
855The return value is a regular expression which can be used with the
856@code{regex} function to recognize a negative response to a yes/no
857question.
858@item YESSTR
6dd5b57e 859The return value is a locale-specific translation of the positive response
85c165be
UD
860to a yes/no question.
861
862Using this value is deprecated since it is a very special case of
6dd5b57e 863message translation, and is better handled by the message
85c165be
UD
864translation functions (@pxref{Message Translation}).
865@item NOSTR
6dd5b57e 866The return value is a locale-specific translation of the negative response
85c165be
UD
867to a yes/no question. What is said for @code{YESSTR} is also true here.
868@end vtable
869
870The file @file{langinfo.h} defines a lot more symbols but none of them
6dd5b57e
UD
871is official. Using them is not portable, and the format of the
872return values might change. Therefore we recommended you not use
873them.
874
875Note that the return value for any valid argument can be used for
876in all situations (with the possible exception of the am/pm time formatting
877codes). If the user has not selected any locale for the
878appropriate category, @code{nl_langinfo} returns the information from the
85c165be
UD
879@code{"C"} locale. It is therefore possible to use this function as
880shown in the example below.
881
6941c42a
UD
882If the argument @var{item} is not valid, a pointer to an empty string is
883returned.
85c165be
UD
884@end deftypefun
885
6dd5b57e
UD
886An example of @code{nl_langinfo} usage is a function which has to
887print a given date and time in a locale-specific way. At first one
888might think that, since @code{strftime} internally uses the locale
889information, writing something like the following is enough:
85c165be
UD
890
891@smallexample
892size_t
893i18n_time_n_data (char *s, size_t len, const struct tm *tp)
894@{
895 return strftime (s, len, "%X %D", tp);
896@}
897@end smallexample
898
899The format contains no weekday or month names and therefore is
900internationally usable. Wrong! The output produced is something like
901@code{"hh:mm:ss MM/DD/YY"}. This format is only recognizable in the
902USA. Other countries use different formats. Therefore the function
903should be rewritten like this:
904
905@smallexample
906size_t
907i18n_time_n_data (char *s, size_t len, const struct tm *tp)
908@{
909 return strftime (s, len, nl_langinfo (D_T_FMT), tp);
910@}
911@end smallexample
912
6dd5b57e
UD
913Now it uses the date and time format of the locale
914selected when the program runs. If the user selects the locale
85c165be
UD
915correctly there should never be a misunderstanding over the time and
916date format.
917
d01d6319 918@node Formatting Numbers, , Locale Information, Locales
5e0889da 919@section A dedicated function to format numbers
85c165be 920
5e0889da 921We have seen that the structure returned by @code{localeconv} as well as
6dd5b57e
UD
922the values given to @code{nl_langinfo} allow you to retrieve the various
923pieces of locale-specific information to format numbers and monetary
924amounts. We have also seen that the underlying rules are quite complex.
85c165be 925
6dd5b57e
UD
926Therefore the X/Open standards introduce a function which uses such
927locale information, making it easier for the user to format
85c165be
UD
928numbers according to these rules.
929
930@deftypefun ssize_t strfmon (char *@var{s}, size_t @var{maxsize}, const char *@var{format}, @dots{})
931The @code{strfmon} function is similar to the @code{strftime} function
6dd5b57e
UD
932in that it takes a buffer, its size, a format string,
933and values to write into the buffer as text in a form specified
934by the format string. Like @code{strftime}, the function
85c165be
UD
935also returns the number of bytes written into the buffer.
936
6dd5b57e
UD
937There are two differences: @code{strfmon} can take more than one
938argument, and, of course, the format specification is different. Like
939@code{strftime}, the format string consists of normal text, which is
940output as is, and format specifiers, which are indicated by a @samp{%}.
941Immediately after the @samp{%}, you can optionally specify various flags
942and formatting information before the main formatting character, in a
943similar way to @code{printf}:
85c165be
UD
944
945@itemize @bullet
946@item
947Immediately following the @samp{%} there can be one or more of the
948following flags:
949@table @asis
950@item @samp{=@var{f}}
951The single byte character @var{f} is used for this field as the numeric
952fill character. By default this character is a space character.
953Filling with this character is only performed if a left precision
954is specified. It is not just to fill to the given field width.
955@item @samp{^}
6dd5b57e
UD
956The number is printed without grouping the digits according to the rules
957of the current locale. By default grouping is enabled.
85c165be 958@item @samp{+}, @samp{(}
6dd5b57e
UD
959At most one of these flags can be used. They select which format to
960represent the sign of a currency amount. By default, and if
961@samp{+} is given, the locale equivalent of @math{+}/@math{-} is used. If
962@samp{(} is given, negative amounts are enclosed in parentheses. The
85c165be
UD
963exact format is determined by the values of the @code{LC_MONETARY}
964category of the locale selected at program runtime.
965@item @samp{!}
966The output will not contain the currency symbol.
967@item @samp{-}
6dd5b57e
UD
968The output will be formatted left-justified instead of right-justified if
969it does not fill the entire field width.
85c165be
UD
970@end table
971@end itemize
972
6dd5b57e
UD
973The next part of a specification is an optional field width. If no
974width is specified @math{0} is taken. During output, the function first
975determines how much space is required. If it requires at least as many
976characters as given by the field width, it is output using as much space
977as necessary. Otherwise, it is extended to use the full width by
978filling with the space character. The presence or absence of the
979@samp{-} flag determines the side at which such padding occurs. If
980present, the spaces are added at the right making the output
981left-justified, and vice versa.
982
983So far the format looks familiar, being similar to the @code{printf} and
984@code{strftime} formats. However, the next two optional fields
985introduce something new. The first one is a @samp{#} character followed
986by a decimal digit string. The value of the digit string specifies the
987number of @emph{digit} positions to the left of the decimal point (or
988equivalent). This does @emph{not} include the grouping character when
989the @samp{^} flag is not given. If the space needed to print the number
990does not fill the whole width, the field is padded at the left side with
991the fill character, which can be selected using the @samp{=} flag and by
992default is a space. For example, if the field width is selected as 6
993and the number is @math{123}, the fill character is @samp{*} the result
994will be @samp{***123}.
995
996The second optional field starts with a @samp{.} (period) and consists
997of another decimal digit string. Its value describes the number of
998characters printed after the decimal point. The default is selected
999from the current locale (@code{frac_digits}, @code{int_frac_digits}, see
1000@pxref{General Numeric}). If the exact representation needs more digits
1001than given by the field width, the displayed value is rounded. If the
1002number of fractional digits is selected to be zero, no decimal point is
1003printed.
1004
1005As a GNU extension, the @code{strfmon} implementation in the GNU libc
1006allows an optional @samp{L} next as a format modifier. If this modifier
1007is given, the argument is expected to be a @code{long double} instead of
1008a @code{double} value.
1009
1010Finally, the last component is a format specifier. There are three
1011specifiers defined:
85c165be
UD
1012
1013@table @asis
1014@item @samp{i}
6dd5b57e 1015Use the locale's rules for formatting an international currency value.
85c165be 1016@item @samp{n}
6dd5b57e 1017Use the locale's rules for formatting a national currency value.
85c165be 1018@item @samp{%}
6dd5b57e 1019Place a @samp{%} in the output. There must be no flag, width
85c165be
UD
1020specifier or modifier given, only @samp{%%} is allowed.
1021@end table
1022
6dd5b57e 1023As for @code{printf}, the function reads the format string
5e0889da
UD
1024from left to right and uses the values passed to the function following
1025the format string. The values are expected to be either of type
1026@code{double} or @code{long double}, depending on the presence of the
85c165be
UD
1027modifier @samp{L}. The result is stored in the buffer pointed to by
1028@var{s}. At most @var{maxsize} characters are stored.
1029
1030The return value of the function is the number of characters stored in
6dd5b57e
UD
1031@var{s}, including the terminating @code{NULL} byte. If the number of
1032characters stored would exceed @var{maxsize}, the function returns
85c165be
UD
1033@math{-1} and the content of the buffer @var{s} is unspecified. In this
1034case @code{errno} is set to @code{E2BIG}.
1035@end deftypefun
1036
6dd5b57e 1037A few examples should make clear how the function works. It is
85c165be 1038assumed that all the following pieces of code are executed in a program
6dd5b57e 1039which uses the USA locale (@code{en_US}). The simplest
85c165be
UD
1040form of the format is this:
1041
1042@smallexample
1043strfmon (buf, 100, "@@%n@@%n@@%n@@", 123.45, -567.89, 12345.678);
1044@end smallexample
1045
1046@noindent
1047The output produced is
1048@smallexample
655b26bb 1049"@@$123.45@@-$567.89@@$12,345.68@@"
85c165be
UD
1050@end smallexample
1051
6dd5b57e
UD
1052We can notice several things here. First, the widths of the output
1053numbers are different. We have not specified a width in the format
1054string, and so this is no wonder. Second, the third number is printed
1055using thousands separators. The thousands separator for the
1056@code{en_US} locale is a comma. The number is also rounded.
1057@math{.678} is rounded to @math{.68} since the format does not specify a
1058precision and the default value in the locale is @math{2}. Finally,
1059note that the national currency symbol is printed since @samp{%n} was
1060used, not @samp{i}. The next example shows how we can align the output.
85c165be
UD
1061
1062@smallexample
1063strfmon (buf, 100, "@@%=*11n@@%=*11n@@%=*11n@@", 123.45, -567.89, 12345.678);
1064@end smallexample
1065
1066@noindent
1067The output this time is:
1068
1069@smallexample
655b26bb 1070"@@ $123.45@@ -$567.89@@ $12,345.68@@"
85c165be
UD
1071@end smallexample
1072
6dd5b57e 1073Two things stand out. Firstly, all fields have the same width (eleven
85c165be
UD
1074characters) since this is the width given in the format and since no
1075number required more characters to be printed. The second important
1076point is that the fill character is not used. This is correct since the
6dd5b57e
UD
1077white space was not used to achieve a precision given by a @samp{#}
1078modifier, but instead to fill to the given width. The difference
1079becomes obvious if we now add a width specification.
85c165be
UD
1080
1081@smallexample
1082strfmon (buf, 100, "@@%=*11#5n@@%=*11#5n@@%=*11#5n@@",
1083 123.45, -567.89, 12345.678);
1084@end smallexample
1085
1086@noindent
1087The output is
1088
1089@smallexample
1090"@@ $***123.45@@-$***567.89@@ $12,456.68@@"
1091@end smallexample
1092
6dd5b57e
UD
1093Here we can see that all the currency symbols are now aligned, and that
1094the space between the currency sign and the number is filled with the
1095selected fill character. Note that although the width is selected to be
1096@math{5} and @math{123.45} has three digits left of the decimal point,
1097the space is filled with three asterisks. This is correct since, as
1098explained above, the width does not include the positions used to store
1099thousands separators. One last example should explain the remaining
1100functionality.
85c165be
UD
1101
1102@smallexample
1103strfmon (buf, 100, "@@%=0(16#5.3i@@%=0(16#5.3i@@%=0(16#5.3i@@",
1104 123.45, -567.89, 12345.678);
1105@end smallexample
1106
1107@noindent
1108This rather complex format string produces the following output:
1109
1110@smallexample
1111"@@ USD 000123,450 @@(USD 000567.890)@@ USD 12,345.678 @@"
1112@end smallexample
1113
6dd5b57e
UD
1114The most noticeable change is the alternative way of representing
1115negative numbers. In financial circles this is often done using
1116parentheses, and this is what the @samp{(} flag selected. The fill
1117character is now @samp{0}. Note that this @samp{0} character is not
1118regarded as a numeric zero, and therefore the first and second numbers
1119are not printed using a thousands separator. Since we used the format
1120specifier @samp{i} instead of @samp{n}, the international form of the
85c165be 1121currency symbol is used. This is a four letter string, in this case
6dd5b57e
UD
1122@code{"USD "}. The last point is that since the precision right of the
1123decimal point is selected to be three, the first and second numbers are
1124printed with an extra zero at the end and the third number is printed
1125without rounding.