]> git.ipfire.org Git - thirdparty/glibc.git/blame - manual/ctype.texi
localedata: Remove executable bit from localedata/locales/bi_VU [BZ #23995]
[thirdparty/glibc.git] / manual / ctype.texi
CommitLineData
99a20616 1@node Character Handling, String and Array Utilities, Memory, Top
7a68c94a 2@c %MENU% Character testing and conversion functions
28f540f4
RM
3@chapter Character Handling
4
5Programs that work with characters and strings often need to classify a
6character---is it alphabetic, is it a digit, is it whitespace, and so
7on---and perform case conversion operations on characters. The
8functions in the header file @file{ctype.h} are provided for this
9purpose.
10@pindex ctype.h
11
12Since the choice of locale and character set can alter the
13classifications of particular character codes, all of these functions
14are affected by the current locale. (More precisely, they are affected
15by the locale currently selected for character classification---the
16@code{LC_CTYPE} category; see @ref{Locale Categories}.)
17
390955cb
UD
18The @w{ISO C} standard specifies two different sets of functions. The
19one set works on @code{char} type characters, the other one on
bc938d3d 20@code{wchar_t} wide characters (@pxref{Extended Char Intro}).
28f540f4 21
390955cb
UD
22@menu
23* Classification of Characters:: Testing whether characters are
24 letters, digits, punctuation, etc.
25
26* Case Conversion:: Case mapping, and the like.
27* Classification of Wide Characters:: Character class determination for
28 wide characters.
29* Using Wide Char Classes:: Notes on using the wide character
30 classes.
31* Wide Character Case Conversion:: Mapping of wide characters.
28f540f4
RM
32@end menu
33
34@node Classification of Characters, Case Conversion, , Character Handling
35@section Classification of Characters
36@cindex character testing
37@cindex classification of characters
38@cindex predicates on characters
39@cindex character predicates
40
41This section explains the library functions for classifying characters.
42For example, @code{isalpha} is the function to test for an alphabetic
43character. It takes one argument, the character to test, and returns a
44nonzero integer if the character is alphabetic, and zero otherwise. You
45would use it like this:
46
47@smallexample
48if (isalpha (c))
49 printf ("The character `%c' is alphabetic.\n", c);
50@end smallexample
51
52Each of the functions in this section tests for membership in a
53particular class of characters; each has a name starting with @samp{is}.
54Each of them takes one argument, which is a character to test, and
55returns an @code{int} which is treated as a boolean value. The
56character argument is passed as an @code{int}, and it may be the
f65fd747 57constant value @code{EOF} instead of a real character.
28f540f4
RM
58
59The attributes of any given character can vary between locales.
60@xref{Locales}, for more information on locales.@refill
61
62These functions are declared in the header file @file{ctype.h}.
63@pindex ctype.h
64
65@cindex lower-case character
28f540f4 66@deftypefun int islower (int @var{c})
d08a7e4c 67@standards{ISO, ctype.h}
c49130e3
AO
68@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
69@c The is* macros call __ctype_b_loc to get the ctype array from the
70@c current locale, and then index it by c. __ctype_b_loc reads from
71@c thread-local memory the (indirect) pointer to the ctype array, which
72@c may involve one word access to the global locale object, if that's
73@c the active locale for the thread, and the array, being part of the
74@c locale data, is undeletable, so there's no thread-safety issue. We
75@c might want to mark these with @mtslocale to flag to callers that
76@c changing locales might affect them, even if not these simpler
77@c functions.
390955cb
UD
78Returns true if @var{c} is a lower-case letter. The letter need not be
79from the Latin alphabet, any alphabet representable is valid.
28f540f4
RM
80@end deftypefun
81
82@cindex upper-case character
28f540f4 83@deftypefun int isupper (int @var{c})
d08a7e4c 84@standards{ISO, ctype.h}
c49130e3 85@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
390955cb
UD
86Returns true if @var{c} is an upper-case letter. The letter need not be
87from the Latin alphabet, any alphabet representable is valid.
28f540f4
RM
88@end deftypefun
89
90@cindex alphabetic character
28f540f4 91@deftypefun int isalpha (int @var{c})
d08a7e4c 92@standards{ISO, ctype.h}
c49130e3 93@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
94Returns true if @var{c} is an alphabetic character (a letter). If
95@code{islower} or @code{isupper} is true of a character, then
96@code{isalpha} is also true.
97
98In some locales, there may be additional characters for which
cc3fa755 99@code{isalpha} is true---letters which are neither upper case nor lower
28f540f4
RM
100case. But in the standard @code{"C"} locale, there are no such
101additional characters.
102@end deftypefun
103
104@cindex digit character
105@cindex decimal digit character
28f540f4 106@deftypefun int isdigit (int @var{c})
d08a7e4c 107@standards{ISO, ctype.h}
c49130e3 108@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
109Returns true if @var{c} is a decimal digit (@samp{0} through @samp{9}).
110@end deftypefun
111
112@cindex alphanumeric character
28f540f4 113@deftypefun int isalnum (int @var{c})
d08a7e4c 114@standards{ISO, ctype.h}
c49130e3 115@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
116Returns true if @var{c} is an alphanumeric character (a letter or
117number); in other words, if either @code{isalpha} or @code{isdigit} is
118true of a character, then @code{isalnum} is also true.
119@end deftypefun
120
121@cindex hexadecimal digit character
28f540f4 122@deftypefun int isxdigit (int @var{c})
d08a7e4c 123@standards{ISO, ctype.h}
c49130e3 124@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
125Returns true if @var{c} is a hexadecimal digit.
126Hexadecimal digits include the normal decimal digits @samp{0} through
127@samp{9} and the letters @samp{A} through @samp{F} and
128@samp{a} through @samp{f}.
129@end deftypefun
130
131@cindex punctuation character
28f540f4 132@deftypefun int ispunct (int @var{c})
d08a7e4c 133@standards{ISO, ctype.h}
c49130e3 134@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
135Returns true if @var{c} is a punctuation character.
136This means any printing character that is not alphanumeric or a space
137character.
138@end deftypefun
139
140@cindex whitespace character
28f540f4 141@deftypefun int isspace (int @var{c})
d08a7e4c 142@standards{ISO, ctype.h}
c49130e3 143@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
144Returns true if @var{c} is a @dfn{whitespace} character. In the standard
145@code{"C"} locale, @code{isspace} returns true for only the standard
146whitespace characters:
147
148@table @code
149@item ' '
150space
151
152@item '\f'
153formfeed
154
155@item '\n'
156newline
157
158@item '\r'
159carriage return
160
161@item '\t'
162horizontal tab
163
164@item '\v'
165vertical tab
166@end table
167@end deftypefun
168
169@cindex blank character
28f540f4 170@deftypefun int isblank (int @var{c})
d08a7e4c 171@standards{ISO, ctype.h}
c49130e3 172@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4 173Returns true if @var{c} is a blank character; that is, a space or a tab.
d3466201 174This function was originally a GNU extension, but was added in @w{ISO C99}.
28f540f4
RM
175@end deftypefun
176
177@cindex graphic character
28f540f4 178@deftypefun int isgraph (int @var{c})
d08a7e4c 179@standards{ISO, ctype.h}
c49130e3 180@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
181Returns true if @var{c} is a graphic character; that is, a character
182that has a glyph associated with it. The whitespace characters are not
183considered graphic.
184@end deftypefun
185
186@cindex printing character
28f540f4 187@deftypefun int isprint (int @var{c})
d08a7e4c 188@standards{ISO, ctype.h}
c49130e3 189@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
190Returns true if @var{c} is a printing character. Printing characters
191include all the graphic characters, plus the space (@samp{ }) character.
192@end deftypefun
193
194@cindex control character
28f540f4 195@deftypefun int iscntrl (int @var{c})
d08a7e4c 196@standards{ISO, ctype.h}
c49130e3 197@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
198Returns true if @var{c} is a control character (that is, a character that
199is not a printing character).
200@end deftypefun
201
202@cindex ASCII character
28f540f4 203@deftypefun int isascii (int @var{c})
d08a7e4c
RJ
204@standards{SVID, ctype.h}
205@standards{BSD, ctype.h}
c49130e3 206@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
207Returns true if @var{c} is a 7-bit @code{unsigned char} value that fits
208into the US/UK ASCII character set. This function is a BSD extension
209and is also an SVID extension.
210@end deftypefun
211
390955cb 212@node Case Conversion, Classification of Wide Characters, Classification of Characters, Character Handling
28f540f4
RM
213@section Case Conversion
214@cindex character case conversion
215@cindex case conversion of characters
216@cindex converting case of characters
217
218This section explains the library functions for performing conversions
219such as case mappings on characters. For example, @code{toupper}
220converts any character to upper case if possible. If the character
221can't be converted, @code{toupper} returns it unchanged.
222
223These functions take one argument of type @code{int}, which is the
224character to convert, and return the converted character as an
225@code{int}. If the conversion is not applicable to the argument given,
226the argument is returned unchanged.
227
f65fd747 228@strong{Compatibility Note:} In pre-@w{ISO C} dialects, instead of
28f540f4
RM
229returning the argument unchanged, these functions may fail when the
230argument is not suitable for the conversion. Thus for portability, you
231may need to write @code{islower(c) ? toupper(c) : c} rather than just
232@code{toupper(c)}.
233
234These functions are declared in the header file @file{ctype.h}.
235@pindex ctype.h
236
28f540f4 237@deftypefun int tolower (int @var{c})
d08a7e4c 238@standards{ISO, ctype.h}
c49130e3
AO
239@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
240@c The to* macros/functions call different functions that use different
241@c arrays than those of__ctype_b_loc, but the access patterns and
242@c thus safety guarantees are the same.
28f540f4
RM
243If @var{c} is an upper-case letter, @code{tolower} returns the corresponding
244lower-case letter. If @var{c} is not an upper-case letter,
245@var{c} is returned unchanged.
246@end deftypefun
247
28f540f4 248@deftypefun int toupper (int @var{c})
d08a7e4c 249@standards{ISO, ctype.h}
c49130e3 250@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
390955cb 251If @var{c} is a lower-case letter, @code{toupper} returns the corresponding
28f540f4
RM
252upper-case letter. Otherwise @var{c} is returned unchanged.
253@end deftypefun
254
28f540f4 255@deftypefun int toascii (int @var{c})
d08a7e4c
RJ
256@standards{SVID, ctype.h}
257@standards{BSD, ctype.h}
c49130e3 258@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
259This function converts @var{c} to a 7-bit @code{unsigned char} value
260that fits into the US/UK ASCII character set, by clearing the high-order
261bits. This function is a BSD extension and is also an SVID extension.
262@end deftypefun
263
28f540f4 264@deftypefun int _tolower (int @var{c})
d08a7e4c 265@standards{SVID, ctype.h}
c49130e3 266@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
267This is identical to @code{tolower}, and is provided for compatibility
268with the SVID. @xref{SVID}.@refill
269@end deftypefun
270
28f540f4 271@deftypefun int _toupper (int @var{c})
d08a7e4c 272@standards{SVID, ctype.h}
c49130e3 273@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
274This is identical to @code{toupper}, and is provided for compatibility
275with the SVID.
276@end deftypefun
390955cb
UD
277
278
279@node Classification of Wide Characters, Using Wide Char Classes, Case Conversion, Character Handling
280@section Character class determination for wide characters
281
aaca11d8
UD
282@w{Amendment 1} to @w{ISO C90} defines functions to classify wide
283characters. Although the original @w{ISO C90} standard already defined
6dd5b57e 284the type @code{wchar_t}, no functions operating on them were defined.
390955cb
UD
285
286The general design of the classification functions for wide characters
6dd5b57e
UD
287is more general. It allows extensions to the set of available
288classifications, beyond those which are always available. The POSIX
289standard specifies how extensions can be made, and this is already
1f77f049 290implemented in the @glibcadj{} implementation of the @code{localedef}
bc938d3d 291program.
390955cb 292
6dd5b57e
UD
293The character class functions are normally implemented with bitsets,
294with a bitset per character. For a given character, the appropriate
295bitset is read from a table and a test is performed as to whether a
296certain bit is set. Which bit is tested for is determined by the
297class.
390955cb
UD
298
299For the wide character classification functions this is made visible.
6dd5b57e
UD
300There is a type classification type defined, a function to retrieve this
301value for a given class, and a function to test whether a given
302character is in this class, using the classification value. On top of
303this the normal character classification functions as used for
390955cb
UD
304@code{char} objects can be defined.
305
390955cb 306@deftp {Data type} wctype_t
d08a7e4c 307@standards{ISO, wctype.h}
390955cb 308The @code{wctype_t} can hold a value which represents a character class.
6dd5b57e 309The only defined way to generate such a value is by using the
390955cb
UD
310@code{wctype} function.
311
312@pindex wctype.h
313This type is defined in @file{wctype.h}.
314@end deftp
315
390955cb 316@deftypefun wctype_t wctype (const char *@var{property})
d08a7e4c 317@standards{ISO, wctype.h}
c49130e3
AO
318@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
319@c Although the source code of wctype contains multiple references to
320@c the locale, that could each reference different locale_data objects
321@c should the global locale object change while active, the compiler can
322@c and does combine them all into a single dereference that resolves
323@c once to the LCTYPE locale object used throughout the function, so it
324@c is safe in (optimized) practice, if not in theory, even when the
325@c locale changes. Ideally we'd explicitly save the resolved
326@c locale_data object to make it visibly safe instead of safe only under
327@c compiler optimizations, but given the decision that setlocale is
328@c MT-Unsafe, all this would afford us would be the ability to not mark
329@c this function with @mtslocale.
d17acc2b
RJ
330@code{wctype} returns a value representing a class of wide
331characters which is identified by the string @var{property}. Besides
390955cb 332some standard properties each locale can define its own ones. In case
6dd5b57e
UD
333no property with the given name is known for the current locale
334selected for the @code{LC_CTYPE} category, the function returns zero.
390955cb
UD
335
336@noindent
337The properties known in every locale are:
338
339@multitable @columnfractions .25 .25 .25 .25
340@item
341@code{"alnum"} @tab @code{"alpha"} @tab @code{"cntrl"} @tab @code{"digit"}
342@item
343@code{"graph"} @tab @code{"lower"} @tab @code{"print"} @tab @code{"punct"}
344@item
345@code{"space"} @tab @code{"upper"} @tab @code{"xdigit"}
346@end multitable
347
348@pindex wctype.h
349This function is declared in @file{wctype.h}.
350@end deftypefun
351
352To test the membership of a character to one of the non-standard classes
353the @w{ISO C} standard defines a completely new function.
354
390955cb 355@deftypefun int iswctype (wint_t @var{wc}, wctype_t @var{desc})
d08a7e4c 356@standards{ISO, wctype.h}
c49130e3
AO
357@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
358@c The compressed lookup table returned by wctype is read-only.
390955cb
UD
359This function returns a nonzero value if @var{wc} is in the character
360class specified by @var{desc}. @var{desc} must previously be returned
361by a successful call to @code{wctype}.
362
363@pindex wctype.h
364This function is declared in @file{wctype.h}.
365@end deftypefun
366
6dd5b57e
UD
367To make it easier to use the commonly-used classification functions,
368they are defined in the C library. There is no need to use
bc938d3d 369@code{wctype} if the property string is one of the known character
390955cb 370classes. In some situations it is desirable to construct the property
6dd5b57e 371strings, and then it is important that @code{wctype} can also handle the
390955cb
UD
372standard classes.
373
374@cindex alphanumeric character
390955cb 375@deftypefun int iswalnum (wint_t @var{wc})
d08a7e4c 376@standards{ISO, wctype.h}
c49130e3
AO
377@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
378@c The implicit wctype call in the isw* functions is actually an
379@c optimized version because the category has a known offset, but the
380@c wctype is equally safe when optimized, unsafe with changing locales
381@c if not optimized (thus @mtslocale). Since it's not a macro, we
382@c always optimize, and the locale can't change in any MT-Safe way, it's
383@c fine. The test whether wc is ASCII to use the non-wide is*
384@c macro/function doesn't bring any other safety issues: the test does
385@c not depend on the locale, and each path after the decision resolves
386@c the locale object only once.
390955cb
UD
387This function returns a nonzero value if @var{wc} is an alphanumeric
388character (a letter or number); in other words, if either @code{iswalpha}
389or @code{iswdigit} is true of a character, then @code{iswalnum} is also
390true.
391
392@noindent
393This function can be implemented using
394
395@smallexample
396iswctype (wc, wctype ("alnum"))
397@end smallexample
398
399@pindex wctype.h
18fd611b 400It is declared in @file{wctype.h}.
390955cb
UD
401@end deftypefun
402
403@cindex alphabetic character
390955cb 404@deftypefun int iswalpha (wint_t @var{wc})
d08a7e4c 405@standards{ISO, wctype.h}
c49130e3 406@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
390955cb
UD
407Returns true if @var{wc} is an alphabetic character (a letter). If
408@code{iswlower} or @code{iswupper} is true of a character, then
409@code{iswalpha} is also true.
410
411In some locales, there may be additional characters for which
412@code{iswalpha} is true---letters which are neither upper case nor lower
413case. But in the standard @code{"C"} locale, there are no such
414additional characters.
415
416@noindent
417This function can be implemented using
418
419@smallexample
420iswctype (wc, wctype ("alpha"))
421@end smallexample
422
423@pindex wctype.h
18fd611b 424It is declared in @file{wctype.h}.
390955cb
UD
425@end deftypefun
426
427@cindex control character
390955cb 428@deftypefun int iswcntrl (wint_t @var{wc})
d08a7e4c 429@standards{ISO, wctype.h}
c49130e3 430@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
390955cb
UD
431Returns true if @var{wc} is a control character (that is, a character that
432is not a printing character).
433
434@noindent
435This function can be implemented using
436
437@smallexample
438iswctype (wc, wctype ("cntrl"))
439@end smallexample
440
441@pindex wctype.h
18fd611b 442It is declared in @file{wctype.h}.
390955cb
UD
443@end deftypefun
444
445@cindex digit character
390955cb 446@deftypefun int iswdigit (wint_t @var{wc})
d08a7e4c 447@standards{ISO, wctype.h}
c49130e3 448@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
390955cb
UD
449Returns true if @var{wc} is a digit (e.g., @samp{0} through @samp{9}).
450Please note that this function does not only return a nonzero value for
451@emph{decimal} digits, but for all kinds of digits. A consequence is
452that code like the following will @strong{not} work unconditionally for
453wide characters:
454
455@smallexample
456n = 0;
6dd5b57e 457while (iswdigit (*wc))
390955cb
UD
458 @{
459 n *= 10;
460 n += *wc++ - L'0';
461 @}
462@end smallexample
463
464@noindent
465This function can be implemented using
466
467@smallexample
468iswctype (wc, wctype ("digit"))
469@end smallexample
470
471@pindex wctype.h
18fd611b 472It is declared in @file{wctype.h}.
390955cb
UD
473@end deftypefun
474
475@cindex graphic character
390955cb 476@deftypefun int iswgraph (wint_t @var{wc})
d08a7e4c 477@standards{ISO, wctype.h}
c49130e3 478@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
390955cb
UD
479Returns true if @var{wc} is a graphic character; that is, a character
480that has a glyph associated with it. The whitespace characters are not
481considered graphic.
482
483@noindent
484This function can be implemented using
485
486@smallexample
487iswctype (wc, wctype ("graph"))
488@end smallexample
489
490@pindex wctype.h
18fd611b 491It is declared in @file{wctype.h}.
390955cb
UD
492@end deftypefun
493
494@cindex lower-case character
390955cb 495@deftypefun int iswlower (wint_t @var{wc})
d08a7e4c 496@standards{ISO, ctype.h}
c49130e3 497@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
390955cb
UD
498Returns true if @var{wc} is a lower-case letter. The letter need not be
499from the Latin alphabet, any alphabet representable is valid.
500
501@noindent
502This function can be implemented using
503
504@smallexample
505iswctype (wc, wctype ("lower"))
506@end smallexample
507
508@pindex wctype.h
18fd611b 509It is declared in @file{wctype.h}.
390955cb
UD
510@end deftypefun
511
512@cindex printing character
390955cb 513@deftypefun int iswprint (wint_t @var{wc})
d08a7e4c 514@standards{ISO, wctype.h}
c49130e3 515@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
390955cb
UD
516Returns true if @var{wc} is a printing character. Printing characters
517include all the graphic characters, plus the space (@samp{ }) character.
518
519@noindent
520This function can be implemented using
521
522@smallexample
523iswctype (wc, wctype ("print"))
524@end smallexample
525
526@pindex wctype.h
18fd611b 527It is declared in @file{wctype.h}.
390955cb
UD
528@end deftypefun
529
530@cindex punctuation character
390955cb 531@deftypefun int iswpunct (wint_t @var{wc})
d08a7e4c 532@standards{ISO, wctype.h}
c49130e3 533@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
390955cb
UD
534Returns true if @var{wc} is a punctuation character.
535This means any printing character that is not alphanumeric or a space
536character.
537
538@noindent
539This function can be implemented using
540
541@smallexample
542iswctype (wc, wctype ("punct"))
543@end smallexample
544
545@pindex wctype.h
18fd611b 546It is declared in @file{wctype.h}.
390955cb
UD
547@end deftypefun
548
549@cindex whitespace character
390955cb 550@deftypefun int iswspace (wint_t @var{wc})
d08a7e4c 551@standards{ISO, wctype.h}
c49130e3 552@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
390955cb
UD
553Returns true if @var{wc} is a @dfn{whitespace} character. In the standard
554@code{"C"} locale, @code{iswspace} returns true for only the standard
555whitespace characters:
556
557@table @code
558@item L' '
559space
560
561@item L'\f'
562formfeed
563
564@item L'\n'
565newline
566
567@item L'\r'
568carriage return
569
570@item L'\t'
571horizontal tab
572
573@item L'\v'
574vertical tab
575@end table
576
577@noindent
578This function can be implemented using
579
580@smallexample
581iswctype (wc, wctype ("space"))
582@end smallexample
583
584@pindex wctype.h
18fd611b 585It is declared in @file{wctype.h}.
390955cb
UD
586@end deftypefun
587
588@cindex upper-case character
390955cb 589@deftypefun int iswupper (wint_t @var{wc})
d08a7e4c 590@standards{ISO, wctype.h}
c49130e3 591@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
390955cb
UD
592Returns true if @var{wc} is an upper-case letter. The letter need not be
593from the Latin alphabet, any alphabet representable is valid.
594
595@noindent
596This function can be implemented using
597
598@smallexample
599iswctype (wc, wctype ("upper"))
600@end smallexample
601
602@pindex wctype.h
18fd611b 603It is declared in @file{wctype.h}.
390955cb
UD
604@end deftypefun
605
606@cindex hexadecimal digit character
390955cb 607@deftypefun int iswxdigit (wint_t @var{wc})
d08a7e4c 608@standards{ISO, wctype.h}
c49130e3 609@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
390955cb
UD
610Returns true if @var{wc} is a hexadecimal digit.
611Hexadecimal digits include the normal decimal digits @samp{0} through
612@samp{9} and the letters @samp{A} through @samp{F} and
613@samp{a} through @samp{f}.
614
615@noindent
616This function can be implemented using
617
618@smallexample
619iswctype (wc, wctype ("xdigit"))
620@end smallexample
621
622@pindex wctype.h
18fd611b 623It is declared in @file{wctype.h}.
390955cb
UD
624@end deftypefun
625
1f77f049 626@Theglibc{} also provides a function which is not defined in the
390955cb
UD
627@w{ISO C} standard but which is available as a version for single byte
628characters as well.
629
630@cindex blank character
390955cb 631@deftypefun int iswblank (wint_t @var{wc})
d08a7e4c 632@standards{ISO, wctype.h}
c49130e3 633@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
390955cb 634Returns true if @var{wc} is a blank character; that is, a space or a tab.
d3466201
RM
635This function was originally a GNU extension, but was added in @w{ISO C99}.
636It is declared in @file{wchar.h}.
390955cb
UD
637@end deftypefun
638
639@node Using Wide Char Classes, Wide Character Case Conversion, Classification of Wide Characters, Character Handling
640@section Notes on using the wide character classes
641
6dd5b57e 642The first note is probably not astonishing but still occasionally a
390955cb 643cause of problems. The @code{isw@var{XXX}} functions can be implemented
1f77f049 644using macros and in fact, @theglibc{} does this. They are still
390955cb 645available as real functions but when the @file{wctype.h} header is
6dd5b57e 646included the macros will be used. This is the same as the
390955cb
UD
647@code{char} type versions of these functions.
648
bc938d3d
UD
649The second note covers something new. It can be best illustrated by a
650(real-world) example. The first piece of code is an excerpt from the
651original code. It is truncated a bit but the intention should be clear.
390955cb
UD
652
653@smallexample
654int
655is_in_class (int c, const char *class)
656@{
657 if (strcmp (class, "alnum") == 0)
658 return isalnum (c);
659 if (strcmp (class, "alpha") == 0)
660 return isalpha (c);
661 if (strcmp (class, "cntrl") == 0)
662 return iscntrl (c);
95fdc6a0 663 @dots{}
390955cb
UD
664 return 0;
665@}
666@end smallexample
667
6dd5b57e
UD
668Now, with the @code{wctype} and @code{iswctype} you can avoid the
669@code{if} cascades, but rewriting the code as follows is wrong:
390955cb
UD
670
671@smallexample
672int
673is_in_class (int c, const char *class)
674@{
675 wctype_t desc = wctype (class);
676 return desc ? iswctype ((wint_t) c, desc) : 0;
677@}
678@end smallexample
679
bc938d3d 680The problem is that it is not guaranteed that the wide character
390955cb 681representation of a single-byte character can be found using casting.
6dd5b57e 682In fact, usually this fails miserably. The correct solution to this
390955cb
UD
683problem is to write the code as follows:
684
685@smallexample
686int
687is_in_class (int c, const char *class)
688@{
689 wctype_t desc = wctype (class);
690 return desc ? iswctype (btowc (c), desc) : 0;
691@}
692@end smallexample
693
e18db2b0 694@xref{Converting a Character}, for more information on @code{btowc}.
6dd5b57e 695Note that this change probably does not improve the performance
390955cb 696of the program a lot since the @code{wctype} function still has to make
6dd5b57e
UD
697the string comparisons. It gets really interesting if the
698@code{is_in_class} function is called more than once for the
390955cb
UD
699same class name. In this case the variable @var{desc} could be computed
700once and reused for all the calls. Therefore the above form of the
701function is probably not the final one.
702
703
704@node Wide Character Case Conversion, , Using Wide Char Classes, Character Handling
705@section Mapping of wide characters.
706
6dd5b57e
UD
707The classification functions are also generalized by the @w{ISO C}
708standard. Instead of just allowing the two standard mappings, a
709locale can contain others. Again, the @code{localedef} program
710already supports generating such locale data files.
390955cb 711
390955cb 712@deftp {Data Type} wctrans_t
d08a7e4c 713@standards{ISO, wctype.h}
390955cb
UD
714This data type is defined as a scalar type which can hold a value
715representing the locale-dependent character mapping. There is no way to
b912ca11 716construct such a value apart from using the return value of the
390955cb
UD
717@code{wctrans} function.
718
719@pindex wctype.h
720@noindent
721This type is defined in @file{wctype.h}.
722@end deftp
723
464d646f 724@deftypefun wctrans_t wctrans (const char *@var{property})
d08a7e4c 725@standards{ISO, wctype.h}
c49130e3
AO
726@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
727@c Similar implementation, same caveats as wctype.
390955cb
UD
728The @code{wctrans} function has to be used to find out whether a named
729mapping is defined in the current locale selected for the
6dd5b57e
UD
730@code{LC_CTYPE} category. If the returned value is non-zero, you can use
731it afterwards in calls to @code{towctrans}. If the return value is
390955cb
UD
732zero no such mapping is known in the current locale.
733
734Beside locale-specific mappings there are two mappings which are
735guaranteed to be available in every locale:
736
737@multitable @columnfractions .5 .5
738@item
739@code{"tolower"} @tab @code{"toupper"}
740@end multitable
741
742@pindex wctype.h
743@noindent
6dd5b57e 744These functions are declared in @file{wctype.h}.
390955cb
UD
745@end deftypefun
746
390955cb 747@deftypefun wint_t towctrans (wint_t @var{wc}, wctrans_t @var{desc})
d08a7e4c 748@standards{ISO, wctype.h}
c49130e3
AO
749@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
750@c Same caveats as iswctype.
6dd5b57e
UD
751@code{towctrans} maps the input character @var{wc}
752according to the rules of the mapping for which @var{desc} is a
753descriptor, and returns the value it finds. @var{desc} must be
390955cb
UD
754obtained by a successful call to @code{wctrans}.
755
756@pindex wctype.h
757@noindent
758This function is declared in @file{wctype.h}.
759@end deftypefun
760
6dd5b57e
UD
761For the generally available mappings, the @w{ISO C} standard defines
762convenient shortcuts so that it is not necessary to call @code{wctrans}
390955cb
UD
763for them.
764
390955cb 765@deftypefun wint_t towlower (wint_t @var{wc})
d08a7e4c 766@standards{ISO, wctype.h}
c49130e3
AO
767@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
768@c Same caveats as iswalnum, just using a wctrans rather than a wctype
769@c table.
390955cb
UD
770If @var{wc} is an upper-case letter, @code{towlower} returns the corresponding
771lower-case letter. If @var{wc} is not an upper-case letter,
772@var{wc} is returned unchanged.
773
18fd611b
UD
774@noindent
775@code{towlower} can be implemented using
776
777@smallexample
778towctrans (wc, wctrans ("tolower"))
779@end smallexample
780
390955cb
UD
781@pindex wctype.h
782@noindent
783This function is declared in @file{wctype.h}.
784@end deftypefun
785
390955cb 786@deftypefun wint_t towupper (wint_t @var{wc})
d08a7e4c 787@standards{ISO, wctype.h}
c49130e3 788@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
390955cb
UD
789If @var{wc} is a lower-case letter, @code{towupper} returns the corresponding
790upper-case letter. Otherwise @var{wc} is returned unchanged.
791
18fd611b
UD
792@noindent
793@code{towupper} can be implemented using
794
795@smallexample
796towctrans (wc, wctrans ("toupper"))
797@end smallexample
798
390955cb
UD
799@pindex wctype.h
800@noindent
801This function is declared in @file{wctype.h}.
802@end deftypefun
803
804The same warnings given in the last section for the use of the wide
6dd5b57e 805character classification functions apply here. It is not possible to
390955cb 806simply cast a @code{char} type value to a @code{wint_t} and use it as an
6dd5b57e 807argument to @code{towctrans} calls.