]>
Commit | Line | Data |
---|---|---|
99a20616 | 1 | @node Character Handling, String and Array Utilities, Memory, Top |
7a68c94a | 2 | @c %MENU% Character testing and conversion functions |
28f540f4 RM |
3 | @chapter Character Handling |
4 | ||
5 | Programs that work with characters and strings often need to classify a | |
6 | character---is it alphabetic, is it a digit, is it whitespace, and so | |
7 | on---and perform case conversion operations on characters. The | |
8 | functions in the header file @file{ctype.h} are provided for this | |
9 | purpose. | |
10 | @pindex ctype.h | |
11 | ||
12 | Since the choice of locale and character set can alter the | |
13 | classifications of particular character codes, all of these functions | |
14 | are affected by the current locale. (More precisely, they are affected | |
15 | by the locale currently selected for character classification---the | |
16 | @code{LC_CTYPE} category; see @ref{Locale Categories}.) | |
17 | ||
390955cb UD |
18 | The @w{ISO C} standard specifies two different sets of functions. The |
19 | one set works on @code{char} type characters, the other one on | |
bc938d3d | 20 | @code{wchar_t} wide characters (@pxref{Extended Char Intro}). |
28f540f4 | 21 | |
390955cb UD |
22 | @menu |
23 | * Classification of Characters:: Testing whether characters are | |
24 | letters, digits, punctuation, etc. | |
25 | ||
26 | * Case Conversion:: Case mapping, and the like. | |
27 | * Classification of Wide Characters:: Character class determination for | |
28 | wide characters. | |
29 | * Using Wide Char Classes:: Notes on using the wide character | |
30 | classes. | |
31 | * Wide Character Case Conversion:: Mapping of wide characters. | |
28f540f4 RM |
32 | @end menu |
33 | ||
34 | @node Classification of Characters, Case Conversion, , Character Handling | |
35 | @section Classification of Characters | |
36 | @cindex character testing | |
37 | @cindex classification of characters | |
38 | @cindex predicates on characters | |
39 | @cindex character predicates | |
40 | ||
41 | This section explains the library functions for classifying characters. | |
42 | For example, @code{isalpha} is the function to test for an alphabetic | |
43 | character. It takes one argument, the character to test, and returns a | |
44 | nonzero integer if the character is alphabetic, and zero otherwise. You | |
45 | would use it like this: | |
46 | ||
47 | @smallexample | |
48 | if (isalpha (c)) | |
49 | printf ("The character `%c' is alphabetic.\n", c); | |
50 | @end smallexample | |
51 | ||
52 | Each of the functions in this section tests for membership in a | |
53 | particular class of characters; each has a name starting with @samp{is}. | |
54 | Each of them takes one argument, which is a character to test, and | |
55 | returns an @code{int} which is treated as a boolean value. The | |
56 | character argument is passed as an @code{int}, and it may be the | |
f65fd747 | 57 | constant value @code{EOF} instead of a real character. |
28f540f4 RM |
58 | |
59 | The attributes of any given character can vary between locales. | |
0005e54f | 60 | @xref{Locales}, for more information on locales. |
28f540f4 RM |
61 | |
62 | These functions are declared in the header file @file{ctype.h}. | |
63 | @pindex ctype.h | |
64 | ||
65 | @cindex lower-case character | |
28f540f4 | 66 | @deftypefun int islower (int @var{c}) |
d08a7e4c | 67 | @standards{ISO, ctype.h} |
c49130e3 AO |
68 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
69 | @c The is* macros call __ctype_b_loc to get the ctype array from the | |
70 | @c current locale, and then index it by c. __ctype_b_loc reads from | |
71 | @c thread-local memory the (indirect) pointer to the ctype array, which | |
72 | @c may involve one word access to the global locale object, if that's | |
73 | @c the active locale for the thread, and the array, being part of the | |
74 | @c locale data, is undeletable, so there's no thread-safety issue. We | |
75 | @c might want to mark these with @mtslocale to flag to callers that | |
76 | @c changing locales might affect them, even if not these simpler | |
77 | @c functions. | |
390955cb UD |
78 | Returns true if @var{c} is a lower-case letter. The letter need not be |
79 | from the Latin alphabet, any alphabet representable is valid. | |
28f540f4 RM |
80 | @end deftypefun |
81 | ||
82 | @cindex upper-case character | |
28f540f4 | 83 | @deftypefun int isupper (int @var{c}) |
d08a7e4c | 84 | @standards{ISO, ctype.h} |
c49130e3 | 85 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
390955cb UD |
86 | Returns true if @var{c} is an upper-case letter. The letter need not be |
87 | from the Latin alphabet, any alphabet representable is valid. | |
28f540f4 RM |
88 | @end deftypefun |
89 | ||
90 | @cindex alphabetic character | |
28f540f4 | 91 | @deftypefun int isalpha (int @var{c}) |
d08a7e4c | 92 | @standards{ISO, ctype.h} |
c49130e3 | 93 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
94 | Returns true if @var{c} is an alphabetic character (a letter). If |
95 | @code{islower} or @code{isupper} is true of a character, then | |
96 | @code{isalpha} is also true. | |
97 | ||
98 | In some locales, there may be additional characters for which | |
cc3fa755 | 99 | @code{isalpha} is true---letters which are neither upper case nor lower |
28f540f4 RM |
100 | case. But in the standard @code{"C"} locale, there are no such |
101 | additional characters. | |
102 | @end deftypefun | |
103 | ||
104 | @cindex digit character | |
105 | @cindex decimal digit character | |
28f540f4 | 106 | @deftypefun int isdigit (int @var{c}) |
d08a7e4c | 107 | @standards{ISO, ctype.h} |
c49130e3 | 108 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
109 | Returns true if @var{c} is a decimal digit (@samp{0} through @samp{9}). |
110 | @end deftypefun | |
111 | ||
112 | @cindex alphanumeric character | |
28f540f4 | 113 | @deftypefun int isalnum (int @var{c}) |
d08a7e4c | 114 | @standards{ISO, ctype.h} |
c49130e3 | 115 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
116 | Returns true if @var{c} is an alphanumeric character (a letter or |
117 | number); in other words, if either @code{isalpha} or @code{isdigit} is | |
118 | true of a character, then @code{isalnum} is also true. | |
119 | @end deftypefun | |
120 | ||
121 | @cindex hexadecimal digit character | |
28f540f4 | 122 | @deftypefun int isxdigit (int @var{c}) |
d08a7e4c | 123 | @standards{ISO, ctype.h} |
c49130e3 | 124 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
125 | Returns true if @var{c} is a hexadecimal digit. |
126 | Hexadecimal digits include the normal decimal digits @samp{0} through | |
127 | @samp{9} and the letters @samp{A} through @samp{F} and | |
128 | @samp{a} through @samp{f}. | |
129 | @end deftypefun | |
130 | ||
131 | @cindex punctuation character | |
28f540f4 | 132 | @deftypefun int ispunct (int @var{c}) |
d08a7e4c | 133 | @standards{ISO, ctype.h} |
c49130e3 | 134 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
135 | Returns true if @var{c} is a punctuation character. |
136 | This means any printing character that is not alphanumeric or a space | |
137 | character. | |
138 | @end deftypefun | |
139 | ||
140 | @cindex whitespace character | |
28f540f4 | 141 | @deftypefun int isspace (int @var{c}) |
d08a7e4c | 142 | @standards{ISO, ctype.h} |
c49130e3 | 143 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
144 | Returns true if @var{c} is a @dfn{whitespace} character. In the standard |
145 | @code{"C"} locale, @code{isspace} returns true for only the standard | |
146 | whitespace characters: | |
147 | ||
148 | @table @code | |
149 | @item ' ' | |
150 | space | |
151 | ||
152 | @item '\f' | |
153 | formfeed | |
154 | ||
155 | @item '\n' | |
156 | newline | |
157 | ||
158 | @item '\r' | |
159 | carriage return | |
160 | ||
161 | @item '\t' | |
162 | horizontal tab | |
163 | ||
164 | @item '\v' | |
165 | vertical tab | |
166 | @end table | |
167 | @end deftypefun | |
168 | ||
169 | @cindex blank character | |
28f540f4 | 170 | @deftypefun int isblank (int @var{c}) |
d08a7e4c | 171 | @standards{ISO, ctype.h} |
c49130e3 | 172 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 | 173 | Returns true if @var{c} is a blank character; that is, a space or a tab. |
d3466201 | 174 | This function was originally a GNU extension, but was added in @w{ISO C99}. |
28f540f4 RM |
175 | @end deftypefun |
176 | ||
177 | @cindex graphic character | |
28f540f4 | 178 | @deftypefun int isgraph (int @var{c}) |
d08a7e4c | 179 | @standards{ISO, ctype.h} |
c49130e3 | 180 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
181 | Returns true if @var{c} is a graphic character; that is, a character |
182 | that has a glyph associated with it. The whitespace characters are not | |
183 | considered graphic. | |
184 | @end deftypefun | |
185 | ||
186 | @cindex printing character | |
28f540f4 | 187 | @deftypefun int isprint (int @var{c}) |
d08a7e4c | 188 | @standards{ISO, ctype.h} |
c49130e3 | 189 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
190 | Returns true if @var{c} is a printing character. Printing characters |
191 | include all the graphic characters, plus the space (@samp{ }) character. | |
192 | @end deftypefun | |
193 | ||
194 | @cindex control character | |
28f540f4 | 195 | @deftypefun int iscntrl (int @var{c}) |
d08a7e4c | 196 | @standards{ISO, ctype.h} |
c49130e3 | 197 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
198 | Returns true if @var{c} is a control character (that is, a character that |
199 | is not a printing character). | |
200 | @end deftypefun | |
201 | ||
202 | @cindex ASCII character | |
28f540f4 | 203 | @deftypefun int isascii (int @var{c}) |
d08a7e4c RJ |
204 | @standards{SVID, ctype.h} |
205 | @standards{BSD, ctype.h} | |
c49130e3 | 206 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
207 | Returns true if @var{c} is a 7-bit @code{unsigned char} value that fits |
208 | into the US/UK ASCII character set. This function is a BSD extension | |
209 | and is also an SVID extension. | |
210 | @end deftypefun | |
211 | ||
390955cb | 212 | @node Case Conversion, Classification of Wide Characters, Classification of Characters, Character Handling |
28f540f4 RM |
213 | @section Case Conversion |
214 | @cindex character case conversion | |
215 | @cindex case conversion of characters | |
216 | @cindex converting case of characters | |
217 | ||
218 | This section explains the library functions for performing conversions | |
219 | such as case mappings on characters. For example, @code{toupper} | |
220 | converts any character to upper case if possible. If the character | |
221 | can't be converted, @code{toupper} returns it unchanged. | |
222 | ||
223 | These functions take one argument of type @code{int}, which is the | |
224 | character to convert, and return the converted character as an | |
225 | @code{int}. If the conversion is not applicable to the argument given, | |
226 | the argument is returned unchanged. | |
227 | ||
f65fd747 | 228 | @strong{Compatibility Note:} In pre-@w{ISO C} dialects, instead of |
28f540f4 RM |
229 | returning the argument unchanged, these functions may fail when the |
230 | argument is not suitable for the conversion. Thus for portability, you | |
231 | may need to write @code{islower(c) ? toupper(c) : c} rather than just | |
232 | @code{toupper(c)}. | |
233 | ||
234 | These functions are declared in the header file @file{ctype.h}. | |
235 | @pindex ctype.h | |
236 | ||
28f540f4 | 237 | @deftypefun int tolower (int @var{c}) |
d08a7e4c | 238 | @standards{ISO, ctype.h} |
c49130e3 AO |
239 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
240 | @c The to* macros/functions call different functions that use different | |
241 | @c arrays than those of__ctype_b_loc, but the access patterns and | |
242 | @c thus safety guarantees are the same. | |
28f540f4 RM |
243 | If @var{c} is an upper-case letter, @code{tolower} returns the corresponding |
244 | lower-case letter. If @var{c} is not an upper-case letter, | |
245 | @var{c} is returned unchanged. | |
246 | @end deftypefun | |
247 | ||
28f540f4 | 248 | @deftypefun int toupper (int @var{c}) |
d08a7e4c | 249 | @standards{ISO, ctype.h} |
c49130e3 | 250 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
390955cb | 251 | If @var{c} is a lower-case letter, @code{toupper} returns the corresponding |
28f540f4 RM |
252 | upper-case letter. Otherwise @var{c} is returned unchanged. |
253 | @end deftypefun | |
254 | ||
28f540f4 | 255 | @deftypefun int toascii (int @var{c}) |
d08a7e4c RJ |
256 | @standards{SVID, ctype.h} |
257 | @standards{BSD, ctype.h} | |
c49130e3 | 258 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
259 | This function converts @var{c} to a 7-bit @code{unsigned char} value |
260 | that fits into the US/UK ASCII character set, by clearing the high-order | |
261 | bits. This function is a BSD extension and is also an SVID extension. | |
262 | @end deftypefun | |
263 | ||
28f540f4 | 264 | @deftypefun int _tolower (int @var{c}) |
d08a7e4c | 265 | @standards{SVID, ctype.h} |
c49130e3 | 266 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 | 267 | This is identical to @code{tolower}, and is provided for compatibility |
0005e54f | 268 | with the SVID. @xref{SVID}. |
28f540f4 RM |
269 | @end deftypefun |
270 | ||
28f540f4 | 271 | @deftypefun int _toupper (int @var{c}) |
d08a7e4c | 272 | @standards{SVID, ctype.h} |
c49130e3 | 273 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
274 | This is identical to @code{toupper}, and is provided for compatibility |
275 | with the SVID. | |
276 | @end deftypefun | |
390955cb UD |
277 | |
278 | ||
279 | @node Classification of Wide Characters, Using Wide Char Classes, Case Conversion, Character Handling | |
280 | @section Character class determination for wide characters | |
281 | ||
aaca11d8 UD |
282 | @w{Amendment 1} to @w{ISO C90} defines functions to classify wide |
283 | characters. Although the original @w{ISO C90} standard already defined | |
6dd5b57e | 284 | the type @code{wchar_t}, no functions operating on them were defined. |
390955cb UD |
285 | |
286 | The general design of the classification functions for wide characters | |
6dd5b57e UD |
287 | is more general. It allows extensions to the set of available |
288 | classifications, beyond those which are always available. The POSIX | |
289 | standard specifies how extensions can be made, and this is already | |
1f77f049 | 290 | implemented in the @glibcadj{} implementation of the @code{localedef} |
bc938d3d | 291 | program. |
390955cb | 292 | |
6dd5b57e UD |
293 | The character class functions are normally implemented with bitsets, |
294 | with a bitset per character. For a given character, the appropriate | |
295 | bitset is read from a table and a test is performed as to whether a | |
296 | certain bit is set. Which bit is tested for is determined by the | |
297 | class. | |
390955cb UD |
298 | |
299 | For the wide character classification functions this is made visible. | |
6dd5b57e UD |
300 | There is a type classification type defined, a function to retrieve this |
301 | value for a given class, and a function to test whether a given | |
302 | character is in this class, using the classification value. On top of | |
303 | this the normal character classification functions as used for | |
390955cb UD |
304 | @code{char} objects can be defined. |
305 | ||
390955cb | 306 | @deftp {Data type} wctype_t |
d08a7e4c | 307 | @standards{ISO, wctype.h} |
390955cb | 308 | The @code{wctype_t} can hold a value which represents a character class. |
6dd5b57e | 309 | The only defined way to generate such a value is by using the |
390955cb UD |
310 | @code{wctype} function. |
311 | ||
312 | @pindex wctype.h | |
313 | This type is defined in @file{wctype.h}. | |
314 | @end deftp | |
315 | ||
390955cb | 316 | @deftypefun wctype_t wctype (const char *@var{property}) |
d08a7e4c | 317 | @standards{ISO, wctype.h} |
c49130e3 AO |
318 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
319 | @c Although the source code of wctype contains multiple references to | |
320 | @c the locale, that could each reference different locale_data objects | |
321 | @c should the global locale object change while active, the compiler can | |
322 | @c and does combine them all into a single dereference that resolves | |
323 | @c once to the LCTYPE locale object used throughout the function, so it | |
324 | @c is safe in (optimized) practice, if not in theory, even when the | |
325 | @c locale changes. Ideally we'd explicitly save the resolved | |
326 | @c locale_data object to make it visibly safe instead of safe only under | |
327 | @c compiler optimizations, but given the decision that setlocale is | |
328 | @c MT-Unsafe, all this would afford us would be the ability to not mark | |
329 | @c this function with @mtslocale. | |
d17acc2b RJ |
330 | @code{wctype} returns a value representing a class of wide |
331 | characters which is identified by the string @var{property}. Besides | |
390955cb | 332 | some standard properties each locale can define its own ones. In case |
6dd5b57e UD |
333 | no property with the given name is known for the current locale |
334 | selected for the @code{LC_CTYPE} category, the function returns zero. | |
390955cb UD |
335 | |
336 | @noindent | |
337 | The properties known in every locale are: | |
338 | ||
339 | @multitable @columnfractions .25 .25 .25 .25 | |
340 | @item | |
341 | @code{"alnum"} @tab @code{"alpha"} @tab @code{"cntrl"} @tab @code{"digit"} | |
342 | @item | |
343 | @code{"graph"} @tab @code{"lower"} @tab @code{"print"} @tab @code{"punct"} | |
344 | @item | |
345 | @code{"space"} @tab @code{"upper"} @tab @code{"xdigit"} | |
346 | @end multitable | |
347 | ||
348 | @pindex wctype.h | |
349 | This function is declared in @file{wctype.h}. | |
350 | @end deftypefun | |
351 | ||
352 | To test the membership of a character to one of the non-standard classes | |
353 | the @w{ISO C} standard defines a completely new function. | |
354 | ||
390955cb | 355 | @deftypefun int iswctype (wint_t @var{wc}, wctype_t @var{desc}) |
d08a7e4c | 356 | @standards{ISO, wctype.h} |
c49130e3 AO |
357 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
358 | @c The compressed lookup table returned by wctype is read-only. | |
390955cb UD |
359 | This function returns a nonzero value if @var{wc} is in the character |
360 | class specified by @var{desc}. @var{desc} must previously be returned | |
361 | by a successful call to @code{wctype}. | |
362 | ||
363 | @pindex wctype.h | |
364 | This function is declared in @file{wctype.h}. | |
365 | @end deftypefun | |
366 | ||
6dd5b57e UD |
367 | To make it easier to use the commonly-used classification functions, |
368 | they are defined in the C library. There is no need to use | |
bc938d3d | 369 | @code{wctype} if the property string is one of the known character |
390955cb | 370 | classes. In some situations it is desirable to construct the property |
6dd5b57e | 371 | strings, and then it is important that @code{wctype} can also handle the |
390955cb UD |
372 | standard classes. |
373 | ||
374 | @cindex alphanumeric character | |
390955cb | 375 | @deftypefun int iswalnum (wint_t @var{wc}) |
d08a7e4c | 376 | @standards{ISO, wctype.h} |
c49130e3 AO |
377 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
378 | @c The implicit wctype call in the isw* functions is actually an | |
379 | @c optimized version because the category has a known offset, but the | |
380 | @c wctype is equally safe when optimized, unsafe with changing locales | |
381 | @c if not optimized (thus @mtslocale). Since it's not a macro, we | |
382 | @c always optimize, and the locale can't change in any MT-Safe way, it's | |
383 | @c fine. The test whether wc is ASCII to use the non-wide is* | |
384 | @c macro/function doesn't bring any other safety issues: the test does | |
385 | @c not depend on the locale, and each path after the decision resolves | |
386 | @c the locale object only once. | |
390955cb UD |
387 | This function returns a nonzero value if @var{wc} is an alphanumeric |
388 | character (a letter or number); in other words, if either @code{iswalpha} | |
389 | or @code{iswdigit} is true of a character, then @code{iswalnum} is also | |
390 | true. | |
391 | ||
392 | @noindent | |
393 | This function can be implemented using | |
394 | ||
395 | @smallexample | |
396 | iswctype (wc, wctype ("alnum")) | |
397 | @end smallexample | |
398 | ||
399 | @pindex wctype.h | |
18fd611b | 400 | It is declared in @file{wctype.h}. |
390955cb UD |
401 | @end deftypefun |
402 | ||
403 | @cindex alphabetic character | |
390955cb | 404 | @deftypefun int iswalpha (wint_t @var{wc}) |
d08a7e4c | 405 | @standards{ISO, wctype.h} |
c49130e3 | 406 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
390955cb UD |
407 | Returns true if @var{wc} is an alphabetic character (a letter). If |
408 | @code{iswlower} or @code{iswupper} is true of a character, then | |
409 | @code{iswalpha} is also true. | |
410 | ||
411 | In some locales, there may be additional characters for which | |
412 | @code{iswalpha} is true---letters which are neither upper case nor lower | |
413 | case. But in the standard @code{"C"} locale, there are no such | |
414 | additional characters. | |
415 | ||
416 | @noindent | |
417 | This function can be implemented using | |
418 | ||
419 | @smallexample | |
420 | iswctype (wc, wctype ("alpha")) | |
421 | @end smallexample | |
422 | ||
423 | @pindex wctype.h | |
18fd611b | 424 | It is declared in @file{wctype.h}. |
390955cb UD |
425 | @end deftypefun |
426 | ||
427 | @cindex control character | |
390955cb | 428 | @deftypefun int iswcntrl (wint_t @var{wc}) |
d08a7e4c | 429 | @standards{ISO, wctype.h} |
c49130e3 | 430 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
390955cb UD |
431 | Returns true if @var{wc} is a control character (that is, a character that |
432 | is not a printing character). | |
433 | ||
434 | @noindent | |
435 | This function can be implemented using | |
436 | ||
437 | @smallexample | |
438 | iswctype (wc, wctype ("cntrl")) | |
439 | @end smallexample | |
440 | ||
441 | @pindex wctype.h | |
18fd611b | 442 | It is declared in @file{wctype.h}. |
390955cb UD |
443 | @end deftypefun |
444 | ||
445 | @cindex digit character | |
390955cb | 446 | @deftypefun int iswdigit (wint_t @var{wc}) |
d08a7e4c | 447 | @standards{ISO, wctype.h} |
c49130e3 | 448 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
390955cb UD |
449 | Returns true if @var{wc} is a digit (e.g., @samp{0} through @samp{9}). |
450 | Please note that this function does not only return a nonzero value for | |
451 | @emph{decimal} digits, but for all kinds of digits. A consequence is | |
452 | that code like the following will @strong{not} work unconditionally for | |
453 | wide characters: | |
454 | ||
455 | @smallexample | |
456 | n = 0; | |
6dd5b57e | 457 | while (iswdigit (*wc)) |
390955cb UD |
458 | @{ |
459 | n *= 10; | |
460 | n += *wc++ - L'0'; | |
461 | @} | |
462 | @end smallexample | |
463 | ||
464 | @noindent | |
465 | This function can be implemented using | |
466 | ||
467 | @smallexample | |
468 | iswctype (wc, wctype ("digit")) | |
469 | @end smallexample | |
470 | ||
471 | @pindex wctype.h | |
18fd611b | 472 | It is declared in @file{wctype.h}. |
390955cb UD |
473 | @end deftypefun |
474 | ||
475 | @cindex graphic character | |
390955cb | 476 | @deftypefun int iswgraph (wint_t @var{wc}) |
d08a7e4c | 477 | @standards{ISO, wctype.h} |
c49130e3 | 478 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
390955cb UD |
479 | Returns true if @var{wc} is a graphic character; that is, a character |
480 | that has a glyph associated with it. The whitespace characters are not | |
481 | considered graphic. | |
482 | ||
483 | @noindent | |
484 | This function can be implemented using | |
485 | ||
486 | @smallexample | |
487 | iswctype (wc, wctype ("graph")) | |
488 | @end smallexample | |
489 | ||
490 | @pindex wctype.h | |
18fd611b | 491 | It is declared in @file{wctype.h}. |
390955cb UD |
492 | @end deftypefun |
493 | ||
494 | @cindex lower-case character | |
390955cb | 495 | @deftypefun int iswlower (wint_t @var{wc}) |
d08a7e4c | 496 | @standards{ISO, ctype.h} |
c49130e3 | 497 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
390955cb UD |
498 | Returns true if @var{wc} is a lower-case letter. The letter need not be |
499 | from the Latin alphabet, any alphabet representable is valid. | |
500 | ||
501 | @noindent | |
502 | This function can be implemented using | |
503 | ||
504 | @smallexample | |
505 | iswctype (wc, wctype ("lower")) | |
506 | @end smallexample | |
507 | ||
508 | @pindex wctype.h | |
18fd611b | 509 | It is declared in @file{wctype.h}. |
390955cb UD |
510 | @end deftypefun |
511 | ||
512 | @cindex printing character | |
390955cb | 513 | @deftypefun int iswprint (wint_t @var{wc}) |
d08a7e4c | 514 | @standards{ISO, wctype.h} |
c49130e3 | 515 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
390955cb UD |
516 | Returns true if @var{wc} is a printing character. Printing characters |
517 | include all the graphic characters, plus the space (@samp{ }) character. | |
518 | ||
519 | @noindent | |
520 | This function can be implemented using | |
521 | ||
522 | @smallexample | |
523 | iswctype (wc, wctype ("print")) | |
524 | @end smallexample | |
525 | ||
526 | @pindex wctype.h | |
18fd611b | 527 | It is declared in @file{wctype.h}. |
390955cb UD |
528 | @end deftypefun |
529 | ||
530 | @cindex punctuation character | |
390955cb | 531 | @deftypefun int iswpunct (wint_t @var{wc}) |
d08a7e4c | 532 | @standards{ISO, wctype.h} |
c49130e3 | 533 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
390955cb UD |
534 | Returns true if @var{wc} is a punctuation character. |
535 | This means any printing character that is not alphanumeric or a space | |
536 | character. | |
537 | ||
538 | @noindent | |
539 | This function can be implemented using | |
540 | ||
541 | @smallexample | |
542 | iswctype (wc, wctype ("punct")) | |
543 | @end smallexample | |
544 | ||
545 | @pindex wctype.h | |
18fd611b | 546 | It is declared in @file{wctype.h}. |
390955cb UD |
547 | @end deftypefun |
548 | ||
549 | @cindex whitespace character | |
390955cb | 550 | @deftypefun int iswspace (wint_t @var{wc}) |
d08a7e4c | 551 | @standards{ISO, wctype.h} |
c49130e3 | 552 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
390955cb UD |
553 | Returns true if @var{wc} is a @dfn{whitespace} character. In the standard |
554 | @code{"C"} locale, @code{iswspace} returns true for only the standard | |
555 | whitespace characters: | |
556 | ||
557 | @table @code | |
558 | @item L' ' | |
559 | space | |
560 | ||
561 | @item L'\f' | |
562 | formfeed | |
563 | ||
564 | @item L'\n' | |
565 | newline | |
566 | ||
567 | @item L'\r' | |
568 | carriage return | |
569 | ||
570 | @item L'\t' | |
571 | horizontal tab | |
572 | ||
573 | @item L'\v' | |
574 | vertical tab | |
575 | @end table | |
576 | ||
577 | @noindent | |
578 | This function can be implemented using | |
579 | ||
580 | @smallexample | |
581 | iswctype (wc, wctype ("space")) | |
582 | @end smallexample | |
583 | ||
584 | @pindex wctype.h | |
18fd611b | 585 | It is declared in @file{wctype.h}. |
390955cb UD |
586 | @end deftypefun |
587 | ||
588 | @cindex upper-case character | |
390955cb | 589 | @deftypefun int iswupper (wint_t @var{wc}) |
d08a7e4c | 590 | @standards{ISO, wctype.h} |
c49130e3 | 591 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
390955cb UD |
592 | Returns true if @var{wc} is an upper-case letter. The letter need not be |
593 | from the Latin alphabet, any alphabet representable is valid. | |
594 | ||
595 | @noindent | |
596 | This function can be implemented using | |
597 | ||
598 | @smallexample | |
599 | iswctype (wc, wctype ("upper")) | |
600 | @end smallexample | |
601 | ||
602 | @pindex wctype.h | |
18fd611b | 603 | It is declared in @file{wctype.h}. |
390955cb UD |
604 | @end deftypefun |
605 | ||
606 | @cindex hexadecimal digit character | |
390955cb | 607 | @deftypefun int iswxdigit (wint_t @var{wc}) |
d08a7e4c | 608 | @standards{ISO, wctype.h} |
c49130e3 | 609 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
390955cb UD |
610 | Returns true if @var{wc} is a hexadecimal digit. |
611 | Hexadecimal digits include the normal decimal digits @samp{0} through | |
612 | @samp{9} and the letters @samp{A} through @samp{F} and | |
613 | @samp{a} through @samp{f}. | |
614 | ||
615 | @noindent | |
616 | This function can be implemented using | |
617 | ||
618 | @smallexample | |
619 | iswctype (wc, wctype ("xdigit")) | |
620 | @end smallexample | |
621 | ||
622 | @pindex wctype.h | |
18fd611b | 623 | It is declared in @file{wctype.h}. |
390955cb UD |
624 | @end deftypefun |
625 | ||
1f77f049 | 626 | @Theglibc{} also provides a function which is not defined in the |
390955cb UD |
627 | @w{ISO C} standard but which is available as a version for single byte |
628 | characters as well. | |
629 | ||
630 | @cindex blank character | |
390955cb | 631 | @deftypefun int iswblank (wint_t @var{wc}) |
d08a7e4c | 632 | @standards{ISO, wctype.h} |
c49130e3 | 633 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
390955cb | 634 | Returns true if @var{wc} is a blank character; that is, a space or a tab. |
d3466201 RM |
635 | This function was originally a GNU extension, but was added in @w{ISO C99}. |
636 | It is declared in @file{wchar.h}. | |
390955cb UD |
637 | @end deftypefun |
638 | ||
639 | @node Using Wide Char Classes, Wide Character Case Conversion, Classification of Wide Characters, Character Handling | |
640 | @section Notes on using the wide character classes | |
641 | ||
6dd5b57e | 642 | The first note is probably not astonishing but still occasionally a |
390955cb | 643 | cause of problems. The @code{isw@var{XXX}} functions can be implemented |
1f77f049 | 644 | using macros and in fact, @theglibc{} does this. They are still |
390955cb | 645 | available as real functions but when the @file{wctype.h} header is |
6dd5b57e | 646 | included the macros will be used. This is the same as the |
390955cb UD |
647 | @code{char} type versions of these functions. |
648 | ||
bc938d3d UD |
649 | The second note covers something new. It can be best illustrated by a |
650 | (real-world) example. The first piece of code is an excerpt from the | |
651 | original code. It is truncated a bit but the intention should be clear. | |
390955cb UD |
652 | |
653 | @smallexample | |
654 | int | |
655 | is_in_class (int c, const char *class) | |
656 | @{ | |
657 | if (strcmp (class, "alnum") == 0) | |
658 | return isalnum (c); | |
659 | if (strcmp (class, "alpha") == 0) | |
660 | return isalpha (c); | |
661 | if (strcmp (class, "cntrl") == 0) | |
662 | return iscntrl (c); | |
95fdc6a0 | 663 | @dots{} |
390955cb UD |
664 | return 0; |
665 | @} | |
666 | @end smallexample | |
667 | ||
6dd5b57e UD |
668 | Now, with the @code{wctype} and @code{iswctype} you can avoid the |
669 | @code{if} cascades, but rewriting the code as follows is wrong: | |
390955cb UD |
670 | |
671 | @smallexample | |
672 | int | |
673 | is_in_class (int c, const char *class) | |
674 | @{ | |
675 | wctype_t desc = wctype (class); | |
676 | return desc ? iswctype ((wint_t) c, desc) : 0; | |
677 | @} | |
678 | @end smallexample | |
679 | ||
bc938d3d | 680 | The problem is that it is not guaranteed that the wide character |
390955cb | 681 | representation of a single-byte character can be found using casting. |
6dd5b57e | 682 | In fact, usually this fails miserably. The correct solution to this |
390955cb UD |
683 | problem is to write the code as follows: |
684 | ||
685 | @smallexample | |
686 | int | |
687 | is_in_class (int c, const char *class) | |
688 | @{ | |
689 | wctype_t desc = wctype (class); | |
690 | return desc ? iswctype (btowc (c), desc) : 0; | |
691 | @} | |
692 | @end smallexample | |
693 | ||
e18db2b0 | 694 | @xref{Converting a Character}, for more information on @code{btowc}. |
6dd5b57e | 695 | Note that this change probably does not improve the performance |
390955cb | 696 | of the program a lot since the @code{wctype} function still has to make |
6dd5b57e UD |
697 | the string comparisons. It gets really interesting if the |
698 | @code{is_in_class} function is called more than once for the | |
390955cb UD |
699 | same class name. In this case the variable @var{desc} could be computed |
700 | once and reused for all the calls. Therefore the above form of the | |
701 | function is probably not the final one. | |
702 | ||
703 | ||
704 | @node Wide Character Case Conversion, , Using Wide Char Classes, Character Handling | |
705 | @section Mapping of wide characters. | |
706 | ||
6dd5b57e UD |
707 | The classification functions are also generalized by the @w{ISO C} |
708 | standard. Instead of just allowing the two standard mappings, a | |
709 | locale can contain others. Again, the @code{localedef} program | |
710 | already supports generating such locale data files. | |
390955cb | 711 | |
390955cb | 712 | @deftp {Data Type} wctrans_t |
d08a7e4c | 713 | @standards{ISO, wctype.h} |
390955cb UD |
714 | This data type is defined as a scalar type which can hold a value |
715 | representing the locale-dependent character mapping. There is no way to | |
b912ca11 | 716 | construct such a value apart from using the return value of the |
390955cb UD |
717 | @code{wctrans} function. |
718 | ||
719 | @pindex wctype.h | |
720 | @noindent | |
721 | This type is defined in @file{wctype.h}. | |
722 | @end deftp | |
723 | ||
464d646f | 724 | @deftypefun wctrans_t wctrans (const char *@var{property}) |
d08a7e4c | 725 | @standards{ISO, wctype.h} |
c49130e3 AO |
726 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
727 | @c Similar implementation, same caveats as wctype. | |
390955cb UD |
728 | The @code{wctrans} function has to be used to find out whether a named |
729 | mapping is defined in the current locale selected for the | |
6dd5b57e UD |
730 | @code{LC_CTYPE} category. If the returned value is non-zero, you can use |
731 | it afterwards in calls to @code{towctrans}. If the return value is | |
390955cb UD |
732 | zero no such mapping is known in the current locale. |
733 | ||
734 | Beside locale-specific mappings there are two mappings which are | |
735 | guaranteed to be available in every locale: | |
736 | ||
737 | @multitable @columnfractions .5 .5 | |
738 | @item | |
739 | @code{"tolower"} @tab @code{"toupper"} | |
740 | @end multitable | |
741 | ||
742 | @pindex wctype.h | |
743 | @noindent | |
6dd5b57e | 744 | These functions are declared in @file{wctype.h}. |
390955cb UD |
745 | @end deftypefun |
746 | ||
390955cb | 747 | @deftypefun wint_t towctrans (wint_t @var{wc}, wctrans_t @var{desc}) |
d08a7e4c | 748 | @standards{ISO, wctype.h} |
c49130e3 AO |
749 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
750 | @c Same caveats as iswctype. | |
6dd5b57e UD |
751 | @code{towctrans} maps the input character @var{wc} |
752 | according to the rules of the mapping for which @var{desc} is a | |
753 | descriptor, and returns the value it finds. @var{desc} must be | |
390955cb UD |
754 | obtained by a successful call to @code{wctrans}. |
755 | ||
756 | @pindex wctype.h | |
757 | @noindent | |
758 | This function is declared in @file{wctype.h}. | |
759 | @end deftypefun | |
760 | ||
6dd5b57e UD |
761 | For the generally available mappings, the @w{ISO C} standard defines |
762 | convenient shortcuts so that it is not necessary to call @code{wctrans} | |
390955cb UD |
763 | for them. |
764 | ||
390955cb | 765 | @deftypefun wint_t towlower (wint_t @var{wc}) |
d08a7e4c | 766 | @standards{ISO, wctype.h} |
c49130e3 AO |
767 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
768 | @c Same caveats as iswalnum, just using a wctrans rather than a wctype | |
769 | @c table. | |
390955cb UD |
770 | If @var{wc} is an upper-case letter, @code{towlower} returns the corresponding |
771 | lower-case letter. If @var{wc} is not an upper-case letter, | |
772 | @var{wc} is returned unchanged. | |
773 | ||
18fd611b UD |
774 | @noindent |
775 | @code{towlower} can be implemented using | |
776 | ||
777 | @smallexample | |
778 | towctrans (wc, wctrans ("tolower")) | |
779 | @end smallexample | |
780 | ||
390955cb UD |
781 | @pindex wctype.h |
782 | @noindent | |
783 | This function is declared in @file{wctype.h}. | |
784 | @end deftypefun | |
785 | ||
390955cb | 786 | @deftypefun wint_t towupper (wint_t @var{wc}) |
d08a7e4c | 787 | @standards{ISO, wctype.h} |
c49130e3 | 788 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
390955cb UD |
789 | If @var{wc} is a lower-case letter, @code{towupper} returns the corresponding |
790 | upper-case letter. Otherwise @var{wc} is returned unchanged. | |
791 | ||
18fd611b UD |
792 | @noindent |
793 | @code{towupper} can be implemented using | |
794 | ||
795 | @smallexample | |
796 | towctrans (wc, wctrans ("toupper")) | |
797 | @end smallexample | |
798 | ||
390955cb UD |
799 | @pindex wctype.h |
800 | @noindent | |
801 | This function is declared in @file{wctype.h}. | |
802 | @end deftypefun | |
803 | ||
804 | The same warnings given in the last section for the use of the wide | |
6dd5b57e | 805 | character classification functions apply here. It is not possible to |
390955cb | 806 | simply cast a @code{char} type value to a @code{wint_t} and use it as an |
6dd5b57e | 807 | argument to @code{towctrans} calls. |