]>
Commit | Line | Data |
---|---|---|
99a20616 | 1 | @node Character Handling, String and Array Utilities, Memory, Top |
7a68c94a | 2 | @c %MENU% Character testing and conversion functions |
28f540f4 RM |
3 | @chapter Character Handling |
4 | ||
5 | Programs that work with characters and strings often need to classify a | |
6 | character---is it alphabetic, is it a digit, is it whitespace, and so | |
7 | on---and perform case conversion operations on characters. The | |
8 | functions in the header file @file{ctype.h} are provided for this | |
9 | purpose. | |
10 | @pindex ctype.h | |
11 | ||
12 | Since the choice of locale and character set can alter the | |
13 | classifications of particular character codes, all of these functions | |
14 | are affected by the current locale. (More precisely, they are affected | |
15 | by the locale currently selected for character classification---the | |
16 | @code{LC_CTYPE} category; see @ref{Locale Categories}.) | |
17 | ||
390955cb UD |
18 | The @w{ISO C} standard specifies two different sets of functions. The |
19 | one set works on @code{char} type characters, the other one on | |
bc938d3d | 20 | @code{wchar_t} wide characters (@pxref{Extended Char Intro}). |
28f540f4 | 21 | |
390955cb UD |
22 | @menu |
23 | * Classification of Characters:: Testing whether characters are | |
24 | letters, digits, punctuation, etc. | |
25 | ||
26 | * Case Conversion:: Case mapping, and the like. | |
27 | * Classification of Wide Characters:: Character class determination for | |
28 | wide characters. | |
29 | * Using Wide Char Classes:: Notes on using the wide character | |
30 | classes. | |
31 | * Wide Character Case Conversion:: Mapping of wide characters. | |
28f540f4 RM |
32 | @end menu |
33 | ||
34 | @node Classification of Characters, Case Conversion, , Character Handling | |
35 | @section Classification of Characters | |
36 | @cindex character testing | |
37 | @cindex classification of characters | |
38 | @cindex predicates on characters | |
39 | @cindex character predicates | |
40 | ||
41 | This section explains the library functions for classifying characters. | |
42 | For example, @code{isalpha} is the function to test for an alphabetic | |
43 | character. It takes one argument, the character to test, and returns a | |
44 | nonzero integer if the character is alphabetic, and zero otherwise. You | |
45 | would use it like this: | |
46 | ||
47 | @smallexample | |
48 | if (isalpha (c)) | |
49 | printf ("The character `%c' is alphabetic.\n", c); | |
50 | @end smallexample | |
51 | ||
52 | Each of the functions in this section tests for membership in a | |
53 | particular class of characters; each has a name starting with @samp{is}. | |
54 | Each of them takes one argument, which is a character to test, and | |
55 | returns an @code{int} which is treated as a boolean value. The | |
56 | character argument is passed as an @code{int}, and it may be the | |
f65fd747 | 57 | constant value @code{EOF} instead of a real character. |
28f540f4 RM |
58 | |
59 | The attributes of any given character can vary between locales. | |
60 | @xref{Locales}, for more information on locales.@refill | |
61 | ||
62 | These functions are declared in the header file @file{ctype.h}. | |
63 | @pindex ctype.h | |
64 | ||
65 | @cindex lower-case character | |
66 | @comment ctype.h | |
f65fd747 | 67 | @comment ISO |
28f540f4 | 68 | @deftypefun int islower (int @var{c}) |
c49130e3 AO |
69 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
70 | @c The is* macros call __ctype_b_loc to get the ctype array from the | |
71 | @c current locale, and then index it by c. __ctype_b_loc reads from | |
72 | @c thread-local memory the (indirect) pointer to the ctype array, which | |
73 | @c may involve one word access to the global locale object, if that's | |
74 | @c the active locale for the thread, and the array, being part of the | |
75 | @c locale data, is undeletable, so there's no thread-safety issue. We | |
76 | @c might want to mark these with @mtslocale to flag to callers that | |
77 | @c changing locales might affect them, even if not these simpler | |
78 | @c functions. | |
390955cb UD |
79 | Returns true if @var{c} is a lower-case letter. The letter need not be |
80 | from the Latin alphabet, any alphabet representable is valid. | |
28f540f4 RM |
81 | @end deftypefun |
82 | ||
83 | @cindex upper-case character | |
84 | @comment ctype.h | |
f65fd747 | 85 | @comment ISO |
28f540f4 | 86 | @deftypefun int isupper (int @var{c}) |
c49130e3 | 87 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
390955cb UD |
88 | Returns true if @var{c} is an upper-case letter. The letter need not be |
89 | from the Latin alphabet, any alphabet representable is valid. | |
28f540f4 RM |
90 | @end deftypefun |
91 | ||
92 | @cindex alphabetic character | |
93 | @comment ctype.h | |
f65fd747 | 94 | @comment ISO |
28f540f4 | 95 | @deftypefun int isalpha (int @var{c}) |
c49130e3 | 96 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
97 | Returns true if @var{c} is an alphabetic character (a letter). If |
98 | @code{islower} or @code{isupper} is true of a character, then | |
99 | @code{isalpha} is also true. | |
100 | ||
101 | In some locales, there may be additional characters for which | |
cc3fa755 | 102 | @code{isalpha} is true---letters which are neither upper case nor lower |
28f540f4 RM |
103 | case. But in the standard @code{"C"} locale, there are no such |
104 | additional characters. | |
105 | @end deftypefun | |
106 | ||
107 | @cindex digit character | |
108 | @cindex decimal digit character | |
109 | @comment ctype.h | |
f65fd747 | 110 | @comment ISO |
28f540f4 | 111 | @deftypefun int isdigit (int @var{c}) |
c49130e3 | 112 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
113 | Returns true if @var{c} is a decimal digit (@samp{0} through @samp{9}). |
114 | @end deftypefun | |
115 | ||
116 | @cindex alphanumeric character | |
117 | @comment ctype.h | |
f65fd747 | 118 | @comment ISO |
28f540f4 | 119 | @deftypefun int isalnum (int @var{c}) |
c49130e3 | 120 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
121 | Returns true if @var{c} is an alphanumeric character (a letter or |
122 | number); in other words, if either @code{isalpha} or @code{isdigit} is | |
123 | true of a character, then @code{isalnum} is also true. | |
124 | @end deftypefun | |
125 | ||
126 | @cindex hexadecimal digit character | |
127 | @comment ctype.h | |
f65fd747 | 128 | @comment ISO |
28f540f4 | 129 | @deftypefun int isxdigit (int @var{c}) |
c49130e3 | 130 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
131 | Returns true if @var{c} is a hexadecimal digit. |
132 | Hexadecimal digits include the normal decimal digits @samp{0} through | |
133 | @samp{9} and the letters @samp{A} through @samp{F} and | |
134 | @samp{a} through @samp{f}. | |
135 | @end deftypefun | |
136 | ||
137 | @cindex punctuation character | |
138 | @comment ctype.h | |
f65fd747 | 139 | @comment ISO |
28f540f4 | 140 | @deftypefun int ispunct (int @var{c}) |
c49130e3 | 141 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
142 | Returns true if @var{c} is a punctuation character. |
143 | This means any printing character that is not alphanumeric or a space | |
144 | character. | |
145 | @end deftypefun | |
146 | ||
147 | @cindex whitespace character | |
148 | @comment ctype.h | |
f65fd747 | 149 | @comment ISO |
28f540f4 | 150 | @deftypefun int isspace (int @var{c}) |
c49130e3 | 151 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
152 | Returns true if @var{c} is a @dfn{whitespace} character. In the standard |
153 | @code{"C"} locale, @code{isspace} returns true for only the standard | |
154 | whitespace characters: | |
155 | ||
156 | @table @code | |
157 | @item ' ' | |
158 | space | |
159 | ||
160 | @item '\f' | |
161 | formfeed | |
162 | ||
163 | @item '\n' | |
164 | newline | |
165 | ||
166 | @item '\r' | |
167 | carriage return | |
168 | ||
169 | @item '\t' | |
170 | horizontal tab | |
171 | ||
172 | @item '\v' | |
173 | vertical tab | |
174 | @end table | |
175 | @end deftypefun | |
176 | ||
177 | @cindex blank character | |
178 | @comment ctype.h | |
b33ed432 | 179 | @comment ISO |
28f540f4 | 180 | @deftypefun int isblank (int @var{c}) |
c49130e3 | 181 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 | 182 | Returns true if @var{c} is a blank character; that is, a space or a tab. |
d3466201 | 183 | This function was originally a GNU extension, but was added in @w{ISO C99}. |
28f540f4 RM |
184 | @end deftypefun |
185 | ||
186 | @cindex graphic character | |
187 | @comment ctype.h | |
f65fd747 | 188 | @comment ISO |
28f540f4 | 189 | @deftypefun int isgraph (int @var{c}) |
c49130e3 | 190 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
191 | Returns true if @var{c} is a graphic character; that is, a character |
192 | that has a glyph associated with it. The whitespace characters are not | |
193 | considered graphic. | |
194 | @end deftypefun | |
195 | ||
196 | @cindex printing character | |
197 | @comment ctype.h | |
f65fd747 | 198 | @comment ISO |
28f540f4 | 199 | @deftypefun int isprint (int @var{c}) |
c49130e3 | 200 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
201 | Returns true if @var{c} is a printing character. Printing characters |
202 | include all the graphic characters, plus the space (@samp{ }) character. | |
203 | @end deftypefun | |
204 | ||
205 | @cindex control character | |
206 | @comment ctype.h | |
f65fd747 | 207 | @comment ISO |
28f540f4 | 208 | @deftypefun int iscntrl (int @var{c}) |
c49130e3 | 209 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
210 | Returns true if @var{c} is a control character (that is, a character that |
211 | is not a printing character). | |
212 | @end deftypefun | |
213 | ||
214 | @cindex ASCII character | |
215 | @comment ctype.h | |
216 | @comment SVID, BSD | |
217 | @deftypefun int isascii (int @var{c}) | |
c49130e3 | 218 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
219 | Returns true if @var{c} is a 7-bit @code{unsigned char} value that fits |
220 | into the US/UK ASCII character set. This function is a BSD extension | |
221 | and is also an SVID extension. | |
222 | @end deftypefun | |
223 | ||
390955cb | 224 | @node Case Conversion, Classification of Wide Characters, Classification of Characters, Character Handling |
28f540f4 RM |
225 | @section Case Conversion |
226 | @cindex character case conversion | |
227 | @cindex case conversion of characters | |
228 | @cindex converting case of characters | |
229 | ||
230 | This section explains the library functions for performing conversions | |
231 | such as case mappings on characters. For example, @code{toupper} | |
232 | converts any character to upper case if possible. If the character | |
233 | can't be converted, @code{toupper} returns it unchanged. | |
234 | ||
235 | These functions take one argument of type @code{int}, which is the | |
236 | character to convert, and return the converted character as an | |
237 | @code{int}. If the conversion is not applicable to the argument given, | |
238 | the argument is returned unchanged. | |
239 | ||
f65fd747 | 240 | @strong{Compatibility Note:} In pre-@w{ISO C} dialects, instead of |
28f540f4 RM |
241 | returning the argument unchanged, these functions may fail when the |
242 | argument is not suitable for the conversion. Thus for portability, you | |
243 | may need to write @code{islower(c) ? toupper(c) : c} rather than just | |
244 | @code{toupper(c)}. | |
245 | ||
246 | These functions are declared in the header file @file{ctype.h}. | |
247 | @pindex ctype.h | |
248 | ||
249 | @comment ctype.h | |
f65fd747 | 250 | @comment ISO |
28f540f4 | 251 | @deftypefun int tolower (int @var{c}) |
c49130e3 AO |
252 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
253 | @c The to* macros/functions call different functions that use different | |
254 | @c arrays than those of__ctype_b_loc, but the access patterns and | |
255 | @c thus safety guarantees are the same. | |
28f540f4 RM |
256 | If @var{c} is an upper-case letter, @code{tolower} returns the corresponding |
257 | lower-case letter. If @var{c} is not an upper-case letter, | |
258 | @var{c} is returned unchanged. | |
259 | @end deftypefun | |
260 | ||
261 | @comment ctype.h | |
f65fd747 | 262 | @comment ISO |
28f540f4 | 263 | @deftypefun int toupper (int @var{c}) |
c49130e3 | 264 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
390955cb | 265 | If @var{c} is a lower-case letter, @code{toupper} returns the corresponding |
28f540f4 RM |
266 | upper-case letter. Otherwise @var{c} is returned unchanged. |
267 | @end deftypefun | |
268 | ||
269 | @comment ctype.h | |
270 | @comment SVID, BSD | |
271 | @deftypefun int toascii (int @var{c}) | |
c49130e3 | 272 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
273 | This function converts @var{c} to a 7-bit @code{unsigned char} value |
274 | that fits into the US/UK ASCII character set, by clearing the high-order | |
275 | bits. This function is a BSD extension and is also an SVID extension. | |
276 | @end deftypefun | |
277 | ||
278 | @comment ctype.h | |
279 | @comment SVID | |
280 | @deftypefun int _tolower (int @var{c}) | |
c49130e3 | 281 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
282 | This is identical to @code{tolower}, and is provided for compatibility |
283 | with the SVID. @xref{SVID}.@refill | |
284 | @end deftypefun | |
285 | ||
286 | @comment ctype.h | |
287 | @comment SVID | |
288 | @deftypefun int _toupper (int @var{c}) | |
c49130e3 | 289 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
28f540f4 RM |
290 | This is identical to @code{toupper}, and is provided for compatibility |
291 | with the SVID. | |
292 | @end deftypefun | |
390955cb UD |
293 | |
294 | ||
295 | @node Classification of Wide Characters, Using Wide Char Classes, Case Conversion, Character Handling | |
296 | @section Character class determination for wide characters | |
297 | ||
aaca11d8 UD |
298 | @w{Amendment 1} to @w{ISO C90} defines functions to classify wide |
299 | characters. Although the original @w{ISO C90} standard already defined | |
6dd5b57e | 300 | the type @code{wchar_t}, no functions operating on them were defined. |
390955cb UD |
301 | |
302 | The general design of the classification functions for wide characters | |
6dd5b57e UD |
303 | is more general. It allows extensions to the set of available |
304 | classifications, beyond those which are always available. The POSIX | |
305 | standard specifies how extensions can be made, and this is already | |
1f77f049 | 306 | implemented in the @glibcadj{} implementation of the @code{localedef} |
bc938d3d | 307 | program. |
390955cb | 308 | |
6dd5b57e UD |
309 | The character class functions are normally implemented with bitsets, |
310 | with a bitset per character. For a given character, the appropriate | |
311 | bitset is read from a table and a test is performed as to whether a | |
312 | certain bit is set. Which bit is tested for is determined by the | |
313 | class. | |
390955cb UD |
314 | |
315 | For the wide character classification functions this is made visible. | |
6dd5b57e UD |
316 | There is a type classification type defined, a function to retrieve this |
317 | value for a given class, and a function to test whether a given | |
318 | character is in this class, using the classification value. On top of | |
319 | this the normal character classification functions as used for | |
390955cb UD |
320 | @code{char} objects can be defined. |
321 | ||
322 | @comment wctype.h | |
323 | @comment ISO | |
324 | @deftp {Data type} wctype_t | |
325 | The @code{wctype_t} can hold a value which represents a character class. | |
6dd5b57e | 326 | The only defined way to generate such a value is by using the |
390955cb UD |
327 | @code{wctype} function. |
328 | ||
329 | @pindex wctype.h | |
330 | This type is defined in @file{wctype.h}. | |
331 | @end deftp | |
332 | ||
333 | @comment wctype.h | |
334 | @comment ISO | |
335 | @deftypefun wctype_t wctype (const char *@var{property}) | |
c49130e3 AO |
336 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
337 | @c Although the source code of wctype contains multiple references to | |
338 | @c the locale, that could each reference different locale_data objects | |
339 | @c should the global locale object change while active, the compiler can | |
340 | @c and does combine them all into a single dereference that resolves | |
341 | @c once to the LCTYPE locale object used throughout the function, so it | |
342 | @c is safe in (optimized) practice, if not in theory, even when the | |
343 | @c locale changes. Ideally we'd explicitly save the resolved | |
344 | @c locale_data object to make it visibly safe instead of safe only under | |
345 | @c compiler optimizations, but given the decision that setlocale is | |
346 | @c MT-Unsafe, all this would afford us would be the ability to not mark | |
347 | @c this function with @mtslocale. | |
390955cb UD |
348 | The @code{wctype} returns a value representing a class of wide |
349 | characters which is identified by the string @var{property}. Beside | |
350 | some standard properties each locale can define its own ones. In case | |
6dd5b57e UD |
351 | no property with the given name is known for the current locale |
352 | selected for the @code{LC_CTYPE} category, the function returns zero. | |
390955cb UD |
353 | |
354 | @noindent | |
355 | The properties known in every locale are: | |
356 | ||
357 | @multitable @columnfractions .25 .25 .25 .25 | |
358 | @item | |
359 | @code{"alnum"} @tab @code{"alpha"} @tab @code{"cntrl"} @tab @code{"digit"} | |
360 | @item | |
361 | @code{"graph"} @tab @code{"lower"} @tab @code{"print"} @tab @code{"punct"} | |
362 | @item | |
363 | @code{"space"} @tab @code{"upper"} @tab @code{"xdigit"} | |
364 | @end multitable | |
365 | ||
366 | @pindex wctype.h | |
367 | This function is declared in @file{wctype.h}. | |
368 | @end deftypefun | |
369 | ||
370 | To test the membership of a character to one of the non-standard classes | |
371 | the @w{ISO C} standard defines a completely new function. | |
372 | ||
373 | @comment wctype.h | |
374 | @comment ISO | |
375 | @deftypefun int iswctype (wint_t @var{wc}, wctype_t @var{desc}) | |
c49130e3 AO |
376 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
377 | @c The compressed lookup table returned by wctype is read-only. | |
390955cb UD |
378 | This function returns a nonzero value if @var{wc} is in the character |
379 | class specified by @var{desc}. @var{desc} must previously be returned | |
380 | by a successful call to @code{wctype}. | |
381 | ||
382 | @pindex wctype.h | |
383 | This function is declared in @file{wctype.h}. | |
384 | @end deftypefun | |
385 | ||
6dd5b57e UD |
386 | To make it easier to use the commonly-used classification functions, |
387 | they are defined in the C library. There is no need to use | |
bc938d3d | 388 | @code{wctype} if the property string is one of the known character |
390955cb | 389 | classes. In some situations it is desirable to construct the property |
6dd5b57e | 390 | strings, and then it is important that @code{wctype} can also handle the |
390955cb UD |
391 | standard classes. |
392 | ||
393 | @cindex alphanumeric character | |
394 | @comment wctype.h | |
395 | @comment ISO | |
396 | @deftypefun int iswalnum (wint_t @var{wc}) | |
c49130e3 AO |
397 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
398 | @c The implicit wctype call in the isw* functions is actually an | |
399 | @c optimized version because the category has a known offset, but the | |
400 | @c wctype is equally safe when optimized, unsafe with changing locales | |
401 | @c if not optimized (thus @mtslocale). Since it's not a macro, we | |
402 | @c always optimize, and the locale can't change in any MT-Safe way, it's | |
403 | @c fine. The test whether wc is ASCII to use the non-wide is* | |
404 | @c macro/function doesn't bring any other safety issues: the test does | |
405 | @c not depend on the locale, and each path after the decision resolves | |
406 | @c the locale object only once. | |
390955cb UD |
407 | This function returns a nonzero value if @var{wc} is an alphanumeric |
408 | character (a letter or number); in other words, if either @code{iswalpha} | |
409 | or @code{iswdigit} is true of a character, then @code{iswalnum} is also | |
410 | true. | |
411 | ||
412 | @noindent | |
413 | This function can be implemented using | |
414 | ||
415 | @smallexample | |
416 | iswctype (wc, wctype ("alnum")) | |
417 | @end smallexample | |
418 | ||
419 | @pindex wctype.h | |
18fd611b | 420 | It is declared in @file{wctype.h}. |
390955cb UD |
421 | @end deftypefun |
422 | ||
423 | @cindex alphabetic character | |
424 | @comment wctype.h | |
425 | @comment ISO | |
426 | @deftypefun int iswalpha (wint_t @var{wc}) | |
c49130e3 | 427 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
390955cb UD |
428 | Returns true if @var{wc} is an alphabetic character (a letter). If |
429 | @code{iswlower} or @code{iswupper} is true of a character, then | |
430 | @code{iswalpha} is also true. | |
431 | ||
432 | In some locales, there may be additional characters for which | |
433 | @code{iswalpha} is true---letters which are neither upper case nor lower | |
434 | case. But in the standard @code{"C"} locale, there are no such | |
435 | additional characters. | |
436 | ||
437 | @noindent | |
438 | This function can be implemented using | |
439 | ||
440 | @smallexample | |
441 | iswctype (wc, wctype ("alpha")) | |
442 | @end smallexample | |
443 | ||
444 | @pindex wctype.h | |
18fd611b | 445 | It is declared in @file{wctype.h}. |
390955cb UD |
446 | @end deftypefun |
447 | ||
448 | @cindex control character | |
449 | @comment wctype.h | |
450 | @comment ISO | |
451 | @deftypefun int iswcntrl (wint_t @var{wc}) | |
c49130e3 | 452 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
390955cb UD |
453 | Returns true if @var{wc} is a control character (that is, a character that |
454 | is not a printing character). | |
455 | ||
456 | @noindent | |
457 | This function can be implemented using | |
458 | ||
459 | @smallexample | |
460 | iswctype (wc, wctype ("cntrl")) | |
461 | @end smallexample | |
462 | ||
463 | @pindex wctype.h | |
18fd611b | 464 | It is declared in @file{wctype.h}. |
390955cb UD |
465 | @end deftypefun |
466 | ||
467 | @cindex digit character | |
468 | @comment wctype.h | |
469 | @comment ISO | |
470 | @deftypefun int iswdigit (wint_t @var{wc}) | |
c49130e3 | 471 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
390955cb UD |
472 | Returns true if @var{wc} is a digit (e.g., @samp{0} through @samp{9}). |
473 | Please note that this function does not only return a nonzero value for | |
474 | @emph{decimal} digits, but for all kinds of digits. A consequence is | |
475 | that code like the following will @strong{not} work unconditionally for | |
476 | wide characters: | |
477 | ||
478 | @smallexample | |
479 | n = 0; | |
6dd5b57e | 480 | while (iswdigit (*wc)) |
390955cb UD |
481 | @{ |
482 | n *= 10; | |
483 | n += *wc++ - L'0'; | |
484 | @} | |
485 | @end smallexample | |
486 | ||
487 | @noindent | |
488 | This function can be implemented using | |
489 | ||
490 | @smallexample | |
491 | iswctype (wc, wctype ("digit")) | |
492 | @end smallexample | |
493 | ||
494 | @pindex wctype.h | |
18fd611b | 495 | It is declared in @file{wctype.h}. |
390955cb UD |
496 | @end deftypefun |
497 | ||
498 | @cindex graphic character | |
499 | @comment wctype.h | |
500 | @comment ISO | |
501 | @deftypefun int iswgraph (wint_t @var{wc}) | |
c49130e3 | 502 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
390955cb UD |
503 | Returns true if @var{wc} is a graphic character; that is, a character |
504 | that has a glyph associated with it. The whitespace characters are not | |
505 | considered graphic. | |
506 | ||
507 | @noindent | |
508 | This function can be implemented using | |
509 | ||
510 | @smallexample | |
511 | iswctype (wc, wctype ("graph")) | |
512 | @end smallexample | |
513 | ||
514 | @pindex wctype.h | |
18fd611b | 515 | It is declared in @file{wctype.h}. |
390955cb UD |
516 | @end deftypefun |
517 | ||
518 | @cindex lower-case character | |
519 | @comment ctype.h | |
520 | @comment ISO | |
521 | @deftypefun int iswlower (wint_t @var{wc}) | |
c49130e3 | 522 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
390955cb UD |
523 | Returns true if @var{wc} is a lower-case letter. The letter need not be |
524 | from the Latin alphabet, any alphabet representable is valid. | |
525 | ||
526 | @noindent | |
527 | This function can be implemented using | |
528 | ||
529 | @smallexample | |
530 | iswctype (wc, wctype ("lower")) | |
531 | @end smallexample | |
532 | ||
533 | @pindex wctype.h | |
18fd611b | 534 | It is declared in @file{wctype.h}. |
390955cb UD |
535 | @end deftypefun |
536 | ||
537 | @cindex printing character | |
538 | @comment wctype.h | |
539 | @comment ISO | |
540 | @deftypefun int iswprint (wint_t @var{wc}) | |
c49130e3 | 541 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
390955cb UD |
542 | Returns true if @var{wc} is a printing character. Printing characters |
543 | include all the graphic characters, plus the space (@samp{ }) character. | |
544 | ||
545 | @noindent | |
546 | This function can be implemented using | |
547 | ||
548 | @smallexample | |
549 | iswctype (wc, wctype ("print")) | |
550 | @end smallexample | |
551 | ||
552 | @pindex wctype.h | |
18fd611b | 553 | It is declared in @file{wctype.h}. |
390955cb UD |
554 | @end deftypefun |
555 | ||
556 | @cindex punctuation character | |
557 | @comment wctype.h | |
558 | @comment ISO | |
559 | @deftypefun int iswpunct (wint_t @var{wc}) | |
c49130e3 | 560 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
390955cb UD |
561 | Returns true if @var{wc} is a punctuation character. |
562 | This means any printing character that is not alphanumeric or a space | |
563 | character. | |
564 | ||
565 | @noindent | |
566 | This function can be implemented using | |
567 | ||
568 | @smallexample | |
569 | iswctype (wc, wctype ("punct")) | |
570 | @end smallexample | |
571 | ||
572 | @pindex wctype.h | |
18fd611b | 573 | It is declared in @file{wctype.h}. |
390955cb UD |
574 | @end deftypefun |
575 | ||
576 | @cindex whitespace character | |
577 | @comment wctype.h | |
578 | @comment ISO | |
579 | @deftypefun int iswspace (wint_t @var{wc}) | |
c49130e3 | 580 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
390955cb UD |
581 | Returns true if @var{wc} is a @dfn{whitespace} character. In the standard |
582 | @code{"C"} locale, @code{iswspace} returns true for only the standard | |
583 | whitespace characters: | |
584 | ||
585 | @table @code | |
586 | @item L' ' | |
587 | space | |
588 | ||
589 | @item L'\f' | |
590 | formfeed | |
591 | ||
592 | @item L'\n' | |
593 | newline | |
594 | ||
595 | @item L'\r' | |
596 | carriage return | |
597 | ||
598 | @item L'\t' | |
599 | horizontal tab | |
600 | ||
601 | @item L'\v' | |
602 | vertical tab | |
603 | @end table | |
604 | ||
605 | @noindent | |
606 | This function can be implemented using | |
607 | ||
608 | @smallexample | |
609 | iswctype (wc, wctype ("space")) | |
610 | @end smallexample | |
611 | ||
612 | @pindex wctype.h | |
18fd611b | 613 | It is declared in @file{wctype.h}. |
390955cb UD |
614 | @end deftypefun |
615 | ||
616 | @cindex upper-case character | |
617 | @comment wctype.h | |
618 | @comment ISO | |
619 | @deftypefun int iswupper (wint_t @var{wc}) | |
c49130e3 | 620 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
390955cb UD |
621 | Returns true if @var{wc} is an upper-case letter. The letter need not be |
622 | from the Latin alphabet, any alphabet representable is valid. | |
623 | ||
624 | @noindent | |
625 | This function can be implemented using | |
626 | ||
627 | @smallexample | |
628 | iswctype (wc, wctype ("upper")) | |
629 | @end smallexample | |
630 | ||
631 | @pindex wctype.h | |
18fd611b | 632 | It is declared in @file{wctype.h}. |
390955cb UD |
633 | @end deftypefun |
634 | ||
635 | @cindex hexadecimal digit character | |
636 | @comment wctype.h | |
637 | @comment ISO | |
638 | @deftypefun int iswxdigit (wint_t @var{wc}) | |
c49130e3 | 639 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
390955cb UD |
640 | Returns true if @var{wc} is a hexadecimal digit. |
641 | Hexadecimal digits include the normal decimal digits @samp{0} through | |
642 | @samp{9} and the letters @samp{A} through @samp{F} and | |
643 | @samp{a} through @samp{f}. | |
644 | ||
645 | @noindent | |
646 | This function can be implemented using | |
647 | ||
648 | @smallexample | |
649 | iswctype (wc, wctype ("xdigit")) | |
650 | @end smallexample | |
651 | ||
652 | @pindex wctype.h | |
18fd611b | 653 | It is declared in @file{wctype.h}. |
390955cb UD |
654 | @end deftypefun |
655 | ||
1f77f049 | 656 | @Theglibc{} also provides a function which is not defined in the |
390955cb UD |
657 | @w{ISO C} standard but which is available as a version for single byte |
658 | characters as well. | |
659 | ||
660 | @cindex blank character | |
661 | @comment wctype.h | |
b33ed432 | 662 | @comment ISO |
390955cb | 663 | @deftypefun int iswblank (wint_t @var{wc}) |
c49130e3 | 664 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
390955cb | 665 | Returns true if @var{wc} is a blank character; that is, a space or a tab. |
d3466201 RM |
666 | This function was originally a GNU extension, but was added in @w{ISO C99}. |
667 | It is declared in @file{wchar.h}. | |
390955cb UD |
668 | @end deftypefun |
669 | ||
670 | @node Using Wide Char Classes, Wide Character Case Conversion, Classification of Wide Characters, Character Handling | |
671 | @section Notes on using the wide character classes | |
672 | ||
6dd5b57e | 673 | The first note is probably not astonishing but still occasionally a |
390955cb | 674 | cause of problems. The @code{isw@var{XXX}} functions can be implemented |
1f77f049 | 675 | using macros and in fact, @theglibc{} does this. They are still |
390955cb | 676 | available as real functions but when the @file{wctype.h} header is |
6dd5b57e | 677 | included the macros will be used. This is the same as the |
390955cb UD |
678 | @code{char} type versions of these functions. |
679 | ||
bc938d3d UD |
680 | The second note covers something new. It can be best illustrated by a |
681 | (real-world) example. The first piece of code is an excerpt from the | |
682 | original code. It is truncated a bit but the intention should be clear. | |
390955cb UD |
683 | |
684 | @smallexample | |
685 | int | |
686 | is_in_class (int c, const char *class) | |
687 | @{ | |
688 | if (strcmp (class, "alnum") == 0) | |
689 | return isalnum (c); | |
690 | if (strcmp (class, "alpha") == 0) | |
691 | return isalpha (c); | |
692 | if (strcmp (class, "cntrl") == 0) | |
693 | return iscntrl (c); | |
95fdc6a0 | 694 | @dots{} |
390955cb UD |
695 | return 0; |
696 | @} | |
697 | @end smallexample | |
698 | ||
6dd5b57e UD |
699 | Now, with the @code{wctype} and @code{iswctype} you can avoid the |
700 | @code{if} cascades, but rewriting the code as follows is wrong: | |
390955cb UD |
701 | |
702 | @smallexample | |
703 | int | |
704 | is_in_class (int c, const char *class) | |
705 | @{ | |
706 | wctype_t desc = wctype (class); | |
707 | return desc ? iswctype ((wint_t) c, desc) : 0; | |
708 | @} | |
709 | @end smallexample | |
710 | ||
bc938d3d | 711 | The problem is that it is not guaranteed that the wide character |
390955cb | 712 | representation of a single-byte character can be found using casting. |
6dd5b57e | 713 | In fact, usually this fails miserably. The correct solution to this |
390955cb UD |
714 | problem is to write the code as follows: |
715 | ||
716 | @smallexample | |
717 | int | |
718 | is_in_class (int c, const char *class) | |
719 | @{ | |
720 | wctype_t desc = wctype (class); | |
721 | return desc ? iswctype (btowc (c), desc) : 0; | |
722 | @} | |
723 | @end smallexample | |
724 | ||
e18db2b0 | 725 | @xref{Converting a Character}, for more information on @code{btowc}. |
6dd5b57e | 726 | Note that this change probably does not improve the performance |
390955cb | 727 | of the program a lot since the @code{wctype} function still has to make |
6dd5b57e UD |
728 | the string comparisons. It gets really interesting if the |
729 | @code{is_in_class} function is called more than once for the | |
390955cb UD |
730 | same class name. In this case the variable @var{desc} could be computed |
731 | once and reused for all the calls. Therefore the above form of the | |
732 | function is probably not the final one. | |
733 | ||
734 | ||
735 | @node Wide Character Case Conversion, , Using Wide Char Classes, Character Handling | |
736 | @section Mapping of wide characters. | |
737 | ||
6dd5b57e UD |
738 | The classification functions are also generalized by the @w{ISO C} |
739 | standard. Instead of just allowing the two standard mappings, a | |
740 | locale can contain others. Again, the @code{localedef} program | |
741 | already supports generating such locale data files. | |
390955cb UD |
742 | |
743 | @comment wctype.h | |
744 | @comment ISO | |
745 | @deftp {Data Type} wctrans_t | |
746 | This data type is defined as a scalar type which can hold a value | |
747 | representing the locale-dependent character mapping. There is no way to | |
b912ca11 | 748 | construct such a value apart from using the return value of the |
390955cb UD |
749 | @code{wctrans} function. |
750 | ||
751 | @pindex wctype.h | |
752 | @noindent | |
753 | This type is defined in @file{wctype.h}. | |
754 | @end deftp | |
755 | ||
756 | @comment wctype.h | |
757 | @comment ISO | |
464d646f | 758 | @deftypefun wctrans_t wctrans (const char *@var{property}) |
c49130e3 AO |
759 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
760 | @c Similar implementation, same caveats as wctype. | |
390955cb UD |
761 | The @code{wctrans} function has to be used to find out whether a named |
762 | mapping is defined in the current locale selected for the | |
6dd5b57e UD |
763 | @code{LC_CTYPE} category. If the returned value is non-zero, you can use |
764 | it afterwards in calls to @code{towctrans}. If the return value is | |
390955cb UD |
765 | zero no such mapping is known in the current locale. |
766 | ||
767 | Beside locale-specific mappings there are two mappings which are | |
768 | guaranteed to be available in every locale: | |
769 | ||
770 | @multitable @columnfractions .5 .5 | |
771 | @item | |
772 | @code{"tolower"} @tab @code{"toupper"} | |
773 | @end multitable | |
774 | ||
775 | @pindex wctype.h | |
776 | @noindent | |
6dd5b57e | 777 | These functions are declared in @file{wctype.h}. |
390955cb UD |
778 | @end deftypefun |
779 | ||
780 | @comment wctype.h | |
781 | @comment ISO | |
782 | @deftypefun wint_t towctrans (wint_t @var{wc}, wctrans_t @var{desc}) | |
c49130e3 AO |
783 | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
784 | @c Same caveats as iswctype. | |
6dd5b57e UD |
785 | @code{towctrans} maps the input character @var{wc} |
786 | according to the rules of the mapping for which @var{desc} is a | |
787 | descriptor, and returns the value it finds. @var{desc} must be | |
390955cb UD |
788 | obtained by a successful call to @code{wctrans}. |
789 | ||
790 | @pindex wctype.h | |
791 | @noindent | |
792 | This function is declared in @file{wctype.h}. | |
793 | @end deftypefun | |
794 | ||
6dd5b57e UD |
795 | For the generally available mappings, the @w{ISO C} standard defines |
796 | convenient shortcuts so that it is not necessary to call @code{wctrans} | |
390955cb UD |
797 | for them. |
798 | ||
799 | @comment wctype.h | |
800 | @comment ISO | |
801 | @deftypefun wint_t towlower (wint_t @var{wc}) | |
c49130e3 AO |
802 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
803 | @c Same caveats as iswalnum, just using a wctrans rather than a wctype | |
804 | @c table. | |
390955cb UD |
805 | If @var{wc} is an upper-case letter, @code{towlower} returns the corresponding |
806 | lower-case letter. If @var{wc} is not an upper-case letter, | |
807 | @var{wc} is returned unchanged. | |
808 | ||
18fd611b UD |
809 | @noindent |
810 | @code{towlower} can be implemented using | |
811 | ||
812 | @smallexample | |
813 | towctrans (wc, wctrans ("tolower")) | |
814 | @end smallexample | |
815 | ||
390955cb UD |
816 | @pindex wctype.h |
817 | @noindent | |
818 | This function is declared in @file{wctype.h}. | |
819 | @end deftypefun | |
820 | ||
821 | @comment wctype.h | |
822 | @comment ISO | |
823 | @deftypefun wint_t towupper (wint_t @var{wc}) | |
c49130e3 | 824 | @safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} |
390955cb UD |
825 | If @var{wc} is a lower-case letter, @code{towupper} returns the corresponding |
826 | upper-case letter. Otherwise @var{wc} is returned unchanged. | |
827 | ||
18fd611b UD |
828 | @noindent |
829 | @code{towupper} can be implemented using | |
830 | ||
831 | @smallexample | |
832 | towctrans (wc, wctrans ("toupper")) | |
833 | @end smallexample | |
834 | ||
390955cb UD |
835 | @pindex wctype.h |
836 | @noindent | |
837 | This function is declared in @file{wctype.h}. | |
838 | @end deftypefun | |
839 | ||
840 | The same warnings given in the last section for the use of the wide | |
6dd5b57e | 841 | character classification functions apply here. It is not possible to |
390955cb | 842 | simply cast a @code{char} type value to a @code{wint_t} and use it as an |
6dd5b57e | 843 | argument to @code{towctrans} calls. |