]> git.ipfire.org Git - thirdparty/glibc.git/blame - manual/string.texi
Add the statx function
[thirdparty/glibc.git] / manual / string.texi
CommitLineData
390955cb 1@node String and Array Utilities, Character Set Handling, Character Handling, Top
7a68c94a 2@c %MENU% Utilities for copying and comparing strings and arrays
28f540f4
RM
3@chapter String and Array Utilities
4
2cc4b9cc 5Operations on strings (null-terminated byte sequences) are an important part of
1f77f049 6many programs. @Theglibc{} provides an extensive set of string
28f540f4
RM
7utility functions, including functions for copying, concatenating,
8comparing, and searching strings. Many of these functions can also
9operate on arbitrary regions of storage; for example, the @code{memcpy}
a5113b14 10function can be used to copy the contents of any kind of array.
28f540f4
RM
11
12It's fairly common for beginning C programmers to ``reinvent the wheel''
13by duplicating this functionality in their own code, but it pays to
14become familiar with the library functions and to make use of them,
15since this offers benefits in maintenance, efficiency, and portability.
16
17For instance, you could easily compare one string to another in two
18lines of C code, but if you use the built-in @code{strcmp} function,
19you're less likely to make a mistake. And, since these library
20functions are typically highly optimized, your program may run faster
21too.
22
23@menu
24* Representation of Strings:: Introduction to basic concepts.
25* String/Array Conventions:: Whether to use a string function or an
26 arbitrary array function.
27* String Length:: Determining the length of a string.
0a13c9e9
PE
28* Copying Strings and Arrays:: Functions to copy strings and arrays.
29* Concatenating Strings:: Functions to concatenate strings while copying.
30* Truncating Strings:: Functions to truncate strings while copying.
28f540f4
RM
31* String/Array Comparison:: Functions for byte-wise and character-wise
32 comparison.
33* Collation Functions:: Functions for collating strings.
34* Search Functions:: Searching for a specific element or substring.
35* Finding Tokens in a String:: Splitting a string into tokens by looking
36 for delimiters.
ea1bd74d
ZW
37* Erasing Sensitive Data:: Clearing memory which contains sensitive
38 data, after it's no longer needed.
b10a0acc
ZW
39* Shuffling Bytes:: Or how to flash-cook a string.
40* Obfuscating Data:: Reversibly obscuring data from casual view.
b4012b75 41* Encode Binary Data:: Encoding and Decoding of Binary Data.
b13927da 42* Argz and Envz Vectors:: Null-separated string vectors.
28f540f4
RM
43@end menu
44
b4012b75 45@node Representation of Strings
28f540f4
RM
46@section Representation of Strings
47@cindex string, representation of
48
49This section is a quick summary of string concepts for beginning C
2cc4b9cc 50programmers. It describes how strings are represented in C
28f540f4
RM
51and some common pitfalls. If you are already familiar with this
52material, you can skip this section.
53
54@cindex string
2cc4b9cc
PE
55A @dfn{string} is a null-terminated array of bytes of type @code{char},
56including the terminating null byte. String-valued
28f540f4
RM
57variables are usually declared to be pointers of type @code{char *}.
58Such variables do not include space for the text of a string; that has
59to be stored somewhere else---in an array variable, a string constant,
60or dynamically allocated memory (@pxref{Memory Allocation}). It's up to
61you to store the address of the chosen memory space into the pointer
62variable. Alternatively you can store a @dfn{null pointer} in the
63pointer variable. The null pointer does not point anywhere, so
64attempting to reference the string it points to gets an error.
65
2cc4b9cc
PE
66@cindex multibyte character
67@cindex multibyte string
68@cindex wide string
69A @dfn{multibyte character} is a sequence of one or more bytes that
70represents a single character using the locale's encoding scheme; a
71null byte always represents the null character. A @dfn{multibyte
72string} is a string that consists entirely of multibyte
73characters. In contrast, a @dfn{wide string} is a null-terminated
74sequence of @code{wchar_t} objects. A wide-string variable is usually
75declared to be a pointer of type @code{wchar_t *}, by analogy with
76string variables and @code{char *}. @xref{Extended Char Intro}.
77
78@cindex null byte
8a2f1f5b 79@cindex null wide character
2cc4b9cc
PE
80By convention, the @dfn{null byte}, @code{'\0'},
81marks the end of a string and the @dfn{null wide character},
82@code{L'\0'}, marks the end of a wide string. For example, in
8a2f1f5b 83testing to see whether the @code{char *} variable @var{p} points to a
2cc4b9cc 84null byte marking the end of a string, you can write
8a2f1f5b 85@code{!*@var{p}} or @code{*@var{p} == '\0'}.
28f540f4 86
2cc4b9cc
PE
87A null byte is quite different conceptually from a null pointer,
88although both are represented by the integer constant @code{0}.
28f540f4
RM
89
90@cindex string literal
2cc4b9cc
PE
91A @dfn{string literal} appears in C program source as a multibyte
92string between double-quote characters (@samp{"}). If the
93initial double-quote character is immediately preceded by a capital
94@samp{L} (ell) character (as in @code{L"foo"}), it is a wide string
95literal. String literals can also contribute to @dfn{string
96concatenation}: @code{"a" "b"} is the same as @code{"ab"}.
97For wide strings one can use either
8a2f1f5b
UD
98@code{L"a" L"b"} or @code{L"a" "b"}. Modification of string literals is
99not allowed by the GNU C compiler, because literals are placed in
100read-only storage.
28f540f4 101
2cc4b9cc 102Arrays that are declared @code{const} cannot be modified
28f540f4
RM
103either. It's generally good style to declare non-modifiable string
104pointers to be of type @code{const char *}, since this often allows the
105C compiler to detect accidental modifications as well as providing some
106amount of documentation about what your program intends to do with the
107string.
108
2cc4b9cc
PE
109The amount of memory allocated for a byte array may extend past the null byte
110that marks the end of the string that the array contains. In this
dd7d45e8 111document, the term @dfn{allocated size} is always used to refer to the
2cc4b9cc
PE
112total amount of memory allocated for an array, while the term
113@dfn{length} refers to the number of bytes up to (but not including)
114the terminating null byte. Wide strings are similar, except their
115sizes and lengths count wide characters, not bytes.
28f540f4
RM
116@cindex length of string
117@cindex allocation size of string
118@cindex size of string
119@cindex string length
120@cindex string allocation
121
2cc4b9cc 122A notorious source of program bugs is trying to put more bytes into a
28f540f4 123string than fit in its allocated size. When writing code that extends
2cc4b9cc 124strings or moves bytes into a pre-allocated array, you should be
28f540f4
RM
125very careful to keep track of the length of the text and make explicit
126checks for overflowing the array. Many of the library functions
127@emph{do not} do this for you! Remember also that you need to allocate
2cc4b9cc 128an extra byte to hold the null byte that marks the end of the
28f540f4
RM
129string.
130
8a2f1f5b
UD
131@cindex single-byte string
132@cindex multibyte string
2cc4b9cc 133Originally strings were sequences of bytes where each byte represented a
8a2f1f5b
UD
134single character. This is still true today if the strings are encoded
135using a single-byte character encoding. Things are different if the
136strings are encoded using a multibyte encoding (for more information on
137encodings see @ref{Extended Char Intro}). There is no difference in
138the programming interface for these two kind of strings; the programmer
139has to be aware of this and interpret the byte sequences accordingly.
140
141But since there is no separate interface taking care of these
142differences the byte-based string functions are sometimes hard to use.
143Since the count parameters of these functions specify bytes a call to
2cc4b9cc 144@code{memcpy} could cut a multibyte character in the middle and put an
8a2f1f5b
UD
145incomplete (and therefore unusable) byte sequence in the target buffer.
146
2cc4b9cc 147@cindex wide string
8a2f1f5b
UD
148To avoid these problems later versions of the @w{ISO C} standard
149introduce a second set of functions which are operating on @dfn{wide
150characters} (@pxref{Extended Char Intro}). These functions don't have
151the problems the single-byte versions have since every wide character is
152a legal, interpretable value. This does not mean that cutting wide
2cc4b9cc 153strings at arbitrary points is without problems. It normally
8a2f1f5b
UD
154is for alphabet-based languages (except for non-normalized text) but
155languages based on syllables still have the problem that more than one
156wide character is necessary to complete a logical unit. This is a
157higher level problem which the @w{C library} functions are not designed
158to solve. But it is at least good that no invalid byte sequences can be
2cc4b9cc
PE
159created. Also, the higher level functions can also much more easily operate
160on wide characters than on multibyte characters so that a common strategy
8a2f1f5b
UD
161is to use wide characters internally whenever text is more than simply
162copied.
163
164The remaining of this chapter will discuss the functions for handling
2cc4b9cc
PE
165wide strings in parallel with the discussion of
166strings since there is almost always an exact equivalent
8a2f1f5b
UD
167available.
168
b4012b75 169@node String/Array Conventions
28f540f4
RM
170@section String and Array Conventions
171
172This chapter describes both functions that work on arbitrary arrays or
2cc4b9cc
PE
173blocks of memory, and functions that are specific to strings and wide
174strings.
28f540f4
RM
175
176Functions that operate on arbitrary blocks of memory have names
8a2f1f5b
UD
177beginning with @samp{mem} and @samp{wmem} (such as @code{memcpy} and
178@code{wmemcpy}) and invariably take an argument which specifies the size
179(in bytes and wide characters respectively) of the block of memory to
28f540f4 180operate on. The array arguments and return values for these functions
8a2f1f5b
UD
181have type @code{void *} or @code{wchar_t}. As a matter of style, the
182elements of the arrays used with the @samp{mem} functions are referred
183to as ``bytes''. You can pass any kind of pointer to these functions,
184and the @code{sizeof} operator is useful in computing the value for the
185size argument. Parameters to the @samp{wmem} functions must be of type
186@code{wchar_t *}. These functions are not really usable with anything
187but arrays of this type.
188
189In contrast, functions that operate specifically on strings and wide
2cc4b9cc 190strings have names beginning with @samp{str} and @samp{wcs}
8a2f1f5b 191respectively (such as @code{strcpy} and @code{wcscpy}) and look for a
2cc4b9cc 192terminating null byte or null wide character instead of requiring an explicit
8a2f1f5b 193size argument to be passed. (Some of these functions accept a specified
2cc4b9cc
PE
194maximum length, but they also check for premature termination.)
195The array arguments and return values for these
8a2f1f5b 196functions have type @code{char *} and @code{wchar_t *} respectively, and
2cc4b9cc 197the array elements are referred to as ``bytes'' and ``wide
8a2f1f5b
UD
198characters''.
199
200In many cases, there are both @samp{mem} and @samp{str}/@samp{wcs}
201versions of a function. The one that is more appropriate to use depends
202on the exact situation. When your program is manipulating arbitrary
203arrays or blocks of storage, then you should always use the @samp{mem}
2cc4b9cc 204functions. On the other hand, when you are manipulating
8a2f1f5b
UD
205strings it is usually more convenient to use the @samp{str}/@samp{wcs}
206functions, unless you already know the length of the string in advance.
207The @samp{wmem} functions should be used for wide character arrays with
208known size.
209
210@cindex wint_t
211@cindex parameter promotion
212Some of the memory and string functions take single characters as
213arguments. Since a value of type @code{char} is automatically promoted
9dcc8f11 214into a value of type @code{int} when used as a parameter, the functions
8a2f1f5b 215are declared with @code{int} as the type of the parameter in question.
2cc4b9cc 216In case of the wide character functions the situation is similar: the
8a2f1f5b
UD
217parameter type for a single wide character is @code{wint_t} and not
218@code{wchar_t}. This would for many implementations not be necessary
2cc4b9cc 219since @code{wchar_t} is large enough to not be automatically
8a2f1f5b
UD
220promoted, but since the @w{ISO C} standard does not require such a
221choice of types the @code{wint_t} type is used.
28f540f4 222
b4012b75 223@node String Length
28f540f4
RM
224@section String Length
225
226You can get the length of a string using the @code{strlen} function.
227This function is declared in the header file @file{string.h}.
228@pindex string.h
229
28f540f4 230@deftypefun size_t strlen (const char *@var{s})
d08a7e4c 231@standards{ISO, string.h}
11087373 232@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
2cc4b9cc 233The @code{strlen} function returns the length of the
8a2f1f5b 234string @var{s} in bytes. (In other words, it returns the offset of the
2cc4b9cc 235terminating null byte within the array.)
28f540f4
RM
236
237For example,
238@smallexample
239strlen ("hello, world")
240 @result{} 12
241@end smallexample
242
2cc4b9cc 243When applied to an array, the @code{strlen} function returns
dd7d45e8 244the length of the string stored there, not its allocated size. You can
2cc4b9cc 245get the allocated size of the array that holds a string using
28f540f4
RM
246the @code{sizeof} operator:
247
248@smallexample
a5113b14 249char string[32] = "hello, world";
28f540f4
RM
250sizeof (string)
251 @result{} 32
252strlen (string)
253 @result{} 12
254@end smallexample
dd7d45e8 255
2cc4b9cc 256But beware, this will not work unless @var{string} is the
dd7d45e8
UD
257array itself, not a pointer to it. For example:
258
259@smallexample
260char string[32] = "hello, world";
261char *ptr = string;
262sizeof (string)
263 @result{} 32
264sizeof (ptr)
265 @result{} 4 /* @r{(on a machine with 4 byte pointers)} */
266@end smallexample
267
268This is an easy mistake to make when you are working with functions that
269take string arguments; those arguments are always pointers, not arrays.
270
8a2f1f5b
UD
271It must also be noted that for multibyte encoded strings the return
272value does not have to correspond to the number of characters in the
273string. To get this value the string can be converted to wide
274characters and @code{wcslen} can be used or something like the following
275code can be used:
276
277@smallexample
278/* @r{The input is in @code{string}.}
279 @r{The length is expected in @code{n}.} */
280@{
281 mbstate_t t;
282 char *scopy = string;
283 /* In initial state. */
284 memset (&t, '\0', sizeof (t));
285 /* Determine number of characters. */
286 n = mbsrtowcs (NULL, &scopy, strlen (scopy), &t);
287@}
288@end smallexample
289
290This is cumbersome to do so if the number of characters (as opposed to
291bytes) is needed often it is better to work with wide characters.
292@end deftypefun
293
294The wide character equivalent is declared in @file{wchar.h}.
295
8a2f1f5b 296@deftypefun size_t wcslen (const wchar_t *@var{ws})
d08a7e4c 297@standards{ISO, wchar.h}
11087373 298@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
299The @code{wcslen} function is the wide character equivalent to
300@code{strlen}. The return value is the number of wide characters in the
2cc4b9cc 301wide string pointed to by @var{ws} (this is also the offset of
8a2f1f5b
UD
302the terminating null wide character of @var{ws}).
303
2cc4b9cc 304Since there are no multi wide character sequences making up one wide
8a2f1f5b
UD
305character the return value is not only the offset in the array, it is
306also the number of wide characters.
307
308This function was introduced in @w{Amendment 1} to @w{ISO C90}.
28f540f4
RM
309@end deftypefun
310
4547c1a4 311@deftypefun size_t strnlen (const char *@var{s}, size_t @var{maxlen})
d08a7e4c 312@standards{GNU, string.h}
11087373 313@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
2cc4b9cc
PE
314If the array @var{s} of size @var{maxlen} contains a null byte,
315the @code{strnlen} function returns the length of the string @var{s} in
316bytes. Otherwise it
8a2f1f5b 317returns @var{maxlen}. Therefore this function is equivalent to
ebaf36eb
JM
318@code{(strlen (@var{s}) < @var{maxlen} ? strlen (@var{s}) : @var{maxlen})}
319but it
2cc4b9cc
PE
320is more efficient and works even if @var{s} is not null-terminated so
321long as @var{maxlen} does not exceed the size of @var{s}'s array.
4547c1a4
UD
322
323@smallexample
324char string[32] = "hello, world";
325strnlen (string, 32)
326 @result{} 12
327strnlen (string, 5)
328 @result{} 5
329@end smallexample
330
8a2f1f5b
UD
331This function is a GNU extension and is declared in @file{string.h}.
332@end deftypefun
333
8a2f1f5b 334@deftypefun size_t wcsnlen (const wchar_t *@var{ws}, size_t @var{maxlen})
d08a7e4c 335@standards{GNU, wchar.h}
11087373 336@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
337@code{wcsnlen} is the wide character equivalent to @code{strnlen}. The
338@var{maxlen} parameter specifies the maximum number of wide characters.
339
340This function is a GNU extension and is declared in @file{wchar.h}.
4547c1a4
UD
341@end deftypefun
342
0a13c9e9
PE
343@node Copying Strings and Arrays
344@section Copying Strings and Arrays
28f540f4
RM
345
346You can use the functions described in this section to copy the contents
0a13c9e9
PE
347of strings, wide strings, and arrays. The @samp{str} and @samp{mem}
348functions are declared in @file{string.h} while the @samp{w} functions
349are declared in @file{wchar.h}.
28f540f4 350@pindex string.h
8a2f1f5b 351@pindex wchar.h
28f540f4
RM
352@cindex copying strings and arrays
353@cindex string copy functions
354@cindex array copy functions
355@cindex concatenating strings
356@cindex string concatenation functions
357
358A helpful way to remember the ordering of the arguments to the functions
359in this section is that it corresponds to an assignment expression, with
0a13c9e9
PE
360the destination array specified to the left of the source array. Most
361of these functions return the address of the destination array; a few
362return the address of the destination's terminating null, or of just
363past the destination.
28f540f4
RM
364
365Most of these functions do not work properly if the source and
366destination arrays overlap. For example, if the beginning of the
367destination array overlaps the end of the source array, the original
368contents of that part of the source array may get overwritten before it
369is copied. Even worse, in the case of the string functions, the null
2cc4b9cc 370byte marking the end of the string may be lost, and the copy
28f540f4
RM
371function might get stuck in a loop trashing all the memory allocated to
372your program.
373
374All functions that have problems copying between overlapping arrays are
375explicitly identified in this manual. In addition to functions in this
376section, there are a few others like @code{sprintf} (@pxref{Formatted
377Output Functions}) and @code{scanf} (@pxref{Formatted Input
378Functions}).
379
8a2f1f5b 380@deftypefun {void *} memcpy (void *restrict @var{to}, const void *restrict @var{from}, size_t @var{size})
d08a7e4c 381@standards{ISO, string.h}
11087373 382@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
383The @code{memcpy} function copies @var{size} bytes from the object
384beginning at @var{from} into the object beginning at @var{to}. The
385behavior of this function is undefined if the two arrays @var{to} and
386@var{from} overlap; use @code{memmove} instead if overlapping is possible.
387
388The value returned by @code{memcpy} is the value of @var{to}.
389
390Here is an example of how you might use @code{memcpy} to copy the
391contents of an array:
392
393@smallexample
394struct foo *oldarray, *newarray;
395int arraysize;
396@dots{}
397memcpy (new, old, arraysize * sizeof (struct foo));
398@end smallexample
399@end deftypefun
400
79827876 401@deftypefun {wchar_t *} wmemcpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size})
d08a7e4c 402@standards{ISO, wchar.h}
11087373 403@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
404The @code{wmemcpy} function copies @var{size} wide characters from the object
405beginning at @var{wfrom} into the object beginning at @var{wto}. The
406behavior of this function is undefined if the two arrays @var{wto} and
407@var{wfrom} overlap; use @code{wmemmove} instead if overlapping is possible.
408
409The following is a possible implementation of @code{wmemcpy} but there
410are more optimizations possible.
411
412@smallexample
413wchar_t *
414wmemcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom,
415 size_t size)
416@{
417 return (wchar_t *) memcpy (wto, wfrom, size * sizeof (wchar_t));
418@}
419@end smallexample
420
421The value returned by @code{wmemcpy} is the value of @var{wto}.
422
423This function was introduced in @w{Amendment 1} to @w{ISO C90}.
424@end deftypefun
425
8a2f1f5b 426@deftypefun {void *} mempcpy (void *restrict @var{to}, const void *restrict @var{from}, size_t @var{size})
d08a7e4c 427@standards{GNU, string.h}
11087373 428@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
4547c1a4 429The @code{mempcpy} function is nearly identical to the @code{memcpy}
f2ea0f5b 430function. It copies @var{size} bytes from the object beginning at
4547c1a4 431@code{from} into the object pointed to by @var{to}. But instead of
976780fd 432returning the value of @var{to} it returns a pointer to the byte
4547c1a4
UD
433following the last written byte in the object beginning at @var{to}.
434I.e., the value is @code{((void *) ((char *) @var{to} + @var{size}))}.
435
436This function is useful in situations where a number of objects shall be
437copied to consecutive memory positions.
438
439@smallexample
440void *
441combine (void *o1, size_t s1, void *o2, size_t s2)
442@{
443 void *result = malloc (s1 + s2);
444 if (result != NULL)
445 mempcpy (mempcpy (result, o1, s1), o2, s2);
446 return result;
447@}
448@end smallexample
449
450This function is a GNU extension.
451@end deftypefun
452
8a2f1f5b 453@deftypefun {wchar_t *} wmempcpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size})
d08a7e4c 454@standards{GNU, wchar.h}
11087373 455@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
456The @code{wmempcpy} function is nearly identical to the @code{wmemcpy}
457function. It copies @var{size} wide characters from the object
458beginning at @code{wfrom} into the object pointed to by @var{wto}. But
459instead of returning the value of @var{wto} it returns a pointer to the
460wide character following the last written wide character in the object
461beginning at @var{wto}. I.e., the value is @code{@var{wto} + @var{size}}.
462
463This function is useful in situations where a number of objects shall be
464copied to consecutive memory positions.
465
466The following is a possible implementation of @code{wmemcpy} but there
467are more optimizations possible.
468
469@smallexample
470wchar_t *
471wmempcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom,
472 size_t size)
473@{
474 return (wchar_t *) mempcpy (wto, wfrom, size * sizeof (wchar_t));
475@}
476@end smallexample
477
478This function is a GNU extension.
479@end deftypefun
480
28f540f4 481@deftypefun {void *} memmove (void *@var{to}, const void *@var{from}, size_t @var{size})
d08a7e4c 482@standards{ISO, string.h}
11087373 483@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
484@code{memmove} copies the @var{size} bytes at @var{from} into the
485@var{size} bytes at @var{to}, even if those two blocks of space
486overlap. In the case of overlap, @code{memmove} is careful to copy the
487original values of the bytes in the block at @var{from}, including those
488bytes which also belong to the block at @var{to}.
8a2f1f5b
UD
489
490The value returned by @code{memmove} is the value of @var{to}.
491@end deftypefun
492
8ded91fb 493@deftypefun {wchar_t *} wmemmove (wchar_t *@var{wto}, const wchar_t *@var{wfrom}, size_t @var{size})
d08a7e4c 494@standards{ISO, wchar.h}
11087373 495@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
496@code{wmemmove} copies the @var{size} wide characters at @var{wfrom}
497into the @var{size} wide characters at @var{wto}, even if those two
f0f308c1 498blocks of space overlap. In the case of overlap, @code{wmemmove} is
8a2f1f5b
UD
499careful to copy the original values of the wide characters in the block
500at @var{wfrom}, including those wide characters which also belong to the
501block at @var{wto}.
502
503The following is a possible implementation of @code{wmemcpy} but there
504are more optimizations possible.
505
506@smallexample
507wchar_t *
508wmempcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom,
509 size_t size)
510@{
511 return (wchar_t *) mempcpy (wto, wfrom, size * sizeof (wchar_t));
512@}
513@end smallexample
514
515The value returned by @code{wmemmove} is the value of @var{wto}.
516
517This function is a GNU extension.
28f540f4
RM
518@end deftypefun
519
8a2f1f5b 520@deftypefun {void *} memccpy (void *restrict @var{to}, const void *restrict @var{from}, int @var{c}, size_t @var{size})
d08a7e4c 521@standards{SVID, string.h}
11087373 522@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
523This function copies no more than @var{size} bytes from @var{from} to
524@var{to}, stopping if a byte matching @var{c} is found. The return
525value is a pointer into @var{to} one byte past where @var{c} was copied,
526or a null pointer if no byte matching @var{c} appeared in the first
527@var{size} bytes of @var{from}.
528@end deftypefun
529
28f540f4 530@deftypefun {void *} memset (void *@var{block}, int @var{c}, size_t @var{size})
d08a7e4c 531@standards{ISO, string.h}
11087373 532@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
533This function copies the value of @var{c} (converted to an
534@code{unsigned char}) into each of the first @var{size} bytes of the
535object beginning at @var{block}. It returns the value of @var{block}.
536@end deftypefun
537
8a2f1f5b 538@deftypefun {wchar_t *} wmemset (wchar_t *@var{block}, wchar_t @var{wc}, size_t @var{size})
d08a7e4c 539@standards{ISO, wchar.h}
11087373 540@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
541This function copies the value of @var{wc} into each of the first
542@var{size} wide characters of the object beginning at @var{block}. It
543returns the value of @var{block}.
544@end deftypefun
545
8a2f1f5b 546@deftypefun {char *} strcpy (char *restrict @var{to}, const char *restrict @var{from})
d08a7e4c 547@standards{ISO, string.h}
11087373 548@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
2cc4b9cc
PE
549This copies bytes from the string @var{from} (up to and including
550the terminating null byte) into the string @var{to}. Like
28f540f4
RM
551@code{memcpy}, this function has undefined results if the strings
552overlap. The return value is the value of @var{to}.
553@end deftypefun
554
8a2f1f5b 555@deftypefun {wchar_t *} wcscpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom})
d08a7e4c 556@standards{ISO, wchar.h}
11087373 557@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
2cc4b9cc 558This copies wide characters from the wide string @var{wfrom} (up to and
8a2f1f5b
UD
559including the terminating null wide character) into the string
560@var{wto}. Like @code{wmemcpy}, this function has undefined results if
561the strings overlap. The return value is the value of @var{wto}.
562@end deftypefun
563
28f540f4 564@deftypefun {char *} strdup (const char *@var{s})
a448ee41 565@standards{SVID, string.h}
11087373 566@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
2cc4b9cc 567This function copies the string @var{s} into a newly
28f540f4
RM
568allocated string. The string is allocated using @code{malloc}; see
569@ref{Unconstrained Allocation}. If @code{malloc} cannot allocate space
570for the new string, @code{strdup} returns a null pointer. Otherwise it
571returns a pointer to the new string.
572@end deftypefun
573
8a2f1f5b 574@deftypefun {wchar_t *} wcsdup (const wchar_t *@var{ws})
d08a7e4c 575@standards{GNU, wchar.h}
11087373 576@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
2cc4b9cc 577This function copies the wide string @var{ws}
8a2f1f5b
UD
578into a newly allocated string. The string is allocated using
579@code{malloc}; see @ref{Unconstrained Allocation}. If @code{malloc}
580cannot allocate space for the new string, @code{wcsdup} returns a null
2cc4b9cc 581pointer. Otherwise it returns a pointer to the new wide string.
8a2f1f5b
UD
582
583This function is a GNU extension.
584@end deftypefun
585
8a2f1f5b 586@deftypefun {char *} stpcpy (char *restrict @var{to}, const char *restrict @var{from})
d08a7e4c 587@standards{Unknown origin, string.h}
11087373 588@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
589This function is like @code{strcpy}, except that it returns a pointer to
590the end of the string @var{to} (that is, the address of the terminating
2cc4b9cc 591null byte @code{to + strlen (from)}) rather than the beginning.
28f540f4
RM
592
593For example, this program uses @code{stpcpy} to concatenate @samp{foo}
594and @samp{bar} to produce @samp{foobar}, which it then prints.
595
596@smallexample
597@include stpcpy.c.texi
598@end smallexample
599
c30c3f46
RM
600This function is part of POSIX.1-2008 and later editions, but was
601available in @theglibc{} and other systems as an extension long before
602it was standardized.
28f540f4 603
8a2f1f5b
UD
604Its behavior is undefined if the strings overlap. The function is
605declared in @file{string.h}.
606@end deftypefun
607
8a2f1f5b 608@deftypefun {wchar_t *} wcpcpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom})
d08a7e4c 609@standards{GNU, wchar.h}
11087373 610@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
611This function is like @code{wcscpy}, except that it returns a pointer to
612the end of the string @var{wto} (that is, the address of the terminating
2cc4b9cc 613null wide character @code{wto + wcslen (wfrom)}) rather than the beginning.
8a2f1f5b
UD
614
615This function is not part of ISO or POSIX but was found useful while
1f77f049 616developing @theglibc{} itself.
8a2f1f5b
UD
617
618The behavior of @code{wcpcpy} is undefined if the strings overlap.
619
620@code{wcpcpy} is a GNU extension and is declared in @file{wchar.h}.
28f540f4
RM
621@end deftypefun
622
26b4d766 623@deftypefn {Macro} {char *} strdupa (const char *@var{s})
d08a7e4c 624@standards{GNU, string.h}
11087373 625@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
976780fd 626This macro is similar to @code{strdup} but allocates the new string
dd7d45e8
UD
627using @code{alloca} instead of @code{malloc} (@pxref{Variable Size
628Automatic}). This means of course the returned string has the same
629limitations as any block of memory allocated using @code{alloca}.
706074a5 630
dd7d45e8 631For obvious reasons @code{strdupa} is implemented only as a macro;
40a55d20 632you cannot get the address of this function. Despite this limitation
706074a5
UD
633it is a useful function. The following code shows a situation where
634using @code{malloc} would be a lot more expensive.
635
636@smallexample
637@include strdupa.c.texi
638@end smallexample
639
640Please note that calling @code{strtok} using @var{path} directly is
8a2f1f5b
UD
641invalid. It is also not allowed to call @code{strdupa} in the argument
642list of @code{strtok} since @code{strdupa} uses @code{alloca}
643(@pxref{Variable Size Automatic}) can interfere with the parameter
644passing.
706074a5
UD
645
646This function is only available if GNU CC is used.
26b4d766 647@end deftypefn
706074a5 648
0a13c9e9 649@deftypefun void bcopy (const void *@var{from}, void *@var{to}, size_t @var{size})
d08a7e4c 650@standards{BSD, string.h}
11087373 651@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
0a13c9e9
PE
652This is a partially obsolete alternative for @code{memmove}, derived from
653BSD. Note that it is not quite equivalent to @code{memmove}, because the
654arguments are not in the same order and there is no return value.
655@end deftypefun
706074a5 656
0a13c9e9 657@deftypefun void bzero (void *@var{block}, size_t @var{size})
d08a7e4c 658@standards{BSD, string.h}
0a13c9e9
PE
659@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
660This is a partially obsolete alternative for @code{memset}, derived from
661BSD. Note that it is not as general as @code{memset}, because the only
662value it can store is zero.
663@end deftypefun
706074a5 664
0a13c9e9
PE
665@node Concatenating Strings
666@section Concatenating Strings
667@pindex string.h
668@pindex wchar.h
669@cindex concatenating strings
670@cindex string concatenation functions
671
672The functions described in this section concatenate the contents of a
673string or wide string to another. They follow the string-copying
674functions in their conventions. @xref{Copying Strings and Arrays}.
675@samp{strcat} is declared in the header file @file{string.h} while
676@samp{wcscat} is declared in @file{wchar.h}.
706074a5 677
8a2f1f5b 678@deftypefun {char *} strcat (char *restrict @var{to}, const char *restrict @var{from})
d08a7e4c 679@standards{ISO, string.h}
11087373 680@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4 681The @code{strcat} function is similar to @code{strcpy}, except that the
2cc4b9cc
PE
682bytes from @var{from} are concatenated or appended to the end of
683@var{to}, instead of overwriting it. That is, the first byte from
684@var{from} overwrites the null byte marking the end of @var{to}.
28f540f4
RM
685
686An equivalent definition for @code{strcat} would be:
687
688@smallexample
689char *
8a2f1f5b 690strcat (char *restrict to, const char *restrict from)
28f540f4
RM
691@{
692 strcpy (to + strlen (to), from);
693 return to;
694@}
695@end smallexample
696
697This function has undefined results if the strings overlap.
0a13c9e9
PE
698
699As noted below, this function has significant performance issues.
28f540f4
RM
700@end deftypefun
701
8a2f1f5b 702@deftypefun {wchar_t *} wcscat (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom})
d08a7e4c 703@standards{ISO, wchar.h}
11087373 704@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b 705The @code{wcscat} function is similar to @code{wcscpy}, except that the
2cc4b9cc
PE
706wide characters from @var{wfrom} are concatenated or appended to the end of
707@var{wto}, instead of overwriting it. That is, the first wide character from
708@var{wfrom} overwrites the null wide character marking the end of @var{wto}.
8a2f1f5b
UD
709
710An equivalent definition for @code{wcscat} would be:
711
712@smallexample
713wchar_t *
714wcscat (wchar_t *wto, const wchar_t *wfrom)
715@{
716 wcscpy (wto + wcslen (wto), wfrom);
717 return wto;
718@}
719@end smallexample
720
721This function has undefined results if the strings overlap.
0a13c9e9
PE
722
723As noted below, this function has significant performance issues.
8a2f1f5b
UD
724@end deftypefun
725
726Programmers using the @code{strcat} or @code{wcscat} function (or the
0a13c9e9
PE
727@code{strncat} or @code{wcsncat} functions defined in
728a later section, for that matter)
8a2f1f5b
UD
729can easily be recognized as lazy and reckless. In almost all situations
730the lengths of the participating strings are known (it better should be
731since how can one otherwise ensure the allocated size of the buffer is
732sufficient?) Or at least, one could know them if one keeps track of the
ee2752ea 733results of the various function calls. But then it is very inefficient
8a2f1f5b
UD
734to use @code{strcat}/@code{wcscat}. A lot of time is wasted finding the
735end of the destination string so that the actual copying can start.
736This is a common example:
ee2752ea 737
ee2752ea
UD
738@cindex va_copy
739@smallexample
49c091e5 740/* @r{This function concatenates arbitrarily many strings. The last}
ee2752ea
UD
741 @r{parameter must be @code{NULL}.} */
742char *
8a2f1f5b 743concat (const char *str, @dots{})
ee2752ea
UD
744@{
745 va_list ap, ap2;
746 size_t total = 1;
747 const char *s;
748 char *result;
749
750 va_start (ap, str);
b5982523 751 va_copy (ap2, ap);
ee2752ea
UD
752
753 /* @r{Determine how much space we need.} */
754 for (s = str; s != NULL; s = va_arg (ap, const char *))
755 total += strlen (s);
756
757 va_end (ap);
758
759 result = (char *) malloc (total);
760 if (result != NULL)
761 @{
762 result[0] = '\0';
763
764 /* @r{Copy the strings.} */
765 for (s = str; s != NULL; s = va_arg (ap2, const char *))
766 strcat (result, s);
767 @}
768
769 va_end (ap2);
770
771 return result;
772@}
773@end smallexample
774
775This looks quite simple, especially the second loop where the strings
776are actually copied. But these innocent lines hide a major performance
777penalty. Just imagine that ten strings of 100 bytes each have to be
778concatenated. For the second string we search the already stored 100
779bytes for the end of the string so that we can append the next string.
780For all strings in total the comparisons necessary to find the end of
781the intermediate results sums up to 5500! If we combine the copying
782with the search for the allocation we can write this function more
f0f308c1 783efficiently:
ee2752ea
UD
784
785@smallexample
786char *
8a2f1f5b 787concat (const char *str, @dots{})
ee2752ea
UD
788@{
789 va_list ap;
790 size_t allocated = 100;
791 char *result = (char *) malloc (allocated);
ee2752ea 792
623281e0 793 if (result != NULL)
ee2752ea
UD
794 @{
795 char *newp;
623281e0 796 char *wp;
1bfb7291 797 const char *s;
ee2752ea 798
623281e0 799 va_start (ap, str);
ee2752ea
UD
800
801 wp = result;
802 for (s = str; s != NULL; s = va_arg (ap, const char *))
803 @{
804 size_t len = strlen (s);
805
806 /* @r{Resize the allocated memory if necessary.} */
807 if (wp + len + 1 > result + allocated)
808 @{
809 allocated = (allocated + len) * 2;
810 newp = (char *) realloc (result, allocated);
811 if (newp == NULL)
812 @{
813 free (result);
814 return NULL;
815 @}
816 wp = newp + (wp - result);
817 result = newp;
818 @}
819
820 wp = mempcpy (wp, s, len);
821 @}
822
823 /* @r{Terminate the result string.} */
824 *wp++ = '\0';
825
826 /* @r{Resize memory to the optimal size.} */
827 newp = realloc (result, wp - result);
828 if (newp != NULL)
829 result = newp;
830
831 va_end (ap);
832 @}
833
834 return result;
835@}
836@end smallexample
837
838With a bit more knowledge about the input strings one could fine-tune
839the memory allocation. The difference we are pointing to here is that
840we don't use @code{strcat} anymore. We always keep track of the length
f0f308c1 841of the current intermediate result so we can save ourselves the search for the
ee2752ea 842end of the string and use @code{mempcpy}. Please note that we also
f0f308c1
RJ
843don't use @code{stpcpy} which might seem more natural since we are handling
844strings. But this is not necessary since we already know the
ee2752ea 845length of the string and therefore can use the faster memory copying
8a2f1f5b 846function. The example would work for wide characters the same way.
ee2752ea
UD
847
848Whenever a programmer feels the need to use @code{strcat} she or he
f0f308c1 849should think twice and look through the program to see whether the code cannot
ee2752ea
UD
850be rewritten to take advantage of already calculated results. Again: it
851is almost always unnecessary to use @code{strcat}.
852
0a13c9e9
PE
853@node Truncating Strings
854@section Truncating Strings while Copying
855@cindex truncating strings
856@cindex string truncation
857
858The functions described in this section copy or concatenate the
859possibly-truncated contents of a string or array to another, and
860similarly for wide strings. They follow the string-copying functions
861in their header conventions. @xref{Copying Strings and Arrays}. The
862@samp{str} functions are declared in the header file @file{string.h}
863and the @samp{wc} functions are declared in the file @file{wchar.h}.
864
0a13c9e9 865@deftypefun {char *} strncpy (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size})
a448ee41 866@standards{C90, string.h}
0a13c9e9
PE
867@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
868This function is similar to @code{strcpy} but always copies exactly
869@var{size} bytes into @var{to}.
870
871If @var{from} does not contain a null byte in its first @var{size}
872bytes, @code{strncpy} copies just the first @var{size} bytes. In this
873case no null terminator is written into @var{to}.
874
875Otherwise @var{from} must be a string with length less than
876@var{size}. In this case @code{strncpy} copies all of @var{from},
877followed by enough null bytes to add up to @var{size} bytes in all.
878
879The behavior of @code{strncpy} is undefined if the strings overlap.
880
881This function was designed for now-rarely-used arrays consisting of
882non-null bytes followed by zero or more null bytes. It needs to set
883all @var{size} bytes of the destination, even when @var{size} is much
884greater than the length of @var{from}. As noted below, this function
885is generally a poor choice for processing text.
886@end deftypefun
887
0a13c9e9 888@deftypefun {wchar_t *} wcsncpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size})
d08a7e4c 889@standards{ISO, wchar.h}
0a13c9e9
PE
890@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
891This function is similar to @code{wcscpy} but always copies exactly
892@var{size} wide characters into @var{wto}.
893
894If @var{wfrom} does not contain a null wide character in its first
895@var{size} wide characters, then @code{wcsncpy} copies just the first
896@var{size} wide characters. In this case no null terminator is
897written into @var{wto}.
898
899Otherwise @var{wfrom} must be a wide string with length less than
900@var{size}. In this case @code{wcsncpy} copies all of @var{wfrom},
901followed by enough null wide characters to add up to @var{size} wide
902characters in all.
903
904The behavior of @code{wcsncpy} is undefined if the strings overlap.
905
906This function is the wide-character counterpart of @code{strncpy} and
907suffers from most of the problems that @code{strncpy} does. For
908example, as noted below, this function is generally a poor choice for
909processing text.
910@end deftypefun
911
0a13c9e9 912@deftypefun {char *} strndup (const char *@var{s}, size_t @var{size})
d08a7e4c 913@standards{GNU, string.h}
0a13c9e9
PE
914@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
915This function is similar to @code{strdup} but always copies at most
916@var{size} bytes into the newly allocated string.
917
918If the length of @var{s} is more than @var{size}, then @code{strndup}
919copies just the first @var{size} bytes and adds a closing null byte.
920Otherwise all bytes are copied and the string is terminated.
921
922This function differs from @code{strncpy} in that it always terminates
923the destination string.
924
925As noted below, this function is generally a poor choice for
926processing text.
927
928@code{strndup} is a GNU extension.
929@end deftypefun
930
0a13c9e9 931@deftypefn {Macro} {char *} strndupa (const char *@var{s}, size_t @var{size})
d08a7e4c 932@standards{GNU, string.h}
0a13c9e9
PE
933@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
934This function is similar to @code{strndup} but like @code{strdupa} it
935allocates the new string using @code{alloca} @pxref{Variable Size
936Automatic}. The same advantages and limitations of @code{strdupa} are
937valid for @code{strndupa}, too.
938
939This function is implemented only as a macro, just like @code{strdupa}.
940Just as @code{strdupa} this macro also must not be used inside the
941parameter list in a function call.
942
943As noted below, this function is generally a poor choice for
944processing text.
945
946@code{strndupa} is only available if GNU CC is used.
947@end deftypefn
948
0a13c9e9 949@deftypefun {char *} stpncpy (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size})
d08a7e4c 950@standards{GNU, string.h}
0a13c9e9
PE
951@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
952This function is similar to @code{stpcpy} but copies always exactly
953@var{size} bytes into @var{to}.
954
955If the length of @var{from} is more than @var{size}, then @code{stpncpy}
956copies just the first @var{size} bytes and returns a pointer to the
957byte directly following the one which was copied last. Note that in
958this case there is no null terminator written into @var{to}.
959
960If the length of @var{from} is less than @var{size}, then @code{stpncpy}
961copies all of @var{from}, followed by enough null bytes to add up
962to @var{size} bytes in all. This behavior is rarely useful, but it
963is implemented to be useful in contexts where this behavior of the
964@code{strncpy} is used. @code{stpncpy} returns a pointer to the
965@emph{first} written null byte.
966
967This function is not part of ISO or POSIX but was found useful while
968developing @theglibc{} itself.
969
970Its behavior is undefined if the strings overlap. The function is
971declared in @file{string.h}.
972
973As noted below, this function is generally a poor choice for
974processing text.
975@end deftypefun
976
0a13c9e9 977@deftypefun {wchar_t *} wcpncpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size})
d08a7e4c 978@standards{GNU, wchar.h}
0a13c9e9
PE
979@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
980This function is similar to @code{wcpcpy} but copies always exactly
981@var{wsize} wide characters into @var{wto}.
982
983If the length of @var{wfrom} is more than @var{size}, then
984@code{wcpncpy} copies just the first @var{size} wide characters and
985returns a pointer to the wide character directly following the last
986non-null wide character which was copied last. Note that in this case
987there is no null terminator written into @var{wto}.
988
989If the length of @var{wfrom} is less than @var{size}, then @code{wcpncpy}
990copies all of @var{wfrom}, followed by enough null wide characters to add up
991to @var{size} wide characters in all. This behavior is rarely useful, but it
992is implemented to be useful in contexts where this behavior of the
993@code{wcsncpy} is used. @code{wcpncpy} returns a pointer to the
994@emph{first} written null wide character.
995
996This function is not part of ISO or POSIX but was found useful while
997developing @theglibc{} itself.
998
999Its behavior is undefined if the strings overlap.
1000
1001As noted below, this function is generally a poor choice for
1002processing text.
1003
1004@code{wcpncpy} is a GNU extension.
1005@end deftypefun
1006
8a2f1f5b 1007@deftypefun {char *} strncat (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size})
d08a7e4c 1008@standards{ISO, string.h}
11087373 1009@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4 1010This function is like @code{strcat} except that not more than @var{size}
2cc4b9cc
PE
1011bytes from @var{from} are appended to the end of @var{to}, and
1012@var{from} need not be null-terminated. A single null byte is also
1013always appended to @var{to}, so the total
28f540f4
RM
1014allocated size of @var{to} must be at least @code{@var{size} + 1} bytes
1015longer than its initial length.
1016
1017The @code{strncat} function could be implemented like this:
1018
1019@smallexample
1020@group
1021char *
1022strncat (char *to, const char *from, size_t size)
1023@{
5d1d4918
PE
1024 size_t len = strlen (to);
1025 memcpy (to + len, from, strnlen (from, size));
1026 to[len + strnlen (from, size)] = '\0';
28f540f4
RM
1027 return to;
1028@}
1029@end group
1030@end smallexample
1031
1032The behavior of @code{strncat} is undefined if the strings overlap.
0a13c9e9
PE
1033
1034As a companion to @code{strncpy}, @code{strncat} was designed for
1035now-rarely-used arrays consisting of non-null bytes followed by zero
1036or more null bytes. As noted below, this function is generally a poor
1037choice for processing text. Also, this function has significant
1038performance issues. @xref{Concatenating Strings}.
28f540f4
RM
1039@end deftypefun
1040
8a2f1f5b 1041@deftypefun {wchar_t *} wcsncat (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size})
d08a7e4c 1042@standards{ISO, wchar.h}
11087373 1043@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b 1044This function is like @code{wcscat} except that not more than @var{size}
2cc4b9cc
PE
1045wide characters from @var{from} are appended to the end of @var{to},
1046and @var{from} need not be null-terminated. A single null wide
1047character is also always appended to @var{to}, so the total allocated
1048size of @var{to} must be at least @code{wcsnlen (@var{wfrom},
1049@var{size}) + 1} wide characters longer than its initial length.
8a2f1f5b
UD
1050
1051The @code{wcsncat} function could be implemented like this:
1052
1053@smallexample
1054@group
1055wchar_t *
1056wcsncat (wchar_t *restrict wto, const wchar_t *restrict wfrom,
1057 size_t size)
1058@{
5d1d4918
PE
1059 size_t len = wcslen (wto);
1060 memcpy (wto + len, wfrom, wcsnlen (wfrom, size) * sizeof (wchar_t));
1061 wto[len + wcsnlen (wfrom, size)] = L'\0';
8a2f1f5b
UD
1062 return wto;
1063@}
1064@end group
1065@end smallexample
1066
1067The behavior of @code{wcsncat} is undefined if the strings overlap.
28f540f4 1068
0a13c9e9
PE
1069As noted below, this function is generally a poor choice for
1070processing text. Also, this function has significant performance
1071issues. @xref{Concatenating Strings}.
1072@end deftypefun
1073
1074Because these functions can abruptly truncate strings or wide strings,
1075they are generally poor choices for processing text. When coping or
1076concatening multibyte strings, they can truncate within a multibyte
1077character so that the result is not a valid multibyte string. When
1078combining or concatenating multibyte or wide strings, they may
1079truncate the output after a combining character, resulting in a
1080corrupted grapheme. They can cause bugs even when processing
1081single-byte strings: for example, when calculating an ASCII-only user
1082name, a truncated name can identify the wrong user.
1083
1084Although some buffer overruns can be prevented by manually replacing
1085calls to copying functions with calls to truncation functions, there
1086are often easier and safer automatic techniques that cause buffer
1087overruns to reliably terminate a program, such as GCC's
1088@option{-fcheck-pointer-bounds} and @option{-fsanitize=address}
1089options. @xref{Debugging Options,, Options for Debugging Your Program
1f6676d7 1090or GCC, gcc, Using GCC}. Because truncation functions can mask
0a13c9e9
PE
1091application bugs that would otherwise be caught by the automatic
1092techniques, these functions should be used only when the application's
1093underlying logic requires truncation.
1094
1095@strong{Note:} GNU programs should not truncate strings or wide
1096strings to fit arbitrary size limits. @xref{Semantics, , Writing
1097Robust Programs, standards, The GNU Coding Standards}. Instead of
1098string-truncation functions, it is usually better to use dynamic
1099memory allocation (@pxref{Unconstrained Allocation}) and functions
1100such as @code{strdup} or @code{asprintf} to construct strings.
28f540f4 1101
b4012b75 1102@node String/Array Comparison
28f540f4
RM
1103@section String/Array Comparison
1104@cindex comparing strings and arrays
1105@cindex string comparison functions
1106@cindex array comparison functions
1107@cindex predicates on strings
1108@cindex predicates on arrays
1109
1110You can use the functions in this section to perform comparisons on the
1111contents of strings and arrays. As well as checking for equality, these
1112functions can also be used as the ordering functions for sorting
1113operations. @xref{Searching and Sorting}, for an example of this.
1114
1115Unlike most comparison operations in C, the string comparison functions
1116return a nonzero value if the strings are @emph{not} equivalent rather
1117than if they are. The sign of the value indicates the relative ordering
2cc4b9cc 1118of the first part of the strings that are not equivalent: a
28f540f4 1119negative value indicates that the first string is ``less'' than the
a5113b14 1120second, while a positive value indicates that the first string is
28f540f4
RM
1121``greater''.
1122
1123The most common use of these functions is to check only for equality.
1124This is canonically done with an expression like @w{@samp{! strcmp (s1, s2)}}.
1125
1126All of these functions are declared in the header file @file{string.h}.
1127@pindex string.h
1128
28f540f4 1129@deftypefun int memcmp (const void *@var{a1}, const void *@var{a2}, size_t @var{size})
d08a7e4c 1130@standards{ISO, string.h}
11087373 1131@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
1132The function @code{memcmp} compares the @var{size} bytes of memory
1133beginning at @var{a1} against the @var{size} bytes of memory beginning
1134at @var{a2}. The value returned has the same sign as the difference
1135between the first differing pair of bytes (interpreted as @code{unsigned
1136char} objects, then promoted to @code{int}).
1137
1138If the contents of the two blocks are equal, @code{memcmp} returns
1139@code{0}.
1140@end deftypefun
1141
8a2f1f5b 1142@deftypefun int wmemcmp (const wchar_t *@var{a1}, const wchar_t *@var{a2}, size_t @var{size})
d08a7e4c 1143@standards{ISO, wchar.h}
11087373 1144@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
1145The function @code{wmemcmp} compares the @var{size} wide characters
1146beginning at @var{a1} against the @var{size} wide characters beginning
1147at @var{a2}. The value returned is smaller than or larger than zero
1148depending on whether the first differing wide character is @var{a1} is
2cc4b9cc 1149smaller or larger than the corresponding wide character in @var{a2}.
8a2f1f5b
UD
1150
1151If the contents of the two blocks are equal, @code{wmemcmp} returns
1152@code{0}.
1153@end deftypefun
1154
28f540f4
RM
1155On arbitrary arrays, the @code{memcmp} function is mostly useful for
1156testing equality. It usually isn't meaningful to do byte-wise ordering
1157comparisons on arrays of things other than bytes. For example, a
1158byte-wise comparison on the bytes that make up floating-point numbers
1159isn't likely to tell you anything about the relationship between the
1160values of the floating-point numbers.
1161
8a2f1f5b
UD
1162@code{wmemcmp} is really only useful to compare arrays of type
1163@code{wchar_t} since the function looks at @code{sizeof (wchar_t)} bytes
1164at a time and this number of bytes is system dependent.
1165
28f540f4
RM
1166You should also be careful about using @code{memcmp} to compare objects
1167that can contain ``holes'', such as the padding inserted into structure
1168objects to enforce alignment requirements, extra space at the end of
2cc4b9cc 1169unions, and extra bytes at the ends of strings whose length is less
28f540f4
RM
1170than their allocated size. The contents of these ``holes'' are
1171indeterminate and may cause strange behavior when performing byte-wise
1172comparisons. For more predictable results, perform an explicit
1173component-wise comparison.
1174
1175For example, given a structure type definition like:
1176
1177@smallexample
1178struct foo
1179 @{
1180 unsigned char tag;
1181 union
1182 @{
1183 double f;
1184 long i;
1185 char *p;
1186 @} value;
1187 @};
1188@end smallexample
1189
1190@noindent
1191you are better off writing a specialized comparison function to compare
1192@code{struct foo} objects instead of comparing them with @code{memcmp}.
1193
28f540f4 1194@deftypefun int strcmp (const char *@var{s1}, const char *@var{s2})
d08a7e4c 1195@standards{ISO, string.h}
11087373 1196@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
1197The @code{strcmp} function compares the string @var{s1} against
1198@var{s2}, returning a value that has the same sign as the difference
2cc4b9cc 1199between the first differing pair of bytes (interpreted as
28f540f4
RM
1200@code{unsigned char} objects, then promoted to @code{int}).
1201
1202If the two strings are equal, @code{strcmp} returns @code{0}.
1203
1204A consequence of the ordering used by @code{strcmp} is that if @var{s1}
1205is an initial substring of @var{s2}, then @var{s1} is considered to be
1206``less than'' @var{s2}.
8a2f1f5b
UD
1207
1208@code{strcmp} does not take sorting conventions of the language the
1209strings are written in into account. To get that one has to use
1210@code{strcoll}.
1211@end deftypefun
1212
8a2f1f5b 1213@deftypefun int wcscmp (const wchar_t *@var{ws1}, const wchar_t *@var{ws2})
d08a7e4c 1214@standards{ISO, wchar.h}
11087373 1215@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b 1216
2cc4b9cc 1217The @code{wcscmp} function compares the wide string @var{ws1}
8a2f1f5b
UD
1218against @var{ws2}. The value returned is smaller than or larger than zero
1219depending on whether the first differing wide character is @var{ws1} is
2cc4b9cc 1220smaller or larger than the corresponding wide character in @var{ws2}.
8a2f1f5b
UD
1221
1222If the two strings are equal, @code{wcscmp} returns @code{0}.
1223
1224A consequence of the ordering used by @code{wcscmp} is that if @var{ws1}
1225is an initial substring of @var{ws2}, then @var{ws1} is considered to be
1226``less than'' @var{ws2}.
1227
1228@code{wcscmp} does not take sorting conventions of the language the
1229strings are written in into account. To get that one has to use
1230@code{wcscoll}.
28f540f4
RM
1231@end deftypefun
1232
28f540f4 1233@deftypefun int strcasecmp (const char *@var{s1}, const char *@var{s2})
d08a7e4c 1234@standards{BSD, string.h}
11087373
AO
1235@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
1236@c Although this calls tolower multiple times, it's a macro, and
1237@c strcasecmp is optimized so that the locale pointer is read only once.
1238@c There are some asm implementations too, for which the single-read
1239@c from locale TLS pointers also applies.
4547c1a4 1240This function is like @code{strcmp}, except that differences in case are
2cc4b9cc
PE
1241ignored, and its arguments must be multibyte strings.
1242How uppercase and lowercase characters are related is
4547c1a4
UD
1243determined by the currently selected locale. In the standard @code{"C"}
1244locale the characters @"A and @"a do not match but in a locale which
dd7d45e8 1245regards these characters as parts of the alphabet they do match.
28f540f4 1246
85c165be 1247@noindent
28f540f4
RM
1248@code{strcasecmp} is derived from BSD.
1249@end deftypefun
1250
8ded91fb 1251@deftypefun int wcscasecmp (const wchar_t *@var{ws1}, const wchar_t *@var{ws2})
d08a7e4c 1252@standards{GNU, wchar.h}
11087373
AO
1253@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
1254@c Since towlower is not a macro, the locale object may be read multiple
1255@c times.
8a2f1f5b
UD
1256This function is like @code{wcscmp}, except that differences in case are
1257ignored. How uppercase and lowercase characters are related is
1258determined by the currently selected locale. In the standard @code{"C"}
1259locale the characters @"A and @"a do not match but in a locale which
1260regards these characters as parts of the alphabet they do match.
1261
1262@noindent
1263@code{wcscasecmp} is a GNU extension.
1264@end deftypefun
1265
8a2f1f5b 1266@deftypefun int strncmp (const char *@var{s1}, const char *@var{s2}, size_t @var{size})
d08a7e4c 1267@standards{ISO, string.h}
11087373 1268@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b 1269This function is the similar to @code{strcmp}, except that no more than
2cc4b9cc
PE
1270@var{size} bytes are compared. In other words, if the two
1271strings are the same in their first @var{size} bytes, the
8a2f1f5b
UD
1272return value is zero.
1273@end deftypefun
1274
8a2f1f5b 1275@deftypefun int wcsncmp (const wchar_t *@var{ws1}, const wchar_t *@var{ws2}, size_t @var{size})
d08a7e4c 1276@standards{ISO, wchar.h}
11087373 1277@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
f0f308c1 1278This function is similar to @code{wcscmp}, except that no more than
8a2f1f5b
UD
1279@var{size} wide characters are compared. In other words, if the two
1280strings are the same in their first @var{size} wide characters, the
1281return value is zero.
1282@end deftypefun
1283
28f540f4 1284@deftypefun int strncasecmp (const char *@var{s1}, const char *@var{s2}, size_t @var{n})
d08a7e4c 1285@standards{BSD, string.h}
11087373 1286@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
28f540f4 1287This function is like @code{strncmp}, except that differences in case
2cc4b9cc
PE
1288are ignored, and the compared parts of the arguments should consist of
1289valid multibyte characters.
1290Like @code{strcasecmp}, it is locale dependent how
dd7d45e8 1291uppercase and lowercase characters are related.
28f540f4 1292
85c165be 1293@noindent
28f540f4
RM
1294@code{strncasecmp} is a GNU extension.
1295@end deftypefun
1296
8a2f1f5b 1297@deftypefun int wcsncasecmp (const wchar_t *@var{ws1}, const wchar_t *@var{s2}, size_t @var{n})
d08a7e4c 1298@standards{GNU, wchar.h}
11087373 1299@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
8a2f1f5b
UD
1300This function is like @code{wcsncmp}, except that differences in case
1301are ignored. Like @code{wcscasecmp}, it is locale dependent how
1302uppercase and lowercase characters are related.
1303
1304@noindent
1305@code{wcsncasecmp} is a GNU extension.
28f540f4
RM
1306@end deftypefun
1307
8a2f1f5b
UD
1308Here are some examples showing the use of @code{strcmp} and
1309@code{strncmp} (equivalent examples can be constructed for the wide
1310character functions). These examples assume the use of the ASCII
1311character set. (If some other character set---say, EBCDIC---is used
1312instead, then the glyphs are associated with different numeric codes,
1313and the return values and ordering may differ.)
28f540f4
RM
1314
1315@smallexample
1316strcmp ("hello", "hello")
1317 @result{} 0 /* @r{These two strings are the same.} */
1318strcmp ("hello", "Hello")
1319 @result{} 32 /* @r{Comparisons are case-sensitive.} */
1320strcmp ("hello", "world")
2cc4b9cc 1321 @result{} -15 /* @r{The byte @code{'h'} comes before @code{'w'}.} */
28f540f4 1322strcmp ("hello", "hello, world")
2cc4b9cc 1323 @result{} -44 /* @r{Comparing a null byte against a comma.} */
6952e59e 1324strncmp ("hello", "hello, world", 5)
2cc4b9cc 1325 @result{} 0 /* @r{The initial 5 bytes are the same.} */
28f540f4 1326strncmp ("hello, world", "hello, stupid world!!!", 5)
2cc4b9cc 1327 @result{} 0 /* @r{The initial 5 bytes are the same.} */
28f540f4
RM
1328@end smallexample
1329
1f205a47 1330@deftypefun int strverscmp (const char *@var{s1}, const char *@var{s2})
d08a7e4c 1331@standards{GNU, string.h}
11087373
AO
1332@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
1333@c Calls isdigit multiple times, locale may change in between.
1f205a47 1334The @code{strverscmp} function compares the string @var{s1} against
f2282d42
RM
1335@var{s2}, considering them as holding indices/version numbers. The
1336return value follows the same conventions as found in the
1337@code{strcmp} function. In fact, if @var{s1} and @var{s2} contain no
f4a36548
FW
1338digits, @code{strverscmp} behaves like @code{strcmp}
1339(in the sense that the sign of the result is the same).
1f205a47 1340
f4a36548
FW
1341The comparison algorithm which the @code{strverscmp} function implements
1342differs slightly from other version-comparison algorithms. The
1343implementation is based on a finite-state machine, whose behavior is
1344approximated below.
1f205a47
UD
1345
1346@itemize @bullet
1347@item
f4a36548
FW
1348The input strings are each split into sequences of non-digits and
1349digits. These sequences can be empty at the beginning and end of the
1350string. Digits are determined by the @code{isdigit} function and are
1351thus subject to the current locale.
1f205a47
UD
1352
1353@item
f4a36548
FW
1354Comparison starts with a (possibly empty) non-digit sequence. The first
1355non-equal sequences of non-digits or digits determines the outcome of
1356the comparison.
1f205a47
UD
1357
1358@item
f4a36548
FW
1359Corresponding non-digit sequences in both strings are compared
1360lexicographically if their lengths are equal. If the lengths differ,
1361the shorter non-digit sequence is extended with the input string
1362character immediately following it (which may be the null terminator),
1363the other sequence is truncated to be of the same (extended) length, and
1364these two sequences are compared lexicographically. In the last case,
1365the sequence comparison determines the result of the function because
1366the extension character (or some character before it) is necessarily
1367different from the character at the same offset in the other input
1368string.
1369
1370@item
1371For two sequences of digits, the number of leading zeros is counted (which
1372can be zero). If the count differs, the string with more leading zeros
1373in the digit sequence is considered smaller than the other string.
1374
1375@item
1376If the two sequences of digits have no leading zeros, they are compared
1377as integers, that is, the string with the longer digit sequence is
1378deemed larger, and if both sequences are of equal length, they are
1379compared lexicographically.
1380
1381@item
1382If both digit sequences start with a zero and have an equal number of
1383leading zeros, they are compared lexicographically if their lengths are
1384the same. If the lengths differ, the shorter sequence is extended with
1385the following character in its input string, and the other sequence is
1386truncated to the same length, and both sequences are compared
1387lexicographically (similar to the non-digit sequence case above).
1f205a47
UD
1388@end itemize
1389
f4a36548
FW
1390The treatment of leading zeros and the tie-breaking extension characters
1391(which in effect propagate across non-digit/digit sequence boundaries)
1392differs from other version-comparison algorithms.
1393
1f205a47
UD
1394@smallexample
1395strverscmp ("no digit", "no digit")
0bc93a2f 1396 @result{} 0 /* @r{same behavior as strcmp.} */
1f205a47
UD
1397strverscmp ("item#99", "item#100")
1398 @result{} <0 /* @r{same prefix, but 99 < 100.} */
1399strverscmp ("alpha1", "alpha001")
f4a36548 1400 @result{} >0 /* @r{different number of leading zeros (0 and 2).} */
1f205a47 1401strverscmp ("part1_f012", "part1_f01")
f4a36548 1402 @result{} >0 /* @r{lexicographical comparison with leading zeros.} */
1f205a47 1403strverscmp ("foo.009", "foo.0")
f4a36548 1404 @result{} <0 /* @r{different number of leading zeros (2 and 1).} */
1f205a47
UD
1405@end smallexample
1406
1f205a47
UD
1407@code{strverscmp} is a GNU extension.
1408@end deftypefun
1409
28f540f4 1410@deftypefun int bcmp (const void *@var{a1}, const void *@var{a2}, size_t @var{size})
d08a7e4c 1411@standards{BSD, string.h}
11087373 1412@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
1413This is an obsolete alias for @code{memcmp}, derived from BSD.
1414@end deftypefun
1415
b4012b75 1416@node Collation Functions
28f540f4
RM
1417@section Collation Functions
1418
1419@cindex collating strings
1420@cindex string collation functions
1421
1422In some locales, the conventions for lexicographic ordering differ from
1423the strict numeric ordering of character codes. For example, in Spanish
1424most glyphs with diacritical marks such as accents are not considered
1425distinct letters for the purposes of collation. On the other hand, the
1426two-character sequence @samp{ll} is treated as a single letter that is
1427collated immediately after @samp{l}.
1428
1429You can use the functions @code{strcoll} and @code{strxfrm} (declared in
8a2f1f5b
UD
1430the headers file @file{string.h}) and @code{wcscoll} and @code{wcsxfrm}
1431(declared in the headers file @file{wchar}) to compare strings using a
1432collation ordering appropriate for the current locale. The locale used
1433by these functions in particular can be specified by setting the locale
1434for the @code{LC_COLLATE} category; see @ref{Locales}.
28f540f4 1435@pindex string.h
8a2f1f5b 1436@pindex wchar.h
28f540f4
RM
1437
1438In the standard C locale, the collation sequence for @code{strcoll} is
8a2f1f5b
UD
1439the same as that for @code{strcmp}. Similarly, @code{wcscoll} and
1440@code{wcscmp} are the same in this situation.
28f540f4
RM
1441
1442Effectively, the way these functions work is by applying a mapping to
2cc4b9cc
PE
1443transform the characters in a multibyte string to a byte
1444sequence that represents
28f540f4
RM
1445the string's position in the collating sequence of the current locale.
1446Comparing two such byte sequences in a simple fashion is equivalent to
1447comparing the strings with the locale's collating sequence.
1448
8a2f1f5b
UD
1449The functions @code{strcoll} and @code{wcscoll} perform this translation
1450implicitly, in order to do one comparison. By contrast, @code{strxfrm}
1451and @code{wcsxfrm} perform the mapping explicitly. If you are making
1452multiple comparisons using the same string or set of strings, it is
1453likely to be more efficient to use @code{strxfrm} or @code{wcsxfrm} to
1454transform all the strings just once, and subsequently compare the
1455transformed strings with @code{strcmp} or @code{wcscmp}.
28f540f4 1456
28f540f4 1457@deftypefun int strcoll (const char *@var{s1}, const char *@var{s2})
d08a7e4c 1458@standards{ISO, string.h}
11087373
AO
1459@safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
1460@c Calls strcoll_l with the current locale, which dereferences only the
1461@c LC_COLLATE data pointer.
28f540f4
RM
1462The @code{strcoll} function is similar to @code{strcmp} but uses the
1463collating sequence of the current locale for collation (the
2cc4b9cc 1464@code{LC_COLLATE} locale). The arguments are multibyte strings.
28f540f4
RM
1465@end deftypefun
1466
8a2f1f5b 1467@deftypefun int wcscoll (const wchar_t *@var{ws1}, const wchar_t *@var{ws2})
d08a7e4c 1468@standards{ISO, wchar.h}
11087373
AO
1469@safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
1470@c Same as strcoll, but calling wcscoll_l.
8a2f1f5b
UD
1471The @code{wcscoll} function is similar to @code{wcscmp} but uses the
1472collating sequence of the current locale for collation (the
1473@code{LC_COLLATE} locale).
1474@end deftypefun
1475
28f540f4
RM
1476Here is an example of sorting an array of strings, using @code{strcoll}
1477to compare them. The actual sort algorithm is not written here; it
1478comes from @code{qsort} (@pxref{Array Sort Function}). The job of the
1479code shown here is to say how to compare the strings while sorting them.
1480(Later on in this section, we will show a way to do this more
1481efficiently using @code{strxfrm}.)
1482
1483@smallexample
1484/* @r{This is the comparison function used with @code{qsort}.} */
1485
1486int
e39745ff 1487compare_elements (const void *v1, const void *v2)
28f540f4 1488@{
e39745ff 1489 char * const *p1 = v1;
a9f5ce09 1490 char * const *p2 = v2;
e39745ff 1491
28f540f4
RM
1492 return strcoll (*p1, *p2);
1493@}
1494
1495/* @r{This is the entry point---the function to sort}
1496 @r{strings using the locale's collating sequence.} */
1497
1498void
1499sort_strings (char **array, int nstrings)
1500@{
1501 /* @r{Sort @code{temp_array} by comparing the strings.} */
9fc19e48
UD
1502 qsort (array, nstrings,
1503 sizeof (char *), compare_elements);
28f540f4
RM
1504@}
1505@end smallexample
1506
1507@cindex converting string to collation order
8a2f1f5b 1508@deftypefun size_t strxfrm (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size})
d08a7e4c 1509@standards{ISO, string.h}
11087373 1510@safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
2cc4b9cc
PE
1511The function @code{strxfrm} transforms the multibyte string
1512@var{from} using the
8a2f1f5b 1513collation transformation determined by the locale currently selected for
28f540f4 1514collation, and stores the transformed string in the array @var{to}. Up
2cc4b9cc 1515to @var{size} bytes (including a terminating null byte) are
28f540f4
RM
1516stored.
1517
1518The behavior is undefined if the strings @var{to} and @var{from}
0a13c9e9 1519overlap; see @ref{Copying Strings and Arrays}.
28f540f4
RM
1520
1521The return value is the length of the entire transformed string. This
1522value is not affected by the value of @var{size}, but if it is greater
a5113b14
UD
1523or equal than @var{size}, it means that the transformed string did not
1524entirely fit in the array @var{to}. In this case, only as much of the
1525string as actually fits was stored. To get the whole transformed
1526string, call @code{strxfrm} again with a bigger output array.
28f540f4
RM
1527
1528The transformed string may be longer than the original string, and it
1529may also be shorter.
1530
2cc4b9cc
PE
1531If @var{size} is zero, no bytes are stored in @var{to}. In this
1532case, @code{strxfrm} simply returns the number of bytes that would
28f540f4 1533be the length of the transformed string. This is useful for determining
8a2f1f5b
UD
1534what size the allocated array should be. It does not matter what
1535@var{to} is if @var{size} is zero; @var{to} may even be a null pointer.
1536@end deftypefun
1537
8a2f1f5b 1538@deftypefun size_t wcsxfrm (wchar_t *restrict @var{wto}, const wchar_t *@var{wfrom}, size_t @var{size})
d08a7e4c 1539@standards{ISO, wchar.h}
11087373 1540@safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
2cc4b9cc 1541The function @code{wcsxfrm} transforms wide string @var{wfrom}
8a2f1f5b
UD
1542using the collation transformation determined by the locale currently
1543selected for collation, and stores the transformed string in the array
1544@var{wto}. Up to @var{size} wide characters (including a terminating null
2cc4b9cc 1545wide character) are stored.
8a2f1f5b
UD
1546
1547The behavior is undefined if the strings @var{wto} and @var{wfrom}
0a13c9e9 1548overlap; see @ref{Copying Strings and Arrays}.
8a2f1f5b 1549
2cc4b9cc 1550The return value is the length of the entire transformed wide
8a2f1f5b
UD
1551string. This value is not affected by the value of @var{size}, but if
1552it is greater or equal than @var{size}, it means that the transformed
2cc4b9cc
PE
1553wide string did not entirely fit in the array @var{wto}. In
1554this case, only as much of the wide string as actually fits
1555was stored. To get the whole transformed wide string, call
8a2f1f5b
UD
1556@code{wcsxfrm} again with a bigger output array.
1557
2cc4b9cc
PE
1558The transformed wide string may be longer than the original
1559wide string, and it may also be shorter.
8a2f1f5b 1560
2cc4b9cc 1561If @var{size} is zero, no wide characters are stored in @var{to}. In this
8a2f1f5b 1562case, @code{wcsxfrm} simply returns the number of wide characters that
2cc4b9cc 1563would be the length of the transformed wide string. This is
8a2f1f5b
UD
1564useful for determining what size the allocated array should be (remember
1565to multiply with @code{sizeof (wchar_t)}). It does not matter what
1566@var{wto} is if @var{size} is zero; @var{wto} may even be a null pointer.
28f540f4
RM
1567@end deftypefun
1568
1569Here is an example of how you can use @code{strxfrm} when
1570you plan to do many comparisons. It does the same thing as the previous
1571example, but much faster, because it has to transform each string only
1572once, no matter how many times it is compared with other strings. Even
1573the time needed to allocate and free storage is much less than the time
1574we save, when there are many strings.
1575
1576@smallexample
1577struct sorter @{ char *input; char *transformed; @};
1578
1579/* @r{This is the comparison function used with @code{qsort}}
1580 @r{to sort an array of @code{struct sorter}.} */
1581
1582int
e39745ff 1583compare_elements (const void *v1, const void *v2)
28f540f4 1584@{
e39745ff
AJ
1585 const struct sorter *p1 = v1;
1586 const struct sorter *p2 = v2;
1587
28f540f4
RM
1588 return strcmp (p1->transformed, p2->transformed);
1589@}
1590
1591/* @r{This is the entry point---the function to sort}
1592 @r{strings using the locale's collating sequence.} */
1593
1594void
1595sort_strings_fast (char **array, int nstrings)
1596@{
1597 struct sorter temp_array[nstrings];
1598 int i;
1599
1600 /* @r{Set up @code{temp_array}. Each element contains}
1601 @r{one input string and its transformed string.} */
1602 for (i = 0; i < nstrings; i++)
1603 @{
1604 size_t length = strlen (array[i]) * 2;
a5113b14 1605 char *transformed;
f2ea0f5b 1606 size_t transformed_length;
28f540f4
RM
1607
1608 temp_array[i].input = array[i];
1609
a5113b14
UD
1610 /* @r{First try a buffer perhaps big enough.} */
1611 transformed = (char *) xmalloc (length);
1612
1613 /* @r{Transform @code{array[i]}.} */
1614 transformed_length = strxfrm (transformed, array[i], length);
1615
1616 /* @r{If the buffer was not large enough, resize it}
1617 @r{and try again.} */
1618 if (transformed_length >= length)
28f540f4 1619 @{
a5113b14 1620 /* @r{Allocate the needed space. +1 for terminating}
2cc4b9cc 1621 @r{@code{'\0'} byte.} */
a5113b14
UD
1622 transformed = (char *) xrealloc (transformed,
1623 transformed_length + 1);
1624
1625 /* @r{The return value is not interesting because we know}
1626 @r{how long the transformed string is.} */
dd7d45e8
UD
1627 (void) strxfrm (transformed, array[i],
1628 transformed_length + 1);
28f540f4 1629 @}
a5113b14
UD
1630
1631 temp_array[i].transformed = transformed;
28f540f4
RM
1632 @}
1633
1634 /* @r{Sort @code{temp_array} by comparing transformed strings.} */
89e691f2
AM
1635 qsort (temp_array, nstrings,
1636 sizeof (struct sorter), compare_elements);
28f540f4
RM
1637
1638 /* @r{Put the elements back in the permanent array}
1639 @r{in their sorted order.} */
1640 for (i = 0; i < nstrings; i++)
1641 array[i] = temp_array[i].input;
1642
1643 /* @r{Free the strings we allocated.} */
1644 for (i = 0; i < nstrings; i++)
1645 free (temp_array[i].transformed);
1646@}
1647@end smallexample
1648
8a2f1f5b
UD
1649The interesting part of this code for the wide character version would
1650look like this:
1651
1652@smallexample
1653void
1654sort_strings_fast (wchar_t **array, int nstrings)
1655@{
1656 @dots{}
1657 /* @r{Transform @code{array[i]}.} */
1658 transformed_length = wcsxfrm (transformed, array[i], length);
1659
1660 /* @r{If the buffer was not large enough, resize it}
1661 @r{and try again.} */
1662 if (transformed_length >= length)
1663 @{
1664 /* @r{Allocate the needed space. +1 for terminating}
2cc4b9cc 1665 @r{@code{L'\0'} wide character.} */
8a2f1f5b
UD
1666 transformed = (wchar_t *) xrealloc (transformed,
1667 (transformed_length + 1)
1668 * sizeof (wchar_t));
1669
1670 /* @r{The return value is not interesting because we know}
1671 @r{how long the transformed string is.} */
1672 (void) wcsxfrm (transformed, array[i],
1673 transformed_length + 1);
1674 @}
1675 @dots{}
1676@end smallexample
1677
1678@noindent
1679Note the additional multiplication with @code{sizeof (wchar_t)} in the
1680@code{realloc} call.
1681
1682@strong{Compatibility Note:} The string collation functions are a new
976780fd 1683feature of @w{ISO C90}. Older C dialects have no equivalent feature.
8a2f1f5b
UD
1684The wide character versions were introduced in @w{Amendment 1} to @w{ISO
1685C90}.
28f540f4 1686
b4012b75 1687@node Search Functions
28f540f4
RM
1688@section Search Functions
1689
1690This section describes library functions which perform various kinds
1691of searching operations on strings and arrays. These functions are
1692declared in the header file @file{string.h}.
1693@pindex string.h
1694@cindex search functions (for strings)
1695@cindex string search functions
1696
28f540f4 1697@deftypefun {void *} memchr (const void *@var{block}, int @var{c}, size_t @var{size})
d08a7e4c 1698@standards{ISO, string.h}
11087373 1699@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
1700This function finds the first occurrence of the byte @var{c} (converted
1701to an @code{unsigned char}) in the initial @var{size} bytes of the
1702object beginning at @var{block}. The return value is a pointer to the
1703located byte, or a null pointer if no match was found.
1704@end deftypefun
1705
8a2f1f5b 1706@deftypefun {wchar_t *} wmemchr (const wchar_t *@var{block}, wchar_t @var{wc}, size_t @var{size})
d08a7e4c 1707@standards{ISO, wchar.h}
11087373 1708@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
1709This function finds the first occurrence of the wide character @var{wc}
1710in the initial @var{size} wide characters of the object beginning at
1711@var{block}. The return value is a pointer to the located wide
1712character, or a null pointer if no match was found.
1713@end deftypefun
1714
87b56f36 1715@deftypefun {void *} rawmemchr (const void *@var{block}, int @var{c})
d08a7e4c 1716@standards{GNU, string.h}
11087373 1717@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
87b56f36
UD
1718Often the @code{memchr} function is used with the knowledge that the
1719byte @var{c} is available in the memory block specified by the
1720parameters. But this means that the @var{size} parameter is not really
1721needed and that the tests performed with it at runtime (to check whether
1722the end of the block is reached) are not needed.
1723
1724The @code{rawmemchr} function exists for just this situation which is
1725surprisingly frequent. The interface is similar to @code{memchr} except
1726that the @var{size} parameter is missing. The function will look beyond
1727the end of the block pointed to by @var{block} in case the programmer
6be569a4 1728made an error in assuming that the byte @var{c} is present in the block.
87b56f36
UD
1729In this case the result is unspecified. Otherwise the return value is a
1730pointer to the located byte.
1731
1732This function is of special interest when looking for the end of a
1733string. Since all strings are terminated by a null byte a call like
1734
1735@smallexample
1736 rawmemchr (str, '\0')
1737@end smallexample
1738
8a2f1f5b 1739@noindent
87b56f36
UD
1740will never go beyond the end of the string.
1741
1742This function is a GNU extension.
1743@end deftypefun
1744
ca747856 1745@deftypefun {void *} memrchr (const void *@var{block}, int @var{c}, size_t @var{size})
d08a7e4c 1746@standards{GNU, string.h}
11087373 1747@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
ca747856
RM
1748The function @code{memrchr} is like @code{memchr}, except that it searches
1749backwards from the end of the block defined by @var{block} and @var{size}
1750(instead of forwards from the front).
4efcb713
UD
1751
1752This function is a GNU extension.
a2d63612 1753@end deftypefun
ca747856 1754
28f540f4 1755@deftypefun {char *} strchr (const char *@var{string}, int @var{c})
d08a7e4c 1756@standards{ISO, string.h}
11087373 1757@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
2cc4b9cc
PE
1758The @code{strchr} function finds the first occurrence of the byte
1759@var{c} (converted to a @code{char}) in the string
28f540f4 1760beginning at @var{string}. The return value is a pointer to the located
2cc4b9cc 1761byte, or a null pointer if no match was found.
28f540f4
RM
1762
1763For example,
1764@smallexample
1765strchr ("hello, world", 'l')
1766 @result{} "llo, world"
1767strchr ("hello, world", '?')
1768 @result{} NULL
a5113b14 1769@end smallexample
28f540f4 1770
2cc4b9cc 1771The terminating null byte is considered to be part of the string,
28f540f4 1772so you can use this function get a pointer to the end of a string by
2cc4b9cc 1773specifying zero as the value of the @var{c} argument.
0520adde
FB
1774
1775When @code{strchr} returns a null pointer, it does not let you know
2cc4b9cc 1776the position of the terminating null byte it has found. If you
0520adde
FB
1777need that information, it is better (but less portable) to use
1778@code{strchrnul} than to search for it a second time.
8a2f1f5b
UD
1779@end deftypefun
1780
8a2f1f5b 1781@deftypefun {wchar_t *} wcschr (const wchar_t *@var{wstring}, int @var{wc})
d08a7e4c 1782@standards{ISO, wchar.h}
11087373 1783@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b 1784The @code{wcschr} function finds the first occurrence of the wide
2cc4b9cc 1785character @var{wc} in the wide string
8a2f1f5b
UD
1786beginning at @var{wstring}. The return value is a pointer to the
1787located wide character, or a null pointer if no match was found.
1788
2cc4b9cc
PE
1789The terminating null wide character is considered to be part of the wide
1790string, so you can use this function get a pointer to the end
1791of a wide string by specifying a null wide character as the
8a2f1f5b
UD
1792value of the @var{wc} argument. It would be better (but less portable)
1793to use @code{wcschrnul} in this case, though.
28f540f4
RM
1794@end deftypefun
1795
0e4ee106 1796@deftypefun {char *} strchrnul (const char *@var{string}, int @var{c})
d08a7e4c 1797@standards{GNU, string.h}
11087373 1798@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
0e4ee106 1799@code{strchrnul} is the same as @code{strchr} except that if it does
2cc4b9cc
PE
1800not find the byte, it returns a pointer to string's terminating
1801null byte rather than a null pointer.
8a2f1f5b
UD
1802
1803This function is a GNU extension.
1804@end deftypefun
1805
8a2f1f5b 1806@deftypefun {wchar_t *} wcschrnul (const wchar_t *@var{wstring}, wchar_t @var{wc})
d08a7e4c 1807@standards{GNU, wchar.h}
11087373 1808@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b 1809@code{wcschrnul} is the same as @code{wcschr} except that if it does not
2cc4b9cc 1810find the wide character, it returns a pointer to the wide string's
8a2f1f5b
UD
1811terminating null wide character rather than a null pointer.
1812
1813This function is a GNU extension.
28f540f4
RM
1814@end deftypefun
1815
ec28fc7c 1816One useful, but unusual, use of the @code{strchr}
2cc4b9cc 1817function is when one wants to have a pointer pointing to the null byte
ee2752ea
UD
1818terminating a string. This is often written in this way:
1819
1820@smallexample
1821 s += strlen (s);
1822@end smallexample
1823
1824@noindent
1825This is almost optimal but the addition operation duplicated a bit of
1826the work already done in the @code{strlen} function. A better solution
1827is this:
1828
1829@smallexample
1830 s = strchr (s, '\0');
1831@end smallexample
1832
1833There is no restriction on the second parameter of @code{strchr} so it
2cc4b9cc 1834could very well also be zero. Those readers thinking very
ee2752ea 1835hard about this might now point out that the @code{strchr} function is
8c474db5 1836more expensive than the @code{strlen} function since we have two abort
1f77f049 1837criteria. This is right. But in @theglibc{} the implementation of
0e4ee106 1838@code{strchr} is optimized in a special way so that @code{strchr}
8c474db5 1839actually is faster.
ee2752ea 1840
28f540f4 1841@deftypefun {char *} strrchr (const char *@var{string}, int @var{c})
d08a7e4c 1842@standards{ISO, string.h}
11087373 1843@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
1844The function @code{strrchr} is like @code{strchr}, except that it searches
1845backwards from the end of the string @var{string} (instead of forwards
1846from the front).
1847
1848For example,
1849@smallexample
1850strrchr ("hello, world", 'l')
1851 @result{} "ld"
1852@end smallexample
1853@end deftypefun
1854
8a2f1f5b 1855@deftypefun {wchar_t *} wcsrchr (const wchar_t *@var{wstring}, wchar_t @var{c})
d08a7e4c 1856@standards{ISO, wchar.h}
11087373 1857@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
1858The function @code{wcsrchr} is like @code{wcschr}, except that it searches
1859backwards from the end of the string @var{wstring} (instead of forwards
1860from the front).
1861@end deftypefun
1862
28f540f4 1863@deftypefun {char *} strstr (const char *@var{haystack}, const char *@var{needle})
d08a7e4c 1864@standards{ISO, string.h}
11087373 1865@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4 1866This is like @code{strchr}, except that it searches @var{haystack} for a
2cc4b9cc 1867substring @var{needle} rather than just a single byte. It
28f540f4 1868returns a pointer into the string @var{haystack} that is the first
2cc4b9cc 1869byte of the substring, or a null pointer if no match was found. If
28f540f4
RM
1870@var{needle} is an empty string, the function returns @var{haystack}.
1871
1872For example,
1873@smallexample
1874strstr ("hello, world", "l")
1875 @result{} "llo, world"
1876strstr ("hello, world", "wo")
1877 @result{} "world"
1878@end smallexample
1879@end deftypefun
1880
8a2f1f5b 1881@deftypefun {wchar_t *} wcsstr (const wchar_t *@var{haystack}, const wchar_t *@var{needle})
d08a7e4c 1882@standards{ISO, wchar.h}
11087373 1883@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
1884This is like @code{wcschr}, except that it searches @var{haystack} for a
1885substring @var{needle} rather than just a single wide character. It
1886returns a pointer into the string @var{haystack} that is the first wide
1887character of the substring, or a null pointer if no match was found. If
1888@var{needle} is an empty string, the function returns @var{haystack}.
1889@end deftypefun
1890
8a2f1f5b 1891@deftypefun {wchar_t *} wcswcs (const wchar_t *@var{haystack}, const wchar_t *@var{needle})
d08a7e4c 1892@standards{XPG, wchar.h}
11087373 1893@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
9dcc8f11 1894@code{wcswcs} is a deprecated alias for @code{wcsstr}. This is the
8a2f1f5b
UD
1895name originally used in the X/Open Portability Guide before the
1896@w{Amendment 1} to @w{ISO C90} was published.
1897@end deftypefun
1898
28f540f4 1899
0e4ee106 1900@deftypefun {char *} strcasestr (const char *@var{haystack}, const char *@var{needle})
d08a7e4c 1901@standards{GNU, string.h}
11087373
AO
1902@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
1903@c There may be multiple calls of strncasecmp, each accessing the locale
1904@c object independently.
0e4ee106
UD
1905This is like @code{strstr}, except that it ignores case in searching for
1906the substring. Like @code{strcasecmp}, it is locale dependent how
2cc4b9cc
PE
1907uppercase and lowercase characters are related, and arguments are
1908multibyte strings.
0e4ee106
UD
1909
1910
1911For example,
1912@smallexample
d6868416 1913strcasestr ("hello, world", "L")
0e4ee106 1914 @result{} "llo, world"
d6868416 1915strcasestr ("hello, World", "wo")
0e4ee106
UD
1916 @result{} "World"
1917@end smallexample
1918@end deftypefun
1919
1920
63551311 1921@deftypefun {void *} memmem (const void *@var{haystack}, size_t @var{haystack-len},@*const void *@var{needle}, size_t @var{needle-len})
d08a7e4c 1922@standards{GNU, string.h}
11087373 1923@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4 1924This is like @code{strstr}, but @var{needle} and @var{haystack} are byte
2cc4b9cc 1925arrays rather than strings. @var{needle-len} is the
28f540f4
RM
1926length of @var{needle} and @var{haystack-len} is the length of
1927@var{haystack}.@refill
1928
1929This function is a GNU extension.
1930@end deftypefun
1931
28f540f4 1932@deftypefun size_t strspn (const char *@var{string}, const char *@var{skipset})
d08a7e4c 1933@standards{ISO, string.h}
11087373 1934@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4 1935The @code{strspn} (``string span'') function returns the length of the
2cc4b9cc 1936initial substring of @var{string} that consists entirely of bytes that
28f540f4 1937are members of the set specified by the string @var{skipset}. The order
2cc4b9cc 1938of the bytes in @var{skipset} is not important.
28f540f4
RM
1939
1940For example,
1941@smallexample
1942strspn ("hello, world", "abcdefghijklmnopqrstuvwxyz")
1943 @result{} 5
1944@end smallexample
8a2f1f5b 1945
2cc4b9cc
PE
1946In a multibyte string, characters consisting of
1947more than one byte are not treated as single entities. Each byte is treated
8a2f1f5b
UD
1948separately. The function is not locale-dependent.
1949@end deftypefun
1950
8a2f1f5b 1951@deftypefun size_t wcsspn (const wchar_t *@var{wstring}, const wchar_t *@var{skipset})
d08a7e4c 1952@standards{ISO, wchar.h}
11087373 1953@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
1954The @code{wcsspn} (``wide character string span'') function returns the
1955length of the initial substring of @var{wstring} that consists entirely
1956of wide characters that are members of the set specified by the string
1957@var{skipset}. The order of the wide characters in @var{skipset} is not
1958important.
28f540f4
RM
1959@end deftypefun
1960
28f540f4 1961@deftypefun size_t strcspn (const char *@var{string}, const char *@var{stopset})
d08a7e4c 1962@standards{ISO, string.h}
11087373 1963@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4 1964The @code{strcspn} (``string complement span'') function returns the length
2cc4b9cc 1965of the initial substring of @var{string} that consists entirely of bytes
28f540f4 1966that are @emph{not} members of the set specified by the string @var{stopset}.
2cc4b9cc 1967(In other words, it returns the offset of the first byte in @var{string}
28f540f4
RM
1968that is a member of the set @var{stopset}.)
1969
1970For example,
1971@smallexample
1972strcspn ("hello, world", " \t\n,.;!?")
1973 @result{} 5
1974@end smallexample
8a2f1f5b 1975
2cc4b9cc
PE
1976In a multibyte string, characters consisting of
1977more than one byte are not treated as a single entities. Each byte is treated
8a2f1f5b
UD
1978separately. The function is not locale-dependent.
1979@end deftypefun
1980
8a2f1f5b 1981@deftypefun size_t wcscspn (const wchar_t *@var{wstring}, const wchar_t *@var{stopset})
d08a7e4c 1982@standards{ISO, wchar.h}
11087373 1983@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
1984The @code{wcscspn} (``wide character string complement span'') function
1985returns the length of the initial substring of @var{wstring} that
1986consists entirely of wide characters that are @emph{not} members of the
1987set specified by the string @var{stopset}. (In other words, it returns
2cc4b9cc 1988the offset of the first wide character in @var{string} that is a member of
8a2f1f5b 1989the set @var{stopset}.)
28f540f4
RM
1990@end deftypefun
1991
28f540f4 1992@deftypefun {char *} strpbrk (const char *@var{string}, const char *@var{stopset})
d08a7e4c 1993@standards{ISO, string.h}
11087373 1994@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4 1995The @code{strpbrk} (``string pointer break'') function is related to
2cc4b9cc 1996@code{strcspn}, except that it returns a pointer to the first byte
28f540f4
RM
1997in @var{string} that is a member of the set @var{stopset} instead of the
1998length of the initial substring. It returns a null pointer if no such
2cc4b9cc 1999byte from @var{stopset} is found.
28f540f4
RM
2000
2001@c @group Invalid outside the example.
2002For example,
2003
2004@smallexample
2005strpbrk ("hello, world", " \t\n,.;!?")
2006 @result{} ", world"
2007@end smallexample
2008@c @end group
8a2f1f5b 2009
2cc4b9cc
PE
2010In a multibyte string, characters consisting of
2011more than one byte are not treated as single entities. Each byte is treated
8a2f1f5b
UD
2012separately. The function is not locale-dependent.
2013@end deftypefun
2014
8a2f1f5b 2015@deftypefun {wchar_t *} wcspbrk (const wchar_t *@var{wstring}, const wchar_t *@var{stopset})
d08a7e4c 2016@standards{ISO, wchar.h}
11087373 2017@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
2018The @code{wcspbrk} (``wide character string pointer break'') function is
2019related to @code{wcscspn}, except that it returns a pointer to the first
2020wide character in @var{wstring} that is a member of the set
2021@var{stopset} instead of the length of the initial substring. It
2cc4b9cc 2022returns a null pointer if no such wide character from @var{stopset} is found.
28f540f4
RM
2023@end deftypefun
2024
0e4ee106
UD
2025
2026@subsection Compatibility String Search Functions
2027
0e4ee106 2028@deftypefun {char *} index (const char *@var{string}, int @var{c})
d08a7e4c 2029@standards{BSD, string.h}
11087373 2030@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
0e4ee106
UD
2031@code{index} is another name for @code{strchr}; they are exactly the same.
2032New code should always use @code{strchr} since this name is defined in
2033@w{ISO C} while @code{index} is a BSD invention which never was available
2034on @w{System V} derived systems.
2035@end deftypefun
2036
0e4ee106 2037@deftypefun {char *} rindex (const char *@var{string}, int @var{c})
d08a7e4c 2038@standards{BSD, string.h}
11087373 2039@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
0e4ee106
UD
2040@code{rindex} is another name for @code{strrchr}; they are exactly the same.
2041New code should always use @code{strrchr} since this name is defined in
2042@w{ISO C} while @code{rindex} is a BSD invention which never was available
2043on @w{System V} derived systems.
2044@end deftypefun
2045
b4012b75 2046@node Finding Tokens in a String
28f540f4
RM
2047@section Finding Tokens in a String
2048
28f540f4
RM
2049@cindex tokenizing strings
2050@cindex breaking a string into tokens
2051@cindex parsing tokens from a string
2052It's fairly common for programs to have a need to do some simple kinds
2053of lexical analysis and parsing, such as splitting a command string up
2054into tokens. You can do this with the @code{strtok} function, declared
2055in the header file @file{string.h}.
2056@pindex string.h
2057
8a2f1f5b 2058@deftypefun {char *} strtok (char *restrict @var{newstring}, const char *restrict @var{delimiters})
d08a7e4c 2059@standards{ISO, string.h}
11087373 2060@safety{@prelim{}@mtunsafe{@mtasurace{:strtok}}@asunsafe{}@acsafe{}}
28f540f4
RM
2061A string can be split into tokens by making a series of calls to the
2062function @code{strtok}.
2063
2064The string to be split up is passed as the @var{newstring} argument on
2065the first call only. The @code{strtok} function uses this to set up
2066some internal state information. Subsequent calls to get additional
2067tokens from the same string are indicated by passing a null pointer as
2068the @var{newstring} argument. Calling @code{strtok} with another
2069non-null @var{newstring} argument reinitializes the state information.
2070It is guaranteed that no other library function ever calls @code{strtok}
2071behind your back (which would mess up this internal state information).
2072
2073The @var{delimiters} argument is a string that specifies a set of delimiters
2cc4b9cc
PE
2074that may surround the token being extracted. All the initial bytes
2075that are members of this set are discarded. The first byte that is
28f540f4
RM
2076@emph{not} a member of this set of delimiters marks the beginning of the
2077next token. The end of the token is found by looking for the next
2cc4b9cc
PE
2078byte that is a member of the delimiter set. This byte in the
2079original string @var{newstring} is overwritten by a null byte, and the
28f540f4
RM
2080pointer to the beginning of the token in @var{newstring} is returned.
2081
2082On the next call to @code{strtok}, the searching begins at the next
2cc4b9cc 2083byte beyond the one that marked the end of the previous token.
28f540f4
RM
2084Note that the set of delimiters @var{delimiters} do not have to be the
2085same on every call in a series of calls to @code{strtok}.
2086
2087If the end of the string @var{newstring} is reached, or if the remainder of
2cc4b9cc 2088string consists only of delimiter bytes, @code{strtok} returns
28f540f4 2089a null pointer.
8a2f1f5b 2090
2cc4b9cc
PE
2091In a multibyte string, characters consisting of
2092more than one byte are not treated as single entities. Each byte is treated
8a2f1f5b
UD
2093separately. The function is not locale-dependent.
2094@end deftypefun
2095
1acd4371 2096@deftypefun {wchar_t *} wcstok (wchar_t *@var{newstring}, const wchar_t *@var{delimiters}, wchar_t **@var{save_ptr})
d08a7e4c 2097@standards{ISO, wchar.h}
11087373 2098@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
2099A string can be split into tokens by making a series of calls to the
2100function @code{wcstok}.
2101
2102The string to be split up is passed as the @var{newstring} argument on
2103the first call only. The @code{wcstok} function uses this to set up
2104some internal state information. Subsequent calls to get additional
2cc4b9cc 2105tokens from the same wide string are indicated by passing a
1acd4371
AO
2106null pointer as the @var{newstring} argument, which causes the pointer
2107previously stored in @var{save_ptr} to be used instead.
8a2f1f5b 2108
2cc4b9cc 2109The @var{delimiters} argument is a wide string that specifies
8a2f1f5b
UD
2110a set of delimiters that may surround the token being extracted. All
2111the initial wide characters that are members of this set are discarded.
2112The first wide character that is @emph{not} a member of this set of
2113delimiters marks the beginning of the next token. The end of the token
2114is found by looking for the next wide character that is a member of the
2cc4b9cc 2115delimiter set. This wide character in the original wide
1acd4371
AO
2116string @var{newstring} is overwritten by a null wide character, the
2117pointer past the overwritten wide character is saved in @var{save_ptr},
2118and the pointer to the beginning of the token in @var{newstring} is
2119returned.
8a2f1f5b
UD
2120
2121On the next call to @code{wcstok}, the searching begins at the next
2122wide character beyond the one that marked the end of the previous token.
2123Note that the set of delimiters @var{delimiters} do not have to be the
2124same on every call in a series of calls to @code{wcstok}.
2125
2cc4b9cc 2126If the end of the wide string @var{newstring} is reached, or
8a2f1f5b
UD
2127if the remainder of string consists only of delimiter wide characters,
2128@code{wcstok} returns a null pointer.
28f540f4
RM
2129@end deftypefun
2130
8a2f1f5b
UD
2131@strong{Warning:} Since @code{strtok} and @code{wcstok} alter the string
2132they is parsing, you should always copy the string to a temporary buffer
0a13c9e9
PE
2133before parsing it with @code{strtok}/@code{wcstok} (@pxref{Copying Strings
2134and Arrays}). If you allow @code{strtok} or @code{wcstok} to modify
8a2f1f5b
UD
2135a string that came from another part of your program, you are asking for
2136trouble; that string might be used for other purposes after
2137@code{strtok} or @code{wcstok} has modified it, and it would not have
2138the expected value.
28f540f4
RM
2139
2140The string that you are operating on might even be a constant. Then
8a2f1f5b
UD
2141when @code{strtok} or @code{wcstok} tries to modify it, your program
2142will get a fatal signal for writing in read-only memory. @xref{Program
2143Error Signals}. Even if the operation of @code{strtok} or @code{wcstok}
2144would not require a modification of the string (e.g., if there is
1f77f049 2145exactly one token) the string can (and in the @glibcadj{} case will) be
8a2f1f5b 2146modified.
28f540f4
RM
2147
2148This is a special case of a general principle: if a part of a program
2149does not have as its purpose the modification of a certain data
2150structure, then it is error-prone to modify the data structure
2151temporarily.
2152
1acd4371 2153The function @code{strtok} is not reentrant, whereas @code{wcstok} is.
8a2f1f5b
UD
2154@xref{Nonreentrancy}, for a discussion of where and why reentrancy is
2155important.
28f540f4
RM
2156
2157Here is a simple example showing the use of @code{strtok}.
2158
2159@comment Yes, this example has been tested.
2160@smallexample
2161#include <string.h>
2162#include <stddef.h>
2163
2164@dots{}
2165
5649a1d6 2166const char string[] = "words separated by spaces -- and, punctuation!";
28f540f4 2167const char delimiters[] = " .,;:!-";
5649a1d6 2168char *token, *cp;
28f540f4
RM
2169
2170@dots{}
2171
5649a1d6
UD
2172cp = strdupa (string); /* Make writable copy. */
2173token = strtok (cp, delimiters); /* token => "words" */
28f540f4
RM
2174token = strtok (NULL, delimiters); /* token => "separated" */
2175token = strtok (NULL, delimiters); /* token => "by" */
2176token = strtok (NULL, delimiters); /* token => "spaces" */
2177token = strtok (NULL, delimiters); /* token => "and" */
2178token = strtok (NULL, delimiters); /* token => "punctuation" */
2179token = strtok (NULL, delimiters); /* token => NULL */
2180@end smallexample
a5113b14 2181
1f77f049 2182@Theglibc{} contains two more functions for tokenizing a string
2cc4b9cc
PE
2183which overcome the limitation of non-reentrancy. They are not
2184available available for wide strings.
a5113b14 2185
a5113b14 2186@deftypefun {char *} strtok_r (char *@var{newstring}, const char *@var{delimiters}, char **@var{save_ptr})
d08a7e4c 2187@standards{POSIX, string.h}
11087373 2188@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
dd7d45e8
UD
2189Just like @code{strtok}, this function splits the string into several
2190tokens which can be accessed by successive calls to @code{strtok_r}.
1acd4371
AO
2191The difference is that, as in @code{wcstok}, the information about the
2192next token is stored in the space pointed to by the third argument,
2193@var{save_ptr}, which is a pointer to a string pointer. Calling
2194@code{strtok_r} with a null pointer for @var{newstring} and leaving
2195@var{save_ptr} between the calls unchanged does the job without
2196hindering reentrancy.
a5113b14 2197
976780fd 2198This function is defined in POSIX.1 and can be found on many systems
a5113b14
UD
2199which support multi-threading.
2200@end deftypefun
2201
a5113b14 2202@deftypefun {char *} strsep (char **@var{string_ptr}, const char *@var{delimiter})
d08a7e4c 2203@standards{BSD, string.h}
11087373 2204@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
0050ad5f
UD
2205This function has a similar functionality as @code{strtok_r} with the
2206@var{newstring} argument replaced by the @var{save_ptr} argument. The
2207initialization of the moving pointer has to be done by the user.
2208Successive calls to @code{strsep} move the pointer along the tokens
2209separated by @var{delimiter}, returning the address of the next token
2210and updating @var{string_ptr} to point to the beginning of the next
2211token.
2212
2213One difference between @code{strsep} and @code{strtok_r} is that if the
2cc4b9cc
PE
2214input string contains more than one byte from @var{delimiter} in a
2215row @code{strsep} returns an empty string for each pair of bytes
0050ad5f
UD
2216from @var{delimiter}. This means that a program normally should test
2217for @code{strsep} returning an empty string before processing it.
9afc8a59 2218
a5113b14
UD
2219This function was introduced in 4.3BSD and therefore is widely available.
2220@end deftypefun
2221
2222Here is how the above example looks like when @code{strsep} is used.
2223
2224@comment Yes, this example has been tested.
2225@smallexample
2226#include <string.h>
2227#include <stddef.h>
2228
2229@dots{}
2230
5649a1d6 2231const char string[] = "words separated by spaces -- and, punctuation!";
a5113b14
UD
2232const char delimiters[] = " .,;:!-";
2233char *running;
2234char *token;
2235
2236@dots{}
2237
5649a1d6 2238running = strdupa (string);
a5113b14
UD
2239token = strsep (&running, delimiters); /* token => "words" */
2240token = strsep (&running, delimiters); /* token => "separated" */
2241token = strsep (&running, delimiters); /* token => "by" */
2242token = strsep (&running, delimiters); /* token => "spaces" */
9afc8a59
UD
2243token = strsep (&running, delimiters); /* token => "" */
2244token = strsep (&running, delimiters); /* token => "" */
2245token = strsep (&running, delimiters); /* token => "" */
a5113b14 2246token = strsep (&running, delimiters); /* token => "and" */
9afc8a59 2247token = strsep (&running, delimiters); /* token => "" */
a5113b14 2248token = strsep (&running, delimiters); /* token => "punctuation" */
9afc8a59 2249token = strsep (&running, delimiters); /* token => "" */
a5113b14
UD
2250token = strsep (&running, delimiters); /* token => NULL */
2251@end smallexample
b4012b75 2252
ec28fc7c 2253@deftypefun {char *} basename (const char *@var{filename})
d08a7e4c 2254@standards{GNU, string.h}
11087373 2255@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
ec28fc7c 2256The GNU version of the @code{basename} function returns the last
9442cd75 2257component of the path in @var{filename}. This function is the preferred
ec28fc7c
UD
2258usage, since it does not modify the argument, @var{filename}, and
2259respects trailing slashes. The prototype for @code{basename} can be
ef48b196 2260found in @file{string.h}. Note, this function is overridden by the XPG
ec28fc7c
UD
2261version, if @file{libgen.h} is included.
2262
2263Example of using GNU @code{basename}:
2264
2265@smallexample
2266#include <string.h>
2267
2268int
2269main (int argc, char *argv[])
2270@{
2271 char *prog = basename (argv[0]);
2272
2273 if (argc < 2)
2274 @{
2275 fprintf (stderr, "Usage %s <arg>\n", prog);
2276 exit (1);
2277 @}
2278
2279 @dots{}
2280@}
2281@end smallexample
2282
2283@strong{Portability Note:} This function may produce different results
2284on different systems.
2285
2286@end deftypefun
2287
af85ebcd 2288@deftypefun {char *} basename (char *@var{path})
d08a7e4c 2289@standards{XPG, libgen.h}
11087373 2290@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
cf822e3c 2291This is the standard XPG defined @code{basename}. It is similar in
ec28fc7c 2292spirit to the GNU version, but may modify the @var{path} by removing
2cc4b9cc
PE
2293trailing '/' bytes. If the @var{path} is made up entirely of '/'
2294bytes, then "/" will be returned. Also, if @var{path} is
ec28fc7c 2295@code{NULL} or an empty string, then "." is returned. The prototype for
e4a5f77d 2296the XPG version can be found in @file{libgen.h}.
ec28fc7c
UD
2297
2298Example of using XPG @code{basename}:
2299
2300@smallexample
2301#include <libgen.h>
2302
2303int
2304main (int argc, char *argv[])
2305@{
2306 char *prog;
2307 char *path = strdupa (argv[0]);
2308
2309 prog = basename (path);
2310
2311 if (argc < 2)
2312 @{
2313 fprintf (stderr, "Usage %s <arg>\n", prog);
2314 exit (1);
2315 @}
2316
2317 @dots{}
2318
2319@}
2320@end smallexample
2321@end deftypefun
2322
ec28fc7c 2323@deftypefun {char *} dirname (char *@var{path})
d08a7e4c 2324@standards{XPG, libgen.h}
11087373 2325@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
ec28fc7c
UD
2326The @code{dirname} function is the compliment to the XPG version of
2327@code{basename}. It returns the parent directory of the file specified
2328by @var{path}. If @var{path} is @code{NULL}, an empty string, or
2cc4b9cc 2329contains no '/' bytes, then "." is returned. The prototype for this
ec28fc7c
UD
2330function can be found in @file{libgen.h}.
2331@end deftypefun
0e4ee106 2332
ea1bd74d
ZW
2333@node Erasing Sensitive Data
2334@section Erasing Sensitive Data
2335
2336Sensitive data, such as cryptographic keys, should be erased from
2337memory after use, to reduce the risk that a bug will expose it to the
2338outside world. However, compiler optimizations may determine that an
2339erasure operation is ``unnecessary,'' and remove it from the generated
2340code, because no @emph{correct} program could access the variable or
2341heap object containing the sensitive data after it's deallocated.
2342Since erasure is a precaution against bugs, this optimization is
2343inappropriate.
2344
2345The function @code{explicit_bzero} erases a block of memory, and
2346guarantees that the compiler will not remove the erasure as
2347``unnecessary.''
2348
2349@smallexample
2350@group
2351#include <string.h>
2352
2353extern void encrypt (const char *key, const char *in,
2354 char *out, size_t n);
2355extern void genkey (const char *phrase, char *key);
2356
2357void encrypt_with_phrase (const char *phrase, const char *in,
2358 char *out, size_t n)
2359@{
2360 char key[16];
2361 genkey (phrase, key);
2362 encrypt (key, in, out, n);
2363 explicit_bzero (key, 16);
2364@}
2365@end group
2366@end smallexample
2367
2368@noindent
2369In this example, if @code{memset}, @code{bzero}, or a hand-written
2370loop had been used, the compiler might remove them as ``unnecessary.''
2371
2372@strong{Warning:} @code{explicit_bzero} does not guarantee that
2373sensitive data is @emph{completely} erased from the computer's memory.
2374There may be copies in temporary storage areas, such as registers and
2375``scratch'' stack space; since these are invisible to the source code,
2376a library function cannot erase them.
2377
2378Also, @code{explicit_bzero} only operates on RAM. If a sensitive data
2379object never needs to have its address taken other than to call
2380@code{explicit_bzero}, it might be stored entirely in CPU registers
2381@emph{until} the call to @code{explicit_bzero}. Then it will be
2382copied into RAM, the copy will be erased, and the original will remain
2383intact. Data in RAM is more likely to be exposed by a bug than data
2384in registers, so this creates a brief window where the data is at
2385greater risk of exposure than it would have been if the program didn't
2386try to erase it at all.
2387
2388Declaring sensitive variables as @code{volatile} will make both the
2389above problems @emph{worse}; a @code{volatile} variable will be stored
2390in memory for its entire lifetime, and the compiler will make
2391@emph{more} copies of it than it would otherwise have. Attempting to
2392erase a normal variable ``by hand'' through a
2393@code{volatile}-qualified pointer doesn't work at all---because the
2394variable itself is not @code{volatile}, some compilers will ignore the
2395qualification on the pointer and remove the erasure anyway.
2396
2397Having said all that, in most situations, using @code{explicit_bzero}
2398is better than not using it. At present, the only way to do a more
2399thorough job is to write the entire sensitive operation in assembly
2400language. We anticipate that future compilers will recognize calls to
2401@code{explicit_bzero} and take appropriate steps to erase all the
2402copies of the affected data, whereever they may be.
2403
ea1bd74d 2404@deftypefun void explicit_bzero (void *@var{block}, size_t @var{len})
d08a7e4c 2405@standards{BSD, string.h}
ea1bd74d
ZW
2406@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
2407
2408@code{explicit_bzero} writes zero into @var{len} bytes of memory
2409beginning at @var{block}, just as @code{bzero} would. The zeroes are
2410always written, even if the compiler could determine that this is
2411``unnecessary'' because no correct program could read them back.
2412
2413@strong{Note:} The @emph{only} optimization that @code{explicit_bzero}
2414disables is removal of ``unnecessary'' writes to memory. The compiler
2415can perform all the other optimizations that it could for a call to
2416@code{memset}. For instance, it may replace the function call with
2417inline memory writes, and it may assume that @var{block} cannot be a
2418null pointer.
2419
2420@strong{Portability Note:} This function first appeared in OpenBSD 5.5
2421and has not been standardized. Other systems may provide the same
2422functionality under a different name, such as @code{explicit_memset},
2423@code{memset_s}, or @code{SecureZeroMemory}.
2424
2425@Theglibc{} declares this function in @file{string.h}, but on other
2426systems it may be in @file{strings.h} instead.
2427@end deftypefun
2428
b10a0acc
ZW
2429
2430@node Shuffling Bytes
2431@section Shuffling Bytes
0e4ee106
UD
2432
2433The function below addresses the perennial programming quandary: ``How do
2434I take good data in string form and painlessly turn it into garbage?''
b10a0acc
ZW
2435This is not a difficult thing to code for oneself, but the authors of
2436@theglibc{} wish to make it as convenient as possible.
0e4ee106 2437
b10a0acc
ZW
2438To @emph{erase} data, use @code{explicit_bzero} (@pxref{Erasing
2439Sensitive Data}); to obfuscate it reversibly, use @code{memfrob}
2440(@pxref{Obfuscating Data}).
0e4ee106 2441
ec28fc7c 2442@deftypefun {char *} strfry (char *@var{string})
d08a7e4c 2443@standards{GNU, string.h}
11087373
AO
2444@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
2445@c Calls initstate_r, time, getpid, strlen, and random_r.
0e4ee106 2446
b10a0acc
ZW
2447@code{strfry} performs an in-place shuffle on @var{string}. Each
2448character is swapped to a position selected at random, within the
2449portion of the string starting with the character's original position.
2450(This is the Fisher-Yates algorithm for unbiased shuffling.)
2451
2452Calling @code{strfry} will not disturb any of the random number
2453generators that have global state (@pxref{Pseudo-Random Numbers}).
0e4ee106
UD
2454
2455The return value of @code{strfry} is always @var{string}.
2456
1f77f049 2457@strong{Portability Note:} This function is unique to @theglibc{}.
b10a0acc 2458It is declared in @file{string.h}.
0e4ee106
UD
2459@end deftypefun
2460
2461
b10a0acc
ZW
2462@node Obfuscating Data
2463@section Obfuscating Data
0e4ee106
UD
2464@cindex Rot13
2465
b10a0acc
ZW
2466The @code{memfrob} function reversibly obfuscates an array of binary
2467data. This is not true encryption; the obfuscated data still bears a
2468clear relationship to the original, and no secret key is required to
2469undo the obfuscation. It is analogous to the ``Rot13'' cipher used on
2470Usenet for obscuring offensive jokes, spoilers for works of fiction,
2471and so on, but it can be applied to arbitrary binary data.
0e4ee106 2472
b10a0acc
ZW
2473Programs that need true encryption---a transformation that completely
2474obscures the original and cannot be reversed without knowledge of a
2475secret key---should use a dedicated cryptography library, such as
2476@uref{https://www.gnu.org/software/libgcrypt/,,libgcrypt}.
2477
2478Programs that need to @emph{destroy} data should use
2479@code{explicit_bzero} (@pxref{Erasing Sensitive Data}), or possibly
2480@code{strfry} (@pxref{Shuffling Bytes}).
0e4ee106 2481
0e4ee106 2482@deftypefun {void *} memfrob (void *@var{mem}, size_t @var{length})
d08a7e4c 2483@standards{GNU, string.h}
11087373 2484@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
0e4ee106 2485
b10a0acc
ZW
2486The function @code{memfrob} obfuscates @var{length} bytes of data
2487beginning at @var{mem}, in place. Each byte is bitwise xor-ed with
2488the binary pattern 00101010 (hexadecimal 0x2A). The return value is
2489always @var{mem}.
0e4ee106 2490
b10a0acc
ZW
2491@code{memfrob} a second time on the same data returns it to
2492its original state.
0e4ee106 2493
1f77f049 2494@strong{Portability Note:} This function is unique to @theglibc{}.
b10a0acc 2495It is declared in @file{string.h}.
0e4ee106
UD
2496@end deftypefun
2497
b4012b75
UD
2498@node Encode Binary Data
2499@section Encode Binary Data
2500
2501To store or transfer binary data in environments which only support text
2502one has to encode the binary data by mapping the input bytes to
2cc4b9cc 2503bytes in the range allowed for storing or transferring. SVID
dd7d45e8
UD
2504systems (and nowadays XPG compliant systems) provide minimal support for
2505this task.
b4012b75 2506
b4012b75 2507@deftypefun {char *} l64a (long int @var{n})
d08a7e4c 2508@standards{XPG, stdlib.h}
11087373 2509@safety{@prelim{}@mtunsafe{@mtasurace{:l64a}}@asunsafe{}@acsafe{}}
2cc4b9cc
PE
2510This function encodes a 32-bit input value using bytes from the
2511basic character set. It returns a pointer to a 7 byte buffer which
dd7d45e8
UD
2512contains an encoded version of @var{n}. To encode a series of bytes the
2513user must copy the returned string to a destination buffer. It returns
2514the empty string if @var{n} is zero, which is somewhat bizarre but
2515mandated by the standard.@*
2516@strong{Warning:} Since a static buffer is used this function should not
5649a1d6 2517be used in multi-threaded programs. There is no thread-safe alternative
dd7d45e8
UD
2518to this function in the C library.@*
2519@strong{Compatibility Note:} The XPG standard states that the return
2520value of @code{l64a} is undefined if @var{n} is negative. In the GNU
2521implementation, @code{l64a} treats its argument as unsigned, so it will
2522return a sensible encoding for any nonzero @var{n}; however, portable
2523programs should not rely on this.
b4012b75 2524
dd7d45e8
UD
2525To encode a large buffer @code{l64a} must be called in a loop, once for
2526each 32-bit word of the buffer. For example, one could do something
2527like this:
5649a1d6
UD
2528
2529@smallexample
2530char *
2531encode (const void *buf, size_t len)
2532@{
2533 /* @r{We know in advance how long the buffer has to be.} */
2534 unsigned char *in = (unsigned char *) buf;
2535 char *out = malloc (6 + ((len + 3) / 4) * 6 + 1);
290639c3 2536 char *cp = out, *p;
5649a1d6
UD
2537
2538 /* @r{Encode the length.} */
dd7d45e8 2539 /* @r{Using `htonl' is necessary so that the data can be}
290639c3
UD
2540 @r{decoded even on machines with different byte order.}
2541 @r{`l64a' can return a string shorter than 6 bytes, so }
2542 @r{we pad it with encoding of 0 (}'.'@r{) at the end by }
2543 @r{hand.} */
dd7d45e8 2544
290639c3
UD
2545 p = stpcpy (cp, l64a (htonl (len)));
2546 cp = mempcpy (p, "......", 6 - (p - cp));
5649a1d6
UD
2547
2548 while (len > 3)
2549 @{
2550 unsigned long int n = *in++;
2551 n = (n << 8) | *in++;
2552 n = (n << 8) | *in++;
2553 n = (n << 8) | *in++;
2554 len -= 4;
290639c3
UD
2555 p = stpcpy (cp, l64a (htonl (n)));
2556 cp = mempcpy (p, "......", 6 - (p - cp));
5649a1d6
UD
2557 @}
2558 if (len > 0)
2559 @{
2560 unsigned long int n = *in++;
2561 if (--len > 0)
2562 @{
2563 n = (n << 8) | *in++;
2564 if (--len > 0)
2565 n = (n << 8) | *in;
2566 @}
290639c3 2567 cp = stpcpy (cp, l64a (htonl (n)));
5649a1d6
UD
2568 @}
2569 *cp = '\0';
2570 return out;
2571@}
2572@end smallexample
2573
2574It is strange that the library does not provide the complete
dd7d45e8
UD
2575functionality needed but so be it.
2576
2577@end deftypefun
5649a1d6 2578
b4012b75
UD
2579To decode data produced with @code{l64a} the following function should be
2580used.
2581
2582@deftypefun {long int} a64l (const char *@var{string})
d08a7e4c 2583@standards{XPG, stdlib.h}
11087373 2584@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
b4012b75 2585The parameter @var{string} should contain a string which was produced by
2cc4b9cc
PE
2586a call to @code{l64a}. The function processes at least 6 bytes of
2587this string, and decodes the bytes it finds according to the table
2588below. It stops decoding when it finds a byte not in the table,
dd7d45e8 2589rather like @code{atoi}; if you have a buffer which has been broken into
2cc4b9cc 2590lines, you must be careful to skip over the end-of-line bytes.
dd7d45e8
UD
2591
2592The decoded number is returned as a @code{long int} value.
b4012b75 2593@end deftypefun
b13927da 2594
dd7d45e8 2595The @code{l64a} and @code{a64l} functions use a base 64 encoding, in
2cc4b9cc 2596which each byte of an encoded string represents six bits of an
dd7d45e8
UD
2597input word. These symbols are used for the base 64 digits:
2598
2599@multitable {xxxxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx}
2600@item @tab 0 @tab 1 @tab 2 @tab 3 @tab 4 @tab 5 @tab 6 @tab 7
2601@item 0 @tab @code{.} @tab @code{/} @tab @code{0} @tab @code{1}
2602 @tab @code{2} @tab @code{3} @tab @code{4} @tab @code{5}
2603@item 8 @tab @code{6} @tab @code{7} @tab @code{8} @tab @code{9}
2604 @tab @code{A} @tab @code{B} @tab @code{C} @tab @code{D}
2605@item 16 @tab @code{E} @tab @code{F} @tab @code{G} @tab @code{H}
2606 @tab @code{I} @tab @code{J} @tab @code{K} @tab @code{L}
2607@item 24 @tab @code{M} @tab @code{N} @tab @code{O} @tab @code{P}
2608 @tab @code{Q} @tab @code{R} @tab @code{S} @tab @code{T}
2609@item 32 @tab @code{U} @tab @code{V} @tab @code{W} @tab @code{X}
2610 @tab @code{Y} @tab @code{Z} @tab @code{a} @tab @code{b}
2611@item 40 @tab @code{c} @tab @code{d} @tab @code{e} @tab @code{f}
2612 @tab @code{g} @tab @code{h} @tab @code{i} @tab @code{j}
2613@item 48 @tab @code{k} @tab @code{l} @tab @code{m} @tab @code{n}
2614 @tab @code{o} @tab @code{p} @tab @code{q} @tab @code{r}
2615@item 56 @tab @code{s} @tab @code{t} @tab @code{u} @tab @code{v}
2616 @tab @code{w} @tab @code{x} @tab @code{y} @tab @code{z}
2617@end multitable
2618
2619This encoding scheme is not standard. There are some other encoding
2620methods which are much more widely used (UU encoding, MIME encoding).
2621Generally, it is better to use one of these encodings.
2622
b13927da
UD
2623@node Argz and Envz Vectors
2624@section Argz and Envz Vectors
2625
5649a1d6 2626@cindex argz vectors (string vectors)
2cc4b9cc
PE
2627@cindex string vectors, null-byte separated
2628@cindex argument vectors, null-byte separated
b13927da 2629@dfn{argz vectors} are vectors of strings in a contiguous block of
2cc4b9cc 2630memory, each element separated from its neighbors by null bytes
b13927da
UD
2631(@code{'\0'}).
2632
5649a1d6 2633@cindex envz vectors (environment vectors)
2cc4b9cc 2634@cindex environment vectors, null-byte separated
b13927da 2635@dfn{Envz vectors} are an extension of argz vectors where each element is a
2cc4b9cc 2636name-value pair, separated by a @code{'='} byte (as in a Unix
b13927da
UD
2637environment).
2638
2639@menu
2640* Argz Functions:: Operations on argz vectors.
2641* Envz Functions:: Additional operations on environment vectors.
2642@end menu
2643
2644@node Argz Functions, Envz Functions, , Argz and Envz Vectors
2645@subsection Argz Functions
2646
2647Each argz vector is represented by a pointer to the first element, of
2648type @code{char *}, and a size, of type @code{size_t}, both of which can
2649be initialized to @code{0} to represent an empty argz vector. All argz
2650functions accept either a pointer and a size argument, or pointers to
2651them, if they will be modified.
2652
2653The argz functions use @code{malloc}/@code{realloc} to allocate/grow
f0f308c1 2654argz vectors, and so any argz vector created using these functions may
b13927da
UD
2655be freed by using @code{free}; conversely, any argz function that may
2656grow a string expects that string to have been allocated using
2657@code{malloc} (those argz functions that only examine their arguments or
2658modify them in place will work on any sort of memory).
2659@xref{Unconstrained Allocation}.
2660
2661All argz functions that do memory allocation have a return type of
2662@code{error_t}, and return @code{0} for success, and @code{ENOMEM} if an
2663allocation error occurs.
2664
2665@pindex argz.h
2666These functions are declared in the standard include file @file{argz.h}.
2667
2668@deftypefun {error_t} argz_create (char *const @var{argv}[], char **@var{argz}, size_t *@var{argz_len})
d08a7e4c 2669@standards{GNU, argz.h}
11087373 2670@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
5649a1d6 2671The @code{argz_create} function converts the Unix-style argument vector
b13927da
UD
2672@var{argv} (a vector of pointers to normal C strings, terminated by
2673@code{(char *)0}; @pxref{Program Arguments}) into an argz vector with
2674the same elements, which is returned in @var{argz} and @var{argz_len}.
2675@end deftypefun
2676
2677@deftypefun {error_t} argz_create_sep (const char *@var{string}, int @var{sep}, char **@var{argz}, size_t *@var{argz_len})
d08a7e4c 2678@standards{GNU, argz.h}
11087373 2679@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
2cc4b9cc 2680The @code{argz_create_sep} function converts the string
b13927da 2681@var{string} into an argz vector (returned in @var{argz} and
49c091e5 2682@var{argz_len}) by splitting it into elements at every occurrence of the
2cc4b9cc 2683byte @var{sep}.
b13927da
UD
2684@end deftypefun
2685
f0f308c1 2686@deftypefun {size_t} argz_count (const char *@var{argz}, size_t @var{argz_len})
d08a7e4c 2687@standards{GNU, argz.h}
11087373 2688@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
b13927da
UD
2689Returns the number of elements in the argz vector @var{argz} and
2690@var{argz_len}.
2691@end deftypefun
2692
8ded91fb 2693@deftypefun {void} argz_extract (const char *@var{argz}, size_t @var{argz_len}, char **@var{argv})
d08a7e4c 2694@standards{GNU, argz.h}
11087373 2695@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
b13927da 2696The @code{argz_extract} function converts the argz vector @var{argz} and
5649a1d6 2697@var{argz_len} into a Unix-style argument vector stored in @var{argv},
b13927da
UD
2698by putting pointers to every element in @var{argz} into successive
2699positions in @var{argv}, followed by a terminator of @code{0}.
2700@var{Argv} must be pre-allocated with enough space to hold all the
2701elements in @var{argz} plus the terminating @code{(char *)0}
2702(@code{(argz_count (@var{argz}, @var{argz_len}) + 1) * sizeof (char *)}
2703bytes should be enough). Note that the string pointers stored into
2704@var{argv} point into @var{argz}---they are not copies---and so
2705@var{argz} must be copied if it will be changed while @var{argv} is
2706still active. This function is useful for passing the elements in
2707@var{argz} to an exec function (@pxref{Executing a File}).
2708@end deftypefun
2709
2710@deftypefun {void} argz_stringify (char *@var{argz}, size_t @var{len}, int @var{sep})
d08a7e4c 2711@standards{GNU, argz.h}
11087373 2712@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
b13927da 2713The @code{argz_stringify} converts @var{argz} into a normal string with
2cc4b9cc 2714the elements separated by the byte @var{sep}, by replacing each
b13927da
UD
2715@code{'\0'} inside @var{argz} (except the last one, which terminates the
2716string) with @var{sep}. This is handy for printing @var{argz} in a
2717readable manner.
2718@end deftypefun
2719
2720@deftypefun {error_t} argz_add (char **@var{argz}, size_t *@var{argz_len}, const char *@var{str})
d08a7e4c 2721@standards{GNU, argz.h}
11087373
AO
2722@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
2723@c Calls strlen and argz_append.
b13927da
UD
2724The @code{argz_add} function adds the string @var{str} to the end of the
2725argz vector @code{*@var{argz}}, and updates @code{*@var{argz}} and
2726@code{*@var{argz_len}} accordingly.
2727@end deftypefun
2728
2729@deftypefun {error_t} argz_add_sep (char **@var{argz}, size_t *@var{argz_len}, const char *@var{str}, int @var{delim})
d08a7e4c 2730@standards{GNU, argz.h}
11087373 2731@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
b13927da 2732The @code{argz_add_sep} function is similar to @code{argz_add}, but
49c091e5 2733@var{str} is split into separate elements in the result at occurrences of
2cc4b9cc 2734the byte @var{delim}. This is useful, for instance, for
5649a1d6 2735adding the components of a Unix search path to an argz vector, by using
b13927da
UD
2736a value of @code{':'} for @var{delim}.
2737@end deftypefun
2738
2739@deftypefun {error_t} argz_append (char **@var{argz}, size_t *@var{argz_len}, const char *@var{buf}, size_t @var{buf_len})
d08a7e4c 2740@standards{GNU, argz.h}
11087373 2741@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
b13927da
UD
2742The @code{argz_append} function appends @var{buf_len} bytes starting at
2743@var{buf} to the argz vector @code{*@var{argz}}, reallocating
2744@code{*@var{argz}} to accommodate it, and adding @var{buf_len} to
2745@code{*@var{argz_len}}.
2746@end deftypefun
2747
30aa5785 2748@deftypefun {void} argz_delete (char **@var{argz}, size_t *@var{argz_len}, char *@var{entry})
d08a7e4c 2749@standards{GNU, argz.h}
11087373
AO
2750@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
2751@c Calls free if no argument is left.
b13927da
UD
2752If @var{entry} points to the beginning of one of the elements in the
2753argz vector @code{*@var{argz}}, the @code{argz_delete} function will
2754remove this entry and reallocate @code{*@var{argz}}, modifying
2755@code{*@var{argz}} and @code{*@var{argz_len}} accordingly. Note that as
2756destructive argz functions usually reallocate their argz argument,
2757pointers into argz vectors such as @var{entry} will then become invalid.
2758@end deftypefun
2759
2760@deftypefun {error_t} argz_insert (char **@var{argz}, size_t *@var{argz_len}, char *@var{before}, const char *@var{entry})
d08a7e4c 2761@standards{GNU, argz.h}
11087373
AO
2762@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
2763@c Calls argz_add or realloc and memmove.
b13927da
UD
2764The @code{argz_insert} function inserts the string @var{entry} into the
2765argz vector @code{*@var{argz}} at a point just before the existing
2766element pointed to by @var{before}, reallocating @code{*@var{argz}} and
2767updating @code{*@var{argz}} and @code{*@var{argz_len}}. If @var{before}
2768is @code{0}, @var{entry} is added to the end instead (as if by
2769@code{argz_add}). Since the first element is in fact the same as
2770@code{*@var{argz}}, passing in @code{*@var{argz}} as the value of
2771@var{before} will result in @var{entry} being inserted at the beginning.
2772@end deftypefun
2773
8ded91fb 2774@deftypefun {char *} argz_next (const char *@var{argz}, size_t @var{argz_len}, const char *@var{entry})
d08a7e4c 2775@standards{GNU, argz.h}
11087373 2776@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
b13927da
UD
2777The @code{argz_next} function provides a convenient way of iterating
2778over the elements in the argz vector @var{argz}. It returns a pointer
2779to the next element in @var{argz} after the element @var{entry}, or
2780@code{0} if there are no elements following @var{entry}. If @var{entry}
2781is @code{0}, the first element of @var{argz} is returned.
2782
2783This behavior suggests two styles of iteration:
2784
2785@smallexample
2786 char *entry = 0;
2787 while ((entry = argz_next (@var{argz}, @var{argz_len}, entry)))
2788 @var{action};
2789@end smallexample
2790
2791(the double parentheses are necessary to make some C compilers shut up
2792about what they consider a questionable @code{while}-test) and:
2793
2794@smallexample
2795 char *entry;
2796 for (entry = @var{argz};
2797 entry;
2798 entry = argz_next (@var{argz}, @var{argz_len}, entry))
2799 @var{action};
2800@end smallexample
2801
2802Note that the latter depends on @var{argz} having a value of @code{0} if
2803it is empty (rather than a pointer to an empty block of memory); this
2804invariant is maintained for argz vectors created by the functions here.
2805@end deftypefun
2806
d705269e 2807@deftypefun error_t argz_replace (@w{char **@var{argz}, size_t *@var{argz_len}}, @w{const char *@var{str}, const char *@var{with}}, @w{unsigned *@var{replace_count}})
d08a7e4c 2808@standards{GNU, argz.h}
11087373 2809@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
49c091e5 2810Replace any occurrences of the string @var{str} in @var{argz} with
d705269e
UD
2811@var{with}, reallocating @var{argz} as necessary. If
2812@var{replace_count} is non-zero, @code{*@var{replace_count}} will be
f0f308c1 2813incremented by the number of replacements performed.
d705269e
UD
2814@end deftypefun
2815
b13927da
UD
2816@node Envz Functions, , Argz Functions, Argz and Envz Vectors
2817@subsection Envz Functions
2818
2819Envz vectors are just argz vectors with additional constraints on the form
2820of each element; as such, argz functions can also be used on them, where it
2821makes sense.
2822
2823Each element in an envz vector is a name-value pair, separated by a @code{'='}
2cc4b9cc 2824byte; if multiple @code{'='} bytes are present in an element, those
b13927da 2825after the first are considered part of the value, and treated like all other
2cc4b9cc 2826non-@code{'\0'} bytes.
b13927da 2827
2cc4b9cc 2828If @emph{no} @code{'='} bytes are present in an element, that element is
b13927da
UD
2829considered the name of a ``null'' entry, as distinct from an entry with an
2830empty value: @code{envz_get} will return @code{0} if given the name of null
2831entry, whereas an entry with an empty value would result in a value of
2832@code{""}; @code{envz_entry} will still find such entries, however. Null
f0f308c1 2833entries can be removed with the @code{envz_strip} function.
b13927da
UD
2834
2835As with argz functions, envz functions that may allocate memory (and thus
2836fail) have a return type of @code{error_t}, and return either @code{0} or
2837@code{ENOMEM}.
2838
2839@pindex envz.h
2840These functions are declared in the standard include file @file{envz.h}.
2841
2842@deftypefun {char *} envz_entry (const char *@var{envz}, size_t @var{envz_len}, const char *@var{name})
d08a7e4c 2843@standards{GNU, envz.h}
11087373 2844@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
b13927da
UD
2845The @code{envz_entry} function finds the entry in @var{envz} with the name
2846@var{name}, and returns a pointer to the whole entry---that is, the argz
2cc4b9cc 2847element which begins with @var{name} followed by a @code{'='} byte. If
b13927da
UD
2848there is no entry with that name, @code{0} is returned.
2849@end deftypefun
2850
2851@deftypefun {char *} envz_get (const char *@var{envz}, size_t @var{envz_len}, const char *@var{name})
d08a7e4c 2852@standards{GNU, envz.h}
11087373 2853@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
b13927da
UD
2854The @code{envz_get} function finds the entry in @var{envz} with the name
2855@var{name} (like @code{envz_entry}), and returns a pointer to the value
2856portion of that entry (following the @code{'='}). If there is no entry with
2857that name (or only a null entry), @code{0} is returned.
2858@end deftypefun
2859
2860@deftypefun {error_t} envz_add (char **@var{envz}, size_t *@var{envz_len}, const char *@var{name}, const char *@var{value})
d08a7e4c 2861@standards{GNU, envz.h}
11087373
AO
2862@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
2863@c Calls envz_remove, which calls enz_entry and argz_delete, and then
2864@c argz_add or equivalent code that reallocs and appends name=value.
b13927da
UD
2865The @code{envz_add} function adds an entry to @code{*@var{envz}}
2866(updating @code{*@var{envz}} and @code{*@var{envz_len}}) with the name
2867@var{name}, and value @var{value}. If an entry with the same name
2868already exists in @var{envz}, it is removed first. If @var{value} is
f0f308c1 2869@code{0}, then the new entry will be the special null type of entry
b13927da
UD
2870(mentioned above).
2871@end deftypefun
2872
2873@deftypefun {error_t} envz_merge (char **@var{envz}, size_t *@var{envz_len}, const char *@var{envz2}, size_t @var{envz2_len}, int @var{override})
d08a7e4c 2874@standards{GNU, envz.h}
11087373 2875@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
b13927da
UD
2876The @code{envz_merge} function adds each entry in @var{envz2} to @var{envz},
2877as if with @code{envz_add}, updating @code{*@var{envz}} and
2878@code{*@var{envz_len}}. If @var{override} is true, then values in @var{envz2}
2879will supersede those with the same name in @var{envz}, otherwise not.
2880
2881Null entries are treated just like other entries in this respect, so a null
2882entry in @var{envz} can prevent an entry of the same name in @var{envz2} from
2883being added to @var{envz}, if @var{override} is false.
2884@end deftypefun
2885
2886@deftypefun {void} envz_strip (char **@var{envz}, size_t *@var{envz_len})
d08a7e4c 2887@standards{GNU, envz.h}
11087373 2888@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
b13927da
UD
2889The @code{envz_strip} function removes any null entries from @var{envz},
2890updating @code{*@var{envz}} and @code{*@var{envz_len}}.
2891@end deftypefun
11087373 2892
920d7012 2893@deftypefun {void} envz_remove (char **@var{envz}, size_t *@var{envz_len}, const char *@var{name})
d08a7e4c 2894@standards{GNU, envz.h}
654055e0 2895@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
920d7012
SP
2896The @code{envz_remove} function removes an entry named @var{name} from
2897@var{envz}, updating @code{*@var{envz}} and @code{*@var{envz_len}}.
2898@end deftypefun
2899
11087373
AO
2900@c FIXME this are undocumented:
2901@c strcasecmp_l @safety{@mtsafe{}@assafe{}@acsafe{}} see strcasecmp