]> git.ipfire.org Git - thirdparty/glibc.git/blame - manual/string.texi
Replace rawmemchr (s, '\0') with strchr
[thirdparty/glibc.git] / manual / string.texi
CommitLineData
390955cb 1@node String and Array Utilities, Character Set Handling, Character Handling, Top
7a68c94a 2@c %MENU% Utilities for copying and comparing strings and arrays
28f540f4
RM
3@chapter String and Array Utilities
4
2cc4b9cc 5Operations on strings (null-terminated byte sequences) are an important part of
1f77f049 6many programs. @Theglibc{} provides an extensive set of string
28f540f4
RM
7utility functions, including functions for copying, concatenating,
8comparing, and searching strings. Many of these functions can also
9operate on arbitrary regions of storage; for example, the @code{memcpy}
a5113b14 10function can be used to copy the contents of any kind of array.
28f540f4
RM
11
12It's fairly common for beginning C programmers to ``reinvent the wheel''
13by duplicating this functionality in their own code, but it pays to
14become familiar with the library functions and to make use of them,
15since this offers benefits in maintenance, efficiency, and portability.
16
17For instance, you could easily compare one string to another in two
18lines of C code, but if you use the built-in @code{strcmp} function,
19you're less likely to make a mistake. And, since these library
20functions are typically highly optimized, your program may run faster
21too.
22
23@menu
24* Representation of Strings:: Introduction to basic concepts.
25* String/Array Conventions:: Whether to use a string function or an
26 arbitrary array function.
27* String Length:: Determining the length of a string.
0a13c9e9
PE
28* Copying Strings and Arrays:: Functions to copy strings and arrays.
29* Concatenating Strings:: Functions to concatenate strings while copying.
30* Truncating Strings:: Functions to truncate strings while copying.
28f540f4
RM
31* String/Array Comparison:: Functions for byte-wise and character-wise
32 comparison.
33* Collation Functions:: Functions for collating strings.
34* Search Functions:: Searching for a specific element or substring.
35* Finding Tokens in a String:: Splitting a string into tokens by looking
36 for delimiters.
ea1bd74d
ZW
37* Erasing Sensitive Data:: Clearing memory which contains sensitive
38 data, after it's no longer needed.
b10a0acc
ZW
39* Shuffling Bytes:: Or how to flash-cook a string.
40* Obfuscating Data:: Reversibly obscuring data from casual view.
b4012b75 41* Encode Binary Data:: Encoding and Decoding of Binary Data.
b13927da 42* Argz and Envz Vectors:: Null-separated string vectors.
28f540f4
RM
43@end menu
44
b4012b75 45@node Representation of Strings
28f540f4
RM
46@section Representation of Strings
47@cindex string, representation of
48
49This section is a quick summary of string concepts for beginning C
2cc4b9cc 50programmers. It describes how strings are represented in C
28f540f4
RM
51and some common pitfalls. If you are already familiar with this
52material, you can skip this section.
53
54@cindex string
2cc4b9cc
PE
55A @dfn{string} is a null-terminated array of bytes of type @code{char},
56including the terminating null byte. String-valued
28f540f4
RM
57variables are usually declared to be pointers of type @code{char *}.
58Such variables do not include space for the text of a string; that has
59to be stored somewhere else---in an array variable, a string constant,
60or dynamically allocated memory (@pxref{Memory Allocation}). It's up to
61you to store the address of the chosen memory space into the pointer
62variable. Alternatively you can store a @dfn{null pointer} in the
63pointer variable. The null pointer does not point anywhere, so
64attempting to reference the string it points to gets an error.
65
2cc4b9cc
PE
66@cindex multibyte character
67@cindex multibyte string
68@cindex wide string
69A @dfn{multibyte character} is a sequence of one or more bytes that
70represents a single character using the locale's encoding scheme; a
71null byte always represents the null character. A @dfn{multibyte
72string} is a string that consists entirely of multibyte
73characters. In contrast, a @dfn{wide string} is a null-terminated
74sequence of @code{wchar_t} objects. A wide-string variable is usually
75declared to be a pointer of type @code{wchar_t *}, by analogy with
76string variables and @code{char *}. @xref{Extended Char Intro}.
77
78@cindex null byte
8a2f1f5b 79@cindex null wide character
2cc4b9cc
PE
80By convention, the @dfn{null byte}, @code{'\0'},
81marks the end of a string and the @dfn{null wide character},
82@code{L'\0'}, marks the end of a wide string. For example, in
8a2f1f5b 83testing to see whether the @code{char *} variable @var{p} points to a
2cc4b9cc 84null byte marking the end of a string, you can write
8a2f1f5b 85@code{!*@var{p}} or @code{*@var{p} == '\0'}.
28f540f4 86
2cc4b9cc
PE
87A null byte is quite different conceptually from a null pointer,
88although both are represented by the integer constant @code{0}.
28f540f4
RM
89
90@cindex string literal
2cc4b9cc
PE
91A @dfn{string literal} appears in C program source as a multibyte
92string between double-quote characters (@samp{"}). If the
93initial double-quote character is immediately preceded by a capital
94@samp{L} (ell) character (as in @code{L"foo"}), it is a wide string
95literal. String literals can also contribute to @dfn{string
96concatenation}: @code{"a" "b"} is the same as @code{"ab"}.
97For wide strings one can use either
8a2f1f5b
UD
98@code{L"a" L"b"} or @code{L"a" "b"}. Modification of string literals is
99not allowed by the GNU C compiler, because literals are placed in
100read-only storage.
28f540f4 101
2cc4b9cc 102Arrays that are declared @code{const} cannot be modified
28f540f4
RM
103either. It's generally good style to declare non-modifiable string
104pointers to be of type @code{const char *}, since this often allows the
105C compiler to detect accidental modifications as well as providing some
106amount of documentation about what your program intends to do with the
107string.
108
2cc4b9cc
PE
109The amount of memory allocated for a byte array may extend past the null byte
110that marks the end of the string that the array contains. In this
dd7d45e8 111document, the term @dfn{allocated size} is always used to refer to the
2cc4b9cc
PE
112total amount of memory allocated for an array, while the term
113@dfn{length} refers to the number of bytes up to (but not including)
114the terminating null byte. Wide strings are similar, except their
115sizes and lengths count wide characters, not bytes.
28f540f4
RM
116@cindex length of string
117@cindex allocation size of string
118@cindex size of string
119@cindex string length
120@cindex string allocation
121
2cc4b9cc 122A notorious source of program bugs is trying to put more bytes into a
28f540f4 123string than fit in its allocated size. When writing code that extends
2cc4b9cc 124strings or moves bytes into a pre-allocated array, you should be
28f540f4
RM
125very careful to keep track of the length of the text and make explicit
126checks for overflowing the array. Many of the library functions
127@emph{do not} do this for you! Remember also that you need to allocate
2cc4b9cc 128an extra byte to hold the null byte that marks the end of the
28f540f4
RM
129string.
130
8a2f1f5b
UD
131@cindex single-byte string
132@cindex multibyte string
2cc4b9cc 133Originally strings were sequences of bytes where each byte represented a
8a2f1f5b
UD
134single character. This is still true today if the strings are encoded
135using a single-byte character encoding. Things are different if the
136strings are encoded using a multibyte encoding (for more information on
137encodings see @ref{Extended Char Intro}). There is no difference in
138the programming interface for these two kind of strings; the programmer
139has to be aware of this and interpret the byte sequences accordingly.
140
141But since there is no separate interface taking care of these
142differences the byte-based string functions are sometimes hard to use.
143Since the count parameters of these functions specify bytes a call to
2cc4b9cc 144@code{memcpy} could cut a multibyte character in the middle and put an
8a2f1f5b
UD
145incomplete (and therefore unusable) byte sequence in the target buffer.
146
2cc4b9cc 147@cindex wide string
8a2f1f5b
UD
148To avoid these problems later versions of the @w{ISO C} standard
149introduce a second set of functions which are operating on @dfn{wide
150characters} (@pxref{Extended Char Intro}). These functions don't have
151the problems the single-byte versions have since every wide character is
152a legal, interpretable value. This does not mean that cutting wide
2cc4b9cc 153strings at arbitrary points is without problems. It normally
8a2f1f5b
UD
154is for alphabet-based languages (except for non-normalized text) but
155languages based on syllables still have the problem that more than one
156wide character is necessary to complete a logical unit. This is a
157higher level problem which the @w{C library} functions are not designed
158to solve. But it is at least good that no invalid byte sequences can be
2cc4b9cc
PE
159created. Also, the higher level functions can also much more easily operate
160on wide characters than on multibyte characters so that a common strategy
8a2f1f5b
UD
161is to use wide characters internally whenever text is more than simply
162copied.
163
164The remaining of this chapter will discuss the functions for handling
2cc4b9cc
PE
165wide strings in parallel with the discussion of
166strings since there is almost always an exact equivalent
8a2f1f5b
UD
167available.
168
b4012b75 169@node String/Array Conventions
28f540f4
RM
170@section String and Array Conventions
171
172This chapter describes both functions that work on arbitrary arrays or
2cc4b9cc
PE
173blocks of memory, and functions that are specific to strings and wide
174strings.
28f540f4
RM
175
176Functions that operate on arbitrary blocks of memory have names
8a2f1f5b
UD
177beginning with @samp{mem} and @samp{wmem} (such as @code{memcpy} and
178@code{wmemcpy}) and invariably take an argument which specifies the size
179(in bytes and wide characters respectively) of the block of memory to
28f540f4 180operate on. The array arguments and return values for these functions
8a2f1f5b
UD
181have type @code{void *} or @code{wchar_t}. As a matter of style, the
182elements of the arrays used with the @samp{mem} functions are referred
183to as ``bytes''. You can pass any kind of pointer to these functions,
184and the @code{sizeof} operator is useful in computing the value for the
185size argument. Parameters to the @samp{wmem} functions must be of type
186@code{wchar_t *}. These functions are not really usable with anything
187but arrays of this type.
188
189In contrast, functions that operate specifically on strings and wide
2cc4b9cc 190strings have names beginning with @samp{str} and @samp{wcs}
8a2f1f5b 191respectively (such as @code{strcpy} and @code{wcscpy}) and look for a
2cc4b9cc 192terminating null byte or null wide character instead of requiring an explicit
8a2f1f5b 193size argument to be passed. (Some of these functions accept a specified
2cc4b9cc
PE
194maximum length, but they also check for premature termination.)
195The array arguments and return values for these
8a2f1f5b 196functions have type @code{char *} and @code{wchar_t *} respectively, and
2cc4b9cc 197the array elements are referred to as ``bytes'' and ``wide
8a2f1f5b
UD
198characters''.
199
200In many cases, there are both @samp{mem} and @samp{str}/@samp{wcs}
201versions of a function. The one that is more appropriate to use depends
202on the exact situation. When your program is manipulating arbitrary
203arrays or blocks of storage, then you should always use the @samp{mem}
2cc4b9cc 204functions. On the other hand, when you are manipulating
8a2f1f5b
UD
205strings it is usually more convenient to use the @samp{str}/@samp{wcs}
206functions, unless you already know the length of the string in advance.
207The @samp{wmem} functions should be used for wide character arrays with
208known size.
209
210@cindex wint_t
211@cindex parameter promotion
212Some of the memory and string functions take single characters as
213arguments. Since a value of type @code{char} is automatically promoted
9dcc8f11 214into a value of type @code{int} when used as a parameter, the functions
8a2f1f5b 215are declared with @code{int} as the type of the parameter in question.
2cc4b9cc 216In case of the wide character functions the situation is similar: the
8a2f1f5b
UD
217parameter type for a single wide character is @code{wint_t} and not
218@code{wchar_t}. This would for many implementations not be necessary
2cc4b9cc 219since @code{wchar_t} is large enough to not be automatically
8a2f1f5b
UD
220promoted, but since the @w{ISO C} standard does not require such a
221choice of types the @code{wint_t} type is used.
28f540f4 222
b4012b75 223@node String Length
28f540f4
RM
224@section String Length
225
226You can get the length of a string using the @code{strlen} function.
227This function is declared in the header file @file{string.h}.
228@pindex string.h
229
28f540f4 230@deftypefun size_t strlen (const char *@var{s})
d08a7e4c 231@standards{ISO, string.h}
11087373 232@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
2cc4b9cc 233The @code{strlen} function returns the length of the
8a2f1f5b 234string @var{s} in bytes. (In other words, it returns the offset of the
2cc4b9cc 235terminating null byte within the array.)
28f540f4
RM
236
237For example,
238@smallexample
239strlen ("hello, world")
240 @result{} 12
241@end smallexample
242
2cc4b9cc 243When applied to an array, the @code{strlen} function returns
dd7d45e8 244the length of the string stored there, not its allocated size. You can
2cc4b9cc 245get the allocated size of the array that holds a string using
28f540f4
RM
246the @code{sizeof} operator:
247
248@smallexample
a5113b14 249char string[32] = "hello, world";
28f540f4
RM
250sizeof (string)
251 @result{} 32
252strlen (string)
253 @result{} 12
254@end smallexample
dd7d45e8 255
2cc4b9cc 256But beware, this will not work unless @var{string} is the
dd7d45e8
UD
257array itself, not a pointer to it. For example:
258
259@smallexample
260char string[32] = "hello, world";
261char *ptr = string;
262sizeof (string)
263 @result{} 32
264sizeof (ptr)
265 @result{} 4 /* @r{(on a machine with 4 byte pointers)} */
266@end smallexample
267
268This is an easy mistake to make when you are working with functions that
269take string arguments; those arguments are always pointers, not arrays.
270
8a2f1f5b
UD
271It must also be noted that for multibyte encoded strings the return
272value does not have to correspond to the number of characters in the
273string. To get this value the string can be converted to wide
274characters and @code{wcslen} can be used or something like the following
275code can be used:
276
277@smallexample
278/* @r{The input is in @code{string}.}
279 @r{The length is expected in @code{n}.} */
280@{
281 mbstate_t t;
282 char *scopy = string;
283 /* In initial state. */
284 memset (&t, '\0', sizeof (t));
285 /* Determine number of characters. */
286 n = mbsrtowcs (NULL, &scopy, strlen (scopy), &t);
287@}
288@end smallexample
289
290This is cumbersome to do so if the number of characters (as opposed to
291bytes) is needed often it is better to work with wide characters.
292@end deftypefun
293
294The wide character equivalent is declared in @file{wchar.h}.
295
8a2f1f5b 296@deftypefun size_t wcslen (const wchar_t *@var{ws})
d08a7e4c 297@standards{ISO, wchar.h}
11087373 298@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
299The @code{wcslen} function is the wide character equivalent to
300@code{strlen}. The return value is the number of wide characters in the
2cc4b9cc 301wide string pointed to by @var{ws} (this is also the offset of
8a2f1f5b
UD
302the terminating null wide character of @var{ws}).
303
2cc4b9cc 304Since there are no multi wide character sequences making up one wide
8a2f1f5b
UD
305character the return value is not only the offset in the array, it is
306also the number of wide characters.
307
308This function was introduced in @w{Amendment 1} to @w{ISO C90}.
28f540f4
RM
309@end deftypefun
310
4547c1a4 311@deftypefun size_t strnlen (const char *@var{s}, size_t @var{maxlen})
d08a7e4c 312@standards{GNU, string.h}
11087373 313@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
2cc4b9cc
PE
314If the array @var{s} of size @var{maxlen} contains a null byte,
315the @code{strnlen} function returns the length of the string @var{s} in
316bytes. Otherwise it
8a2f1f5b 317returns @var{maxlen}. Therefore this function is equivalent to
ebaf36eb
JM
318@code{(strlen (@var{s}) < @var{maxlen} ? strlen (@var{s}) : @var{maxlen})}
319but it
2cc4b9cc
PE
320is more efficient and works even if @var{s} is not null-terminated so
321long as @var{maxlen} does not exceed the size of @var{s}'s array.
4547c1a4
UD
322
323@smallexample
324char string[32] = "hello, world";
325strnlen (string, 32)
326 @result{} 12
327strnlen (string, 5)
328 @result{} 5
329@end smallexample
330
8a2f1f5b
UD
331This function is a GNU extension and is declared in @file{string.h}.
332@end deftypefun
333
8a2f1f5b 334@deftypefun size_t wcsnlen (const wchar_t *@var{ws}, size_t @var{maxlen})
d08a7e4c 335@standards{GNU, wchar.h}
11087373 336@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
337@code{wcsnlen} is the wide character equivalent to @code{strnlen}. The
338@var{maxlen} parameter specifies the maximum number of wide characters.
339
340This function is a GNU extension and is declared in @file{wchar.h}.
4547c1a4
UD
341@end deftypefun
342
0a13c9e9
PE
343@node Copying Strings and Arrays
344@section Copying Strings and Arrays
28f540f4
RM
345
346You can use the functions described in this section to copy the contents
0a13c9e9
PE
347of strings, wide strings, and arrays. The @samp{str} and @samp{mem}
348functions are declared in @file{string.h} while the @samp{w} functions
349are declared in @file{wchar.h}.
28f540f4 350@pindex string.h
8a2f1f5b 351@pindex wchar.h
28f540f4
RM
352@cindex copying strings and arrays
353@cindex string copy functions
354@cindex array copy functions
355@cindex concatenating strings
356@cindex string concatenation functions
357
358A helpful way to remember the ordering of the arguments to the functions
359in this section is that it corresponds to an assignment expression, with
0a13c9e9
PE
360the destination array specified to the left of the source array. Most
361of these functions return the address of the destination array; a few
362return the address of the destination's terminating null, or of just
363past the destination.
28f540f4
RM
364
365Most of these functions do not work properly if the source and
366destination arrays overlap. For example, if the beginning of the
367destination array overlaps the end of the source array, the original
368contents of that part of the source array may get overwritten before it
369is copied. Even worse, in the case of the string functions, the null
2cc4b9cc 370byte marking the end of the string may be lost, and the copy
28f540f4
RM
371function might get stuck in a loop trashing all the memory allocated to
372your program.
373
374All functions that have problems copying between overlapping arrays are
375explicitly identified in this manual. In addition to functions in this
376section, there are a few others like @code{sprintf} (@pxref{Formatted
377Output Functions}) and @code{scanf} (@pxref{Formatted Input
378Functions}).
379
8a2f1f5b 380@deftypefun {void *} memcpy (void *restrict @var{to}, const void *restrict @var{from}, size_t @var{size})
d08a7e4c 381@standards{ISO, string.h}
11087373 382@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
383The @code{memcpy} function copies @var{size} bytes from the object
384beginning at @var{from} into the object beginning at @var{to}. The
385behavior of this function is undefined if the two arrays @var{to} and
386@var{from} overlap; use @code{memmove} instead if overlapping is possible.
387
388The value returned by @code{memcpy} is the value of @var{to}.
389
390Here is an example of how you might use @code{memcpy} to copy the
391contents of an array:
392
393@smallexample
394struct foo *oldarray, *newarray;
395int arraysize;
396@dots{}
397memcpy (new, old, arraysize * sizeof (struct foo));
398@end smallexample
399@end deftypefun
400
79827876 401@deftypefun {wchar_t *} wmemcpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size})
d08a7e4c 402@standards{ISO, wchar.h}
11087373 403@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
404The @code{wmemcpy} function copies @var{size} wide characters from the object
405beginning at @var{wfrom} into the object beginning at @var{wto}. The
406behavior of this function is undefined if the two arrays @var{wto} and
407@var{wfrom} overlap; use @code{wmemmove} instead if overlapping is possible.
408
409The following is a possible implementation of @code{wmemcpy} but there
410are more optimizations possible.
411
412@smallexample
413wchar_t *
414wmemcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom,
415 size_t size)
416@{
417 return (wchar_t *) memcpy (wto, wfrom, size * sizeof (wchar_t));
418@}
419@end smallexample
420
421The value returned by @code{wmemcpy} is the value of @var{wto}.
422
423This function was introduced in @w{Amendment 1} to @w{ISO C90}.
424@end deftypefun
425
8a2f1f5b 426@deftypefun {void *} mempcpy (void *restrict @var{to}, const void *restrict @var{from}, size_t @var{size})
d08a7e4c 427@standards{GNU, string.h}
11087373 428@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
4547c1a4 429The @code{mempcpy} function is nearly identical to the @code{memcpy}
f2ea0f5b 430function. It copies @var{size} bytes from the object beginning at
4547c1a4 431@code{from} into the object pointed to by @var{to}. But instead of
976780fd 432returning the value of @var{to} it returns a pointer to the byte
4547c1a4
UD
433following the last written byte in the object beginning at @var{to}.
434I.e., the value is @code{((void *) ((char *) @var{to} + @var{size}))}.
435
436This function is useful in situations where a number of objects shall be
437copied to consecutive memory positions.
438
439@smallexample
440void *
441combine (void *o1, size_t s1, void *o2, size_t s2)
442@{
443 void *result = malloc (s1 + s2);
444 if (result != NULL)
445 mempcpy (mempcpy (result, o1, s1), o2, s2);
446 return result;
447@}
448@end smallexample
449
450This function is a GNU extension.
451@end deftypefun
452
8a2f1f5b 453@deftypefun {wchar_t *} wmempcpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size})
d08a7e4c 454@standards{GNU, wchar.h}
11087373 455@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
456The @code{wmempcpy} function is nearly identical to the @code{wmemcpy}
457function. It copies @var{size} wide characters from the object
458beginning at @code{wfrom} into the object pointed to by @var{wto}. But
459instead of returning the value of @var{wto} it returns a pointer to the
460wide character following the last written wide character in the object
461beginning at @var{wto}. I.e., the value is @code{@var{wto} + @var{size}}.
462
463This function is useful in situations where a number of objects shall be
464copied to consecutive memory positions.
465
466The following is a possible implementation of @code{wmemcpy} but there
467are more optimizations possible.
468
469@smallexample
470wchar_t *
471wmempcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom,
472 size_t size)
473@{
474 return (wchar_t *) mempcpy (wto, wfrom, size * sizeof (wchar_t));
475@}
476@end smallexample
477
478This function is a GNU extension.
479@end deftypefun
480
28f540f4 481@deftypefun {void *} memmove (void *@var{to}, const void *@var{from}, size_t @var{size})
d08a7e4c 482@standards{ISO, string.h}
11087373 483@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
484@code{memmove} copies the @var{size} bytes at @var{from} into the
485@var{size} bytes at @var{to}, even if those two blocks of space
486overlap. In the case of overlap, @code{memmove} is careful to copy the
487original values of the bytes in the block at @var{from}, including those
488bytes which also belong to the block at @var{to}.
8a2f1f5b
UD
489
490The value returned by @code{memmove} is the value of @var{to}.
491@end deftypefun
492
8ded91fb 493@deftypefun {wchar_t *} wmemmove (wchar_t *@var{wto}, const wchar_t *@var{wfrom}, size_t @var{size})
d08a7e4c 494@standards{ISO, wchar.h}
11087373 495@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
496@code{wmemmove} copies the @var{size} wide characters at @var{wfrom}
497into the @var{size} wide characters at @var{wto}, even if those two
f0f308c1 498blocks of space overlap. In the case of overlap, @code{wmemmove} is
8a2f1f5b
UD
499careful to copy the original values of the wide characters in the block
500at @var{wfrom}, including those wide characters which also belong to the
501block at @var{wto}.
502
503The following is a possible implementation of @code{wmemcpy} but there
504are more optimizations possible.
505
506@smallexample
507wchar_t *
508wmempcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom,
509 size_t size)
510@{
511 return (wchar_t *) mempcpy (wto, wfrom, size * sizeof (wchar_t));
512@}
513@end smallexample
514
515The value returned by @code{wmemmove} is the value of @var{wto}.
516
517This function is a GNU extension.
28f540f4
RM
518@end deftypefun
519
8a2f1f5b 520@deftypefun {void *} memccpy (void *restrict @var{to}, const void *restrict @var{from}, int @var{c}, size_t @var{size})
d08a7e4c 521@standards{SVID, string.h}
11087373 522@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
523This function copies no more than @var{size} bytes from @var{from} to
524@var{to}, stopping if a byte matching @var{c} is found. The return
525value is a pointer into @var{to} one byte past where @var{c} was copied,
526or a null pointer if no byte matching @var{c} appeared in the first
527@var{size} bytes of @var{from}.
528@end deftypefun
529
28f540f4 530@deftypefun {void *} memset (void *@var{block}, int @var{c}, size_t @var{size})
d08a7e4c 531@standards{ISO, string.h}
11087373 532@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
533This function copies the value of @var{c} (converted to an
534@code{unsigned char}) into each of the first @var{size} bytes of the
535object beginning at @var{block}. It returns the value of @var{block}.
536@end deftypefun
537
8a2f1f5b 538@deftypefun {wchar_t *} wmemset (wchar_t *@var{block}, wchar_t @var{wc}, size_t @var{size})
d08a7e4c 539@standards{ISO, wchar.h}
11087373 540@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
541This function copies the value of @var{wc} into each of the first
542@var{size} wide characters of the object beginning at @var{block}. It
543returns the value of @var{block}.
544@end deftypefun
545
8a2f1f5b 546@deftypefun {char *} strcpy (char *restrict @var{to}, const char *restrict @var{from})
d08a7e4c 547@standards{ISO, string.h}
11087373 548@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
2cc4b9cc
PE
549This copies bytes from the string @var{from} (up to and including
550the terminating null byte) into the string @var{to}. Like
28f540f4
RM
551@code{memcpy}, this function has undefined results if the strings
552overlap. The return value is the value of @var{to}.
553@end deftypefun
554
8a2f1f5b 555@deftypefun {wchar_t *} wcscpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom})
d08a7e4c 556@standards{ISO, wchar.h}
11087373 557@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
2cc4b9cc 558This copies wide characters from the wide string @var{wfrom} (up to and
8a2f1f5b
UD
559including the terminating null wide character) into the string
560@var{wto}. Like @code{wmemcpy}, this function has undefined results if
561the strings overlap. The return value is the value of @var{wto}.
562@end deftypefun
563
28f540f4 564@deftypefun {char *} strdup (const char *@var{s})
a448ee41 565@standards{SVID, string.h}
11087373 566@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
2cc4b9cc 567This function copies the string @var{s} into a newly
28f540f4
RM
568allocated string. The string is allocated using @code{malloc}; see
569@ref{Unconstrained Allocation}. If @code{malloc} cannot allocate space
570for the new string, @code{strdup} returns a null pointer. Otherwise it
571returns a pointer to the new string.
572@end deftypefun
573
8a2f1f5b 574@deftypefun {wchar_t *} wcsdup (const wchar_t *@var{ws})
d08a7e4c 575@standards{GNU, wchar.h}
11087373 576@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
2cc4b9cc 577This function copies the wide string @var{ws}
8a2f1f5b
UD
578into a newly allocated string. The string is allocated using
579@code{malloc}; see @ref{Unconstrained Allocation}. If @code{malloc}
580cannot allocate space for the new string, @code{wcsdup} returns a null
2cc4b9cc 581pointer. Otherwise it returns a pointer to the new wide string.
8a2f1f5b
UD
582
583This function is a GNU extension.
584@end deftypefun
585
8a2f1f5b 586@deftypefun {char *} stpcpy (char *restrict @var{to}, const char *restrict @var{from})
d08a7e4c 587@standards{Unknown origin, string.h}
11087373 588@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
589This function is like @code{strcpy}, except that it returns a pointer to
590the end of the string @var{to} (that is, the address of the terminating
2cc4b9cc 591null byte @code{to + strlen (from)}) rather than the beginning.
28f540f4
RM
592
593For example, this program uses @code{stpcpy} to concatenate @samp{foo}
594and @samp{bar} to produce @samp{foobar}, which it then prints.
595
596@smallexample
597@include stpcpy.c.texi
598@end smallexample
599
c30c3f46
RM
600This function is part of POSIX.1-2008 and later editions, but was
601available in @theglibc{} and other systems as an extension long before
602it was standardized.
28f540f4 603
8a2f1f5b
UD
604Its behavior is undefined if the strings overlap. The function is
605declared in @file{string.h}.
606@end deftypefun
607
8a2f1f5b 608@deftypefun {wchar_t *} wcpcpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom})
d08a7e4c 609@standards{GNU, wchar.h}
11087373 610@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
611This function is like @code{wcscpy}, except that it returns a pointer to
612the end of the string @var{wto} (that is, the address of the terminating
2cc4b9cc 613null wide character @code{wto + wcslen (wfrom)}) rather than the beginning.
8a2f1f5b
UD
614
615This function is not part of ISO or POSIX but was found useful while
1f77f049 616developing @theglibc{} itself.
8a2f1f5b
UD
617
618The behavior of @code{wcpcpy} is undefined if the strings overlap.
619
620@code{wcpcpy} is a GNU extension and is declared in @file{wchar.h}.
28f540f4
RM
621@end deftypefun
622
26b4d766 623@deftypefn {Macro} {char *} strdupa (const char *@var{s})
d08a7e4c 624@standards{GNU, string.h}
11087373 625@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
976780fd 626This macro is similar to @code{strdup} but allocates the new string
dd7d45e8
UD
627using @code{alloca} instead of @code{malloc} (@pxref{Variable Size
628Automatic}). This means of course the returned string has the same
629limitations as any block of memory allocated using @code{alloca}.
706074a5 630
dd7d45e8 631For obvious reasons @code{strdupa} is implemented only as a macro;
40a55d20 632you cannot get the address of this function. Despite this limitation
706074a5
UD
633it is a useful function. The following code shows a situation where
634using @code{malloc} would be a lot more expensive.
635
636@smallexample
637@include strdupa.c.texi
638@end smallexample
639
640Please note that calling @code{strtok} using @var{path} directly is
8a2f1f5b
UD
641invalid. It is also not allowed to call @code{strdupa} in the argument
642list of @code{strtok} since @code{strdupa} uses @code{alloca}
643(@pxref{Variable Size Automatic}) can interfere with the parameter
644passing.
706074a5
UD
645
646This function is only available if GNU CC is used.
26b4d766 647@end deftypefn
706074a5 648
0a13c9e9 649@deftypefun void bcopy (const void *@var{from}, void *@var{to}, size_t @var{size})
d08a7e4c 650@standards{BSD, string.h}
11087373 651@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
0a13c9e9
PE
652This is a partially obsolete alternative for @code{memmove}, derived from
653BSD. Note that it is not quite equivalent to @code{memmove}, because the
654arguments are not in the same order and there is no return value.
655@end deftypefun
706074a5 656
0a13c9e9 657@deftypefun void bzero (void *@var{block}, size_t @var{size})
d08a7e4c 658@standards{BSD, string.h}
0a13c9e9
PE
659@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
660This is a partially obsolete alternative for @code{memset}, derived from
661BSD. Note that it is not as general as @code{memset}, because the only
662value it can store is zero.
663@end deftypefun
706074a5 664
0a13c9e9
PE
665@node Concatenating Strings
666@section Concatenating Strings
667@pindex string.h
668@pindex wchar.h
669@cindex concatenating strings
670@cindex string concatenation functions
671
672The functions described in this section concatenate the contents of a
673string or wide string to another. They follow the string-copying
674functions in their conventions. @xref{Copying Strings and Arrays}.
675@samp{strcat} is declared in the header file @file{string.h} while
676@samp{wcscat} is declared in @file{wchar.h}.
706074a5 677
8a2f1f5b 678@deftypefun {char *} strcat (char *restrict @var{to}, const char *restrict @var{from})
d08a7e4c 679@standards{ISO, string.h}
11087373 680@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4 681The @code{strcat} function is similar to @code{strcpy}, except that the
2cc4b9cc
PE
682bytes from @var{from} are concatenated or appended to the end of
683@var{to}, instead of overwriting it. That is, the first byte from
684@var{from} overwrites the null byte marking the end of @var{to}.
28f540f4
RM
685
686An equivalent definition for @code{strcat} would be:
687
688@smallexample
689char *
8a2f1f5b 690strcat (char *restrict to, const char *restrict from)
28f540f4
RM
691@{
692 strcpy (to + strlen (to), from);
693 return to;
694@}
695@end smallexample
696
697This function has undefined results if the strings overlap.
0a13c9e9
PE
698
699As noted below, this function has significant performance issues.
28f540f4
RM
700@end deftypefun
701
8a2f1f5b 702@deftypefun {wchar_t *} wcscat (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom})
d08a7e4c 703@standards{ISO, wchar.h}
11087373 704@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b 705The @code{wcscat} function is similar to @code{wcscpy}, except that the
2cc4b9cc
PE
706wide characters from @var{wfrom} are concatenated or appended to the end of
707@var{wto}, instead of overwriting it. That is, the first wide character from
708@var{wfrom} overwrites the null wide character marking the end of @var{wto}.
8a2f1f5b
UD
709
710An equivalent definition for @code{wcscat} would be:
711
712@smallexample
713wchar_t *
714wcscat (wchar_t *wto, const wchar_t *wfrom)
715@{
716 wcscpy (wto + wcslen (wto), wfrom);
717 return wto;
718@}
719@end smallexample
720
721This function has undefined results if the strings overlap.
0a13c9e9
PE
722
723As noted below, this function has significant performance issues.
8a2f1f5b
UD
724@end deftypefun
725
726Programmers using the @code{strcat} or @code{wcscat} function (or the
0a13c9e9
PE
727@code{strncat} or @code{wcsncat} functions defined in
728a later section, for that matter)
8a2f1f5b
UD
729can easily be recognized as lazy and reckless. In almost all situations
730the lengths of the participating strings are known (it better should be
731since how can one otherwise ensure the allocated size of the buffer is
732sufficient?) Or at least, one could know them if one keeps track of the
ee2752ea 733results of the various function calls. But then it is very inefficient
8a2f1f5b
UD
734to use @code{strcat}/@code{wcscat}. A lot of time is wasted finding the
735end of the destination string so that the actual copying can start.
736This is a common example:
ee2752ea 737
ee2752ea
UD
738@cindex va_copy
739@smallexample
49c091e5 740/* @r{This function concatenates arbitrarily many strings. The last}
ee2752ea
UD
741 @r{parameter must be @code{NULL}.} */
742char *
8a2f1f5b 743concat (const char *str, @dots{})
ee2752ea
UD
744@{
745 va_list ap, ap2;
746 size_t total = 1;
ee2752ea
UD
747
748 va_start (ap, str);
b5982523 749 va_copy (ap2, ap);
ee2752ea
UD
750
751 /* @r{Determine how much space we need.} */
bdc674d9 752 for (const char *s = str; s != NULL; s = va_arg (ap, const char *))
ee2752ea
UD
753 total += strlen (s);
754
755 va_end (ap);
756
bdc674d9 757 char *result = malloc (total);
ee2752ea
UD
758 if (result != NULL)
759 @{
760 result[0] = '\0';
761
762 /* @r{Copy the strings.} */
763 for (s = str; s != NULL; s = va_arg (ap2, const char *))
764 strcat (result, s);
765 @}
766
767 va_end (ap2);
768
769 return result;
770@}
771@end smallexample
772
773This looks quite simple, especially the second loop where the strings
774are actually copied. But these innocent lines hide a major performance
775penalty. Just imagine that ten strings of 100 bytes each have to be
776concatenated. For the second string we search the already stored 100
777bytes for the end of the string so that we can append the next string.
778For all strings in total the comparisons necessary to find the end of
779the intermediate results sums up to 5500! If we combine the copying
780with the search for the allocation we can write this function more
f0f308c1 781efficiently:
ee2752ea
UD
782
783@smallexample
784char *
8a2f1f5b 785concat (const char *str, @dots{})
ee2752ea 786@{
ee2752ea 787 size_t allocated = 100;
bdc674d9 788 char *result = malloc (allocated);
ee2752ea 789
623281e0 790 if (result != NULL)
ee2752ea 791 @{
bdc674d9
PE
792 va_list ap;
793 size_t resultlen = 0;
ee2752ea
UD
794 char *newp;
795
623281e0 796 va_start (ap, str);
ee2752ea 797
bdc674d9 798 for (const char *s = str; s != NULL; s = va_arg (ap, const char *))
ee2752ea
UD
799 @{
800 size_t len = strlen (s);
801
802 /* @r{Resize the allocated memory if necessary.} */
bdc674d9 803 if (resultlen + len + 1 > allocated)
ee2752ea 804 @{
bdc674d9
PE
805 allocated += len;
806 newp = reallocarray (result, allocated, 2);
807 allocated *= 2;
ee2752ea
UD
808 if (newp == NULL)
809 @{
810 free (result);
811 return NULL;
812 @}
ee2752ea
UD
813 result = newp;
814 @}
815
bdc674d9
PE
816 memcpy (result + resultlen, s, len);
817 resultlen += len;
ee2752ea
UD
818 @}
819
820 /* @r{Terminate the result string.} */
bdc674d9 821 result[resultlen++] = '\0';
ee2752ea
UD
822
823 /* @r{Resize memory to the optimal size.} */
bdc674d9 824 newp = realloc (result, resultlen);
ee2752ea
UD
825 if (newp != NULL)
826 result = newp;
827
828 va_end (ap);
829 @}
830
831 return result;
832@}
833@end smallexample
834
835With a bit more knowledge about the input strings one could fine-tune
836the memory allocation. The difference we are pointing to here is that
837we don't use @code{strcat} anymore. We always keep track of the length
f0f308c1 838of the current intermediate result so we can save ourselves the search for the
ee2752ea 839end of the string and use @code{mempcpy}. Please note that we also
f0f308c1
RJ
840don't use @code{stpcpy} which might seem more natural since we are handling
841strings. But this is not necessary since we already know the
ee2752ea 842length of the string and therefore can use the faster memory copying
8a2f1f5b 843function. The example would work for wide characters the same way.
ee2752ea
UD
844
845Whenever a programmer feels the need to use @code{strcat} she or he
f0f308c1 846should think twice and look through the program to see whether the code cannot
ee2752ea
UD
847be rewritten to take advantage of already calculated results. Again: it
848is almost always unnecessary to use @code{strcat}.
849
0a13c9e9
PE
850@node Truncating Strings
851@section Truncating Strings while Copying
852@cindex truncating strings
853@cindex string truncation
854
855The functions described in this section copy or concatenate the
856possibly-truncated contents of a string or array to another, and
857similarly for wide strings. They follow the string-copying functions
858in their header conventions. @xref{Copying Strings and Arrays}. The
859@samp{str} functions are declared in the header file @file{string.h}
860and the @samp{wc} functions are declared in the file @file{wchar.h}.
861
0a13c9e9 862@deftypefun {char *} strncpy (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size})
a448ee41 863@standards{C90, string.h}
0a13c9e9
PE
864@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
865This function is similar to @code{strcpy} but always copies exactly
866@var{size} bytes into @var{to}.
867
868If @var{from} does not contain a null byte in its first @var{size}
869bytes, @code{strncpy} copies just the first @var{size} bytes. In this
870case no null terminator is written into @var{to}.
871
872Otherwise @var{from} must be a string with length less than
873@var{size}. In this case @code{strncpy} copies all of @var{from},
874followed by enough null bytes to add up to @var{size} bytes in all.
875
876The behavior of @code{strncpy} is undefined if the strings overlap.
877
878This function was designed for now-rarely-used arrays consisting of
879non-null bytes followed by zero or more null bytes. It needs to set
880all @var{size} bytes of the destination, even when @var{size} is much
881greater than the length of @var{from}. As noted below, this function
882is generally a poor choice for processing text.
883@end deftypefun
884
0a13c9e9 885@deftypefun {wchar_t *} wcsncpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size})
d08a7e4c 886@standards{ISO, wchar.h}
0a13c9e9
PE
887@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
888This function is similar to @code{wcscpy} but always copies exactly
889@var{size} wide characters into @var{wto}.
890
891If @var{wfrom} does not contain a null wide character in its first
892@var{size} wide characters, then @code{wcsncpy} copies just the first
893@var{size} wide characters. In this case no null terminator is
894written into @var{wto}.
895
896Otherwise @var{wfrom} must be a wide string with length less than
897@var{size}. In this case @code{wcsncpy} copies all of @var{wfrom},
898followed by enough null wide characters to add up to @var{size} wide
899characters in all.
900
901The behavior of @code{wcsncpy} is undefined if the strings overlap.
902
903This function is the wide-character counterpart of @code{strncpy} and
904suffers from most of the problems that @code{strncpy} does. For
905example, as noted below, this function is generally a poor choice for
906processing text.
907@end deftypefun
908
0a13c9e9 909@deftypefun {char *} strndup (const char *@var{s}, size_t @var{size})
d08a7e4c 910@standards{GNU, string.h}
0a13c9e9
PE
911@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
912This function is similar to @code{strdup} but always copies at most
913@var{size} bytes into the newly allocated string.
914
915If the length of @var{s} is more than @var{size}, then @code{strndup}
916copies just the first @var{size} bytes and adds a closing null byte.
917Otherwise all bytes are copied and the string is terminated.
918
919This function differs from @code{strncpy} in that it always terminates
920the destination string.
921
922As noted below, this function is generally a poor choice for
923processing text.
924
925@code{strndup} is a GNU extension.
926@end deftypefun
927
0a13c9e9 928@deftypefn {Macro} {char *} strndupa (const char *@var{s}, size_t @var{size})
d08a7e4c 929@standards{GNU, string.h}
0a13c9e9
PE
930@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
931This function is similar to @code{strndup} but like @code{strdupa} it
932allocates the new string using @code{alloca} @pxref{Variable Size
933Automatic}. The same advantages and limitations of @code{strdupa} are
934valid for @code{strndupa}, too.
935
936This function is implemented only as a macro, just like @code{strdupa}.
937Just as @code{strdupa} this macro also must not be used inside the
938parameter list in a function call.
939
940As noted below, this function is generally a poor choice for
941processing text.
942
943@code{strndupa} is only available if GNU CC is used.
944@end deftypefn
945
0a13c9e9 946@deftypefun {char *} stpncpy (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size})
d08a7e4c 947@standards{GNU, string.h}
0a13c9e9
PE
948@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
949This function is similar to @code{stpcpy} but copies always exactly
950@var{size} bytes into @var{to}.
951
952If the length of @var{from} is more than @var{size}, then @code{stpncpy}
953copies just the first @var{size} bytes and returns a pointer to the
954byte directly following the one which was copied last. Note that in
955this case there is no null terminator written into @var{to}.
956
957If the length of @var{from} is less than @var{size}, then @code{stpncpy}
958copies all of @var{from}, followed by enough null bytes to add up
959to @var{size} bytes in all. This behavior is rarely useful, but it
960is implemented to be useful in contexts where this behavior of the
961@code{strncpy} is used. @code{stpncpy} returns a pointer to the
962@emph{first} written null byte.
963
964This function is not part of ISO or POSIX but was found useful while
965developing @theglibc{} itself.
966
967Its behavior is undefined if the strings overlap. The function is
968declared in @file{string.h}.
969
970As noted below, this function is generally a poor choice for
971processing text.
972@end deftypefun
973
0a13c9e9 974@deftypefun {wchar_t *} wcpncpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size})
d08a7e4c 975@standards{GNU, wchar.h}
0a13c9e9
PE
976@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
977This function is similar to @code{wcpcpy} but copies always exactly
978@var{wsize} wide characters into @var{wto}.
979
980If the length of @var{wfrom} is more than @var{size}, then
981@code{wcpncpy} copies just the first @var{size} wide characters and
982returns a pointer to the wide character directly following the last
983non-null wide character which was copied last. Note that in this case
984there is no null terminator written into @var{wto}.
985
986If the length of @var{wfrom} is less than @var{size}, then @code{wcpncpy}
987copies all of @var{wfrom}, followed by enough null wide characters to add up
988to @var{size} wide characters in all. This behavior is rarely useful, but it
989is implemented to be useful in contexts where this behavior of the
990@code{wcsncpy} is used. @code{wcpncpy} returns a pointer to the
991@emph{first} written null wide character.
992
993This function is not part of ISO or POSIX but was found useful while
994developing @theglibc{} itself.
995
996Its behavior is undefined if the strings overlap.
997
998As noted below, this function is generally a poor choice for
999processing text.
1000
1001@code{wcpncpy} is a GNU extension.
1002@end deftypefun
1003
8a2f1f5b 1004@deftypefun {char *} strncat (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size})
d08a7e4c 1005@standards{ISO, string.h}
11087373 1006@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4 1007This function is like @code{strcat} except that not more than @var{size}
2cc4b9cc
PE
1008bytes from @var{from} are appended to the end of @var{to}, and
1009@var{from} need not be null-terminated. A single null byte is also
1010always appended to @var{to}, so the total
28f540f4
RM
1011allocated size of @var{to} must be at least @code{@var{size} + 1} bytes
1012longer than its initial length.
1013
1014The @code{strncat} function could be implemented like this:
1015
1016@smallexample
1017@group
1018char *
1019strncat (char *to, const char *from, size_t size)
1020@{
5d1d4918
PE
1021 size_t len = strlen (to);
1022 memcpy (to + len, from, strnlen (from, size));
1023 to[len + strnlen (from, size)] = '\0';
28f540f4
RM
1024 return to;
1025@}
1026@end group
1027@end smallexample
1028
1029The behavior of @code{strncat} is undefined if the strings overlap.
0a13c9e9
PE
1030
1031As a companion to @code{strncpy}, @code{strncat} was designed for
1032now-rarely-used arrays consisting of non-null bytes followed by zero
1033or more null bytes. As noted below, this function is generally a poor
1034choice for processing text. Also, this function has significant
1035performance issues. @xref{Concatenating Strings}.
28f540f4
RM
1036@end deftypefun
1037
8a2f1f5b 1038@deftypefun {wchar_t *} wcsncat (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size})
d08a7e4c 1039@standards{ISO, wchar.h}
11087373 1040@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b 1041This function is like @code{wcscat} except that not more than @var{size}
2cc4b9cc
PE
1042wide characters from @var{from} are appended to the end of @var{to},
1043and @var{from} need not be null-terminated. A single null wide
1044character is also always appended to @var{to}, so the total allocated
1045size of @var{to} must be at least @code{wcsnlen (@var{wfrom},
1046@var{size}) + 1} wide characters longer than its initial length.
8a2f1f5b
UD
1047
1048The @code{wcsncat} function could be implemented like this:
1049
1050@smallexample
1051@group
1052wchar_t *
1053wcsncat (wchar_t *restrict wto, const wchar_t *restrict wfrom,
1054 size_t size)
1055@{
5d1d4918
PE
1056 size_t len = wcslen (wto);
1057 memcpy (wto + len, wfrom, wcsnlen (wfrom, size) * sizeof (wchar_t));
1058 wto[len + wcsnlen (wfrom, size)] = L'\0';
8a2f1f5b
UD
1059 return wto;
1060@}
1061@end group
1062@end smallexample
1063
1064The behavior of @code{wcsncat} is undefined if the strings overlap.
28f540f4 1065
0a13c9e9
PE
1066As noted below, this function is generally a poor choice for
1067processing text. Also, this function has significant performance
1068issues. @xref{Concatenating Strings}.
1069@end deftypefun
1070
1071Because these functions can abruptly truncate strings or wide strings,
1072they are generally poor choices for processing text. When coping or
1073concatening multibyte strings, they can truncate within a multibyte
1074character so that the result is not a valid multibyte string. When
1075combining or concatenating multibyte or wide strings, they may
1076truncate the output after a combining character, resulting in a
1077corrupted grapheme. They can cause bugs even when processing
1078single-byte strings: for example, when calculating an ASCII-only user
1079name, a truncated name can identify the wrong user.
1080
1081Although some buffer overruns can be prevented by manually replacing
1082calls to copying functions with calls to truncation functions, there
1083are often easier and safer automatic techniques that cause buffer
1084overruns to reliably terminate a program, such as GCC's
1085@option{-fcheck-pointer-bounds} and @option{-fsanitize=address}
1086options. @xref{Debugging Options,, Options for Debugging Your Program
1f6676d7 1087or GCC, gcc, Using GCC}. Because truncation functions can mask
0a13c9e9
PE
1088application bugs that would otherwise be caught by the automatic
1089techniques, these functions should be used only when the application's
1090underlying logic requires truncation.
1091
1092@strong{Note:} GNU programs should not truncate strings or wide
1093strings to fit arbitrary size limits. @xref{Semantics, , Writing
1094Robust Programs, standards, The GNU Coding Standards}. Instead of
1095string-truncation functions, it is usually better to use dynamic
1096memory allocation (@pxref{Unconstrained Allocation}) and functions
1097such as @code{strdup} or @code{asprintf} to construct strings.
28f540f4 1098
b4012b75 1099@node String/Array Comparison
28f540f4
RM
1100@section String/Array Comparison
1101@cindex comparing strings and arrays
1102@cindex string comparison functions
1103@cindex array comparison functions
1104@cindex predicates on strings
1105@cindex predicates on arrays
1106
1107You can use the functions in this section to perform comparisons on the
1108contents of strings and arrays. As well as checking for equality, these
1109functions can also be used as the ordering functions for sorting
1110operations. @xref{Searching and Sorting}, for an example of this.
1111
1112Unlike most comparison operations in C, the string comparison functions
1113return a nonzero value if the strings are @emph{not} equivalent rather
1114than if they are. The sign of the value indicates the relative ordering
2cc4b9cc 1115of the first part of the strings that are not equivalent: a
28f540f4 1116negative value indicates that the first string is ``less'' than the
a5113b14 1117second, while a positive value indicates that the first string is
28f540f4
RM
1118``greater''.
1119
1120The most common use of these functions is to check only for equality.
1121This is canonically done with an expression like @w{@samp{! strcmp (s1, s2)}}.
1122
1123All of these functions are declared in the header file @file{string.h}.
1124@pindex string.h
1125
28f540f4 1126@deftypefun int memcmp (const void *@var{a1}, const void *@var{a2}, size_t @var{size})
d08a7e4c 1127@standards{ISO, string.h}
11087373 1128@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
1129The function @code{memcmp} compares the @var{size} bytes of memory
1130beginning at @var{a1} against the @var{size} bytes of memory beginning
1131at @var{a2}. The value returned has the same sign as the difference
1132between the first differing pair of bytes (interpreted as @code{unsigned
1133char} objects, then promoted to @code{int}).
1134
1135If the contents of the two blocks are equal, @code{memcmp} returns
1136@code{0}.
1137@end deftypefun
1138
8a2f1f5b 1139@deftypefun int wmemcmp (const wchar_t *@var{a1}, const wchar_t *@var{a2}, size_t @var{size})
d08a7e4c 1140@standards{ISO, wchar.h}
11087373 1141@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
1142The function @code{wmemcmp} compares the @var{size} wide characters
1143beginning at @var{a1} against the @var{size} wide characters beginning
1144at @var{a2}. The value returned is smaller than or larger than zero
1145depending on whether the first differing wide character is @var{a1} is
2cc4b9cc 1146smaller or larger than the corresponding wide character in @var{a2}.
8a2f1f5b
UD
1147
1148If the contents of the two blocks are equal, @code{wmemcmp} returns
1149@code{0}.
1150@end deftypefun
1151
28f540f4
RM
1152On arbitrary arrays, the @code{memcmp} function is mostly useful for
1153testing equality. It usually isn't meaningful to do byte-wise ordering
1154comparisons on arrays of things other than bytes. For example, a
1155byte-wise comparison on the bytes that make up floating-point numbers
1156isn't likely to tell you anything about the relationship between the
1157values of the floating-point numbers.
1158
8a2f1f5b
UD
1159@code{wmemcmp} is really only useful to compare arrays of type
1160@code{wchar_t} since the function looks at @code{sizeof (wchar_t)} bytes
1161at a time and this number of bytes is system dependent.
1162
28f540f4
RM
1163You should also be careful about using @code{memcmp} to compare objects
1164that can contain ``holes'', such as the padding inserted into structure
1165objects to enforce alignment requirements, extra space at the end of
2cc4b9cc 1166unions, and extra bytes at the ends of strings whose length is less
28f540f4
RM
1167than their allocated size. The contents of these ``holes'' are
1168indeterminate and may cause strange behavior when performing byte-wise
1169comparisons. For more predictable results, perform an explicit
1170component-wise comparison.
1171
1172For example, given a structure type definition like:
1173
1174@smallexample
1175struct foo
1176 @{
1177 unsigned char tag;
1178 union
1179 @{
1180 double f;
1181 long i;
1182 char *p;
1183 @} value;
1184 @};
1185@end smallexample
1186
1187@noindent
1188you are better off writing a specialized comparison function to compare
1189@code{struct foo} objects instead of comparing them with @code{memcmp}.
1190
28f540f4 1191@deftypefun int strcmp (const char *@var{s1}, const char *@var{s2})
d08a7e4c 1192@standards{ISO, string.h}
11087373 1193@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
1194The @code{strcmp} function compares the string @var{s1} against
1195@var{s2}, returning a value that has the same sign as the difference
2cc4b9cc 1196between the first differing pair of bytes (interpreted as
28f540f4
RM
1197@code{unsigned char} objects, then promoted to @code{int}).
1198
1199If the two strings are equal, @code{strcmp} returns @code{0}.
1200
1201A consequence of the ordering used by @code{strcmp} is that if @var{s1}
1202is an initial substring of @var{s2}, then @var{s1} is considered to be
1203``less than'' @var{s2}.
8a2f1f5b
UD
1204
1205@code{strcmp} does not take sorting conventions of the language the
1206strings are written in into account. To get that one has to use
1207@code{strcoll}.
1208@end deftypefun
1209
8a2f1f5b 1210@deftypefun int wcscmp (const wchar_t *@var{ws1}, const wchar_t *@var{ws2})
d08a7e4c 1211@standards{ISO, wchar.h}
11087373 1212@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b 1213
2cc4b9cc 1214The @code{wcscmp} function compares the wide string @var{ws1}
8a2f1f5b
UD
1215against @var{ws2}. The value returned is smaller than or larger than zero
1216depending on whether the first differing wide character is @var{ws1} is
2cc4b9cc 1217smaller or larger than the corresponding wide character in @var{ws2}.
8a2f1f5b
UD
1218
1219If the two strings are equal, @code{wcscmp} returns @code{0}.
1220
1221A consequence of the ordering used by @code{wcscmp} is that if @var{ws1}
1222is an initial substring of @var{ws2}, then @var{ws1} is considered to be
1223``less than'' @var{ws2}.
1224
1225@code{wcscmp} does not take sorting conventions of the language the
1226strings are written in into account. To get that one has to use
1227@code{wcscoll}.
28f540f4
RM
1228@end deftypefun
1229
28f540f4 1230@deftypefun int strcasecmp (const char *@var{s1}, const char *@var{s2})
d08a7e4c 1231@standards{BSD, string.h}
11087373
AO
1232@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
1233@c Although this calls tolower multiple times, it's a macro, and
1234@c strcasecmp is optimized so that the locale pointer is read only once.
1235@c There are some asm implementations too, for which the single-read
1236@c from locale TLS pointers also applies.
4547c1a4 1237This function is like @code{strcmp}, except that differences in case are
2cc4b9cc
PE
1238ignored, and its arguments must be multibyte strings.
1239How uppercase and lowercase characters are related is
4547c1a4
UD
1240determined by the currently selected locale. In the standard @code{"C"}
1241locale the characters @"A and @"a do not match but in a locale which
dd7d45e8 1242regards these characters as parts of the alphabet they do match.
28f540f4 1243
85c165be 1244@noindent
28f540f4
RM
1245@code{strcasecmp} is derived from BSD.
1246@end deftypefun
1247
8ded91fb 1248@deftypefun int wcscasecmp (const wchar_t *@var{ws1}, const wchar_t *@var{ws2})
d08a7e4c 1249@standards{GNU, wchar.h}
11087373
AO
1250@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
1251@c Since towlower is not a macro, the locale object may be read multiple
1252@c times.
8a2f1f5b
UD
1253This function is like @code{wcscmp}, except that differences in case are
1254ignored. How uppercase and lowercase characters are related is
1255determined by the currently selected locale. In the standard @code{"C"}
1256locale the characters @"A and @"a do not match but in a locale which
1257regards these characters as parts of the alphabet they do match.
1258
1259@noindent
1260@code{wcscasecmp} is a GNU extension.
1261@end deftypefun
1262
8a2f1f5b 1263@deftypefun int strncmp (const char *@var{s1}, const char *@var{s2}, size_t @var{size})
d08a7e4c 1264@standards{ISO, string.h}
11087373 1265@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b 1266This function is the similar to @code{strcmp}, except that no more than
2cc4b9cc
PE
1267@var{size} bytes are compared. In other words, if the two
1268strings are the same in their first @var{size} bytes, the
8a2f1f5b
UD
1269return value is zero.
1270@end deftypefun
1271
8a2f1f5b 1272@deftypefun int wcsncmp (const wchar_t *@var{ws1}, const wchar_t *@var{ws2}, size_t @var{size})
d08a7e4c 1273@standards{ISO, wchar.h}
11087373 1274@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
f0f308c1 1275This function is similar to @code{wcscmp}, except that no more than
8a2f1f5b
UD
1276@var{size} wide characters are compared. In other words, if the two
1277strings are the same in their first @var{size} wide characters, the
1278return value is zero.
1279@end deftypefun
1280
28f540f4 1281@deftypefun int strncasecmp (const char *@var{s1}, const char *@var{s2}, size_t @var{n})
d08a7e4c 1282@standards{BSD, string.h}
11087373 1283@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
28f540f4 1284This function is like @code{strncmp}, except that differences in case
2cc4b9cc
PE
1285are ignored, and the compared parts of the arguments should consist of
1286valid multibyte characters.
1287Like @code{strcasecmp}, it is locale dependent how
dd7d45e8 1288uppercase and lowercase characters are related.
28f540f4 1289
85c165be 1290@noindent
28f540f4
RM
1291@code{strncasecmp} is a GNU extension.
1292@end deftypefun
1293
8a2f1f5b 1294@deftypefun int wcsncasecmp (const wchar_t *@var{ws1}, const wchar_t *@var{s2}, size_t @var{n})
d08a7e4c 1295@standards{GNU, wchar.h}
11087373 1296@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
8a2f1f5b
UD
1297This function is like @code{wcsncmp}, except that differences in case
1298are ignored. Like @code{wcscasecmp}, it is locale dependent how
1299uppercase and lowercase characters are related.
1300
1301@noindent
1302@code{wcsncasecmp} is a GNU extension.
28f540f4
RM
1303@end deftypefun
1304
8a2f1f5b
UD
1305Here are some examples showing the use of @code{strcmp} and
1306@code{strncmp} (equivalent examples can be constructed for the wide
1307character functions). These examples assume the use of the ASCII
1308character set. (If some other character set---say, EBCDIC---is used
1309instead, then the glyphs are associated with different numeric codes,
1310and the return values and ordering may differ.)
28f540f4
RM
1311
1312@smallexample
1313strcmp ("hello", "hello")
1314 @result{} 0 /* @r{These two strings are the same.} */
1315strcmp ("hello", "Hello")
1316 @result{} 32 /* @r{Comparisons are case-sensitive.} */
1317strcmp ("hello", "world")
2cc4b9cc 1318 @result{} -15 /* @r{The byte @code{'h'} comes before @code{'w'}.} */
28f540f4 1319strcmp ("hello", "hello, world")
2cc4b9cc 1320 @result{} -44 /* @r{Comparing a null byte against a comma.} */
6952e59e 1321strncmp ("hello", "hello, world", 5)
2cc4b9cc 1322 @result{} 0 /* @r{The initial 5 bytes are the same.} */
28f540f4 1323strncmp ("hello, world", "hello, stupid world!!!", 5)
2cc4b9cc 1324 @result{} 0 /* @r{The initial 5 bytes are the same.} */
28f540f4
RM
1325@end smallexample
1326
1f205a47 1327@deftypefun int strverscmp (const char *@var{s1}, const char *@var{s2})
d08a7e4c 1328@standards{GNU, string.h}
11087373
AO
1329@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
1330@c Calls isdigit multiple times, locale may change in between.
1f205a47 1331The @code{strverscmp} function compares the string @var{s1} against
f2282d42
RM
1332@var{s2}, considering them as holding indices/version numbers. The
1333return value follows the same conventions as found in the
1334@code{strcmp} function. In fact, if @var{s1} and @var{s2} contain no
f4a36548
FW
1335digits, @code{strverscmp} behaves like @code{strcmp}
1336(in the sense that the sign of the result is the same).
1f205a47 1337
f4a36548
FW
1338The comparison algorithm which the @code{strverscmp} function implements
1339differs slightly from other version-comparison algorithms. The
1340implementation is based on a finite-state machine, whose behavior is
1341approximated below.
1f205a47
UD
1342
1343@itemize @bullet
1344@item
f4a36548
FW
1345The input strings are each split into sequences of non-digits and
1346digits. These sequences can be empty at the beginning and end of the
1347string. Digits are determined by the @code{isdigit} function and are
1348thus subject to the current locale.
1f205a47
UD
1349
1350@item
f4a36548
FW
1351Comparison starts with a (possibly empty) non-digit sequence. The first
1352non-equal sequences of non-digits or digits determines the outcome of
1353the comparison.
1f205a47
UD
1354
1355@item
f4a36548
FW
1356Corresponding non-digit sequences in both strings are compared
1357lexicographically if their lengths are equal. If the lengths differ,
1358the shorter non-digit sequence is extended with the input string
1359character immediately following it (which may be the null terminator),
1360the other sequence is truncated to be of the same (extended) length, and
1361these two sequences are compared lexicographically. In the last case,
1362the sequence comparison determines the result of the function because
1363the extension character (or some character before it) is necessarily
1364different from the character at the same offset in the other input
1365string.
1366
1367@item
1368For two sequences of digits, the number of leading zeros is counted (which
1369can be zero). If the count differs, the string with more leading zeros
1370in the digit sequence is considered smaller than the other string.
1371
1372@item
1373If the two sequences of digits have no leading zeros, they are compared
1374as integers, that is, the string with the longer digit sequence is
1375deemed larger, and if both sequences are of equal length, they are
1376compared lexicographically.
1377
1378@item
1379If both digit sequences start with a zero and have an equal number of
1380leading zeros, they are compared lexicographically if their lengths are
1381the same. If the lengths differ, the shorter sequence is extended with
1382the following character in its input string, and the other sequence is
1383truncated to the same length, and both sequences are compared
1384lexicographically (similar to the non-digit sequence case above).
1f205a47
UD
1385@end itemize
1386
f4a36548
FW
1387The treatment of leading zeros and the tie-breaking extension characters
1388(which in effect propagate across non-digit/digit sequence boundaries)
1389differs from other version-comparison algorithms.
1390
1f205a47
UD
1391@smallexample
1392strverscmp ("no digit", "no digit")
0bc93a2f 1393 @result{} 0 /* @r{same behavior as strcmp.} */
1f205a47
UD
1394strverscmp ("item#99", "item#100")
1395 @result{} <0 /* @r{same prefix, but 99 < 100.} */
1396strverscmp ("alpha1", "alpha001")
f4a36548 1397 @result{} >0 /* @r{different number of leading zeros (0 and 2).} */
1f205a47 1398strverscmp ("part1_f012", "part1_f01")
f4a36548 1399 @result{} >0 /* @r{lexicographical comparison with leading zeros.} */
1f205a47 1400strverscmp ("foo.009", "foo.0")
f4a36548 1401 @result{} <0 /* @r{different number of leading zeros (2 and 1).} */
1f205a47
UD
1402@end smallexample
1403
1f205a47
UD
1404@code{strverscmp} is a GNU extension.
1405@end deftypefun
1406
28f540f4 1407@deftypefun int bcmp (const void *@var{a1}, const void *@var{a2}, size_t @var{size})
d08a7e4c 1408@standards{BSD, string.h}
11087373 1409@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
1410This is an obsolete alias for @code{memcmp}, derived from BSD.
1411@end deftypefun
1412
b4012b75 1413@node Collation Functions
28f540f4
RM
1414@section Collation Functions
1415
1416@cindex collating strings
1417@cindex string collation functions
1418
1419In some locales, the conventions for lexicographic ordering differ from
1420the strict numeric ordering of character codes. For example, in Spanish
1421most glyphs with diacritical marks such as accents are not considered
a5177499
BS
1422distinct letters for the purposes of collation. On the other hand, in
1423Czech the two-character sequence @samp{ch} is treated as a single letter
1424that is collated between @samp{h} and @samp{i}.
28f540f4
RM
1425
1426You can use the functions @code{strcoll} and @code{strxfrm} (declared in
8a2f1f5b
UD
1427the headers file @file{string.h}) and @code{wcscoll} and @code{wcsxfrm}
1428(declared in the headers file @file{wchar}) to compare strings using a
1429collation ordering appropriate for the current locale. The locale used
1430by these functions in particular can be specified by setting the locale
1431for the @code{LC_COLLATE} category; see @ref{Locales}.
28f540f4 1432@pindex string.h
8a2f1f5b 1433@pindex wchar.h
28f540f4
RM
1434
1435In the standard C locale, the collation sequence for @code{strcoll} is
8a2f1f5b
UD
1436the same as that for @code{strcmp}. Similarly, @code{wcscoll} and
1437@code{wcscmp} are the same in this situation.
28f540f4
RM
1438
1439Effectively, the way these functions work is by applying a mapping to
2cc4b9cc
PE
1440transform the characters in a multibyte string to a byte
1441sequence that represents
28f540f4
RM
1442the string's position in the collating sequence of the current locale.
1443Comparing two such byte sequences in a simple fashion is equivalent to
1444comparing the strings with the locale's collating sequence.
1445
8a2f1f5b
UD
1446The functions @code{strcoll} and @code{wcscoll} perform this translation
1447implicitly, in order to do one comparison. By contrast, @code{strxfrm}
1448and @code{wcsxfrm} perform the mapping explicitly. If you are making
1449multiple comparisons using the same string or set of strings, it is
1450likely to be more efficient to use @code{strxfrm} or @code{wcsxfrm} to
1451transform all the strings just once, and subsequently compare the
1452transformed strings with @code{strcmp} or @code{wcscmp}.
28f540f4 1453
28f540f4 1454@deftypefun int strcoll (const char *@var{s1}, const char *@var{s2})
d08a7e4c 1455@standards{ISO, string.h}
11087373
AO
1456@safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
1457@c Calls strcoll_l with the current locale, which dereferences only the
1458@c LC_COLLATE data pointer.
28f540f4
RM
1459The @code{strcoll} function is similar to @code{strcmp} but uses the
1460collating sequence of the current locale for collation (the
2cc4b9cc 1461@code{LC_COLLATE} locale). The arguments are multibyte strings.
28f540f4
RM
1462@end deftypefun
1463
8a2f1f5b 1464@deftypefun int wcscoll (const wchar_t *@var{ws1}, const wchar_t *@var{ws2})
d08a7e4c 1465@standards{ISO, wchar.h}
11087373
AO
1466@safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
1467@c Same as strcoll, but calling wcscoll_l.
8a2f1f5b
UD
1468The @code{wcscoll} function is similar to @code{wcscmp} but uses the
1469collating sequence of the current locale for collation (the
1470@code{LC_COLLATE} locale).
1471@end deftypefun
1472
28f540f4
RM
1473Here is an example of sorting an array of strings, using @code{strcoll}
1474to compare them. The actual sort algorithm is not written here; it
1475comes from @code{qsort} (@pxref{Array Sort Function}). The job of the
1476code shown here is to say how to compare the strings while sorting them.
1477(Later on in this section, we will show a way to do this more
1478efficiently using @code{strxfrm}.)
1479
1480@smallexample
1481/* @r{This is the comparison function used with @code{qsort}.} */
1482
1483int
e39745ff 1484compare_elements (const void *v1, const void *v2)
28f540f4 1485@{
e39745ff 1486 char * const *p1 = v1;
a9f5ce09 1487 char * const *p2 = v2;
e39745ff 1488
28f540f4
RM
1489 return strcoll (*p1, *p2);
1490@}
1491
1492/* @r{This is the entry point---the function to sort}
1493 @r{strings using the locale's collating sequence.} */
1494
1495void
1496sort_strings (char **array, int nstrings)
1497@{
1498 /* @r{Sort @code{temp_array} by comparing the strings.} */
9fc19e48
UD
1499 qsort (array, nstrings,
1500 sizeof (char *), compare_elements);
28f540f4
RM
1501@}
1502@end smallexample
1503
1504@cindex converting string to collation order
8a2f1f5b 1505@deftypefun size_t strxfrm (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size})
d08a7e4c 1506@standards{ISO, string.h}
11087373 1507@safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
2cc4b9cc
PE
1508The function @code{strxfrm} transforms the multibyte string
1509@var{from} using the
8a2f1f5b 1510collation transformation determined by the locale currently selected for
28f540f4 1511collation, and stores the transformed string in the array @var{to}. Up
2cc4b9cc 1512to @var{size} bytes (including a terminating null byte) are
28f540f4
RM
1513stored.
1514
1515The behavior is undefined if the strings @var{to} and @var{from}
0a13c9e9 1516overlap; see @ref{Copying Strings and Arrays}.
28f540f4
RM
1517
1518The return value is the length of the entire transformed string. This
1519value is not affected by the value of @var{size}, but if it is greater
a5113b14
UD
1520or equal than @var{size}, it means that the transformed string did not
1521entirely fit in the array @var{to}. In this case, only as much of the
1522string as actually fits was stored. To get the whole transformed
1523string, call @code{strxfrm} again with a bigger output array.
28f540f4
RM
1524
1525The transformed string may be longer than the original string, and it
1526may also be shorter.
1527
2cc4b9cc
PE
1528If @var{size} is zero, no bytes are stored in @var{to}. In this
1529case, @code{strxfrm} simply returns the number of bytes that would
28f540f4 1530be the length of the transformed string. This is useful for determining
8a2f1f5b
UD
1531what size the allocated array should be. It does not matter what
1532@var{to} is if @var{size} is zero; @var{to} may even be a null pointer.
1533@end deftypefun
1534
8a2f1f5b 1535@deftypefun size_t wcsxfrm (wchar_t *restrict @var{wto}, const wchar_t *@var{wfrom}, size_t @var{size})
d08a7e4c 1536@standards{ISO, wchar.h}
11087373 1537@safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
2cc4b9cc 1538The function @code{wcsxfrm} transforms wide string @var{wfrom}
8a2f1f5b
UD
1539using the collation transformation determined by the locale currently
1540selected for collation, and stores the transformed string in the array
1541@var{wto}. Up to @var{size} wide characters (including a terminating null
2cc4b9cc 1542wide character) are stored.
8a2f1f5b
UD
1543
1544The behavior is undefined if the strings @var{wto} and @var{wfrom}
0a13c9e9 1545overlap; see @ref{Copying Strings and Arrays}.
8a2f1f5b 1546
2cc4b9cc 1547The return value is the length of the entire transformed wide
8a2f1f5b
UD
1548string. This value is not affected by the value of @var{size}, but if
1549it is greater or equal than @var{size}, it means that the transformed
2cc4b9cc
PE
1550wide string did not entirely fit in the array @var{wto}. In
1551this case, only as much of the wide string as actually fits
1552was stored. To get the whole transformed wide string, call
8a2f1f5b
UD
1553@code{wcsxfrm} again with a bigger output array.
1554
2cc4b9cc
PE
1555The transformed wide string may be longer than the original
1556wide string, and it may also be shorter.
8a2f1f5b 1557
2cc4b9cc 1558If @var{size} is zero, no wide characters are stored in @var{to}. In this
8a2f1f5b 1559case, @code{wcsxfrm} simply returns the number of wide characters that
2cc4b9cc 1560would be the length of the transformed wide string. This is
8a2f1f5b
UD
1561useful for determining what size the allocated array should be (remember
1562to multiply with @code{sizeof (wchar_t)}). It does not matter what
1563@var{wto} is if @var{size} is zero; @var{wto} may even be a null pointer.
28f540f4
RM
1564@end deftypefun
1565
1566Here is an example of how you can use @code{strxfrm} when
1567you plan to do many comparisons. It does the same thing as the previous
1568example, but much faster, because it has to transform each string only
1569once, no matter how many times it is compared with other strings. Even
1570the time needed to allocate and free storage is much less than the time
1571we save, when there are many strings.
1572
1573@smallexample
1574struct sorter @{ char *input; char *transformed; @};
1575
1576/* @r{This is the comparison function used with @code{qsort}}
1577 @r{to sort an array of @code{struct sorter}.} */
1578
1579int
e39745ff 1580compare_elements (const void *v1, const void *v2)
28f540f4 1581@{
e39745ff
AJ
1582 const struct sorter *p1 = v1;
1583 const struct sorter *p2 = v2;
1584
28f540f4
RM
1585 return strcmp (p1->transformed, p2->transformed);
1586@}
1587
1588/* @r{This is the entry point---the function to sort}
1589 @r{strings using the locale's collating sequence.} */
1590
1591void
1592sort_strings_fast (char **array, int nstrings)
1593@{
1594 struct sorter temp_array[nstrings];
1595 int i;
1596
1597 /* @r{Set up @code{temp_array}. Each element contains}
1598 @r{one input string and its transformed string.} */
1599 for (i = 0; i < nstrings; i++)
1600 @{
1601 size_t length = strlen (array[i]) * 2;
a5113b14 1602 char *transformed;
f2ea0f5b 1603 size_t transformed_length;
28f540f4
RM
1604
1605 temp_array[i].input = array[i];
1606
a5113b14
UD
1607 /* @r{First try a buffer perhaps big enough.} */
1608 transformed = (char *) xmalloc (length);
1609
1610 /* @r{Transform @code{array[i]}.} */
1611 transformed_length = strxfrm (transformed, array[i], length);
1612
1613 /* @r{If the buffer was not large enough, resize it}
1614 @r{and try again.} */
1615 if (transformed_length >= length)
28f540f4 1616 @{
a5113b14 1617 /* @r{Allocate the needed space. +1 for terminating}
2cc4b9cc 1618 @r{@code{'\0'} byte.} */
bdc674d9
PE
1619 transformed = xrealloc (transformed,
1620 transformed_length + 1);
a5113b14
UD
1621
1622 /* @r{The return value is not interesting because we know}
1623 @r{how long the transformed string is.} */
dd7d45e8
UD
1624 (void) strxfrm (transformed, array[i],
1625 transformed_length + 1);
28f540f4 1626 @}
a5113b14
UD
1627
1628 temp_array[i].transformed = transformed;
28f540f4
RM
1629 @}
1630
1631 /* @r{Sort @code{temp_array} by comparing transformed strings.} */
89e691f2
AM
1632 qsort (temp_array, nstrings,
1633 sizeof (struct sorter), compare_elements);
28f540f4
RM
1634
1635 /* @r{Put the elements back in the permanent array}
1636 @r{in their sorted order.} */
1637 for (i = 0; i < nstrings; i++)
1638 array[i] = temp_array[i].input;
1639
1640 /* @r{Free the strings we allocated.} */
1641 for (i = 0; i < nstrings; i++)
1642 free (temp_array[i].transformed);
1643@}
1644@end smallexample
1645
8a2f1f5b
UD
1646The interesting part of this code for the wide character version would
1647look like this:
1648
1649@smallexample
1650void
1651sort_strings_fast (wchar_t **array, int nstrings)
1652@{
1653 @dots{}
1654 /* @r{Transform @code{array[i]}.} */
1655 transformed_length = wcsxfrm (transformed, array[i], length);
1656
1657 /* @r{If the buffer was not large enough, resize it}
1658 @r{and try again.} */
1659 if (transformed_length >= length)
1660 @{
1661 /* @r{Allocate the needed space. +1 for terminating}
2cc4b9cc 1662 @r{@code{L'\0'} wide character.} */
bdc674d9
PE
1663 transformed = xreallocarray (transformed,
1664 transformed_length + 1,
1665 sizeof *transformed);
8a2f1f5b
UD
1666
1667 /* @r{The return value is not interesting because we know}
1668 @r{how long the transformed string is.} */
1669 (void) wcsxfrm (transformed, array[i],
1670 transformed_length + 1);
1671 @}
1672 @dots{}
1673@end smallexample
1674
1675@noindent
1676Note the additional multiplication with @code{sizeof (wchar_t)} in the
1677@code{realloc} call.
1678
1679@strong{Compatibility Note:} The string collation functions are a new
976780fd 1680feature of @w{ISO C90}. Older C dialects have no equivalent feature.
8a2f1f5b
UD
1681The wide character versions were introduced in @w{Amendment 1} to @w{ISO
1682C90}.
28f540f4 1683
b4012b75 1684@node Search Functions
28f540f4
RM
1685@section Search Functions
1686
1687This section describes library functions which perform various kinds
1688of searching operations on strings and arrays. These functions are
1689declared in the header file @file{string.h}.
1690@pindex string.h
1691@cindex search functions (for strings)
1692@cindex string search functions
1693
28f540f4 1694@deftypefun {void *} memchr (const void *@var{block}, int @var{c}, size_t @var{size})
d08a7e4c 1695@standards{ISO, string.h}
11087373 1696@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
1697This function finds the first occurrence of the byte @var{c} (converted
1698to an @code{unsigned char}) in the initial @var{size} bytes of the
1699object beginning at @var{block}. The return value is a pointer to the
1700located byte, or a null pointer if no match was found.
1701@end deftypefun
1702
8a2f1f5b 1703@deftypefun {wchar_t *} wmemchr (const wchar_t *@var{block}, wchar_t @var{wc}, size_t @var{size})
d08a7e4c 1704@standards{ISO, wchar.h}
11087373 1705@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
1706This function finds the first occurrence of the wide character @var{wc}
1707in the initial @var{size} wide characters of the object beginning at
1708@var{block}. The return value is a pointer to the located wide
1709character, or a null pointer if no match was found.
1710@end deftypefun
1711
87b56f36 1712@deftypefun {void *} rawmemchr (const void *@var{block}, int @var{c})
d08a7e4c 1713@standards{GNU, string.h}
11087373 1714@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
87b56f36
UD
1715Often the @code{memchr} function is used with the knowledge that the
1716byte @var{c} is available in the memory block specified by the
1717parameters. But this means that the @var{size} parameter is not really
1718needed and that the tests performed with it at runtime (to check whether
1719the end of the block is reached) are not needed.
1720
1721The @code{rawmemchr} function exists for just this situation which is
1722surprisingly frequent. The interface is similar to @code{memchr} except
1723that the @var{size} parameter is missing. The function will look beyond
1724the end of the block pointed to by @var{block} in case the programmer
6be569a4 1725made an error in assuming that the byte @var{c} is present in the block.
87b56f36
UD
1726In this case the result is unspecified. Otherwise the return value is a
1727pointer to the located byte.
1728
32c7acd4 1729When looking for the end of a string, use @code{strchr}.
87b56f36
UD
1730
1731This function is a GNU extension.
1732@end deftypefun
1733
ca747856 1734@deftypefun {void *} memrchr (const void *@var{block}, int @var{c}, size_t @var{size})
d08a7e4c 1735@standards{GNU, string.h}
11087373 1736@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
ca747856
RM
1737The function @code{memrchr} is like @code{memchr}, except that it searches
1738backwards from the end of the block defined by @var{block} and @var{size}
1739(instead of forwards from the front).
4efcb713
UD
1740
1741This function is a GNU extension.
a2d63612 1742@end deftypefun
ca747856 1743
28f540f4 1744@deftypefun {char *} strchr (const char *@var{string}, int @var{c})
d08a7e4c 1745@standards{ISO, string.h}
11087373 1746@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
2cc4b9cc
PE
1747The @code{strchr} function finds the first occurrence of the byte
1748@var{c} (converted to a @code{char}) in the string
28f540f4 1749beginning at @var{string}. The return value is a pointer to the located
2cc4b9cc 1750byte, or a null pointer if no match was found.
28f540f4
RM
1751
1752For example,
1753@smallexample
1754strchr ("hello, world", 'l')
1755 @result{} "llo, world"
1756strchr ("hello, world", '?')
1757 @result{} NULL
a5113b14 1758@end smallexample
28f540f4 1759
2cc4b9cc 1760The terminating null byte is considered to be part of the string,
28f540f4 1761so you can use this function get a pointer to the end of a string by
2cc4b9cc 1762specifying zero as the value of the @var{c} argument.
0520adde
FB
1763
1764When @code{strchr} returns a null pointer, it does not let you know
2cc4b9cc 1765the position of the terminating null byte it has found. If you
0520adde
FB
1766need that information, it is better (but less portable) to use
1767@code{strchrnul} than to search for it a second time.
8a2f1f5b
UD
1768@end deftypefun
1769
f801cf7b 1770@deftypefun {wchar_t *} wcschr (const wchar_t *@var{wstring}, wchar_t @var{wc})
d08a7e4c 1771@standards{ISO, wchar.h}
11087373 1772@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b 1773The @code{wcschr} function finds the first occurrence of the wide
2cc4b9cc 1774character @var{wc} in the wide string
8a2f1f5b
UD
1775beginning at @var{wstring}. The return value is a pointer to the
1776located wide character, or a null pointer if no match was found.
1777
2cc4b9cc
PE
1778The terminating null wide character is considered to be part of the wide
1779string, so you can use this function get a pointer to the end
1780of a wide string by specifying a null wide character as the
8a2f1f5b
UD
1781value of the @var{wc} argument. It would be better (but less portable)
1782to use @code{wcschrnul} in this case, though.
28f540f4
RM
1783@end deftypefun
1784
0e4ee106 1785@deftypefun {char *} strchrnul (const char *@var{string}, int @var{c})
d08a7e4c 1786@standards{GNU, string.h}
11087373 1787@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
0e4ee106 1788@code{strchrnul} is the same as @code{strchr} except that if it does
2cc4b9cc
PE
1789not find the byte, it returns a pointer to string's terminating
1790null byte rather than a null pointer.
8a2f1f5b
UD
1791
1792This function is a GNU extension.
1793@end deftypefun
1794
8a2f1f5b 1795@deftypefun {wchar_t *} wcschrnul (const wchar_t *@var{wstring}, wchar_t @var{wc})
d08a7e4c 1796@standards{GNU, wchar.h}
11087373 1797@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b 1798@code{wcschrnul} is the same as @code{wcschr} except that if it does not
2cc4b9cc 1799find the wide character, it returns a pointer to the wide string's
8a2f1f5b
UD
1800terminating null wide character rather than a null pointer.
1801
1802This function is a GNU extension.
28f540f4
RM
1803@end deftypefun
1804
ec28fc7c 1805One useful, but unusual, use of the @code{strchr}
2cc4b9cc 1806function is when one wants to have a pointer pointing to the null byte
ee2752ea
UD
1807terminating a string. This is often written in this way:
1808
1809@smallexample
1810 s += strlen (s);
1811@end smallexample
1812
1813@noindent
1814This is almost optimal but the addition operation duplicated a bit of
1815the work already done in the @code{strlen} function. A better solution
1816is this:
1817
1818@smallexample
1819 s = strchr (s, '\0');
1820@end smallexample
1821
1822There is no restriction on the second parameter of @code{strchr} so it
2cc4b9cc 1823could very well also be zero. Those readers thinking very
ee2752ea 1824hard about this might now point out that the @code{strchr} function is
8c474db5 1825more expensive than the @code{strlen} function since we have two abort
1f77f049 1826criteria. This is right. But in @theglibc{} the implementation of
0e4ee106 1827@code{strchr} is optimized in a special way so that @code{strchr}
8c474db5 1828actually is faster.
ee2752ea 1829
28f540f4 1830@deftypefun {char *} strrchr (const char *@var{string}, int @var{c})
d08a7e4c 1831@standards{ISO, string.h}
11087373 1832@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
1833The function @code{strrchr} is like @code{strchr}, except that it searches
1834backwards from the end of the string @var{string} (instead of forwards
1835from the front).
1836
1837For example,
1838@smallexample
1839strrchr ("hello, world", 'l')
1840 @result{} "ld"
1841@end smallexample
1842@end deftypefun
1843
4315f45c 1844@deftypefun {wchar_t *} wcsrchr (const wchar_t *@var{wstring}, wchar_t @var{wc})
d08a7e4c 1845@standards{ISO, wchar.h}
11087373 1846@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
1847The function @code{wcsrchr} is like @code{wcschr}, except that it searches
1848backwards from the end of the string @var{wstring} (instead of forwards
1849from the front).
1850@end deftypefun
1851
28f540f4 1852@deftypefun {char *} strstr (const char *@var{haystack}, const char *@var{needle})
d08a7e4c 1853@standards{ISO, string.h}
11087373 1854@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4 1855This is like @code{strchr}, except that it searches @var{haystack} for a
2cc4b9cc 1856substring @var{needle} rather than just a single byte. It
28f540f4 1857returns a pointer into the string @var{haystack} that is the first
2cc4b9cc 1858byte of the substring, or a null pointer if no match was found. If
28f540f4
RM
1859@var{needle} is an empty string, the function returns @var{haystack}.
1860
1861For example,
1862@smallexample
1863strstr ("hello, world", "l")
1864 @result{} "llo, world"
1865strstr ("hello, world", "wo")
1866 @result{} "world"
1867@end smallexample
1868@end deftypefun
1869
8a2f1f5b 1870@deftypefun {wchar_t *} wcsstr (const wchar_t *@var{haystack}, const wchar_t *@var{needle})
d08a7e4c 1871@standards{ISO, wchar.h}
11087373 1872@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
1873This is like @code{wcschr}, except that it searches @var{haystack} for a
1874substring @var{needle} rather than just a single wide character. It
1875returns a pointer into the string @var{haystack} that is the first wide
1876character of the substring, or a null pointer if no match was found. If
1877@var{needle} is an empty string, the function returns @var{haystack}.
1878@end deftypefun
1879
8a2f1f5b 1880@deftypefun {wchar_t *} wcswcs (const wchar_t *@var{haystack}, const wchar_t *@var{needle})
d08a7e4c 1881@standards{XPG, wchar.h}
11087373 1882@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
9dcc8f11 1883@code{wcswcs} is a deprecated alias for @code{wcsstr}. This is the
8a2f1f5b
UD
1884name originally used in the X/Open Portability Guide before the
1885@w{Amendment 1} to @w{ISO C90} was published.
1886@end deftypefun
1887
28f540f4 1888
0e4ee106 1889@deftypefun {char *} strcasestr (const char *@var{haystack}, const char *@var{needle})
d08a7e4c 1890@standards{GNU, string.h}
11087373
AO
1891@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
1892@c There may be multiple calls of strncasecmp, each accessing the locale
1893@c object independently.
0e4ee106
UD
1894This is like @code{strstr}, except that it ignores case in searching for
1895the substring. Like @code{strcasecmp}, it is locale dependent how
2cc4b9cc
PE
1896uppercase and lowercase characters are related, and arguments are
1897multibyte strings.
0e4ee106
UD
1898
1899
1900For example,
1901@smallexample
d6868416 1902strcasestr ("hello, world", "L")
0e4ee106 1903 @result{} "llo, world"
d6868416 1904strcasestr ("hello, World", "wo")
0e4ee106
UD
1905 @result{} "World"
1906@end smallexample
1907@end deftypefun
1908
1909
63551311 1910@deftypefun {void *} memmem (const void *@var{haystack}, size_t @var{haystack-len},@*const void *@var{needle}, size_t @var{needle-len})
d08a7e4c 1911@standards{GNU, string.h}
11087373 1912@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4 1913This is like @code{strstr}, but @var{needle} and @var{haystack} are byte
2cc4b9cc 1914arrays rather than strings. @var{needle-len} is the
28f540f4 1915length of @var{needle} and @var{haystack-len} is the length of
0005e54f 1916@var{haystack}.
28f540f4
RM
1917
1918This function is a GNU extension.
1919@end deftypefun
1920
28f540f4 1921@deftypefun size_t strspn (const char *@var{string}, const char *@var{skipset})
d08a7e4c 1922@standards{ISO, string.h}
11087373 1923@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4 1924The @code{strspn} (``string span'') function returns the length of the
2cc4b9cc 1925initial substring of @var{string} that consists entirely of bytes that
28f540f4 1926are members of the set specified by the string @var{skipset}. The order
2cc4b9cc 1927of the bytes in @var{skipset} is not important.
28f540f4
RM
1928
1929For example,
1930@smallexample
1931strspn ("hello, world", "abcdefghijklmnopqrstuvwxyz")
1932 @result{} 5
1933@end smallexample
8a2f1f5b 1934
2cc4b9cc
PE
1935In a multibyte string, characters consisting of
1936more than one byte are not treated as single entities. Each byte is treated
8a2f1f5b
UD
1937separately. The function is not locale-dependent.
1938@end deftypefun
1939
8a2f1f5b 1940@deftypefun size_t wcsspn (const wchar_t *@var{wstring}, const wchar_t *@var{skipset})
d08a7e4c 1941@standards{ISO, wchar.h}
11087373 1942@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
1943The @code{wcsspn} (``wide character string span'') function returns the
1944length of the initial substring of @var{wstring} that consists entirely
1945of wide characters that are members of the set specified by the string
1946@var{skipset}. The order of the wide characters in @var{skipset} is not
1947important.
28f540f4
RM
1948@end deftypefun
1949
28f540f4 1950@deftypefun size_t strcspn (const char *@var{string}, const char *@var{stopset})
d08a7e4c 1951@standards{ISO, string.h}
11087373 1952@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4 1953The @code{strcspn} (``string complement span'') function returns the length
2cc4b9cc 1954of the initial substring of @var{string} that consists entirely of bytes
28f540f4 1955that are @emph{not} members of the set specified by the string @var{stopset}.
2cc4b9cc 1956(In other words, it returns the offset of the first byte in @var{string}
28f540f4
RM
1957that is a member of the set @var{stopset}.)
1958
1959For example,
1960@smallexample
1961strcspn ("hello, world", " \t\n,.;!?")
1962 @result{} 5
1963@end smallexample
8a2f1f5b 1964
2cc4b9cc
PE
1965In a multibyte string, characters consisting of
1966more than one byte are not treated as a single entities. Each byte is treated
8a2f1f5b
UD
1967separately. The function is not locale-dependent.
1968@end deftypefun
1969
8a2f1f5b 1970@deftypefun size_t wcscspn (const wchar_t *@var{wstring}, const wchar_t *@var{stopset})
d08a7e4c 1971@standards{ISO, wchar.h}
11087373 1972@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
1973The @code{wcscspn} (``wide character string complement span'') function
1974returns the length of the initial substring of @var{wstring} that
1975consists entirely of wide characters that are @emph{not} members of the
1976set specified by the string @var{stopset}. (In other words, it returns
2cc4b9cc 1977the offset of the first wide character in @var{string} that is a member of
8a2f1f5b 1978the set @var{stopset}.)
28f540f4
RM
1979@end deftypefun
1980
28f540f4 1981@deftypefun {char *} strpbrk (const char *@var{string}, const char *@var{stopset})
d08a7e4c 1982@standards{ISO, string.h}
11087373 1983@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4 1984The @code{strpbrk} (``string pointer break'') function is related to
2cc4b9cc 1985@code{strcspn}, except that it returns a pointer to the first byte
28f540f4
RM
1986in @var{string} that is a member of the set @var{stopset} instead of the
1987length of the initial substring. It returns a null pointer if no such
2cc4b9cc 1988byte from @var{stopset} is found.
28f540f4
RM
1989
1990@c @group Invalid outside the example.
1991For example,
1992
1993@smallexample
1994strpbrk ("hello, world", " \t\n,.;!?")
1995 @result{} ", world"
1996@end smallexample
1997@c @end group
8a2f1f5b 1998
2cc4b9cc
PE
1999In a multibyte string, characters consisting of
2000more than one byte are not treated as single entities. Each byte is treated
8a2f1f5b
UD
2001separately. The function is not locale-dependent.
2002@end deftypefun
2003
8a2f1f5b 2004@deftypefun {wchar_t *} wcspbrk (const wchar_t *@var{wstring}, const wchar_t *@var{stopset})
d08a7e4c 2005@standards{ISO, wchar.h}
11087373 2006@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
2007The @code{wcspbrk} (``wide character string pointer break'') function is
2008related to @code{wcscspn}, except that it returns a pointer to the first
2009wide character in @var{wstring} that is a member of the set
2010@var{stopset} instead of the length of the initial substring. It
2cc4b9cc 2011returns a null pointer if no such wide character from @var{stopset} is found.
28f540f4
RM
2012@end deftypefun
2013
0e4ee106
UD
2014
2015@subsection Compatibility String Search Functions
2016
0e4ee106 2017@deftypefun {char *} index (const char *@var{string}, int @var{c})
d08a7e4c 2018@standards{BSD, string.h}
11087373 2019@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
0e4ee106
UD
2020@code{index} is another name for @code{strchr}; they are exactly the same.
2021New code should always use @code{strchr} since this name is defined in
2022@w{ISO C} while @code{index} is a BSD invention which never was available
2023on @w{System V} derived systems.
2024@end deftypefun
2025
0e4ee106 2026@deftypefun {char *} rindex (const char *@var{string}, int @var{c})
d08a7e4c 2027@standards{BSD, string.h}
11087373 2028@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
0e4ee106
UD
2029@code{rindex} is another name for @code{strrchr}; they are exactly the same.
2030New code should always use @code{strrchr} since this name is defined in
2031@w{ISO C} while @code{rindex} is a BSD invention which never was available
2032on @w{System V} derived systems.
2033@end deftypefun
2034
b4012b75 2035@node Finding Tokens in a String
28f540f4
RM
2036@section Finding Tokens in a String
2037
28f540f4
RM
2038@cindex tokenizing strings
2039@cindex breaking a string into tokens
2040@cindex parsing tokens from a string
2041It's fairly common for programs to have a need to do some simple kinds
2042of lexical analysis and parsing, such as splitting a command string up
2043into tokens. You can do this with the @code{strtok} function, declared
2044in the header file @file{string.h}.
2045@pindex string.h
2046
8a2f1f5b 2047@deftypefun {char *} strtok (char *restrict @var{newstring}, const char *restrict @var{delimiters})
d08a7e4c 2048@standards{ISO, string.h}
11087373 2049@safety{@prelim{}@mtunsafe{@mtasurace{:strtok}}@asunsafe{}@acsafe{}}
28f540f4
RM
2050A string can be split into tokens by making a series of calls to the
2051function @code{strtok}.
2052
2053The string to be split up is passed as the @var{newstring} argument on
2054the first call only. The @code{strtok} function uses this to set up
2055some internal state information. Subsequent calls to get additional
2056tokens from the same string are indicated by passing a null pointer as
2057the @var{newstring} argument. Calling @code{strtok} with another
2058non-null @var{newstring} argument reinitializes the state information.
2059It is guaranteed that no other library function ever calls @code{strtok}
2060behind your back (which would mess up this internal state information).
2061
2062The @var{delimiters} argument is a string that specifies a set of delimiters
2cc4b9cc
PE
2063that may surround the token being extracted. All the initial bytes
2064that are members of this set are discarded. The first byte that is
28f540f4
RM
2065@emph{not} a member of this set of delimiters marks the beginning of the
2066next token. The end of the token is found by looking for the next
2cc4b9cc
PE
2067byte that is a member of the delimiter set. This byte in the
2068original string @var{newstring} is overwritten by a null byte, and the
28f540f4
RM
2069pointer to the beginning of the token in @var{newstring} is returned.
2070
2071On the next call to @code{strtok}, the searching begins at the next
2cc4b9cc 2072byte beyond the one that marked the end of the previous token.
28f540f4
RM
2073Note that the set of delimiters @var{delimiters} do not have to be the
2074same on every call in a series of calls to @code{strtok}.
2075
2076If the end of the string @var{newstring} is reached, or if the remainder of
2cc4b9cc 2077string consists only of delimiter bytes, @code{strtok} returns
28f540f4 2078a null pointer.
8a2f1f5b 2079
2cc4b9cc
PE
2080In a multibyte string, characters consisting of
2081more than one byte are not treated as single entities. Each byte is treated
8a2f1f5b
UD
2082separately. The function is not locale-dependent.
2083@end deftypefun
2084
1acd4371 2085@deftypefun {wchar_t *} wcstok (wchar_t *@var{newstring}, const wchar_t *@var{delimiters}, wchar_t **@var{save_ptr})
d08a7e4c 2086@standards{ISO, wchar.h}
11087373 2087@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
2088A string can be split into tokens by making a series of calls to the
2089function @code{wcstok}.
2090
2091The string to be split up is passed as the @var{newstring} argument on
2092the first call only. The @code{wcstok} function uses this to set up
2093some internal state information. Subsequent calls to get additional
2cc4b9cc 2094tokens from the same wide string are indicated by passing a
1acd4371
AO
2095null pointer as the @var{newstring} argument, which causes the pointer
2096previously stored in @var{save_ptr} to be used instead.
8a2f1f5b 2097
2cc4b9cc 2098The @var{delimiters} argument is a wide string that specifies
8a2f1f5b
UD
2099a set of delimiters that may surround the token being extracted. All
2100the initial wide characters that are members of this set are discarded.
2101The first wide character that is @emph{not} a member of this set of
2102delimiters marks the beginning of the next token. The end of the token
2103is found by looking for the next wide character that is a member of the
2cc4b9cc 2104delimiter set. This wide character in the original wide
1acd4371
AO
2105string @var{newstring} is overwritten by a null wide character, the
2106pointer past the overwritten wide character is saved in @var{save_ptr},
2107and the pointer to the beginning of the token in @var{newstring} is
2108returned.
8a2f1f5b
UD
2109
2110On the next call to @code{wcstok}, the searching begins at the next
2111wide character beyond the one that marked the end of the previous token.
2112Note that the set of delimiters @var{delimiters} do not have to be the
2113same on every call in a series of calls to @code{wcstok}.
2114
2cc4b9cc 2115If the end of the wide string @var{newstring} is reached, or
8a2f1f5b
UD
2116if the remainder of string consists only of delimiter wide characters,
2117@code{wcstok} returns a null pointer.
28f540f4
RM
2118@end deftypefun
2119
8a2f1f5b
UD
2120@strong{Warning:} Since @code{strtok} and @code{wcstok} alter the string
2121they is parsing, you should always copy the string to a temporary buffer
0a13c9e9
PE
2122before parsing it with @code{strtok}/@code{wcstok} (@pxref{Copying Strings
2123and Arrays}). If you allow @code{strtok} or @code{wcstok} to modify
8a2f1f5b
UD
2124a string that came from another part of your program, you are asking for
2125trouble; that string might be used for other purposes after
2126@code{strtok} or @code{wcstok} has modified it, and it would not have
2127the expected value.
28f540f4
RM
2128
2129The string that you are operating on might even be a constant. Then
8a2f1f5b
UD
2130when @code{strtok} or @code{wcstok} tries to modify it, your program
2131will get a fatal signal for writing in read-only memory. @xref{Program
2132Error Signals}. Even if the operation of @code{strtok} or @code{wcstok}
2133would not require a modification of the string (e.g., if there is
1f77f049 2134exactly one token) the string can (and in the @glibcadj{} case will) be
8a2f1f5b 2135modified.
28f540f4
RM
2136
2137This is a special case of a general principle: if a part of a program
2138does not have as its purpose the modification of a certain data
2139structure, then it is error-prone to modify the data structure
2140temporarily.
2141
1acd4371 2142The function @code{strtok} is not reentrant, whereas @code{wcstok} is.
8a2f1f5b
UD
2143@xref{Nonreentrancy}, for a discussion of where and why reentrancy is
2144important.
28f540f4
RM
2145
2146Here is a simple example showing the use of @code{strtok}.
2147
2148@comment Yes, this example has been tested.
2149@smallexample
2150#include <string.h>
2151#include <stddef.h>
2152
2153@dots{}
2154
5649a1d6 2155const char string[] = "words separated by spaces -- and, punctuation!";
28f540f4 2156const char delimiters[] = " .,;:!-";
5649a1d6 2157char *token, *cp;
28f540f4
RM
2158
2159@dots{}
2160
5649a1d6
UD
2161cp = strdupa (string); /* Make writable copy. */
2162token = strtok (cp, delimiters); /* token => "words" */
28f540f4
RM
2163token = strtok (NULL, delimiters); /* token => "separated" */
2164token = strtok (NULL, delimiters); /* token => "by" */
2165token = strtok (NULL, delimiters); /* token => "spaces" */
2166token = strtok (NULL, delimiters); /* token => "and" */
2167token = strtok (NULL, delimiters); /* token => "punctuation" */
2168token = strtok (NULL, delimiters); /* token => NULL */
2169@end smallexample
a5113b14 2170
1f77f049 2171@Theglibc{} contains two more functions for tokenizing a string
2cc4b9cc
PE
2172which overcome the limitation of non-reentrancy. They are not
2173available available for wide strings.
a5113b14 2174
a5113b14 2175@deftypefun {char *} strtok_r (char *@var{newstring}, const char *@var{delimiters}, char **@var{save_ptr})
d08a7e4c 2176@standards{POSIX, string.h}
11087373 2177@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
dd7d45e8
UD
2178Just like @code{strtok}, this function splits the string into several
2179tokens which can be accessed by successive calls to @code{strtok_r}.
1acd4371
AO
2180The difference is that, as in @code{wcstok}, the information about the
2181next token is stored in the space pointed to by the third argument,
2182@var{save_ptr}, which is a pointer to a string pointer. Calling
2183@code{strtok_r} with a null pointer for @var{newstring} and leaving
2184@var{save_ptr} between the calls unchanged does the job without
2185hindering reentrancy.
a5113b14 2186
976780fd 2187This function is defined in POSIX.1 and can be found on many systems
a5113b14
UD
2188which support multi-threading.
2189@end deftypefun
2190
a5113b14 2191@deftypefun {char *} strsep (char **@var{string_ptr}, const char *@var{delimiter})
d08a7e4c 2192@standards{BSD, string.h}
11087373 2193@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
0050ad5f
UD
2194This function has a similar functionality as @code{strtok_r} with the
2195@var{newstring} argument replaced by the @var{save_ptr} argument. The
2196initialization of the moving pointer has to be done by the user.
2197Successive calls to @code{strsep} move the pointer along the tokens
2198separated by @var{delimiter}, returning the address of the next token
2199and updating @var{string_ptr} to point to the beginning of the next
2200token.
2201
2202One difference between @code{strsep} and @code{strtok_r} is that if the
2cc4b9cc
PE
2203input string contains more than one byte from @var{delimiter} in a
2204row @code{strsep} returns an empty string for each pair of bytes
0050ad5f
UD
2205from @var{delimiter}. This means that a program normally should test
2206for @code{strsep} returning an empty string before processing it.
9afc8a59 2207
a5113b14
UD
2208This function was introduced in 4.3BSD and therefore is widely available.
2209@end deftypefun
2210
2211Here is how the above example looks like when @code{strsep} is used.
2212
2213@comment Yes, this example has been tested.
2214@smallexample
2215#include <string.h>
2216#include <stddef.h>
2217
2218@dots{}
2219
5649a1d6 2220const char string[] = "words separated by spaces -- and, punctuation!";
a5113b14
UD
2221const char delimiters[] = " .,;:!-";
2222char *running;
2223char *token;
2224
2225@dots{}
2226
5649a1d6 2227running = strdupa (string);
a5113b14
UD
2228token = strsep (&running, delimiters); /* token => "words" */
2229token = strsep (&running, delimiters); /* token => "separated" */
2230token = strsep (&running, delimiters); /* token => "by" */
2231token = strsep (&running, delimiters); /* token => "spaces" */
9afc8a59
UD
2232token = strsep (&running, delimiters); /* token => "" */
2233token = strsep (&running, delimiters); /* token => "" */
2234token = strsep (&running, delimiters); /* token => "" */
a5113b14 2235token = strsep (&running, delimiters); /* token => "and" */
9afc8a59 2236token = strsep (&running, delimiters); /* token => "" */
a5113b14 2237token = strsep (&running, delimiters); /* token => "punctuation" */
9afc8a59 2238token = strsep (&running, delimiters); /* token => "" */
a5113b14
UD
2239token = strsep (&running, delimiters); /* token => NULL */
2240@end smallexample
b4012b75 2241
ec28fc7c 2242@deftypefun {char *} basename (const char *@var{filename})
d08a7e4c 2243@standards{GNU, string.h}
11087373 2244@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
ec28fc7c 2245The GNU version of the @code{basename} function returns the last
9442cd75 2246component of the path in @var{filename}. This function is the preferred
ec28fc7c
UD
2247usage, since it does not modify the argument, @var{filename}, and
2248respects trailing slashes. The prototype for @code{basename} can be
ef48b196 2249found in @file{string.h}. Note, this function is overridden by the XPG
ec28fc7c
UD
2250version, if @file{libgen.h} is included.
2251
2252Example of using GNU @code{basename}:
2253
2254@smallexample
2255#include <string.h>
2256
2257int
2258main (int argc, char *argv[])
2259@{
2260 char *prog = basename (argv[0]);
2261
2262 if (argc < 2)
2263 @{
2264 fprintf (stderr, "Usage %s <arg>\n", prog);
2265 exit (1);
2266 @}
2267
2268 @dots{}
2269@}
2270@end smallexample
2271
2272@strong{Portability Note:} This function may produce different results
2273on different systems.
2274
2275@end deftypefun
2276
af85ebcd 2277@deftypefun {char *} basename (char *@var{path})
d08a7e4c 2278@standards{XPG, libgen.h}
11087373 2279@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
cf822e3c 2280This is the standard XPG defined @code{basename}. It is similar in
ec28fc7c 2281spirit to the GNU version, but may modify the @var{path} by removing
2cc4b9cc
PE
2282trailing '/' bytes. If the @var{path} is made up entirely of '/'
2283bytes, then "/" will be returned. Also, if @var{path} is
ec28fc7c 2284@code{NULL} or an empty string, then "." is returned. The prototype for
e4a5f77d 2285the XPG version can be found in @file{libgen.h}.
ec28fc7c
UD
2286
2287Example of using XPG @code{basename}:
2288
2289@smallexample
2290#include <libgen.h>
2291
2292int
2293main (int argc, char *argv[])
2294@{
2295 char *prog;
2296 char *path = strdupa (argv[0]);
2297
2298 prog = basename (path);
2299
2300 if (argc < 2)
2301 @{
2302 fprintf (stderr, "Usage %s <arg>\n", prog);
2303 exit (1);
2304 @}
2305
2306 @dots{}
2307
2308@}
2309@end smallexample
2310@end deftypefun
2311
ec28fc7c 2312@deftypefun {char *} dirname (char *@var{path})
d08a7e4c 2313@standards{XPG, libgen.h}
11087373 2314@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
ec28fc7c
UD
2315The @code{dirname} function is the compliment to the XPG version of
2316@code{basename}. It returns the parent directory of the file specified
2317by @var{path}. If @var{path} is @code{NULL}, an empty string, or
2cc4b9cc 2318contains no '/' bytes, then "." is returned. The prototype for this
ec28fc7c
UD
2319function can be found in @file{libgen.h}.
2320@end deftypefun
0e4ee106 2321
ea1bd74d
ZW
2322@node Erasing Sensitive Data
2323@section Erasing Sensitive Data
2324
2325Sensitive data, such as cryptographic keys, should be erased from
2326memory after use, to reduce the risk that a bug will expose it to the
2327outside world. However, compiler optimizations may determine that an
2328erasure operation is ``unnecessary,'' and remove it from the generated
2329code, because no @emph{correct} program could access the variable or
2330heap object containing the sensitive data after it's deallocated.
2331Since erasure is a precaution against bugs, this optimization is
2332inappropriate.
2333
2334The function @code{explicit_bzero} erases a block of memory, and
2335guarantees that the compiler will not remove the erasure as
2336``unnecessary.''
2337
2338@smallexample
2339@group
2340#include <string.h>
2341
2342extern void encrypt (const char *key, const char *in,
2343 char *out, size_t n);
2344extern void genkey (const char *phrase, char *key);
2345
2346void encrypt_with_phrase (const char *phrase, const char *in,
2347 char *out, size_t n)
2348@{
2349 char key[16];
2350 genkey (phrase, key);
2351 encrypt (key, in, out, n);
2352 explicit_bzero (key, 16);
2353@}
2354@end group
2355@end smallexample
2356
2357@noindent
2358In this example, if @code{memset}, @code{bzero}, or a hand-written
2359loop had been used, the compiler might remove them as ``unnecessary.''
2360
2361@strong{Warning:} @code{explicit_bzero} does not guarantee that
2362sensitive data is @emph{completely} erased from the computer's memory.
2363There may be copies in temporary storage areas, such as registers and
2364``scratch'' stack space; since these are invisible to the source code,
2365a library function cannot erase them.
2366
2367Also, @code{explicit_bzero} only operates on RAM. If a sensitive data
2368object never needs to have its address taken other than to call
2369@code{explicit_bzero}, it might be stored entirely in CPU registers
2370@emph{until} the call to @code{explicit_bzero}. Then it will be
2371copied into RAM, the copy will be erased, and the original will remain
2372intact. Data in RAM is more likely to be exposed by a bug than data
2373in registers, so this creates a brief window where the data is at
2374greater risk of exposure than it would have been if the program didn't
2375try to erase it at all.
2376
2377Declaring sensitive variables as @code{volatile} will make both the
2378above problems @emph{worse}; a @code{volatile} variable will be stored
2379in memory for its entire lifetime, and the compiler will make
2380@emph{more} copies of it than it would otherwise have. Attempting to
2381erase a normal variable ``by hand'' through a
2382@code{volatile}-qualified pointer doesn't work at all---because the
2383variable itself is not @code{volatile}, some compilers will ignore the
2384qualification on the pointer and remove the erasure anyway.
2385
2386Having said all that, in most situations, using @code{explicit_bzero}
2387is better than not using it. At present, the only way to do a more
2388thorough job is to write the entire sensitive operation in assembly
2389language. We anticipate that future compilers will recognize calls to
2390@code{explicit_bzero} and take appropriate steps to erase all the
8394b8c4 2391copies of the affected data, wherever they may be.
ea1bd74d 2392
ea1bd74d 2393@deftypefun void explicit_bzero (void *@var{block}, size_t @var{len})
d08a7e4c 2394@standards{BSD, string.h}
ea1bd74d
ZW
2395@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
2396
2397@code{explicit_bzero} writes zero into @var{len} bytes of memory
2398beginning at @var{block}, just as @code{bzero} would. The zeroes are
2399always written, even if the compiler could determine that this is
2400``unnecessary'' because no correct program could read them back.
2401
2402@strong{Note:} The @emph{only} optimization that @code{explicit_bzero}
2403disables is removal of ``unnecessary'' writes to memory. The compiler
2404can perform all the other optimizations that it could for a call to
2405@code{memset}. For instance, it may replace the function call with
2406inline memory writes, and it may assume that @var{block} cannot be a
2407null pointer.
2408
2409@strong{Portability Note:} This function first appeared in OpenBSD 5.5
2410and has not been standardized. Other systems may provide the same
2411functionality under a different name, such as @code{explicit_memset},
2412@code{memset_s}, or @code{SecureZeroMemory}.
2413
2414@Theglibc{} declares this function in @file{string.h}, but on other
2415systems it may be in @file{strings.h} instead.
2416@end deftypefun
2417
b10a0acc
ZW
2418
2419@node Shuffling Bytes
2420@section Shuffling Bytes
0e4ee106
UD
2421
2422The function below addresses the perennial programming quandary: ``How do
2423I take good data in string form and painlessly turn it into garbage?''
b10a0acc
ZW
2424This is not a difficult thing to code for oneself, but the authors of
2425@theglibc{} wish to make it as convenient as possible.
0e4ee106 2426
b10a0acc
ZW
2427To @emph{erase} data, use @code{explicit_bzero} (@pxref{Erasing
2428Sensitive Data}); to obfuscate it reversibly, use @code{memfrob}
2429(@pxref{Obfuscating Data}).
0e4ee106 2430
ec28fc7c 2431@deftypefun {char *} strfry (char *@var{string})
d08a7e4c 2432@standards{GNU, string.h}
11087373
AO
2433@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
2434@c Calls initstate_r, time, getpid, strlen, and random_r.
0e4ee106 2435
b10a0acc
ZW
2436@code{strfry} performs an in-place shuffle on @var{string}. Each
2437character is swapped to a position selected at random, within the
2438portion of the string starting with the character's original position.
2439(This is the Fisher-Yates algorithm for unbiased shuffling.)
2440
2441Calling @code{strfry} will not disturb any of the random number
2442generators that have global state (@pxref{Pseudo-Random Numbers}).
0e4ee106
UD
2443
2444The return value of @code{strfry} is always @var{string}.
2445
1f77f049 2446@strong{Portability Note:} This function is unique to @theglibc{}.
b10a0acc 2447It is declared in @file{string.h}.
0e4ee106
UD
2448@end deftypefun
2449
2450
b10a0acc
ZW
2451@node Obfuscating Data
2452@section Obfuscating Data
0e4ee106
UD
2453@cindex Rot13
2454
b10a0acc
ZW
2455The @code{memfrob} function reversibly obfuscates an array of binary
2456data. This is not true encryption; the obfuscated data still bears a
2457clear relationship to the original, and no secret key is required to
2458undo the obfuscation. It is analogous to the ``Rot13'' cipher used on
2459Usenet for obscuring offensive jokes, spoilers for works of fiction,
2460and so on, but it can be applied to arbitrary binary data.
0e4ee106 2461
b10a0acc
ZW
2462Programs that need true encryption---a transformation that completely
2463obscures the original and cannot be reversed without knowledge of a
2464secret key---should use a dedicated cryptography library, such as
2465@uref{https://www.gnu.org/software/libgcrypt/,,libgcrypt}.
2466
2467Programs that need to @emph{destroy} data should use
2468@code{explicit_bzero} (@pxref{Erasing Sensitive Data}), or possibly
2469@code{strfry} (@pxref{Shuffling Bytes}).
0e4ee106 2470
0e4ee106 2471@deftypefun {void *} memfrob (void *@var{mem}, size_t @var{length})
d08a7e4c 2472@standards{GNU, string.h}
11087373 2473@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
0e4ee106 2474
b10a0acc
ZW
2475The function @code{memfrob} obfuscates @var{length} bytes of data
2476beginning at @var{mem}, in place. Each byte is bitwise xor-ed with
2477the binary pattern 00101010 (hexadecimal 0x2A). The return value is
2478always @var{mem}.
0e4ee106 2479
b10a0acc
ZW
2480@code{memfrob} a second time on the same data returns it to
2481its original state.
0e4ee106 2482
1f77f049 2483@strong{Portability Note:} This function is unique to @theglibc{}.
b10a0acc 2484It is declared in @file{string.h}.
0e4ee106
UD
2485@end deftypefun
2486
b4012b75
UD
2487@node Encode Binary Data
2488@section Encode Binary Data
2489
2490To store or transfer binary data in environments which only support text
2491one has to encode the binary data by mapping the input bytes to
2cc4b9cc 2492bytes in the range allowed for storing or transferring. SVID
dd7d45e8
UD
2493systems (and nowadays XPG compliant systems) provide minimal support for
2494this task.
b4012b75 2495
b4012b75 2496@deftypefun {char *} l64a (long int @var{n})
d08a7e4c 2497@standards{XPG, stdlib.h}
11087373 2498@safety{@prelim{}@mtunsafe{@mtasurace{:l64a}}@asunsafe{}@acsafe{}}
2cc4b9cc
PE
2499This function encodes a 32-bit input value using bytes from the
2500basic character set. It returns a pointer to a 7 byte buffer which
dd7d45e8
UD
2501contains an encoded version of @var{n}. To encode a series of bytes the
2502user must copy the returned string to a destination buffer. It returns
2503the empty string if @var{n} is zero, which is somewhat bizarre but
2504mandated by the standard.@*
2505@strong{Warning:} Since a static buffer is used this function should not
5649a1d6 2506be used in multi-threaded programs. There is no thread-safe alternative
dd7d45e8
UD
2507to this function in the C library.@*
2508@strong{Compatibility Note:} The XPG standard states that the return
2509value of @code{l64a} is undefined if @var{n} is negative. In the GNU
2510implementation, @code{l64a} treats its argument as unsigned, so it will
2511return a sensible encoding for any nonzero @var{n}; however, portable
2512programs should not rely on this.
b4012b75 2513
dd7d45e8
UD
2514To encode a large buffer @code{l64a} must be called in a loop, once for
2515each 32-bit word of the buffer. For example, one could do something
2516like this:
5649a1d6
UD
2517
2518@smallexample
2519char *
2520encode (const void *buf, size_t len)
2521@{
2522 /* @r{We know in advance how long the buffer has to be.} */
2523 unsigned char *in = (unsigned char *) buf;
2524 char *out = malloc (6 + ((len + 3) / 4) * 6 + 1);
290639c3 2525 char *cp = out, *p;
5649a1d6
UD
2526
2527 /* @r{Encode the length.} */
dd7d45e8 2528 /* @r{Using `htonl' is necessary so that the data can be}
290639c3
UD
2529 @r{decoded even on machines with different byte order.}
2530 @r{`l64a' can return a string shorter than 6 bytes, so }
2531 @r{we pad it with encoding of 0 (}'.'@r{) at the end by }
2532 @r{hand.} */
dd7d45e8 2533
290639c3
UD
2534 p = stpcpy (cp, l64a (htonl (len)));
2535 cp = mempcpy (p, "......", 6 - (p - cp));
5649a1d6
UD
2536
2537 while (len > 3)
2538 @{
2539 unsigned long int n = *in++;
2540 n = (n << 8) | *in++;
2541 n = (n << 8) | *in++;
2542 n = (n << 8) | *in++;
2543 len -= 4;
290639c3
UD
2544 p = stpcpy (cp, l64a (htonl (n)));
2545 cp = mempcpy (p, "......", 6 - (p - cp));
5649a1d6
UD
2546 @}
2547 if (len > 0)
2548 @{
2549 unsigned long int n = *in++;
2550 if (--len > 0)
2551 @{
2552 n = (n << 8) | *in++;
2553 if (--len > 0)
2554 n = (n << 8) | *in;
2555 @}
290639c3 2556 cp = stpcpy (cp, l64a (htonl (n)));
5649a1d6
UD
2557 @}
2558 *cp = '\0';
2559 return out;
2560@}
2561@end smallexample
2562
2563It is strange that the library does not provide the complete
dd7d45e8
UD
2564functionality needed but so be it.
2565
2566@end deftypefun
5649a1d6 2567
b4012b75
UD
2568To decode data produced with @code{l64a} the following function should be
2569used.
2570
2571@deftypefun {long int} a64l (const char *@var{string})
d08a7e4c 2572@standards{XPG, stdlib.h}
11087373 2573@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
b4012b75 2574The parameter @var{string} should contain a string which was produced by
2cc4b9cc
PE
2575a call to @code{l64a}. The function processes at least 6 bytes of
2576this string, and decodes the bytes it finds according to the table
2577below. It stops decoding when it finds a byte not in the table,
dd7d45e8 2578rather like @code{atoi}; if you have a buffer which has been broken into
2cc4b9cc 2579lines, you must be careful to skip over the end-of-line bytes.
dd7d45e8
UD
2580
2581The decoded number is returned as a @code{long int} value.
b4012b75 2582@end deftypefun
b13927da 2583
dd7d45e8 2584The @code{l64a} and @code{a64l} functions use a base 64 encoding, in
2cc4b9cc 2585which each byte of an encoded string represents six bits of an
dd7d45e8
UD
2586input word. These symbols are used for the base 64 digits:
2587
2588@multitable {xxxxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx}
2589@item @tab 0 @tab 1 @tab 2 @tab 3 @tab 4 @tab 5 @tab 6 @tab 7
2590@item 0 @tab @code{.} @tab @code{/} @tab @code{0} @tab @code{1}
2591 @tab @code{2} @tab @code{3} @tab @code{4} @tab @code{5}
2592@item 8 @tab @code{6} @tab @code{7} @tab @code{8} @tab @code{9}
2593 @tab @code{A} @tab @code{B} @tab @code{C} @tab @code{D}
2594@item 16 @tab @code{E} @tab @code{F} @tab @code{G} @tab @code{H}
2595 @tab @code{I} @tab @code{J} @tab @code{K} @tab @code{L}
2596@item 24 @tab @code{M} @tab @code{N} @tab @code{O} @tab @code{P}
2597 @tab @code{Q} @tab @code{R} @tab @code{S} @tab @code{T}
2598@item 32 @tab @code{U} @tab @code{V} @tab @code{W} @tab @code{X}
2599 @tab @code{Y} @tab @code{Z} @tab @code{a} @tab @code{b}
2600@item 40 @tab @code{c} @tab @code{d} @tab @code{e} @tab @code{f}
2601 @tab @code{g} @tab @code{h} @tab @code{i} @tab @code{j}
2602@item 48 @tab @code{k} @tab @code{l} @tab @code{m} @tab @code{n}
2603 @tab @code{o} @tab @code{p} @tab @code{q} @tab @code{r}
2604@item 56 @tab @code{s} @tab @code{t} @tab @code{u} @tab @code{v}
2605 @tab @code{w} @tab @code{x} @tab @code{y} @tab @code{z}
2606@end multitable
2607
2608This encoding scheme is not standard. There are some other encoding
2609methods which are much more widely used (UU encoding, MIME encoding).
2610Generally, it is better to use one of these encodings.
2611
b13927da
UD
2612@node Argz and Envz Vectors
2613@section Argz and Envz Vectors
2614
5649a1d6 2615@cindex argz vectors (string vectors)
2cc4b9cc
PE
2616@cindex string vectors, null-byte separated
2617@cindex argument vectors, null-byte separated
b13927da 2618@dfn{argz vectors} are vectors of strings in a contiguous block of
2cc4b9cc 2619memory, each element separated from its neighbors by null bytes
b13927da
UD
2620(@code{'\0'}).
2621
5649a1d6 2622@cindex envz vectors (environment vectors)
2cc4b9cc 2623@cindex environment vectors, null-byte separated
b13927da 2624@dfn{Envz vectors} are an extension of argz vectors where each element is a
2cc4b9cc 2625name-value pair, separated by a @code{'='} byte (as in a Unix
b13927da
UD
2626environment).
2627
2628@menu
2629* Argz Functions:: Operations on argz vectors.
2630* Envz Functions:: Additional operations on environment vectors.
2631@end menu
2632
2633@node Argz Functions, Envz Functions, , Argz and Envz Vectors
2634@subsection Argz Functions
2635
2636Each argz vector is represented by a pointer to the first element, of
2637type @code{char *}, and a size, of type @code{size_t}, both of which can
2638be initialized to @code{0} to represent an empty argz vector. All argz
2639functions accept either a pointer and a size argument, or pointers to
2640them, if they will be modified.
2641
2642The argz functions use @code{malloc}/@code{realloc} to allocate/grow
f0f308c1 2643argz vectors, and so any argz vector created using these functions may
b13927da
UD
2644be freed by using @code{free}; conversely, any argz function that may
2645grow a string expects that string to have been allocated using
2646@code{malloc} (those argz functions that only examine their arguments or
2647modify them in place will work on any sort of memory).
2648@xref{Unconstrained Allocation}.
2649
2650All argz functions that do memory allocation have a return type of
2651@code{error_t}, and return @code{0} for success, and @code{ENOMEM} if an
2652allocation error occurs.
2653
2654@pindex argz.h
2655These functions are declared in the standard include file @file{argz.h}.
2656
2657@deftypefun {error_t} argz_create (char *const @var{argv}[], char **@var{argz}, size_t *@var{argz_len})
d08a7e4c 2658@standards{GNU, argz.h}
11087373 2659@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
5649a1d6 2660The @code{argz_create} function converts the Unix-style argument vector
b13927da
UD
2661@var{argv} (a vector of pointers to normal C strings, terminated by
2662@code{(char *)0}; @pxref{Program Arguments}) into an argz vector with
2663the same elements, which is returned in @var{argz} and @var{argz_len}.
2664@end deftypefun
2665
2666@deftypefun {error_t} argz_create_sep (const char *@var{string}, int @var{sep}, char **@var{argz}, size_t *@var{argz_len})
d08a7e4c 2667@standards{GNU, argz.h}
11087373 2668@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
2cc4b9cc 2669The @code{argz_create_sep} function converts the string
b13927da 2670@var{string} into an argz vector (returned in @var{argz} and
49c091e5 2671@var{argz_len}) by splitting it into elements at every occurrence of the
2cc4b9cc 2672byte @var{sep}.
b13927da
UD
2673@end deftypefun
2674
f0f308c1 2675@deftypefun {size_t} argz_count (const char *@var{argz}, size_t @var{argz_len})
d08a7e4c 2676@standards{GNU, argz.h}
11087373 2677@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
b13927da
UD
2678Returns the number of elements in the argz vector @var{argz} and
2679@var{argz_len}.
2680@end deftypefun
2681
8ded91fb 2682@deftypefun {void} argz_extract (const char *@var{argz}, size_t @var{argz_len}, char **@var{argv})
d08a7e4c 2683@standards{GNU, argz.h}
11087373 2684@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
b13927da 2685The @code{argz_extract} function converts the argz vector @var{argz} and
5649a1d6 2686@var{argz_len} into a Unix-style argument vector stored in @var{argv},
b13927da
UD
2687by putting pointers to every element in @var{argz} into successive
2688positions in @var{argv}, followed by a terminator of @code{0}.
2689@var{Argv} must be pre-allocated with enough space to hold all the
2690elements in @var{argz} plus the terminating @code{(char *)0}
2691(@code{(argz_count (@var{argz}, @var{argz_len}) + 1) * sizeof (char *)}
2692bytes should be enough). Note that the string pointers stored into
2693@var{argv} point into @var{argz}---they are not copies---and so
2694@var{argz} must be copied if it will be changed while @var{argv} is
2695still active. This function is useful for passing the elements in
2696@var{argz} to an exec function (@pxref{Executing a File}).
2697@end deftypefun
2698
2699@deftypefun {void} argz_stringify (char *@var{argz}, size_t @var{len}, int @var{sep})
d08a7e4c 2700@standards{GNU, argz.h}
11087373 2701@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
b13927da 2702The @code{argz_stringify} converts @var{argz} into a normal string with
2cc4b9cc 2703the elements separated by the byte @var{sep}, by replacing each
b13927da
UD
2704@code{'\0'} inside @var{argz} (except the last one, which terminates the
2705string) with @var{sep}. This is handy for printing @var{argz} in a
2706readable manner.
2707@end deftypefun
2708
2709@deftypefun {error_t} argz_add (char **@var{argz}, size_t *@var{argz_len}, const char *@var{str})
d08a7e4c 2710@standards{GNU, argz.h}
11087373
AO
2711@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
2712@c Calls strlen and argz_append.
b13927da
UD
2713The @code{argz_add} function adds the string @var{str} to the end of the
2714argz vector @code{*@var{argz}}, and updates @code{*@var{argz}} and
2715@code{*@var{argz_len}} accordingly.
2716@end deftypefun
2717
2718@deftypefun {error_t} argz_add_sep (char **@var{argz}, size_t *@var{argz_len}, const char *@var{str}, int @var{delim})
d08a7e4c 2719@standards{GNU, argz.h}
11087373 2720@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
b13927da 2721The @code{argz_add_sep} function is similar to @code{argz_add}, but
49c091e5 2722@var{str} is split into separate elements in the result at occurrences of
2cc4b9cc 2723the byte @var{delim}. This is useful, for instance, for
5649a1d6 2724adding the components of a Unix search path to an argz vector, by using
b13927da
UD
2725a value of @code{':'} for @var{delim}.
2726@end deftypefun
2727
2728@deftypefun {error_t} argz_append (char **@var{argz}, size_t *@var{argz_len}, const char *@var{buf}, size_t @var{buf_len})
d08a7e4c 2729@standards{GNU, argz.h}
11087373 2730@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
b13927da
UD
2731The @code{argz_append} function appends @var{buf_len} bytes starting at
2732@var{buf} to the argz vector @code{*@var{argz}}, reallocating
2733@code{*@var{argz}} to accommodate it, and adding @var{buf_len} to
2734@code{*@var{argz_len}}.
2735@end deftypefun
2736
30aa5785 2737@deftypefun {void} argz_delete (char **@var{argz}, size_t *@var{argz_len}, char *@var{entry})
d08a7e4c 2738@standards{GNU, argz.h}
11087373
AO
2739@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
2740@c Calls free if no argument is left.
b13927da
UD
2741If @var{entry} points to the beginning of one of the elements in the
2742argz vector @code{*@var{argz}}, the @code{argz_delete} function will
2743remove this entry and reallocate @code{*@var{argz}}, modifying
2744@code{*@var{argz}} and @code{*@var{argz_len}} accordingly. Note that as
2745destructive argz functions usually reallocate their argz argument,
2746pointers into argz vectors such as @var{entry} will then become invalid.
2747@end deftypefun
2748
2749@deftypefun {error_t} argz_insert (char **@var{argz}, size_t *@var{argz_len}, char *@var{before}, const char *@var{entry})
d08a7e4c 2750@standards{GNU, argz.h}
11087373
AO
2751@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
2752@c Calls argz_add or realloc and memmove.
b13927da
UD
2753The @code{argz_insert} function inserts the string @var{entry} into the
2754argz vector @code{*@var{argz}} at a point just before the existing
2755element pointed to by @var{before}, reallocating @code{*@var{argz}} and
2756updating @code{*@var{argz}} and @code{*@var{argz_len}}. If @var{before}
2757is @code{0}, @var{entry} is added to the end instead (as if by
2758@code{argz_add}). Since the first element is in fact the same as
2759@code{*@var{argz}}, passing in @code{*@var{argz}} as the value of
2760@var{before} will result in @var{entry} being inserted at the beginning.
2761@end deftypefun
2762
8ded91fb 2763@deftypefun {char *} argz_next (const char *@var{argz}, size_t @var{argz_len}, const char *@var{entry})
d08a7e4c 2764@standards{GNU, argz.h}
11087373 2765@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
b13927da
UD
2766The @code{argz_next} function provides a convenient way of iterating
2767over the elements in the argz vector @var{argz}. It returns a pointer
2768to the next element in @var{argz} after the element @var{entry}, or
2769@code{0} if there are no elements following @var{entry}. If @var{entry}
2770is @code{0}, the first element of @var{argz} is returned.
2771
2772This behavior suggests two styles of iteration:
2773
2774@smallexample
2775 char *entry = 0;
2776 while ((entry = argz_next (@var{argz}, @var{argz_len}, entry)))
2777 @var{action};
2778@end smallexample
2779
2780(the double parentheses are necessary to make some C compilers shut up
2781about what they consider a questionable @code{while}-test) and:
2782
2783@smallexample
2784 char *entry;
2785 for (entry = @var{argz};
2786 entry;
2787 entry = argz_next (@var{argz}, @var{argz_len}, entry))
2788 @var{action};
2789@end smallexample
2790
2791Note that the latter depends on @var{argz} having a value of @code{0} if
2792it is empty (rather than a pointer to an empty block of memory); this
2793invariant is maintained for argz vectors created by the functions here.
2794@end deftypefun
2795
d705269e 2796@deftypefun error_t argz_replace (@w{char **@var{argz}, size_t *@var{argz_len}}, @w{const char *@var{str}, const char *@var{with}}, @w{unsigned *@var{replace_count}})
d08a7e4c 2797@standards{GNU, argz.h}
11087373 2798@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
49c091e5 2799Replace any occurrences of the string @var{str} in @var{argz} with
d705269e
UD
2800@var{with}, reallocating @var{argz} as necessary. If
2801@var{replace_count} is non-zero, @code{*@var{replace_count}} will be
f0f308c1 2802incremented by the number of replacements performed.
d705269e
UD
2803@end deftypefun
2804
b13927da
UD
2805@node Envz Functions, , Argz Functions, Argz and Envz Vectors
2806@subsection Envz Functions
2807
2808Envz vectors are just argz vectors with additional constraints on the form
2809of each element; as such, argz functions can also be used on them, where it
2810makes sense.
2811
2812Each element in an envz vector is a name-value pair, separated by a @code{'='}
2cc4b9cc 2813byte; if multiple @code{'='} bytes are present in an element, those
b13927da 2814after the first are considered part of the value, and treated like all other
2cc4b9cc 2815non-@code{'\0'} bytes.
b13927da 2816
2cc4b9cc 2817If @emph{no} @code{'='} bytes are present in an element, that element is
b13927da
UD
2818considered the name of a ``null'' entry, as distinct from an entry with an
2819empty value: @code{envz_get} will return @code{0} if given the name of null
2820entry, whereas an entry with an empty value would result in a value of
2821@code{""}; @code{envz_entry} will still find such entries, however. Null
f0f308c1 2822entries can be removed with the @code{envz_strip} function.
b13927da
UD
2823
2824As with argz functions, envz functions that may allocate memory (and thus
2825fail) have a return type of @code{error_t}, and return either @code{0} or
2826@code{ENOMEM}.
2827
2828@pindex envz.h
2829These functions are declared in the standard include file @file{envz.h}.
2830
2831@deftypefun {char *} envz_entry (const char *@var{envz}, size_t @var{envz_len}, const char *@var{name})
d08a7e4c 2832@standards{GNU, envz.h}
11087373 2833@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
b13927da
UD
2834The @code{envz_entry} function finds the entry in @var{envz} with the name
2835@var{name}, and returns a pointer to the whole entry---that is, the argz
2cc4b9cc 2836element which begins with @var{name} followed by a @code{'='} byte. If
b13927da
UD
2837there is no entry with that name, @code{0} is returned.
2838@end deftypefun
2839
2840@deftypefun {char *} envz_get (const char *@var{envz}, size_t @var{envz_len}, const char *@var{name})
d08a7e4c 2841@standards{GNU, envz.h}
11087373 2842@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
b13927da
UD
2843The @code{envz_get} function finds the entry in @var{envz} with the name
2844@var{name} (like @code{envz_entry}), and returns a pointer to the value
2845portion of that entry (following the @code{'='}). If there is no entry with
2846that name (or only a null entry), @code{0} is returned.
2847@end deftypefun
2848
2849@deftypefun {error_t} envz_add (char **@var{envz}, size_t *@var{envz_len}, const char *@var{name}, const char *@var{value})
d08a7e4c 2850@standards{GNU, envz.h}
11087373
AO
2851@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
2852@c Calls envz_remove, which calls enz_entry and argz_delete, and then
2853@c argz_add or equivalent code that reallocs and appends name=value.
b13927da
UD
2854The @code{envz_add} function adds an entry to @code{*@var{envz}}
2855(updating @code{*@var{envz}} and @code{*@var{envz_len}}) with the name
2856@var{name}, and value @var{value}. If an entry with the same name
2857already exists in @var{envz}, it is removed first. If @var{value} is
f0f308c1 2858@code{0}, then the new entry will be the special null type of entry
b13927da
UD
2859(mentioned above).
2860@end deftypefun
2861
2862@deftypefun {error_t} envz_merge (char **@var{envz}, size_t *@var{envz_len}, const char *@var{envz2}, size_t @var{envz2_len}, int @var{override})
d08a7e4c 2863@standards{GNU, envz.h}
11087373 2864@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
b13927da
UD
2865The @code{envz_merge} function adds each entry in @var{envz2} to @var{envz},
2866as if with @code{envz_add}, updating @code{*@var{envz}} and
2867@code{*@var{envz_len}}. If @var{override} is true, then values in @var{envz2}
2868will supersede those with the same name in @var{envz}, otherwise not.
2869
2870Null entries are treated just like other entries in this respect, so a null
2871entry in @var{envz} can prevent an entry of the same name in @var{envz2} from
2872being added to @var{envz}, if @var{override} is false.
2873@end deftypefun
2874
2875@deftypefun {void} envz_strip (char **@var{envz}, size_t *@var{envz_len})
d08a7e4c 2876@standards{GNU, envz.h}
11087373 2877@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
b13927da
UD
2878The @code{envz_strip} function removes any null entries from @var{envz},
2879updating @code{*@var{envz}} and @code{*@var{envz_len}}.
2880@end deftypefun
11087373 2881
920d7012 2882@deftypefun {void} envz_remove (char **@var{envz}, size_t *@var{envz_len}, const char *@var{name})
d08a7e4c 2883@standards{GNU, envz.h}
654055e0 2884@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
920d7012
SP
2885The @code{envz_remove} function removes an entry named @var{name} from
2886@var{envz}, updating @code{*@var{envz}} and @code{*@var{envz_len}}.
2887@end deftypefun
2888
11087373
AO
2889@c FIXME this are undocumented:
2890@c strcasecmp_l @safety{@mtsafe{}@assafe{}@acsafe{}} see strcasecmp