]> git.ipfire.org Git - thirdparty/glibc.git/blame - manual/string.texi
Add fmaximum, fminimum functions
[thirdparty/glibc.git] / manual / string.texi
CommitLineData
390955cb 1@node String and Array Utilities, Character Set Handling, Character Handling, Top
7a68c94a 2@c %MENU% Utilities for copying and comparing strings and arrays
28f540f4
RM
3@chapter String and Array Utilities
4
2cc4b9cc 5Operations on strings (null-terminated byte sequences) are an important part of
1f77f049 6many programs. @Theglibc{} provides an extensive set of string
28f540f4
RM
7utility functions, including functions for copying, concatenating,
8comparing, and searching strings. Many of these functions can also
9operate on arbitrary regions of storage; for example, the @code{memcpy}
a5113b14 10function can be used to copy the contents of any kind of array.
28f540f4
RM
11
12It's fairly common for beginning C programmers to ``reinvent the wheel''
13by duplicating this functionality in their own code, but it pays to
14become familiar with the library functions and to make use of them,
15since this offers benefits in maintenance, efficiency, and portability.
16
17For instance, you could easily compare one string to another in two
18lines of C code, but if you use the built-in @code{strcmp} function,
19you're less likely to make a mistake. And, since these library
20functions are typically highly optimized, your program may run faster
21too.
22
23@menu
24* Representation of Strings:: Introduction to basic concepts.
25* String/Array Conventions:: Whether to use a string function or an
26 arbitrary array function.
27* String Length:: Determining the length of a string.
0a13c9e9
PE
28* Copying Strings and Arrays:: Functions to copy strings and arrays.
29* Concatenating Strings:: Functions to concatenate strings while copying.
30* Truncating Strings:: Functions to truncate strings while copying.
28f540f4
RM
31* String/Array Comparison:: Functions for byte-wise and character-wise
32 comparison.
33* Collation Functions:: Functions for collating strings.
34* Search Functions:: Searching for a specific element or substring.
35* Finding Tokens in a String:: Splitting a string into tokens by looking
36 for delimiters.
ea1bd74d
ZW
37* Erasing Sensitive Data:: Clearing memory which contains sensitive
38 data, after it's no longer needed.
b10a0acc
ZW
39* Shuffling Bytes:: Or how to flash-cook a string.
40* Obfuscating Data:: Reversibly obscuring data from casual view.
b4012b75 41* Encode Binary Data:: Encoding and Decoding of Binary Data.
b13927da 42* Argz and Envz Vectors:: Null-separated string vectors.
28f540f4
RM
43@end menu
44
b4012b75 45@node Representation of Strings
28f540f4
RM
46@section Representation of Strings
47@cindex string, representation of
48
49This section is a quick summary of string concepts for beginning C
2cc4b9cc 50programmers. It describes how strings are represented in C
28f540f4
RM
51and some common pitfalls. If you are already familiar with this
52material, you can skip this section.
53
54@cindex string
2cc4b9cc
PE
55A @dfn{string} is a null-terminated array of bytes of type @code{char},
56including the terminating null byte. String-valued
28f540f4
RM
57variables are usually declared to be pointers of type @code{char *}.
58Such variables do not include space for the text of a string; that has
59to be stored somewhere else---in an array variable, a string constant,
60or dynamically allocated memory (@pxref{Memory Allocation}). It's up to
61you to store the address of the chosen memory space into the pointer
62variable. Alternatively you can store a @dfn{null pointer} in the
63pointer variable. The null pointer does not point anywhere, so
64attempting to reference the string it points to gets an error.
65
2cc4b9cc
PE
66@cindex multibyte character
67@cindex multibyte string
68@cindex wide string
69A @dfn{multibyte character} is a sequence of one or more bytes that
70represents a single character using the locale's encoding scheme; a
71null byte always represents the null character. A @dfn{multibyte
72string} is a string that consists entirely of multibyte
73characters. In contrast, a @dfn{wide string} is a null-terminated
74sequence of @code{wchar_t} objects. A wide-string variable is usually
75declared to be a pointer of type @code{wchar_t *}, by analogy with
76string variables and @code{char *}. @xref{Extended Char Intro}.
77
78@cindex null byte
8a2f1f5b 79@cindex null wide character
2cc4b9cc
PE
80By convention, the @dfn{null byte}, @code{'\0'},
81marks the end of a string and the @dfn{null wide character},
82@code{L'\0'}, marks the end of a wide string. For example, in
8a2f1f5b 83testing to see whether the @code{char *} variable @var{p} points to a
2cc4b9cc 84null byte marking the end of a string, you can write
8a2f1f5b 85@code{!*@var{p}} or @code{*@var{p} == '\0'}.
28f540f4 86
2cc4b9cc
PE
87A null byte is quite different conceptually from a null pointer,
88although both are represented by the integer constant @code{0}.
28f540f4
RM
89
90@cindex string literal
2cc4b9cc
PE
91A @dfn{string literal} appears in C program source as a multibyte
92string between double-quote characters (@samp{"}). If the
93initial double-quote character is immediately preceded by a capital
94@samp{L} (ell) character (as in @code{L"foo"}), it is a wide string
95literal. String literals can also contribute to @dfn{string
96concatenation}: @code{"a" "b"} is the same as @code{"ab"}.
97For wide strings one can use either
8a2f1f5b
UD
98@code{L"a" L"b"} or @code{L"a" "b"}. Modification of string literals is
99not allowed by the GNU C compiler, because literals are placed in
100read-only storage.
28f540f4 101
2cc4b9cc 102Arrays that are declared @code{const} cannot be modified
28f540f4
RM
103either. It's generally good style to declare non-modifiable string
104pointers to be of type @code{const char *}, since this often allows the
105C compiler to detect accidental modifications as well as providing some
106amount of documentation about what your program intends to do with the
107string.
108
2cc4b9cc
PE
109The amount of memory allocated for a byte array may extend past the null byte
110that marks the end of the string that the array contains. In this
dd7d45e8 111document, the term @dfn{allocated size} is always used to refer to the
2cc4b9cc
PE
112total amount of memory allocated for an array, while the term
113@dfn{length} refers to the number of bytes up to (but not including)
114the terminating null byte. Wide strings are similar, except their
115sizes and lengths count wide characters, not bytes.
28f540f4
RM
116@cindex length of string
117@cindex allocation size of string
118@cindex size of string
119@cindex string length
120@cindex string allocation
121
2cc4b9cc 122A notorious source of program bugs is trying to put more bytes into a
28f540f4 123string than fit in its allocated size. When writing code that extends
2cc4b9cc 124strings or moves bytes into a pre-allocated array, you should be
28f540f4
RM
125very careful to keep track of the length of the text and make explicit
126checks for overflowing the array. Many of the library functions
127@emph{do not} do this for you! Remember also that you need to allocate
2cc4b9cc 128an extra byte to hold the null byte that marks the end of the
28f540f4
RM
129string.
130
8a2f1f5b
UD
131@cindex single-byte string
132@cindex multibyte string
2cc4b9cc 133Originally strings were sequences of bytes where each byte represented a
8a2f1f5b
UD
134single character. This is still true today if the strings are encoded
135using a single-byte character encoding. Things are different if the
136strings are encoded using a multibyte encoding (for more information on
137encodings see @ref{Extended Char Intro}). There is no difference in
138the programming interface for these two kind of strings; the programmer
139has to be aware of this and interpret the byte sequences accordingly.
140
141But since there is no separate interface taking care of these
142differences the byte-based string functions are sometimes hard to use.
143Since the count parameters of these functions specify bytes a call to
2cc4b9cc 144@code{memcpy} could cut a multibyte character in the middle and put an
8a2f1f5b
UD
145incomplete (and therefore unusable) byte sequence in the target buffer.
146
2cc4b9cc 147@cindex wide string
8a2f1f5b
UD
148To avoid these problems later versions of the @w{ISO C} standard
149introduce a second set of functions which are operating on @dfn{wide
150characters} (@pxref{Extended Char Intro}). These functions don't have
151the problems the single-byte versions have since every wide character is
152a legal, interpretable value. This does not mean that cutting wide
2cc4b9cc 153strings at arbitrary points is without problems. It normally
8a2f1f5b
UD
154is for alphabet-based languages (except for non-normalized text) but
155languages based on syllables still have the problem that more than one
156wide character is necessary to complete a logical unit. This is a
157higher level problem which the @w{C library} functions are not designed
158to solve. But it is at least good that no invalid byte sequences can be
2cc4b9cc
PE
159created. Also, the higher level functions can also much more easily operate
160on wide characters than on multibyte characters so that a common strategy
8a2f1f5b
UD
161is to use wide characters internally whenever text is more than simply
162copied.
163
164The remaining of this chapter will discuss the functions for handling
2cc4b9cc
PE
165wide strings in parallel with the discussion of
166strings since there is almost always an exact equivalent
8a2f1f5b
UD
167available.
168
b4012b75 169@node String/Array Conventions
28f540f4
RM
170@section String and Array Conventions
171
172This chapter describes both functions that work on arbitrary arrays or
2cc4b9cc
PE
173blocks of memory, and functions that are specific to strings and wide
174strings.
28f540f4
RM
175
176Functions that operate on arbitrary blocks of memory have names
8a2f1f5b
UD
177beginning with @samp{mem} and @samp{wmem} (such as @code{memcpy} and
178@code{wmemcpy}) and invariably take an argument which specifies the size
179(in bytes and wide characters respectively) of the block of memory to
28f540f4 180operate on. The array arguments and return values for these functions
8a2f1f5b
UD
181have type @code{void *} or @code{wchar_t}. As a matter of style, the
182elements of the arrays used with the @samp{mem} functions are referred
183to as ``bytes''. You can pass any kind of pointer to these functions,
184and the @code{sizeof} operator is useful in computing the value for the
185size argument. Parameters to the @samp{wmem} functions must be of type
186@code{wchar_t *}. These functions are not really usable with anything
187but arrays of this type.
188
189In contrast, functions that operate specifically on strings and wide
2cc4b9cc 190strings have names beginning with @samp{str} and @samp{wcs}
8a2f1f5b 191respectively (such as @code{strcpy} and @code{wcscpy}) and look for a
2cc4b9cc 192terminating null byte or null wide character instead of requiring an explicit
8a2f1f5b 193size argument to be passed. (Some of these functions accept a specified
2cc4b9cc
PE
194maximum length, but they also check for premature termination.)
195The array arguments and return values for these
8a2f1f5b 196functions have type @code{char *} and @code{wchar_t *} respectively, and
2cc4b9cc 197the array elements are referred to as ``bytes'' and ``wide
8a2f1f5b
UD
198characters''.
199
200In many cases, there are both @samp{mem} and @samp{str}/@samp{wcs}
201versions of a function. The one that is more appropriate to use depends
202on the exact situation. When your program is manipulating arbitrary
203arrays or blocks of storage, then you should always use the @samp{mem}
2cc4b9cc 204functions. On the other hand, when you are manipulating
8a2f1f5b
UD
205strings it is usually more convenient to use the @samp{str}/@samp{wcs}
206functions, unless you already know the length of the string in advance.
207The @samp{wmem} functions should be used for wide character arrays with
208known size.
209
210@cindex wint_t
211@cindex parameter promotion
212Some of the memory and string functions take single characters as
213arguments. Since a value of type @code{char} is automatically promoted
9dcc8f11 214into a value of type @code{int} when used as a parameter, the functions
8a2f1f5b 215are declared with @code{int} as the type of the parameter in question.
2cc4b9cc 216In case of the wide character functions the situation is similar: the
8a2f1f5b
UD
217parameter type for a single wide character is @code{wint_t} and not
218@code{wchar_t}. This would for many implementations not be necessary
2cc4b9cc 219since @code{wchar_t} is large enough to not be automatically
8a2f1f5b
UD
220promoted, but since the @w{ISO C} standard does not require such a
221choice of types the @code{wint_t} type is used.
28f540f4 222
b4012b75 223@node String Length
28f540f4
RM
224@section String Length
225
226You can get the length of a string using the @code{strlen} function.
227This function is declared in the header file @file{string.h}.
228@pindex string.h
229
28f540f4 230@deftypefun size_t strlen (const char *@var{s})
d08a7e4c 231@standards{ISO, string.h}
11087373 232@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
2cc4b9cc 233The @code{strlen} function returns the length of the
8a2f1f5b 234string @var{s} in bytes. (In other words, it returns the offset of the
2cc4b9cc 235terminating null byte within the array.)
28f540f4
RM
236
237For example,
238@smallexample
239strlen ("hello, world")
240 @result{} 12
241@end smallexample
242
2cc4b9cc 243When applied to an array, the @code{strlen} function returns
dd7d45e8 244the length of the string stored there, not its allocated size. You can
2cc4b9cc 245get the allocated size of the array that holds a string using
28f540f4
RM
246the @code{sizeof} operator:
247
248@smallexample
a5113b14 249char string[32] = "hello, world";
28f540f4
RM
250sizeof (string)
251 @result{} 32
252strlen (string)
253 @result{} 12
254@end smallexample
dd7d45e8 255
2cc4b9cc 256But beware, this will not work unless @var{string} is the
dd7d45e8
UD
257array itself, not a pointer to it. For example:
258
259@smallexample
260char string[32] = "hello, world";
261char *ptr = string;
262sizeof (string)
263 @result{} 32
264sizeof (ptr)
265 @result{} 4 /* @r{(on a machine with 4 byte pointers)} */
266@end smallexample
267
268This is an easy mistake to make when you are working with functions that
269take string arguments; those arguments are always pointers, not arrays.
270
8a2f1f5b
UD
271It must also be noted that for multibyte encoded strings the return
272value does not have to correspond to the number of characters in the
273string. To get this value the string can be converted to wide
274characters and @code{wcslen} can be used or something like the following
275code can be used:
276
277@smallexample
278/* @r{The input is in @code{string}.}
279 @r{The length is expected in @code{n}.} */
280@{
281 mbstate_t t;
282 char *scopy = string;
283 /* In initial state. */
284 memset (&t, '\0', sizeof (t));
285 /* Determine number of characters. */
286 n = mbsrtowcs (NULL, &scopy, strlen (scopy), &t);
287@}
288@end smallexample
289
290This is cumbersome to do so if the number of characters (as opposed to
291bytes) is needed often it is better to work with wide characters.
292@end deftypefun
293
294The wide character equivalent is declared in @file{wchar.h}.
295
8a2f1f5b 296@deftypefun size_t wcslen (const wchar_t *@var{ws})
d08a7e4c 297@standards{ISO, wchar.h}
11087373 298@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
299The @code{wcslen} function is the wide character equivalent to
300@code{strlen}. The return value is the number of wide characters in the
2cc4b9cc 301wide string pointed to by @var{ws} (this is also the offset of
8a2f1f5b
UD
302the terminating null wide character of @var{ws}).
303
2cc4b9cc 304Since there are no multi wide character sequences making up one wide
8a2f1f5b
UD
305character the return value is not only the offset in the array, it is
306also the number of wide characters.
307
308This function was introduced in @w{Amendment 1} to @w{ISO C90}.
28f540f4
RM
309@end deftypefun
310
4547c1a4 311@deftypefun size_t strnlen (const char *@var{s}, size_t @var{maxlen})
d08a7e4c 312@standards{GNU, string.h}
11087373 313@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
2cc4b9cc
PE
314If the array @var{s} of size @var{maxlen} contains a null byte,
315the @code{strnlen} function returns the length of the string @var{s} in
316bytes. Otherwise it
8a2f1f5b 317returns @var{maxlen}. Therefore this function is equivalent to
ebaf36eb
JM
318@code{(strlen (@var{s}) < @var{maxlen} ? strlen (@var{s}) : @var{maxlen})}
319but it
2cc4b9cc
PE
320is more efficient and works even if @var{s} is not null-terminated so
321long as @var{maxlen} does not exceed the size of @var{s}'s array.
4547c1a4
UD
322
323@smallexample
324char string[32] = "hello, world";
325strnlen (string, 32)
326 @result{} 12
327strnlen (string, 5)
328 @result{} 5
329@end smallexample
330
8a2f1f5b
UD
331This function is a GNU extension and is declared in @file{string.h}.
332@end deftypefun
333
8a2f1f5b 334@deftypefun size_t wcsnlen (const wchar_t *@var{ws}, size_t @var{maxlen})
d08a7e4c 335@standards{GNU, wchar.h}
11087373 336@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
337@code{wcsnlen} is the wide character equivalent to @code{strnlen}. The
338@var{maxlen} parameter specifies the maximum number of wide characters.
339
340This function is a GNU extension and is declared in @file{wchar.h}.
4547c1a4
UD
341@end deftypefun
342
0a13c9e9
PE
343@node Copying Strings and Arrays
344@section Copying Strings and Arrays
28f540f4
RM
345
346You can use the functions described in this section to copy the contents
0a13c9e9
PE
347of strings, wide strings, and arrays. The @samp{str} and @samp{mem}
348functions are declared in @file{string.h} while the @samp{w} functions
349are declared in @file{wchar.h}.
28f540f4 350@pindex string.h
8a2f1f5b 351@pindex wchar.h
28f540f4
RM
352@cindex copying strings and arrays
353@cindex string copy functions
354@cindex array copy functions
355@cindex concatenating strings
356@cindex string concatenation functions
357
358A helpful way to remember the ordering of the arguments to the functions
359in this section is that it corresponds to an assignment expression, with
0a13c9e9
PE
360the destination array specified to the left of the source array. Most
361of these functions return the address of the destination array; a few
362return the address of the destination's terminating null, or of just
363past the destination.
28f540f4
RM
364
365Most of these functions do not work properly if the source and
366destination arrays overlap. For example, if the beginning of the
367destination array overlaps the end of the source array, the original
368contents of that part of the source array may get overwritten before it
369is copied. Even worse, in the case of the string functions, the null
2cc4b9cc 370byte marking the end of the string may be lost, and the copy
28f540f4
RM
371function might get stuck in a loop trashing all the memory allocated to
372your program.
373
374All functions that have problems copying between overlapping arrays are
375explicitly identified in this manual. In addition to functions in this
376section, there are a few others like @code{sprintf} (@pxref{Formatted
377Output Functions}) and @code{scanf} (@pxref{Formatted Input
378Functions}).
379
8a2f1f5b 380@deftypefun {void *} memcpy (void *restrict @var{to}, const void *restrict @var{from}, size_t @var{size})
d08a7e4c 381@standards{ISO, string.h}
11087373 382@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
383The @code{memcpy} function copies @var{size} bytes from the object
384beginning at @var{from} into the object beginning at @var{to}. The
385behavior of this function is undefined if the two arrays @var{to} and
386@var{from} overlap; use @code{memmove} instead if overlapping is possible.
387
388The value returned by @code{memcpy} is the value of @var{to}.
389
390Here is an example of how you might use @code{memcpy} to copy the
391contents of an array:
392
393@smallexample
394struct foo *oldarray, *newarray;
395int arraysize;
396@dots{}
397memcpy (new, old, arraysize * sizeof (struct foo));
398@end smallexample
399@end deftypefun
400
79827876 401@deftypefun {wchar_t *} wmemcpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size})
d08a7e4c 402@standards{ISO, wchar.h}
11087373 403@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
404The @code{wmemcpy} function copies @var{size} wide characters from the object
405beginning at @var{wfrom} into the object beginning at @var{wto}. The
406behavior of this function is undefined if the two arrays @var{wto} and
407@var{wfrom} overlap; use @code{wmemmove} instead if overlapping is possible.
408
409The following is a possible implementation of @code{wmemcpy} but there
410are more optimizations possible.
411
412@smallexample
413wchar_t *
414wmemcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom,
415 size_t size)
416@{
417 return (wchar_t *) memcpy (wto, wfrom, size * sizeof (wchar_t));
418@}
419@end smallexample
420
421The value returned by @code{wmemcpy} is the value of @var{wto}.
422
423This function was introduced in @w{Amendment 1} to @w{ISO C90}.
424@end deftypefun
425
8a2f1f5b 426@deftypefun {void *} mempcpy (void *restrict @var{to}, const void *restrict @var{from}, size_t @var{size})
d08a7e4c 427@standards{GNU, string.h}
11087373 428@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
4547c1a4 429The @code{mempcpy} function is nearly identical to the @code{memcpy}
f2ea0f5b 430function. It copies @var{size} bytes from the object beginning at
4547c1a4 431@code{from} into the object pointed to by @var{to}. But instead of
976780fd 432returning the value of @var{to} it returns a pointer to the byte
4547c1a4
UD
433following the last written byte in the object beginning at @var{to}.
434I.e., the value is @code{((void *) ((char *) @var{to} + @var{size}))}.
435
436This function is useful in situations where a number of objects shall be
437copied to consecutive memory positions.
438
439@smallexample
440void *
441combine (void *o1, size_t s1, void *o2, size_t s2)
442@{
443 void *result = malloc (s1 + s2);
444 if (result != NULL)
445 mempcpy (mempcpy (result, o1, s1), o2, s2);
446 return result;
447@}
448@end smallexample
449
450This function is a GNU extension.
451@end deftypefun
452
8a2f1f5b 453@deftypefun {wchar_t *} wmempcpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size})
d08a7e4c 454@standards{GNU, wchar.h}
11087373 455@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
456The @code{wmempcpy} function is nearly identical to the @code{wmemcpy}
457function. It copies @var{size} wide characters from the object
458beginning at @code{wfrom} into the object pointed to by @var{wto}. But
459instead of returning the value of @var{wto} it returns a pointer to the
460wide character following the last written wide character in the object
461beginning at @var{wto}. I.e., the value is @code{@var{wto} + @var{size}}.
462
463This function is useful in situations where a number of objects shall be
464copied to consecutive memory positions.
465
466The following is a possible implementation of @code{wmemcpy} but there
467are more optimizations possible.
468
469@smallexample
470wchar_t *
471wmempcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom,
472 size_t size)
473@{
474 return (wchar_t *) mempcpy (wto, wfrom, size * sizeof (wchar_t));
475@}
476@end smallexample
477
478This function is a GNU extension.
479@end deftypefun
480
28f540f4 481@deftypefun {void *} memmove (void *@var{to}, const void *@var{from}, size_t @var{size})
d08a7e4c 482@standards{ISO, string.h}
11087373 483@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
484@code{memmove} copies the @var{size} bytes at @var{from} into the
485@var{size} bytes at @var{to}, even if those two blocks of space
486overlap. In the case of overlap, @code{memmove} is careful to copy the
487original values of the bytes in the block at @var{from}, including those
488bytes which also belong to the block at @var{to}.
8a2f1f5b
UD
489
490The value returned by @code{memmove} is the value of @var{to}.
491@end deftypefun
492
8ded91fb 493@deftypefun {wchar_t *} wmemmove (wchar_t *@var{wto}, const wchar_t *@var{wfrom}, size_t @var{size})
d08a7e4c 494@standards{ISO, wchar.h}
11087373 495@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
496@code{wmemmove} copies the @var{size} wide characters at @var{wfrom}
497into the @var{size} wide characters at @var{wto}, even if those two
f0f308c1 498blocks of space overlap. In the case of overlap, @code{wmemmove} is
8a2f1f5b
UD
499careful to copy the original values of the wide characters in the block
500at @var{wfrom}, including those wide characters which also belong to the
501block at @var{wto}.
502
503The following is a possible implementation of @code{wmemcpy} but there
504are more optimizations possible.
505
506@smallexample
507wchar_t *
508wmempcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom,
509 size_t size)
510@{
511 return (wchar_t *) mempcpy (wto, wfrom, size * sizeof (wchar_t));
512@}
513@end smallexample
514
515The value returned by @code{wmemmove} is the value of @var{wto}.
516
517This function is a GNU extension.
28f540f4
RM
518@end deftypefun
519
8a2f1f5b 520@deftypefun {void *} memccpy (void *restrict @var{to}, const void *restrict @var{from}, int @var{c}, size_t @var{size})
d08a7e4c 521@standards{SVID, string.h}
11087373 522@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
523This function copies no more than @var{size} bytes from @var{from} to
524@var{to}, stopping if a byte matching @var{c} is found. The return
525value is a pointer into @var{to} one byte past where @var{c} was copied,
526or a null pointer if no byte matching @var{c} appeared in the first
527@var{size} bytes of @var{from}.
528@end deftypefun
529
28f540f4 530@deftypefun {void *} memset (void *@var{block}, int @var{c}, size_t @var{size})
d08a7e4c 531@standards{ISO, string.h}
11087373 532@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
533This function copies the value of @var{c} (converted to an
534@code{unsigned char}) into each of the first @var{size} bytes of the
535object beginning at @var{block}. It returns the value of @var{block}.
536@end deftypefun
537
8a2f1f5b 538@deftypefun {wchar_t *} wmemset (wchar_t *@var{block}, wchar_t @var{wc}, size_t @var{size})
d08a7e4c 539@standards{ISO, wchar.h}
11087373 540@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
541This function copies the value of @var{wc} into each of the first
542@var{size} wide characters of the object beginning at @var{block}. It
543returns the value of @var{block}.
544@end deftypefun
545
8a2f1f5b 546@deftypefun {char *} strcpy (char *restrict @var{to}, const char *restrict @var{from})
d08a7e4c 547@standards{ISO, string.h}
11087373 548@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
2cc4b9cc
PE
549This copies bytes from the string @var{from} (up to and including
550the terminating null byte) into the string @var{to}. Like
28f540f4
RM
551@code{memcpy}, this function has undefined results if the strings
552overlap. The return value is the value of @var{to}.
553@end deftypefun
554
8a2f1f5b 555@deftypefun {wchar_t *} wcscpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom})
d08a7e4c 556@standards{ISO, wchar.h}
11087373 557@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
2cc4b9cc 558This copies wide characters from the wide string @var{wfrom} (up to and
8a2f1f5b
UD
559including the terminating null wide character) into the string
560@var{wto}. Like @code{wmemcpy}, this function has undefined results if
561the strings overlap. The return value is the value of @var{wto}.
562@end deftypefun
563
28f540f4 564@deftypefun {char *} strdup (const char *@var{s})
a448ee41 565@standards{SVID, string.h}
11087373 566@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
2cc4b9cc 567This function copies the string @var{s} into a newly
28f540f4
RM
568allocated string. The string is allocated using @code{malloc}; see
569@ref{Unconstrained Allocation}. If @code{malloc} cannot allocate space
570for the new string, @code{strdup} returns a null pointer. Otherwise it
571returns a pointer to the new string.
572@end deftypefun
573
8a2f1f5b 574@deftypefun {wchar_t *} wcsdup (const wchar_t *@var{ws})
d08a7e4c 575@standards{GNU, wchar.h}
11087373 576@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
2cc4b9cc 577This function copies the wide string @var{ws}
8a2f1f5b
UD
578into a newly allocated string. The string is allocated using
579@code{malloc}; see @ref{Unconstrained Allocation}. If @code{malloc}
580cannot allocate space for the new string, @code{wcsdup} returns a null
2cc4b9cc 581pointer. Otherwise it returns a pointer to the new wide string.
8a2f1f5b
UD
582
583This function is a GNU extension.
584@end deftypefun
585
8a2f1f5b 586@deftypefun {char *} stpcpy (char *restrict @var{to}, const char *restrict @var{from})
d08a7e4c 587@standards{Unknown origin, string.h}
11087373 588@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
589This function is like @code{strcpy}, except that it returns a pointer to
590the end of the string @var{to} (that is, the address of the terminating
2cc4b9cc 591null byte @code{to + strlen (from)}) rather than the beginning.
28f540f4
RM
592
593For example, this program uses @code{stpcpy} to concatenate @samp{foo}
594and @samp{bar} to produce @samp{foobar}, which it then prints.
595
596@smallexample
597@include stpcpy.c.texi
598@end smallexample
599
c30c3f46
RM
600This function is part of POSIX.1-2008 and later editions, but was
601available in @theglibc{} and other systems as an extension long before
602it was standardized.
28f540f4 603
8a2f1f5b
UD
604Its behavior is undefined if the strings overlap. The function is
605declared in @file{string.h}.
606@end deftypefun
607
8a2f1f5b 608@deftypefun {wchar_t *} wcpcpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom})
d08a7e4c 609@standards{GNU, wchar.h}
11087373 610@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
611This function is like @code{wcscpy}, except that it returns a pointer to
612the end of the string @var{wto} (that is, the address of the terminating
2cc4b9cc 613null wide character @code{wto + wcslen (wfrom)}) rather than the beginning.
8a2f1f5b
UD
614
615This function is not part of ISO or POSIX but was found useful while
1f77f049 616developing @theglibc{} itself.
8a2f1f5b
UD
617
618The behavior of @code{wcpcpy} is undefined if the strings overlap.
619
620@code{wcpcpy} is a GNU extension and is declared in @file{wchar.h}.
28f540f4
RM
621@end deftypefun
622
26b4d766 623@deftypefn {Macro} {char *} strdupa (const char *@var{s})
d08a7e4c 624@standards{GNU, string.h}
11087373 625@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
976780fd 626This macro is similar to @code{strdup} but allocates the new string
dd7d45e8
UD
627using @code{alloca} instead of @code{malloc} (@pxref{Variable Size
628Automatic}). This means of course the returned string has the same
629limitations as any block of memory allocated using @code{alloca}.
706074a5 630
dd7d45e8 631For obvious reasons @code{strdupa} is implemented only as a macro;
40a55d20 632you cannot get the address of this function. Despite this limitation
706074a5
UD
633it is a useful function. The following code shows a situation where
634using @code{malloc} would be a lot more expensive.
635
636@smallexample
637@include strdupa.c.texi
638@end smallexample
639
640Please note that calling @code{strtok} using @var{path} directly is
8a2f1f5b
UD
641invalid. It is also not allowed to call @code{strdupa} in the argument
642list of @code{strtok} since @code{strdupa} uses @code{alloca}
643(@pxref{Variable Size Automatic}) can interfere with the parameter
644passing.
706074a5
UD
645
646This function is only available if GNU CC is used.
26b4d766 647@end deftypefn
706074a5 648
0a13c9e9 649@deftypefun void bcopy (const void *@var{from}, void *@var{to}, size_t @var{size})
d08a7e4c 650@standards{BSD, string.h}
11087373 651@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
0a13c9e9
PE
652This is a partially obsolete alternative for @code{memmove}, derived from
653BSD. Note that it is not quite equivalent to @code{memmove}, because the
654arguments are not in the same order and there is no return value.
655@end deftypefun
706074a5 656
0a13c9e9 657@deftypefun void bzero (void *@var{block}, size_t @var{size})
d08a7e4c 658@standards{BSD, string.h}
0a13c9e9
PE
659@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
660This is a partially obsolete alternative for @code{memset}, derived from
661BSD. Note that it is not as general as @code{memset}, because the only
662value it can store is zero.
663@end deftypefun
706074a5 664
0a13c9e9
PE
665@node Concatenating Strings
666@section Concatenating Strings
667@pindex string.h
668@pindex wchar.h
669@cindex concatenating strings
670@cindex string concatenation functions
671
672The functions described in this section concatenate the contents of a
673string or wide string to another. They follow the string-copying
674functions in their conventions. @xref{Copying Strings and Arrays}.
675@samp{strcat} is declared in the header file @file{string.h} while
676@samp{wcscat} is declared in @file{wchar.h}.
706074a5 677
8a2f1f5b 678@deftypefun {char *} strcat (char *restrict @var{to}, const char *restrict @var{from})
d08a7e4c 679@standards{ISO, string.h}
11087373 680@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4 681The @code{strcat} function is similar to @code{strcpy}, except that the
2cc4b9cc
PE
682bytes from @var{from} are concatenated or appended to the end of
683@var{to}, instead of overwriting it. That is, the first byte from
684@var{from} overwrites the null byte marking the end of @var{to}.
28f540f4
RM
685
686An equivalent definition for @code{strcat} would be:
687
688@smallexample
689char *
8a2f1f5b 690strcat (char *restrict to, const char *restrict from)
28f540f4
RM
691@{
692 strcpy (to + strlen (to), from);
693 return to;
694@}
695@end smallexample
696
697This function has undefined results if the strings overlap.
0a13c9e9
PE
698
699As noted below, this function has significant performance issues.
28f540f4
RM
700@end deftypefun
701
8a2f1f5b 702@deftypefun {wchar_t *} wcscat (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom})
d08a7e4c 703@standards{ISO, wchar.h}
11087373 704@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b 705The @code{wcscat} function is similar to @code{wcscpy}, except that the
2cc4b9cc
PE
706wide characters from @var{wfrom} are concatenated or appended to the end of
707@var{wto}, instead of overwriting it. That is, the first wide character from
708@var{wfrom} overwrites the null wide character marking the end of @var{wto}.
8a2f1f5b
UD
709
710An equivalent definition for @code{wcscat} would be:
711
712@smallexample
713wchar_t *
714wcscat (wchar_t *wto, const wchar_t *wfrom)
715@{
716 wcscpy (wto + wcslen (wto), wfrom);
717 return wto;
718@}
719@end smallexample
720
721This function has undefined results if the strings overlap.
0a13c9e9
PE
722
723As noted below, this function has significant performance issues.
8a2f1f5b
UD
724@end deftypefun
725
726Programmers using the @code{strcat} or @code{wcscat} function (or the
0a13c9e9
PE
727@code{strncat} or @code{wcsncat} functions defined in
728a later section, for that matter)
8a2f1f5b
UD
729can easily be recognized as lazy and reckless. In almost all situations
730the lengths of the participating strings are known (it better should be
731since how can one otherwise ensure the allocated size of the buffer is
732sufficient?) Or at least, one could know them if one keeps track of the
ee2752ea 733results of the various function calls. But then it is very inefficient
8a2f1f5b
UD
734to use @code{strcat}/@code{wcscat}. A lot of time is wasted finding the
735end of the destination string so that the actual copying can start.
736This is a common example:
ee2752ea 737
ee2752ea
UD
738@cindex va_copy
739@smallexample
49c091e5 740/* @r{This function concatenates arbitrarily many strings. The last}
ee2752ea
UD
741 @r{parameter must be @code{NULL}.} */
742char *
8a2f1f5b 743concat (const char *str, @dots{})
ee2752ea
UD
744@{
745 va_list ap, ap2;
746 size_t total = 1;
ee2752ea
UD
747
748 va_start (ap, str);
b5982523 749 va_copy (ap2, ap);
ee2752ea
UD
750
751 /* @r{Determine how much space we need.} */
bdc674d9 752 for (const char *s = str; s != NULL; s = va_arg (ap, const char *))
ee2752ea
UD
753 total += strlen (s);
754
755 va_end (ap);
756
bdc674d9 757 char *result = malloc (total);
ee2752ea
UD
758 if (result != NULL)
759 @{
760 result[0] = '\0';
761
762 /* @r{Copy the strings.} */
763 for (s = str; s != NULL; s = va_arg (ap2, const char *))
764 strcat (result, s);
765 @}
766
767 va_end (ap2);
768
769 return result;
770@}
771@end smallexample
772
773This looks quite simple, especially the second loop where the strings
774are actually copied. But these innocent lines hide a major performance
775penalty. Just imagine that ten strings of 100 bytes each have to be
776concatenated. For the second string we search the already stored 100
777bytes for the end of the string so that we can append the next string.
778For all strings in total the comparisons necessary to find the end of
779the intermediate results sums up to 5500! If we combine the copying
780with the search for the allocation we can write this function more
f0f308c1 781efficiently:
ee2752ea
UD
782
783@smallexample
784char *
8a2f1f5b 785concat (const char *str, @dots{})
ee2752ea 786@{
ee2752ea 787 size_t allocated = 100;
bdc674d9 788 char *result = malloc (allocated);
ee2752ea 789
623281e0 790 if (result != NULL)
ee2752ea 791 @{
bdc674d9
PE
792 va_list ap;
793 size_t resultlen = 0;
ee2752ea
UD
794 char *newp;
795
623281e0 796 va_start (ap, str);
ee2752ea 797
bdc674d9 798 for (const char *s = str; s != NULL; s = va_arg (ap, const char *))
ee2752ea
UD
799 @{
800 size_t len = strlen (s);
801
802 /* @r{Resize the allocated memory if necessary.} */
bdc674d9 803 if (resultlen + len + 1 > allocated)
ee2752ea 804 @{
bdc674d9
PE
805 allocated += len;
806 newp = reallocarray (result, allocated, 2);
807 allocated *= 2;
ee2752ea
UD
808 if (newp == NULL)
809 @{
810 free (result);
811 return NULL;
812 @}
ee2752ea
UD
813 result = newp;
814 @}
815
bdc674d9
PE
816 memcpy (result + resultlen, s, len);
817 resultlen += len;
ee2752ea
UD
818 @}
819
820 /* @r{Terminate the result string.} */
bdc674d9 821 result[resultlen++] = '\0';
ee2752ea
UD
822
823 /* @r{Resize memory to the optimal size.} */
bdc674d9 824 newp = realloc (result, resultlen);
ee2752ea
UD
825 if (newp != NULL)
826 result = newp;
827
828 va_end (ap);
829 @}
830
831 return result;
832@}
833@end smallexample
834
835With a bit more knowledge about the input strings one could fine-tune
836the memory allocation. The difference we are pointing to here is that
837we don't use @code{strcat} anymore. We always keep track of the length
f0f308c1 838of the current intermediate result so we can save ourselves the search for the
ee2752ea 839end of the string and use @code{mempcpy}. Please note that we also
f0f308c1
RJ
840don't use @code{stpcpy} which might seem more natural since we are handling
841strings. But this is not necessary since we already know the
ee2752ea 842length of the string and therefore can use the faster memory copying
8a2f1f5b 843function. The example would work for wide characters the same way.
ee2752ea
UD
844
845Whenever a programmer feels the need to use @code{strcat} she or he
f0f308c1 846should think twice and look through the program to see whether the code cannot
ee2752ea
UD
847be rewritten to take advantage of already calculated results. Again: it
848is almost always unnecessary to use @code{strcat}.
849
0a13c9e9
PE
850@node Truncating Strings
851@section Truncating Strings while Copying
852@cindex truncating strings
853@cindex string truncation
854
855The functions described in this section copy or concatenate the
856possibly-truncated contents of a string or array to another, and
857similarly for wide strings. They follow the string-copying functions
858in their header conventions. @xref{Copying Strings and Arrays}. The
859@samp{str} functions are declared in the header file @file{string.h}
860and the @samp{wc} functions are declared in the file @file{wchar.h}.
861
0a13c9e9 862@deftypefun {char *} strncpy (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size})
a448ee41 863@standards{C90, string.h}
0a13c9e9
PE
864@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
865This function is similar to @code{strcpy} but always copies exactly
866@var{size} bytes into @var{to}.
867
868If @var{from} does not contain a null byte in its first @var{size}
869bytes, @code{strncpy} copies just the first @var{size} bytes. In this
870case no null terminator is written into @var{to}.
871
872Otherwise @var{from} must be a string with length less than
873@var{size}. In this case @code{strncpy} copies all of @var{from},
874followed by enough null bytes to add up to @var{size} bytes in all.
875
876The behavior of @code{strncpy} is undefined if the strings overlap.
877
878This function was designed for now-rarely-used arrays consisting of
879non-null bytes followed by zero or more null bytes. It needs to set
880all @var{size} bytes of the destination, even when @var{size} is much
881greater than the length of @var{from}. As noted below, this function
882is generally a poor choice for processing text.
883@end deftypefun
884
0a13c9e9 885@deftypefun {wchar_t *} wcsncpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size})
d08a7e4c 886@standards{ISO, wchar.h}
0a13c9e9
PE
887@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
888This function is similar to @code{wcscpy} but always copies exactly
889@var{size} wide characters into @var{wto}.
890
891If @var{wfrom} does not contain a null wide character in its first
892@var{size} wide characters, then @code{wcsncpy} copies just the first
893@var{size} wide characters. In this case no null terminator is
894written into @var{wto}.
895
896Otherwise @var{wfrom} must be a wide string with length less than
897@var{size}. In this case @code{wcsncpy} copies all of @var{wfrom},
898followed by enough null wide characters to add up to @var{size} wide
899characters in all.
900
901The behavior of @code{wcsncpy} is undefined if the strings overlap.
902
903This function is the wide-character counterpart of @code{strncpy} and
904suffers from most of the problems that @code{strncpy} does. For
905example, as noted below, this function is generally a poor choice for
906processing text.
907@end deftypefun
908
0a13c9e9 909@deftypefun {char *} strndup (const char *@var{s}, size_t @var{size})
d08a7e4c 910@standards{GNU, string.h}
0a13c9e9
PE
911@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
912This function is similar to @code{strdup} but always copies at most
913@var{size} bytes into the newly allocated string.
914
915If the length of @var{s} is more than @var{size}, then @code{strndup}
916copies just the first @var{size} bytes and adds a closing null byte.
917Otherwise all bytes are copied and the string is terminated.
918
919This function differs from @code{strncpy} in that it always terminates
920the destination string.
921
922As noted below, this function is generally a poor choice for
923processing text.
924
925@code{strndup} is a GNU extension.
926@end deftypefun
927
0a13c9e9 928@deftypefn {Macro} {char *} strndupa (const char *@var{s}, size_t @var{size})
d08a7e4c 929@standards{GNU, string.h}
0a13c9e9
PE
930@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
931This function is similar to @code{strndup} but like @code{strdupa} it
932allocates the new string using @code{alloca} @pxref{Variable Size
933Automatic}. The same advantages and limitations of @code{strdupa} are
934valid for @code{strndupa}, too.
935
936This function is implemented only as a macro, just like @code{strdupa}.
937Just as @code{strdupa} this macro also must not be used inside the
938parameter list in a function call.
939
940As noted below, this function is generally a poor choice for
941processing text.
942
943@code{strndupa} is only available if GNU CC is used.
944@end deftypefn
945
0a13c9e9 946@deftypefun {char *} stpncpy (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size})
d08a7e4c 947@standards{GNU, string.h}
0a13c9e9
PE
948@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
949This function is similar to @code{stpcpy} but copies always exactly
950@var{size} bytes into @var{to}.
951
952If the length of @var{from} is more than @var{size}, then @code{stpncpy}
953copies just the first @var{size} bytes and returns a pointer to the
954byte directly following the one which was copied last. Note that in
955this case there is no null terminator written into @var{to}.
956
957If the length of @var{from} is less than @var{size}, then @code{stpncpy}
958copies all of @var{from}, followed by enough null bytes to add up
959to @var{size} bytes in all. This behavior is rarely useful, but it
960is implemented to be useful in contexts where this behavior of the
961@code{strncpy} is used. @code{stpncpy} returns a pointer to the
962@emph{first} written null byte.
963
964This function is not part of ISO or POSIX but was found useful while
965developing @theglibc{} itself.
966
967Its behavior is undefined if the strings overlap. The function is
968declared in @file{string.h}.
969
970As noted below, this function is generally a poor choice for
971processing text.
972@end deftypefun
973
0a13c9e9 974@deftypefun {wchar_t *} wcpncpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size})
d08a7e4c 975@standards{GNU, wchar.h}
0a13c9e9
PE
976@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
977This function is similar to @code{wcpcpy} but copies always exactly
978@var{wsize} wide characters into @var{wto}.
979
980If the length of @var{wfrom} is more than @var{size}, then
981@code{wcpncpy} copies just the first @var{size} wide characters and
982returns a pointer to the wide character directly following the last
983non-null wide character which was copied last. Note that in this case
984there is no null terminator written into @var{wto}.
985
986If the length of @var{wfrom} is less than @var{size}, then @code{wcpncpy}
987copies all of @var{wfrom}, followed by enough null wide characters to add up
988to @var{size} wide characters in all. This behavior is rarely useful, but it
989is implemented to be useful in contexts where this behavior of the
990@code{wcsncpy} is used. @code{wcpncpy} returns a pointer to the
991@emph{first} written null wide character.
992
993This function is not part of ISO or POSIX but was found useful while
994developing @theglibc{} itself.
995
996Its behavior is undefined if the strings overlap.
997
998As noted below, this function is generally a poor choice for
999processing text.
1000
1001@code{wcpncpy} is a GNU extension.
1002@end deftypefun
1003
8a2f1f5b 1004@deftypefun {char *} strncat (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size})
d08a7e4c 1005@standards{ISO, string.h}
11087373 1006@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4 1007This function is like @code{strcat} except that not more than @var{size}
2cc4b9cc
PE
1008bytes from @var{from} are appended to the end of @var{to}, and
1009@var{from} need not be null-terminated. A single null byte is also
1010always appended to @var{to}, so the total
28f540f4
RM
1011allocated size of @var{to} must be at least @code{@var{size} + 1} bytes
1012longer than its initial length.
1013
1014The @code{strncat} function could be implemented like this:
1015
1016@smallexample
1017@group
1018char *
1019strncat (char *to, const char *from, size_t size)
1020@{
5d1d4918
PE
1021 size_t len = strlen (to);
1022 memcpy (to + len, from, strnlen (from, size));
1023 to[len + strnlen (from, size)] = '\0';
28f540f4
RM
1024 return to;
1025@}
1026@end group
1027@end smallexample
1028
1029The behavior of @code{strncat} is undefined if the strings overlap.
0a13c9e9
PE
1030
1031As a companion to @code{strncpy}, @code{strncat} was designed for
1032now-rarely-used arrays consisting of non-null bytes followed by zero
1033or more null bytes. As noted below, this function is generally a poor
1034choice for processing text. Also, this function has significant
1035performance issues. @xref{Concatenating Strings}.
28f540f4
RM
1036@end deftypefun
1037
8a2f1f5b 1038@deftypefun {wchar_t *} wcsncat (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size})
d08a7e4c 1039@standards{ISO, wchar.h}
11087373 1040@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b 1041This function is like @code{wcscat} except that not more than @var{size}
2cc4b9cc
PE
1042wide characters from @var{from} are appended to the end of @var{to},
1043and @var{from} need not be null-terminated. A single null wide
1044character is also always appended to @var{to}, so the total allocated
1045size of @var{to} must be at least @code{wcsnlen (@var{wfrom},
1046@var{size}) + 1} wide characters longer than its initial length.
8a2f1f5b
UD
1047
1048The @code{wcsncat} function could be implemented like this:
1049
1050@smallexample
1051@group
1052wchar_t *
1053wcsncat (wchar_t *restrict wto, const wchar_t *restrict wfrom,
1054 size_t size)
1055@{
5d1d4918
PE
1056 size_t len = wcslen (wto);
1057 memcpy (wto + len, wfrom, wcsnlen (wfrom, size) * sizeof (wchar_t));
1058 wto[len + wcsnlen (wfrom, size)] = L'\0';
8a2f1f5b
UD
1059 return wto;
1060@}
1061@end group
1062@end smallexample
1063
1064The behavior of @code{wcsncat} is undefined if the strings overlap.
28f540f4 1065
0a13c9e9
PE
1066As noted below, this function is generally a poor choice for
1067processing text. Also, this function has significant performance
1068issues. @xref{Concatenating Strings}.
1069@end deftypefun
1070
1071Because these functions can abruptly truncate strings or wide strings,
1072they are generally poor choices for processing text. When coping or
1073concatening multibyte strings, they can truncate within a multibyte
1074character so that the result is not a valid multibyte string. When
1075combining or concatenating multibyte or wide strings, they may
1076truncate the output after a combining character, resulting in a
1077corrupted grapheme. They can cause bugs even when processing
1078single-byte strings: for example, when calculating an ASCII-only user
1079name, a truncated name can identify the wrong user.
1080
1081Although some buffer overruns can be prevented by manually replacing
1082calls to copying functions with calls to truncation functions, there
1083are often easier and safer automatic techniques that cause buffer
1084overruns to reliably terminate a program, such as GCC's
1085@option{-fcheck-pointer-bounds} and @option{-fsanitize=address}
1086options. @xref{Debugging Options,, Options for Debugging Your Program
1f6676d7 1087or GCC, gcc, Using GCC}. Because truncation functions can mask
0a13c9e9
PE
1088application bugs that would otherwise be caught by the automatic
1089techniques, these functions should be used only when the application's
1090underlying logic requires truncation.
1091
1092@strong{Note:} GNU programs should not truncate strings or wide
1093strings to fit arbitrary size limits. @xref{Semantics, , Writing
1094Robust Programs, standards, The GNU Coding Standards}. Instead of
1095string-truncation functions, it is usually better to use dynamic
1096memory allocation (@pxref{Unconstrained Allocation}) and functions
1097such as @code{strdup} or @code{asprintf} to construct strings.
28f540f4 1098
b4012b75 1099@node String/Array Comparison
28f540f4
RM
1100@section String/Array Comparison
1101@cindex comparing strings and arrays
1102@cindex string comparison functions
1103@cindex array comparison functions
1104@cindex predicates on strings
1105@cindex predicates on arrays
1106
1107You can use the functions in this section to perform comparisons on the
1108contents of strings and arrays. As well as checking for equality, these
1109functions can also be used as the ordering functions for sorting
1110operations. @xref{Searching and Sorting}, for an example of this.
1111
1112Unlike most comparison operations in C, the string comparison functions
1113return a nonzero value if the strings are @emph{not} equivalent rather
1114than if they are. The sign of the value indicates the relative ordering
2cc4b9cc 1115of the first part of the strings that are not equivalent: a
28f540f4 1116negative value indicates that the first string is ``less'' than the
a5113b14 1117second, while a positive value indicates that the first string is
28f540f4
RM
1118``greater''.
1119
1120The most common use of these functions is to check only for equality.
1121This is canonically done with an expression like @w{@samp{! strcmp (s1, s2)}}.
1122
1123All of these functions are declared in the header file @file{string.h}.
1124@pindex string.h
1125
28f540f4 1126@deftypefun int memcmp (const void *@var{a1}, const void *@var{a2}, size_t @var{size})
d08a7e4c 1127@standards{ISO, string.h}
11087373 1128@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
1129The function @code{memcmp} compares the @var{size} bytes of memory
1130beginning at @var{a1} against the @var{size} bytes of memory beginning
1131at @var{a2}. The value returned has the same sign as the difference
1132between the first differing pair of bytes (interpreted as @code{unsigned
1133char} objects, then promoted to @code{int}).
1134
1135If the contents of the two blocks are equal, @code{memcmp} returns
1136@code{0}.
1137@end deftypefun
1138
8a2f1f5b 1139@deftypefun int wmemcmp (const wchar_t *@var{a1}, const wchar_t *@var{a2}, size_t @var{size})
d08a7e4c 1140@standards{ISO, wchar.h}
11087373 1141@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
1142The function @code{wmemcmp} compares the @var{size} wide characters
1143beginning at @var{a1} against the @var{size} wide characters beginning
1144at @var{a2}. The value returned is smaller than or larger than zero
1145depending on whether the first differing wide character is @var{a1} is
2cc4b9cc 1146smaller or larger than the corresponding wide character in @var{a2}.
8a2f1f5b
UD
1147
1148If the contents of the two blocks are equal, @code{wmemcmp} returns
1149@code{0}.
1150@end deftypefun
1151
28f540f4
RM
1152On arbitrary arrays, the @code{memcmp} function is mostly useful for
1153testing equality. It usually isn't meaningful to do byte-wise ordering
1154comparisons on arrays of things other than bytes. For example, a
1155byte-wise comparison on the bytes that make up floating-point numbers
1156isn't likely to tell you anything about the relationship between the
1157values of the floating-point numbers.
1158
8a2f1f5b
UD
1159@code{wmemcmp} is really only useful to compare arrays of type
1160@code{wchar_t} since the function looks at @code{sizeof (wchar_t)} bytes
1161at a time and this number of bytes is system dependent.
1162
28f540f4
RM
1163You should also be careful about using @code{memcmp} to compare objects
1164that can contain ``holes'', such as the padding inserted into structure
1165objects to enforce alignment requirements, extra space at the end of
2cc4b9cc 1166unions, and extra bytes at the ends of strings whose length is less
28f540f4
RM
1167than their allocated size. The contents of these ``holes'' are
1168indeterminate and may cause strange behavior when performing byte-wise
1169comparisons. For more predictable results, perform an explicit
1170component-wise comparison.
1171
1172For example, given a structure type definition like:
1173
1174@smallexample
1175struct foo
1176 @{
1177 unsigned char tag;
1178 union
1179 @{
1180 double f;
1181 long i;
1182 char *p;
1183 @} value;
1184 @};
1185@end smallexample
1186
1187@noindent
1188you are better off writing a specialized comparison function to compare
1189@code{struct foo} objects instead of comparing them with @code{memcmp}.
1190
28f540f4 1191@deftypefun int strcmp (const char *@var{s1}, const char *@var{s2})
d08a7e4c 1192@standards{ISO, string.h}
11087373 1193@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
1194The @code{strcmp} function compares the string @var{s1} against
1195@var{s2}, returning a value that has the same sign as the difference
2cc4b9cc 1196between the first differing pair of bytes (interpreted as
28f540f4
RM
1197@code{unsigned char} objects, then promoted to @code{int}).
1198
1199If the two strings are equal, @code{strcmp} returns @code{0}.
1200
1201A consequence of the ordering used by @code{strcmp} is that if @var{s1}
1202is an initial substring of @var{s2}, then @var{s1} is considered to be
1203``less than'' @var{s2}.
8a2f1f5b
UD
1204
1205@code{strcmp} does not take sorting conventions of the language the
1206strings are written in into account. To get that one has to use
1207@code{strcoll}.
1208@end deftypefun
1209
8a2f1f5b 1210@deftypefun int wcscmp (const wchar_t *@var{ws1}, const wchar_t *@var{ws2})
d08a7e4c 1211@standards{ISO, wchar.h}
11087373 1212@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b 1213
2cc4b9cc 1214The @code{wcscmp} function compares the wide string @var{ws1}
8a2f1f5b
UD
1215against @var{ws2}. The value returned is smaller than or larger than zero
1216depending on whether the first differing wide character is @var{ws1} is
2cc4b9cc 1217smaller or larger than the corresponding wide character in @var{ws2}.
8a2f1f5b
UD
1218
1219If the two strings are equal, @code{wcscmp} returns @code{0}.
1220
1221A consequence of the ordering used by @code{wcscmp} is that if @var{ws1}
1222is an initial substring of @var{ws2}, then @var{ws1} is considered to be
1223``less than'' @var{ws2}.
1224
1225@code{wcscmp} does not take sorting conventions of the language the
1226strings are written in into account. To get that one has to use
1227@code{wcscoll}.
28f540f4
RM
1228@end deftypefun
1229
28f540f4 1230@deftypefun int strcasecmp (const char *@var{s1}, const char *@var{s2})
d08a7e4c 1231@standards{BSD, string.h}
11087373
AO
1232@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
1233@c Although this calls tolower multiple times, it's a macro, and
1234@c strcasecmp is optimized so that the locale pointer is read only once.
1235@c There are some asm implementations too, for which the single-read
1236@c from locale TLS pointers also applies.
4547c1a4 1237This function is like @code{strcmp}, except that differences in case are
2cc4b9cc
PE
1238ignored, and its arguments must be multibyte strings.
1239How uppercase and lowercase characters are related is
4547c1a4
UD
1240determined by the currently selected locale. In the standard @code{"C"}
1241locale the characters @"A and @"a do not match but in a locale which
dd7d45e8 1242regards these characters as parts of the alphabet they do match.
28f540f4 1243
85c165be 1244@noindent
28f540f4
RM
1245@code{strcasecmp} is derived from BSD.
1246@end deftypefun
1247
8ded91fb 1248@deftypefun int wcscasecmp (const wchar_t *@var{ws1}, const wchar_t *@var{ws2})
d08a7e4c 1249@standards{GNU, wchar.h}
11087373
AO
1250@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
1251@c Since towlower is not a macro, the locale object may be read multiple
1252@c times.
8a2f1f5b
UD
1253This function is like @code{wcscmp}, except that differences in case are
1254ignored. How uppercase and lowercase characters are related is
1255determined by the currently selected locale. In the standard @code{"C"}
1256locale the characters @"A and @"a do not match but in a locale which
1257regards these characters as parts of the alphabet they do match.
1258
1259@noindent
1260@code{wcscasecmp} is a GNU extension.
1261@end deftypefun
1262
8a2f1f5b 1263@deftypefun int strncmp (const char *@var{s1}, const char *@var{s2}, size_t @var{size})
d08a7e4c 1264@standards{ISO, string.h}
11087373 1265@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b 1266This function is the similar to @code{strcmp}, except that no more than
2cc4b9cc
PE
1267@var{size} bytes are compared. In other words, if the two
1268strings are the same in their first @var{size} bytes, the
8a2f1f5b
UD
1269return value is zero.
1270@end deftypefun
1271
8a2f1f5b 1272@deftypefun int wcsncmp (const wchar_t *@var{ws1}, const wchar_t *@var{ws2}, size_t @var{size})
d08a7e4c 1273@standards{ISO, wchar.h}
11087373 1274@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
f0f308c1 1275This function is similar to @code{wcscmp}, except that no more than
8a2f1f5b
UD
1276@var{size} wide characters are compared. In other words, if the two
1277strings are the same in their first @var{size} wide characters, the
1278return value is zero.
1279@end deftypefun
1280
28f540f4 1281@deftypefun int strncasecmp (const char *@var{s1}, const char *@var{s2}, size_t @var{n})
d08a7e4c 1282@standards{BSD, string.h}
11087373 1283@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
28f540f4 1284This function is like @code{strncmp}, except that differences in case
2cc4b9cc
PE
1285are ignored, and the compared parts of the arguments should consist of
1286valid multibyte characters.
1287Like @code{strcasecmp}, it is locale dependent how
dd7d45e8 1288uppercase and lowercase characters are related.
28f540f4 1289
85c165be 1290@noindent
28f540f4
RM
1291@code{strncasecmp} is a GNU extension.
1292@end deftypefun
1293
8a2f1f5b 1294@deftypefun int wcsncasecmp (const wchar_t *@var{ws1}, const wchar_t *@var{s2}, size_t @var{n})
d08a7e4c 1295@standards{GNU, wchar.h}
11087373 1296@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
8a2f1f5b
UD
1297This function is like @code{wcsncmp}, except that differences in case
1298are ignored. Like @code{wcscasecmp}, it is locale dependent how
1299uppercase and lowercase characters are related.
1300
1301@noindent
1302@code{wcsncasecmp} is a GNU extension.
28f540f4
RM
1303@end deftypefun
1304
8a2f1f5b
UD
1305Here are some examples showing the use of @code{strcmp} and
1306@code{strncmp} (equivalent examples can be constructed for the wide
1307character functions). These examples assume the use of the ASCII
1308character set. (If some other character set---say, EBCDIC---is used
1309instead, then the glyphs are associated with different numeric codes,
1310and the return values and ordering may differ.)
28f540f4
RM
1311
1312@smallexample
1313strcmp ("hello", "hello")
1314 @result{} 0 /* @r{These two strings are the same.} */
1315strcmp ("hello", "Hello")
1316 @result{} 32 /* @r{Comparisons are case-sensitive.} */
1317strcmp ("hello", "world")
2cc4b9cc 1318 @result{} -15 /* @r{The byte @code{'h'} comes before @code{'w'}.} */
28f540f4 1319strcmp ("hello", "hello, world")
2cc4b9cc 1320 @result{} -44 /* @r{Comparing a null byte against a comma.} */
6952e59e 1321strncmp ("hello", "hello, world", 5)
2cc4b9cc 1322 @result{} 0 /* @r{The initial 5 bytes are the same.} */
28f540f4 1323strncmp ("hello, world", "hello, stupid world!!!", 5)
2cc4b9cc 1324 @result{} 0 /* @r{The initial 5 bytes are the same.} */
28f540f4
RM
1325@end smallexample
1326
1f205a47 1327@deftypefun int strverscmp (const char *@var{s1}, const char *@var{s2})
d08a7e4c 1328@standards{GNU, string.h}
11087373
AO
1329@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
1330@c Calls isdigit multiple times, locale may change in between.
1f205a47 1331The @code{strverscmp} function compares the string @var{s1} against
f2282d42
RM
1332@var{s2}, considering them as holding indices/version numbers. The
1333return value follows the same conventions as found in the
1334@code{strcmp} function. In fact, if @var{s1} and @var{s2} contain no
f4a36548
FW
1335digits, @code{strverscmp} behaves like @code{strcmp}
1336(in the sense that the sign of the result is the same).
1f205a47 1337
f4a36548
FW
1338The comparison algorithm which the @code{strverscmp} function implements
1339differs slightly from other version-comparison algorithms. The
1340implementation is based on a finite-state machine, whose behavior is
1341approximated below.
1f205a47
UD
1342
1343@itemize @bullet
1344@item
f4a36548
FW
1345The input strings are each split into sequences of non-digits and
1346digits. These sequences can be empty at the beginning and end of the
1347string. Digits are determined by the @code{isdigit} function and are
1348thus subject to the current locale.
1f205a47
UD
1349
1350@item
f4a36548
FW
1351Comparison starts with a (possibly empty) non-digit sequence. The first
1352non-equal sequences of non-digits or digits determines the outcome of
1353the comparison.
1f205a47
UD
1354
1355@item
f4a36548
FW
1356Corresponding non-digit sequences in both strings are compared
1357lexicographically if their lengths are equal. If the lengths differ,
1358the shorter non-digit sequence is extended with the input string
1359character immediately following it (which may be the null terminator),
1360the other sequence is truncated to be of the same (extended) length, and
1361these two sequences are compared lexicographically. In the last case,
1362the sequence comparison determines the result of the function because
1363the extension character (or some character before it) is necessarily
1364different from the character at the same offset in the other input
1365string.
1366
1367@item
1368For two sequences of digits, the number of leading zeros is counted (which
1369can be zero). If the count differs, the string with more leading zeros
1370in the digit sequence is considered smaller than the other string.
1371
1372@item
1373If the two sequences of digits have no leading zeros, they are compared
1374as integers, that is, the string with the longer digit sequence is
1375deemed larger, and if both sequences are of equal length, they are
1376compared lexicographically.
1377
1378@item
1379If both digit sequences start with a zero and have an equal number of
1380leading zeros, they are compared lexicographically if their lengths are
1381the same. If the lengths differ, the shorter sequence is extended with
1382the following character in its input string, and the other sequence is
1383truncated to the same length, and both sequences are compared
1384lexicographically (similar to the non-digit sequence case above).
1f205a47
UD
1385@end itemize
1386
f4a36548
FW
1387The treatment of leading zeros and the tie-breaking extension characters
1388(which in effect propagate across non-digit/digit sequence boundaries)
1389differs from other version-comparison algorithms.
1390
1f205a47
UD
1391@smallexample
1392strverscmp ("no digit", "no digit")
0bc93a2f 1393 @result{} 0 /* @r{same behavior as strcmp.} */
1f205a47
UD
1394strverscmp ("item#99", "item#100")
1395 @result{} <0 /* @r{same prefix, but 99 < 100.} */
1396strverscmp ("alpha1", "alpha001")
f4a36548 1397 @result{} >0 /* @r{different number of leading zeros (0 and 2).} */
1f205a47 1398strverscmp ("part1_f012", "part1_f01")
f4a36548 1399 @result{} >0 /* @r{lexicographical comparison with leading zeros.} */
1f205a47 1400strverscmp ("foo.009", "foo.0")
f4a36548 1401 @result{} <0 /* @r{different number of leading zeros (2 and 1).} */
1f205a47
UD
1402@end smallexample
1403
1f205a47
UD
1404@code{strverscmp} is a GNU extension.
1405@end deftypefun
1406
28f540f4 1407@deftypefun int bcmp (const void *@var{a1}, const void *@var{a2}, size_t @var{size})
d08a7e4c 1408@standards{BSD, string.h}
11087373 1409@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
1410This is an obsolete alias for @code{memcmp}, derived from BSD.
1411@end deftypefun
1412
b4012b75 1413@node Collation Functions
28f540f4
RM
1414@section Collation Functions
1415
1416@cindex collating strings
1417@cindex string collation functions
1418
1419In some locales, the conventions for lexicographic ordering differ from
1420the strict numeric ordering of character codes. For example, in Spanish
1421most glyphs with diacritical marks such as accents are not considered
a5177499
BS
1422distinct letters for the purposes of collation. On the other hand, in
1423Czech the two-character sequence @samp{ch} is treated as a single letter
1424that is collated between @samp{h} and @samp{i}.
28f540f4
RM
1425
1426You can use the functions @code{strcoll} and @code{strxfrm} (declared in
8a2f1f5b
UD
1427the headers file @file{string.h}) and @code{wcscoll} and @code{wcsxfrm}
1428(declared in the headers file @file{wchar}) to compare strings using a
1429collation ordering appropriate for the current locale. The locale used
1430by these functions in particular can be specified by setting the locale
1431for the @code{LC_COLLATE} category; see @ref{Locales}.
28f540f4 1432@pindex string.h
8a2f1f5b 1433@pindex wchar.h
28f540f4
RM
1434
1435In the standard C locale, the collation sequence for @code{strcoll} is
8a2f1f5b
UD
1436the same as that for @code{strcmp}. Similarly, @code{wcscoll} and
1437@code{wcscmp} are the same in this situation.
28f540f4
RM
1438
1439Effectively, the way these functions work is by applying a mapping to
2cc4b9cc
PE
1440transform the characters in a multibyte string to a byte
1441sequence that represents
28f540f4
RM
1442the string's position in the collating sequence of the current locale.
1443Comparing two such byte sequences in a simple fashion is equivalent to
1444comparing the strings with the locale's collating sequence.
1445
8a2f1f5b
UD
1446The functions @code{strcoll} and @code{wcscoll} perform this translation
1447implicitly, in order to do one comparison. By contrast, @code{strxfrm}
1448and @code{wcsxfrm} perform the mapping explicitly. If you are making
1449multiple comparisons using the same string or set of strings, it is
1450likely to be more efficient to use @code{strxfrm} or @code{wcsxfrm} to
1451transform all the strings just once, and subsequently compare the
1452transformed strings with @code{strcmp} or @code{wcscmp}.
28f540f4 1453
28f540f4 1454@deftypefun int strcoll (const char *@var{s1}, const char *@var{s2})
d08a7e4c 1455@standards{ISO, string.h}
11087373
AO
1456@safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
1457@c Calls strcoll_l with the current locale, which dereferences only the
1458@c LC_COLLATE data pointer.
28f540f4
RM
1459The @code{strcoll} function is similar to @code{strcmp} but uses the
1460collating sequence of the current locale for collation (the
2cc4b9cc 1461@code{LC_COLLATE} locale). The arguments are multibyte strings.
28f540f4
RM
1462@end deftypefun
1463
8a2f1f5b 1464@deftypefun int wcscoll (const wchar_t *@var{ws1}, const wchar_t *@var{ws2})
d08a7e4c 1465@standards{ISO, wchar.h}
11087373
AO
1466@safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
1467@c Same as strcoll, but calling wcscoll_l.
8a2f1f5b
UD
1468The @code{wcscoll} function is similar to @code{wcscmp} but uses the
1469collating sequence of the current locale for collation (the
1470@code{LC_COLLATE} locale).
1471@end deftypefun
1472
28f540f4
RM
1473Here is an example of sorting an array of strings, using @code{strcoll}
1474to compare them. The actual sort algorithm is not written here; it
1475comes from @code{qsort} (@pxref{Array Sort Function}). The job of the
1476code shown here is to say how to compare the strings while sorting them.
1477(Later on in this section, we will show a way to do this more
1478efficiently using @code{strxfrm}.)
1479
1480@smallexample
1481/* @r{This is the comparison function used with @code{qsort}.} */
1482
1483int
e39745ff 1484compare_elements (const void *v1, const void *v2)
28f540f4 1485@{
e39745ff 1486 char * const *p1 = v1;
a9f5ce09 1487 char * const *p2 = v2;
e39745ff 1488
28f540f4
RM
1489 return strcoll (*p1, *p2);
1490@}
1491
1492/* @r{This is the entry point---the function to sort}
1493 @r{strings using the locale's collating sequence.} */
1494
1495void
1496sort_strings (char **array, int nstrings)
1497@{
1498 /* @r{Sort @code{temp_array} by comparing the strings.} */
9fc19e48
UD
1499 qsort (array, nstrings,
1500 sizeof (char *), compare_elements);
28f540f4
RM
1501@}
1502@end smallexample
1503
1504@cindex converting string to collation order
8a2f1f5b 1505@deftypefun size_t strxfrm (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size})
d08a7e4c 1506@standards{ISO, string.h}
11087373 1507@safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
2cc4b9cc
PE
1508The function @code{strxfrm} transforms the multibyte string
1509@var{from} using the
8a2f1f5b 1510collation transformation determined by the locale currently selected for
28f540f4 1511collation, and stores the transformed string in the array @var{to}. Up
2cc4b9cc 1512to @var{size} bytes (including a terminating null byte) are
28f540f4
RM
1513stored.
1514
1515The behavior is undefined if the strings @var{to} and @var{from}
0a13c9e9 1516overlap; see @ref{Copying Strings and Arrays}.
28f540f4
RM
1517
1518The return value is the length of the entire transformed string. This
1519value is not affected by the value of @var{size}, but if it is greater
a5113b14
UD
1520or equal than @var{size}, it means that the transformed string did not
1521entirely fit in the array @var{to}. In this case, only as much of the
1522string as actually fits was stored. To get the whole transformed
1523string, call @code{strxfrm} again with a bigger output array.
28f540f4
RM
1524
1525The transformed string may be longer than the original string, and it
1526may also be shorter.
1527
2cc4b9cc
PE
1528If @var{size} is zero, no bytes are stored in @var{to}. In this
1529case, @code{strxfrm} simply returns the number of bytes that would
28f540f4 1530be the length of the transformed string. This is useful for determining
8a2f1f5b
UD
1531what size the allocated array should be. It does not matter what
1532@var{to} is if @var{size} is zero; @var{to} may even be a null pointer.
1533@end deftypefun
1534
8a2f1f5b 1535@deftypefun size_t wcsxfrm (wchar_t *restrict @var{wto}, const wchar_t *@var{wfrom}, size_t @var{size})
d08a7e4c 1536@standards{ISO, wchar.h}
11087373 1537@safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
2cc4b9cc 1538The function @code{wcsxfrm} transforms wide string @var{wfrom}
8a2f1f5b
UD
1539using the collation transformation determined by the locale currently
1540selected for collation, and stores the transformed string in the array
1541@var{wto}. Up to @var{size} wide characters (including a terminating null
2cc4b9cc 1542wide character) are stored.
8a2f1f5b
UD
1543
1544The behavior is undefined if the strings @var{wto} and @var{wfrom}
0a13c9e9 1545overlap; see @ref{Copying Strings and Arrays}.
8a2f1f5b 1546
2cc4b9cc 1547The return value is the length of the entire transformed wide
8a2f1f5b
UD
1548string. This value is not affected by the value of @var{size}, but if
1549it is greater or equal than @var{size}, it means that the transformed
2cc4b9cc
PE
1550wide string did not entirely fit in the array @var{wto}. In
1551this case, only as much of the wide string as actually fits
1552was stored. To get the whole transformed wide string, call
8a2f1f5b
UD
1553@code{wcsxfrm} again with a bigger output array.
1554
2cc4b9cc
PE
1555The transformed wide string may be longer than the original
1556wide string, and it may also be shorter.
8a2f1f5b 1557
2cc4b9cc 1558If @var{size} is zero, no wide characters are stored in @var{to}. In this
8a2f1f5b 1559case, @code{wcsxfrm} simply returns the number of wide characters that
2cc4b9cc 1560would be the length of the transformed wide string. This is
8a2f1f5b
UD
1561useful for determining what size the allocated array should be (remember
1562to multiply with @code{sizeof (wchar_t)}). It does not matter what
1563@var{wto} is if @var{size} is zero; @var{wto} may even be a null pointer.
28f540f4
RM
1564@end deftypefun
1565
1566Here is an example of how you can use @code{strxfrm} when
1567you plan to do many comparisons. It does the same thing as the previous
1568example, but much faster, because it has to transform each string only
1569once, no matter how many times it is compared with other strings. Even
1570the time needed to allocate and free storage is much less than the time
1571we save, when there are many strings.
1572
1573@smallexample
1574struct sorter @{ char *input; char *transformed; @};
1575
1576/* @r{This is the comparison function used with @code{qsort}}
1577 @r{to sort an array of @code{struct sorter}.} */
1578
1579int
e39745ff 1580compare_elements (const void *v1, const void *v2)
28f540f4 1581@{
e39745ff
AJ
1582 const struct sorter *p1 = v1;
1583 const struct sorter *p2 = v2;
1584
28f540f4
RM
1585 return strcmp (p1->transformed, p2->transformed);
1586@}
1587
1588/* @r{This is the entry point---the function to sort}
1589 @r{strings using the locale's collating sequence.} */
1590
1591void
1592sort_strings_fast (char **array, int nstrings)
1593@{
1594 struct sorter temp_array[nstrings];
1595 int i;
1596
1597 /* @r{Set up @code{temp_array}. Each element contains}
1598 @r{one input string and its transformed string.} */
1599 for (i = 0; i < nstrings; i++)
1600 @{
1601 size_t length = strlen (array[i]) * 2;
a5113b14 1602 char *transformed;
f2ea0f5b 1603 size_t transformed_length;
28f540f4
RM
1604
1605 temp_array[i].input = array[i];
1606
a5113b14
UD
1607 /* @r{First try a buffer perhaps big enough.} */
1608 transformed = (char *) xmalloc (length);
1609
1610 /* @r{Transform @code{array[i]}.} */
1611 transformed_length = strxfrm (transformed, array[i], length);
1612
1613 /* @r{If the buffer was not large enough, resize it}
1614 @r{and try again.} */
1615 if (transformed_length >= length)
28f540f4 1616 @{
a5113b14 1617 /* @r{Allocate the needed space. +1 for terminating}
2cc4b9cc 1618 @r{@code{'\0'} byte.} */
bdc674d9
PE
1619 transformed = xrealloc (transformed,
1620 transformed_length + 1);
a5113b14
UD
1621
1622 /* @r{The return value is not interesting because we know}
1623 @r{how long the transformed string is.} */
dd7d45e8
UD
1624 (void) strxfrm (transformed, array[i],
1625 transformed_length + 1);
28f540f4 1626 @}
a5113b14
UD
1627
1628 temp_array[i].transformed = transformed;
28f540f4
RM
1629 @}
1630
1631 /* @r{Sort @code{temp_array} by comparing transformed strings.} */
89e691f2
AM
1632 qsort (temp_array, nstrings,
1633 sizeof (struct sorter), compare_elements);
28f540f4
RM
1634
1635 /* @r{Put the elements back in the permanent array}
1636 @r{in their sorted order.} */
1637 for (i = 0; i < nstrings; i++)
1638 array[i] = temp_array[i].input;
1639
1640 /* @r{Free the strings we allocated.} */
1641 for (i = 0; i < nstrings; i++)
1642 free (temp_array[i].transformed);
1643@}
1644@end smallexample
1645
8a2f1f5b
UD
1646The interesting part of this code for the wide character version would
1647look like this:
1648
1649@smallexample
1650void
1651sort_strings_fast (wchar_t **array, int nstrings)
1652@{
1653 @dots{}
1654 /* @r{Transform @code{array[i]}.} */
1655 transformed_length = wcsxfrm (transformed, array[i], length);
1656
1657 /* @r{If the buffer was not large enough, resize it}
1658 @r{and try again.} */
1659 if (transformed_length >= length)
1660 @{
1661 /* @r{Allocate the needed space. +1 for terminating}
2cc4b9cc 1662 @r{@code{L'\0'} wide character.} */
bdc674d9
PE
1663 transformed = xreallocarray (transformed,
1664 transformed_length + 1,
1665 sizeof *transformed);
8a2f1f5b
UD
1666
1667 /* @r{The return value is not interesting because we know}
1668 @r{how long the transformed string is.} */
1669 (void) wcsxfrm (transformed, array[i],
1670 transformed_length + 1);
1671 @}
1672 @dots{}
1673@end smallexample
1674
1675@noindent
1676Note the additional multiplication with @code{sizeof (wchar_t)} in the
1677@code{realloc} call.
1678
1679@strong{Compatibility Note:} The string collation functions are a new
976780fd 1680feature of @w{ISO C90}. Older C dialects have no equivalent feature.
8a2f1f5b
UD
1681The wide character versions were introduced in @w{Amendment 1} to @w{ISO
1682C90}.
28f540f4 1683
b4012b75 1684@node Search Functions
28f540f4
RM
1685@section Search Functions
1686
1687This section describes library functions which perform various kinds
1688of searching operations on strings and arrays. These functions are
1689declared in the header file @file{string.h}.
1690@pindex string.h
1691@cindex search functions (for strings)
1692@cindex string search functions
1693
28f540f4 1694@deftypefun {void *} memchr (const void *@var{block}, int @var{c}, size_t @var{size})
d08a7e4c 1695@standards{ISO, string.h}
11087373 1696@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
1697This function finds the first occurrence of the byte @var{c} (converted
1698to an @code{unsigned char}) in the initial @var{size} bytes of the
1699object beginning at @var{block}. The return value is a pointer to the
1700located byte, or a null pointer if no match was found.
1701@end deftypefun
1702
8a2f1f5b 1703@deftypefun {wchar_t *} wmemchr (const wchar_t *@var{block}, wchar_t @var{wc}, size_t @var{size})
d08a7e4c 1704@standards{ISO, wchar.h}
11087373 1705@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
1706This function finds the first occurrence of the wide character @var{wc}
1707in the initial @var{size} wide characters of the object beginning at
1708@var{block}. The return value is a pointer to the located wide
1709character, or a null pointer if no match was found.
1710@end deftypefun
1711
87b56f36 1712@deftypefun {void *} rawmemchr (const void *@var{block}, int @var{c})
d08a7e4c 1713@standards{GNU, string.h}
11087373 1714@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
87b56f36
UD
1715Often the @code{memchr} function is used with the knowledge that the
1716byte @var{c} is available in the memory block specified by the
1717parameters. But this means that the @var{size} parameter is not really
1718needed and that the tests performed with it at runtime (to check whether
1719the end of the block is reached) are not needed.
1720
1721The @code{rawmemchr} function exists for just this situation which is
1722surprisingly frequent. The interface is similar to @code{memchr} except
1723that the @var{size} parameter is missing. The function will look beyond
1724the end of the block pointed to by @var{block} in case the programmer
6be569a4 1725made an error in assuming that the byte @var{c} is present in the block.
87b56f36
UD
1726In this case the result is unspecified. Otherwise the return value is a
1727pointer to the located byte.
1728
1729This function is of special interest when looking for the end of a
1730string. Since all strings are terminated by a null byte a call like
1731
1732@smallexample
1733 rawmemchr (str, '\0')
1734@end smallexample
1735
8a2f1f5b 1736@noindent
87b56f36
UD
1737will never go beyond the end of the string.
1738
1739This function is a GNU extension.
1740@end deftypefun
1741
ca747856 1742@deftypefun {void *} memrchr (const void *@var{block}, int @var{c}, size_t @var{size})
d08a7e4c 1743@standards{GNU, string.h}
11087373 1744@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
ca747856
RM
1745The function @code{memrchr} is like @code{memchr}, except that it searches
1746backwards from the end of the block defined by @var{block} and @var{size}
1747(instead of forwards from the front).
4efcb713
UD
1748
1749This function is a GNU extension.
a2d63612 1750@end deftypefun
ca747856 1751
28f540f4 1752@deftypefun {char *} strchr (const char *@var{string}, int @var{c})
d08a7e4c 1753@standards{ISO, string.h}
11087373 1754@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
2cc4b9cc
PE
1755The @code{strchr} function finds the first occurrence of the byte
1756@var{c} (converted to a @code{char}) in the string
28f540f4 1757beginning at @var{string}. The return value is a pointer to the located
2cc4b9cc 1758byte, or a null pointer if no match was found.
28f540f4
RM
1759
1760For example,
1761@smallexample
1762strchr ("hello, world", 'l')
1763 @result{} "llo, world"
1764strchr ("hello, world", '?')
1765 @result{} NULL
a5113b14 1766@end smallexample
28f540f4 1767
2cc4b9cc 1768The terminating null byte is considered to be part of the string,
28f540f4 1769so you can use this function get a pointer to the end of a string by
2cc4b9cc 1770specifying zero as the value of the @var{c} argument.
0520adde
FB
1771
1772When @code{strchr} returns a null pointer, it does not let you know
2cc4b9cc 1773the position of the terminating null byte it has found. If you
0520adde
FB
1774need that information, it is better (but less portable) to use
1775@code{strchrnul} than to search for it a second time.
8a2f1f5b
UD
1776@end deftypefun
1777
f801cf7b 1778@deftypefun {wchar_t *} wcschr (const wchar_t *@var{wstring}, wchar_t @var{wc})
d08a7e4c 1779@standards{ISO, wchar.h}
11087373 1780@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b 1781The @code{wcschr} function finds the first occurrence of the wide
2cc4b9cc 1782character @var{wc} in the wide string
8a2f1f5b
UD
1783beginning at @var{wstring}. The return value is a pointer to the
1784located wide character, or a null pointer if no match was found.
1785
2cc4b9cc
PE
1786The terminating null wide character is considered to be part of the wide
1787string, so you can use this function get a pointer to the end
1788of a wide string by specifying a null wide character as the
8a2f1f5b
UD
1789value of the @var{wc} argument. It would be better (but less portable)
1790to use @code{wcschrnul} in this case, though.
28f540f4
RM
1791@end deftypefun
1792
0e4ee106 1793@deftypefun {char *} strchrnul (const char *@var{string}, int @var{c})
d08a7e4c 1794@standards{GNU, string.h}
11087373 1795@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
0e4ee106 1796@code{strchrnul} is the same as @code{strchr} except that if it does
2cc4b9cc
PE
1797not find the byte, it returns a pointer to string's terminating
1798null byte rather than a null pointer.
8a2f1f5b
UD
1799
1800This function is a GNU extension.
1801@end deftypefun
1802
8a2f1f5b 1803@deftypefun {wchar_t *} wcschrnul (const wchar_t *@var{wstring}, wchar_t @var{wc})
d08a7e4c 1804@standards{GNU, wchar.h}
11087373 1805@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b 1806@code{wcschrnul} is the same as @code{wcschr} except that if it does not
2cc4b9cc 1807find the wide character, it returns a pointer to the wide string's
8a2f1f5b
UD
1808terminating null wide character rather than a null pointer.
1809
1810This function is a GNU extension.
28f540f4
RM
1811@end deftypefun
1812
ec28fc7c 1813One useful, but unusual, use of the @code{strchr}
2cc4b9cc 1814function is when one wants to have a pointer pointing to the null byte
ee2752ea
UD
1815terminating a string. This is often written in this way:
1816
1817@smallexample
1818 s += strlen (s);
1819@end smallexample
1820
1821@noindent
1822This is almost optimal but the addition operation duplicated a bit of
1823the work already done in the @code{strlen} function. A better solution
1824is this:
1825
1826@smallexample
1827 s = strchr (s, '\0');
1828@end smallexample
1829
1830There is no restriction on the second parameter of @code{strchr} so it
2cc4b9cc 1831could very well also be zero. Those readers thinking very
ee2752ea 1832hard about this might now point out that the @code{strchr} function is
8c474db5 1833more expensive than the @code{strlen} function since we have two abort
1f77f049 1834criteria. This is right. But in @theglibc{} the implementation of
0e4ee106 1835@code{strchr} is optimized in a special way so that @code{strchr}
8c474db5 1836actually is faster.
ee2752ea 1837
28f540f4 1838@deftypefun {char *} strrchr (const char *@var{string}, int @var{c})
d08a7e4c 1839@standards{ISO, string.h}
11087373 1840@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
1841The function @code{strrchr} is like @code{strchr}, except that it searches
1842backwards from the end of the string @var{string} (instead of forwards
1843from the front).
1844
1845For example,
1846@smallexample
1847strrchr ("hello, world", 'l')
1848 @result{} "ld"
1849@end smallexample
1850@end deftypefun
1851
4315f45c 1852@deftypefun {wchar_t *} wcsrchr (const wchar_t *@var{wstring}, wchar_t @var{wc})
d08a7e4c 1853@standards{ISO, wchar.h}
11087373 1854@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
1855The function @code{wcsrchr} is like @code{wcschr}, except that it searches
1856backwards from the end of the string @var{wstring} (instead of forwards
1857from the front).
1858@end deftypefun
1859
28f540f4 1860@deftypefun {char *} strstr (const char *@var{haystack}, const char *@var{needle})
d08a7e4c 1861@standards{ISO, string.h}
11087373 1862@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4 1863This is like @code{strchr}, except that it searches @var{haystack} for a
2cc4b9cc 1864substring @var{needle} rather than just a single byte. It
28f540f4 1865returns a pointer into the string @var{haystack} that is the first
2cc4b9cc 1866byte of the substring, or a null pointer if no match was found. If
28f540f4
RM
1867@var{needle} is an empty string, the function returns @var{haystack}.
1868
1869For example,
1870@smallexample
1871strstr ("hello, world", "l")
1872 @result{} "llo, world"
1873strstr ("hello, world", "wo")
1874 @result{} "world"
1875@end smallexample
1876@end deftypefun
1877
8a2f1f5b 1878@deftypefun {wchar_t *} wcsstr (const wchar_t *@var{haystack}, const wchar_t *@var{needle})
d08a7e4c 1879@standards{ISO, wchar.h}
11087373 1880@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
1881This is like @code{wcschr}, except that it searches @var{haystack} for a
1882substring @var{needle} rather than just a single wide character. It
1883returns a pointer into the string @var{haystack} that is the first wide
1884character of the substring, or a null pointer if no match was found. If
1885@var{needle} is an empty string, the function returns @var{haystack}.
1886@end deftypefun
1887
8a2f1f5b 1888@deftypefun {wchar_t *} wcswcs (const wchar_t *@var{haystack}, const wchar_t *@var{needle})
d08a7e4c 1889@standards{XPG, wchar.h}
11087373 1890@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
9dcc8f11 1891@code{wcswcs} is a deprecated alias for @code{wcsstr}. This is the
8a2f1f5b
UD
1892name originally used in the X/Open Portability Guide before the
1893@w{Amendment 1} to @w{ISO C90} was published.
1894@end deftypefun
1895
28f540f4 1896
0e4ee106 1897@deftypefun {char *} strcasestr (const char *@var{haystack}, const char *@var{needle})
d08a7e4c 1898@standards{GNU, string.h}
11087373
AO
1899@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
1900@c There may be multiple calls of strncasecmp, each accessing the locale
1901@c object independently.
0e4ee106
UD
1902This is like @code{strstr}, except that it ignores case in searching for
1903the substring. Like @code{strcasecmp}, it is locale dependent how
2cc4b9cc
PE
1904uppercase and lowercase characters are related, and arguments are
1905multibyte strings.
0e4ee106
UD
1906
1907
1908For example,
1909@smallexample
d6868416 1910strcasestr ("hello, world", "L")
0e4ee106 1911 @result{} "llo, world"
d6868416 1912strcasestr ("hello, World", "wo")
0e4ee106
UD
1913 @result{} "World"
1914@end smallexample
1915@end deftypefun
1916
1917
63551311 1918@deftypefun {void *} memmem (const void *@var{haystack}, size_t @var{haystack-len},@*const void *@var{needle}, size_t @var{needle-len})
d08a7e4c 1919@standards{GNU, string.h}
11087373 1920@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4 1921This is like @code{strstr}, but @var{needle} and @var{haystack} are byte
2cc4b9cc 1922arrays rather than strings. @var{needle-len} is the
28f540f4
RM
1923length of @var{needle} and @var{haystack-len} is the length of
1924@var{haystack}.@refill
1925
1926This function is a GNU extension.
1927@end deftypefun
1928
28f540f4 1929@deftypefun size_t strspn (const char *@var{string}, const char *@var{skipset})
d08a7e4c 1930@standards{ISO, string.h}
11087373 1931@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4 1932The @code{strspn} (``string span'') function returns the length of the
2cc4b9cc 1933initial substring of @var{string} that consists entirely of bytes that
28f540f4 1934are members of the set specified by the string @var{skipset}. The order
2cc4b9cc 1935of the bytes in @var{skipset} is not important.
28f540f4
RM
1936
1937For example,
1938@smallexample
1939strspn ("hello, world", "abcdefghijklmnopqrstuvwxyz")
1940 @result{} 5
1941@end smallexample
8a2f1f5b 1942
2cc4b9cc
PE
1943In a multibyte string, characters consisting of
1944more than one byte are not treated as single entities. Each byte is treated
8a2f1f5b
UD
1945separately. The function is not locale-dependent.
1946@end deftypefun
1947
8a2f1f5b 1948@deftypefun size_t wcsspn (const wchar_t *@var{wstring}, const wchar_t *@var{skipset})
d08a7e4c 1949@standards{ISO, wchar.h}
11087373 1950@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
1951The @code{wcsspn} (``wide character string span'') function returns the
1952length of the initial substring of @var{wstring} that consists entirely
1953of wide characters that are members of the set specified by the string
1954@var{skipset}. The order of the wide characters in @var{skipset} is not
1955important.
28f540f4
RM
1956@end deftypefun
1957
28f540f4 1958@deftypefun size_t strcspn (const char *@var{string}, const char *@var{stopset})
d08a7e4c 1959@standards{ISO, string.h}
11087373 1960@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4 1961The @code{strcspn} (``string complement span'') function returns the length
2cc4b9cc 1962of the initial substring of @var{string} that consists entirely of bytes
28f540f4 1963that are @emph{not} members of the set specified by the string @var{stopset}.
2cc4b9cc 1964(In other words, it returns the offset of the first byte in @var{string}
28f540f4
RM
1965that is a member of the set @var{stopset}.)
1966
1967For example,
1968@smallexample
1969strcspn ("hello, world", " \t\n,.;!?")
1970 @result{} 5
1971@end smallexample
8a2f1f5b 1972
2cc4b9cc
PE
1973In a multibyte string, characters consisting of
1974more than one byte are not treated as a single entities. Each byte is treated
8a2f1f5b
UD
1975separately. The function is not locale-dependent.
1976@end deftypefun
1977
8a2f1f5b 1978@deftypefun size_t wcscspn (const wchar_t *@var{wstring}, const wchar_t *@var{stopset})
d08a7e4c 1979@standards{ISO, wchar.h}
11087373 1980@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
1981The @code{wcscspn} (``wide character string complement span'') function
1982returns the length of the initial substring of @var{wstring} that
1983consists entirely of wide characters that are @emph{not} members of the
1984set specified by the string @var{stopset}. (In other words, it returns
2cc4b9cc 1985the offset of the first wide character in @var{string} that is a member of
8a2f1f5b 1986the set @var{stopset}.)
28f540f4
RM
1987@end deftypefun
1988
28f540f4 1989@deftypefun {char *} strpbrk (const char *@var{string}, const char *@var{stopset})
d08a7e4c 1990@standards{ISO, string.h}
11087373 1991@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4 1992The @code{strpbrk} (``string pointer break'') function is related to
2cc4b9cc 1993@code{strcspn}, except that it returns a pointer to the first byte
28f540f4
RM
1994in @var{string} that is a member of the set @var{stopset} instead of the
1995length of the initial substring. It returns a null pointer if no such
2cc4b9cc 1996byte from @var{stopset} is found.
28f540f4
RM
1997
1998@c @group Invalid outside the example.
1999For example,
2000
2001@smallexample
2002strpbrk ("hello, world", " \t\n,.;!?")
2003 @result{} ", world"
2004@end smallexample
2005@c @end group
8a2f1f5b 2006
2cc4b9cc
PE
2007In a multibyte string, characters consisting of
2008more than one byte are not treated as single entities. Each byte is treated
8a2f1f5b
UD
2009separately. The function is not locale-dependent.
2010@end deftypefun
2011
8a2f1f5b 2012@deftypefun {wchar_t *} wcspbrk (const wchar_t *@var{wstring}, const wchar_t *@var{stopset})
d08a7e4c 2013@standards{ISO, wchar.h}
11087373 2014@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
2015The @code{wcspbrk} (``wide character string pointer break'') function is
2016related to @code{wcscspn}, except that it returns a pointer to the first
2017wide character in @var{wstring} that is a member of the set
2018@var{stopset} instead of the length of the initial substring. It
2cc4b9cc 2019returns a null pointer if no such wide character from @var{stopset} is found.
28f540f4
RM
2020@end deftypefun
2021
0e4ee106
UD
2022
2023@subsection Compatibility String Search Functions
2024
0e4ee106 2025@deftypefun {char *} index (const char *@var{string}, int @var{c})
d08a7e4c 2026@standards{BSD, string.h}
11087373 2027@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
0e4ee106
UD
2028@code{index} is another name for @code{strchr}; they are exactly the same.
2029New code should always use @code{strchr} since this name is defined in
2030@w{ISO C} while @code{index} is a BSD invention which never was available
2031on @w{System V} derived systems.
2032@end deftypefun
2033
0e4ee106 2034@deftypefun {char *} rindex (const char *@var{string}, int @var{c})
d08a7e4c 2035@standards{BSD, string.h}
11087373 2036@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
0e4ee106
UD
2037@code{rindex} is another name for @code{strrchr}; they are exactly the same.
2038New code should always use @code{strrchr} since this name is defined in
2039@w{ISO C} while @code{rindex} is a BSD invention which never was available
2040on @w{System V} derived systems.
2041@end deftypefun
2042
b4012b75 2043@node Finding Tokens in a String
28f540f4
RM
2044@section Finding Tokens in a String
2045
28f540f4
RM
2046@cindex tokenizing strings
2047@cindex breaking a string into tokens
2048@cindex parsing tokens from a string
2049It's fairly common for programs to have a need to do some simple kinds
2050of lexical analysis and parsing, such as splitting a command string up
2051into tokens. You can do this with the @code{strtok} function, declared
2052in the header file @file{string.h}.
2053@pindex string.h
2054
8a2f1f5b 2055@deftypefun {char *} strtok (char *restrict @var{newstring}, const char *restrict @var{delimiters})
d08a7e4c 2056@standards{ISO, string.h}
11087373 2057@safety{@prelim{}@mtunsafe{@mtasurace{:strtok}}@asunsafe{}@acsafe{}}
28f540f4
RM
2058A string can be split into tokens by making a series of calls to the
2059function @code{strtok}.
2060
2061The string to be split up is passed as the @var{newstring} argument on
2062the first call only. The @code{strtok} function uses this to set up
2063some internal state information. Subsequent calls to get additional
2064tokens from the same string are indicated by passing a null pointer as
2065the @var{newstring} argument. Calling @code{strtok} with another
2066non-null @var{newstring} argument reinitializes the state information.
2067It is guaranteed that no other library function ever calls @code{strtok}
2068behind your back (which would mess up this internal state information).
2069
2070The @var{delimiters} argument is a string that specifies a set of delimiters
2cc4b9cc
PE
2071that may surround the token being extracted. All the initial bytes
2072that are members of this set are discarded. The first byte that is
28f540f4
RM
2073@emph{not} a member of this set of delimiters marks the beginning of the
2074next token. The end of the token is found by looking for the next
2cc4b9cc
PE
2075byte that is a member of the delimiter set. This byte in the
2076original string @var{newstring} is overwritten by a null byte, and the
28f540f4
RM
2077pointer to the beginning of the token in @var{newstring} is returned.
2078
2079On the next call to @code{strtok}, the searching begins at the next
2cc4b9cc 2080byte beyond the one that marked the end of the previous token.
28f540f4
RM
2081Note that the set of delimiters @var{delimiters} do not have to be the
2082same on every call in a series of calls to @code{strtok}.
2083
2084If the end of the string @var{newstring} is reached, or if the remainder of
2cc4b9cc 2085string consists only of delimiter bytes, @code{strtok} returns
28f540f4 2086a null pointer.
8a2f1f5b 2087
2cc4b9cc
PE
2088In a multibyte string, characters consisting of
2089more than one byte are not treated as single entities. Each byte is treated
8a2f1f5b
UD
2090separately. The function is not locale-dependent.
2091@end deftypefun
2092
1acd4371 2093@deftypefun {wchar_t *} wcstok (wchar_t *@var{newstring}, const wchar_t *@var{delimiters}, wchar_t **@var{save_ptr})
d08a7e4c 2094@standards{ISO, wchar.h}
11087373 2095@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
2096A string can be split into tokens by making a series of calls to the
2097function @code{wcstok}.
2098
2099The string to be split up is passed as the @var{newstring} argument on
2100the first call only. The @code{wcstok} function uses this to set up
2101some internal state information. Subsequent calls to get additional
2cc4b9cc 2102tokens from the same wide string are indicated by passing a
1acd4371
AO
2103null pointer as the @var{newstring} argument, which causes the pointer
2104previously stored in @var{save_ptr} to be used instead.
8a2f1f5b 2105
2cc4b9cc 2106The @var{delimiters} argument is a wide string that specifies
8a2f1f5b
UD
2107a set of delimiters that may surround the token being extracted. All
2108the initial wide characters that are members of this set are discarded.
2109The first wide character that is @emph{not} a member of this set of
2110delimiters marks the beginning of the next token. The end of the token
2111is found by looking for the next wide character that is a member of the
2cc4b9cc 2112delimiter set. This wide character in the original wide
1acd4371
AO
2113string @var{newstring} is overwritten by a null wide character, the
2114pointer past the overwritten wide character is saved in @var{save_ptr},
2115and the pointer to the beginning of the token in @var{newstring} is
2116returned.
8a2f1f5b
UD
2117
2118On the next call to @code{wcstok}, the searching begins at the next
2119wide character beyond the one that marked the end of the previous token.
2120Note that the set of delimiters @var{delimiters} do not have to be the
2121same on every call in a series of calls to @code{wcstok}.
2122
2cc4b9cc 2123If the end of the wide string @var{newstring} is reached, or
8a2f1f5b
UD
2124if the remainder of string consists only of delimiter wide characters,
2125@code{wcstok} returns a null pointer.
28f540f4
RM
2126@end deftypefun
2127
8a2f1f5b
UD
2128@strong{Warning:} Since @code{strtok} and @code{wcstok} alter the string
2129they is parsing, you should always copy the string to a temporary buffer
0a13c9e9
PE
2130before parsing it with @code{strtok}/@code{wcstok} (@pxref{Copying Strings
2131and Arrays}). If you allow @code{strtok} or @code{wcstok} to modify
8a2f1f5b
UD
2132a string that came from another part of your program, you are asking for
2133trouble; that string might be used for other purposes after
2134@code{strtok} or @code{wcstok} has modified it, and it would not have
2135the expected value.
28f540f4
RM
2136
2137The string that you are operating on might even be a constant. Then
8a2f1f5b
UD
2138when @code{strtok} or @code{wcstok} tries to modify it, your program
2139will get a fatal signal for writing in read-only memory. @xref{Program
2140Error Signals}. Even if the operation of @code{strtok} or @code{wcstok}
2141would not require a modification of the string (e.g., if there is
1f77f049 2142exactly one token) the string can (and in the @glibcadj{} case will) be
8a2f1f5b 2143modified.
28f540f4
RM
2144
2145This is a special case of a general principle: if a part of a program
2146does not have as its purpose the modification of a certain data
2147structure, then it is error-prone to modify the data structure
2148temporarily.
2149
1acd4371 2150The function @code{strtok} is not reentrant, whereas @code{wcstok} is.
8a2f1f5b
UD
2151@xref{Nonreentrancy}, for a discussion of where and why reentrancy is
2152important.
28f540f4
RM
2153
2154Here is a simple example showing the use of @code{strtok}.
2155
2156@comment Yes, this example has been tested.
2157@smallexample
2158#include <string.h>
2159#include <stddef.h>
2160
2161@dots{}
2162
5649a1d6 2163const char string[] = "words separated by spaces -- and, punctuation!";
28f540f4 2164const char delimiters[] = " .,;:!-";
5649a1d6 2165char *token, *cp;
28f540f4
RM
2166
2167@dots{}
2168
5649a1d6
UD
2169cp = strdupa (string); /* Make writable copy. */
2170token = strtok (cp, delimiters); /* token => "words" */
28f540f4
RM
2171token = strtok (NULL, delimiters); /* token => "separated" */
2172token = strtok (NULL, delimiters); /* token => "by" */
2173token = strtok (NULL, delimiters); /* token => "spaces" */
2174token = strtok (NULL, delimiters); /* token => "and" */
2175token = strtok (NULL, delimiters); /* token => "punctuation" */
2176token = strtok (NULL, delimiters); /* token => NULL */
2177@end smallexample
a5113b14 2178
1f77f049 2179@Theglibc{} contains two more functions for tokenizing a string
2cc4b9cc
PE
2180which overcome the limitation of non-reentrancy. They are not
2181available available for wide strings.
a5113b14 2182
a5113b14 2183@deftypefun {char *} strtok_r (char *@var{newstring}, const char *@var{delimiters}, char **@var{save_ptr})
d08a7e4c 2184@standards{POSIX, string.h}
11087373 2185@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
dd7d45e8
UD
2186Just like @code{strtok}, this function splits the string into several
2187tokens which can be accessed by successive calls to @code{strtok_r}.
1acd4371
AO
2188The difference is that, as in @code{wcstok}, the information about the
2189next token is stored in the space pointed to by the third argument,
2190@var{save_ptr}, which is a pointer to a string pointer. Calling
2191@code{strtok_r} with a null pointer for @var{newstring} and leaving
2192@var{save_ptr} between the calls unchanged does the job without
2193hindering reentrancy.
a5113b14 2194
976780fd 2195This function is defined in POSIX.1 and can be found on many systems
a5113b14
UD
2196which support multi-threading.
2197@end deftypefun
2198
a5113b14 2199@deftypefun {char *} strsep (char **@var{string_ptr}, const char *@var{delimiter})
d08a7e4c 2200@standards{BSD, string.h}
11087373 2201@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
0050ad5f
UD
2202This function has a similar functionality as @code{strtok_r} with the
2203@var{newstring} argument replaced by the @var{save_ptr} argument. The
2204initialization of the moving pointer has to be done by the user.
2205Successive calls to @code{strsep} move the pointer along the tokens
2206separated by @var{delimiter}, returning the address of the next token
2207and updating @var{string_ptr} to point to the beginning of the next
2208token.
2209
2210One difference between @code{strsep} and @code{strtok_r} is that if the
2cc4b9cc
PE
2211input string contains more than one byte from @var{delimiter} in a
2212row @code{strsep} returns an empty string for each pair of bytes
0050ad5f
UD
2213from @var{delimiter}. This means that a program normally should test
2214for @code{strsep} returning an empty string before processing it.
9afc8a59 2215
a5113b14
UD
2216This function was introduced in 4.3BSD and therefore is widely available.
2217@end deftypefun
2218
2219Here is how the above example looks like when @code{strsep} is used.
2220
2221@comment Yes, this example has been tested.
2222@smallexample
2223#include <string.h>
2224#include <stddef.h>
2225
2226@dots{}
2227
5649a1d6 2228const char string[] = "words separated by spaces -- and, punctuation!";
a5113b14
UD
2229const char delimiters[] = " .,;:!-";
2230char *running;
2231char *token;
2232
2233@dots{}
2234
5649a1d6 2235running = strdupa (string);
a5113b14
UD
2236token = strsep (&running, delimiters); /* token => "words" */
2237token = strsep (&running, delimiters); /* token => "separated" */
2238token = strsep (&running, delimiters); /* token => "by" */
2239token = strsep (&running, delimiters); /* token => "spaces" */
9afc8a59
UD
2240token = strsep (&running, delimiters); /* token => "" */
2241token = strsep (&running, delimiters); /* token => "" */
2242token = strsep (&running, delimiters); /* token => "" */
a5113b14 2243token = strsep (&running, delimiters); /* token => "and" */
9afc8a59 2244token = strsep (&running, delimiters); /* token => "" */
a5113b14 2245token = strsep (&running, delimiters); /* token => "punctuation" */
9afc8a59 2246token = strsep (&running, delimiters); /* token => "" */
a5113b14
UD
2247token = strsep (&running, delimiters); /* token => NULL */
2248@end smallexample
b4012b75 2249
ec28fc7c 2250@deftypefun {char *} basename (const char *@var{filename})
d08a7e4c 2251@standards{GNU, string.h}
11087373 2252@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
ec28fc7c 2253The GNU version of the @code{basename} function returns the last
9442cd75 2254component of the path in @var{filename}. This function is the preferred
ec28fc7c
UD
2255usage, since it does not modify the argument, @var{filename}, and
2256respects trailing slashes. The prototype for @code{basename} can be
ef48b196 2257found in @file{string.h}. Note, this function is overridden by the XPG
ec28fc7c
UD
2258version, if @file{libgen.h} is included.
2259
2260Example of using GNU @code{basename}:
2261
2262@smallexample
2263#include <string.h>
2264
2265int
2266main (int argc, char *argv[])
2267@{
2268 char *prog = basename (argv[0]);
2269
2270 if (argc < 2)
2271 @{
2272 fprintf (stderr, "Usage %s <arg>\n", prog);
2273 exit (1);
2274 @}
2275
2276 @dots{}
2277@}
2278@end smallexample
2279
2280@strong{Portability Note:} This function may produce different results
2281on different systems.
2282
2283@end deftypefun
2284
af85ebcd 2285@deftypefun {char *} basename (char *@var{path})
d08a7e4c 2286@standards{XPG, libgen.h}
11087373 2287@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
cf822e3c 2288This is the standard XPG defined @code{basename}. It is similar in
ec28fc7c 2289spirit to the GNU version, but may modify the @var{path} by removing
2cc4b9cc
PE
2290trailing '/' bytes. If the @var{path} is made up entirely of '/'
2291bytes, then "/" will be returned. Also, if @var{path} is
ec28fc7c 2292@code{NULL} or an empty string, then "." is returned. The prototype for
e4a5f77d 2293the XPG version can be found in @file{libgen.h}.
ec28fc7c
UD
2294
2295Example of using XPG @code{basename}:
2296
2297@smallexample
2298#include <libgen.h>
2299
2300int
2301main (int argc, char *argv[])
2302@{
2303 char *prog;
2304 char *path = strdupa (argv[0]);
2305
2306 prog = basename (path);
2307
2308 if (argc < 2)
2309 @{
2310 fprintf (stderr, "Usage %s <arg>\n", prog);
2311 exit (1);
2312 @}
2313
2314 @dots{}
2315
2316@}
2317@end smallexample
2318@end deftypefun
2319
ec28fc7c 2320@deftypefun {char *} dirname (char *@var{path})
d08a7e4c 2321@standards{XPG, libgen.h}
11087373 2322@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
ec28fc7c
UD
2323The @code{dirname} function is the compliment to the XPG version of
2324@code{basename}. It returns the parent directory of the file specified
2325by @var{path}. If @var{path} is @code{NULL}, an empty string, or
2cc4b9cc 2326contains no '/' bytes, then "." is returned. The prototype for this
ec28fc7c
UD
2327function can be found in @file{libgen.h}.
2328@end deftypefun
0e4ee106 2329
ea1bd74d
ZW
2330@node Erasing Sensitive Data
2331@section Erasing Sensitive Data
2332
2333Sensitive data, such as cryptographic keys, should be erased from
2334memory after use, to reduce the risk that a bug will expose it to the
2335outside world. However, compiler optimizations may determine that an
2336erasure operation is ``unnecessary,'' and remove it from the generated
2337code, because no @emph{correct} program could access the variable or
2338heap object containing the sensitive data after it's deallocated.
2339Since erasure is a precaution against bugs, this optimization is
2340inappropriate.
2341
2342The function @code{explicit_bzero} erases a block of memory, and
2343guarantees that the compiler will not remove the erasure as
2344``unnecessary.''
2345
2346@smallexample
2347@group
2348#include <string.h>
2349
2350extern void encrypt (const char *key, const char *in,
2351 char *out, size_t n);
2352extern void genkey (const char *phrase, char *key);
2353
2354void encrypt_with_phrase (const char *phrase, const char *in,
2355 char *out, size_t n)
2356@{
2357 char key[16];
2358 genkey (phrase, key);
2359 encrypt (key, in, out, n);
2360 explicit_bzero (key, 16);
2361@}
2362@end group
2363@end smallexample
2364
2365@noindent
2366In this example, if @code{memset}, @code{bzero}, or a hand-written
2367loop had been used, the compiler might remove them as ``unnecessary.''
2368
2369@strong{Warning:} @code{explicit_bzero} does not guarantee that
2370sensitive data is @emph{completely} erased from the computer's memory.
2371There may be copies in temporary storage areas, such as registers and
2372``scratch'' stack space; since these are invisible to the source code,
2373a library function cannot erase them.
2374
2375Also, @code{explicit_bzero} only operates on RAM. If a sensitive data
2376object never needs to have its address taken other than to call
2377@code{explicit_bzero}, it might be stored entirely in CPU registers
2378@emph{until} the call to @code{explicit_bzero}. Then it will be
2379copied into RAM, the copy will be erased, and the original will remain
2380intact. Data in RAM is more likely to be exposed by a bug than data
2381in registers, so this creates a brief window where the data is at
2382greater risk of exposure than it would have been if the program didn't
2383try to erase it at all.
2384
2385Declaring sensitive variables as @code{volatile} will make both the
2386above problems @emph{worse}; a @code{volatile} variable will be stored
2387in memory for its entire lifetime, and the compiler will make
2388@emph{more} copies of it than it would otherwise have. Attempting to
2389erase a normal variable ``by hand'' through a
2390@code{volatile}-qualified pointer doesn't work at all---because the
2391variable itself is not @code{volatile}, some compilers will ignore the
2392qualification on the pointer and remove the erasure anyway.
2393
2394Having said all that, in most situations, using @code{explicit_bzero}
2395is better than not using it. At present, the only way to do a more
2396thorough job is to write the entire sensitive operation in assembly
2397language. We anticipate that future compilers will recognize calls to
2398@code{explicit_bzero} and take appropriate steps to erase all the
2399copies of the affected data, whereever they may be.
2400
ea1bd74d 2401@deftypefun void explicit_bzero (void *@var{block}, size_t @var{len})
d08a7e4c 2402@standards{BSD, string.h}
ea1bd74d
ZW
2403@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
2404
2405@code{explicit_bzero} writes zero into @var{len} bytes of memory
2406beginning at @var{block}, just as @code{bzero} would. The zeroes are
2407always written, even if the compiler could determine that this is
2408``unnecessary'' because no correct program could read them back.
2409
2410@strong{Note:} The @emph{only} optimization that @code{explicit_bzero}
2411disables is removal of ``unnecessary'' writes to memory. The compiler
2412can perform all the other optimizations that it could for a call to
2413@code{memset}. For instance, it may replace the function call with
2414inline memory writes, and it may assume that @var{block} cannot be a
2415null pointer.
2416
2417@strong{Portability Note:} This function first appeared in OpenBSD 5.5
2418and has not been standardized. Other systems may provide the same
2419functionality under a different name, such as @code{explicit_memset},
2420@code{memset_s}, or @code{SecureZeroMemory}.
2421
2422@Theglibc{} declares this function in @file{string.h}, but on other
2423systems it may be in @file{strings.h} instead.
2424@end deftypefun
2425
b10a0acc
ZW
2426
2427@node Shuffling Bytes
2428@section Shuffling Bytes
0e4ee106
UD
2429
2430The function below addresses the perennial programming quandary: ``How do
2431I take good data in string form and painlessly turn it into garbage?''
b10a0acc
ZW
2432This is not a difficult thing to code for oneself, but the authors of
2433@theglibc{} wish to make it as convenient as possible.
0e4ee106 2434
b10a0acc
ZW
2435To @emph{erase} data, use @code{explicit_bzero} (@pxref{Erasing
2436Sensitive Data}); to obfuscate it reversibly, use @code{memfrob}
2437(@pxref{Obfuscating Data}).
0e4ee106 2438
ec28fc7c 2439@deftypefun {char *} strfry (char *@var{string})
d08a7e4c 2440@standards{GNU, string.h}
11087373
AO
2441@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
2442@c Calls initstate_r, time, getpid, strlen, and random_r.
0e4ee106 2443
b10a0acc
ZW
2444@code{strfry} performs an in-place shuffle on @var{string}. Each
2445character is swapped to a position selected at random, within the
2446portion of the string starting with the character's original position.
2447(This is the Fisher-Yates algorithm for unbiased shuffling.)
2448
2449Calling @code{strfry} will not disturb any of the random number
2450generators that have global state (@pxref{Pseudo-Random Numbers}).
0e4ee106
UD
2451
2452The return value of @code{strfry} is always @var{string}.
2453
1f77f049 2454@strong{Portability Note:} This function is unique to @theglibc{}.
b10a0acc 2455It is declared in @file{string.h}.
0e4ee106
UD
2456@end deftypefun
2457
2458
b10a0acc
ZW
2459@node Obfuscating Data
2460@section Obfuscating Data
0e4ee106
UD
2461@cindex Rot13
2462
b10a0acc
ZW
2463The @code{memfrob} function reversibly obfuscates an array of binary
2464data. This is not true encryption; the obfuscated data still bears a
2465clear relationship to the original, and no secret key is required to
2466undo the obfuscation. It is analogous to the ``Rot13'' cipher used on
2467Usenet for obscuring offensive jokes, spoilers for works of fiction,
2468and so on, but it can be applied to arbitrary binary data.
0e4ee106 2469
b10a0acc
ZW
2470Programs that need true encryption---a transformation that completely
2471obscures the original and cannot be reversed without knowledge of a
2472secret key---should use a dedicated cryptography library, such as
2473@uref{https://www.gnu.org/software/libgcrypt/,,libgcrypt}.
2474
2475Programs that need to @emph{destroy} data should use
2476@code{explicit_bzero} (@pxref{Erasing Sensitive Data}), or possibly
2477@code{strfry} (@pxref{Shuffling Bytes}).
0e4ee106 2478
0e4ee106 2479@deftypefun {void *} memfrob (void *@var{mem}, size_t @var{length})
d08a7e4c 2480@standards{GNU, string.h}
11087373 2481@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
0e4ee106 2482
b10a0acc
ZW
2483The function @code{memfrob} obfuscates @var{length} bytes of data
2484beginning at @var{mem}, in place. Each byte is bitwise xor-ed with
2485the binary pattern 00101010 (hexadecimal 0x2A). The return value is
2486always @var{mem}.
0e4ee106 2487
b10a0acc
ZW
2488@code{memfrob} a second time on the same data returns it to
2489its original state.
0e4ee106 2490
1f77f049 2491@strong{Portability Note:} This function is unique to @theglibc{}.
b10a0acc 2492It is declared in @file{string.h}.
0e4ee106
UD
2493@end deftypefun
2494
b4012b75
UD
2495@node Encode Binary Data
2496@section Encode Binary Data
2497
2498To store or transfer binary data in environments which only support text
2499one has to encode the binary data by mapping the input bytes to
2cc4b9cc 2500bytes in the range allowed for storing or transferring. SVID
dd7d45e8
UD
2501systems (and nowadays XPG compliant systems) provide minimal support for
2502this task.
b4012b75 2503
b4012b75 2504@deftypefun {char *} l64a (long int @var{n})
d08a7e4c 2505@standards{XPG, stdlib.h}
11087373 2506@safety{@prelim{}@mtunsafe{@mtasurace{:l64a}}@asunsafe{}@acsafe{}}
2cc4b9cc
PE
2507This function encodes a 32-bit input value using bytes from the
2508basic character set. It returns a pointer to a 7 byte buffer which
dd7d45e8
UD
2509contains an encoded version of @var{n}. To encode a series of bytes the
2510user must copy the returned string to a destination buffer. It returns
2511the empty string if @var{n} is zero, which is somewhat bizarre but
2512mandated by the standard.@*
2513@strong{Warning:} Since a static buffer is used this function should not
5649a1d6 2514be used in multi-threaded programs. There is no thread-safe alternative
dd7d45e8
UD
2515to this function in the C library.@*
2516@strong{Compatibility Note:} The XPG standard states that the return
2517value of @code{l64a} is undefined if @var{n} is negative. In the GNU
2518implementation, @code{l64a} treats its argument as unsigned, so it will
2519return a sensible encoding for any nonzero @var{n}; however, portable
2520programs should not rely on this.
b4012b75 2521
dd7d45e8
UD
2522To encode a large buffer @code{l64a} must be called in a loop, once for
2523each 32-bit word of the buffer. For example, one could do something
2524like this:
5649a1d6
UD
2525
2526@smallexample
2527char *
2528encode (const void *buf, size_t len)
2529@{
2530 /* @r{We know in advance how long the buffer has to be.} */
2531 unsigned char *in = (unsigned char *) buf;
2532 char *out = malloc (6 + ((len + 3) / 4) * 6 + 1);
290639c3 2533 char *cp = out, *p;
5649a1d6
UD
2534
2535 /* @r{Encode the length.} */
dd7d45e8 2536 /* @r{Using `htonl' is necessary so that the data can be}
290639c3
UD
2537 @r{decoded even on machines with different byte order.}
2538 @r{`l64a' can return a string shorter than 6 bytes, so }
2539 @r{we pad it with encoding of 0 (}'.'@r{) at the end by }
2540 @r{hand.} */
dd7d45e8 2541
290639c3
UD
2542 p = stpcpy (cp, l64a (htonl (len)));
2543 cp = mempcpy (p, "......", 6 - (p - cp));
5649a1d6
UD
2544
2545 while (len > 3)
2546 @{
2547 unsigned long int n = *in++;
2548 n = (n << 8) | *in++;
2549 n = (n << 8) | *in++;
2550 n = (n << 8) | *in++;
2551 len -= 4;
290639c3
UD
2552 p = stpcpy (cp, l64a (htonl (n)));
2553 cp = mempcpy (p, "......", 6 - (p - cp));
5649a1d6
UD
2554 @}
2555 if (len > 0)
2556 @{
2557 unsigned long int n = *in++;
2558 if (--len > 0)
2559 @{
2560 n = (n << 8) | *in++;
2561 if (--len > 0)
2562 n = (n << 8) | *in;
2563 @}
290639c3 2564 cp = stpcpy (cp, l64a (htonl (n)));
5649a1d6
UD
2565 @}
2566 *cp = '\0';
2567 return out;
2568@}
2569@end smallexample
2570
2571It is strange that the library does not provide the complete
dd7d45e8
UD
2572functionality needed but so be it.
2573
2574@end deftypefun
5649a1d6 2575
b4012b75
UD
2576To decode data produced with @code{l64a} the following function should be
2577used.
2578
2579@deftypefun {long int} a64l (const char *@var{string})
d08a7e4c 2580@standards{XPG, stdlib.h}
11087373 2581@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
b4012b75 2582The parameter @var{string} should contain a string which was produced by
2cc4b9cc
PE
2583a call to @code{l64a}. The function processes at least 6 bytes of
2584this string, and decodes the bytes it finds according to the table
2585below. It stops decoding when it finds a byte not in the table,
dd7d45e8 2586rather like @code{atoi}; if you have a buffer which has been broken into
2cc4b9cc 2587lines, you must be careful to skip over the end-of-line bytes.
dd7d45e8
UD
2588
2589The decoded number is returned as a @code{long int} value.
b4012b75 2590@end deftypefun
b13927da 2591
dd7d45e8 2592The @code{l64a} and @code{a64l} functions use a base 64 encoding, in
2cc4b9cc 2593which each byte of an encoded string represents six bits of an
dd7d45e8
UD
2594input word. These symbols are used for the base 64 digits:
2595
2596@multitable {xxxxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx}
2597@item @tab 0 @tab 1 @tab 2 @tab 3 @tab 4 @tab 5 @tab 6 @tab 7
2598@item 0 @tab @code{.} @tab @code{/} @tab @code{0} @tab @code{1}
2599 @tab @code{2} @tab @code{3} @tab @code{4} @tab @code{5}
2600@item 8 @tab @code{6} @tab @code{7} @tab @code{8} @tab @code{9}
2601 @tab @code{A} @tab @code{B} @tab @code{C} @tab @code{D}
2602@item 16 @tab @code{E} @tab @code{F} @tab @code{G} @tab @code{H}
2603 @tab @code{I} @tab @code{J} @tab @code{K} @tab @code{L}
2604@item 24 @tab @code{M} @tab @code{N} @tab @code{O} @tab @code{P}
2605 @tab @code{Q} @tab @code{R} @tab @code{S} @tab @code{T}
2606@item 32 @tab @code{U} @tab @code{V} @tab @code{W} @tab @code{X}
2607 @tab @code{Y} @tab @code{Z} @tab @code{a} @tab @code{b}
2608@item 40 @tab @code{c} @tab @code{d} @tab @code{e} @tab @code{f}
2609 @tab @code{g} @tab @code{h} @tab @code{i} @tab @code{j}
2610@item 48 @tab @code{k} @tab @code{l} @tab @code{m} @tab @code{n}
2611 @tab @code{o} @tab @code{p} @tab @code{q} @tab @code{r}
2612@item 56 @tab @code{s} @tab @code{t} @tab @code{u} @tab @code{v}
2613 @tab @code{w} @tab @code{x} @tab @code{y} @tab @code{z}
2614@end multitable
2615
2616This encoding scheme is not standard. There are some other encoding
2617methods which are much more widely used (UU encoding, MIME encoding).
2618Generally, it is better to use one of these encodings.
2619
b13927da
UD
2620@node Argz and Envz Vectors
2621@section Argz and Envz Vectors
2622
5649a1d6 2623@cindex argz vectors (string vectors)
2cc4b9cc
PE
2624@cindex string vectors, null-byte separated
2625@cindex argument vectors, null-byte separated
b13927da 2626@dfn{argz vectors} are vectors of strings in a contiguous block of
2cc4b9cc 2627memory, each element separated from its neighbors by null bytes
b13927da
UD
2628(@code{'\0'}).
2629
5649a1d6 2630@cindex envz vectors (environment vectors)
2cc4b9cc 2631@cindex environment vectors, null-byte separated
b13927da 2632@dfn{Envz vectors} are an extension of argz vectors where each element is a
2cc4b9cc 2633name-value pair, separated by a @code{'='} byte (as in a Unix
b13927da
UD
2634environment).
2635
2636@menu
2637* Argz Functions:: Operations on argz vectors.
2638* Envz Functions:: Additional operations on environment vectors.
2639@end menu
2640
2641@node Argz Functions, Envz Functions, , Argz and Envz Vectors
2642@subsection Argz Functions
2643
2644Each argz vector is represented by a pointer to the first element, of
2645type @code{char *}, and a size, of type @code{size_t}, both of which can
2646be initialized to @code{0} to represent an empty argz vector. All argz
2647functions accept either a pointer and a size argument, or pointers to
2648them, if they will be modified.
2649
2650The argz functions use @code{malloc}/@code{realloc} to allocate/grow
f0f308c1 2651argz vectors, and so any argz vector created using these functions may
b13927da
UD
2652be freed by using @code{free}; conversely, any argz function that may
2653grow a string expects that string to have been allocated using
2654@code{malloc} (those argz functions that only examine their arguments or
2655modify them in place will work on any sort of memory).
2656@xref{Unconstrained Allocation}.
2657
2658All argz functions that do memory allocation have a return type of
2659@code{error_t}, and return @code{0} for success, and @code{ENOMEM} if an
2660allocation error occurs.
2661
2662@pindex argz.h
2663These functions are declared in the standard include file @file{argz.h}.
2664
2665@deftypefun {error_t} argz_create (char *const @var{argv}[], char **@var{argz}, size_t *@var{argz_len})
d08a7e4c 2666@standards{GNU, argz.h}
11087373 2667@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
5649a1d6 2668The @code{argz_create} function converts the Unix-style argument vector
b13927da
UD
2669@var{argv} (a vector of pointers to normal C strings, terminated by
2670@code{(char *)0}; @pxref{Program Arguments}) into an argz vector with
2671the same elements, which is returned in @var{argz} and @var{argz_len}.
2672@end deftypefun
2673
2674@deftypefun {error_t} argz_create_sep (const char *@var{string}, int @var{sep}, char **@var{argz}, size_t *@var{argz_len})
d08a7e4c 2675@standards{GNU, argz.h}
11087373 2676@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
2cc4b9cc 2677The @code{argz_create_sep} function converts the string
b13927da 2678@var{string} into an argz vector (returned in @var{argz} and
49c091e5 2679@var{argz_len}) by splitting it into elements at every occurrence of the
2cc4b9cc 2680byte @var{sep}.
b13927da
UD
2681@end deftypefun
2682
f0f308c1 2683@deftypefun {size_t} argz_count (const char *@var{argz}, size_t @var{argz_len})
d08a7e4c 2684@standards{GNU, argz.h}
11087373 2685@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
b13927da
UD
2686Returns the number of elements in the argz vector @var{argz} and
2687@var{argz_len}.
2688@end deftypefun
2689
8ded91fb 2690@deftypefun {void} argz_extract (const char *@var{argz}, size_t @var{argz_len}, char **@var{argv})
d08a7e4c 2691@standards{GNU, argz.h}
11087373 2692@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
b13927da 2693The @code{argz_extract} function converts the argz vector @var{argz} and
5649a1d6 2694@var{argz_len} into a Unix-style argument vector stored in @var{argv},
b13927da
UD
2695by putting pointers to every element in @var{argz} into successive
2696positions in @var{argv}, followed by a terminator of @code{0}.
2697@var{Argv} must be pre-allocated with enough space to hold all the
2698elements in @var{argz} plus the terminating @code{(char *)0}
2699(@code{(argz_count (@var{argz}, @var{argz_len}) + 1) * sizeof (char *)}
2700bytes should be enough). Note that the string pointers stored into
2701@var{argv} point into @var{argz}---they are not copies---and so
2702@var{argz} must be copied if it will be changed while @var{argv} is
2703still active. This function is useful for passing the elements in
2704@var{argz} to an exec function (@pxref{Executing a File}).
2705@end deftypefun
2706
2707@deftypefun {void} argz_stringify (char *@var{argz}, size_t @var{len}, int @var{sep})
d08a7e4c 2708@standards{GNU, argz.h}
11087373 2709@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
b13927da 2710The @code{argz_stringify} converts @var{argz} into a normal string with
2cc4b9cc 2711the elements separated by the byte @var{sep}, by replacing each
b13927da
UD
2712@code{'\0'} inside @var{argz} (except the last one, which terminates the
2713string) with @var{sep}. This is handy for printing @var{argz} in a
2714readable manner.
2715@end deftypefun
2716
2717@deftypefun {error_t} argz_add (char **@var{argz}, size_t *@var{argz_len}, const char *@var{str})
d08a7e4c 2718@standards{GNU, argz.h}
11087373
AO
2719@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
2720@c Calls strlen and argz_append.
b13927da
UD
2721The @code{argz_add} function adds the string @var{str} to the end of the
2722argz vector @code{*@var{argz}}, and updates @code{*@var{argz}} and
2723@code{*@var{argz_len}} accordingly.
2724@end deftypefun
2725
2726@deftypefun {error_t} argz_add_sep (char **@var{argz}, size_t *@var{argz_len}, const char *@var{str}, int @var{delim})
d08a7e4c 2727@standards{GNU, argz.h}
11087373 2728@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
b13927da 2729The @code{argz_add_sep} function is similar to @code{argz_add}, but
49c091e5 2730@var{str} is split into separate elements in the result at occurrences of
2cc4b9cc 2731the byte @var{delim}. This is useful, for instance, for
5649a1d6 2732adding the components of a Unix search path to an argz vector, by using
b13927da
UD
2733a value of @code{':'} for @var{delim}.
2734@end deftypefun
2735
2736@deftypefun {error_t} argz_append (char **@var{argz}, size_t *@var{argz_len}, const char *@var{buf}, size_t @var{buf_len})
d08a7e4c 2737@standards{GNU, argz.h}
11087373 2738@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
b13927da
UD
2739The @code{argz_append} function appends @var{buf_len} bytes starting at
2740@var{buf} to the argz vector @code{*@var{argz}}, reallocating
2741@code{*@var{argz}} to accommodate it, and adding @var{buf_len} to
2742@code{*@var{argz_len}}.
2743@end deftypefun
2744
30aa5785 2745@deftypefun {void} argz_delete (char **@var{argz}, size_t *@var{argz_len}, char *@var{entry})
d08a7e4c 2746@standards{GNU, argz.h}
11087373
AO
2747@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
2748@c Calls free if no argument is left.
b13927da
UD
2749If @var{entry} points to the beginning of one of the elements in the
2750argz vector @code{*@var{argz}}, the @code{argz_delete} function will
2751remove this entry and reallocate @code{*@var{argz}}, modifying
2752@code{*@var{argz}} and @code{*@var{argz_len}} accordingly. Note that as
2753destructive argz functions usually reallocate their argz argument,
2754pointers into argz vectors such as @var{entry} will then become invalid.
2755@end deftypefun
2756
2757@deftypefun {error_t} argz_insert (char **@var{argz}, size_t *@var{argz_len}, char *@var{before}, const char *@var{entry})
d08a7e4c 2758@standards{GNU, argz.h}
11087373
AO
2759@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
2760@c Calls argz_add or realloc and memmove.
b13927da
UD
2761The @code{argz_insert} function inserts the string @var{entry} into the
2762argz vector @code{*@var{argz}} at a point just before the existing
2763element pointed to by @var{before}, reallocating @code{*@var{argz}} and
2764updating @code{*@var{argz}} and @code{*@var{argz_len}}. If @var{before}
2765is @code{0}, @var{entry} is added to the end instead (as if by
2766@code{argz_add}). Since the first element is in fact the same as
2767@code{*@var{argz}}, passing in @code{*@var{argz}} as the value of
2768@var{before} will result in @var{entry} being inserted at the beginning.
2769@end deftypefun
2770
8ded91fb 2771@deftypefun {char *} argz_next (const char *@var{argz}, size_t @var{argz_len}, const char *@var{entry})
d08a7e4c 2772@standards{GNU, argz.h}
11087373 2773@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
b13927da
UD
2774The @code{argz_next} function provides a convenient way of iterating
2775over the elements in the argz vector @var{argz}. It returns a pointer
2776to the next element in @var{argz} after the element @var{entry}, or
2777@code{0} if there are no elements following @var{entry}. If @var{entry}
2778is @code{0}, the first element of @var{argz} is returned.
2779
2780This behavior suggests two styles of iteration:
2781
2782@smallexample
2783 char *entry = 0;
2784 while ((entry = argz_next (@var{argz}, @var{argz_len}, entry)))
2785 @var{action};
2786@end smallexample
2787
2788(the double parentheses are necessary to make some C compilers shut up
2789about what they consider a questionable @code{while}-test) and:
2790
2791@smallexample
2792 char *entry;
2793 for (entry = @var{argz};
2794 entry;
2795 entry = argz_next (@var{argz}, @var{argz_len}, entry))
2796 @var{action};
2797@end smallexample
2798
2799Note that the latter depends on @var{argz} having a value of @code{0} if
2800it is empty (rather than a pointer to an empty block of memory); this
2801invariant is maintained for argz vectors created by the functions here.
2802@end deftypefun
2803
d705269e 2804@deftypefun error_t argz_replace (@w{char **@var{argz}, size_t *@var{argz_len}}, @w{const char *@var{str}, const char *@var{with}}, @w{unsigned *@var{replace_count}})
d08a7e4c 2805@standards{GNU, argz.h}
11087373 2806@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
49c091e5 2807Replace any occurrences of the string @var{str} in @var{argz} with
d705269e
UD
2808@var{with}, reallocating @var{argz} as necessary. If
2809@var{replace_count} is non-zero, @code{*@var{replace_count}} will be
f0f308c1 2810incremented by the number of replacements performed.
d705269e
UD
2811@end deftypefun
2812
b13927da
UD
2813@node Envz Functions, , Argz Functions, Argz and Envz Vectors
2814@subsection Envz Functions
2815
2816Envz vectors are just argz vectors with additional constraints on the form
2817of each element; as such, argz functions can also be used on them, where it
2818makes sense.
2819
2820Each element in an envz vector is a name-value pair, separated by a @code{'='}
2cc4b9cc 2821byte; if multiple @code{'='} bytes are present in an element, those
b13927da 2822after the first are considered part of the value, and treated like all other
2cc4b9cc 2823non-@code{'\0'} bytes.
b13927da 2824
2cc4b9cc 2825If @emph{no} @code{'='} bytes are present in an element, that element is
b13927da
UD
2826considered the name of a ``null'' entry, as distinct from an entry with an
2827empty value: @code{envz_get} will return @code{0} if given the name of null
2828entry, whereas an entry with an empty value would result in a value of
2829@code{""}; @code{envz_entry} will still find such entries, however. Null
f0f308c1 2830entries can be removed with the @code{envz_strip} function.
b13927da
UD
2831
2832As with argz functions, envz functions that may allocate memory (and thus
2833fail) have a return type of @code{error_t}, and return either @code{0} or
2834@code{ENOMEM}.
2835
2836@pindex envz.h
2837These functions are declared in the standard include file @file{envz.h}.
2838
2839@deftypefun {char *} envz_entry (const char *@var{envz}, size_t @var{envz_len}, const char *@var{name})
d08a7e4c 2840@standards{GNU, envz.h}
11087373 2841@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
b13927da
UD
2842The @code{envz_entry} function finds the entry in @var{envz} with the name
2843@var{name}, and returns a pointer to the whole entry---that is, the argz
2cc4b9cc 2844element which begins with @var{name} followed by a @code{'='} byte. If
b13927da
UD
2845there is no entry with that name, @code{0} is returned.
2846@end deftypefun
2847
2848@deftypefun {char *} envz_get (const char *@var{envz}, size_t @var{envz_len}, const char *@var{name})
d08a7e4c 2849@standards{GNU, envz.h}
11087373 2850@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
b13927da
UD
2851The @code{envz_get} function finds the entry in @var{envz} with the name
2852@var{name} (like @code{envz_entry}), and returns a pointer to the value
2853portion of that entry (following the @code{'='}). If there is no entry with
2854that name (or only a null entry), @code{0} is returned.
2855@end deftypefun
2856
2857@deftypefun {error_t} envz_add (char **@var{envz}, size_t *@var{envz_len}, const char *@var{name}, const char *@var{value})
d08a7e4c 2858@standards{GNU, envz.h}
11087373
AO
2859@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
2860@c Calls envz_remove, which calls enz_entry and argz_delete, and then
2861@c argz_add or equivalent code that reallocs and appends name=value.
b13927da
UD
2862The @code{envz_add} function adds an entry to @code{*@var{envz}}
2863(updating @code{*@var{envz}} and @code{*@var{envz_len}}) with the name
2864@var{name}, and value @var{value}. If an entry with the same name
2865already exists in @var{envz}, it is removed first. If @var{value} is
f0f308c1 2866@code{0}, then the new entry will be the special null type of entry
b13927da
UD
2867(mentioned above).
2868@end deftypefun
2869
2870@deftypefun {error_t} envz_merge (char **@var{envz}, size_t *@var{envz_len}, const char *@var{envz2}, size_t @var{envz2_len}, int @var{override})
d08a7e4c 2871@standards{GNU, envz.h}
11087373 2872@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
b13927da
UD
2873The @code{envz_merge} function adds each entry in @var{envz2} to @var{envz},
2874as if with @code{envz_add}, updating @code{*@var{envz}} and
2875@code{*@var{envz_len}}. If @var{override} is true, then values in @var{envz2}
2876will supersede those with the same name in @var{envz}, otherwise not.
2877
2878Null entries are treated just like other entries in this respect, so a null
2879entry in @var{envz} can prevent an entry of the same name in @var{envz2} from
2880being added to @var{envz}, if @var{override} is false.
2881@end deftypefun
2882
2883@deftypefun {void} envz_strip (char **@var{envz}, size_t *@var{envz_len})
d08a7e4c 2884@standards{GNU, envz.h}
11087373 2885@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
b13927da
UD
2886The @code{envz_strip} function removes any null entries from @var{envz},
2887updating @code{*@var{envz}} and @code{*@var{envz_len}}.
2888@end deftypefun
11087373 2889
920d7012 2890@deftypefun {void} envz_remove (char **@var{envz}, size_t *@var{envz_len}, const char *@var{name})
d08a7e4c 2891@standards{GNU, envz.h}
654055e0 2892@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
920d7012
SP
2893The @code{envz_remove} function removes an entry named @var{name} from
2894@var{envz}, updating @code{*@var{envz}} and @code{*@var{envz_len}}.
2895@end deftypefun
2896
11087373
AO
2897@c FIXME this are undocumented:
2898@c strcasecmp_l @safety{@mtsafe{}@assafe{}@acsafe{}} see strcasecmp