]> git.ipfire.org Git - thirdparty/glibc.git/blame - manual/string.texi
manual: Manual update for strlcat, strlcpy, wcslcat, wclscpy
[thirdparty/glibc.git] / manual / string.texi
CommitLineData
390955cb 1@node String and Array Utilities, Character Set Handling, Character Handling, Top
7a68c94a 2@c %MENU% Utilities for copying and comparing strings and arrays
28f540f4
RM
3@chapter String and Array Utilities
4
2cc4b9cc 5Operations on strings (null-terminated byte sequences) are an important part of
1f77f049 6many programs. @Theglibc{} provides an extensive set of string
28f540f4
RM
7utility functions, including functions for copying, concatenating,
8comparing, and searching strings. Many of these functions can also
9operate on arbitrary regions of storage; for example, the @code{memcpy}
a5113b14 10function can be used to copy the contents of any kind of array.
28f540f4
RM
11
12It's fairly common for beginning C programmers to ``reinvent the wheel''
13by duplicating this functionality in their own code, but it pays to
14become familiar with the library functions and to make use of them,
15since this offers benefits in maintenance, efficiency, and portability.
16
17For instance, you could easily compare one string to another in two
18lines of C code, but if you use the built-in @code{strcmp} function,
19you're less likely to make a mistake. And, since these library
20functions are typically highly optimized, your program may run faster
21too.
22
23@menu
24* Representation of Strings:: Introduction to basic concepts.
25* String/Array Conventions:: Whether to use a string function or an
26 arbitrary array function.
27* String Length:: Determining the length of a string.
0a13c9e9
PE
28* Copying Strings and Arrays:: Functions to copy strings and arrays.
29* Concatenating Strings:: Functions to concatenate strings while copying.
30* Truncating Strings:: Functions to truncate strings while copying.
28f540f4
RM
31* String/Array Comparison:: Functions for byte-wise and character-wise
32 comparison.
33* Collation Functions:: Functions for collating strings.
34* Search Functions:: Searching for a specific element or substring.
35* Finding Tokens in a String:: Splitting a string into tokens by looking
36 for delimiters.
ea1bd74d
ZW
37* Erasing Sensitive Data:: Clearing memory which contains sensitive
38 data, after it's no longer needed.
b10a0acc
ZW
39* Shuffling Bytes:: Or how to flash-cook a string.
40* Obfuscating Data:: Reversibly obscuring data from casual view.
b4012b75 41* Encode Binary Data:: Encoding and Decoding of Binary Data.
b13927da 42* Argz and Envz Vectors:: Null-separated string vectors.
28f540f4
RM
43@end menu
44
b4012b75 45@node Representation of Strings
28f540f4
RM
46@section Representation of Strings
47@cindex string, representation of
48
49This section is a quick summary of string concepts for beginning C
2cc4b9cc 50programmers. It describes how strings are represented in C
28f540f4
RM
51and some common pitfalls. If you are already familiar with this
52material, you can skip this section.
53
54@cindex string
2cc4b9cc
PE
55A @dfn{string} is a null-terminated array of bytes of type @code{char},
56including the terminating null byte. String-valued
28f540f4 57variables are usually declared to be pointers of type @code{char *}.
1fb22592 58Such variables do not include space for the contents of a string; that has
28f540f4
RM
59to be stored somewhere else---in an array variable, a string constant,
60or dynamically allocated memory (@pxref{Memory Allocation}). It's up to
61you to store the address of the chosen memory space into the pointer
62variable. Alternatively you can store a @dfn{null pointer} in the
63pointer variable. The null pointer does not point anywhere, so
64attempting to reference the string it points to gets an error.
65
2cc4b9cc
PE
66@cindex multibyte character
67@cindex multibyte string
68@cindex wide string
69A @dfn{multibyte character} is a sequence of one or more bytes that
70represents a single character using the locale's encoding scheme; a
71null byte always represents the null character. A @dfn{multibyte
72string} is a string that consists entirely of multibyte
73characters. In contrast, a @dfn{wide string} is a null-terminated
74sequence of @code{wchar_t} objects. A wide-string variable is usually
75declared to be a pointer of type @code{wchar_t *}, by analogy with
76string variables and @code{char *}. @xref{Extended Char Intro}.
77
78@cindex null byte
8a2f1f5b 79@cindex null wide character
2cc4b9cc
PE
80By convention, the @dfn{null byte}, @code{'\0'},
81marks the end of a string and the @dfn{null wide character},
82@code{L'\0'}, marks the end of a wide string. For example, in
8a2f1f5b 83testing to see whether the @code{char *} variable @var{p} points to a
2cc4b9cc 84null byte marking the end of a string, you can write
8a2f1f5b 85@code{!*@var{p}} or @code{*@var{p} == '\0'}.
28f540f4 86
2cc4b9cc
PE
87A null byte is quite different conceptually from a null pointer,
88although both are represented by the integer constant @code{0}.
28f540f4
RM
89
90@cindex string literal
2cc4b9cc
PE
91A @dfn{string literal} appears in C program source as a multibyte
92string between double-quote characters (@samp{"}). If the
93initial double-quote character is immediately preceded by a capital
94@samp{L} (ell) character (as in @code{L"foo"}), it is a wide string
95literal. String literals can also contribute to @dfn{string
96concatenation}: @code{"a" "b"} is the same as @code{"ab"}.
97For wide strings one can use either
8a2f1f5b
UD
98@code{L"a" L"b"} or @code{L"a" "b"}. Modification of string literals is
99not allowed by the GNU C compiler, because literals are placed in
100read-only storage.
28f540f4 101
2cc4b9cc 102Arrays that are declared @code{const} cannot be modified
28f540f4
RM
103either. It's generally good style to declare non-modifiable string
104pointers to be of type @code{const char *}, since this often allows the
105C compiler to detect accidental modifications as well as providing some
106amount of documentation about what your program intends to do with the
107string.
108
2cc4b9cc
PE
109The amount of memory allocated for a byte array may extend past the null byte
110that marks the end of the string that the array contains. In this
dd7d45e8 111document, the term @dfn{allocated size} is always used to refer to the
2cc4b9cc
PE
112total amount of memory allocated for an array, while the term
113@dfn{length} refers to the number of bytes up to (but not including)
114the terminating null byte. Wide strings are similar, except their
115sizes and lengths count wide characters, not bytes.
28f540f4
RM
116@cindex length of string
117@cindex allocation size of string
118@cindex size of string
119@cindex string length
120@cindex string allocation
121
2cc4b9cc 122A notorious source of program bugs is trying to put more bytes into a
28f540f4 123string than fit in its allocated size. When writing code that extends
2cc4b9cc 124strings or moves bytes into a pre-allocated array, you should be
1fb22592 125very careful to keep track of the length of the string and make explicit
28f540f4
RM
126checks for overflowing the array. Many of the library functions
127@emph{do not} do this for you! Remember also that you need to allocate
2cc4b9cc 128an extra byte to hold the null byte that marks the end of the
28f540f4
RM
129string.
130
8a2f1f5b
UD
131@cindex single-byte string
132@cindex multibyte string
2cc4b9cc 133Originally strings were sequences of bytes where each byte represented a
8a2f1f5b
UD
134single character. This is still true today if the strings are encoded
135using a single-byte character encoding. Things are different if the
136strings are encoded using a multibyte encoding (for more information on
137encodings see @ref{Extended Char Intro}). There is no difference in
138the programming interface for these two kind of strings; the programmer
139has to be aware of this and interpret the byte sequences accordingly.
140
141But since there is no separate interface taking care of these
142differences the byte-based string functions are sometimes hard to use.
143Since the count parameters of these functions specify bytes a call to
2cc4b9cc 144@code{memcpy} could cut a multibyte character in the middle and put an
8a2f1f5b
UD
145incomplete (and therefore unusable) byte sequence in the target buffer.
146
2cc4b9cc 147@cindex wide string
8a2f1f5b
UD
148To avoid these problems later versions of the @w{ISO C} standard
149introduce a second set of functions which are operating on @dfn{wide
150characters} (@pxref{Extended Char Intro}). These functions don't have
151the problems the single-byte versions have since every wide character is
152a legal, interpretable value. This does not mean that cutting wide
2cc4b9cc 153strings at arbitrary points is without problems. It normally
8a2f1f5b
UD
154is for alphabet-based languages (except for non-normalized text) but
155languages based on syllables still have the problem that more than one
156wide character is necessary to complete a logical unit. This is a
157higher level problem which the @w{C library} functions are not designed
158to solve. But it is at least good that no invalid byte sequences can be
2cc4b9cc
PE
159created. Also, the higher level functions can also much more easily operate
160on wide characters than on multibyte characters so that a common strategy
8a2f1f5b
UD
161is to use wide characters internally whenever text is more than simply
162copied.
163
164The remaining of this chapter will discuss the functions for handling
2cc4b9cc
PE
165wide strings in parallel with the discussion of
166strings since there is almost always an exact equivalent
8a2f1f5b
UD
167available.
168
b4012b75 169@node String/Array Conventions
28f540f4
RM
170@section String and Array Conventions
171
172This chapter describes both functions that work on arbitrary arrays or
2cc4b9cc
PE
173blocks of memory, and functions that are specific to strings and wide
174strings.
28f540f4
RM
175
176Functions that operate on arbitrary blocks of memory have names
8a2f1f5b
UD
177beginning with @samp{mem} and @samp{wmem} (such as @code{memcpy} and
178@code{wmemcpy}) and invariably take an argument which specifies the size
179(in bytes and wide characters respectively) of the block of memory to
28f540f4 180operate on. The array arguments and return values for these functions
8a2f1f5b
UD
181have type @code{void *} or @code{wchar_t}. As a matter of style, the
182elements of the arrays used with the @samp{mem} functions are referred
183to as ``bytes''. You can pass any kind of pointer to these functions,
184and the @code{sizeof} operator is useful in computing the value for the
185size argument. Parameters to the @samp{wmem} functions must be of type
186@code{wchar_t *}. These functions are not really usable with anything
187but arrays of this type.
188
189In contrast, functions that operate specifically on strings and wide
2cc4b9cc 190strings have names beginning with @samp{str} and @samp{wcs}
8a2f1f5b 191respectively (such as @code{strcpy} and @code{wcscpy}) and look for a
2cc4b9cc 192terminating null byte or null wide character instead of requiring an explicit
8a2f1f5b 193size argument to be passed. (Some of these functions accept a specified
2cc4b9cc
PE
194maximum length, but they also check for premature termination.)
195The array arguments and return values for these
8a2f1f5b 196functions have type @code{char *} and @code{wchar_t *} respectively, and
2cc4b9cc 197the array elements are referred to as ``bytes'' and ``wide
8a2f1f5b
UD
198characters''.
199
200In many cases, there are both @samp{mem} and @samp{str}/@samp{wcs}
201versions of a function. The one that is more appropriate to use depends
202on the exact situation. When your program is manipulating arbitrary
203arrays or blocks of storage, then you should always use the @samp{mem}
2cc4b9cc 204functions. On the other hand, when you are manipulating
8a2f1f5b
UD
205strings it is usually more convenient to use the @samp{str}/@samp{wcs}
206functions, unless you already know the length of the string in advance.
207The @samp{wmem} functions should be used for wide character arrays with
208known size.
209
210@cindex wint_t
211@cindex parameter promotion
212Some of the memory and string functions take single characters as
213arguments. Since a value of type @code{char} is automatically promoted
9dcc8f11 214into a value of type @code{int} when used as a parameter, the functions
8a2f1f5b 215are declared with @code{int} as the type of the parameter in question.
2cc4b9cc 216In case of the wide character functions the situation is similar: the
8a2f1f5b
UD
217parameter type for a single wide character is @code{wint_t} and not
218@code{wchar_t}. This would for many implementations not be necessary
2cc4b9cc 219since @code{wchar_t} is large enough to not be automatically
8a2f1f5b
UD
220promoted, but since the @w{ISO C} standard does not require such a
221choice of types the @code{wint_t} type is used.
28f540f4 222
b4012b75 223@node String Length
28f540f4
RM
224@section String Length
225
226You can get the length of a string using the @code{strlen} function.
227This function is declared in the header file @file{string.h}.
228@pindex string.h
229
28f540f4 230@deftypefun size_t strlen (const char *@var{s})
d08a7e4c 231@standards{ISO, string.h}
11087373 232@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
2cc4b9cc 233The @code{strlen} function returns the length of the
8a2f1f5b 234string @var{s} in bytes. (In other words, it returns the offset of the
2cc4b9cc 235terminating null byte within the array.)
28f540f4
RM
236
237For example,
238@smallexample
239strlen ("hello, world")
240 @result{} 12
241@end smallexample
242
2cc4b9cc 243When applied to an array, the @code{strlen} function returns
dd7d45e8 244the length of the string stored there, not its allocated size. You can
2cc4b9cc 245get the allocated size of the array that holds a string using
28f540f4
RM
246the @code{sizeof} operator:
247
248@smallexample
a5113b14 249char string[32] = "hello, world";
28f540f4
RM
250sizeof (string)
251 @result{} 32
252strlen (string)
253 @result{} 12
254@end smallexample
dd7d45e8 255
2cc4b9cc 256But beware, this will not work unless @var{string} is the
dd7d45e8
UD
257array itself, not a pointer to it. For example:
258
259@smallexample
260char string[32] = "hello, world";
261char *ptr = string;
262sizeof (string)
263 @result{} 32
264sizeof (ptr)
265 @result{} 4 /* @r{(on a machine with 4 byte pointers)} */
266@end smallexample
267
268This is an easy mistake to make when you are working with functions that
269take string arguments; those arguments are always pointers, not arrays.
270
8a2f1f5b
UD
271It must also be noted that for multibyte encoded strings the return
272value does not have to correspond to the number of characters in the
273string. To get this value the string can be converted to wide
274characters and @code{wcslen} can be used or something like the following
275code can be used:
276
277@smallexample
278/* @r{The input is in @code{string}.}
279 @r{The length is expected in @code{n}.} */
280@{
281 mbstate_t t;
282 char *scopy = string;
283 /* In initial state. */
284 memset (&t, '\0', sizeof (t));
285 /* Determine number of characters. */
286 n = mbsrtowcs (NULL, &scopy, strlen (scopy), &t);
287@}
288@end smallexample
289
290This is cumbersome to do so if the number of characters (as opposed to
291bytes) is needed often it is better to work with wide characters.
292@end deftypefun
293
294The wide character equivalent is declared in @file{wchar.h}.
295
8a2f1f5b 296@deftypefun size_t wcslen (const wchar_t *@var{ws})
d08a7e4c 297@standards{ISO, wchar.h}
11087373 298@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
299The @code{wcslen} function is the wide character equivalent to
300@code{strlen}. The return value is the number of wide characters in the
2cc4b9cc 301wide string pointed to by @var{ws} (this is also the offset of
8a2f1f5b
UD
302the terminating null wide character of @var{ws}).
303
2cc4b9cc 304Since there are no multi wide character sequences making up one wide
8a2f1f5b
UD
305character the return value is not only the offset in the array, it is
306also the number of wide characters.
307
308This function was introduced in @w{Amendment 1} to @w{ISO C90}.
28f540f4
RM
309@end deftypefun
310
4547c1a4 311@deftypefun size_t strnlen (const char *@var{s}, size_t @var{maxlen})
d08a7e4c 312@standards{GNU, string.h}
11087373 313@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
2cc4b9cc
PE
314If the array @var{s} of size @var{maxlen} contains a null byte,
315the @code{strnlen} function returns the length of the string @var{s} in
316bytes. Otherwise it
8a2f1f5b 317returns @var{maxlen}. Therefore this function is equivalent to
ebaf36eb
JM
318@code{(strlen (@var{s}) < @var{maxlen} ? strlen (@var{s}) : @var{maxlen})}
319but it
2cc4b9cc
PE
320is more efficient and works even if @var{s} is not null-terminated so
321long as @var{maxlen} does not exceed the size of @var{s}'s array.
4547c1a4
UD
322
323@smallexample
324char string[32] = "hello, world";
325strnlen (string, 32)
326 @result{} 12
327strnlen (string, 5)
328 @result{} 5
329@end smallexample
330
8a2f1f5b
UD
331This function is a GNU extension and is declared in @file{string.h}.
332@end deftypefun
333
8a2f1f5b 334@deftypefun size_t wcsnlen (const wchar_t *@var{ws}, size_t @var{maxlen})
d08a7e4c 335@standards{GNU, wchar.h}
11087373 336@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
337@code{wcsnlen} is the wide character equivalent to @code{strnlen}. The
338@var{maxlen} parameter specifies the maximum number of wide characters.
339
340This function is a GNU extension and is declared in @file{wchar.h}.
4547c1a4
UD
341@end deftypefun
342
0a13c9e9
PE
343@node Copying Strings and Arrays
344@section Copying Strings and Arrays
28f540f4
RM
345
346You can use the functions described in this section to copy the contents
0a13c9e9
PE
347of strings, wide strings, and arrays. The @samp{str} and @samp{mem}
348functions are declared in @file{string.h} while the @samp{w} functions
349are declared in @file{wchar.h}.
28f540f4 350@pindex string.h
8a2f1f5b 351@pindex wchar.h
28f540f4
RM
352@cindex copying strings and arrays
353@cindex string copy functions
354@cindex array copy functions
355@cindex concatenating strings
356@cindex string concatenation functions
357
358A helpful way to remember the ordering of the arguments to the functions
359in this section is that it corresponds to an assignment expression, with
0a13c9e9
PE
360the destination array specified to the left of the source array. Most
361of these functions return the address of the destination array; a few
362return the address of the destination's terminating null, or of just
363past the destination.
28f540f4
RM
364
365Most of these functions do not work properly if the source and
366destination arrays overlap. For example, if the beginning of the
367destination array overlaps the end of the source array, the original
368contents of that part of the source array may get overwritten before it
369is copied. Even worse, in the case of the string functions, the null
2cc4b9cc 370byte marking the end of the string may be lost, and the copy
28f540f4
RM
371function might get stuck in a loop trashing all the memory allocated to
372your program.
373
374All functions that have problems copying between overlapping arrays are
375explicitly identified in this manual. In addition to functions in this
376section, there are a few others like @code{sprintf} (@pxref{Formatted
377Output Functions}) and @code{scanf} (@pxref{Formatted Input
378Functions}).
379
8a2f1f5b 380@deftypefun {void *} memcpy (void *restrict @var{to}, const void *restrict @var{from}, size_t @var{size})
d08a7e4c 381@standards{ISO, string.h}
11087373 382@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
383The @code{memcpy} function copies @var{size} bytes from the object
384beginning at @var{from} into the object beginning at @var{to}. The
385behavior of this function is undefined if the two arrays @var{to} and
386@var{from} overlap; use @code{memmove} instead if overlapping is possible.
387
388The value returned by @code{memcpy} is the value of @var{to}.
389
390Here is an example of how you might use @code{memcpy} to copy the
391contents of an array:
392
393@smallexample
394struct foo *oldarray, *newarray;
395int arraysize;
396@dots{}
397memcpy (new, old, arraysize * sizeof (struct foo));
398@end smallexample
399@end deftypefun
400
79827876 401@deftypefun {wchar_t *} wmemcpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size})
d08a7e4c 402@standards{ISO, wchar.h}
11087373 403@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
404The @code{wmemcpy} function copies @var{size} wide characters from the object
405beginning at @var{wfrom} into the object beginning at @var{wto}. The
406behavior of this function is undefined if the two arrays @var{wto} and
407@var{wfrom} overlap; use @code{wmemmove} instead if overlapping is possible.
408
409The following is a possible implementation of @code{wmemcpy} but there
410are more optimizations possible.
411
412@smallexample
413wchar_t *
414wmemcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom,
415 size_t size)
416@{
417 return (wchar_t *) memcpy (wto, wfrom, size * sizeof (wchar_t));
418@}
419@end smallexample
420
421The value returned by @code{wmemcpy} is the value of @var{wto}.
422
423This function was introduced in @w{Amendment 1} to @w{ISO C90}.
424@end deftypefun
425
8a2f1f5b 426@deftypefun {void *} mempcpy (void *restrict @var{to}, const void *restrict @var{from}, size_t @var{size})
d08a7e4c 427@standards{GNU, string.h}
11087373 428@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
4547c1a4 429The @code{mempcpy} function is nearly identical to the @code{memcpy}
f2ea0f5b 430function. It copies @var{size} bytes from the object beginning at
4547c1a4 431@code{from} into the object pointed to by @var{to}. But instead of
976780fd 432returning the value of @var{to} it returns a pointer to the byte
4547c1a4
UD
433following the last written byte in the object beginning at @var{to}.
434I.e., the value is @code{((void *) ((char *) @var{to} + @var{size}))}.
435
436This function is useful in situations where a number of objects shall be
437copied to consecutive memory positions.
438
439@smallexample
440void *
441combine (void *o1, size_t s1, void *o2, size_t s2)
442@{
443 void *result = malloc (s1 + s2);
444 if (result != NULL)
445 mempcpy (mempcpy (result, o1, s1), o2, s2);
446 return result;
447@}
448@end smallexample
449
450This function is a GNU extension.
451@end deftypefun
452
8a2f1f5b 453@deftypefun {wchar_t *} wmempcpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size})
d08a7e4c 454@standards{GNU, wchar.h}
11087373 455@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
456The @code{wmempcpy} function is nearly identical to the @code{wmemcpy}
457function. It copies @var{size} wide characters from the object
458beginning at @code{wfrom} into the object pointed to by @var{wto}. But
459instead of returning the value of @var{wto} it returns a pointer to the
460wide character following the last written wide character in the object
461beginning at @var{wto}. I.e., the value is @code{@var{wto} + @var{size}}.
462
463This function is useful in situations where a number of objects shall be
464copied to consecutive memory positions.
465
466The following is a possible implementation of @code{wmemcpy} but there
467are more optimizations possible.
468
469@smallexample
470wchar_t *
471wmempcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom,
472 size_t size)
473@{
474 return (wchar_t *) mempcpy (wto, wfrom, size * sizeof (wchar_t));
475@}
476@end smallexample
477
478This function is a GNU extension.
479@end deftypefun
480
28f540f4 481@deftypefun {void *} memmove (void *@var{to}, const void *@var{from}, size_t @var{size})
d08a7e4c 482@standards{ISO, string.h}
11087373 483@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
484@code{memmove} copies the @var{size} bytes at @var{from} into the
485@var{size} bytes at @var{to}, even if those two blocks of space
486overlap. In the case of overlap, @code{memmove} is careful to copy the
487original values of the bytes in the block at @var{from}, including those
488bytes which also belong to the block at @var{to}.
8a2f1f5b
UD
489
490The value returned by @code{memmove} is the value of @var{to}.
491@end deftypefun
492
8ded91fb 493@deftypefun {wchar_t *} wmemmove (wchar_t *@var{wto}, const wchar_t *@var{wfrom}, size_t @var{size})
d08a7e4c 494@standards{ISO, wchar.h}
11087373 495@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
496@code{wmemmove} copies the @var{size} wide characters at @var{wfrom}
497into the @var{size} wide characters at @var{wto}, even if those two
f0f308c1 498blocks of space overlap. In the case of overlap, @code{wmemmove} is
8a2f1f5b
UD
499careful to copy the original values of the wide characters in the block
500at @var{wfrom}, including those wide characters which also belong to the
501block at @var{wto}.
502
503The following is a possible implementation of @code{wmemcpy} but there
504are more optimizations possible.
505
506@smallexample
507wchar_t *
508wmempcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom,
509 size_t size)
510@{
511 return (wchar_t *) mempcpy (wto, wfrom, size * sizeof (wchar_t));
512@}
513@end smallexample
514
515The value returned by @code{wmemmove} is the value of @var{wto}.
516
517This function is a GNU extension.
28f540f4
RM
518@end deftypefun
519
8a2f1f5b 520@deftypefun {void *} memccpy (void *restrict @var{to}, const void *restrict @var{from}, int @var{c}, size_t @var{size})
d08a7e4c 521@standards{SVID, string.h}
11087373 522@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
523This function copies no more than @var{size} bytes from @var{from} to
524@var{to}, stopping if a byte matching @var{c} is found. The return
525value is a pointer into @var{to} one byte past where @var{c} was copied,
526or a null pointer if no byte matching @var{c} appeared in the first
527@var{size} bytes of @var{from}.
528@end deftypefun
529
28f540f4 530@deftypefun {void *} memset (void *@var{block}, int @var{c}, size_t @var{size})
d08a7e4c 531@standards{ISO, string.h}
11087373 532@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
533This function copies the value of @var{c} (converted to an
534@code{unsigned char}) into each of the first @var{size} bytes of the
535object beginning at @var{block}. It returns the value of @var{block}.
536@end deftypefun
537
8a2f1f5b 538@deftypefun {wchar_t *} wmemset (wchar_t *@var{block}, wchar_t @var{wc}, size_t @var{size})
d08a7e4c 539@standards{ISO, wchar.h}
11087373 540@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
541This function copies the value of @var{wc} into each of the first
542@var{size} wide characters of the object beginning at @var{block}. It
543returns the value of @var{block}.
544@end deftypefun
545
8a2f1f5b 546@deftypefun {char *} strcpy (char *restrict @var{to}, const char *restrict @var{from})
d08a7e4c 547@standards{ISO, string.h}
11087373 548@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
2cc4b9cc
PE
549This copies bytes from the string @var{from} (up to and including
550the terminating null byte) into the string @var{to}. Like
28f540f4
RM
551@code{memcpy}, this function has undefined results if the strings
552overlap. The return value is the value of @var{to}.
553@end deftypefun
554
8a2f1f5b 555@deftypefun {wchar_t *} wcscpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom})
d08a7e4c 556@standards{ISO, wchar.h}
11087373 557@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
2cc4b9cc 558This copies wide characters from the wide string @var{wfrom} (up to and
8a2f1f5b
UD
559including the terminating null wide character) into the string
560@var{wto}. Like @code{wmemcpy}, this function has undefined results if
561the strings overlap. The return value is the value of @var{wto}.
562@end deftypefun
563
28f540f4 564@deftypefun {char *} strdup (const char *@var{s})
a448ee41 565@standards{SVID, string.h}
11087373 566@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
2cc4b9cc 567This function copies the string @var{s} into a newly
28f540f4
RM
568allocated string. The string is allocated using @code{malloc}; see
569@ref{Unconstrained Allocation}. If @code{malloc} cannot allocate space
570for the new string, @code{strdup} returns a null pointer. Otherwise it
571returns a pointer to the new string.
572@end deftypefun
573
8a2f1f5b 574@deftypefun {wchar_t *} wcsdup (const wchar_t *@var{ws})
d08a7e4c 575@standards{GNU, wchar.h}
11087373 576@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
2cc4b9cc 577This function copies the wide string @var{ws}
8a2f1f5b
UD
578into a newly allocated string. The string is allocated using
579@code{malloc}; see @ref{Unconstrained Allocation}. If @code{malloc}
580cannot allocate space for the new string, @code{wcsdup} returns a null
2cc4b9cc 581pointer. Otherwise it returns a pointer to the new wide string.
8a2f1f5b
UD
582
583This function is a GNU extension.
584@end deftypefun
585
8a2f1f5b 586@deftypefun {char *} stpcpy (char *restrict @var{to}, const char *restrict @var{from})
d08a7e4c 587@standards{Unknown origin, string.h}
11087373 588@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
589This function is like @code{strcpy}, except that it returns a pointer to
590the end of the string @var{to} (that is, the address of the terminating
2cc4b9cc 591null byte @code{to + strlen (from)}) rather than the beginning.
28f540f4
RM
592
593For example, this program uses @code{stpcpy} to concatenate @samp{foo}
594and @samp{bar} to produce @samp{foobar}, which it then prints.
595
596@smallexample
597@include stpcpy.c.texi
598@end smallexample
599
c30c3f46
RM
600This function is part of POSIX.1-2008 and later editions, but was
601available in @theglibc{} and other systems as an extension long before
602it was standardized.
28f540f4 603
8a2f1f5b
UD
604Its behavior is undefined if the strings overlap. The function is
605declared in @file{string.h}.
606@end deftypefun
607
8a2f1f5b 608@deftypefun {wchar_t *} wcpcpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom})
d08a7e4c 609@standards{GNU, wchar.h}
11087373 610@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
611This function is like @code{wcscpy}, except that it returns a pointer to
612the end of the string @var{wto} (that is, the address of the terminating
2cc4b9cc 613null wide character @code{wto + wcslen (wfrom)}) rather than the beginning.
8a2f1f5b
UD
614
615This function is not part of ISO or POSIX but was found useful while
1f77f049 616developing @theglibc{} itself.
8a2f1f5b
UD
617
618The behavior of @code{wcpcpy} is undefined if the strings overlap.
619
620@code{wcpcpy} is a GNU extension and is declared in @file{wchar.h}.
28f540f4
RM
621@end deftypefun
622
26b4d766 623@deftypefn {Macro} {char *} strdupa (const char *@var{s})
d08a7e4c 624@standards{GNU, string.h}
11087373 625@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
976780fd 626This macro is similar to @code{strdup} but allocates the new string
dd7d45e8
UD
627using @code{alloca} instead of @code{malloc} (@pxref{Variable Size
628Automatic}). This means of course the returned string has the same
629limitations as any block of memory allocated using @code{alloca}.
706074a5 630
dd7d45e8 631For obvious reasons @code{strdupa} is implemented only as a macro;
40a55d20 632you cannot get the address of this function. Despite this limitation
706074a5
UD
633it is a useful function. The following code shows a situation where
634using @code{malloc} would be a lot more expensive.
635
636@smallexample
637@include strdupa.c.texi
638@end smallexample
639
640Please note that calling @code{strtok} using @var{path} directly is
8a2f1f5b
UD
641invalid. It is also not allowed to call @code{strdupa} in the argument
642list of @code{strtok} since @code{strdupa} uses @code{alloca}
643(@pxref{Variable Size Automatic}) can interfere with the parameter
644passing.
706074a5
UD
645
646This function is only available if GNU CC is used.
26b4d766 647@end deftypefn
706074a5 648
0a13c9e9 649@deftypefun void bcopy (const void *@var{from}, void *@var{to}, size_t @var{size})
d08a7e4c 650@standards{BSD, string.h}
11087373 651@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
0a13c9e9
PE
652This is a partially obsolete alternative for @code{memmove}, derived from
653BSD. Note that it is not quite equivalent to @code{memmove}, because the
654arguments are not in the same order and there is no return value.
655@end deftypefun
706074a5 656
0a13c9e9 657@deftypefun void bzero (void *@var{block}, size_t @var{size})
d08a7e4c 658@standards{BSD, string.h}
0a13c9e9
PE
659@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
660This is a partially obsolete alternative for @code{memset}, derived from
661BSD. Note that it is not as general as @code{memset}, because the only
662value it can store is zero.
663@end deftypefun
706074a5 664
0a13c9e9
PE
665@node Concatenating Strings
666@section Concatenating Strings
667@pindex string.h
668@pindex wchar.h
669@cindex concatenating strings
670@cindex string concatenation functions
671
672The functions described in this section concatenate the contents of a
673string or wide string to another. They follow the string-copying
674functions in their conventions. @xref{Copying Strings and Arrays}.
675@samp{strcat} is declared in the header file @file{string.h} while
676@samp{wcscat} is declared in @file{wchar.h}.
706074a5 677
1fb22592
PE
678As noted below, these functions are problematic as their callers may
679have performance issues.
680
8a2f1f5b 681@deftypefun {char *} strcat (char *restrict @var{to}, const char *restrict @var{from})
d08a7e4c 682@standards{ISO, string.h}
11087373 683@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4 684The @code{strcat} function is similar to @code{strcpy}, except that the
2cc4b9cc
PE
685bytes from @var{from} are concatenated or appended to the end of
686@var{to}, instead of overwriting it. That is, the first byte from
687@var{from} overwrites the null byte marking the end of @var{to}.
28f540f4
RM
688
689An equivalent definition for @code{strcat} would be:
690
691@smallexample
692char *
8a2f1f5b 693strcat (char *restrict to, const char *restrict from)
28f540f4
RM
694@{
695 strcpy (to + strlen (to), from);
696 return to;
697@}
698@end smallexample
699
700This function has undefined results if the strings overlap.
0a13c9e9
PE
701
702As noted below, this function has significant performance issues.
28f540f4
RM
703@end deftypefun
704
8a2f1f5b 705@deftypefun {wchar_t *} wcscat (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom})
d08a7e4c 706@standards{ISO, wchar.h}
11087373 707@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b 708The @code{wcscat} function is similar to @code{wcscpy}, except that the
2cc4b9cc
PE
709wide characters from @var{wfrom} are concatenated or appended to the end of
710@var{wto}, instead of overwriting it. That is, the first wide character from
711@var{wfrom} overwrites the null wide character marking the end of @var{wto}.
8a2f1f5b
UD
712
713An equivalent definition for @code{wcscat} would be:
714
715@smallexample
716wchar_t *
717wcscat (wchar_t *wto, const wchar_t *wfrom)
718@{
719 wcscpy (wto + wcslen (wto), wfrom);
720 return wto;
721@}
722@end smallexample
723
724This function has undefined results if the strings overlap.
0a13c9e9
PE
725
726As noted below, this function has significant performance issues.
8a2f1f5b
UD
727@end deftypefun
728
d2fda60e
PE
729Programmers using the @code{strcat} or @code{wcscat} functions (or the
730@code{strlcat}, @code{strncat} and @code{wcsncat} functions defined in
0a13c9e9 731a later section, for that matter)
8a2f1f5b
UD
732can easily be recognized as lazy and reckless. In almost all situations
733the lengths of the participating strings are known (it better should be
734since how can one otherwise ensure the allocated size of the buffer is
735sufficient?) Or at least, one could know them if one keeps track of the
ee2752ea 736results of the various function calls. But then it is very inefficient
8a2f1f5b
UD
737to use @code{strcat}/@code{wcscat}. A lot of time is wasted finding the
738end of the destination string so that the actual copying can start.
739This is a common example:
ee2752ea 740
ee2752ea
UD
741@cindex va_copy
742@smallexample
49c091e5 743/* @r{This function concatenates arbitrarily many strings. The last}
ee2752ea
UD
744 @r{parameter must be @code{NULL}.} */
745char *
8a2f1f5b 746concat (const char *str, @dots{})
ee2752ea
UD
747@{
748 va_list ap, ap2;
749 size_t total = 1;
ee2752ea
UD
750
751 va_start (ap, str);
b5982523 752 va_copy (ap2, ap);
ee2752ea
UD
753
754 /* @r{Determine how much space we need.} */
bdc674d9 755 for (const char *s = str; s != NULL; s = va_arg (ap, const char *))
ee2752ea
UD
756 total += strlen (s);
757
758 va_end (ap);
759
bdc674d9 760 char *result = malloc (total);
ee2752ea
UD
761 if (result != NULL)
762 @{
763 result[0] = '\0';
764
765 /* @r{Copy the strings.} */
766 for (s = str; s != NULL; s = va_arg (ap2, const char *))
767 strcat (result, s);
768 @}
769
770 va_end (ap2);
771
772 return result;
773@}
774@end smallexample
775
776This looks quite simple, especially the second loop where the strings
777are actually copied. But these innocent lines hide a major performance
778penalty. Just imagine that ten strings of 100 bytes each have to be
779concatenated. For the second string we search the already stored 100
780bytes for the end of the string so that we can append the next string.
781For all strings in total the comparisons necessary to find the end of
782the intermediate results sums up to 5500! If we combine the copying
783with the search for the allocation we can write this function more
f0f308c1 784efficiently:
ee2752ea
UD
785
786@smallexample
787char *
8a2f1f5b 788concat (const char *str, @dots{})
ee2752ea 789@{
ee2752ea 790 size_t allocated = 100;
bdc674d9 791 char *result = malloc (allocated);
ee2752ea 792
623281e0 793 if (result != NULL)
ee2752ea 794 @{
bdc674d9
PE
795 va_list ap;
796 size_t resultlen = 0;
ee2752ea
UD
797 char *newp;
798
623281e0 799 va_start (ap, str);
ee2752ea 800
bdc674d9 801 for (const char *s = str; s != NULL; s = va_arg (ap, const char *))
ee2752ea
UD
802 @{
803 size_t len = strlen (s);
804
805 /* @r{Resize the allocated memory if necessary.} */
bdc674d9 806 if (resultlen + len + 1 > allocated)
ee2752ea 807 @{
bdc674d9
PE
808 allocated += len;
809 newp = reallocarray (result, allocated, 2);
810 allocated *= 2;
ee2752ea
UD
811 if (newp == NULL)
812 @{
813 free (result);
814 return NULL;
815 @}
ee2752ea
UD
816 result = newp;
817 @}
818
bdc674d9
PE
819 memcpy (result + resultlen, s, len);
820 resultlen += len;
ee2752ea
UD
821 @}
822
823 /* @r{Terminate the result string.} */
bdc674d9 824 result[resultlen++] = '\0';
ee2752ea
UD
825
826 /* @r{Resize memory to the optimal size.} */
bdc674d9 827 newp = realloc (result, resultlen);
ee2752ea
UD
828 if (newp != NULL)
829 result = newp;
830
831 va_end (ap);
832 @}
833
834 return result;
835@}
836@end smallexample
837
838With a bit more knowledge about the input strings one could fine-tune
839the memory allocation. The difference we are pointing to here is that
840we don't use @code{strcat} anymore. We always keep track of the length
f0f308c1 841of the current intermediate result so we can save ourselves the search for the
ee2752ea 842end of the string and use @code{mempcpy}. Please note that we also
f0f308c1
RJ
843don't use @code{stpcpy} which might seem more natural since we are handling
844strings. But this is not necessary since we already know the
ee2752ea 845length of the string and therefore can use the faster memory copying
8a2f1f5b 846function. The example would work for wide characters the same way.
ee2752ea
UD
847
848Whenever a programmer feels the need to use @code{strcat} she or he
f0f308c1 849should think twice and look through the program to see whether the code cannot
1fb22592 850be rewritten to take advantage of already calculated results.
d2fda60e
PE
851The related functions @code{strlcat}, @code{strncat},
852@code{wcscat} and @code{wcsncat}
1fb22592
PE
853are almost always unnecessary, too.
854Again: it is almost always unnecessary to use functions like @code{strcat}.
ee2752ea 855
0a13c9e9
PE
856@node Truncating Strings
857@section Truncating Strings while Copying
858@cindex truncating strings
859@cindex string truncation
860
861The functions described in this section copy or concatenate the
862possibly-truncated contents of a string or array to another, and
863similarly for wide strings. They follow the string-copying functions
864in their header conventions. @xref{Copying Strings and Arrays}. The
865@samp{str} functions are declared in the header file @file{string.h}
866and the @samp{wc} functions are declared in the file @file{wchar.h}.
867
1fb22592
PE
868As noted below, these functions are problematic as their callers may
869have truncation-related bugs and performance issues.
870
0a13c9e9 871@deftypefun {char *} strncpy (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size})
a448ee41 872@standards{C90, string.h}
0a13c9e9
PE
873@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
874This function is similar to @code{strcpy} but always copies exactly
875@var{size} bytes into @var{to}.
876
877If @var{from} does not contain a null byte in its first @var{size}
878bytes, @code{strncpy} copies just the first @var{size} bytes. In this
879case no null terminator is written into @var{to}.
880
881Otherwise @var{from} must be a string with length less than
882@var{size}. In this case @code{strncpy} copies all of @var{from},
883followed by enough null bytes to add up to @var{size} bytes in all.
884
885The behavior of @code{strncpy} is undefined if the strings overlap.
886
887This function was designed for now-rarely-used arrays consisting of
888non-null bytes followed by zero or more null bytes. It needs to set
889all @var{size} bytes of the destination, even when @var{size} is much
890greater than the length of @var{from}. As noted below, this function
1fb22592 891is generally a poor choice for processing strings.
0a13c9e9
PE
892@end deftypefun
893
0a13c9e9 894@deftypefun {wchar_t *} wcsncpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size})
d08a7e4c 895@standards{ISO, wchar.h}
0a13c9e9
PE
896@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
897This function is similar to @code{wcscpy} but always copies exactly
898@var{size} wide characters into @var{wto}.
899
900If @var{wfrom} does not contain a null wide character in its first
901@var{size} wide characters, then @code{wcsncpy} copies just the first
902@var{size} wide characters. In this case no null terminator is
903written into @var{wto}.
904
905Otherwise @var{wfrom} must be a wide string with length less than
906@var{size}. In this case @code{wcsncpy} copies all of @var{wfrom},
907followed by enough null wide characters to add up to @var{size} wide
908characters in all.
909
910The behavior of @code{wcsncpy} is undefined if the strings overlap.
911
912This function is the wide-character counterpart of @code{strncpy} and
913suffers from most of the problems that @code{strncpy} does. For
914example, as noted below, this function is generally a poor choice for
1fb22592 915processing strings.
0a13c9e9
PE
916@end deftypefun
917
0a13c9e9 918@deftypefun {char *} strndup (const char *@var{s}, size_t @var{size})
d08a7e4c 919@standards{GNU, string.h}
0a13c9e9
PE
920@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
921This function is similar to @code{strdup} but always copies at most
922@var{size} bytes into the newly allocated string.
923
924If the length of @var{s} is more than @var{size}, then @code{strndup}
925copies just the first @var{size} bytes and adds a closing null byte.
926Otherwise all bytes are copied and the string is terminated.
927
928This function differs from @code{strncpy} in that it always terminates
929the destination string.
930
931As noted below, this function is generally a poor choice for
1fb22592 932processing strings.
0a13c9e9
PE
933
934@code{strndup} is a GNU extension.
935@end deftypefun
936
0a13c9e9 937@deftypefn {Macro} {char *} strndupa (const char *@var{s}, size_t @var{size})
d08a7e4c 938@standards{GNU, string.h}
0a13c9e9
PE
939@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
940This function is similar to @code{strndup} but like @code{strdupa} it
941allocates the new string using @code{alloca} @pxref{Variable Size
942Automatic}. The same advantages and limitations of @code{strdupa} are
943valid for @code{strndupa}, too.
944
945This function is implemented only as a macro, just like @code{strdupa}.
946Just as @code{strdupa} this macro also must not be used inside the
947parameter list in a function call.
948
949As noted below, this function is generally a poor choice for
1fb22592 950processing strings.
0a13c9e9
PE
951
952@code{strndupa} is only available if GNU CC is used.
953@end deftypefn
954
0a13c9e9 955@deftypefun {char *} stpncpy (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size})
d08a7e4c 956@standards{GNU, string.h}
0a13c9e9
PE
957@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
958This function is similar to @code{stpcpy} but copies always exactly
959@var{size} bytes into @var{to}.
960
961If the length of @var{from} is more than @var{size}, then @code{stpncpy}
962copies just the first @var{size} bytes and returns a pointer to the
963byte directly following the one which was copied last. Note that in
964this case there is no null terminator written into @var{to}.
965
966If the length of @var{from} is less than @var{size}, then @code{stpncpy}
967copies all of @var{from}, followed by enough null bytes to add up
968to @var{size} bytes in all. This behavior is rarely useful, but it
969is implemented to be useful in contexts where this behavior of the
970@code{strncpy} is used. @code{stpncpy} returns a pointer to the
971@emph{first} written null byte.
972
973This function is not part of ISO or POSIX but was found useful while
974developing @theglibc{} itself.
975
976Its behavior is undefined if the strings overlap. The function is
977declared in @file{string.h}.
978
979As noted below, this function is generally a poor choice for
1fb22592 980processing strings.
0a13c9e9
PE
981@end deftypefun
982
0a13c9e9 983@deftypefun {wchar_t *} wcpncpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size})
d08a7e4c 984@standards{GNU, wchar.h}
0a13c9e9
PE
985@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
986This function is similar to @code{wcpcpy} but copies always exactly
987@var{wsize} wide characters into @var{wto}.
988
989If the length of @var{wfrom} is more than @var{size}, then
990@code{wcpncpy} copies just the first @var{size} wide characters and
991returns a pointer to the wide character directly following the last
992non-null wide character which was copied last. Note that in this case
993there is no null terminator written into @var{wto}.
994
995If the length of @var{wfrom} is less than @var{size}, then @code{wcpncpy}
996copies all of @var{wfrom}, followed by enough null wide characters to add up
997to @var{size} wide characters in all. This behavior is rarely useful, but it
998is implemented to be useful in contexts where this behavior of the
999@code{wcsncpy} is used. @code{wcpncpy} returns a pointer to the
1000@emph{first} written null wide character.
1001
1002This function is not part of ISO or POSIX but was found useful while
1003developing @theglibc{} itself.
1004
1005Its behavior is undefined if the strings overlap.
1006
1007As noted below, this function is generally a poor choice for
1fb22592 1008processing strings.
0a13c9e9
PE
1009
1010@code{wcpncpy} is a GNU extension.
1011@end deftypefun
1012
8a2f1f5b 1013@deftypefun {char *} strncat (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size})
d08a7e4c 1014@standards{ISO, string.h}
11087373 1015@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4 1016This function is like @code{strcat} except that not more than @var{size}
2cc4b9cc
PE
1017bytes from @var{from} are appended to the end of @var{to}, and
1018@var{from} need not be null-terminated. A single null byte is also
1019always appended to @var{to}, so the total
28f540f4
RM
1020allocated size of @var{to} must be at least @code{@var{size} + 1} bytes
1021longer than its initial length.
1022
1023The @code{strncat} function could be implemented like this:
1024
1025@smallexample
1026@group
1027char *
1028strncat (char *to, const char *from, size_t size)
1029@{
5d1d4918
PE
1030 size_t len = strlen (to);
1031 memcpy (to + len, from, strnlen (from, size));
1032 to[len + strnlen (from, size)] = '\0';
28f540f4
RM
1033 return to;
1034@}
1035@end group
1036@end smallexample
1037
1038The behavior of @code{strncat} is undefined if the strings overlap.
0a13c9e9
PE
1039
1040As a companion to @code{strncpy}, @code{strncat} was designed for
1041now-rarely-used arrays consisting of non-null bytes followed by zero
1042or more null bytes. As noted below, this function is generally a poor
1fb22592 1043choice for processing strings. Also, this function has significant
0a13c9e9 1044performance issues. @xref{Concatenating Strings}.
28f540f4
RM
1045@end deftypefun
1046
8a2f1f5b 1047@deftypefun {wchar_t *} wcsncat (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size})
d08a7e4c 1048@standards{ISO, wchar.h}
11087373 1049@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b 1050This function is like @code{wcscat} except that not more than @var{size}
2cc4b9cc
PE
1051wide characters from @var{from} are appended to the end of @var{to},
1052and @var{from} need not be null-terminated. A single null wide
1053character is also always appended to @var{to}, so the total allocated
1054size of @var{to} must be at least @code{wcsnlen (@var{wfrom},
1055@var{size}) + 1} wide characters longer than its initial length.
8a2f1f5b
UD
1056
1057The @code{wcsncat} function could be implemented like this:
1058
1059@smallexample
1060@group
1061wchar_t *
1062wcsncat (wchar_t *restrict wto, const wchar_t *restrict wfrom,
1063 size_t size)
1064@{
5d1d4918
PE
1065 size_t len = wcslen (wto);
1066 memcpy (wto + len, wfrom, wcsnlen (wfrom, size) * sizeof (wchar_t));
1067 wto[len + wcsnlen (wfrom, size)] = L'\0';
8a2f1f5b
UD
1068 return wto;
1069@}
1070@end group
1071@end smallexample
1072
1073The behavior of @code{wcsncat} is undefined if the strings overlap.
28f540f4 1074
0a13c9e9 1075As noted below, this function is generally a poor choice for
1fb22592 1076processing strings. Also, this function has significant performance
0a13c9e9
PE
1077issues. @xref{Concatenating Strings}.
1078@end deftypefun
1079
d2fda60e
PE
1080@deftypefun size_t strlcpy (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size})
1081@standards{BSD, string.h}
1082@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
1083This function copies the string @var{from} to the destination array
1084@var{to}, limiting the result's size (including the null terminator)
1085to @var{size}. The caller should ensure that @var{size} includes room
1086for the result's terminating null byte.
1087
1088If @var{size} is greater than the length of the string @var{from},
1089this function copies the non-null bytes of the string
1090@var{from} to the destination array @var{to},
1091and terminates the copy with a null byte. Like other
1092string functions such as @code{strcpy}, but unlike @code{strncpy}, any
1093remaining bytes in the destination array remain unchanged.
1094
1095If @var{size} is nonzero and less than or equal to the the length of the string
1096@var{from}, this function copies only the first @samp{@var{size} - 1}
1097bytes to the destination array @var{to}, and writes a terminating null
1098byte to the last byte of the array.
1099
1100This function returns the length of the string @var{from}. This means
1101that truncation occurs if and only if the returned value is greater
1102than or equal to @var{size}.
1103
1104The behavior is undefined if @var{to} or @var{from} is a null pointer,
1105or if the destination array's size is less than @var{size}, or if the
1106string @var{from} overlaps the first @var{size} bytes of the
1107destination array.
1108
1109As noted below, this function is generally a poor choice for
1110processing strings. Also, this function has a performance issue,
1111as its time cost is proportional to the length of @var{from}
1112even when @var{size} is small.
1113
1114This function is derived from OpenBSD 2.4.
1115@end deftypefun
1116
1117@deftypefun size_t wcslcpy (wchar_t *restrict @var{to}, const wchar_t *restrict @var{from}, size_t @var{size})
1118@standards{BSD, string.h}
1119@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
1120This function is a variant of @code{strlcpy} for wide strings.
1121The @var{size} argument counts the length of the destination buffer in
1122wide characters (and not bytes).
1123
1124This function is derived from BSD.
1125@end deftypefun
1126
1127@deftypefun size_t strlcat (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size})
1128@standards{BSD, string.h}
1129@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
1130This function appends the string @var{from} to the
1131string @var{to}, limiting the result's total size (including the null
1132terminator) to @var{size}. The caller should ensure that @var{size}
1133includes room for the result's terminating null byte.
1134
1135This function copies as much as possible of the string @var{from} into
1136the array at @var{to} of @var{size} bytes, starting at the terminating
1137null byte of the original string @var{to}. In effect, this appends
1138the string @var{from} to the string @var{to}. Although the resulting
1139string will contain a null terminator, it can be truncated (not all
1140bytes in @var{from} may be copied).
1141
1142This function returns the sum of the original length of @var{to} and
1143the length of @var{from}. This means that truncation occurs if and
1144only if the returned value is greater than or equal to @var{size}.
1145
1146The behavior is undefined if @var{to} or @var{from} is a null pointer,
1147or if the destination array's size is less than @var{size}, or if the
1148destination array does not contain a null byte in its first @var{size}
1149bytes, or if the string @var{from} overlaps the first @var{size} bytes
1150of the destination array.
1151
1152As noted below, this function is generally a poor choice for
1153processing strings. Also, this function has significant performance
1154issues. @xref{Concatenating Strings}.
1155
1156This function is derived from OpenBSD 2.4.
1157@end deftypefun
1158
1159@deftypefun size_t wcslcat (wchar_t *restrict @var{to}, const wchar_t *restrict @var{from}, size_t @var{size})
1160@standards{BSD, string.h}
1161@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
1162This function is a variant of @code{strlcat} for wide strings.
1163The @var{size} argument counts the length of the destination buffer in
1164wide characters (and not bytes).
1165
1166This function is derived from BSD.
1167@end deftypefun
1168
0a13c9e9 1169Because these functions can abruptly truncate strings or wide strings,
1fb22592 1170they are generally poor choices for processing them. When copying or
0a13c9e9
PE
1171concatening multibyte strings, they can truncate within a multibyte
1172character so that the result is not a valid multibyte string. When
1173combining or concatenating multibyte or wide strings, they may
1174truncate the output after a combining character, resulting in a
1175corrupted grapheme. They can cause bugs even when processing
1176single-byte strings: for example, when calculating an ASCII-only user
1177name, a truncated name can identify the wrong user.
1178
1179Although some buffer overruns can be prevented by manually replacing
1180calls to copying functions with calls to truncation functions, there
54ae6d81
PE
1181are often easier and safer automatic techniques, such as fortification
1182(@pxref{Source Fortification}) and AddressSanitizer
1183(@pxref{Instrumentation Options,, Program Instrumentation Options, gcc, Using GCC}).
1184Because truncation functions can mask
0a13c9e9
PE
1185application bugs that would otherwise be caught by the automatic
1186techniques, these functions should be used only when the application's
1187underlying logic requires truncation.
1188
1189@strong{Note:} GNU programs should not truncate strings or wide
1190strings to fit arbitrary size limits. @xref{Semantics, , Writing
1191Robust Programs, standards, The GNU Coding Standards}. Instead of
1192string-truncation functions, it is usually better to use dynamic
1193memory allocation (@pxref{Unconstrained Allocation}) and functions
1194such as @code{strdup} or @code{asprintf} to construct strings.
28f540f4 1195
b4012b75 1196@node String/Array Comparison
28f540f4
RM
1197@section String/Array Comparison
1198@cindex comparing strings and arrays
1199@cindex string comparison functions
1200@cindex array comparison functions
1201@cindex predicates on strings
1202@cindex predicates on arrays
1203
1204You can use the functions in this section to perform comparisons on the
1205contents of strings and arrays. As well as checking for equality, these
1206functions can also be used as the ordering functions for sorting
1207operations. @xref{Searching and Sorting}, for an example of this.
1208
1209Unlike most comparison operations in C, the string comparison functions
1210return a nonzero value if the strings are @emph{not} equivalent rather
1211than if they are. The sign of the value indicates the relative ordering
2cc4b9cc 1212of the first part of the strings that are not equivalent: a
28f540f4 1213negative value indicates that the first string is ``less'' than the
a5113b14 1214second, while a positive value indicates that the first string is
28f540f4
RM
1215``greater''.
1216
1217The most common use of these functions is to check only for equality.
1218This is canonically done with an expression like @w{@samp{! strcmp (s1, s2)}}.
1219
1220All of these functions are declared in the header file @file{string.h}.
1221@pindex string.h
1222
28f540f4 1223@deftypefun int memcmp (const void *@var{a1}, const void *@var{a2}, size_t @var{size})
d08a7e4c 1224@standards{ISO, string.h}
11087373 1225@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
1226The function @code{memcmp} compares the @var{size} bytes of memory
1227beginning at @var{a1} against the @var{size} bytes of memory beginning
1228at @var{a2}. The value returned has the same sign as the difference
1229between the first differing pair of bytes (interpreted as @code{unsigned
1230char} objects, then promoted to @code{int}).
1231
1232If the contents of the two blocks are equal, @code{memcmp} returns
1233@code{0}.
1234@end deftypefun
1235
8a2f1f5b 1236@deftypefun int wmemcmp (const wchar_t *@var{a1}, const wchar_t *@var{a2}, size_t @var{size})
d08a7e4c 1237@standards{ISO, wchar.h}
11087373 1238@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
1239The function @code{wmemcmp} compares the @var{size} wide characters
1240beginning at @var{a1} against the @var{size} wide characters beginning
1241at @var{a2}. The value returned is smaller than or larger than zero
1242depending on whether the first differing wide character is @var{a1} is
2cc4b9cc 1243smaller or larger than the corresponding wide character in @var{a2}.
8a2f1f5b
UD
1244
1245If the contents of the two blocks are equal, @code{wmemcmp} returns
1246@code{0}.
1247@end deftypefun
1248
28f540f4
RM
1249On arbitrary arrays, the @code{memcmp} function is mostly useful for
1250testing equality. It usually isn't meaningful to do byte-wise ordering
1251comparisons on arrays of things other than bytes. For example, a
1252byte-wise comparison on the bytes that make up floating-point numbers
1253isn't likely to tell you anything about the relationship between the
1254values of the floating-point numbers.
1255
8a2f1f5b
UD
1256@code{wmemcmp} is really only useful to compare arrays of type
1257@code{wchar_t} since the function looks at @code{sizeof (wchar_t)} bytes
1258at a time and this number of bytes is system dependent.
1259
28f540f4
RM
1260You should also be careful about using @code{memcmp} to compare objects
1261that can contain ``holes'', such as the padding inserted into structure
1262objects to enforce alignment requirements, extra space at the end of
2cc4b9cc 1263unions, and extra bytes at the ends of strings whose length is less
28f540f4
RM
1264than their allocated size. The contents of these ``holes'' are
1265indeterminate and may cause strange behavior when performing byte-wise
1266comparisons. For more predictable results, perform an explicit
1267component-wise comparison.
1268
1269For example, given a structure type definition like:
1270
1271@smallexample
1272struct foo
1273 @{
1274 unsigned char tag;
1275 union
1276 @{
1277 double f;
1278 long i;
1279 char *p;
1280 @} value;
1281 @};
1282@end smallexample
1283
1284@noindent
1285you are better off writing a specialized comparison function to compare
1286@code{struct foo} objects instead of comparing them with @code{memcmp}.
1287
28f540f4 1288@deftypefun int strcmp (const char *@var{s1}, const char *@var{s2})
d08a7e4c 1289@standards{ISO, string.h}
11087373 1290@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
1291The @code{strcmp} function compares the string @var{s1} against
1292@var{s2}, returning a value that has the same sign as the difference
2cc4b9cc 1293between the first differing pair of bytes (interpreted as
28f540f4
RM
1294@code{unsigned char} objects, then promoted to @code{int}).
1295
1296If the two strings are equal, @code{strcmp} returns @code{0}.
1297
1298A consequence of the ordering used by @code{strcmp} is that if @var{s1}
1299is an initial substring of @var{s2}, then @var{s1} is considered to be
1300``less than'' @var{s2}.
8a2f1f5b
UD
1301
1302@code{strcmp} does not take sorting conventions of the language the
1303strings are written in into account. To get that one has to use
1304@code{strcoll}.
1305@end deftypefun
1306
8a2f1f5b 1307@deftypefun int wcscmp (const wchar_t *@var{ws1}, const wchar_t *@var{ws2})
d08a7e4c 1308@standards{ISO, wchar.h}
11087373 1309@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b 1310
2cc4b9cc 1311The @code{wcscmp} function compares the wide string @var{ws1}
8a2f1f5b
UD
1312against @var{ws2}. The value returned is smaller than or larger than zero
1313depending on whether the first differing wide character is @var{ws1} is
2cc4b9cc 1314smaller or larger than the corresponding wide character in @var{ws2}.
8a2f1f5b
UD
1315
1316If the two strings are equal, @code{wcscmp} returns @code{0}.
1317
1318A consequence of the ordering used by @code{wcscmp} is that if @var{ws1}
1319is an initial substring of @var{ws2}, then @var{ws1} is considered to be
1320``less than'' @var{ws2}.
1321
1322@code{wcscmp} does not take sorting conventions of the language the
1323strings are written in into account. To get that one has to use
1324@code{wcscoll}.
28f540f4
RM
1325@end deftypefun
1326
28f540f4 1327@deftypefun int strcasecmp (const char *@var{s1}, const char *@var{s2})
d08a7e4c 1328@standards{BSD, string.h}
11087373
AO
1329@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
1330@c Although this calls tolower multiple times, it's a macro, and
1331@c strcasecmp is optimized so that the locale pointer is read only once.
1332@c There are some asm implementations too, for which the single-read
1333@c from locale TLS pointers also applies.
4547c1a4 1334This function is like @code{strcmp}, except that differences in case are
2cc4b9cc
PE
1335ignored, and its arguments must be multibyte strings.
1336How uppercase and lowercase characters are related is
4547c1a4
UD
1337determined by the currently selected locale. In the standard @code{"C"}
1338locale the characters @"A and @"a do not match but in a locale which
dd7d45e8 1339regards these characters as parts of the alphabet they do match.
28f540f4 1340
85c165be 1341@noindent
28f540f4
RM
1342@code{strcasecmp} is derived from BSD.
1343@end deftypefun
1344
8ded91fb 1345@deftypefun int wcscasecmp (const wchar_t *@var{ws1}, const wchar_t *@var{ws2})
d08a7e4c 1346@standards{GNU, wchar.h}
11087373
AO
1347@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
1348@c Since towlower is not a macro, the locale object may be read multiple
1349@c times.
8a2f1f5b
UD
1350This function is like @code{wcscmp}, except that differences in case are
1351ignored. How uppercase and lowercase characters are related is
1352determined by the currently selected locale. In the standard @code{"C"}
1353locale the characters @"A and @"a do not match but in a locale which
1354regards these characters as parts of the alphabet they do match.
1355
1356@noindent
1357@code{wcscasecmp} is a GNU extension.
1358@end deftypefun
1359
8a2f1f5b 1360@deftypefun int strncmp (const char *@var{s1}, const char *@var{s2}, size_t @var{size})
d08a7e4c 1361@standards{ISO, string.h}
11087373 1362@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b 1363This function is the similar to @code{strcmp}, except that no more than
2cc4b9cc
PE
1364@var{size} bytes are compared. In other words, if the two
1365strings are the same in their first @var{size} bytes, the
8a2f1f5b
UD
1366return value is zero.
1367@end deftypefun
1368
8a2f1f5b 1369@deftypefun int wcsncmp (const wchar_t *@var{ws1}, const wchar_t *@var{ws2}, size_t @var{size})
d08a7e4c 1370@standards{ISO, wchar.h}
11087373 1371@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
f0f308c1 1372This function is similar to @code{wcscmp}, except that no more than
8a2f1f5b
UD
1373@var{size} wide characters are compared. In other words, if the two
1374strings are the same in their first @var{size} wide characters, the
1375return value is zero.
1376@end deftypefun
1377
28f540f4 1378@deftypefun int strncasecmp (const char *@var{s1}, const char *@var{s2}, size_t @var{n})
d08a7e4c 1379@standards{BSD, string.h}
11087373 1380@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
28f540f4 1381This function is like @code{strncmp}, except that differences in case
2cc4b9cc
PE
1382are ignored, and the compared parts of the arguments should consist of
1383valid multibyte characters.
1384Like @code{strcasecmp}, it is locale dependent how
dd7d45e8 1385uppercase and lowercase characters are related.
28f540f4 1386
85c165be 1387@noindent
28f540f4
RM
1388@code{strncasecmp} is a GNU extension.
1389@end deftypefun
1390
8a2f1f5b 1391@deftypefun int wcsncasecmp (const wchar_t *@var{ws1}, const wchar_t *@var{s2}, size_t @var{n})
d08a7e4c 1392@standards{GNU, wchar.h}
11087373 1393@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
8a2f1f5b
UD
1394This function is like @code{wcsncmp}, except that differences in case
1395are ignored. Like @code{wcscasecmp}, it is locale dependent how
1396uppercase and lowercase characters are related.
1397
1398@noindent
1399@code{wcsncasecmp} is a GNU extension.
28f540f4
RM
1400@end deftypefun
1401
8a2f1f5b
UD
1402Here are some examples showing the use of @code{strcmp} and
1403@code{strncmp} (equivalent examples can be constructed for the wide
1404character functions). These examples assume the use of the ASCII
1405character set. (If some other character set---say, EBCDIC---is used
1406instead, then the glyphs are associated with different numeric codes,
1407and the return values and ordering may differ.)
28f540f4
RM
1408
1409@smallexample
1410strcmp ("hello", "hello")
1411 @result{} 0 /* @r{These two strings are the same.} */
1412strcmp ("hello", "Hello")
1413 @result{} 32 /* @r{Comparisons are case-sensitive.} */
1414strcmp ("hello", "world")
2cc4b9cc 1415 @result{} -15 /* @r{The byte @code{'h'} comes before @code{'w'}.} */
28f540f4 1416strcmp ("hello", "hello, world")
2cc4b9cc 1417 @result{} -44 /* @r{Comparing a null byte against a comma.} */
6952e59e 1418strncmp ("hello", "hello, world", 5)
2cc4b9cc 1419 @result{} 0 /* @r{The initial 5 bytes are the same.} */
28f540f4 1420strncmp ("hello, world", "hello, stupid world!!!", 5)
2cc4b9cc 1421 @result{} 0 /* @r{The initial 5 bytes are the same.} */
28f540f4
RM
1422@end smallexample
1423
1f205a47 1424@deftypefun int strverscmp (const char *@var{s1}, const char *@var{s2})
d08a7e4c 1425@standards{GNU, string.h}
11087373
AO
1426@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
1427@c Calls isdigit multiple times, locale may change in between.
1f205a47 1428The @code{strverscmp} function compares the string @var{s1} against
f2282d42
RM
1429@var{s2}, considering them as holding indices/version numbers. The
1430return value follows the same conventions as found in the
1431@code{strcmp} function. In fact, if @var{s1} and @var{s2} contain no
f4a36548
FW
1432digits, @code{strverscmp} behaves like @code{strcmp}
1433(in the sense that the sign of the result is the same).
1f205a47 1434
f4a36548
FW
1435The comparison algorithm which the @code{strverscmp} function implements
1436differs slightly from other version-comparison algorithms. The
1437implementation is based on a finite-state machine, whose behavior is
1438approximated below.
1f205a47
UD
1439
1440@itemize @bullet
1441@item
f4a36548
FW
1442The input strings are each split into sequences of non-digits and
1443digits. These sequences can be empty at the beginning and end of the
1444string. Digits are determined by the @code{isdigit} function and are
1445thus subject to the current locale.
1f205a47
UD
1446
1447@item
f4a36548
FW
1448Comparison starts with a (possibly empty) non-digit sequence. The first
1449non-equal sequences of non-digits or digits determines the outcome of
1450the comparison.
1f205a47
UD
1451
1452@item
f4a36548
FW
1453Corresponding non-digit sequences in both strings are compared
1454lexicographically if their lengths are equal. If the lengths differ,
1455the shorter non-digit sequence is extended with the input string
1456character immediately following it (which may be the null terminator),
1457the other sequence is truncated to be of the same (extended) length, and
1458these two sequences are compared lexicographically. In the last case,
1459the sequence comparison determines the result of the function because
1460the extension character (or some character before it) is necessarily
1461different from the character at the same offset in the other input
1462string.
1463
1464@item
1465For two sequences of digits, the number of leading zeros is counted (which
1466can be zero). If the count differs, the string with more leading zeros
1467in the digit sequence is considered smaller than the other string.
1468
1469@item
1470If the two sequences of digits have no leading zeros, they are compared
1471as integers, that is, the string with the longer digit sequence is
1472deemed larger, and if both sequences are of equal length, they are
1473compared lexicographically.
1474
1475@item
1476If both digit sequences start with a zero and have an equal number of
1477leading zeros, they are compared lexicographically if their lengths are
1478the same. If the lengths differ, the shorter sequence is extended with
1479the following character in its input string, and the other sequence is
1480truncated to the same length, and both sequences are compared
1481lexicographically (similar to the non-digit sequence case above).
1f205a47
UD
1482@end itemize
1483
f4a36548
FW
1484The treatment of leading zeros and the tie-breaking extension characters
1485(which in effect propagate across non-digit/digit sequence boundaries)
1486differs from other version-comparison algorithms.
1487
1f205a47
UD
1488@smallexample
1489strverscmp ("no digit", "no digit")
0bc93a2f 1490 @result{} 0 /* @r{same behavior as strcmp.} */
1f205a47
UD
1491strverscmp ("item#99", "item#100")
1492 @result{} <0 /* @r{same prefix, but 99 < 100.} */
1493strverscmp ("alpha1", "alpha001")
f4a36548 1494 @result{} >0 /* @r{different number of leading zeros (0 and 2).} */
1f205a47 1495strverscmp ("part1_f012", "part1_f01")
f4a36548 1496 @result{} >0 /* @r{lexicographical comparison with leading zeros.} */
1f205a47 1497strverscmp ("foo.009", "foo.0")
f4a36548 1498 @result{} <0 /* @r{different number of leading zeros (2 and 1).} */
1f205a47
UD
1499@end smallexample
1500
1f205a47
UD
1501@code{strverscmp} is a GNU extension.
1502@end deftypefun
1503
28f540f4 1504@deftypefun int bcmp (const void *@var{a1}, const void *@var{a2}, size_t @var{size})
d08a7e4c 1505@standards{BSD, string.h}
11087373 1506@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
1507This is an obsolete alias for @code{memcmp}, derived from BSD.
1508@end deftypefun
1509
b4012b75 1510@node Collation Functions
28f540f4
RM
1511@section Collation Functions
1512
1513@cindex collating strings
1514@cindex string collation functions
1515
1516In some locales, the conventions for lexicographic ordering differ from
1517the strict numeric ordering of character codes. For example, in Spanish
1518most glyphs with diacritical marks such as accents are not considered
a5177499
BS
1519distinct letters for the purposes of collation. On the other hand, in
1520Czech the two-character sequence @samp{ch} is treated as a single letter
1521that is collated between @samp{h} and @samp{i}.
28f540f4
RM
1522
1523You can use the functions @code{strcoll} and @code{strxfrm} (declared in
8a2f1f5b
UD
1524the headers file @file{string.h}) and @code{wcscoll} and @code{wcsxfrm}
1525(declared in the headers file @file{wchar}) to compare strings using a
1526collation ordering appropriate for the current locale. The locale used
1527by these functions in particular can be specified by setting the locale
1528for the @code{LC_COLLATE} category; see @ref{Locales}.
28f540f4 1529@pindex string.h
8a2f1f5b 1530@pindex wchar.h
28f540f4
RM
1531
1532In the standard C locale, the collation sequence for @code{strcoll} is
8a2f1f5b
UD
1533the same as that for @code{strcmp}. Similarly, @code{wcscoll} and
1534@code{wcscmp} are the same in this situation.
28f540f4
RM
1535
1536Effectively, the way these functions work is by applying a mapping to
2cc4b9cc
PE
1537transform the characters in a multibyte string to a byte
1538sequence that represents
28f540f4
RM
1539the string's position in the collating sequence of the current locale.
1540Comparing two such byte sequences in a simple fashion is equivalent to
1541comparing the strings with the locale's collating sequence.
1542
8a2f1f5b
UD
1543The functions @code{strcoll} and @code{wcscoll} perform this translation
1544implicitly, in order to do one comparison. By contrast, @code{strxfrm}
1545and @code{wcsxfrm} perform the mapping explicitly. If you are making
1546multiple comparisons using the same string or set of strings, it is
1547likely to be more efficient to use @code{strxfrm} or @code{wcsxfrm} to
1548transform all the strings just once, and subsequently compare the
1549transformed strings with @code{strcmp} or @code{wcscmp}.
28f540f4 1550
28f540f4 1551@deftypefun int strcoll (const char *@var{s1}, const char *@var{s2})
d08a7e4c 1552@standards{ISO, string.h}
11087373
AO
1553@safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
1554@c Calls strcoll_l with the current locale, which dereferences only the
1555@c LC_COLLATE data pointer.
28f540f4
RM
1556The @code{strcoll} function is similar to @code{strcmp} but uses the
1557collating sequence of the current locale for collation (the
2cc4b9cc 1558@code{LC_COLLATE} locale). The arguments are multibyte strings.
28f540f4
RM
1559@end deftypefun
1560
8a2f1f5b 1561@deftypefun int wcscoll (const wchar_t *@var{ws1}, const wchar_t *@var{ws2})
d08a7e4c 1562@standards{ISO, wchar.h}
11087373
AO
1563@safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
1564@c Same as strcoll, but calling wcscoll_l.
8a2f1f5b
UD
1565The @code{wcscoll} function is similar to @code{wcscmp} but uses the
1566collating sequence of the current locale for collation (the
1567@code{LC_COLLATE} locale).
1568@end deftypefun
1569
28f540f4
RM
1570Here is an example of sorting an array of strings, using @code{strcoll}
1571to compare them. The actual sort algorithm is not written here; it
1572comes from @code{qsort} (@pxref{Array Sort Function}). The job of the
1573code shown here is to say how to compare the strings while sorting them.
1574(Later on in this section, we will show a way to do this more
1575efficiently using @code{strxfrm}.)
1576
1577@smallexample
1578/* @r{This is the comparison function used with @code{qsort}.} */
1579
1580int
e39745ff 1581compare_elements (const void *v1, const void *v2)
28f540f4 1582@{
e39745ff 1583 char * const *p1 = v1;
a9f5ce09 1584 char * const *p2 = v2;
e39745ff 1585
28f540f4
RM
1586 return strcoll (*p1, *p2);
1587@}
1588
1589/* @r{This is the entry point---the function to sort}
1590 @r{strings using the locale's collating sequence.} */
1591
1592void
1593sort_strings (char **array, int nstrings)
1594@{
1595 /* @r{Sort @code{temp_array} by comparing the strings.} */
9fc19e48
UD
1596 qsort (array, nstrings,
1597 sizeof (char *), compare_elements);
28f540f4
RM
1598@}
1599@end smallexample
1600
1601@cindex converting string to collation order
8a2f1f5b 1602@deftypefun size_t strxfrm (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size})
d08a7e4c 1603@standards{ISO, string.h}
11087373 1604@safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
2cc4b9cc
PE
1605The function @code{strxfrm} transforms the multibyte string
1606@var{from} using the
8a2f1f5b 1607collation transformation determined by the locale currently selected for
28f540f4 1608collation, and stores the transformed string in the array @var{to}. Up
2cc4b9cc 1609to @var{size} bytes (including a terminating null byte) are
28f540f4
RM
1610stored.
1611
1612The behavior is undefined if the strings @var{to} and @var{from}
0a13c9e9 1613overlap; see @ref{Copying Strings and Arrays}.
28f540f4
RM
1614
1615The return value is the length of the entire transformed string. This
1616value is not affected by the value of @var{size}, but if it is greater
a5113b14
UD
1617or equal than @var{size}, it means that the transformed string did not
1618entirely fit in the array @var{to}. In this case, only as much of the
1619string as actually fits was stored. To get the whole transformed
1620string, call @code{strxfrm} again with a bigger output array.
28f540f4
RM
1621
1622The transformed string may be longer than the original string, and it
1623may also be shorter.
1624
2cc4b9cc
PE
1625If @var{size} is zero, no bytes are stored in @var{to}. In this
1626case, @code{strxfrm} simply returns the number of bytes that would
28f540f4 1627be the length of the transformed string. This is useful for determining
8a2f1f5b
UD
1628what size the allocated array should be. It does not matter what
1629@var{to} is if @var{size} is zero; @var{to} may even be a null pointer.
1630@end deftypefun
1631
8a2f1f5b 1632@deftypefun size_t wcsxfrm (wchar_t *restrict @var{wto}, const wchar_t *@var{wfrom}, size_t @var{size})
d08a7e4c 1633@standards{ISO, wchar.h}
11087373 1634@safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
2cc4b9cc 1635The function @code{wcsxfrm} transforms wide string @var{wfrom}
8a2f1f5b
UD
1636using the collation transformation determined by the locale currently
1637selected for collation, and stores the transformed string in the array
1638@var{wto}. Up to @var{size} wide characters (including a terminating null
2cc4b9cc 1639wide character) are stored.
8a2f1f5b
UD
1640
1641The behavior is undefined if the strings @var{wto} and @var{wfrom}
0a13c9e9 1642overlap; see @ref{Copying Strings and Arrays}.
8a2f1f5b 1643
2cc4b9cc 1644The return value is the length of the entire transformed wide
8a2f1f5b
UD
1645string. This value is not affected by the value of @var{size}, but if
1646it is greater or equal than @var{size}, it means that the transformed
2cc4b9cc
PE
1647wide string did not entirely fit in the array @var{wto}. In
1648this case, only as much of the wide string as actually fits
1649was stored. To get the whole transformed wide string, call
8a2f1f5b
UD
1650@code{wcsxfrm} again with a bigger output array.
1651
2cc4b9cc
PE
1652The transformed wide string may be longer than the original
1653wide string, and it may also be shorter.
8a2f1f5b 1654
2cc4b9cc 1655If @var{size} is zero, no wide characters are stored in @var{to}. In this
8a2f1f5b 1656case, @code{wcsxfrm} simply returns the number of wide characters that
2cc4b9cc 1657would be the length of the transformed wide string. This is
8a2f1f5b
UD
1658useful for determining what size the allocated array should be (remember
1659to multiply with @code{sizeof (wchar_t)}). It does not matter what
1660@var{wto} is if @var{size} is zero; @var{wto} may even be a null pointer.
28f540f4
RM
1661@end deftypefun
1662
1663Here is an example of how you can use @code{strxfrm} when
1664you plan to do many comparisons. It does the same thing as the previous
1665example, but much faster, because it has to transform each string only
1666once, no matter how many times it is compared with other strings. Even
1667the time needed to allocate and free storage is much less than the time
1668we save, when there are many strings.
1669
1670@smallexample
1671struct sorter @{ char *input; char *transformed; @};
1672
1673/* @r{This is the comparison function used with @code{qsort}}
1674 @r{to sort an array of @code{struct sorter}.} */
1675
1676int
e39745ff 1677compare_elements (const void *v1, const void *v2)
28f540f4 1678@{
e39745ff
AJ
1679 const struct sorter *p1 = v1;
1680 const struct sorter *p2 = v2;
1681
28f540f4
RM
1682 return strcmp (p1->transformed, p2->transformed);
1683@}
1684
1685/* @r{This is the entry point---the function to sort}
1686 @r{strings using the locale's collating sequence.} */
1687
1688void
1689sort_strings_fast (char **array, int nstrings)
1690@{
1691 struct sorter temp_array[nstrings];
1692 int i;
1693
1694 /* @r{Set up @code{temp_array}. Each element contains}
1695 @r{one input string and its transformed string.} */
1696 for (i = 0; i < nstrings; i++)
1697 @{
1698 size_t length = strlen (array[i]) * 2;
a5113b14 1699 char *transformed;
f2ea0f5b 1700 size_t transformed_length;
28f540f4
RM
1701
1702 temp_array[i].input = array[i];
1703
a5113b14
UD
1704 /* @r{First try a buffer perhaps big enough.} */
1705 transformed = (char *) xmalloc (length);
1706
1707 /* @r{Transform @code{array[i]}.} */
1708 transformed_length = strxfrm (transformed, array[i], length);
1709
1710 /* @r{If the buffer was not large enough, resize it}
1711 @r{and try again.} */
1712 if (transformed_length >= length)
28f540f4 1713 @{
a5113b14 1714 /* @r{Allocate the needed space. +1 for terminating}
2cc4b9cc 1715 @r{@code{'\0'} byte.} */
bdc674d9
PE
1716 transformed = xrealloc (transformed,
1717 transformed_length + 1);
a5113b14
UD
1718
1719 /* @r{The return value is not interesting because we know}
1720 @r{how long the transformed string is.} */
dd7d45e8
UD
1721 (void) strxfrm (transformed, array[i],
1722 transformed_length + 1);
28f540f4 1723 @}
a5113b14
UD
1724
1725 temp_array[i].transformed = transformed;
28f540f4
RM
1726 @}
1727
1728 /* @r{Sort @code{temp_array} by comparing transformed strings.} */
89e691f2
AM
1729 qsort (temp_array, nstrings,
1730 sizeof (struct sorter), compare_elements);
28f540f4
RM
1731
1732 /* @r{Put the elements back in the permanent array}
1733 @r{in their sorted order.} */
1734 for (i = 0; i < nstrings; i++)
1735 array[i] = temp_array[i].input;
1736
1737 /* @r{Free the strings we allocated.} */
1738 for (i = 0; i < nstrings; i++)
1739 free (temp_array[i].transformed);
1740@}
1741@end smallexample
1742
8a2f1f5b
UD
1743The interesting part of this code for the wide character version would
1744look like this:
1745
1746@smallexample
1747void
1748sort_strings_fast (wchar_t **array, int nstrings)
1749@{
1750 @dots{}
1751 /* @r{Transform @code{array[i]}.} */
1752 transformed_length = wcsxfrm (transformed, array[i], length);
1753
1754 /* @r{If the buffer was not large enough, resize it}
1755 @r{and try again.} */
1756 if (transformed_length >= length)
1757 @{
1758 /* @r{Allocate the needed space. +1 for terminating}
2cc4b9cc 1759 @r{@code{L'\0'} wide character.} */
bdc674d9
PE
1760 transformed = xreallocarray (transformed,
1761 transformed_length + 1,
1762 sizeof *transformed);
8a2f1f5b
UD
1763
1764 /* @r{The return value is not interesting because we know}
1765 @r{how long the transformed string is.} */
1766 (void) wcsxfrm (transformed, array[i],
1767 transformed_length + 1);
1768 @}
1769 @dots{}
1770@end smallexample
1771
1772@noindent
1773Note the additional multiplication with @code{sizeof (wchar_t)} in the
1774@code{realloc} call.
1775
1776@strong{Compatibility Note:} The string collation functions are a new
976780fd 1777feature of @w{ISO C90}. Older C dialects have no equivalent feature.
8a2f1f5b
UD
1778The wide character versions were introduced in @w{Amendment 1} to @w{ISO
1779C90}.
28f540f4 1780
b4012b75 1781@node Search Functions
28f540f4
RM
1782@section Search Functions
1783
1784This section describes library functions which perform various kinds
1785of searching operations on strings and arrays. These functions are
1786declared in the header file @file{string.h}.
1787@pindex string.h
1788@cindex search functions (for strings)
1789@cindex string search functions
1790
28f540f4 1791@deftypefun {void *} memchr (const void *@var{block}, int @var{c}, size_t @var{size})
d08a7e4c 1792@standards{ISO, string.h}
11087373 1793@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
1794This function finds the first occurrence of the byte @var{c} (converted
1795to an @code{unsigned char}) in the initial @var{size} bytes of the
1796object beginning at @var{block}. The return value is a pointer to the
1797located byte, or a null pointer if no match was found.
1798@end deftypefun
1799
8a2f1f5b 1800@deftypefun {wchar_t *} wmemchr (const wchar_t *@var{block}, wchar_t @var{wc}, size_t @var{size})
d08a7e4c 1801@standards{ISO, wchar.h}
11087373 1802@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
1803This function finds the first occurrence of the wide character @var{wc}
1804in the initial @var{size} wide characters of the object beginning at
1805@var{block}. The return value is a pointer to the located wide
1806character, or a null pointer if no match was found.
1807@end deftypefun
1808
87b56f36 1809@deftypefun {void *} rawmemchr (const void *@var{block}, int @var{c})
d08a7e4c 1810@standards{GNU, string.h}
11087373 1811@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
87b56f36
UD
1812Often the @code{memchr} function is used with the knowledge that the
1813byte @var{c} is available in the memory block specified by the
1814parameters. But this means that the @var{size} parameter is not really
1815needed and that the tests performed with it at runtime (to check whether
1816the end of the block is reached) are not needed.
1817
1818The @code{rawmemchr} function exists for just this situation which is
1819surprisingly frequent. The interface is similar to @code{memchr} except
1820that the @var{size} parameter is missing. The function will look beyond
1821the end of the block pointed to by @var{block} in case the programmer
6be569a4 1822made an error in assuming that the byte @var{c} is present in the block.
87b56f36
UD
1823In this case the result is unspecified. Otherwise the return value is a
1824pointer to the located byte.
1825
32c7acd4 1826When looking for the end of a string, use @code{strchr}.
87b56f36
UD
1827
1828This function is a GNU extension.
1829@end deftypefun
1830
ca747856 1831@deftypefun {void *} memrchr (const void *@var{block}, int @var{c}, size_t @var{size})
d08a7e4c 1832@standards{GNU, string.h}
11087373 1833@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
ca747856
RM
1834The function @code{memrchr} is like @code{memchr}, except that it searches
1835backwards from the end of the block defined by @var{block} and @var{size}
1836(instead of forwards from the front).
4efcb713
UD
1837
1838This function is a GNU extension.
a2d63612 1839@end deftypefun
ca747856 1840
28f540f4 1841@deftypefun {char *} strchr (const char *@var{string}, int @var{c})
d08a7e4c 1842@standards{ISO, string.h}
11087373 1843@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
2cc4b9cc
PE
1844The @code{strchr} function finds the first occurrence of the byte
1845@var{c} (converted to a @code{char}) in the string
28f540f4 1846beginning at @var{string}. The return value is a pointer to the located
2cc4b9cc 1847byte, or a null pointer if no match was found.
28f540f4
RM
1848
1849For example,
1850@smallexample
1851strchr ("hello, world", 'l')
1852 @result{} "llo, world"
1853strchr ("hello, world", '?')
1854 @result{} NULL
a5113b14 1855@end smallexample
28f540f4 1856
2cc4b9cc 1857The terminating null byte is considered to be part of the string,
28f540f4 1858so you can use this function get a pointer to the end of a string by
2cc4b9cc 1859specifying zero as the value of the @var{c} argument.
0520adde
FB
1860
1861When @code{strchr} returns a null pointer, it does not let you know
2cc4b9cc 1862the position of the terminating null byte it has found. If you
0520adde
FB
1863need that information, it is better (but less portable) to use
1864@code{strchrnul} than to search for it a second time.
8a2f1f5b
UD
1865@end deftypefun
1866
f801cf7b 1867@deftypefun {wchar_t *} wcschr (const wchar_t *@var{wstring}, wchar_t @var{wc})
d08a7e4c 1868@standards{ISO, wchar.h}
11087373 1869@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b 1870The @code{wcschr} function finds the first occurrence of the wide
2cc4b9cc 1871character @var{wc} in the wide string
8a2f1f5b
UD
1872beginning at @var{wstring}. The return value is a pointer to the
1873located wide character, or a null pointer if no match was found.
1874
2cc4b9cc
PE
1875The terminating null wide character is considered to be part of the wide
1876string, so you can use this function get a pointer to the end
1877of a wide string by specifying a null wide character as the
8a2f1f5b
UD
1878value of the @var{wc} argument. It would be better (but less portable)
1879to use @code{wcschrnul} in this case, though.
28f540f4
RM
1880@end deftypefun
1881
0e4ee106 1882@deftypefun {char *} strchrnul (const char *@var{string}, int @var{c})
d08a7e4c 1883@standards{GNU, string.h}
11087373 1884@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
0e4ee106 1885@code{strchrnul} is the same as @code{strchr} except that if it does
2cc4b9cc
PE
1886not find the byte, it returns a pointer to string's terminating
1887null byte rather than a null pointer.
8a2f1f5b
UD
1888
1889This function is a GNU extension.
1890@end deftypefun
1891
8a2f1f5b 1892@deftypefun {wchar_t *} wcschrnul (const wchar_t *@var{wstring}, wchar_t @var{wc})
d08a7e4c 1893@standards{GNU, wchar.h}
11087373 1894@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b 1895@code{wcschrnul} is the same as @code{wcschr} except that if it does not
2cc4b9cc 1896find the wide character, it returns a pointer to the wide string's
8a2f1f5b
UD
1897terminating null wide character rather than a null pointer.
1898
1899This function is a GNU extension.
28f540f4
RM
1900@end deftypefun
1901
ec28fc7c 1902One useful, but unusual, use of the @code{strchr}
2cc4b9cc 1903function is when one wants to have a pointer pointing to the null byte
ee2752ea
UD
1904terminating a string. This is often written in this way:
1905
1906@smallexample
1907 s += strlen (s);
1908@end smallexample
1909
1910@noindent
1911This is almost optimal but the addition operation duplicated a bit of
1912the work already done in the @code{strlen} function. A better solution
1913is this:
1914
1915@smallexample
1916 s = strchr (s, '\0');
1917@end smallexample
1918
1919There is no restriction on the second parameter of @code{strchr} so it
2cc4b9cc 1920could very well also be zero. Those readers thinking very
ee2752ea 1921hard about this might now point out that the @code{strchr} function is
8c474db5 1922more expensive than the @code{strlen} function since we have two abort
1f77f049 1923criteria. This is right. But in @theglibc{} the implementation of
0e4ee106 1924@code{strchr} is optimized in a special way so that @code{strchr}
8c474db5 1925actually is faster.
ee2752ea 1926
28f540f4 1927@deftypefun {char *} strrchr (const char *@var{string}, int @var{c})
d08a7e4c 1928@standards{ISO, string.h}
11087373 1929@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4
RM
1930The function @code{strrchr} is like @code{strchr}, except that it searches
1931backwards from the end of the string @var{string} (instead of forwards
1932from the front).
1933
1934For example,
1935@smallexample
1936strrchr ("hello, world", 'l')
1937 @result{} "ld"
1938@end smallexample
1939@end deftypefun
1940
4315f45c 1941@deftypefun {wchar_t *} wcsrchr (const wchar_t *@var{wstring}, wchar_t @var{wc})
d08a7e4c 1942@standards{ISO, wchar.h}
11087373 1943@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
1944The function @code{wcsrchr} is like @code{wcschr}, except that it searches
1945backwards from the end of the string @var{wstring} (instead of forwards
1946from the front).
1947@end deftypefun
1948
28f540f4 1949@deftypefun {char *} strstr (const char *@var{haystack}, const char *@var{needle})
d08a7e4c 1950@standards{ISO, string.h}
11087373 1951@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4 1952This is like @code{strchr}, except that it searches @var{haystack} for a
2cc4b9cc 1953substring @var{needle} rather than just a single byte. It
28f540f4 1954returns a pointer into the string @var{haystack} that is the first
2cc4b9cc 1955byte of the substring, or a null pointer if no match was found. If
28f540f4
RM
1956@var{needle} is an empty string, the function returns @var{haystack}.
1957
1958For example,
1959@smallexample
1960strstr ("hello, world", "l")
1961 @result{} "llo, world"
1962strstr ("hello, world", "wo")
1963 @result{} "world"
1964@end smallexample
1965@end deftypefun
1966
8a2f1f5b 1967@deftypefun {wchar_t *} wcsstr (const wchar_t *@var{haystack}, const wchar_t *@var{needle})
d08a7e4c 1968@standards{ISO, wchar.h}
11087373 1969@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
1970This is like @code{wcschr}, except that it searches @var{haystack} for a
1971substring @var{needle} rather than just a single wide character. It
1972returns a pointer into the string @var{haystack} that is the first wide
1973character of the substring, or a null pointer if no match was found. If
1974@var{needle} is an empty string, the function returns @var{haystack}.
1975@end deftypefun
1976
8a2f1f5b 1977@deftypefun {wchar_t *} wcswcs (const wchar_t *@var{haystack}, const wchar_t *@var{needle})
d08a7e4c 1978@standards{XPG, wchar.h}
11087373 1979@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
9dcc8f11 1980@code{wcswcs} is a deprecated alias for @code{wcsstr}. This is the
8a2f1f5b
UD
1981name originally used in the X/Open Portability Guide before the
1982@w{Amendment 1} to @w{ISO C90} was published.
1983@end deftypefun
1984
28f540f4 1985
0e4ee106 1986@deftypefun {char *} strcasestr (const char *@var{haystack}, const char *@var{needle})
d08a7e4c 1987@standards{GNU, string.h}
11087373
AO
1988@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
1989@c There may be multiple calls of strncasecmp, each accessing the locale
1990@c object independently.
0e4ee106
UD
1991This is like @code{strstr}, except that it ignores case in searching for
1992the substring. Like @code{strcasecmp}, it is locale dependent how
2cc4b9cc
PE
1993uppercase and lowercase characters are related, and arguments are
1994multibyte strings.
0e4ee106
UD
1995
1996
1997For example,
1998@smallexample
d6868416 1999strcasestr ("hello, world", "L")
0e4ee106 2000 @result{} "llo, world"
d6868416 2001strcasestr ("hello, World", "wo")
0e4ee106
UD
2002 @result{} "World"
2003@end smallexample
2004@end deftypefun
2005
2006
63551311 2007@deftypefun {void *} memmem (const void *@var{haystack}, size_t @var{haystack-len},@*const void *@var{needle}, size_t @var{needle-len})
d08a7e4c 2008@standards{GNU, string.h}
11087373 2009@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4 2010This is like @code{strstr}, but @var{needle} and @var{haystack} are byte
2cc4b9cc 2011arrays rather than strings. @var{needle-len} is the
28f540f4 2012length of @var{needle} and @var{haystack-len} is the length of
0005e54f 2013@var{haystack}.
28f540f4
RM
2014
2015This function is a GNU extension.
2016@end deftypefun
2017
28f540f4 2018@deftypefun size_t strspn (const char *@var{string}, const char *@var{skipset})
d08a7e4c 2019@standards{ISO, string.h}
11087373 2020@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4 2021The @code{strspn} (``string span'') function returns the length of the
2cc4b9cc 2022initial substring of @var{string} that consists entirely of bytes that
28f540f4 2023are members of the set specified by the string @var{skipset}. The order
2cc4b9cc 2024of the bytes in @var{skipset} is not important.
28f540f4
RM
2025
2026For example,
2027@smallexample
2028strspn ("hello, world", "abcdefghijklmnopqrstuvwxyz")
2029 @result{} 5
2030@end smallexample
8a2f1f5b 2031
2cc4b9cc
PE
2032In a multibyte string, characters consisting of
2033more than one byte are not treated as single entities. Each byte is treated
8a2f1f5b
UD
2034separately. The function is not locale-dependent.
2035@end deftypefun
2036
8a2f1f5b 2037@deftypefun size_t wcsspn (const wchar_t *@var{wstring}, const wchar_t *@var{skipset})
d08a7e4c 2038@standards{ISO, wchar.h}
11087373 2039@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
2040The @code{wcsspn} (``wide character string span'') function returns the
2041length of the initial substring of @var{wstring} that consists entirely
2042of wide characters that are members of the set specified by the string
2043@var{skipset}. The order of the wide characters in @var{skipset} is not
2044important.
28f540f4
RM
2045@end deftypefun
2046
28f540f4 2047@deftypefun size_t strcspn (const char *@var{string}, const char *@var{stopset})
d08a7e4c 2048@standards{ISO, string.h}
11087373 2049@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4 2050The @code{strcspn} (``string complement span'') function returns the length
2cc4b9cc 2051of the initial substring of @var{string} that consists entirely of bytes
28f540f4 2052that are @emph{not} members of the set specified by the string @var{stopset}.
2cc4b9cc 2053(In other words, it returns the offset of the first byte in @var{string}
28f540f4
RM
2054that is a member of the set @var{stopset}.)
2055
2056For example,
2057@smallexample
2058strcspn ("hello, world", " \t\n,.;!?")
2059 @result{} 5
2060@end smallexample
8a2f1f5b 2061
2cc4b9cc
PE
2062In a multibyte string, characters consisting of
2063more than one byte are not treated as a single entities. Each byte is treated
8a2f1f5b
UD
2064separately. The function is not locale-dependent.
2065@end deftypefun
2066
8a2f1f5b 2067@deftypefun size_t wcscspn (const wchar_t *@var{wstring}, const wchar_t *@var{stopset})
d08a7e4c 2068@standards{ISO, wchar.h}
11087373 2069@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
2070The @code{wcscspn} (``wide character string complement span'') function
2071returns the length of the initial substring of @var{wstring} that
2072consists entirely of wide characters that are @emph{not} members of the
2073set specified by the string @var{stopset}. (In other words, it returns
2cc4b9cc 2074the offset of the first wide character in @var{string} that is a member of
8a2f1f5b 2075the set @var{stopset}.)
28f540f4
RM
2076@end deftypefun
2077
28f540f4 2078@deftypefun {char *} strpbrk (const char *@var{string}, const char *@var{stopset})
d08a7e4c 2079@standards{ISO, string.h}
11087373 2080@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
28f540f4 2081The @code{strpbrk} (``string pointer break'') function is related to
2cc4b9cc 2082@code{strcspn}, except that it returns a pointer to the first byte
28f540f4
RM
2083in @var{string} that is a member of the set @var{stopset} instead of the
2084length of the initial substring. It returns a null pointer if no such
2cc4b9cc 2085byte from @var{stopset} is found.
28f540f4
RM
2086
2087@c @group Invalid outside the example.
2088For example,
2089
2090@smallexample
2091strpbrk ("hello, world", " \t\n,.;!?")
2092 @result{} ", world"
2093@end smallexample
2094@c @end group
8a2f1f5b 2095
2cc4b9cc
PE
2096In a multibyte string, characters consisting of
2097more than one byte are not treated as single entities. Each byte is treated
8a2f1f5b
UD
2098separately. The function is not locale-dependent.
2099@end deftypefun
2100
8a2f1f5b 2101@deftypefun {wchar_t *} wcspbrk (const wchar_t *@var{wstring}, const wchar_t *@var{stopset})
d08a7e4c 2102@standards{ISO, wchar.h}
11087373 2103@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
2104The @code{wcspbrk} (``wide character string pointer break'') function is
2105related to @code{wcscspn}, except that it returns a pointer to the first
2106wide character in @var{wstring} that is a member of the set
2107@var{stopset} instead of the length of the initial substring. It
2cc4b9cc 2108returns a null pointer if no such wide character from @var{stopset} is found.
28f540f4
RM
2109@end deftypefun
2110
0e4ee106
UD
2111
2112@subsection Compatibility String Search Functions
2113
0e4ee106 2114@deftypefun {char *} index (const char *@var{string}, int @var{c})
d08a7e4c 2115@standards{BSD, string.h}
11087373 2116@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
0e4ee106
UD
2117@code{index} is another name for @code{strchr}; they are exactly the same.
2118New code should always use @code{strchr} since this name is defined in
2119@w{ISO C} while @code{index} is a BSD invention which never was available
2120on @w{System V} derived systems.
2121@end deftypefun
2122
0e4ee106 2123@deftypefun {char *} rindex (const char *@var{string}, int @var{c})
d08a7e4c 2124@standards{BSD, string.h}
11087373 2125@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
0e4ee106
UD
2126@code{rindex} is another name for @code{strrchr}; they are exactly the same.
2127New code should always use @code{strrchr} since this name is defined in
2128@w{ISO C} while @code{rindex} is a BSD invention which never was available
2129on @w{System V} derived systems.
2130@end deftypefun
2131
b4012b75 2132@node Finding Tokens in a String
28f540f4
RM
2133@section Finding Tokens in a String
2134
28f540f4
RM
2135@cindex tokenizing strings
2136@cindex breaking a string into tokens
2137@cindex parsing tokens from a string
2138It's fairly common for programs to have a need to do some simple kinds
2139of lexical analysis and parsing, such as splitting a command string up
2140into tokens. You can do this with the @code{strtok} function, declared
2141in the header file @file{string.h}.
2142@pindex string.h
2143
8a2f1f5b 2144@deftypefun {char *} strtok (char *restrict @var{newstring}, const char *restrict @var{delimiters})
d08a7e4c 2145@standards{ISO, string.h}
11087373 2146@safety{@prelim{}@mtunsafe{@mtasurace{:strtok}}@asunsafe{}@acsafe{}}
28f540f4
RM
2147A string can be split into tokens by making a series of calls to the
2148function @code{strtok}.
2149
2150The string to be split up is passed as the @var{newstring} argument on
2151the first call only. The @code{strtok} function uses this to set up
2152some internal state information. Subsequent calls to get additional
2153tokens from the same string are indicated by passing a null pointer as
2154the @var{newstring} argument. Calling @code{strtok} with another
2155non-null @var{newstring} argument reinitializes the state information.
2156It is guaranteed that no other library function ever calls @code{strtok}
2157behind your back (which would mess up this internal state information).
2158
2159The @var{delimiters} argument is a string that specifies a set of delimiters
2cc4b9cc
PE
2160that may surround the token being extracted. All the initial bytes
2161that are members of this set are discarded. The first byte that is
28f540f4
RM
2162@emph{not} a member of this set of delimiters marks the beginning of the
2163next token. The end of the token is found by looking for the next
2cc4b9cc
PE
2164byte that is a member of the delimiter set. This byte in the
2165original string @var{newstring} is overwritten by a null byte, and the
28f540f4
RM
2166pointer to the beginning of the token in @var{newstring} is returned.
2167
2168On the next call to @code{strtok}, the searching begins at the next
2cc4b9cc 2169byte beyond the one that marked the end of the previous token.
28f540f4
RM
2170Note that the set of delimiters @var{delimiters} do not have to be the
2171same on every call in a series of calls to @code{strtok}.
2172
2173If the end of the string @var{newstring} is reached, or if the remainder of
2cc4b9cc 2174string consists only of delimiter bytes, @code{strtok} returns
28f540f4 2175a null pointer.
8a2f1f5b 2176
2cc4b9cc
PE
2177In a multibyte string, characters consisting of
2178more than one byte are not treated as single entities. Each byte is treated
8a2f1f5b
UD
2179separately. The function is not locale-dependent.
2180@end deftypefun
2181
1acd4371 2182@deftypefun {wchar_t *} wcstok (wchar_t *@var{newstring}, const wchar_t *@var{delimiters}, wchar_t **@var{save_ptr})
d08a7e4c 2183@standards{ISO, wchar.h}
11087373 2184@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
8a2f1f5b
UD
2185A string can be split into tokens by making a series of calls to the
2186function @code{wcstok}.
2187
2188The string to be split up is passed as the @var{newstring} argument on
2189the first call only. The @code{wcstok} function uses this to set up
2190some internal state information. Subsequent calls to get additional
2cc4b9cc 2191tokens from the same wide string are indicated by passing a
1acd4371
AO
2192null pointer as the @var{newstring} argument, which causes the pointer
2193previously stored in @var{save_ptr} to be used instead.
8a2f1f5b 2194
2cc4b9cc 2195The @var{delimiters} argument is a wide string that specifies
8a2f1f5b
UD
2196a set of delimiters that may surround the token being extracted. All
2197the initial wide characters that are members of this set are discarded.
2198The first wide character that is @emph{not} a member of this set of
2199delimiters marks the beginning of the next token. The end of the token
2200is found by looking for the next wide character that is a member of the
2cc4b9cc 2201delimiter set. This wide character in the original wide
1acd4371
AO
2202string @var{newstring} is overwritten by a null wide character, the
2203pointer past the overwritten wide character is saved in @var{save_ptr},
2204and the pointer to the beginning of the token in @var{newstring} is
2205returned.
8a2f1f5b
UD
2206
2207On the next call to @code{wcstok}, the searching begins at the next
2208wide character beyond the one that marked the end of the previous token.
2209Note that the set of delimiters @var{delimiters} do not have to be the
2210same on every call in a series of calls to @code{wcstok}.
2211
2cc4b9cc 2212If the end of the wide string @var{newstring} is reached, or
8a2f1f5b
UD
2213if the remainder of string consists only of delimiter wide characters,
2214@code{wcstok} returns a null pointer.
28f540f4
RM
2215@end deftypefun
2216
8a2f1f5b
UD
2217@strong{Warning:} Since @code{strtok} and @code{wcstok} alter the string
2218they is parsing, you should always copy the string to a temporary buffer
0a13c9e9
PE
2219before parsing it with @code{strtok}/@code{wcstok} (@pxref{Copying Strings
2220and Arrays}). If you allow @code{strtok} or @code{wcstok} to modify
8a2f1f5b
UD
2221a string that came from another part of your program, you are asking for
2222trouble; that string might be used for other purposes after
2223@code{strtok} or @code{wcstok} has modified it, and it would not have
2224the expected value.
28f540f4
RM
2225
2226The string that you are operating on might even be a constant. Then
8a2f1f5b
UD
2227when @code{strtok} or @code{wcstok} tries to modify it, your program
2228will get a fatal signal for writing in read-only memory. @xref{Program
2229Error Signals}. Even if the operation of @code{strtok} or @code{wcstok}
2230would not require a modification of the string (e.g., if there is
1f77f049 2231exactly one token) the string can (and in the @glibcadj{} case will) be
8a2f1f5b 2232modified.
28f540f4
RM
2233
2234This is a special case of a general principle: if a part of a program
2235does not have as its purpose the modification of a certain data
2236structure, then it is error-prone to modify the data structure
2237temporarily.
2238
1acd4371 2239The function @code{strtok} is not reentrant, whereas @code{wcstok} is.
8a2f1f5b
UD
2240@xref{Nonreentrancy}, for a discussion of where and why reentrancy is
2241important.
28f540f4
RM
2242
2243Here is a simple example showing the use of @code{strtok}.
2244
2245@comment Yes, this example has been tested.
2246@smallexample
2247#include <string.h>
2248#include <stddef.h>
2249
2250@dots{}
2251
5649a1d6 2252const char string[] = "words separated by spaces -- and, punctuation!";
28f540f4 2253const char delimiters[] = " .,;:!-";
5649a1d6 2254char *token, *cp;
28f540f4
RM
2255
2256@dots{}
2257
5649a1d6
UD
2258cp = strdupa (string); /* Make writable copy. */
2259token = strtok (cp, delimiters); /* token => "words" */
28f540f4
RM
2260token = strtok (NULL, delimiters); /* token => "separated" */
2261token = strtok (NULL, delimiters); /* token => "by" */
2262token = strtok (NULL, delimiters); /* token => "spaces" */
2263token = strtok (NULL, delimiters); /* token => "and" */
2264token = strtok (NULL, delimiters); /* token => "punctuation" */
2265token = strtok (NULL, delimiters); /* token => NULL */
2266@end smallexample
a5113b14 2267
1f77f049 2268@Theglibc{} contains two more functions for tokenizing a string
2cc4b9cc
PE
2269which overcome the limitation of non-reentrancy. They are not
2270available available for wide strings.
a5113b14 2271
a5113b14 2272@deftypefun {char *} strtok_r (char *@var{newstring}, const char *@var{delimiters}, char **@var{save_ptr})
d08a7e4c 2273@standards{POSIX, string.h}
11087373 2274@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
dd7d45e8
UD
2275Just like @code{strtok}, this function splits the string into several
2276tokens which can be accessed by successive calls to @code{strtok_r}.
1acd4371
AO
2277The difference is that, as in @code{wcstok}, the information about the
2278next token is stored in the space pointed to by the third argument,
2279@var{save_ptr}, which is a pointer to a string pointer. Calling
2280@code{strtok_r} with a null pointer for @var{newstring} and leaving
2281@var{save_ptr} between the calls unchanged does the job without
2282hindering reentrancy.
a5113b14 2283
976780fd 2284This function is defined in POSIX.1 and can be found on many systems
a5113b14
UD
2285which support multi-threading.
2286@end deftypefun
2287
a5113b14 2288@deftypefun {char *} strsep (char **@var{string_ptr}, const char *@var{delimiter})
d08a7e4c 2289@standards{BSD, string.h}
11087373 2290@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
0050ad5f
UD
2291This function has a similar functionality as @code{strtok_r} with the
2292@var{newstring} argument replaced by the @var{save_ptr} argument. The
2293initialization of the moving pointer has to be done by the user.
2294Successive calls to @code{strsep} move the pointer along the tokens
2295separated by @var{delimiter}, returning the address of the next token
2296and updating @var{string_ptr} to point to the beginning of the next
2297token.
2298
2299One difference between @code{strsep} and @code{strtok_r} is that if the
2cc4b9cc
PE
2300input string contains more than one byte from @var{delimiter} in a
2301row @code{strsep} returns an empty string for each pair of bytes
0050ad5f
UD
2302from @var{delimiter}. This means that a program normally should test
2303for @code{strsep} returning an empty string before processing it.
9afc8a59 2304
a5113b14
UD
2305This function was introduced in 4.3BSD and therefore is widely available.
2306@end deftypefun
2307
2308Here is how the above example looks like when @code{strsep} is used.
2309
2310@comment Yes, this example has been tested.
2311@smallexample
2312#include <string.h>
2313#include <stddef.h>
2314
2315@dots{}
2316
5649a1d6 2317const char string[] = "words separated by spaces -- and, punctuation!";
a5113b14
UD
2318const char delimiters[] = " .,;:!-";
2319char *running;
2320char *token;
2321
2322@dots{}
2323
5649a1d6 2324running = strdupa (string);
a5113b14
UD
2325token = strsep (&running, delimiters); /* token => "words" */
2326token = strsep (&running, delimiters); /* token => "separated" */
2327token = strsep (&running, delimiters); /* token => "by" */
2328token = strsep (&running, delimiters); /* token => "spaces" */
9afc8a59
UD
2329token = strsep (&running, delimiters); /* token => "" */
2330token = strsep (&running, delimiters); /* token => "" */
2331token = strsep (&running, delimiters); /* token => "" */
a5113b14 2332token = strsep (&running, delimiters); /* token => "and" */
9afc8a59 2333token = strsep (&running, delimiters); /* token => "" */
a5113b14 2334token = strsep (&running, delimiters); /* token => "punctuation" */
9afc8a59 2335token = strsep (&running, delimiters); /* token => "" */
a5113b14
UD
2336token = strsep (&running, delimiters); /* token => NULL */
2337@end smallexample
b4012b75 2338
ec28fc7c 2339@deftypefun {char *} basename (const char *@var{filename})
d08a7e4c 2340@standards{GNU, string.h}
11087373 2341@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
ec28fc7c 2342The GNU version of the @code{basename} function returns the last
9442cd75 2343component of the path in @var{filename}. This function is the preferred
ec28fc7c
UD
2344usage, since it does not modify the argument, @var{filename}, and
2345respects trailing slashes. The prototype for @code{basename} can be
ef48b196 2346found in @file{string.h}. Note, this function is overridden by the XPG
ec28fc7c
UD
2347version, if @file{libgen.h} is included.
2348
2349Example of using GNU @code{basename}:
2350
2351@smallexample
2352#include <string.h>
2353
2354int
2355main (int argc, char *argv[])
2356@{
2357 char *prog = basename (argv[0]);
2358
2359 if (argc < 2)
2360 @{
2361 fprintf (stderr, "Usage %s <arg>\n", prog);
2362 exit (1);
2363 @}
2364
2365 @dots{}
2366@}
2367@end smallexample
2368
2369@strong{Portability Note:} This function may produce different results
2370on different systems.
2371
2372@end deftypefun
2373
af85ebcd 2374@deftypefun {char *} basename (char *@var{path})
d08a7e4c 2375@standards{XPG, libgen.h}
11087373 2376@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
cf822e3c 2377This is the standard XPG defined @code{basename}. It is similar in
ec28fc7c 2378spirit to the GNU version, but may modify the @var{path} by removing
2cc4b9cc
PE
2379trailing '/' bytes. If the @var{path} is made up entirely of '/'
2380bytes, then "/" will be returned. Also, if @var{path} is
ec28fc7c 2381@code{NULL} or an empty string, then "." is returned. The prototype for
e4a5f77d 2382the XPG version can be found in @file{libgen.h}.
ec28fc7c
UD
2383
2384Example of using XPG @code{basename}:
2385
2386@smallexample
2387#include <libgen.h>
2388
2389int
2390main (int argc, char *argv[])
2391@{
2392 char *prog;
2393 char *path = strdupa (argv[0]);
2394
2395 prog = basename (path);
2396
2397 if (argc < 2)
2398 @{
2399 fprintf (stderr, "Usage %s <arg>\n", prog);
2400 exit (1);
2401 @}
2402
2403 @dots{}
2404
2405@}
2406@end smallexample
2407@end deftypefun
2408
ec28fc7c 2409@deftypefun {char *} dirname (char *@var{path})
d08a7e4c 2410@standards{XPG, libgen.h}
11087373 2411@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
ec28fc7c
UD
2412The @code{dirname} function is the compliment to the XPG version of
2413@code{basename}. It returns the parent directory of the file specified
2414by @var{path}. If @var{path} is @code{NULL}, an empty string, or
2cc4b9cc 2415contains no '/' bytes, then "." is returned. The prototype for this
ec28fc7c
UD
2416function can be found in @file{libgen.h}.
2417@end deftypefun
0e4ee106 2418
ea1bd74d
ZW
2419@node Erasing Sensitive Data
2420@section Erasing Sensitive Data
2421
2422Sensitive data, such as cryptographic keys, should be erased from
2423memory after use, to reduce the risk that a bug will expose it to the
2424outside world. However, compiler optimizations may determine that an
2425erasure operation is ``unnecessary,'' and remove it from the generated
2426code, because no @emph{correct} program could access the variable or
2427heap object containing the sensitive data after it's deallocated.
2428Since erasure is a precaution against bugs, this optimization is
2429inappropriate.
2430
2431The function @code{explicit_bzero} erases a block of memory, and
2432guarantees that the compiler will not remove the erasure as
2433``unnecessary.''
2434
2435@smallexample
2436@group
2437#include <string.h>
2438
2439extern void encrypt (const char *key, const char *in,
2440 char *out, size_t n);
2441extern void genkey (const char *phrase, char *key);
2442
2443void encrypt_with_phrase (const char *phrase, const char *in,
2444 char *out, size_t n)
2445@{
2446 char key[16];
2447 genkey (phrase, key);
2448 encrypt (key, in, out, n);
2449 explicit_bzero (key, 16);
2450@}
2451@end group
2452@end smallexample
2453
2454@noindent
2455In this example, if @code{memset}, @code{bzero}, or a hand-written
2456loop had been used, the compiler might remove them as ``unnecessary.''
2457
2458@strong{Warning:} @code{explicit_bzero} does not guarantee that
2459sensitive data is @emph{completely} erased from the computer's memory.
2460There may be copies in temporary storage areas, such as registers and
2461``scratch'' stack space; since these are invisible to the source code,
2462a library function cannot erase them.
2463
2464Also, @code{explicit_bzero} only operates on RAM. If a sensitive data
2465object never needs to have its address taken other than to call
2466@code{explicit_bzero}, it might be stored entirely in CPU registers
2467@emph{until} the call to @code{explicit_bzero}. Then it will be
2468copied into RAM, the copy will be erased, and the original will remain
2469intact. Data in RAM is more likely to be exposed by a bug than data
2470in registers, so this creates a brief window where the data is at
2471greater risk of exposure than it would have been if the program didn't
2472try to erase it at all.
2473
2474Declaring sensitive variables as @code{volatile} will make both the
2475above problems @emph{worse}; a @code{volatile} variable will be stored
2476in memory for its entire lifetime, and the compiler will make
2477@emph{more} copies of it than it would otherwise have. Attempting to
2478erase a normal variable ``by hand'' through a
2479@code{volatile}-qualified pointer doesn't work at all---because the
2480variable itself is not @code{volatile}, some compilers will ignore the
2481qualification on the pointer and remove the erasure anyway.
2482
2483Having said all that, in most situations, using @code{explicit_bzero}
2484is better than not using it. At present, the only way to do a more
2485thorough job is to write the entire sensitive operation in assembly
2486language. We anticipate that future compilers will recognize calls to
2487@code{explicit_bzero} and take appropriate steps to erase all the
8394b8c4 2488copies of the affected data, wherever they may be.
ea1bd74d 2489
ea1bd74d 2490@deftypefun void explicit_bzero (void *@var{block}, size_t @var{len})
d08a7e4c 2491@standards{BSD, string.h}
ea1bd74d
ZW
2492@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
2493
2494@code{explicit_bzero} writes zero into @var{len} bytes of memory
2495beginning at @var{block}, just as @code{bzero} would. The zeroes are
2496always written, even if the compiler could determine that this is
2497``unnecessary'' because no correct program could read them back.
2498
2499@strong{Note:} The @emph{only} optimization that @code{explicit_bzero}
2500disables is removal of ``unnecessary'' writes to memory. The compiler
2501can perform all the other optimizations that it could for a call to
2502@code{memset}. For instance, it may replace the function call with
2503inline memory writes, and it may assume that @var{block} cannot be a
2504null pointer.
2505
2506@strong{Portability Note:} This function first appeared in OpenBSD 5.5
2507and has not been standardized. Other systems may provide the same
2508functionality under a different name, such as @code{explicit_memset},
2509@code{memset_s}, or @code{SecureZeroMemory}.
2510
2511@Theglibc{} declares this function in @file{string.h}, but on other
2512systems it may be in @file{strings.h} instead.
2513@end deftypefun
2514
b10a0acc
ZW
2515
2516@node Shuffling Bytes
2517@section Shuffling Bytes
0e4ee106
UD
2518
2519The function below addresses the perennial programming quandary: ``How do
2520I take good data in string form and painlessly turn it into garbage?''
b10a0acc
ZW
2521This is not a difficult thing to code for oneself, but the authors of
2522@theglibc{} wish to make it as convenient as possible.
0e4ee106 2523
b10a0acc
ZW
2524To @emph{erase} data, use @code{explicit_bzero} (@pxref{Erasing
2525Sensitive Data}); to obfuscate it reversibly, use @code{memfrob}
2526(@pxref{Obfuscating Data}).
0e4ee106 2527
ec28fc7c 2528@deftypefun {char *} strfry (char *@var{string})
d08a7e4c 2529@standards{GNU, string.h}
11087373
AO
2530@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
2531@c Calls initstate_r, time, getpid, strlen, and random_r.
0e4ee106 2532
b10a0acc
ZW
2533@code{strfry} performs an in-place shuffle on @var{string}. Each
2534character is swapped to a position selected at random, within the
2535portion of the string starting with the character's original position.
2536(This is the Fisher-Yates algorithm for unbiased shuffling.)
2537
2538Calling @code{strfry} will not disturb any of the random number
2539generators that have global state (@pxref{Pseudo-Random Numbers}).
0e4ee106
UD
2540
2541The return value of @code{strfry} is always @var{string}.
2542
1f77f049 2543@strong{Portability Note:} This function is unique to @theglibc{}.
b10a0acc 2544It is declared in @file{string.h}.
0e4ee106
UD
2545@end deftypefun
2546
2547
b10a0acc
ZW
2548@node Obfuscating Data
2549@section Obfuscating Data
0e4ee106
UD
2550@cindex Rot13
2551
b10a0acc
ZW
2552The @code{memfrob} function reversibly obfuscates an array of binary
2553data. This is not true encryption; the obfuscated data still bears a
2554clear relationship to the original, and no secret key is required to
2555undo the obfuscation. It is analogous to the ``Rot13'' cipher used on
2556Usenet for obscuring offensive jokes, spoilers for works of fiction,
2557and so on, but it can be applied to arbitrary binary data.
0e4ee106 2558
b10a0acc
ZW
2559Programs that need true encryption---a transformation that completely
2560obscures the original and cannot be reversed without knowledge of a
2561secret key---should use a dedicated cryptography library, such as
2562@uref{https://www.gnu.org/software/libgcrypt/,,libgcrypt}.
2563
2564Programs that need to @emph{destroy} data should use
2565@code{explicit_bzero} (@pxref{Erasing Sensitive Data}), or possibly
2566@code{strfry} (@pxref{Shuffling Bytes}).
0e4ee106 2567
0e4ee106 2568@deftypefun {void *} memfrob (void *@var{mem}, size_t @var{length})
d08a7e4c 2569@standards{GNU, string.h}
11087373 2570@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
0e4ee106 2571
b10a0acc
ZW
2572The function @code{memfrob} obfuscates @var{length} bytes of data
2573beginning at @var{mem}, in place. Each byte is bitwise xor-ed with
2574the binary pattern 00101010 (hexadecimal 0x2A). The return value is
2575always @var{mem}.
0e4ee106 2576
b10a0acc
ZW
2577@code{memfrob} a second time on the same data returns it to
2578its original state.
0e4ee106 2579
1f77f049 2580@strong{Portability Note:} This function is unique to @theglibc{}.
b10a0acc 2581It is declared in @file{string.h}.
0e4ee106
UD
2582@end deftypefun
2583
b4012b75
UD
2584@node Encode Binary Data
2585@section Encode Binary Data
2586
2587To store or transfer binary data in environments which only support text
2588one has to encode the binary data by mapping the input bytes to
2cc4b9cc 2589bytes in the range allowed for storing or transferring. SVID
dd7d45e8
UD
2590systems (and nowadays XPG compliant systems) provide minimal support for
2591this task.
b4012b75 2592
b4012b75 2593@deftypefun {char *} l64a (long int @var{n})
d08a7e4c 2594@standards{XPG, stdlib.h}
11087373 2595@safety{@prelim{}@mtunsafe{@mtasurace{:l64a}}@asunsafe{}@acsafe{}}
2cc4b9cc
PE
2596This function encodes a 32-bit input value using bytes from the
2597basic character set. It returns a pointer to a 7 byte buffer which
dd7d45e8
UD
2598contains an encoded version of @var{n}. To encode a series of bytes the
2599user must copy the returned string to a destination buffer. It returns
2600the empty string if @var{n} is zero, which is somewhat bizarre but
2601mandated by the standard.@*
2602@strong{Warning:} Since a static buffer is used this function should not
5649a1d6 2603be used in multi-threaded programs. There is no thread-safe alternative
dd7d45e8
UD
2604to this function in the C library.@*
2605@strong{Compatibility Note:} The XPG standard states that the return
2606value of @code{l64a} is undefined if @var{n} is negative. In the GNU
2607implementation, @code{l64a} treats its argument as unsigned, so it will
2608return a sensible encoding for any nonzero @var{n}; however, portable
2609programs should not rely on this.
b4012b75 2610
dd7d45e8
UD
2611To encode a large buffer @code{l64a} must be called in a loop, once for
2612each 32-bit word of the buffer. For example, one could do something
2613like this:
5649a1d6
UD
2614
2615@smallexample
2616char *
2617encode (const void *buf, size_t len)
2618@{
2619 /* @r{We know in advance how long the buffer has to be.} */
2620 unsigned char *in = (unsigned char *) buf;
2621 char *out = malloc (6 + ((len + 3) / 4) * 6 + 1);
290639c3 2622 char *cp = out, *p;
5649a1d6
UD
2623
2624 /* @r{Encode the length.} */
dd7d45e8 2625 /* @r{Using `htonl' is necessary so that the data can be}
290639c3
UD
2626 @r{decoded even on machines with different byte order.}
2627 @r{`l64a' can return a string shorter than 6 bytes, so }
2628 @r{we pad it with encoding of 0 (}'.'@r{) at the end by }
2629 @r{hand.} */
dd7d45e8 2630
290639c3
UD
2631 p = stpcpy (cp, l64a (htonl (len)));
2632 cp = mempcpy (p, "......", 6 - (p - cp));
5649a1d6
UD
2633
2634 while (len > 3)
2635 @{
2636 unsigned long int n = *in++;
2637 n = (n << 8) | *in++;
2638 n = (n << 8) | *in++;
2639 n = (n << 8) | *in++;
2640 len -= 4;
290639c3
UD
2641 p = stpcpy (cp, l64a (htonl (n)));
2642 cp = mempcpy (p, "......", 6 - (p - cp));
5649a1d6
UD
2643 @}
2644 if (len > 0)
2645 @{
2646 unsigned long int n = *in++;
2647 if (--len > 0)
2648 @{
2649 n = (n << 8) | *in++;
2650 if (--len > 0)
2651 n = (n << 8) | *in;
2652 @}
290639c3 2653 cp = stpcpy (cp, l64a (htonl (n)));
5649a1d6
UD
2654 @}
2655 *cp = '\0';
2656 return out;
2657@}
2658@end smallexample
2659
2660It is strange that the library does not provide the complete
dd7d45e8
UD
2661functionality needed but so be it.
2662
2663@end deftypefun
5649a1d6 2664
b4012b75
UD
2665To decode data produced with @code{l64a} the following function should be
2666used.
2667
2668@deftypefun {long int} a64l (const char *@var{string})
d08a7e4c 2669@standards{XPG, stdlib.h}
11087373 2670@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
b4012b75 2671The parameter @var{string} should contain a string which was produced by
2cc4b9cc
PE
2672a call to @code{l64a}. The function processes at least 6 bytes of
2673this string, and decodes the bytes it finds according to the table
2674below. It stops decoding when it finds a byte not in the table,
dd7d45e8 2675rather like @code{atoi}; if you have a buffer which has been broken into
2cc4b9cc 2676lines, you must be careful to skip over the end-of-line bytes.
dd7d45e8
UD
2677
2678The decoded number is returned as a @code{long int} value.
b4012b75 2679@end deftypefun
b13927da 2680
dd7d45e8 2681The @code{l64a} and @code{a64l} functions use a base 64 encoding, in
2cc4b9cc 2682which each byte of an encoded string represents six bits of an
dd7d45e8
UD
2683input word. These symbols are used for the base 64 digits:
2684
2685@multitable {xxxxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx}
2686@item @tab 0 @tab 1 @tab 2 @tab 3 @tab 4 @tab 5 @tab 6 @tab 7
2687@item 0 @tab @code{.} @tab @code{/} @tab @code{0} @tab @code{1}
2688 @tab @code{2} @tab @code{3} @tab @code{4} @tab @code{5}
2689@item 8 @tab @code{6} @tab @code{7} @tab @code{8} @tab @code{9}
2690 @tab @code{A} @tab @code{B} @tab @code{C} @tab @code{D}
2691@item 16 @tab @code{E} @tab @code{F} @tab @code{G} @tab @code{H}
2692 @tab @code{I} @tab @code{J} @tab @code{K} @tab @code{L}
2693@item 24 @tab @code{M} @tab @code{N} @tab @code{O} @tab @code{P}
2694 @tab @code{Q} @tab @code{R} @tab @code{S} @tab @code{T}
2695@item 32 @tab @code{U} @tab @code{V} @tab @code{W} @tab @code{X}
2696 @tab @code{Y} @tab @code{Z} @tab @code{a} @tab @code{b}
2697@item 40 @tab @code{c} @tab @code{d} @tab @code{e} @tab @code{f}
2698 @tab @code{g} @tab @code{h} @tab @code{i} @tab @code{j}
2699@item 48 @tab @code{k} @tab @code{l} @tab @code{m} @tab @code{n}
2700 @tab @code{o} @tab @code{p} @tab @code{q} @tab @code{r}
2701@item 56 @tab @code{s} @tab @code{t} @tab @code{u} @tab @code{v}
2702 @tab @code{w} @tab @code{x} @tab @code{y} @tab @code{z}
2703@end multitable
2704
2705This encoding scheme is not standard. There are some other encoding
2706methods which are much more widely used (UU encoding, MIME encoding).
2707Generally, it is better to use one of these encodings.
2708
b13927da
UD
2709@node Argz and Envz Vectors
2710@section Argz and Envz Vectors
2711
5649a1d6 2712@cindex argz vectors (string vectors)
2cc4b9cc
PE
2713@cindex string vectors, null-byte separated
2714@cindex argument vectors, null-byte separated
b13927da 2715@dfn{argz vectors} are vectors of strings in a contiguous block of
2cc4b9cc 2716memory, each element separated from its neighbors by null bytes
b13927da
UD
2717(@code{'\0'}).
2718
5649a1d6 2719@cindex envz vectors (environment vectors)
2cc4b9cc 2720@cindex environment vectors, null-byte separated
b13927da 2721@dfn{Envz vectors} are an extension of argz vectors where each element is a
2cc4b9cc 2722name-value pair, separated by a @code{'='} byte (as in a Unix
b13927da
UD
2723environment).
2724
2725@menu
2726* Argz Functions:: Operations on argz vectors.
2727* Envz Functions:: Additional operations on environment vectors.
2728@end menu
2729
2730@node Argz Functions, Envz Functions, , Argz and Envz Vectors
2731@subsection Argz Functions
2732
2733Each argz vector is represented by a pointer to the first element, of
2734type @code{char *}, and a size, of type @code{size_t}, both of which can
2735be initialized to @code{0} to represent an empty argz vector. All argz
2736functions accept either a pointer and a size argument, or pointers to
2737them, if they will be modified.
2738
2739The argz functions use @code{malloc}/@code{realloc} to allocate/grow
f0f308c1 2740argz vectors, and so any argz vector created using these functions may
b13927da
UD
2741be freed by using @code{free}; conversely, any argz function that may
2742grow a string expects that string to have been allocated using
2743@code{malloc} (those argz functions that only examine their arguments or
2744modify them in place will work on any sort of memory).
2745@xref{Unconstrained Allocation}.
2746
2747All argz functions that do memory allocation have a return type of
2748@code{error_t}, and return @code{0} for success, and @code{ENOMEM} if an
2749allocation error occurs.
2750
2751@pindex argz.h
2752These functions are declared in the standard include file @file{argz.h}.
2753
2754@deftypefun {error_t} argz_create (char *const @var{argv}[], char **@var{argz}, size_t *@var{argz_len})
d08a7e4c 2755@standards{GNU, argz.h}
11087373 2756@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
5649a1d6 2757The @code{argz_create} function converts the Unix-style argument vector
b13927da
UD
2758@var{argv} (a vector of pointers to normal C strings, terminated by
2759@code{(char *)0}; @pxref{Program Arguments}) into an argz vector with
2760the same elements, which is returned in @var{argz} and @var{argz_len}.
2761@end deftypefun
2762
2763@deftypefun {error_t} argz_create_sep (const char *@var{string}, int @var{sep}, char **@var{argz}, size_t *@var{argz_len})
d08a7e4c 2764@standards{GNU, argz.h}
11087373 2765@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
2cc4b9cc 2766The @code{argz_create_sep} function converts the string
b13927da 2767@var{string} into an argz vector (returned in @var{argz} and
49c091e5 2768@var{argz_len}) by splitting it into elements at every occurrence of the
2cc4b9cc 2769byte @var{sep}.
b13927da
UD
2770@end deftypefun
2771
f0f308c1 2772@deftypefun {size_t} argz_count (const char *@var{argz}, size_t @var{argz_len})
d08a7e4c 2773@standards{GNU, argz.h}
11087373 2774@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
b13927da
UD
2775Returns the number of elements in the argz vector @var{argz} and
2776@var{argz_len}.
2777@end deftypefun
2778
8ded91fb 2779@deftypefun {void} argz_extract (const char *@var{argz}, size_t @var{argz_len}, char **@var{argv})
d08a7e4c 2780@standards{GNU, argz.h}
11087373 2781@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
b13927da 2782The @code{argz_extract} function converts the argz vector @var{argz} and
5649a1d6 2783@var{argz_len} into a Unix-style argument vector stored in @var{argv},
b13927da
UD
2784by putting pointers to every element in @var{argz} into successive
2785positions in @var{argv}, followed by a terminator of @code{0}.
2786@var{Argv} must be pre-allocated with enough space to hold all the
2787elements in @var{argz} plus the terminating @code{(char *)0}
2788(@code{(argz_count (@var{argz}, @var{argz_len}) + 1) * sizeof (char *)}
2789bytes should be enough). Note that the string pointers stored into
2790@var{argv} point into @var{argz}---they are not copies---and so
2791@var{argz} must be copied if it will be changed while @var{argv} is
2792still active. This function is useful for passing the elements in
2793@var{argz} to an exec function (@pxref{Executing a File}).
2794@end deftypefun
2795
2796@deftypefun {void} argz_stringify (char *@var{argz}, size_t @var{len}, int @var{sep})
d08a7e4c 2797@standards{GNU, argz.h}
11087373 2798@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
b13927da 2799The @code{argz_stringify} converts @var{argz} into a normal string with
2cc4b9cc 2800the elements separated by the byte @var{sep}, by replacing each
b13927da
UD
2801@code{'\0'} inside @var{argz} (except the last one, which terminates the
2802string) with @var{sep}. This is handy for printing @var{argz} in a
2803readable manner.
2804@end deftypefun
2805
2806@deftypefun {error_t} argz_add (char **@var{argz}, size_t *@var{argz_len}, const char *@var{str})
d08a7e4c 2807@standards{GNU, argz.h}
11087373
AO
2808@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
2809@c Calls strlen and argz_append.
b13927da
UD
2810The @code{argz_add} function adds the string @var{str} to the end of the
2811argz vector @code{*@var{argz}}, and updates @code{*@var{argz}} and
2812@code{*@var{argz_len}} accordingly.
2813@end deftypefun
2814
2815@deftypefun {error_t} argz_add_sep (char **@var{argz}, size_t *@var{argz_len}, const char *@var{str}, int @var{delim})
d08a7e4c 2816@standards{GNU, argz.h}
11087373 2817@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
b13927da 2818The @code{argz_add_sep} function is similar to @code{argz_add}, but
49c091e5 2819@var{str} is split into separate elements in the result at occurrences of
2cc4b9cc 2820the byte @var{delim}. This is useful, for instance, for
5649a1d6 2821adding the components of a Unix search path to an argz vector, by using
b13927da
UD
2822a value of @code{':'} for @var{delim}.
2823@end deftypefun
2824
2825@deftypefun {error_t} argz_append (char **@var{argz}, size_t *@var{argz_len}, const char *@var{buf}, size_t @var{buf_len})
d08a7e4c 2826@standards{GNU, argz.h}
11087373 2827@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
b13927da
UD
2828The @code{argz_append} function appends @var{buf_len} bytes starting at
2829@var{buf} to the argz vector @code{*@var{argz}}, reallocating
2830@code{*@var{argz}} to accommodate it, and adding @var{buf_len} to
2831@code{*@var{argz_len}}.
2832@end deftypefun
2833
30aa5785 2834@deftypefun {void} argz_delete (char **@var{argz}, size_t *@var{argz_len}, char *@var{entry})
d08a7e4c 2835@standards{GNU, argz.h}
11087373
AO
2836@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
2837@c Calls free if no argument is left.
b13927da
UD
2838If @var{entry} points to the beginning of one of the elements in the
2839argz vector @code{*@var{argz}}, the @code{argz_delete} function will
2840remove this entry and reallocate @code{*@var{argz}}, modifying
2841@code{*@var{argz}} and @code{*@var{argz_len}} accordingly. Note that as
2842destructive argz functions usually reallocate their argz argument,
2843pointers into argz vectors such as @var{entry} will then become invalid.
2844@end deftypefun
2845
2846@deftypefun {error_t} argz_insert (char **@var{argz}, size_t *@var{argz_len}, char *@var{before}, const char *@var{entry})
d08a7e4c 2847@standards{GNU, argz.h}
11087373
AO
2848@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
2849@c Calls argz_add or realloc and memmove.
b13927da
UD
2850The @code{argz_insert} function inserts the string @var{entry} into the
2851argz vector @code{*@var{argz}} at a point just before the existing
2852element pointed to by @var{before}, reallocating @code{*@var{argz}} and
2853updating @code{*@var{argz}} and @code{*@var{argz_len}}. If @var{before}
2854is @code{0}, @var{entry} is added to the end instead (as if by
2855@code{argz_add}). Since the first element is in fact the same as
2856@code{*@var{argz}}, passing in @code{*@var{argz}} as the value of
2857@var{before} will result in @var{entry} being inserted at the beginning.
2858@end deftypefun
2859
8ded91fb 2860@deftypefun {char *} argz_next (const char *@var{argz}, size_t @var{argz_len}, const char *@var{entry})
d08a7e4c 2861@standards{GNU, argz.h}
11087373 2862@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
b13927da
UD
2863The @code{argz_next} function provides a convenient way of iterating
2864over the elements in the argz vector @var{argz}. It returns a pointer
2865to the next element in @var{argz} after the element @var{entry}, or
2866@code{0} if there are no elements following @var{entry}. If @var{entry}
2867is @code{0}, the first element of @var{argz} is returned.
2868
2869This behavior suggests two styles of iteration:
2870
2871@smallexample
2872 char *entry = 0;
2873 while ((entry = argz_next (@var{argz}, @var{argz_len}, entry)))
2874 @var{action};
2875@end smallexample
2876
2877(the double parentheses are necessary to make some C compilers shut up
2878about what they consider a questionable @code{while}-test) and:
2879
2880@smallexample
2881 char *entry;
2882 for (entry = @var{argz};
2883 entry;
2884 entry = argz_next (@var{argz}, @var{argz_len}, entry))
2885 @var{action};
2886@end smallexample
2887
2888Note that the latter depends on @var{argz} having a value of @code{0} if
2889it is empty (rather than a pointer to an empty block of memory); this
2890invariant is maintained for argz vectors created by the functions here.
2891@end deftypefun
2892
d705269e 2893@deftypefun error_t argz_replace (@w{char **@var{argz}, size_t *@var{argz_len}}, @w{const char *@var{str}, const char *@var{with}}, @w{unsigned *@var{replace_count}})
d08a7e4c 2894@standards{GNU, argz.h}
11087373 2895@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
49c091e5 2896Replace any occurrences of the string @var{str} in @var{argz} with
d705269e
UD
2897@var{with}, reallocating @var{argz} as necessary. If
2898@var{replace_count} is non-zero, @code{*@var{replace_count}} will be
f0f308c1 2899incremented by the number of replacements performed.
d705269e
UD
2900@end deftypefun
2901
b13927da
UD
2902@node Envz Functions, , Argz Functions, Argz and Envz Vectors
2903@subsection Envz Functions
2904
2905Envz vectors are just argz vectors with additional constraints on the form
2906of each element; as such, argz functions can also be used on them, where it
2907makes sense.
2908
2909Each element in an envz vector is a name-value pair, separated by a @code{'='}
2cc4b9cc 2910byte; if multiple @code{'='} bytes are present in an element, those
b13927da 2911after the first are considered part of the value, and treated like all other
2cc4b9cc 2912non-@code{'\0'} bytes.
b13927da 2913
2cc4b9cc 2914If @emph{no} @code{'='} bytes are present in an element, that element is
b13927da
UD
2915considered the name of a ``null'' entry, as distinct from an entry with an
2916empty value: @code{envz_get} will return @code{0} if given the name of null
2917entry, whereas an entry with an empty value would result in a value of
2918@code{""}; @code{envz_entry} will still find such entries, however. Null
f0f308c1 2919entries can be removed with the @code{envz_strip} function.
b13927da
UD
2920
2921As with argz functions, envz functions that may allocate memory (and thus
2922fail) have a return type of @code{error_t}, and return either @code{0} or
2923@code{ENOMEM}.
2924
2925@pindex envz.h
2926These functions are declared in the standard include file @file{envz.h}.
2927
2928@deftypefun {char *} envz_entry (const char *@var{envz}, size_t @var{envz_len}, const char *@var{name})
d08a7e4c 2929@standards{GNU, envz.h}
11087373 2930@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
b13927da
UD
2931The @code{envz_entry} function finds the entry in @var{envz} with the name
2932@var{name}, and returns a pointer to the whole entry---that is, the argz
2cc4b9cc 2933element which begins with @var{name} followed by a @code{'='} byte. If
b13927da
UD
2934there is no entry with that name, @code{0} is returned.
2935@end deftypefun
2936
2937@deftypefun {char *} envz_get (const char *@var{envz}, size_t @var{envz_len}, const char *@var{name})
d08a7e4c 2938@standards{GNU, envz.h}
11087373 2939@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
b13927da
UD
2940The @code{envz_get} function finds the entry in @var{envz} with the name
2941@var{name} (like @code{envz_entry}), and returns a pointer to the value
2942portion of that entry (following the @code{'='}). If there is no entry with
2943that name (or only a null entry), @code{0} is returned.
2944@end deftypefun
2945
2946@deftypefun {error_t} envz_add (char **@var{envz}, size_t *@var{envz_len}, const char *@var{name}, const char *@var{value})
d08a7e4c 2947@standards{GNU, envz.h}
11087373
AO
2948@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
2949@c Calls envz_remove, which calls enz_entry and argz_delete, and then
2950@c argz_add or equivalent code that reallocs and appends name=value.
b13927da
UD
2951The @code{envz_add} function adds an entry to @code{*@var{envz}}
2952(updating @code{*@var{envz}} and @code{*@var{envz_len}}) with the name
2953@var{name}, and value @var{value}. If an entry with the same name
2954already exists in @var{envz}, it is removed first. If @var{value} is
f0f308c1 2955@code{0}, then the new entry will be the special null type of entry
b13927da
UD
2956(mentioned above).
2957@end deftypefun
2958
2959@deftypefun {error_t} envz_merge (char **@var{envz}, size_t *@var{envz_len}, const char *@var{envz2}, size_t @var{envz2_len}, int @var{override})
d08a7e4c 2960@standards{GNU, envz.h}
11087373 2961@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
b13927da
UD
2962The @code{envz_merge} function adds each entry in @var{envz2} to @var{envz},
2963as if with @code{envz_add}, updating @code{*@var{envz}} and
2964@code{*@var{envz_len}}. If @var{override} is true, then values in @var{envz2}
2965will supersede those with the same name in @var{envz}, otherwise not.
2966
2967Null entries are treated just like other entries in this respect, so a null
2968entry in @var{envz} can prevent an entry of the same name in @var{envz2} from
2969being added to @var{envz}, if @var{override} is false.
2970@end deftypefun
2971
2972@deftypefun {void} envz_strip (char **@var{envz}, size_t *@var{envz_len})
d08a7e4c 2973@standards{GNU, envz.h}
11087373 2974@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
b13927da
UD
2975The @code{envz_strip} function removes any null entries from @var{envz},
2976updating @code{*@var{envz}} and @code{*@var{envz_len}}.
2977@end deftypefun
11087373 2978
920d7012 2979@deftypefun {void} envz_remove (char **@var{envz}, size_t *@var{envz_len}, const char *@var{name})
d08a7e4c 2980@standards{GNU, envz.h}
654055e0 2981@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
920d7012
SP
2982The @code{envz_remove} function removes an entry named @var{name} from
2983@var{envz}, updating @code{*@var{envz}} and @code{*@var{envz_len}}.
2984@end deftypefun
2985
11087373
AO
2986@c FIXME this are undocumented:
2987@c strcasecmp_l @safety{@mtsafe{}@assafe{}@acsafe{}} see strcasecmp