manual/string.texi

   1 @node String and Array Utilities, Extended Characters, Character Handling, Top
   2 @chapter String and Array Utilities
   3
   4 Operations on strings (or arrays of characters) are an important part of
   5 many programs.  The GNU C library provides an extensive set of string
   6 utility functions, including functions for copying, concatenating,
   7 comparing, and searching strings.  Many of these functions can also
   8 operate on arbitrary regions of storage; for example, the @code{memcpy}
   9 function can be used to copy the contents of any kind of array.
  10
  11 It's fairly common for beginning C programmers to ``reinvent the wheel''
  12 by duplicating this functionality in their own code, but it pays to
  13 become familiar with the library functions and to make use of them,
  14 since this offers benefits in maintenance, efficiency, and portability.
  15
  16 For instance, you could easily compare one string to another in two
  17 lines of C code, but if you use the built-in @code{strcmp} function,
  18 you're less likely to make a mistake.  And, since these library
  19 functions are typically highly optimized, your program may run faster
  20 too.
  21
  22 @menu
  23 * Representation of Strings::   Introduction to basic concepts.
  24 * String/Array Conventions::    Whether to use a string function or an
  25                                  arbitrary array function.
  26 * String Length::               Determining the length of a string.
  27 * Copying and Concatenation::   Functions to copy the contents of strings
  28                                  and arrays.
  29 * String/Array Comparison::     Functions for byte-wise and character-wise
  30                                  comparison.
  31 * Collation Functions::         Functions for collating strings.
  32 * Search Functions::            Searching for a specific element or substring.
  33 * Finding Tokens in a String::  Splitting a string into tokens by looking
  34                                  for delimiters.
  35 @end menu
  36
  37 @node Representation of Strings, String/Array Conventions,  , String and Array Utilities
  38 @section Representation of Strings
  39 @cindex string, representation of
  40
  41 This section is a quick summary of string concepts for beginning C
  42 programmers.  It describes how character strings are represented in C
  43 and some common pitfalls.  If you are already familiar with this
  44 material, you can skip this section.
  45
  46 @cindex string
  47 @cindex null character
  48 A @dfn{string} is an array of @code{char} objects.  But string-valued
  49 variables are usually declared to be pointers of type @code{char *}.
  50 Such variables do not include space for the text of a string; that has
  51 to be stored somewhere else---in an array variable, a string constant,
  52 or dynamically allocated memory (@pxref{Memory Allocation}).  It's up to
  53 you to store the address of the chosen memory space into the pointer
  54 variable.  Alternatively you can store a @dfn{null pointer} in the
  55 pointer variable.  The null pointer does not point anywhere, so
  56 attempting to reference the string it points to gets an error.
  57
  58 By convention, a @dfn{null character}, @code{'\0'}, marks the end of a
  59 string.  For example, in testing to see whether the @code{char *}
  60 variable @var{p} points to a null character marking the end of a string,
  61 you can write @code{!*@var{p}} or @code{*@var{p} == '\0'}.
  62
  63 A null character is quite different conceptually from a null pointer,
  64 although both are represented by the integer @code{0}.
  65
  66 @cindex string literal
  67 @dfn{String literals} appear in C program source as strings of
  68 characters between double-quote characters (@samp{"}).  In @w{ISO C},
  69 string literals can also be formed by @dfn{string concatenation}:
  70 @code{"a" "b"} is the same as @code{"ab"}.  Modification of string
  71 literals is not allowed by the GNU C compiler, because literals
  72 are placed in read-only storage.
  73
  74 Character arrays that are declared @code{const} cannot be modified
  75 either.  It's generally good style to declare non-modifiable string
  76 pointers to be of type @code{const char *}, since this often allows the
  77 C compiler to detect accidental modifications as well as providing some
  78 amount of documentation about what your program intends to do with the
  79 string.
  80
  81 The amount of memory allocated for the character array may extend past
  82 the null character that normally marks the end of the string.  In this
  83 document, the term @dfn{allocation size} is always used to refer to the
  84 total amount of memory allocated for the string, while the term
  85 @dfn{length} refers to the number of characters up to (but not
  86 including) the terminating null character.
  87 @cindex length of string
  88 @cindex allocation size of string
  89 @cindex size of string
  90 @cindex string length
  91 @cindex string allocation
  92
  93 A notorious source of program bugs is trying to put more characters in a
  94 string than fit in its allocated size.  When writing code that extends
  95 strings or moves characters into a pre-allocated array, you should be
  96 very careful to keep track of the length of the text and make explicit
  97 checks for overflowing the array.  Many of the library functions
  98 @emph{do not} do this for you!  Remember also that you need to allocate
  99 an extra byte to hold the null character that marks the end of the
 100 string.
 101
 102 @node String/Array Conventions, String Length, Representation of Strings, String and Array Utilities
 103 @section String and Array Conventions
 104
 105 This chapter describes both functions that work on arbitrary arrays or
 106 blocks of memory, and functions that are specific to null-terminated
 107 arrays of characters.
 108
 109 Functions that operate on arbitrary blocks of memory have names
 110 beginning with @samp{mem} (such as @code{memcpy}) and invariably take an
 111 argument which specifies the size (in bytes) of the block of memory to
 112 operate on.  The array arguments and return values for these functions
 113 have type @code{void *}, and as a matter of style, the elements of these
 114 arrays are referred to as ``bytes''.  You can pass any kind of pointer
 115 to these functions, and the @code{sizeof} operator is useful in
 116 computing the value for the size argument.
 117
 118 In contrast, functions that operate specifically on strings have names
 119 beginning with @samp{str} (such as @code{strcpy}) and look for a null
 120 character to terminate the string instead of requiring an explicit size
 121 argument to be passed.  (Some of these functions accept a specified
 122 maximum length, but they also check for premature termination with a
 123 null character.)  The array arguments and return values for these
 124 functions have type @code{char *}, and the array elements are referred
 125 to as ``characters''.
 126
 127 In many cases, there are both @samp{mem} and @samp{str} versions of a
 128 function.  The one that is more appropriate to use depends on the exact
 129 situation.  When your program is manipulating arbitrary arrays or blocks of
 130 storage, then you should always use the @samp{mem} functions.  On the
 131 other hand, when you are manipulating null-terminated strings it is
 132 usually more convenient to use the @samp{str} functions, unless you
 133 already know the length of the string in advance.
 134
 135 @node String Length, Copying and Concatenation, String/Array Conventions, String and Array Utilities
 136 @section String Length
 137
 138 You can get the length of a string using the @code{strlen} function.
 139 This function is declared in the header file @file{string.h}.
 140 @pindex string.h
 141
 142 @comment string.h
 143 @comment ISO
 144 @deftypefun size_t strlen (const char *@var{s})
 145 The @code{strlen} function returns the length of the null-terminated
 146 string @var{s}.  (In other words, it returns the offset of the terminating
 147 null character within the array.)
 148
 149 For example,
 150 @smallexample
 151 strlen ("hello, world")
 152     @result{} 12
 153 @end smallexample
 154
 155 When applied to a character array, the @code{strlen} function returns
 156 the length of the string stored there, not its allocation size.  You can
 157 get the allocation size of the character array that holds a string using
 158 the @code{sizeof} operator:
 159
 160 @smallexample
 161 char string[32] = "hello, world";
 162 sizeof (string)
 163     @result{} 32
 164 strlen (string)
 165     @result{} 12
 166 @end smallexample
 167 @end deftypefun
 168
 169 @node Copying and Concatenation, String/Array Comparison, String Length, String and Array Utilities
 170 @section Copying and Concatenation
 171
 172 You can use the functions described in this section to copy the contents
 173 of strings and arrays, or to append the contents of one string to
 174 another.  These functions are declared in the header file
 175 @file{string.h}.
 176 @pindex string.h
 177 @cindex copying strings and arrays
 178 @cindex string copy functions
 179 @cindex array copy functions
 180 @cindex concatenating strings
 181 @cindex string concatenation functions
 182
 183 A helpful way to remember the ordering of the arguments to the functions
 184 in this section is that it corresponds to an assignment expression, with
 185 the destination array specified to the left of the source array.  All
 186 of these functions return the address of the destination array.
 187
 188 Most of these functions do not work properly if the source and
 189 destination arrays overlap.  For example, if the beginning of the
 190 destination array overlaps the end of the source array, the original
 191 contents of that part of the source array may get overwritten before it
 192 is copied.  Even worse, in the case of the string functions, the null
 193 character marking the end of the string may be lost, and the copy
 194 function might get stuck in a loop trashing all the memory allocated to
 195 your program.
 196
 197 All functions that have problems copying between overlapping arrays are
 198 explicitly identified in this manual.  In addition to functions in this
 199 section, there are a few others like @code{sprintf} (@pxref{Formatted
 200 Output Functions}) and @code{scanf} (@pxref{Formatted Input
 201 Functions}).
 202
 203 @comment string.h
 204 @comment ISO
 205 @deftypefun {void *} memcpy (void *@var{to}, const void *@var{from}, size_t @var{size})
 206 The @code{memcpy} function copies @var{size} bytes from the object
 207 beginning at @var{from} into the object beginning at @var{to}.  The
 208 behavior of this function is undefined if the two arrays @var{to} and
 209 @var{from} overlap; use @code{memmove} instead if overlapping is possible.
 210
 211 The value returned by @code{memcpy} is the value of @var{to}.
 212
 213 Here is an example of how you might use @code{memcpy} to copy the
 214 contents of an array:
 215
 216 @smallexample
 217 struct foo *oldarray, *newarray;
 218 int arraysize;
 219 @dots{}
 220 memcpy (new, old, arraysize * sizeof (struct foo));
 221 @end smallexample
 222 @end deftypefun
 223
 224 @comment string.h
 225 @comment ISO
 226 @deftypefun {void *} memmove (void *@var{to}, const void *@var{from}, size_t @var{size})
 227 @code{memmove} copies the @var{size} bytes at @var{from} into the
 228 @var{size} bytes at @var{to}, even if those two blocks of space
 229 overlap.  In the case of overlap, @code{memmove} is careful to copy the
 230 original values of the bytes in the block at @var{from}, including those
 231 bytes which also belong to the block at @var{to}.
 232 @end deftypefun
 233
 234 @comment string.h
 235 @comment SVID
 236 @deftypefun {void *} memccpy (void *@var{to}, const void *@var{from}, int @var{c}, size_t @var{size})
 237 This function copies no more than @var{size} bytes from @var{from} to
 238 @var{to}, stopping if a byte matching @var{c} is found.  The return
 239 value is a pointer into @var{to} one byte past where @var{c} was copied,
 240 or a null pointer if no byte matching @var{c} appeared in the first
 241 @var{size} bytes of @var{from}.
 242 @end deftypefun
 243
 244 @comment string.h
 245 @comment ISO
 246 @deftypefun {void *} memset (void *@var{block}, int @var{c}, size_t @var{size})
 247 This function copies the value of @var{c} (converted to an
 248 @code{unsigned char}) into each of the first @var{size} bytes of the
 249 object beginning at @var{block}.  It returns the value of @var{block}.
 250 @end deftypefun
 251
 252 @comment string.h
 253 @comment ISO
 254 @deftypefun {char *} strcpy (char *@var{to}, const char *@var{from})
 255 This copies characters from the string @var{from} (up to and including
 256 the terminating null character) into the string @var{to}.  Like
 257 @code{memcpy}, this function has undefined results if the strings
 258 overlap.  The return value is the value of @var{to}.
 259 @end deftypefun
 260
 261 @comment string.h
 262 @comment ISO
 263 @deftypefun {char *} strncpy (char *@var{to}, const char *@var{from}, size_t @var{size})
 264 This function is similar to @code{strcpy} but always copies exactly
 265 @var{size} characters into @var{to}.
 266
 267 If the length of @var{from} is more than @var{size}, then @code{strncpy}
 268 copies just the first @var{size} characters.  Note that in this case
 269 there is no null terminator written into @var{to}.
 270
 271 If the length of @var{from} is less than @var{size}, then @code{strncpy}
 272 copies all of @var{from}, followed by enough null characters to add up
 273 to @var{size} characters in all.  This behavior is rarely useful, but it
 274 is specified by the @w{ISO C} standard.
 275
 276 The behavior of @code{strncpy} is undefined if the strings overlap.
 277
 278 Using @code{strncpy} as opposed to @code{strcpy} is a way to avoid bugs
 279 relating to writing past the end of the allocated space for @var{to}.
 280 However, it can also make your program much slower in one common case:
 281 copying a string which is probably small into a potentially large buffer.
 282 In this case, @var{size} may be large, and when it is, @code{strncpy} will
 283 waste a considerable amount of time copying null characters.
 284 @end deftypefun
 285
 286 @comment string.h
 287 @comment SVID
 288 @deftypefun {char *} strdup (const char *@var{s})
 289 This function copies the null-terminated string @var{s} into a newly
 290 allocated string.  The string is allocated using @code{malloc}; see
 291 @ref{Unconstrained Allocation}.  If @code{malloc} cannot allocate space
 292 for the new string, @code{strdup} returns a null pointer.  Otherwise it
 293 returns a pointer to the new string.
 294 @end deftypefun
 295
 296 @comment string.h
 297 @comment GNU
 298 @deftypefun {char *} strndup (const char *@var{s}, size_t @var{size})
 299 This function is similar to @code{strdup} but always copies at most
 300 @var{size} characters into the newly allocated string.
 301
 302 If the length of @var{s} is more than @var{size}, then @code{strndup}
 303 copies just the first @var{size} characters and adds a closing null
 304 terminator.  Otherwise all characters are copied and the string is
 305 terminated.
 306
 307 This function is different to @code{strncpy} in that it always
 308 terminates the destination string.
 309 @end deftypefun
 310
 311 @comment string.h
 312 @comment Unknown origin
 313 @deftypefun {char *} stpcpy (char *@var{to}, const char *@var{from})
 314 This function is like @code{strcpy}, except that it returns a pointer to
 315 the end of the string @var{to} (that is, the address of the terminating
 316 null character) rather than the beginning.
 317
 318 For example, this program uses @code{stpcpy} to concatenate @samp{foo}
 319 and @samp{bar} to produce @samp{foobar}, which it then prints.
 320
 321 @smallexample
 322 @include stpcpy.c.texi
 323 @end smallexample
 324
 325 This function is not part of the ISO or POSIX standards, and is not
 326 customary on Unix systems, but we did not invent it either.  Perhaps it
 327 comes from MS-DOG.
 328
 329 Its behavior is undefined if the strings overlap.
 330 @end deftypefun
 331
 332 @comment string.h
 333 @comment GNU
 334 @deftypefun {char *} stpncpy (char *@var{to}, const char *@var{from}, size_t @var{size})
 335 This function is similar to @code{stpcpy} but copies always exactly
 336 @var{size} characters into @var{to}.
 337
 338 If the length of @var{from} is more then @var{size}, then @code{stpncpy}
 339 copies just the first @var{size} characters and returns a pointer to the
 340 character directly following the one which was copied last.  Note that in
 341 this case there is no null terminator written into @var{to}.
 342
 343 If the length of @var{from} is less than @var{size}, then @code{stpncpy}
 344 copies all of @var{from}, followed by enough null characters to add up
 345 to @var{size} characters in all.  This behaviour is rarely useful, but it
 346 is implemented to be useful in contexts where this behaviour of the
 347 @code{strncpy} is used.  @code{stpncpy} returns a pointer to the
 348 @emph{first} written null character.
 349
 350 This function is not part of ISO or POSIX but was found useful while
 351 developing GNU C Library itself.
 352
 353 Its behaviour is undefined if the strings overlap.
 354 @end deftypefun
 355
 356 @comment string.h
 357 @comment GNU
 358 @deftypefun {char *} strdupa (const char *@var{s})
 359 This function is similar to @code{strdup} but allocates the new string
 360 using @code{alloca} instead of @code{malloc}
 361 @pxref{Variable Size Automatic}.  This means of course the returned
 362 string has the same limitations as any block of memory allocated using
 363 @code{alloca}.
 364
 365 For obvious reasons @code{strdupa} is implemented only as a macro.  I.e.,
 366 you cannot get the address of this function.  Despite this limitations
 367 it is a useful function.  The following code shows a situation where
 368 using @code{malloc} would be a lot more expensive.
 369
 370 @smallexample
 371 @include strdupa.c.texi
 372 @end smallexample
 373
 374 Please note that calling @code{strtok} using @var{path} directly is
 375 illegal.
 376
 377 This function is only available if GNU CC is used.
 378 @end deftypefun
 379
 380 @comment string.h
 381 @comment GNU
 382 @deftypefun {char *} strndupa (const char *@var{s}, size_t @var{size})
 383 This function is similar to @code{strndup} but like @code{strdupa} it
 384 allocates the new string using @code{alloca}
 385 @pxref{Variable Size Automatic}.  The same advantages and limitations
 386 of @code{strdupa} are valid for @code{strndupa}, too.
 387
 388 This function is implemented only as a macro which means one cannot
 389 get the address of it.
 390
 391 @code{strndupa} is only available if GNU CC is used.
 392 @end deftypefun
 393
 394 @comment string.h
 395 @comment ISO
 396 @deftypefun {char *} strcat (char *@var{to}, const char *@var{from})
 397 The @code{strcat} function is similar to @code{strcpy}, except that the
 398 characters from @var{from} are concatenated or appended to the end of
 399 @var{to}, instead of overwriting it.  That is, the first character from
 400 @var{from} overwrites the null character marking the end of @var{to}.
 401
 402 An equivalent definition for @code{strcat} would be:
 403
 404 @smallexample
 405 char *
 406 strcat (char *to, const char *from)
 407 @{
 408   strcpy (to + strlen (to), from);
 409   return to;
 410 @}
 411 @end smallexample
 412
 413 This function has undefined results if the strings overlap.
 414 @end deftypefun
 415
 416 @comment string.h
 417 @comment ISO
 418 @deftypefun {char *} strncat (char *@var{to}, const char *@var{from}, size_t @var{size})
 419 This function is like @code{strcat} except that not more than @var{size}
 420 characters from @var{from} are appended to the end of @var{to}.  A
 421 single null character is also always appended to @var{to}, so the total
 422 allocated size of @var{to} must be at least @code{@var{size} + 1} bytes
 423 longer than its initial length.
 424
 425 The @code{strncat} function could be implemented like this:
 426
 427 @smallexample
 428 @group
 429 char *
 430 strncat (char *to, const char *from, size_t size)
 431 @{
 432   strncpy (to + strlen (to), from, size);
 433   return to;
 434 @}
 435 @end group
 436 @end smallexample
 437
 438 The behavior of @code{strncat} is undefined if the strings overlap.
 439 @end deftypefun
 440
 441 Here is an example showing the use of @code{strncpy} and @code{strncat}.
 442 Notice how, in the call to @code{strncat}, the @var{size} parameter
 443 is computed to avoid overflowing the character array @code{buffer}.
 444
 445 @smallexample
 446 @include strncat.c.texi
 447 @end smallexample
 448
 449 @noindent
 450 The output produced by this program looks like:
 451
 452 @smallexample
 453 hello
 454 hello, wo
 455 @end smallexample
 456
 457 @comment string.h
 458 @comment BSD
 459 @deftypefun {void *} bcopy (void *@var{from}, const void *@var{to}, size_t @var{size})
 460 This is a partially obsolete alternative for @code{memmove}, derived from
 461 BSD.  Note that it is not quite equivalent to @code{memmove}, because the
 462 arguments are not in the same order.
 463 @end deftypefun
 464
 465 @comment string.h
 466 @comment BSD
 467 @deftypefun {void *} bzero (void *@var{block}, size_t @var{size})
 468 This is a partially obsolete alternative for @code{memset}, derived from
 469 BSD.  Note that it is not as general as @code{memset}, because the only
 470 value it can store is zero.
 471 @end deftypefun
 472
 473 @node String/Array Comparison, Collation Functions, Copying and Concatenation, String and Array Utilities
 474 @section String/Array Comparison
 475 @cindex comparing strings and arrays
 476 @cindex string comparison functions
 477 @cindex array comparison functions
 478 @cindex predicates on strings
 479 @cindex predicates on arrays
 480
 481 You can use the functions in this section to perform comparisons on the
 482 contents of strings and arrays.  As well as checking for equality, these
 483 functions can also be used as the ordering functions for sorting
 484 operations.  @xref{Searching and Sorting}, for an example of this.
 485
 486 Unlike most comparison operations in C, the string comparison functions
 487 return a nonzero value if the strings are @emph{not} equivalent rather
 488 than if they are.  The sign of the value indicates the relative ordering
 489 of the first characters in the strings that are not equivalent:  a
 490 negative value indicates that the first string is ``less'' than the
 491 second, while a positive value indicates that the first string is
 492 ``greater''.
 493
 494 The most common use of these functions is to check only for equality.
 495 This is canonically done with an expression like @w{@samp{! strcmp (s1, s2)}}.
 496
 497 All of these functions are declared in the header file @file{string.h}.
 498 @pindex string.h
 499
 500 @comment string.h
 501 @comment ISO
 502 @deftypefun int memcmp (const void *@var{a1}, const void *@var{a2}, size_t @var{size})
 503 The function @code{memcmp} compares the @var{size} bytes of memory
 504 beginning at @var{a1} against the @var{size} bytes of memory beginning
 505 at @var{a2}.  The value returned has the same sign as the difference
 506 between the first differing pair of bytes (interpreted as @code{unsigned
 507 char} objects, then promoted to @code{int}).
 508
 509 If the contents of the two blocks are equal, @code{memcmp} returns
 510 @code{0}.
 511 @end deftypefun
 512
 513 On arbitrary arrays, the @code{memcmp} function is mostly useful for
 514 testing equality.  It usually isn't meaningful to do byte-wise ordering
 515 comparisons on arrays of things other than bytes.  For example, a
 516 byte-wise comparison on the bytes that make up floating-point numbers
 517 isn't likely to tell you anything about the relationship between the
 518 values of the floating-point numbers.
 519
 520 You should also be careful about using @code{memcmp} to compare objects
 521 that can contain ``holes'', such as the padding inserted into structure
 522 objects to enforce alignment requirements, extra space at the end of
 523 unions, and extra characters at the ends of strings whose length is less
 524 than their allocated size.  The contents of these ``holes'' are
 525 indeterminate and may cause strange behavior when performing byte-wise
 526 comparisons.  For more predictable results, perform an explicit
 527 component-wise comparison.
 528
 529 For example, given a structure type definition like:
 530
 531 @smallexample
 532 struct foo
 533   @{
 534     unsigned char tag;
 535     union
 536       @{
 537         double f;
 538         long i;
 539         char *p;
 540       @} value;
 541   @};
 542 @end smallexample
 543
 544 @noindent
 545 you are better off writing a specialized comparison function to compare
 546 @code{struct foo} objects instead of comparing them with @code{memcmp}.
 547
 548 @comment string.h
 549 @comment ISO
 550 @deftypefun int strcmp (const char *@var{s1}, const char *@var{s2})
 551 The @code{strcmp} function compares the string @var{s1} against
 552 @var{s2}, returning a value that has the same sign as the difference
 553 between the first differing pair of characters (interpreted as
 554 @code{unsigned char} objects, then promoted to @code{int}).
 555
 556 If the two strings are equal, @code{strcmp} returns @code{0}.
 557
 558 A consequence of the ordering used by @code{strcmp} is that if @var{s1}
 559 is an initial substring of @var{s2}, then @var{s1} is considered to be
 560 ``less than'' @var{s2}.
 561 @end deftypefun
 562
 563 @comment string.h
 564 @comment BSD
 565 @deftypefun int strcasecmp (const char *@var{s1}, const char *@var{s2})
 566 This function is like @code{strcmp}, except that differences in case
 567 are ignored.
 568
 569 @code{strcasecmp} is derived from BSD.
 570 @end deftypefun
 571
 572 @comment string.h
 573 @comment BSD
 574 @deftypefun int strncasecmp (const char *@var{s1}, const char *@var{s2}, size_t @var{n})
 575 This function is like @code{strncmp}, except that differences in case
 576 are ignored.
 577
 578 @code{strncasecmp} is a GNU extension.
 579 @end deftypefun
 580
 581 @comment string.h
 582 @comment ISO
 583 @deftypefun int strncmp (const char *@var{s1}, const char *@var{s2}, size_t @var{size})
 584 This function is the similar to @code{strcmp}, except that no more than
 585 @var{size} characters are compared.  In other words, if the two strings are
 586 the same in their first @var{size} characters, the return value is zero.
 587 @end deftypefun
 588
 589 Here are some examples showing the use of @code{strcmp} and @code{strncmp}.
 590 These examples assume the use of the ASCII character set.  (If some
 591 other character set---say, EBCDIC---is used instead, then the glyphs
 592 are associated with different numeric codes, and the return values
 593 and ordering may differ.)
 594
 595 @smallexample
 596 strcmp ("hello", "hello")
 597     @result{} 0    /* @r{These two strings are the same.} */
 598 strcmp ("hello", "Hello")
 599     @result{} 32   /* @r{Comparisons are case-sensitive.} */
 600 strcmp ("hello", "world")
 601     @result{} -15  /* @r{The character @code{'h'} comes before @code{'w'}.} */
 602 strcmp ("hello", "hello, world")
 603     @result{} -44  /* @r{Comparing a null character against a comma.} */
 604 strncmp ("hello", "hello, world", 5)
 605     @result{} 0    /* @r{The initial 5 characters are the same.} */
 606 strncmp ("hello, world", "hello, stupid world!!!", 5)
 607     @result{} 0    /* @r{The initial 5 characters are the same.} */
 608 @end smallexample
 609
 610 @comment string.h
 611 @comment BSD
 612 @deftypefun int bcmp (const void *@var{a1}, const void *@var{a2}, size_t @var{size})
 613 This is an obsolete alias for @code{memcmp}, derived from BSD.
 614 @end deftypefun
 615
 616 @node Collation Functions, Search Functions, String/Array Comparison, String and Array Utilities
 617 @section Collation Functions
 618
 619 @cindex collating strings
 620 @cindex string collation functions
 621
 622 In some locales, the conventions for lexicographic ordering differ from
 623 the strict numeric ordering of character codes.  For example, in Spanish
 624 most glyphs with diacritical marks such as accents are not considered
 625 distinct letters for the purposes of collation.  On the other hand, the
 626 two-character sequence @samp{ll} is treated as a single letter that is
 627 collated immediately after @samp{l}.
 628
 629 You can use the functions @code{strcoll} and @code{strxfrm} (declared in
 630 the header file @file{string.h}) to compare strings using a collation
 631 ordering appropriate for the current locale.  The locale used by these
 632 functions in particular can be specified by setting the locale for the
 633 @code{LC_COLLATE} category; see @ref{Locales}.
 634 @pindex string.h
 635
 636 In the standard C locale, the collation sequence for @code{strcoll} is
 637 the same as that for @code{strcmp}.
 638
 639 Effectively, the way these functions work is by applying a mapping to
 640 transform the characters in a string to a byte sequence that represents
 641 the string's position in the collating sequence of the current locale.
 642 Comparing two such byte sequences in a simple fashion is equivalent to
 643 comparing the strings with the locale's collating sequence.
 644
 645 The function @code{strcoll} performs this translation implicitly, in
 646 order to do one comparison.  By contrast, @code{strxfrm} performs the
 647 mapping explicitly.  If you are making multiple comparisons using the
 648 same string or set of strings, it is likely to be more efficient to use
 649 @code{strxfrm} to transform all the strings just once, and subsequently
 650 compare the transformed strings with @code{strcmp}.
 651
 652 @comment string.h
 653 @comment ISO
 654 @deftypefun int strcoll (const char *@var{s1}, const char *@var{s2})
 655 The @code{strcoll} function is similar to @code{strcmp} but uses the
 656 collating sequence of the current locale for collation (the
 657 @code{LC_COLLATE} locale).
 658 @end deftypefun
 659
 660 Here is an example of sorting an array of strings, using @code{strcoll}
 661 to compare them.  The actual sort algorithm is not written here; it
 662 comes from @code{qsort} (@pxref{Array Sort Function}).  The job of the
 663 code shown here is to say how to compare the strings while sorting them.
 664 (Later on in this section, we will show a way to do this more
 665 efficiently using @code{strxfrm}.)
 666
 667 @smallexample
 668 /* @r{This is the comparison function used with @code{qsort}.} */
 669
 670 int
 671 compare_elements (char **p1, char **p2)
 672 @{
 673   return strcoll (*p1, *p2);
 674 @}
 675
 676 /* @r{This is the entry point---the function to sort}
 677    @r{strings using the locale's collating sequence.} */
 678
 679 void
 680 sort_strings (char **array, int nstrings)
 681 @{
 682   /* @r{Sort @code{temp_array} by comparing the strings.} */
 683   qsort (array, sizeof (char *),
 684          nstrings, compare_elements);
 685 @}
 686 @end smallexample
 687
 688 @cindex converting string to collation order
 689 @comment string.h
 690 @comment ISO
 691 @deftypefun size_t strxfrm (char *@var{to}, const char *@var{from}, size_t @var{size})
 692 The function @code{strxfrm} transforms @var{string} using the collation
 693 transformation determined by the locale currently selected for
 694 collation, and stores the transformed string in the array @var{to}.  Up
 695 to @var{size} characters (including a terminating null character) are
 696 stored.
 697
 698 The behavior is undefined if the strings @var{to} and @var{from}
 699 overlap; see @ref{Copying and Concatenation}.
 700
 701 The return value is the length of the entire transformed string.  This
 702 value is not affected by the value of @var{size}, but if it is greater
 703 or equal than @var{size}, it means that the transformed string did not
 704 entirely fit in the array @var{to}.  In this case, only as much of the
 705 string as actually fits was stored.  To get the whole transformed
 706 string, call @code{strxfrm} again with a bigger output array.
 707
 708 The transformed string may be longer than the original string, and it
 709 may also be shorter.
 710
 711 If @var{size} is zero, no characters are stored in @var{to}.  In this
 712 case, @code{strxfrm} simply returns the number of characters that would
 713 be the length of the transformed string.  This is useful for determining
 714 what size string to allocate.  It does not matter what @var{to} is if
 715 @var{size} is zero; @var{to} may even be a null pointer.
 716 @end deftypefun
 717
 718 Here is an example of how you can use @code{strxfrm} when
 719 you plan to do many comparisons.  It does the same thing as the previous
 720 example, but much faster, because it has to transform each string only
 721 once, no matter how many times it is compared with other strings.  Even
 722 the time needed to allocate and free storage is much less than the time
 723 we save, when there are many strings.
 724
 725 @smallexample
 726 struct sorter @{ char *input; char *transformed; @};
 727
 728 /* @r{This is the comparison function used with @code{qsort}}
 729    @r{to sort an array of @code{struct sorter}.} */
 730
 731 int
 732 compare_elements (struct sorter *p1, struct sorter *p2)
 733 @{
 734   return strcmp (p1->transformed, p2->transformed);
 735 @}
 736
 737 /* @r{This is the entry point---the function to sort}
 738    @r{strings using the locale's collating sequence.} */
 739
 740 void
 741 sort_strings_fast (char **array, int nstrings)
 742 @{
 743   struct sorter temp_array[nstrings];
 744   int i;
 745
 746   /* @r{Set up @code{temp_array}.  Each element contains}
 747      @r{one input string and its transformed string.} */
 748   for (i = 0; i < nstrings; i++)
 749     @{
 750       size_t length = strlen (array[i]) * 2;
 751       char *transformed;
 752       size_t transformed_lenght;
 753
 754       temp_array[i].input = array[i];
 755
 756       /* @r{First try a buffer perhaps big enough.}  */
 757       transformed = (char *) xmalloc (length);
 758
 759       /* @r{Transform @code{array[i]}.}  */
 760       transformed_length = strxfrm (transformed, array[i], length);
 761
 762       /* @r{If the buffer was not large enough, resize it}
 763          @r{and try again.}  */
 764       if (transformed_length >= length)
 765         @{
 766           /* @r{Allocate the needed space. +1 for terminating}
 767              @r{@code{NUL} character.}  */
 768           transformed = (char *) xrealloc (transformed,
 769                                            transformed_length + 1);
 770
 771           /* @r{The return value is not interesting because we know}
 772              @r{how long the transformed string is.}  */
 773           (void) strxfrm (transformed, array[i], transformed_length + 1);
 774         @}
 775
 776       temp_array[i].transformed = transformed;
 777     @}
 778
 779   /* @r{Sort @code{temp_array} by comparing transformed strings.} */
 780   qsort (temp_array, sizeof (struct sorter),
 781          nstrings, compare_elements);
 782
 783   /* @r{Put the elements back in the permanent array}
 784      @r{in their sorted order.} */
 785   for (i = 0; i < nstrings; i++)
 786     array[i] = temp_array[i].input;
 787
 788   /* @r{Free the strings we allocated.} */
 789   for (i = 0; i < nstrings; i++)
 790     free (temp_array[i].transformed);
 791 @}
 792 @end smallexample
 793
 794 @strong{Compatibility Note:}  The string collation functions are a new
 795 feature of @w{ISO C}.  Older C dialects have no equivalent feature.
 796
 797 @node Search Functions, Finding Tokens in a String, Collation Functions, String and Array Utilities
 798 @section Search Functions
 799
 800 This section describes library functions which perform various kinds
 801 of searching operations on strings and arrays.  These functions are
 802 declared in the header file @file{string.h}.
 803 @pindex string.h
 804 @cindex search functions (for strings)
 805 @cindex string search functions
 806
 807 @comment string.h
 808 @comment ISO
 809 @deftypefun {void *} memchr (const void *@var{block}, int @var{c}, size_t @var{size})
 810 This function finds the first occurrence of the byte @var{c} (converted
 811 to an @code{unsigned char}) in the initial @var{size} bytes of the
 812 object beginning at @var{block}.  The return value is a pointer to the
 813 located byte, or a null pointer if no match was found.
 814 @end deftypefun
 815
 816 @comment string.h
 817 @comment ISO
 818 @deftypefun {char *} strchr (const char *@var{string}, int @var{c})
 819 The @code{strchr} function finds the first occurrence of the character
 820 @var{c} (converted to a @code{char}) in the null-terminated string
 821 beginning at @var{string}.  The return value is a pointer to the located
 822 character, or a null pointer if no match was found.
 823
 824 For example,
 825 @smallexample
 826 strchr ("hello, world", 'l')
 827     @result{} "llo, world"
 828 strchr ("hello, world", '?')
 829     @result{} NULL
 830 @end smallexample
 831
 832 The terminating null character is considered to be part of the string,
 833 so you can use this function get a pointer to the end of a string by
 834 specifying a null character as the value of the @var{c} argument.
 835 @end deftypefun
 836
 837 @comment string.h
 838 @comment BSD
 839 @deftypefun {char *} index (const char *@var{string}, int @var{c})
 840 @code{index} is another name for @code{strchr}; they are exactly the same.
 841 @end deftypefun
 842
 843 @comment string.h
 844 @comment ISO
 845 @deftypefun {char *} strrchr (const char *@var{string}, int @var{c})
 846 The function @code{strrchr} is like @code{strchr}, except that it searches
 847 backwards from the end of the string @var{string} (instead of forwards
 848 from the front).
 849
 850 For example,
 851 @smallexample
 852 strrchr ("hello, world", 'l')
 853     @result{} "ld"
 854 @end smallexample
 855 @end deftypefun
 856
 857 @comment string.h
 858 @comment BSD
 859 @deftypefun {char *} rindex (const char *@var{string}, int @var{c})
 860 @code{rindex} is another name for @code{strrchr}; they are exactly the same.
 861 @end deftypefun
 862
 863 @comment string.h
 864 @comment ISO
 865 @deftypefun {char *} strstr (const char *@var{haystack}, const char *@var{needle})
 866 This is like @code{strchr}, except that it searches @var{haystack} for a
 867 substring @var{needle} rather than just a single character.  It
 868 returns a pointer into the string @var{haystack} that is the first
 869 character of the substring, or a null pointer if no match was found.  If
 870 @var{needle} is an empty string, the function returns @var{haystack}.
 871
 872 For example,
 873 @smallexample
 874 strstr ("hello, world", "l")
 875     @result{} "llo, world"
 876 strstr ("hello, world", "wo")
 877     @result{} "world"
 878 @end smallexample
 879 @end deftypefun
 880
 881
 882 @comment string.h
 883 @comment GNU
 884 @deftypefun {void *} memmem (const void *@var{haystack}, size_t @var{haystack-len},@*const void *@var{needle}, size_t @var{needle-len})
 885 This is like @code{strstr}, but @var{needle} and @var{haystack} are byte
 886 arrays rather than null-terminated strings.  @var{needle-len} is the
 887 length of @var{needle} and @var{haystack-len} is the length of
 888 @var{haystack}.@refill
 889
 890 This function is a GNU extension.
 891 @end deftypefun
 892
 893 @comment string.h
 894 @comment ISO
 895 @deftypefun size_t strspn (const char *@var{string}, const char *@var{skipset})
 896 The @code{strspn} (``string span'') function returns the length of the
 897 initial substring of @var{string} that consists entirely of characters that
 898 are members of the set specified by the string @var{skipset}.  The order
 899 of the characters in @var{skipset} is not important.
 900
 901 For example,
 902 @smallexample
 903 strspn ("hello, world", "abcdefghijklmnopqrstuvwxyz")
 904     @result{} 5
 905 @end smallexample
 906 @end deftypefun
 907
 908 @comment string.h
 909 @comment ISO
 910 @deftypefun size_t strcspn (const char *@var{string}, const char *@var{stopset})
 911 The @code{strcspn} (``string complement span'') function returns the length
 912 of the initial substring of @var{string} that consists entirely of characters
 913 that are @emph{not} members of the set specified by the string @var{stopset}.
 914 (In other words, it returns the offset of the first character in @var{string}
 915 that is a member of the set @var{stopset}.)
 916
 917 For example,
 918 @smallexample
 919 strcspn ("hello, world", " \t\n,.;!?")
 920     @result{} 5
 921 @end smallexample
 922 @end deftypefun
 923
 924 @comment string.h
 925 @comment ISO
 926 @deftypefun {char *} strpbrk (const char *@var{string}, const char *@var{stopset})
 927 The @code{strpbrk} (``string pointer break'') function is related to
 928 @code{strcspn}, except that it returns a pointer to the first character
 929 in @var{string} that is a member of the set @var{stopset} instead of the
 930 length of the initial substring.  It returns a null pointer if no such
 931 character from @var{stopset} is found.
 932
 933 @c @group  Invalid outside the example.
 934 For example,
 935
 936 @smallexample
 937 strpbrk ("hello, world", " \t\n,.;!?")
 938     @result{} ", world"
 939 @end smallexample
 940 @c @end group
 941 @end deftypefun
 942
 943 @node Finding Tokens in a String,  , Search Functions, String and Array Utilities
 944 @section Finding Tokens in a String
 945
 946 @cindex tokenizing strings
 947 @cindex breaking a string into tokens
 948 @cindex parsing tokens from a string
 949 It's fairly common for programs to have a need to do some simple kinds
 950 of lexical analysis and parsing, such as splitting a command string up
 951 into tokens.  You can do this with the @code{strtok} function, declared
 952 in the header file @file{string.h}.
 953 @pindex string.h
 954
 955 @comment string.h
 956 @comment ISO
 957 @deftypefun {char *} strtok (char *@var{newstring}, const char *@var{delimiters})
 958 A string can be split into tokens by making a series of calls to the
 959 function @code{strtok}.
 960
 961 The string to be split up is passed as the @var{newstring} argument on
 962 the first call only.  The @code{strtok} function uses this to set up
 963 some internal state information.  Subsequent calls to get additional
 964 tokens from the same string are indicated by passing a null pointer as
 965 the @var{newstring} argument.  Calling @code{strtok} with another
 966 non-null @var{newstring} argument reinitializes the state information.
 967 It is guaranteed that no other library function ever calls @code{strtok}
 968 behind your back (which would mess up this internal state information).
 969
 970 The @var{delimiters} argument is a string that specifies a set of delimiters
 971 that may surround the token being extracted.  All the initial characters
 972 that are members of this set are discarded.  The first character that is
 973 @emph{not} a member of this set of delimiters marks the beginning of the
 974 next token.  The end of the token is found by looking for the next
 975 character that is a member of the delimiter set.  This character in the
 976 original string @var{newstring} is overwritten by a null character, and the
 977 pointer to the beginning of the token in @var{newstring} is returned.
 978
 979 On the next call to @code{strtok}, the searching begins at the next
 980 character beyond the one that marked the end of the previous token.
 981 Note that the set of delimiters @var{delimiters} do not have to be the
 982 same on every call in a series of calls to @code{strtok}.
 983
 984 If the end of the string @var{newstring} is reached, or if the remainder of
 985 string consists only of delimiter characters, @code{strtok} returns
 986 a null pointer.
 987 @end deftypefun
 988
 989 @strong{Warning:} Since @code{strtok} alters the string it is parsing,
 990 you always copy the string to a temporary buffer before parsing it with
 991 @code{strtok}.  If you allow @code{strtok} to modify a string that came
 992 from another part of your program, you are asking for trouble; that
 993 string may be part of a data structure that could be used for other
 994 purposes during the parsing, when alteration by @code{strtok} makes the
 995 data structure temporarily inaccurate.
 996
 997 The string that you are operating on might even be a constant.  Then
 998 when @code{strtok} tries to modify it, your program will get a fatal
 999 signal for writing in read-only memory.  @xref{Program Error Signals}.
1000
1001 This is a special case of a general principle: if a part of a program
1002 does not have as its purpose the modification of a certain data
1003 structure, then it is error-prone to modify the data structure
1004 temporarily.
1005
1006 The function @code{strtok} is not reentrant.  @xref{Nonreentrancy}, for
1007 a discussion of where and why reentrancy is important.
1008
1009 Here is a simple example showing the use of @code{strtok}.
1010
1011 @comment Yes, this example has been tested.
1012 @smallexample
1013 #include <string.h>
1014 #include <stddef.h>
1015
1016 @dots{}
1017
1018 char string[] = "words separated by spaces -- and, punctuation!";
1019 const char delimiters[] = " .,;:!-";
1020 char *token;
1021
1022 @dots{}
1023
1024 token = strtok (string, delimiters);  /* token => "words" */
1025 token = strtok (NULL, delimiters);    /* token => "separated" */
1026 token = strtok (NULL, delimiters);    /* token => "by" */
1027 token = strtok (NULL, delimiters);    /* token => "spaces" */
1028 token = strtok (NULL, delimiters);    /* token => "and" */
1029 token = strtok (NULL, delimiters);    /* token => "punctuation" */
1030 token = strtok (NULL, delimiters);    /* token => NULL */
1031 @end smallexample
1032
1033 The GNU C library contains two more functions for tokenizing a string
1034 which overcome the limitation of non-reentrancy.
1035
1036 @comment string.h
1037 @comment POSIX
1038 @deftypefun {char *} strtok_r (char *@var{newstring}, const char *@var{delimiters}, char **@var{save_ptr})
1039 Just like @code{strtok} this function splits the string into several
1040 tokens which can be accessed be successive calls to @code{strtok_r}.
1041 The difference is that the information about the next token is not set
1042 up in some internal state information.  Instead the caller has to
1043 provide another argument @var{save_ptr} which is a pointer to a string
1044 pointer.  Calling @code{strtok_r} with a null pointer for
1045 @var{newstring} and leaving @var{save_ptr} between the calls unchanged
1046 does the job without limiting reentrancy.
1047
1048 This function was proposed for POSIX.1b and can be found on many systems
1049 which support multi-threading.
1050 @end deftypefun
1051
1052 @comment string.h
1053 @comment BSD
1054 @deftypefun {char *} strsep (char **@var{string_ptr}, const char *@var{delimiter})
1055 A second reentrant approach is to avoid the additional first argument.
1056 The initialization of the moving pointer has to be done by the user.
1057 Successive calls of @code{strsep} move the pointer along the tokens
1058 separated by @var{delimiter}, returning the address of the next token
1059 and updating @var{string_ptr} to point to the beginning of the next
1060 token.
1061
1062 This function was introduced in 4.3BSD and therefore is widely available.
1063 @end deftypefun
1064
1065 Here is how the above example looks like when @code{strsep} is used.
1066
1067 @comment Yes, this example has been tested.
1068 @smallexample
1069 #include <string.h>
1070 #include <stddef.h>
1071
1072 @dots{}
1073
1074 char string[] = "words separated by spaces -- and, punctuation!";
1075 const char delimiters[] = " .,;:!-";
1076 char *running;
1077 char *token;
1078
1079 @dots{}
1080
1081 running = string;
1082 token = strsep (&running, delimiters);    /* token => "words" */
1083 token = strsep (&running, delimiters);    /* token => "separated" */
1084 token = strsep (&running, delimiters);    /* token => "by" */
1085 token = strsep (&running, delimiters);    /* token => "spaces" */
1086 token = strsep (&running, delimiters);    /* token => "and" */
1087 token = strsep (&running, delimiters);    /* token => "punctuation" */
1088 token = strsep (&running, delimiters);    /* token => NULL */
1089 @end smallexample