]> git.ipfire.org Git - thirdparty/glibc.git/blame - manual/search.texi
.
[thirdparty/glibc.git] / manual / search.texi
CommitLineData
40a55d20 1@node Searching and Sorting, Pattern Matching, Message Translation, Top
7a68c94a 2@c %MENU% General searching and sorting functions
f65fd747 3@chapter Searching and Sorting
28f540f4
RM
4
5This chapter describes functions for searching and sorting arrays of
6arbitrary objects. You pass the appropriate comparison function to be
7applied as an argument, along with the size of the objects in the array
8and the total number of elements.
9
10@menu
11* Comparison Functions:: Defining how to compare two objects.
12 Since the sort and search facilities
13 are general, you have to specify the
f65fd747 14 ordering.
28f540f4
RM
15* Array Search Function:: The @code{bsearch} function.
16* Array Sort Function:: The @code{qsort} function.
17* Search/Sort Example:: An example program.
2604afb1
UD
18* Hash Search Function:: The @code{hsearch} function.
19* Tree Search Function:: The @code{tsearch} function.
28f540f4
RM
20@end menu
21
2604afb1 22@node Comparison Functions
28f540f4
RM
23@section Defining the Comparison Function
24@cindex Comparison Function
25
26In order to use the sorted array library functions, you have to describe
27how to compare the elements of the array.
28
29To do this, you supply a comparison function to compare two elements of
30the array. The library will call this function, passing as arguments
31pointers to two array elements to be compared. Your comparison function
32should return a value the way @code{strcmp} (@pxref{String/Array
33Comparison}) does: negative if the first argument is ``less'' than the
34second, zero if they are ``equal'', and positive if the first argument
35is ``greater''.
36
37Here is an example of a comparison function which works with an array of
38numbers of type @code{double}:
39
40@smallexample
41int
68ef28ed 42compare_doubles (const void *a, const void *b)
28f540f4 43@{
68ef28ed
UD
44 const double *da = (const double *) a;
45 const double *db = (const double *) b;
46
47 return (*da > *db) - (*da < *db);
28f540f4
RM
48@}
49@end smallexample
50
51The header file @file{stdlib.h} defines a name for the data type of
52comparison functions. This type is a GNU extension.
53
54@comment stdlib.h
55@comment GNU
56@tindex comparison_fn_t
57@smallexample
58int comparison_fn_t (const void *, const void *);
59@end smallexample
60
2604afb1 61@node Array Search Function
28f540f4
RM
62@section Array Search Function
63@cindex search function (for arrays)
64@cindex binary search function (for arrays)
65@cindex array search function
66
2604afb1
UD
67Generally searching for a specific element in an array means that
68potentially all elements must be checked. The GNU C library contains
69functions to perform linear search. The prototypes for the following
70two functions can be found in @file{search.h}.
71
72@comment search.h
73@comment SVID
74@deftypefun {void *} lfind (const void *@var{key}, void *@var{base}, size_t *@var{nmemb}, size_t @var{size}, comparison_fn_t @var{compar})
75The @code{lfind} function searches in the array with @code{*@var{nmemb}}
76elements of @var{size} bytes pointed to by @var{base} for an element
77which matches the one pointed to by @var{key}. The function pointed to
78by @var{compar} is used decide whether two elements match.
79
80The return value is a pointer to the matching element in the array
81starting at @var{base} if it is found. If no matching element is
82available @code{NULL} is returned.
83
84The mean runtime of this function is @code{*@var{nmemb}}/2. This
32c075e1 85function should only be used elements often get added to or deleted from
2604afb1
UD
86the array in which case it might not be useful to sort the array before
87searching.
88@end deftypefun
89
90@comment search.h
91@comment SVID
92@deftypefun {void *} lsearch (const void *@var{key}, void *@var{base}, size_t *@var{nmemb}, size_t @var{size}, comparison_fn_t @var{compar})
93The @code{lsearch} function is similar to the @code{lfind} function. It
94searches the given array for an element and returns it if found. The
95difference is that if no matching element is found the @code{lsearch}
96function adds the object pointed to by @var{key} (with a size of
97@var{size} bytes) at the end of the array and it increments the value of
98@code{*@var{nmemb}} to reflect this addition.
99
100This means for the caller that if it is not sure that the array contains
101the element one is searching for the memory allocated for the array
102starting at @var{base} must have room for at least @var{size} more
103bytes. If one is sure the element is in the array it is better to use
104@code{lfind} so having more room in the array is always necessary when
105calling @code{lsearch}.
106@end deftypefun
107
28f540f4
RM
108To search a sorted array for an element matching the key, use the
109@code{bsearch} function. The prototype for this function is in
110the header file @file{stdlib.h}.
111@pindex stdlib.h
112
113@comment stdlib.h
f65fd747 114@comment ISO
28f540f4
RM
115@deftypefun {void *} bsearch (const void *@var{key}, const void *@var{array}, size_t @var{count}, size_t @var{size}, comparison_fn_t @var{compare})
116The @code{bsearch} function searches the sorted array @var{array} for an object
117that is equivalent to @var{key}. The array contains @var{count} elements,
f65fd747 118each of which is of size @var{size} bytes.
28f540f4
RM
119
120The @var{compare} function is used to perform the comparison. This
121function is called with two pointer arguments and should return an
122integer less than, equal to, or greater than zero corresponding to
123whether its first argument is considered less than, equal to, or greater
124than its second argument. The elements of the @var{array} must already
125be sorted in ascending order according to this comparison function.
126
127The return value is a pointer to the matching array element, or a null
128pointer if no match is found. If the array contains more than one element
129that matches, the one that is returned is unspecified.
130
131This function derives its name from the fact that it is implemented
132using the binary search algorithm.
133@end deftypefun
134
2604afb1 135@node Array Sort Function
28f540f4
RM
136@section Array Sort Function
137@cindex sort function (for arrays)
138@cindex quick sort function (for arrays)
139@cindex array sort function
140
141To sort an array using an arbitrary comparison function, use the
142@code{qsort} function. The prototype for this function is in
143@file{stdlib.h}.
144@pindex stdlib.h
145
146@comment stdlib.h
f65fd747 147@comment ISO
28f540f4
RM
148@deftypefun void qsort (void *@var{array}, size_t @var{count}, size_t @var{size}, comparison_fn_t @var{compare})
149The @var{qsort} function sorts the array @var{array}. The array contains
150@var{count} elements, each of which is of size @var{size}.
151
152The @var{compare} function is used to perform the comparison on the
153array elements. This function is called with two pointer arguments and
154should return an integer less than, equal to, or greater than zero
155corresponding to whether its first argument is considered less than,
156equal to, or greater than its second argument.
157
158@cindex stable sorting
159@strong{Warning:} If two objects compare as equal, their order after
160sorting is unpredictable. That is to say, the sorting is not stable.
161This can make a difference when the comparison considers only part of
162the elements. Two elements with the same sort key may differ in other
163respects.
164
165If you want the effect of a stable sort, you can get this result by
166writing the comparison function so that, lacking other reason
167distinguish between two elements, it compares them by their addresses.
168Note that doing this may make the sorting algorithm less efficient, so
169do it only if necessary.
170
171Here is a simple example of sorting an array of doubles in numerical
172order, using the comparison function defined above (@pxref{Comparison
173Functions}):
174
175@smallexample
176@{
177 double *array;
178 int size;
179 @dots{}
180 qsort (array, size, sizeof (double), compare_doubles);
181@}
182@end smallexample
183
184The @code{qsort} function derives its name from the fact that it was
185originally implemented using the ``quick sort'' algorithm.
21ad6b26
AJ
186
187The implementation of @code{qsort} in this library might not be an
188in-place sort and might thereby use an extra amount of memory to store
189the array.
28f540f4
RM
190@end deftypefun
191
2604afb1 192@node Search/Sort Example
28f540f4
RM
193@section Searching and Sorting Example
194
195Here is an example showing the use of @code{qsort} and @code{bsearch}
196with an array of structures. The objects in the array are sorted
197by comparing their @code{name} fields with the @code{strcmp} function.
198Then, we can look up individual objects based on their names.
199
200@comment This example is dedicated to the memory of Jim Henson. RIP.
201@smallexample
202@include search.c.texi
203@end smallexample
204
205@cindex Kermit the frog
206The output from this program looks like:
207
208@smallexample
209Kermit, the frog
210Piggy, the pig
211Gonzo, the whatever
212Fozzie, the bear
213Sam, the eagle
214Robin, the frog
215Animal, the animal
216Camilla, the chicken
217Sweetums, the monster
218Dr. Strangepork, the pig
219Link Hogthrob, the pig
220Zoot, the human
221Dr. Bunsen Honeydew, the human
222Beaker, the human
223Swedish Chef, the human
224
225Animal, the animal
226Beaker, the human
227Camilla, the chicken
228Dr. Bunsen Honeydew, the human
229Dr. Strangepork, the pig
230Fozzie, the bear
231Gonzo, the whatever
232Kermit, the frog
233Link Hogthrob, the pig
234Piggy, the pig
235Robin, the frog
236Sam, the eagle
237Swedish Chef, the human
238Sweetums, the monster
239Zoot, the human
240
241Kermit, the frog
242Gonzo, the whatever
243Couldn't find Janice.
244@end smallexample
2604afb1
UD
245
246
247@node Hash Search Function
248@section The @code{hsearch} function.
249
32c075e1 250The functions mentioned so far in this chapter are searching in a sorted
2604afb1
UD
251or unsorted array. There are other methods to organize information
252which later should be searched. The costs of insert, delete and search
253differ. One possible implementation is using hashing tables.
32c075e1 254The following functions are declared in the the header file @file{search.h}.
2604afb1
UD
255
256@comment search.h
257@comment SVID
258@deftypefun int hcreate (size_t @var{nel})
259The @code{hcreate} function creates a hashing table which can contain at
260least @var{nel} elements. There is no possibility to grow this table so
32c075e1
JJ
261it is necessary to choose the value for @var{nel} wisely. The used
262methods to implement this function might make it necessary to make the
2604afb1 263number of elements in the hashing table larger than the expected maximal
32c075e1 264number of elements. Hashing tables usually work inefficient if they are
2604afb1
UD
265filled 80% or more. The constant access time guaranteed by hashing can
266only be achieved if few collisions exist. See Knuth's ``The Art of
267Computer Programming, Part 3: Searching and Sorting'' for more
268information.
269
270The weakest aspect of this function is that there can be at most one
f2ea0f5b 271hashing table used through the whole program. The table is allocated
2604afb1
UD
272in local memory out of control of the programmer. As an extension the
273GNU C library provides an additional set of functions with an reentrant
274interface which provide a similar interface but which allow to keep
49c091e5 275arbitrarily many hashing tables.
2604afb1
UD
276
277It is possible to use more than one hashing table in the program run if
278the former table is first destroyed by a call to @code{hdestroy}.
279
280The function returns a non-zero value if successful. If it return zero
281something went wrong. This could either mean there is already a hashing
282table in use or the program runs out of memory.
283@end deftypefun
284
285@comment search.h
286@comment SVID
287@deftypefun void hdestroy (void)
288The @code{hdestroy} function can be used to free all the resources
289allocated in a previous call of @code{hcreate}. After a call to this
290function it is again possible to call @code{hcreate} and allocate a new
291table with possibly different size.
292
293It is important to remember that the elements contained in the hashing
294table at the time @code{hdestroy} is called are @emph{not} freed by this
295function. It is the responsibility of the program code to free those
f2ea0f5b 296strings (if necessary at all). Freeing all the element memory is not
2604afb1
UD
297possible without extra, separately kept information since there is no
298function to iterate through all available elements in the hashing table.
299If it is really necessary to free a table and all elements the
300programmer has to keep a list of all table elements and before calling
301@code{hdestroy} s/he has to free all element's data using this list.
f2ea0f5b 302This is a very unpleasant mechanism and it also shows that this kind of
2604afb1
UD
303hashing tables is mainly meant for tables which are created once and
304used until the end of the program run.
305@end deftypefun
306
307Entries of the hashing table and keys for the search are defined using
308this type:
309
310@deftp {Data type} {struct ENTRY}
311Both elements of this structure are pointers to zero-terminated strings.
312This is a limiting restriction of the functionality of the
313@code{hsearch} functions. They can only be used for data sets which use
314the NUL character always and solely to terminate the records. It is not
315possible to handle general binary data.
316
317@table @code
318@item char *key
319Pointer to a zero-terminated string of characters describing the key for
320the search or the element in the hashing table.
321@item char *data
322Pointer to a zero-terminated string of characters describing the data.
323If the functions will be called only for searching an existing entry
324this element might stay undefined since it is not used.
325@end table
326@end deftp
327
328@comment search.h
329@comment SVID
330@deftypefun {ENTRY *} hsearch (ENTRY @var{item}, ACTION @var{action})
331To search in a hashing table created using @code{hcreate} the
332@code{hsearch} function must be used. This function can perform simple
333search for an element (if @var{action} has the @code{FIND}) or it can
afdef815
UD
334alternatively insert the key element into the hashing table. Entries
335are never replaced.
2604afb1
UD
336
337The key is denoted by a pointer to an object of type @code{ENTRY}. For
338locating the corresponding position in the hashing table only the
339@code{key} element of the structure is used.
340
afdef815
UD
341If an entry with matching key is found the @var{action} parameter is
342irrelevant. The found entry is returned. If no matching entry is found
343and the @var{action} parameter has the value @code{FIND} the function
344returns a @code{NULL} pointer. If no entry is found and the
345@var{action} parameter has the value @code{ENTER} a new entry is added
346to the hashing table which is initialized with the parameter @var{item}.
347A pointer to the newly added entry is returned.
2604afb1
UD
348@end deftypefun
349
350As mentioned before the hashing table used by the functions described so
351far is global and there can be at any time at most one hashing table in
352the program. A solution is to use the following functions which are a
353GNU extension. All have in common that they operate on a hashing table
354which is described by the content of an object of the type @code{struct
355hsearch_data}. This type should be treated as opaque, none of its
356members should be changed directly.
357
358@comment search.h
359@comment GNU
360@deftypefun int hcreate_r (size_t @var{nel}, struct hsearch_data *@var{htab})
f2ea0f5b 361The @code{hcreate_r} function initializes the object pointed to by
2604afb1
UD
362@var{htab} to contain a hashing table with at least @var{nel} elements.
363So this function is equivalent to the @code{hcreate} function except
364that the initialized data structure is controlled by the user.
365
0d1ef15f
UD
366This allows having more than one hashing table at one time. The memory
367necessary for the @code{struct hsearch_data} object can be allocated
368dynamically. It must be initialized with zero before calling this
369function.
2604afb1 370
32c075e1
JJ
371The return value is non-zero if the operation were successful. if the
372return value is zero something went wrong which probably means the
373programs runs out of memory.
2604afb1
UD
374@end deftypefun
375
376@comment search.h
377@comment GNU
378@deftypefun void hdestroy_r (struct hsearch_data *@var{htab})
379The @code{hdestroy_r} function frees all resources allocated by the
380@code{hcreate_r} function for this very same object @var{htab}. As for
381@code{hdestroy} it is the programs responsibility to free the strings
382for the elements of the table.
383@end deftypefun
384
385@comment search.h
386@comment GNU
387@deftypefun int hsearch_r (ENTRY @var{item}, ACTION @var{action}, ENTRY **@var{retval}, struct hsearch_data *@var{htab})
388The @code{hsearch_r} function is equivalent to @code{hsearch}. The
389meaning of the first two arguments is identical. But instead of
f2ea0f5b 390operating on a single global hashing table the function works on the
2604afb1
UD
391table described by the object pointed to by @var{htab} (which is
392initialized by a call to @code{hcreate_r}).
393
394Another difference to @code{hcreate} is that the pointer to the found
395entry in the table is not the return value of the functions. It is
396returned by storing it in a pointer variables pointed to by the
397@var{retval} parameter. The return value of the function is an integer
398value indicating success if it is non-zero and failure if it is zero.
49c091e5 399In the latter case the global variable @var{errno} signals the reason for
2604afb1
UD
400the failure.
401
402@table @code
403@item ENOMEM
404The table is filled and @code{hsearch_r} was called with an so far
405unknown key and @var{action} set to @code{ENTER}.
406@item ESRCH
407The @var{action} parameter is @code{FIND} and no corresponding element
408is found in the table.
409@end table
410@end deftypefun
411
412
413@node Tree Search Function
414@section The @code{tsearch} function.
415
416Another common form to organize data for efficient search is to use
417trees. The @code{tsearch} function family provides a nice interface to
418functions to organize possibly large amounts of data by providing a mean
419access time proportional to the logarithm of the number of elements.
420The GNU C library implementation even guarantees that this bound is
421never exceeded even for input data which cause problems for simple
422binary tree implementations.
423
f2ea0f5b 424The functions described in the chapter are all described in the @w{System
2604afb1
UD
425V} and X/Open specifications and are therefore quite portable.
426
427In contrast to the @code{hsearch} functions the @code{tsearch} functions
428can be used with arbitrary data and not only zero-terminated strings.
429
430The @code{tsearch} functions have the advantage that no function to
431initialize data structures is necessary. A simple pointer of type
432@code{void *} initialized to @code{NULL} is a valid tree and can be
ce460d04
RM
433extended or searched. The prototypes for these functions can be found
434in the header file @file{search.h}.
2604afb1
UD
435
436@comment search.h
437@comment SVID
438@deftypefun {void *} tsearch (const void *@var{key}, void **@var{rootp}, comparison_fn_t @var{compar})
439The @code{tsearch} function searches in the tree pointed to by
440@code{*@var{rootp}} for an element matching @var{key}. The function
f2ea0f5b 441pointed to by @var{compar} is used to determine whether two elements
8b7fb588 442match. @xref{Comparison Functions}, for a specification of the functions
2604afb1
UD
443which can be used for the @var{compar} parameter.
444
445If the tree does not contain a matching entry the @var{key} value will
446be added to the tree. @code{tsearch} does not make a copy of the object
447pointed to by @var{key} (how could it since the size is unknown).
448Instead it adds a reference to this object which means the object must
449be available as long as the tree data structure is used.
450
451The tree is represented by a pointer to a pointer since it is sometimes
452necessary to change the root node of the tree. So it must not be
453assumed that the variable pointed to by @var{rootp} has the same value
454after the call. This also shows that it is not safe to call the
455@code{tsearch} function more than once at the same time using the same
456tree. It is no problem to run it more than once at a time on different
457trees.
458
459The return value is a pointer to the matching element in the tree. If a
460new element was created the pointer points to the new data (which is in
461fact @var{key}). If an entry had to be created and the program ran out
462of space @code{NULL} is returned.
463@end deftypefun
464
465@comment search.h
466@comment SVID
467@deftypefun {void *} tfind (const void *@var{key}, void *const *@var{rootp}, comparison_fn_t @var{compar})
468The @code{tfind} function is similar to the @code{tsearch} function. It
469locates an element matching the one pointed to by @var{key} and returns
470a pointer to this element. But if no matching element is available no
471new element is entered (note that the @var{rootp} parameter points to a
472constant pointer). Instead the function returns @code{NULL}.
473@end deftypefun
474
475Another advantage of the @code{tsearch} function in contrast to the
476@code{hsearch} functions is that there is an easy way to remove
477elements.
478
479@comment search.h
480@comment SVID
481@deftypefun {void *} tdelete (const void *@var{key}, void **@var{rootp}, comparison_fn_t @var{compar})
482To remove a specific element matching @var{key} from the tree
483@code{tdelete} can be used. It locates the matching element using the
484same method as @code{tfind}. The corresponding element is then removed
2864e767
UD
485and a pointer to the parent of the deleted node is returned by the
486function. If there is no matching entry in the tree nothing can be
487deleted and the function returns @code{NULL}. If the root of the tree
488is deleted @code{tdelete} returns some unspecified value not equal to
489@code{NULL}.
2604afb1
UD
490@end deftypefun
491
492@comment search.h
493@comment GNU
494@deftypefun void tdestroy (void *@var{vroot}, __free_fn_t @var{freefct})
495If the complete search tree has to be removed one can use
496@code{tdestroy}. It frees all resources allocated by the @code{tsearch}
497function to generate the tree pointed to by @var{vroot}.
498
499For the data in each tree node the function @var{freefct} is called.
500The pointer to the data is passed as the argument to the function. If
501no such work is necessary @var{freefct} must point to a function doing
502nothing. It is called in any case.
503
504This function is a GNU extension and not covered by the @w{System V} or
505X/Open specifications.
506@end deftypefun
507
508In addition to the function to create and destroy the tree data
e8b1163e
AJ
509structure, there is another function which allows you to apply a
510function to all elements of the tree. The function must have this type:
2604afb1
UD
511
512@smallexample
8c479619 513void __action_fn_t (const void *nodep, VISIT value, int level);
2604afb1
UD
514@end smallexample
515
8c479619 516The @var{nodep} is the data value of the current node (once given as the
2604afb1
UD
517@var{key} argument to @code{tsearch}). @var{level} is a numeric value
518which corresponds to the depth of the current node in the tree. The
519root node has the depth @math{0} and its children have a depth of
520@math{1} and so on. The @code{VISIT} type is an enumeration type.
521
522@deftp {Data Type} VISIT
523The @code{VISIT} value indicates the status of the current node in the
524tree and how the function is called. The status of a node is either
525`leaf' or `internal node'. For each leaf node the function is called
526exactly once, for each internal node it is called three times: before
527the first child is processed, after the first child is processed and
f2ea0f5b 528after both children are processed. This makes it possible to handle all
2604afb1
UD
529three methods of tree traversal (or even a combination of them).
530
531@table @code
532@item preorder
533The current node is an internal node and the function is called before
534the first child was processed.
da2a3ca6 535@item postorder
2604afb1
UD
536The current node is an internal node and the function is called after
537the first child was processed.
da2a3ca6 538@item endorder
2604afb1
UD
539The current node is an internal node and the function is called after
540the second child was processed.
541@item leaf
542The current node is a leaf.
543@end table
544@end deftp
545
546@comment search.h
547@comment SVID
548@deftypefun void twalk (const void *@var{root}, __action_fn_t @var{action})
da2a3ca6 549For each node in the tree with a node pointed to by @var{root}, the
2604afb1
UD
550@code{twalk} function calls the function provided by the parameter
551@var{action}. For leaf nodes the function is called exactly once with
552@var{value} set to @code{leaf}. For internal nodes the function is
553called three times, setting the @var{value} parameter or @var{action} to
554the appropriate value. The @var{level} argument for the @var{action}
555function is computed while descending the tree with increasing the value
f2ea0f5b 556by one for the descend to a child, starting with the value @math{0} for
2604afb1
UD
557the root node.
558
559Since the functions used for the @var{action} parameter to @code{twalk}
da2a3ca6
UD
560must not modify the tree data, it is safe to run @code{twalk} in more
561than one thread at the same time, working on the same tree. It is also
2604afb1 562safe to call @code{tfind} in parallel. Functions which modify the tree
0bc93a2f 563must not be used, otherwise the behavior is undefined.
2604afb1 564@end deftypefun