]>
Commit | Line | Data |
---|---|---|
fea681da MK |
1 | .\" Copyright (c) 1998 Andries Brouwer |
2 | .\" | |
e4a74ca8 | 3 | .\" SPDX-License-Identifier: GPL-2.0-or-later |
fea681da MK |
4 | .\" |
5 | .\" 2003-08-24 fix for / by John Kristoff + joey | |
6 | .\" | |
ed6c69ca | 7 | .TH GLOB 7 2020-08-13 "Linux" "Linux Programmer's Manual" |
fea681da | 8 | .SH NAME |
f68512e9 | 9 | glob \- globbing pathnames |
fea681da | 10 | .SH DESCRIPTION |
b4112efb | 11 | Long ago, in UNIX\ V6, there was a program |
fea681da MK |
12 | .I /etc/glob |
13 | that would expand wildcard patterns. | |
5fab2e7c | 14 | Soon afterward this became a shell built-in. |
a721e8b2 | 15 | .PP |
fea681da MK |
16 | These days there is also a library routine |
17 | .BR glob (3) | |
18 | that will perform this function for a user program. | |
a721e8b2 | 19 | .PP |
4dec66f9 | 20 | The rules are as follows (POSIX.2, 3.13). |
73d8cece | 21 | .SS Wildcard matching |
fea681da | 22 | A string is a wildcard pattern if it contains one of the |
735334d4 | 23 | characters \(aq?\(aq, \(aq*\(aq, or \(aq[\(aq. |
c13182ef | 24 | Globbing is the operation |
fea681da | 25 | that expands a wildcard pattern into the list of pathnames |
c13182ef MK |
26 | matching the pattern. |
27 | Matching is defined by: | |
a721e8b2 | 28 | .PP |
333a424b | 29 | A \(aq?\(aq (not between brackets) matches any single character. |
a721e8b2 | 30 | .PP |
333a424b | 31 | A \(aq*\(aq (not between brackets) matches any string, |
fea681da | 32 | including the empty string. |
1ce284ec MK |
33 | .PP |
34 | .B "Character classes" | |
6545cc56 | 35 | .PP |
333a424b MK |
36 | An expression "\fI[...]\fP" where the first character after the |
37 | leading \(aq[\(aq is not an \(aq!\(aq matches a single character, | |
fea681da MK |
38 | namely any of the characters enclosed by the brackets. |
39 | The string enclosed by the brackets cannot be empty; | |
333a424b | 40 | therefore \(aq]\(aq can be allowed between the brackets, provided |
c13182ef | 41 | that it is the first character. |
333a424b | 42 | (Thus, "\fI[][!]\fP" matches the |
735334d4 | 43 | three characters \(aq[\(aq, \(aq]\(aq, and \(aq!\(aq.) |
1ce284ec MK |
44 | .PP |
45 | .B Ranges | |
6545cc56 | 46 | .PP |
fea681da | 47 | There is one special convention: |
333a424b | 48 | two characters separated by \(aq\-\(aq denote a range. |
c45660d7 MK |
49 | (Thus, "\fI[A\-Fa\-f0\-9]\fP" |
50 | is equivalent to "\fI[ABCDEFabcdef0123456789]\fP".) | |
333a424b | 51 | One may include \(aq\-\(aq in its literal meaning by making it the |
fea681da | 52 | first or last character between the brackets. |
333a424b MK |
53 | (Thus, "\fI[]\-]\fP" matches just the two characters \(aq]\(aq and \(aq\-\(aq, |
54 | and "\fI[\-\-0]\fP" matches the | |
55 | three characters \(aq\-\(aq, \(aq.\(aq, \(aq0\(aq, since \(aq/\(aq | |
fea681da | 56 | cannot be matched.) |
1ce284ec MK |
57 | .PP |
58 | .B Complementation | |
6545cc56 | 59 | .PP |
333a424b | 60 | An expression "\fI[!...]\fP" matches a single character, namely |
fea681da | 61 | any character that is not matched by the expression obtained |
333a424b MK |
62 | by removing the first \(aq!\(aq from it. |
63 | (Thus, "\fI[!]a\-]\fP" matches any | |
735334d4 | 64 | single character except \(aq]\(aq, \(aqa\(aq, and \(aq\-\(aq.) |
a721e8b2 | 65 | .PP |
735334d4 | 66 | One can remove the special meaning of \(aq?\(aq, \(aq*\(aq, and \(aq[\(aq by |
fea681da MK |
67 | preceding them by a backslash, or, in case this is part of |
68 | a shell command line, enclosing them in quotes. | |
69 | Between brackets these characters stand for themselves. | |
31a6818e | 70 | Thus, "\fI[[?*\e]\fP" matches the |
735334d4 | 71 | four characters \(aq[\(aq, \(aq?\(aq, \(aq*\(aq, and \(aq\e\(aq. |
1ce284ec | 72 | .SS Pathnames |
fea681da | 73 | Globbing is applied on each of the components of a pathname |
c13182ef | 74 | separately. |
333a424b MK |
75 | A \(aq/\(aq in a pathname cannot be matched by a \(aq?\(aq or \(aq*\(aq |
76 | wildcard, or by a range like "\fI[.\-0]\fP". | |
1bceaaee MK |
77 | A range containing an explicit \(aq/\(aq character is syntactically incorrect. |
78 | (POSIX requires that syntactically incorrect patterns are left unchanged.) | |
a721e8b2 | 79 | .PP |
c45660d7 MK |
80 | If a filename starts with a \(aq.\(aq, |
81 | this character must be matched explicitly. | |
333a424b MK |
82 | (Thus, \fIrm\ *\fP will not remove .profile, and \fItar\ c\ *\fP will not |
83 | archive all your files; \fItar\ c\ .\fP is better.) | |
73d8cece | 84 | .SS Empty lists |
333a424b | 85 | The nice and simple rule given above: "expand a wildcard pattern |
008f1ecc | 86 | into the list of matching pathnames" was the original UNIX |
c13182ef MK |
87 | definition. |
88 | It allowed one to have patterns that expand into | |
fea681da | 89 | an empty list, as in |
a721e8b2 | 90 | .PP |
fea681da | 91 | .nf |
7295b7ed | 92 | xv \-wait 0 *.gif *.jpg |
fea681da | 93 | .fi |
a721e8b2 | 94 | .PP |
fea681da MK |
95 | where perhaps no *.gif files are present (and this is not |
96 | an error). | |
97 | However, POSIX requires that a wildcard pattern is left | |
98 | unchanged when it is syntactically incorrect, or the list of | |
99 | matching pathnames is empty. | |
100 | With | |
101 | .I bash | |
c998e004 | 102 | one can force the classical behavior using this command: |
a721e8b2 | 103 | .PP |
1ae6b2c7 AC |
104 | .in +4n |
105 | .EX | |
106 | shopt \-s nullglob | |
107 | .EE | |
108 | .in | |
c998e004 | 109 | .\" In Bash v1, by setting allow_null_glob_expansion=true |
a721e8b2 | 110 | .PP |
c13182ef | 111 | (Similar problems occur elsewhere. |
59dc509c | 112 | For example, where old scripts have |
a721e8b2 | 113 | .PP |
1ae6b2c7 AC |
114 | .in +4n |
115 | .EX | |
116 | rm \`find . \-name "*\(ti"\` | |
117 | .EE | |
118 | .in | |
a721e8b2 | 119 | .PP |
fea681da | 120 | new scripts require |
a721e8b2 | 121 | .PP |
1ae6b2c7 AC |
122 | .in +4n |
123 | .EX | |
124 | rm \-f nosuchfile \`find . \-name "*\(ti"\` | |
125 | .EE | |
126 | .in | |
a721e8b2 | 127 | .PP |
fea681da MK |
128 | to avoid error messages from |
129 | .I rm | |
130 | called with an empty argument list.) | |
fea681da MK |
131 | .SH NOTES |
132 | .SS Regular expressions | |
133 | Note that wildcard patterns are not regular expressions, | |
c13182ef MK |
134 | although they are a bit similar. |
135 | First of all, they match | |
fea681da | 136 | filenames, rather than text, and secondly, the conventions |
333a424b | 137 | are not the same: for example, in a regular expression \(aq*\(aq means zero or |
fea681da | 138 | more copies of the preceding thing. |
a721e8b2 | 139 | .PP |
fea681da | 140 | Now that regular expressions have bracket expressions where |
9ca13180 MK |
141 | the negation is indicated by a \(aq\(ha\(aq, POSIX has declared the |
142 | effect of a wildcard pattern "\fI[\(ha...]\fP" to be undefined. | |
c634028a | 143 | .SS Character classes and internationalization |
fea681da | 144 | Of course ranges were originally meant to be ASCII ranges, |
333a424b | 145 | so that "\fI[\ \-%]\fP" stands for "\fI[\ !"#$%]\fP" and "\fI[a\-z]\fP" stands |
fea681da | 146 | for "any lowercase letter". |
008f1ecc | 147 | Some UNIX implementations generalized this so that a range X\-Y |
fea681da | 148 | stands for the set of characters with code between the codes for |
c13182ef MK |
149 | X and for Y. |
150 | However, this requires the user to know the | |
fea681da MK |
151 | character coding in use on the local system, and moreover, is |
152 | not convenient if the collating sequence for the local alphabet | |
153 | differs from the ordering of the character codes. | |
154 | Therefore, POSIX extended the bracket notation greatly, | |
155 | both for wildcard patterns and for regular expressions. | |
156 | In the above we saw three types of items that can occur in a bracket | |
157 | expression: namely (i) the negation, (ii) explicit single characters, | |
c13182ef MK |
158 | and (iii) ranges. |
159 | POSIX specifies ranges in an internationally | |
fea681da | 160 | more useful way and adds three more types: |
a721e8b2 | 161 | .PP |
4d9b6984 | 162 | (iii) Ranges X\-Y comprise all characters that fall between X |
9fdfa163 | 163 | and Y (inclusive) in the current collating sequence as defined |
097585ed MK |
164 | by the |
165 | .B LC_COLLATE | |
166 | category in the current locale. | |
a721e8b2 | 167 | .PP |
fea681da | 168 | (iv) Named character classes, like |
408731d4 | 169 | .PP |
fea681da MK |
170 | .nf |
171 | [:alnum:] [:alpha:] [:blank:] [:cntrl:] | |
172 | [:digit:] [:graph:] [:lower:] [:print:] | |
173 | [:punct:] [:space:] [:upper:] [:xdigit:] | |
174 | .fi | |
408731d4 | 175 | .PP |
333a424b MK |
176 | so that one can say "\fI[[:lower:]]\fP" instead of "\fI[a\-z]\fP", and have |
177 | things work in Denmark, too, where there are three letters past \(aqz\(aq | |
fea681da | 178 | in the alphabet. |
1274071a MK |
179 | These character classes are defined by the |
180 | .B LC_CTYPE | |
181 | category | |
fea681da | 182 | in the current locale. |
a721e8b2 | 183 | .PP |
333a424b MK |
184 | (v) Collating symbols, like "\fI[.ch.]\fP" or "\fI[.a-acute.]\fP", |
185 | where the string between "\fI[.\fP" and "\fI.]\fP" is a collating | |
c13182ef MK |
186 | element defined for the current locale. |
187 | Note that this may | |
ae03dc66 | 188 | be a multicharacter element. |
a721e8b2 | 189 | .PP |
333a424b MK |
190 | (vi) Equivalence class expressions, like "\fI[=a=]\fP", |
191 | where the string between "\fI[=\fP" and "\fI=]\fP" is any collating | |
fea681da | 192 | element from its equivalence class, as defined for the |
c13182ef | 193 | current locale. |
333a424b | 194 | For example, "\fI[[=a=]]\fP" might be equivalent |
7b97eb9f | 195 | to "\fI[a\('a\(\`a\(:a\(^a]\fP", that is, |
333a424b | 196 | to "\fI[a[.a-acute.][.a-grave.][.a-umlaut.][.a-circumflex.]]\fP". |
47297adb | 197 | .SH SEE ALSO |
fea681da MK |
198 | .BR sh (1), |
199 | .BR fnmatch (3), | |
200 | .BR glob (3), | |
201 | .BR locale (7), | |
202 | .BR regex (7) |