]>
Commit | Line | Data |
---|---|---|
fea681da MK |
1 | .\" (C) Copyright 1999-2000 David A. Wheeler (dwheeler@dwheeler.com) |
2 | .\" | |
93015253 | 3 | .\" %%%LICENSE_START(VERBATIM) |
fea681da MK |
4 | .\" Permission is granted to make and distribute verbatim copies of this |
5 | .\" manual provided the copyright notice and this permission notice are | |
6 | .\" preserved on all copies. | |
7 | .\" | |
8 | .\" Permission is granted to copy and distribute modified versions of this | |
9 | .\" manual under the conditions for verbatim copying, provided that the | |
10 | .\" entire resulting derived work is distributed under the terms of a | |
11 | .\" permission notice identical to this one. | |
c13182ef | 12 | .\" |
fea681da MK |
13 | .\" Since the Linux kernel and libraries are constantly changing, this |
14 | .\" manual page may be incorrect or out-of-date. The author(s) assume no | |
15 | .\" responsibility for errors or omissions, or for damages resulting from | |
16 | .\" the use of the information contained herein. The author(s) may not | |
17 | .\" have taken the same level of care in the production of this manual, | |
18 | .\" which is licensed free of charge, as they might when working | |
19 | .\" professionally. | |
c13182ef | 20 | .\" |
fea681da MK |
21 | .\" Formatted or processed versions of this manual, if unaccompanied by |
22 | .\" the source, must acknowledge the copyright and authors of this work. | |
4b72fb64 | 23 | .\" %%%LICENSE_END |
fea681da MK |
24 | .\" |
25 | .\" Fragments of this document are directly derived from IETF standards. | |
26 | .\" For those fragments which are directly derived from such standards, | |
27 | .\" the following notice applies, which is the standard copyright and | |
28 | .\" rights announcement of The Internet Society: | |
29 | .\" | |
30 | .\" Copyright (C) The Internet Society (1998). All Rights Reserved. | |
31 | .\" This document and translations of it may be copied and furnished to | |
32 | .\" others, and derivative works that comment on or otherwise explain it | |
33 | .\" or assist in its implementation may be prepared, copied, published | |
34 | .\" and distributed, in whole or in part, without restriction of any | |
35 | .\" kind, provided that the above copyright notice and this paragraph are | |
36 | .\" included on all such copies and derivative works. However, this | |
37 | .\" document itself may not be modified in any way, such as by removing | |
38 | .\" the copyright notice or references to the Internet Society or other | |
39 | .\" Internet organizations, except as needed for the purpose of | |
40 | .\" developing Internet standards in which case the procedures for | |
41 | .\" copyrights defined in the Internet Standards process must be | |
42 | .\" followed, or as required to translate it into languages other than English. | |
43 | .\" | |
44 | .\" Modified Fri Jul 25 23:00:00 1999 by David A. Wheeler (dwheeler@dwheeler.com) | |
45 | .\" Modified Fri Aug 21 23:00:00 1999 by David A. Wheeler (dwheeler@dwheeler.com) | |
46 | .\" Modified Tue Mar 14 2000 by David A. Wheeler (dwheeler@dwheeler.com) | |
47 | .\" | |
52d06f48 | 48 | .TH URI 7 2014-03-18 "Linux" "Linux Programmer's Manual" |
fea681da MK |
49 | .SH NAME |
50 | uri, url, urn \- uniform resource identifier (URI), including a URL or URN | |
51 | .SH SYNOPSIS | |
52 | .nf | |
53 | .HP 0.2i | |
54 | URI = [ absoluteURI | relativeURI ] [ "#" fragment ] | |
55 | .HP | |
56 | absoluteURI = scheme ":" ( hierarchical_part | opaque_part ) | |
57 | .HP | |
58 | relativeURI = ( net_path | absolute_path | relative_path ) [ "?" query ] | |
fea681da | 59 | .HP |
c79efe90 MK |
60 | scheme = "http" | "ftp" | "gopher" | "mailto" | "news" | "telnet" | |
61 | "file" | "man" | "info" | "whatis" | "ldap" | "wais" | \&... | |
fea681da MK |
62 | .HP |
63 | hierarchical_part = ( net_path | absolute_path ) [ "?" query ] | |
fea681da MK |
64 | .HP |
65 | net_path = "//" authority [ absolute_path ] | |
66 | .HP | |
67 | absolute_path = "/" path_segments | |
68 | .HP | |
69 | relative_path = relative_segment [ absolute_path ] | |
70 | .fi | |
71 | .SH DESCRIPTION | |
72 | .PP | |
73 | A Uniform Resource Identifier (URI) is a short string of characters | |
74 | identifying an abstract or physical resource (for example, a web page). | |
75 | A Uniform Resource Locator (URL) is a URI | |
76 | that identifies a resource through its primary access | |
77 | mechanism (e.g., its network "location"), rather than | |
78 | by name or some other attribute of that resource. | |
79 | A Uniform Resource Name (URN) is a URI | |
80 | that must remain globally unique and persistent even when | |
81 | the resource ceases to exist or becomes unavailable. | |
82 | .PP | |
83 | URIs are the standard way to name hypertext link destinations | |
84 | for tools such as web browsers. | |
6116ff44 MK |
85 | The string "http://www.kernelnotes.org" is a URL (and thus it |
86 | is also a URI). | |
fea681da MK |
87 | Many people use the term URL loosely as a synonym for URI |
88 | (though technically URLs are a subset of URIs). | |
89 | .PP | |
90 | URIs can be absolute or relative. | |
91 | An absolute identifier refers to a resource independent of | |
92 | context, while a relative | |
93 | identifier refers to a resource by describing the difference | |
94 | from the current context. | |
95 | Within a relative path reference, the complete path segments "." and | |
96 | ".." have special meanings: "the current hierarchy level" and "the | |
97 | level above this hierarchy level", respectively, just like they do in | |
008f1ecc | 98 | UNIX-like systems. |
fea681da MK |
99 | A path segment which contains a colon |
100 | character can't be used as the first segment of a relative URI path | |
101 | (e.g., "this:that"), because it would be mistaken for a scheme name; | |
102 | precede such segments with ./ (e.g., "./this:that"). | |
b9560046 | 103 | Note that descendants of MS-DOS (e.g., Microsoft Windows) replace |
fea681da MK |
104 | devicename colons with the vertical bar ("|") in URIs, so "C:" becomes "C|". |
105 | .PP | |
106 | A fragment identifier, if included, refers to a particular named portion | |
f81fb444 MK |
107 | (fragment) of a resource; text after a \(aq#\(aq identifies the fragment. |
108 | A URI beginning with \(aq#\(aq refers to that fragment in the current resource. | |
446a4bc8 | 109 | .SS Usage |
fea681da MK |
110 | There are many different URI schemes, each with specific |
111 | additional rules and meanings, but they are intentionally made to be | |
112 | as similar as possible. | |
113 | For example, many URL schemes | |
114 | permit the authority to be the following format, called here an | |
115 | .I ip_server | |
116 | (square brackets show what's optional): | |
117 | .HP | |
118 | .IR "ip_server = " [ user " [ : " password " ] @ ] " host " [ : " port ] | |
119 | .PP | |
18701562 | 120 | This format allows you to optionally insert a username, |
fea681da MK |
121 | a user plus password, and/or a port number. |
122 | The | |
123 | .I host | |
124 | is the name of the host computer, either its name as determined by DNS | |
125 | or an IP address (numbers separated by periods). | |
126 | Thus the URI | |
ffc3e08c JW |
127 | <http://fred:fredpassword@example.com:8080/> |
128 | logs into a web server on host example.com | |
fea681da MK |
129 | as fred (using fredpassword) using port 8080. |
130 | Avoid including a password in a URI if possible because of the many | |
131 | security risks of having a password written down. | |
18701562 | 132 | If the URL supplies a username but no password, and the remote |
fea681da MK |
133 | server requests a password, the program interpreting the URL |
134 | should request one from the user. | |
135 | .PP | |
008f1ecc | 136 | Here are some of the most common schemes in use on UNIX-like systems |
fea681da MK |
137 | that are understood by many tools. |
138 | Note that many tools using URIs also have internal schemes or specialized | |
139 | schemes; see those tools' documentation for information on those schemes. | |
446a4bc8 MK |
140 | .PP |
141 | .B "http \- Web (HTTP) server" | |
142 | .PP | |
fea681da MK |
143 | .RI http:// ip_server / path |
144 | .br | |
145 | .RI http:// ip_server / path ? query | |
146 | .PP | |
147 | This is a URL accessing a web (HTTP) server. | |
148 | The default port is 80. | |
149 | If the path refers to a directory, the web server will choose what | |
150 | to return; usually if there is a file named "index.html" or "index.htm" | |
151 | its content is returned, otherwise, a list of the files in the current | |
152 | directory (with appropriate links) is generated and returned. | |
153 | An example is <http://lwn.net>. | |
154 | .PP | |
155 | A query can be given in the archaic "isindex" format, consisting of a | |
156 | word or phrase and not including an equal sign (=). | |
157 | A query can also be in the longer "GET" format, which has one or more | |
158 | query entries of the form | |
159 | .IR key = value | |
160 | separated by the ampersand character (&). | |
161 | Note that | |
162 | .I key | |
163 | can be repeated more than once, though it's up to the web server | |
164 | and its application programs to determine if there's any meaning to that. | |
165 | There is an unfortunate interaction with HTML/XML/SGML and | |
166 | the GET query format; when such URIs with more than one key | |
167 | are embedded in SGML/XML documents (including HTML), the ampersand | |
168 | (&) has to be rewritten as &. | |
169 | Note that not all queries use this format; larger forms | |
170 | may be too long to store as a URI, so they use a different | |
6116ff44 MK |
171 | interaction mechanism (called POST) which does |
172 | not include the data in the URI. | |
fea681da | 173 | See the Common Gateway Interface specification at |
608bf950 SK |
174 | .UR http://www.w3.org\:/CGI |
175 | .UE | |
176 | for more information. | |
446a4bc8 MK |
177 | .PP |
178 | .B "ftp \- File Transfer Protocol (FTP)" | |
179 | .PP | |
fea681da MK |
180 | .RI ftp:// ip_server / path |
181 | .PP | |
182 | This is a URL accessing a file through the file transfer protocol (FTP). | |
183 | The default port (for control) is 21. | |
18701562 | 184 | If no username is included, the username "anonymous" is supplied, and |
fea681da MK |
185 | in that case many clients provide as the password the requestor's |
186 | Internet email address. | |
187 | An example is | |
188 | <ftp://ftp.is.co.za/rfc/rfc1808.txt>. | |
446a4bc8 MK |
189 | .PP |
190 | .B "gopher \- Gopher server" | |
191 | .PP | |
fea681da MK |
192 | .RI gopher:// ip_server / "gophertype selector" |
193 | .br | |
194 | .RI gopher:// ip_server / "gophertype selector" %09 search | |
195 | .br | |
196 | .RI gopher:// ip_server / "gophertype selector" %09 search %09 gopher+_string | |
197 | .br | |
198 | .PP | |
199 | The default gopher port is 70. | |
200 | .I gophertype | |
201 | is a single-character field to denote the | |
202 | Gopher type of the resource to | |
203 | which the URL refers. | |
204 | The entire path may also be empty, in | |
205 | which case the delimiting "/" is also optional and the gophertype | |
206 | defaults to "1". | |
207 | .PP | |
208 | .I selector | |
c13182ef MK |
209 | is the Gopher selector string. |
210 | In the Gopher protocol, | |
fea681da MK |
211 | Gopher selector strings are a sequence of octets which may contain |
212 | any octets except 09 hexadecimal (US-ASCII HT or tab), 0A hexadecimal | |
213 | (US-ASCII character LF), and 0D (US-ASCII character CR). | |
446a4bc8 MK |
214 | .PP |
215 | .B "mailto \- Email address" | |
216 | .PP | |
fea681da MK |
217 | .RI mailto: email-address |
218 | .PP | |
219 | This is an email address, usually of the form | |
220 | .IR name @ hostname . | |
221 | See | |
222 | .BR mailaddr (7) | |
223 | for more information on the correct format of an email address. | |
224 | Note that any % character must be rewritten as %25. | |
225 | An example is <mailto:dwheeler@dwheeler.com>. | |
446a4bc8 MK |
226 | .PP |
227 | .B "news \- Newsgroup or News message" | |
228 | .PP | |
fea681da MK |
229 | .RI news: newsgroup-name |
230 | .br | |
231 | .RI news: message-id | |
232 | .PP | |
233 | A | |
234 | .I newsgroup-name | |
235 | is a period-delimited hierarchical name, such as | |
236 | "comp.infosystems.www.misc". | |
237 | If <newsgroup-name> is "*" (as in <news:*>), it is used to refer | |
238 | to "all available news groups". | |
239 | An example is <news:comp.lang.ada>. | |
240 | .PP | |
241 | A | |
242 | .I message-id | |
243 | corresponds to the Message-ID of | |
608bf950 | 244 | .UR http://www.ietf.org\:/rfc\:/rfc1036.txt |
331da7c3 | 245 | IETF RFC\ 1036, |
fea681da MK |
246 | .UE |
247 | without the enclosing "<" | |
248 | and ">"; it takes the form | |
249 | .IR unique @ full_domain_name . | |
250 | A message identifier may be distinguished from a news group name by the | |
251 | presence of the "@" character. | |
446a4bc8 MK |
252 | .PP |
253 | .B "telnet \- Telnet login" | |
254 | .PP | |
fea681da MK |
255 | .RI telnet:// ip_server / |
256 | .PP | |
257 | The Telnet URL scheme is used to designate interactive text services that | |
c13182ef MK |
258 | may be accessed by the Telnet protocol. |
259 | The final "/" character may be omitted. | |
fea681da MK |
260 | The default port is 23. |
261 | An example is <telnet://melvyl.ucop.edu/>. | |
446a4bc8 MK |
262 | .PP |
263 | .B "file \- Normal file" | |
264 | .PP | |
fea681da MK |
265 | .RI file:// ip_server / path_segments |
266 | .br | |
267 | .RI file: path_segments | |
268 | .PP | |
269 | This represents a file or directory accessible locally. | |
270 | As a special case, | |
7adfc6e1 | 271 | .I ip_server |
fea681da | 272 | can be the string "localhost" or the empty |
2d986c92 MK |
273 | string; this is interpreted as "the machine from which the URL is |
274 | being interpreted". | |
fea681da MK |
275 | If the path is to a directory, the viewer should display the |
276 | directory's contents with links to each containee; | |
277 | not all viewers currently do this. | |
278 | KDE supports generated files through the URL <file:/cgi-bin>. | |
279 | If the given file isn't found, browser writers may want to try to expand | |
280 | the filename via filename globbing | |
281 | (see | |
282 | .BR glob (7) | |
283 | and | |
284 | .BR glob (3)). | |
285 | .PP | |
286 | The second format (e.g., <file:/etc/passwd>) | |
287 | is a correct format for referring to | |
c13182ef MK |
288 | a local file. |
289 | However, older standards did not permit this format, | |
fea681da | 290 | and some programs don't recognize this as a URI. |
75b94dc3 MK |
291 | A more portable syntax is to use an empty string as the server name, |
292 | for example, | |
fea681da MK |
293 | <file:///etc/passwd>; this form does the same thing |
294 | and is easily recognized by pattern matchers and older programs as a URI. | |
295 | Note that if you really mean to say "start from the current location," don't | |
296 | specify the scheme at all; use a relative address like <../test.txt>, | |
297 | which has the side-effect of being scheme-independent. | |
298 | An example of this scheme is <file:///etc/passwd>. | |
446a4bc8 MK |
299 | .PP |
300 | .B "man \- Man page documentation" | |
301 | .PP | |
fea681da MK |
302 | .RI man: command-name |
303 | .br | |
304 | .RI man: command-name ( section ) | |
305 | .PP | |
306 | This refers to local online manual (man) reference pages. | |
6116ff44 MK |
307 | The command name can optionally be followed by a |
308 | parenthesis and section number; see | |
fea681da MK |
309 | .BR man (7) |
310 | for more information on the meaning of the section numbers. | |
008f1ecc | 311 | This URI scheme is unique to UNIX-like systems (such as Linux) |
fea681da MK |
312 | and is not currently registered by the IETF. |
313 | An example is <man:ls(1)>. | |
446a4bc8 MK |
314 | .PP |
315 | .B "info \- Info page documentation" | |
316 | .PP | |
fea681da MK |
317 | .RI info: virtual-filename |
318 | .br | |
319 | .RI info: virtual-filename # nodename | |
320 | .br | |
321 | .RI info:( virtual-filename ) | |
322 | .br | |
323 | .RI info:( virtual-filename ) nodename | |
324 | .PP | |
325 | This scheme refers to online info reference pages (generated from | |
6116ff44 MK |
326 | texinfo files), |
327 | a documentation format used by programs such as the GNU tools. | |
008f1ecc | 328 | This URI scheme is unique to UNIX-like systems (such as Linux) |
fea681da MK |
329 | and is not currently registered by the IETF. |
330 | As of this writing, GNOME and KDE differ in their URI syntax | |
331 | and do not accept the other's syntax. | |
332 | The first two formats are the GNOME format; in nodenames all spaces | |
333 | are written as underscores. | |
334 | The second two formats are the KDE format; | |
335 | spaces in nodenames must be written as spaces, even though this | |
336 | is forbidden by the URI standards. | |
337 | It's hoped that in the future most tools will understand all of these | |
338 | formats and will always accept underscores for spaces in nodenames. | |
339 | In both GNOME and KDE, if the form without the nodename is used the | |
340 | nodename is assumed to be "Top". | |
341 | Examples of the GNOME format are <info:gcc> and <info:gcc#G++_and_GCC>. | |
342 | Examples of the KDE format are <info:(gcc)> and <info:(gcc)G++ and GCC>. | |
446a4bc8 MK |
343 | .PP |
344 | .B "whatis \- Documentation search" | |
345 | .PP | |
fea681da MK |
346 | .RI whatis: string |
347 | .PP | |
6116ff44 MK |
348 | This scheme searches the database of short (one-line) descriptions of |
349 | commands and returns a list of descriptions containing that string. | |
fea681da MK |
350 | Only complete word matches are returned. |
351 | See | |
352 | .BR whatis (1). | |
008f1ecc | 353 | This URI scheme is unique to UNIX-like systems (such as Linux) |
fea681da | 354 | and is not currently registered by the IETF. |
446a4bc8 MK |
355 | .PP |
356 | .B "ghelp \- GNOME help documentation" | |
357 | .PP | |
fea681da MK |
358 | .RI ghelp: name-of-application |
359 | .PP | |
360 | This loads GNOME help for the given application. | |
361 | Note that not much documentation currently exists in this format. | |
446a4bc8 MK |
362 | .PP |
363 | .B "ldap \- Lightweight Directory Access Protocol" | |
364 | .PP | |
fea681da MK |
365 | .RI ldap:// hostport |
366 | .br | |
367 | .RI ldap:// hostport / | |
368 | .br | |
369 | .RI ldap:// hostport / dn | |
370 | .br | |
371 | .RI ldap:// hostport / dn ? attributes | |
372 | .br | |
373 | .RI ldap:// hostport / dn ? attributes ? scope | |
374 | .br | |
375 | .RI ldap:// hostport / dn ? attributes ? scope ? filter | |
376 | .br | |
377 | .RI ldap:// hostport / dn ? attributes ? scope ? filter ? extensions | |
378 | .PP | |
379 | This scheme supports queries to the | |
380 | Lightweight Directory Access Protocol (LDAP), a protocol for querying | |
3f624b93 | 381 | a set of servers for hierarchically organized information |
fea681da | 382 | (such as people and computing resources). |
034dbf3a | 383 | See |
608bf950 | 384 | .UR http://www.ietf.org\:/rfc\:/rfc2255.txt |
034dbf3a | 385 | RFC\ 2255 |
fea681da | 386 | .UE |
034dbf3a | 387 | for more information on the LDAP URL scheme. |
fea681da MK |
388 | The components of this URL are: |
389 | .IP hostport 12 | |
390 | the LDAP server to query, written as a hostname optionally followed by | |
391 | a colon and the port number. | |
c13182ef | 392 | The default LDAP port is TCP port 389. |
fea681da MK |
393 | If empty, the client determines which the LDAP server to use. |
394 | .IP dn | |
395 | the LDAP Distinguished Name, which identifies | |
396 | the base object of the LDAP search (see | |
608bf950 | 397 | .UR http://www.ietf.org\:/rfc\:/rfc2253.txt |
331da7c3 | 398 | RFC\ 2253 |
fea681da MK |
399 | .UE |
400 | section 3). | |
401 | .IP attributes | |
402 | a comma-separated list of attributes to be returned; | |
c13182ef | 403 | see RFC\ 2251 section 4.1.5. |
331da7c3 | 404 | If omitted, all attributes should be returned. |
fea681da MK |
405 | .IP scope |
406 | specifies the scope of the search, which can be one of | |
407 | "base" (for a base object search), "one" (for a one-level search), | |
c13182ef MK |
408 | or "sub" (for a subtree search). |
409 | If scope is omitted, "base" is assumed. | |
fea681da MK |
410 | .IP filter |
411 | specifies the search filter (subset of entries | |
c13182ef MK |
412 | to return). |
413 | If omitted, all entries should be returned. | |
fea681da | 414 | See |
608bf950 | 415 | .UR http://www.ietf.org\:/rfc\:/rfc2254.txt |
331da7c3 | 416 | RFC\ 2254 |
fea681da MK |
417 | .UE |
418 | section 4. | |
419 | .IP extensions | |
420 | a comma-separated list of type=value | |
421 | pairs, where the =value portion may be omitted for options not | |
c13182ef | 422 | requiring it. |
f81fb444 | 423 | An extension prefixed with a \(aq!\(aq is critical |
24b74457 | 424 | (must be supported to be valid), otherwise it is noncritical (optional). |
fea681da MK |
425 | .PP |
426 | LDAP queries are easiest to explain by example. | |
427 | Here's a query that asks ldap.itd.umich.edu for information about | |
428 | the University of Michigan in the U.S.: | |
0dac954b MK |
429 | .PP |
430 | .nf | |
fea681da | 431 | ldap://ldap.itd.umich.edu/o=University%20of%20Michigan,c=US |
3ffdc54f | 432 | .fi |
fea681da MK |
433 | .PP |
434 | To just get its postal address attribute, request: | |
0dac954b MK |
435 | .PP |
436 | .nf | |
fea681da | 437 | ldap://ldap.itd.umich.edu/o=University%20of%20Michigan,c=US?postalAddress |
0dac954b | 438 | .fi |
fea681da MK |
439 | .PP |
440 | To ask a host.com at port 6666 for information about the person | |
441 | with common name (cn) "Babs Jensen" at University of Michigan, request: | |
0dac954b MK |
442 | .PP |
443 | .nf | |
fea681da | 444 | ldap://host.com:6666/o=University%20of%20Michigan,c=US??sub?(cn=Babs%20Jensen) |
0dac954b | 445 | .fi |
446a4bc8 MK |
446 | .PP |
447 | .B "wais \- Wide Area Information Servers" | |
448 | .PP | |
fea681da MK |
449 | .RI wais:// hostport / database |
450 | .br | |
451 | .RI wais:// hostport / database ? search | |
452 | .br | |
453 | .RI wais:// hostport / database / wtype / wpath | |
454 | .PP | |
455 | This scheme designates a WAIS database, search, or document | |
456 | (see | |
608bf950 | 457 | .UR http://www.ietf.org\:/rfc\:/rfc1625.txt |
331da7c3 | 458 | IETF RFC\ 1625 |
fea681da MK |
459 | .UE |
460 | for more information on WAIS). | |
461 | Hostport is the hostname, optionally followed by a colon and port number | |
462 | (the default port number is 210). | |
463 | .PP | |
464 | The first form designates a WAIS database for searching. | |
465 | The second form designates a particular search of the WAIS database | |
466 | .IR database . | |
467 | The third form designates a particular document within a WAIS | |
468 | database to be retrieved. | |
469 | .I wtype | |
470 | is the WAIS designation of the type of the object and | |
471 | .I wpath | |
472 | is the WAIS document-id. | |
446a4bc8 MK |
473 | .PP |
474 | .B "other schemes" | |
475 | .PP | |
fea681da MK |
476 | There are many other URI schemes. |
477 | Most tools that accept URIs support a set of internal URIs | |
478 | (e.g., Mozilla has the about: scheme for internal information, | |
479 | and the GNOME help browser has the toc: scheme for various starting | |
480 | locations). | |
481 | There are many schemes that have been defined but are not as widely | |
482 | used at the current time | |
483 | (e.g., prospero). | |
484 | The nntp: scheme is deprecated in favor of the news: scheme. | |
485 | URNs are to be supported by the urn: scheme, with a hierarchical name space | |
486 | (e.g., urn:ietf:... would identify IETF documents); at this time | |
487 | URNs are not widely implemented. | |
488 | Not all tools support all schemes. | |
73d8cece | 489 | .SS Character encoding |
fea681da MK |
490 | .PP |
491 | URIs use a limited number of characters so that they can be | |
492 | typed in and used in a variety of situations. | |
493 | .PP | |
494 | The following characters are reserved, that is, they may appear in a | |
495 | URI but their use is limited to their reserved purpose | |
496 | (conflicting data must be escaped before forming the URI): | |
497 | .IP | |
498 | ; / ? : @ & = + $ , | |
499 | .PP | |
500 | Unreserved characters may be included in a URI. | |
501 | Unreserved characters | |
efaef3da | 502 | include uppercase and lowercase English letters, |
fea681da MK |
503 | decimal digits, and the following |
504 | limited set of punctuation marks and symbols: | |
505 | .IP | |
4d9b6984 | 506 | \- _ . ! ~ * ' ( ) |
fea681da MK |
507 | .PP |
508 | All other characters must be escaped. | |
509 | An escaped octet is encoded as a character triplet, consisting of the | |
510 | percent character "%" followed by the two hexadecimal digits | |
efaef3da | 511 | representing the octet code (you can use uppercase or lowercase letters |
c13182ef MK |
512 | for the hexadecimal digits). |
513 | For example, a blank space must be escaped | |
fea681da MK |
514 | as "%20", a tab character as "%09", and the "&" as "%26". |
515 | Because the percent "%" character always has the reserved purpose of | |
516 | being the escape indicator, it must be escaped as "%25". | |
517 | It is common practice to escape space characters as the plus symbol (+) | |
518 | in query text; this practice isn't uniformly defined | |
519 | in the relevant RFCs (which recommend %20 instead) but any tool accepting | |
520 | URIs with query text should be prepared for them. | |
521 | A URI is always shown in its "escaped" form. | |
522 | .PP | |
523 | Unreserved characters can be escaped without changing the semantics | |
524 | of the URI, but this should not be done unless the URI is being used | |
525 | in a context that does not allow the unescaped character to appear. | |
763f0e47 MK |
526 | For example, "%7e" is sometimes used instead of "~" in an HTTP URL |
527 | path, but the two are equivalent for an HTTP URL. | |
fea681da MK |
528 | .PP |
529 | For URIs which must handle characters outside the US ASCII character set, | |
530 | the HTML 4.01 specification (section B.2) and | |
331da7c3 | 531 | IETF RFC\ 2718 (section 2.2.5) recommend the following approach: |
fea681da | 532 | .IP 1. 4 |
5503c85e MK |
533 | translate the character sequences into UTF-8 (IETF RFC\ 2279)\(emsee |
534 | .BR utf-8 (7)\(emand | |
535 | then | |
fea681da MK |
536 | .IP 2. |
537 | use the URI escaping mechanism, that is, | |
538 | use the %HH encoding for unsafe octets. | |
73d8cece | 539 | .SS Writing a URI |
eb1af896 | 540 | When written, URIs should be placed inside double quotes |
fea681da MK |
541 | (e.g., "http://www.kernelnotes.org"), |
542 | enclosed in angle brackets (e.g., <http://lwn.net>), | |
543 | or placed on a line by themselves. | |
544 | A warning for those who use double-quotes: | |
545 | .B never | |
546 | move extraneous punctuation (such as the period ending a sentence or the | |
547 | comma in a list) | |
548 | inside a URI, since this will change the value of the URI. | |
549 | Instead, use angle brackets instead, or | |
550 | switch to a quoting system that never includes extraneous characters | |
551 | inside quotation marks. | |
552 | This latter system, called the 'new' or 'logical' quoting system by | |
553 | "Hart's Rules" and the "Oxford Dictionary for Writers and Editors", | |
554 | is preferred practice in Great Britain and hackers worldwide | |
555 | (see the | |
defcceb3 | 556 | Jargon File's section on Hacker Writing Style, |
608bf950 SK |
557 | .UR http://www.fwi.uva.nl\:/~mes\:/jargon\:/h\:/HackerWritingStyle.html |
558 | .UE , | |
fea681da | 559 | for more information). |
c13182ef | 560 | Older documents suggested inserting the prefix "URL:" |
fea681da MK |
561 | just before the URI, but this form has never caught on. |
562 | .PP | |
563 | The URI syntax was designed to be unambiguous. | |
564 | However, as URIs have become commonplace, traditional media | |
565 | (television, radio, newspapers, billboards, etc.) have increasingly | |
566 | used abbreviated URI references consisting of | |
567 | only the authority and path portions of the identified resource | |
568 | (e.g., <www.w3.org/Addressing>). | |
569 | Such references are primarily | |
570 | intended for human interpretation rather than machine, with the | |
571 | assumption that context-based heuristics are sufficient to complete | |
572 | the URI (e.g., hostnames beginning with "www" are likely to have | |
573 | a URI prefix of "http://" and hostnames beginning with "ftp" likely | |
574 | to have a prefix of "ftp://"). | |
575 | Many client implementations heuristically resolve these references. | |
576 | Such heuristics may | |
577 | change over time, particularly when new schemes are introduced. | |
578 | Since an abbreviated URI has the same syntax as a relative URL path, | |
579 | abbreviated URI references cannot be used where relative URIs are | |
33a0ccb2 | 580 | permitted, and can be used only when there is no defined base |
fea681da MK |
581 | (such as in dialog boxes). |
582 | Don't use abbreviated URIs as hypertext links inside a document; | |
583 | use the standard format as described here. | |
47297adb | 584 | .SH CONFORMING TO |
2b2581ee | 585 | .PP |
608bf950 SK |
586 | .UR http://www.ietf.org\:/rfc\:/rfc2396.txt |
587 | (IETF RFC\ 2396) | |
588 | .UE , | |
589 | .UR http://www.w3.org\:/TR\:/REC-html40 | |
590 | (HTML 4.0) | |
591 | .UE . | |
fea681da MK |
592 | .SH NOTES |
593 | Any tool accepting URIs (e.g., a web browser) on a Linux system should | |
6116ff44 MK |
594 | be able to handle (directly or indirectly) all of the |
595 | schemes described here, including the man: and info: schemes. | |
596 | Handling them by invoking some other program is | |
597 | fine and in fact encouraged. | |
fea681da MK |
598 | .PP |
599 | Technically the fragment isn't part of the URI. | |
600 | .PP | |
601 | For information on how to embed URIs (including URLs) in a data format, | |
602 | see documentation on that format. | |
603 | HTML uses the format <A HREF="\fIuri\fP"> | |
604 | .I text | |
605 | </A>. | |
606 | Texinfo files use the format @uref{\fIuri\fP}. | |
3f624b93 | 607 | Man and mdoc have the recently added UR macro, or just include the |
fea681da MK |
608 | URI in the text (viewers should be able to detect :// as part of a URI). |
609 | .PP | |
6116ff44 MK |
610 | The GNOME and KDE desktop environments currently vary in the URIs |
611 | they accept, in particular in their respective help browsers. | |
fea681da MK |
612 | To list man pages, GNOME uses <toc:man> while KDE uses <man:(index)>, and |
613 | to list info pages, GNOME uses <toc:info> while KDE uses <info:(dir)> | |
614 | (the author of this man page prefers the KDE approach here, though a more | |
615 | regular format would be even better). | |
616 | In general, KDE uses <file:/cgi-bin/> as a prefix to a set of generated | |
617 | files. | |
618 | KDE prefers documentation in HTML, accessed via the | |
619 | <file:/cgi-bin/helpindex>. | |
620 | GNOME prefers the ghelp scheme to store and find documentation. | |
621 | Neither browser handles file: references to directories at the time | |
622 | of this writing, making it difficult to refer to an entire directory with | |
623 | a browsable URI. | |
6116ff44 MK |
624 | As noted above, these environments differ in how they handle the |
625 | info: scheme, probably the most important variation. | |
fea681da MK |
626 | It is expected that GNOME and KDE |
627 | will converge to common URI formats, and a future | |
628 | version of this man page will describe the converged result. | |
629 | Efforts to aid this convergence are encouraged. | |
2b2581ee | 630 | .SS Security |
fea681da MK |
631 | .PP |
632 | A URI does not in itself pose a security threat. | |
633 | There is no general guarantee that a URL, which at one time | |
c13182ef MK |
634 | located a given resource, will continue to do so. |
635 | Nor is there any | |
fea681da | 636 | guarantee that a URL will not locate a different resource at some |
33a0ccb2 MK |
637 | later point in time; such a guarantee can be |
638 | obtained only from the person(s) controlling that namespace and the | |
fea681da MK |
639 | resource in question. |
640 | .PP | |
641 | It is sometimes possible to construct a URL such that an attempt to | |
642 | perform a seemingly harmless operation, such as the | |
643 | retrieval of an entity associated with the resource, will in fact | |
c13182ef MK |
644 | cause a possibly damaging remote operation to occur. |
645 | The unsafe URL | |
fea681da | 646 | is typically constructed by specifying a port number other than that |
c13182ef MK |
647 | reserved for the network protocol in question. |
648 | The client unwittingly contacts a site that is in fact | |
649 | running a different protocol. | |
650 | The content of the URL contains instructions that, when | |
fea681da | 651 | interpreted according to this other protocol, cause an unexpected |
c13182ef MK |
652 | operation. |
653 | An example has been the use of a gopher URL to cause an | |
fea681da MK |
654 | unintended or impersonating message to be sent via a SMTP server. |
655 | .PP | |
656 | Caution should be used when using any URL that specifies a port | |
657 | number other than the default for the protocol, especially when it is | |
658 | a number within the reserved space. | |
659 | .PP | |
660 | Care should be taken when a URI contains escaped delimiters for a | |
661 | given protocol (for example, CR and LF characters for telnet | |
c13182ef MK |
662 | protocols) that these are not unescaped before transmission. |
663 | This might violate the protocol, but avoids the potential for such | |
fea681da MK |
664 | characters to be used to simulate an extra operation or parameter in |
665 | that protocol, which might lead to an unexpected and possibly harmful | |
666 | remote operation to be performed. | |
667 | .PP | |
668 | It is clearly unwise to use a URI that contains a password which is | |
c13182ef MK |
669 | intended to be secret. |
670 | In particular, the use of a password within | |
84c517a4 MK |
671 | the "userinfo" component of a URI is strongly recommended against except |
672 | in those rare cases where the "password" parameter is intended to be public. | |
fea681da MK |
673 | .SH BUGS |
674 | .PP | |
675 | Documentation may be placed in a variety of locations, so there | |
676 | currently isn't a good URI scheme for general online documentation | |
677 | in arbitrary formats. | |
678 | References of the form | |
679 | <file:///usr/doc/ZZZ> don't work because different distributions and | |
680 | local installation requirements may place the files in different | |
681 | directories | |
6116ff44 MK |
682 | (it may be in /usr/doc, or /usr/local/doc, or /usr/share, |
683 | or somewhere else). | |
fea681da MK |
684 | Also, the directory ZZZ usually changes when a version changes |
685 | (though filename globbing could partially overcome this). | |
6116ff44 MK |
686 | Finally, using the file: scheme doesn't easily support people |
687 | who dynamically load documentation from the Internet (instead of | |
9ee4a2b6 | 688 | loading the files onto a local filesystem). |
fea681da | 689 | A future URI scheme may be added (e.g., "userdoc:") to permit |
6116ff44 MK |
690 | programs to include cross-references to more detailed documentation |
691 | without having to know the exact location of that documentation. | |
9ee4a2b6 | 692 | Alternatively, a future version of the filesystem specification may |
fea681da MK |
693 | specify file locations sufficiently so that the file: scheme will |
694 | be able to locate documentation. | |
695 | .PP | |
696 | Many programs and file formats don't include a way to incorporate | |
697 | or implement links using URIs. | |
698 | .PP | |
699 | Many programs can't handle all of these different URI formats; there | |
700 | should be a standard mechanism to load an arbitrary URI that automatically | |
6116ff44 | 701 | detects the users' environment (e.g., text or graphics, |
3f624b93 | 702 | desktop environment, local user preferences, and currently executing |
6116ff44 | 703 | tools) and invokes the right tool for any URI. |
fd7f0a7f MK |
704 | .\" .SH AUTHOR |
705 | .\" David A. Wheeler (dwheeler@dwheeler.com) wrote this man page. | |
47297adb | 706 | .SH SEE ALSO |
fea681da MK |
707 | .BR lynx (1), |
708 | .BR man2html (1), | |
709 | .BR mailaddr (7), | |
173fe7e7 DP |
710 | .BR utf-8 (7) |
711 | ||
608bf950 | 712 | .UR http://www.ietf.org\:/rfc\:/rfc2255.txt |
baf17bc4 | 713 | IETF RFC\ 2255 |
fea681da | 714 | .UE |