]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man7/uri.7
Wrapped long lines, wrapped at sentence boundaries; stripped trailing
[thirdparty/man-pages.git] / man7 / uri.7
CommitLineData
fea681da
MK
1'\"
2.\" (C) Copyright 1999-2000 David A. Wheeler (dwheeler@dwheeler.com)
3.\"
4.\" Permission is granted to make and distribute verbatim copies of this
5.\" manual provided the copyright notice and this permission notice are
6.\" preserved on all copies.
7.\"
8.\" Permission is granted to copy and distribute modified versions of this
9.\" manual under the conditions for verbatim copying, provided that the
10.\" entire resulting derived work is distributed under the terms of a
11.\" permission notice identical to this one.
c13182ef 12.\"
fea681da
MK
13.\" Since the Linux kernel and libraries are constantly changing, this
14.\" manual page may be incorrect or out-of-date. The author(s) assume no
15.\" responsibility for errors or omissions, or for damages resulting from
16.\" the use of the information contained herein. The author(s) may not
17.\" have taken the same level of care in the production of this manual,
18.\" which is licensed free of charge, as they might when working
19.\" professionally.
c13182ef 20.\"
fea681da
MK
21.\" Formatted or processed versions of this manual, if unaccompanied by
22.\" the source, must acknowledge the copyright and authors of this work.
23.\"
24.\" Fragments of this document are directly derived from IETF standards.
25.\" For those fragments which are directly derived from such standards,
26.\" the following notice applies, which is the standard copyright and
27.\" rights announcement of The Internet Society:
28.\"
29.\" Copyright (C) The Internet Society (1998). All Rights Reserved.
30.\" This document and translations of it may be copied and furnished to
31.\" others, and derivative works that comment on or otherwise explain it
32.\" or assist in its implementation may be prepared, copied, published
33.\" and distributed, in whole or in part, without restriction of any
34.\" kind, provided that the above copyright notice and this paragraph are
35.\" included on all such copies and derivative works. However, this
36.\" document itself may not be modified in any way, such as by removing
37.\" the copyright notice or references to the Internet Society or other
38.\" Internet organizations, except as needed for the purpose of
39.\" developing Internet standards in which case the procedures for
40.\" copyrights defined in the Internet Standards process must be
41.\" followed, or as required to translate it into languages other than English.
42.\"
43.\" Modified Fri Jul 25 23:00:00 1999 by David A. Wheeler (dwheeler@dwheeler.com)
44.\" Modified Fri Aug 21 23:00:00 1999 by David A. Wheeler (dwheeler@dwheeler.com)
45.\" Modified Tue Mar 14 2000 by David A. Wheeler (dwheeler@dwheeler.com)
46.\"
47.TH URI 7 2000-03-14 "Linux" "Linux Programmer's Manual"
48.SH NAME
49uri, url, urn \- uniform resource identifier (URI), including a URL or URN
50.SH SYNOPSIS
51.nf
52.HP 0.2i
53URI = [ absoluteURI | relativeURI ] [ "#" fragment ]
54.HP
55absoluteURI = scheme ":" ( hierarchical_part | opaque_part )
56.HP
57relativeURI = ( net_path | absolute_path | relative_path ) [ "?" query ]
fea681da
MK
58.HP
59scheme = "http" | "ftp" | "gopher" | "mailto" | "news" | "telnet" | "file" | "man" | "info" | "whatis" | "ldap" | "wais" | \&...
60.HP
61hierarchical_part = ( net_path | absolute_path ) [ "?" query ]
fea681da
MK
62.HP
63net_path = "//" authority [ absolute_path ]
64.HP
65absolute_path = "/" path_segments
66.HP
67relative_path = relative_segment [ absolute_path ]
68.fi
69.SH DESCRIPTION
70.PP
71A Uniform Resource Identifier (URI) is a short string of characters
72identifying an abstract or physical resource (for example, a web page).
73A Uniform Resource Locator (URL) is a URI
74that identifies a resource through its primary access
75mechanism (e.g., its network "location"), rather than
76by name or some other attribute of that resource.
77A Uniform Resource Name (URN) is a URI
78that must remain globally unique and persistent even when
79the resource ceases to exist or becomes unavailable.
80.PP
81URIs are the standard way to name hypertext link destinations
82for tools such as web browsers.
83The string "http://www.kernelnotes.org" is a URL (and thus it's a URI).
84Many people use the term URL loosely as a synonym for URI
85(though technically URLs are a subset of URIs).
86.PP
87URIs can be absolute or relative.
88An absolute identifier refers to a resource independent of
89context, while a relative
90identifier refers to a resource by describing the difference
91from the current context.
92Within a relative path reference, the complete path segments "." and
93".." have special meanings: "the current hierarchy level" and "the
94level above this hierarchy level", respectively, just like they do in
95Unix-like systems.
96A path segment which contains a colon
97character can't be used as the first segment of a relative URI path
98(e.g., "this:that"), because it would be mistaken for a scheme name;
99precede such segments with ./ (e.g., "./this:that").
b9560046 100Note that descendants of MS-DOS (e.g., Microsoft Windows) replace
fea681da
MK
101devicename colons with the vertical bar ("|") in URIs, so "C:" becomes "C|".
102.PP
103A fragment identifier, if included, refers to a particular named portion
104(fragment) of a resource; text after a '#' identifies the fragment.
105A URI beginning with '#' refers to that fragment in the current resource.
106.SH USAGE
107There are many different URI schemes, each with specific
108additional rules and meanings, but they are intentionally made to be
109as similar as possible.
110For example, many URL schemes
111permit the authority to be the following format, called here an
112.I ip_server
113(square brackets show what's optional):
114.HP
115.IR "ip_server = " [ user " [ : " password " ] @ ] " host " [ : " port ]
116.PP
117This format allows you to optionally insert a user name,
118a user plus password, and/or a port number.
119The
120.I host
121is the name of the host computer, either its name as determined by DNS
122or an IP address (numbers separated by periods).
123Thus the URI
124<http://fred:fredpassword@xyz.com:8080/>
125logs into a web server on host xyz.com
126as fred (using fredpassword) using port 8080.
127Avoid including a password in a URI if possible because of the many
128security risks of having a password written down.
129If the URL supplies a user name but no password, and the remote
130server requests a password, the program interpreting the URL
131should request one from the user.
132.PP
133Here are some of the most common schemes in use on Unix-like systems
134that are understood by many tools.
135Note that many tools using URIs also have internal schemes or specialized
136schemes; see those tools' documentation for information on those schemes.
4d9b6984 137.SS "http \- Web (HTTP) server"
fea681da
MK
138.RI http:// ip_server / path
139.br
140.RI http:// ip_server / path ? query
141.PP
142This is a URL accessing a web (HTTP) server.
143The default port is 80.
144If the path refers to a directory, the web server will choose what
145to return; usually if there is a file named "index.html" or "index.htm"
146its content is returned, otherwise, a list of the files in the current
147directory (with appropriate links) is generated and returned.
148An example is <http://lwn.net>.
149.PP
150A query can be given in the archaic "isindex" format, consisting of a
151word or phrase and not including an equal sign (=).
152A query can also be in the longer "GET" format, which has one or more
153query entries of the form
154.IR key = value
155separated by the ampersand character (&).
156Note that
157.I key
158can be repeated more than once, though it's up to the web server
159and its application programs to determine if there's any meaning to that.
160There is an unfortunate interaction with HTML/XML/SGML and
161the GET query format; when such URIs with more than one key
162are embedded in SGML/XML documents (including HTML), the ampersand
163(&) has to be rewritten as &amp;.
164Note that not all queries use this format; larger forms
165may be too long to store as a URI, so they use a different
166interaction mechanism (called POST) which does not include the data in the URI.
167See the Common Gateway Interface specification at
168<http://www.w3.org/CGI> for more information.
4d9b6984 169.SS "ftp \- File Transfer Protocol (FTP)"
fea681da
MK
170.RI ftp:// ip_server / path
171.PP
172This is a URL accessing a file through the file transfer protocol (FTP).
173The default port (for control) is 21.
174If no username is included, the user name "anonymous" is supplied, and
175in that case many clients provide as the password the requestor's
176Internet email address.
177An example is
178<ftp://ftp.is.co.za/rfc/rfc1808.txt>.
4d9b6984 179.SS "gopher \- Gopher server"
fea681da
MK
180.RI gopher:// ip_server / "gophertype selector"
181.br
182.RI gopher:// ip_server / "gophertype selector" %09 search
183.br
184.RI gopher:// ip_server / "gophertype selector" %09 search %09 gopher+_string
185.br
186.PP
187The default gopher port is 70.
188.I gophertype
189is a single-character field to denote the
190Gopher type of the resource to
191which the URL refers.
192The entire path may also be empty, in
193which case the delimiting "/" is also optional and the gophertype
194defaults to "1".
195.PP
196.I selector
c13182ef
MK
197is the Gopher selector string.
198In the Gopher protocol,
fea681da
MK
199Gopher selector strings are a sequence of octets which may contain
200any octets except 09 hexadecimal (US-ASCII HT or tab), 0A hexadecimal
201(US-ASCII character LF), and 0D (US-ASCII character CR).
4d9b6984 202.SS "mailto \- Email address"
fea681da
MK
203.RI mailto: email-address
204.PP
205This is an email address, usually of the form
206.IR name @ hostname .
207See
208.BR mailaddr (7)
209for more information on the correct format of an email address.
210Note that any % character must be rewritten as %25.
211An example is <mailto:dwheeler@dwheeler.com>.
4d9b6984 212.SS "news \- Newsgroup or News message"
fea681da
MK
213.RI news: newsgroup-name
214.br
215.RI news: message-id
216.PP
217A
218.I newsgroup-name
219is a period-delimited hierarchical name, such as
220"comp.infosystems.www.misc".
221If <newsgroup-name> is "*" (as in <news:*>), it is used to refer
222to "all available news groups".
223An example is <news:comp.lang.ada>.
224.PP
225A
226.I message-id
227corresponds to the Message-ID of
defcceb3 228.URL http://www.ietf.org/rfc/rfc1036.txt
331da7c3 229IETF RFC\ 1036,
fea681da
MK
230.UE
231without the enclosing "<"
232and ">"; it takes the form
233.IR unique @ full_domain_name .
234A message identifier may be distinguished from a news group name by the
235presence of the "@" character.
4d9b6984 236.SS "telnet \- Telnet login"
fea681da
MK
237.RI telnet:// ip_server /
238.PP
239The Telnet URL scheme is used to designate interactive text services that
c13182ef
MK
240may be accessed by the Telnet protocol.
241The final "/" character may be omitted.
fea681da
MK
242The default port is 23.
243An example is <telnet://melvyl.ucop.edu/>.
4d9b6984 244.SS "file \- Normal file"
fea681da
MK
245.RI file:// ip_server / path_segments
246.br
247.RI file: path_segments
248.PP
249This represents a file or directory accessible locally.
250As a special case,
251.I host
252can be the string "localhost" or the empty
253string; this is interpreted as `the machine from which the URL is
254being interpreted'.
255If the path is to a directory, the viewer should display the
256directory's contents with links to each containee;
257not all viewers currently do this.
258KDE supports generated files through the URL <file:/cgi-bin>.
259If the given file isn't found, browser writers may want to try to expand
260the filename via filename globbing
261(see
262.BR glob (7)
263and
264.BR glob (3)).
265.PP
266The second format (e.g., <file:/etc/passwd>)
267is a correct format for referring to
c13182ef
MK
268a local file.
269However, older standards did not permit this format,
fea681da
MK
270and some programs don't recognize this as a URI.
271A more portable syntax is to use an empty string as the server name, e.g.,
272<file:///etc/passwd>; this form does the same thing
273and is easily recognized by pattern matchers and older programs as a URI.
274Note that if you really mean to say "start from the current location," don't
275specify the scheme at all; use a relative address like <../test.txt>,
276which has the side-effect of being scheme-independent.
277An example of this scheme is <file:///etc/passwd>.
4d9b6984 278.SS "man \- Man page documentation"
fea681da
MK
279.RI man: command-name
280.br
281.RI man: command-name ( section )
282.PP
283This refers to local online manual (man) reference pages.
284The command name can optionally be followed by a parenthesis and section number;
285see
286.BR man (7)
287for more information on the meaning of the section numbers.
288This URI scheme is unique to Unix-like systems (such as Linux)
289and is not currently registered by the IETF.
290An example is <man:ls(1)>.
4d9b6984 291.SS "info \- Info page documentation"
fea681da
MK
292.RI info: virtual-filename
293.br
294.RI info: virtual-filename # nodename
295.br
296.RI info:( virtual-filename )
297.br
298.RI info:( virtual-filename ) nodename
299.PP
300This scheme refers to online info reference pages (generated from
301texinfo files), a documentation format used by programs such as the GNU tools.
302This URI scheme is unique to Unix-like systems (such as Linux)
303and is not currently registered by the IETF.
304As of this writing, GNOME and KDE differ in their URI syntax
305and do not accept the other's syntax.
306The first two formats are the GNOME format; in nodenames all spaces
307are written as underscores.
308The second two formats are the KDE format;
309spaces in nodenames must be written as spaces, even though this
310is forbidden by the URI standards.
311It's hoped that in the future most tools will understand all of these
312formats and will always accept underscores for spaces in nodenames.
313In both GNOME and KDE, if the form without the nodename is used the
314nodename is assumed to be "Top".
315Examples of the GNOME format are <info:gcc> and <info:gcc#G++_and_GCC>.
316Examples of the KDE format are <info:(gcc)> and <info:(gcc)G++ and GCC>.
4d9b6984 317.SS "whatis \- Documentation search"
fea681da
MK
318.RI whatis: string
319.PP
320This scheme searches the database of short (one-line) descriptions of commands
321and returns a list of descriptions containing that string.
322Only complete word matches are returned.
323See
324.BR whatis (1).
325This URI scheme is unique to Unix-like systems (such as Linux)
326and is not currently registered by the IETF.
4d9b6984 327.SS "ghelp \- GNOME help documentation"
fea681da
MK
328.RI ghelp: name-of-application
329.PP
330This loads GNOME help for the given application.
331Note that not much documentation currently exists in this format.
4d9b6984 332.SS "ldap \- Lightweight Directory Access Protocol"
fea681da
MK
333.RI ldap:// hostport
334.br
335.RI ldap:// hostport /
336.br
337.RI ldap:// hostport / dn
338.br
339.RI ldap:// hostport / dn ? attributes
340.br
341.RI ldap:// hostport / dn ? attributes ? scope
342.br
343.RI ldap:// hostport / dn ? attributes ? scope ? filter
344.br
345.RI ldap:// hostport / dn ? attributes ? scope ? filter ? extensions
346.PP
347This scheme supports queries to the
348Lightweight Directory Access Protocol (LDAP), a protocol for querying
349a set of servers for hierarchically-organized information
350(such as people and computing resources).
351More information on the LDAP URL scheme is available in
352.UR http://www.ietf.org/rfc/rfc2255.txt
331da7c3 353RFC\ 2255.
fea681da
MK
354.UE
355The components of this URL are:
356.IP hostport 12
357the LDAP server to query, written as a hostname optionally followed by
358a colon and the port number.
c13182ef 359The default LDAP port is TCP port 389.
fea681da
MK
360If empty, the client determines which the LDAP server to use.
361.IP dn
362the LDAP Distinguished Name, which identifies
363the base object of the LDAP search (see
364.UR http://www.ietf.org/rfc/rfc2253.txt
331da7c3 365RFC\ 2253
fea681da
MK
366.UE
367section 3).
368.IP attributes
369a comma-separated list of attributes to be returned;
c13182ef 370see RFC\ 2251 section 4.1.5.
331da7c3 371If omitted, all attributes should be returned.
fea681da
MK
372.IP scope
373specifies the scope of the search, which can be one of
374"base" (for a base object search), "one" (for a one-level search),
c13182ef
MK
375or "sub" (for a subtree search).
376If scope is omitted, "base" is assumed.
fea681da
MK
377.IP filter
378specifies the search filter (subset of entries
c13182ef
MK
379to return).
380If omitted, all entries should be returned.
fea681da
MK
381See
382.UR http://www.ietf.org/rfc/rfc2254.txt
331da7c3 383RFC\ 2254
fea681da
MK
384.UE
385section 4.
386.IP extensions
387a comma-separated list of type=value
388pairs, where the =value portion may be omitted for options not
c13182ef
MK
389requiring it.
390An extension prefixed with a '!' is critical
fea681da
MK
391(must be supported to be valid), otherwise it's non-critical (optional).
392.PP
393LDAP queries are easiest to explain by example.
394Here's a query that asks ldap.itd.umich.edu for information about
395the University of Michigan in the U.S.:
396.RS
397ldap://ldap.itd.umich.edu/o=University%20of%20Michigan,c=US
398.RE
399.PP
400To just get its postal address attribute, request:
401.RS
402ldap://ldap.itd.umich.edu/o=University%20of%20Michigan,c=US?postalAddress
403.RE
404.PP
405To ask a host.com at port 6666 for information about the person
406with common name (cn) "Babs Jensen" at University of Michigan, request:
407.RS
408ldap://host.com:6666/o=University%20of%20Michigan,c=US??sub?(cn=Babs%20Jensen)
409.RE
4d9b6984 410.SS "wais \- Wide Area Information Servers"
fea681da
MK
411.RI wais:// hostport / database
412.br
413.RI wais:// hostport / database ? search
414.br
415.RI wais:// hostport / database / wtype / wpath
416.PP
417This scheme designates a WAIS database, search, or document
418(see
419.UR http://www.ietf.org/rfc/rfc1625.txt
331da7c3 420IETF RFC\ 1625
fea681da
MK
421.UE
422for more information on WAIS).
423Hostport is the hostname, optionally followed by a colon and port number
424(the default port number is 210).
425.PP
426The first form designates a WAIS database for searching.
427The second form designates a particular search of the WAIS database
428.IR database .
429The third form designates a particular document within a WAIS
430database to be retrieved.
431.I wtype
432is the WAIS designation of the type of the object and
433.I wpath
434is the WAIS document-id.
435.SS "other schemes"
436There are many other URI schemes.
437Most tools that accept URIs support a set of internal URIs
438(e.g., Mozilla has the about: scheme for internal information,
439and the GNOME help browser has the toc: scheme for various starting
440locations).
441There are many schemes that have been defined but are not as widely
442used at the current time
443(e.g., prospero).
444The nntp: scheme is deprecated in favor of the news: scheme.
445URNs are to be supported by the urn: scheme, with a hierarchical name space
446(e.g., urn:ietf:... would identify IETF documents); at this time
447URNs are not widely implemented.
448Not all tools support all schemes.
449.SH "CHARACTER ENCODING"
450.PP
451URIs use a limited number of characters so that they can be
452typed in and used in a variety of situations.
453.PP
454The following characters are reserved, that is, they may appear in a
455URI but their use is limited to their reserved purpose
456(conflicting data must be escaped before forming the URI):
457.IP
458 ; / ? : @ & = + $ ,
459.PP
460Unreserved characters may be included in a URI.
461Unreserved characters
1954b6a9 462include upper and lower case English letters,
fea681da
MK
463decimal digits, and the following
464limited set of punctuation marks and symbols:
465.IP
4d9b6984 466 \- _ . ! ~ * ' ( )
fea681da
MK
467.PP
468All other characters must be escaped.
469An escaped octet is encoded as a character triplet, consisting of the
470percent character "%" followed by the two hexadecimal digits
471representing the octet code (you can use upper or lower case letters
c13182ef
MK
472for the hexadecimal digits).
473For example, a blank space must be escaped
fea681da
MK
474as "%20", a tab character as "%09", and the "&" as "%26".
475Because the percent "%" character always has the reserved purpose of
476being the escape indicator, it must be escaped as "%25".
477It is common practice to escape space characters as the plus symbol (+)
478in query text; this practice isn't uniformly defined
479in the relevant RFCs (which recommend %20 instead) but any tool accepting
480URIs with query text should be prepared for them.
481A URI is always shown in its "escaped" form.
482.PP
483Unreserved characters can be escaped without changing the semantics
484of the URI, but this should not be done unless the URI is being used
485in a context that does not allow the unescaped character to appear.
486For example, "%7e" is sometimes used instead of "~" in an http URL
487path, but the two are equivalent for an http URL.
488.PP
489For URIs which must handle characters outside the US ASCII character set,
490the HTML 4.01 specification (section B.2) and
331da7c3 491IETF RFC\ 2718 (section 2.2.5) recommend the following approach:
fea681da 492.IP 1. 4
331da7c3 493translate the character sequences into UTF-8 (IETF RFC\ 2279) \(em see
fea681da 494.BR utf-8 (7)
4d9b6984 495\(em and then
fea681da
MK
496.IP 2.
497use the URI escaping mechanism, that is,
498use the %HH encoding for unsafe octets.
499.SH "WRITING A URI"
500When written, URIs should be placed inside doublequotes
501(e.g., "http://www.kernelnotes.org"),
502enclosed in angle brackets (e.g., <http://lwn.net>),
503or placed on a line by themselves.
504A warning for those who use double-quotes:
505.B never
506move extraneous punctuation (such as the period ending a sentence or the
507comma in a list)
508inside a URI, since this will change the value of the URI.
509Instead, use angle brackets instead, or
510switch to a quoting system that never includes extraneous characters
511inside quotation marks.
512This latter system, called the 'new' or 'logical' quoting system by
513"Hart's Rules" and the "Oxford Dictionary for Writers and Editors",
514is preferred practice in Great Britain and hackers worldwide
515(see the
defcceb3
MK
516Jargon File's section on Hacker Writing Style,
517.IR http://www.fwi.uva.nl/~mes/jargon/h/HackerWritingStyle.html ,
fea681da 518for more information).
c13182ef 519Older documents suggested inserting the prefix "URL:"
fea681da
MK
520just before the URI, but this form has never caught on.
521.PP
522The URI syntax was designed to be unambiguous.
523However, as URIs have become commonplace, traditional media
524(television, radio, newspapers, billboards, etc.) have increasingly
525used abbreviated URI references consisting of
526only the authority and path portions of the identified resource
527(e.g., <www.w3.org/Addressing>).
528Such references are primarily
529intended for human interpretation rather than machine, with the
530assumption that context-based heuristics are sufficient to complete
531the URI (e.g., hostnames beginning with "www" are likely to have
532a URI prefix of "http://" and hostnames beginning with "ftp" likely
533to have a prefix of "ftp://").
534Many client implementations heuristically resolve these references.
535Such heuristics may
536change over time, particularly when new schemes are introduced.
537Since an abbreviated URI has the same syntax as a relative URL path,
538abbreviated URI references cannot be used where relative URIs are
539permitted, and can only be used when there is no defined base
540(such as in dialog boxes).
541Don't use abbreviated URIs as hypertext links inside a document;
542use the standard format as described here.
543.SH NOTES
544Any tool accepting URIs (e.g., a web browser) on a Linux system should
545be able to handle (directly or indirectly) all of the schemes described here,
546including the man: and info: schemes.
547Handling them by invoking some other program is fine and in fact encouraged.
548.PP
549Technically the fragment isn't part of the URI.
550.PP
551For information on how to embed URIs (including URLs) in a data format,
552see documentation on that format.
553HTML uses the format <A HREF="\fIuri\fP">
554.I text
555</A>.
556Texinfo files use the format @uref{\fIuri\fP}.
557Man and mdoc have the recently-added UR macro, or just include the
558URI in the text (viewers should be able to detect :// as part of a URI).
559.PP
560The GNOME and KDE desktop environments currently vary in the URIs they accept,
561in particular in their respective help browsers.
562To list man pages, GNOME uses <toc:man> while KDE uses <man:(index)>, and
563to list info pages, GNOME uses <toc:info> while KDE uses <info:(dir)>
564(the author of this man page prefers the KDE approach here, though a more
565regular format would be even better).
566In general, KDE uses <file:/cgi-bin/> as a prefix to a set of generated
567files.
568KDE prefers documentation in HTML, accessed via the
569<file:/cgi-bin/helpindex>.
570GNOME prefers the ghelp scheme to store and find documentation.
571Neither browser handles file: references to directories at the time
572of this writing, making it difficult to refer to an entire directory with
573a browsable URI.
574As noted above, these environments differ in how they handle the info: scheme,
575probably the most important variation.
576It is expected that GNOME and KDE
577will converge to common URI formats, and a future
578version of this man page will describe the converged result.
579Efforts to aid this convergence are encouraged.
580.SH SECURITY
581.PP
582A URI does not in itself pose a security threat.
583There is no general guarantee that a URL, which at one time
c13182ef
MK
584located a given resource, will continue to do so.
585Nor is there any
fea681da
MK
586guarantee that a URL will not locate a different resource at some
587later point in time; such a guarantee can only be
588obtained from the person(s) controlling that namespace and the
589resource in question.
590.PP
591It is sometimes possible to construct a URL such that an attempt to
592perform a seemingly harmless operation, such as the
593retrieval of an entity associated with the resource, will in fact
c13182ef
MK
594cause a possibly damaging remote operation to occur.
595The unsafe URL
fea681da 596is typically constructed by specifying a port number other than that
c13182ef
MK
597reserved for the network protocol in question.
598The client unwittingly contacts a site that is in fact
599running a different protocol.
600The content of the URL contains instructions that, when
fea681da 601interpreted according to this other protocol, cause an unexpected
c13182ef
MK
602operation.
603An example has been the use of a gopher URL to cause an
fea681da
MK
604unintended or impersonating message to be sent via a SMTP server.
605.PP
606Caution should be used when using any URL that specifies a port
607number other than the default for the protocol, especially when it is
608a number within the reserved space.
609.PP
610Care should be taken when a URI contains escaped delimiters for a
611given protocol (for example, CR and LF characters for telnet
c13182ef
MK
612protocols) that these are not unescaped before transmission.
613This might violate the protocol, but avoids the potential for such
fea681da
MK
614characters to be used to simulate an extra operation or parameter in
615that protocol, which might lead to an unexpected and possibly harmful
616remote operation to be performed.
617.PP
618It is clearly unwise to use a URI that contains a password which is
c13182ef
MK
619intended to be secret.
620In particular, the use of a password within
b9560046 621the 'userinfo' component of a URI is strongly recommended against except
fea681da
MK
622in those rare cases where the 'password' parameter is intended to be public.
623.SH "CONFORMING TO"
624.PP
defcceb3
MK
625.IR http://www.ietf.org/rfc/rfc2396.txt
626(IETF RFC\ 2396),
627.I http://www.w3.org/TR/REC-html40
628(HTML 4.0).
fea681da
MK
629.UE
630.SH BUGS
631.PP
632Documentation may be placed in a variety of locations, so there
633currently isn't a good URI scheme for general online documentation
634in arbitrary formats.
635References of the form
636<file:///usr/doc/ZZZ> don't work because different distributions and
637local installation requirements may place the files in different
638directories
639(it may be in /usr/doc, or /usr/local/doc, or /usr/share, or somewhere else).
640Also, the directory ZZZ usually changes when a version changes
641(though filename globbing could partially overcome this).
642Finally, using the file: scheme doesn't easily support people who dynamically
643load documentation from the Internet (instead of loading the files
644onto a local filesystem).
645A future URI scheme may be added (e.g., "userdoc:") to permit
646programs to include cross-references to more detailed documentation without
647having to know the exact location of that documentation.
648Alternatively, a future version of the filesystem specification may
649specify file locations sufficiently so that the file: scheme will
650be able to locate documentation.
651.PP
652Many programs and file formats don't include a way to incorporate
653or implement links using URIs.
654.PP
655Many programs can't handle all of these different URI formats; there
656should be a standard mechanism to load an arbitrary URI that automatically
657detects the users' environment (e.g., text or graphics, desktop environment,
658local user preferences, and currently-executing tools) and invokes the
659right tool for any URI.
660.SH AUTHOR
661David A. Wheeler (dwheeler@dwheeler.com) wrote this man page.
662.SH "SEE ALSO"
663.BR lynx (1),
664.BR man2html (1),
665.BR mailaddr (7),
666.BR utf-8 (7)
667.UR http://www.ietf.org/rfc/rfc2255.txt
331da7c3 668IETF RFC\ 2255.
fea681da 669.UE