From f2b85e19f261bfe602313304e7964df75807f79a Mon Sep 17 00:00:00 2001 From: hno <> Date: Tue, 11 Jan 2005 00:03:28 +0000 Subject: [PATCH] CGI and ICAP specifications --- doc/rfc/1-index.txt | 9 + doc/rfc/draft-coar-cgi-v11-04.txt | 1904 ++++++++++++++++++++ doc/rfc/rfc3507.txt | 2747 +++++++++++++++++++++++++++++ 3 files changed, 4660 insertions(+) create mode 100644 doc/rfc/draft-coar-cgi-v11-04.txt create mode 100644 doc/rfc/rfc3507.txt diff --git a/doc/rfc/1-index.txt b/doc/rfc/1-index.txt index 98afb1c9bb..6a7b567222 100644 --- a/doc/rfc/1-index.txt +++ b/doc/rfc/1-index.txt @@ -11,6 +11,11 @@ draft-wilson-wrec-wccp-v2-01.txt draft-vinod-carp-v1-03.txt Microsoft CARP peering algorithm +draft-coar-cgi-v11-04.txt + CGI/1.1 specification + used by cachemgr to get it's request arguments from the + web server where it is hosted + rfc1738.txt Uniform Resource Locators (URL) @@ -37,3 +42,7 @@ rfc3310.txt Updated Digest specification Most likely not in use for HTTP. Title says HTTP but all examples is SIP. + +rfc2507.txt + Internet Content Adaptation Protocol (ICAP/1.0) + Common protocol for plugging into the datastream of a HTTP proxy diff --git a/doc/rfc/draft-coar-cgi-v11-04.txt b/doc/rfc/draft-coar-cgi-v11-04.txt new file mode 100644 index 0000000000..3e26f5b833 --- /dev/null +++ b/doc/rfc/draft-coar-cgi-v11-04.txt @@ -0,0 +1,1904 @@ + + + +INTERNET-DRAFT David Robinson +draft-coar-cgi-v11-04.txt Apache Software Foundation +Expires 18 April 2004 Ken A.L. Coar + IBM Corporation + 19 October 2003 + + + The Common Gateway Interface (CGI) Version 1.1 + + +Status of this Memo + + This document is an Internet-Draft and is in full conformance with + all provisions of Section 10 of RFC2026. + + Internet-Drafts are working documents of the Internet Engineering + Task Force (IETF), its areas, and its working groups. Note that + other groups may also distribute working documents as + Internet-Drafts. + + Internet-Drafts are draft documents valid for a maximum of six months + and may be updated, replaced, or obsoleted by other documents at any + time. It is inappropriate to use Internet-Drafts as reference + material or to cite them other than as 'work in progress'. + + The list of current Internet-Drafts can be accessed at + http://www.ietf.org/ietf/1id-abstracts.txt. + + The list of Internet-Draft Shadow Directories can be accessed at + http://www.ietf.org/shadow.html. + + Distribution of this document is unlimited. Please send comments to + the authors, or via the CGI-WG mailing list; see the project Web page + at . + +Abstract + + The Common Gateway Interface (CGI) is a simple interface for running + external programs, software or gateways under an information server + in a platform-independent manner. Currently, the supported + information servers are HTTP servers. + + The interface has been in use by the World-Wide Web since 1993. This + specification defines the 'current practice' parameters of the + 'CGI/1.1' interface developed and documented at the U.S. National + Centre for Supercomputing Applications. This document also defines + the use of the CGI/1.1 interface on UNIX(R) and other, similar + systems. + + + +Robinson & Coar Expires 18 April 2004 [Page 1] + +INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 + + +Contents + + 1 Introduction 4 + 1.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . 4 + 1.2 Requirements . . . . . . . . . . . . . . . . . . . . . . . 4 + 1.3 Specifications . . . . . . . . . . . . . . . . . . . . . . 4 + 1.4 Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 + + 2 Notational Conventions and Generic Grammar 5 + 2.1 Augmented BNF . . . . . . . . . . . . . . . . . . . . . . 5 + 2.2 Basic Rules . . . . . . . . . . . . . . . . . . . . . . . 6 + 2.3 URL Encoding . . . . . . . . . . . . . . . . . . . . . . . 7 + + 3 Invoking the Script 8 + 3.1 Server Responsibilities . . . . . . . . . . . . . . . . . 8 + 3.2 Script Selection . . . . . . . . . . . . . . . . . . . . . 8 + 3.3 The Script-URI . . . . . . . . . . . . . . . . . . . . . . 9 + 3.4 Execution . . . . . . . . . . . . . . . . . . . . . . . . 10 + + 4 The CGI Request 10 + 4.1 Request Meta-Variables . . . . . . . . . . . . . . . . . . 10 + 4.1.1 AUTH_TYPE . . . . . . . . . . . . . . . . . . . . . 11 + 4.1.2 CONTENT_LENGTH . . . . . . . . . . . . . . . . . . 11 + 4.1.3 CONTENT_TYPE . . . . . . . . . . . . . . . . . . . 12 + 4.1.4 GATEWAY_INTERFACE . . . . . . . . . . . . . . . . . 13 + 4.1.5 PATH_INFO . . . . . . . . . . . . . . . . . . . . . 13 + 4.1.6 PATH_TRANSLATED . . . . . . . . . . . . . . . . . . 14 + 4.1.7 QUERY_STRING . . . . . . . . . . . . . . . . . . . 15 + 4.1.8 REMOTE_ADDR . . . . . . . . . . . . . . . . . . . . 15 + 4.1.9 REMOTE_HOST . . . . . . . . . . . . . . . . . . . . 16 + 4.1.10 REMOTE_IDENT . . . . . . . . . . . . . . . . . . . 16 + 4.1.11 REMOTE_USER . . . . . . . . . . . . . . . . . . . . 16 + 4.1.12 REQUEST_METHOD . . . . . . . . . . . . . . . . . . 16 + 4.1.13 SCRIPT_NAME . . . . . . . . . . . . . . . . . . . . 17 + 4.1.14 SERVER_NAME . . . . . . . . . . . . . . . . . . . . 17 + 4.1.15 SERVER_PORT . . . . . . . . . . . . . . . . . . . . 17 + 4.1.16 SERVER_PROTOCOL . . . . . . . . . . . . . . . . . . 18 + 4.1.17 SERVER_SOFTWARE . . . . . . . . . . . . . . . . . . 18 + 4.1.18 Protocol-Specific Meta-Variables . . . . . . . . . 18 + 4.2 Request Message-Body . . . . . . . . . . . . . . . . . . . 19 + 4.3 Request Methods . . . . . . . . . . . . . . . . . . . . . 20 + 4.3.1 GET . . . . . . . . . . . . . . . . . . . . . . . . 20 + 4.3.2 POST . . . . . . . . . . . . . . . . . . . . . . . 20 + 4.3.3 HEAD . . . . . . . . . . . . . . . . . . . . . . . 20 + 4.3.4 Protocol-Specific Methods . . . . . . . . . . . . . 20 + 4.4 The Script Command Line . . . . . . . . . . . . . . . . . 21 + + 5 NPH Scripts 21 + 5.1 Identification . . . . . . . . . . . . . . . . . . . . . . 21 + + +Robinson & Coar Expires 18 April 2004 [Page 2] + +INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 + + + 5.2 NPH Response . . . . . . . . . . . . . . . . . . . . . . . 22 + + 6 CGI Response 22 + 6.1 Response Handling . . . . . . . . . . . . . . . . . . . . 22 + 6.2 Response Types . . . . . . . . . . . . . . . . . . . . . . 22 + 6.2.1 Document Response . . . . . . . . . . . . . . . . . 23 + 6.2.2 Local Redirect Response . . . . . . . . . . . . . . 23 + 6.2.3 Client Redirect Response . . . . . . . . . . . . . 23 + 6.2.4 Client Redirect Response with Document . . . . . . 24 + 6.3 Response Header Fields . . . . . . . . . . . . . . . . . . 24 + 6.3.1 Content-Type . . . . . . . . . . . . . . . . . . . 24 + 6.3.2 Location . . . . . . . . . . . . . . . . . . . . . 25 + 6.3.3 Status . . . . . . . . . . . . . . . . . . . . . . 26 + 6.3.4 Protocol-Specific Header Fields . . . . . . . . . . 26 + 6.3.5 Extension Header Fields . . . . . . . . . . . . . . 27 + 6.4 Response Message-Body . . . . . . . . . . . . . . . . . . 27 + + 7 System Specifications 27 + 7.1 AmigaDOS . . . . . . . . . . . . . . . . . . . . . . . . . 27 + 7.2 UNIX . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 + 7.3 EBCDIC/POSIX . . . . . . . . . . . . . . . . . . . . . . . 28 + + 8 Implementation 29 + 8.1 Recommendations for Servers . . . . . . . . . . . . . . . 29 + 8.2 Recommendations for Scripts . . . . . . . . . . . . . . . 29 + + 9 Security Considerations 30 + 9.1 Safe Methods . . . . . . . . . . . . . . . . . . . . . . . 30 + 9.2 Header Fields Containing Sensitive Information . . . . . . 30 + 9.3 Data Privacy . . . . . . . . . . . . . . . . . . . . . . . 30 + 9.4 Information Security Model . . . . . . . . . . . . . . . . 30 + 9.5 Script Interference with the Server . . . . . . . . . . . 30 + 9.6 Data Length and Buffering Considerations . . . . . . . . . 31 + 9.7 Stateless Processing . . . . . . . . . . . . . . . . . . . 31 + 9.8 Relative Paths . . . . . . . . . . . . . . . . . . . . . . 32 + 9.9 Non-parsed Header Output . . . . . . . . . . . . . . . . . 32 + + 10 Acknowledgements 32 + + 11 References 32 + + 12 Authors' Addresses 34 + + + + + + + + + +Robinson & Coar Expires 18 April 2004 [Page 3] + +INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 + + +1 Introduction + +1.1 Purpose + + The Common Gateway Interface (CGI) [21] allows an HTTP [2], [8] + server and a CGI script to share responsibility for responding to + client requests. The client request comprises a Universal Resource + Identifier (URI) [1], a request method and various ancillary + information about the request provided by the transport protocol. + + The CGI defines the abstract parameters, known as meta-variables, + which describe the client's request. Together with a concrete + programmer interface this specifies a platform-independent interface + between the script and the HTTP server. + + The server is responsible for managing connection, data transfer, + transport and network issues related to the client request, whereas + the CGI script handles the application issues, such as data access + and document processing. + +1.2 Requirements + + The key words 'MUST', 'MUST NOT', 'REQUIRED', 'SHALL', 'SHALL NOT', + 'SHOULD', 'SHOULD NOT', 'RECOMMENDED', 'MAY' and 'OPTIONAL' in this + document are to be interpreted as described in RFC 2119 [5]. + + An implementation is not compliant if it fails to satisfy one or more + of the 'must' requirements for the protocols it implements. An + implementation that satisfies all of the 'must' and all of the + 'should' requirements for its features is said to be 'unconditionally + compliant'; one that satisfies all of the 'must' requirements but not + all of the 'should' requirements for its features is said to be + 'conditionally compliant'. + +1.3 Specifications + + Not all of the functions and features of the CGI are defined in the + main part of this specification. The following phrases are used to + describe the features that are not specified: + + 'system defined' + The feature may differ between systems, but must be the same for + different implementations using the same system. A system will + usually identify a class of operating-systems. Some systems are + defined in section 7 of this document. New systems may be defined + by new specifications without revision of this document. + + 'implementation defined' + + + +Robinson & Coar Expires 18 April 2004 [Page 4] + +INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 + + + The behaviour of the feature may vary from implementation to + implementation; a particular implementation must document its + behaviour. + +1.4 Terminology + + This specification uses many terms defined in the HTTP/1.1 + specification [8]; however, the following terms are used here in a + sense which may not accord with their definitions in that document, + or with their common meaning. + + 'meta-variable' + A named parameter which carries information from the server to the + script. It is not necessarily a variable in the operating- + system's environment, although that is the most common + implementation. + + 'script' + The software that is invoked by the server according to this + interface. It need not be a standalone program, but could be a + dynamically-loaded or shared library, or even a subroutine in the + server. It might be a set of statements interpreted at run-time, + as the term 'script' is frequently understood, but that is not a + requirement and within the context of this specification the term + has the broader definition stated. + + 'server' + The application program that invokes the script in order to + service requests from the client. + +2 Notational Conventions and Generic Grammar + +2.1 Augmented BNF + + All of the mechanisms specified in this document are described in + both prose and an augmented Backus-Naur Form (BNF) similar to that + used by RFC 822 [6]. Unless stated otherwise, the elements are + case-sensitive. This augmented BNF contains the following + constructs: + + name = definition + The name of a rule and its definition are separated by the equals + character ('='). Whitespace is only significant in that + continuation lines of a definition are indented. + + "literal" + Double quotation marks (") surround literal text, except for a + literal quotation mark, which is surrounded by angle-brackets ('<' + + + +Robinson & Coar Expires 18 April 2004 [Page 5] + +INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 + + + and '>'). + + rule1 | rule2 + Alternative rules are separated by a vertical bar ('|'). + + (rule1 rule2 rule3) + Elements enclosed in parentheses are treated as a single element. + + *rule + A rule preceded by an asterisk ('*') may have zero or more + occurrences. The full form is 'n*m rule' indicating at least n + and at most m occurrences of the rule. n and m are optional + decimal values with default values of 0 and infinity respectively. + + [rule] + An element enclosed in square brackets ('[' and ']') is optional, + and is equivalent to '*1 rule'. + + N rule + A rule preceded by a decimal number represents exactly N + occurrences of the rule. It is equivalent to 'N*N rule'. + +2.2 Basic Rules + + This specification uses a BNF-like grammar defined in terms of + characters. Unlike many specifications which define the bytes + allowed by a protocol, here each literal in the grammar corresponds + to the character it represents. How these characters are represented + in terms of bits and bytes within a a system are either + system-defined or specified in the particular context. The single + exception is the rule 'OCTET', defined below. + + The following rules are used throughout this specification to + describe basic parsing constructs. + + alpha = lowalpha | hialpha + lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | + "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | + "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | + "y" | "z" + hialpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | + "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | + "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | + "Y" | "Z" + digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | + "8" | "9" + alphanum = alpha | digit + OCTET = + + + +Robinson & Coar Expires 18 April 2004 [Page 6] + +INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 + + + CHAR = alpha | digit | separator | "!" | "#" | "$" | + "%" | "&" | "'" | "*" | "+" | "-" | "." | "`" | + "^" | "_" | "{" | "|" | "}" | "~" | CTL + CTL = + SP = + HT = + NL = + LWSP = SP | HT | NL + separator = "(" | ")" | "<" | ">" | "@" | "," | ";" | ":" | + "\" | <"> | "/" | "[" | "]" | "?" | "=" | "{" | + "}" | SP | HT + token = 1* + quoted-string = <"> *qdtext <"> + qdtext = and CTLs but including LWSP> + TEXT = + + Note that newline (NL) need not be a single control character, but + can be a sequence of control characters. A system MAY define TEXT to + be a larger set of characters than . + +2.3 URL Encoding + + Some variables and constructs used here are described as being + 'URL-encoded'. This encoding is described in section 2 of RFC 2396 + [3]. In a URL-encoded string an escape sequence consists of a + percent character ("%") followed by two hexadecimal digits, where the + two hexadecimal digits form an octet. An escape sequence represents + the graphic character that has the octet as its code within the + US-ASCII [20] coded character set, if it exists. Currently there is + no provision within the URI syntax to identify which character set + non-ASCII codes represent, so CGI handles this issue on an ad-hoc + basis. + + Note that some unsafe (reserved) characters may have different + semantics when encoded. The definition of which characters are + unsafe depends on the context; see section 2 of RFC 2396 [3], updated + by RFC 2732 [11], for an authoritative treatment. These reserved + characters are generally used to provide syntactic structure to the + character string, for example as field separators. In all cases, the + string is first processed with regard to any reserved characters + present, and then the resulting data can be URL-decoded by replacing + "%" escapes by their character values. + + To encode a character string, all reserved and forbidden characters + are replaced by the corresponding "%" escapes. The string can then + be used in assembling a URI. The reserved characters will vary from + context to context, but will always be drawn from this set: + + + +Robinson & Coar Expires 18 April 2004 [Page 7] + +INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 + + + reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | + "," | "[" | "]" + + The last two characters were added by RFC 2732 [11]. In any + particular context, a sub-set of these characters will be reserved; + the other characters from this set MUST NOT be encoded when a string + is URL-encoded in that context. Other basic rules used to describe + URI syntax are: + + hex = digit | "A" | "B" | "C" | "D" | "E" | "F" | "a" | "b" + | "c" | "d" | "e" | "f" + escaped = "%" hex hex + unreserved = alpha | digit | mark + mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")" + +3 Invoking the Script + +3.1 Server Responsibilities + + The server acts as an application gateway. It receives the request + from the client, selects a CGI script to handle the request, converts + the client request to a CGI request, executes the script and converts + the CGI response into a response for the client. When processing the + client request, it is responsible for implementing any protocol or + transport level authentication and security. The server MAY also + function in a 'non-transparent' manner, modifying the request or + response in order to provide some additional service, such as media + type transformation or protocol reduction. + + The server MUST perform translations and protocol conversions on the + client request data required by this specification. Furthermore, the + server retains its responsibility to the client to conform to the + relevant network protocol even if the CGI script fails to conform to + this specification. + + If the server is applying authentication to the request, then it MUST + NOT execute the script unless the request passes all defined access + controls. + +3.2 Script Selection + + The server determines which CGI is script to be executed based on a + generic-form URI supplied by the client. This URI includes a + hierarchical path with components separated by "/". For any + particular request, the server will identify all or a leading part of + this path with an individual script, thus placing the script at a + particular point in the path hierarchy. The remainder of the path, + if any, is a resource or sub-resource identifier to be interpreted by + + + +Robinson & Coar Expires 18 April 2004 [Page 8] + +INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 + + + the script. + + Information about this split of the path is available to the script + in the meta-variables, described below. Support for non-hierarchical + URI schemes is outside the scope of this specification. + +3.3 The Script-URI + + The mapping from client request URI to choice of script is defined by + the particular server implementation and its configuration. The + server may allow the script to be identified with a set of several + different URI path hierarchies, and therefore is permitted to replace + the URI by other members of this set during processing and generation + of the meta-variables. The server + + 1. MAY preserve the URI in the particular client request; or + + 2. MAY select a canonical URI from the set of possible values for + each script; or + + 3. can implement any other selection of URI from the set. + + From the meta-variables thus generated, a URI, the 'Script-URI', can + be constructed. This MUST have the property that if the client had + accessed this URI instead, then the script would have been executed + with the same values for the SCRIPT_NAME, PATH_INFO and QUERY_STRING + meta-variables. The Script-URI has the structure of a generic URI as + defined in section 3 of RFC 2396 [3], with the exception that object + parameters and fragment identifiers are not permitted. The various + components of the Script-URI are defined by some of the + meta-variables (see below); + + script-URI = "://" ":" + "?" + + where is found from SERVER_PROTOCOL, , + and are the values of the respective + meta-variables. The SCRIPT_NAME and PATH_INFO values, URL-encoded + with ";", "=" and "?" reserved, give and . + See section 4.1.5 for more information about the PATH_INFO + meta-variable. + + The scheme and the protocol are not identical as the scheme + identifies the access method in addition to the protocol. For + example, a resource accessed using Transport Layer Security (TLS) [7] + would have a request URI with a scheme of https when using the HTTP + protocol [16]. CGI/1.1 provides no generic means for the script to + reconstruct this, and therefore the Script-URI as defined includes + + + +Robinson & Coar Expires 18 April 2004 [Page 9] + +INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 + + + the base protocol used. However, a script MAY make use of + scheme-specific meta-variables to better deduce the URI scheme. + + Note that this definition also allows URIs to be constructed which + would invoke the script with any permitted values for the path-info + or query-string, by modifying the appropriate components. + +3.4 Execution + + The script is invoked in a system defined manner. Unless specified + otherwise, the file containing the script will be invoked as an + executable program. The server prepares the CGI request as described + in section 4; this comprises the request meta-variables (immediately + available to the script on execution) and request message data. The + request data need not be immediately available to the script; the + script can be executed before all this data has been received by the + server from the client. The response from the script is returned to + the server as described in sections 5 and 6. + + In the event of an error condition, the server can interrupt or + terminate script execution at any time and without warning. That + could occur, for example, in the event of a transport failure between + the server and the client; so the script SHOULD be prepared to handle + abnormal termination. + +4 The CGI Request + + Information about a request comes from two different sources; the + request meta-variables and any associated message-body. + +4.1 Request Meta-Variables + + Meta-variables contain data about the request passed from the server + to the script, and are accessed by the script in a system defined + manner. Meta-variables are identified by case-insensitive names; + there cannot be two different variables whose names differ in case + only. Here they are shown using a canonical representation of + capitals plus underscore ("_"). A particular system can define a + different representation. + + meta-variable-name = "AUTH_TYPE" | "CONTENT_LENGTH" | + "CONTENT_TYPE" | "GATEWAY_INTERFACE" | + "PATH_INFO" | "PATH_TRANSLATED" | + "QUERY_STRING" | "REMOTE_ADDR" | + "REMOTE_HOST" | "REMOTE_IDENT" | + "REMOTE_USER" | "REQUEST_METHOD" | + "SCRIPT_NAME" | "SERVER_NAME" | + "SERVER_PORT" | "SERVER_PROTOCOL" | + + + +Robinson & Coar Expires 18 April 2004 [Page 10] + +INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 + + + "SERVER_SOFTWARE" | scheme | + protocol-var-name | extension-var-name + protocol-var-name = ( protocol | scheme ) "_" var-name + scheme = alpha *( alpha | digit | "+" | "-" | "." ) + var-name = token + extension-var-name = token + + Meta-variables with the same name as a scheme, and names beginning + with the name of a protocol or scheme (e.g. HTTP_ACCEPT) are also be + specified. The number and meaning of these variables may change + independently of this specification. (See also section 4.1.18.) + + The server MAY define additional implementation-specific extension + meta-variables, whose names SHOULD be prefixed with "X_". + + This specification does not distinguish between zero-length (NULL) + values and missing values. For example, a script cannot distinguish + between the two requests http://host/script and http://host/script? + as in both cases the QUERY_STRING meta-variable would be NULL. + + meta-variable-value = "" | 1* + + An optional meta-variable may be omitted (left unset) if its value is + NULL. Meta-variable values MUST be considered case-sensitive except + as noted otherwise. The representation of the characters in the + meta-variables is system defined; the server MUST convert values to + that representation. + +4.1.1 AUTH_TYPE + + The AUTH_TYPE variable identifies any mechanism used by the server to + authenticate the user. It contains a case-insensitive value defined + by the client protocol or server implementation. + + For HTTP, If the client request required authentication for external + access, then the server MUST set the value of this variable from the + 'auth-scheme' token in the request Authorization header field. + + AUTH_TYPE = "" | auth-scheme + auth-scheme = "Basic" | "Digest" | extension-auth + extension-auth = token + + HTTP access authentication schemes are described in RFC 2617 [9]. + +4.1.2 CONTENT_LENGTH + + The CONTENT_LENGTH variable contains the size of the message-body + attached to the request, if any, in decimal number of octets. If no + + + +Robinson & Coar Expires 18 April 2004 [Page 11] + +INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 + + + data is attached, then NULL (or unset). + + CONTENT_LENGTH = "" | 1*digit + + The server MUST set this meta-variable if and only if the request is + accompanied by a message-body entity. The CONTENT_LENGTH value must + reflect the length of the message-body after the server has removed + any transfer-codings or content-codings. + +4.1.3 CONTENT_TYPE + + If the request includes a message-body, the CONTENT_TYPE variable is + set to the Internet Media Type [10] of the message-body. + + CONTENT_TYPE = "" | media-type + media-type = type "/" subtype *( ";" parameter ) + type = token + subtype = token + parameter = attribute "=" value + attribute = token + value = token | quoted-string + + The type, subtype and parameter attribute names are not case- + sensitive. Parameter values may be case sensitive. Media types and + their use in HTTP are described section 3.7 of the HTTP/1.1 + specification [8]. + + There is no default value for this variable. If and only if it is + unset, then the script MAY attempt to determine the media type from + the data received. If the type remains unknown, then the script MAY + choose to assume a type of application/octet-stream or it may reject + the request with an error (as described in section 6.3.3). + + Each media-type defines a set of optional and mandatory parameters. + This may include a charset parameter with a case-insensitive value + defining the coded character set for the message-body. If the + charset parameter is omitted, then the default value should be + derived according to whichever of the following rules is the first to + apply: + + 1. There MAY be a system-defined default charset for some + media-types. + + 2. The default for media-types of type "text" is ISO-8859-1 [8]. + + 3. Any default defined in the media-type specification. + + 4. The default is US-ASCII. + + + +Robinson & Coar Expires 18 April 2004 [Page 12] + +INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 + + + The server MUST set this meta-variable if an HTTP Content-Type field + is present in the client request header. If the server receives a + request with an attached entity but no Content-Type header field, it + MAY attempt to determine the correct content type, otherwise it + should omit this meta-variable. + +4.1.4 GATEWAY_INTERFACE + + The GATEWAY_INTERFACE variable MUST be set to the dialect of CGI + being used by the server to communicate with the script. Syntax: + + GATEWAY_INTERFACE = "CGI" "/" 1*digit "." 1*digit + + Note that the major and minor numbers are treated as separate + integers and hence each may be incremented higher than a single + digit. Thus CGI/2.4 is a lower version than CGI/2.13 which in turn + is lower than CGI/12.3. Leading zeros MUST be ignored by the script + and MUST NOT be generated by the server. + + This document defines the 1.1 version of the CGI interface. + +4.1.5 PATH_INFO + + The PATH_INFO variable specifies a path to be interpreted by the CGI + script. It identifies the resource or sub-resource to be returned by + the CGI script, and is derived from the the portion of the URI path + hierarchy following the part that identifies the script itself. + Unlike a URI path, the PATH_INFO is not URL-encoded, and cannot + contain path-segment parameters. A PATH_INFO of "/" represents a + single void path segment. + + PATH_INFO = "" | ( "/" path ) + path = lsegment *( "/" lsegment ) + lsegment = *lchar + lchar = + + The value is considered case-sensitive and the server MUST preserve + the case of the path as presented in the request URI. The server MAY + impose restrictions and limitations on what values it permits for + PATH_INFO, and MAY reject the request with an error if it encounters + any values considered objectionable. That MAY include any requests + that would result in an encoded "/" being decoded into PATH_INFO, as + this might represent a loss of information to the script. Similarly, + treatment of non US-ASCII characters in the path is system defined. + + URL-encoded, the PATH_INFO string forms the extra-path component of + the Script-URI (see section 3.3) which follows the SCRIPT_NAME part + of that path. + + + +Robinson & Coar Expires 18 April 2004 [Page 13] + +INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 + + +4.1.6 PATH_TRANSLATED + + The PATH_TRANSLATED variable is derived by taking the PATH_INFO + value, parsing it as a local URI in its own right, and performing any + virtual-to-physical translation appropriate to map it onto the + server's document repository structure. The set of characters + permitted in the result is system defined. + + PATH_TRANSLATED = * + + This is the file location that would be accessed by a request for + + "://" ":" + + where is the scheme for the original client request and + is a URL-encoded version of PATH_INFO, with ";", "=" and + "?" reserved. For example, a request such as the following: + + http://somehost.com/cgi-bin/somescript/this%2eis%2epath%3binfo + + would result in a PATH_INFO value of + + /this.is.the.path;info + + An internal URI is constructed from the scheme, server location and + the URL-encoded PATH_INFO: + + http://somehost.com/this.is.the.path%3binfo + + This would then be translated to a location in the server's document + repository, perhaps a filesystem path something like this: + + /usr/local/www/htdocs/this.is.the.path;info + + The result of the translation is the value of PATH_TRANSLATED. + + The value of PATH_TRANSLATED is derived in this way irrespective of + whether it maps to a valid repository location. The server MUST + preserve the case of the extra-path segment unless the underlying + repository supports case-insensitive names. If the repository is + only case-aware, case-preserving, or case-blind with regard to + document names, the server is not required to preserve the case of + the original segment through the translation. + + The translation algorithm the server uses to derive PATH_TRANSLATED + is implementation defined; CGI scripts which use this variable may + suffer limited portability. + + + + +Robinson & Coar Expires 18 April 2004 [Page 14] + +INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 + + + The server SHOULD set this meta-variable if the request URI includes + a path-info component. If PATH_INFO is NULL, then the + PATH_TRANSLATED variable MUST be set to NULL (or unset). + +4.1.7 QUERY_STRING + + The QUERY_STRING variable contains a URL-encoded search or parameter + string; it provides information to the CGI script to affect or refine + the document to be returned by the script. + + The URL syntax for a search string is described in section 3 of RFC + 2396 [3]. The QUERY_STRING value is case-sensitive. + + QUERY_STRING = query-string + query-string = *uric + uric = reserved | unreserved | escaped + + When parsing and decoding the query string, the details of the + parsing, reserved characters and support for non US-ASCII characters + depends on the context. For example, form submission from an HTML + document [15] uses application/x-www-form-urlencoded encoding, in + which the characters "+", "&" and "=" are reserved, and the ISO + 8859-1 encoding may be used for non US-ASCII characters. + + The QUERY_STRING value provides the query-string part of the + Script-URI. (See section 3.3). + + The server MUST set this variable; if the Script-URI does not include + a query component, the QUERY_STRING MUST be defined as an empty + string (""). + +4.1.8 REMOTE_ADDR + + The REMOTE_ADDR variable MUST be set to the network address of the + client sending the request to the server. + + REMOTE_ADDR = hostnumber + hostnumber = ipv4-address | ipv6-address + ipv4-address = 1*3digit "." 1*3digit "." 1*3digit "." 1*3digit + ipv6-address = hexpart [ ":" ipv4-address ] + hexpart = hexseq | ( [ hexseq ] "::" [ hexseq ] ) + hexseq = 1*4hex *( ":" 1*4hex ) + + The format of IPv6 addresses is defined in RFC 2373 [12]. + + + + + + + +Robinson & Coar Expires 18 April 2004 [Page 15] + +INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 + + +4.1.9 REMOTE_HOST + + The REMOTE_HOST variable contains the fully qualified domain name of + the client sending the request to the server, if available, otherwise + NULL. Fully qualified domain names take the form as described in + section 3.5 of RFC 1034 [14] and section 2.1 of RFC 1123 [4]. Domain + names are not case sensitive. + + REMOTE_HOST = "" | hostname | hostnumber + hostname = *( domainlabel "." ) toplabel [ "." ] + domainlabel = alphanum [ *alphahypdigit alphanum ] + toplabel = alpha [ *alphahypdigit alphanum ] + alphahypdigit = alphanum | "-" + + The server SHOULD set this variable. If the hostname is not + available for performance reasons or otherwise, the server MAY + substitute the REMOTE_ADDR value. + +4.1.10 REMOTE_IDENT + + The REMOTE_IDENT variable MAY be used to provide identity information + reported about the connection by an RFC 1413 [17] request to the + remote agent, if available. The server may choose not to support + this feature, or not to request the data for efficiency reasons, or + not to return available identity data. + + REMOTE_IDENT = *TEXT + + The data returned may be used for authentication purposes, but the + level of trust reposed in it should be minimal. + +4.1.11 REMOTE_USER + + The REMOTE_USER variable provides a user identification string + supplied by client as part of user authentication. + + REMOTE_USER = *TEXT + + If the client request required HTTP Authentication [9] (e.g. the + AUTH_TYPE meta-variable is set to "Basic" or "Digest"), then the + value of the REMOTE_USER meta-variable MUST be set to the user-ID + supplied. + +4.1.12 REQUEST_METHOD + + The REQUEST_METHOD meta-variable MUST be set to the method which + should be used by the script to process the request, as described in + section 4.3. + + + +Robinson & Coar Expires 18 April 2004 [Page 16] + +INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 + + + REQUEST_METHOD = method + method = "GET" | "POST" | "HEAD" | extension-method + extension-method = "PUT" | "DELETE" | token + + The method is case sensitive. The HTTP methods are described in + section 5.1.1 of the HTTP/1.0 specification [2] and section 5.1.1 of + the HTTP/1.1 specification [8]. + +4.1.13 SCRIPT_NAME + + The SCRIPT_NAME variable MUST be set to a URI path (not URL-encoded) + which could identify the CGI script (rather then the script's + output). The syntax is the same as for PATH_INFO (section 4.1.5) + + SCRIPT_NAME = "" | ( "/" path ) + + The leading "/" is not part of the path. It is optional if the path + is NULL; however, the variable MUST still be set in that case. + + The SCRIPT_NAME string forms some leading part of the path component + of the Script-URI derived in some implementation defined manner. No + PATH_INFO segment (see section 4.1.5) is included in the SCRIPT_NAME + value. + +4.1.14 SERVER_NAME + + The SERVER_NAME variable MUST be set to the name of the server host + to which the client request is directed. It is a case-insensitive + hostname or network address. It forms the host part of the + Script-URI. The syntax for an IPv6 address in a URI is defined in + RFC 2373 [12]. + + SERVER_NAME = server-name + server-name = hostname | ipv4-address | ( "[" ipv6-address "]" ) + + A deployed server can have more than one possible value for this + variable, where several HTTP virtual hosts share the same IP address. + In that case, the server uses the contents of the Host header field + to select the correct virtual host. + +4.1.15 SERVER_PORT + + The SERVER_PORT variable MUST be set to the TCP/IP port number on + which this request is received from the client. This value is used + in the port part of the Script-URI. + + SERVER_PORT = server-port + server-port = 1*digit + + + +Robinson & Coar Expires 18 April 2004 [Page 17] + +INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 + + + Note that this variable MUST be set, even if the port is the default + port for the scheme and could otherwise be omitted from a URI. + +4.1.16 SERVER_PROTOCOL + + The SERVER_PROTOCOL variable MUST be set to the name and version of + the application protocol used for this CGI request. This is not + necessarily the same as the protocol version used by the server in + its communication with the client. + + SERVER_PROTOCOL = HTTP-Version | "INCLUDED" | extension-version + HTTP-Version = "HTTP" "/" 1*digit "." 1*digit + extension-version = protocol [ "/" 1*digit "." 1*digit ] + protocol = token + + 'protocol' is a version of the scheme part of the Script-URI, and is + not case sensitive. By convention, 'protocol' is in upper case. The + protocol may not be identical to the scheme of the request; for + example, the request may have scheme "https", whilst the protocol is + "HTTP". + + A well-known value for SERVER_PROTOCOL which the server MAY use is + "INCLUDED", which signals that the current document is being included + as part of a composite document, rather than being the direct target + of the client request. The script should treat this as an HTTP/1.0 + request. + +4.1.17 SERVER_SOFTWARE + + The SERVER_SOFTWARE meta-variable MUST be set to the name and version + of the information server software making the CGI request (and + running the gateway). It SHOULD be the same as the server + description reported to the client, if any. + + SERVER_SOFTWARE = 1*( product | comment ) + product = token [ "/" product-version ] + product-version = token + comment = "(" *( ctext | comment ) ")" + ctext = + +4.1.18 Protocol-Specific Meta-Variables + + The server SHOULD set meta-variables specific to the protocol and + scheme for the request. Interpretation of protocol-specific + variables depends on the protocol version in SERVER_PROTOCOL. The + server MAY set a meta-variable with the name of the scheme to a + non-NULL value if the scheme is not the same as the protocol. The + presence of such a variable indicates to a script which scheme is + + + +Robinson & Coar Expires 18 April 2004 [Page 18] + +INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 + + + used by the request. + + Meta-variables with names beginning with "HTTP_" contain values read + from the client request header fields, if the protocol used is HTTP. + The HTTP header field name is converted to upper case, has all + occurrences of "-" replaced with "_" and has "HTTP_" prepended to + give the meta-variable name. The header data can be presented as + sent by the client, or can be rewritten in ways which do not change + its semantics. If multiple header fields with the same field-name + are received then the server MUST rewrite them as a single value + having the same semantics. Similarly, a header field that spans + multiple lines must be merged onto a single line. The server MUST, + if necessary, change the representation of the data (for example, the + character set) to be appropriate for a CGI meta-variable. + + The server is not required to create meta-variables for all the + header fields that it receives. In particular, it SHOULD remove any + header fields carrying authentication information, such as + 'Authorization'; or that are available to the script in other + variables, such as 'Content-Length' and 'Content-Type'. The server + MAY remove header fields that relate solely to client-side + communication issues, such as 'Connection'. + +4.2 Request Message-Body + + Request data is accessed by the script in a system-defined method; + unless defined otherwise, this will be by reading the 'standard + input' file descriptor or file handle. + + Request-Data = [ request-body ] [ extension-data ] + request-body = OCTET + extension-data = *OCTET + + A request-body is supplied with the request if the CONTENT_LENGTH is + not NULL. The server MUST make at least that many bytes available + for the script to read. The server MAY signal an end-of-file + condition after CONTENT_LENGTH bytes have been read or it MAY supply + extension data. Therefore, the script MUST NOT attempt to read more + than CONTENT_LENGTH bytes, even if more data is available. However, + it is not obliged to read any of the data. + + For non-parsed header (NPH) scripts (section 5), the server SHOULD + attempt to ensure that the data supplied to the script is precisely + as supplied by the client and is unaltered by the server. + + As transfer-codings are not supported on the request-body, the server + MUST remove any such codings from the message-body, and recalculate + the CONTENT_LENGTH. If this is not possible (for example, because of + + + +Robinson & Coar Expires 18 April 2004 [Page 19] + +INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 + + + large buffering requirements), the server SHOULD reject the client + request. It MAY also remove content-codings from the message-body. + +4.3 Request Methods + + The Request Method, as supplied in the REQUEST_METHOD meta-variable, + identifies the processing method to be applied by the script in + producing a response. The script author can choose to implement the + methods most appropriate for the particular application. If the + script receives a request with a method it does not support it SHOULD + reject it with an error (see section 6.3.3). + +4.3.1 GET + + The GET method method indicates that the script should produce a + document based on the meta-variable values. By convention, the GET + method is 'safe' and 'idempotent' and SHOULD NOT have the the + significance of taking an action other than producing a document. + + The meaning of the GET method may be modified and refined by + protocol-specific meta-variables. + +4.3.2 POST + + The POST method is used to request the script perform processing and + produce a document based on the data in the request message-body, in + addition to meta-variable values. A common use is form submission in + HTML [15], intended to initiate processing by the script that has a + permanent affect, such a change in a database. + + The script MUST check the value of the CONTENT_LENGTH variable before + reading the attached message-body, and SHOULD check the CONTENT_TYPE + value before processing it. + +4.3.3 HEAD + + The HEAD method requests the script to do sufficient processing to + return the response header fields, without providing a response + message-body. The script MUST NOT provide a response message-body + for a HEAD request. If it does, then the server MUST discard the + message-body when reading the response. + +4.3.4 Protocol-Specific Methods + + The script MAY implement any protocol-specific method, such as + HTTP/1.1 PUT and DELETE; it SHOULD check the value of SERVER_PROTOCOL + when doing so. + + + + +Robinson & Coar Expires 18 April 2004 [Page 20] + +INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 + + + The server MAY decide that some methods are not appropriate or + permitted for a script, and may handle the methods itself or return + an error to the client. + +4.4 The Script Command Line + + Some systems support a method for supplying an array of strings to + the CGI script. This is only used in the case of an 'indexed' HTTP + query, which is identified by a 'GET' or 'HEAD' request with a URI + query string that does not contain any unencoded "=" characters. For + such a request, the server SHOULD treat the query-string as a + search-string and parse it into words, using the rules + + search-string = search-word *( "+" search-word ) + search-word = 1*schar + schar = unreserved | escaped | xreserved + xreserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "," | + "$" + + After parsing, each search-word is URL-decoded, optionally encoded in + a system defined manner and then added to the argument list. + + If the server cannot create any part of the argument list, then the + server MUST NOT generate any command line information. For example, + the number of arguments may be greater than operating system or + server limits, or one of the words may not be representable as an + argument. + + The script SHOULD check to see if the QUERY_STRING value contains an + unencoded "=" character, and SHOULD NOT use the command line + arguments if it does. + +5 NPH Scripts + +5.1 Identification + + The server MAY support NPH (Non-Parsed Header) scripts; these are + scripts to which the server passes all responsibility for response + processing. + + This specification provides no mechanism for an NPH script to be + identified on the basis of its output data alone. By convention, + therefore, any particular script can only ever provide output of one + type (NPH or CGI) and hence the script itself is described as an 'NPH + script'. A server with NPH support MUST provide an implementation- + defined mechanism for identifying NPH scripts, perhaps based on the + name or location of the script. + + + + +Robinson & Coar Expires 18 April 2004 [Page 21] + +INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 + + +5.2 NPH Response + + There MUST be a system defined method for the script to send data + back to the server or client; a script MUST always return some data. + Unless defined otherwise, this will be the same as for conventional + CGI scripts. + + Currently, NPH scripts are only defined for HTTP client requests. An + (HTTP) NPH script MUST return a complete HTTP response message, + currently described in section 6 of the HTTP specifications [2], [8]. + The script MUST use the SERVER_PROTOCOL variable to determine the + appropriate format for a response. It MUST also take account of any + generic or protocol-specific meta-variables in the request as might + be mandated by the particular protocol specification. + + The server MUST ensure that the script output is sent to the client + unmodified. Note that this requires the script to use the correct + character set (US-ASCII [20] and ISO 8859-1 [21] for HTTP) in the + header fields. The server SHOULD attempt to ensure that the script + output is sent directly to the client, with minimal internal and no + transport-visible buffering. + + Unless the implementation defines otherwise, the script MUST NOT + indicate in its response that the client can send further requests + over the same connection. + +6 CGI Response + +6.1 Response Handling + + A script MUST always provide a non-empty response, and so there is a + system defined method for it to send this data back to the server. + Unless defined otherwise, this will be via the 'standard output' file + descriptor. + + The script MUST check the REQUEST_METHOD variable when processing the + request and preparing its response. + + The server MAY implement a timeout period within which data must be + received from the script. If a server implementation defines such a + timeout and receives no data from a script within the timeout period, + the server MAY terminate the script process. + +6.2 Response Types + + The response comprises a message-header and a message-body, separated + by a blank line. The message-header contains one ore more header + fields. The body may be NULL. + + + +Robinson & Coar Expires 18 April 2004 [Page 22] + +INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 + + + generic-response = 1*header-field NL [ response-body ] + + The script MUST return one of either a document response, a local + redirect response or a client redirect (with optional document) + response. In the response definitions below, the order of header + fields in a response is not significant (despite appearing so in the + BNF). The header fields are defined in section 6.3. + + CGI-Response = document-response | local-redir-response | + client-redir-response | client-redirdoc-response + +6.2.1 Document Response + + The CGI script can return a document to the user in a document + response, with an optional error code indicating the success status + of the response. + + document-response = Content-Type [ Status ] *other-field NL + response-body + + The script MUST return a Content-Type header field. A Status header + field is optional, and status 200 'OK' is assumed if it is omitted. + The server MUST make any appropriate modifications to the script's + output to ensure that the response to the client complies with the + response protocol version. + +6.2.2 Local Redirect Response + + The CGI script can return a URI path and query-string + ('local-pathquery') for a local resource in a Location header field. + This indicates to the server that it should reprocess the request + using the path specified. + + local-redir-response = local-Location NL + + The script MUST NOT return any other header fields or a message-body, + and the server MUST generate the response that it would have produced + in response to a request containing the URL + + scheme "://" server-name ":" server-port local-pathquery + +6.2.3 Client Redirect Response + + The CGI script can return an absolute URI path in a Location header + field, to indicate to the client that it should reprocess the request + using the URI specified. + + client-redir-response = client-Location *extension-field NL + + + +Robinson & Coar Expires 18 April 2004 [Page 23] + +INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 + + + The script MUST not provide any other header fields, except for + server-defined CGI extension fields. For an HTTP client request, the + server MUST generate a 302 'Found' HTTP response message. + +6.2.4 Client Redirect Response with Document + + The CGI script can return an absolute URI path in a Location header + field together with an attached document, to indicate to the client + that it should reprocess the request using the URI specified. + + client-redirdoc-response = client-Location Status Content-Type + *other-field NL response-body + + The Status header field MUST be supplied and MUST contain a status + value of 302 'Found'. The server MUST make any appropriate + modifications to the script's output to ensure that the response to + the client complies with the response protocol version. + +6.3 Response Header Fields + + The response header fields are either CGI or extension header fields + to be interpreted by the server, or protocol-specific headers to be + included in the response returned to the client. At least one CGI + field MUST be supplied; each CGI field MUST NOT appear more than once + in the response. The response header fields have the syntax: + + header-field = CGI-field | other-field + CGI-field = Content-Type | Location | Status + other-field = protocol-field | extension-field + protocol-field = generic-field + extension-field = generic-field + generic-field = field-name ":" [ field-value ] NL + field-name = token + field-value = *( field-content | LWSP ) + field-content = *( token | separator | quoted-string ) + + The field-name is not case sensitive. A NULL field value is + equivalent to a field not being sent. Note that each header field in + a CGI-Response MUST be specified on a single line; CGI/1.1 does not + support continuation lines. Whitespace is permitted between the ":" + and the field-value (but not between the field-name and the ":"), and + also between tokens in the field-value. + +6.3.1 Content-Type + + The Content-Type response field sets the Internet Media Type [10] of + the entity body. + + + + +Robinson & Coar Expires 18 April 2004 [Page 24] + +INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 + + + Content-Type = "Content-Type:" media-type NL + + If an entity body is returned, the script MUST supply a Content-Type + field in the response. If it fails to do so, the server SHOULD NOT + attempt to determine the correct content type. The value SHOULD be + sent unmodified to the client, except for any charset parameter + changes. + + Unless it is otherwise system-defined, the default charset assumed by + the client for text media-types is ISO-8859-1 if the protocol is HTTP + and US-ASCII otherwise. Hence the script SHOULD include a charset + parameter. See section 3.4.1 of the HTTP/1.1 specification [8] for a + discussion of this issue. + +6.3.2 Location + + The Location header field is used to specify to the server that the + script is returning a reference to a document rather than an actual + document. It is either an absolute URI (with fragment), indicating + that the client is to fetch the referenced document, or a local URI + path (with query string), indicating that the server is to fetch the + referenced document. + + Location = local-Location | client-Location + client-Location = "Location:" fragment-URI NL + local-Location = "Location:" local-pathquery NL + fragment-URI = absoluteURI [ "#" fragment ] + fragment = *uric + local-pathquery = abs-path [ "?" query-string ] + abs-path = "/" path-segments + path-segments = segment *( "/" segment ) + segment = *pchar + pchar = unreserved | escaped | extra + extra = ":" | "@" | "&" | "=" | "+" | "$" | "," + + The syntax of an absoluteURI is incorporated into this document from + that specified in RFC 2396 [3] and RFC 2732 [11]. A valid + absoluteURI always starts with the name of scheme followed by ":"; + scheme names start with a letter and continue with alphanumerics, + "+", "-" or ".". The local URI path and query must be an absolute + path, and not a relative path or NULL, and hence must start with a + "/". + + Note that any message-body attached to the request (such as for a + POST request) may not be available to the resource that is the target + of the redirect. + + + + + +Robinson & Coar Expires 18 April 2004 [Page 25] + +INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 + + +6.3.3 Status + + The Status header field contains a 3-digit integer result code that + indicates the level of success of the script's attempt to handle the + request. + + Status = "Status:" status-code SP reason-phrase NL + status-code = "200" | "302" | "400" | "501" | 3digit + reason-phrase = *TEXT + + Status code 200 'OK' indicates success, and is the default value + assumed for a document response. Status code 302 'Found' is used + with a Location header field and response message-body. Status code + 400 'Bad Request' may be used for an unknown request format, such as + a missing CONTENT_TYPE. Status code 501 'Not Implemented' may be + returned by a script if it receives an unsupported REQUEST_METHOD. + + Other valid status codes are listed in section 6.1.1 of the HTTP + specifications [2], [8], and also the IANA HTTP Status Code Registry + [18], and can be used in addition to or instead of the ones listed + above. The script SHOULD check the value of SERVER_PROTOCOL before + using HTTP/1.1 status codes. The script MAY reject with error 405 + 'Method Not Allowed' HTTP/1.1 requests made using a method it does + not support. + + Note that returning an error status code does not have to mean an + error condition with the script itself. For example, a script that + is invoked as an error handler by the server should return the code + appropriate to the server's error condition. + + The reason-phrase is a textual description of the error to be + returned to the client for human consumption. + +6.3.4 Protocol-Specific Header Fields + + The script MAY return any other header fields that relate to the + response message defined by the specification for the SERVER_PROTOCOL + (HTTP/1.0 [2] or HTTP/1.1 [8]). The server MUST translate the header + data from the CGI header syntax to the HTTP header syntax if these + differ. For example, the character sequence for newline (such as + UNIX's US-ASCII LF) used by CGI scripts may not be the same as that + used by HTTP (US-ASCII CR followed by LF). + + The script MUST NOT return any header fields that relate to + client-side communication issues and could affect the server's + ability to send the response to the client. The server MAY remove + any such header fields returned by the client. It SHOULD resolve any + conflicts between headers returned by the script and headers that it + + + +Robinson & Coar Expires 18 April 2004 [Page 26] + +INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 + + + would otherwise send itself. + +6.3.5 Extension Header Fields + + The server may define additional implementation-specific CGI header + fields, whose field names SHOULD begin with "X-CGI-". It MAY ignore + (and delete) any unrecognised header fields with names beginning + "X-CGI-". + +6.4 Response Message-Body + + The response message-body is an attached document to be returned to + the client by the server. The server MUST read all the data provided + by the script, until the script signals the end of the message-body + by way of an end-of-file condition. The message-body SHOULD be sent + unmodified to the client, except for HEAD requests or any required + transfer-codings, content-codings or charset conversions. + + response-body = *OCTET + +7 System Specifications + +7.1 AmigaDOS + + Meta-Variables + Meta-variables are passed to the script in identically named + environment variables. These are accessed by the DOS library + routine GetVar(). The flags argument SHOULD be 0. Case is + ignored, but upper case is recommended for compatibility with + case-sensitive systems. + + The current working directory + The current working directory for the script is set to the + directory containing the script. + + Character set + The US-ASCII character set [20] is used for the definition of + meta-variables, header fields and values; the newline (NL) + sequence is LF; servers SHOULD also accept CR LF as a newline. + +7.2 UNIX + + For UNIX compatible operating systems, the following are defined: + + Meta-Variables + Meta-variables are passed to the script in identically named + environment variables. These are accessed by the C library + routine getenv() or variable environ. + + + +Robinson & Coar Expires 18 April 2004 [Page 27] + +INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 + + + The command line + This is accessed using the the argc and argv arguments to main(). + The words have any characters which are 'active' in the Bourne + shell escaped with a backslash. + + The current working directory + The current working directory for the script SHOULD be set to the + directory containing the script. + + Character set + The US-ASCII character set [20], excluding NUL, is used for the + definition of meta-variables, header fields and CHAR values; TEXT + values use ISO-8859-1. The PATH_TRANSLATED value can contain any + 8-bit byte except NUL. The newline (NL) sequence is LF; servers + should also accept CR LF as a newline. + +7.3 EBCDIC/POSIX + + For POSIX compatible operating systems using the EBCDIC character + set, the following are defined: + + Meta-Variables + Meta-variables are passed to the script in identically named + environment variables. These are accessed by the C library + routine getenv(). + + The command line + This is accessed using the the argc and argv arguments to main(). + The words have any characters which are 'active' in the Bourne + shell escaped with a backslash. + + The current working directory + The current working directory for the script SHOULD be set to the + directory containing the script. + + Character set + The IBM1047 character set [19], excluding NUL, is used for the + definition of meta-variables, header fields, values, TEXT strings + and the PATH_TRANSLATED value. The newline (NL) sequence is LF; + servers should also accept CR LF as a newline. + + media-type charset default + The default charset value for text (and other + implementation-defined) media types is IBM1047. + + + + + + + +Robinson & Coar Expires 18 April 2004 [Page 28] + +INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 + + +8 Implementation + +8.1 Recommendations for Servers + + Although the server and the CGI script need not be consistent in + their handling of URL paths (client URLs and the PATH_INFO data, + respectively), server authors may wish to impose consistency. So the + server implementation should specify its behaviour for the following + cases: + + 1. define any restrictions on allowed path segments, in particular + whether non-terminal NULL segments are permitted; + + 2. define the behaviour for "." or ".." path segments; i.e. + whether they are prohibited, treated as ordinary path segments + or interpreted in accordance with the relative URL + specification [3]; + + 3. define any limits of the implementation, including limits on + path or search string lengths, and limits on the volume of + header fields the server will parse. + +8.2 Recommendations for Scripts + + If the script does not intend processing the PATH_INFO data, then it + should reject the request with 404 Not Found if PATH_INFO is not + NULL. + + If the output of a form is being processed, check that CONTENT_TYPE + is "application/x-www-form-urlencoded" [15] or "multipart/form-data" + [13]. If CONTENT_TYPE is blank, the script can reject the request + with a 415 'Unsupported Media Type' error, where supported by the + protocol. + + When parsing PATH_INFO, PATH_TRANSLATED or SCRIPT_NAME the script + should be careful of void path segments ("//") and special path + segments ("." and ".."). They should either be removed from the path + before use in OS system calls, or the request should be rejected with + 404 'Not Found'. + + When returning header fields, the script should try to send the CGI + headers as soon as possible, and should send them before any HTTP + headers. This may help reduce the server's memory requirements. + + + + + + + + +Robinson & Coar Expires 18 April 2004 [Page 29] + +INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 + + +9 Security Considerations + +9.1 Safe Methods + + As discussed in the security considerations of the HTTP + specifications [2], [8], the convention has been established that the + GET and HEAD methods should be 'safe' and 'idempotent' (repeated + requests have the same effect as a single request). See section 9.1 + of RFC 2616 [8] for a full discussion. + +9.2 Header Fields Containing Sensitive Information + + Some HTTP header fields may carry sensitive information which the + server should not pass on to the script unless explicitly configured + to do so. For example, if the server protects the script using the + Basic authentication scheme, then the client will send an + Authorization header field containing a username and password. The + server validates this information and so it should not pass on the + password via the HTTP_AUTHORIZATION meta-variable without careful + consideration. This also applies to the Proxy-Authorization header + field and the corresponding HTTP_PROXY_AUTHORIZATION meta-variable. + +9.3 Data Privacy + + Confidential data in a request should be placed in a message-body as + part of a POST request, and not placed in the URI or message headers. + On some systems, the environment used to pass meta-variables to a + script may be visible to other scripts or users. In addition, many + existing servers, proxies and clients will permanently record the URI + where it might be visible to third parties. + +9.4 Information Security Model + + For a client connection using TLS, the security model applies between + the client and the server, and not between the client and the script. + It is the server's responsibility to handle the TLS session, and thus + it is the server which is authenticated to the client, not the CGI + script. + + This specification provides no mechanism for the script to + authenticate the server which invoked it. There is no enforced + integrity on the CGI request and response messages. + +9.5 Script Interference with the Server + + The most common implementation of CGI invokes the script as a child + process using the same user and group as the server process. It + should therefore be ensured that the script cannot interfere with the + + + +Robinson & Coar Expires 18 April 2004 [Page 30] + +INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 + + + server process, its configuration, documents or log files. + + If the script is executed by calling a function linked in to the + server software (either at compile-time or run-time) then precautions + should be taken to protect the core memory of the server, or to + ensure that untrusted code cannot be executed. + +9.6 Data Length and Buffering Considerations + + This specification places no limits on the length of the message-body + presented to the script. The script should not assume that + statically allocated buffers of any size are sufficient to contain + the entire submission at one time. Use of a fixed length buffer + without careful overflow checking may result in an attacker + exploiting 'stack-smashing' or 'stack-overflow' vulnerabilities of + the operating system. The script may spool large submissions to disk + or other buffering media, but a rapid succession of large submissions + may result in denial of service conditions. If the CONTENT_LENGTH of + a message-body is larger than resource considerations allow, scripts + should respond with an error status appropriate for the protocol + version; potentially applicable status codes include 503 'Service + Unavailable' (HTTP/1.0 and HTTP/1.1), 413 'Request Entity Too Large' + (HTTP/1.1), and 414 'Request-URI Too Large' (HTTP/1.1). + + Similar considerations apply to the server's handling of the CGI + response from the script. There is no limit on the length of the + header or message-body returned by the script; the server should not + assume that statically allocated buffers of any size are sufficient + to contain the entire response. + +9.7 Stateless Processing + + The stateless nature of the Web makes each script execution and + resource retrieval independent of all others even when multiple + requests constitute a single conceptual Web transaction. Because of + this, a script should not make any assumptions about the context of + the user-agent submitting a request. In particular, scripts should + examine data obtained from the client and verify that they are valid, + both in form and content, before allowing them to be used for + sensitive purposes such as input to other applications, commands, or + operating system services. These uses include (but are not limited + to) system call arguments, database writes, dynamically evaluated + source code, and input to billing or other secure processes. It is + important that applications be protected from invalid input + regardless of whether the invalidity is the result of user error, + logic error, or malicious action. + + Authors of scripts involved in multi-request transactions should be + + + +Robinson & Coar Expires 18 April 2004 [Page 31] + +INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 + + + particularly cautious about validating the state information; + undesirable effects may result from the substitution of dangerous + values for portions of the submission which might otherwise be + presumed safe. Subversion of this type occurs when alterations are + made to data from a prior stage of the transaction that were not + meant to be controlled by the client (e.g., hidden HTML form + elements, cookies, embedded URLs, etc.). + +9.8 Relative Paths + + The server should be careful of ".." path segments in the request + URI. These should be removed or resolved in the request URI before + it is split into the script-path and extra-path. Alternatively, when + the extra-path is used to find the PATH_TRANSLATED, care should be + taken to avoid the path resolution from providing translated paths + outside an expected path hierarchy. + +9.9 Non-parsed Header Output + + If a script returns a non-parsed header output, to be interpreted by + the client in its native protocol, then the script must address all + security considerations relating to that protocol. + +10 Acknowledgements + + This work is based on the original CGI interface that arose out of + discussions on the 'www-talk' mailing list. In particular, Rob + McCool, John Franks, Ari Luotonen, George Phillips and Tony Sanders + deserve special recognition for their efforts in defining and + implementing the early versions of this interface. + + This document has also greatly benefited from the comments and + suggestions made Chris Adie, Dave Kristol and Mike Meyer; also David + Morris, Jeremy Madea, Patrick McManus, Adam Donahue, Ross Patterson + and Harald Alvestrand. + +11 References + + [1] Berners-Lee, T., 'Universal Resource Identifiers in WWW: A + Unifying Syntax for the Expression of Names and Addresses of + Objects on the Network as used in the World-Wide Web', RFC 1630, + CERN, June 1994. + + [2] Berners-Lee, T., Fielding, R. T. and Frystyk, H., 'Hypertext + Transfer Protocol -- HTTP/1.0', RFC 1945, MIT/LCS, UC Irvine, + May 1996. + + [3] Berners-Lee, T., Fielding, R. and Masinter, L., 'Uniform + + + +Robinson & Coar Expires 18 April 2004 [Page 32] + +INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 + + + Resource Identifiers (URI) : Generic Syntax', RFC 2396, MIT/LC, + U.C. Irvine, Xerox Corporation, August 1998. + + [4] Braden, R. (Editor), 'Requirements for Internet Hosts -- + Application and Support', STD 3, RFC 1123, IETF, October 1989. + + [5] Bradner, S., 'Key words for use in RFCs to Indicate Requirements + Levels', BCP 14, RFC 2119, Harvard University, March 1997. + + [6] Crocker, D.H., 'Standard for the Format of ARPA Internet Text + Messages', STD 11, RFC 822, University of Delaware, August 1982. + + [7] Dierks, T. and Allen, C., 'The TLS Protocol Version 1.0', RFC + 2246, Certicom, January 1999. + + [8] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., + Leach, P. and Berners-Lee, T., 'Hypertext Transfer Protocol -- + HTTP/1.1', RFC 2616, UC Irving, Compaq/W3C, Compaq, W3C/MIT, + Xerox, Microsoft, W3C/MIT, June 1999. + + [9] Franks, J., Hallam-Baker, P., Hostetler, J., Lawrence, S., + Leach, P., Luotonen, A. and Stewart L., 'HTTP Authentication: + Basic and Digest Access Authentication', RFC 2617, Northwestern + University, Verisign Inc., AbiSource, Inc., Agranat Systems, + Inc., Microsoft Corporation, Netscape Communications + Corporation, Open Market, Inc., June 1999. + + [10] Freed, N. and Borenstein N., 'Multipurpose Internet Mail + Extensions (MIME) Part Two: Media Types', RFC 2046, Innosoft, + First Virtual, November 1996. + + [11] Hinden, R., Carpenter, B. and Masinter, L., 'Format for Literal + IPv6 Addresses in URL's', RFC 2732, Nokia, IBM, AT&T, December + 1999. + + [12] Hinden R. and Deering S., 'IP Version 6 Addressing + Architecture', RFC 2373, Nokia, Cisco Systems, July 1998. + + [13] Masinter, L., 'Returning Values from Forms: + multipart/form-data', RFC 2388, Xerox Corporation, August 1998. + + [14] Mockapetris, P., 'Domain Names - Concepts and Facilities', STD + 13, RFC 1034, ISI, November 1987. + + [15] Raggett, D., Le Hors, A. and Jacobs, I. (eds), 'HTML 4.01 + Specification', W3C Recommendation December 1999, + http://www.w3.org/TR/html401/. + + [16] Rescola, E. 'HTTP Over TLS', RFC 2818, RTFM, May 2000. + + +Robinson & Coar Expires 18 April 2004 [Page 33] + +INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003 + + + [17] St. Johns, M., 'Identification Protocol', RFC 1413, US + Department of Defense, February 1993. + + [18] 'HTTP Status Code Registry', + http://www.iana.org/assignments/http-status-codes, IANA. + + [19] IBM National Language Support Reference Manual Volume 2, + SE09-8002-01, March 1990. + + [20] 'Information Systems -- Coded Character Sets -- 7-bit American + Standard Code for Information Interchange (7-Bit ASCII)', ANSI + INCITS.4-1986 (R2002). + + [21] 'Information technology -- 8-bit single-byte coded graphic + character sets -- Part 1: Latin alphabet No. 1', ISO/IEC + 8859-1:1998. + + [22] 'The Common Gateway Interface', + http://hoohoo.ncsa.uiuc.edu/cgi/, NCSA, University of Illinois. + + +12 Authors' Addresses + + David Robinson + Apache Software Foundation + Email: drtr@apache.org + + Ken A. L. Coar + MeepZor Consulting + 7824 Mayfaire Crest Lane, Suite 202 + Raleigh, NC 27615-4875 + USA + Tel: +1 (919) 254 4237 + Fax: +1 (919) 254 5420 + Email: Ken.Coar@Golux.com + + + + + + + + + + + + + + + + +Robinson & Coar Expires 18 April 2004 [Page 34] + diff --git a/doc/rfc/rfc3507.txt b/doc/rfc/rfc3507.txt new file mode 100644 index 0000000000..ceff70759c --- /dev/null +++ b/doc/rfc/rfc3507.txt @@ -0,0 +1,2747 @@ + + + + + + +Network Working Group J. Elson +Request for Comments: 3507 A. Cerpa +Category: Informational UCLA + April 2003 + + + Internet Content Adaptation Protocol (ICAP) + +Status of this Memo + + This memo provides information for the Internet community. It does + not specify an Internet standard of any kind. Distribution of this + memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (2003). All Rights Reserved. + +IESG Note + + The Open Pluggable Services (OPES) working group has been chartered + to produce a standards track protocol specification for a protocol + intended to perform the same of functions as ICAP. However, since + ICAP is already in widespread use the IESG believes it is appropriate + to document existing usage by publishing the ICAP specification as an + informational document. The IESG also notes that ICAP was developed + before the publication of RFC 3238 and therefore does not address the + architectural and policy issues described in that document. + +Abstract + + ICAP, the Internet Content Adaption Protocol, is a protocol aimed at + providing simple object-based content vectoring for HTTP services. + ICAP is, in essence, a lightweight protocol for executing a "remote + procedure call" on HTTP messages. It allows ICAP clients to pass + HTTP messages to ICAP servers for some sort of transformation or + other processing ("adaptation"). The server executes its + transformation service on messages and sends back responses to the + client, usually with modified messages. Typically, the adapted + messages are either HTTP requests or HTTP responses. + + + + + + + + + + + +Elson & Cerpa Informational [Page 1] + +RFC 3507 ICAP April 2003 + + +Table of Contents + + 1. Introduction............................................3 + 2. Terminology.............................................5 + 3. ICAP Overall Operation..................................8 + 3.1 Request Modification..............................8 + 3.2 Response Modification............................10 + 4. Protocol Semantics.....................................11 + 4.1 General Operation................................11 + 4.2 ICAP URIs........................................11 + 4.3 ICAP Headers.....................................12 + 4.3.1 Headers Common to Requests and + Responses................................12 + 4.3.2 Request Headers..........................13 + 4.3.3 Response Headers.........................14 + 4.3.4 ICAP-Related Headers in HTTP + Messages.................................15 + 4.4 ICAP Bodies: Encapsulation of HTTP + Messages.........................................16 + 4.4.1 Expected Encapsulated Sections...........16 + 4.4.2 Encapsulated HTTP Headers................18 + 4.5 Message Preview..................................18 + 4.6 "204 No Content" Responses outside of + Previews.........................................22 + 4.7 ISTag Response Header............................22 + 4.8 Request Modification Mode........................23 + 4.8.1 Request..................................23 + 4.8.2 Response.................................24 + 4.8.3 Examples.................................24 + 4.9 Response Modification Mode.......................27 + 4.9.1 Request..................................27 + 4.9.2 Response.................................27 + 4.9.3 Examples.................................28 + 4.10 OPTIONS Method...................................29 + 4.10.1 OPTIONS request..........................29 + 4.10.2 OPTIONS response.........................30 + 4.10.3 OPTIONS examples.........................33 + 5. Caching................................................33 + 6. Implementation Notes...................................34 + 6.1 Vectoring Points.................................34 + 6.2 Application Level Errors.........................35 + 6.3 Use of Chunked Transfer-Encoding.................37 + 6.4 Distinct URIs for Distinct Services..............37 + 7. Security Considerations................................37 + 7.1 Authentication...................................37 + 7.2 Encryption.......................................38 + 7.3 Service Validation...............................38 + 8. Motivations and Design Alternatives....................39 + + + +Elson & Cerpa Informational [Page 2] + +RFC 3507 ICAP April 2003 + + + 8.1 To Be HTTP, or Not to Be.........................39 + 8.2 Mandatory Use of Chunking........................39 + 8.3 Use of the null-body directive in the + Encapsulated header..............................40 + 9. References.............................................40 + 10. Contributors...........................................41 + Appendix A BNF Grammar for ICAP Messages..................45 + Authors' Addresses..........................................48 + Full Copyright Statement....................................49 + +1. Introduction + + As the Internet grows, so does the need for scalable Internet + services. Popular web servers are asked to deliver content to + hundreds of millions of users connected at ever-increasing + bandwidths. The model of centralized, monolithic servers that are + responsible for all aspects of every client's request seems to be + reaching the end of its useful life. + + To keep up with the growth in the number of clients, there has been a + move towards architectures that scale better through the use of + replication, distribution, and caching. On the content provider + side, replication and load-balancing techniques allow the burden of + client requests to be spread out over a myriad of servers. Content + providers have also begun to deploy geographically diverse content + distribution networks that bring origin-servers closer to the "edge" + of the network where clients are attached. These networks of + distributed origin-servers or "surrogates" allow the content provider + to distribute their content whilst retaining control over the + integrity of that content. The distributed nature of this type of + deployment and the proximity of a given surrogate to the end-user + enables the content provider to offer additional services to a user + which might be based, for example, on geography where this would have + been difficult with a single, centralized service. + + ICAP, the Internet Content Adaption Protocol, is a protocol aimed at + providing simple object-based content vectoring for HTTP services. + ICAP is, in essence, a lightweight protocol for executing a "remote + procedure call" on HTTP messages. It allows ICAP clients to pass + HTTP messages to ICAP servers for some sort of transformation or + other processing ("adaptation"). The server executes its + transformation service on messages and sends back responses to the + client, usually with modified messages. The adapted messages may be + either HTTP requests or HTTP responses. Though transformations may + be possible on other non-HTTP content, they are beyond the scope of + this document. + + + + + +Elson & Cerpa Informational [Page 3] + +RFC 3507 ICAP April 2003 + + + This type of Remote Procedure Call (RPC) is useful in a number of + ways. For example: + + o Simple transformations of content can be performed near the edge + of the network instead of requiring an updated copy of an object + from an origin server. For example, a content provider might want + to provide a popular web page with a different advertisement every + time the page is viewed. Currently, content providers implement + this policy by marking such pages as non-cachable and tracking + user cookies. This imposes additional load on the origin server + and the network. In our architecture, the page could be cached + once near the edges of the network. These edge caches can then + use an ICAP call to a nearby ad-insertion server every time the + page is served to a client. + + Other such transformations by edge servers are possible, either + with cooperation from the content provider (as in a content + distribution network), or as a value-added service provided by a + client's network provider (as in a surrogate). Examples of these + kinds of transformations are translation of web pages to different + human languages or to different formats that are appropriate for + special physical devices (e.g., PDA-based or cell-phone-based + browsers). + + o Surrogates or origin servers can avoid performing expensive + operations by shipping the work off to other servers instead. + This helps distribute load across multiple machines. For example, + consider a user attempting to download an executable program via a + surrogate (e.g., a caching proxy). The surrogate, acting as an + ICAP client, can ask an external server to check the executable + for viruses before accepting it into its cache. + + o Firewalls or surrogates can act as ICAP clients and send outgoing + requests to a service that checks to make sure the URI in the + request is allowed (for example, in a system that allows parental + control of web content viewed by children). In this case, it is a + *request* that is being adapted, not an object returned by a + response. + + In all of these examples, ICAP is helping to reduce or distribute the + load on origin servers, surrogates, or the network itself. In some + cases, ICAP facilitates transformations near the edge of the network, + allowing greater cachability of the underlying content. In other + examples, devices such as origin servers or surrogates are able to + reduce their load by distributing expensive operations onto other + machines. In all cases, ICAP has also created a standard interface + for content adaptation to allow greater flexibility in content + distribution or the addition of value added services in surrogates. + + + +Elson & Cerpa Informational [Page 4] + +RFC 3507 ICAP April 2003 + + + There are two major components in our architecture: + + 1. Transaction semantics -- "How do I ask for adaptation?" + + 2. Control of policy -- "When am I supposed to ask for adaptation, + what kind of adaptation do I ask for, and from where?" + + Currently, ICAP defines only the transaction semantics. For example, + this document specifies how to send an HTTP message from an ICAP + client to an ICAP server, specify the URI of the ICAP resource + requested along with other resource-specific parameters, and receive + the adapted message. + + Although a necessary building-block, this wire-protocol defined by + ICAP is of limited use without the second part: an accompanying + application framework in which it operates. The more difficult + policy issue is beyond the scope of the current ICAP protocol, but is + planned in future work. + + In initial implementations, we expect that implementation-specific + manual configuration will be used to define policy. This includes + the rules for recognizing messages that require adaptation, the URIs + of available adaptation resources, and so on. For ICAP clients and + servers to interoperate, the exact method used to define policy need + not be consistent across implementations, as long as the policy + itself is consistent. + + IMPORTANT: + Note that at this time, in the absence of a policy-framework, it + is strongly RECOMMENDED that transformations SHOULD only be + performed on messages with the explicit consent of either the + content-provider or the user (or both). Deployment of + transformation services without the consent of either leads to, at + best, unpredictable results. For more discussion of these issues, + see Section 7. + + Once the full extent of the typical policy decisions are more fully + understood through experience with these initial implementations, + later follow-ons to this architecture may define an additional policy + control protocol. This future protocol may allow a standard policy + definition interface complementary to the ICAP transaction interface + defined here. + +2. Terminology + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this + document are to be interpreted as described in BCP 14, RFC 2119 [2]. + + + +Elson & Cerpa Informational [Page 5] + +RFC 3507 ICAP April 2003 + + + The special terminology used in this document is defined below. The + majority of these terms are taken as-is from HTTP/1.1 [4] and are + reproduced here for reference. A thorough understanding of HTTP/1.1 + is assumed on the part of the reader. + + connection: + A transport layer virtual circuit established between two programs + for the purpose of communication. + + message: + The basic unit of HTTP communication, consisting of a structured + sequence of octets matching the syntax defined in Section 4 of + HTTP/1.1 [4] and transmitted via the connection. + + request: + An HTTP request message, as defined in Section 5 of HTTP/1.1 [4]. + + response: + An HTTP response message, as defined in Section 6 of HTTP/1.1 [4]. + + resource: + A network data object or service that can be identified by a URI, + as defined in Section 3.2 of HTTP/1.1 [4]. Resources may be + available in multiple representations (e.g., multiple languages, + data formats, size, resolutions) or vary in other ways. + + client: + A program that establishes connections for the purpose of sending + requests. + + server: + An application program that accepts connections in order to + service requests by sending back responses. Any given program may + be capable of being both a client and a server; our use of these + terms refers only to the role being performed by the program for a + particular connection, rather than to the program's capabilities + in general. Likewise, any server may act as an origin server, + surrogate, gateway, or tunnel, switching behavior based on the + nature of each request. + + origin server: + The server on which a given resource resides or is to be created. + + + + + + + + + +Elson & Cerpa Informational [Page 6] + +RFC 3507 ICAP April 2003 + + + proxy: + An intermediary program which acts as both a server and a client + for the purpose of making requests on behalf of other clients. + Requests are serviced internally or by passing them on, with + possible translation, to other servers. A proxy MUST implement + both the client and server requirements of this specification. + + cache: + A program's local store of response messages and the subsystem + that controls its message storage, retrieval, and deletion. A + cache stores cachable responses in order to reduce the response + time and network bandwidth consumption on future, equivalent + requests. Any client or server may include a cache, though a + cache cannot be used by a server that is acting as a tunnel. + + cachable: + A response is cachable if a cache is allowed to store a copy of + the response message for use in answering subsequent requests. + The rules for determining the cachability of HTTP responses are + defined in Section 13 of [4]. Even if a resource is cachable, + there may be additional constraints on whether a cache can use the + cached copy for a particular request. + + surrogate: + A gateway co-located with an origin server, or at a different + point in the network, delegated the authority to operate on behalf + of, and typically working in close co-operation with, one or more + origin servers. Responses are typically delivered from an + internal cache. Surrogates may derive cache entries from the + origin server or from another of the origin server's delegates. + In some cases a surrogate may tunnel such requests. + + Where close co-operation between origin servers and surrogates + exists, this enables modifications of some protocol requirements, + including the Cache-Control directives in [4]. Such modifications + have yet to be fully specified. + + Devices commonly known as "reverse proxies" and "(origin) server + accelerators" are both more properly defined as surrogates. + + New definitions: + + ICAP resource: + Similar to an HTTP resource as described above, but the URI refers + to an ICAP service that performs adaptations of HTTP messages. + + + + + + +Elson & Cerpa Informational [Page 7] + +RFC 3507 ICAP April 2003 + + + ICAP server: + Similar to an HTTP server as described above, except that the + application services ICAP requests. + + ICAP client: + A program that establishes connections to ICAP servers for the + purpose of sending requests. An ICAP client is often, but not + always, a surrogate acting on behalf of a user. + +3. ICAP Overall Operation + + Before describing ICAP's semantics in detail, we will first give a + general overview of the protocol's major functions and expected uses. + As described earlier, ICAP focuses on modification of HTTP requests + (Section 3.1), and modification of HTTP responses (Section 3.2). + +3.1 Request Modification + + In "request modification" (reqmod) mode, an ICAP client sends an HTTP + request to an ICAP server. The ICAP server may then: + + 1) Send back a modified version of the request. The ICAP client may + then perform the modified request by contacting an origin server; + or, pipeline the modified request to another ICAP server for + further modification. + + 2) Send back an HTTP response to the request. This is used to + provide information useful to the user in case of an error (e.g., + "you sent a request to view a page you are not allowed to see"). + + 3) Return an error. + + ICAP clients MUST be able to handle all three types of responses. + However, in line with the guidance provided for HTTP surrogates in + Section 13.8 of [4], ICAP client implementors do have flexibility in + handling errors. If the ICAP server returns an error, the ICAP + client may (for example) return the error to the user, execute the + unadapted request as it arrived from the client, or re-try the + adaptation again. + + We will illustrate this method with an example application: content + filtering. Consider a surrogate that receives a request from a + client for a web page on an origin server. The surrogate, acting as + an ICAP client, sends the client's request to an ICAP server that + performs URI-based content filtering. If access to the requested URI + is allowed, the request is returned to the ICAP client unmodified. + However, if the ICAP server chooses to disallow access to the + requested resources, it may either: + + + +Elson & Cerpa Informational [Page 8] + +RFC 3507 ICAP April 2003 + + + 1) Modify the request so that it points to a page containing an error + message instead of the original URI. + + 2) Return an encapsulated HTTP response that indicates an HTTP error. + + This method can be used for a variety of other applications; for + example, anonymization, modification of the Accept: headers to handle + special device requirements, and so forth. + + Typical data flow: + + origin-server + | /|\ + | | + 5 | | 4 + | | + \|/ | 2 + ICAP-client --------------> ICAP-resource + (surrogate) <-------------- on ICAP-server + | /|\ 3 + | | + 6 | | 1 + | | + \|/ | + client + + 1. A client makes a request to a ICAP-capable surrogate (ICAP client) + for an object on an origin server. + + 2. The surrogate sends the request to the ICAP server. + + 3. The ICAP server executes the ICAP resource's service on the + request and sends the possibly modified request, or a response to + the request back to the ICAP client. + + If Step 3 returned a request: + + 4. The surrogate sends the request, possibly different from original + client request, to the origin server. + + 5. The origin server responds to request. + + 6. The surrogate sends the reply (from either the ICAP server or the + origin server) to the client. + + + + + + + +Elson & Cerpa Informational [Page 9] + +RFC 3507 ICAP April 2003 + + +3.2 Response Modification + + In the "response modification" (respmod) mode, an ICAP client sends + an HTTP response to an ICAP server. (The response sent by the ICAP + client typically has been generated by an origin server.) The ICAP + server may then: + + 1) Send back a modified version of the response. + + 2) Return an error. + + The response modification method is intended for post-processing + performed on an HTTP response before it is delivered to a client. + Examples include formatting HTML for display on special devices, + human language translation, virus checking, and so forth. + + Typical data flow: + + origin-server + | /|\ + | | + 3 | | 2 + | | + \|/ | 4 + ICAP-client --------------> ICAP-resource + (surrogate) <-------------- on ICAP-server + | /|\ 5 + | | + 6 | | 1 + | | + \|/ | + client + + 1. A client makes a request to a ICAP-capable surrogate (ICAP client) + for an object on an origin server. + + 2. The surrogate sends the request to the origin server. + + 3. The origin server responds to request. + + 4. The ICAP-capable surrogate sends the origin server's reply to the + ICAP server. + + 5. The ICAP server executes the ICAP resource's service on the origin + server's reply and sends the possibly modified reply back to the + ICAP client. + + + + + +Elson & Cerpa Informational [Page 10] + +RFC 3507 ICAP April 2003 + + + 6. The surrogate sends the reply, possibly modified from the original + origin server's reply, to the client. + +4. Protocol Semantics + +4.1 General Operation + + ICAP is a request/response protocol similar in semantics and usage to + HTTP/1.1 [4]. Despite the similarity, ICAP is not HTTP, nor is it an + application protocol that runs over HTTP. This means, for example, + that ICAP messages can not be forwarded by HTTP surrogates. Our + reasons for not building directly on top of HTTP are discussed in + Section 8.1. + + ICAP uses TCP/IP as a transport protocol. The default port is 1344, + but other ports may be used. The TCP flow is initiated by the ICAP + client to a passively listening ICAP server. + + ICAP messages consist of requests from client to server and responses + from server to client. Requests and responses use the generic + message format of RFC 2822 [3] -- that is, a start-line (either a + request line or a status line), a number of header fields (also known + as "headers"), an empty line (i.e., a line with nothing preceding the + CRLF) indicating the end of the header fields, and a message-body. + + The header lines of an ICAP message specify the ICAP resource being + requested as well as other meta-data such as cache control + information. The message body of an ICAP request contains the + (encapsulated) HTTP messages that are being modified. + + As in HTTP/1.1, a single transport connection MAY (perhaps even + SHOULD) be re-used for multiple request/response pairs. The rules + for doing so in ICAP are the same as described in Section 8.1.2.2 of + [4]. Specifically, requests are matched up with responses by + allowing only one outstanding request on a transport connection at a + time. Multiple parallel connections MAY be used as in HTTP. + +4.2 ICAP URIs + + All ICAP requests specify the ICAP resource being requested from the + server using an ICAP URI. This MUST be an absolute URI that + specifies both the complete hostname and the path of the resource + being requested. For definitive information on URL syntax and + semantics, see "Uniform Resource Identifiers (URI): Generic Syntax + and Semantics," RFC 2396 [1], Section 3. The URI structure defined + by ICAP is roughly: + + + + + +Elson & Cerpa Informational [Page 11] + +RFC 3507 ICAP April 2003 + + + ICAP_URI = Scheme ":" Net_Path [ "?" Query ] + + Scheme = "icap" + + Net_Path = "//" Authority [ Abs_Path ] + + Authority = [ userinfo "@" ] host [ ":" port ] + + ICAP adds the new scheme "icap" to the ones defined in RFC 2396. If + the port is empty or not given, port 1344 is assumed. An example + ICAP URI line might look like this: + + icap://icap.example.net:2000/services/icap-service-1 + + An ICAP server MUST be able to recognize all of its hosts names, + including any aliases, local variations, and numeric IP addresses of + its interfaces. + + Any arguments that an ICAP client wishes to pass to an ICAP service + to modify the nature of the service MAY be passed as part of the + ICAP-URI, using the standard "?"-encoding of attribute-value pairs + used in HTTP. For example: + + icap://icap.net/service?mode=translate&lang=french + +4.3 ICAP Headers + + The following sections define the valid headers for ICAP messages. + Section 4.3.1 describes headers common to both requests and + responses. Request-specific and response-specific headers are + described in Sections 4.3.2 and 4.3.3, respectively. + + User-defined header extensions are allowed. In compliance with the + precedent established by the Internet mail format [3] and later + adopted by HTTP [4], all user-defined headers MUST follow the "X-" + naming convention ("X-Extension-Header: Foo"). ICAP implementations + MAY ignore any "X-" headers without loss of compliance with the + protocol as defined in this document. + + Each header field consists of a name followed by a colon (":") and + the field value. Field names are case-insensitive. ICAP follows the + rules describe in section 4.2 of [4]. + +4.3.1 Headers Common to Requests and Responses + + The headers of all ICAP messages MAY include the following + directives, defined in ICAP the same as they are in HTTP: + + + + +Elson & Cerpa Informational [Page 12] + +RFC 3507 ICAP April 2003 + + + Cache-Control + Connection + Date + Expires + Pragma + Trailer + Upgrade + + Note in particular that the "Transfer-Encoding" option is not + allowed. The special transfer-encoding requirements of ICAP bodies + are described in Section 4.4. + + The Upgrade header MAY be used to negotiate Transport-Layer Security + on an ICAP connection, exactly as described for HTTP/1.1 in [4]. + + The ICAP-specific headers defined are: + + Encapsulated (See Section 4.4) + +4.3.2 Request Headers + + Similar to HTTP, ICAP requests MUST start with a request line that + contains a method, the complete URI of the ICAP resource being + requested, and an ICAP version string. The current version number of + ICAP is "1.0". + + This version of ICAP defines three methods: + + REQMOD - for Request Modification (Section 4.8) + RESPMOD - for Response Modification (Section 4.9) + OPTIONS - to learn about configuration (Section 4.10) + + The OPTIONS method MUST be implemented by all ICAP servers. All + other methods are optional and MAY be implemented. + + User-defined extension methods are allowed. Before attempting to use + an extension method, an ICAP client SHOULD use the OPTIONS method to + query the ICAP server's list of supported methods; see Section 4.10. + (If an ICAP server receives a request for an unknown method, it MUST + give a 501 error response as described in the next section.) + + Given the URI rules described in Section 4.2, a well-formed ICAP + request line looks like the following example: + + RESPMOD icap://icap.example.net/translate?mode=french ICAP/1.0 + + + + + + +Elson & Cerpa Informational [Page 13] + +RFC 3507 ICAP April 2003 + + + A number of request-specific headers are allowed in ICAP requests, + following the same semantics as the corresponding HTTP request + headers (Section 5.3 of [4]). These are: + + Authorization + Allow (see Section 4.6) + From (see Section 14.22 of [4]) + Host (REQUIRED in ICAP as it is in HTTP/1.1) + Referer (see Section 14.36 of [4]) + User-Agent + + In addition to HTTP-like headers, there are also request headers + unique to ICAP defined: + + Preview (see Section 4.5) + +4.3.3 Response Headers + + ICAP responses MUST start with an ICAP status line, similar in form + to that used by HTTP, including the ICAP version and a status code. + For example: + + ICAP/1.0 200 OK + + Semantics of ICAP status codes in ICAP match the status codes defined + by HTTP (Section 6.1.1 and 10 of [4]), except where otherwise + indicated in this document; n.b. 100 (Section 4.5) and 204 (Section + 4.6). + + ICAP error codes that differ from their HTTP counterparts are: + + 100 - Continue after ICAP Preview (Section 4.5). + + 204 - No modifications needed (Section 4.6). + + 400 - Bad request. + + 404 - ICAP Service not found. + + 405 - Method not allowed for service (e.g., RESPMOD requested for + service that supports only REQMOD). + + 408 - Request timeout. ICAP server gave up waiting for a request + from an ICAP client. + + 500 - Server error. Error on the ICAP server, such as "out of disk + space". + + + + +Elson & Cerpa Informational [Page 14] + +RFC 3507 ICAP April 2003 + + + 501 - Method not implemented. This response is illegal for an + OPTIONS request since implementation of OPTIONS is mandatory. + + 502 - Bad Gateway. This is an ICAP proxy and proxying produced an + error. + + 503 - Service overloaded. The ICAP server has exceeded a maximum + connection limit associated with this service; the ICAP client + should not exceed this limit in the future. + + 505 - ICAP version not supported by server. + + As in HTTP, the 4xx class of error codes indicate client errors, and + the 5xx class indicate server errors. + + ICAP's response-header fields allow the server to pass additional + information in the response that cannot be placed in the ICAP's + status line. + + A response-specific header is allowed in ICAP requests, following the + same semantics as the corresponding HTTP response headers (Section + 6.2 of [4]). This is: + + Server (see Section 14.38 of [4]) + + In addition to HTTP-like headers, there is also a response header + unique to ICAP defined: + + ISTag (see Section 4.7) + +4.3.4 ICAP-Related Headers in HTTP Messages + + When an ICAP-enabled HTTP surrogate makes an HTTP request to an + origin server, it is often useful to advise the origin server of the + surrogate's ICAP capabilities. Origin servers can use this + information to modify its response accordingly. For example, an + origin server may choose not to insert an advertisement into a page + if it knows that a downstream ICAP server can insert the ad instead. + + Although this ICAP specification can not mandate how HTTP is used in + communication between HTTP clients and servers, we do suggest a + convention: such headers (if used) SHOULD start with "X-ICAP". HTTP + clients with ICAP services SHOULD minimally include an "X-ICAP- + Version: 1.0" header along with their application-specific headers. + + + + + + + +Elson & Cerpa Informational [Page 15] + +RFC 3507 ICAP April 2003 + + +4.4 ICAP Bodies: Encapsulation of HTTP Messages + + The ICAP encapsulation model is a lightweight means of packaging any + number of HTTP message sections into an encapsulating ICAP message- + body, in order to allow the vectoring of requests, responses, and + request/response pairs to an ICAP server. + + This is accomplished by concatenating interesting message parts + (encapsulatED sections) into a single ICAP message-body (the + encapsulatING message). The encapsulated sections may be the headers + or bodies of HTTP messages. + + Encapsulated bodies MUST be transferred using the "chunked" + transfer-coding described in Section 3.6.1 of [4]. However, + encapsulated headers MUST NOT be chunked. In other words, an ICAP + message-body switches from being non-chunked to chunked as the body + passes from the encapsulated header to encapsulated body section. + (See Examples in Sections 4.8.3 and 4.9.3.). The motivation behind + this decision is described in Section 8.2. + +4.4.1 The "Encapsulated" Header + + The offset of each encapsulated section's start relative to the start + of the encapsulating message's body is noted using the "Encapsulated" + header. This header MUST be included in every ICAP message. For + example, the header + + Encapsulated: req-hdr=0, res-hdr=45, res-body=100 + + indicates a message that encapsulates a group of request headers, a + group of response headers, and then a response body. Each of these + is included at the byte-offsets listed. The byte-offsets are in + decimal notation for consistency with HTTP's Content-Length header. + + The special entity "null-body" indicates there is no encapsulated + body in the ICAP message. + + The syntax of an Encapsulated header is: + + encapsulated_header: "Encapsulated: " encapsulated_list + encapsulated_list: encapsulated_entity | + encapsulated_entity ", " encapsulated_list + encapsulated_entity: reqhdr | reshdr | reqbody | resbody | optbody + reqhdr = "req-hdr" "=" (decimal integer) + reshdr = "res-hdr" "=" (decimal integer) + reqbody = { "req-body" | "null-body" } "=" (decimal integer) + resbody = { "res-body" | "null-body" } "=" (decimal integer) + optbody = { "opt-body" | "null-body" } "=" (decimal integer) + + + +Elson & Cerpa Informational [Page 16] + +RFC 3507 ICAP April 2003 + + + There are semantic restrictions on Encapsulated headers beyond the + syntactic restrictions. The order in which the encapsulated parts + appear in the encapsulating message-body MUST be the same as the + order in which the parts are named in the Encapsulated header. In + other words, the offsets listed in the Encapsulated line MUST be + monotonically increasing. In addition, the legal forms of the + Encapsulated header depend on the method being used (REQMOD, RESPMOD, + or OPTIONS). Specifically: + + REQMOD request encapsulated_list: [reqhdr] reqbody + REQMOD response encapsulated_list: {[reqhdr] reqbody} | + {[reshdr] resbody} + RESPMOD request encapsulated_list: [reqhdr] [reshdr] resbody + RESPMOD response encapsulated_list: [reshdr] resbody + OPTIONS response encapsulated_list: optbody + + In the above grammar, note that encapsulated headers are always + optional. At most one body per encapsulated message is allowed. If + no encapsulated body is presented, the "null-body" header is used + instead; this is useful because it indicates the length of the header + section. + + Examples of legal Encapsulated headers: + + /* REQMOD request: This encapsulated HTTP request's headers start + * at offset 0; the HTTP request body (e.g., in a POST) starts + * at 412. */ + Encapsulated: req-hdr=0, req-body=412 + + /* REQMOD request: Similar to the above, but no request body is + * present (e.g., a GET). We use the null-body directive instead. + * In both this case and the previous one, we can tell from the + * Encapsulated header that the request headers were 412 bytes + * long. */ + Encapsulated: req-hdr=0, null-body=412 + + /* REQMOD response: ICAP server returned a modified request, + * with body */ + Encapsulated: req-hdr=0, req-body=512 + + /* RESPMOD request: Request headers at 0, response headers at 822, + * response body at 1655. Note that no request body is allowed in + * RESPMOD requests. */ + Encapsulated: req-hdr=0, res-hdr=822, res-body=1655 + + /* RESPMOD or REQMOD response: header and body returned */ + Encapsulated: res-hdr=0, res-body=749 + + + + +Elson & Cerpa Informational [Page 17] + +RFC 3507 ICAP April 2003 + + + /* OPTIONS response when there IS an options body */ + Encapsulated: opt-body=0 + + /* OPTIONS response when there IS NOT an options body */ + Encapsulated: null-body=0 + +4.4.2 Encapsulated HTTP Headers + + By default, ICAP messages may encapsulate HTTP message headers and + entity bodies. HTTP headers MUST start with the request-line or + status-line for requests and responses, respectively, followed by + interesting HTTP headers. + + The encapsulated headers MUST be terminated by a blank line, in order + to make them human readable, and in order to terminate line-by-line + HTTP parsers. + + HTTP/1.1 makes a distinction between end-to-end headers and hop-by- + hop headers (see Section 13.5.1 of [4]). End-to-end headers are + meaningful to the ultimate recipient of a message, whereas hop-by-hop + headers are meaningful only for a single transport-layer connection. + Hop-by-hop headers include Connection, Keep-Alive, and so forth. All + end-to-end HTTP headers SHOULD be encapsulated, and all hop-by-hop + headers MUST NOT be encapsulated. + + Despite the above restrictions on encapsulation, the hop-by-hop + Proxy-Authenticate and Proxy-Authorization headers MUST be forwarded + to the ICAP server in the ICAP header section (not the encapsulated + message). This allows propagation of client credentials that might + have been sent to the ICAP client in cases where the ICAP client is + also an HTTP surrogate. Note that this does not contradict HTTP/1.1, + which explicitly states "A proxy MAY relay the credentials from the + client request to the next proxy if that is the mechanism by which + the proxies cooperatively authenticate a given request." (Section + 14.34). + + The Via header of an encapsulated message SHOULD be modified by an + ICAP server as if the encapsulated message were traveling through an + HTTP surrogate. The Via header added by an ICAP server MUST specify + protocol as ICAP/1.0. + +4.5 Message Preview + + ICAP REQMOD or RESPMOD requests sent by the ICAP client to the ICAP + server may include a "preview". This feature allows an ICAP server + to see the beginning of a transaction, then decide if it wants to + + + + + +Elson & Cerpa Informational [Page 18] + +RFC 3507 ICAP April 2003 + + + opt-out of the transaction early instead of receiving the remainder + of the request message. Previewing can yield significant performance + improvements in a variety of situations, such as the following: + + - Virus-checkers can certify a large fraction of files as "clean" + just by looking at the file type, file name extension, and the + first few bytes of the file. Only the remaining files need to be + transmitted to the virus-checking ICAP server in their entirety. + + - Content filters can use Preview to decide if an HTTP entity needs + to be inspected (the HTTP file type alone is not enough in cases + where "text" actually turns out to be graphics data). The magic + numbers at the front of the file can identify a file as a JPEG or + GIF. + + - If an ICAP server wants to transcode all GIF87 files into GIF89 + files, then the GIF87 files could quickly be detected by looking + at the first few body bytes of the file. + + - If an ICAP server wants to force all cacheable files to expire in + 24 hours or less, then this could be implemented by selecting HTTP + messages with expiries more than 24 hours in the future. + + ICAP servers SHOULD use the OPTIONS method (see Section 4.10) to + specify how many bytes of preview are needed for a particular ICAP + application on a per-resource basis. Clients SHOULD be able to + provide Previews of at least 4096 bytes. Clients furthermore SHOULD + provide a Preview when using any ICAP resource that has indicated a + Preview is useful. (This indication might be provided via the + OPTIONS method, or some other "out-of-band" configuration.) Clients + SHOULD NOT provide a larger Preview than a server has indicated it is + willing to accept. + + To effect a Preview, an ICAP client MUST add a "Preview:" header to + its request headers indicating the length of the preview. The ICAP + client then sends: + + - all of the encapsulated header sections, and + + - the beginning of the encapsulated body section, if any, up to the + number of bytes advertised in the Preview (possibly 0). + + After the Preview is sent, the client stops and waits for an + intermediate response from the ICAP server before continuing. This + mechanism is similar to the "100-Continue" feature found in HTTP, + except that the stop-and-wait point can be within the message body. + In contrast, HTTP requires that the point must be the boundary + between the headers and body. + + + +Elson & Cerpa Informational [Page 19] + +RFC 3507 ICAP April 2003 + + + For example, to effect a Preview consisting of only encapsulated HTTP + headers, the ICAP client would add the following header to the ICAP + request: + + Preview: 0 + + This indicates that the ICAP client will send only the encapsulated + header sections to the ICAP server, then it will send a zero-length + chunk and stop and wait for a "go ahead" to send more encapsulated + body bytes to the ICAP server. + + Similarly, the ICAP header: + + Preview: 4096 + + Indicates that the ICAP client will attempt to send 4096 bytes of + origin server data in the encapsulated body of the ICAP request to + the ICAP server. It is important to note that the actual transfer + may be less, because the ICAP client is acting like a surrogate and + is not looking ahead to find the total length of the origin server + response. The entire ICAP encapsulated header section(s) will be + sent, followed by up to 4096 bytes of encapsulated HTTP body. The + chunk body terminator "0\r\n\r\n" is always included in these + transactions. + + After sending the preview, the ICAP client will wait for a response + from the ICAP server. The response MUST be one of the following: + + - 204 No Content. The ICAP server does not want to (or can not) + modify the ICAP client's request. The ICAP client MUST treat this + the same as if it had sent the entire message to the ICAP server + and an identical message was returned. + + - ICAP reqmod or respmod response, depending what method was the + original request. See Section 4.8.2 and 4.9.2 for the format of + reqmod and respmod responses. + + - 100 Continue. If the entire encapsulated HTTP body did not fit + in the preview, the ICAP client MUST send the remainder of its + ICAP message, starting from the first chunk after the preview. If + the entire message fit in the preview (detected by the "EOF" + symbol explained below), then the ICAP server MUST NOT respond + with 100 Continue. + + When an ICAP client is performing a preview, it may not yet know how + many bytes will ultimately be available in the arriving HTTP message + that it is relaying to the HTTP server. Therefore, ICAP defines a + way for ICAP clients to indicate "EOF" to ICAP servers if one + + + +Elson & Cerpa Informational [Page 20] + +RFC 3507 ICAP April 2003 + + + unexpectedly arrives during the preview process. This is a + particularly useful optimization if a header-only HTTP response + arrives at the ICAP client (i.e., zero bytes of body); only a single + round trip will be needed for the complete ICAP server response. + + We define an HTTP chunk-extension of "ieof" to indicate that an ICAP + chunk is the last chunk (see [4]). The ICAP server MUST strip this + chunk extension before passing the chunk data to an ICAP application + process. + + For example, consider an ICAP client that has just received HTTP + response headers from an origin server and initiates an ICAP RESPMOD + transaction to an ICAP server. It does not know yet how many body + bytes will be arriving from the origin server because the server is + not using the Content-Length header. The ICAP client informs the + ICAP server that it will be sending a 1024-byte preview using a + "Preview: 1024" request header. If the HTTP origin server then + closes its connection to the ICAP client before sending any data + (i.e., it provides a zero-byte body), the corresponding zero-byte + preview for that zero-byte origin response would appear as follows: + + \r\n + 0; ieof\r\n\r\n + + If an ICAP server sees this preview, it knows from the presence of + "ieof" that the client will not be sending any more chunk data. In + this case, the server MUST respond with the modified response or a + 204 No Content message right away. It MUST NOT send a 100-Continue + response in this case. (In contrast, if the origin response had been + 1 byte or larger, the "ieof" would not have appeared. In that case, + an ICAP server MAY reply with 100-Continue, a modified response, or + 204 No Content.) + + In another example, if the preview is 1024 bytes and the origin + response is 1024 bytes in two chunks, then the encapsulation would + appear as follows: + + 200\r\n + <512 bytes of data>\r\n + 200\r\n + <512 bytes of data>\r\n + 0; ieof\r\n\r\n + + <204 or modified response> (100 Continue disallowed due to ieof) + + If the preview is 1024 bytes and the origin response is 1025 bytes + (and the ICAP server responds with 100-continue), then these chunks + would appear on the wire: + + + +Elson & Cerpa Informational [Page 21] + +RFC 3507 ICAP April 2003 + + + 200\r\n + <512 bytes of data>\r\n + 200\r\n + <512 bytes of data>\r\n + 0\r\n + + <100 Continue Message> + + 1\r\n + <1 byte of data>\r\n + 0\r\n\r\n + + Once the ICAP server receives the eof indicator, it finishes reading + the current chunk stream. + + Note that when offering a Preview, the ICAP client is committing to + temporarily buffer the previewed portion of the message so that it + can honor a "204 No Content" response. The remainder of the message + is not necessarily buffered; it might be pipelined directly from + another source to the ICAP server after a 100-Continue. + +4.6 "204 No Content" Responses outside of Previews + + An ICAP client MAY choose to honor "204 No Content" responses for an + entire message. This is the decision of the client because it + imposes a burden on the client of buffering the entire message. + + An ICAP client MAY include "Allow: 204" in its request headers, + indicating that the server MAY reply to the message with a "204 No + Content" response if the object does not need modification. + + If an ICAP server receives a request that does not have "Allow: 204", + it MUST NOT reply with a 204. In this case, an ICAP server MUST + return the entire message back to the client, even though it is + identical to the message it received. + + The ONLY EXCEPTION to this rule is in the case of a message preview, + as described in the previous section. If this is the case, an ICAP + server can respond with a 204 No Content message in response to a + message preview EVEN if the original request did not have the "Allow: + 204" header. + +4.7 ISTag Response Header + + The ISTag ("ICAP Service Tag") response-header field provides a way + for ICAP servers to send a service-specific "cookie" to ICAP clients + that represents a service's current state. It is a 32-byte-maximum + alphanumeric string of data (not including the null character) that + + + +Elson & Cerpa Informational [Page 22] + +RFC 3507 ICAP April 2003 + + + may, for example, be a representation of the software version or + configuration of a service. An ISTag validates that previous ICAP + server responses can still be considered fresh by an ICAP client that + may be caching them. If a change on the ICAP server invalidates + previous responses, the ICAP server can invalidate portions of the + ICAP client's cache by changing its ISTag. The ISTag MUST be + included in every ICAP response from an ICAP server. + + For example, consider a virus-scanning ICAP service. The ISTag might + be a combination of the virus scanner's software version and the + release number of its virus signature database. When the database is + updated, the ISTag can be changed to invalidate all previous + responses that had been certified as "clean" and cached with the old + ISTag. + + ISTag is similar, but not identical, to the HTTP ETag. While an ETag + is a validator for a particular entity (object), an ISTag validates + all entities generated by a particular service (URI). A change in + the ISTag invalidates all the other entities provided a service with + the old ISTag, not just the entity whose response contained the + updated ISTag. + + The syntax of an ISTag is simply: + ISTag = "ISTag: " quoted-string + + In this document we use the quoted-string definition defined in + section 2.2 of [4]. + + For example: + ISTag: "874900-1994-1c02798" + +4.8 Request Modification Mode + + In this method, described in Section 3.1, an ICAP client sends an + HTTP request to an ICAP server. The ICAP server returns a modified + version of the request, an HTTP response, or (if the client indicates + it supports 204 responses) an indication that no modification is + required. + +4.8.1 Request + + In REQMOD mode, the ICAP request MUST contain an encapsulated HTTP + request. The headers and body (if any) MUST both be encapsulated, + except that hop-by-hop headers are not encapsulated. + + + + + + + +Elson & Cerpa Informational [Page 23] + +RFC 3507 ICAP April 2003 + + +4.8.2 Response + + The response from the ICAP server back to the ICAP client may take + one of four forms: + + - An error indication, + + - A 204 indicating that the ICAP client's request requires no + adaptation (see Section 4.6 for limitations of this response), + + - An encapsulated, adapted version of the ICAP client's request, or + + - An encapsulated HTTP error response. Note that Request + Modification requests may only be satisfied with HTTP responses in + cases when the HTTP response is an error (e.g., 403 Forbidden). + + The first line of the response message MUST be a status line as + described in Section 4.3.3. If the return code is a 2XX, the ICAP + client SHOULD continue its normal execution of the request. If the + ICAP client is a surrogate, this may include serving an object from + its cache or forwarding the modified request to an origin server. + Note it is valid for a 2XX ICAP response to contain an encapsulated + HTTP error response, which in turn should be returned to the + downstream client by the ICAP client. + + For other return codes that indicate an error, the ICAP client MAY + (for example) return the error to the downstream client or user, + execute the unadapted request as it arrived from the client, or re- + try the adaptation again. + + The modified request headers, if any, MUST be returned to the ICAP + client using appropriate encapsulation as described in Section 4.4. + +4.8.3 Examples + + Consider the following example, in which a surrogate receives a + simple GET request from a client. The surrogate, acting as an ICAP + client, then forwards this request to an ICAP server for + modification. The ICAP server modifies the request headers and sends + them back to the ICAP client. Our hypothetical ICAP server will + modify several headers and strip the cookie from the original + request. + + In all of our examples, we include the extra meta-data added to the + message due to chunking the encapsulated message body (if any). We + assume that end-of-line terminations, and blank lines, are two-byte + "CRLF" sequences. + + + + +Elson & Cerpa Informational [Page 24] + +RFC 3507 ICAP April 2003 + + + ICAP Request Modification Example 1 - ICAP Request + ---------------------------------------------------------------- + REQMOD icap://icap-server.net/server?arg=87 ICAP/1.0 + Host: icap-server.net + Encapsulated: req-hdr=0, null-body=170 + + GET / HTTP/1.1 + Host: www.origin-server.com + Accept: text/html, text/plain + Accept-Encoding: compress + Cookie: ff39fk3jur@4ii0e02i + If-None-Match: "xyzzy", "r2d2xxxx" + + ---------------------------------------------------------------- + ICAP Request Modification Example 1 - ICAP Response + ---------------------------------------------------------------- + ICAP/1.0 200 OK + Date: Mon, 10 Jan 2000 09:55:21 GMT + Server: ICAP-Server-Software/1.0 + Connection: close + ISTag: "W3E4R7U9-L2E4-2" + Encapsulated: req-hdr=0, null-body=231 + + GET /modified-path HTTP/1.1 + Host: www.origin-server.com + Via: 1.0 icap-server.net (ICAP Example ReqMod Service 1.1) + Accept: text/html, text/plain, image/gif + Accept-Encoding: gzip, compress + If-None-Match: "xyzzy", "r2d2xxxx" + + ---------------------------------------------------------------- + + The second example is similar to the first, except that the request + being modified in this case is a POST instead of a GET. Note that + the encapsulated Content-Length argument has been modified to reflect + the modified body of the POST message. The outer ICAP message does + not need a Content-Length header because it uses chunking (not + shown). + + In this second example, the Encapsulated header shows the division + between the forwarded header and forwarded body, for both the request + and the response. + + ICAP Request Modification Example 2 - ICAP Request + ---------------------------------------------------------------- + REQMOD icap://icap-server.net/server?arg=87 ICAP/1.0 + Host: icap-server.net + Encapsulated: req-hdr=0, req-body=147 + + + +Elson & Cerpa Informational [Page 25] + +RFC 3507 ICAP April 2003 + + + POST /origin-resource/form.pl HTTP/1.1 + Host: www.origin-server.com + Accept: text/html, text/plain + Accept-Encoding: compress + Pragma: no-cache + + 1e + I am posting this information. + 0 + + ---------------------------------------------------------------- + ICAP Request Modification Example 2 - ICAP Response + ---------------------------------------------------------------- + ICAP/1.0 200 OK + Date: Mon, 10 Jan 2000 09:55:21 GMT + Server: ICAP-Server-Software/1.0 + Connection: close + ISTag: "W3E4R7U9-L2E4-2" + Encapsulated: req-hdr=0, req-body=244 + + POST /origin-resource/form.pl HTTP/1.1 + Host: www.origin-server.com + Via: 1.0 icap-server.net (ICAP Example ReqMod Service 1.1) + Accept: text/html, text/plain, image/gif + Accept-Encoding: gzip, compress + Pragma: no-cache + Content-Length: 45 + + 2d + I am posting this information. ICAP powered! + 0 + + ---------------------------------------------------------------- + Finally, this third example shows an ICAP server returning an error + response when it receives a Request Modification request. + + ICAP Request Modification Example 3 - ICAP Request + ---------------------------------------------------------------- + REQMOD icap://icap-server.net/content-filter ICAP/1.0 + Host: icap-server.net + Encapsulated: req-hdr=0, null-body=119 + + GET /naughty-content HTTP/1.1 + Host: www.naughty-site.com + Accept: text/html, text/plain + Accept-Encoding: compress + + ---------------------------------------------------------------- + + + +Elson & Cerpa Informational [Page 26] + +RFC 3507 ICAP April 2003 + + + ICAP Request Modification Example 3 - ICAP Response + ---------------------------------------------------------------- + ICAP/1.0 200 OK + Date: Mon, 10 Jan 2000 09:55:21 GMT + Server: ICAP-Server-Software/1.0 + Connection: close + ISTag: "W3E4R7U9-L2E4-2" + Encapsulated: res-hdr=0, res-body=213 + + HTTP/1.1 403 Forbidden + Date: Wed, 08 Nov 2000 16:02:10 GMT + Server: Apache/1.3.12 (Unix) + Last-Modified: Thu, 02 Nov 2000 13:51:37 GMT + ETag: "63600-1989-3a017169" + Content-Length: 58 + Content-Type: text/html + + 3a + Sorry, you are not allowed to access that naughty content. + 0 + + ---------------------------------------------------------------- + +4.9 Response Modification Mode + + In this method, described in Section 3.2, an ICAP client sends an + origin server's HTTP response to an ICAP server, and (if available) + the original client request that caused that response. Similar to + Request Modification method, the response from the ICAP server can be + an adapted HTTP response, an error, or a 204 response code indicating + that no adaptation is required. + +4.9.1 Request + + Using encapsulation described in Section 4.4, the header and body of + the HTTP response to be modified MUST be included in the ICAP body. + If available, the header of the original client request SHOULD also + be included. As with the other method, the hop-by-hop headers of the + encapsulated messages MUST NOT be forwarded. The Encapsulated header + MUST indicate the byte-offsets of the beginning of each of these four + parts. + +4.9.2 Response + + The response from the ICAP server looks just like a reply in the + Request Modification method (Section 4.8); that is, + + - An error indication, + + + +Elson & Cerpa Informational [Page 27] + +RFC 3507 ICAP April 2003 + + + - An encapsulated and potentially modified HTTP response header and + response body, or + + - An HTTP response 204 indicating that the ICAP client's request + requires no adaptation. + + The first line of the response message MUST be a status line as + described in Section 4.3.3. If the return code is a 2XX, the ICAP + client SHOULD continue its normal execution of the response. The + ICAP client MAY re-examine the headers in the response's message + headers in order to make further decisions about the response (e.g., + its cachability). + + For other return codes that indicate an error, the ICAP client SHOULD + NOT return these directly to downstream client, since these errors + only make sense in the ICAP client/server transaction. + + The modified response headers, if any, MUST be returned to the ICAP + client using appropriate encapsulation as described in Section 4.4. + +4.9.3 Examples + + In Example 4, an ICAP client is requesting modification of an entity + that was returned as a result of a client GET. The original client + GET was to an origin server at "www.origin-server.com"; the ICAP + server is at "icap.example.org". + + ICAP Response Modification Example 4 - ICAP Request + ---------------------------------------------------------------- + RESPMOD icap://icap.example.org/satisf ICAP/1.0 + Host: icap.example.org + Encapsulated: req-hdr=0, res-hdr=137, res-body=296 + + GET /origin-resource HTTP/1.1 + Host: www.origin-server.com + Accept: text/html, text/plain, image/gif + Accept-Encoding: gzip, compress + + HTTP/1.1 200 OK + Date: Mon, 10 Jan 2000 09:52:22 GMT + Server: Apache/1.3.6 (Unix) + ETag: "63840-1ab7-378d415b" + Content-Type: text/html + Content-Length: 51 + + + + + + + +Elson & Cerpa Informational [Page 28] + +RFC 3507 ICAP April 2003 + + + 33 + This is data that was returned by an origin server. + 0 + + ---------------------------------------------------------------- + + ICAP Response Modification Example 4 - ICAP Response + ---------------------------------------------------------------- + ICAP/1.0 200 OK + Date: Mon, 10 Jan 2000 09:55:21 GMT + Server: ICAP-Server-Software/1.0 + Connection: close + ISTag: "W3E4R7U9-L2E4-2" + Encapsulated: res-hdr=0, res-body=222 + + HTTP/1.1 200 OK + Date: Mon, 10 Jan 2000 09:55:21 GMT + Via: 1.0 icap.example.org (ICAP Example RespMod Service 1.1) + Server: Apache/1.3.6 (Unix) + ETag: "63840-1ab7-378d415b" + Content-Type: text/html + Content-Length: 92 + + 5c + This is data that was returned by an origin server, but with + value added by an ICAP server. + 0 + + ---------------------------------------------------------------- + +4.10 OPTIONS Method + + The ICAP "OPTIONS" method is used by the ICAP client to retrieve + configuration information from the ICAP server. In this method, the + ICAP client sends a request addressed to a specific ICAP resource and + receives back a response with options that are specific to the + service named by the URI. All OPTIONS requests MAY also return + options that are global to the server (i.e., apply to all services). + +4.10.1 OPTIONS Request + + The OPTIONS method consists of a request-line, as described in + Section 4.3.2, such as the following example: + + OPTIONS icap://icap.server.net/sample-service ICAP/1.0 User-Agent: + ICAP-client-XYZ/1.001 + + + + + +Elson & Cerpa Informational [Page 29] + +RFC 3507 ICAP April 2003 + + + Other headers are also allowed as described in Section 4.3.1 and + Section 4.3.2 (for example, Host). + +4.10.2 OPTIONS Response + + The OPTIONS response consists of a status line as described in + section 4.3.3 followed by a series of header field names-value pairs + optionally followed by an opt-body. Multiple values in the value + field MUST be separated by commas. If an opt-body is present in the + OPTIONS response, the Opt-body-type header describes the format of + the opt-body. + + The OPTIONS headers supported in this version of the protocol are: + + -- Methods: + + The method that is supported by this service. This header MUST be + included in the OPTIONS response. The OPTIONS method MUST NOT be + in the Methods' list since it MUST be supported by all the ICAP + server implementations. Each service should have a distinct URI + and support only one method in addition to OPTIONS (see Section + 6.4). + + For example: + Methods: RESPMOD + + -- Service: + + A text description of the vendor and product name. This header + MAY be included in the OPTIONS response. + + For example: + Service: XYZ Technology Server 1.0 + + -- ISTag: + + See section 4.7 for details. This header MUST be included in the + OPTIONS response. + + For example: + ISTag: "5BDEEEA9-12E4-2" + + -- Encapsulated: + + This header MUST be included in the OPTIONS response; see Section + 4.4. + + + + + +Elson & Cerpa Informational [Page 30] + +RFC 3507 ICAP April 2003 + + + For example: + Encapsulated: opt-body=0 + + -- Opt-body-type: + + A token identifying the format of the opt-body. (Valid opt-body + types are not defined by ICAP.) This header MUST be included in + the OPTIONS response ONLY if an opt-body type is present. + + For example: + Opt-body-type: XML-Policy-Table-1.0 + + -- Max-Connections: + + The maximum number of ICAP connections the server is able to + support. This header MAY be included in the OPTIONS response. + + For example: + Max-Connections: 1500 + + -- Options-TTL: + + The time (in seconds) for which this OPTIONS response is valid. + If none is specified, the OPTIONS response does not expire. This + header MAY be included in the OPTIONS response. The ICAP client + MAY reissue an OPTIONS request once the Options-TTL expires. + + For example: + Options-TTL: 3600 + + -- Date: + + The server's clock, specified as an RFC 1123 compliant date/time + string. This header MAY be included in the OPTIONS response. + + For example: + Date: Fri, 15 Jun 2001 04:33:55 GMT + + -- Service-ID: + + A short label identifying the ICAP service. It MAY be used in + attribute header names. This header MAY be included in the + OPTIONS response. + + For example: + Service-ID: xyztech + + + + + +Elson & Cerpa Informational [Page 31] + +RFC 3507 ICAP April 2003 + + + -- Allow: + + A directive declaring a list of optional ICAP features that this + server has implemented. This header MAY be included in the + OPTIONS response. In this document we define the value "204" to + indicate that the ICAP server supports a 204 response. + + For example: + Allow: 204 + + -- Preview: + + The number of bytes to be sent by the ICAP client during a + preview. This header MAY be included in the OPTIONS response. + + For example: + Preview: 1024 + + -- Transfer-Preview: + + A list of file extensions that should be previewed to the ICAP + server before sending them in their entirety. This header MAY be + included in the OPTIONS response. Multiple file extensions values + should be separated by commas. The wildcard value "*" specifies + the default behavior for all the file extensions not specified in + any other Transfer-* header (see below). + + For example: + Transfer-Preview: * + + -- Transfer-Ignore: + + A list of file extensions that should NOT be sent to the ICAP + server. This header MAY be included in the OPTIONS response. + Multiple file extensions should be separated by commas. + + For example: + Transfer-Ignore: html + + -- Transfer-Complete: + + A list of file extensions that should be sent in their entirety + (without preview) to the ICAP server. This header MAY be included + in the OPTIONS response. Multiple file extensions values should + be separated by commas. + + For example: + Transfer-Complete: asp, bat, exe, com, ole + + + +Elson & Cerpa Informational [Page 32] + +RFC 3507 ICAP April 2003 + + + Note: If any of Transfer-* are sent, exactly one of them MUST contain + the wildcard value "*" to specify the default. If no Transfer-* are + sent, all responses will be sent in their entirety (without Preview). + +4.10.3 OPTIONS Examples + + In example 5, an ICAP Client sends an OPTIONS Request to an ICAP + Service named icap.server.net/sample-service in order to get + configuration information for the service provided. + + ICAP OPTIONS Example 5 - ICAP OPTIONS Request + ---------------------------------------------------------------- + OPTIONS icap://icap.server.net/sample-service ICAP/1.0 + Host: icap.server.net + User-Agent: BazookaDotCom-ICAP-Client-Library/2.3 + + ---------------------------------------------------------------- + + ICAP OPTIONS Example 5 - ICAP OPTIONS Response + ---------------------------------------------------------------- + ICAP/1.0 200 OK + Date: Mon, 10 Jan 2000 09:55:21 GMT + Methods: RESPMOD + Service: FOO Tech Server 1.0 + ISTag: "W3E4R7U9-L2E4-2" + Encapsulated: null-body=0 + Max-Connections: 1000 + Options-TTL: 7200 + Allow: 204 + Preview: 2048 + Transfer-Complete: asp, bat, exe, com + Transfer-Ignore: html + Transfer-Preview: * + + ---------------------------------------------------------------- + +5. Caching + + ICAP servers' responses MAY be cached by ICAP clients, just as any + other surrogate might cache HTTP responses. Similar to HTTP, ICAP + clients MAY always store a successful response (see sections 4.8.2 + and 4.9.2) as a cache entry, and MAY return it without validation if + it is fresh. ICAP servers use the caching directives described in + HTTP/1.1 [4]. + + In Request Modification mode, the ICAP server MAY include caching + directives in the ICAP header section of the ICAP response (NOT in + the encapsulated HTTP request of the ICAP message body). In Response + + + +Elson & Cerpa Informational [Page 33] + +RFC 3507 ICAP April 2003 + + + Modification mode, the ICAP server MAY add or modify the HTTP caching + directives located in the encapsulated HTTP response (NOT in the ICAP + header section). Consequently, the ICAP client SHOULD look for + caching directives in the ICAP headers in case of REQMOD, and in the + encapsulated HTTP response in case of RESPMOD. + + In cases where an ICAP server returns a modified version of an object + created by an origin server, such as in Response Modification mode, + the expiration of the ICAP-modified object MUST NOT be longer than + that of the origin object. In other words, ICAP servers MUST NOT + extend the lifetime of origin server objects, but MAY shorten it. + + In cases where the ICAP server is the authoritative source of an ICAP + response, such as in Request Modification mode, the ICAP server is + not restricted in its expiration policy. + + Note that the ISTag response-header may also be used to providing + caching hints to clients; see Section 4.7. + +6. Implementation Notes + +6.1 Vectoring Points + + The definition of the ICAP protocol itself only describes two + different adaptation channels: modification (and satisfaction) of + requests, and modifications of replies. However, an ICAP client + implementation is likely to actually distinguish among four different + classes of adaptation: + + 1. Adaptation of client requests. This is adaptation done every + time a request arrives from a client. This is adaptation done + when a request is "on its way into the cache". Factors such as + the state of the objects currently cached will determine whether + or not this request actually gets forwarded to an origin server + (instead of, say, getting served off the cache's disk). An + example of this type of adaptation would be special access + control or authentication services that must be performed on a + per-client basis. + + 2. Adaptation of requests on their way to an origin server. + Although this type of adaptation is also an adaptation of + requests similar to (1), it describes requests that are "on their + way out of the cache"; i.e., if a request actually requires that + an origin server be contacted. These adaptation requests are not + necessarily specific to particular clients. An example would be + addition of "Accept:" headers for special devices; these + adaptations can potentially apply to many clients. + + + + +Elson & Cerpa Informational [Page 34] + +RFC 3507 ICAP April 2003 + + + 3. Adaptations of responses coming from an origin server. This is + the adaptation of an object "on its way into the cache". In + other words, this is adaptation that a surrogate might want to + perform on an object before caching it. The adapted object may + subsequently served to many clients. An example of this type of + adaptation is virus checking: a surrogate will want to check an + incoming origin reply for viruses once, before allowing it into + the cache -- not every time the cached object is served to a + client. + + Adaptation of responses coming from the surrogate, heading back + to the client. Although this type of adaptation, like (3), is + the adaptation of a response, it is client-specific. Client + reply adaptation is adaptation that is required every time an + object is served to a client, even if all the replies come from + the same cached object off of disk. Ad insertion is a common + form of this kind of adaptation; e.g., if a popular (cached) + object that rarely changes needs a different ad inserted into it + every time it is served off disk to a client. Note that the + relationship between adaptations of type (3) and (4) is analogous + to the relationship between types (2) and (1). + + Although the distinction among these four adaptation points is + critical for ICAP client implementations, the distinction is not + significant for the ICAP protocol itself. From the point of view of + an ICAP server, a request is a request -- the ICAP server doesn't + care what policy led the ICAP client to generate the request. We + therefore did not make these four channels explicit in ICAP for + simplicity. + +6.2 Application Level Errors + + Section 4 described "on the wire" protocol errors that MUST be + standardized across implementations to ensure interoperability. In + this section, we describe errors that are communicated between ICAP + software and the clients and servers on which they are implemented. + Although such errors are implementation dependent and do not + necessarily need to be standardized because they are "within the + box", they are presented here as advice to future implementors based + on past implementation experience. + + + + + + + + + + + +Elson & Cerpa Informational [Page 35] + +RFC 3507 ICAP April 2003 + + + Error name Value + ==================================================== + ICAP_CANT_CONNECT 1000 + ICAP_SERVER_RESPONSE_CLOSE 1001 + ICAP_SERVER_RESPONSE_RESET 1002 + ICAP_SERVER_UNKNOWN_CODE 1003 + ICAP_SERVER_UNEXPECTED_CLOSE_204 1004 + ICAP_SERVER_UNEXPECTED_CLOSE 1005 + + 1000 ICAP_CANT_CONNECT: + "Cannot connect to ICAP server". + + The ICAP server is not connected on the socket. Maybe the ICAP + server is dead or it is not connected on the socket. + + 1001 ICAP_SERVER_RESPONSE_CLOSE: + "ICAP Server closed connection while reading response". + + The ICAP server TCP-shutdowns the connection before the ICAP + client can send all the body data. + + 1002 ICAP_SERVER_RESPONSE_RESET: + "ICAP Server reset connection while reading response". + + The ICAP server TCP-reset the connection before the ICAP client + can send all the body data. + + 1003 ICAP_SERVER_UNKNOWN_CODE: + "ICAP Server sent unknown response code". + + An unknown ICAP response code (see Section 4.x) was received by + the ICAP client. + + 1004 ICAP_SERVER_UNEXPECTED_CLOSE_204: + "ICAP Server closed connection on 204 without 'Connection: close' + header". + + An ICAP server MUST send the "Connection: close" header if + intends to close after the current transaction. + + 1005 ICAP_SERVER_UNEXPECTED_CLOSE: + "ICAP Server closed connection as ICAP client wrote body + preview". + + + + + + + + +Elson & Cerpa Informational [Page 36] + +RFC 3507 ICAP April 2003 + + +6.3 Use of Chunked Transfer-Encoding + + For simplicity, ICAP messages MUST use the "chunked" transfer- + encoding within the encapsulated body section as defined in HTTP/1.1 + [4]. This requires that ICAP client implementations convert incoming + objects "on the fly" to chunked from whatever transfer-encoding on + which they arrive. However, the transformation is simple: + + - For objects arriving using "Content-Length" headers, one big chunk + can be created of the same size as indicated in the Content-Length + header. + + - For objects arriving using a TCP close to signal the end of the + object, each incoming group of bytes read from the OS can be + converted into a chunk (by writing the length of the bytes read, + followed by the bytes themselves) + + - For objects arriving using chunked encoding, they can be + retransmitted as is (without re-chunking). + +6.4 Distinct URIs for Distinct Services + + ICAP servers SHOULD assign unique URIs to each service they provide, + even if such services might theoretically be differentiated based on + their method. In other words, a REQMOD and RESPMOD service should + never have the same URI, even if they do something that is + conceptually the same. + + This situation in ICAP is similar to that found in HTTP where it + might, in theory, be possible to perform a GET or a POST to the same + URI and expect two different results. This kind of overloading of + URIs only causes confusion and should be avoided. + +7. Security Considerations + +7.1 Authentication + + Authentication in ICAP is very similar to proxy authentication in + HTTP as specified in RFC 2617. Specifically, the following rules + apply: + + - WWW-Authenticate challenges and responses are for end-to-end + authentication between a client (user) and an origin server. As + any proxy, ICAP clients and ICAP servers MUST forward these + headers without modification. + + + + + + +Elson & Cerpa Informational [Page 37] + +RFC 3507 ICAP April 2003 + + + - If authentication is required between an ICAP client and ICAP + server, hop-by-hop Proxy Authentication as described in RFC 2617 + MUST be used. + + There are potential applications where a user (as opposed to ICAP + client) might have rights to access an ICAP service. In this version + of the protocol, we assume that ICAP clients and ICAP servers are + under the same administrative domain, and contained in a single trust + domain. Therefore, in these cases, we assume that it is sufficient + for users to authenticate themselves to the ICAP client (which is a + surrogate from the point of view from the user). This type of + authentication will also be Proxy Authentication as described in RFC + 2617. + + This standard explicitly excludes any method for a user to + authenticate directly to an ICAP server; the ICAP client MUST be + involved as described above. + +7.2 Encryption + + Users of ICAP should note well that ICAP messages are not encrypted + for transit by default. In the absence of some other form of + encryption at the link or network layers, eavesdroppers may be able + to record the unencrypted transactions between ICAP clients and + servers. As described in Section 4.3.1, the Upgrade header MAY be + used to negotiate transport-layer security for an ICAP connection + [5]. + + Note also that end-to-end encryption between a client and origin + server is likely to preclude the use of value-added services by + intermediaries such as surrogates. An ICAP server that is unable to + decrypt a client's messages will, of course, be unable to perform any + transformations on it. + +7.3 Service Validation + + Normal HTTP surrogates, when operating correctly, should not affect + the end-to-end semantics of messages that pass through them. This + forms a well-defined criterion to validate that a surrogate is + working correctly: a message should look the same before the + surrogate as it does after the surrogate. + + In contrast, ICAP is meant to cause changes in the semantics of + messages on their way from origin servers to users. The criteria for + a correctly operating surrogate are no longer as easy to define. + This will make validation of ICAP services significantly more + difficult. Incorrect adaptations may lead to security + vulnerabilities that were not present in the unadapted content. + + + +Elson & Cerpa Informational [Page 38] + +RFC 3507 ICAP April 2003 + + +8. Motivations and Design Alternatives + + This section describes some of our design decisions in more detail, + and describes the ideas and motivations behind them. This section + does not define protocol requirements, but hopefully sheds light on + the requirements defined in previous sections. Nothing in this + section carries the "force of law" or is part of the formal protocol + specification. + + In general, our guiding principle was to make ICAP the simplest + possible protocol that would do the job, and no simpler. Some + features were rejected where alternative (non-protocol-based) + solutions could be found. In addition, we have intentionally left a + number of issues at the discretion of the implementor, where we + believe that doing so does not compromise interoperability. + +8.1 To Be HTTP, or Not To Be + + ICAP was initially designed as an application-layer protocol built to + run on top of HTTP. This was desirable for a number of reasons. + HTTP is well-understood in the community and has enjoyed significant + investments in software infrastructure (clients, servers, parsers, + etc.). Our initial designs focused on leveraging that existing work; + we hoped that it would be possible to implement ICAP services simply, + using CGI scripts run by existing web servers. + + However, the devil (as always) proved to be in the details. Certain + features that we considered important were impossible to implement + with HTTP. For example, ICAP clients can stop and wait for a "100 + Continue" message in the midst of a message-body; HTTP clients may + only wait between the header and body. In addition, certain + transformations of HTTP messages by surrogates are legal (and + harmless for HTTP), but caused problems with ICAP's "header-in- + header" encapsulation and other features. + + Ultimately, we decided that the tangle of workarounds required to fit + ICAP into HTTP was more complex and confusing than moving away from + HTTP and defining a new (but similar) protocol. + +8.2 Mandatory Use of Chunking + + Chunking is mandatory in ICAP encapsulated bodies for three reasons. + First, efficiency is important, and the chunked encoding allows both + the client and server to keep the transport-layer connection open for + later reuse. Second, ICAP servers (and their developers) should be + encouraged to produce "incremental" responses where possible, to + reduce the latency perceived by users. Chunked encoding is the only + way to support this type of implementation. Finally, by + + + +Elson & Cerpa Informational [Page 39] + +RFC 3507 ICAP April 2003 + + + standardizing on a single encapsulation mechanism, we avoid the + complexity that would be required in client and server software to + support multiple mechanisms. This simplifies ICAP, particularly in + the "body preview" feature described in Section 4.5. + + While chunking of encapsulated bodies is mandatory, encapsulated + headers are not chunked. There are two reasons for this decision. + First, in cases where a chunked HTTP message body is being + encapsulated in an ICAP message, the ICAP client (HTTP server) can + copy it directly from the HTTP client to the ICAP server without un- + chunking and then re-chunking it. Second, many header-parser + implementations have difficulty dealing with headers that come in + multiple chunks. Earlier drafts of this document mandated that a + chunk boundary not come within a header. For clarity, chunking of + encapsulated headers has simply been disallowed. + +8.3 Use of the null-body directive in the Encapsulated header + + There is a disadvantage to not using the chunked transfer-encoding + for encapsulated header part of an ICAP message. Specifically, + parsers do not know in advance how much header data is coming (e.g., + for buffer allocation). ICAP does not allow chunking in the header + part for reasons described in Section 8.2. To compensate, the + "null-body" directive allows the final header's length to be + determined, despite it not being chunked. + +9. References + + [1] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform Resource + Identifiers (URI): Generic Syntax and Semantics", RFC 2396, + August 1998. + + [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement + Levels", BCP 14, RFC 2119, March 1997. + + [3] Resnick, P., "Internet Message Format", RFC 2822, April 2001. + + [4] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., + Leach, P. and T. Berners-Lee, "Hypertext Transfer Protocol -- + HTTP/1.1", RFC 2616, June 1999. + + [5] Khare, R. and S. Lawrence, "Upgrading to TLS Within HTTP/1.1", + RFC 2817, May 2000. + + + + + + + + +Elson & Cerpa Informational [Page 40] + +RFC 3507 ICAP April 2003 + + +10. Contributors + + ICAP is based on an original idea by John Martin and Peter Danzig. + Many individuals and organizations have contributed to the + development of ICAP, including the following contributors (past and + present): + + Lee Duggs + Network Appliance, Inc. + 495 East Java Dr. + Sunnyvale, CA 94089 USA + + Phone: (408) 822-6000 + EMail: lee.duggs@netapp.com + + Paul Eastham + Network Appliance, Inc. + 495 East Java Dr. + Sunnyvale, CA 94089 USA + + Phone: (408) 822-6000 + EMail: eastham@netapp.com + + Debbie Futcher + Network Appliance, Inc. + 495 East Java Dr. + Sunnyvale, CA 94089 USA + + Phone: (408) 822-6000 + EMail: deborah.futcher@netapp.com + + Don Gillies + Network Appliance, Inc. + 495 East Java Dr. + Sunnyvale, CA 94089 USA + + Phone: (408) 822-6000 + EMail: gillies@netapp.com + + Steven La + Network Appliance, Inc. + 495 East Java Dr. + Sunnyvale, CA 94089 USA + + Phone: (408) 822-6000 + EMail: steven.la@netapp.com + + + + + +Elson & Cerpa Informational [Page 41] + +RFC 3507 ICAP April 2003 + + + John Martin + Network Appliance, Inc. + 495 East Java Dr. + Sunnyvale, CA 94089 USA + + Phone: (408) 822-6000 + EMail: jmartin@netapp.com + + Jeff Merrick + Network Appliance, Inc. + 495 East Java Dr. + Sunnyvale, CA 94089 USA + + Phone: (408) 822-6000 + EMail: jeffrey.merrick@netapp.com + + John Schuster + Network Appliance, Inc. + 495 East Java Dr. + Sunnyvale, CA 94089 USA + + Phone: (408) 822-6000 + EMail: john.schuster@netapp.com + + Edward Sharp + Network Appliance, Inc. + 495 East Java Dr. + Sunnyvale, CA 94089 USA + + Phone: (408) 822-6000 + EMail: edward.sharp@netapp.com + + Peter Danzig + Akamai Technologies + 1400 Fashion Island Blvd + San Mateo, CA 94404 USA + + Phone: (650) 372-5757 + EMail: danzig@akamai.com + + Mark Nottingham + Akamai Technologies + 1400 Fashion Island Blvd + San Mateo, CA 94404 USA + + Phone: (650) 372-5757 + EMail: mnot@akamai.com + + + + +Elson & Cerpa Informational [Page 42] + +RFC 3507 ICAP April 2003 + + + Nitin Sharma + Akamai Technologies + 1400 Fashion Island Blvd + San Mateo, CA 94404 USA + + Phone: (650) 372-5757 + EMail: nitin@akamai.com + + Hilarie Orman + Novell, Inc. + 122 East 1700 South + Provo, UT 84606 USA + + Phone: (801) 861-7021 + EMail: horman@novell.com + + Craig Blitz + Novell, Inc. + 122 East 1700 South + Provo, UT 84606 USA + + Phone: (801) 861-7021 + EMail: cblitz@novell.com + + Gary Tomlinson + Novell, Inc. + 122 East 1700 South + Provo, UT 84606 USA + + Phone: (801) 861-7021 + EMail: garyt@novell.com + + Andre Beck + Bell Laboratories / Lucent Technologies + 101 Crawfords Corner Road + Holmdel, New Jersey 07733-3030 + + Phone: (732) 332-5983 + EMail: abeck@bell-labs.com + + Markus Hofmann + Bell Laboratories / Lucent Technologies + 101 Crawfords Corner Road + Holmdel, New Jersey 07733-3030 + + Phone: (732) 332-5983 + EMail: hofmann@bell-labs.com + + + + +Elson & Cerpa Informational [Page 43] + +RFC 3507 ICAP April 2003 + + + David Bryant + CacheFlow, Inc. + 650 Almanor Avenue + Sunnyvale, California 94086 + + Phone: (888) 462-3568 + EMail: david.bryant@cacheflow.com + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Elson & Cerpa Informational [Page 44] + +RFC 3507 ICAP April 2003 + + +Appendix A BNF Grammar for ICAP Messages + + This grammar is specified in terms of the augmented Backus-Naur Form + (BNF) similar to that used by the HTTP/1.1 specification (See Section + 2.1 of [4]). Implementors will need to be familiar with the notation + in order to understand this specification. + + Many header values (where noted) have exactly the same grammar and + semantics as in HTTP/1.1. We do not reproduce those grammars here. + + ICAP-Version = "ICAP/1.0" + + ICAP-Message = Request | Response + + Request = Request-Line + *(Request-Header CRLF) + CRLF + [ Request-Body ] + + Request-Line = Method SP ICAP_URI SP ICAP-Version CRLF + + Method = "REQMOD" ; Section 4.8 + | "RESPMOD" ; Section 4.9 + | "OPTIONS" ; Section 4.10 + | Extension-Method ; Section 4.3.2 + + Extension-Method = token + + ICAP_URI = Scheme ":" Net_Path [ "?" Query ] ; Section 4.2 + + Scheme = "icap" + + Net_Path = "//" Authority [ Abs_Path ] + + Authority = [ userinfo "@" ] host [ ":" port ] + + + Request-Header = Request-Fields ":" [ Generic-Field-Value ] + + Request-Fields = Request-Field-Name + | Common-Field-Name + + ; Header fields specific to requests + Request-Field-Name = "Authorization" ; Section 4.3.2 + | "Allow" ; Section 4.3.2 + | "From" ; Section 4.3.2 + | "Host" ; Section 4.3.2 + | "Referer" ; Section 4.3.2 + + + +Elson & Cerpa Informational [Page 45] + +RFC 3507 ICAP April 2003 + + + | "User-Agent" ; Section 4.3.2 + | "Preview" ; Section 4.5 + + ; Header fields common to both requests and responses + Common-Field-Name = "Cache-Control" ; Section 4.3.1 + | "Connection" ; Section 4.3.1 + | "Date" ; Section 4.3.1 + | "Expires" ; Section 4.3.1 + | "Pragma" ; Section 4.3.1 + | "Trailer" ; Section 4.3.1 + | "Upgrade" ; Section 4.3.1 + | "Encapsulated" ; Section 4.4 + | Extension-Field-Name ; Section 4.3 + + Extension-Field-Name = "X-" token + + Generic-Field-Value = *( Generic-Field-Content | LWS ) + Generic-Field-Content = + + Request-Body = *OCTET ; See Sections 4.4 and 4.5 for semantics + + Response = Status-Line + *(Response-Header CRLF) + CRLF + [ Response-Body ] + + Status-Line = ICAP-Version SP Status-Code SP Reason-Phrase CRLF + + Status-Code = "100" ; Section 4.5 + | "101" ; Section 10.1.2 of [4] + | "200" ; Section 10.2.1 of [4] + | "201" ; Section 10.2.2 of [4] + | "202" ; Section 10.2.3 of [4] + | "203" ; Section 10.2.4 of [4] + | "204" ; Section 4.6 + | "205" ; Section 10.2.6 of [4] + | "206" ; Section 10.2.7 of [4] + | "300" ; Section 10.3.1 of [4] + | "301" ; Section 10.3.2 of [4] + | "302" ; Section 10.3.3 of [4] + | "303" ; Section 10.3.4 of [4] + | "304" ; Section 10.3.5 of [4] + | "305" ; Section 10.3.6 of [4] + | "306" ; Section 10.3.7 of [4] + | "307" ; Section 10.3.8 of [4] + + + +Elson & Cerpa Informational [Page 46] + +RFC 3507 ICAP April 2003 + + + | "400" ; Section 4.3.3 + | "401" ; Section 10.4.2 of [4] + | "402" ; Section 10.4.3 of [4] + | "403" ; Section 10.4.4 of [4] + | "404" ; Section 4.3.3 + | "405" ; Section 4.3.3 + | "406" ; Section 10.4.7 of [4] + | "407" ; Section 10.4.8 of [4] + | "408" ; Section 4.3.3 + | "409" ; Section 10.4.10 of [4] + | "410" ; Section 10.4.11 of [4] + | "411" ; Section 10.4.12 of [4] + | "412" ; Section 10.4.13 of [4] + | "413" ; Section 10.4.14 of [4] + | "414" ; Section 10.4.15 of [4] + | "415" ; Section 10.4.16 of [4] + | "416" ; Section 10.4.17 of [4] + | "417" ; Section 10.4.18 of [4] + | "500" ; Section 4.3.3 + | "501" ; Section 4.3.3 + | "502" ; Section 4.3.3 + | "503" ; Section 4.3.3 + | "504" ; Section 10.5.5 of [4] + | "505" ; Section 4.3.3 + | Extension-Code + + Extension-Code = 3DIGIT + + Reason-Phrase = * + + Response-Header = Response-Fields ":" [ Generic-Field-Value ] + + Response-Fields = Response-Field-Name + | Common-Field-Name + + Response-Field-Name = "Server" ; Section 4.3.3 + | "ISTag" ; Section 4.7 + + Response-Body = *OCTET ; See Sections 4.4 and 4.5 for semantics + + + + + + + + + + + + +Elson & Cerpa Informational [Page 47] + +RFC 3507 ICAP April 2003 + + +Authors' Addresses + + Jeremy Elson + University of California Los Angeles + Department of Computer Science + 3440 Boelter Hall + Los Angeles CA 90095 + + Phone: (310) 206-3925 + EMail: jelson@cs.ucla.edu + + + Alberto Cerpa + University of California Los Angeles + Department of Computer Science + 3440 Boelter Hall + Los Angeles CA 90095 + + Phone: (310) 206-3925 + EMail: cerpa@cs.ucla.edu + + + ICAP discussion currently takes place at + icap-discussions@yahoogroups.com. + For more information, see + http://groups.yahoo.com/group/icap-discussions/. + + + + + + + + + + + + + + + + + + + + + + + + + +Elson & Cerpa Informational [Page 48] + +RFC 3507 ICAP April 2003 + + +Full Copyright Statement + + Copyright (C) The Internet Society (2003). All Rights Reserved. + + This document and translations of it may be copied and furnished to + others, and derivative works that comment on or otherwise explain it + or assist in its implementation may be prepared, copied, published + and distributed, in whole or in part, without restriction of any + kind, provided that the above copyright notice and this paragraph are + included on all such copies and derivative works. However, this + document itself may not be modified in any way, such as by removing + the copyright notice or references to the Internet Society or other + Internet organizations, except as needed for the purpose of + developing Internet standards in which case the procedures for + copyrights defined in the Internet Standards process must be + followed, or as required to translate it into languages other than + English. + + The limited permissions granted above are perpetual and will not be + revoked by the Internet Society or its successors or assigns. + + This document and the information contained herein is provided on an + "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING + TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING + BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION + HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF + MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + +Acknowledgement + + Funding for the RFC Editor function is currently provided by the + Internet Society. + + + + + + + + + + + + + + + + + + + +Elson & Cerpa Informational [Page 49] + -- 2.47.2