--- /dev/null
+
+
+
+INTERNET-DRAFT David Robinson
+draft-coar-cgi-v11-04.txt Apache Software Foundation
+Expires 18 April 2004 Ken A.L. Coar
+ IBM Corporation
+ 19 October 2003
+
+
+ The Common Gateway Interface (CGI) Version 1.1
+
+
+Status of this Memo
+
+ This document is an Internet-Draft and is in full conformance with
+ all provisions of Section 10 of RFC2026.
+
+ Internet-Drafts are working documents of the Internet Engineering
+ Task Force (IETF), its areas, and its working groups. Note that
+ other groups may also distribute working documents as
+ Internet-Drafts.
+
+ Internet-Drafts are draft documents valid for a maximum of six months
+ and may be updated, replaced, or obsoleted by other documents at any
+ time. It is inappropriate to use Internet-Drafts as reference
+ material or to cite them other than as 'work in progress'.
+
+ The list of current Internet-Drafts can be accessed at
+ http://www.ietf.org/ietf/1id-abstracts.txt.
+
+ The list of Internet-Draft Shadow Directories can be accessed at
+ http://www.ietf.org/shadow.html.
+
+ Distribution of this document is unlimited. Please send comments to
+ the authors, or via the CGI-WG mailing list; see the project Web page
+ at <http://cgi-spec.golux.com>.
+
+Abstract
+
+ The Common Gateway Interface (CGI) is a simple interface for running
+ external programs, software or gateways under an information server
+ in a platform-independent manner. Currently, the supported
+ information servers are HTTP servers.
+
+ The interface has been in use by the World-Wide Web since 1993. This
+ specification defines the 'current practice' parameters of the
+ 'CGI/1.1' interface developed and documented at the U.S. National
+ Centre for Supercomputing Applications. This document also defines
+ the use of the CGI/1.1 interface on UNIX(R) and other, similar
+ systems.
+
+
+
+Robinson & Coar Expires 18 April 2004 [Page 1]
+\f
+INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003
+
+
+Contents
+
+ 1 Introduction 4
+ 1.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . 4
+ 1.2 Requirements . . . . . . . . . . . . . . . . . . . . . . . 4
+ 1.3 Specifications . . . . . . . . . . . . . . . . . . . . . . 4
+ 1.4 Terminology . . . . . . . . . . . . . . . . . . . . . . . 5
+
+ 2 Notational Conventions and Generic Grammar 5
+ 2.1 Augmented BNF . . . . . . . . . . . . . . . . . . . . . . 5
+ 2.2 Basic Rules . . . . . . . . . . . . . . . . . . . . . . . 6
+ 2.3 URL Encoding . . . . . . . . . . . . . . . . . . . . . . . 7
+
+ 3 Invoking the Script 8
+ 3.1 Server Responsibilities . . . . . . . . . . . . . . . . . 8
+ 3.2 Script Selection . . . . . . . . . . . . . . . . . . . . . 8
+ 3.3 The Script-URI . . . . . . . . . . . . . . . . . . . . . . 9
+ 3.4 Execution . . . . . . . . . . . . . . . . . . . . . . . . 10
+
+ 4 The CGI Request 10
+ 4.1 Request Meta-Variables . . . . . . . . . . . . . . . . . . 10
+ 4.1.1 AUTH_TYPE . . . . . . . . . . . . . . . . . . . . . 11
+ 4.1.2 CONTENT_LENGTH . . . . . . . . . . . . . . . . . . 11
+ 4.1.3 CONTENT_TYPE . . . . . . . . . . . . . . . . . . . 12
+ 4.1.4 GATEWAY_INTERFACE . . . . . . . . . . . . . . . . . 13
+ 4.1.5 PATH_INFO . . . . . . . . . . . . . . . . . . . . . 13
+ 4.1.6 PATH_TRANSLATED . . . . . . . . . . . . . . . . . . 14
+ 4.1.7 QUERY_STRING . . . . . . . . . . . . . . . . . . . 15
+ 4.1.8 REMOTE_ADDR . . . . . . . . . . . . . . . . . . . . 15
+ 4.1.9 REMOTE_HOST . . . . . . . . . . . . . . . . . . . . 16
+ 4.1.10 REMOTE_IDENT . . . . . . . . . . . . . . . . . . . 16
+ 4.1.11 REMOTE_USER . . . . . . . . . . . . . . . . . . . . 16
+ 4.1.12 REQUEST_METHOD . . . . . . . . . . . . . . . . . . 16
+ 4.1.13 SCRIPT_NAME . . . . . . . . . . . . . . . . . . . . 17
+ 4.1.14 SERVER_NAME . . . . . . . . . . . . . . . . . . . . 17
+ 4.1.15 SERVER_PORT . . . . . . . . . . . . . . . . . . . . 17
+ 4.1.16 SERVER_PROTOCOL . . . . . . . . . . . . . . . . . . 18
+ 4.1.17 SERVER_SOFTWARE . . . . . . . . . . . . . . . . . . 18
+ 4.1.18 Protocol-Specific Meta-Variables . . . . . . . . . 18
+ 4.2 Request Message-Body . . . . . . . . . . . . . . . . . . . 19
+ 4.3 Request Methods . . . . . . . . . . . . . . . . . . . . . 20
+ 4.3.1 GET . . . . . . . . . . . . . . . . . . . . . . . . 20
+ 4.3.2 POST . . . . . . . . . . . . . . . . . . . . . . . 20
+ 4.3.3 HEAD . . . . . . . . . . . . . . . . . . . . . . . 20
+ 4.3.4 Protocol-Specific Methods . . . . . . . . . . . . . 20
+ 4.4 The Script Command Line . . . . . . . . . . . . . . . . . 21
+
+ 5 NPH Scripts 21
+ 5.1 Identification . . . . . . . . . . . . . . . . . . . . . . 21
+
+
+Robinson & Coar Expires 18 April 2004 [Page 2]
+\f
+INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003
+
+
+ 5.2 NPH Response . . . . . . . . . . . . . . . . . . . . . . . 22
+
+ 6 CGI Response 22
+ 6.1 Response Handling . . . . . . . . . . . . . . . . . . . . 22
+ 6.2 Response Types . . . . . . . . . . . . . . . . . . . . . . 22
+ 6.2.1 Document Response . . . . . . . . . . . . . . . . . 23
+ 6.2.2 Local Redirect Response . . . . . . . . . . . . . . 23
+ 6.2.3 Client Redirect Response . . . . . . . . . . . . . 23
+ 6.2.4 Client Redirect Response with Document . . . . . . 24
+ 6.3 Response Header Fields . . . . . . . . . . . . . . . . . . 24
+ 6.3.1 Content-Type . . . . . . . . . . . . . . . . . . . 24
+ 6.3.2 Location . . . . . . . . . . . . . . . . . . . . . 25
+ 6.3.3 Status . . . . . . . . . . . . . . . . . . . . . . 26
+ 6.3.4 Protocol-Specific Header Fields . . . . . . . . . . 26
+ 6.3.5 Extension Header Fields . . . . . . . . . . . . . . 27
+ 6.4 Response Message-Body . . . . . . . . . . . . . . . . . . 27
+
+ 7 System Specifications 27
+ 7.1 AmigaDOS . . . . . . . . . . . . . . . . . . . . . . . . . 27
+ 7.2 UNIX . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
+ 7.3 EBCDIC/POSIX . . . . . . . . . . . . . . . . . . . . . . . 28
+
+ 8 Implementation 29
+ 8.1 Recommendations for Servers . . . . . . . . . . . . . . . 29
+ 8.2 Recommendations for Scripts . . . . . . . . . . . . . . . 29
+
+ 9 Security Considerations 30
+ 9.1 Safe Methods . . . . . . . . . . . . . . . . . . . . . . . 30
+ 9.2 Header Fields Containing Sensitive Information . . . . . . 30
+ 9.3 Data Privacy . . . . . . . . . . . . . . . . . . . . . . . 30
+ 9.4 Information Security Model . . . . . . . . . . . . . . . . 30
+ 9.5 Script Interference with the Server . . . . . . . . . . . 30
+ 9.6 Data Length and Buffering Considerations . . . . . . . . . 31
+ 9.7 Stateless Processing . . . . . . . . . . . . . . . . . . . 31
+ 9.8 Relative Paths . . . . . . . . . . . . . . . . . . . . . . 32
+ 9.9 Non-parsed Header Output . . . . . . . . . . . . . . . . . 32
+
+ 10 Acknowledgements 32
+
+ 11 References 32
+
+ 12 Authors' Addresses 34
+
+
+
+
+
+
+
+
+
+Robinson & Coar Expires 18 April 2004 [Page 3]
+\f
+INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003
+
+
+1 Introduction
+
+1.1 Purpose
+
+ The Common Gateway Interface (CGI) [21] allows an HTTP [2], [8]
+ server and a CGI script to share responsibility for responding to
+ client requests. The client request comprises a Universal Resource
+ Identifier (URI) [1], a request method and various ancillary
+ information about the request provided by the transport protocol.
+
+ The CGI defines the abstract parameters, known as meta-variables,
+ which describe the client's request. Together with a concrete
+ programmer interface this specifies a platform-independent interface
+ between the script and the HTTP server.
+
+ The server is responsible for managing connection, data transfer,
+ transport and network issues related to the client request, whereas
+ the CGI script handles the application issues, such as data access
+ and document processing.
+
+1.2 Requirements
+
+ The key words 'MUST', 'MUST NOT', 'REQUIRED', 'SHALL', 'SHALL NOT',
+ 'SHOULD', 'SHOULD NOT', 'RECOMMENDED', 'MAY' and 'OPTIONAL' in this
+ document are to be interpreted as described in RFC 2119 [5].
+
+ An implementation is not compliant if it fails to satisfy one or more
+ of the 'must' requirements for the protocols it implements. An
+ implementation that satisfies all of the 'must' and all of the
+ 'should' requirements for its features is said to be 'unconditionally
+ compliant'; one that satisfies all of the 'must' requirements but not
+ all of the 'should' requirements for its features is said to be
+ 'conditionally compliant'.
+
+1.3 Specifications
+
+ Not all of the functions and features of the CGI are defined in the
+ main part of this specification. The following phrases are used to
+ describe the features that are not specified:
+
+ 'system defined'
+ The feature may differ between systems, but must be the same for
+ different implementations using the same system. A system will
+ usually identify a class of operating-systems. Some systems are
+ defined in section 7 of this document. New systems may be defined
+ by new specifications without revision of this document.
+
+ 'implementation defined'
+
+
+
+Robinson & Coar Expires 18 April 2004 [Page 4]
+\f
+INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003
+
+
+ The behaviour of the feature may vary from implementation to
+ implementation; a particular implementation must document its
+ behaviour.
+
+1.4 Terminology
+
+ This specification uses many terms defined in the HTTP/1.1
+ specification [8]; however, the following terms are used here in a
+ sense which may not accord with their definitions in that document,
+ or with their common meaning.
+
+ 'meta-variable'
+ A named parameter which carries information from the server to the
+ script. It is not necessarily a variable in the operating-
+ system's environment, although that is the most common
+ implementation.
+
+ 'script'
+ The software that is invoked by the server according to this
+ interface. It need not be a standalone program, but could be a
+ dynamically-loaded or shared library, or even a subroutine in the
+ server. It might be a set of statements interpreted at run-time,
+ as the term 'script' is frequently understood, but that is not a
+ requirement and within the context of this specification the term
+ has the broader definition stated.
+
+ 'server'
+ The application program that invokes the script in order to
+ service requests from the client.
+
+2 Notational Conventions and Generic Grammar
+
+2.1 Augmented BNF
+
+ All of the mechanisms specified in this document are described in
+ both prose and an augmented Backus-Naur Form (BNF) similar to that
+ used by RFC 822 [6]. Unless stated otherwise, the elements are
+ case-sensitive. This augmented BNF contains the following
+ constructs:
+
+ name = definition
+ The name of a rule and its definition are separated by the equals
+ character ('='). Whitespace is only significant in that
+ continuation lines of a definition are indented.
+
+ "literal"
+ Double quotation marks (") surround literal text, except for a
+ literal quotation mark, which is surrounded by angle-brackets ('<'
+
+
+
+Robinson & Coar Expires 18 April 2004 [Page 5]
+\f
+INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003
+
+
+ and '>').
+
+ rule1 | rule2
+ Alternative rules are separated by a vertical bar ('|').
+
+ (rule1 rule2 rule3)
+ Elements enclosed in parentheses are treated as a single element.
+
+ *rule
+ A rule preceded by an asterisk ('*') may have zero or more
+ occurrences. The full form is 'n*m rule' indicating at least n
+ and at most m occurrences of the rule. n and m are optional
+ decimal values with default values of 0 and infinity respectively.
+
+ [rule]
+ An element enclosed in square brackets ('[' and ']') is optional,
+ and is equivalent to '*1 rule'.
+
+ N rule
+ A rule preceded by a decimal number represents exactly N
+ occurrences of the rule. It is equivalent to 'N*N rule'.
+
+2.2 Basic Rules
+
+ This specification uses a BNF-like grammar defined in terms of
+ characters. Unlike many specifications which define the bytes
+ allowed by a protocol, here each literal in the grammar corresponds
+ to the character it represents. How these characters are represented
+ in terms of bits and bytes within a a system are either
+ system-defined or specified in the particular context. The single
+ exception is the rule 'OCTET', defined below.
+
+ The following rules are used throughout this specification to
+ describe basic parsing constructs.
+
+ alpha = lowalpha | hialpha
+ lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" |
+ "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" |
+ "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" |
+ "y" | "z"
+ hialpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" |
+ "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" |
+ "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" |
+ "Y" | "Z"
+ digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
+ "8" | "9"
+ alphanum = alpha | digit
+ OCTET = <any 8-bit byte>
+
+
+
+Robinson & Coar Expires 18 April 2004 [Page 6]
+\f
+INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003
+
+
+ CHAR = alpha | digit | separator | "!" | "#" | "$" |
+ "%" | "&" | "'" | "*" | "+" | "-" | "." | "`" |
+ "^" | "_" | "{" | "|" | "}" | "~" | CTL
+ CTL = <any control character>
+ SP = <space character>
+ HT = <horizontal tab character>
+ NL = <newline>
+ LWSP = SP | HT | NL
+ separator = "(" | ")" | "<" | ">" | "@" | "," | ";" | ":" |
+ "\" | <"> | "/" | "[" | "]" | "?" | "=" | "{" |
+ "}" | SP | HT
+ token = 1*<any CHAR except CTLs or separators>
+ quoted-string = <"> *qdtext <">
+ qdtext = <any CHAR except <"> and CTLs but including LWSP>
+ TEXT = <any printable character>
+
+ Note that newline (NL) need not be a single control character, but
+ can be a sequence of control characters. A system MAY define TEXT to
+ be a larger set of characters than <any CHAR excluding CTLs but
+ including LWSP>.
+
+2.3 URL Encoding
+
+ Some variables and constructs used here are described as being
+ 'URL-encoded'. This encoding is described in section 2 of RFC 2396
+ [3]. In a URL-encoded string an escape sequence consists of a
+ percent character ("%") followed by two hexadecimal digits, where the
+ two hexadecimal digits form an octet. An escape sequence represents
+ the graphic character that has the octet as its code within the
+ US-ASCII [20] coded character set, if it exists. Currently there is
+ no provision within the URI syntax to identify which character set
+ non-ASCII codes represent, so CGI handles this issue on an ad-hoc
+ basis.
+
+ Note that some unsafe (reserved) characters may have different
+ semantics when encoded. The definition of which characters are
+ unsafe depends on the context; see section 2 of RFC 2396 [3], updated
+ by RFC 2732 [11], for an authoritative treatment. These reserved
+ characters are generally used to provide syntactic structure to the
+ character string, for example as field separators. In all cases, the
+ string is first processed with regard to any reserved characters
+ present, and then the resulting data can be URL-decoded by replacing
+ "%" escapes by their character values.
+
+ To encode a character string, all reserved and forbidden characters
+ are replaced by the corresponding "%" escapes. The string can then
+ be used in assembling a URI. The reserved characters will vary from
+ context to context, but will always be drawn from this set:
+
+
+
+Robinson & Coar Expires 18 April 2004 [Page 7]
+\f
+INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003
+
+
+ reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" |
+ "," | "[" | "]"
+
+ The last two characters were added by RFC 2732 [11]. In any
+ particular context, a sub-set of these characters will be reserved;
+ the other characters from this set MUST NOT be encoded when a string
+ is URL-encoded in that context. Other basic rules used to describe
+ URI syntax are:
+
+ hex = digit | "A" | "B" | "C" | "D" | "E" | "F" | "a" | "b"
+ | "c" | "d" | "e" | "f"
+ escaped = "%" hex hex
+ unreserved = alpha | digit | mark
+ mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"
+
+3 Invoking the Script
+
+3.1 Server Responsibilities
+
+ The server acts as an application gateway. It receives the request
+ from the client, selects a CGI script to handle the request, converts
+ the client request to a CGI request, executes the script and converts
+ the CGI response into a response for the client. When processing the
+ client request, it is responsible for implementing any protocol or
+ transport level authentication and security. The server MAY also
+ function in a 'non-transparent' manner, modifying the request or
+ response in order to provide some additional service, such as media
+ type transformation or protocol reduction.
+
+ The server MUST perform translations and protocol conversions on the
+ client request data required by this specification. Furthermore, the
+ server retains its responsibility to the client to conform to the
+ relevant network protocol even if the CGI script fails to conform to
+ this specification.
+
+ If the server is applying authentication to the request, then it MUST
+ NOT execute the script unless the request passes all defined access
+ controls.
+
+3.2 Script Selection
+
+ The server determines which CGI is script to be executed based on a
+ generic-form URI supplied by the client. This URI includes a
+ hierarchical path with components separated by "/". For any
+ particular request, the server will identify all or a leading part of
+ this path with an individual script, thus placing the script at a
+ particular point in the path hierarchy. The remainder of the path,
+ if any, is a resource or sub-resource identifier to be interpreted by
+
+
+
+Robinson & Coar Expires 18 April 2004 [Page 8]
+\f
+INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003
+
+
+ the script.
+
+ Information about this split of the path is available to the script
+ in the meta-variables, described below. Support for non-hierarchical
+ URI schemes is outside the scope of this specification.
+
+3.3 The Script-URI
+
+ The mapping from client request URI to choice of script is defined by
+ the particular server implementation and its configuration. The
+ server may allow the script to be identified with a set of several
+ different URI path hierarchies, and therefore is permitted to replace
+ the URI by other members of this set during processing and generation
+ of the meta-variables. The server
+
+ 1. MAY preserve the URI in the particular client request; or
+
+ 2. MAY select a canonical URI from the set of possible values for
+ each script; or
+
+ 3. can implement any other selection of URI from the set.
+
+ From the meta-variables thus generated, a URI, the 'Script-URI', can
+ be constructed. This MUST have the property that if the client had
+ accessed this URI instead, then the script would have been executed
+ with the same values for the SCRIPT_NAME, PATH_INFO and QUERY_STRING
+ meta-variables. The Script-URI has the structure of a generic URI as
+ defined in section 3 of RFC 2396 [3], with the exception that object
+ parameters and fragment identifiers are not permitted. The various
+ components of the Script-URI are defined by some of the
+ meta-variables (see below);
+
+ script-URI = <scheme> "://" <server-name> ":" <server-port>
+ <script-path> <extra-path> "?" <query-string>
+
+ where <scheme> is found from SERVER_PROTOCOL, <server-name>,
+ <server-port> and <query-string> are the values of the respective
+ meta-variables. The SCRIPT_NAME and PATH_INFO values, URL-encoded
+ with ";", "=" and "?" reserved, give <script-path> and <extra-path>.
+ See section 4.1.5 for more information about the PATH_INFO
+ meta-variable.
+
+ The scheme and the protocol are not identical as the scheme
+ identifies the access method in addition to the protocol. For
+ example, a resource accessed using Transport Layer Security (TLS) [7]
+ would have a request URI with a scheme of https when using the HTTP
+ protocol [16]. CGI/1.1 provides no generic means for the script to
+ reconstruct this, and therefore the Script-URI as defined includes
+
+
+
+Robinson & Coar Expires 18 April 2004 [Page 9]
+\f
+INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003
+
+
+ the base protocol used. However, a script MAY make use of
+ scheme-specific meta-variables to better deduce the URI scheme.
+
+ Note that this definition also allows URIs to be constructed which
+ would invoke the script with any permitted values for the path-info
+ or query-string, by modifying the appropriate components.
+
+3.4 Execution
+
+ The script is invoked in a system defined manner. Unless specified
+ otherwise, the file containing the script will be invoked as an
+ executable program. The server prepares the CGI request as described
+ in section 4; this comprises the request meta-variables (immediately
+ available to the script on execution) and request message data. The
+ request data need not be immediately available to the script; the
+ script can be executed before all this data has been received by the
+ server from the client. The response from the script is returned to
+ the server as described in sections 5 and 6.
+
+ In the event of an error condition, the server can interrupt or
+ terminate script execution at any time and without warning. That
+ could occur, for example, in the event of a transport failure between
+ the server and the client; so the script SHOULD be prepared to handle
+ abnormal termination.
+
+4 The CGI Request
+
+ Information about a request comes from two different sources; the
+ request meta-variables and any associated message-body.
+
+4.1 Request Meta-Variables
+
+ Meta-variables contain data about the request passed from the server
+ to the script, and are accessed by the script in a system defined
+ manner. Meta-variables are identified by case-insensitive names;
+ there cannot be two different variables whose names differ in case
+ only. Here they are shown using a canonical representation of
+ capitals plus underscore ("_"). A particular system can define a
+ different representation.
+
+ meta-variable-name = "AUTH_TYPE" | "CONTENT_LENGTH" |
+ "CONTENT_TYPE" | "GATEWAY_INTERFACE" |
+ "PATH_INFO" | "PATH_TRANSLATED" |
+ "QUERY_STRING" | "REMOTE_ADDR" |
+ "REMOTE_HOST" | "REMOTE_IDENT" |
+ "REMOTE_USER" | "REQUEST_METHOD" |
+ "SCRIPT_NAME" | "SERVER_NAME" |
+ "SERVER_PORT" | "SERVER_PROTOCOL" |
+
+
+
+Robinson & Coar Expires 18 April 2004 [Page 10]
+\f
+INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003
+
+
+ "SERVER_SOFTWARE" | scheme |
+ protocol-var-name | extension-var-name
+ protocol-var-name = ( protocol | scheme ) "_" var-name
+ scheme = alpha *( alpha | digit | "+" | "-" | "." )
+ var-name = token
+ extension-var-name = token
+
+ Meta-variables with the same name as a scheme, and names beginning
+ with the name of a protocol or scheme (e.g. HTTP_ACCEPT) are also be
+ specified. The number and meaning of these variables may change
+ independently of this specification. (See also section 4.1.18.)
+
+ The server MAY define additional implementation-specific extension
+ meta-variables, whose names SHOULD be prefixed with "X_".
+
+ This specification does not distinguish between zero-length (NULL)
+ values and missing values. For example, a script cannot distinguish
+ between the two requests http://host/script and http://host/script?
+ as in both cases the QUERY_STRING meta-variable would be NULL.
+
+ meta-variable-value = "" | 1*<TEXT, CHAR or tokens of value>
+
+ An optional meta-variable may be omitted (left unset) if its value is
+ NULL. Meta-variable values MUST be considered case-sensitive except
+ as noted otherwise. The representation of the characters in the
+ meta-variables is system defined; the server MUST convert values to
+ that representation.
+
+4.1.1 AUTH_TYPE
+
+ The AUTH_TYPE variable identifies any mechanism used by the server to
+ authenticate the user. It contains a case-insensitive value defined
+ by the client protocol or server implementation.
+
+ For HTTP, If the client request required authentication for external
+ access, then the server MUST set the value of this variable from the
+ 'auth-scheme' token in the request Authorization header field.
+
+ AUTH_TYPE = "" | auth-scheme
+ auth-scheme = "Basic" | "Digest" | extension-auth
+ extension-auth = token
+
+ HTTP access authentication schemes are described in RFC 2617 [9].
+
+4.1.2 CONTENT_LENGTH
+
+ The CONTENT_LENGTH variable contains the size of the message-body
+ attached to the request, if any, in decimal number of octets. If no
+
+
+
+Robinson & Coar Expires 18 April 2004 [Page 11]
+\f
+INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003
+
+
+ data is attached, then NULL (or unset).
+
+ CONTENT_LENGTH = "" | 1*digit
+
+ The server MUST set this meta-variable if and only if the request is
+ accompanied by a message-body entity. The CONTENT_LENGTH value must
+ reflect the length of the message-body after the server has removed
+ any transfer-codings or content-codings.
+
+4.1.3 CONTENT_TYPE
+
+ If the request includes a message-body, the CONTENT_TYPE variable is
+ set to the Internet Media Type [10] of the message-body.
+
+ CONTENT_TYPE = "" | media-type
+ media-type = type "/" subtype *( ";" parameter )
+ type = token
+ subtype = token
+ parameter = attribute "=" value
+ attribute = token
+ value = token | quoted-string
+
+ The type, subtype and parameter attribute names are not case-
+ sensitive. Parameter values may be case sensitive. Media types and
+ their use in HTTP are described section 3.7 of the HTTP/1.1
+ specification [8].
+
+ There is no default value for this variable. If and only if it is
+ unset, then the script MAY attempt to determine the media type from
+ the data received. If the type remains unknown, then the script MAY
+ choose to assume a type of application/octet-stream or it may reject
+ the request with an error (as described in section 6.3.3).
+
+ Each media-type defines a set of optional and mandatory parameters.
+ This may include a charset parameter with a case-insensitive value
+ defining the coded character set for the message-body. If the
+ charset parameter is omitted, then the default value should be
+ derived according to whichever of the following rules is the first to
+ apply:
+
+ 1. There MAY be a system-defined default charset for some
+ media-types.
+
+ 2. The default for media-types of type "text" is ISO-8859-1 [8].
+
+ 3. Any default defined in the media-type specification.
+
+ 4. The default is US-ASCII.
+
+
+
+Robinson & Coar Expires 18 April 2004 [Page 12]
+\f
+INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003
+
+
+ The server MUST set this meta-variable if an HTTP Content-Type field
+ is present in the client request header. If the server receives a
+ request with an attached entity but no Content-Type header field, it
+ MAY attempt to determine the correct content type, otherwise it
+ should omit this meta-variable.
+
+4.1.4 GATEWAY_INTERFACE
+
+ The GATEWAY_INTERFACE variable MUST be set to the dialect of CGI
+ being used by the server to communicate with the script. Syntax:
+
+ GATEWAY_INTERFACE = "CGI" "/" 1*digit "." 1*digit
+
+ Note that the major and minor numbers are treated as separate
+ integers and hence each may be incremented higher than a single
+ digit. Thus CGI/2.4 is a lower version than CGI/2.13 which in turn
+ is lower than CGI/12.3. Leading zeros MUST be ignored by the script
+ and MUST NOT be generated by the server.
+
+ This document defines the 1.1 version of the CGI interface.
+
+4.1.5 PATH_INFO
+
+ The PATH_INFO variable specifies a path to be interpreted by the CGI
+ script. It identifies the resource or sub-resource to be returned by
+ the CGI script, and is derived from the the portion of the URI path
+ hierarchy following the part that identifies the script itself.
+ Unlike a URI path, the PATH_INFO is not URL-encoded, and cannot
+ contain path-segment parameters. A PATH_INFO of "/" represents a
+ single void path segment.
+
+ PATH_INFO = "" | ( "/" path )
+ path = lsegment *( "/" lsegment )
+ lsegment = *lchar
+ lchar = <any TEXT or CTL except "/">
+
+ The value is considered case-sensitive and the server MUST preserve
+ the case of the path as presented in the request URI. The server MAY
+ impose restrictions and limitations on what values it permits for
+ PATH_INFO, and MAY reject the request with an error if it encounters
+ any values considered objectionable. That MAY include any requests
+ that would result in an encoded "/" being decoded into PATH_INFO, as
+ this might represent a loss of information to the script. Similarly,
+ treatment of non US-ASCII characters in the path is system defined.
+
+ URL-encoded, the PATH_INFO string forms the extra-path component of
+ the Script-URI (see section 3.3) which follows the SCRIPT_NAME part
+ of that path.
+
+
+
+Robinson & Coar Expires 18 April 2004 [Page 13]
+\f
+INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003
+
+
+4.1.6 PATH_TRANSLATED
+
+ The PATH_TRANSLATED variable is derived by taking the PATH_INFO
+ value, parsing it as a local URI in its own right, and performing any
+ virtual-to-physical translation appropriate to map it onto the
+ server's document repository structure. The set of characters
+ permitted in the result is system defined.
+
+ PATH_TRANSLATED = *<any character>
+
+ This is the file location that would be accessed by a request for
+
+ <scheme> "://" <server-name> ":" <server-port> <extra-path>
+
+ where <scheme> is the scheme for the original client request and
+ <extra-path> is a URL-encoded version of PATH_INFO, with ";", "=" and
+ "?" reserved. For example, a request such as the following:
+
+ http://somehost.com/cgi-bin/somescript/this%2eis%2epath%3binfo
+
+ would result in a PATH_INFO value of
+
+ /this.is.the.path;info
+
+ An internal URI is constructed from the scheme, server location and
+ the URL-encoded PATH_INFO:
+
+ http://somehost.com/this.is.the.path%3binfo
+
+ This would then be translated to a location in the server's document
+ repository, perhaps a filesystem path something like this:
+
+ /usr/local/www/htdocs/this.is.the.path;info
+
+ The result of the translation is the value of PATH_TRANSLATED.
+
+ The value of PATH_TRANSLATED is derived in this way irrespective of
+ whether it maps to a valid repository location. The server MUST
+ preserve the case of the extra-path segment unless the underlying
+ repository supports case-insensitive names. If the repository is
+ only case-aware, case-preserving, or case-blind with regard to
+ document names, the server is not required to preserve the case of
+ the original segment through the translation.
+
+ The translation algorithm the server uses to derive PATH_TRANSLATED
+ is implementation defined; CGI scripts which use this variable may
+ suffer limited portability.
+
+
+
+
+Robinson & Coar Expires 18 April 2004 [Page 14]
+\f
+INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003
+
+
+ The server SHOULD set this meta-variable if the request URI includes
+ a path-info component. If PATH_INFO is NULL, then the
+ PATH_TRANSLATED variable MUST be set to NULL (or unset).
+
+4.1.7 QUERY_STRING
+
+ The QUERY_STRING variable contains a URL-encoded search or parameter
+ string; it provides information to the CGI script to affect or refine
+ the document to be returned by the script.
+
+ The URL syntax for a search string is described in section 3 of RFC
+ 2396 [3]. The QUERY_STRING value is case-sensitive.
+
+ QUERY_STRING = query-string
+ query-string = *uric
+ uric = reserved | unreserved | escaped
+
+ When parsing and decoding the query string, the details of the
+ parsing, reserved characters and support for non US-ASCII characters
+ depends on the context. For example, form submission from an HTML
+ document [15] uses application/x-www-form-urlencoded encoding, in
+ which the characters "+", "&" and "=" are reserved, and the ISO
+ 8859-1 encoding may be used for non US-ASCII characters.
+
+ The QUERY_STRING value provides the query-string part of the
+ Script-URI. (See section 3.3).
+
+ The server MUST set this variable; if the Script-URI does not include
+ a query component, the QUERY_STRING MUST be defined as an empty
+ string ("").
+
+4.1.8 REMOTE_ADDR
+
+ The REMOTE_ADDR variable MUST be set to the network address of the
+ client sending the request to the server.
+
+ REMOTE_ADDR = hostnumber
+ hostnumber = ipv4-address | ipv6-address
+ ipv4-address = 1*3digit "." 1*3digit "." 1*3digit "." 1*3digit
+ ipv6-address = hexpart [ ":" ipv4-address ]
+ hexpart = hexseq | ( [ hexseq ] "::" [ hexseq ] )
+ hexseq = 1*4hex *( ":" 1*4hex )
+
+ The format of IPv6 addresses is defined in RFC 2373 [12].
+
+
+
+
+
+
+
+Robinson & Coar Expires 18 April 2004 [Page 15]
+\f
+INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003
+
+
+4.1.9 REMOTE_HOST
+
+ The REMOTE_HOST variable contains the fully qualified domain name of
+ the client sending the request to the server, if available, otherwise
+ NULL. Fully qualified domain names take the form as described in
+ section 3.5 of RFC 1034 [14] and section 2.1 of RFC 1123 [4]. Domain
+ names are not case sensitive.
+
+ REMOTE_HOST = "" | hostname | hostnumber
+ hostname = *( domainlabel "." ) toplabel [ "." ]
+ domainlabel = alphanum [ *alphahypdigit alphanum ]
+ toplabel = alpha [ *alphahypdigit alphanum ]
+ alphahypdigit = alphanum | "-"
+
+ The server SHOULD set this variable. If the hostname is not
+ available for performance reasons or otherwise, the server MAY
+ substitute the REMOTE_ADDR value.
+
+4.1.10 REMOTE_IDENT
+
+ The REMOTE_IDENT variable MAY be used to provide identity information
+ reported about the connection by an RFC 1413 [17] request to the
+ remote agent, if available. The server may choose not to support
+ this feature, or not to request the data for efficiency reasons, or
+ not to return available identity data.
+
+ REMOTE_IDENT = *TEXT
+
+ The data returned may be used for authentication purposes, but the
+ level of trust reposed in it should be minimal.
+
+4.1.11 REMOTE_USER
+
+ The REMOTE_USER variable provides a user identification string
+ supplied by client as part of user authentication.
+
+ REMOTE_USER = *TEXT
+
+ If the client request required HTTP Authentication [9] (e.g. the
+ AUTH_TYPE meta-variable is set to "Basic" or "Digest"), then the
+ value of the REMOTE_USER meta-variable MUST be set to the user-ID
+ supplied.
+
+4.1.12 REQUEST_METHOD
+
+ The REQUEST_METHOD meta-variable MUST be set to the method which
+ should be used by the script to process the request, as described in
+ section 4.3.
+
+
+
+Robinson & Coar Expires 18 April 2004 [Page 16]
+\f
+INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003
+
+
+ REQUEST_METHOD = method
+ method = "GET" | "POST" | "HEAD" | extension-method
+ extension-method = "PUT" | "DELETE" | token
+
+ The method is case sensitive. The HTTP methods are described in
+ section 5.1.1 of the HTTP/1.0 specification [2] and section 5.1.1 of
+ the HTTP/1.1 specification [8].
+
+4.1.13 SCRIPT_NAME
+
+ The SCRIPT_NAME variable MUST be set to a URI path (not URL-encoded)
+ which could identify the CGI script (rather then the script's
+ output). The syntax is the same as for PATH_INFO (section 4.1.5)
+
+ SCRIPT_NAME = "" | ( "/" path )
+
+ The leading "/" is not part of the path. It is optional if the path
+ is NULL; however, the variable MUST still be set in that case.
+
+ The SCRIPT_NAME string forms some leading part of the path component
+ of the Script-URI derived in some implementation defined manner. No
+ PATH_INFO segment (see section 4.1.5) is included in the SCRIPT_NAME
+ value.
+
+4.1.14 SERVER_NAME
+
+ The SERVER_NAME variable MUST be set to the name of the server host
+ to which the client request is directed. It is a case-insensitive
+ hostname or network address. It forms the host part of the
+ Script-URI. The syntax for an IPv6 address in a URI is defined in
+ RFC 2373 [12].
+
+ SERVER_NAME = server-name
+ server-name = hostname | ipv4-address | ( "[" ipv6-address "]" )
+
+ A deployed server can have more than one possible value for this
+ variable, where several HTTP virtual hosts share the same IP address.
+ In that case, the server uses the contents of the Host header field
+ to select the correct virtual host.
+
+4.1.15 SERVER_PORT
+
+ The SERVER_PORT variable MUST be set to the TCP/IP port number on
+ which this request is received from the client. This value is used
+ in the port part of the Script-URI.
+
+ SERVER_PORT = server-port
+ server-port = 1*digit
+
+
+
+Robinson & Coar Expires 18 April 2004 [Page 17]
+\f
+INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003
+
+
+ Note that this variable MUST be set, even if the port is the default
+ port for the scheme and could otherwise be omitted from a URI.
+
+4.1.16 SERVER_PROTOCOL
+
+ The SERVER_PROTOCOL variable MUST be set to the name and version of
+ the application protocol used for this CGI request. This is not
+ necessarily the same as the protocol version used by the server in
+ its communication with the client.
+
+ SERVER_PROTOCOL = HTTP-Version | "INCLUDED" | extension-version
+ HTTP-Version = "HTTP" "/" 1*digit "." 1*digit
+ extension-version = protocol [ "/" 1*digit "." 1*digit ]
+ protocol = token
+
+ 'protocol' is a version of the scheme part of the Script-URI, and is
+ not case sensitive. By convention, 'protocol' is in upper case. The
+ protocol may not be identical to the scheme of the request; for
+ example, the request may have scheme "https", whilst the protocol is
+ "HTTP".
+
+ A well-known value for SERVER_PROTOCOL which the server MAY use is
+ "INCLUDED", which signals that the current document is being included
+ as part of a composite document, rather than being the direct target
+ of the client request. The script should treat this as an HTTP/1.0
+ request.
+
+4.1.17 SERVER_SOFTWARE
+
+ The SERVER_SOFTWARE meta-variable MUST be set to the name and version
+ of the information server software making the CGI request (and
+ running the gateway). It SHOULD be the same as the server
+ description reported to the client, if any.
+
+ SERVER_SOFTWARE = 1*( product | comment )
+ product = token [ "/" product-version ]
+ product-version = token
+ comment = "(" *( ctext | comment ) ")"
+ ctext = <any TEXT excluding "(" and ")">
+
+4.1.18 Protocol-Specific Meta-Variables
+
+ The server SHOULD set meta-variables specific to the protocol and
+ scheme for the request. Interpretation of protocol-specific
+ variables depends on the protocol version in SERVER_PROTOCOL. The
+ server MAY set a meta-variable with the name of the scheme to a
+ non-NULL value if the scheme is not the same as the protocol. The
+ presence of such a variable indicates to a script which scheme is
+
+
+
+Robinson & Coar Expires 18 April 2004 [Page 18]
+\f
+INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003
+
+
+ used by the request.
+
+ Meta-variables with names beginning with "HTTP_" contain values read
+ from the client request header fields, if the protocol used is HTTP.
+ The HTTP header field name is converted to upper case, has all
+ occurrences of "-" replaced with "_" and has "HTTP_" prepended to
+ give the meta-variable name. The header data can be presented as
+ sent by the client, or can be rewritten in ways which do not change
+ its semantics. If multiple header fields with the same field-name
+ are received then the server MUST rewrite them as a single value
+ having the same semantics. Similarly, a header field that spans
+ multiple lines must be merged onto a single line. The server MUST,
+ if necessary, change the representation of the data (for example, the
+ character set) to be appropriate for a CGI meta-variable.
+
+ The server is not required to create meta-variables for all the
+ header fields that it receives. In particular, it SHOULD remove any
+ header fields carrying authentication information, such as
+ 'Authorization'; or that are available to the script in other
+ variables, such as 'Content-Length' and 'Content-Type'. The server
+ MAY remove header fields that relate solely to client-side
+ communication issues, such as 'Connection'.
+
+4.2 Request Message-Body
+
+ Request data is accessed by the script in a system-defined method;
+ unless defined otherwise, this will be by reading the 'standard
+ input' file descriptor or file handle.
+
+ Request-Data = [ request-body ] [ extension-data ]
+ request-body = <CONTENT_LENGTH>OCTET
+ extension-data = *OCTET
+
+ A request-body is supplied with the request if the CONTENT_LENGTH is
+ not NULL. The server MUST make at least that many bytes available
+ for the script to read. The server MAY signal an end-of-file
+ condition after CONTENT_LENGTH bytes have been read or it MAY supply
+ extension data. Therefore, the script MUST NOT attempt to read more
+ than CONTENT_LENGTH bytes, even if more data is available. However,
+ it is not obliged to read any of the data.
+
+ For non-parsed header (NPH) scripts (section 5), the server SHOULD
+ attempt to ensure that the data supplied to the script is precisely
+ as supplied by the client and is unaltered by the server.
+
+ As transfer-codings are not supported on the request-body, the server
+ MUST remove any such codings from the message-body, and recalculate
+ the CONTENT_LENGTH. If this is not possible (for example, because of
+
+
+
+Robinson & Coar Expires 18 April 2004 [Page 19]
+\f
+INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003
+
+
+ large buffering requirements), the server SHOULD reject the client
+ request. It MAY also remove content-codings from the message-body.
+
+4.3 Request Methods
+
+ The Request Method, as supplied in the REQUEST_METHOD meta-variable,
+ identifies the processing method to be applied by the script in
+ producing a response. The script author can choose to implement the
+ methods most appropriate for the particular application. If the
+ script receives a request with a method it does not support it SHOULD
+ reject it with an error (see section 6.3.3).
+
+4.3.1 GET
+
+ The GET method method indicates that the script should produce a
+ document based on the meta-variable values. By convention, the GET
+ method is 'safe' and 'idempotent' and SHOULD NOT have the the
+ significance of taking an action other than producing a document.
+
+ The meaning of the GET method may be modified and refined by
+ protocol-specific meta-variables.
+
+4.3.2 POST
+
+ The POST method is used to request the script perform processing and
+ produce a document based on the data in the request message-body, in
+ addition to meta-variable values. A common use is form submission in
+ HTML [15], intended to initiate processing by the script that has a
+ permanent affect, such a change in a database.
+
+ The script MUST check the value of the CONTENT_LENGTH variable before
+ reading the attached message-body, and SHOULD check the CONTENT_TYPE
+ value before processing it.
+
+4.3.3 HEAD
+
+ The HEAD method requests the script to do sufficient processing to
+ return the response header fields, without providing a response
+ message-body. The script MUST NOT provide a response message-body
+ for a HEAD request. If it does, then the server MUST discard the
+ message-body when reading the response.
+
+4.3.4 Protocol-Specific Methods
+
+ The script MAY implement any protocol-specific method, such as
+ HTTP/1.1 PUT and DELETE; it SHOULD check the value of SERVER_PROTOCOL
+ when doing so.
+
+
+
+
+Robinson & Coar Expires 18 April 2004 [Page 20]
+\f
+INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003
+
+
+ The server MAY decide that some methods are not appropriate or
+ permitted for a script, and may handle the methods itself or return
+ an error to the client.
+
+4.4 The Script Command Line
+
+ Some systems support a method for supplying an array of strings to
+ the CGI script. This is only used in the case of an 'indexed' HTTP
+ query, which is identified by a 'GET' or 'HEAD' request with a URI
+ query string that does not contain any unencoded "=" characters. For
+ such a request, the server SHOULD treat the query-string as a
+ search-string and parse it into words, using the rules
+
+ search-string = search-word *( "+" search-word )
+ search-word = 1*schar
+ schar = unreserved | escaped | xreserved
+ xreserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "," |
+ "$"
+
+ After parsing, each search-word is URL-decoded, optionally encoded in
+ a system defined manner and then added to the argument list.
+
+ If the server cannot create any part of the argument list, then the
+ server MUST NOT generate any command line information. For example,
+ the number of arguments may be greater than operating system or
+ server limits, or one of the words may not be representable as an
+ argument.
+
+ The script SHOULD check to see if the QUERY_STRING value contains an
+ unencoded "=" character, and SHOULD NOT use the command line
+ arguments if it does.
+
+5 NPH Scripts
+
+5.1 Identification
+
+ The server MAY support NPH (Non-Parsed Header) scripts; these are
+ scripts to which the server passes all responsibility for response
+ processing.
+
+ This specification provides no mechanism for an NPH script to be
+ identified on the basis of its output data alone. By convention,
+ therefore, any particular script can only ever provide output of one
+ type (NPH or CGI) and hence the script itself is described as an 'NPH
+ script'. A server with NPH support MUST provide an implementation-
+ defined mechanism for identifying NPH scripts, perhaps based on the
+ name or location of the script.
+
+
+
+
+Robinson & Coar Expires 18 April 2004 [Page 21]
+\f
+INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003
+
+
+5.2 NPH Response
+
+ There MUST be a system defined method for the script to send data
+ back to the server or client; a script MUST always return some data.
+ Unless defined otherwise, this will be the same as for conventional
+ CGI scripts.
+
+ Currently, NPH scripts are only defined for HTTP client requests. An
+ (HTTP) NPH script MUST return a complete HTTP response message,
+ currently described in section 6 of the HTTP specifications [2], [8].
+ The script MUST use the SERVER_PROTOCOL variable to determine the
+ appropriate format for a response. It MUST also take account of any
+ generic or protocol-specific meta-variables in the request as might
+ be mandated by the particular protocol specification.
+
+ The server MUST ensure that the script output is sent to the client
+ unmodified. Note that this requires the script to use the correct
+ character set (US-ASCII [20] and ISO 8859-1 [21] for HTTP) in the
+ header fields. The server SHOULD attempt to ensure that the script
+ output is sent directly to the client, with minimal internal and no
+ transport-visible buffering.
+
+ Unless the implementation defines otherwise, the script MUST NOT
+ indicate in its response that the client can send further requests
+ over the same connection.
+
+6 CGI Response
+
+6.1 Response Handling
+
+ A script MUST always provide a non-empty response, and so there is a
+ system defined method for it to send this data back to the server.
+ Unless defined otherwise, this will be via the 'standard output' file
+ descriptor.
+
+ The script MUST check the REQUEST_METHOD variable when processing the
+ request and preparing its response.
+
+ The server MAY implement a timeout period within which data must be
+ received from the script. If a server implementation defines such a
+ timeout and receives no data from a script within the timeout period,
+ the server MAY terminate the script process.
+
+6.2 Response Types
+
+ The response comprises a message-header and a message-body, separated
+ by a blank line. The message-header contains one ore more header
+ fields. The body may be NULL.
+
+
+
+Robinson & Coar Expires 18 April 2004 [Page 22]
+\f
+INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003
+
+
+ generic-response = 1*header-field NL [ response-body ]
+
+ The script MUST return one of either a document response, a local
+ redirect response or a client redirect (with optional document)
+ response. In the response definitions below, the order of header
+ fields in a response is not significant (despite appearing so in the
+ BNF). The header fields are defined in section 6.3.
+
+ CGI-Response = document-response | local-redir-response |
+ client-redir-response | client-redirdoc-response
+
+6.2.1 Document Response
+
+ The CGI script can return a document to the user in a document
+ response, with an optional error code indicating the success status
+ of the response.
+
+ document-response = Content-Type [ Status ] *other-field NL
+ response-body
+
+ The script MUST return a Content-Type header field. A Status header
+ field is optional, and status 200 'OK' is assumed if it is omitted.
+ The server MUST make any appropriate modifications to the script's
+ output to ensure that the response to the client complies with the
+ response protocol version.
+
+6.2.2 Local Redirect Response
+
+ The CGI script can return a URI path and query-string
+ ('local-pathquery') for a local resource in a Location header field.
+ This indicates to the server that it should reprocess the request
+ using the path specified.
+
+ local-redir-response = local-Location NL
+
+ The script MUST NOT return any other header fields or a message-body,
+ and the server MUST generate the response that it would have produced
+ in response to a request containing the URL
+
+ scheme "://" server-name ":" server-port local-pathquery
+
+6.2.3 Client Redirect Response
+
+ The CGI script can return an absolute URI path in a Location header
+ field, to indicate to the client that it should reprocess the request
+ using the URI specified.
+
+ client-redir-response = client-Location *extension-field NL
+
+
+
+Robinson & Coar Expires 18 April 2004 [Page 23]
+\f
+INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003
+
+
+ The script MUST not provide any other header fields, except for
+ server-defined CGI extension fields. For an HTTP client request, the
+ server MUST generate a 302 'Found' HTTP response message.
+
+6.2.4 Client Redirect Response with Document
+
+ The CGI script can return an absolute URI path in a Location header
+ field together with an attached document, to indicate to the client
+ that it should reprocess the request using the URI specified.
+
+ client-redirdoc-response = client-Location Status Content-Type
+ *other-field NL response-body
+
+ The Status header field MUST be supplied and MUST contain a status
+ value of 302 'Found'. The server MUST make any appropriate
+ modifications to the script's output to ensure that the response to
+ the client complies with the response protocol version.
+
+6.3 Response Header Fields
+
+ The response header fields are either CGI or extension header fields
+ to be interpreted by the server, or protocol-specific headers to be
+ included in the response returned to the client. At least one CGI
+ field MUST be supplied; each CGI field MUST NOT appear more than once
+ in the response. The response header fields have the syntax:
+
+ header-field = CGI-field | other-field
+ CGI-field = Content-Type | Location | Status
+ other-field = protocol-field | extension-field
+ protocol-field = generic-field
+ extension-field = generic-field
+ generic-field = field-name ":" [ field-value ] NL
+ field-name = token
+ field-value = *( field-content | LWSP )
+ field-content = *( token | separator | quoted-string )
+
+ The field-name is not case sensitive. A NULL field value is
+ equivalent to a field not being sent. Note that each header field in
+ a CGI-Response MUST be specified on a single line; CGI/1.1 does not
+ support continuation lines. Whitespace is permitted between the ":"
+ and the field-value (but not between the field-name and the ":"), and
+ also between tokens in the field-value.
+
+6.3.1 Content-Type
+
+ The Content-Type response field sets the Internet Media Type [10] of
+ the entity body.
+
+
+
+
+Robinson & Coar Expires 18 April 2004 [Page 24]
+\f
+INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003
+
+
+ Content-Type = "Content-Type:" media-type NL
+
+ If an entity body is returned, the script MUST supply a Content-Type
+ field in the response. If it fails to do so, the server SHOULD NOT
+ attempt to determine the correct content type. The value SHOULD be
+ sent unmodified to the client, except for any charset parameter
+ changes.
+
+ Unless it is otherwise system-defined, the default charset assumed by
+ the client for text media-types is ISO-8859-1 if the protocol is HTTP
+ and US-ASCII otherwise. Hence the script SHOULD include a charset
+ parameter. See section 3.4.1 of the HTTP/1.1 specification [8] for a
+ discussion of this issue.
+
+6.3.2 Location
+
+ The Location header field is used to specify to the server that the
+ script is returning a reference to a document rather than an actual
+ document. It is either an absolute URI (with fragment), indicating
+ that the client is to fetch the referenced document, or a local URI
+ path (with query string), indicating that the server is to fetch the
+ referenced document.
+
+ Location = local-Location | client-Location
+ client-Location = "Location:" fragment-URI NL
+ local-Location = "Location:" local-pathquery NL
+ fragment-URI = absoluteURI [ "#" fragment ]
+ fragment = *uric
+ local-pathquery = abs-path [ "?" query-string ]
+ abs-path = "/" path-segments
+ path-segments = segment *( "/" segment )
+ segment = *pchar
+ pchar = unreserved | escaped | extra
+ extra = ":" | "@" | "&" | "=" | "+" | "$" | ","
+
+ The syntax of an absoluteURI is incorporated into this document from
+ that specified in RFC 2396 [3] and RFC 2732 [11]. A valid
+ absoluteURI always starts with the name of scheme followed by ":";
+ scheme names start with a letter and continue with alphanumerics,
+ "+", "-" or ".". The local URI path and query must be an absolute
+ path, and not a relative path or NULL, and hence must start with a
+ "/".
+
+ Note that any message-body attached to the request (such as for a
+ POST request) may not be available to the resource that is the target
+ of the redirect.
+
+
+
+
+
+Robinson & Coar Expires 18 April 2004 [Page 25]
+\f
+INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003
+
+
+6.3.3 Status
+
+ The Status header field contains a 3-digit integer result code that
+ indicates the level of success of the script's attempt to handle the
+ request.
+
+ Status = "Status:" status-code SP reason-phrase NL
+ status-code = "200" | "302" | "400" | "501" | 3digit
+ reason-phrase = *TEXT
+
+ Status code 200 'OK' indicates success, and is the default value
+ assumed for a document response. Status code 302 'Found' is used
+ with a Location header field and response message-body. Status code
+ 400 'Bad Request' may be used for an unknown request format, such as
+ a missing CONTENT_TYPE. Status code 501 'Not Implemented' may be
+ returned by a script if it receives an unsupported REQUEST_METHOD.
+
+ Other valid status codes are listed in section 6.1.1 of the HTTP
+ specifications [2], [8], and also the IANA HTTP Status Code Registry
+ [18], and can be used in addition to or instead of the ones listed
+ above. The script SHOULD check the value of SERVER_PROTOCOL before
+ using HTTP/1.1 status codes. The script MAY reject with error 405
+ 'Method Not Allowed' HTTP/1.1 requests made using a method it does
+ not support.
+
+ Note that returning an error status code does not have to mean an
+ error condition with the script itself. For example, a script that
+ is invoked as an error handler by the server should return the code
+ appropriate to the server's error condition.
+
+ The reason-phrase is a textual description of the error to be
+ returned to the client for human consumption.
+
+6.3.4 Protocol-Specific Header Fields
+
+ The script MAY return any other header fields that relate to the
+ response message defined by the specification for the SERVER_PROTOCOL
+ (HTTP/1.0 [2] or HTTP/1.1 [8]). The server MUST translate the header
+ data from the CGI header syntax to the HTTP header syntax if these
+ differ. For example, the character sequence for newline (such as
+ UNIX's US-ASCII LF) used by CGI scripts may not be the same as that
+ used by HTTP (US-ASCII CR followed by LF).
+
+ The script MUST NOT return any header fields that relate to
+ client-side communication issues and could affect the server's
+ ability to send the response to the client. The server MAY remove
+ any such header fields returned by the client. It SHOULD resolve any
+ conflicts between headers returned by the script and headers that it
+
+
+
+Robinson & Coar Expires 18 April 2004 [Page 26]
+\f
+INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003
+
+
+ would otherwise send itself.
+
+6.3.5 Extension Header Fields
+
+ The server may define additional implementation-specific CGI header
+ fields, whose field names SHOULD begin with "X-CGI-". It MAY ignore
+ (and delete) any unrecognised header fields with names beginning
+ "X-CGI-".
+
+6.4 Response Message-Body
+
+ The response message-body is an attached document to be returned to
+ the client by the server. The server MUST read all the data provided
+ by the script, until the script signals the end of the message-body
+ by way of an end-of-file condition. The message-body SHOULD be sent
+ unmodified to the client, except for HEAD requests or any required
+ transfer-codings, content-codings or charset conversions.
+
+ response-body = *OCTET
+
+7 System Specifications
+
+7.1 AmigaDOS
+
+ Meta-Variables
+ Meta-variables are passed to the script in identically named
+ environment variables. These are accessed by the DOS library
+ routine GetVar(). The flags argument SHOULD be 0. Case is
+ ignored, but upper case is recommended for compatibility with
+ case-sensitive systems.
+
+ The current working directory
+ The current working directory for the script is set to the
+ directory containing the script.
+
+ Character set
+ The US-ASCII character set [20] is used for the definition of
+ meta-variables, header fields and values; the newline (NL)
+ sequence is LF; servers SHOULD also accept CR LF as a newline.
+
+7.2 UNIX
+
+ For UNIX compatible operating systems, the following are defined:
+
+ Meta-Variables
+ Meta-variables are passed to the script in identically named
+ environment variables. These are accessed by the C library
+ routine getenv() or variable environ.
+
+
+
+Robinson & Coar Expires 18 April 2004 [Page 27]
+\f
+INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003
+
+
+ The command line
+ This is accessed using the the argc and argv arguments to main().
+ The words have any characters which are 'active' in the Bourne
+ shell escaped with a backslash.
+
+ The current working directory
+ The current working directory for the script SHOULD be set to the
+ directory containing the script.
+
+ Character set
+ The US-ASCII character set [20], excluding NUL, is used for the
+ definition of meta-variables, header fields and CHAR values; TEXT
+ values use ISO-8859-1. The PATH_TRANSLATED value can contain any
+ 8-bit byte except NUL. The newline (NL) sequence is LF; servers
+ should also accept CR LF as a newline.
+
+7.3 EBCDIC/POSIX
+
+ For POSIX compatible operating systems using the EBCDIC character
+ set, the following are defined:
+
+ Meta-Variables
+ Meta-variables are passed to the script in identically named
+ environment variables. These are accessed by the C library
+ routine getenv().
+
+ The command line
+ This is accessed using the the argc and argv arguments to main().
+ The words have any characters which are 'active' in the Bourne
+ shell escaped with a backslash.
+
+ The current working directory
+ The current working directory for the script SHOULD be set to the
+ directory containing the script.
+
+ Character set
+ The IBM1047 character set [19], excluding NUL, is used for the
+ definition of meta-variables, header fields, values, TEXT strings
+ and the PATH_TRANSLATED value. The newline (NL) sequence is LF;
+ servers should also accept CR LF as a newline.
+
+ media-type charset default
+ The default charset value for text (and other
+ implementation-defined) media types is IBM1047.
+
+
+
+
+
+
+
+Robinson & Coar Expires 18 April 2004 [Page 28]
+\f
+INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003
+
+
+8 Implementation
+
+8.1 Recommendations for Servers
+
+ Although the server and the CGI script need not be consistent in
+ their handling of URL paths (client URLs and the PATH_INFO data,
+ respectively), server authors may wish to impose consistency. So the
+ server implementation should specify its behaviour for the following
+ cases:
+
+ 1. define any restrictions on allowed path segments, in particular
+ whether non-terminal NULL segments are permitted;
+
+ 2. define the behaviour for "." or ".." path segments; i.e.
+ whether they are prohibited, treated as ordinary path segments
+ or interpreted in accordance with the relative URL
+ specification [3];
+
+ 3. define any limits of the implementation, including limits on
+ path or search string lengths, and limits on the volume of
+ header fields the server will parse.
+
+8.2 Recommendations for Scripts
+
+ If the script does not intend processing the PATH_INFO data, then it
+ should reject the request with 404 Not Found if PATH_INFO is not
+ NULL.
+
+ If the output of a form is being processed, check that CONTENT_TYPE
+ is "application/x-www-form-urlencoded" [15] or "multipart/form-data"
+ [13]. If CONTENT_TYPE is blank, the script can reject the request
+ with a 415 'Unsupported Media Type' error, where supported by the
+ protocol.
+
+ When parsing PATH_INFO, PATH_TRANSLATED or SCRIPT_NAME the script
+ should be careful of void path segments ("//") and special path
+ segments ("." and ".."). They should either be removed from the path
+ before use in OS system calls, or the request should be rejected with
+ 404 'Not Found'.
+
+ When returning header fields, the script should try to send the CGI
+ headers as soon as possible, and should send them before any HTTP
+ headers. This may help reduce the server's memory requirements.
+
+
+
+
+
+
+
+
+Robinson & Coar Expires 18 April 2004 [Page 29]
+\f
+INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003
+
+
+9 Security Considerations
+
+9.1 Safe Methods
+
+ As discussed in the security considerations of the HTTP
+ specifications [2], [8], the convention has been established that the
+ GET and HEAD methods should be 'safe' and 'idempotent' (repeated
+ requests have the same effect as a single request). See section 9.1
+ of RFC 2616 [8] for a full discussion.
+
+9.2 Header Fields Containing Sensitive Information
+
+ Some HTTP header fields may carry sensitive information which the
+ server should not pass on to the script unless explicitly configured
+ to do so. For example, if the server protects the script using the
+ Basic authentication scheme, then the client will send an
+ Authorization header field containing a username and password. The
+ server validates this information and so it should not pass on the
+ password via the HTTP_AUTHORIZATION meta-variable without careful
+ consideration. This also applies to the Proxy-Authorization header
+ field and the corresponding HTTP_PROXY_AUTHORIZATION meta-variable.
+
+9.3 Data Privacy
+
+ Confidential data in a request should be placed in a message-body as
+ part of a POST request, and not placed in the URI or message headers.
+ On some systems, the environment used to pass meta-variables to a
+ script may be visible to other scripts or users. In addition, many
+ existing servers, proxies and clients will permanently record the URI
+ where it might be visible to third parties.
+
+9.4 Information Security Model
+
+ For a client connection using TLS, the security model applies between
+ the client and the server, and not between the client and the script.
+ It is the server's responsibility to handle the TLS session, and thus
+ it is the server which is authenticated to the client, not the CGI
+ script.
+
+ This specification provides no mechanism for the script to
+ authenticate the server which invoked it. There is no enforced
+ integrity on the CGI request and response messages.
+
+9.5 Script Interference with the Server
+
+ The most common implementation of CGI invokes the script as a child
+ process using the same user and group as the server process. It
+ should therefore be ensured that the script cannot interfere with the
+
+
+
+Robinson & Coar Expires 18 April 2004 [Page 30]
+\f
+INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003
+
+
+ server process, its configuration, documents or log files.
+
+ If the script is executed by calling a function linked in to the
+ server software (either at compile-time or run-time) then precautions
+ should be taken to protect the core memory of the server, or to
+ ensure that untrusted code cannot be executed.
+
+9.6 Data Length and Buffering Considerations
+
+ This specification places no limits on the length of the message-body
+ presented to the script. The script should not assume that
+ statically allocated buffers of any size are sufficient to contain
+ the entire submission at one time. Use of a fixed length buffer
+ without careful overflow checking may result in an attacker
+ exploiting 'stack-smashing' or 'stack-overflow' vulnerabilities of
+ the operating system. The script may spool large submissions to disk
+ or other buffering media, but a rapid succession of large submissions
+ may result in denial of service conditions. If the CONTENT_LENGTH of
+ a message-body is larger than resource considerations allow, scripts
+ should respond with an error status appropriate for the protocol
+ version; potentially applicable status codes include 503 'Service
+ Unavailable' (HTTP/1.0 and HTTP/1.1), 413 'Request Entity Too Large'
+ (HTTP/1.1), and 414 'Request-URI Too Large' (HTTP/1.1).
+
+ Similar considerations apply to the server's handling of the CGI
+ response from the script. There is no limit on the length of the
+ header or message-body returned by the script; the server should not
+ assume that statically allocated buffers of any size are sufficient
+ to contain the entire response.
+
+9.7 Stateless Processing
+
+ The stateless nature of the Web makes each script execution and
+ resource retrieval independent of all others even when multiple
+ requests constitute a single conceptual Web transaction. Because of
+ this, a script should not make any assumptions about the context of
+ the user-agent submitting a request. In particular, scripts should
+ examine data obtained from the client and verify that they are valid,
+ both in form and content, before allowing them to be used for
+ sensitive purposes such as input to other applications, commands, or
+ operating system services. These uses include (but are not limited
+ to) system call arguments, database writes, dynamically evaluated
+ source code, and input to billing or other secure processes. It is
+ important that applications be protected from invalid input
+ regardless of whether the invalidity is the result of user error,
+ logic error, or malicious action.
+
+ Authors of scripts involved in multi-request transactions should be
+
+
+
+Robinson & Coar Expires 18 April 2004 [Page 31]
+\f
+INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003
+
+
+ particularly cautious about validating the state information;
+ undesirable effects may result from the substitution of dangerous
+ values for portions of the submission which might otherwise be
+ presumed safe. Subversion of this type occurs when alterations are
+ made to data from a prior stage of the transaction that were not
+ meant to be controlled by the client (e.g., hidden HTML form
+ elements, cookies, embedded URLs, etc.).
+
+9.8 Relative Paths
+
+ The server should be careful of ".." path segments in the request
+ URI. These should be removed or resolved in the request URI before
+ it is split into the script-path and extra-path. Alternatively, when
+ the extra-path is used to find the PATH_TRANSLATED, care should be
+ taken to avoid the path resolution from providing translated paths
+ outside an expected path hierarchy.
+
+9.9 Non-parsed Header Output
+
+ If a script returns a non-parsed header output, to be interpreted by
+ the client in its native protocol, then the script must address all
+ security considerations relating to that protocol.
+
+10 Acknowledgements
+
+ This work is based on the original CGI interface that arose out of
+ discussions on the 'www-talk' mailing list. In particular, Rob
+ McCool, John Franks, Ari Luotonen, George Phillips and Tony Sanders
+ deserve special recognition for their efforts in defining and
+ implementing the early versions of this interface.
+
+ This document has also greatly benefited from the comments and
+ suggestions made Chris Adie, Dave Kristol and Mike Meyer; also David
+ Morris, Jeremy Madea, Patrick McManus, Adam Donahue, Ross Patterson
+ and Harald Alvestrand.
+
+11 References
+
+ [1] Berners-Lee, T., 'Universal Resource Identifiers in WWW: A
+ Unifying Syntax for the Expression of Names and Addresses of
+ Objects on the Network as used in the World-Wide Web', RFC 1630,
+ CERN, June 1994.
+
+ [2] Berners-Lee, T., Fielding, R. T. and Frystyk, H., 'Hypertext
+ Transfer Protocol -- HTTP/1.0', RFC 1945, MIT/LCS, UC Irvine,
+ May 1996.
+
+ [3] Berners-Lee, T., Fielding, R. and Masinter, L., 'Uniform
+
+
+
+Robinson & Coar Expires 18 April 2004 [Page 32]
+\f
+INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003
+
+
+ Resource Identifiers (URI) : Generic Syntax', RFC 2396, MIT/LC,
+ U.C. Irvine, Xerox Corporation, August 1998.
+
+ [4] Braden, R. (Editor), 'Requirements for Internet Hosts --
+ Application and Support', STD 3, RFC 1123, IETF, October 1989.
+
+ [5] Bradner, S., 'Key words for use in RFCs to Indicate Requirements
+ Levels', BCP 14, RFC 2119, Harvard University, March 1997.
+
+ [6] Crocker, D.H., 'Standard for the Format of ARPA Internet Text
+ Messages', STD 11, RFC 822, University of Delaware, August 1982.
+
+ [7] Dierks, T. and Allen, C., 'The TLS Protocol Version 1.0', RFC
+ 2246, Certicom, January 1999.
+
+ [8] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L.,
+ Leach, P. and Berners-Lee, T., 'Hypertext Transfer Protocol --
+ HTTP/1.1', RFC 2616, UC Irving, Compaq/W3C, Compaq, W3C/MIT,
+ Xerox, Microsoft, W3C/MIT, June 1999.
+
+ [9] Franks, J., Hallam-Baker, P., Hostetler, J., Lawrence, S.,
+ Leach, P., Luotonen, A. and Stewart L., 'HTTP Authentication:
+ Basic and Digest Access Authentication', RFC 2617, Northwestern
+ University, Verisign Inc., AbiSource, Inc., Agranat Systems,
+ Inc., Microsoft Corporation, Netscape Communications
+ Corporation, Open Market, Inc., June 1999.
+
+ [10] Freed, N. and Borenstein N., 'Multipurpose Internet Mail
+ Extensions (MIME) Part Two: Media Types', RFC 2046, Innosoft,
+ First Virtual, November 1996.
+
+ [11] Hinden, R., Carpenter, B. and Masinter, L., 'Format for Literal
+ IPv6 Addresses in URL's', RFC 2732, Nokia, IBM, AT&T, December
+ 1999.
+
+ [12] Hinden R. and Deering S., 'IP Version 6 Addressing
+ Architecture', RFC 2373, Nokia, Cisco Systems, July 1998.
+
+ [13] Masinter, L., 'Returning Values from Forms:
+ multipart/form-data', RFC 2388, Xerox Corporation, August 1998.
+
+ [14] Mockapetris, P., 'Domain Names - Concepts and Facilities', STD
+ 13, RFC 1034, ISI, November 1987.
+
+ [15] Raggett, D., Le Hors, A. and Jacobs, I. (eds), 'HTML 4.01
+ Specification', W3C Recommendation December 1999,
+ http://www.w3.org/TR/html401/.
+
+ [16] Rescola, E. 'HTTP Over TLS', RFC 2818, RTFM, May 2000.
+
+
+Robinson & Coar Expires 18 April 2004 [Page 33]
+\f
+INTERNET-DRAFT Common Gateway Interface -- 1.1 19 October 2003
+
+
+ [17] St. Johns, M., 'Identification Protocol', RFC 1413, US
+ Department of Defense, February 1993.
+
+ [18] 'HTTP Status Code Registry',
+ http://www.iana.org/assignments/http-status-codes, IANA.
+
+ [19] IBM National Language Support Reference Manual Volume 2,
+ SE09-8002-01, March 1990.
+
+ [20] 'Information Systems -- Coded Character Sets -- 7-bit American
+ Standard Code for Information Interchange (7-Bit ASCII)', ANSI
+ INCITS.4-1986 (R2002).
+
+ [21] 'Information technology -- 8-bit single-byte coded graphic
+ character sets -- Part 1: Latin alphabet No. 1', ISO/IEC
+ 8859-1:1998.
+
+ [22] 'The Common Gateway Interface',
+ http://hoohoo.ncsa.uiuc.edu/cgi/, NCSA, University of Illinois.
+
+
+12 Authors' Addresses
+
+ David Robinson
+ Apache Software Foundation
+ Email: drtr@apache.org
+
+ Ken A. L. Coar
+ MeepZor Consulting
+ 7824 Mayfaire Crest Lane, Suite 202
+ Raleigh, NC 27615-4875
+ USA
+ Tel: +1 (919) 254 4237
+ Fax: +1 (919) 254 5420
+ Email: Ken.Coar@Golux.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Robinson & Coar Expires 18 April 2004 [Page 34]
+\f
--- /dev/null
+
+
+
+
+
+
+Network Working Group J. Elson
+Request for Comments: 3507 A. Cerpa
+Category: Informational UCLA
+ April 2003
+
+
+ Internet Content Adaptation Protocol (ICAP)
+
+Status of this Memo
+
+ This memo provides information for the Internet community. It does
+ not specify an Internet standard of any kind. Distribution of this
+ memo is unlimited.
+
+Copyright Notice
+
+ Copyright (C) The Internet Society (2003). All Rights Reserved.
+
+IESG Note
+
+ The Open Pluggable Services (OPES) working group has been chartered
+ to produce a standards track protocol specification for a protocol
+ intended to perform the same of functions as ICAP. However, since
+ ICAP is already in widespread use the IESG believes it is appropriate
+ to document existing usage by publishing the ICAP specification as an
+ informational document. The IESG also notes that ICAP was developed
+ before the publication of RFC 3238 and therefore does not address the
+ architectural and policy issues described in that document.
+
+Abstract
+
+ ICAP, the Internet Content Adaption Protocol, is a protocol aimed at
+ providing simple object-based content vectoring for HTTP services.
+ ICAP is, in essence, a lightweight protocol for executing a "remote
+ procedure call" on HTTP messages. It allows ICAP clients to pass
+ HTTP messages to ICAP servers for some sort of transformation or
+ other processing ("adaptation"). The server executes its
+ transformation service on messages and sends back responses to the
+ client, usually with modified messages. Typically, the adapted
+ messages are either HTTP requests or HTTP responses.
+
+
+
+
+
+
+
+
+
+
+
+Elson & Cerpa Informational [Page 1]
+\f
+RFC 3507 ICAP April 2003
+
+
+Table of Contents
+
+ 1. Introduction............................................3
+ 2. Terminology.............................................5
+ 3. ICAP Overall Operation..................................8
+ 3.1 Request Modification..............................8
+ 3.2 Response Modification............................10
+ 4. Protocol Semantics.....................................11
+ 4.1 General Operation................................11
+ 4.2 ICAP URIs........................................11
+ 4.3 ICAP Headers.....................................12
+ 4.3.1 Headers Common to Requests and
+ Responses................................12
+ 4.3.2 Request Headers..........................13
+ 4.3.3 Response Headers.........................14
+ 4.3.4 ICAP-Related Headers in HTTP
+ Messages.................................15
+ 4.4 ICAP Bodies: Encapsulation of HTTP
+ Messages.........................................16
+ 4.4.1 Expected Encapsulated Sections...........16
+ 4.4.2 Encapsulated HTTP Headers................18
+ 4.5 Message Preview..................................18
+ 4.6 "204 No Content" Responses outside of
+ Previews.........................................22
+ 4.7 ISTag Response Header............................22
+ 4.8 Request Modification Mode........................23
+ 4.8.1 Request..................................23
+ 4.8.2 Response.................................24
+ 4.8.3 Examples.................................24
+ 4.9 Response Modification Mode.......................27
+ 4.9.1 Request..................................27
+ 4.9.2 Response.................................27
+ 4.9.3 Examples.................................28
+ 4.10 OPTIONS Method...................................29
+ 4.10.1 OPTIONS request..........................29
+ 4.10.2 OPTIONS response.........................30
+ 4.10.3 OPTIONS examples.........................33
+ 5. Caching................................................33
+ 6. Implementation Notes...................................34
+ 6.1 Vectoring Points.................................34
+ 6.2 Application Level Errors.........................35
+ 6.3 Use of Chunked Transfer-Encoding.................37
+ 6.4 Distinct URIs for Distinct Services..............37
+ 7. Security Considerations................................37
+ 7.1 Authentication...................................37
+ 7.2 Encryption.......................................38
+ 7.3 Service Validation...............................38
+ 8. Motivations and Design Alternatives....................39
+
+
+
+Elson & Cerpa Informational [Page 2]
+\f
+RFC 3507 ICAP April 2003
+
+
+ 8.1 To Be HTTP, or Not to Be.........................39
+ 8.2 Mandatory Use of Chunking........................39
+ 8.3 Use of the null-body directive in the
+ Encapsulated header..............................40
+ 9. References.............................................40
+ 10. Contributors...........................................41
+ Appendix A BNF Grammar for ICAP Messages..................45
+ Authors' Addresses..........................................48
+ Full Copyright Statement....................................49
+
+1. Introduction
+
+ As the Internet grows, so does the need for scalable Internet
+ services. Popular web servers are asked to deliver content to
+ hundreds of millions of users connected at ever-increasing
+ bandwidths. The model of centralized, monolithic servers that are
+ responsible for all aspects of every client's request seems to be
+ reaching the end of its useful life.
+
+ To keep up with the growth in the number of clients, there has been a
+ move towards architectures that scale better through the use of
+ replication, distribution, and caching. On the content provider
+ side, replication and load-balancing techniques allow the burden of
+ client requests to be spread out over a myriad of servers. Content
+ providers have also begun to deploy geographically diverse content
+ distribution networks that bring origin-servers closer to the "edge"
+ of the network where clients are attached. These networks of
+ distributed origin-servers or "surrogates" allow the content provider
+ to distribute their content whilst retaining control over the
+ integrity of that content. The distributed nature of this type of
+ deployment and the proximity of a given surrogate to the end-user
+ enables the content provider to offer additional services to a user
+ which might be based, for example, on geography where this would have
+ been difficult with a single, centralized service.
+
+ ICAP, the Internet Content Adaption Protocol, is a protocol aimed at
+ providing simple object-based content vectoring for HTTP services.
+ ICAP is, in essence, a lightweight protocol for executing a "remote
+ procedure call" on HTTP messages. It allows ICAP clients to pass
+ HTTP messages to ICAP servers for some sort of transformation or
+ other processing ("adaptation"). The server executes its
+ transformation service on messages and sends back responses to the
+ client, usually with modified messages. The adapted messages may be
+ either HTTP requests or HTTP responses. Though transformations may
+ be possible on other non-HTTP content, they are beyond the scope of
+ this document.
+
+
+
+
+
+Elson & Cerpa Informational [Page 3]
+\f
+RFC 3507 ICAP April 2003
+
+
+ This type of Remote Procedure Call (RPC) is useful in a number of
+ ways. For example:
+
+ o Simple transformations of content can be performed near the edge
+ of the network instead of requiring an updated copy of an object
+ from an origin server. For example, a content provider might want
+ to provide a popular web page with a different advertisement every
+ time the page is viewed. Currently, content providers implement
+ this policy by marking such pages as non-cachable and tracking
+ user cookies. This imposes additional load on the origin server
+ and the network. In our architecture, the page could be cached
+ once near the edges of the network. These edge caches can then
+ use an ICAP call to a nearby ad-insertion server every time the
+ page is served to a client.
+
+ Other such transformations by edge servers are possible, either
+ with cooperation from the content provider (as in a content
+ distribution network), or as a value-added service provided by a
+ client's network provider (as in a surrogate). Examples of these
+ kinds of transformations are translation of web pages to different
+ human languages or to different formats that are appropriate for
+ special physical devices (e.g., PDA-based or cell-phone-based
+ browsers).
+
+ o Surrogates or origin servers can avoid performing expensive
+ operations by shipping the work off to other servers instead.
+ This helps distribute load across multiple machines. For example,
+ consider a user attempting to download an executable program via a
+ surrogate (e.g., a caching proxy). The surrogate, acting as an
+ ICAP client, can ask an external server to check the executable
+ for viruses before accepting it into its cache.
+
+ o Firewalls or surrogates can act as ICAP clients and send outgoing
+ requests to a service that checks to make sure the URI in the
+ request is allowed (for example, in a system that allows parental
+ control of web content viewed by children). In this case, it is a
+ *request* that is being adapted, not an object returned by a
+ response.
+
+ In all of these examples, ICAP is helping to reduce or distribute the
+ load on origin servers, surrogates, or the network itself. In some
+ cases, ICAP facilitates transformations near the edge of the network,
+ allowing greater cachability of the underlying content. In other
+ examples, devices such as origin servers or surrogates are able to
+ reduce their load by distributing expensive operations onto other
+ machines. In all cases, ICAP has also created a standard interface
+ for content adaptation to allow greater flexibility in content
+ distribution or the addition of value added services in surrogates.
+
+
+
+Elson & Cerpa Informational [Page 4]
+\f
+RFC 3507 ICAP April 2003
+
+
+ There are two major components in our architecture:
+
+ 1. Transaction semantics -- "How do I ask for adaptation?"
+
+ 2. Control of policy -- "When am I supposed to ask for adaptation,
+ what kind of adaptation do I ask for, and from where?"
+
+ Currently, ICAP defines only the transaction semantics. For example,
+ this document specifies how to send an HTTP message from an ICAP
+ client to an ICAP server, specify the URI of the ICAP resource
+ requested along with other resource-specific parameters, and receive
+ the adapted message.
+
+ Although a necessary building-block, this wire-protocol defined by
+ ICAP is of limited use without the second part: an accompanying
+ application framework in which it operates. The more difficult
+ policy issue is beyond the scope of the current ICAP protocol, but is
+ planned in future work.
+
+ In initial implementations, we expect that implementation-specific
+ manual configuration will be used to define policy. This includes
+ the rules for recognizing messages that require adaptation, the URIs
+ of available adaptation resources, and so on. For ICAP clients and
+ servers to interoperate, the exact method used to define policy need
+ not be consistent across implementations, as long as the policy
+ itself is consistent.
+
+ IMPORTANT:
+ Note that at this time, in the absence of a policy-framework, it
+ is strongly RECOMMENDED that transformations SHOULD only be
+ performed on messages with the explicit consent of either the
+ content-provider or the user (or both). Deployment of
+ transformation services without the consent of either leads to, at
+ best, unpredictable results. For more discussion of these issues,
+ see Section 7.
+
+ Once the full extent of the typical policy decisions are more fully
+ understood through experience with these initial implementations,
+ later follow-ons to this architecture may define an additional policy
+ control protocol. This future protocol may allow a standard policy
+ definition interface complementary to the ICAP transaction interface
+ defined here.
+
+2. Terminology
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+ document are to be interpreted as described in BCP 14, RFC 2119 [2].
+
+
+
+Elson & Cerpa Informational [Page 5]
+\f
+RFC 3507 ICAP April 2003
+
+
+ The special terminology used in this document is defined below. The
+ majority of these terms are taken as-is from HTTP/1.1 [4] and are
+ reproduced here for reference. A thorough understanding of HTTP/1.1
+ is assumed on the part of the reader.
+
+ connection:
+ A transport layer virtual circuit established between two programs
+ for the purpose of communication.
+
+ message:
+ The basic unit of HTTP communication, consisting of a structured
+ sequence of octets matching the syntax defined in Section 4 of
+ HTTP/1.1 [4] and transmitted via the connection.
+
+ request:
+ An HTTP request message, as defined in Section 5 of HTTP/1.1 [4].
+
+ response:
+ An HTTP response message, as defined in Section 6 of HTTP/1.1 [4].
+
+ resource:
+ A network data object or service that can be identified by a URI,
+ as defined in Section 3.2 of HTTP/1.1 [4]. Resources may be
+ available in multiple representations (e.g., multiple languages,
+ data formats, size, resolutions) or vary in other ways.
+
+ client:
+ A program that establishes connections for the purpose of sending
+ requests.
+
+ server:
+ An application program that accepts connections in order to
+ service requests by sending back responses. Any given program may
+ be capable of being both a client and a server; our use of these
+ terms refers only to the role being performed by the program for a
+ particular connection, rather than to the program's capabilities
+ in general. Likewise, any server may act as an origin server,
+ surrogate, gateway, or tunnel, switching behavior based on the
+ nature of each request.
+
+ origin server:
+ The server on which a given resource resides or is to be created.
+
+
+
+
+
+
+
+
+
+Elson & Cerpa Informational [Page 6]
+\f
+RFC 3507 ICAP April 2003
+
+
+ proxy:
+ An intermediary program which acts as both a server and a client
+ for the purpose of making requests on behalf of other clients.
+ Requests are serviced internally or by passing them on, with
+ possible translation, to other servers. A proxy MUST implement
+ both the client and server requirements of this specification.
+
+ cache:
+ A program's local store of response messages and the subsystem
+ that controls its message storage, retrieval, and deletion. A
+ cache stores cachable responses in order to reduce the response
+ time and network bandwidth consumption on future, equivalent
+ requests. Any client or server may include a cache, though a
+ cache cannot be used by a server that is acting as a tunnel.
+
+ cachable:
+ A response is cachable if a cache is allowed to store a copy of
+ the response message for use in answering subsequent requests.
+ The rules for determining the cachability of HTTP responses are
+ defined in Section 13 of [4]. Even if a resource is cachable,
+ there may be additional constraints on whether a cache can use the
+ cached copy for a particular request.
+
+ surrogate:
+ A gateway co-located with an origin server, or at a different
+ point in the network, delegated the authority to operate on behalf
+ of, and typically working in close co-operation with, one or more
+ origin servers. Responses are typically delivered from an
+ internal cache. Surrogates may derive cache entries from the
+ origin server or from another of the origin server's delegates.
+ In some cases a surrogate may tunnel such requests.
+
+ Where close co-operation between origin servers and surrogates
+ exists, this enables modifications of some protocol requirements,
+ including the Cache-Control directives in [4]. Such modifications
+ have yet to be fully specified.
+
+ Devices commonly known as "reverse proxies" and "(origin) server
+ accelerators" are both more properly defined as surrogates.
+
+ New definitions:
+
+ ICAP resource:
+ Similar to an HTTP resource as described above, but the URI refers
+ to an ICAP service that performs adaptations of HTTP messages.
+
+
+
+
+
+
+Elson & Cerpa Informational [Page 7]
+\f
+RFC 3507 ICAP April 2003
+
+
+ ICAP server:
+ Similar to an HTTP server as described above, except that the
+ application services ICAP requests.
+
+ ICAP client:
+ A program that establishes connections to ICAP servers for the
+ purpose of sending requests. An ICAP client is often, but not
+ always, a surrogate acting on behalf of a user.
+
+3. ICAP Overall Operation
+
+ Before describing ICAP's semantics in detail, we will first give a
+ general overview of the protocol's major functions and expected uses.
+ As described earlier, ICAP focuses on modification of HTTP requests
+ (Section 3.1), and modification of HTTP responses (Section 3.2).
+
+3.1 Request Modification
+
+ In "request modification" (reqmod) mode, an ICAP client sends an HTTP
+ request to an ICAP server. The ICAP server may then:
+
+ 1) Send back a modified version of the request. The ICAP client may
+ then perform the modified request by contacting an origin server;
+ or, pipeline the modified request to another ICAP server for
+ further modification.
+
+ 2) Send back an HTTP response to the request. This is used to
+ provide information useful to the user in case of an error (e.g.,
+ "you sent a request to view a page you are not allowed to see").
+
+ 3) Return an error.
+
+ ICAP clients MUST be able to handle all three types of responses.
+ However, in line with the guidance provided for HTTP surrogates in
+ Section 13.8 of [4], ICAP client implementors do have flexibility in
+ handling errors. If the ICAP server returns an error, the ICAP
+ client may (for example) return the error to the user, execute the
+ unadapted request as it arrived from the client, or re-try the
+ adaptation again.
+
+ We will illustrate this method with an example application: content
+ filtering. Consider a surrogate that receives a request from a
+ client for a web page on an origin server. The surrogate, acting as
+ an ICAP client, sends the client's request to an ICAP server that
+ performs URI-based content filtering. If access to the requested URI
+ is allowed, the request is returned to the ICAP client unmodified.
+ However, if the ICAP server chooses to disallow access to the
+ requested resources, it may either:
+
+
+
+Elson & Cerpa Informational [Page 8]
+\f
+RFC 3507 ICAP April 2003
+
+
+ 1) Modify the request so that it points to a page containing an error
+ message instead of the original URI.
+
+ 2) Return an encapsulated HTTP response that indicates an HTTP error.
+
+ This method can be used for a variety of other applications; for
+ example, anonymization, modification of the Accept: headers to handle
+ special device requirements, and so forth.
+
+ Typical data flow:
+
+ origin-server
+ | /|\
+ | |
+ 5 | | 4
+ | |
+ \|/ | 2
+ ICAP-client --------------> ICAP-resource
+ (surrogate) <-------------- on ICAP-server
+ | /|\ 3
+ | |
+ 6 | | 1
+ | |
+ \|/ |
+ client
+
+ 1. A client makes a request to a ICAP-capable surrogate (ICAP client)
+ for an object on an origin server.
+
+ 2. The surrogate sends the request to the ICAP server.
+
+ 3. The ICAP server executes the ICAP resource's service on the
+ request and sends the possibly modified request, or a response to
+ the request back to the ICAP client.
+
+ If Step 3 returned a request:
+
+ 4. The surrogate sends the request, possibly different from original
+ client request, to the origin server.
+
+ 5. The origin server responds to request.
+
+ 6. The surrogate sends the reply (from either the ICAP server or the
+ origin server) to the client.
+
+
+
+
+
+
+
+Elson & Cerpa Informational [Page 9]
+\f
+RFC 3507 ICAP April 2003
+
+
+3.2 Response Modification
+
+ In the "response modification" (respmod) mode, an ICAP client sends
+ an HTTP response to an ICAP server. (The response sent by the ICAP
+ client typically has been generated by an origin server.) The ICAP
+ server may then:
+
+ 1) Send back a modified version of the response.
+
+ 2) Return an error.
+
+ The response modification method is intended for post-processing
+ performed on an HTTP response before it is delivered to a client.
+ Examples include formatting HTML for display on special devices,
+ human language translation, virus checking, and so forth.
+
+ Typical data flow:
+
+ origin-server
+ | /|\
+ | |
+ 3 | | 2
+ | |
+ \|/ | 4
+ ICAP-client --------------> ICAP-resource
+ (surrogate) <-------------- on ICAP-server
+ | /|\ 5
+ | |
+ 6 | | 1
+ | |
+ \|/ |
+ client
+
+ 1. A client makes a request to a ICAP-capable surrogate (ICAP client)
+ for an object on an origin server.
+
+ 2. The surrogate sends the request to the origin server.
+
+ 3. The origin server responds to request.
+
+ 4. The ICAP-capable surrogate sends the origin server's reply to the
+ ICAP server.
+
+ 5. The ICAP server executes the ICAP resource's service on the origin
+ server's reply and sends the possibly modified reply back to the
+ ICAP client.
+
+
+
+
+
+Elson & Cerpa Informational [Page 10]
+\f
+RFC 3507 ICAP April 2003
+
+
+ 6. The surrogate sends the reply, possibly modified from the original
+ origin server's reply, to the client.
+
+4. Protocol Semantics
+
+4.1 General Operation
+
+ ICAP is a request/response protocol similar in semantics and usage to
+ HTTP/1.1 [4]. Despite the similarity, ICAP is not HTTP, nor is it an
+ application protocol that runs over HTTP. This means, for example,
+ that ICAP messages can not be forwarded by HTTP surrogates. Our
+ reasons for not building directly on top of HTTP are discussed in
+ Section 8.1.
+
+ ICAP uses TCP/IP as a transport protocol. The default port is 1344,
+ but other ports may be used. The TCP flow is initiated by the ICAP
+ client to a passively listening ICAP server.
+
+ ICAP messages consist of requests from client to server and responses
+ from server to client. Requests and responses use the generic
+ message format of RFC 2822 [3] -- that is, a start-line (either a
+ request line or a status line), a number of header fields (also known
+ as "headers"), an empty line (i.e., a line with nothing preceding the
+ CRLF) indicating the end of the header fields, and a message-body.
+
+ The header lines of an ICAP message specify the ICAP resource being
+ requested as well as other meta-data such as cache control
+ information. The message body of an ICAP request contains the
+ (encapsulated) HTTP messages that are being modified.
+
+ As in HTTP/1.1, a single transport connection MAY (perhaps even
+ SHOULD) be re-used for multiple request/response pairs. The rules
+ for doing so in ICAP are the same as described in Section 8.1.2.2 of
+ [4]. Specifically, requests are matched up with responses by
+ allowing only one outstanding request on a transport connection at a
+ time. Multiple parallel connections MAY be used as in HTTP.
+
+4.2 ICAP URIs
+
+ All ICAP requests specify the ICAP resource being requested from the
+ server using an ICAP URI. This MUST be an absolute URI that
+ specifies both the complete hostname and the path of the resource
+ being requested. For definitive information on URL syntax and
+ semantics, see "Uniform Resource Identifiers (URI): Generic Syntax
+ and Semantics," RFC 2396 [1], Section 3. The URI structure defined
+ by ICAP is roughly:
+
+
+
+
+
+Elson & Cerpa Informational [Page 11]
+\f
+RFC 3507 ICAP April 2003
+
+
+ ICAP_URI = Scheme ":" Net_Path [ "?" Query ]
+
+ Scheme = "icap"
+
+ Net_Path = "//" Authority [ Abs_Path ]
+
+ Authority = [ userinfo "@" ] host [ ":" port ]
+
+ ICAP adds the new scheme "icap" to the ones defined in RFC 2396. If
+ the port is empty or not given, port 1344 is assumed. An example
+ ICAP URI line might look like this:
+
+ icap://icap.example.net:2000/services/icap-service-1
+
+ An ICAP server MUST be able to recognize all of its hosts names,
+ including any aliases, local variations, and numeric IP addresses of
+ its interfaces.
+
+ Any arguments that an ICAP client wishes to pass to an ICAP service
+ to modify the nature of the service MAY be passed as part of the
+ ICAP-URI, using the standard "?"-encoding of attribute-value pairs
+ used in HTTP. For example:
+
+ icap://icap.net/service?mode=translate&lang=french
+
+4.3 ICAP Headers
+
+ The following sections define the valid headers for ICAP messages.
+ Section 4.3.1 describes headers common to both requests and
+ responses. Request-specific and response-specific headers are
+ described in Sections 4.3.2 and 4.3.3, respectively.
+
+ User-defined header extensions are allowed. In compliance with the
+ precedent established by the Internet mail format [3] and later
+ adopted by HTTP [4], all user-defined headers MUST follow the "X-"
+ naming convention ("X-Extension-Header: Foo"). ICAP implementations
+ MAY ignore any "X-" headers without loss of compliance with the
+ protocol as defined in this document.
+
+ Each header field consists of a name followed by a colon (":") and
+ the field value. Field names are case-insensitive. ICAP follows the
+ rules describe in section 4.2 of [4].
+
+4.3.1 Headers Common to Requests and Responses
+
+ The headers of all ICAP messages MAY include the following
+ directives, defined in ICAP the same as they are in HTTP:
+
+
+
+
+Elson & Cerpa Informational [Page 12]
+\f
+RFC 3507 ICAP April 2003
+
+
+ Cache-Control
+ Connection
+ Date
+ Expires
+ Pragma
+ Trailer
+ Upgrade
+
+ Note in particular that the "Transfer-Encoding" option is not
+ allowed. The special transfer-encoding requirements of ICAP bodies
+ are described in Section 4.4.
+
+ The Upgrade header MAY be used to negotiate Transport-Layer Security
+ on an ICAP connection, exactly as described for HTTP/1.1 in [4].
+
+ The ICAP-specific headers defined are:
+
+ Encapsulated (See Section 4.4)
+
+4.3.2 Request Headers
+
+ Similar to HTTP, ICAP requests MUST start with a request line that
+ contains a method, the complete URI of the ICAP resource being
+ requested, and an ICAP version string. The current version number of
+ ICAP is "1.0".
+
+ This version of ICAP defines three methods:
+
+ REQMOD - for Request Modification (Section 4.8)
+ RESPMOD - for Response Modification (Section 4.9)
+ OPTIONS - to learn about configuration (Section 4.10)
+
+ The OPTIONS method MUST be implemented by all ICAP servers. All
+ other methods are optional and MAY be implemented.
+
+ User-defined extension methods are allowed. Before attempting to use
+ an extension method, an ICAP client SHOULD use the OPTIONS method to
+ query the ICAP server's list of supported methods; see Section 4.10.
+ (If an ICAP server receives a request for an unknown method, it MUST
+ give a 501 error response as described in the next section.)
+
+ Given the URI rules described in Section 4.2, a well-formed ICAP
+ request line looks like the following example:
+
+ RESPMOD icap://icap.example.net/translate?mode=french ICAP/1.0
+
+
+
+
+
+
+Elson & Cerpa Informational [Page 13]
+\f
+RFC 3507 ICAP April 2003
+
+
+ A number of request-specific headers are allowed in ICAP requests,
+ following the same semantics as the corresponding HTTP request
+ headers (Section 5.3 of [4]). These are:
+
+ Authorization
+ Allow (see Section 4.6)
+ From (see Section 14.22 of [4])
+ Host (REQUIRED in ICAP as it is in HTTP/1.1)
+ Referer (see Section 14.36 of [4])
+ User-Agent
+
+ In addition to HTTP-like headers, there are also request headers
+ unique to ICAP defined:
+
+ Preview (see Section 4.5)
+
+4.3.3 Response Headers
+
+ ICAP responses MUST start with an ICAP status line, similar in form
+ to that used by HTTP, including the ICAP version and a status code.
+ For example:
+
+ ICAP/1.0 200 OK
+
+ Semantics of ICAP status codes in ICAP match the status codes defined
+ by HTTP (Section 6.1.1 and 10 of [4]), except where otherwise
+ indicated in this document; n.b. 100 (Section 4.5) and 204 (Section
+ 4.6).
+
+ ICAP error codes that differ from their HTTP counterparts are:
+
+ 100 - Continue after ICAP Preview (Section 4.5).
+
+ 204 - No modifications needed (Section 4.6).
+
+ 400 - Bad request.
+
+ 404 - ICAP Service not found.
+
+ 405 - Method not allowed for service (e.g., RESPMOD requested for
+ service that supports only REQMOD).
+
+ 408 - Request timeout. ICAP server gave up waiting for a request
+ from an ICAP client.
+
+ 500 - Server error. Error on the ICAP server, such as "out of disk
+ space".
+
+
+
+
+Elson & Cerpa Informational [Page 14]
+\f
+RFC 3507 ICAP April 2003
+
+
+ 501 - Method not implemented. This response is illegal for an
+ OPTIONS request since implementation of OPTIONS is mandatory.
+
+ 502 - Bad Gateway. This is an ICAP proxy and proxying produced an
+ error.
+
+ 503 - Service overloaded. The ICAP server has exceeded a maximum
+ connection limit associated with this service; the ICAP client
+ should not exceed this limit in the future.
+
+ 505 - ICAP version not supported by server.
+
+ As in HTTP, the 4xx class of error codes indicate client errors, and
+ the 5xx class indicate server errors.
+
+ ICAP's response-header fields allow the server to pass additional
+ information in the response that cannot be placed in the ICAP's
+ status line.
+
+ A response-specific header is allowed in ICAP requests, following the
+ same semantics as the corresponding HTTP response headers (Section
+ 6.2 of [4]). This is:
+
+ Server (see Section 14.38 of [4])
+
+ In addition to HTTP-like headers, there is also a response header
+ unique to ICAP defined:
+
+ ISTag (see Section 4.7)
+
+4.3.4 ICAP-Related Headers in HTTP Messages
+
+ When an ICAP-enabled HTTP surrogate makes an HTTP request to an
+ origin server, it is often useful to advise the origin server of the
+ surrogate's ICAP capabilities. Origin servers can use this
+ information to modify its response accordingly. For example, an
+ origin server may choose not to insert an advertisement into a page
+ if it knows that a downstream ICAP server can insert the ad instead.
+
+ Although this ICAP specification can not mandate how HTTP is used in
+ communication between HTTP clients and servers, we do suggest a
+ convention: such headers (if used) SHOULD start with "X-ICAP". HTTP
+ clients with ICAP services SHOULD minimally include an "X-ICAP-
+ Version: 1.0" header along with their application-specific headers.
+
+
+
+
+
+
+
+Elson & Cerpa Informational [Page 15]
+\f
+RFC 3507 ICAP April 2003
+
+
+4.4 ICAP Bodies: Encapsulation of HTTP Messages
+
+ The ICAP encapsulation model is a lightweight means of packaging any
+ number of HTTP message sections into an encapsulating ICAP message-
+ body, in order to allow the vectoring of requests, responses, and
+ request/response pairs to an ICAP server.
+
+ This is accomplished by concatenating interesting message parts
+ (encapsulatED sections) into a single ICAP message-body (the
+ encapsulatING message). The encapsulated sections may be the headers
+ or bodies of HTTP messages.
+
+ Encapsulated bodies MUST be transferred using the "chunked"
+ transfer-coding described in Section 3.6.1 of [4]. However,
+ encapsulated headers MUST NOT be chunked. In other words, an ICAP
+ message-body switches from being non-chunked to chunked as the body
+ passes from the encapsulated header to encapsulated body section.
+ (See Examples in Sections 4.8.3 and 4.9.3.). The motivation behind
+ this decision is described in Section 8.2.
+
+4.4.1 The "Encapsulated" Header
+
+ The offset of each encapsulated section's start relative to the start
+ of the encapsulating message's body is noted using the "Encapsulated"
+ header. This header MUST be included in every ICAP message. For
+ example, the header
+
+ Encapsulated: req-hdr=0, res-hdr=45, res-body=100
+
+ indicates a message that encapsulates a group of request headers, a
+ group of response headers, and then a response body. Each of these
+ is included at the byte-offsets listed. The byte-offsets are in
+ decimal notation for consistency with HTTP's Content-Length header.
+
+ The special entity "null-body" indicates there is no encapsulated
+ body in the ICAP message.
+
+ The syntax of an Encapsulated header is:
+
+ encapsulated_header: "Encapsulated: " encapsulated_list
+ encapsulated_list: encapsulated_entity |
+ encapsulated_entity ", " encapsulated_list
+ encapsulated_entity: reqhdr | reshdr | reqbody | resbody | optbody
+ reqhdr = "req-hdr" "=" (decimal integer)
+ reshdr = "res-hdr" "=" (decimal integer)
+ reqbody = { "req-body" | "null-body" } "=" (decimal integer)
+ resbody = { "res-body" | "null-body" } "=" (decimal integer)
+ optbody = { "opt-body" | "null-body" } "=" (decimal integer)
+
+
+
+Elson & Cerpa Informational [Page 16]
+\f
+RFC 3507 ICAP April 2003
+
+
+ There are semantic restrictions on Encapsulated headers beyond the
+ syntactic restrictions. The order in which the encapsulated parts
+ appear in the encapsulating message-body MUST be the same as the
+ order in which the parts are named in the Encapsulated header. In
+ other words, the offsets listed in the Encapsulated line MUST be
+ monotonically increasing. In addition, the legal forms of the
+ Encapsulated header depend on the method being used (REQMOD, RESPMOD,
+ or OPTIONS). Specifically:
+
+ REQMOD request encapsulated_list: [reqhdr] reqbody
+ REQMOD response encapsulated_list: {[reqhdr] reqbody} |
+ {[reshdr] resbody}
+ RESPMOD request encapsulated_list: [reqhdr] [reshdr] resbody
+ RESPMOD response encapsulated_list: [reshdr] resbody
+ OPTIONS response encapsulated_list: optbody
+
+ In the above grammar, note that encapsulated headers are always
+ optional. At most one body per encapsulated message is allowed. If
+ no encapsulated body is presented, the "null-body" header is used
+ instead; this is useful because it indicates the length of the header
+ section.
+
+ Examples of legal Encapsulated headers:
+
+ /* REQMOD request: This encapsulated HTTP request's headers start
+ * at offset 0; the HTTP request body (e.g., in a POST) starts
+ * at 412. */
+ Encapsulated: req-hdr=0, req-body=412
+
+ /* REQMOD request: Similar to the above, but no request body is
+ * present (e.g., a GET). We use the null-body directive instead.
+ * In both this case and the previous one, we can tell from the
+ * Encapsulated header that the request headers were 412 bytes
+ * long. */
+ Encapsulated: req-hdr=0, null-body=412
+
+ /* REQMOD response: ICAP server returned a modified request,
+ * with body */
+ Encapsulated: req-hdr=0, req-body=512
+
+ /* RESPMOD request: Request headers at 0, response headers at 822,
+ * response body at 1655. Note that no request body is allowed in
+ * RESPMOD requests. */
+ Encapsulated: req-hdr=0, res-hdr=822, res-body=1655
+
+ /* RESPMOD or REQMOD response: header and body returned */
+ Encapsulated: res-hdr=0, res-body=749
+
+
+
+
+Elson & Cerpa Informational [Page 17]
+\f
+RFC 3507 ICAP April 2003
+
+
+ /* OPTIONS response when there IS an options body */
+ Encapsulated: opt-body=0
+
+ /* OPTIONS response when there IS NOT an options body */
+ Encapsulated: null-body=0
+
+4.4.2 Encapsulated HTTP Headers
+
+ By default, ICAP messages may encapsulate HTTP message headers and
+ entity bodies. HTTP headers MUST start with the request-line or
+ status-line for requests and responses, respectively, followed by
+ interesting HTTP headers.
+
+ The encapsulated headers MUST be terminated by a blank line, in order
+ to make them human readable, and in order to terminate line-by-line
+ HTTP parsers.
+
+ HTTP/1.1 makes a distinction between end-to-end headers and hop-by-
+ hop headers (see Section 13.5.1 of [4]). End-to-end headers are
+ meaningful to the ultimate recipient of a message, whereas hop-by-hop
+ headers are meaningful only for a single transport-layer connection.
+ Hop-by-hop headers include Connection, Keep-Alive, and so forth. All
+ end-to-end HTTP headers SHOULD be encapsulated, and all hop-by-hop
+ headers MUST NOT be encapsulated.
+
+ Despite the above restrictions on encapsulation, the hop-by-hop
+ Proxy-Authenticate and Proxy-Authorization headers MUST be forwarded
+ to the ICAP server in the ICAP header section (not the encapsulated
+ message). This allows propagation of client credentials that might
+ have been sent to the ICAP client in cases where the ICAP client is
+ also an HTTP surrogate. Note that this does not contradict HTTP/1.1,
+ which explicitly states "A proxy MAY relay the credentials from the
+ client request to the next proxy if that is the mechanism by which
+ the proxies cooperatively authenticate a given request." (Section
+ 14.34).
+
+ The Via header of an encapsulated message SHOULD be modified by an
+ ICAP server as if the encapsulated message were traveling through an
+ HTTP surrogate. The Via header added by an ICAP server MUST specify
+ protocol as ICAP/1.0.
+
+4.5 Message Preview
+
+ ICAP REQMOD or RESPMOD requests sent by the ICAP client to the ICAP
+ server may include a "preview". This feature allows an ICAP server
+ to see the beginning of a transaction, then decide if it wants to
+
+
+
+
+
+Elson & Cerpa Informational [Page 18]
+\f
+RFC 3507 ICAP April 2003
+
+
+ opt-out of the transaction early instead of receiving the remainder
+ of the request message. Previewing can yield significant performance
+ improvements in a variety of situations, such as the following:
+
+ - Virus-checkers can certify a large fraction of files as "clean"
+ just by looking at the file type, file name extension, and the
+ first few bytes of the file. Only the remaining files need to be
+ transmitted to the virus-checking ICAP server in their entirety.
+
+ - Content filters can use Preview to decide if an HTTP entity needs
+ to be inspected (the HTTP file type alone is not enough in cases
+ where "text" actually turns out to be graphics data). The magic
+ numbers at the front of the file can identify a file as a JPEG or
+ GIF.
+
+ - If an ICAP server wants to transcode all GIF87 files into GIF89
+ files, then the GIF87 files could quickly be detected by looking
+ at the first few body bytes of the file.
+
+ - If an ICAP server wants to force all cacheable files to expire in
+ 24 hours or less, then this could be implemented by selecting HTTP
+ messages with expiries more than 24 hours in the future.
+
+ ICAP servers SHOULD use the OPTIONS method (see Section 4.10) to
+ specify how many bytes of preview are needed for a particular ICAP
+ application on a per-resource basis. Clients SHOULD be able to
+ provide Previews of at least 4096 bytes. Clients furthermore SHOULD
+ provide a Preview when using any ICAP resource that has indicated a
+ Preview is useful. (This indication might be provided via the
+ OPTIONS method, or some other "out-of-band" configuration.) Clients
+ SHOULD NOT provide a larger Preview than a server has indicated it is
+ willing to accept.
+
+ To effect a Preview, an ICAP client MUST add a "Preview:" header to
+ its request headers indicating the length of the preview. The ICAP
+ client then sends:
+
+ - all of the encapsulated header sections, and
+
+ - the beginning of the encapsulated body section, if any, up to the
+ number of bytes advertised in the Preview (possibly 0).
+
+ After the Preview is sent, the client stops and waits for an
+ intermediate response from the ICAP server before continuing. This
+ mechanism is similar to the "100-Continue" feature found in HTTP,
+ except that the stop-and-wait point can be within the message body.
+ In contrast, HTTP requires that the point must be the boundary
+ between the headers and body.
+
+
+
+Elson & Cerpa Informational [Page 19]
+\f
+RFC 3507 ICAP April 2003
+
+
+ For example, to effect a Preview consisting of only encapsulated HTTP
+ headers, the ICAP client would add the following header to the ICAP
+ request:
+
+ Preview: 0
+
+ This indicates that the ICAP client will send only the encapsulated
+ header sections to the ICAP server, then it will send a zero-length
+ chunk and stop and wait for a "go ahead" to send more encapsulated
+ body bytes to the ICAP server.
+
+ Similarly, the ICAP header:
+
+ Preview: 4096
+
+ Indicates that the ICAP client will attempt to send 4096 bytes of
+ origin server data in the encapsulated body of the ICAP request to
+ the ICAP server. It is important to note that the actual transfer
+ may be less, because the ICAP client is acting like a surrogate and
+ is not looking ahead to find the total length of the origin server
+ response. The entire ICAP encapsulated header section(s) will be
+ sent, followed by up to 4096 bytes of encapsulated HTTP body. The
+ chunk body terminator "0\r\n\r\n" is always included in these
+ transactions.
+
+ After sending the preview, the ICAP client will wait for a response
+ from the ICAP server. The response MUST be one of the following:
+
+ - 204 No Content. The ICAP server does not want to (or can not)
+ modify the ICAP client's request. The ICAP client MUST treat this
+ the same as if it had sent the entire message to the ICAP server
+ and an identical message was returned.
+
+ - ICAP reqmod or respmod response, depending what method was the
+ original request. See Section 4.8.2 and 4.9.2 for the format of
+ reqmod and respmod responses.
+
+ - 100 Continue. If the entire encapsulated HTTP body did not fit
+ in the preview, the ICAP client MUST send the remainder of its
+ ICAP message, starting from the first chunk after the preview. If
+ the entire message fit in the preview (detected by the "EOF"
+ symbol explained below), then the ICAP server MUST NOT respond
+ with 100 Continue.
+
+ When an ICAP client is performing a preview, it may not yet know how
+ many bytes will ultimately be available in the arriving HTTP message
+ that it is relaying to the HTTP server. Therefore, ICAP defines a
+ way for ICAP clients to indicate "EOF" to ICAP servers if one
+
+
+
+Elson & Cerpa Informational [Page 20]
+\f
+RFC 3507 ICAP April 2003
+
+
+ unexpectedly arrives during the preview process. This is a
+ particularly useful optimization if a header-only HTTP response
+ arrives at the ICAP client (i.e., zero bytes of body); only a single
+ round trip will be needed for the complete ICAP server response.
+
+ We define an HTTP chunk-extension of "ieof" to indicate that an ICAP
+ chunk is the last chunk (see [4]). The ICAP server MUST strip this
+ chunk extension before passing the chunk data to an ICAP application
+ process.
+
+ For example, consider an ICAP client that has just received HTTP
+ response headers from an origin server and initiates an ICAP RESPMOD
+ transaction to an ICAP server. It does not know yet how many body
+ bytes will be arriving from the origin server because the server is
+ not using the Content-Length header. The ICAP client informs the
+ ICAP server that it will be sending a 1024-byte preview using a
+ "Preview: 1024" request header. If the HTTP origin server then
+ closes its connection to the ICAP client before sending any data
+ (i.e., it provides a zero-byte body), the corresponding zero-byte
+ preview for that zero-byte origin response would appear as follows:
+
+ \r\n
+ 0; ieof\r\n\r\n
+
+ If an ICAP server sees this preview, it knows from the presence of
+ "ieof" that the client will not be sending any more chunk data. In
+ this case, the server MUST respond with the modified response or a
+ 204 No Content message right away. It MUST NOT send a 100-Continue
+ response in this case. (In contrast, if the origin response had been
+ 1 byte or larger, the "ieof" would not have appeared. In that case,
+ an ICAP server MAY reply with 100-Continue, a modified response, or
+ 204 No Content.)
+
+ In another example, if the preview is 1024 bytes and the origin
+ response is 1024 bytes in two chunks, then the encapsulation would
+ appear as follows:
+
+ 200\r\n
+ <512 bytes of data>\r\n
+ 200\r\n
+ <512 bytes of data>\r\n
+ 0; ieof\r\n\r\n
+
+ <204 or modified response> (100 Continue disallowed due to ieof)
+
+ If the preview is 1024 bytes and the origin response is 1025 bytes
+ (and the ICAP server responds with 100-continue), then these chunks
+ would appear on the wire:
+
+
+
+Elson & Cerpa Informational [Page 21]
+\f
+RFC 3507 ICAP April 2003
+
+
+ 200\r\n
+ <512 bytes of data>\r\n
+ 200\r\n
+ <512 bytes of data>\r\n
+ 0\r\n
+
+ <100 Continue Message>
+
+ 1\r\n
+ <1 byte of data>\r\n
+ 0\r\n\r\n <no ieof because we are no longer in preview mode>
+
+ Once the ICAP server receives the eof indicator, it finishes reading
+ the current chunk stream.
+
+ Note that when offering a Preview, the ICAP client is committing to
+ temporarily buffer the previewed portion of the message so that it
+ can honor a "204 No Content" response. The remainder of the message
+ is not necessarily buffered; it might be pipelined directly from
+ another source to the ICAP server after a 100-Continue.
+
+4.6 "204 No Content" Responses outside of Previews
+
+ An ICAP client MAY choose to honor "204 No Content" responses for an
+ entire message. This is the decision of the client because it
+ imposes a burden on the client of buffering the entire message.
+
+ An ICAP client MAY include "Allow: 204" in its request headers,
+ indicating that the server MAY reply to the message with a "204 No
+ Content" response if the object does not need modification.
+
+ If an ICAP server receives a request that does not have "Allow: 204",
+ it MUST NOT reply with a 204. In this case, an ICAP server MUST
+ return the entire message back to the client, even though it is
+ identical to the message it received.
+
+ The ONLY EXCEPTION to this rule is in the case of a message preview,
+ as described in the previous section. If this is the case, an ICAP
+ server can respond with a 204 No Content message in response to a
+ message preview EVEN if the original request did not have the "Allow:
+ 204" header.
+
+4.7 ISTag Response Header
+
+ The ISTag ("ICAP Service Tag") response-header field provides a way
+ for ICAP servers to send a service-specific "cookie" to ICAP clients
+ that represents a service's current state. It is a 32-byte-maximum
+ alphanumeric string of data (not including the null character) that
+
+
+
+Elson & Cerpa Informational [Page 22]
+\f
+RFC 3507 ICAP April 2003
+
+
+ may, for example, be a representation of the software version or
+ configuration of a service. An ISTag validates that previous ICAP
+ server responses can still be considered fresh by an ICAP client that
+ may be caching them. If a change on the ICAP server invalidates
+ previous responses, the ICAP server can invalidate portions of the
+ ICAP client's cache by changing its ISTag. The ISTag MUST be
+ included in every ICAP response from an ICAP server.
+
+ For example, consider a virus-scanning ICAP service. The ISTag might
+ be a combination of the virus scanner's software version and the
+ release number of its virus signature database. When the database is
+ updated, the ISTag can be changed to invalidate all previous
+ responses that had been certified as "clean" and cached with the old
+ ISTag.
+
+ ISTag is similar, but not identical, to the HTTP ETag. While an ETag
+ is a validator for a particular entity (object), an ISTag validates
+ all entities generated by a particular service (URI). A change in
+ the ISTag invalidates all the other entities provided a service with
+ the old ISTag, not just the entity whose response contained the
+ updated ISTag.
+
+ The syntax of an ISTag is simply:
+ ISTag = "ISTag: " quoted-string
+
+ In this document we use the quoted-string definition defined in
+ section 2.2 of [4].
+
+ For example:
+ ISTag: "874900-1994-1c02798"
+
+4.8 Request Modification Mode
+
+ In this method, described in Section 3.1, an ICAP client sends an
+ HTTP request to an ICAP server. The ICAP server returns a modified
+ version of the request, an HTTP response, or (if the client indicates
+ it supports 204 responses) an indication that no modification is
+ required.
+
+4.8.1 Request
+
+ In REQMOD mode, the ICAP request MUST contain an encapsulated HTTP
+ request. The headers and body (if any) MUST both be encapsulated,
+ except that hop-by-hop headers are not encapsulated.
+
+
+
+
+
+
+
+Elson & Cerpa Informational [Page 23]
+\f
+RFC 3507 ICAP April 2003
+
+
+4.8.2 Response
+
+ The response from the ICAP server back to the ICAP client may take
+ one of four forms:
+
+ - An error indication,
+
+ - A 204 indicating that the ICAP client's request requires no
+ adaptation (see Section 4.6 for limitations of this response),
+
+ - An encapsulated, adapted version of the ICAP client's request, or
+
+ - An encapsulated HTTP error response. Note that Request
+ Modification requests may only be satisfied with HTTP responses in
+ cases when the HTTP response is an error (e.g., 403 Forbidden).
+
+ The first line of the response message MUST be a status line as
+ described in Section 4.3.3. If the return code is a 2XX, the ICAP
+ client SHOULD continue its normal execution of the request. If the
+ ICAP client is a surrogate, this may include serving an object from
+ its cache or forwarding the modified request to an origin server.
+ Note it is valid for a 2XX ICAP response to contain an encapsulated
+ HTTP error response, which in turn should be returned to the
+ downstream client by the ICAP client.
+
+ For other return codes that indicate an error, the ICAP client MAY
+ (for example) return the error to the downstream client or user,
+ execute the unadapted request as it arrived from the client, or re-
+ try the adaptation again.
+
+ The modified request headers, if any, MUST be returned to the ICAP
+ client using appropriate encapsulation as described in Section 4.4.
+
+4.8.3 Examples
+
+ Consider the following example, in which a surrogate receives a
+ simple GET request from a client. The surrogate, acting as an ICAP
+ client, then forwards this request to an ICAP server for
+ modification. The ICAP server modifies the request headers and sends
+ them back to the ICAP client. Our hypothetical ICAP server will
+ modify several headers and strip the cookie from the original
+ request.
+
+ In all of our examples, we include the extra meta-data added to the
+ message due to chunking the encapsulated message body (if any). We
+ assume that end-of-line terminations, and blank lines, are two-byte
+ "CRLF" sequences.
+
+
+
+
+Elson & Cerpa Informational [Page 24]
+\f
+RFC 3507 ICAP April 2003
+
+
+ ICAP Request Modification Example 1 - ICAP Request
+ ----------------------------------------------------------------
+ REQMOD icap://icap-server.net/server?arg=87 ICAP/1.0
+ Host: icap-server.net
+ Encapsulated: req-hdr=0, null-body=170
+
+ GET / HTTP/1.1
+ Host: www.origin-server.com
+ Accept: text/html, text/plain
+ Accept-Encoding: compress
+ Cookie: ff39fk3jur@4ii0e02i
+ If-None-Match: "xyzzy", "r2d2xxxx"
+
+ ----------------------------------------------------------------
+ ICAP Request Modification Example 1 - ICAP Response
+ ----------------------------------------------------------------
+ ICAP/1.0 200 OK
+ Date: Mon, 10 Jan 2000 09:55:21 GMT
+ Server: ICAP-Server-Software/1.0
+ Connection: close
+ ISTag: "W3E4R7U9-L2E4-2"
+ Encapsulated: req-hdr=0, null-body=231
+
+ GET /modified-path HTTP/1.1
+ Host: www.origin-server.com
+ Via: 1.0 icap-server.net (ICAP Example ReqMod Service 1.1)
+ Accept: text/html, text/plain, image/gif
+ Accept-Encoding: gzip, compress
+ If-None-Match: "xyzzy", "r2d2xxxx"
+
+ ----------------------------------------------------------------
+
+ The second example is similar to the first, except that the request
+ being modified in this case is a POST instead of a GET. Note that
+ the encapsulated Content-Length argument has been modified to reflect
+ the modified body of the POST message. The outer ICAP message does
+ not need a Content-Length header because it uses chunking (not
+ shown).
+
+ In this second example, the Encapsulated header shows the division
+ between the forwarded header and forwarded body, for both the request
+ and the response.
+
+ ICAP Request Modification Example 2 - ICAP Request
+ ----------------------------------------------------------------
+ REQMOD icap://icap-server.net/server?arg=87 ICAP/1.0
+ Host: icap-server.net
+ Encapsulated: req-hdr=0, req-body=147
+
+
+
+Elson & Cerpa Informational [Page 25]
+\f
+RFC 3507 ICAP April 2003
+
+
+ POST /origin-resource/form.pl HTTP/1.1
+ Host: www.origin-server.com
+ Accept: text/html, text/plain
+ Accept-Encoding: compress
+ Pragma: no-cache
+
+ 1e
+ I am posting this information.
+ 0
+
+ ----------------------------------------------------------------
+ ICAP Request Modification Example 2 - ICAP Response
+ ----------------------------------------------------------------
+ ICAP/1.0 200 OK
+ Date: Mon, 10 Jan 2000 09:55:21 GMT
+ Server: ICAP-Server-Software/1.0
+ Connection: close
+ ISTag: "W3E4R7U9-L2E4-2"
+ Encapsulated: req-hdr=0, req-body=244
+
+ POST /origin-resource/form.pl HTTP/1.1
+ Host: www.origin-server.com
+ Via: 1.0 icap-server.net (ICAP Example ReqMod Service 1.1)
+ Accept: text/html, text/plain, image/gif
+ Accept-Encoding: gzip, compress
+ Pragma: no-cache
+ Content-Length: 45
+
+ 2d
+ I am posting this information. ICAP powered!
+ 0
+
+ ----------------------------------------------------------------
+ Finally, this third example shows an ICAP server returning an error
+ response when it receives a Request Modification request.
+
+ ICAP Request Modification Example 3 - ICAP Request
+ ----------------------------------------------------------------
+ REQMOD icap://icap-server.net/content-filter ICAP/1.0
+ Host: icap-server.net
+ Encapsulated: req-hdr=0, null-body=119
+
+ GET /naughty-content HTTP/1.1
+ Host: www.naughty-site.com
+ Accept: text/html, text/plain
+ Accept-Encoding: compress
+
+ ----------------------------------------------------------------
+
+
+
+Elson & Cerpa Informational [Page 26]
+\f
+RFC 3507 ICAP April 2003
+
+
+ ICAP Request Modification Example 3 - ICAP Response
+ ----------------------------------------------------------------
+ ICAP/1.0 200 OK
+ Date: Mon, 10 Jan 2000 09:55:21 GMT
+ Server: ICAP-Server-Software/1.0
+ Connection: close
+ ISTag: "W3E4R7U9-L2E4-2"
+ Encapsulated: res-hdr=0, res-body=213
+
+ HTTP/1.1 403 Forbidden
+ Date: Wed, 08 Nov 2000 16:02:10 GMT
+ Server: Apache/1.3.12 (Unix)
+ Last-Modified: Thu, 02 Nov 2000 13:51:37 GMT
+ ETag: "63600-1989-3a017169"
+ Content-Length: 58
+ Content-Type: text/html
+
+ 3a
+ Sorry, you are not allowed to access that naughty content.
+ 0
+
+ ----------------------------------------------------------------
+
+4.9 Response Modification Mode
+
+ In this method, described in Section 3.2, an ICAP client sends an
+ origin server's HTTP response to an ICAP server, and (if available)
+ the original client request that caused that response. Similar to
+ Request Modification method, the response from the ICAP server can be
+ an adapted HTTP response, an error, or a 204 response code indicating
+ that no adaptation is required.
+
+4.9.1 Request
+
+ Using encapsulation described in Section 4.4, the header and body of
+ the HTTP response to be modified MUST be included in the ICAP body.
+ If available, the header of the original client request SHOULD also
+ be included. As with the other method, the hop-by-hop headers of the
+ encapsulated messages MUST NOT be forwarded. The Encapsulated header
+ MUST indicate the byte-offsets of the beginning of each of these four
+ parts.
+
+4.9.2 Response
+
+ The response from the ICAP server looks just like a reply in the
+ Request Modification method (Section 4.8); that is,
+
+ - An error indication,
+
+
+
+Elson & Cerpa Informational [Page 27]
+\f
+RFC 3507 ICAP April 2003
+
+
+ - An encapsulated and potentially modified HTTP response header and
+ response body, or
+
+ - An HTTP response 204 indicating that the ICAP client's request
+ requires no adaptation.
+
+ The first line of the response message MUST be a status line as
+ described in Section 4.3.3. If the return code is a 2XX, the ICAP
+ client SHOULD continue its normal execution of the response. The
+ ICAP client MAY re-examine the headers in the response's message
+ headers in order to make further decisions about the response (e.g.,
+ its cachability).
+
+ For other return codes that indicate an error, the ICAP client SHOULD
+ NOT return these directly to downstream client, since these errors
+ only make sense in the ICAP client/server transaction.
+
+ The modified response headers, if any, MUST be returned to the ICAP
+ client using appropriate encapsulation as described in Section 4.4.
+
+4.9.3 Examples
+
+ In Example 4, an ICAP client is requesting modification of an entity
+ that was returned as a result of a client GET. The original client
+ GET was to an origin server at "www.origin-server.com"; the ICAP
+ server is at "icap.example.org".
+
+ ICAP Response Modification Example 4 - ICAP Request
+ ----------------------------------------------------------------
+ RESPMOD icap://icap.example.org/satisf ICAP/1.0
+ Host: icap.example.org
+ Encapsulated: req-hdr=0, res-hdr=137, res-body=296
+
+ GET /origin-resource HTTP/1.1
+ Host: www.origin-server.com
+ Accept: text/html, text/plain, image/gif
+ Accept-Encoding: gzip, compress
+
+ HTTP/1.1 200 OK
+ Date: Mon, 10 Jan 2000 09:52:22 GMT
+ Server: Apache/1.3.6 (Unix)
+ ETag: "63840-1ab7-378d415b"
+ Content-Type: text/html
+ Content-Length: 51
+
+
+
+
+
+
+
+Elson & Cerpa Informational [Page 28]
+\f
+RFC 3507 ICAP April 2003
+
+
+ 33
+ This is data that was returned by an origin server.
+ 0
+
+ ----------------------------------------------------------------
+
+ ICAP Response Modification Example 4 - ICAP Response
+ ----------------------------------------------------------------
+ ICAP/1.0 200 OK
+ Date: Mon, 10 Jan 2000 09:55:21 GMT
+ Server: ICAP-Server-Software/1.0
+ Connection: close
+ ISTag: "W3E4R7U9-L2E4-2"
+ Encapsulated: res-hdr=0, res-body=222
+
+ HTTP/1.1 200 OK
+ Date: Mon, 10 Jan 2000 09:55:21 GMT
+ Via: 1.0 icap.example.org (ICAP Example RespMod Service 1.1)
+ Server: Apache/1.3.6 (Unix)
+ ETag: "63840-1ab7-378d415b"
+ Content-Type: text/html
+ Content-Length: 92
+
+ 5c
+ This is data that was returned by an origin server, but with
+ value added by an ICAP server.
+ 0
+
+ ----------------------------------------------------------------
+
+4.10 OPTIONS Method
+
+ The ICAP "OPTIONS" method is used by the ICAP client to retrieve
+ configuration information from the ICAP server. In this method, the
+ ICAP client sends a request addressed to a specific ICAP resource and
+ receives back a response with options that are specific to the
+ service named by the URI. All OPTIONS requests MAY also return
+ options that are global to the server (i.e., apply to all services).
+
+4.10.1 OPTIONS Request
+
+ The OPTIONS method consists of a request-line, as described in
+ Section 4.3.2, such as the following example:
+
+ OPTIONS icap://icap.server.net/sample-service ICAP/1.0 User-Agent:
+ ICAP-client-XYZ/1.001
+
+
+
+
+
+Elson & Cerpa Informational [Page 29]
+\f
+RFC 3507 ICAP April 2003
+
+
+ Other headers are also allowed as described in Section 4.3.1 and
+ Section 4.3.2 (for example, Host).
+
+4.10.2 OPTIONS Response
+
+ The OPTIONS response consists of a status line as described in
+ section 4.3.3 followed by a series of header field names-value pairs
+ optionally followed by an opt-body. Multiple values in the value
+ field MUST be separated by commas. If an opt-body is present in the
+ OPTIONS response, the Opt-body-type header describes the format of
+ the opt-body.
+
+ The OPTIONS headers supported in this version of the protocol are:
+
+ -- Methods:
+
+ The method that is supported by this service. This header MUST be
+ included in the OPTIONS response. The OPTIONS method MUST NOT be
+ in the Methods' list since it MUST be supported by all the ICAP
+ server implementations. Each service should have a distinct URI
+ and support only one method in addition to OPTIONS (see Section
+ 6.4).
+
+ For example:
+ Methods: RESPMOD
+
+ -- Service:
+
+ A text description of the vendor and product name. This header
+ MAY be included in the OPTIONS response.
+
+ For example:
+ Service: XYZ Technology Server 1.0
+
+ -- ISTag:
+
+ See section 4.7 for details. This header MUST be included in the
+ OPTIONS response.
+
+ For example:
+ ISTag: "5BDEEEA9-12E4-2"
+
+ -- Encapsulated:
+
+ This header MUST be included in the OPTIONS response; see Section
+ 4.4.
+
+
+
+
+
+Elson & Cerpa Informational [Page 30]
+\f
+RFC 3507 ICAP April 2003
+
+
+ For example:
+ Encapsulated: opt-body=0
+
+ -- Opt-body-type:
+
+ A token identifying the format of the opt-body. (Valid opt-body
+ types are not defined by ICAP.) This header MUST be included in
+ the OPTIONS response ONLY if an opt-body type is present.
+
+ For example:
+ Opt-body-type: XML-Policy-Table-1.0
+
+ -- Max-Connections:
+
+ The maximum number of ICAP connections the server is able to
+ support. This header MAY be included in the OPTIONS response.
+
+ For example:
+ Max-Connections: 1500
+
+ -- Options-TTL:
+
+ The time (in seconds) for which this OPTIONS response is valid.
+ If none is specified, the OPTIONS response does not expire. This
+ header MAY be included in the OPTIONS response. The ICAP client
+ MAY reissue an OPTIONS request once the Options-TTL expires.
+
+ For example:
+ Options-TTL: 3600
+
+ -- Date:
+
+ The server's clock, specified as an RFC 1123 compliant date/time
+ string. This header MAY be included in the OPTIONS response.
+
+ For example:
+ Date: Fri, 15 Jun 2001 04:33:55 GMT
+
+ -- Service-ID:
+
+ A short label identifying the ICAP service. It MAY be used in
+ attribute header names. This header MAY be included in the
+ OPTIONS response.
+
+ For example:
+ Service-ID: xyztech
+
+
+
+
+
+Elson & Cerpa Informational [Page 31]
+\f
+RFC 3507 ICAP April 2003
+
+
+ -- Allow:
+
+ A directive declaring a list of optional ICAP features that this
+ server has implemented. This header MAY be included in the
+ OPTIONS response. In this document we define the value "204" to
+ indicate that the ICAP server supports a 204 response.
+
+ For example:
+ Allow: 204
+
+ -- Preview:
+
+ The number of bytes to be sent by the ICAP client during a
+ preview. This header MAY be included in the OPTIONS response.
+
+ For example:
+ Preview: 1024
+
+ -- Transfer-Preview:
+
+ A list of file extensions that should be previewed to the ICAP
+ server before sending them in their entirety. This header MAY be
+ included in the OPTIONS response. Multiple file extensions values
+ should be separated by commas. The wildcard value "*" specifies
+ the default behavior for all the file extensions not specified in
+ any other Transfer-* header (see below).
+
+ For example:
+ Transfer-Preview: *
+
+ -- Transfer-Ignore:
+
+ A list of file extensions that should NOT be sent to the ICAP
+ server. This header MAY be included in the OPTIONS response.
+ Multiple file extensions should be separated by commas.
+
+ For example:
+ Transfer-Ignore: html
+
+ -- Transfer-Complete:
+
+ A list of file extensions that should be sent in their entirety
+ (without preview) to the ICAP server. This header MAY be included
+ in the OPTIONS response. Multiple file extensions values should
+ be separated by commas.
+
+ For example:
+ Transfer-Complete: asp, bat, exe, com, ole
+
+
+
+Elson & Cerpa Informational [Page 32]
+\f
+RFC 3507 ICAP April 2003
+
+
+ Note: If any of Transfer-* are sent, exactly one of them MUST contain
+ the wildcard value "*" to specify the default. If no Transfer-* are
+ sent, all responses will be sent in their entirety (without Preview).
+
+4.10.3 OPTIONS Examples
+
+ In example 5, an ICAP Client sends an OPTIONS Request to an ICAP
+ Service named icap.server.net/sample-service in order to get
+ configuration information for the service provided.
+
+ ICAP OPTIONS Example 5 - ICAP OPTIONS Request
+ ----------------------------------------------------------------
+ OPTIONS icap://icap.server.net/sample-service ICAP/1.0
+ Host: icap.server.net
+ User-Agent: BazookaDotCom-ICAP-Client-Library/2.3
+
+ ----------------------------------------------------------------
+
+ ICAP OPTIONS Example 5 - ICAP OPTIONS Response
+ ----------------------------------------------------------------
+ ICAP/1.0 200 OK
+ Date: Mon, 10 Jan 2000 09:55:21 GMT
+ Methods: RESPMOD
+ Service: FOO Tech Server 1.0
+ ISTag: "W3E4R7U9-L2E4-2"
+ Encapsulated: null-body=0
+ Max-Connections: 1000
+ Options-TTL: 7200
+ Allow: 204
+ Preview: 2048
+ Transfer-Complete: asp, bat, exe, com
+ Transfer-Ignore: html
+ Transfer-Preview: *
+
+ ----------------------------------------------------------------
+
+5. Caching
+
+ ICAP servers' responses MAY be cached by ICAP clients, just as any
+ other surrogate might cache HTTP responses. Similar to HTTP, ICAP
+ clients MAY always store a successful response (see sections 4.8.2
+ and 4.9.2) as a cache entry, and MAY return it without validation if
+ it is fresh. ICAP servers use the caching directives described in
+ HTTP/1.1 [4].
+
+ In Request Modification mode, the ICAP server MAY include caching
+ directives in the ICAP header section of the ICAP response (NOT in
+ the encapsulated HTTP request of the ICAP message body). In Response
+
+
+
+Elson & Cerpa Informational [Page 33]
+\f
+RFC 3507 ICAP April 2003
+
+
+ Modification mode, the ICAP server MAY add or modify the HTTP caching
+ directives located in the encapsulated HTTP response (NOT in the ICAP
+ header section). Consequently, the ICAP client SHOULD look for
+ caching directives in the ICAP headers in case of REQMOD, and in the
+ encapsulated HTTP response in case of RESPMOD.
+
+ In cases where an ICAP server returns a modified version of an object
+ created by an origin server, such as in Response Modification mode,
+ the expiration of the ICAP-modified object MUST NOT be longer than
+ that of the origin object. In other words, ICAP servers MUST NOT
+ extend the lifetime of origin server objects, but MAY shorten it.
+
+ In cases where the ICAP server is the authoritative source of an ICAP
+ response, such as in Request Modification mode, the ICAP server is
+ not restricted in its expiration policy.
+
+ Note that the ISTag response-header may also be used to providing
+ caching hints to clients; see Section 4.7.
+
+6. Implementation Notes
+
+6.1 Vectoring Points
+
+ The definition of the ICAP protocol itself only describes two
+ different adaptation channels: modification (and satisfaction) of
+ requests, and modifications of replies. However, an ICAP client
+ implementation is likely to actually distinguish among four different
+ classes of adaptation:
+
+ 1. Adaptation of client requests. This is adaptation done every
+ time a request arrives from a client. This is adaptation done
+ when a request is "on its way into the cache". Factors such as
+ the state of the objects currently cached will determine whether
+ or not this request actually gets forwarded to an origin server
+ (instead of, say, getting served off the cache's disk). An
+ example of this type of adaptation would be special access
+ control or authentication services that must be performed on a
+ per-client basis.
+
+ 2. Adaptation of requests on their way to an origin server.
+ Although this type of adaptation is also an adaptation of
+ requests similar to (1), it describes requests that are "on their
+ way out of the cache"; i.e., if a request actually requires that
+ an origin server be contacted. These adaptation requests are not
+ necessarily specific to particular clients. An example would be
+ addition of "Accept:" headers for special devices; these
+ adaptations can potentially apply to many clients.
+
+
+
+
+Elson & Cerpa Informational [Page 34]
+\f
+RFC 3507 ICAP April 2003
+
+
+ 3. Adaptations of responses coming from an origin server. This is
+ the adaptation of an object "on its way into the cache". In
+ other words, this is adaptation that a surrogate might want to
+ perform on an object before caching it. The adapted object may
+ subsequently served to many clients. An example of this type of
+ adaptation is virus checking: a surrogate will want to check an
+ incoming origin reply for viruses once, before allowing it into
+ the cache -- not every time the cached object is served to a
+ client.
+
+ Adaptation of responses coming from the surrogate, heading back
+ to the client. Although this type of adaptation, like (3), is
+ the adaptation of a response, it is client-specific. Client
+ reply adaptation is adaptation that is required every time an
+ object is served to a client, even if all the replies come from
+ the same cached object off of disk. Ad insertion is a common
+ form of this kind of adaptation; e.g., if a popular (cached)
+ object that rarely changes needs a different ad inserted into it
+ every time it is served off disk to a client. Note that the
+ relationship between adaptations of type (3) and (4) is analogous
+ to the relationship between types (2) and (1).
+
+ Although the distinction among these four adaptation points is
+ critical for ICAP client implementations, the distinction is not
+ significant for the ICAP protocol itself. From the point of view of
+ an ICAP server, a request is a request -- the ICAP server doesn't
+ care what policy led the ICAP client to generate the request. We
+ therefore did not make these four channels explicit in ICAP for
+ simplicity.
+
+6.2 Application Level Errors
+
+ Section 4 described "on the wire" protocol errors that MUST be
+ standardized across implementations to ensure interoperability. In
+ this section, we describe errors that are communicated between ICAP
+ software and the clients and servers on which they are implemented.
+ Although such errors are implementation dependent and do not
+ necessarily need to be standardized because they are "within the
+ box", they are presented here as advice to future implementors based
+ on past implementation experience.
+
+
+
+
+
+
+
+
+
+
+
+Elson & Cerpa Informational [Page 35]
+\f
+RFC 3507 ICAP April 2003
+
+
+ Error name Value
+ ====================================================
+ ICAP_CANT_CONNECT 1000
+ ICAP_SERVER_RESPONSE_CLOSE 1001
+ ICAP_SERVER_RESPONSE_RESET 1002
+ ICAP_SERVER_UNKNOWN_CODE 1003
+ ICAP_SERVER_UNEXPECTED_CLOSE_204 1004
+ ICAP_SERVER_UNEXPECTED_CLOSE 1005
+
+ 1000 ICAP_CANT_CONNECT:
+ "Cannot connect to ICAP server".
+
+ The ICAP server is not connected on the socket. Maybe the ICAP
+ server is dead or it is not connected on the socket.
+
+ 1001 ICAP_SERVER_RESPONSE_CLOSE:
+ "ICAP Server closed connection while reading response".
+
+ The ICAP server TCP-shutdowns the connection before the ICAP
+ client can send all the body data.
+
+ 1002 ICAP_SERVER_RESPONSE_RESET:
+ "ICAP Server reset connection while reading response".
+
+ The ICAP server TCP-reset the connection before the ICAP client
+ can send all the body data.
+
+ 1003 ICAP_SERVER_UNKNOWN_CODE:
+ "ICAP Server sent unknown response code".
+
+ An unknown ICAP response code (see Section 4.x) was received by
+ the ICAP client.
+
+ 1004 ICAP_SERVER_UNEXPECTED_CLOSE_204:
+ "ICAP Server closed connection on 204 without 'Connection: close'
+ header".
+
+ An ICAP server MUST send the "Connection: close" header if
+ intends to close after the current transaction.
+
+ 1005 ICAP_SERVER_UNEXPECTED_CLOSE:
+ "ICAP Server closed connection as ICAP client wrote body
+ preview".
+
+
+
+
+
+
+
+
+Elson & Cerpa Informational [Page 36]
+\f
+RFC 3507 ICAP April 2003
+
+
+6.3 Use of Chunked Transfer-Encoding
+
+ For simplicity, ICAP messages MUST use the "chunked" transfer-
+ encoding within the encapsulated body section as defined in HTTP/1.1
+ [4]. This requires that ICAP client implementations convert incoming
+ objects "on the fly" to chunked from whatever transfer-encoding on
+ which they arrive. However, the transformation is simple:
+
+ - For objects arriving using "Content-Length" headers, one big chunk
+ can be created of the same size as indicated in the Content-Length
+ header.
+
+ - For objects arriving using a TCP close to signal the end of the
+ object, each incoming group of bytes read from the OS can be
+ converted into a chunk (by writing the length of the bytes read,
+ followed by the bytes themselves)
+
+ - For objects arriving using chunked encoding, they can be
+ retransmitted as is (without re-chunking).
+
+6.4 Distinct URIs for Distinct Services
+
+ ICAP servers SHOULD assign unique URIs to each service they provide,
+ even if such services might theoretically be differentiated based on
+ their method. In other words, a REQMOD and RESPMOD service should
+ never have the same URI, even if they do something that is
+ conceptually the same.
+
+ This situation in ICAP is similar to that found in HTTP where it
+ might, in theory, be possible to perform a GET or a POST to the same
+ URI and expect two different results. This kind of overloading of
+ URIs only causes confusion and should be avoided.
+
+7. Security Considerations
+
+7.1 Authentication
+
+ Authentication in ICAP is very similar to proxy authentication in
+ HTTP as specified in RFC 2617. Specifically, the following rules
+ apply:
+
+ - WWW-Authenticate challenges and responses are for end-to-end
+ authentication between a client (user) and an origin server. As
+ any proxy, ICAP clients and ICAP servers MUST forward these
+ headers without modification.
+
+
+
+
+
+
+Elson & Cerpa Informational [Page 37]
+\f
+RFC 3507 ICAP April 2003
+
+
+ - If authentication is required between an ICAP client and ICAP
+ server, hop-by-hop Proxy Authentication as described in RFC 2617
+ MUST be used.
+
+ There are potential applications where a user (as opposed to ICAP
+ client) might have rights to access an ICAP service. In this version
+ of the protocol, we assume that ICAP clients and ICAP servers are
+ under the same administrative domain, and contained in a single trust
+ domain. Therefore, in these cases, we assume that it is sufficient
+ for users to authenticate themselves to the ICAP client (which is a
+ surrogate from the point of view from the user). This type of
+ authentication will also be Proxy Authentication as described in RFC
+ 2617.
+
+ This standard explicitly excludes any method for a user to
+ authenticate directly to an ICAP server; the ICAP client MUST be
+ involved as described above.
+
+7.2 Encryption
+
+ Users of ICAP should note well that ICAP messages are not encrypted
+ for transit by default. In the absence of some other form of
+ encryption at the link or network layers, eavesdroppers may be able
+ to record the unencrypted transactions between ICAP clients and
+ servers. As described in Section 4.3.1, the Upgrade header MAY be
+ used to negotiate transport-layer security for an ICAP connection
+ [5].
+
+ Note also that end-to-end encryption between a client and origin
+ server is likely to preclude the use of value-added services by
+ intermediaries such as surrogates. An ICAP server that is unable to
+ decrypt a client's messages will, of course, be unable to perform any
+ transformations on it.
+
+7.3 Service Validation
+
+ Normal HTTP surrogates, when operating correctly, should not affect
+ the end-to-end semantics of messages that pass through them. This
+ forms a well-defined criterion to validate that a surrogate is
+ working correctly: a message should look the same before the
+ surrogate as it does after the surrogate.
+
+ In contrast, ICAP is meant to cause changes in the semantics of
+ messages on their way from origin servers to users. The criteria for
+ a correctly operating surrogate are no longer as easy to define.
+ This will make validation of ICAP services significantly more
+ difficult. Incorrect adaptations may lead to security
+ vulnerabilities that were not present in the unadapted content.
+
+
+
+Elson & Cerpa Informational [Page 38]
+\f
+RFC 3507 ICAP April 2003
+
+
+8. Motivations and Design Alternatives
+
+ This section describes some of our design decisions in more detail,
+ and describes the ideas and motivations behind them. This section
+ does not define protocol requirements, but hopefully sheds light on
+ the requirements defined in previous sections. Nothing in this
+ section carries the "force of law" or is part of the formal protocol
+ specification.
+
+ In general, our guiding principle was to make ICAP the simplest
+ possible protocol that would do the job, and no simpler. Some
+ features were rejected where alternative (non-protocol-based)
+ solutions could be found. In addition, we have intentionally left a
+ number of issues at the discretion of the implementor, where we
+ believe that doing so does not compromise interoperability.
+
+8.1 To Be HTTP, or Not To Be
+
+ ICAP was initially designed as an application-layer protocol built to
+ run on top of HTTP. This was desirable for a number of reasons.
+ HTTP is well-understood in the community and has enjoyed significant
+ investments in software infrastructure (clients, servers, parsers,
+ etc.). Our initial designs focused on leveraging that existing work;
+ we hoped that it would be possible to implement ICAP services simply,
+ using CGI scripts run by existing web servers.
+
+ However, the devil (as always) proved to be in the details. Certain
+ features that we considered important were impossible to implement
+ with HTTP. For example, ICAP clients can stop and wait for a "100
+ Continue" message in the midst of a message-body; HTTP clients may
+ only wait between the header and body. In addition, certain
+ transformations of HTTP messages by surrogates are legal (and
+ harmless for HTTP), but caused problems with ICAP's "header-in-
+ header" encapsulation and other features.
+
+ Ultimately, we decided that the tangle of workarounds required to fit
+ ICAP into HTTP was more complex and confusing than moving away from
+ HTTP and defining a new (but similar) protocol.
+
+8.2 Mandatory Use of Chunking
+
+ Chunking is mandatory in ICAP encapsulated bodies for three reasons.
+ First, efficiency is important, and the chunked encoding allows both
+ the client and server to keep the transport-layer connection open for
+ later reuse. Second, ICAP servers (and their developers) should be
+ encouraged to produce "incremental" responses where possible, to
+ reduce the latency perceived by users. Chunked encoding is the only
+ way to support this type of implementation. Finally, by
+
+
+
+Elson & Cerpa Informational [Page 39]
+\f
+RFC 3507 ICAP April 2003
+
+
+ standardizing on a single encapsulation mechanism, we avoid the
+ complexity that would be required in client and server software to
+ support multiple mechanisms. This simplifies ICAP, particularly in
+ the "body preview" feature described in Section 4.5.
+
+ While chunking of encapsulated bodies is mandatory, encapsulated
+ headers are not chunked. There are two reasons for this decision.
+ First, in cases where a chunked HTTP message body is being
+ encapsulated in an ICAP message, the ICAP client (HTTP server) can
+ copy it directly from the HTTP client to the ICAP server without un-
+ chunking and then re-chunking it. Second, many header-parser
+ implementations have difficulty dealing with headers that come in
+ multiple chunks. Earlier drafts of this document mandated that a
+ chunk boundary not come within a header. For clarity, chunking of
+ encapsulated headers has simply been disallowed.
+
+8.3 Use of the null-body directive in the Encapsulated header
+
+ There is a disadvantage to not using the chunked transfer-encoding
+ for encapsulated header part of an ICAP message. Specifically,
+ parsers do not know in advance how much header data is coming (e.g.,
+ for buffer allocation). ICAP does not allow chunking in the header
+ part for reasons described in Section 8.2. To compensate, the
+ "null-body" directive allows the final header's length to be
+ determined, despite it not being chunked.
+
+9. References
+
+ [1] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform Resource
+ Identifiers (URI): Generic Syntax and Semantics", RFC 2396,
+ August 1998.
+
+ [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement
+ Levels", BCP 14, RFC 2119, March 1997.
+
+ [3] Resnick, P., "Internet Message Format", RFC 2822, April 2001.
+
+ [4] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L.,
+ Leach, P. and T. Berners-Lee, "Hypertext Transfer Protocol --
+ HTTP/1.1", RFC 2616, June 1999.
+
+ [5] Khare, R. and S. Lawrence, "Upgrading to TLS Within HTTP/1.1",
+ RFC 2817, May 2000.
+
+
+
+
+
+
+
+
+Elson & Cerpa Informational [Page 40]
+\f
+RFC 3507 ICAP April 2003
+
+
+10. Contributors
+
+ ICAP is based on an original idea by John Martin and Peter Danzig.
+ Many individuals and organizations have contributed to the
+ development of ICAP, including the following contributors (past and
+ present):
+
+ Lee Duggs
+ Network Appliance, Inc.
+ 495 East Java Dr.
+ Sunnyvale, CA 94089 USA
+
+ Phone: (408) 822-6000
+ EMail: lee.duggs@netapp.com
+
+ Paul Eastham
+ Network Appliance, Inc.
+ 495 East Java Dr.
+ Sunnyvale, CA 94089 USA
+
+ Phone: (408) 822-6000
+ EMail: eastham@netapp.com
+
+ Debbie Futcher
+ Network Appliance, Inc.
+ 495 East Java Dr.
+ Sunnyvale, CA 94089 USA
+
+ Phone: (408) 822-6000
+ EMail: deborah.futcher@netapp.com
+
+ Don Gillies
+ Network Appliance, Inc.
+ 495 East Java Dr.
+ Sunnyvale, CA 94089 USA
+
+ Phone: (408) 822-6000
+ EMail: gillies@netapp.com
+
+ Steven La
+ Network Appliance, Inc.
+ 495 East Java Dr.
+ Sunnyvale, CA 94089 USA
+
+ Phone: (408) 822-6000
+ EMail: steven.la@netapp.com
+
+
+
+
+
+Elson & Cerpa Informational [Page 41]
+\f
+RFC 3507 ICAP April 2003
+
+
+ John Martin
+ Network Appliance, Inc.
+ 495 East Java Dr.
+ Sunnyvale, CA 94089 USA
+
+ Phone: (408) 822-6000
+ EMail: jmartin@netapp.com
+
+ Jeff Merrick
+ Network Appliance, Inc.
+ 495 East Java Dr.
+ Sunnyvale, CA 94089 USA
+
+ Phone: (408) 822-6000
+ EMail: jeffrey.merrick@netapp.com
+
+ John Schuster
+ Network Appliance, Inc.
+ 495 East Java Dr.
+ Sunnyvale, CA 94089 USA
+
+ Phone: (408) 822-6000
+ EMail: john.schuster@netapp.com
+
+ Edward Sharp
+ Network Appliance, Inc.
+ 495 East Java Dr.
+ Sunnyvale, CA 94089 USA
+
+ Phone: (408) 822-6000
+ EMail: edward.sharp@netapp.com
+
+ Peter Danzig
+ Akamai Technologies
+ 1400 Fashion Island Blvd
+ San Mateo, CA 94404 USA
+
+ Phone: (650) 372-5757
+ EMail: danzig@akamai.com
+
+ Mark Nottingham
+ Akamai Technologies
+ 1400 Fashion Island Blvd
+ San Mateo, CA 94404 USA
+
+ Phone: (650) 372-5757
+ EMail: mnot@akamai.com
+
+
+
+
+Elson & Cerpa Informational [Page 42]
+\f
+RFC 3507 ICAP April 2003
+
+
+ Nitin Sharma
+ Akamai Technologies
+ 1400 Fashion Island Blvd
+ San Mateo, CA 94404 USA
+
+ Phone: (650) 372-5757
+ EMail: nitin@akamai.com
+
+ Hilarie Orman
+ Novell, Inc.
+ 122 East 1700 South
+ Provo, UT 84606 USA
+
+ Phone: (801) 861-7021
+ EMail: horman@novell.com
+
+ Craig Blitz
+ Novell, Inc.
+ 122 East 1700 South
+ Provo, UT 84606 USA
+
+ Phone: (801) 861-7021
+ EMail: cblitz@novell.com
+
+ Gary Tomlinson
+ Novell, Inc.
+ 122 East 1700 South
+ Provo, UT 84606 USA
+
+ Phone: (801) 861-7021
+ EMail: garyt@novell.com
+
+ Andre Beck
+ Bell Laboratories / Lucent Technologies
+ 101 Crawfords Corner Road
+ Holmdel, New Jersey 07733-3030
+
+ Phone: (732) 332-5983
+ EMail: abeck@bell-labs.com
+
+ Markus Hofmann
+ Bell Laboratories / Lucent Technologies
+ 101 Crawfords Corner Road
+ Holmdel, New Jersey 07733-3030
+
+ Phone: (732) 332-5983
+ EMail: hofmann@bell-labs.com
+
+
+
+
+Elson & Cerpa Informational [Page 43]
+\f
+RFC 3507 ICAP April 2003
+
+
+ David Bryant
+ CacheFlow, Inc.
+ 650 Almanor Avenue
+ Sunnyvale, California 94086
+
+ Phone: (888) 462-3568
+ EMail: david.bryant@cacheflow.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Elson & Cerpa Informational [Page 44]
+\f
+RFC 3507 ICAP April 2003
+
+
+Appendix A BNF Grammar for ICAP Messages
+
+ This grammar is specified in terms of the augmented Backus-Naur Form
+ (BNF) similar to that used by the HTTP/1.1 specification (See Section
+ 2.1 of [4]). Implementors will need to be familiar with the notation
+ in order to understand this specification.
+
+ Many header values (where noted) have exactly the same grammar and
+ semantics as in HTTP/1.1. We do not reproduce those grammars here.
+
+ ICAP-Version = "ICAP/1.0"
+
+ ICAP-Message = Request | Response
+
+ Request = Request-Line
+ *(Request-Header CRLF)
+ CRLF
+ [ Request-Body ]
+
+ Request-Line = Method SP ICAP_URI SP ICAP-Version CRLF
+
+ Method = "REQMOD" ; Section 4.8
+ | "RESPMOD" ; Section 4.9
+ | "OPTIONS" ; Section 4.10
+ | Extension-Method ; Section 4.3.2
+
+ Extension-Method = token
+
+ ICAP_URI = Scheme ":" Net_Path [ "?" Query ] ; Section 4.2
+
+ Scheme = "icap"
+
+ Net_Path = "//" Authority [ Abs_Path ]
+
+ Authority = [ userinfo "@" ] host [ ":" port ]
+
+
+ Request-Header = Request-Fields ":" [ Generic-Field-Value ]
+
+ Request-Fields = Request-Field-Name
+ | Common-Field-Name
+
+ ; Header fields specific to requests
+ Request-Field-Name = "Authorization" ; Section 4.3.2
+ | "Allow" ; Section 4.3.2
+ | "From" ; Section 4.3.2
+ | "Host" ; Section 4.3.2
+ | "Referer" ; Section 4.3.2
+
+
+
+Elson & Cerpa Informational [Page 45]
+\f
+RFC 3507 ICAP April 2003
+
+
+ | "User-Agent" ; Section 4.3.2
+ | "Preview" ; Section 4.5
+
+ ; Header fields common to both requests and responses
+ Common-Field-Name = "Cache-Control" ; Section 4.3.1
+ | "Connection" ; Section 4.3.1
+ | "Date" ; Section 4.3.1
+ | "Expires" ; Section 4.3.1
+ | "Pragma" ; Section 4.3.1
+ | "Trailer" ; Section 4.3.1
+ | "Upgrade" ; Section 4.3.1
+ | "Encapsulated" ; Section 4.4
+ | Extension-Field-Name ; Section 4.3
+
+ Extension-Field-Name = "X-" token
+
+ Generic-Field-Value = *( Generic-Field-Content | LWS )
+ Generic-Field-Content = <the OCTETs making up the field-value
+ and consisting of either *TEXT or
+ combinations of token, separators,
+ and quoted-string>
+
+ Request-Body = *OCTET ; See Sections 4.4 and 4.5 for semantics
+
+ Response = Status-Line
+ *(Response-Header CRLF)
+ CRLF
+ [ Response-Body ]
+
+ Status-Line = ICAP-Version SP Status-Code SP Reason-Phrase CRLF
+
+ Status-Code = "100" ; Section 4.5
+ | "101" ; Section 10.1.2 of [4]
+ | "200" ; Section 10.2.1 of [4]
+ | "201" ; Section 10.2.2 of [4]
+ | "202" ; Section 10.2.3 of [4]
+ | "203" ; Section 10.2.4 of [4]
+ | "204" ; Section 4.6
+ | "205" ; Section 10.2.6 of [4]
+ | "206" ; Section 10.2.7 of [4]
+ | "300" ; Section 10.3.1 of [4]
+ | "301" ; Section 10.3.2 of [4]
+ | "302" ; Section 10.3.3 of [4]
+ | "303" ; Section 10.3.4 of [4]
+ | "304" ; Section 10.3.5 of [4]
+ | "305" ; Section 10.3.6 of [4]
+ | "306" ; Section 10.3.7 of [4]
+ | "307" ; Section 10.3.8 of [4]
+
+
+
+Elson & Cerpa Informational [Page 46]
+\f
+RFC 3507 ICAP April 2003
+
+
+ | "400" ; Section 4.3.3
+ | "401" ; Section 10.4.2 of [4]
+ | "402" ; Section 10.4.3 of [4]
+ | "403" ; Section 10.4.4 of [4]
+ | "404" ; Section 4.3.3
+ | "405" ; Section 4.3.3
+ | "406" ; Section 10.4.7 of [4]
+ | "407" ; Section 10.4.8 of [4]
+ | "408" ; Section 4.3.3
+ | "409" ; Section 10.4.10 of [4]
+ | "410" ; Section 10.4.11 of [4]
+ | "411" ; Section 10.4.12 of [4]
+ | "412" ; Section 10.4.13 of [4]
+ | "413" ; Section 10.4.14 of [4]
+ | "414" ; Section 10.4.15 of [4]
+ | "415" ; Section 10.4.16 of [4]
+ | "416" ; Section 10.4.17 of [4]
+ | "417" ; Section 10.4.18 of [4]
+ | "500" ; Section 4.3.3
+ | "501" ; Section 4.3.3
+ | "502" ; Section 4.3.3
+ | "503" ; Section 4.3.3
+ | "504" ; Section 10.5.5 of [4]
+ | "505" ; Section 4.3.3
+ | Extension-Code
+
+ Extension-Code = 3DIGIT
+
+ Reason-Phrase = *<TEXT, excluding CR, LF>
+
+ Response-Header = Response-Fields ":" [ Generic-Field-Value ]
+
+ Response-Fields = Response-Field-Name
+ | Common-Field-Name
+
+ Response-Field-Name = "Server" ; Section 4.3.3
+ | "ISTag" ; Section 4.7
+
+ Response-Body = *OCTET ; See Sections 4.4 and 4.5 for semantics
+
+
+
+
+
+
+
+
+
+
+
+
+Elson & Cerpa Informational [Page 47]
+\f
+RFC 3507 ICAP April 2003
+
+
+Authors' Addresses
+
+ Jeremy Elson
+ University of California Los Angeles
+ Department of Computer Science
+ 3440 Boelter Hall
+ Los Angeles CA 90095
+
+ Phone: (310) 206-3925
+ EMail: jelson@cs.ucla.edu
+
+
+ Alberto Cerpa
+ University of California Los Angeles
+ Department of Computer Science
+ 3440 Boelter Hall
+ Los Angeles CA 90095
+
+ Phone: (310) 206-3925
+ EMail: cerpa@cs.ucla.edu
+
+
+ ICAP discussion currently takes place at
+ icap-discussions@yahoogroups.com.
+ For more information, see
+ http://groups.yahoo.com/group/icap-discussions/.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Elson & Cerpa Informational [Page 48]
+\f
+RFC 3507 ICAP April 2003
+
+
+Full Copyright Statement
+
+ Copyright (C) The Internet Society (2003). All Rights Reserved.
+
+ This document and translations of it may be copied and furnished to
+ others, and derivative works that comment on or otherwise explain it
+ or assist in its implementation may be prepared, copied, published
+ and distributed, in whole or in part, without restriction of any
+ kind, provided that the above copyright notice and this paragraph are
+ included on all such copies and derivative works. However, this
+ document itself may not be modified in any way, such as by removing
+ the copyright notice or references to the Internet Society or other
+ Internet organizations, except as needed for the purpose of
+ developing Internet standards in which case the procedures for
+ copyrights defined in the Internet Standards process must be
+ followed, or as required to translate it into languages other than
+ English.
+
+ The limited permissions granted above are perpetual and will not be
+ revoked by the Internet Society or its successors or assigns.
+
+ This document and the information contained herein is provided on an
+ "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+ TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+ BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+ HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+ MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Acknowledgement
+
+ Funding for the RFC Editor function is currently provided by the
+ Internet Society.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Elson & Cerpa Informational [Page 49]
+\f