From: wessels <> Date: Thu, 20 May 1999 02:48:47 +0000 (+0000) Subject: yucky formatting X-Git-Tag: SQUID_3_0_PRE1~2199 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=af30250328e15abd9cd17e98e960e52ddc7e1afb;p=thirdparty%2Fsquid.git yucky formatting --- diff --git a/doc/Programming-Guide/prog-guide.sgml b/doc/Programming-Guide/prog-guide.sgml index 705c5710cf..5e37b74e62 100644 --- a/doc/Programming-Guide/prog-guide.sgml +++ b/doc/Programming-Guide/prog-guide.sgml @@ -18,175 +18,184 @@ or improve it. Introduction -

-The Squid source code has evolved more from empirical observation and -tinkering, rather than a solid design process. It carries a legacy of -being ``touched'' by numerous individuals, each with somewhat different -techniques and terminology. - -

-Squid is a single-process proxy server. Every request is handled by -the main process, with the exception of FTP. However, Squid does not -use a ``threads package'' such has Pthreads. While this might be -easier to code, it suffers from portability and performance problems. -Instead Squid maintains data structures and state information for -each active request. - -

-The code is often difficult to follow because there are no explicit -state variables for the active requests. Instead, thread execution -progresses as a sequence of ``callback functions'' which get executed -when I/O is ready to occur, or some other event has happened. As -a callback function completes, it is responsible for registering the -next callback function for subsequent I/O. - -

-Note there is only a pseudo-consistent naming scheme. In most -cases functions are named like -Note that the Squid source changes rapidly, and some parts of this -document may become out-of-date. If you find any inconsistencies, please -feel free to notify -. +

+ The Squid source code has evolved more from empirical + observation and tinkering, rather than a solid design + process. It carries a legacy of being ``touched'' by + numerous individuals, each with somewhat different techniques + and terminology. + +

+ Squid is a single-process proxy server. Every request is + handled by the main process, with the exception of FTP. + However, Squid does not use a ``threads package'' such has + Pthreads. While this might be easier to code, it suffers + from portability and performance problems. Instead Squid + maintains data structures and state information for each + active request. + +

+ The code is often difficult to follow because there are no + explicit state variables for the active requests. Instead, + thread execution progresses as a sequence of ``callback + functions'' which get executed when I/O is ready to occur, + or some other event has happened. As a callback function + completes, it is responsible for registering the next + callback function for subsequent I/O. + +

+ Note there is only a pseudo-consistent naming scheme. In + most cases functions are named like + Note that the Squid source changes rapidly, and some parts + of this document may become out-of-date. If you find any + inconsistencies, please feel free to notify . Conventions -

-Function names and file names will be written in a courier font, such -as + Function names and file names will be written in a courier + font, such as Source Code Overview -

+Overview of Squid Components + +

Squid consists of the following major components Client Side -

- Here new client connections are accepted, parsed, and processed. - This is where we determine if the request is a cache HIT, - REFRESH, MISS, etc. With HTTP/1.1 we may have multiple requests - from a single TCP connection. Per-connection state information - is held in a data structure called + Here new client connections are accepted, parsed, and + processed. This is where we determine if the request is + a cache HIT, REFRESH, MISS, etc. With HTTP/1.1 we may have + multiple requests from a single TCP connection. Per-connection + state information is held in a data structure called + Server Side -

- These routines are responsible for forwarding cache misses - to other servers, depending on the protocol. Cache misses - may be forwarded to either origin servers, or other proxy caches. - Note that all requests (FTP, Gopher) to other - proxies are sent as HTTP requests. - + These routines are responsible for forwarding cache misses + to other servers, depending on the protocol. Cache misses + may be forwarded to either origin servers, or other proxy + caches. Note that all requests (FTP, Gopher) to other + proxies are sent as HTTP requests. Storage Manager -

- The Storage Manager is the glue between client and server sides. - Every object saved in the cache is allocated a - Squid can quickly locate cached objects because it keeps (in memory) a hash - table of all - Objects are saved to disk in a two-level directory structure. For - each object the - Client-side requests register themselves with a + The Storage Manager is the glue between client and server + sides. Every object saved in the cache is allocated a + + Squid can quickly locate cached objects because it keeps + (in memory) a hash table of all + Objects are saved to disk in a two-level directory structure. + For each object the + Client-side requests register themselves with a Request Forwarding Peer Selection -

- These functions are responsible for selecting - one (or none) of the neighbor caches as the appropriate forwarding - location. +

+ These functions are responsible for selecting one (or none) + of the neighbor caches as the appropriate forwarding + location. Access Control -

- These functions are responsible for allowing - or denying a request, based on a number of different parameters. - These parameters include the client's IP address, the hostname - of the requested resource, the request method, etc. - Some of the necessary information may not be immediately available, - for example the origin server's IP address. In these cases, - the ACL routines initiate lookups for the necessary information and - continues the access control checks when the information is - available. +

+ These functions are responsible for allowing or denying a + request, based on a number of different parameters. These + parameters include the client's IP address, the hostname + of the requested resource, the request method, etc. Some + of the necessary information may not be immediately available, + for example the origin server's IP address. In these cases, + the ACL routines initiate lookups for the necessary + information and continues the access control checks when + the information is available. Network Communication -

- These are the routines for communicating over - TCP and UDP network sockets. Here is where sockets are opened, - closed, read, and written. In addition, note that the heart of - Squid ( + These are the routines for communicating over TCP and UDP + network sockets. Here is where sockets are opened, closed, + read, and written. In addition, note that the heart of + Squid (File/Disk I/O -

- Routines for reading and writing disk files (and FIFOs). - Reasons for separating network and - disk I/O functions are partly historical, and partly because of - different behaviors. For example, we don't worry about getting a - ``No space left on device'' error for network sockets. The disk - I/O routines support queuing of multiple blocks for writing. - In some cases, it is possible to merge multiple blocks into - a single write request. The write callback does not necessarily - occur for every write request. +

+ Routines for reading and writing disk files (and FIFOs). + Reasons for separating network and disk I/O functions are + partly historical, and partly because of different behaviors. + For example, we don't worry about getting a ``No space left + on device'' error for network sockets. The disk I/O routines + support queuing of multiple blocks for writing. In some + cases, it is possible to merge multiple blocks into a single + write request. The write callback does not necessarily + occur for every write request. Neighbors -

- Maintains the list of neighbor caches. Sends and receives - ICP messages to neighbors. Decides which neighbors to - query for a given request. File: + Maintains the list of neighbor caches. Sends and receives + ICP messages to neighbors. Decides which neighbors to + query for a given request. File: IP/FQDN Cache -

- A cache of name-to-address and address-to-name lookups. These are - hash tables keyed on the names and addresses. - + A cache of name-to-address and address-to-name lookups. + These are hash tables keyed on the names and addresses. + Cache Manager -

+

This provides access to certain information needed by the cache administrator. A companion program, Network Measurement Database -

+

In a number of situation, Squid finds it useful to know the estimated network round-trip time (RTT) between itself and origin servers. A particularly useful is example is @@ -220,7 +229,7 @@ Squid consists of the following major components Redirectors -

+

Squid has the ability to rewrite requests from clients. After checking the access controls, but before checking for cache hits, requested URLs may optionally be written to an external @@ -231,7 +240,7 @@ Squid consists of the following major components Autonomous System Numbers -

+

Squid supports Autonomous System (AS) numbers as another access control element. The routines in Configuration File Parsing -

+

The primary configuration file specification is in the file Callback Data Database -

+

Squid's extensive use of callback functions makes it very susceptible to memory access errors. Care must be taken so that the Debugging -

+

Squid includes extensive debugging statements to assist in tracking down bugs and strange behavior. Every debug statement is assigned a section and level. Usually, every debug statement @@ -276,23 +285,23 @@ Squid consists of the following major components Error Generation -

+

The routines in Event Queue -

+

The routines in Filedescriptor Management -

+

Here we track the number of filedescriptors in use, and the number of bytes which has been read from or written to each file descriptor. @@ -300,14 +309,14 @@ Squid consists of the following major components Hashtable Support -

+

These routines implement generic hash tables. A hash table is created with a function for hashing the key values, and a function for comparing the key values. HTTP Anonymization -

+

These routines support anonymizing of HTTP requests leaving the cache. Either specific request headers will be removed (the ``standard'' mode), or only specific request headers @@ -316,7 +325,7 @@ Squid consists of the following major components Internet Cache Protocol -

+

Here we implement the Internet Cache Protocol. This protocol is documented in the RFC 2186 and RFC 2187. The bulk of code is in the Ident Lookups -

+

These routines support RFC 931 ``Ident'' lookups. An ident server running on a host will report the user name associated with a connected TCP socket. Some sites use this facility for @@ -335,7 +344,7 @@ Squid consists of the following major components Memory Management -

+

These routines allocate and manage pools of memory for frequently-used data structures. When the Multicast Support -

+

Currently, multicast is only used for ICP queries. The routines in this file implement joining a UDP socket to a multicast group (or groups), and setting @@ -353,7 +362,7 @@ Squid consists of the following major components Persistent Server Connections -

+

These routines manage idle, persistent HTTP connections to origin servers and neighbor caches. Idle sockets are indexed in a hash table by their socket address @@ -364,7 +373,7 @@ Squid consists of the following major components Refresh Rules -

+

These routines decide wether a cached object is stale or fresh, based on the SNMP Support -

+

These routines implement SNMP for Squid. At the present time, we have made almost all of the cachemgr information available via SNMP. URN Support -

-We are experimenting with URN support in Squid version 1.2. Note, -we're not talking full-blown generic URN's here. This is primarily -targeted towards using URN's as an smart way of handling lists of -mirror sites. For more details, please see - + We are experimenting with URN support in Squid version 1.2. + Note, we're not talking full-blown generic URN's here. This + is primarily targeted towards using URN's as an smart way + of handling lists of mirror sites. For more details, please + see . @@ -393,18 +402,19 @@ mirror sites. For more details, please see dnsserver -

- Because the standard + Because the standard pinger -

+

Although it would be possible for Squid to send and receive ICMP messages directly, we use an external process for two important reasons: @@ -417,10 +427,10 @@ mirror sites. For more details, please see we prefer to have the smaller and simpler - + unlinkd -

+

The redirector -

+

A redirector process reads URLs on stdin and writes (possibly changed) URLs on stdout. It is implemented as an external process to maximize flexibility. Flow of a Typical Request -

- - -A client connection is accepted by the -The access controls are checked. The client-side builds an -ACL state data structure and registers a callback function -for notification when access control checking is completed. - - -After the access controls have been verified, the client-side looks for -the requested object in the cache. If is a cache hit, then the -client-side registers its interest in the -The request-forwarding process begins with -When the ICP replies (if any) have been processed, we end up -at -The HTTP module first opens a connection to the origin server -or cache peer. If there is no idle persistent socket available, -a new connection request is given to the Network Communication -module with a callback function. The -When a TCP connection has been established, HTTP builds a request -buffer and submits it for writing on the socket. It then registers -a read handler to receive and process the HTTP reply. - - -As the reply is initially received, the HTTP reply headers are -parsed and placed into a reply data structure. As reply data -is read, it is appended to the -As the client-side is notified of new data, it copies the data -from the StoreEntry and submits it for writing on the client socket. - - -As data is appended to the -When the HTTP module finishes reading the reply from the upstream -server, it marks the -When the client-side has written all of the object data, it unregisters -itself from the +

+ + + A client connection is accepted by the + The access controls are checked. The client-side builds + an ACL state data structure and registers a callback function + for notification when access control checking is completed. + + + After the access controls have been verified, the client-side + looks for the requested object in the cache. If is a cache + hit, then the client-side registers its interest in the + + The request-forwarding process begins with + When the ICP replies (if any) have been processed, we end + up at + The HTTP module first opens a connection to the origin + server or cache peer. If there is no idle persistent socket + available, a new connection request is given to the Network + Communication module with a callback function. The + + When a TCP connection has been established, HTTP builds a + request buffer and submits it for writing on the socket. + It then registers a read handler to receive and process + the HTTP reply. + + + As the reply is initially received, the HTTP reply headers + are parsed and placed into a reply data structure. As + reply data is read, it is appended to the + As the client-side is notified of new data, it copies the + data from the StoreEntry and submits it for writing on the + client socket. + + + As data is appended to the + When the HTTP module finishes reading the reply from the + upstream server, it marks the + When the client-side has written all of the object data, + it unregisters itself from the Callback Functions The Main Loop: -At the core of Squid is the -The + At the core of Squid is the + The commSetSelect(fd, COMM_SELECT_READ, clientReadRequest, conn, 0); -In this example, clientReadRequest(fd, conn); -

-The I/O handlers are reset every time they are called. In other words, -a handler function must re-register itself with + The I/O handlers are reset every time they are called. In + other words, a handler function must re-register itself + with commSetSelect(fd, COMM_SELECT_READ, NULL, NULL, 0); -

-These I/O handlers (and others) and their associated callback data -pointers are saved in the + These I/O handlers (and others) and their associated callback + data pointers are saved in the struct _fde { ... @@ -558,75 +576,77 @@ pointers are saved in the - -In some situations we want to defer reading from a filedescriptor, -even though it has data for us to read. This may be the case -when data arrives from the server-side faster than it can -be written to the client-side. -Before adding a filedescriptor to the ``read set'' for select, -we call -These handlers are stored in the + In some situations we want to defer reading from a + filedescriptor, even though it has data for us to read. + This may be the case when data arrives from the server-side + faster than it can be written to the client-side. Before + adding a filedescriptor to the ``read set'' for select, we + call + These handlers are stored in the typedef void (*PF) (int, void *); -The close handler is really a linked list of handler functions. -Each handler also has an associated pointer - -After each handler is called, -Typical read handlers are - -The close handlers are normally called from -The timeout and lifetime handlers are called for file descriptors which -have been idle for too long. They are further discussed in a following -chapter. + The close handler is really a linked list of handler + functions. Each handler also has an associated pointer + + + After each handler is called, + Typical read handlers are + + The close handlers are normally called from + The timeout and lifetime handlers are called for file + descriptors which have been idle for too long. They are + further discussed in a following chapter. Processing Client Requests @@ -642,73 +662,81 @@ chapter. Introduction -

-The IP cache is a built-in component of squid providing -Hostname to IP-Number translation functionality and managing -the involved data-structures. Efficiency concerns require -mechanisms that allow non-blocking access to these mappings. -The IP cache usually doesn't block on a request except for special -cases where this is desired (see below). +

+ The IP cache is a built-in component of squid providing + Hostname to IP-Number translation functionality and managing + the involved data-structures. Efficiency concerns require + mechanisms that allow non-blocking access to these mappings. + The IP cache usually doesn't block on a request except for + special cases where this is desired (see below). Data Structures -

-The data structure used for storing name-address mappings -is a small hashtable (static hash_table *ip_table), -where structures of type ipcache_entry whose most -interesting members are: +

+ The data structure used for storing name-address mappings + is a small hashtable (static hash_table *ip_table), + where structures of type ipcache_entry whose most + interesting members are: -struct _ipcache_entry { -char *name; -time_t lastref; -ipcache_addrs addrs; -struct _ip_pending *pending_head; -char *error_message; -unsigned char locks; -ipcache_status_t status:3; -} + struct _ipcache_entry { + char *name; + time_t lastref; + ipcache_addrs addrs; + struct _ip_pending *pending_head; + char *error_message; + unsigned char locks; + ipcache_status_t status:3; + } External overview -

-Main functionality -is provided through calls to: - -ipcache_nbgethostbyname(const char *name, IPH *handler, void *handlerdata) - -where ipcache_gethostbyname(const char *name,int flags) -is different in that -it only checks if an entry exists in it's data-structures and does not by -default contact the DNS, unless this is requested, by setting the ipcache_init() is called from ipcache_restart() is called to clear the IP cache's data structures, -cancel all pending requests. Currently, it is only called from - +

+ Main functionality + is provided through calls to: + + + ipcache_nbgethostbyname(const char *name, IPH *handler, + void *handlerdata) + where ipcache_gethostbyname(const char *name,int flags) + is different in that it only checks if an entry exists in + it's data-structures and does not by default contact the + DNS, unless this is requested, by setting the ipcache_init() is called from ipcache_restart() is called to clear the IP + cache's data structures, cancel all pending requests. + Currently, it is only called from Internal Operation -

-Internally, the execution flow is as follows: On a miss, - + Internally, the execution flow is as follows: On a miss, + Server Protocols @@ -739,7 +767,7 @@ according to the Callback Data Database -

+

Squid's extensive use of callback functions makes it very susceptible to memory access errors. For a blocking operation with callback functions, the normal sequence of events is as @@ -758,7 +786,7 @@ according to the +

The callback data database lets us do this in a uniform and safe manner. Every callback_data pointer must be added to the database. It is then locked while the blocking operation executes @@ -778,7 +806,7 @@ according to the -

+

With this scheme, nothing bad happens if @@ -807,16 +835,18 @@ according to the HTTP Headers -

- + +

+ General remarks -

+

+

Most operations on Life cycle -

+

-

+

Prior to use, an +

Once initialized, the +

Note that there are no methods for "creating" or "destroying" a "dynamic" Header Manipulation. -

+

The mostly common operations on HTTP headers are testing for a particular header-field ( +

+

+

Special care must be taken when several header-fields with the same id are preset in the header. If HTTP protocol allows only one copy of the specified field per header @@ -910,13 +940,13 @@ according to the +

It is prohibited to ask for a List of values when only one value is permitted, and visa-versa. This restriction prevents a programmer from processing one value of an header-field while ignoring other valid values. -

+

+

The value being put using one of the +

Example: @@ -939,7 +969,7 @@ according to the -

+

There are two ways to delete a field from a header. To delete a "known" field (a field with "id" other than +

The I/O and Headers. -

+

To store a header in a file or socket, pack it using Adding new header-field ids. -

+

Adding new ids is simple. First add new HDR_ entry to the http_hdr_type enumeration in enums.h. Then describe a new header-field attributes in the HeadersAttrs array located @@ -998,7 +1028,7 @@ according to the +

Finally, add new id to one of the following arrays: +

Also, if the new field is a "list" header, add it to the +

In most cases, if you forget to include a new field id in one of the required arrays, you will get a run-time assertion. For rarely used fields, however, it may take a long time for an assertion to be triggered. -

+

There is virtually no limit on the number of fields supported by Squid. If current mask sizes cannot fit all the ids (you will get an assertion if that happens), simply enlarge @@ -1033,20 +1063,20 @@ according to the A Word on Efficiency. -

+

+

Adding new fields is somewhat expensive if they require complex conversions to a string. -

+

Deleting existing fields requires scan of all the entries and comparing their "id"s (faster) or "names" (slower) with the one specified for deletion. -

+

Most of the operations are faster than their "ascii string" equivalents.