/* * Copyright (C) 1996-2014 The Squid Software Foundation and contributors * * Squid software is distributed under GPLv2+ license and includes * contributions from numerous individuals and organizations. * Please see the COPYING and CONTRIBUTORS files for details. */ /** \ingroup Component \section Overview of Squid Components \par Squid consists of the following major components \section ClientSideSocket Client Side Socket \par Here new client connections are accepted, parsed, and reply data sent. Per-connection state information is held in a data structure called ConnStateData. Per-request state information is stored in the clientSocketContext structure. With HTTP/1.1 we may have multiple requests from a single TCP connection. \todo DOCS: find out what has replaced clientSocketContext since it seems to not exist now. \section ClientSideRequest Client Side Request \par This is where requests are processed. We determine if the request is to be redirected, if it passes access lists, and setup the initial client stream for internal requests. Temporary state for this processing is held in a clientRequestContext. \todo DOCS: find out what has replaced clientRequestContext since it seems not to exist now. \section ClientSideReply Client Side Reply \par This is where we determine if the request is cache HIT, REFRESH, MISS, etc. This involves querying the store (possibly multiple times) to work through Vary lists and the list. Per-request state information is stored in the clientReplyContext. \section StorageManager Storage Manager \par The Storage Manager is the glue between client and server sides. Every object saved in the cache is allocated a StoreEntry structure. While the object is being accessed, it also has a MemObject structure. \par Squid can quickly locate cached objects because it keeps (in memory) a hash table of all StoreEntry's. The keys for the hash table are MD5 checksums of the objects URI. In addition there is also a storage policy such as LRU that keeps track of the objects and determines the removal order when space needs to be reclaimed. For the LRU policy this is implemented as a doubly linked list. \par For each object the StoreEntry maps to a cache_dir and location via sdirno and sfileno. For the "ufs" store this file number (sfileno) is converted to a disk pathname by a simple modulo of L2 and L1, but other storage drivers may map sfilen in other ways. A cache swap file consists of two parts: the cache metadata, and the object data. Note the object data includes the full HTTP reply---headers and body. The HTTP reply headers are not the same as the cache metadata. \par Client-side requests register themselves with a StoreEntry to be notified when new data arrives. Multiple clients may receive data via a single StoreEntry. For POST and PUT request, this process works in reverse. Server-side functions are notified when additional data is read from the client. \section RequestForwarding Request Forwarding \section PeerSelection Peer Selection \par These functions are responsible for selecting one (or none) of the neighbor caches as the appropriate forwarding location. \section AccessControl Access Control \par These functions are responsible for allowing or denying a request, based on a number of different parameters. These parameters include the client's IP address, the hostname of the requested resource, the request method, etc. Some of the necessary information may not be immediately available, for example the origin server's IP address. In these cases, the ACL routines initiate lookups for the necessary information and continues the access control checks when the information is available. \section AuthenticationFramework Authentication Framework \par These functions are responsible for handling HTTP authentication. They follow a modular framework allow different authentication schemes to be added at will. For information on working with the authentication schemes See the chapter Authentication Framework. \section NetworkCommunication Network Communication \par These are the routines for communicating over TCP and UDP network sockets. Here is where sockets are opened, closed, read, and written. In addition, note that the heart of Squid (comm_select() or comm_poll()) exists here, even though it handles all file descriptors, not just network sockets. These routines do not support queuing multiple blocks of data for writing. Consequently, a callback occurs for every write request. \todo DOCS: decide what to do for comm_poll() since its either obsolete or uses other names. \section FileDiskIO File/Disk I/O \par Routines for reading and writing disk files (and FIFOs). Reasons for separating network and disk I/O functions are partly historical, and partly because of different behaviors. For example, we don't worry about getting a "No space left on device" error for network sockets. The disk I/O routines support queuing of multiple blocks for writing. In some cases, it is possible to merge multiple blocks into a single write request. The write callback does not necessarily occur for every write request. \section Neighbors Neighbors \par Maintains the list of neighbor caches. Sends and receives ICP messages to neighbors. Decides which neighbors to query for a given request. File: neighbors.c. \section FQDNCache IP/FQDN Cache \par A cache of name-to-address and address-to-name lookups. These are hash tables keyed on the names and addresses. ipcache_nbgethostbyname() and fqdncache_nbgethostbyaddr() implement the non-blocking lookups. Files: ipcache.c, fqdncache.c. \section CacheManager Cache Manager \par This provides access to certain information needed by the cache administrator. A companion program, cachemgr.cgi can be used to make this information available via a Web browser. Cache manager requests to Squid are made with a special URL of the form \code cache_object://hostname/operation \endcode The cache manager provides essentially "read-only" access to information. It does not provide a method for configuring Squid while it is running. \todo DOCS: get cachemgr.cgi documenting \section NetworkMeasurementDB Network Measurement Database \par In a number of situation, Squid finds it useful to know the estimated network round-trip time (RTT) between itself and origin servers. A particularly useful is example is the peer selection algorithm. By making RTT measurements, a Squid cache will know if it, or one if its neighbors, is closest to a given origin server. The actual measurements are made with the pinger program, described below. The measured values are stored in a database indexed under two keys. The primary index field is the /24 prefix of the origin server's IP address. Secondly, a hash table of fully-qualified host names that have data structures with links to the appropriate network entry. This allows Squid to quickly look up measurements when given either an IP address, or a host name. The /24 prefix aggregation is used to reduce the overall database size. File: net_db.c. \section Redirectors Redirectors \par Squid has the ability to rewrite requests from clients. After checking the ACL access controls, but before checking for cache hits, requested URLs may optionally be written to an external redirector process. This program, which can be highly customized, may return a new URL to replace the original request. Common applications for this feature are extended access controls and local mirroring. File: redirect.c. \section ASN Autonomous System Numbers \par Squid supports Autonomous System (AS) numbers as another access control element. The routines in asn.c query databases which map AS numbers into lists of CIDR prefixes. These results are stored in a radix tree which allows fast searching of the AS number for a given IP address. \section ConfigurationFileParsing Configuration File Parsing \par The primary configuration file specification is in the file cf.data.pre. A simple utility program, cf_gen, reads the cf.data.pre file and generates cf_parser.c and squid.conf. cf_parser.c is included directly into cache_cf.c at compile time. \todo DOCS: get cf.data.pre documenting \todo DOCS: get squid.conf documenting \todo DOCS: get cf_gen documenting and linking. \section Callback Data Allocator \par Squid's extensive use of callback functions makes it very susceptible to memory access errors. Care must be taken so that the callback_data memory is still valid when the callback function is executed. The routines in cbdata.c provide a uniform method for managing callback data memory, canceling callbacks, and preventing erroneous memory accesses. \todo DOCS: get callback_data (object?) linking or repalcement named. \section RefCountDataAllocator Refcount Data Allocator \since Squid 3.0 \par Manual reference counting such as cbdata uses is error prone, and time consuming for the programmer. C++'s operator overloading allows us to create automatic reference counting pointers, that will free objects when they are no longer needed. With some care these objects can be passed to functions needed Callback Data pointers. \todo DOCS: get cbdata documenting and linking. \section Debugging Debugging \par Squid includes extensive debugging statements to assist in tracking down bugs and strange behavior. Every debug statement is assigned a section and level. Usually, every debug statement in the same source file has the same section. Levels are chosen depending on how much output will be generated, or how useful the provided information will be. The \em debug_options line in the configuration file determines which debug statements will be shown and which will not. The \em debug_options line assigns a maximum level for every section. If a given debug statement has a level less than or equal to the configured level for that section, it will be shown. This description probably sounds more complicated than it really is. File: debug.c. Note that debugs() itself is a macro. \todo DOCS: get debugs() documenting as if it was a function. \section ErrorGeneration Error Generation \par The routines in errorpage.c generate error messages from a template file and specific request parameters. This allows for customized error messages and multilingual support. \section EventQueue Event Queue \par The routines in event.c maintain a linked-list event queue for functions to be executed at a future time. The event queue is used for periodic functions such as performing cache replacement, cleaning swap directories, as well as one-time functions such as ICP query timeouts. \section FiledescriptorManagement Filedescriptor Management \par Here we track the number of filedescriptors in use, and the number of bytes which has been read from or written to each file descriptor. \section HashtableSupport Hashtable Support \par These routines implement generic hash tables. A hash table is created with a function for hashing the key values, and a function for comparing the key values. \section HTTPAnonymization HTTP Anonymization \par These routines support anonymizing of HTTP requests leaving the cache. Either specific request headers will be removed (the "standard" mode), or only specific request headers will be allowed (the "paranoid" mode). \section DelayPools Delay Pools \par Delay pools provide bandwidth regulation by restricting the rate at which squid reads from a server before sending to a client. They do not prevent cache hits from being sent at maximal capacity. Delay pools can aggregate the bandwidth from multiple machines and users to provide more or less general restrictions. \section ICPSupport Internet Cache Protocol \par Here we implement the Internet Cache Protocol. This protocol is documented in the RFC 2186 and RFC 2187. The bulk of code is in the icp_v2.c file. The other, icp_v3.c is a single function for handling ICP queries from Netcache/Netapp caches; they use a different version number and a slightly different message format. \todo DOCS: get RFCs linked from ietf \section IdentLookups Ident Lookups \par These routines support RFC 931 (http://www.ietf.org/rfc/rfc931.txt) "Ident" lookups. An ident server running on a host will report the user name associated with a connected TCP socket. Some sites use this facility for access control and logging purposes. \section MemoryManagement Memory Management \par These routines allocate and manage pools of memory for frequently-used data structures. When the \em memory_pools configuration option is enabled, unused memory is not actually freed. Instead it is kept for future use. This may result in more efficient use of memory at the expense of a larger process size. \section MulticastSupport Multicast Support \par Currently, multicast is only used for ICP queries. The routines in this file implement joining a UDP socket to a multicast group (or groups), and setting the multicast TTL value on outgoing packets. \section PresistentConnections Persistent Server Connections \par These routines manage idle, persistent HTTP connections to origin servers and neighbor caches. Idle sockets are indexed in a hash table by their socket address (IP address and port number). Up to 10 idle sockets will be kept for each socket address, but only for 15 seconds. After 15 seconds, idle socket connections are closed. \section RefreshRules Refresh Rules \par These routines decide whether a cached object is stale or fresh, based on the \em refresh_pattern configuration options. If an object is fresh, it can be returned as a cache hit. If it is stale, then it must be revalidated with an If-Modified-Since request. \section SNMPSupport SNMP Support \par These routines implement SNMP for Squid. At the present time, we have made almost all of the cachemgr information available via SNMP. \section URNSupport URN Support \par We are experimenting with URN support in Squid version 1.2. Note, we're not talking full-blown generic URN's here. This is primarily targeted toward using URN's as an smart way of handling lists of mirror sites. For more details, please see (http://squid.nlanr.net/Squid/urn-support.html) URN Support in Squid . \section ESI ESI \par ESI is an implementation of Edge Side Includes (http://www.esi.org). ESI is implemented as a client side stream and a small modification to client_side_reply.c to check whether ESI should be inserted into the reply stream or not. */