doc/Programming-Guide/03_MajorComponents.dox

   1 /*
   2  * Copyright (C) 1996-2015 The Squid Software Foundation and contributors
   3  *
   4  * Squid software is distributed under GPLv2+ license and includes
   5  * contributions from numerous individuals and organizations.
   6  * Please see the COPYING and CONTRIBUTORS files for details.
   7  */
   8
   9 /**
  10 \ingroup Component
  11
  12 \section Overview of Squid Components
  13
  14 \par Squid consists of the following major components
  15
  16 \section ClientSideSocket Client Side Socket
  17
  18 \par
  19         Here new client connections are accepted, parsed, and
  20         reply data sent. Per-connection state information is held
  21         in a data structure called ConnStateData.  Per-request
  22         state information is stored in the clientSocketContext
  23         structure. With HTTP/1.1 we may have multiple requests from
  24         a single TCP connection.
  25 \todo DOCS: find out what has replaced clientSocketContext since it seems to not exist now.
  26
  27 \section ClientSideRequest Client Side Request
  28 \par
  29         This is where requests are processed. We determine if the
  30         request is to be redirected, if it passes access lists,
  31         and setup the initial client stream for internal requests.
  32         Temporary state for this processing is held in a
  33         clientRequestContext.
  34 \todo DOCS: find out what has replaced clientRequestContext since it seems not to exist now.
  35
  36 \section ClientSideReply Client Side Reply
  37 \par
  38         This is where we determine if the request is cache HIT,
  39         REFRESH, MISS, etc. This involves querying the store
  40         (possibly multiple times) to work through Vary lists and
  41         the list. Per-request state information is stored
  42         in the clientReplyContext.
  43
  44 \section StorageManager Storage Manager
  45 \par
  46         The Storage Manager is the glue between client and server
  47         sides.  Every object saved in the cache is allocated a
  48         StoreEntry structure.  While the object is being
  49         accessed, it also has a MemObject structure.
  50 \par
  51         Squid can quickly locate cached objects because it keeps
  52         (in memory) a hash table of all StoreEntry's.  The
  53         keys for the hash table are MD5 checksums of the objects
  54         URI.  In addition there is also a storage policy such
  55         as LRU that keeps track of the objects and determines
  56         the removal order when space needs to be reclaimed.
  57         For the LRU policy this is implemented as a doubly linked
  58         list.
  59 \par
  60         For each object the StoreEntry maps to a cache_dir
  61         and location via sdirno and sfileno. For the "ufs" store
  62         this file number (sfileno) is converted to a disk pathname
  63         by a simple modulo of L2 and L1, but other storage drivers may
  64         map sfilen in other ways.  A cache swap file consists
  65         of two parts: the cache metadata, and the object data.
  66         Note the object data includes the full HTTP reply---headers
  67         and body.  The HTTP reply headers are not the same as the
  68         cache metadata.
  69 \par
  70         Client-side requests register themselves with a StoreEntry
  71         to be notified when new data arrives.  Multiple clients
  72         may receive data via a single StoreEntry.  For POST
  73         and PUT request, this process works in reverse.  Server-side
  74         functions are notified when additional data is read from
  75         the client.
  76
  77 \section RequestForwarding Request Forwarding
  78
  79 \section PeerSelection Peer Selection
  80 \par
  81         These functions are responsible for selecting one (or none)
  82         of the neighbor caches as the appropriate forwarding
  83         location.
  84
  85 \section AccessControl Access Control
  86 \par
  87         These functions are responsible for allowing or denying a
  88         request, based on a number of different parameters.  These
  89         parameters include the client's IP address, the hostname
  90         of the requested resource, the request method, etc.  Some
  91         of the necessary information may not be immediately available,
  92         for example the origin server's IP address.  In these cases,
  93         the ACL routines initiate lookups for the necessary
  94         information and continues the access control checks when
  95         the information is available.
  96
  97 \section AuthenticationFramework Authentication Framework
  98 \par
  99         These functions are responsible for handling HTTP
 100         authentication.  They follow a modular framework allow
 101         different authentication schemes to be added at will. For
 102         information on working with the authentication schemes See
 103         the chapter Authentication Framework.
 104
 105 \section NetworkCommunication Network Communication
 106 \par
 107         These are the routines for communicating over TCP and UDP
 108         network sockets.  Here is where sockets are opened, closed,
 109         read, and written.  In addition, note that the heart of
 110         Squid (comm_select() or comm_poll()) exists here,
 111         even though it handles all file descriptors, not just
 112         network sockets.  These routines do not support queuing
 113         multiple blocks of data for writing.  Consequently, a
 114         callback occurs for every write request.
 115 \todo DOCS: decide what to do for comm_poll() since its either obsolete or uses other names.
 116
 117 \section FileDiskIO File/Disk I/O
 118 \par
 119         Routines for reading and writing disk files (and FIFOs).
 120         Reasons for separating network and disk I/O functions are
 121         partly historical, and partly because of different behaviors.
 122         For example, we don't worry about getting a "No space left
 123         on device" error for network sockets.  The disk I/O routines
 124         support queuing of multiple blocks for writing.  In some
 125         cases, it is possible to merge multiple blocks into a single
 126         write request.  The write callback does not necessarily
 127         occur for every write request.
 128
 129 \section Neighbors Neighbors
 130 \par
 131         Maintains the list of neighbor caches.  Sends and receives
 132         ICP messages to neighbors.  Decides which neighbors to
 133         query for a given request.  File: neighbors.c.
 134
 135 \section FQDNCache IP/FQDN Cache
 136 \par
 137         A cache of name-to-address and address-to-name lookups.
 138         These are hash tables keyed on the names and addresses.
 139         ipcache_nbgethostbyname() and fqdncache_nbgethostbyaddr()
 140         implement the non-blocking lookups.  Files: ipcache.c,
 141         fqdncache.c.
 142
 143 \section CacheManager Cache Manager
 144 \par
 145         This provides access to certain information needed by the
 146         cache administrator.  A companion program, cachemgr.cgi
 147         can be used to make this information available via a Web
 148         browser.  Cache manager requests to Squid are made with a
 149         special URL of the form
 150 \code
 151         cache_object://hostname/operation
 152 \endcode
 153         The cache manager provides essentially "read-only" access
 154         to information.  It does not provide a method for configuring
 155         Squid while it is running.
 156 \todo DOCS: get cachemgr.cgi documenting
 157
 158 \section NetworkMeasurementDB Network Measurement Database
 159 \par
 160         In a number of situation, Squid finds it useful to know the
 161         estimated network round-trip time (RTT) between itself and
 162         origin servers.  A particularly useful is example is
 163         the peer selection algorithm.  By making RTT measurements, a
 164         Squid cache will know if it, or one if its neighbors, is closest
 165         to a given origin server.  The actual measurements are made
 166         with the pinger program, described below.  The measured
 167         values are stored in a database indexed under two keys.  The
 168         primary index field is the /24 prefix of the origin server's
 169         IP address.  Secondly, a hash table of fully-qualified host
 170         names that have data structures with links to the appropriate
 171         network entry.  This allows Squid to quickly look up measurements
 172         when given either an IP address, or a host name.  The /24 prefix
 173         aggregation is used to reduce the overall database size.  File:
 174         net_db.c.
 175
 176 \section Redirectors Redirectors
 177 \par
 178         Squid has the ability to rewrite requests from clients.  After
 179         checking the ACL access controls, but before checking for cache hits,
 180         requested URLs may optionally be written to an external
 181         redirector process.  This program, which can be highly
 182         customized, may return a new URL to replace the original request.
 183         Common applications for this feature are extended access controls
 184         and local mirroring.  File: redirect.c.
 185
 186 \section ASN Autonomous System Numbers
 187 \par
 188         Squid supports Autonomous System (AS) numbers as another
 189         access control element.  The routines in asn.c
 190         query databases which map AS numbers into lists of CIDR
 191         prefixes.  These results are stored in a radix tree which
 192         allows fast searching of the AS number for a given IP address.
 193
 194 \section ConfigurationFileParsing Configuration File Parsing
 195 \par
 196         The primary configuration file specification is in the file
 197         cf.data.pre.  A simple utility program, cf_gen,
 198         reads the cf.data.pre file and generates cf_parser.c
 199         and squid.conf.  cf_parser.c is included directly
 200         into cache_cf.c at compile time.
 201 \todo DOCS: get cf.data.pre documenting
 202 \todo DOCS: get squid.conf documenting
 203 \todo DOCS: get cf_gen documenting and linking.
 204
 205 \section Callback Data Allocator
 206 \par
 207         Squid's extensive use of callback functions makes it very
 208         susceptible to memory access errors.  Care must be taken
 209         so that the callback_data memory is still valid when
 210         the callback function is executed.  The routines in cbdata.c
 211         provide a uniform method for managing callback data memory,
 212         canceling callbacks, and preventing erroneous memory accesses.
 213 \todo DOCS: get callback_data (object?) linking or repalcement named.
 214
 215 \section RefCountDataAllocator Refcount Data Allocator
 216 \since Squid 3.0
 217 \par
 218         Manual reference counting such as cbdata uses is error prone,
 219         and time consuming for the programmer. C++'s operator overloading
 220         allows us to create automatic reference counting pointers, that will
 221         free objects when they are no longer needed. With some care these
 222         objects can be passed to functions needed Callback Data pointers.
 223 \todo DOCS: get cbdata documenting and linking.
 224
 225 \section Debugging Debugging
 226 \par
 227         Squid includes extensive debugging statements to assist in
 228         tracking down bugs and strange behavior.  Every debug statement
 229         is assigned a section and level.  Usually, every debug statement
 230         in the same source file has the same section.  Levels are chosen
 231         depending on how much output will be generated, or how useful the
 232         provided information will be.  The \em debug_options line
 233         in the configuration file determines which debug statements will
 234         be shown and which will not.  The \em debug_options line
 235         assigns a maximum level for every section.  If a given debug
 236         statement has a level less than or equal to the configured
 237         level for that section, it will be shown.  This description
 238         probably sounds more complicated than it really is.
 239         File: debug.c.  Note that debugs() itself is a macro.
 240 \todo DOCS: get debugs() documenting as if it was a function.
 241
 242 \section ErrorGeneration Error Generation
 243 \par
 244         The routines in errorpage.c generate error messages from
 245         a template file and specific request parameters.  This allows
 246         for customized error messages and multilingual support.
 247
 248 \section EventQueue Event Queue
 249 \par
 250         The routines in event.c maintain a linked-list event
 251         queue for functions to be executed at a future time.  The
 252         event queue is used for periodic functions such as performing
 253         cache replacement, cleaning swap directories, as well as one-time
 254         functions such as ICP query timeouts.
 255
 256 \section FiledescriptorManagement Filedescriptor Management
 257 \par
 258         Here we track the number of filedescriptors in use, and the
 259         number of bytes which has been read from or written to each
 260         file descriptor.
 261
 262
 263 \section HashtableSupport Hashtable Support
 264 \par
 265         These routines implement generic hash tables.  A hash table
 266         is created with a function for hashing the key values, and a
 267         function for comparing the key values.
 268
 269 \section HTTPAnonymization HTTP Anonymization
 270 \par
 271         These routines support anonymizing of HTTP requests leaving
 272         the cache.  Either specific request headers will be removed
 273         (the "standard" mode), or only specific request headers
 274         will be allowed (the "paranoid" mode).
 275
 276 \section DelayPools Delay Pools
 277 \par
 278         Delay pools provide bandwidth regulation by restricting the rate
 279         at which squid reads from a server before sending to a client. They
 280         do not prevent cache hits from being sent at maximal capacity. Delay
 281         pools can aggregate the bandwidth from multiple machines and users
 282         to provide more or less general restrictions.
 283
 284 \section ICPSupport Internet Cache Protocol
 285 \par
 286         Here we implement the Internet Cache Protocol.  This
 287         protocol is documented in the RFC 2186 and RFC 2187.
 288         The bulk of code is in the icp_v2.c file.  The
 289         other, icp_v3.c is a single function for handling
 290         ICP queries from Netcache/Netapp caches; they use
 291         a different version number and a slightly different message
 292         format.
 293 \todo DOCS: get RFCs linked from ietf
 294
 295 \section IdentLookups Ident Lookups
 296 \par
 297         These routines support RFC 931 (http://www.ietf.org/rfc/rfc931.txt)
 298         "Ident" lookups.   An ident
 299         server running on a host will report the user name associated
 300         with a connected TCP socket.  Some sites use this facility for
 301         access control and logging purposes.
 302
 303 \section MemoryManagement Memory Management
 304 \par
 305         These routines allocate and manage pools of memory for
 306         frequently-used data structures.  When the \em memory_pools
 307         configuration option is enabled, unused memory is not actually
 308         freed.  Instead it is kept for future use.  This may result
 309         in more efficient use of memory at the expense of a larger
 310         process size.
 311
 312 \section MulticastSupport Multicast Support
 313 \par
 314         Currently, multicast is only used for ICP queries.   The
 315         routines in this file implement joining a UDP
 316         socket to a multicast group (or groups), and setting
 317         the multicast TTL value on outgoing packets.
 318
 319 \section PresistentConnections Persistent Server Connections
 320 \par
 321         These routines manage idle, persistent HTTP connections
 322         to origin servers and neighbor caches.  Idle sockets
 323         are indexed in a hash table by their socket address
 324         (IP address and port number).  Up to 10 idle sockets
 325         will be kept for each socket address, but only for
 326         15 seconds.  After 15 seconds, idle socket connections
 327         are closed.
 328
 329 \section RefreshRules Refresh Rules
 330 \par
 331         These routines decide whether a cached object is stale or fresh,
 332         based on the \em refresh_pattern configuration options.
 333         If an object is fresh, it can be returned as a cache hit.
 334         If it is stale, then it must be revalidated with an
 335         If-Modified-Since request.
 336
 337 \section SNMPSupport SNMP Support
 338 \par
 339         These routines implement SNMP for Squid.  At the present time,
 340         we have made almost all of the cachemgr information available
 341         via SNMP.
 342
 343 \section URNSupport URN Support
 344 \par
 345         We are experimenting with URN support in Squid version 1.2.
 346         Note, we're not talking full-blown generic URN's here. This
 347         is primarily targeted toward using URN's as an smart way
 348         of handling lists of mirror sites.  For more details, please
 349         see (http://squid.nlanr.net/Squid/urn-support.html) URN Support in Squid
 350         .
 351
 352 \section ESI ESI
 353 \par
 354         ESI is an implementation of Edge Side Includes (http://www.esi.org).
 355         ESI is implemented as a client side stream and a small
 356         modification to client_side_reply.c to check whether
 357         ESI should be inserted into the reply stream or not.
 358
 359  */