]> git.ipfire.org Git - thirdparty/squid.git/blob - doc/Programming-Guide/03_MajorComponents.dox
SourceFormat Enforcement
[thirdparty/squid.git] / doc / Programming-Guide / 03_MajorComponents.dox
1 /*
2 * Copyright (C) 1996-2015 The Squid Software Foundation and contributors
3 *
4 * Squid software is distributed under GPLv2+ license and includes
5 * contributions from numerous individuals and organizations.
6 * Please see the COPYING and CONTRIBUTORS files for details.
7 */
8
9 /**
10 \ingroup Component
11
12 \section Overview of Squid Components
13
14 \par Squid consists of the following major components
15
16 \section ClientSideSocket Client Side Socket
17
18 \par
19 Here new client connections are accepted, parsed, and
20 reply data sent. Per-connection state information is held
21 in a data structure called ConnStateData. Per-request
22 state information is stored in the clientSocketContext
23 structure. With HTTP/1.1 we may have multiple requests from
24 a single TCP connection.
25 \todo DOCS: find out what has replaced clientSocketContext since it seems to not exist now.
26
27 \section ClientSideRequest Client Side Request
28 \par
29 This is where requests are processed. We determine if the
30 request is to be redirected, if it passes access lists,
31 and setup the initial client stream for internal requests.
32 Temporary state for this processing is held in a
33 clientRequestContext.
34 \todo DOCS: find out what has replaced clientRequestContext since it seems not to exist now.
35
36 \section ClientSideReply Client Side Reply
37 \par
38 This is where we determine if the request is cache HIT,
39 REFRESH, MISS, etc. This involves querying the store
40 (possibly multiple times) to work through Vary lists and
41 the list. Per-request state information is stored
42 in the clientReplyContext.
43
44 \section StorageManager Storage Manager
45 \par
46 The Storage Manager is the glue between client and server
47 sides. Every object saved in the cache is allocated a
48 StoreEntry structure. While the object is being
49 accessed, it also has a MemObject structure.
50 \par
51 Squid can quickly locate cached objects because it keeps
52 (in memory) a hash table of all StoreEntry's. The
53 keys for the hash table are MD5 checksums of the objects
54 URI. In addition there is also a storage policy such
55 as LRU that keeps track of the objects and determines
56 the removal order when space needs to be reclaimed.
57 For the LRU policy this is implemented as a doubly linked
58 list.
59 \par
60 For each object the StoreEntry maps to a cache_dir
61 and location via sdirno and sfileno. For the "ufs" store
62 this file number (sfileno) is converted to a disk pathname
63 by a simple modulo of L2 and L1, but other storage drivers may
64 map sfilen in other ways. A cache swap file consists
65 of two parts: the cache metadata, and the object data.
66 Note the object data includes the full HTTP reply---headers
67 and body. The HTTP reply headers are not the same as the
68 cache metadata.
69 \par
70 Client-side requests register themselves with a StoreEntry
71 to be notified when new data arrives. Multiple clients
72 may receive data via a single StoreEntry. For POST
73 and PUT request, this process works in reverse. Server-side
74 functions are notified when additional data is read from
75 the client.
76
77 \section RequestForwarding Request Forwarding
78
79 \section PeerSelection Peer Selection
80 \par
81 These functions are responsible for selecting one (or none)
82 of the neighbor caches as the appropriate forwarding
83 location.
84
85 \section AccessControl Access Control
86 \par
87 These functions are responsible for allowing or denying a
88 request, based on a number of different parameters. These
89 parameters include the client's IP address, the hostname
90 of the requested resource, the request method, etc. Some
91 of the necessary information may not be immediately available,
92 for example the origin server's IP address. In these cases,
93 the ACL routines initiate lookups for the necessary
94 information and continues the access control checks when
95 the information is available.
96
97 \section AuthenticationFramework Authentication Framework
98 \par
99 These functions are responsible for handling HTTP
100 authentication. They follow a modular framework allow
101 different authentication schemes to be added at will. For
102 information on working with the authentication schemes See
103 the chapter Authentication Framework.
104
105 \section NetworkCommunication Network Communication
106 \par
107 These are the routines for communicating over TCP and UDP
108 network sockets. Here is where sockets are opened, closed,
109 read, and written. In addition, note that the heart of
110 Squid (comm_select() or comm_poll()) exists here,
111 even though it handles all file descriptors, not just
112 network sockets. These routines do not support queuing
113 multiple blocks of data for writing. Consequently, a
114 callback occurs for every write request.
115 \todo DOCS: decide what to do for comm_poll() since its either obsolete or uses other names.
116
117 \section FileDiskIO File/Disk I/O
118 \par
119 Routines for reading and writing disk files (and FIFOs).
120 Reasons for separating network and disk I/O functions are
121 partly historical, and partly because of different behaviors.
122 For example, we don't worry about getting a "No space left
123 on device" error for network sockets. The disk I/O routines
124 support queuing of multiple blocks for writing. In some
125 cases, it is possible to merge multiple blocks into a single
126 write request. The write callback does not necessarily
127 occur for every write request.
128
129 \section Neighbors Neighbors
130 \par
131 Maintains the list of neighbor caches. Sends and receives
132 ICP messages to neighbors. Decides which neighbors to
133 query for a given request. File: neighbors.c.
134
135 \section FQDNCache IP/FQDN Cache
136 \par
137 A cache of name-to-address and address-to-name lookups.
138 These are hash tables keyed on the names and addresses.
139 ipcache_nbgethostbyname() and fqdncache_nbgethostbyaddr()
140 implement the non-blocking lookups. Files: ipcache.c,
141 fqdncache.c.
142
143 \section CacheManager Cache Manager
144 \par
145 This provides access to certain information needed by the
146 cache administrator. A companion program, cachemgr.cgi
147 can be used to make this information available via a Web
148 browser. Cache manager requests to Squid are made with a
149 special URL of the form
150 \code
151 cache_object://hostname/operation
152 \endcode
153 The cache manager provides essentially "read-only" access
154 to information. It does not provide a method for configuring
155 Squid while it is running.
156 \todo DOCS: get cachemgr.cgi documenting
157
158 \section NetworkMeasurementDB Network Measurement Database
159 \par
160 In a number of situation, Squid finds it useful to know the
161 estimated network round-trip time (RTT) between itself and
162 origin servers. A particularly useful is example is
163 the peer selection algorithm. By making RTT measurements, a
164 Squid cache will know if it, or one if its neighbors, is closest
165 to a given origin server. The actual measurements are made
166 with the pinger program, described below. The measured
167 values are stored in a database indexed under two keys. The
168 primary index field is the /24 prefix of the origin server's
169 IP address. Secondly, a hash table of fully-qualified host
170 names that have data structures with links to the appropriate
171 network entry. This allows Squid to quickly look up measurements
172 when given either an IP address, or a host name. The /24 prefix
173 aggregation is used to reduce the overall database size. File:
174 net_db.c.
175
176 \section Redirectors Redirectors
177 \par
178 Squid has the ability to rewrite requests from clients. After
179 checking the ACL access controls, but before checking for cache hits,
180 requested URLs may optionally be written to an external
181 redirector process. This program, which can be highly
182 customized, may return a new URL to replace the original request.
183 Common applications for this feature are extended access controls
184 and local mirroring. File: redirect.c.
185
186 \section ASN Autonomous System Numbers
187 \par
188 Squid supports Autonomous System (AS) numbers as another
189 access control element. The routines in asn.c
190 query databases which map AS numbers into lists of CIDR
191 prefixes. These results are stored in a radix tree which
192 allows fast searching of the AS number for a given IP address.
193
194 \section ConfigurationFileParsing Configuration File Parsing
195 \par
196 The primary configuration file specification is in the file
197 cf.data.pre. A simple utility program, cf_gen,
198 reads the cf.data.pre file and generates cf_parser.c
199 and squid.conf. cf_parser.c is included directly
200 into cache_cf.c at compile time.
201 \todo DOCS: get cf.data.pre documenting
202 \todo DOCS: get squid.conf documenting
203 \todo DOCS: get cf_gen documenting and linking.
204
205 \section Callback Data Allocator
206 \par
207 Squid's extensive use of callback functions makes it very
208 susceptible to memory access errors. Care must be taken
209 so that the callback_data memory is still valid when
210 the callback function is executed. The routines in cbdata.c
211 provide a uniform method for managing callback data memory,
212 canceling callbacks, and preventing erroneous memory accesses.
213 \todo DOCS: get callback_data (object?) linking or repalcement named.
214
215 \section RefCountDataAllocator Refcount Data Allocator
216 \since Squid 3.0
217 \par
218 Manual reference counting such as cbdata uses is error prone,
219 and time consuming for the programmer. C++'s operator overloading
220 allows us to create automatic reference counting pointers, that will
221 free objects when they are no longer needed. With some care these
222 objects can be passed to functions needed Callback Data pointers.
223 \todo DOCS: get cbdata documenting and linking.
224
225 \section Debugging Debugging
226 \par
227 Squid includes extensive debugging statements to assist in
228 tracking down bugs and strange behavior. Every debug statement
229 is assigned a section and level. Usually, every debug statement
230 in the same source file has the same section. Levels are chosen
231 depending on how much output will be generated, or how useful the
232 provided information will be. The \em debug_options line
233 in the configuration file determines which debug statements will
234 be shown and which will not. The \em debug_options line
235 assigns a maximum level for every section. If a given debug
236 statement has a level less than or equal to the configured
237 level for that section, it will be shown. This description
238 probably sounds more complicated than it really is.
239 File: debug.c. Note that debugs() itself is a macro.
240 \todo DOCS: get debugs() documenting as if it was a function.
241
242 \section ErrorGeneration Error Generation
243 \par
244 The routines in errorpage.c generate error messages from
245 a template file and specific request parameters. This allows
246 for customized error messages and multilingual support.
247
248 \section EventQueue Event Queue
249 \par
250 The routines in event.c maintain a linked-list event
251 queue for functions to be executed at a future time. The
252 event queue is used for periodic functions such as performing
253 cache replacement, cleaning swap directories, as well as one-time
254 functions such as ICP query timeouts.
255
256 \section FiledescriptorManagement Filedescriptor Management
257 \par
258 Here we track the number of filedescriptors in use, and the
259 number of bytes which has been read from or written to each
260 file descriptor.
261
262
263 \section HashtableSupport Hashtable Support
264 \par
265 These routines implement generic hash tables. A hash table
266 is created with a function for hashing the key values, and a
267 function for comparing the key values.
268
269 \section HTTPAnonymization HTTP Anonymization
270 \par
271 These routines support anonymizing of HTTP requests leaving
272 the cache. Either specific request headers will be removed
273 (the "standard" mode), or only specific request headers
274 will be allowed (the "paranoid" mode).
275
276 \section DelayPools Delay Pools
277 \par
278 Delay pools provide bandwidth regulation by restricting the rate
279 at which squid reads from a server before sending to a client. They
280 do not prevent cache hits from being sent at maximal capacity. Delay
281 pools can aggregate the bandwidth from multiple machines and users
282 to provide more or less general restrictions.
283
284 \section ICPSupport Internet Cache Protocol
285 \par
286 Here we implement the Internet Cache Protocol. This
287 protocol is documented in the RFC 2186 and RFC 2187.
288 The bulk of code is in the icp_v2.c file. The
289 other, icp_v3.c is a single function for handling
290 ICP queries from Netcache/Netapp caches; they use
291 a different version number and a slightly different message
292 format.
293 \todo DOCS: get RFCs linked from ietf
294
295 \section IdentLookups Ident Lookups
296 \par
297 These routines support RFC 931 (http://www.ietf.org/rfc/rfc931.txt)
298 "Ident" lookups. An ident
299 server running on a host will report the user name associated
300 with a connected TCP socket. Some sites use this facility for
301 access control and logging purposes.
302
303 \section MemoryManagement Memory Management
304 \par
305 These routines allocate and manage pools of memory for
306 frequently-used data structures. When the \em memory_pools
307 configuration option is enabled, unused memory is not actually
308 freed. Instead it is kept for future use. This may result
309 in more efficient use of memory at the expense of a larger
310 process size.
311
312 \section MulticastSupport Multicast Support
313 \par
314 Currently, multicast is only used for ICP queries. The
315 routines in this file implement joining a UDP
316 socket to a multicast group (or groups), and setting
317 the multicast TTL value on outgoing packets.
318
319 \section PresistentConnections Persistent Server Connections
320 \par
321 These routines manage idle, persistent HTTP connections
322 to origin servers and neighbor caches. Idle sockets
323 are indexed in a hash table by their socket address
324 (IP address and port number). Up to 10 idle sockets
325 will be kept for each socket address, but only for
326 15 seconds. After 15 seconds, idle socket connections
327 are closed.
328
329 \section RefreshRules Refresh Rules
330 \par
331 These routines decide whether a cached object is stale or fresh,
332 based on the \em refresh_pattern configuration options.
333 If an object is fresh, it can be returned as a cache hit.
334 If it is stale, then it must be revalidated with an
335 If-Modified-Since request.
336
337 \section SNMPSupport SNMP Support
338 \par
339 These routines implement SNMP for Squid. At the present time,
340 we have made almost all of the cachemgr information available
341 via SNMP.
342
343 \section URNSupport URN Support
344 \par
345 We are experimenting with URN support in Squid version 1.2.
346 Note, we're not talking full-blown generic URN's here. This
347 is primarily targeted toward using URN's as an smart way
348 of handling lists of mirror sites. For more details, please
349 see (http://squid.nlanr.net/Squid/urn-support.html) URN Support in Squid
350 .
351
352 \section ESI ESI
353 \par
354 ESI is an implementation of Edge Side Includes (http://www.esi.org).
355 ESI is implemented as a client side stream and a small
356 modification to client_side_reply.c to check whether
357 ESI should be inserted into the reply stream or not.
358
359 */