]>
Commit | Line | Data |
---|---|---|
f2a134b9 | 1 | /* |
77b1029d | 2 | * Copyright (C) 1996-2020 The Squid Software Foundation and contributors |
f2a134b9 AJ |
3 | * |
4 | * Squid software is distributed under GPLv2+ license and includes | |
5 | * contributions from numerous individuals and organizations. | |
6 | * Please see the COPYING and CONTRIBUTORS files for details. | |
7 | */ | |
8 | ||
8b651fb3 | 9 | /** |
10 | \ingroup Component | |
11 | ||
12 | \section Overview of Squid Components | |
13 | ||
14 | \par Squid consists of the following major components | |
15 | ||
16 | \section ClientSideSocket Client Side Socket | |
17 | ||
18 | \par | |
19 | Here new client connections are accepted, parsed, and | |
20 | reply data sent. Per-connection state information is held | |
21 | in a data structure called ConnStateData. Per-request | |
22 | state information is stored in the clientSocketContext | |
23 | structure. With HTTP/1.1 we may have multiple requests from | |
24 | a single TCP connection. | |
9837567d | 25 | TODO: find out what has replaced clientSocketContext since it seems to not exist now. |
8b651fb3 | 26 | |
27 | \section ClientSideRequest Client Side Request | |
28 | \par | |
29 | This is where requests are processed. We determine if the | |
30 | request is to be redirected, if it passes access lists, | |
31 | and setup the initial client stream for internal requests. | |
32 | Temporary state for this processing is held in a | |
33 | clientRequestContext. | |
9837567d | 34 | TODO: find out what has replaced clientRequestContext since it seems not to exist now. |
8b651fb3 | 35 | |
36 | \section ClientSideReply Client Side Reply | |
37 | \par | |
38 | This is where we determine if the request is cache HIT, | |
39 | REFRESH, MISS, etc. This involves querying the store | |
40 | (possibly multiple times) to work through Vary lists and | |
41 | the list. Per-request state information is stored | |
42 | in the clientReplyContext. | |
43 | ||
44 | \section StorageManager Storage Manager | |
45 | \par | |
46 | The Storage Manager is the glue between client and server | |
47 | sides. Every object saved in the cache is allocated a | |
48 | StoreEntry structure. While the object is being | |
49 | accessed, it also has a MemObject structure. | |
50 | \par | |
51 | Squid can quickly locate cached objects because it keeps | |
52 | (in memory) a hash table of all StoreEntry's. The | |
53 | keys for the hash table are MD5 checksums of the objects | |
54 | URI. In addition there is also a storage policy such | |
55 | as LRU that keeps track of the objects and determines | |
56 | the removal order when space needs to be reclaimed. | |
57 | For the LRU policy this is implemented as a doubly linked | |
58 | list. | |
59 | \par | |
60 | For each object the StoreEntry maps to a cache_dir | |
61 | and location via sdirno and sfileno. For the "ufs" store | |
62 | this file number (sfileno) is converted to a disk pathname | |
63 | by a simple modulo of L2 and L1, but other storage drivers may | |
64 | map sfilen in other ways. A cache swap file consists | |
65 | of two parts: the cache metadata, and the object data. | |
66 | Note the object data includes the full HTTP reply---headers | |
67 | and body. The HTTP reply headers are not the same as the | |
68 | cache metadata. | |
69 | \par | |
70 | Client-side requests register themselves with a StoreEntry | |
71 | to be notified when new data arrives. Multiple clients | |
72 | may receive data via a single StoreEntry. For POST | |
73 | and PUT request, this process works in reverse. Server-side | |
74 | functions are notified when additional data is read from | |
75 | the client. | |
76 | ||
77 | \section RequestForwarding Request Forwarding | |
78 | ||
79 | \section PeerSelection Peer Selection | |
80 | \par | |
81 | These functions are responsible for selecting one (or none) | |
82 | of the neighbor caches as the appropriate forwarding | |
83 | location. | |
84 | ||
85 | \section AccessControl Access Control | |
86 | \par | |
87 | These functions are responsible for allowing or denying a | |
88 | request, based on a number of different parameters. These | |
89 | parameters include the client's IP address, the hostname | |
90 | of the requested resource, the request method, etc. Some | |
91 | of the necessary information may not be immediately available, | |
92 | for example the origin server's IP address. In these cases, | |
93 | the ACL routines initiate lookups for the necessary | |
94 | information and continues the access control checks when | |
95 | the information is available. | |
96 | ||
97 | \section AuthenticationFramework Authentication Framework | |
98 | \par | |
99 | These functions are responsible for handling HTTP | |
100 | authentication. They follow a modular framework allow | |
101 | different authentication schemes to be added at will. For | |
102 | information on working with the authentication schemes See | |
103 | the chapter Authentication Framework. | |
104 | ||
105 | \section NetworkCommunication Network Communication | |
106 | \par | |
107 | These are the routines for communicating over TCP and UDP | |
108 | network sockets. Here is where sockets are opened, closed, | |
109 | read, and written. In addition, note that the heart of | |
110 | Squid (comm_select() or comm_poll()) exists here, | |
111 | even though it handles all file descriptors, not just | |
112 | network sockets. These routines do not support queuing | |
113 | multiple blocks of data for writing. Consequently, a | |
114 | callback occurs for every write request. | |
9837567d | 115 | TODO: decide what to do for comm_poll() since its either obsolete or uses other names. |
8b651fb3 | 116 | |
117 | \section FileDiskIO File/Disk I/O | |
118 | \par | |
119 | Routines for reading and writing disk files (and FIFOs). | |
120 | Reasons for separating network and disk I/O functions are | |
121 | partly historical, and partly because of different behaviors. | |
122 | For example, we don't worry about getting a "No space left | |
123 | on device" error for network sockets. The disk I/O routines | |
124 | support queuing of multiple blocks for writing. In some | |
125 | cases, it is possible to merge multiple blocks into a single | |
126 | write request. The write callback does not necessarily | |
127 | occur for every write request. | |
128 | ||
129 | \section Neighbors Neighbors | |
130 | \par | |
131 | Maintains the list of neighbor caches. Sends and receives | |
132 | ICP messages to neighbors. Decides which neighbors to | |
133 | query for a given request. File: neighbors.c. | |
134 | ||
135 | \section FQDNCache IP/FQDN Cache | |
136 | \par | |
137 | A cache of name-to-address and address-to-name lookups. | |
138 | These are hash tables keyed on the names and addresses. | |
139 | ipcache_nbgethostbyname() and fqdncache_nbgethostbyaddr() | |
140 | implement the non-blocking lookups. Files: ipcache.c, | |
141 | fqdncache.c. | |
142 | ||
143 | \section CacheManager Cache Manager | |
144 | \par | |
145 | This provides access to certain information needed by the | |
146 | cache administrator. A companion program, cachemgr.cgi | |
147 | can be used to make this information available via a Web | |
148 | browser. Cache manager requests to Squid are made with a | |
149 | special URL of the form | |
150 | \code | |
151 | cache_object://hostname/operation | |
152 | \endcode | |
153 | The cache manager provides essentially "read-only" access | |
154 | to information. It does not provide a method for configuring | |
155 | Squid while it is running. | |
9837567d | 156 | TODO: get cachemgr.cgi documenting |
8b651fb3 | 157 | |
158 | \section NetworkMeasurementDB Network Measurement Database | |
159 | \par | |
160 | In a number of situation, Squid finds it useful to know the | |
161 | estimated network round-trip time (RTT) between itself and | |
162 | origin servers. A particularly useful is example is | |
163 | the peer selection algorithm. By making RTT measurements, a | |
164 | Squid cache will know if it, or one if its neighbors, is closest | |
165 | to a given origin server. The actual measurements are made | |
166 | with the pinger program, described below. The measured | |
167 | values are stored in a database indexed under two keys. The | |
168 | primary index field is the /24 prefix of the origin server's | |
169 | IP address. Secondly, a hash table of fully-qualified host | |
170 | names that have data structures with links to the appropriate | |
171 | network entry. This allows Squid to quickly look up measurements | |
172 | when given either an IP address, or a host name. The /24 prefix | |
173 | aggregation is used to reduce the overall database size. File: | |
174 | net_db.c. | |
175 | ||
176 | \section Redirectors Redirectors | |
177 | \par | |
178 | Squid has the ability to rewrite requests from clients. After | |
179 | checking the ACL access controls, but before checking for cache hits, | |
180 | requested URLs may optionally be written to an external | |
181 | redirector process. This program, which can be highly | |
182 | customized, may return a new URL to replace the original request. | |
183 | Common applications for this feature are extended access controls | |
184 | and local mirroring. File: redirect.c. | |
185 | ||
186 | \section ASN Autonomous System Numbers | |
187 | \par | |
188 | Squid supports Autonomous System (AS) numbers as another | |
189 | access control element. The routines in asn.c | |
190 | query databases which map AS numbers into lists of CIDR | |
191 | prefixes. These results are stored in a radix tree which | |
192 | allows fast searching of the AS number for a given IP address. | |
193 | ||
194 | \section ConfigurationFileParsing Configuration File Parsing | |
195 | \par | |
196 | The primary configuration file specification is in the file | |
197 | cf.data.pre. A simple utility program, cf_gen, | |
9837567d AJ |
198 | reads the cf.data.pre file and generates cf_parser.cci |
199 | and squid.conf. cf_parser.cci is included directly | |
200 | into cache_cf.cc at compile time. | |
201 | TODO: get cf.data.pre documenting | |
202 | TODO: get squid.conf documenting | |
203 | TODO: get cf_gen documenting and linking. | |
8b651fb3 | 204 | |
205 | \section Callback Data Allocator | |
206 | \par | |
207 | Squid's extensive use of callback functions makes it very | |
208 | susceptible to memory access errors. Care must be taken | |
209 | so that the callback_data memory is still valid when | |
210 | the callback function is executed. The routines in cbdata.c | |
211 | provide a uniform method for managing callback data memory, | |
212 | canceling callbacks, and preventing erroneous memory accesses. | |
9837567d | 213 | TODO: get callback_data (object?) linking or replacement named. |
8b651fb3 | 214 | |
215 | \section RefCountDataAllocator Refcount Data Allocator | |
216 | \since Squid 3.0 | |
217 | \par | |
218 | Manual reference counting such as cbdata uses is error prone, | |
219 | and time consuming for the programmer. C++'s operator overloading | |
220 | allows us to create automatic reference counting pointers, that will | |
221 | free objects when they are no longer needed. With some care these | |
222 | objects can be passed to functions needed Callback Data pointers. | |
9837567d | 223 | TODO: get cbdata documenting and linking. |
8b651fb3 | 224 | |
225 | \section Debugging Debugging | |
226 | \par | |
227 | Squid includes extensive debugging statements to assist in | |
228 | tracking down bugs and strange behavior. Every debug statement | |
229 | is assigned a section and level. Usually, every debug statement | |
230 | in the same source file has the same section. Levels are chosen | |
231 | depending on how much output will be generated, or how useful the | |
232 | provided information will be. The \em debug_options line | |
233 | in the configuration file determines which debug statements will | |
234 | be shown and which will not. The \em debug_options line | |
235 | assigns a maximum level for every section. If a given debug | |
236 | statement has a level less than or equal to the configured | |
237 | level for that section, it will be shown. This description | |
238 | probably sounds more complicated than it really is. | |
239 | File: debug.c. Note that debugs() itself is a macro. | |
9837567d | 240 | TODO: get debugs() documenting as if it was a function. |
8b651fb3 | 241 | |
242 | \section ErrorGeneration Error Generation | |
243 | \par | |
244 | The routines in errorpage.c generate error messages from | |
245 | a template file and specific request parameters. This allows | |
246 | for customized error messages and multilingual support. | |
247 | ||
248 | \section EventQueue Event Queue | |
249 | \par | |
250 | The routines in event.c maintain a linked-list event | |
251 | queue for functions to be executed at a future time. The | |
252 | event queue is used for periodic functions such as performing | |
253 | cache replacement, cleaning swap directories, as well as one-time | |
254 | functions such as ICP query timeouts. | |
255 | ||
256 | \section FiledescriptorManagement Filedescriptor Management | |
257 | \par | |
258 | Here we track the number of filedescriptors in use, and the | |
259 | number of bytes which has been read from or written to each | |
260 | file descriptor. | |
261 | ||
262 | ||
263 | \section HashtableSupport Hashtable Support | |
264 | \par | |
265 | These routines implement generic hash tables. A hash table | |
266 | is created with a function for hashing the key values, and a | |
267 | function for comparing the key values. | |
268 | ||
269 | \section HTTPAnonymization HTTP Anonymization | |
270 | \par | |
271 | These routines support anonymizing of HTTP requests leaving | |
272 | the cache. Either specific request headers will be removed | |
273 | (the "standard" mode), or only specific request headers | |
274 | will be allowed (the "paranoid" mode). | |
275 | ||
276 | \section DelayPools Delay Pools | |
277 | \par | |
278 | Delay pools provide bandwidth regulation by restricting the rate | |
279 | at which squid reads from a server before sending to a client. They | |
280 | do not prevent cache hits from being sent at maximal capacity. Delay | |
281 | pools can aggregate the bandwidth from multiple machines and users | |
282 | to provide more or less general restrictions. | |
283 | ||
284 | \section ICPSupport Internet Cache Protocol | |
285 | \par | |
286 | Here we implement the Internet Cache Protocol. This | |
287 | protocol is documented in the RFC 2186 and RFC 2187. | |
288 | The bulk of code is in the icp_v2.c file. The | |
289 | other, icp_v3.c is a single function for handling | |
290 | ICP queries from Netcache/Netapp caches; they use | |
291 | a different version number and a slightly different message | |
292 | format. | |
9837567d | 293 | TODO: get RFCs linked from ietf |
8b651fb3 | 294 | |
295 | \section IdentLookups Ident Lookups | |
296 | \par | |
297 | These routines support RFC 931 (http://www.ietf.org/rfc/rfc931.txt) | |
298 | "Ident" lookups. An ident | |
299 | server running on a host will report the user name associated | |
300 | with a connected TCP socket. Some sites use this facility for | |
301 | access control and logging purposes. | |
302 | ||
303 | \section MemoryManagement Memory Management | |
304 | \par | |
305 | These routines allocate and manage pools of memory for | |
306 | frequently-used data structures. When the \em memory_pools | |
307 | configuration option is enabled, unused memory is not actually | |
308 | freed. Instead it is kept for future use. This may result | |
309 | in more efficient use of memory at the expense of a larger | |
310 | process size. | |
311 | ||
312 | \section MulticastSupport Multicast Support | |
313 | \par | |
314 | Currently, multicast is only used for ICP queries. The | |
315 | routines in this file implement joining a UDP | |
316 | socket to a multicast group (or groups), and setting | |
317 | the multicast TTL value on outgoing packets. | |
318 | ||
319 | \section PresistentConnections Persistent Server Connections | |
320 | \par | |
321 | These routines manage idle, persistent HTTP connections | |
322 | to origin servers and neighbor caches. Idle sockets | |
323 | are indexed in a hash table by their socket address | |
324 | (IP address and port number). Up to 10 idle sockets | |
325 | will be kept for each socket address, but only for | |
326 | 15 seconds. After 15 seconds, idle socket connections | |
327 | are closed. | |
328 | ||
329 | \section RefreshRules Refresh Rules | |
330 | \par | |
331 | These routines decide whether a cached object is stale or fresh, | |
332 | based on the \em refresh_pattern configuration options. | |
333 | If an object is fresh, it can be returned as a cache hit. | |
334 | If it is stale, then it must be revalidated with an | |
335 | If-Modified-Since request. | |
336 | ||
337 | \section SNMPSupport SNMP Support | |
338 | \par | |
339 | These routines implement SNMP for Squid. At the present time, | |
340 | we have made almost all of the cachemgr information available | |
341 | via SNMP. | |
342 | ||
343 | \section URNSupport URN Support | |
344 | \par | |
345 | We are experimenting with URN support in Squid version 1.2. | |
346 | Note, we're not talking full-blown generic URN's here. This | |
347 | is primarily targeted toward using URN's as an smart way | |
348 | of handling lists of mirror sites. For more details, please | |
349 | see (http://squid.nlanr.net/Squid/urn-support.html) URN Support in Squid | |
350 | . | |
351 | ||
352 | \section ESI ESI | |
353 | \par | |
354 | ESI is an implementation of Edge Side Includes (http://www.esi.org). | |
355 | ESI is implemented as a client side stream and a small | |
356 | modification to client_side_reply.c to check whether | |
357 | ESI should be inserted into the reply stream or not. | |
358 | ||
359 | */ |