From: amosjeffries <>
Date: Sun, 20 Jan 2008 16:48:41 +0000 (+0000)
Subject: Add major additional information pages.
X-Git-Tag: BASIC_TPROXY4~177
X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=8b651fb31fea43cef240f7dd77095e417a81254a;p=thirdparty%2Fsquid.git

Add major additional information pages.

* These pages are for discourses on major components not suitable for
  writing into the code pages.
---

diff --git a/doc/Programming-Guide/01_Main.dox b/doc/Programming-Guide/01_Main.dox
new file mode 100644
index 0000000000..0692fa8798
--- /dev/null
+++ b/doc/Programming-Guide/01_Main.dox
@@ -0,0 +1,55 @@
+/**
+\mainpage Squid 3.x Developer Programming Guide
+
+\section Abstract Abstract
+
+\par
+     Squid is a WWW Cache application developed by the National Laboratory
+     for Applied Network Research and members of the Web Caching community.
+     Squid is implemented as a single, non-blocking process based around
+     a BSD select() loop.  This document describes the operation of the Squid
+     source code and is intended to be used by others who wish to customize
+     or improve it.
+
+
+\section Introduction Introduction
+
+\par
+        The Squid source code has evolved more from empirical
+        observation and tinkering, rather than a solid design
+        process.  It carries a legacy of being "touched" by
+        numerous individuals, each with somewhat different techniques
+        and terminology.
+
+\par
+        Squid is a single-process proxy server.  Every request is
+        handled by the main process, with the exception of FTP.
+        However, Squid does not use a "threads package" such has
+        Pthreads.  While this might be easier to code, it suffers
+        from portability and performance problems.  Instead Squid
+        maintains data structures and state information for each
+        active request.
+
+\par
+        The code is often difficult to follow because there are no
+        explicit state variables for the active requests.  Instead,
+        thread execution progresses as a sequence of "callback
+        functions" which get executed when I/O is ready to occur,
+        or some other event has happened.  As a callback function
+        completes, it is responsible for registering the next
+        callback function for subsequent I/O.
+
+\par
+        Note there is only a pseudo-consistent naming scheme.  In
+        most cases functions are named like \c moduleFooBar() .
+        However, there are also some functions named like
+        \c module_foo_bar() .
+
+\par
+        Note that the Squid source changes rapidly, and while we 
+        do make some effort to document code as we go some parts
+        of the documentation may be left out.  If you find any
+        inconsistencies, please feel free to notify 
+        http://www.squid-cache.org/Support/contact.dyn the Squid Developers.
+
+ */
diff --git a/doc/Programming-Guide/02_CodingConventions.dox b/doc/Programming-Guide/02_CodingConventions.dox
new file mode 100644
index 0000000000..abee9694cb
--- /dev/null
+++ b/doc/Programming-Guide/02_CodingConventions.dox
@@ -0,0 +1,191 @@
+/**
+\page Conventions Coding and Other Conventions used in Squid
+
+\section Coding Code Conventions
+\par
+        Most custom types and tools are documented in the code or the relevant
+        portions of this manual. Some key points apply globally however.
+
+\section FWT Fixed Width types
+
+\par
+        If you need to use specific width types - such as
+        a 16 bit unsigned integer, use one of the following types. To access
+        them simply include "config.h".
+
+\verbatim
+        int16_t    -  16 bit signed.
+        u_int16_t  -  16 bit unsigned.
+        int32_t    -  32 bit signed.
+        u_int32_t  -  32 bit unsigned.
+        int64_t    -  64 bit signed.
+        u_int64_t  -  64 bit unsigned.
+\endverbatim
+
+\section Documentation Documentation Conventions
+\par
+	Now that documentation is generated automatically from the sources
+	some common comment conventions need to be adopted.
+
+
+\subsection CommentComponents	API vs Internal Component Commenting
+
+\par
+	First among these is a definition seperation between component API
+	and Internal operations. API functions and objects should always be
+	commented and in the *.h file for the component. Internal logics and
+	objects should be commented in the *.cc file where they are defined.
+        The group is to be defined in the components main files with the
+	overview paragraphs about the API usage or component structure.
+
+\par
+	With C++ classes it is easy to seperate API and Internals with the C++
+	public: and private: distinctions on whichever class defines the
+	component API. An Internal group may not be required if there are no
+	additional items in the Internals (rare as globals are common in squid).
+
+\par
+	With unconverted modules still coded in Objective-C, the task is harder.
+	In these cases two sub-groups must be defined *API and *Internal into
+	which naturally individual functions, variables, etc. are grouped using
+	the \b \\ingroup tag. The API group is usually a sub-group of Components
+	and the Internal is always a sub-group of the API.
+
+\par	Rule of thumb:
+	For both items, if its referenced from elsewhere in the code or
+	defined in the .h file it should be part of the API.
+	Everything else should be in the Internals group and kept out of the .h file.
+
+\subsection FunctionComments	Function/Method Comments
+
+\par
+	All descriptions may be more than one line, and while whitespace formatting is
+	ignored by doxygen, it is good to keep it clear for manual reading of the code.
+
+\par
+	Any text directly following a \b \\par tag will be highlighted in bold
+	automatically (like all the 'For Examples' below) so be careful what is placed
+	there.
+
+
+\subsubsection PARAM    Function Parameters
+
+\par
+	Function and Method parameters MUST be named in both the definition and in
+	the declaration, and they also MUST be the same text. The doxygen parser
+	needs them to be identical to accurately link the two with documentation.
+	Particularly linking function with documentation of the label itself.
+
+\par
+	Each function that takes parameters should have the possible range of values
+	commented in the pre-function descriptor. For API function this is as usual
+	in the .h file, for Internal functions it is i the .(cc|cci) file.
+
+\par
+	The \b \\param tag is used to describe these. It takes two required parameters;
+	the name of the function parameter being documented followed immediately by
+	either [in], [out], or [in,out].
+	Followed by an optional description of what the parameter represents.
+
+\par	For Example:
+\verbatim
+/**
+ \param g[out]		Buffer to receive something
+ \param glen[in]	Length of buffer available to write
+ */
+void
+X::getFubar(char *g, int glen)
+...
+\endverbatim
+
+
+\subsubsection RETVAL   Return Values
+
+\par
+	Each function that returns a value should have the possible range of values
+	commented in the pre-function descriptor.
+\par
+	The \b \\retval tag is used to describe these. It takes one required parameter;
+	the value or range of values returned.
+	Followed by an optional description of what/why of that value.
+
+\par	For Example:
+\verbatim
+/**
+ \retval 0	when FUBAR does not start with 'F'
+ \retval 1	when FUBAR startes with F
+ */
+int
+X::saidFubar()
+...
+\endverbatim
+
+\par	Alternatively
+	when a state or other context-dependant object is returned the \b \return
+	tag is used. It is followed by a description of the object and ideally its
+	content.
+
+
+\subsubsection FLOW     Function Actions / Internal Flows
+
+\par	Simple functions
+	do not exactly need a detailed description of their operation.
+	The \link PARAM input parameters \endlink and \link RETVAL Return \endlink
+	value should be enough for any developer to understand the function.
+
+\par	Long or Complex Functions
+	do however need some commenting.
+	A well-designed function does all its operatons in distinct blocks;
+	\arg Input validation
+	\arg Processing on some state
+	\arg Processing on the output of that earlier processing
+	\arg etc, etc.
+
+\par
+	Each of these design blocks inside the function should be given a comment
+	indicating what they do. The comments should begin with
+	\verbatim /** \par \endverbatim
+	The resulting function description will then contain a paragraph on each of the
+	blocks in the order they occur in the function.
+
+\par	For example:
+\verbatim
+/**
+ \param g	The buffer to be used
+ \param glen	Length of buffer provided
+ \param state	Object of type X storing foo
+ */
+void
+fubar(char *g, int glen, void *state) {
+\endverbatim
+	Designed validation part of the function
+\verbatim
+    /** \par
+     * When g is NULL or gen is 0 nothing is done */
+    if(g == NULL || glen < 1)
+        return;
+
+   /** \par
+     * When glen is longer than the accepted length it gets truncated */
+   if(glen > MAX_FOO) glen = MAX_FOO;
+\endverbatim
+	now we get on to the active part of the function
+\verbatim
+   /** \par
+     * Appends up to MAX_FOO bytes from g onto the end of state->foo
+     * then passes the state off to FUBAR.
+     * No check for null-termination is done.
+     */
+   xmemcpy(g, glen, state->foo_end_ptr );
+   state->foo_end_ptr += glen;
+   fubar(state);
+}
+\endverbatim
+
+\par
+	Of course, this is a very simple example. This type of comment should only be
+	needed in the larger functions with many side effects.
+	A function this small could reasonably have all its commenting done just ahead of
+	the parameter description.
+
+ */
diff --git a/doc/Programming-Guide/03_MajorComponents.dox b/doc/Programming-Guide/03_MajorComponents.dox
new file mode 100644
index 0000000000..0216a8bd73
--- /dev/null
+++ b/doc/Programming-Guide/03_MajorComponents.dox
@@ -0,0 +1,351 @@
+/**
+\ingroup Component
+
+\section Overview of Squid Components
+
+\par Squid consists of the following major components
+
+\section ClientSideSocket Client Side Socket
+
+\par
+	Here new client connections are accepted, parsed, and
+	reply data sent. Per-connection state information is held
+	in a data structure called ConnStateData.  Per-request 
+	state information is stored in the clientSocketContext
+	structure. With HTTP/1.1 we may have multiple requests from
+	a single TCP connection.
+\todo DOCS: find out what has replaced clientSocketContext since it seems to not exist now.
+
+\section ClientSideRequest Client Side Request
+\par
+	This is where requests are processed. We determine if the
+	request is to be redirected, if it passes access lists,
+	and setup the initial client stream for internal requests.
+	Temporary state for this processing is held in a 
+	clientRequestContext.
+\todo DOCS: find out what has replaced clientRequestContext since it seems not to exist now.
+
+\section ClientSideReply Client Side Reply	
+\par
+	This is where we determine if the request is cache HIT, 
+	REFRESH, MISS, etc. This involves querying the store 
+	(possibly multiple times) to work through Vary lists and
+	the list. Per-request state information is stored
+	in the clientReplyContext.
+
+\section StorageManager Storage Manager
+\par
+	The Storage Manager is the glue between client and server
+	sides.  Every object saved in the cache is allocated a
+	StoreEntry structure.  While the object is being
+	accessed, it also has a MemObject structure.
+\par
+	Squid can quickly locate cached objects because it keeps
+	(in memory) a hash table of all StoreEntry's.  The
+	keys for the hash table are MD5 checksums of the objects
+	URI.  In addition there is also a storage policy such
+	as LRU that keeps track of the objects and determines
+	the removal order when space needs to be reclaimed.
+	For the LRU policy this is implemented as a doubly linked
+	list.
+\par
+	For each object the StoreEntry maps to a cache_dir
+	and location via sdirno and sfileno. For the "ufs" store
+	this file number (sfileno) is converted to a disk pathname
+	by a simple modulo of L2 and L1, but other storage drivers may
+	map sfilen in other ways.  A cache swap file consists
+	of two parts: the cache metadata, and the object data.
+	Note the object data includes the full HTTP reply---headers
+	and body.  The HTTP reply headers are not the same as the
+	cache metadata.
+\par
+	Client-side requests register themselves with a StoreEntry
+	to be notified when new data arrives.  Multiple clients
+	may receive data via a single StoreEntry.  For POST
+	and PUT request, this process works in reverse.  Server-side
+	functions are notified when additional data is read from
+	the client.
+
+\section RequestForwarding Request Forwarding
+
+\section PeerSelection Peer Selection
+\par
+	These functions are responsible for selecting one (or none)
+	of the neighbor caches as the appropriate forwarding
+	location.
+
+\section AccessControl Access Control
+\par
+	These functions are responsible for allowing or denying a
+	request, based on a number of different parameters.  These
+	parameters include the client's IP address, the hostname
+	of the requested resource, the request method, etc.  Some
+	of the necessary information may not be immediately available,
+	for example the origin server's IP address.  In these cases,
+	the ACL routines initiate lookups for the necessary
+	information and continues the access control checks when
+	the information is available.
+
+\section AuthenticationFramework Authentication Framework
+\par
+	These functions are responsible for handling HTTP
+	authentication.  They follow a modular framework allow
+	different authentication schemes to be added at will. For
+	information on working with the authentication schemes See
+	the chapter Authentication Framework.
+
+\section NetworkCommunication Network Communication
+\par
+	These are the routines for communicating over TCP and UDP
+	network sockets.  Here is where sockets are opened, closed,
+	read, and written.  In addition, note that the heart of
+	Squid (comm_select() or comm_poll()) exists here,
+	even though it handles all file descriptors, not just
+	network sockets.  These routines do not support queuing
+	multiple blocks of data for writing.  Consequently, a
+	callback occurs for every write request.
+\todo DOCS: decide what to do for comm_poll() since its either obsolete or uses other names.
+
+\section FileDiskIO File/Disk I/O
+\par
+	Routines for reading and writing disk files (and FIFOs).
+	Reasons for separating network and disk I/O functions are
+	partly historical, and partly because of different behaviors.
+	For example, we don't worry about getting a "No space left
+	on device" error for network sockets.  The disk I/O routines
+	support queuing of multiple blocks for writing.  In some
+	cases, it is possible to merge multiple blocks into a single
+	write request.  The write callback does not necessarily
+	occur for every write request.
+
+\section Neighbors Neighbors
+\par
+	Maintains the list of neighbor caches.  Sends and receives
+	ICP messages to neighbors.  Decides which neighbors to
+	query for a given request.  File: neighbors.c.
+
+\section FQDNCache IP/FQDN Cache
+\par
+	A cache of name-to-address and address-to-name lookups.
+	These are hash tables keyed on the names and addresses.
+	ipcache_nbgethostbyname() and fqdncache_nbgethostbyaddr()
+	implement the non-blocking lookups.  Files: ipcache.c,
+	fqdncache.c.
+
+\section CacheManager Cache Manager
+\par
+	This provides access to certain information needed by the
+	cache administrator.  A companion program, cachemgr.cgi
+	can be used to make this information available via a Web
+	browser.  Cache manager requests to Squid are made with a
+	special URL of the form
+\code
+	cache_object://hostname/operation
+\endcode
+	The cache manager provides essentially "read-only" access
+	to information.  It does not provide a method for configuring
+	Squid while it is running.
+\todo DOCS: get cachemgr.cgi documenting
+
+\section NetworkMeasurementDB Network Measurement Database
+\par
+	In a number of situation, Squid finds it useful to know the
+	estimated network round-trip time (RTT) between itself and
+	origin servers.  A particularly useful is example is
+	the peer selection algorithm.  By making RTT measurements, a
+	Squid cache will know if it, or one if its neighbors, is closest
+	to a given origin server.  The actual measurements are made
+	with the pinger program, described below.  The measured
+	values are stored in a database indexed under two keys.  The
+	primary index field is the /24 prefix of the origin server's
+	IP address.  Secondly, a hash table of fully-qualified host
+	names that have data structures with links to the appropriate
+	network entry.  This allows Squid to quickly look up measurements
+	when given either an IP address, or a host name.  The /24 prefix
+	aggregation is used to reduce the overall database size.  File:
+	net_db.c.
+
+\section Redirectors Redirectors
+\par
+	Squid has the ability to rewrite requests from clients.  After
+	checking the ACL access controls, but before checking for cache hits,
+	requested URLs may optionally be written to an external
+	redirector process.  This program, which can be highly
+	customized, may return a new URL to replace the original request.
+	Common applications for this feature are extended access controls
+	and local mirroring.  File: redirect.c.
+
+\section ASN Autonomous System Numbers
+\par
+	Squid supports Autonomous System (AS) numbers as another
+	access control element.  The routines in asn.c
+	query databases which map AS numbers into lists of CIDR
+	prefixes.  These results are stored in a radix tree which
+	allows fast searching of the AS number for a given IP address.
+
+\section ConfigurationFileParsing Configuration File Parsing
+\par
+	The primary configuration file specification is in the file
+	cf.data.pre.  A simple utility program, cf_gen,
+	reads the cf.data.pre file and generates cf_parser.c
+	and squid.conf.  cf_parser.c is included directly
+	into cache_cf.c at compile time.
+\todo DOCS: get cf.data.pre documenting
+\todo DOCS: get squid.conf documenting
+\todo DOCS: get cf_gen documenting and linking.
+
+\section Callback Data Allocator
+\par
+	Squid's extensive use of callback functions makes it very
+	susceptible to memory access errors.  Care must be taken
+	so that the callback_data memory is still valid when
+	the callback function is executed.  The routines in cbdata.c
+	provide a uniform method for managing callback data memory,
+	canceling callbacks, and preventing erroneous memory accesses.
+\todo DOCS: get callback_data (object?) linking or repalcement named.
+
+\section RefCountDataAllocator Refcount Data Allocator
+\since Squid 3.0
+\par
+	Manual reference counting such as cbdata uses is error prone,
+	and time consuming for the programmer. C++'s operator overloading
+	allows us to create automatic reference counting pointers, that will
+	free objects when they are no longer needed. With some care these 
+	objects can be passed to functions needed Callback Data pointers.
+\todo DOCS: get cbdata documenting and linking.
+
+\section Debugging Debugging
+\par
+	Squid includes extensive debugging statements to assist in
+	tracking down bugs and strange behavior.  Every debug statement
+	is assigned a section and level.  Usually, every debug statement
+	in the same source file has the same section.  Levels are chosen
+	depending on how much output will be generated, or how useful the
+	provided information will be.  The \em debug_options line
+	in the configuration file determines which debug statements will
+	be shown and which will not.  The \em debug_options line
+	assigns a maximum level for every section.  If a given debug
+	statement has a level less than or equal to the configured
+	level for that section, it will be shown.  This description
+	probably sounds more complicated than it really is.
+	File: debug.c.  Note that debugs() itself is a macro.
+\todo DOCS: get debugs() documenting as if it was a function.
+
+\section ErrorGeneration Error Generation
+\par
+	The routines in errorpage.c generate error messages from
+	a template file and specific request parameters.  This allows
+	for customized error messages and multilingual support.
+
+\section EventQueue Event Queue
+\par
+	The routines in event.c maintain a linked-list event
+	queue for functions to be executed at a future time.  The
+	event queue is used for periodic functions such as performing
+	cache replacement, cleaning swap directories, as well as one-time
+	functions such as ICP query timeouts.
+
+\section FiledescriptorManagement Filedescriptor Management
+\par
+	Here we track the number of filedescriptors in use, and the
+	number of bytes which has been read from or written to each
+	file descriptor.
+
+
+\section HashtableSupport Hashtable Support
+\par
+	These routines implement generic hash tables.  A hash table
+	is created with a function for hashing the key values, and a
+	function for comparing the key values.
+
+\section HTTPAnonymization HTTP Anonymization
+\par
+	These routines support anonymizing of HTTP requests leaving
+	the cache.  Either specific request headers will be removed
+	(the "standard" mode), or only specific request headers
+	will be allowed (the "paranoid" mode).
+
+\section DelayPools Delay Pools
+\par
+	Delay pools provide bandwidth regulation by restricting the rate
+	at which squid reads from a server before sending to a client. They
+	do not prevent cache hits from being sent at maximal capacity. Delay
+	pools can aggregate the bandwidth from multiple machines and users
+	to provide more or less general restrictions.
+
+\section ICPSupport Internet Cache Protocol
+\par
+	Here we implement the Internet Cache Protocol.  This
+	protocol is documented in the RFC 2186 and RFC 2187.
+	The bulk of code is in the icp_v2.c file.  The
+	other, icp_v3.c is a single function for handling
+	ICP queries from Netcache/Netapp caches; they use
+	a different version number and a slightly different message
+	format.
+\todo DOCS: get RFCs linked from ietf
+
+\section IdentLookups Ident Lookups
+\par
+	These routines support RFC 931 (http://www.ietf.org/rfc/rfc931.txt)
+        "Ident" lookups.   An ident
+	server running on a host will report the user name associated
+	with a connected TCP socket.  Some sites use this facility for
+	access control and logging purposes.
+
+\section MemoryManagement Memory Management
+\par
+	These routines allocate and manage pools of memory for
+	frequently-used data structures.  When the \em memory_pools
+	configuration option is enabled, unused memory is not actually
+	freed.  Instead it is kept for future use.  This may result
+	in more efficient use of memory at the expense of a larger
+	process size.
+
+\section MulticastSupport Multicast Support
+\par
+	Currently, multicast is only used for ICP queries.   The
+	routines in this file implement joining a UDP
+	socket to a multicast group (or groups), and setting
+	the multicast TTL value on outgoing packets.
+
+\section PresistentConnections Persistent Server Connections
+\par
+	These routines manage idle, persistent HTTP connections
+	to origin servers and neighbor caches.  Idle sockets
+	are indexed in a hash table by their socket address
+	(IP address and port number).  Up to 10 idle sockets
+	will be kept for each socket address, but only for
+	15 seconds.  After 15 seconds, idle socket connections
+	are closed.
+
+\section RefreshRules Refresh Rules
+\par
+	These routines decide whether a cached object is stale or fresh,
+	based on the \em refresh_pattern configuration options.
+	If an object is fresh, it can be returned as a cache hit.
+	If it is stale, then it must be revalidated with an	
+	If-Modified-Since request.
+
+\section SNMPSupport SNMP Support
+\par
+	These routines implement SNMP for Squid.  At the present time,
+	we have made almost all of the cachemgr information available
+	via SNMP.
+
+\section URNSupport URN Support
+\par
+	We are experimenting with URN support in Squid version 1.2.
+	Note, we're not talking full-blown generic URN's here. This
+	is primarily targeted toward using URN's as an smart way
+	of handling lists of mirror sites.  For more details, please
+	see (http://squid.nlanr.net/Squid/urn-support.html) URN Support in Squid
+	.
+
+\section ESI ESI
+\par
+	ESI is an implementation of Edge Side Includes (http://www.esi.org).
+	ESI is implemented as a client side stream and a small 
+	modification to client_side_reply.c to check whether
+	ESI should be inserted into the reply stream or not.
+
+ */
diff --git a/doc/Programming-Guide/05_TypicalRequestFlow.dox b/doc/Programming-Guide/05_TypicalRequestFlow.dox
new file mode 100644
index 0000000000..9cce99e0b5
--- /dev/null
+++ b/doc/Programming-Guide/05_TypicalRequestFlow.dox
@@ -0,0 +1,72 @@
+/**
+\page 05_TypicalRequestFlow Flow of a Typical Request
+
+\par
+\li	A client connection is accepted by the client-side socket
+	support and parsed, or is directly created via
+	clientBeginRequest().
+
+\li	The access controls are checked.  The client-side-request builds
+	an ACL state data structure and registers a callback function
+	for notification when access control checking is completed.
+
+\li	After the access controls have been verified, the request
+	may be redirected. 
+
+\li	The client-side-request is forwarded up the client stream
+	to GetMoreData() which looks for the requested object in the 
+	cache, and or Vary: versions of the same. If is a cache hit, 
+	then the client-side registers its interest in the 
+	StoreEntry. Otherwise, Squid needs to forward the request,
+	perhaps with an If-Modified-Since header.
+
+\li	The request-forwarding process begins with protoDispatch().
+	This function begins the peer selection procedure, which
+	may involve sending ICP queries and receiving ICP replies.
+	The peer selection procedure also involves checking
+	configuration options such as \em never_direct and
+	\em always_direct.
+
+\li	When the ICP replies (if any) have been processed, we end
+	up at protoStart().  This function calls an appropriate
+	protocol-specific function for forwarding the request.
+	Here we will assume it is an HTTP request.
+
+\li	The HTTP module first opens a connection to the origin
+	server or cache peer.  If there is no idle persistent socket
+	available, a new connection request is given to the Network
+	Communication module with a callback function.  The
+	comm.c routines may try establishing a connection
+	multiple times before giving up.
+
+\li	When a TCP connection has been established, HTTP builds a
+	request buffer and submits it for writing on the socket.
+	It then registers a read handler to receive and process
+	the HTTP reply.
+
+\li	As the reply is initially received, the HTTP reply headers
+	are parsed and placed into a reply data structure.  As
+	reply data is read, it is appended to the StoreEntry.
+	Every time data is appended to the StoreEntry, the
+	client-side is notified of the new data via a callback
+	function. The rate at which reading occurs is regulated by
+	the delay pools routines, via the deferred read mechanism.
+
+\li	As the client-side is notified of new data, it copies the
+	data from the StoreEntry and submits it for writing on the
+	client socket.
+
+\li	As data is appended to the StoreEntry, and the client(s)
+	read it, the data may be submitted for writing to disk.
+
+\li	When the HTTP module finishes reading the reply from the
+	upstream server, it marks the StoreEntry as "complete".
+	The server socket is either closed or given to the persistent
+	connection pool for future use.
+
+\li	When the client-side has written all of the object data,
+	it unregisters itself from the StoreEntry.  At the
+	same time it either waits for another request from the
+	client, or closes the client connection.
+
+*/
diff --git a/doc/Programming-Guide/AccessControls.dox b/doc/Programming-Guide/AccessControls.dox
new file mode 100644
index 0000000000..acd3ffe0ab
--- /dev/null
+++ b/doc/Programming-Guide/AccessControls.dox
@@ -0,0 +1,16 @@
+/**
+\defgroup ACLAPI Access Controls
+\ingroup Components
+
+\par
+        These functions are responsible for allowing or denying a
+        request, based on a number of different parameters.  These
+        parameters include the client's IP address, the hostname
+        of the requested resource, the request method, etc.  Some
+        of the necessary information may not be immediately available,
+        for example the origin server's IP address.  In these cases,
+        the ACL routines initiate lookups for the necessary
+        information and continues the access control checks when
+        the information is available.
+
+ */
diff --git a/doc/Programming-Guide/BasicAuthentication.dox b/doc/Programming-Guide/BasicAuthentication.dox
new file mode 100644
index 0000000000..45ce6b3a94
--- /dev/null
+++ b/doc/Programming-Guide/BasicAuthentication.dox
@@ -0,0 +1,43 @@
+/**
+\defgroup AuthAPIBasic Basic Authentication
+\ingroup AuthAPI
+
+\par
+Basic authentication provides a username and password.  These
+are written to the authentication module processes on a single
+line, separated by a space:
+\code
+<USERNAME> <PASSWORD>
+\endcode
+
+\par
+	The authentication module process reads username, password pairs
+	on stdin and returns either "OK" or "ERR" on stdout for
+	each input line.
+
+\par
+	The following simple perl script demonstrates how the
+	authentication module works.  This script allows any
+	user named "Dirk" (without checking the password)
+	and allows any user that uses the password "Sekrit":
+
+\code
+#!/usr/bin/perl -w
+$|=1;		# no buffering, important!
+while (<>) {
+        chop;
+        ($u,$p) = split;
+        $ans = &amp;check($u,$p);
+        print "$ans\n";
+}
+
+sub check {
+        local($u,$p) = @_;
+        return 'ERR' unless (defined $p &amp;&amp; defined $u);
+        return 'OK' if ('Dirk' eq $u);
+        return 'OK' if ('Sekrit' eq $p);
+        return 'ERR';
+}
+\endcode
+
+ */
diff --git a/doc/Programming-Guide/DelayPools.dox b/doc/Programming-Guide/DelayPools.dox
new file mode 100644
index 0000000000..cea810d0e9
--- /dev/null
+++ b/doc/Programming-Guide/DelayPools.dox
@@ -0,0 +1,49 @@
+/**
+\page 10_DelayPools Delay Pools
+
+\section Introduction Introduction
+\par
+	A DelayPool is a Composite used to manage bandwidth for any request
+	assigned to the pool by an access expression. DelayId's are a used
+	to manage the bandwith on a given request, whereas a DelayPool 
+	manages the bandwidth availability and assigned DelayId's.
+
+\section ExtendingDelayPools Extending Delay Pools
+\par
+	A CompositePoolNode is the base type for all members of a DelayPool.
+	Any child must implement the RefCounting primitives, as well as five
+	delay pool functions:
+	\li	stats() - provide cachemanager statistics for itself.
+	\li	dump() - generate squid.conf syntax for the current configuration of the item.
+	\li	update() - allocate more bandwith to all buckets in the item.
+	\li	parse() - accept squid.conf syntax for the item, and configure for use appropriately.
+	\li	id() - return a DelayId entry for the current item.
+
+\par
+	A DelayIdComposite is the base type for all delay Id's. Concrete
+	Delay Id's must implement the refcounting primitives, as well as two
+	delay id functions:
+	\li	bytesWanted() - return the largest amount of bytes that this delay id allows by policy.
+	\li	bytesIn() - record the use of bandwidth by the request(s) that this delayId is monitoring.
+
+\par
+	Composite creation is currently under design review, so see the
+	DelayPool class and follow the parse() code path for details.
+
+\section NeatExtensions Neat things that could be done.
+\par
+	With the composite structure, some neat things have become possible.
+	For instance:
+
+\par	Dynamically defined pool arrangements.
+	For instance an aggregate (class 1) combined with the per-class-C-net tracking of a 
+	class 3 pool, without the individual host tracking. This differs
+	from a class 3 pool with -1/-1 in the host bucket, because no memory
+	or cpu would be used on hosts, whereas with a class 3 pool, they are
+	allocated and used.
+
+\par	Per request bandwidth limits.
+	A delayId that contains it's own bucket could limit each request
+	independently to a given policy, with no aggregate restrictions.
+
+ */
diff --git a/doc/Programming-Guide/Groups.dox b/doc/Programming-Guide/Groups.dox
new file mode 100644
index 0000000000..b77eac85c4
--- /dev/null
+++ b/doc/Programming-Guide/Groups.dox
@@ -0,0 +1,93 @@
+/**
+ \defgroup POD              POD Classes
+ *
+ \par
+ *     Classes which encapsulate POD (plain old data) in such a way
+ *     that they can be used as POD themselves and passed around Squid.
+ *     These objects should have a formal API for safe handling of their
+ *     content, but it MUST NOT depend on any externality than itself
+ *     or the standard C++ libraries.
+ */
+
+/**
+ \defgroup Components		Squid Components
+ */
+
+/**
+ \defgroup ServerProtocol	Server-Side Protocols
+ \ingroup Components
+ \par
+ *   These routines are responsible for forwarding cache misses
+ *   to other servers, depending on the protocol.  Cache misses
+ *   may be forwarded to either origin servers, or other proxy
+ *   caches.
+ *   All requests to other proxies are sent as HTTP requests.
+ *   All requests to origin-server are sent in that servers protocol.
+ *
+ \par
+ *   Wais and Gopher don't receive much
+ *   attention because they comprise a relatively insignificant
+ *   portion of Internet traffic.
+ */
+
+/**
+ \defgroup libsquid         Squid Library
+ * 
+ \par
+ *     These objects are provided publicly through lidsquid.la
+ */
+
+/**
+ \defgroup Tests            Unit Testing
+ *
+ \par
+ *      Any good application has a set of tests to ensure it stays
+ *      in a good condition. Squid tends to use cppunit tests.
+ \par
+ *      It is preferrable to automated tests for units of functionality. There
+ *      is a boilerplate for tests in "src/tests/testBoilerplate.[cc|h]". New
+ *      tests need to be added to src/Makefile.am to build and run them during
+ *      "make check". To add a new test script, just copy the references to
+ *      testBoilerplate in Makefile.am adjusting the name, and likewise copy the
+ *      source files. If you are testing an already tested area you may be able
+ *      to just add new test cases to an existing script. I.e. to test the store
+ *      some more just edit tests/testStore.h and add a new unit test method
+ *      name.
+ */
+
+/**
+ \defgroup Callbacks         Event Callback Functions
+ * 
+ \par
+ *      Squid uses events to process asynchronous actions.
+ *      These mehods are registered as callbacks to receive notice whenever a
+ *      specific event occurs.
+ */
+
+/**
+ \defgroup Timeouts		Timeouts
+ \todo DOCS: document Timeouts.
+ */
+
+/**
+ \defgroup ServerProtocolHTTP HTTP
+ \ingroup ServerProtocol
+ \todo Write Documentation about HTTP
+ */
+
+/**
+ \defgroup ServerProtocolFTPAPI Server-Side FTP API
+ \ingroup ServerProtocol
+ */
+
+/**
+ \defgroup ServerProtocolWAIS WAIS
+ \ingroup ServerProtocol
+ \todo Write Documentation about Wais
+ */
+
+/**
+ \defgroup ServerProtocolPassthru Passthru
+ \ingroup ServerProtocol
+ \todo Write Documentation about Passthru
+ */
diff --git a/doc/Programming-Guide/Makefile.dox b/doc/Programming-Guide/Makefile.dox
new file mode 100644
index 0000000000..98f501a2d5
--- /dev/null
+++ b/doc/Programming-Guide/Makefile.dox
@@ -0,0 +1,55 @@
+/**
+ \page 03_Makefile Altering Squid Makefiles
+ *
+ \section MakefileWhich1 Which file to edit.
+ \par
+ *   Each directory in the squid sources is largely self-sufficient
+ *   \b Makefile.in is auto-generated by autotools based on the
+ *   \b configure.in and \b Makefile.am files.
+ *
+ \par
+ *   In general your additions should go in \b Makefile.am
+ *
+ *
+ \section MakefileUnitTests Adding new Unit Tests
+ *
+ \par
+ *   To alter or add new tests for a class where a set of tests
+ *   already exist, you should simply edit the \b tests/testX.(h|cc) files
+ *   for that class.
+ *
+ \par
+ *   When a new class needs testing you will need to add some variables
+ *   to Makefile.am telling autotools what to build. These variables are:
+ *
+ \subsection _SOURCES tests_testX_SOURCES= ...
+ \par
+ *      The list of .(h|cc) files that need linking to the class.
+ *      Most tests \b should use the actual Squid code. Though there are \b stub_X.cc
+ *      files available that simplify some of the more complex optional components.
+ *
+ \subsection _LDFLAGS tests_testX_LDFLAGS= ...
+ \par
+ *      Most cases it should be just \b \$(LIBADD_DL).
+ *
+ \subsection _DEPENDENCIES tests_testX_DEPENDENCIES= ...
+ \par
+ *      this is a list of the additional module *.a files that need linking.
+ *      All unit tests require: \b \@SQUID_CPPUNIT_LA\@
+ *
+ \subsection _LDADD tests_testX_LDADD= ...
+ \par
+ *      this is a list of the additional module libraries that need linking.
+ *      All unit tests require: \b \@SQUID_CPPUNIT_LIBS\@
+ \par
+ *
+ \subsection LIBS Modules available for *_DEPENDENCIES and *_LDADD
+ *
+ \par Linking ~/lib/* code:
+ \li *_LDADD= \b -L../lib \b -lmiscutil ...
+ \li *_DEPENDENCIES= \b \$(top_builddir)/lib/libmiscutil.a ...
+ *
+ \par Linking ~/src/auth/* code:
+ \li *_LDADD= \b libauth.la ...
+ *
+ */
diff --git a/doc/Programming-Guide/StorageManager.dox b/doc/Programming-Guide/StorageManager.dox
new file mode 100644
index 0000000000..96e443cff4
--- /dev/null
+++ b/doc/Programming-Guide/StorageManager.dox
@@ -0,0 +1,40 @@
+\**
+ \defgroup StorageManager Storage Manager
+ \ingroup Components
+ *
+ \par
+	The Storage Manager is the glue between client and server
+	sides.  Every object saved in the cache is allocated a
+	StoreEntry structure.  While the object is being
+	accessed, it also has a MemObject structure.
+
+\par
+	Squid can quickly locate cached objects because it keeps
+	(in memory) a hash table of all StoreEntry's.  The
+	keys for the hash table are MD5 checksums of the objects
+	URI.  In addition there is also a storage policy such
+	as LRU that keeps track of the objects and determines
+	the removal order when space needs to be reclaimed.
+	For the LRU policy this is implemented as a doubly linked
+	list.
+
+\par
+	For each object the StoreEntry maps to a cache_dir
+	and location via sdirn and sfilen. For the "ufs" store
+	this file number (sfilen) is converted to a disk pathname
+	by a simple modulo of L2 and L1, but other storage drivers may
+	map sfilen in other ways.  A cache swap file consists
+	of two parts: the cache metadata, and the object data.
+	Note the object data includes the full HTTP reply---headers
+	and body.  The HTTP reply headers are not the same as the
+	cache metadata.
+
+\par
+	Client-side requests register themselves with a StoreEntry
+	to be notified when new data arrives.  Multiple clients
+	may receive data via a single StoreEntry.  For POST
+	and PUT request, this process works in reverse.  Server-side
+	functions are notified when additional data is read from
+	the client.
+
+ */