author: Alex Rousskov <rousskov@measurement-factory.com>
cache_peer standby=N implementation.
The feature focus is to instantly provide a ready-to-use connection to a
cooperating cache peer, virtually at all times. This is useful when connection
establishment is "too slow" and/or when infrequent peer use prevents Squid from
combating slow connection establishment with the regular idle connection pool.
The feature is similar to Squid2 idle=N feature, but there are key differences:
* Standby connections are available virtually at all times, while Squid2 unused
"idle" connections are available only for a short time after a peer request.
* All N standby connections are not opened at once, reducing the chance of
the feature being mistaken for a DoS attack on a peer.
* More consistent support for peers with multiple IP addresses (peer IPs are
cycled through, just like during regular Squid request forwarding).
Besides, "idle" is a poor choice of adjective for an unused connection pool
name because the same term is used for used persistent connections, which have
somewhat different properties, are stored in a different pool, may need
distinct set of tuning options, etc. It is better to use a dedicated term for
the new feature.
The relationship between the max-conn limit and standby/idle connections is a
complex one. After several rewrites and tests, Squid now obeys max-conn limit
when opening new standby connections and accounts for standby connections when
checking whether to allow peer use. This often works OK, but leads to standby
guarantee violations when non-standby connections approach the limit. The
alternative design where standby code ignores max-conn works better, but is
really difficult to explain and advocate because an admin expects max-conn to
cover all connections and because of the idle connections accounting and
maintenance bugs. We may come back to this when the idle connections code is
fixed.
Fixed max-conn documentation and XXXed a peerHTTPOkay() bug (now in
peerHasConnAvailable()) that results in max-conn limit preventing the use of a
peer with idle persistent connections.
Decided to use standby connections for non-retriable requests. Avoiding
standby connections for POSTs and such would violate the main purpose of the
feature: providing an instant ready-to-use connection. A user does not care
whether it is waiting too long for a GET or POST request. Actually, a user may
care more when their POST requests are delayed (because canceling and
retrying them is often scary from the user point of view). The idea behind
standby connections is that the admin is responsible for avoiding race
conditions by properly configuring the peering Squids. If such proper
configuration is not possible or the consequences of rare races (e.g., due to
peer shutdown) are more severe than the consequences of slow requests, the
admin should not use standby=N. This choice may become configurable in the
future.
TODO: Teach peer probing code to push successful probing connections into the
standby pool (when enabled). Should be done as a followup project because of
the differences in standby and probe connection opening code, especially when
SSL peers are supported. Will require some discussion.
A standby pool is using a full-blown PconnPool object for storage instead of
the smaller IdleConnList, like the ICAP code does. The primary reasons for
this design were:
* A peer may have multiple addresses and those addresses may change. PconnPool
has code to deal with multiple addresses while IdleConnList does not. I do not
think this difference is really used in this implementation, but I did not
want to face an unknown limitation. Note that ICAP does not support multiple
ICAP server addresses.
* PconnPool has reporting (and cache manager integration) code that we should
eventually improve and report standby-specific stats. When this happens,
PconnPool will probably become abstract and spawn two kids, one for pconn and
one for standby pools.
Seemingly unrelated changes triggered by standby=N addition:
* Removed PconnPool from fde.h. We used to create immortal PconnPool objects.
Now, standby pools are destroyed when their peer is destroyed. Sharing raw
pointers to such pools is too dangerous. We could use smart pointers, but
PconnPools do not really belong to such a low-level object like fde IMO.
* Added FwdState::closeServerConnection() to encapsulate server connection
closing code, including the new noteUses() maintenance. Also updated
FwdState::serverClosed() to do the same maintenance.
* Close all connections in IdleConnList upon deletion. The old code did
not care because we never deleted PconnPools (although I am not sure
there were no bugs related to ICAP service pools which use IdleConnList
directly and do get destroyed).
* Fixed PconnPool::dumpHash(). It was listing the first entry twice because
the code misused misnamed hash_next().
* Removed unnecessary hard-coded limit on the number of PconnPools. Use
std::set for their storage.
* Fixed very stale PconnPool::pop() documentation and polished its code.
* Added RegisteredRunner::sync() method to use during Squid reconfiguration:
The existing run() method and destructor are great for the initial
configuration and final shutdown, but do not work well for reconfiguration
when you do not want to completely destroy and then recreate the state.
The sync() method (called via SyncRegistered) can be used for that.
Eventually, the reconfiguration API should present the old "saved" config
and the new "current" config to RegisteredRunners so that they can update
their modules/features intelligently. For now, they just see the new config.
Currently note values printed with "%note" formating code, which contain non
alphanumeric characters, were quoted and quotes were then escaped, resulting
in bizarre logged rendition of empty or simple values (often received from
various helpers):
%22-%22
%22Default_Google%22
%22pg13,US%22
This patch:
- does not use quotes to print annotations
- allow system admin to define a separator to use for logged
annotations. The %note logformat accepts the following argument:
[name][:separator]
The separator can be one of the ',' ';' or ':'.
By default, multiple note values are separated with "," and multiple
notes are separated with "\r\n". When logging named notes with
%{name}note, the explicitly configured separator is used between note
values. When logging all notes with %note, the explicitly configured
separator is used between individual notes. There is currently no way to
specify both value and notes separators when logging all notes with %note.
- makes the Format::Token::data a struct (now is a union) and initialize
Format::Token::data data members in Format::Token::Token constructor.
CbcPointer<> is used from code outside of Job protection where it is
safe to use Must(). In order to get a useful backtrace we need to assert
immediately at the point of failure. Particularly necessary since these
are in generic operators used "everywhere" in the code.
Anatoli [Tue, 29 Apr 2014 18:08:35 +0000 (06:08 +1200)]
Fix order dependency between cache_dir and maximum_object_size
parse_cachedir() has a call to update_maxobjsize() which limits the
store_maxobjsize variable used as the internal maximum_object_size
variable of the store data structure) to the value of maximum_object_size
defined at the moment of execution of this function, for all stores (all
store directories). So if parse for cache_dir is called before
maximum_object_size, we get the effect of the default 4 MB.
BUT, when we get to parse maximum_object_size line(s) after the last
cache_dir, the maximum_object_size option is processed and only shown on
the cachemgr config page without having updated store_maxobjsize.
Replace the HttpMsg::protocol member (only used by HttpRequest) with a
class URL member HttpRequest::url. To do this we adjust the class URL
scheme_ member to be mutable via the makeScheme() setter, and add a
clear() method to reset its internal state. These are necessary for
HttpRequest init() and initHTTP() mechanisms, but fiddling with the
scheme should be avoided as much as possible.
Remove the hack of forcing internal requests to http:// for processing
and cache lookup, then to internal:// for FwdState. Instead use the
available flags.internal for requests identified to be served by this proxy.
Drop the non-standard and now meaningless "internal://" scheme.
Add debugging to display what internal indicators are detected and when
the internal*() server and CacheManager are used by FwdState.
Also document HttpMsg::http_ver which is a copy of the HTTP-Version
field in the "on-wire" message syntax. It is unrelated to the socket
transport protocol and the URL scheme protocol.
Alex Rousskov [Sat, 26 Apr 2014 17:30:33 +0000 (11:30 -0600)]
Document counter-intuitive round-robin cache_dir selection bias; decrease it.
Many squid.confs have at least two groups of cache_dir lines. For example,
rare "large" objects go to larger/slower HDDs while popular "small" objects go
to smaller/fast SSDs:
# HDDs
cache_dir rock /hdd1 ... min-size=large
cache_dir rock /hdd2 ... min-size=large
cache_dir rock /hdd3 ... min-size=large
# SSDs
cache_dir rock /ssd1 ... max-size=large-1
cache_dir rock /ssd2 ... max-size=large-1
cache_dir rock /ssd3 ... max-size=large-1
# rock store does not support least-load yet
store_dir_select_algorithm round-robin
Since round-robin selects the first suitable disk during a sequential scan,
the probability of /hdd1 (/ssd1) selection is higher than other HDDs (SSDs).
Consider a large object that needs an HDD: /hdd1 is selected whenever scan
starts at /ssd1, /ssd2, /ssd3, and /hdd1 while /hdd2 is selected only when the
scan starts at /hdd2 itself! Documentation now warns against the above
cache_dir configuration approach and suggests to interleave cache_dirs:
cache_dir rock /hdd1 ... min-size=large
cache_dir rock /ssd1 ... max-size=large-1
cache_dir rock /hdd2 ... min-size=large
cache_dir rock /ssd2 ... max-size=large-1
cache_dir rock /hdd3 ... min-size=large
cache_dir rock /ssd3 ... max-size=large-1
store_dir_select_algorithm round-robin
Squid's historic implementation of round-robin made its natural bias even
worse because it made the starting point of the sequential scan to be the last
selected dir. In the first configuration example above, it boosted the
probability that the scan will start at one of the SSDs because smaller
objects are more popular and, hence, their dirs are selected more often. With
the starting point usually at an SSD, even more _large_ objects were sent to
/hdd1 compared to other HDDs! The code change avoids this artificial boost
(but the cache_dir lines should still be interleaved to avoid the natural
round-robin bias discussed earlier).
author: Alex Rousskov <rousskov@measurement-factory.com>, Christos Tsantilas <chtsanti@users.sourceforge.net>
Ssl::PeerConnector class
This patch investigates the new Ssl::PeerConnector class. This class connects
Squid client-side to a SSL cache_peer or SSL server. It is used by
TunnelStateData and FwdState to initiate and establish the SSL connection.
This class handles peer certificate validation.
The caller receives a call back with PeerConnectorAnswer. In the SSL connection
is not established because of an error, an error object suitable for error
response generation is attached to PeerConnectorAnser
The Ssl::PeerConnector class includes the old SSL initialization code from
FwdState class.
The squid.conf parser does not accept whitespace in the IFDEF: parameter
values. This was hiding parse issues when high_memory_warning was used
without gnumalloc.h availability.
Also, sort the cf_gen_defines array alphabetically by IFDEF: value for
easier maintenance.
Docs: mention byte units and maximum_object_size/cache_dir relationship
* we are having users configure rounded n MB or n GB values as KB numbers
unnecessarily from misunderstanding the unit measures. Document which
units Squid understands to assist simplifying these configs.
* the maximum_object_size directive has been a "toggle" default for
cache_dir max-size for some time but the dependency of having it
placed before cache_dir it is supposed to affect has not been
documented. Add the documentation, and re-order the default config
directive positions.
Alex Rousskov [Mon, 21 Apr 2014 18:09:06 +0000 (12:09 -0600)]
Stop wasting 96 RAM bytes per slot for high-offset slots in large shared caches
with more than 16777216 slots.
Ipc::StoreMap was using the same structure for all db slots. However, slots at
offsets exceeding SwapFilenMax (16777215) could not contain store entry
anchors and the anchor part of the structure was wasting RAM for those slots.
This change splits a single array of StoreMapSlots into two arrays, one
storing StoreMapAnchors and one storing StoreMapSlices. The anchors array is
shorter for caches with more than 16777216 slots.
For example, a StoreMap for a 1TB shared cache with default 16KB slot sizes
(67108864 slots) occupied about 6.5GB of RAM. After this change, the same
information is stored in about 2.0GB because unused anchors are not stored.
32-bit environments were wasting 72 (instead of 96) bytes per high-offset slot.
Also simplified Ipc::StoreMap API by removing its StoreMapWithExtras part.
The added complexity caused bugs and was not worth saving a few extra lines of
callers code. With the StoreMap storage array split in two, the extras may
belong to each part (although the current code only adds extras to slices),
further complicating the WithExtras part of the StoreMap API. These extras
are now stored in dedicated shared memory segments (*_ex.shm).
Added Ipc::Mem::Segment::Name() function to standardize segment name
formation. TODO: Attempt to convert shm_new/old API to use SBuf instead of
char* to simplify callers, most of which have to form Segment IDs by
concatenating strings.
Alex Rousskov [Sat, 19 Apr 2014 23:05:51 +0000 (17:05 -0600)]
Allow HITs on entries backed by a shared memory cache only.
A typo in r12501.1.59 "Do not become a store_client for entries that are not
backed by Store" prevented such entries from being used for HITs and possibly
even purged them from the memory cache.
Alex Rousskov [Fri, 18 Apr 2014 16:57:01 +0000 (10:57 -0600)]
Restored Squid ability to cache (in memory) when no disk caches are configured
which was lost during r12662 "Bug 3686: cache_dir max-size default fails"
The bug was hidden until the memory cache code started calling
StoreEntry::checkCachable() in branch r13315, exposing the entry size check to
a broken limit.
r12662 converted store_maxobjsize from a sometimes present disk-only limit to
an always set Squid-global limit. However, store_maxobjsize value was only
calculated when parsing cache_dirs. A config without cache_dirs would leave
store_maxobjsize at -1, triggering "StoreEntry::checkCachable: NO: too big"
prohibition for all responses.
This change moves store_maxobjsize calculation from parser to storeConfigure()
where some other Store globals are computed after squid.conf parsing.
Also honored memory cache size limit (just in case cache_mem is smaller than
maximum_object_size_in_memory) and removed leftover checks for
store_maxobjsize being set (it should always be set, at least to zero).
Alex Rousskov [Thu, 10 Apr 2014 04:56:25 +0000 (22:56 -0600)]
Significantly reduced Large Rock (and slightly shared memory) RAM requirements
by not allocating 40 (and 12) bytes of unused RAM per cache slot.
Rock: Stale code inherited from the Small Rock implementation allocated 40
bytes of unused memory per rock db slot (2.5GB of RAM for a 1TB disk with 16KB
db slots). The current (Large Rock) code stores the same kind of information
(and more) in a different location (Ipc::StoreMap).
Shared memory: Similarly, the Large Rock-based shared memory cache allocated
12 bytes of unused memory per shared memory cache slot (3.8MB of RAM for a
10GB shared RAM cache with 32KB slots).
This is may cause problems in some cases where the code assume that the MemBuf
is always NULL terminated. For example when an ErrorState object try to use
an empty errorpage template.
This patch terminates the (empty) MemBuf on MemBuf::init method.
Cleanup: Make crypt(3) detection dependent on the helpers that use it
Only run detection for crypt(3) support when the NCSA and getpwnam
helpers which use it are to be built.
Also, move shadow.h detection to depend on getpwnam helper which is the
only code using it.
Also, shuffle the libcrypt and libmd5 detection up into the section with
other crypto libraries so they are known in advance of helper detections
which may require libcrypt.
Fix OpenSSL detection when an explicit path is given
The previous OpenSSL detection was skipping the library checks when an
explicit path was presented. Resulting in no -lssl flag being passed to
the linker.
Rational for this fix:
pkg-config presents location-neutral details.
The explicit checks are likewise neutral provided the LIBS environment
variable has been set with the explicit path.
User presented path must be used regardless of which the library checks
are used in detection.
So...
Always perform the checks with optionally set LIBS and keep the
user provided path explicitly separate from the pkg-config *_LIBS
variable. Only assemble the parts into SSLLIB once all have been
identified.
Alex Rousskov [Sun, 6 Apr 2014 21:02:18 +0000 (15:02 -0600)]
Lifted 16777216 slot limit from rock cache_dirs and shared memory caches.
Also fixed rock disk space waste warning.
Rock store and shared memory cache code used entry and slot limit as if
the two were the same. In large caches, the number of entry slots may exceed
the number of entries supported by Squid (16777216 entries per store). This
is especially likely with large caches storing large entries (many slots
per entry with maximum number of entries) or large caches using small slot
sizes.
AFAICT, the bug led to the "tail" of cache storage being unused and cache
entries being purged even though there was still space to store them. Also,
Squid was allocating smaller shared memory tables (and temporary cache index
build tables) than actually needed for the configured cache sizes.
Note that this change does not remove the per-store 16777216 entry limit.
The old [small] rock warning about wasted disk space assumed that a cache
entry occupies exactly one slot. The updated warnings do not.
Also fixed and simplified db scanning condition during cache index build.
Alex Rousskov [Fri, 4 Apr 2014 16:10:40 +0000 (10:10 -0600)]
Report IpcIo file name with errors and warnings
to inform admin which cache_dir needs troubleshooting or tuning.
Some level-0/1 messages did not mention disker ID or file name at all while
others relied on disker process ID which is too low-level information from most
admins point of view.
Also improved error and warning messages consistency.
Old parser tracked completed states. Which led to several bugs in
resuming parse of a 'block' of data, end of block detection.
Instead track the state/block currently being processed.
Thanks to Alex Rousskov for identifying this as the source of several
bugs.
Amos Jeffries [Mon, 31 Mar 2014 04:46:50 +0000 (21:46 -0700)]
Cleanup: make loadable modules build variables follow Squid3 coding guidelines
Squid-3 coding guidelines mandate that AM_CONDITIONAL variables begin
with ENABLE_* and AC_DEFINE macros begin with USE_* to resolve confusion
over which is relevant.
Amos Jeffries [Sun, 30 Mar 2014 12:00:34 +0000 (05:00 -0700)]
Cleanup: replace USE_SSL wrapper macro with USE_OPENSSL
Squid-3 currently only supports OpenSSL for SSL/TLS components. This
makes the support type explicit and prepares for alternative SSL
libraries to be added in future with different macro wrappers.
Amos Jeffries [Sun, 30 Mar 2014 06:46:34 +0000 (23:46 -0700)]
Fix buffer overruns in generated NTLM tokens
The NTLM token payload (string value) encoding was not protecting fully
against 16-bit and 8-bit wrap errors and signedness conversions.
This protects against the wrap and conversion errors by truncating at
the maximum field length. That length limit is vastly larger than NTLM
protocol specified field sizes and permitted HTTP header sizes so is not
expected to cause issues with existing implementations.
Amos Jeffries [Sun, 30 Mar 2014 06:41:27 +0000 (23:41 -0700)]
crypto-ng: Drop --enable-ssl build option
This confgure option was fully overlapping --with-openssl.
Simplify the build options and cleanup in preparation for crypto-ng as
SSL functionality will be enabled by default in future when any of the
supported SSL/TLS libraries is available.
Amos Jeffries [Sat, 29 Mar 2014 11:15:13 +0000 (04:15 -0700)]
C++11: Upgrade auto-detection to use the formal -std=c++11
When the latest compilers added support for -std=c++11 they also dropped
the temporary -std=c++0x option without backward-compatible support. So
for the newest compilers we have not been testing the C++11 code.
As a result of this change Squid will no longer attempt to enable the
partial support in older compilers with -std=c++0x.
Also, update the compiler option test macro from autoconf project.
Amos Jeffries [Mon, 24 Mar 2014 04:57:32 +0000 (21:57 -0700)]
Parser-NG: Convert the ConnStateData input buffer to SBuf
Prepare the way to efficiently parse client requests using SBuf based
parser-ng.
IoCallback stores a raw-pointer to the ConnStateData::In::buf member
object rather than an SBuf reference to the backing MemBlob or char*
store so that only the short (blocking) FD_READ_METHOD() call needs to
provide any synchronous guarantees. We also particularly need a direct
(raw) pointer to the ConnStateData member to prevent the possible
read/consume collisions causing problems with the ConnStateData callback
and avoid having to merge two separate SBuf.
This patch fixes the following bug:
1) A user sends a CONNECT request with valid credentials
2) Squid checks the credentials and adds the user to the user cache
3) The same user sends a CONNECT request with invalid credentials
4) Squid overwrites the entry in the user cache and denies the second
CONNECT request
5) The user sends a GET request on the first SSL connection which is
established by now
6) Squid knows that it does not need to check the credentials on the
bumped connection but still somehow checks again whether the user is
successfully authenticated
7) Due to the second CONNECT request the user is regarded as not
successfully authenticated
8) Squid denies the GET request of the first SSL connection with 403
ERR_CACHE_ACCESS_DENIED
On proxies with Basic authentication and SSL bumping, this can be used
to prevent a legitimate user from making any HTTPS requests
Amos Jeffries [Sun, 23 Mar 2014 05:17:14 +0000 (23:17 -0600)]
Portability: invert the basic_nis_auth header check
autoconf macro will set its action-if-found if *any* of the headers is
found. Since these are mandatory headers being tested for we need to
disable if any are missing rather than enable on finding one works.
Alex Rousskov [Wed, 19 Mar 2014 04:04:52 +0000 (22:04 -0600)]
Avoid "FATAL: Squid has attempted to read data from memory that is not present"
crashes. Improve related code.
Shared memory caching code was not checking whether the response is generally
cachable and, hence, tried to store responses that Squid already refused to
cache (e.g., release-requested entries). The shared memory cache should also
check that the response is contiguous because the disk store may not check
that (e.g., when disk caching id disabled).
The problem was exacerbated by the broken entry dump code accompanying the
FATAL message. The Splay tree iterator is evidently not iterating a portion of
the tree. I added a visitor pattern code to work around that, but did not try
to fix the Splay iterator iterator itself because, in part, the whole iterator
design felt inappropriate (storing and flattening the tree before iterating
it?) for its intended purpose. I could not check quickly enough where the
broken iterator is used besides mem_hdr::debugDump() so more bugs like this
are possible.
Optimized StoreEntry::checkCachable() a little and removed wrong TODO comment:
Disk-only mayStartSwapOut() should not accumulate "general" cachability checks
which are used by RAM caches as well.
Added more mem_hdr debugging and polished method descriptions.
Fixed NullStoreEntry::mayStartSwapout() spelling/case. It should override
StoreEntry::mayStartSwapOut().