Alex Rousskov [Thu, 17 Nov 2011 09:34:09 +0000 (02:34 -0700)]
Fix Comm::Write closing() assertion when retrying a failed UDP DNS query.
When we receive a UDP DNS response with a truncation (TC) bit set, we retry
using TCP. Since the retry trigger has nothing to do with the TCP connection,
it is possible that the TCP connection is being closed when we are about to
write to it: A call to our connection close callback has been scheduled but
has not fired yet. We must check for and avoid such race conditions.
Amos Jeffries [Thu, 17 Nov 2011 09:26:50 +0000 (02:26 -0700)]
Document and alter the pconn idle timeout directives.
Alters the directive names to clarify what they do and adds some more
description to the config file documentation.
Alters the internal config variables to match the new directive names.
Also alters the well known messages in mgr:filedescriptors report a little
to indicate client/server type and adds a standard "Idle " prefix for
easy automated scanning.
This patch allows Squid to provide details for the %D macro on the secure
connect failed error page when an SSL handshake with the origin server fails.
The default %D text is "Handshake with SSL server failed: XYZ" where XYZ is the
corresponding error string/description returned by OpenSSL if there is any.
Andrew Beverley [Sun, 6 Nov 2011 01:38:17 +0000 (19:38 -0600)]
Add a mask on the qos_flows miss configuration value.
The reason for this is to allow the preserved mark/TOS value from the
server to be altered slightly rather than overwritten completely.
Example usage. The following will preserve the netfilter mark, but will
ensure that the (9th) bit specified in the miss value will be set to 1
in the preserved mark:
Amos Jeffries [Sun, 6 Nov 2011 00:03:50 +0000 (18:03 -0600)]
Fixed two more cases of outdated shared memory cache detection
... which led to "STORE_DISK_CLIENT == getType()" assertions
when running SMP Squid with non-shared memory caching.
UsingSmp() is not the right condition to detect whether we are using a shared
memory cache because shared memory caching may be disabled and because
Coordinator does not use a shared memory cache even if shared caching is
enabled.
This option allows Squid administrator to add custom ICAP request
headers or eCAP options to Squid ICAP requests or eCAP transactions.
Use it to pass custom authentication tokens and other
transaction-state related meta information to an ICAP/eCAP service.
The addition of a meta header is ACL-driven:
adaptation_meta name value [!]aclname ...
Processing for a given header name stops after the first ACL list match.
Thus, it is impossible to add two headers with the same name. If no ACL
lists match for a given header name, no such header is added. For example:
# do not debug transactions except for those that need debugging
adaptation_meta X-Debug 1 needs_debugging
# log all transactions except for those that must remain secret
adaptation_meta X-Log 1 !keep_secret
# mark transactions from users in the "G 1" group
adaptation_meta X-Authenticated-Groups "G 1" authed_as_G1
The "value" parameter may be a regular squid.conf token or a "double
quoted string". Within the quoted string, use backslash (\) to escape
any character, which is currently only useful for escaping backslashes
and double quotes. For example,
"this string has one backslash (\\) and two \"quotes\""
Dmitry Kurochkin [Sat, 29 Oct 2011 11:49:34 +0000 (05:49 -0600)]
Provide "fake" AtomicWordT implementation for non-SMP configurations.
While we can not provide real AtomicWordT implementation on the systems where
atomic operations are not available, we can use a "fake" one if Squid is
running in non-SMP mode. Before the change, the "fake" implementation was
always asserting, which is too restrictive and leads to test failures on
systems without atomic operations.
The new implementation works under conditions similar to "fake" shared memory
segments and allows SMP-using code (e.g. Rock store) to work in non-SMP mode.
In particular, it allows tests to pass on such systems.
AtomicWordT was renamed to WordT and moved to Ipc::Atomic namespace to allow
Ipc::Atomic::Enabled() to be declared outside of the AtomicWordT template
class. This lets us define the Enabled() method in AtomicWord.cc which avoids
dragging protos.h #include into the AtomicWord.h header.
Dmitry Kurochkin [Sat, 29 Oct 2011 11:47:24 +0000 (05:47 -0600)]
Bug 3150: do not start useless unlinkd.
Unlinkd may be used only by UFS storage but, before the change, Squid
always started unlinkd if it was built, even if it was not needed.
Whether a SwapDir may use unlinkd depends on the SwapDir
implementation and DiskIO strategy it uses. The patch adds
unlinkdUseful() method to SwapDir and DiskIOStrategy to decide if
unlinkd should be started.
After the change, unlinkd may be started during reconfiguration and
unlinkdInit() may be called multiple times.
After the change, unlinkdClose() may be called when unlinkd was never
started. The patch removes a warning which was printed in this case
on Windows.
Dmitry Kurochkin [Sat, 29 Oct 2011 11:46:10 +0000 (05:46 -0600)]
Optimization: Make read requests in [Rock] IpcIo bypass max-swap-rate limit.
Before the change, IpcIoFile::WaitBeforePop() delayed both swap ins
(hits) and swap outs (misses). This is suboptimal because reads do
not usually accumulate unfinished I/O requests in OS buffers and,
hence, do not eventually require the OS to block all I/O.
Ideally, a disker should probably dequeue all pending disker requests,
satisfy reads ASAP, and then handle writes, but that is difficult for
several reasons. The patch implements a simpler approach: peek the
next request to be popped, and if it is a swap in (i.e., read or hit),
then pop it without any delay.
When a read is popped, we still adjust the balance member and LastIo,
because we do want to maintain the configured average I/O rate. When a
write request comes in, it will be delayed [longer] if needed.
In the extreme case of a very long stream of read requests (no writes
at all), there will be essentially no I/O rate limit and that is what
we want.
Dmitry Kurochkin [Sat, 29 Oct 2011 11:43:41 +0000 (05:43 -0600)]
Do not allow max-swap-rate and swap-timeout reconfiguration for Rock Store.
These options are used to configure DiskIO module during Rock SwapDir
initialization. During reconfiguration, the values are updated in Rock
SwapDir, but they do not reach the DiskIO module. Thus, while Squid says that
option has a new value, the new value is never really used. This patch fixes
this inconsistency.
In the future, we may support reconfiguration for max-swap-rate and
swap-timeout, but that would require adding reconfiguration support
to DiskIO modules.
Dmitry Kurochkin [Sat, 29 Oct 2011 11:41:43 +0000 (05:41 -0600)]
Independent shared I/O page limit.
Shared memory pages are used for shared memory cache and IPC I/O module.
Before this change, the number of shared memory pages needed for IPC I/O
was calculated from the size of shared memory cache. Moreover, shared
memory cache was required for IPC I/O.
The patch makes the limit for shared I/O pages independent from the
shared memory cache size and presence. IPC I/O pages limit is calculated
from the number of workers and diskers; it does not depend on cache_dir
configuration. This may change in the future if we learn how to compute
it (e.g., by multiplying max-swap-rate and swap-timeout if both are
available).
UsingSmp() is not the right condition to detect whether we are using a shared
cache because shared memory caching may be disabled and because Coordinator
does not use a shared memory cache even if shared caching is enabled.
The assertion was triggered by icons being added to Coordinator local memory
cache. TODO: Coordinator does not need to cache [icons] at all.
SslBump code assumed that it is signing generated certificates with a root CA
certificate. Root certificates are usually not sent along with the server
certificates because clients must have them independently installed or
built-in. Squid was not sending the signing certificate.
In many environments, Squid signing certificate is intermediate (i.e., it
belongs to a non-root CA). If Squid does not send that intermediate signing
certificate with the generated one, the client will not be able to establish a
complete chain of trust from the generated fake to the root CA certificate,
leading to errors.
With this change, Squid may send the signing certificate (along with the
generated one) using the following rules:
* If the configured signing certificate is self-signed,
then just send the generated certificate alone.
Note that root CA certificates are self-signed (by root CA).
* Otherwise (i.e., if the configured signing certificate is an intermediate
CA certificate), send both the intermediate CA and the generated fake
certificate.
* If Squid sends the intermediate CA certificate, Squid also sends
all other certificates from the "cert=" file, Sending a chain with
multiple intermediate CA certificates may be required when the Squid
signing certificate was signed by another intermediate CA.
Fix: The multi-language support is broken for Ssl error details
Current Ssl::ErrorDetail::useRequest never sets the ErrorDetail::request
member.The ErrorDetail::request member used to select the correct language
for the web client from Accept-Language header.
Alex Rousskov [Sat, 29 Oct 2011 04:52:37 +0000 (22:52 -0600)]
Bug 3383: unhandled exception: theGroupBSize > 0
Do not create shared queue for IpcIoFile if there are no diskers. The queue
code requires at least one queue reader and writer, and SMP does not imply the
existence of diskers.
Amos Jeffries [Sat, 29 Oct 2011 04:51:06 +0000 (22:51 -0600)]
Produce full list of peer options on maybe-direct forwarding case
Now that we are generating the set of possible peers before attempting to
connect. The retry_on_error directive is no longer repeatedly cycling
through the peer selecting process failing on one peer until that peer
works.
The maybe-direct case seems to have been depending on this behaviour to
locate alternative peers before going DIRECT. The attached patch seeks
to make the maybe-direct case produce a list of all available parents
with the specific algorithm choice first.
Alex Rousskov [Sat, 29 Oct 2011 04:49:50 +0000 (22:49 -0600)]
SMP shared memory cache stats were not collected.
"Hot Object" stats were not reported for shared memory cache.
Mean disk object size stats were aggregated inaccurately for SMP.
Moved Store-related stats into a dedicated StoreStats class,
encapsulating memory cache-related (mem), disk cache-related (swap), and
global store (number of objects) stats. Used consistent naming scheme
and a common parent class to make memory and disk stats more alike.
Moved Store stats collection into corresponding Store classes rather
than forcing GetInfo() in stat.cc to know how to deal with all Store stats.
Alex Rousskov [Sat, 29 Oct 2011 04:29:41 +0000 (22:29 -0600)]
Add basic "make check" tests for Rock Store, based on UFS and COSS tests.
Also, Name shared memory segments in a more portable way
to make shm_open() work on FreeBSD and some other OSes.
Linux and friends use "/slashless-name" template for shared memory segment IDs.
HPUX and friends use "/full/path/to/some/file".
FreeBSD uses the former or the latter, depending on version and jail context.
We now distinguish the above cases and prefix the internal segment ID
accordingly. The above analysis and its implementation are based on the
boost::interprocess code.
To make matters worse, the right prefix for path-based OSes depends on whether
we are running an [installed] Squid binary or just a "make check" test case.
For test cases, we cannot use PREFIX-based paths because they may not exist.
Instead, we use the current directory. This is consistent with TESTDIR (i.e.,
cache_dir location) which each fs test case defines to be in the current
directory.
Finally, the segment name may clash with cache_dir name on path-based OSes. We
now append ".shm" to the segment name to reduce the likelihood of a collision.
TODO: Should StoreMap/etc (or their creators) append "map"/etc to their IDs?
Andrew Beverley [Mon, 24 Oct 2011 02:35:47 +0000 (20:35 -0600)]
ext_session_acl: version 1.2
This patch makes the following changes to the session helper:
- Removes support for Berkeley DB 1.85
- Adds support for the current Berkeley DB (db.h) 4.x or later
- Adds support for a DB environment (if a directory is specified as
the path then an environment is created). This gives better
synchronisation within multiple processes
- Fixes a bug with active mode where LOGIN/LOGOUT did not write to the DB
This patch finishes the conversion of ServerStateData into AsyncJob by properly
implementing the doneAll() method and by removing calls to deleteThis() or
replacing with mustStop() calls as appropriate.
The Adaptation::AccessCheck modified to schedule an AsyncJobCall when
access check finishes.
The ServerStateData and ClientHttpRequest classes modified to work with the new
Adaptation::AccessCheck.
Alex Rousskov [Mon, 24 Oct 2011 02:29:21 +0000 (20:29 -0600)]
Account for max-swap-rate in swap-timeout handling for Rock.
Current swap-timeout code does not know about max-swap-rate. It
simply finds the longest-waiting I/O in disker queues (incoming and
outgoing) and then assumes that the new I/O will wait at least that
long. The assumption is likely to be wrong when the queue contains
lots of freshly queued requests to disker: Those requests have not
waited long yet, but a max-swap-rate limit will slow them down
shortly.
The patch changes the swap-timeout code to account for max-swap-rate
when dealing with the workers-to-disker queue: If there are N requests
pending, the new one will wait at least N/max-swap-rate seconds. Also
expected wait time is adjusted based on the queue "balance" member, in
case we have been borrowing time against future I/O already.
This patch:
- converts type of the Token::[width|precision] members from "unsigned int" to "int"
- renames the Token::[width|precision] members to Token::[widthMin/widthMax]
- removes unneeded typecastings
Alex Rousskov [Thu, 13 Oct 2011 04:41:46 +0000 (22:41 -0600)]
Allow non-shared memory caching when there are no cache_dirs.
Before this change, we destroyed unused/idle StoreEntries if nobody was voting
to keep them in store_table. That blocked non-shared memory cache from getting
new entries if there were no cache_dirs to vote for them, which is wrong. The
new code keeps unused/idle StoreEntries in store_table if nobody objects.
Amos Jeffries [Mon, 10 Oct 2011 08:30:48 +0000 (02:30 -0600)]
Add directive dns_v4_first to make IPv4 connections before IPv6 is tried.
Default off, to prefer the faster protocol.
The use-case for this is networks which are IPv6-enabled but stuck
behind slow tunnels and whose upstream is not supporting full transit
services over IP.
Henrik Nordstrom [Mon, 10 Oct 2011 08:22:34 +0000 (02:22 -0600)]
Properly parse HTTP list headers with embedded 8-bit characters
MSIE and maybe other browsers sometimes sends 8-bit high characters
in HTTP headers (and URLs). This was mistakenly read as CTL characters
on platforms with signed char type (i.e. x86 etc).
One visible effect of this was that HTTP Digest authentication failed
in MSIE when following a link with embedded 8-bit or UTF-8 characters.
Currently we are locking every file going to be accessed by CertificateDB code
even if it is not realy needed, because of a more general lock.
This patch:
- Replace the old FileLocker class with the pair Lock/Locker classes
- Remove most of the locks in CertificateDB with only two locks one
for main database locking and one lock for the file contain the
current serial number.
Bug 3349: Bad support for adaptation service URIs with '='
Currently using URIs which include "=" is not supported by
ecap_service/icap_service configuration parameters. Also the squid.conf
documentation about ecap_service is outdated.
This patch
- Fixes the [e|i]cap_service line parser to allow URIs with "="
- Changes the [e|i]cap_service configuration parameter to use the following syntax:
ecap_service id vectoring_point uri [name=value ...]
- Check for duplicated options
- Fixes the related documentation
- Also the older [e|i]cap_service syntax forms are supported:
ecap_service id vectoring_point [1|0] uri
ecap_service id vectoring_point [name=value ...] uri
- The "uri" options is not documented but supported.
Polished SMP caching code, primarily to stay out of the way in non-SMP mode.
Do not start useless diskers. Do not assume Rock cache_dirs are present.
Do not require IpcIo DiskIO module to build Rock store.
Check IPC I/O pages limits in Rock store only when using a disker.
Warn about Rock cache_dir disk space waste.
Warn if shared memory cache is enabled in non-SMP mode.
Fake shared memory segments if needed (e.g., we are using Rock cache_dirs with
no POSIX shared memory support) and possible (e.g., no SMP).
Alex Rousskov [Sat, 8 Oct 2011 08:14:51 +0000 (02:14 -0600)]
Added max-swap-rate=swaps/sec option to Rock cache_dir.
The option limits the rate of Rock disk access to smooth out OS disk commit
activity and to avoid blocking Rock diskers (or even other processes) on I/O.
Should be used when swap demand exceeds disk performance limits but the
underlying file system does not slow down incoming I/Os until the situation
gets out of control.
Bug 3190: Large HTTP POST stuck after early ICAP 400 error response
When an ICAP REQMOD service responds with an error to
(or the REQMOD transaction aborts while processing) a large HTTP
request, the HTTP request may get stuck because the request body
buffer gets full and nobody consumes the no-longer-needed content.
The ICAP code quits but leaves the body buffer intact in case the
client-side code wants to bypass the error. After that, nobody consumes
the request body because the buggy client side does not inform the body
pipe that there will be no other consumers, which would have triggered
a noteBodyConsumerAborted() callback and enable auto-consumption or closed
the client connection.
While the server writes the response to Store, the client side may
synchronously abort the entry. This happens, for example, when the
server receives a 304 response and handleIMSReply calls
sendClientOldEntry, which calls storeUnregister with our entry,
resulting in CheckQuickAbort.
Once server store write returns, if the server is done, it calls
FwdState::completed(). At that time, the server does not know that (and
should not care whether) the entry was aborted. Thus, we need to handle
aborted entries in FwdState::completed.
Remove SwapDir::reconfigure() arguments since they are not used.
Before the change, SwapDir::reconfigure() took index and path
arguments, but none of them was actually used: neither index nor path
can be changed during reconfigure. And both index and path are
available as SwapDir members so there is no reason to have these
arguments.
Ignore and warn about attempts to reconfigure static Rock store options.
Some Rock store options cannot be changed dynamically: path, size, and
max-size. Before the change, there were no checks during reconfigure
to prevent changing these options. This may lead to Rock cache
corruption and other bugs. The patch adds necessary checks to Rock
store code. If user tries to change an option that cannot be updated
dynamically, a warning is reported and the value is left unchanged.
Fix: "(ssl_crtd): Cannot add certificate to db" when updating expired cert
When ssl_crtd helper needs to add a fresh certificate to the database but
finds an expired certificate already stored, ssl_crtd deletes the expired
certificate file from disk before adding the fresh one. However, the addition
still fails because the expired certificate was not removed from database
indexes.
This fix:
- Adds code to update database indexes upon deletion of a row.
- Polishes certificates deletion code to avoid duplication.
TODO: Report failure details to Squid and make certificate-specific failures
not fatal for the ssl_crtd helper.
Alex Rousskov [Mon, 3 Oct 2011 11:07:46 +0000 (05:07 -0600)]
Temporary fix: Avoid killing Coordinator with unregistered cache mgr actions
that cause isOpen() assertions.
If a worker forwards a cache manager request to Coordinator and Coordinator
does not have that action registered, CacheManager::createRequestedAction()
throws (as it should) and Mgr::Request cleanup asserts when its half-baked
connection tries to close a not-yet-imported socket descriptor.
This workaround catches the exception, reports it, and manually closes the
socket descriptor. It also prevents an ACK response from being sent to the
worker, which triggers a worker timeout.
Mid-term TODO: Coordinator should register all actions that are known to kids.
Should Coordinator respond with an error instead of relying on a timeout?
Long-term TODO: Consider an API where cache manager responses can be
aggregated and formatted by Coordinator without knowing action-specific
details. After all, there are not so many types of action information (size,
count, rate, etc.) and most actions have simple reporting formats. Currently,
it is awkward to guarantee that Coordinator and all workers know all actions,
especially when some actions may be specific to non-worker kids such as
Coordinator and diskers.
Alex Rousskov [Thu, 22 Sep 2011 03:42:11 +0000 (21:42 -0600)]
SMP Caching: Core changes, IPC primitives, Shared memory cache, and Rock Store
Core changes
------------
* Added MemObject::expectedReplySize() and used it instead of object_sz.
When deciding whether an object with a known content length can be
swapped out, do not wait until the object is completely received and its
size (mem_obj->object_sz) becomes known (while asking the store to
recheck in vain with every incoming chunk). Instead, use the known
content length, if any, to make the decision.
This optimizes the common case where the complete object is eventually
received and swapped out, preventing accumulating potentially large
objects in RAM while waiting for the end of the response. Should not
affect objects with unknown content length.
Side-effect1: probably fixes several cases of unknowingly using negative
(unknown) mem_obj->object_sz in calculations. I added a few assertions
to double check some of the remaining object_sz/objectLen() uses.
Side-effect2: When expectedReplySize() is stored on disk as StoreEntry
metadata, it may help to detect truncated entries when the writer
process dies before completing the swapout.
* Removed mem->swapout.memnode in favor of mem->swapout.queue_offset.
The code used swapout.memnode pointer to keep track of the last page
that was swapped out. The code was semi-buggy because it could reset the
pointer to NULL if no new data came in before the call to doPages().
Perhaps the code relied on the assumption that the caller will never
doPages if there is no new data, but I am not sure that assumption was
correct in all cases (it could be that I broke the calling code, of course).
Moreover, the page pointer was kept without any protection from page
disappearing during asynchronous swapout. There were "Evil hack time"
comments discussing how the page might disappear.
Fortunately, we already have mem->swapout.queue_offset that can be fed
to getBlockContainingLocation to find the page that needs to be swapped
out. There is no need to keep the page pointer around. The
queue_offset-based math is the same so we are not adding any overheads
by using that offset (in fact, we are removing some minor computations).
* Added "close how?" parameter to storeClose() and friends.
The old code would follow the same path when closing swapout activity
for an aborted entry and when completing a perfectly healthy swapout. In
non-shared case, that could have been OK because the abort code would
then release the entry, removing any half-written entry from the index
and the disk (but I am not sure that release happened fast enough in
100% of cases).
When the index and disk storage is shared among workers, such
"temporary" inconsistencies result in truncated responses being
delivered by other workers to the user because once the swapout activity
is closed, other workers can start using the entry.
By adding the "close how?" parameter to closing methods we allow the
core and SwapDir-specific code to handle aborted swapouts appropriately.
Since swapin code is "read only", we do not currently distinguish
between aborted and fully satisfied readers: The readerGone enum value
applies to both cases. If needed, the SwapDir reading code can make that
distinction by analyzing how much was actually swapped in.
* Moved "can you store this entry?" code to virtual SwapDir::canStore().
The old code had some of the tests in SwapDir-specific canStore()
methods and some in storeDirSelect*() methods. This resulted in
inconsistencies, code duplication, and extra calculation overheads.
Making this call virtual allows individual cache_dir types to do custom
access controls.
The same method is used for cache_dir load reporting (if it returns
true). Load management needs more work, but the current code is no worse
than the old one in this aspect, and further improvements are outside
this change scope.
Moved common (and often rather complex!) code from store modules into
storeRebuildLoadEntry, storeRebuildParseEntry, and storeRebuildKeepEntry.
* Do not set object_sz when the entry is aborted because the true object
size (HTTP reply headers + body) is not known in this case. Setting
object_sz may fool client-side code into believing that the object is
complete.
This addresses an old RBC's complaint.
* When swapout initiation fails, mark swapout decision as
MemObject::SwapOut::swImpossible. This prevents the caller code from trying to
swap out again and again because swap_status becomes SWAPOUT_NONE.
TODO: Consider add SWAPOUT_ERROR, STORE_ERROR, and similar states. It
may solve several problems where the code sees _NONE or _OK and thinks
everything is peachy when in fact there was an error.
* Call haveParsedReplyHeaders() before entry->replaceHttpReply().
HaveParsedReplyHeaders() sets the entry public key and various flags (at
least). ReplaceHttpReply() packs reply headers, starting swapout process.
It feels natural to adjust the entry _before_ we pack/swap it, but I may be
missing some side-effects here.
The change was necessary because we started calling checkCachable() from
swapoutPossible(). If haveParsedReplyHeaders() is not called before we swap
out checks, the entry will still have the private key and will be declared
impossible to cache.
* Extracted the write-to-store step from StoreEntry::replaceHttpReply().
This allows the caller to set the reply for the entry and then update the
entry and the reply before writing them to store. For example, the server-side
haveParsedReplyHeaders() code needs to set the entry timestamps and make the
entry key public before the entry starts swapping out, but the same code also
needs access to entry->getReply() and such for timestampsSet() and similar
code to work correctly.
TODO: Calls to StoreEntry::replaceHttpReply() do not have to be modified
because replaceHttpReply() does write by default. However, it is likely that
callers other than ServerStateData::setFinalReply() should take advantage of
the new split interface because they call timestampsSet() and such after
replaceHttpReply().
* Moved SwapDir::cur_size and n_disk_objects to specific SwapDirs. Removed
updateSize(). Some cache_dirs maintain their own maps and size statistics,
making the one-size-fits-all SwapDir members inappropriate.
* A new SwapDir public method swappedOut() added. It is called from
storeSwapOutFileClosed() to notify SwapDir that an object was swapped
out.
* Change SwapDir::max_size to bytes, make it protected, use maxSize() instead.
Change SwapDir::cur_size to bytes, make it private, use currentSize() instead.
Store Config.Store.avgObjectSize in bytes to avoid repeated and error-prone
KB<->bytes conversions.
* Change Config.cacheSwap.swapDirs and StoreEntry::store() type to SwapDir.
This allows using SwapDir API without dynamic_cast.
* Always call StoreEntry::abort() instead of setting ENTRY_ABORTED manually.
* Rely on entry->abort() side-effects if ENTRY_ABORTED was set.
* Added or updated comments to better document current code.
* Added operator << for dumping StoreEntry summary into the debugging
log. Needs more work to report more info (and not report yet-unknown info).
* Fixed blocking reads that were sometimes reading from random file offsets.
Core "disk file" reading code assumed that if the globally stored disk.offset
matches the desired offset, there is no reason to seek. This was probably done
to reduce seek overhead between consecutive reads. Unfortunately, the disk
writing code did not know about that optimization and left F->disk.offset
unchanged after writing.
This may have worked OK for UFS if it never writes to the file it reads from,
but it does not work for store modules that do both kinds of I/O at different
offsets of the same disk file.
Eventually, implement this optimization correctly or remove disk.offset.
IPC primitives
--------------
To make SMP disk and memory caching non-blocking and correct, worker and
disker processes must asynchronously communicate with each other. We are
adding a collection of classes that support such communication.
At the base of the collection is the AtomicWordT template that uses GCC atomic
primitives such as __sync_add_and_fetch() to perform atomic operations on
integral values in memory shared by multiple Squid kids. AtomicWordT is used
to implement non-blocking shared locks, queues, store tables, and page pools.
To avoid blocking or very long searches, many operations are "optimistic" in
nature. For example, it is possible that an atomic store map will refuse to
allocate an entry for two processes even though a blocking implementation
would have allowed one of the processes to get the map slot. We speculate that
such conflict resolution is better than blocking locks when it comes to
caching, especially if the conflicts are rare due to large number of cache
entries, fast operations, and relatively small number of kids.
TODO: Eventually, consider breaking locks left by dead kids.
The shared memory cache keeps its own compact index of cached entries using
extended Ipc::StoreMap class (MemStoreMap). The cache also strives to keep its
Root.get() results out of the store_table except during transit.
Eventually, the non-shared/local memory cache should also be implemented
using a MemStore-like class, I think. This will allow to clearly isolate
local from shared memory cache code.
Allow the user to explicitly disable shared memory caching in SMP mode via
memory_cache_shared to squid.conf. Report whether mem_cache is shared.
Disable shared memory caching by default if atomic operations are not
supported. Prohibit shared memory caching if atomic operations are not
supported.
TODO: Better limits/separation for cache and I/O shared memory pages.
Eventually, support shared memory caching of multi-page entries.
Rock Store
----------
Rock Store uses a single [large] database-style file per cache_dir to store
cached responses and metadata. This part of the design is similar to COSS.
Rock Store does not maintain or rely on swap.state "log" for recovery.
Instead, the database is scanned in the background to load entries when Squid
starts. Rock Store maintains its own index of cached entries and avoids global
store_table. All entries must be max-size or smaller.
In SMP mode, each Rock cache_dir is given a dedicated Kid processes called
"disker". All SMP workers communicate with diskers to store misses and load
hits, using shared memory pages and atomic shared memory queues. Disker blocks
when doing disk I/O but workers do not. Any Diskers:Workers ratio is supported
so that the user can find and configure the optimal number of workers and
diskers for a given number of disks and CPU cores.
In non-SMP mode, should use good old blocking disk I/O, without any diskers,
but this has not been tested recently and probably needs more work.
Alex Rousskov [Fri, 16 Sep 2011 09:27:32 +0000 (03:27 -0600)]
Do not let cache manager requests kill SMP Squid using isOpen() assertion.
As the comment above the close call implies, we have not imported the foreign
socket descriptor into our fd_table yet. We must use raw close(2), just like
the corresponding Mgr::Request::Request(msg) code that allocates request.conn,
uses raw assignment to give that half-baked connection a descriptor.
TODO: This direct manipulation of Connection::fd is ugly, and this half-baked
connection will most likely cause more [hidden] problems down the road. For
example, Mgr::Request destructor will assert in a similar way if the request
object is destroyed before Action::respond() is called.
Adjust format code %la for intercepted connections
This patch adjusts the %la logformat code handling for intercepted connections
based on the following rules:
- If the corresponding http_port or https_port option has an explicit
listening host name or IP address, then log the IP address.
- Otherwise, log a dash character.
Also adjusts %lp logformat code handling for intercepted connections to always
log the port number from the corresponding http_port or https_port option.
Amos comments about %la formating code:
For the record these are the permutations we seek to cover...
Scenario 1: client 192.168.0.3 connects to google (74.125.237.81). Gets intercepted into Squid.
Alex Rousskov [Wed, 14 Sep 2011 06:30:55 +0000 (00:30 -0600)]
Fixed max-stale check. Entities not exceeding max-stale were marked as stale.
Since the fixed check is performed for entities already suspected of being
stale by refreshCheck(), it is difficult to describe exactly which entities
were affected by the bug. A rough description would be "entities which would
otherwise qualify for a FRESH_OVERRIDE_EXPIRES or FRESH_OVERRIDE_LASTMOD
exceptions located below the fixed check.
Other concerns about staleness checks have been discussed on squid-dev's
"max_stale broken?" email thread.
Alex Rousskov [Wed, 14 Sep 2011 06:28:54 +0000 (00:28 -0600)]
Add RunnersRegistry, an API to register and, later, run a group of actions.
Useful for keeping general initialization/cleanup management code (e.g.,
main.cc) independent from specific initialization/cleanup code (e.g.,
Store file systems or memory cache) during staged initialization and
cleaning.
Designed with Rock Store needs in mind. Currently unused. Should eventually be
used for most modules initialization and cleanup, removing main.cc dependency
on those modules and perfecting [de]initialization order.
The %I must print the server ip address, but currently displays the host name of
the server on squid error pages. This patch fixes %I to print the server ip
address in the server or "[unknown]" otherwise.
Alex Rousskov [Fri, 9 Sep 2011 10:15:37 +0000 (04:15 -0600)]
Temporary fixed coredumps when isOpen() is called during shutdown cleanup.
For a permanent fix, we need to avoid deleting fd_table while it is still
in use by others, such as DeferredReads, possibly by allowing event loop
to run during shutdown.
Alex Rousskov [Fri, 9 Sep 2011 10:14:48 +0000 (04:14 -0600)]
Docs: Adjusted cf.data.pre to reflect cross-vectoring point adaptation plan support.
These documentation changes were somehow missed from the r11327 commit that
introduced support for dynamic adaptation plans that cover multiple vectoring
points.
Alex Rousskov [Fri, 9 Sep 2011 10:13:14 +0000 (04:13 -0600)]
Support maximum field width for string access.log fields.
Some standard command-line and some log processing tools have trouble
handling URLs or other logged fields exceeding 8KB in length. Moreover,
Squid violates its own log line format and truncates the entire log line
if, for example, the URL is 8KB long. By supporting .precision format
argument, we allow the administrator to specify logged URL size and
avoid these problems.
Limiting logged field width has no effect on traffic on the wire, with
the exception of log records if they are sent over the network, of course.
TODO: The name comes from the printf(3) "precision" format part. It may
be a good idea to rename our "precision" into max_width or similar,
especially if we do not support floating point precision logging.
TODO: Old code used chars to store user-configured field width and
precision. That does not work for URLs, headers, and other entries
longer than 256 characters. This patch changes the storage type to int.
The code should probably be polished further to remove unsigned->signed
conversions.