Amos Jeffries [Fri, 11 Mar 2011 15:11:11 +0000 (08:11 -0700)]
Expand Makefile sources macros
Expand several macros used in earlier attempts to omtimize the Makefile
content. With the SourceLayout and modular changes underway these are
proving to be more of a problem than they are worth.
At some future time when the convenience libraries are settled it may be
worth revisiting some shared lists. But not yet.
Apply uri_whitespace before logging malformed requests
This patch try to implement the first option from those described at the
squid-dev thread with subject "Request URI logging for malformed requests":
http://www.squid-cache.org/mail-archive/squid-dev/201101/0004.html
Currently the logged URI set using the setLogUri method (in client_side.cc and
client_side_reply.cc files). Also the setLogUri called at least two times for
every normal request. Moreover the setLogUri always check if the URI contains
characters which must escaped which in the case of normal requests it is not
needed because urlCanonicalClean always used before pass the URI to setLogUri.
This patch:
- add a parameter to the setLogUri to say if the URI must cleaned and the
uri_whitespace filtering must applied.
- Remove the setLogUri call from the parseHttpRequest.
- Call in all cases (HTTP request error or not) the setLogUri in
clientProcessRequest
- In the case the URL is not a valid HTTP request applies the uri_whitespace
filtering.
- In the case the URI is valid the uri_whitespace filtering is not required
because it is already applied by the urpParse function.
Alex Rousskov [Wed, 9 Mar 2011 19:26:18 +0000 (12:26 -0700)]
Support libecap v0.2.0; fixed eCAP body handling and logging.
Summary of changes:
libecap v0.2.0 support: accept/update/log eCAP transaction meta-info.
libecap v0.2.0 support: supply client IP and username to eCAP adapter.
libecap v0.1.0 support: Support blockVirgin() API with ERR_ACCESS_DENIED.
Use pkg-config's PKG_CHECK_MODULES to check for and link with libecap.
Support adapter-specific parameters as a part of ecap_service configuration.
Allow uri=value parameter when specifying adaptation service URIs.
Fixed virgin body handling in our eCAP transaction wrapper (Ecap::XactionRep).
Fixed BodyPipe.cc:144 "!theConsumer" assertion.
Log "important" messages from eCAP adapters with DBG_IMPORTANT not DBG_DATA!
Added XXXs to identify old unrelated problems to be fixed separately.
Alex Rousskov [Wed, 9 Mar 2011 17:52:15 +0000 (10:52 -0700)]
Give full Request-URI to eCAP adapters.
Implement libecap::RequestLine::uri() to return full Request-URI instead
of URL path.
Niether full URL nor URL path is perfect because the actual request may
have full URI or a path, but Squid does not really keep that
information. This change makes our eCAP implementation consistent with
our ICAP implementation.
Eventually, eCAP may have an API that is guaranteed to return full
Request-URI and Squid may remember what kind of URI it got in the virgin
request, allowing for a more truthful implementation.
Alex Rousskov [Wed, 9 Mar 2011 17:44:55 +0000 (10:44 -0700)]
Bug 2621: Provide request headers to RESPMOD when using cache_peer.
A short-term fix.
When FwdServer::_peer is set, HttpStateData constructor creates a new special
HttpRequest, overwriting the request pointer set in the parent
(ServerStateData) constructor to fwd->request.
To make matters worse, this special peer request has no headers at all (even
though flags and some cached/computed header values are copied). We initialize
it with the right URL, method, and protocol. We copy flags and a few other
random properties from the original request. We never copy the original
headers.
Furthermore, regardless of the peering, when we create the headers to send to
the next hop, those headers are temporary and not stored in any request
structure (see HttpStateData::buildRequestPrefix). The non-peering code
survives this because the request member points to fwd->request, which has the
headers. The peering code fails as illustrated by this bug.
I believe both cases are buggy because server-side adaptation and core code
should have access to the request headers we sent rather than the request
headers we received and adapted (or no headers at all). After all, it is the
sent headers that determine the next hop view of our Squid and adaptation
services should see a pair of _matching_ request and response headers.
I am pretty sure there are other bugs related to HttpStateData using a special
peer request structure instead of fwd->request. Please note that FwdState has
no idea that this substitution is going on.
This quick short-term fix uses the original request and its headers when
checking RESPMOD ACLs. This is what the patch in bug #2562 did for Squid v3.0.
For the reasons described above, this patch may be either insufficient or
wrong for the long-term fix.
Alex Rousskov [Wed, 9 Mar 2011 17:39:02 +0000 (10:39 -0700)]
Import external libecap in adaptation/ecap/, where it belongs.
Either we finally found the working combination of libtool variables or the
previous attempts were failing due to libtool confusion over leftovers in
the build directory (or we will discover more build failures later!).
Polished libecap-related Makefile variable names based on squid-dev review.
Amos Jeffries [Sat, 5 Mar 2011 06:00:08 +0000 (23:00 -0700)]
Make DNS report failure on all packet construction errors
The attached patch alters the DNS lookup behaviour to abort with an error
in ALL cases where the rfc1035 library generates an error (negative result).
I'm not sure there is any noticable effect other than better code. The
error case *should* in old code be picked up on the initial packet
construction rather than the repeat packet. This may have been incorrect
given that the packet type is changing between A/AAAA.
Amos Jeffries [Sat, 5 Mar 2011 02:00:32 +0000 (19:00 -0700)]
Bug 2976: invalid URL on intercepted requests during reconfigure
Listening ports abuse the cbdata type as a pseudo refcount. This breaks
during reconfigure when the config is erased and the active requests
handles all become invalid pointers.
Interception only works on HTTP protocol. We can hard-code the scheme
and avoid this problem until a complete fix is written.
Add support for parameterized Cache Manager queries.
Currently, one sends mgr queries to the "whole" Squid. Kids responses may get
aggregated by Coordinator, and we, in general, want to aggregate all responses
that can be aggregated.
This patch allow us to aggregate stats for a subset of kids. For example,
the following query aggregates stats for just the first and the third workers:
mgr:info?workers=1,3
When query response information cannot be aggregated (or at least is not
aggregated right now), then a parameterized query will result in several
matching "byKid { ..." blocks.
This patch support the following scope variants:
* raw interface with access to any kid process or groups of kids; similar
to ${process_number} macro we already support in squid.conf:
mgr:foo?processes=id,id,id...
* higher-level interface to isolate workers by their numbers, starting
with 1 for the first worker:
mgr:foo?workers=num,num,num...
Currently, all kids except Coordinator are workers, but that will change in
the future as we get more kinds of kids.Currently, one sends mgr queries
to the "whole" Squid. Kids responses may get aggregated by Coordinator,
and we, in general, want to aggregate all responses that can be aggregated.
Amos Jeffries [Wed, 2 Mar 2011 07:27:24 +0000 (00:27 -0700)]
SourceLayout: protocol_t upgrade
This begins the libanyp.la SourceLayout changes by moving the protocol_t
type code to stand-alone files inside its namespace.
On the most part there are no behaviour changes. The boundaries between
the two semi-related types of protocol and protocol scheme are now clear:
* URLScheme is to be used where the protocol name is related to a URI
scheme description.
* AnyP::ProtocolType is to be used for other non-URI places requiring
the protocol to be named or manipulated as a concept.
Textual representations of these two concepts differ and the output of
these two types likewise differs to accomodate. Abusing them will result
in visibly unusual output.
Amos Jeffries [Fri, 25 Feb 2011 03:38:04 +0000 (16:38 +1300)]
Move EUI and NAT results into Comm::Connection
This makes NAT lookup utilize Comm::Connection as both data source and
storage location for results. The net output is that teh active Comm::Connection
object stores accurate data regardless of NAT.
Also moves the EUI lookup results in Comm::Connection.
Removed ConnStateData fields which are available via its Comm::Connection.
Removed HttPRequest fields which are available via its ConnStateData Comm::Connection
Also fixes several build issues unconvered during the transition.
Amos Jeffries [Thu, 24 Feb 2011 13:52:27 +0000 (02:52 +1300)]
Display correct information on dstdomain clashes
Abort with an error when a wildcard entry is going to be
discarded because of a sub-domain entry.
Also whenever there is a mixup between a domain and its
sub-domain wildcard alternative.
Rais a non-fatal warning when a useless subdomain entry
is being discarded and its super-set wildcard kept.
Care is taken to present the singular subdomain for
possible removal and keep the wildcard.
Amos Jeffries [Tue, 22 Feb 2011 23:39:33 +0000 (12:39 +1300)]
Fix for revno11239. Perform detection where it is not going to be erased
The kerberos library detection is wrapped inside a state which erases
the results of LIBS and flags found by AC_SEARCH_LIBS.
Do the search outside that block.
Alex Rousskov [Tue, 22 Feb 2011 23:13:13 +0000 (16:13 -0700)]
Fixed blocking reads that were sometimes reading from random file offsets.
Core "disk file" reading code assumed that if the globally stored disk.offset
matches the desired offset, there is no reason to seek. This was probably done
to reduce seek overhead between consecutive reads. Unfortunately, the disk
writing code did not know about that optimization and left F->disk.offset
unchanged after writing.
This may have worked OK for UFS if it never writes to the file it reads from,
but it does not work for store modules that do both kinds of I/O at different
offsets of the same disk file.
TODO: Implement this optimization correctly or remove disk.offset.
Alex Rousskov [Mon, 21 Feb 2011 04:31:06 +0000 (21:31 -0700)]
Code cleanup: Proper assignment and copying of HttpHeader.
Besides being the Right Thing, having correct assignment operator and
copy constructor helps classes that have HttpHeader data members to
avoid defining explicit assignment operators and copy constructors.
Also adds forgotten reset of "len" in the clean() method.
Other polishing touches. HttpHeader::reset() is now a tiny bit faster.
Adapter transaction options are now extracted and can be logged in HTTP
transaction log (access.log): Moved ICAP-specific logging-related adaptation
history features to the general adaptation history class and used them in eCAP
code.
Support adaptation_masterx_shared_names option for eCAP transactions. eCAP
transactions can now forward meta information to subsequent ICAP or eCAP
transactions (within the same master transaction scope).
Allow a routing eCAP service to determine the list of adaptation services to
be applied next to the same master transaction (same as the X-Next-Services
ICAP extension).
Identified a couple of %adapt::<last_h logging bugs and history maintenance
inconsistencies but left them alone for now.
Alex Rousskov [Thu, 17 Feb 2011 19:27:54 +0000 (12:27 -0700)]
libecap v0.2.0 options support: supply client IP and user name to eCAP.
Squid now uses libecap::Options API to send client IP and user name
meta-information to the eCAP adapter transaction, just like ICAP code
does when talking to an ICAP service transaction.
Renamed related icap_* options to their more general adaptation_*
equivalents because they now control both eCAP and ICAP behavior.
Old icap_* names are deprecated but still available.
Converted eCAP service configuration code to support the new Options
API, polished.
Alex Rousskov [Thu, 17 Feb 2011 19:11:06 +0000 (12:11 -0700)]
Fixed linking with libecap after PKG_CHECK_MODULES changes.
Previous commit (r11126) made sense and seemed to work fine until I
discovered that touching a source file in src/adaptation/ecap and
running "make" there leads to libtool's "file not found" errors for some
libtool-generated internal file. However, "make clean all" would still
work.
To better diagnose the issue, I renamed PKG_CHECK_MODULES prefix to
EXTLIBECAP and our ecap/libecap to ecap/libxecap. The difference in
names allowed to distinguish external libecap-related names/failures
from internal ones, but it did not solve the problem.
Moving EXTLIBECAP_LIBS from adaptation/ecap/ to adaptation/ Makefile.am
helped. Older code included external libecap in adaptation/Makefile.am
as well, perhaps to avoid similar problems. It does not make sense to
put external libecap there instead of the ecap-specific directory, but
it works.
Amos Jeffries [Thu, 17 Feb 2011 14:57:33 +0000 (03:57 +1300)]
squidclient: send cachemgr password via -w option
Preparation for internal cachemgr updates to use real proxy-auth.
The cachamgr password may now be sent in three ways:
Deprecated: mgr:info@password
Current Option: -w password mgr:info
Preferred: -u username -w password mgr:info
The old explicit @ syntax is now deprecated for visible use. The background
systems will still send it that way for cache_object: URLs. Use of this
overrides any -w option set. So it is still possible to login to a proxy
with one set of credentials and pass a separate password to the cachemgr.
The long-term plan is to drop @ completely in future.
The current option of just -w will convert the password to @ syntax in the
background but not add Proxy-Authentication headers. This may die in future.
The preferred alternative is to use -u and -w which triggers addition of real
Proxy-Authenticate headers. The username is not yet used by cachemgr but
may be required by the proxy ACL configuration.
Alex Rousskov [Wed, 16 Feb 2011 17:57:09 +0000 (10:57 -0700)]
Use pkg-config's PKG_CHECK_MODULES to check for and link with libecap.
Side-effect: We can and do check whether Squid supports the installed libecap
package version. This check avoids accidently building Squid with the wrong
libecap release. TODO: Check whether the loaded eCAP adapter was build with a
supported libecap version as well.
This simplifies the header parser for basic auth. Working towards a more
generalized model of AuthUser children. Removing two memory allocations, two
leaks and several unnecessary functions.
Alex Rousskov [Tue, 15 Feb 2011 04:09:58 +0000 (21:09 -0700)]
* Removed old Rock Store hack to update swap_file_sz before it is packed into
store entry disk "header". Prepend our own disk to each db cell. Eventually,
the same header may be used to store "next page" or other info.
Modern Squid code should not use packed swap_file_sz or object_sz because
neither can be unknown at the packing time. The storage code should either
record its own size information (Rock Store does that now) or rely on
3rd-party metadata (UFS uses fstat(2) system call).
* Do not promise to read a being-written entry. Use peekAtReader() to check
that we can read the presumably locked (by us) entry.
It is probably possible to queue a reader somehow and wait for the entry to
become readable or even to read initial pages as we write (when we start doing
incremental writing) but it would be rather complex and expensive to
synchronize such shared read/write access across workers.
* Fixed ::StoreIOState::offset_ Rock maintenance. The old code probably
misinterpret offset_ meaning and kept the disk offset there. It should have
been just the distance from the beginning of the db cell (not db file)
instead. It probably worked because we write everything at once and the
offset_ values were usually big enough to pass "wrote everything" checks.
* Extracted storeRebuildParseEntry from storeRebuildLoadEntry.
This allows Rock Store to analyze and skip disk cell header before giving
the packed store entry info to the common core code.
* Report more disk stats. Needs more work, especially in SMP reporting case.
Map stats are only reported for tiny maps because scanning the entire map may
be too expensive for bigger maps (is it worth the overhead to maintain these
stats all the time instead of computing them from scratch on-demand?).
* Moved StoreEntryBasics into Rock namespace.
* Be more consistent in setting Rock::IoState's swap_filen, swap_dirn,
diskOffset, and payloadEnd.
* Renamed vague (and confusing after the header was added to the db cell)
entrySize to more specific payloadEnd.
* Synced with r11228 changes (making Store work with SMP-shared max-size
cache_dirs).
Alex Rousskov [Tue, 15 Feb 2011 04:02:28 +0000 (21:02 -0700)]
Changes revolving around making Store work with SMP-shared max-size cache_dirs:
* Added MemObject::expectedReplySize() and used it instead of object_sz.
When deciding whether an object with a known content length can be swapped
out, do not wait until the object is completely received and its size
(mem_obj->object_sz) becomes known (while asking the store to recheck in vain
with every incoming chunk). Instead, use the known content length, if any, to
make the decision.
This optimizes the common case where the complete object is eventually
received and swapped out, preventing accumulating potentially large objects in
RAM while waiting for the end of the response. Should not affect objects with
unknown content length.
Side-effect1: probably fixes several cases of unknowingly using negative
(unknown) mem_obj->object_sz in calculations. I added a few assertions to
double check some of the remaining object_sz/objectLen() uses.
Side-effect2: When expectedReplySize() is stored on disk as StoreEntry
metadata, it may help to detect truncated entries when the writer process dies
before completing the swapout.
* Removed mem->swapout.memnode in favor of mem->swapout.queue_offset.
The code used swapout.memnode pointer to keep track of the last page that was
swapped out. The code was semi-buggy because it could reset the pointer to
NULL if no new data came in before the call to doPages(). Perhaps the code
relied on the assumption that the caller will never doPages if there is no new
data, but I am not sure that assumption was correct in all cases (it could be
that I broke the calling code, of course).
Moreover, the page pointer was kept without any protection from page
disappearing during asynchronous swapout. There were "Evil hack time" comments
discussing how the page might disappear.
Fortunately, we already have mem->swapout.queue_offset that can be fed to
getBlockContainingLocation to find the page that needs to be swapped out.
There is no need to keep the page pointer around. The queue_offset-based math
is the same so we are not adding any overheads by using that offset (in fact,
we are removing some minor computations).
* Added "close how?" parameter to storeClose() and friends.
The old code would follow the same path when closing swapout activity for an
aborted entry and when completing a perfectly healthy swapout. In non-shared
case, that could have been OK because the abort code would then release the
entry, removing any half-written entry from the index and the disk (but I am
not sure that release happened fast enough in 100% of cases).
When the index and disk storage is shared among workers, such "temporary"
inconsistencies result in truncated responses being delivered by other workers
to the user because once the swapout activity is closed, other workers can
start using the entry.
By adding the "close how?" parameter to closing methods we allow the core and
SwapDir-specific code to handle aborted swapouts appropriately.
Since swapin code is "read only", we do not currently distinguish between
aborted and fully satisfied readers: The readerGone enum value applies to both
cases. If needed, the SwapDir reading code can make that distinction by
analyzing how much was actually swapped in.
* Moved "can you store this entry?" code to virtual SwapDir::canStore().
The old code had some of the tests in SwapDir-specific canStore() methods and
some in storeDirSelect*() methods. This resulted in inconsistencies, code
duplication, and extra calculation overheads. Making this call virtual allows
individual cache_dir types to do custom access controls.
The same method is used for cache_dir load reporting (if it returns true).
Load management needs more work, but the current code is no worse than the old
one in this aspect, and further improvements are outside this change scope.
Moved common (and often rather complex!) code from store modules into
storeRebuildLoadEntry, storeRebuildParseEntry, and storeRebuildKeepEntry.
* Do not set object_sz when the entry is aborted because the true object size
(HTTP reply headers + body) is not known in this case. Setting object_sz may
fool client-side code into believing that the object is complete.
This addresses an old RBC's complaint.
* When swapout initiation fails, release StoreEntry. This prevents the caller
code from trying to swap out again and again because swap_status becomes
SWAPOUT_NONE.
TODO: Consider add SWAPOUT_ERROR, STORE_ERROR, and similar states. It may
solve several problems where the code sees _NONE or _OK and thinks everything
is peachy when in fact there was an error.
* Always call StoreEntry::abort() instead of setting ENTRY_ABORTED manually.
* Rely on entry->abort() side-effects if ENTRY_ABORTED was set.
* Added or updated comments to better document current code.
* Added operator << for dumping StoreEntry summary into the debugging log.
Needs more work to report more info (and not report yet-unknown info).
Alex Rousskov [Mon, 14 Feb 2011 04:56:59 +0000 (21:56 -0700)]
Call StoreEntry::abort() instead of setting ENTRY_ABORTED flag.
This may be necessary because the abort() method does more than just setting
the flag and releasing the request. It guarantees, for example, that the
swapout I/O, if any, is closed.
Alex Rousskov [Mon, 14 Feb 2011 04:48:35 +0000 (21:48 -0700)]
Polished mgr query handoff from the original recepient worker to Coordinator.
When the worker receives a cache manager query, gives it to Coordinator, and
receives an ACK from Coordinator, the worker should stop handling the
originating transaction without declaring the associated StoreEntry as
complete because doing so triggers store client activity on the client-side
and might cause undesirable output to the now-shared HTTP client socket.
Besides, declaring an empty entry as complete is kind of wrong.