rousskov [Thu, 2 Aug 2007 02:51:06 +0000 (02:51 +0000)]
Polished ICAP debugging so that users can see fatal ICAP service failures
(and not much more) at debug level 2. This may help identify why a service
is suspended.
rousskov [Wed, 1 Aug 2007 10:36:47 +0000 (10:36 +0000)]
Author: Henrik Nordstrom <henrik@henriknordstrom.net>
Bug #1872: Date parsing error causing HTTP objects to get unexpectedly cached
and fresh ICAP OPTIONS rejected.
Applied Henrik's patch from Squid2 patchset #11161. The bug was affecting ICAP
Options TTLs, marking some ICAP services with soon-to-expire options as being
down. Original patch text is below.
An unexpected and undesired sideeffect of the earlier change to accept dates
far in the future was that the date calculations suddenly got daylight savings
sensitive due to mktime() automatically adjusting the supplied date. This has the
unfortunate effect that if Squid is running in a timezone with daylight savings
things may get cached longer than they should.
rousskov [Wed, 1 Aug 2007 10:18:08 +0000 (10:18 +0000)]
Bug 2024 fix: Do not optimize StoreEntry::write call away for zero-size
data because StoreEntry::write invokes client-side handlers for response
headers, even if there is no body data to write.
rousskov [Wed, 1 Aug 2007 10:16:00 +0000 (10:16 +0000)]
Bug #2029 fix: When ICAP produces a response before the entire
virgin response is read, do not abort the transaction.
The code needed to be re-arranged to make sure that virgin response body
is not appended to the store when adapted body is written there. The
changes resulted in a cleaner code, where virgin data never enters the
store if ICAP adaptation was started, and adapted data is always
written to the store. More polishing work is needed here.
Some debugging logs seem to indicate that asynchronous calls are fired
out of order. The code cannot (and probably should not) handle that.
This out-of-order calls may be happening because async calls are implemented
using time-based events, and events are using wall clock (current_dtime)
when looking for the right place in the event queue. The wall clock may
go backwards, placing a later call earlier in the event queue.
This change forces async call-looking events to use absolute timestamp of
zero, so nothing (including a later async call) can get in front of them.
Remember the ICAP OPTIONS fetching transaction so that we can clear/destroy
it when it is done, to avoid ICAPOptXactLauncher leak. Use the transaction
pointer instead of the boolean 'waiting' flag.
Author: Christos Tsantilas <chtsanti@users.sourceforge.net>
Bug #1971 fix, part 2, take B, v4: Polish HttpStateData::readReply() layout
to weed out bugs.
Moved processReplyHeader post-processing from readReply into newly added
continueAfterParsingHeader() method. Either deleted unreachable code in
readReply or re-arranged it to become reachable in continueAfterParsingHeader.
Start ICAP ACL checks only after the response header has been successfully
processed. The old code would start ICAP ACL checks for transactions that
would be aborted immediately after the check is scheduled.
Set eof when we read zero bytes. Doing so as a stand-alone step and referring
to 'eof' member from there on makes readReply code a little more clear.
Set reply member when HTTP header parser fails in processReplyHeader() so that
the caller has more information about the parsing error.
persistentConnStatus() now returns COMPLETE_NONPERSISTENT_MSG if we reach eof.
The old code would just call serverComplete() in readReply() instead, making
it a yet another special case there.
We are not done with ICAP if an ICAP ACL check is pending. This change
affects FTP as well and allows us to process more errors with ICAP.
Commented that persistentConnStatus() may be called for ICAP-adapted responses
that do not have a notion of a persistent connection. Why does this work?
These changes were inspired by Christos Tsantilas comments and patches but
the new bugs are mine.
Reversed bug #2011 fix because it may slow down ICAP, BodyPipe, and other code
using zero-delay events to implement "asynchronous" calls.
The code should probably be rewritten (a) to avoid any waiting/blocking when
there are ready events and (b) to allow waiting longer when there are no ready
events.
Removed the "reply" parameter and "this->reply = reply" assignment from
Server::setReply() because the actual parameter was always "reply". The
code setReply() replaced did not have the assignment either.
Eventually, the setReply() method needs to be renamed because the reply is
already set when we call it.
Bug #1921: ftp.cc(3309) ICAP cannot keep up with FTP; lost 1448/1448 bytes.
This patch adds a little buffering before the BodyPipe, allowing more
data to be temporarily stuffed than what fits in the pipe. Needed for
FTP directory listings and other sources with no byte-exact flow control.
The patch also moves most ICAP dependencies from the protocol implementations
to the shared ServerStateData parent, allowing them all to share logics.
Accept filters support will be enabled if the system defines the
SO_ACCEPTFILTER socket option. A filter will be applied to Squid's
listen socket(s) if set with the 'accept_filter' directive in
squid.conf.
Typical use will be to apply the 'httpready' filter, which requires
loading the 'accf_http' kernel module. The 'httpready' filter delays
passing new connection descriptors to Squid until a full HTTP request
is received.
Call leave_suid() in fatal() to make sure that swap.state files are written as the effective user, rather than root.
Squid may take on root privs during reconfigure. If squid.conf
contains a "Bungled" line, fatal() will be called when the process
still has root privs.
Author: Henrik Nordstrom <hno@squid-cache.org>
Fix bug 2010. Buffer overread at a loop termination.
Boolean logic typo caused an endcase read of an array-counting loop
to over-read the array in its termination condition.
Became visible with additional g++-4.2 code-safety checks.
Updated the Negotiate implementation to match current NTLM implementation
which gets rid of this. Looks like there has been some patches to NTLM
not applied on Negotiate. The two is pretty much the same code, and any
bugs hitting NTLM will hit Negotiate as well..
wessels [Sat, 30 Jun 2007 12:33:05 +0000 (12:33 +0000)]
./configure needs to use $CRYPTLIB when checking for crypt()
When ./configure checks for the crypt() function, it should have -lcrypt
in $LIBS (if -lcrypt was found). Created a special AC_CHECK_FUNCS
for crypt() that temporarily adds $CRYPTLIB to $LIBS.
This became a problem when #if HAVE_CRYPT was added to
helpers/basic_auth/NCSA/ncsa_auth.c. The crypt() function was always
there, but not always surrounded by the #ifdef.
rousskov [Thu, 28 Jun 2007 21:28:59 +0000 (21:28 +0000)]
Explicitly disabled ICAPConfig copying and assignment to prevent such things
from happening because the default/generated constructor and operator do not
do the right thing.
rousskov [Thu, 28 Jun 2007 21:18:16 +0000 (21:18 +0000)]
Fixed coredumps when requesting /config cache_object with ICAP enabled.
Various dump_icap_*() functions were copying ICAPConfig instances. ICAPConfig
class does not support copying and copying is not really needed (and may be
expensive) here.
rousskov [Thu, 28 Jun 2007 20:31:58 +0000 (20:31 +0000)]
Partial bug #1921 fix: Limit the amount of data read by FTP to avoid
overflowing ICAP buffers. This fix is partial because it applies to
reading response bodies (the data stream). A complete fix should also
address cases where the ICAP buffers are overwhelmed by content produced
by FTP code (e.g., a directory listing).
rousskov [Thu, 28 Jun 2007 20:20:05 +0000 (20:20 +0000)]
Bug #1986 fix: Do not abandon ICAP when doing FTP HEAD.
Also, do not forget to adjust cachability of FTP responses when doing ICAP.
FTP HEAD processing code was calling appendSuccessHeader() but was not
checking whether the latter set icapAccessCheckPending. This resulted
in aborted FTP HEAD transactions.
Also, appendSuccessHeader was adjusting entry cachability based on
authentication flags but _not_ when ICAP was enabled. I moved that code
to newly added haveParsedReplyHeaders() which is called with or without
ICAP, just like in HttpState.
TODO: rename appendSuccessHeader because it is doing a lot more than that.
rousskov [Tue, 26 Jun 2007 06:35:24 +0000 (06:35 +0000)]
Polish HttpStateData::readReply() layout to weed out bugs, step 4:
Polished comments and debugging messages.
This change should have no runtime effect.
rousskov [Tue, 26 Jun 2007 06:24:33 +0000 (06:24 +0000)]
Polish HttpStateData::readReply() layout to weed out bugs, step 3:
Handle I/O errors first so that we do not have to compare flag
with COMM_OK in all the if statements that follow.
This change should have no runtime effect.
rousskov [Tue, 26 Jun 2007 06:16:00 +0000 (06:16 +0000)]
Polish HttpStateData::readReply() layout to weed out bugs, step 2:
Use the xerrno parameter instead of the global errno. The latter has
nothing to do with the old system call we are post-processing now.
rousskov [Tue, 26 Jun 2007 06:11:08 +0000 (06:11 +0000)]
Polish HttpStateData::readReply() layout to weed out bugs, step 1:
1996 (revision 1.8) code was setting errno to zero before calling read(2).
We no longer make system calls in HttpStateData::readReply(). Resetting
errno here is a little bit too late.
rousskov [Tue, 26 Jun 2007 04:34:24 +0000 (04:34 +0000)]
Author: Christos Tsantilas <chtsanti@users.sourceforge.net>
Bug #1971 fix, part 1: Check whether the store entry got aborted while
we were waiting for an asynchronous ICAP ACL check.
Also moved common icapAclCheckDone() code from FtpStateData and
HttpStateData into Server. Removed "if (eof) serverComplete()" checks from
HttpStateData::processReplyHeader and HttpStateData::failReply methods as
eof can no longer be true there.
hno [Mon, 25 Jun 2007 20:38:14 +0000 (20:38 +0000)]
Bug #1798: Temporary shortage of system filedescriptors may cause Squid to perma
nently stop accepting connections
To adjust to system limits Squid automatically reduces the amount of
filedescriptors when it detects a system wide shortage. However, this
reduction might go so far that Squid completely stops accepting new
connections.
This patch adds a limit causing Squid to restart if the limit gets way
too low, in an attempt to recover if the situation is temporary.
hno [Sun, 24 Jun 2007 03:08:39 +0000 (03:08 +0000)]
Author: Manu Garg <manugarg@gmail.com>
Bug #1968: Squid hangs occasionally when using DNS search paths
Squid 2.6 (tested with squid 2.6.1 and 2.6.5) hangs after running for some time
(could be from 5 min to 40 min on a busy server). Squid becomes unresponsive
and CPU usage becomes 90-100%. ltrace shows that it's continuously comparing
strings in an infinite loop:
The problem seems to be in the way squid's internal DNS system (dns_internal.c)
keeps record of looked up but not yet answered DNS queries. This bug is hit
specifically when multiple search paths are used in /etc/resolv.conf.
Squid caches all dns queries before sending them to avoid duplicate queries for
the same name. (look at: idnsCacheQuery(q) and hash_table *idns_lookup_hash, in
dns_internal.c). This mechanism works well unless multiple search paths are
defined in /etc/resolv.conf. When multiple dns search paths are defined, same
query object is modified and next search path is concatenated to it's name.
This query is cached again and resent.
Problem is that the query is not unlinked before being cached and thus linked
again. Only the key of hash object (that's actually name) changes this time;
object itself remains same. This corrupts the hash table of looked up queries.
rousskov [Wed, 20 Jun 2007 03:19:04 +0000 (03:19 +0000)]
The last http.cc commit message is missing the following important change:
HttpStateData may get stuck if request body producer aborts while the
request body is being sent. The code already had a related XXX comment.
This change is meant to abort the server transaction when that happens,
but I cannot reproduce the problem and, hence, is not sure the change is
correct and sufficient.
When handling request body producer abort, create an error before
terminating the server transaction because FwdState asserts there is an
error. Unfortunately, we do not know why the body was aborted and have
to guess. The response is probably irrelevant if it was the HTTP client
that aborted the request. Thus, we are using ICAP_ERROR in hope that all
other cases are ICAP-related. If there are exceptions, we may confuse
admins and developers.
rousskov [Wed, 20 Jun 2007 03:13:49 +0000 (03:13 +0000)]
Partial bug #1973 fix: Try to revive a suspended service after the
icap_service_revival_delay.
When a service gets suspended, schedule an OPTIONS update after the
icap_service_revival_delay. If there is already an OPTIONS fetch scheduled,
that outdated event will be deleted using eventDelete().
Eventually, we may want to have a separate option to control suspended
service revival.
rousskov [Wed, 20 Jun 2007 03:12:15 +0000 (03:12 +0000)]
Bug #1974 fix: Bypass failures of optional ICAP services.
To bypass various failures, ICAPModXact catches its own exceptions and enables
the "echo" mode instead of quitting. Exceptions are not overruled if the
transaction is retriable. The code disables bypass for essential ICAP services
and when something makes clean echoing impossible (e.g., some buffered HTTP
body content was consumed).
This design allows the same bypass mechanism to be used for moth REQMOD and
RESPMOD, regardless of the vectoring point. Its implementation did not require
changing any Squid core files.
Previously many if not most ICAP failures were not bypassed and users were
receiving cryptic "ICAP protocol error" messages when an optional ICAP
service went down. With the current code, the service may go up and down many
times but only the transactions with large message bodies (that were executing
when the service went down) should be affected.
When deciding on whether to consume HTTP content written to the ICAP server,
Squid has to resolve a trade-off: postponing consumption longer increases
process footprint and may slow the HTTP side down, but consuming sooner
increases the chance of that "ICAP protocol" error being returned to the user.
Since Squid cannot buffer very large messages, some errors are inevitable. We
may want to add knobs to control this tradeoff.
The entries below are only indirectly related to the bug #1974 fix.
virginConsume() does not call or imply checkConsuming(). We must call
checkConsuming() even if we called virginConsume(). Otherwise, we may leave
and get destroyed while the pipe object still holds a [consumer] pointer to
us.
Polished VirginBodyAct so that we can distinguish disabled from undecided
states without using magical negative size constants.
Do not stop writing before throwing an exception. If the exception kills the
transaction, the transaction should cleanup the writer anyway. To bypass an
exception, we need all virgin content intact. Stopping writes before throwing
an exception may consume virgin content because the code does not know that
the exception is about to be thrown and may perceive content as "no longer
needed".
rousskov [Wed, 20 Jun 2007 03:08:33 +0000 (03:08 +0000)]
Ignore comm_write notifications if the newly added
ICAPXaction::ignoreLastWrite member is set. This allows kids to more safely
work around comm inability to cancel a pending write request.
When connect() times out, throw an exception instead of calling mustStop()
because we can bypass the former and not the latter. We cannot bypass mustStop
because the code may use it for legitimate stopping conditions. Eventually,
we may want to have two kinds of exceptions: bypassable and fatal.
Support icap_connect_timeout and icap_io_timeout squid.conf options.
Count I/O timeout as a service failure. This helps suspend broken services
faster and avoid HTTP processing delays when the service is optional.
Eventually, we will probably count all (or most) exceptions as service
failures.
Removed ICAPXaction members that were copied to AsyncJob some time ago.
rousskov [Wed, 20 Jun 2007 03:03:45 +0000 (03:03 +0000)]
Added icap_connect_timeout and icap_io_timeout squid.conf options to control
how long an ICAP transaction should wait for the ICAP server to accept its
connection or process the next I/O. These options are needed for many optional
ICAP services that are poorly reachable or otherwise delay network I/Os.
Waiting for a service is harmful to user experience, especially when the
service failure or lack of connectivity can be bypassed.
The two knobs use HTTP option values as defaults and have different defaults
for essential and optional services. This may be a bad idea and will change
depending on user feedback.
All timeouts are currently global. Eventually, we will need per-service or
service group timeouts and, possibly, even an OPTIONS-specific timeout.
rousskov [Wed, 20 Jun 2007 02:58:26 +0000 (02:58 +0000)]
Do not retry server transactions aborted due to request body supply
failures because the problem is not with the server and will not go away
if FwdState tries again.
rousskov [Wed, 20 Jun 2007 02:27:00 +0000 (02:27 +0000)]
Bug #1981 fix: Tell FwdState that an unregistered and failed Server is
gone.
When a Server unregisters its fd (by calling FwdState::unregister) and
then fails, call FwdState::handleUnregisteredServerEnd to tell FwdState
that the server is done working, so that the transaction does not get
stuck waiting for the server to call FwdState::complete. The old code
would just set Server's FwdState pointer to NULL which was only enough
if FwdState::self was already NULL due to aborts.
This change fixed the bug in my limited ICAP tests.
The whole FwdState relationship with Servers needs a serious revision as
similar bugs probably do (or will) exist. We probably need to maintain a
state variable representing the relationship. The following Server
states are possible, at least: got valid FD and working, closed FD but
still working, stopped working (failure or success). FwdState needs to
be notified when we stopped working. Currently, when Server closes the
FD, deep down the Server stack, it may not be clear whether the Server
is still working or not.
Finally, it may be a good idea for FwdState to notify Server when
FwdState is aborted. Currently, the Server must check that the store
entry is still valid to notice the abort, and it sometimes forgets to do
that, causing store assertions.
hno [Sun, 10 Jun 2007 18:13:31 +0000 (18:13 +0000)]
Kill old stale code dealing with deferred reads and delay pools.
Squid-3 already have a deferred read infrastructure dealing with
buffering and delay pools, taking filedescriptors in/out from the
active set when needed. The comm loops don't need to emulate this.
hno [Sun, 10 Jun 2007 18:07:28 +0000 (18:07 +0000)]
Clean up configure magics selecting which comm loop to use, promote epoll to stable
This is basically a copy from Squid-2, making configure a bit smarter
in selecting which comm loop to use, and automatically enabling epoll
if it seems usable.
kqueue is still not activated automatically even if defected as the
comm_kqueue implementation is still experimental and known to have issues
hno [Sun, 10 Jun 2007 17:02:23 +0000 (17:02 +0000)]
Bug #1939: --enable-epoll causes SSL to occationally hang
This patch makes comm_epoll support the "read_pending" flag, indicating
data has been buffered at the I/O layer and is immediately available for
processing without having to wait for an I/O event.
hno [Sun, 10 Jun 2007 16:43:17 +0000 (16:43 +0000)]
Restore the code making helpers run in their own sessions.
Having the helpers in the same session / process group as Squid broke
debugging with Squid running in foreground, which is worth more than
to have the process associations entirely correct.
hno [Sat, 2 Jun 2007 17:50:32 +0000 (17:50 +0000)]
Make tunnel.cc identify itself as tunnel and not ssl
tunnel.cc is about tunnel operation as needed for CONNECT, not really SSL.
This change renames everything in tunnel.cc to use tunnel instead of ssl,
this to avoid confusion with the SSL code.
rousskov [Thu, 31 May 2007 02:22:30 +0000 (02:22 +0000)]
Bug #1976 fix: Do not check OPTIONS response specifics if the response
is invalid in some way. Squid should now handle 404 Not Found and other
"invalid" ICAP OPTIONS responses by marking the ICAP service down.
OPTIONS error setting needs more work, but reporting the last error should
work for now.
Thanks to Christos Tsantilas for reporting and investigating this bug.