The security fix in v5 r14979 had a negative effect on collapsed
forwarding. All "private" entries were considered automatically
non-shareable among collapsed clients. However this is not true: there
are many situations when collapsed forwarding should work despite of
"private" entry status: 304/5xx responses are good examples of that.
This patch fixes that by means of a new StoreEntry::shareableWhenPrivate
flag.
The suggested fix is not complete: To cover all possible situations, we
need to decide whether StoreEntry::shareableWhenPrivate is true or not
for all contexts where StoreEntry::setPrivateKey() is used. This patch
fixes only few important cases inside http.cc, making CF (as well
collapsed revalidation) work for some [non-cacheable] response status
codes, including 3xx, 5xx and some others.
The original support for internal revalidation requests collapsing
was in trink r14755 and referred to Squid bugs 2833, 4311, and 4471.
Amos Jeffries [Mon, 29 May 2017 00:06:55 +0000 (12:06 +1200)]
Add OpenSSL library details to -v output
This is partially to meet the OpenSSL copyright requirement that binaries
mention when they are using the library, and partially for admin to see
which library their Squid is using when multiple are present in the system.
Alex Rousskov [Thu, 25 May 2017 15:35:25 +0000 (09:35 -0600)]
Fixed Windows-specific code in r15148. Polished r15148 code.
The Windows-specific part of File::synchronize() missed a curly brace.
Besides breaking compilation on Windows, it broke non-Windows code
formatting (--ignore-all-space is advised when looking at this commit).
Also polished r15148 code without changing its functionality. These
polishing touches were meant to be done during r15148 commit.
Squid crashes when server-first bumping mode is used with openSSL-1.1.0 release
When OpenSSL-1.1.0 or later is used:
- The SQUID_USE_SSLGETCERTIFICATE_HACK configure test is false
- The SQUID_SSLGETCERTIFICATE_BUGGY configure test is true
- Squid hits an assert(0) inside Ssl::verifySslCertificate when trying to
retrieve a generated certificate from cache.
Create PID file ASAP, before the shared memory segments.
PID file is created right after configuration finalization, before the
allocation for any shared memory segments.
Late PID file creation allowed N+1 concurrent Squid instances to create
the same set of shared segments (overwriting each other segments),
resulting in extremely confusing havoc because the N instances would
later lose the race for the PID file (or some other critical resource)
creation and remove the segments. If that removal happened before a kid
of the single surviving instance started, that kid would fail to start
with open() errors in Segment.cc because the shared segment it tries to
open would be gone. Otherwise, that kid would fail to _restart_ after
any unrelated failures (possibly many days after the conflict), with
same errors, for the same reason.
Shared state corruption was also possible if different kids (of the
winning instance) opened (and started using) segments created (and
initialized) by different instances.
Situations with N+1 concurrent Squid instances are not uncommon because
many Squid service management scripts (or manual admin commands!)
* do not check whether another Squid is already running and/or
* incorrectly assume that "squid -z" does not daemonize.
This change finally makes starting N+1 Squid instances safe (AFAIK).
Also made daemonized and non-daemonized Squid create the PID file at the
same startup stage, reducing inconsistencies between the two modes.
Make PID file check/creation atomic to avoid associated race conditions.
After this change, if N Squid instances are concurrently started shortly
after time TS, then exactly one Squid instance (X) will run (and have
the corresponding PID file). If another Squid instance has already been
running (with the corresponding PID file) at TS, then X will be that
"old" Squid instance. If no Squid instances were running at TS, then X
will be one of those new N Squids started after TS.
Lack of atomic PID file operations caused unexpected Squid behavior:
* Mismatch between started Squid instance and stored PID file.
* Unexpected crashes due to failed allocation of shared resources,
such as listening TCP ports or shared memory segments.
A new File class guarantees atomic PID file operations using locks. We
tried to generalize/reuse Ssl::Lock from the certificate generation
helper, but that was a bad idea: Helpers cannot use a lot of Squid code
(e.g., debugs(), TextException, SBuf, and enter_suid()), and the old
Ssl::Lock class cannot support shared locking without a major rewrite.
File locks on Solaris cannot work well (see bug #4212 comment #14), but
those problems do not affect PID file management code. Solaris- and
Windows-specific File code has not been tested and may not build.
Failure to write a PID file is now fatal. It used to be fatal only when
Squid was started with the -C command line option. In the increasingly
SMP world, running without a PID file leads to difficult-to-triage
errors. An admin who does not care about PID files should disable them.
Squid now exits with a non-zero error code if another Squid is running.
Also removed PID file rewriting during reconfiguration in non-daemon
mode. Squid daemons do not support PID file reconfiguration since trunk
r13867, but that revision (accidentally?) left behind half-broken
reconfiguration code for non-daemon mode. Fixing that code is difficult,
and supporting PID reconfigure in non-daemons is probably unnecessary.
Also fixed "is Squid running?" check when kill(0) does not have
permissions to signal the other instance. This does happen when Squid is
started (e.g., on the command line) by a different user than the user
Squid normally runs as or, perhaps, when the other Squid instance enters
a privileged section at the time of the check (untested). The bug could
result in undelivered signals or multiple running Squid instances.
These changes do not alter partially broken enter/leave_suid() behavior
of main.cc. That old code will need to be fixed separately!
PID file-related cache.log messages have changed slightly to improve
consistency with other DBG_IMPORTANT messages and to simplify code.
Squid no longer lies about creating a non-configured PID file. TODO:
Consider lowering the importance of these benign/boring messages.
* Terminal errors should throw instead of calling exit()
Squid used to call exit() in many PID-related error cases. Using exit()
as an error handling mechanism creates several problems:
1. exit() does not unwind the stack, possibly executing atexit()
handlers in the wrong (e.g., privileged) context, possibly leaving
some RAII-controller resources in bad state, and complicating triage;
2. Using exit() complicates code by adding a yet another error handling
mechanism to the (appropriate) exceptions and assertions.
3. Spreading exit() calls around the code obscures unreachable code
areas, complicates unifying exit codes, and confuses code checkers.
Long-term, it is best to use exceptions for nearly all error handling.
Reaching that goal will take time, but we can and should move in that
direction: The adjusted SquidMainSafe() treats exceptions as fatal
errors, without dumping core or assuming that no exception can reach
SquidMainSafe() on purpose. This trivial-looking change significantly
simplified (and otherwise improved) PID-file handling code!
The fatal()-related code suffers from similar (and other) problems, but
we did not need to touch it.
TODO: Audit catch(...) and exit() cases [in main.cc] to take advantage
of the new SquidMainSafe() code supporting the throw-on-errors approach.
Alex Rousskov [Mon, 22 May 2017 16:49:36 +0000 (10:49 -0600)]
Do not unconditionally revive dead peers after a DNS refresh.
Every hour, peerRefreshDNS() performs a DNS lookup of all cache_peer
addresses. Before this patch, even if the lookup results did not change,
the associated peerDNSConfigure() code silently cleared dead peer
marking (CachePeer::tcp_up counter), if any. Forcefully reviving dead
peers every hour can lead to transaction delays (and delays may lead to
failures) due to connection timeouts when using a still dead peer.
This patch starts standard TCP probing (instead of pointless dead peer
reviving), correctly refreshing peer state. The primary goal is to
cover a situation where a DNS refresh changes the peer address list.
However, TCP probing may be useful for other situations as well and has
low overhead (that is why it starts unconditionally). For example,
probing may be useful when the DNS refresh changes the order of IP
addresses. It also helps detecting dead idle peers.
Also delay and later resume peer probing if peerDNSConfigure() is
invoked when peers are being probed. Squid should re-probe because the
current probes may use stale IP addresses and produce wrong results.
xstrndup() does not work like strndup(3), and some callers got confused:
1. When n is the str length or less, standard strndup(str,n) copies all
n bytes but our xstrndup(str,n) drops the last one. Thus, all callers
must add one to the desired result length when calling xstrndup().
Most already do, but it is often hard to see due to low code quality
(e.g., one must remember that MAX_URL is not the maximum URL length).
2. xstrndup() also assumes that the source string is 0-terminated. This
dangerous assumption does not contradict many official strndup(3)
descriptions, but that lack of contradiction is actually a recently
fixed POSIX documentation bug (i.e., correct implementations must not
assume 0-termination): http://austingroupbugs.net/view.php?id=1019
The OutOfBoundsException bug led to truncated exception messages.
The ESI bug led to truncated 'literal strings', but I do not know what
that means in terms of user impact. That ESI fix is untested.
cachemgr.cc bug was masked by the fact that the buffer ends with \n
that is unused and stripped by the custom xstrtok() implementation.
TODO. Fix xstrndup() implementation (and rename the function so that
fixed callers do not misbehave if carelessly ported to older Squids).
This ACL detects presence of request, response or ALE transaction
components. Since many ACLs require some of these components, lack of
them in a transaction may spoil the check and confuse admin with
warnings like "... ACL is used in context without an HTTP request".
Using 'has' ACL should help dealing with these problems caused by
component-less transactions.
Also: addressed TODO in item #3 of v4 revision 14752.
bug 4321: ssl_bump terminate does not terminate at step1
The following trivial configuration should terminate all connections that
are subject to SslBumping:
ssl_bump terminate all
but Squid either splices or bumps instead.
This patch fixes Squid to immediately close the connection.
Also this patch:
- solves wrong use of Ssl::bumpNone in cases where the Ssl::bumpEnd
(do not bump) or Ssl::bumpSplice (splice after peek/stare at step1)
must be used.
- updates %ssl::bump_mode documetation.
- fixes %ssl::bump_mode formating code to print last bumping action
Squid does not forward HTTP transactions to dead peers except when a
dead peer was idle for some time (ten peer connect timeouts or longer).
When the idle peer is still dead, this exception leads to transaction
delays (at best) or client disconnects/errors (at worst), depending on
Squid and client configurations/state. I am removing this exception.
The "use dead idle peer" heuristic was introduced as a small part of a
much bigger bug #14 fix (trunk r6631). AFAICT, the stated goal of the
feature was speeding up failure recovery: The heuristic may result in
HTTP transactions sent to a previously dead (but now alive) idle peer
earlier, before the peer is proven to be alive (using peer revival
mechanisms such as TCP probes). However, the negative side effects of
this heuristic outweigh its accidental benefits. If somebody needs Squid
to detect revived idle peers earlier, they need to add a different
probing mechanism that does not jeopardize HTTP transactions.
Nobody has spoken in defense of this feature on Squid mailing lists:
http://lists.squid-cache.org/pipermail/squid-users/2017-March/014785.html
http://lists.squid-cache.org/pipermail/squid-dev/2017-March/008308.html
The removed functionality was not used to detect revived peers. All peer
revival mechanisms (such as TCP probes) remain intact.
bug 4711: SubjectAlternativeNames is missing in some generated certificates
Squid may generate certificates which have a Common Name, but do not have
a subjectAltName extension. For example when squid generated certificates
do not mimic an origin certificate or when the certificate adaptation
algorithm sslproxy_cert_adapt/setCommonName is used.
This is causes problems to some browsers, which validates a certificate using
the SubjectAlternativeNames but ignore the CommonName field.
This patch fixes squid to always add a SubjectAlternativeNames extension in
generated certificates which do not mimic an origin certificate.
Squid still will not add a subjectAltName extension when mimicking an origin
server certificate, even if that origin server certificate does not include
the subjectAltName extension. Such origin server may have problems when
talking directly to browsers, and patched Squid is not trying to fix those
problems.
bug4682: When client-first bumping mode is used squid can ignore http access
denied
Squid fails to identify HTTP requests which are tunneled inside an already
established client-first bumped tunnel, and this is results to ignore
http access denied for these requests.
Fixes squid documentation to correctly describe the squid behavior when the
"bump" action is selected on step SslBump1. In this case squid selects
the client-first bumping mode.
Bug 4659 - sslproxy_foreign_intermediate_certs does not work
The sslproxy_foreign_intermediate_certs directive does not work after r14769.
The bug is caused because of wrong use of X509_check_issued OpenSSL API call.
Now that Squid is sending an explicit '-' for the trailing %DATA parameter
if there were no acl parameters this helper needs to cope with it on
'active mode' session lookups when login/logout are not being performed.
Squid does not send CONNECT request to adaptation services
if the "ssl_bump splice" rule matched at step 2. This adaptation
is important because the CONNECT request gains SNI information during
the second SslBump step. This is a regression bug, possibly caused by
the Squid bug 4529 fix (trunk commits r14913 and r14914).
Count failures and use peer-specific connect timeouts when tunneling.
Fixed two bugs with tunneling CONNECT requests (or equivalent traffic)
through a cache_peer:
1. Not detecting dead cache_peers due to missing code to count peer
connect failures. TLS/SSL-level failures were detected (for "tls"
cache_peers) but TCP/IP connect(2) failures were not (for all peers).
2. Origin server connect_timeout used instead of peer_connect_timeout or
a peer-specific connect-timeout=N (where configured).
The regular forwarding code path does not have the above bugs. This
change reduces code duplication across the two code paths (that
duplication probably caused these bugs in the first place), but a lot
more work is needed in that direction.
The 5-second forwarding timeout hack has been in Squid since
forward_timeout inception (r6733). It is not without problems (now
marked with an XXX), but I left it as is to avoid opening another
Pandora box. The hack now applies to the tunneling code path as well.
Cleanup: remove redundant IntRange class from StoreMeta.cc
Use the Range<> template we have for generic ranges.
Move the Range.h template definitio to src/base/. It is only used by
code in src/.
Also, include a small performance improvements for StoreMeta::validLength().
Storing the valid TLV length limits in a static instead of generating a
new object instance on each call.
QA: allow test-suite to be run without a full build
The squid.conf processing tests have been assuming a full 'make check' was
run and generated a squid binary in the build directory.
This change allows callers to also run these tests on an arbitrary 'squid'
binary by using the command:
make --eval="BIN_DIR=/path" -C test-suite squid-conf-tests
where /path is the path under which a squid binary already exists.
Amos Jeffries [Fri, 31 Mar 2017 18:43:20 +0000 (07:43 +1300)]
Bug 4610: cleanup of BerkleyDB related checks
Most of the logic seems to be hangovers from when session helper was
using the BerkleyDB v1.85 compatibility interface. Some of it is
possibly still necessary for the time_quota helper, but that helper has
not been using it so far and needs an upgrade to match what happened to
session helper.
Changes:
* The helpers needing -ldb will not be built unless the library and
headers are available. So we can drop the Makefile LIB_DB substitutions
and always just link -ldb explicitly to these helpers.
NP: Anyone who needs small minimal binaries, can build with the
--as-needed linker flag, or without these helpers. This change has no
effect on other helpers or the main squid binary.
* Since we no longer need to check if -ldb is necessary, we can drop the
configure.ac and acinclude logic detecting that.
* Remove unused AC_CHECK_DECL(dbopen, ...)
- resolves one "FIXME"
* Fix the time_quota helper check to only scan db.h header file contents
if that file is existing, and if the db_185.h file is not being used
instead.
* Fix the session helper check to only try compiling with the db.h
header if that header actually exists.
* De-duplicate the library header file detection shared by configure.ac
and the helpers required.m4 files (after the above two changes).
Amos Jeffries [Sat, 18 Mar 2017 04:25:24 +0000 (17:25 +1300)]
Add move semantics to remaining HTTP Parser heirarchy
Destructor is requied because this hierarchy contains virtuals, which in turn
means the compiler will not add move constructor by default. So we must add
teh default ones in ourselves.
Squid may fail to load cache entry metadata for several very different
reasons, including the following two relatively common ones:
* A cache_dir entry corruption.
* Huge cache_dir entry metadata that does not fit into the I/O buffer
used for loading entry metadata.
Knowing the exact failure reason may help triage and guide development.
We refactored existing checks to distinguish various error cases,
including the two above. Refactoring also reduced code duplication.
These improvements also uncovered and fixed a null pointer dereference
inside ufsdump.cc (but ufsdump does not even build right now for reasons
unrelated to these changes).
Amos Jeffries [Wed, 15 Mar 2017 15:41:41 +0000 (04:41 +1300)]
Cleanup: Migrate Http1:: Parser child classes to C++11 initialization
Also, add move semantics to Http1::RequestParser. This apparently will
make the clear() operators faster as they no longer have to data-copy.
At least, one the base Parser class supports move as well.
It also consists a small experiment to see if virtaul destructor alone
allows automatic move constructor to be added by the compiler.
Alex Rousskov [Fri, 3 Mar 2017 23:18:25 +0000 (16:18 -0700)]
Fixed URI scheme case-sensitivity treatment broken since r14802.
A parsed value for the AnyP::UriScheme image constructor parameter was
stored without toLower() canonicalization for known protocols (e.g.,
Squid would store "HTTP" instead of "http" after successfully parsing
"HTTP://EXAMPLE.COM/" in urlParseFinish()). Without that
canonicalization step, Squid violated various HTTP caching rules related
to URI comparison (and served fewer hits) when dealing with absolute
URLs containing non-lowercase HTTP scheme.
According to my limited tests, URL-based ACLs are not affected by this
bug, but I have not investigated how URL-based ACL code differs from
caching code when it comes to stored URL access and whether some ACLs
are actually affected in some environments.
Fix two read-ahead problems related to delay pools (or lack of thereof).
1. Honor EOF on Squid-to-server connections with full read ahead buffers
and no clients when --enable-delay-pools is used without any delay
pools configured in squid.conf.
Since trunk r6150.
Squid delays reading from the server after buffering read_ahead_gap
bytes that are not yet sent to the client. A delayed read is normally
resumed after Squid sends more buffered bytes to the client. See
readAheadPolicyCanRead() and kickReads().
However, Squid was not resuming the delayed read after all Store clients
were gone. If quick_abort prevents Squid from immediately closing the
corresponding Squid-to-server connection, then the connection gets stuck
until read_timeout (15m), even if the server closes much sooner, --
without reading from the server, Squid cannot detect the connection
closure. The affected connections enter the CLOSE_WAIT state.
Kicking delayed read when the last client leaves fixes the problem. The
removal of any client, including the last one, may change
readAheadPolicyCanRead() answer and, hence, deserves a kickReads() call.
Why "without any delay pools configured in squid.conf"? When classic
(i.e., delay_pool_*) delay pools are configured, Squid kicks all delayed
reads every second. That periodic kicking is an old design bug, but it
resumes stuck reads when all Store clients are gone. Without classic
delay pools, there is no periodic kicking. This fix does not address
that old bug but removes Squid hidden dependence on its side effect.
Note that the Squid-to-server connections with full read-ahead buffers
still remain "stuck" if there are non-reading clients. There is nothing
Squid can do about them because we cannot reliably detect EOF without
reading at least one byte and such reading is not allowed by the read
ahead gap. In other words, non-reading clients still stall server
connections.
While fixing this, I moved all CheckQuickAbort() tests into
CheckQuickAbortIsReasonable() because we need a boolean function to
avoid kicking aborted entries and because the old separation was rather
awkward -- CheckQuickAbort() contained "reasonable" tests that were not
in CheckQuickAbortIsReasonable(). All the aborting tests and their order
were preserved during this move. The moved tests gained debugging.
According to the existing test order in CheckQuickAbortIsReasonable(),
the above problem can be caused by:
* non-private responses with a known content length
* non-private responses with unknown content length, having quick_abort_min
set to -1 KB.
2. Honor read_ahead_gap with --disable-delay-pools.
Since trunk r13954.
This fix also addresses "Perhaps these two calls should both live
in MemObject" comment and eliminates existing code duplication.
Amos Jeffries [Fri, 3 Mar 2017 11:52:37 +0000 (00:52 +1300)]
Bug 4671 pt4: refactor Format::assemble()
* replace the String local with an SBuf to get appendf()
* overdue removal of empty lines and '!= NULL' conditions
* reduce scope redux for many out assignments
* use sizeof(tmp) instead of '1024'
* Fixes many GCC 7 compile errors from snprintf() being called with a
too-small buffer.
* update the for-loops in Adaptation::History to C++11 and produce output
in an SBuf. Removing need for iterator typedef's and resolving more GCC 7
warnings about too-small buffers for snprintf().
Amos Jeffries [Fri, 3 Mar 2017 11:41:07 +0000 (00:41 +1300)]
Bug 4671 pt3: remove limit on FTP realm strings
Convert ftpRealm() from generating char* to SBuf. This fixes issues identified
by GCC 7 where the realm string may be longer than the available buffer and
gets truncated.
The size of the buffer was making the occurance rather rare, but it is still
possible.