At high levels of build parallelism on systems using GNU coreutils,
sometimes "make check" hangs on a request to confirm
removal of a read-only file temporary file.
Force-remove temporary test files to ensure removal is noniteractive.
Do not use raw pointers to index sourcehash CachePeers (#1474)
Simplified and improved code safety by using CbcPointers for sourcehash
cache_peers, as we have done for CARP peers in recent commit e7959b5.
Also fixed mgr:sourcehash Cache Manager reports to detail relevant
cache_peers instead of all cache_peers. This problem existed since
inception (2008 commit f7e1d9c) as detailed in recent commit e7959b5.
Also applied the new "no new globals" policy to CARP peering code, to
keep improved CARP and sourcehash peering code in sync.
Bug 5301: cachemgr.cgi not showing new manager interface URLs (#1479)
Also fix several related UI issues uncovered during testing:
* Prune the list of servers accessible via the CGI tool login.
Their responses would be badly mangled if accessed via
the old tools parse logic.
Also, hide the old login form if all servers use the new
manager interface.
* Ensure the 'menu' report is always used by default after
the CGI tool login. This prevents errors about MGR_INDEX
not being available on recent Squid releases. Restoring the
expected CGI tool behavior.
Extend cache_log_message to problematic from-helper annotations (#1481)
WARNING: Unsupported or unexpected from-helper annotation
with a name reserved for Squid use
The above message is emitted for every helper response containing
problematic annotations. Let admins control this reporting using
cache_log_message directive and message id 69.
This bug is specific to "half_closed_clients on" configurations.
assertion failed: ... "isOpen(fd) && !commHasHalfClosedMonitor(fd)"
location: comm.cc:1583 in commStartHalfClosedMonitor()
Squid asserts because Server schedules comm_read() after receiving EOF:
That extra read results in another EOF notification, and an attempt to
start monitoring an already monitored half-closed connection.
Upon detecting a potentially half-closed connection,
Server::doClientRead() should clear flags.readMore to prevent Server
from scheduling another comm_read(), but it does not and cannot do that
(without significant refactoring) because
* Server does not have access to flags.readMore
* flags.readMore hack is used for more than just "read more"
We worked around the above limitation by re-detecting half-closed
conditions and clearing flags.readMore after clientParseRequests(). That
fixed the bug but further increased poor code duplication across
ConnStateData::afterClientRead() and ConnStateData::kick() methods. We
then refactored by merging and moving that duplicated code into
clientParseRequests() and renamed that method to make backports safer.
Alex Rousskov [Sat, 9 Sep 2023 03:01:52 +0000 (03:01 +0000)]
Log %err_code for ERR_RELAY_REMOTE transactions (#1472)
For ERR_RELAY_REMOTE transactions, Squid was logging %err_code as "-"
because BuildHttpReply() was not updating HttpRequest::error or ALE.
That update was missing because the pre-computed response in those
transactions triggered a premature exit from BuildHttpReply().
BuildHttpReply() should not be updating errors at all, but significant
code refactoring required to fix that problem needs a dedicated change.
Also enabled regression testing for the fixed bug and the bug fixed in
recent commit ea3f56e.
Alex Rousskov [Fri, 8 Sep 2023 06:05:36 +0000 (06:05 +0000)]
Restore errno in %err_detail for ERR_CONNECT_FAIL (#1368)
Squid was sometimes logging %err_code/%err_detail as
ERR_CONNECT_FAIL/WITH_SERVER. It now logs
ERR_CONNECT_FAIL/WITH_SERVER+errno=111 (or similar).
When dealing with two error details, Squid was ignoring the latter one.
The new ErrorDetails code combines multiple details, reusing the
existing Security::ErrorDetail::brief() "a+b+c" syntax.
The new detail accumulation functionality may help detail other errors.
At least some of the logged errno details were lost in commit ba3fe8d.
Do not use invasive lists to store CachePeers (#1424)
Using invasive lists for CachePeer objects gives no advantages, but
requires maintaining non-standard error-prone code that leads to
excessive locking and associated memory overheads.
Also fixed a neighbors_init() bug that resulted in accessing an already
deleted "looks like this host" CachePeer object.
Amos Jeffries [Thu, 31 Aug 2023 14:09:50 +0000 (14:09 +0000)]
Allow creation of RefCountable objects via Make() (#1458)
Compared to calling new() directly, using a Pointer-returning Make()
reduces memory leaks related to freshly allocated heap objects. Perfect
forwarding of constructor parameters through Make() minimizes overhead.
const auto foo1 = new Foo(...); // XXX: leak-prone
const auto foo2 = Foo::Pointer::Make(...); // OK
Future changes will prohibit RefCountable object creation via direct
new() calls and require using Make() or an equivalent safety API.
Alex Rousskov [Wed, 16 Aug 2023 17:53:41 +0000 (17:53 +0000)]
Cover OnTerminate() calls unrelated to exception handling (#1430)
The C++ standard lists many reasons[^1] for calling std::terminate().
All of them deal with an "exception handling failure". However, when
Squid bugs lead to an undefined behavior (e.g., by calling a pure
virtual function), some compilers also call std::terminate(). In those
cases, there may be no active exception, and our std::terminate()
handler's report about an "exception handling failure" (with "no active
exception") was confusing and misleading.
Also do not describe an exception when there is none. Callers that know
that the exception exists may continue to use CurrentException(),
especially if they do not need to report that exception on a dedicated
debugs() line. Other callers, including OnTerminate(), should use the
new CurrentExceptionExtra() manipulator.
Keep ::helper objects alive while in use by helper_servers (#1389)
Squid creates ::helper objects to manage a given configured helper. For
each ::helper object, Squid may create one or more helper_server objects
to manage communication with individual helper processes. A
helper_server object uses its so called "parent" ::helper object to
access configuration and for helper process death notification purposes.
There are no checks that the "parent" object is still alive when used.
The same problem applies to statefulhelper and helper_stateful_server.
helper_server code evidently attempted to extend their "parent" lifetime
using cbdata, but that does not work, especially in C++ world where the
object is destructed (and, hence, enters an undefined state) even though
its top-level memory is preserved by cbdata. Non-trivial members like
helper::queue (std::queue) and statefulhelper::reservations
(std::unordered_map) release their internal memory when destructed.
We now refcount ::helper objects to keep them alive until the last
object using them is gone. This does not result in reference loops
because the ::helper object uses raw dlink_node pointers to store its
helper_servers.
The following helpers (listed here by their container names) were
destructed while possibly still in use by their helper_server objects:
external_acl::theHelper, Ssl::CertValidationHelper::ssl_crt_validator,
Ssl::Helper::ssl_crtd. The following helpers are not destructed during
reconfiguration: redirectors, storeIds, basicauthenticators,
ntlmauthenticators, negotiateauthenticators, and digestauthenticators
(even though casual reading of relevant code may suggest otherwise).
This bug fix does not address or mark many remaining helper bugs.
We cannot test cbdata validity of CachePeer pointers stored in
Config.peers because Config.peers own CachePeer objects. By definition,
owners should not lock their objects and do not need to test object
validity: Owners determine the lifetime of those objects! Naturally,
unlocked objects must not be tested for validity by others because to
test an object validity one has to have a lock that preserves object
metadata even after the object is invalidated (by its owner).
Alex Rousskov [Sat, 12 Aug 2023 15:40:12 +0000 (15:40 +0000)]
Bug 5294: ERR_CANNOT_FORWARD returned instead of ERR_DNS_FAIL (#1453)
Since 2017 commit fd9c47d, peer selection code stopped reporting
ERR_DNS_FAIL cases because PeerSelector::noteIps() treated DNS answers
without IP addresses as if at least one IP address was received. Without
seeing a DNS resolution error, the ultimate recipient of the DNS
resolution results (e.g., CONNECT tunneling or regular forwarding code)
used ERR_CANNOT_FORWARD to indicate a failure to find a forwarding path.
PeerSelector::noteIps() code mimicked legacy IPH code with regard to
handling of the addresses parameter. However, IPH caller had a special
emptyIsNil adjustment that was missing from the noteIps() call! We now
apply that adjustment to both noteIps() and IPH code paths.
Long-term, we should probably remove nil address container pointers.
Having two different ways to signal lack of IPs is dangerous. Currently,
there is only one known supplier of nil address container:
IpcacheStats.invalid code that validates ipcache_nbgethostbyname() name
parameter. Either the corresponding nil/empty name check should be
converted into an assertion (blaming the ipcache_nbgethostbyname()
caller for not validating the name) OR that checking code should supply
an empty address container to finalCallback().
Shmaya [Fri, 11 Aug 2023 17:41:30 +0000 (17:41 +0000)]
Bug 4981: Work around in-call job invalidation bugs (#1428)
Bug 4981 is one known case of such invalidation, but this workaround is
much broader than that bug context. We can speculate that architectural
problems described in commit e3b6f15 are behind (some of) these bugs.
Alex Rousskov [Tue, 8 Aug 2023 13:36:09 +0000 (13:36 +0000)]
Replaced ACLStrategised, enabling other ACL improvements (#1392)
The ACLStrategised class stands in the way of several ACL bug fixes and
improvements because its design forces us to place ACL-like match()
methods inside non-ACL classes, creating two parallel matching
hierarchies: one rooted in the ACL class and one rooted in the Strategy
template. The two APIs have similar methods, but Strategy-based objects
lie outside the ACL hierarchy and cannot be treated as ACL objects.
ACLStrategised pairs an ACL-like matching algorithm (a Strategy-derived
class) with an ACLData-derived class. The need to combine the two is
genuine, but the same combination is best supported without creating a
parallel hierarchy of Strategy classes. The new ACL-derived
ParameterizedNode base class accomplishes that, addressing the old
ACLStrategised design XXX.
Strategy-derived classes were not pooled at all! With these changes, all
formerly ACLStrategised classes get individual memory pools, typically
one per acltype, with proper acltype-based naming in mgr:mem reports. No
other functionality changes intended.
Do not use raw pointers to index CARP CachePeers (#1381)
Simplified and improved code safety by using CbcPointers instead.
Also fixed mgr:carp Cache Manager reports to detail relevant
cache_peers instead of all cache_peers. When mgr:carp report was added
in 2000 commit 8ee9b49, Squid did not index (or even distinguish!)
CARP cache_peers, and the reporting loop naturally iterated through all
cache_peers. 2002 commit b399543 added identification and indexing of
CARP peers but forgot to adjust the reporting loop.
Very similar changes will be applied to userhash and sourcehash
cache_peers. 2008 commit 63104e2 simply copied problematic CARP code to
add userhash and sourcehash cache_peer support. This change adds a few
reusable types with those upcoming improvements in mind.
Alex Rousskov [Wed, 2 Aug 2023 05:00:49 +0000 (05:00 +0000)]
CI: More HTTP caching and revalidation tests (#1440)
Use Daft cache-response test to monitor for bugs in basic caching code,
including bugs like the one fixed by commit c203754. Unlike the old
proxy-collapsed-forwarding test that uses concurrent requests, this test
varies basic response properties.
Use Daft accumulate-headers-after-304 test to monitor for bugs like the
one fixed by commit 55e1c6e. Unlike old proxy-update-headers-after-304,
this test focuses on certain _problematic_ HTTP 304 header updates.
Due to poor code duplication, commit 92a5adb accidentally classified
URLs without a trailing slash in the magical prefix as valid cache
manager URLs, triggering the above ERRORs. We were denying such
"slashless" cache manager URLs (as invalid internal URLs) prior to that
commit. Since that commit, the ERRORs triggered by that commit
effectively denied them as well. Denying them properly results in
simpler/smaller code (than allowing them would), so we should avoid a UI
change and continue to deny them, at least for now.
This change also reduces duplication of magic prefix definitions. Other
pending work will completely eliminate that duplication in src/ code.
Alex Rousskov [Sun, 30 Jul 2023 02:44:04 +0000 (02:44 +0000)]
Remove dead "String debugging" code (#1436)
The corresponding src/String.cc code does not build since 2013 commit 5082718 (at least): That commit removed StringRegistry class declaration
the debugging code relied on. Since inception, the code could not be
enabled using ./configure options or make flags -- one had to modify
Squid sources to alter hard-coded `#define DEBUGSTRINGS 0` setting.
Alex Rousskov [Sat, 29 Jul 2023 17:44:59 +0000 (17:44 +0000)]
Bug 5290: pure virtual call in Ftp::Client constructor (#1429)
FATAL: Dying from an exception handling failure;
exception: [no active exception]
Converting `this` to CbcPointer in a constructor of an abstract class
like Ftp::Client does not work because our virtual toCbdata() method
remains pure until the final/child class constructor runs.
Conceptually, the bug was probably introduced in 2013 commit 434a79b,
when FTP class hierarchy grew, making Ftp::Client an abstract class, but
the trigger was recent commit 337b9aa that removed CBDATA_CLASS() from
Ftp::Client class declaration. We discovered, described, and addressed
several such bugs in that commit, but we missed this case.
Alex Rousskov [Sun, 23 Jul 2023 01:38:19 +0000 (01:38 +0000)]
Fix memory leak when reconfiguring multiline all-of ACLs (#1425)
Normally, Acl::InnerNode::add() automatically registers stored ACL nodes
for future cleanup, but when we find the second all-of rule/line (with
the same ACL name), we do not add() the newly created OrNode and have to
explicitly register it to avoid memory leaks on reconfiguration.
Leaking since all-of ACL support was added in 2013 commit 6f58d7d.
Alex Rousskov [Sat, 22 Jul 2023 22:30:39 +0000 (22:30 +0000)]
Allow unit tests to customize their initialization code (#1423)
CppUnit provides TestFixture::setUp() and tearDown() methods, but those
methods are executed before and after _each_ test case. Many single
TestFixture-derived classes register many test cases that need to run
some test program setup/initialization code just once. Counting past
setUp() calls would increase noise and likely introduce bugs.
Some of our test cases may actually need before/after cleanup -- the
functionality already provided by TestFixture::setUp()/tearDown()
methods. Moreover, when multiple TestFixture-derived classes share the
same test executable, their collections of test cases are executed in
essentially random order (see commit 27685ef). In those use cases,
insuring one-time initialization via setUp() hacks would be especially
awkward and error-prone.
With this solution, a test program with custom needs just needs to
define a TestProgram-derived class that implements its one-time startup
logic. The same TestProgram class hierarchy might also prove useful in
future custom test execution adjustments.
Also migrated away from the "include main()" unit test design to avoid
adding risky hacks that register custom TestProgram-derived classes and
to reduce "magic" in test code (while following our style guidelines).
Every test programs now declares its own (trivial) main() rather than
getting it from include/unitTestMain.h.
MinGW produces warnings about not supporting the printf '%zu' format
code when the 'printf' format archetype is used to validate code.
Checking against the 'gnu_printf' format archetype avoids this bad
warning. We accept the lower rate of error detection since other OS
builds verify against the 'printf' format archetype.
We also removed undocumented, inconsistent, and presumably unused
support for providing custom PRINTF_FORMAT_ARG macros during Squid
build. This removal simplifies code.
... to run validation for a library after SQUID_AUTO_LIB detection.
A typical use of this macro could be:
```
SQUID_AUTO_LIB(foo,[Foo],[LIBFOO])
SQUID_CHECK_LIB_WORKS(foo,[
PKG_CHECK_MODULES([LIBFOO],[foo >= 1.0],[],[
LIBFOO_LIBS=""
])
AC_CHECK_HEADERS([foo.h],[],[LIBFOO_LIBS=""])
])
```
Update of configure.ac to use this macro uncovered and
fixed a bug which may have broken libnettle detection on
some systems.
This code has quite a lot of bitrot compared to
the POSIX select(2) code in ModSelect.cc.
Modern Windows and MinGW apparently support
the POSIX API. So we should be able to just
use the normal ModSelect now. Any adjustments
should be added to ModSelect.cc when later
testing proves a need.
Miss if a 304 update would exceed reply_header_max_size (#1420)
Fetch the resource unconditionally when a 304 (Not Modified) response to
an internal cache revalidation request grows cached HTTP response
headers beyond the reply_header_max_size limit.
Alex Rousskov [Fri, 14 Jul 2023 17:46:30 +0000 (17:46 +0000)]
Do not use static initialization to register modules (#1422)
ERROR: ... Unknown authentication scheme 'ntlm'.
When a translation unit does not contain main() and its code is not used
by the rest of the executable, the linker may exclude that translation
unit from the executable. This exclusion by LTO may happen even if that
code _is_ used to initialize static variables in that translation unit:
"If no variable or function is odr-used from a given translation unit,
the non-local variables defined in that translation unit may never be
initialized"[^1].
For example, src/auth/ntlm/Scheme.o translation unit contains nothing
but NtlmAuthRr class definition and static initialization code. The
linker knows that the rest of Squid does not use NtlmAuthRr and excludes
that translation unit from the squid executable, effectively disabling
NTLM module registration required to parse "auth_param ntlm" directives.
The problem does affect existing NTLM module, and may affect any future
module code as we reduce module's external footprint. This change
converts all RegisteredRunner registrations from using side effects of
static initialization to explicit registration calls from SquidMain().
Relying on "life before main()" is a design bug. This PR fixes that bug
with respect to RegisteredRunner registrations.
Due to indeterminate C++ static initialization order, no module can
require registration before main() starts. Thus, moving registration
timing to the beginning of SquidMain() should have no negative effects.
The new registration API still does not expose main.cc to any module
details (other than the name of the registration function itself). We
still do not need to #include any module headers into main.cc. Compiler
or linker does catch most typos in RegisteredRunner names.
Unfortunately, explicit registration still cannot prevent bugs where we
forget to register a module or fail to register a module due to wrong
registration code guards. Eventually, CI will expose such bugs.
Handle helper program startup failure as its death (#1395)
Squid quits when started helper programs die too fast without responding
to requests[^1] because such a Squid instance is unlikely to provide
acceptable service (and a full restart may actually fix the problem or,
at the very least, is more likely to bring the needed admin attention).
The same logic now applies when Squid fails to start a helper (i.e. when
ipcCreate() fails). There is no conceptual difference between those two
kinds of failures as far as helper handling code is concerned, so we now
treat them the same way.
Without these changes, helper start failures may result in an unusable
(but running) Squid instance, especially if no helpers can be started at
all, because new transactions get stuck waiting in the queue until
clients timeout. Such persistent ipcCreate() failures may be caused, for
example, by its fork() hitting an overcommit memory limits.
[^1]: The actual condition excludes cases where at least startup=N
helpers are still running. That exclusion and other helper failure
handling details are problematic, but adjusting that code is outside
_this_ fix scope: Here, we only apply _existing_ handling logic to a
missed case.
Andrew Novikov [Sun, 9 Jul 2023 02:05:56 +0000 (02:05 +0000)]
Bug 5187: Work around REQMOD satisfaction regression (#1400)
Commit ba3fe8d broke ICAP REQMOD satisfaction transactions. In some
cases, this workaround may resurrect Squid Bug 5187. Triage available at
https://bugs.squid-cache.org/show_bug.cgi?id=5187#c6
Support read-only ClpMap iteration by ClpMap users (#1409)
This change makes it possible for anticipated future ClpMap users (e.g.,
stat_ipcache_get()) to report their cache contents. No changes to
ClpMap::Entry and related old types except making them public.
The alternative solution -- a visitor design pattern -- was rejected
because `for` loops are a bit easier to read than for_each() loops.
Alex Rousskov [Wed, 5 Jul 2023 14:41:47 +0000 (14:41 +0000)]
Do not leak memory when handling cache manager requests (#1408)
Also adjusted Cache-Control APIs to prevent similar bugs. These changes
also speed up processing a bit and simplify most of the affected code.
The now-gone "just remove the old CC" putCc() misfeature was unused.
The leak was introduced by commit 92a5adb: PutCommonResponseHeaders()
incorrectly assumed that putCc(pointerToX) takes ownership of X.
Detected by Coverity. CID 1534779: Resource leak (RESOURCE_LEAK).
Alex Rousskov [Wed, 5 Jul 2023 01:55:59 +0000 (01:55 +0000)]
CI: Remove unnecessary test-functionality test wrappers (#1393)
These workarounds are not needed for the current and future code in this
branch. Other branches get their own test-functionality.sh files that
can be used to maintain a branch-specific collection of test wrappers.
The (now unused) has_commit_by_message() function was left in the script
because that function is likely to be used by future workarounds. Unlike
specific test workarounds that only apply to a subset of old code, this
and similar functions can be viewed as a reusable code "library".
Removing this non-standard protocol (already mentioned as deprecated
in Squid sources) helps eliminate duplication and simplifies the
existing error-prone forwarding logic (causing CVEs).
Separate commits replace cache_object URLs sent by squidclient and
cachemgr.cgi tools with URLs using an http scheme.
Alex Rousskov [Wed, 28 Jun 2023 16:15:49 +0000 (16:15 +0000)]
Maintenance: Remove dead Multicast Miss Stream feature (#1320)
This feature was added in 1998 commit e66d792 for special "you are
absolutely certain you understand what you are doing" experiments and
considered "largely undocumented and unsupported" by its primary
author[^1]. The corresponding MULTICAST_MISS_STREAM code in
access_log.cc has not built since 2007 commit cc192b5 (at least):
* 'class Ip::Address' has no member named 's_addr'
* 'comm_open' was not declared in this scope
* 'comm_udp_sendto' was not declared in this scope
* 'mcastSetTtl' was not declared in this scope
* 'METHOD_GET' was not declared in this scope
* 'no_addr' was not declared in this scope
* no match for 'operator!=' (operand types are LogTags and LogTags_ot)
[^1]: Duane Wessels. 2004. Squid: The Definitive Guide. O'Reilly &
Associates, Inc., USA. See mcast_miss_addr documentation on page 390.
Alex Rousskov [Wed, 28 Jun 2023 09:28:59 +0000 (09:28 +0000)]
Reject more CONNECT requests with malformed targets (#1253)
Squid silently ignored many syntax violations, interpreting a malformed
target as a valid host:port address of some origin server and
establishing a CONNECT tunnel with that origin server. While being
"tolerant" probably did not compromise the Squid instance itself, some
known attacks abuse the _difference_ in treatment of malformed requests.
Rejecting malformed requests and closing the connection prevents many
such attacks.
Among other syntax violations, this change rejects bracketed pure IPv4
addresses and bracketed domain names in CONNECT targets. Bracketed IPv6
addresses that include an IPv4address suffix are still accepted (e.g.,
look for ABNF containing ls32 in RFC 3986, Section 3.2.2).
CONNECT target hosts are no longer covered by uri_whitespace: CONNECT
targets containing whitespace are now rejected with in ERR_INVALID_URL
regardless of uri_whitespace and check_hostnames settings. With the
exception of whitespaces in host names when uri_whitespace is set to
"strip" (default), the uri_whitespace directive was already ignored for
all request targets. For example, whitespaces in the port component were
always handled without checking uri_whitespace. In CONNECT context,
stripping whitespace is not only risky but probably is not needed in
practice because user typos should not lead to spaces in CONNECT
targets, and even if they do, virtually all legitimate use cases ought
to fail during certificate validation and similar post-CONNECT checks.
Alex Rousskov [Tue, 27 Jun 2023 22:47:36 +0000 (22:47 +0000)]
Fix store_client caller memory leak on certain errors (#1347)
When a storeUnregister() code path destroys store_client before the
latter has a chance to deliver the answer, the cbdataReferenceDone()
call in store_client::finishCallback() is not reached, keeping the
callback data (e.g., clientReplyContext) alive forever. These
storeClientCopy() "cancellations" may happen, for example, when the
client-to-Squid connection is closed while store_client waits for Store.
Use CallbackData to guarantee cbdataReferenceDone() when store_client is
destructed before it can finishCallback(). These synchronous callbacks
will be replaced with AsyncCalls. For now, we use the "discouraged"
CallbackData API to accommodate the existing legacy callbacks.
Alex Rousskov [Tue, 27 Jun 2023 11:58:16 +0000 (11:58 +0000)]
Do not cache (and do not serve cached) cache manager responses (#1185)
The fixed bug affected cache manager transactions that were using
/squid-internal-mgr URL path prefix with http(s) URL scheme. It did not
affect transactions that were using legacy cache_object URL scheme.
Stale cache manager responses had their Age response header set to the
number of seconds since Unix epoch. If disk and memory caches were
disabled, then cache manager requests just triggered "found KEY_PRIVATE"
WARNINGs in cache.log (for reasons that remain unclear).
Probably broken since 2011 commit e37bd29 that did not expand
HttpRequest::maybeCacheable() (called cacheable() back then)
PROTO_CACHE_OBJECT check to include /squid-internal-mgr requests.
Also added missing Access-Control-* response headers to cache manager
responses in SMP mode and reduced code duplication related to sending
those headers (which led to them missing in SMP Squids).
Alex Rousskov [Sat, 24 Jun 2023 08:18:55 +0000 (08:18 +0000)]
Remove serialized HTTP headers from storeClientCopy() (#1335)
Do not send serialized HTTP response header bytes in storeClientCopy()
answers. Ignore serialized header size when calling storeClientCopy().
This complex change adjusts storeClientCopy() API to addresses several
related problems with storeClientCopy() and its callers. The sections
below summarize storeClientCopy() changes and then move on to callers.
### storeClientCopy() changes
Squid incorrectly assumed that serialized HTTP response headers are read
from disk in a single storeRead() request. In reality, many situations
lead to store_client::readBody() receiving partial HTTP headers,
resulting in parseCharBuf() failure and a level-0 cache.log message:
Could not parse headers from on disk object
Inadequate handling of this failure resulted in a variety of problems.
Squid now accumulates storeRead() results to parse larger headers and
also handles parsing failures better, but we could not just stop there.
With the storeRead() accumulation in place, it is no longer possible to
send parsed serialized HTTP headers to storeClientCopy() callers because
those callers do not provide enough buffer space to fit larger headers.
Increasing caller buffer capacity does not work well because the actual
size of the serialized header is unknown in advance and may be quite
large. Always allocating large buffers "just in case" is bad for
performance. Finally, larger buffers may jeopardize hard-to-find code
that uses hard-coded 4KB buffers without using HTTP_REQBUF_SZ macro.
Fortunately, storeClientCopy() callers either do not care about
serialized HTTP response headers or should not care about them! The API
forced callers to deal with serialized headers, but callers could (and
some did) just use the parsed headers available in the corresponding
MemObject. With this API change, storeClientCopy() callers no longer
receive serialized headers and do not need to parse or skip them.
Consequently, callers also do not need to account for response headers
size when computing offsets for subsequent storeClientCopy() requests.
Restricting storeClientCopy() API to HTTP _body_ bytes removed a lot of
problematic caller code. Caller changes are summarized further below.
A similar HTTP response header parsing problem existed in shared memory
cache code. That code was actually aware that headers may span multiple
cache slices but incorrectly assumed that httpMsgParseStep() accumulates
input as needed (to make another parsing "step"). It does not. Large
response headers cached in shared memory triggered a level-1 message:
Corrupted mem-cached headers: e:...
Fixed MemStore code now accumulates serialized HTTP response headers as
needed to parse them, sharing high-level parsing code with store_client.
Old clientReplyContext methods worked hard to skip received serialized
HTTP headers. The code contained dangerous and often complex/unreadable
manipulation of various raw offsets and buffer pointers, aggravated by
the perceived need to save/restore those offsets across asynchronous
checks (see below). That header skipping code is gone now. Several stale
and misleading comments related to Store buffers management were also
removed or updated.
We replaced reqofs/reqsize with simpler/safer lastStreamBufferedBytes,
while becoming more consistent with that "cached" info invalidation. We
still need this info to resume HTTP body processing after asynchronous
http_reply_access checks and cache hit validations, but we no longer
save/restore this info for hit validation: No need to save/restore
information about the buffer that hit validation does not use and must
never touch!
The API change also moved from-Store StoreIOBuffer usage closer to
StoreIOBuffers manipulated by Clients Streams code. Buffers in both
categories now contain just the body bytes, and both now treat zero
length as EOF only _after_ processing the response headers.
These changes improve overall code quality, but this code path and these
changes still suffer from utterly unsafe legacy interfaces like
StoreIOBuffer and clientStreamNode. We cannot rely on the compiler to
check our work. The risk of these changes exposing/causing bugs is high.
### AS number WHOIS lookup
asHandleReply() expected WHOIS response body bytes where serialized HTTP
headers were! The code also had multiple problems typical for manually
written C parsers dealing with raw input buffers. Now replaced with a
Tokenizer-based code.
### Cache Digests
To skip received HTTP response headers, peerDigestHandleReply() helper
functions called headersEnd() on the received buffer. Twice. We have now
merged those two parsing helper functions into one (that just checks the
already parsed headers). This merger preserved "304s must come with
fetch->pd->cd" logic that was hidden/spread across those two functions.
### URN resolver
urnHandleReply() re-parsed received HTTP response headers. We left its
HTTP body parsing code unchanged except for polishing NUL-termination.
### NetDB exchange
netdbExchangeHandleReply() re-parsed received HTTP response headers to
find where they end (via headersEnd()). We improved handing of corner
cases and replaced some "tricky bits" code, reusing the new
Store::ParsingBuffer class. The net_db record parsing code is unchanged.
### SMP Cache Manager
Mgr::StoreToCommWriter::noteStoreCopied() is a very special case. It
actually worked OK because, unlike all other storeClientCopy() callers,
this code does not get serialized HTTP headers from Store: The code
adding bytes to the corresponding StoreEntry does not write serialized
HTTP headers at all. StoreToCommWriter is used to deliver kid-specific
pieces of an HTTP body of an SMP cache manager response. The HTTP
headers of that response are handled elsewhere. We left this code
unchanged, but the existence of the special no-headers case does
complicate storeClientCopy() API documentation, implementation, and
understanding.
Co-authored-by: Eduard Bagdasaryan <eduard.bagdasaryan@measurement-factory.com>
Alex Rousskov [Fri, 23 Jun 2023 23:05:27 +0000 (23:05 +0000)]
Documentation: Update stale SMP cache_dir caveats (#1394)
The requirement to specify "workers" before "cache_dir" was added in
2010 commit acf69d7. It became obsolete since 2011 commit 095ec2b.
The "dedicated cache directory" hack for UFS-based stores has always led
to HTTP violations, but the increased complexity of worker-to-worker
synchronization code (required to improve HTTP support) also increased
the probability of crashes or worse outcomes when SMP conditionals are
used. Those hacks violate the "all processes see the same configuration"
and similar basic code assumptions. We do not test (and usually do not
even consider the needs of) such unsupported configurations.
Alex Rousskov [Mon, 19 Jun 2023 01:48:38 +0000 (01:48 +0000)]
Honor DNS RR TTLs larger than negative_dns_ttl (#1380)
Since 2017 commit fd9c47d, Squid was effectively ignoring DNS RR TTLs
that exceeded negative_dns_ttl (i.e. 60 seconds by default) because the
"find the smallest TTL across the DNS records seen so far" code in
ipcache_entry::updateTtl() mistook the "default" ipcache_entry::expires
value as the one based on an earlier seen DNS record.
In most cases, this bug decreased IP cache hit ratio.
Existing fqdncache code does not suffer from the same bug because
fqdncacheParse() always resets fqdncache_entry::expires instead of
updating it incrementally. ipcacheParse() has to update incrementally
because it is called twice per entry, once with an A answer and once
with an AAAA answer.
Ideally, ipcache_entry::expires should be made optional to eliminate
awkward "first updateTtl() call" detection, but doing so well requires
significant code changes, so that entries without a known expiration
value are not cached forever _unless_ they were loaded from /etc/hosts.
And those changes should probably be propagated to fqdncache.cc.
Alex Rousskov [Sun, 18 Jun 2023 00:30:35 +0000 (00:30 +0000)]
CI: Remove pass-through test-functionality test wrappers (#1383)
Instead of requiring a custom test wrapper for each test and, hence,
creating an ever-increasing number of pass-through wrappers that do
nothing useful, use a custom test wrapper if and only if it exists. By
default (i.e. when there is no custom wrapper), just run the named test.
As a positive side effect, this change also simplifies running tests
that are not on the $default_tests list hard-coded in main():
Alex Rousskov [Sat, 17 Jun 2023 16:34:11 +0000 (16:34 +0000)]
Bug 5278: Log %err_code for "early" request handling errors (#1382)
Certain "early" request handling errors (e.g., ERR_TOO_BIG,
ERR_UNSUP_REQ, ERR_UNSUP_HTTPVERSION, ERR_PROTOCOL_UNKNOWN, and
ERR_INVALID_URL) are handled with nil ClientHttpRequest::request object.
ErrorState relied on that object to record error type and details, so
%err_code expanded to "-" in the corresponding access.log records.
Now, ErrorState also uses ALE (where available). ALE already handles the
combination of an early error and late HttpRequest (e.g., the fake one
clientReplyContext::createStoreEntry() creates when handling the error).
This minimal fix does not address known problems with ErrorState::detail
and ErrorState::BuildHttpReply(). It also duplicates related xerrno
detailing code. We will address some of these problems shortly,
including the increased code duplication problem.
We were using the CachePeer class to record the address of each non-peer
sending us certain unwanted ICP responses, along with the number of such
responses and a histogram of associated ICP opcodes. Since 1997 commit e102ebd, these historical records were accumulated without a limit and
linearly searched, endangering a Squid instance that used ICP. Using
CachePeer to store this non-cache_peer information also complicated
cache_peer code.
This change removes these records and the corresponding mgr:non_peers
report, leaving just the cache.log warning about unexpected messages.
The warning is useful because these ICP messages from non-peers indicate
a cache hierarchy misconfiguration or a fairly sophisticated attack.
This is the simplest fix that minimizes Squid and developer resources
spent on handling these errors.
Alex Rousskov [Wed, 14 Jun 2023 14:09:38 +0000 (14:09 +0000)]
Fix key equality comparison in LookupTable map (#1379)
An std::unordered_map with case-insensitive keys must use a
case-insensitive key equality comparison. lookupTable_t used (the
default) std::equal_to<SBuf> comparison which is case-sensitive.
The full extent of this bug effects is unknown, but Squid was
mishandling Cache-Control and Surrogate-Control directives with
non-canonical (i.e. other than all-lower-case) spelling. Similar
problems affected WWW-Authenticate Digest parameter names, but the
related code remains inconsistent, with both case-sensitive and
case-insensitive checks applied to some of the key names in
Auth::Digest::Config::decode().
Also removed a similarly buggy (and, technically, unused) "typical use"
example from the CaseInsensitiveSBufHash class description. C++
documentation and Squid code are better sources of usage examples when
it comes to STL-used concepts like hash function objects.
This minimal fix excludes LookupTable class polishing.
Stop leaking PeerDigests on reconfiguration (#1371)
peerDigestCreate() cbdata-locked the digest twice. The second lock was a
"self lock" -- a lock without storing the locked pointer somewhere. That
self lock was introduced (with an XXX) in 2002 commit fa80a8e. It did
not have the corresponding unlock, and Squid leaked the digest object.
That leak became conditional in 2018 commit b56b37c: Since then, Squid
was missing one digest unlock only when the CachePeer object was gone
before a peerDigestDestroy() call, as happens during reconfiguration.
Also removed a pair of excessive cbdata locks/unlocks (that triggered
this leak): Long-term users should lock when they start storing a
pointer to the cbdata-protected object and unlock when they stop.
Also converted PeerDigest::peer to CbcPointer, addressing a TODO.
Also fixed subtle problems with detecting gone CachePeer and gone
PeerDigest objects in peerDigestFetchedEnough().
Also removed stale peerNoteDigestGone() declaration.
Alex Rousskov [Sun, 4 Jun 2023 19:41:19 +0000 (19:41 +0000)]
Do not report DNS answers without A/AAAA records by default (#1369)
Default (i.e. level-0/1) cache.log error reports should focus on
problems that may affect Squid components or transaction security rather
than on individual transaction failures due to external agents (that
most Squid admins cannot prevent or fix). The latter should be detailed
in access.log, where individual transactions are reported.
We were already using this principle for ipcache_entry::latestError()
calls except for one call devoted to DNS responses that have no A/AAAA
records in the answer section _but_ have other records in that section.
Removing that exceptional treatment simplifies Squid code and addresses
admin complaints about cache.log pollution with minor error messages
that they often cannot prevent or stop.
Destroy an idle PeerDigest after its CachePeer disappears (#1372)
Delaying this destruction until the next peerDigestCheck() event results
in accumulation of PeerDigests (and related objects) during frequent
reconfigurations.
Alex Rousskov [Thu, 25 May 2023 02:10:28 +0000 (02:10 +0000)]
Fix userinfo percent-encoding (#1367)
%X expects an unsigned int, and that is what we were giving it. However,
to get to the correct unsigned int value from a (signed) char, one has
to cast to an unsigned char (or equivalent) first.
* web server has not been the direct release destination for
many years. Remove mentions of $dst/changesets/
* web server references github repository for basic
information. Do not push CONTRIBUTORS, COPYING, README,
CREDITS and SPONSORS files to staging area.
* release notes no longer link to local ChangeLog file.
No need to adjust the hyperlink.
When Squid failed to open a TCP connection to an origin server (or
through a cache peer) while handling a CONNECT request, Squid logged %Ss
as NONE_NONE because TunnelStateData waited for a successful TCP
connection to update the log tag. That unnecessary wait (modeled after
the necessary HTTP status code wait) complicated code. Squid now logs
TCP_TUNNEL to indicate the request handling path chosen by Squid.
We already use the same "handling path" approach for most other request
status codes. For example, TCP_MISS does not mean the miss transaction
was successful. It only means that Squid satisfied the request using the
cache miss forwarding logic. Even the LOG_TCP_REFRESH_FAIL_ERR status
code does not imply that the error response was successfully forwarded
to the client; only that the request was satisfied on a particular
(albeit very detailed in this case!) handling path.
Apply the same logic to tunneling attempts blocked by miss_access. They
were also logged as NONE_NONE and are now logged as TCP_TUNNEL.
The actual outcome of a tunneling attempt can usually be determined
using %err_code/%err_detail and %>Hs logformat codes.
Alex Rousskov [Thu, 18 May 2023 16:28:35 +0000 (16:28 +0000)]
Do not apply custom debugs() format to Debug::Extra lines (#1345)
Some debugs() callers set and leave custom std::ostream formats like
std::hex. Such behavior should be considered natural/correct because,
from those callers point of view, they see and control the entire
debugging message. Stream formats obviously must not cross debugs()
boundaries, and, from those callers point of view, those boundaries are
the message they explicitly call with. Here is a hypothetical example:
When we automatically add extra information to debugging messages (e.g.,
to detail which transaction has failed, what listening port is
misconfigured, or where the exception was thrown from), the "extra"
lines are added in a completely different code context. That detailing
code also sees a seemingly isolated printing context and should not be
expected to remember to reset debugging stream flags "just in case":
os << Debug::Extra << "listening port: " << s.port();
The combination of natural code expectations, may result in very
misleading messages because, for example, hexadecimal numbers may look
just like decimal ones. We have seen that happening to transaction IDs
and here is an example based on the above debugs() that complains about
an invalid QoS flag on http_port 80 (but logs port 50):
This change would break any _deliberate_ format "bleeds" from primary
debugs() messages to Debug::Extra lines, but no such bleeding cases are
known, and they are too confusing/fragile to be encouraged/supported.
Alex Rousskov [Wed, 17 May 2023 19:33:51 +0000 (19:33 +0000)]
Do not erase aborted StoreMap entries that are still being read (#1344)
The old `!s.lock.readers` precondition for calling freeChain() in
StoreMap::abortWriting() was not broad enough: It covered "old" readers
that have incremented ReadWriteLock::readers already while missing those
lockShared() callers that have fully qualified for becoming a reader but
have not incremented the `readers` counter yet. A correct test is now
implemented by ReadWriteLock::stopAppendingAndRestoreExclusive().
This code was broken since appending mode was introduced in 2013 commit ce49546. Evidently, the mishandled race condition is short-lived enough
and abortWriting() calls are rare enough, that there are no (known)
relevant bug reports. This bug was accidentally discovered while reading
the code to explain another SMP bug.
Controller::markedForDeletionAndAbandoned() similarly misjudges
readers(), but that bug does not lead to serious problems, and we would
like to find a fix that identifies such entries without false positives
(that looking at ReadWriteLock::readLevel instead of readers creates).
Also polished Ipc::StoreMap::abortWriting() debugging a little.
Since commit 24c9378, Transients are not supposed to maintain "basic"
cache entry information (i.e. StoreMapAnchor::Basics a.k.a. StoreEntry
STORE_META_STD fields). Using often-at-their-defaults Transient basics
may unexpectedly erase valuable StoreEntry information obtained from one
of the caches (e.g., swap_file_sz) and lead to assertions.
Consequently, do not populate unused Transients entry basics either.
Eventually, we may find a good way to parameterize StoreMapAnchor to
avoid keeping those unused fields in Transients StoreMap.
Alex Rousskov [Wed, 10 May 2023 20:45:27 +0000 (20:45 +0000)]
Do not leak Security::CertErrors created in X509_verify_cert() (#1346)
ACLFilledChecklist was using a raw C pointer for handling cbdata-managed
Security::CertErrors. Some sslErrors users knew about hidden cbdata
requirements, some did not, resulting in inconsistent locking/unlocking
and associated memory leaks. Upgrading ACLFilledChecklist::sslErrors to
smart CbcPointer fixes those leaks (and simplifies code).
Alex Rousskov [Wed, 3 May 2023 22:20:27 +0000 (22:20 +0000)]
Do not check store_status when checking ENTRY_BAD_LENGTH (#1342)
Before 1997 commit b34ed72 (that introduced the current ENTRY_BAD_LENGTH
"cache"), checking store_status before calling storeEntryValidLength()
kind of made sense because storeEntryValidLength() did not support
STORE_PENDING entries with their unset object_len. Since that commit,
checking store_status does not cause any visible problems (i.e. the code
"works") because ENTRY_BAD_LENGTH implies STORE_OK. However, that
implication is not backed by some fundamental invariant and might
suddenly disappear. Checking two flags instead of one is also wasteful.
Fix default build with libgnutls but absent GnuTLS headers (#1332)
Ensure that when libgnutls is found but any of the required GnuTLS
headers cannot be used, GnuTLS support is not enabled by default and an
explicit request for GnuTLS support is fatally rejected during
./configure. Otherwise, the build fails later anyway, during "make".