Alex Bason [Sun, 15 Oct 2023 13:04:47 +0000 (13:04 +0000)]
Fix stack buffer overflow when parsing Digest Authorization (#1517)
The bug was discovered and detailed by Joshua Rogers at
https://megamansec.github.io/Squid-Security-Audit/digest-overflow.html
where it was filed as "Stack Buffer Overflow in Digest Authentication".
Recognize internal requests created by adaptation/redirection (#1504)
Before this fix, Squid set flags.internal for virgin requests but not
for adapted/redirected requests, leaving post-adaptation request
processing code in an inconsistent state, forwarding the
internalCheck()-compliant adapted/redirected requests as regular
requests, triggering forwarding loops and complicating code refactoring.
Martin Grimm [Mon, 9 Oct 2023 17:10:43 +0000 (17:10 +0000)]
Rewrite SplayNode to eliminate recursive calls (#1431)
Recursive method calls in SplayNode can lead to a stack overflow with
large (degenerate) trees, e.g. after creating a large dst acl from a
sorted ip list.
A "make check" testHeaders target is supposed to check that C/C++ header
files can be compiled "autonomously" (i.e. without any other code except
the required squid.h). Since 2010 commit a0fdc9b, existing testHeaders
were not detecting problems in any headers due to a use of a shell
heredoc with echo, a command that does not read from standard input.
This fix ensures that all C/C++ header files used by "make" are tested
in the corresponding "make check".
This change also improves parallel testing of individual header files.
Avoid truncation errors when printing time_t-based squid.conf directives
on platforms with 32-bit int and 64-bit time_t. Also avoid similar
errors when printing time_msec-based directives on platforms with 32-bit
int.
Detected by Coverity. CID 1529622: Use of 32-bit time_t (Y2K38_SAFETY).
Do not use raw pointers to index userhash CachePeers (#1496)
Simplified and improved code safety by using CbcPointers for userhash
cache_peers, as we have done for CARP peers in recent commit e7959b5.
Also fixed mgr:userehash Cache Manager reports to detail relevant
cache_peers instead of all cache_peers. This problem existed since
inception (2008 commit f7e1d9c) as detailed in recent commit e7959b5.
Y2038: Use time_t for commSetConnTimeout() timeout parameter (#1492)
Change commSetConnTimeout() "timeout" parameter from int to time_t, to
match the common caller type and improve Year 2038-safety on systems
with 32-bit int.
Detected by Coverity. CID 1545129: Use of 32-bit time_t (Y2K38_SAFETY).
Kill helpers that speak without being spoken to (#1488)
ERROR: helperHandleRead: unexpected read from ...
ERROR: helperStatefulHandleRead: unexpected read from ...
Squid ignored bytes received from both stateful and stateless helper
processes that had no outstanding helper requests at the time of
read(2). In stateful helpers, the implementation also resulted in
undefined behavior: Calling std::list::front() with an empty list.
Ignoring these "early" bytes also complicates code improvements.
Detecting early bytes cannot be done reliably because Squid cannot know
whether some early bytes were sent just before Squid created a helper
request and, hence, could be mistaken for a helper response to that
request. Incorrectly mapping helper responses could lead to serious
problems. When Squid is lucky to detect a buggy helper that sends early
bytes, the safest and simplest action is to kill the helper process.
At high levels of build parallelism on systems using GNU coreutils,
sometimes "make check" hangs on a request to confirm
removal of a read-only file temporary file.
Force-remove temporary test files to ensure removal is noniteractive.
Do not use raw pointers to index sourcehash CachePeers (#1474)
Simplified and improved code safety by using CbcPointers for sourcehash
cache_peers, as we have done for CARP peers in recent commit e7959b5.
Also fixed mgr:sourcehash Cache Manager reports to detail relevant
cache_peers instead of all cache_peers. This problem existed since
inception (2008 commit f7e1d9c) as detailed in recent commit e7959b5.
Also applied the new "no new globals" policy to CARP peering code, to
keep improved CARP and sourcehash peering code in sync.
Bug 5301: cachemgr.cgi not showing new manager interface URLs (#1479)
Also fix several related UI issues uncovered during testing:
* Prune the list of servers accessible via the CGI tool login.
Their responses would be badly mangled if accessed via
the old tools parse logic.
Also, hide the old login form if all servers use the new
manager interface.
* Ensure the 'menu' report is always used by default after
the CGI tool login. This prevents errors about MGR_INDEX
not being available on recent Squid releases. Restoring the
expected CGI tool behavior.
Extend cache_log_message to problematic from-helper annotations (#1481)
WARNING: Unsupported or unexpected from-helper annotation
with a name reserved for Squid use
The above message is emitted for every helper response containing
problematic annotations. Let admins control this reporting using
cache_log_message directive and message id 69.
This bug is specific to "half_closed_clients on" configurations.
assertion failed: ... "isOpen(fd) && !commHasHalfClosedMonitor(fd)"
location: comm.cc:1583 in commStartHalfClosedMonitor()
Squid asserts because Server schedules comm_read() after receiving EOF:
That extra read results in another EOF notification, and an attempt to
start monitoring an already monitored half-closed connection.
Upon detecting a potentially half-closed connection,
Server::doClientRead() should clear flags.readMore to prevent Server
from scheduling another comm_read(), but it does not and cannot do that
(without significant refactoring) because
* Server does not have access to flags.readMore
* flags.readMore hack is used for more than just "read more"
We worked around the above limitation by re-detecting half-closed
conditions and clearing flags.readMore after clientParseRequests(). That
fixed the bug but further increased poor code duplication across
ConnStateData::afterClientRead() and ConnStateData::kick() methods. We
then refactored by merging and moving that duplicated code into
clientParseRequests() and renamed that method to make backports safer.
Alex Rousskov [Sat, 9 Sep 2023 03:01:52 +0000 (03:01 +0000)]
Log %err_code for ERR_RELAY_REMOTE transactions (#1472)
For ERR_RELAY_REMOTE transactions, Squid was logging %err_code as "-"
because BuildHttpReply() was not updating HttpRequest::error or ALE.
That update was missing because the pre-computed response in those
transactions triggered a premature exit from BuildHttpReply().
BuildHttpReply() should not be updating errors at all, but significant
code refactoring required to fix that problem needs a dedicated change.
Also enabled regression testing for the fixed bug and the bug fixed in
recent commit ea3f56e.
Alex Rousskov [Fri, 8 Sep 2023 06:05:36 +0000 (06:05 +0000)]
Restore errno in %err_detail for ERR_CONNECT_FAIL (#1368)
Squid was sometimes logging %err_code/%err_detail as
ERR_CONNECT_FAIL/WITH_SERVER. It now logs
ERR_CONNECT_FAIL/WITH_SERVER+errno=111 (or similar).
When dealing with two error details, Squid was ignoring the latter one.
The new ErrorDetails code combines multiple details, reusing the
existing Security::ErrorDetail::brief() "a+b+c" syntax.
The new detail accumulation functionality may help detail other errors.
At least some of the logged errno details were lost in commit ba3fe8d.
Do not use invasive lists to store CachePeers (#1424)
Using invasive lists for CachePeer objects gives no advantages, but
requires maintaining non-standard error-prone code that leads to
excessive locking and associated memory overheads.
Also fixed a neighbors_init() bug that resulted in accessing an already
deleted "looks like this host" CachePeer object.
Amos Jeffries [Thu, 31 Aug 2023 14:09:50 +0000 (14:09 +0000)]
Allow creation of RefCountable objects via Make() (#1458)
Compared to calling new() directly, using a Pointer-returning Make()
reduces memory leaks related to freshly allocated heap objects. Perfect
forwarding of constructor parameters through Make() minimizes overhead.
const auto foo1 = new Foo(...); // XXX: leak-prone
const auto foo2 = Foo::Pointer::Make(...); // OK
Future changes will prohibit RefCountable object creation via direct
new() calls and require using Make() or an equivalent safety API.
Alex Rousskov [Wed, 16 Aug 2023 17:53:41 +0000 (17:53 +0000)]
Cover OnTerminate() calls unrelated to exception handling (#1430)
The C++ standard lists many reasons[^1] for calling std::terminate().
All of them deal with an "exception handling failure". However, when
Squid bugs lead to an undefined behavior (e.g., by calling a pure
virtual function), some compilers also call std::terminate(). In those
cases, there may be no active exception, and our std::terminate()
handler's report about an "exception handling failure" (with "no active
exception") was confusing and misleading.
Also do not describe an exception when there is none. Callers that know
that the exception exists may continue to use CurrentException(),
especially if they do not need to report that exception on a dedicated
debugs() line. Other callers, including OnTerminate(), should use the
new CurrentExceptionExtra() manipulator.
Keep ::helper objects alive while in use by helper_servers (#1389)
Squid creates ::helper objects to manage a given configured helper. For
each ::helper object, Squid may create one or more helper_server objects
to manage communication with individual helper processes. A
helper_server object uses its so called "parent" ::helper object to
access configuration and for helper process death notification purposes.
There are no checks that the "parent" object is still alive when used.
The same problem applies to statefulhelper and helper_stateful_server.
helper_server code evidently attempted to extend their "parent" lifetime
using cbdata, but that does not work, especially in C++ world where the
object is destructed (and, hence, enters an undefined state) even though
its top-level memory is preserved by cbdata. Non-trivial members like
helper::queue (std::queue) and statefulhelper::reservations
(std::unordered_map) release their internal memory when destructed.
We now refcount ::helper objects to keep them alive until the last
object using them is gone. This does not result in reference loops
because the ::helper object uses raw dlink_node pointers to store its
helper_servers.
The following helpers (listed here by their container names) were
destructed while possibly still in use by their helper_server objects:
external_acl::theHelper, Ssl::CertValidationHelper::ssl_crt_validator,
Ssl::Helper::ssl_crtd. The following helpers are not destructed during
reconfiguration: redirectors, storeIds, basicauthenticators,
ntlmauthenticators, negotiateauthenticators, and digestauthenticators
(even though casual reading of relevant code may suggest otherwise).
This bug fix does not address or mark many remaining helper bugs.
We cannot test cbdata validity of CachePeer pointers stored in
Config.peers because Config.peers own CachePeer objects. By definition,
owners should not lock their objects and do not need to test object
validity: Owners determine the lifetime of those objects! Naturally,
unlocked objects must not be tested for validity by others because to
test an object validity one has to have a lock that preserves object
metadata even after the object is invalidated (by its owner).
Alex Rousskov [Sat, 12 Aug 2023 15:40:12 +0000 (15:40 +0000)]
Bug 5294: ERR_CANNOT_FORWARD returned instead of ERR_DNS_FAIL (#1453)
Since 2017 commit fd9c47d, peer selection code stopped reporting
ERR_DNS_FAIL cases because PeerSelector::noteIps() treated DNS answers
without IP addresses as if at least one IP address was received. Without
seeing a DNS resolution error, the ultimate recipient of the DNS
resolution results (e.g., CONNECT tunneling or regular forwarding code)
used ERR_CANNOT_FORWARD to indicate a failure to find a forwarding path.
PeerSelector::noteIps() code mimicked legacy IPH code with regard to
handling of the addresses parameter. However, IPH caller had a special
emptyIsNil adjustment that was missing from the noteIps() call! We now
apply that adjustment to both noteIps() and IPH code paths.
Long-term, we should probably remove nil address container pointers.
Having two different ways to signal lack of IPs is dangerous. Currently,
there is only one known supplier of nil address container:
IpcacheStats.invalid code that validates ipcache_nbgethostbyname() name
parameter. Either the corresponding nil/empty name check should be
converted into an assertion (blaming the ipcache_nbgethostbyname()
caller for not validating the name) OR that checking code should supply
an empty address container to finalCallback().
Shmaya [Fri, 11 Aug 2023 17:41:30 +0000 (17:41 +0000)]
Bug 4981: Work around in-call job invalidation bugs (#1428)
Bug 4981 is one known case of such invalidation, but this workaround is
much broader than that bug context. We can speculate that architectural
problems described in commit e3b6f15 are behind (some of) these bugs.
Alex Rousskov [Tue, 8 Aug 2023 13:36:09 +0000 (13:36 +0000)]
Replaced ACLStrategised, enabling other ACL improvements (#1392)
The ACLStrategised class stands in the way of several ACL bug fixes and
improvements because its design forces us to place ACL-like match()
methods inside non-ACL classes, creating two parallel matching
hierarchies: one rooted in the ACL class and one rooted in the Strategy
template. The two APIs have similar methods, but Strategy-based objects
lie outside the ACL hierarchy and cannot be treated as ACL objects.
ACLStrategised pairs an ACL-like matching algorithm (a Strategy-derived
class) with an ACLData-derived class. The need to combine the two is
genuine, but the same combination is best supported without creating a
parallel hierarchy of Strategy classes. The new ACL-derived
ParameterizedNode base class accomplishes that, addressing the old
ACLStrategised design XXX.
Strategy-derived classes were not pooled at all! With these changes, all
formerly ACLStrategised classes get individual memory pools, typically
one per acltype, with proper acltype-based naming in mgr:mem reports. No
other functionality changes intended.
Do not use raw pointers to index CARP CachePeers (#1381)
Simplified and improved code safety by using CbcPointers instead.
Also fixed mgr:carp Cache Manager reports to detail relevant
cache_peers instead of all cache_peers. When mgr:carp report was added
in 2000 commit 8ee9b49, Squid did not index (or even distinguish!)
CARP cache_peers, and the reporting loop naturally iterated through all
cache_peers. 2002 commit b399543 added identification and indexing of
CARP peers but forgot to adjust the reporting loop.
Very similar changes will be applied to userhash and sourcehash
cache_peers. 2008 commit 63104e2 simply copied problematic CARP code to
add userhash and sourcehash cache_peer support. This change adds a few
reusable types with those upcoming improvements in mind.
Alex Rousskov [Wed, 2 Aug 2023 05:00:49 +0000 (05:00 +0000)]
CI: More HTTP caching and revalidation tests (#1440)
Use Daft cache-response test to monitor for bugs in basic caching code,
including bugs like the one fixed by commit c203754. Unlike the old
proxy-collapsed-forwarding test that uses concurrent requests, this test
varies basic response properties.
Use Daft accumulate-headers-after-304 test to monitor for bugs like the
one fixed by commit 55e1c6e. Unlike old proxy-update-headers-after-304,
this test focuses on certain _problematic_ HTTP 304 header updates.
Due to poor code duplication, commit 92a5adb accidentally classified
URLs without a trailing slash in the magical prefix as valid cache
manager URLs, triggering the above ERRORs. We were denying such
"slashless" cache manager URLs (as invalid internal URLs) prior to that
commit. Since that commit, the ERRORs triggered by that commit
effectively denied them as well. Denying them properly results in
simpler/smaller code (than allowing them would), so we should avoid a UI
change and continue to deny them, at least for now.
This change also reduces duplication of magic prefix definitions. Other
pending work will completely eliminate that duplication in src/ code.
Alex Rousskov [Sun, 30 Jul 2023 02:44:04 +0000 (02:44 +0000)]
Remove dead "String debugging" code (#1436)
The corresponding src/String.cc code does not build since 2013 commit 5082718 (at least): That commit removed StringRegistry class declaration
the debugging code relied on. Since inception, the code could not be
enabled using ./configure options or make flags -- one had to modify
Squid sources to alter hard-coded `#define DEBUGSTRINGS 0` setting.
Alex Rousskov [Sat, 29 Jul 2023 17:44:59 +0000 (17:44 +0000)]
Bug 5290: pure virtual call in Ftp::Client constructor (#1429)
FATAL: Dying from an exception handling failure;
exception: [no active exception]
Converting `this` to CbcPointer in a constructor of an abstract class
like Ftp::Client does not work because our virtual toCbdata() method
remains pure until the final/child class constructor runs.
Conceptually, the bug was probably introduced in 2013 commit 434a79b,
when FTP class hierarchy grew, making Ftp::Client an abstract class, but
the trigger was recent commit 337b9aa that removed CBDATA_CLASS() from
Ftp::Client class declaration. We discovered, described, and addressed
several such bugs in that commit, but we missed this case.
Alex Rousskov [Sun, 23 Jul 2023 01:38:19 +0000 (01:38 +0000)]
Fix memory leak when reconfiguring multiline all-of ACLs (#1425)
Normally, Acl::InnerNode::add() automatically registers stored ACL nodes
for future cleanup, but when we find the second all-of rule/line (with
the same ACL name), we do not add() the newly created OrNode and have to
explicitly register it to avoid memory leaks on reconfiguration.
Leaking since all-of ACL support was added in 2013 commit 6f58d7d.
Alex Rousskov [Sat, 22 Jul 2023 22:30:39 +0000 (22:30 +0000)]
Allow unit tests to customize their initialization code (#1423)
CppUnit provides TestFixture::setUp() and tearDown() methods, but those
methods are executed before and after _each_ test case. Many single
TestFixture-derived classes register many test cases that need to run
some test program setup/initialization code just once. Counting past
setUp() calls would increase noise and likely introduce bugs.
Some of our test cases may actually need before/after cleanup -- the
functionality already provided by TestFixture::setUp()/tearDown()
methods. Moreover, when multiple TestFixture-derived classes share the
same test executable, their collections of test cases are executed in
essentially random order (see commit 27685ef). In those use cases,
insuring one-time initialization via setUp() hacks would be especially
awkward and error-prone.
With this solution, a test program with custom needs just needs to
define a TestProgram-derived class that implements its one-time startup
logic. The same TestProgram class hierarchy might also prove useful in
future custom test execution adjustments.
Also migrated away from the "include main()" unit test design to avoid
adding risky hacks that register custom TestProgram-derived classes and
to reduce "magic" in test code (while following our style guidelines).
Every test programs now declares its own (trivial) main() rather than
getting it from include/unitTestMain.h.
MinGW produces warnings about not supporting the printf '%zu' format
code when the 'printf' format archetype is used to validate code.
Checking against the 'gnu_printf' format archetype avoids this bad
warning. We accept the lower rate of error detection since other OS
builds verify against the 'printf' format archetype.
We also removed undocumented, inconsistent, and presumably unused
support for providing custom PRINTF_FORMAT_ARG macros during Squid
build. This removal simplifies code.
... to run validation for a library after SQUID_AUTO_LIB detection.
A typical use of this macro could be:
```
SQUID_AUTO_LIB(foo,[Foo],[LIBFOO])
SQUID_CHECK_LIB_WORKS(foo,[
PKG_CHECK_MODULES([LIBFOO],[foo >= 1.0],[],[
LIBFOO_LIBS=""
])
AC_CHECK_HEADERS([foo.h],[],[LIBFOO_LIBS=""])
])
```
Update of configure.ac to use this macro uncovered and
fixed a bug which may have broken libnettle detection on
some systems.
This code has quite a lot of bitrot compared to
the POSIX select(2) code in ModSelect.cc.
Modern Windows and MinGW apparently support
the POSIX API. So we should be able to just
use the normal ModSelect now. Any adjustments
should be added to ModSelect.cc when later
testing proves a need.
Miss if a 304 update would exceed reply_header_max_size (#1420)
Fetch the resource unconditionally when a 304 (Not Modified) response to
an internal cache revalidation request grows cached HTTP response
headers beyond the reply_header_max_size limit.
Alex Rousskov [Fri, 14 Jul 2023 17:46:30 +0000 (17:46 +0000)]
Do not use static initialization to register modules (#1422)
ERROR: ... Unknown authentication scheme 'ntlm'.
When a translation unit does not contain main() and its code is not used
by the rest of the executable, the linker may exclude that translation
unit from the executable. This exclusion by LTO may happen even if that
code _is_ used to initialize static variables in that translation unit:
"If no variable or function is odr-used from a given translation unit,
the non-local variables defined in that translation unit may never be
initialized"[^1].
For example, src/auth/ntlm/Scheme.o translation unit contains nothing
but NtlmAuthRr class definition and static initialization code. The
linker knows that the rest of Squid does not use NtlmAuthRr and excludes
that translation unit from the squid executable, effectively disabling
NTLM module registration required to parse "auth_param ntlm" directives.
The problem does affect existing NTLM module, and may affect any future
module code as we reduce module's external footprint. This change
converts all RegisteredRunner registrations from using side effects of
static initialization to explicit registration calls from SquidMain().
Relying on "life before main()" is a design bug. This PR fixes that bug
with respect to RegisteredRunner registrations.
Due to indeterminate C++ static initialization order, no module can
require registration before main() starts. Thus, moving registration
timing to the beginning of SquidMain() should have no negative effects.
The new registration API still does not expose main.cc to any module
details (other than the name of the registration function itself). We
still do not need to #include any module headers into main.cc. Compiler
or linker does catch most typos in RegisteredRunner names.
Unfortunately, explicit registration still cannot prevent bugs where we
forget to register a module or fail to register a module due to wrong
registration code guards. Eventually, CI will expose such bugs.
Handle helper program startup failure as its death (#1395)
Squid quits when started helper programs die too fast without responding
to requests[^1] because such a Squid instance is unlikely to provide
acceptable service (and a full restart may actually fix the problem or,
at the very least, is more likely to bring the needed admin attention).
The same logic now applies when Squid fails to start a helper (i.e. when
ipcCreate() fails). There is no conceptual difference between those two
kinds of failures as far as helper handling code is concerned, so we now
treat them the same way.
Without these changes, helper start failures may result in an unusable
(but running) Squid instance, especially if no helpers can be started at
all, because new transactions get stuck waiting in the queue until
clients timeout. Such persistent ipcCreate() failures may be caused, for
example, by its fork() hitting an overcommit memory limits.
[^1]: The actual condition excludes cases where at least startup=N
helpers are still running. That exclusion and other helper failure
handling details are problematic, but adjusting that code is outside
_this_ fix scope: Here, we only apply _existing_ handling logic to a
missed case.
Andrew Novikov [Sun, 9 Jul 2023 02:05:56 +0000 (02:05 +0000)]
Bug 5187: Work around REQMOD satisfaction regression (#1400)
Commit ba3fe8d broke ICAP REQMOD satisfaction transactions. In some
cases, this workaround may resurrect Squid Bug 5187. Triage available at
https://bugs.squid-cache.org/show_bug.cgi?id=5187#c6
Support read-only ClpMap iteration by ClpMap users (#1409)
This change makes it possible for anticipated future ClpMap users (e.g.,
stat_ipcache_get()) to report their cache contents. No changes to
ClpMap::Entry and related old types except making them public.
The alternative solution -- a visitor design pattern -- was rejected
because `for` loops are a bit easier to read than for_each() loops.
Alex Rousskov [Wed, 5 Jul 2023 14:41:47 +0000 (14:41 +0000)]
Do not leak memory when handling cache manager requests (#1408)
Also adjusted Cache-Control APIs to prevent similar bugs. These changes
also speed up processing a bit and simplify most of the affected code.
The now-gone "just remove the old CC" putCc() misfeature was unused.
The leak was introduced by commit 92a5adb: PutCommonResponseHeaders()
incorrectly assumed that putCc(pointerToX) takes ownership of X.
Detected by Coverity. CID 1534779: Resource leak (RESOURCE_LEAK).
Alex Rousskov [Wed, 5 Jul 2023 01:55:59 +0000 (01:55 +0000)]
CI: Remove unnecessary test-functionality test wrappers (#1393)
These workarounds are not needed for the current and future code in this
branch. Other branches get their own test-functionality.sh files that
can be used to maintain a branch-specific collection of test wrappers.
The (now unused) has_commit_by_message() function was left in the script
because that function is likely to be used by future workarounds. Unlike
specific test workarounds that only apply to a subset of old code, this
and similar functions can be viewed as a reusable code "library".
Removing this non-standard protocol (already mentioned as deprecated
in Squid sources) helps eliminate duplication and simplifies the
existing error-prone forwarding logic (causing CVEs).
Separate commits replace cache_object URLs sent by squidclient and
cachemgr.cgi tools with URLs using an http scheme.
Alex Rousskov [Wed, 28 Jun 2023 16:15:49 +0000 (16:15 +0000)]
Maintenance: Remove dead Multicast Miss Stream feature (#1320)
This feature was added in 1998 commit e66d792 for special "you are
absolutely certain you understand what you are doing" experiments and
considered "largely undocumented and unsupported" by its primary
author[^1]. The corresponding MULTICAST_MISS_STREAM code in
access_log.cc has not built since 2007 commit cc192b5 (at least):
* 'class Ip::Address' has no member named 's_addr'
* 'comm_open' was not declared in this scope
* 'comm_udp_sendto' was not declared in this scope
* 'mcastSetTtl' was not declared in this scope
* 'METHOD_GET' was not declared in this scope
* 'no_addr' was not declared in this scope
* no match for 'operator!=' (operand types are LogTags and LogTags_ot)
[^1]: Duane Wessels. 2004. Squid: The Definitive Guide. O'Reilly &
Associates, Inc., USA. See mcast_miss_addr documentation on page 390.
Alex Rousskov [Wed, 28 Jun 2023 09:28:59 +0000 (09:28 +0000)]
Reject more CONNECT requests with malformed targets (#1253)
Squid silently ignored many syntax violations, interpreting a malformed
target as a valid host:port address of some origin server and
establishing a CONNECT tunnel with that origin server. While being
"tolerant" probably did not compromise the Squid instance itself, some
known attacks abuse the _difference_ in treatment of malformed requests.
Rejecting malformed requests and closing the connection prevents many
such attacks.
Among other syntax violations, this change rejects bracketed pure IPv4
addresses and bracketed domain names in CONNECT targets. Bracketed IPv6
addresses that include an IPv4address suffix are still accepted (e.g.,
look for ABNF containing ls32 in RFC 3986, Section 3.2.2).
CONNECT target hosts are no longer covered by uri_whitespace: CONNECT
targets containing whitespace are now rejected with in ERR_INVALID_URL
regardless of uri_whitespace and check_hostnames settings. With the
exception of whitespaces in host names when uri_whitespace is set to
"strip" (default), the uri_whitespace directive was already ignored for
all request targets. For example, whitespaces in the port component were
always handled without checking uri_whitespace. In CONNECT context,
stripping whitespace is not only risky but probably is not needed in
practice because user typos should not lead to spaces in CONNECT
targets, and even if they do, virtually all legitimate use cases ought
to fail during certificate validation and similar post-CONNECT checks.
Alex Rousskov [Tue, 27 Jun 2023 22:47:36 +0000 (22:47 +0000)]
Fix store_client caller memory leak on certain errors (#1347)
When a storeUnregister() code path destroys store_client before the
latter has a chance to deliver the answer, the cbdataReferenceDone()
call in store_client::finishCallback() is not reached, keeping the
callback data (e.g., clientReplyContext) alive forever. These
storeClientCopy() "cancellations" may happen, for example, when the
client-to-Squid connection is closed while store_client waits for Store.
Use CallbackData to guarantee cbdataReferenceDone() when store_client is
destructed before it can finishCallback(). These synchronous callbacks
will be replaced with AsyncCalls. For now, we use the "discouraged"
CallbackData API to accommodate the existing legacy callbacks.
Alex Rousskov [Tue, 27 Jun 2023 11:58:16 +0000 (11:58 +0000)]
Do not cache (and do not serve cached) cache manager responses (#1185)
The fixed bug affected cache manager transactions that were using
/squid-internal-mgr URL path prefix with http(s) URL scheme. It did not
affect transactions that were using legacy cache_object URL scheme.
Stale cache manager responses had their Age response header set to the
number of seconds since Unix epoch. If disk and memory caches were
disabled, then cache manager requests just triggered "found KEY_PRIVATE"
WARNINGs in cache.log (for reasons that remain unclear).
Probably broken since 2011 commit e37bd29 that did not expand
HttpRequest::maybeCacheable() (called cacheable() back then)
PROTO_CACHE_OBJECT check to include /squid-internal-mgr requests.
Also added missing Access-Control-* response headers to cache manager
responses in SMP mode and reduced code duplication related to sending
those headers (which led to them missing in SMP Squids).
Alex Rousskov [Sat, 24 Jun 2023 08:18:55 +0000 (08:18 +0000)]
Remove serialized HTTP headers from storeClientCopy() (#1335)
Do not send serialized HTTP response header bytes in storeClientCopy()
answers. Ignore serialized header size when calling storeClientCopy().
This complex change adjusts storeClientCopy() API to addresses several
related problems with storeClientCopy() and its callers. The sections
below summarize storeClientCopy() changes and then move on to callers.
### storeClientCopy() changes
Squid incorrectly assumed that serialized HTTP response headers are read
from disk in a single storeRead() request. In reality, many situations
lead to store_client::readBody() receiving partial HTTP headers,
resulting in parseCharBuf() failure and a level-0 cache.log message:
Could not parse headers from on disk object
Inadequate handling of this failure resulted in a variety of problems.
Squid now accumulates storeRead() results to parse larger headers and
also handles parsing failures better, but we could not just stop there.
With the storeRead() accumulation in place, it is no longer possible to
send parsed serialized HTTP headers to storeClientCopy() callers because
those callers do not provide enough buffer space to fit larger headers.
Increasing caller buffer capacity does not work well because the actual
size of the serialized header is unknown in advance and may be quite
large. Always allocating large buffers "just in case" is bad for
performance. Finally, larger buffers may jeopardize hard-to-find code
that uses hard-coded 4KB buffers without using HTTP_REQBUF_SZ macro.
Fortunately, storeClientCopy() callers either do not care about
serialized HTTP response headers or should not care about them! The API
forced callers to deal with serialized headers, but callers could (and
some did) just use the parsed headers available in the corresponding
MemObject. With this API change, storeClientCopy() callers no longer
receive serialized headers and do not need to parse or skip them.
Consequently, callers also do not need to account for response headers
size when computing offsets for subsequent storeClientCopy() requests.
Restricting storeClientCopy() API to HTTP _body_ bytes removed a lot of
problematic caller code. Caller changes are summarized further below.
A similar HTTP response header parsing problem existed in shared memory
cache code. That code was actually aware that headers may span multiple
cache slices but incorrectly assumed that httpMsgParseStep() accumulates
input as needed (to make another parsing "step"). It does not. Large
response headers cached in shared memory triggered a level-1 message:
Corrupted mem-cached headers: e:...
Fixed MemStore code now accumulates serialized HTTP response headers as
needed to parse them, sharing high-level parsing code with store_client.
Old clientReplyContext methods worked hard to skip received serialized
HTTP headers. The code contained dangerous and often complex/unreadable
manipulation of various raw offsets and buffer pointers, aggravated by
the perceived need to save/restore those offsets across asynchronous
checks (see below). That header skipping code is gone now. Several stale
and misleading comments related to Store buffers management were also
removed or updated.
We replaced reqofs/reqsize with simpler/safer lastStreamBufferedBytes,
while becoming more consistent with that "cached" info invalidation. We
still need this info to resume HTTP body processing after asynchronous
http_reply_access checks and cache hit validations, but we no longer
save/restore this info for hit validation: No need to save/restore
information about the buffer that hit validation does not use and must
never touch!
The API change also moved from-Store StoreIOBuffer usage closer to
StoreIOBuffers manipulated by Clients Streams code. Buffers in both
categories now contain just the body bytes, and both now treat zero
length as EOF only _after_ processing the response headers.
These changes improve overall code quality, but this code path and these
changes still suffer from utterly unsafe legacy interfaces like
StoreIOBuffer and clientStreamNode. We cannot rely on the compiler to
check our work. The risk of these changes exposing/causing bugs is high.
### AS number WHOIS lookup
asHandleReply() expected WHOIS response body bytes where serialized HTTP
headers were! The code also had multiple problems typical for manually
written C parsers dealing with raw input buffers. Now replaced with a
Tokenizer-based code.
### Cache Digests
To skip received HTTP response headers, peerDigestHandleReply() helper
functions called headersEnd() on the received buffer. Twice. We have now
merged those two parsing helper functions into one (that just checks the
already parsed headers). This merger preserved "304s must come with
fetch->pd->cd" logic that was hidden/spread across those two functions.
### URN resolver
urnHandleReply() re-parsed received HTTP response headers. We left its
HTTP body parsing code unchanged except for polishing NUL-termination.
### NetDB exchange
netdbExchangeHandleReply() re-parsed received HTTP response headers to
find where they end (via headersEnd()). We improved handing of corner
cases and replaced some "tricky bits" code, reusing the new
Store::ParsingBuffer class. The net_db record parsing code is unchanged.
### SMP Cache Manager
Mgr::StoreToCommWriter::noteStoreCopied() is a very special case. It
actually worked OK because, unlike all other storeClientCopy() callers,
this code does not get serialized HTTP headers from Store: The code
adding bytes to the corresponding StoreEntry does not write serialized
HTTP headers at all. StoreToCommWriter is used to deliver kid-specific
pieces of an HTTP body of an SMP cache manager response. The HTTP
headers of that response are handled elsewhere. We left this code
unchanged, but the existence of the special no-headers case does
complicate storeClientCopy() API documentation, implementation, and
understanding.
Co-authored-by: Eduard Bagdasaryan <eduard.bagdasaryan@measurement-factory.com>
Alex Rousskov [Fri, 23 Jun 2023 23:05:27 +0000 (23:05 +0000)]
Documentation: Update stale SMP cache_dir caveats (#1394)
The requirement to specify "workers" before "cache_dir" was added in
2010 commit acf69d7. It became obsolete since 2011 commit 095ec2b.
The "dedicated cache directory" hack for UFS-based stores has always led
to HTTP violations, but the increased complexity of worker-to-worker
synchronization code (required to improve HTTP support) also increased
the probability of crashes or worse outcomes when SMP conditionals are
used. Those hacks violate the "all processes see the same configuration"
and similar basic code assumptions. We do not test (and usually do not
even consider the needs of) such unsupported configurations.
Alex Rousskov [Mon, 19 Jun 2023 01:48:38 +0000 (01:48 +0000)]
Honor DNS RR TTLs larger than negative_dns_ttl (#1380)
Since 2017 commit fd9c47d, Squid was effectively ignoring DNS RR TTLs
that exceeded negative_dns_ttl (i.e. 60 seconds by default) because the
"find the smallest TTL across the DNS records seen so far" code in
ipcache_entry::updateTtl() mistook the "default" ipcache_entry::expires
value as the one based on an earlier seen DNS record.
In most cases, this bug decreased IP cache hit ratio.
Existing fqdncache code does not suffer from the same bug because
fqdncacheParse() always resets fqdncache_entry::expires instead of
updating it incrementally. ipcacheParse() has to update incrementally
because it is called twice per entry, once with an A answer and once
with an AAAA answer.
Ideally, ipcache_entry::expires should be made optional to eliminate
awkward "first updateTtl() call" detection, but doing so well requires
significant code changes, so that entries without a known expiration
value are not cached forever _unless_ they were loaded from /etc/hosts.
And those changes should probably be propagated to fqdncache.cc.
Alex Rousskov [Sun, 18 Jun 2023 00:30:35 +0000 (00:30 +0000)]
CI: Remove pass-through test-functionality test wrappers (#1383)
Instead of requiring a custom test wrapper for each test and, hence,
creating an ever-increasing number of pass-through wrappers that do
nothing useful, use a custom test wrapper if and only if it exists. By
default (i.e. when there is no custom wrapper), just run the named test.
As a positive side effect, this change also simplifies running tests
that are not on the $default_tests list hard-coded in main():