Avoid file name conflict with Windows WinSvc.h (#1637)
There is a conflict between our WinSvc.h header file and
Windows system include <winsvc.h>, which is one of the
factors preventing our ability to build native windows helpers.
Alex Rousskov [Thu, 11 Jan 2024 07:10:15 +0000 (07:10 +0000)]
Fix dupe handling in Splay ACLs: src, dst, http_status, etc. (#1632)
Squid was dangerously mishandling duplicate[^1] values in ACLs with
Splay tree storage: src, dst, localip, http_status, dstdomain,
srcdomain, and ssl::server_name. These problems were all rooted in the
same old code but had diverged across two ACL groups, as detailed in the
corresponding sections below. Fortunately, all known problems were
accompanied with Squid cache.log WARNINGs: Squid configurations that do
not emit cited or similar warnings are not affected by these bugs.
[^1]: Squid Splay trees can only store unique values. Uniqueness is
defined in terms of a configurable comparison function that returns zero
when the two values are "not unique" (i.e. are considered to be
"duplicated" or "equal" in that Splay context). Those two "equal" values
may differ a lot in other contexts! For example, the following two
status code ranges are equal from acl_httpstatus_data::compare() point
of view, but are obviously very different from an admin and http_access
rules points of view: 200-200 and 200-299.
### Group 1: src, dst, localip, and http_status ACLs
These ACLs ignored (i.e. never matched) some configured ACL values in
problematic use cases. They also gave wrong "remove X" advice and
incorrectly classified values as being subranges or subsets:
Processing: acl testA http_status 200 200-300
WARNING: '200-300' is a subrange of '200'
WARNING: because of this '200-300' is ignored ...
Processing: acl testB http_status 300-400 200-300
WARNING: '200-300' is a subrange of '300-400'
WARNING: because of this '200-300' is ignored ...
WARNING: You should probably remove '300-400' from the ACL...
Processing: acl testC src 10.0.0.1 10.0.0.0-10.0.0.255
WARNING: (B) 10.0.0.1 is a subnetwork of (A) 10.0.0.0-10.0.0.255
WARNING: because of this 10.0.0.0-10.0.0.255 is ignored...
Processing: acl testD src 10.0.0.0-10.0.0.1 10.0.0.1-10.0.0.255
WARNING: (A) 10.0.0.1-10.0.0.255 is a subnetwork of (B) 10.0.0.0-...
WARNING: because of this 10.0.0.1-10.0.0.255 is ignored...
WARNING: You should probably remove 10.0.0.1-10.0.0.255 from the ACL
Since 2002 commit 96da6e8, IP-based ACLs like src, dst, and localip
eliminate duplicates[^1] among configured ACL values. That elimination
code was buggy since inception. Those bugs were later duplicated in
http_status code (2005 commit a0ec9f6). This change fixes those bugs.
To correctly eliminate duplicates when facing two (fully or partially)
overlapping ranges -- new A and old B -- we must pick the right
corrective action depending on the kind of overlap:
* A is a subrange of B: Ignore fully duplicated new range A. Keep B.
* B is a subrange of A: Remove fully duplicated old range B. Add A.
* A partially overlaps with B: Add a union of A and B. Remove B.
Both acl_httpstatus_data::compare() and acl_ip_data::NetworkCompare()
mishandled the last two cases because these functions effectively
implemented the following buggy logic instead:
- in all three cases: Ignore new range A. Keep B.
Their WARNINGs also suggested wrong corrective actions in two mishandled
cases (see the last WARNING lines in testB and testD output above).
### Group 2: dstdomain, srcdomain, and ssl::server_name ACLs
Processing: acl testE dstdomain .example.com example.com
ERROR: '.example.com' is a subdomain of 'example.com'
ERROR: You need to remove '.example.com' from the ACL named 'testE'
2002 commit 96da6e8 bugs mentioned in Group 1 section stopped affecting
domain-based ACLs (i.e. dstdomain, srcdomain, and ssl::server_name) in
2011 commit 14e563c that adjusted aclDomainCompare() to self_destruct()
in problematic cases. However, that adjustment emitted wrong advice and
incorrect subdomain classification in cases where strlen() checks do not
work for determining which of the two configured ACL values should be
removed (see testE ERRORs above).
This change improves that handling by replacing the call to
self_destruct() with proper duplicate[^1] resolution code (and fixing
cache.log messages). We (now) support duplicate values instead of
rejecting configurations containing them because duplicate values do not
invalidate an ACL -- an ACL with duplicates could match as expected. It
may be difficult for some admins to avoid duplication, especially when
ACL values come from multiple sources. Squid should continue to warn
about duplicates (because they waste resources and may indicate a deeper
problem), but killing Squid or otherwise rejecting ACLs with duplicates
is bad UX.
N.B. Domain-based ACLs use sets of values rather than "ranges" discussed
in Group 1 section, but domain sets follow the same basic principles.
### All of the above seven ACLs
The problems in both ACL groups were fixed by factoring out "insert ACL
values while correctly handling duplicates" algorithm (see the three
bullets in Group 1 section) into Acl::SplayInserter::Merge() function.
Without duplicates, the new ACL value insertion code has the same cost
as the old one. The vast majority of duplicate cases incur a constant
additional overhead because Splay tree dynamic reorganization makes the
right tree nodes immediately available. A few cases duplicate double the
number of comparisons during tree searches (when Splay reorganization
cannot cope with a particularly "bad" ACL value order), but since these
"bad" cases ought to be very rare, and since all problematic cases are
accompanied by WARNINGs, this extra cost is deemed acceptable.
This change also fixes memory leaks associated with ignored ACL values.
Also added test-suite/test-squid-conf.sh support for matching multiple
stderr messages. Without this, testing a variety of closely-related
cases requires creating lots of test configuration files (two per test
case), increasing noise and, more importantly, making it difficult to
handle related test cases as one coherent collection. The new
"expect-messages" feature is barely sufficient for testing these
changes, but we are now at (or perhaps even beyond) the limit of what
can be reasonably done using shell scripts and test instruction files:
The next step is to convert instruction files (and likely some test
cases themselves!) to scripts written in Perl or a better language.
Amos Jeffries [Sun, 7 Jan 2024 18:37:09 +0000 (18:37 +0000)]
Docs: update doxygen support (#1622)
Doxygen v1.7.5 changed HTML_HEADER requirements: We now need to add an
open div element to the header that Doxygen then closes for us. This
change also fixes the corresponding generated content on Squid website.
Also, disable IDL_PROPERTY_SUPPORT which has been causing some errors
with output getter/setter matching in the generated output.
Bug 5329: cbdata.cc:276 "c->locks > 0" assertion on reconfigure (#1625)
Recent commit 0f78379 correctly removed an excessive cbdata lock of a
CachePeer::digest object in peerDigestCreate() but accidentally lost
another digest lock while inlining peerDigestCreate(). The resulting
excessive unlocking triggered reconfiguration assertions.
This change restores the lost lock as a short-term fix.
Long-term, CachePeer code should be fixed to become an exclusive[^1]
PeerDigest owner (i.e. creating and deleting its cbdata-protected digest
object without locking, unlocking, or checking locks). That improvement
is already in the works, but it requires significant code refactoring.
[^1]: Shared PeerDigest ownership (i.e. reference counting instead of
explicit delete and cbdata) does not work well in this context due to
circular references.
Ben Kallus [Mon, 18 Dec 2023 18:43:03 +0000 (18:43 +0000)]
Bug 5119: Null pointer dereference in makeMemNodeDataOffset() (#1623)
UndefinedBehaviorSanitizer: undefined-behavior mem_node.cc:27:26 in
runtime error: member access within null pointer of type 'mem_node'
Since only the address of the data member is computed, a compiler is
likely to perform pointer arithmetic rather than dereference a nullptr,
but it is best to replace this UB with a safe and clearer alternative.
Alex Rousskov [Sun, 17 Dec 2023 14:48:41 +0000 (14:48 +0000)]
Bug 5254, part 1: Do not leak master process' cache.log to kids (#1222)
The fork()ed kids unknowingly inherited cache.log file descriptor and,
hence, prevented the underlying file from being deleted on log rotation.
This change does not fully fix the bug because the file is still being
held open by the master process itself. This change was isolated because
it lacks bad/controversial side effects -- it is a simple step forward.
This fix may also help stop leaking kid's cache.log descriptor to
helpers, but that requires more work -- replacing descriptor duping
trick in ipcCreate() with a proper servicing of helper stderr descriptor
via Comm. For now, each helper process still keeps cache.log alive.
Alex Rousskov [Sun, 17 Dec 2023 02:03:22 +0000 (02:03 +0000)]
Docs: Describe surprising side effects of auth_param basic (#1612)
acl badGuys proxy_auth Bob
http_access deny badGuys
Admins may be surprised that their proxy_auth ACLs do not match users
with logins identical to those listed as proxy_auth ACL values. For
example, a user logged in as "Bob" will no match the above ACL if Basic
authentication is used without an explicit "casesensitive on" setting.
In fact, the above ACL cannot match any user in that environment!
Amos Jeffries [Wed, 13 Dec 2023 18:51:47 +0000 (18:51 +0000)]
Stop zeroing huge memAllocBuf() buffers (#1592)
memAllocBuf() buffers smaller than 64KB were not zeroed before this
change. Larger buffers (a.k.a. "huge buffers") were zeroed with
xcalloc(). We believe it is safe to stop zeroing those huge buffers
because all known code that allocates huge buffers also allocates
smaller ones for the same purpose. If some of that code relied on
zeroing for years, we would expect to see problems with smaller buffers.
Removing xcalloc() allows removal of mem*Rigid(), memBufStats() and all
string pools API complications.
The cache manager mgr:mem report section for string
statistics is also removed.
We just wanted to remove legacy printf()-like calls from Notes.cc, but
realized that finding correct replacement for that code is complicated
because some of the calls were broken, and the true meaning or purpose
of the affected annotation reporting methods was elusive. This change
combines several related fixes and improvements detailed below.
### Fix reporting method names and their descriptions
Humans could not easily figure out the difference between Note::dump(),
Note::toString(), Notes::dump(), Notes::toString(), and
NotePairs::toString() methods, especially since Note and Notes classes
had both, implying some important difference. The toString() name is
very generic. The dump() name is used (differently) in Configuration
code and ACL class hierarchy; these classes are used by that code, but
they do not belong to that hierarchy. Bugs and the variety of
annotation-related use cases increased doubts and confusion.
The new task/format-specific names for Note and Notes methods fix this.
### Fix annotate_client and annotate_transaction mgr:config reporting
Multi-name annotations were split across several lines and used an
incorrect name/value separator. The "acl" directive line was followed by
an extra new line.
Each "note" directive was followed by bogus "(note...)" suffix instead
of ACL names (if any). The "note" directive line was followed by an
extra new line.
### Improve debugging of annotations in helper responses
Alex Rousskov [Sat, 9 Dec 2023 04:46:55 +0000 (04:46 +0000)]
Bug 5274: Successful tunnels logged as TCP_TUNNEL/500 (#1608)
Stop calling retryOrBail() when the tunneled Squid-server connection
(that we have committed to use) closes. Our retryOrBail() is dedicated
to handling errors. Most[^1] serverClosed() calls are _not_ related to
errors because our tunneling code abuses asynchronous connection closure
callbacks for TunnelStateData work termination. Depending on the
transaction details (e.g., TLS interception vs. true CONNECT), calling
retryOrBail() on these no-error code paths may result in retryOrBail()
"catch all other errors" code creating bogus ERR_CANNOT_FORWARD errors.
Most tunneling errors are already detailed, and retryOrBail() does not
have enough information to correctly detail the remaining ones anyway.
Removing this retryOrBail() call selects the arguably lesser evil.
The client-Squid connection closure callback, clientClosed(), already
uses the same logic.
This change does not resurrect Bug 5132 fixed by commit 752fa20 that
added the now-replaced retryOrBail() call to serverClosed(). That commit
fixed the leak by calling deleteThis() (via retryOrBail()). Our
finishWritingAndDelete() call preserves that logic. That commit also
claimed to allow more retries, but that claim was a mistake: To-server
closure callback registration (e.g. commitToServer()) bans retries.
[^1]: The fact that severClosed() is called for both successful and
problematic outcomes prevents TunnelStateData from properly handling
certain (rare) errors. We tried to fix that as well, but the changes
quickly snowballed, so we left a few XXXs instead.
Alex Rousskov [Wed, 6 Dec 2023 05:25:53 +0000 (05:25 +0000)]
Docs: Describe more ACL effects on (re)authentication (#1611)
Existing documentation was
* silent about %ul, max_user_ip, ident, and ident_regex side effects;
* silent about adapted_http_access context triggering authentication;
* vague about (re)authentication triggers.
Amos Jeffries [Tue, 5 Dec 2023 12:31:57 +0000 (12:31 +0000)]
Remove ADD_X_REQUEST_URI (#1607)
This mechanism was supposed to add the client requested URI to
response messages. But in modern Squid the string it uses can
now be modified by adaptation and redirectors.
The following squid.conf setting does a better job of adding the
intended value and does not require a custom build:
reply_header_add X-Request-URI "%>ru" all
Also, other %ru related format codes can be added at the admin
choice for more flexible troubleshooting.
Alex [Wed, 29 Nov 2023 20:29:36 +0000 (20:29 +0000)]
Avoid UB when copying AnyP::Uri (#1604)
Without a self-assignment check, a memcpy(3) call in Uri copy assignment
operator could result in undefined behavior. Let the compiler generate
all this code instead.
Add %byte{value} logformat code for logging or sending any byte (#1588)
This feature is needed by at least the statsd tool receiving TCP log
info: https://github.com/statsd/statsd/blob/7c07eec/docs/server.md
No support for zero byte values yet because existing Format::assemble()
code does not support that out of the box, and there is no known need
for such support. It can be added later (without backward compatibility
problems) if needed.
Thomas Leroy [Tue, 28 Nov 2023 07:35:46 +0000 (07:35 +0000)]
Limit the number of allowed X-Forwarded-For hops (#1589)
Squid will ignore all X-Forwarded-For elements listed after the first 64
addresses allowed by the follow_x_forwarded_for directive. A different
limit can be specified by defining a C++ SQUID_X_FORWARDED_FOR_HOP_MAX
macro, but that macro is not a supported Squid configuration interface
and may change or disappear at any time.
Squid will log a cache.log ERROR if the hop limit has been reached.
This change works around problematic ACLChecklist and/or slow ACLs
implementation that results in immediate nonBlockingCheck() callbacks.
Such callbacks have caused many bugs and development complications. In
clientFollowXForwardedForCheck() context, they lead to indirect
recursion that was bound only by the number of allowed XFF entries,
which could reach thousands and exhaust Squid process call stack.
This recursion bug was discovered and detailed by Joshua Rogers at
https://megamansec.github.io/Squid-Security-Audit/xff-stackoverflow.html
where it was filed as "X-Forwarded-For Stack Overflow".
Recent commit 122a6e3 removed HTTP response headers from store_client
responses. That removal created the possibility of an empty
StoreIOBuffer at the beginning of the feeding sequence. Pending Bug 5317
fix will make such buffers even more frequent. Existing store_client
recipients have varying requirements with regard to empty response
buffers, as documented in store_client::finishCallback(). We missed this
requirement conflict in Cache Digest code. This fix adjusts Cache
Digests code to be compatible with empty StoreIOBuffer representation in
current store_client code.
Amos Jeffries [Sun, 19 Nov 2023 10:04:11 +0000 (10:04 +0000)]
Move memory initialization check (#1591)
mainInitialize() is far too late to be validating that memory
pools created with memDataInit() have been correctly initialized.
They will have already been used by debugging, and many other
components early-setup logic.
Instead run the check at the last possible moment before
declaring the memory pools to be fully initialized.
Alan Coopersmith [Sun, 19 Nov 2023 03:17:07 +0000 (03:17 +0000)]
Bug 5312: Startup aborts if OPEN_MAX exceeds RLIMIT_NOFILE (#1551)
FATAL: Event loop exited with failure
The DP_POLL ioctl on Solaris fails with EINVAL when the number of
supplied descriptors (OPEN_MAX) is higher than the current RLIMIT_NOFILE
setting. When it comes to the maximum number of descriptors, we should
use Squid_MaxFD (which already reflects RLIMIT_NOFILE setting).
When a helper has no server programs left (e.g., because all started
programs, if any, have quit and no new programs could be started), drop
all queued helper requests, so that their corresponding master
transactions do not get stuck.
Existing handleFewerServers() already handles this situation by killing
Squid, but we must drop queued requests in other places (that can be
found by locating code that adjusts childs.n_active) because:
* that existing Squid-killing code effectively excludes several use
cases (e.g., it ignores common startup=0 helper configurations);
* handleFewerServers() may not be called in some relevant cases
(e.g., when no new servers were needed after a reconfiguration);
* handleFewerServers() may be followed by a helperOpenServers() call
(that may open new servers for handling queued requests); we do
not want to kill queued requests in those cases.
Squid may recover and successfully start another helper later, but
transaction X progress must not rely on an unrelated transaction Y
actions: If transaction Y does not show up or does not result in a
successful helper program start, then transaction X will continue to be
queued at the helper level, possibly forever (even request timeouts are
handled by individual helper server objects that we lack here).
Also significantly simplified HelperServerClosed() implementations.
Reduce ACLChecklist::AsyncState to a function pointer (#1576)
ExternalACLLookup::checkForAsync() -- the code that starts communication
with the external ACL helper -- relies on two implicit assumptions:
* AclMatchedName global matches this->name during ACLExternal::match().
* FindByName("x") result never changes during same-Checklist evaluation.
The first assumption might hold[^1], but the second assumption will fail
once we finally start refcounting ACLs instead of allowing
reconfiguration to break in-flight transactions waiting for a slow ACL
match with ACCESS_DUNNO answers. Today, reconfiguration invalidates all
ACL objects, making repeated same-Checklist checkForAsync() calls
impossible across reconfiguration. Tomorrow, such calls will happen and
violation of the second assumption will lead to assertions or worse[^2].
Thus, removing that assumption is a precondition on ACL refcounting and
smooth reconfiguration support.
ExternalACLLookup used FindByName() to find the ACL being matched. The
source code comment suggested that we "have a pointer to this around
somewhere". That pointer is "this" pointer of an ACL calling goAsync().
It is available in the call stack two frames higher, but is not stored
by the Checklist in any usable form.
To remove that FindByName() call, we could just add a second goAsync()
parameter to pass the ACL pointer to Checklist. Doing that would require
changing all the goAsync() callers and the corresponding
ACLChecklist::AsyncState API. Since that very complex and confusing API
was actually used as a basic function pointer, we decided it is best to
replace the has-to-be-changed-anyway API with a function pointer. That
plan worked well, removing a lot of unnecessary and confusing code!
[^1]: It is difficult to be sure: AclMatchedName global was named and
added for a very different purpose, and no API enforces that invariant.
[^2]: When ACLs are refcountered, ACLExternal::match() will continue to
set AclMatchedName global to the name (e.g., "x") of an ACL object that
resumes matching after Squid reconfigures and receives a helper reply.
However, the same FindByName("x") call may now return nil or a
completely different ACL object (still named "x" in the new
configuration, but possibly no longer an ACLExternal object)!
Nicolai Moore [Wed, 15 Nov 2023 00:59:59 +0000 (00:59 +0000)]
Bug 5319: QOS Netfilter MARK preservation is always disabled (#1585)
Default ./configure options and explicit --enable-zph-qos enabled ZPH
QOS support (USE_QOS_TOS) as expected but did not enable QOS Netfilter
MARK preservation support (USE_LIBNETFILTERCONNTRACK). For example,
qos_flows directive became available, but clientside_mark and
client_connection_mark ACL types were not recognized.
The missing opening bracket before AS_IF() condition injected a trailing
closing bracket into that condition, resulting in an always-false
condition for setting USE_LIBNETFILTERCONNTRACK.
Alex Rousskov [Tue, 14 Nov 2023 18:40:37 +0000 (18:40 +0000)]
Bug 5317: FATAL attempt to read data from memory (#1579)
FATAL: Squid has attempted to read data ... that is not present.
Recent commit 122a6e3 attempted to deliver in-memory response body bytes
to a Store-reading client that requested (at least) response headers.
That optimization relied on the old canReadFromMemory() logic, but that
logic results in false positives when the checked read offset falls into
a gap between stored headers and the first body byte of a Content-Range.
In that case, a false positive leads to a readFromMemory() call and a
FATAL mem_hdr::copy() error.
This workaround disables the above optimization without fixing
canReadFromMemory(). We believe that a readFromMemory() call that comes
right after response headers are delivered to the Store-reading client
will not suffer from the same problem because the client will supply the
read offset of the first body byte, eliminating the false positive.
Andreas Weigel [Tue, 14 Nov 2023 15:17:09 +0000 (15:17 +0000)]
FTP: Ignore credentials with a NUL-prefixed username (#1557)
FATAL: FTP login parsing destroyed username info
This bug was discovered and detailed by Joshua Rogers at
https://megamansec.github.io/Squid-Security-Audit/ftp-fatal.html
where it was filed as "FTP Authentication Crash".
Alex Rousskov [Sun, 12 Nov 2023 09:33:20 +0000 (09:33 +0000)]
Do not update StoreEntry expiration after errorAppendEntry() (#1580)
errorAppendEntry() is responsible for setting entry expiration times,
which it does by calling StoreEntry::storeErrorResponse() that calls
StoreEntry::negativeCache().
This change was triggered by a vulnerability report by Joshua Rogers at
https://megamansec.github.io/Squid-Security-Audit/cache-uaf.html where
it was filed as "Use-After-Free in Cache Manager Errors". The reported
"use after free" vulnerability was unknowingly addressed by 2022 commit 1fa761a that removed excessively long "reentrant" store_client calls
responsible for the disappearance of the properly locked StoreEntry in
this (and probably other) contexts.
2015 commit 21cd322 started to continue ClientStream processing after
socket write(2) failures. In most cases, the code still "worked". For
example, initiateClose() would close the client-Squid connection, and
connStateClosed() would be called before Store has a chance to deliver
response body data requested by pullData() in writeComplete().
However, that response body data could sometimes reach Server, and
handleReply() would assert because startOfOutput() says that we have not
written the headers, but ClientStream state (i.e. a nil `rep` parameter)
says that we have. These assertion can be triggered by disabling
initiateClose(), and they can probably be triggered by traffic as well.
Now, after a Comm::Write() error, we terminateAll() client transactions
on the failed connection[^1] and do not call afterClientWrite() that is
not equipped to handle I/O errors and would continue ClientStream
processing if called.
This bug was discovered and detailed by Joshua Rogers at
https://megamansec.github.io/Squid-Security-Audit/stream-assert.html
where it was filed as "Implicit Assertion in Stream Handling".
[^1]: We terminateAll() instead of potentially postponing closure with
initiateClose() because the failed client-Squid connection most likely
cannot be salvaged for, say, reading the remainder of the request body.
Once store_dirs_rebuilding reaches 1, we now use counts.started() to
check whether the kid was responsible for indexing any cache_dirs. We no
longer report bogus/empty indexing stats if the kid was not responsible.
TODO: This change preserves a storeDigestNoteStoreReady() call at the
end of indexing. Cache Digests probably do not work "as is" in SMP mode,
but fixing that is difficult and deserves a dedicated project.
Some of the serious cache_dir indexing errors (such as job-stopping
exceptions) were not logged at level 1. After such an error, Squid would
end up using a small (possibly empty) subset of cache_dir slots, usually
without admin knowledge. Other serious errors were reported and treated
as fatal. We now report all serious errors and treat them as fatal.
Also added more rebuild information to reported errors.
Also removed the "Do you need to run 'squid -z' ..." hint. "squid -z" is
only useful for ENOENT, but SwapDir::init() essentially prevents ENOENT
during Rebuild by not starting a Rebuild job when cache_dir is missing.
Alex Rousskov [Tue, 7 Nov 2023 02:50:47 +0000 (02:50 +0000)]
Fix ipv4 and expand ipv6 ACL parameter matching (#1568)
The undocumented "ipv4", "ipv6", and "all" parameters can be used in
src, dst, and localip ACLs. This refactoring fixes "ipv4" matching and
expands "ipv6" scope as detailed further below. These parameters remain
undocumented.
Together, the fixed ipv4 parameter and the adjusted ipv6 parameter now
match any IP address known to Squid. So does the undocumented "all"
parameter (which is currently used to implement the "all" dst ACL).
Also speed up matching of ACLs containing "ipv4", "ipv6", and "all"
parameters (at the cost of adding two boolean tests to other src, dst,
and localip ACLs). This performance improvement is a side effect of the
corresponding code improvements and is not the focus of these changes.
Also improved mgr:config reporting of ACLs containing these special
parameters. They are now reported using their names (e.g., "all") rather
than IP addresses (e.g., `::/0` instead of "all").
### Fix ipv4 ACL parameter to match (any IPv4 address)
The "ipv4" parameter is supposed to match any IPv4 address, but it
matched none because an unmasked[^1] IPv4 input address was compared to
an all-0s address[^2], while any IPv4 address in Squid has some 1s due
to IPv4-to-IPv6 mapping used by Ip::Address. Those 1s could not match.
The "ipv4" parameter did match one address -- an IPv6 address `::`[^2].
This matching was possibly broken since inception (2009 commit 7764e92).
[^1]: Calling mask.setNoAddr() sets all sin6_addr bits to 1 while
calling mask.applyMask(0, AF_INET) does nothing at all.
[^2]: We did not set q->addr1. Ip::Address default constructor sets all
Ip::Address bits to 0. For ACL IP comparison purposes, that is
equivalent to an IPv6 `::` address (all sin6_addr bits are zero).
### Expand ipv6 ACL parameter scope to match all IPv6 addresses
The "ipv6" parameter was not matching many IPv6 addresses, including
some frequently seen ones like `::1` (i.e. IPv6 localhost). Now it
matches any IP address that Squid code itself considers an IPv6 address,
reducing surprises among admins that need to treat all IPv6 addresses
specially.
Squids receives IPv6 addresses from traffic, configuration, or
hard-coded constants even when ./configured with --disable-ipv6 or when
IPv6 support was automatically disabled at startup after failing IPv6
tests. To handle IPv6 correctly, such Squids must support basic IPv6
operations like recognizing an IPv6 address in a request-target or
reporting an unsolicited IPv6 DNS record. At least for now, such Squids
must also correctly parse configuration-related IPv6 addresses.
All those activities rely on various low-level operations like filling
addrinfo structure with IP address information. Since 2012 commit c5fbbc7, Ip::Address::getAddrInfo() was failing for IPv6 addresses when
Ip::EnableIpv6 was falsy. That change correctly recognized[^1] the need
for such Squids to handle IPv6, but to support basic operations, we need
to reject IPv6 addresses at a higher level and without asserting.
That high-level rejection work is ongoing, but initial attempts have
exposed difficult problems that will take time to address. For now, we
just avoid the assertion while protecting IPv6-disabled Squid from
listening on or opening connections to IPv6 addresses. Since Squid
already expects (and usually correctly handles) socket opening failures,
disabling those operations is better than failing in low-level IP
manipulation code.
The overall IPv6 posture of IPv6-disabled Squids that lack http_access
or other rules to deny IPv6 requests will change: This fix exposes more
of IPv6-disabled Squid code to IPv6 addresses. It is possible that such
exposure will make some IPv6 resources inside Squid (e.g., a previously
cached HTTP response) accessible to external requests. Squids will not
open or accept IPv6 connections but may forward requests with raw IPv6
targets to IPv4 cache_peers. Whether these and similar behavior changes
are going to be permanent is open for debate, but even if they are
temporary, they are arguably better than the corresponding assertions.
These changes do not effect IPv6-enabled Squids.
The assertion in IPv6-disabled Squid was reported by Joshua Rogers at
https://megamansec.github.io/Squid-Security-Audit/ipv6-assert.html where
it was filed as "Assertion on IPv6 Host Requests with --disable-ipv6".
Alex Rousskov [Wed, 1 Nov 2023 03:16:12 +0000 (03:16 +0000)]
Remove mem_hdr::freeDataUpto() assertion (#1562)
stmem.cc:98: "lowestOffset () <= target_offset"
The assertion is conceptually wrong: The given target_offset parameter
may have any value; that value does not have to correlate with mem_hdr
state in any way. It is freeDataUpto() job to preserve nodes at or above
the given offset and (arguably optionally) remove nodes below it, but
the assertion does not actually validate that freeDataUpdo() did that.
The old mem_hdr::freeDataUpto() assertion incorrectly assumed that,
after zero or more unneeded memory nodes were freed, the remaining
memory area never starts after the given target_offset parameter. That
assumption fails in at least two use cases, both using target_offset
values that do not belong to any existing or future mem_hdr node:
1. target_offset is points to the left of the first node. freeDataUpto()
correctly keeps all memory nodes in such calls, but then asserts. For
example, calling freeDataUpto(0) when mem_hdr has bytes [100,199)
triggers this incorrect assertion.
2. target_offset is in the gap between two nodes. For example, calling
freeDataUpto(2000) when mem_hdr contains two nodes: [0,1000) and
[3000,3003) will trigger this assertion (as happened in Bug 5309).
Such gaps are very common for HTTP 206 responses with a Content-Range
header because such responses often specify a range that does not
start with zero and create a gap after the node(s) with HTTP headers.
Bugs notwithstanding, it is unlikely that relevant calls exist today,
but they certainly could be added, especially when freeDataUpto() stops
preserving the last unused node. The current "avoid change to [some
unidentified] part of code" hoarding excuse should not last forever.
Prior to commit 122a6e3, Squid did not (frequently) assert in gap cases:
Callers first give target_offset 0 (which results in freeDataUpto()
doing nothing, keeping the header node(s)) and then they give
target_offset matching the beginning of the first body node (which
results in freeDataUpto() freeing the header nodes(s) and increasing
lowerOffset() from zero to target_offset). A bug in commit 122a6e3
lowered target_offset a bit, placing target_offset in the gap and
triggering frequent (and incorrect) assertions (Bug 5309).
Recent commit 122a6e3 left store_client::readOffset() unchanged but
should have adjusted it to match changed copyInto.offset semantics:
Starting with that commit, storeClientCopy() callers supply HTTP
response _body_ offset rather than HTTP response offset.
This bug decreased readOffset() values (by the size of stored HTTP
response headers), effectively telling Store that we are not yet done
with some of the MemObject/mem_hdr bytes. This bug could cause slightly
higher transaction memory usage because the same response bytes are
trimmed later. This bug should not have caused any assertions.
However, the old mem_hdr::freeDataUpto() code that uses readOffset() is
also broken -- the assertion in that method only "works" when
readOffset() returns values matching a memory node boundary. The smaller
values returned by buggy readOffset() triggered buggy assertions.
This minimal fix removes the recent store_client::readOffset() bug
described above. We will address old mem_hdr problems separately.
Alex Rousskov [Tue, 31 Oct 2023 11:35:02 +0000 (11:35 +0000)]
Fix infinite recursion when parsing HTTP chunks (#1553)
This change stops infinite HttpStateData recursion with at-max-capacity
inBuf. Such inBuf prevents progress in the following call chain:
* processReply()
* processReplyBody() and decodeAndWriteReplyBody()
* maybeReadVirginBody()
* maybeMakeSpaceAvailable() -- tries but fails to quit processing
* processReply()
HttpStateData::maybeMakeSpaceAvailable() no longer calls processReply(),
preventing recursion.
maybeReadVirginBody() now aborts transactions that would otherwise get
stalled due to full read buffer at its maximum capacity. This change
requires that all maybeReadVirginBody() callers do actually need more
response data to make progress. AFAICT, that (natural) invariant holds.
We moved transaction stalling check from maybeMakeSpaceAvailable() into
its previous callers. Without that move, maybeMakeSpaceAvailable() would
have to handle both abortTransaction() and delayRead() cases. Besides
increased code complexity, that would trigger some premature delayRead()
calls (at maybeReadVirginBody() time). Deciding whether to delay socket
reads is complicated, the delay mechanism is expensive, and delaying may
become unnecessary by the time the socket becomes readable, so it is
best to continue to only delayRead() at readReply() time, when there is
no other choice left.
maybeReadVirginBody() mishandled cases where progress was possible, but
not _immediately_ -- it did nothing in those cases, probably stalling
transactions when maybeMakeSpaceAvailable() returned false but did not
call processReply(). This is now fixed: maybeReadVirginBody() now starts
waiting for the socket to be ready for reading in those cases,
effectively passing control to readReply() that handles them.
maybeReadVirginBody() prematurely grew buffer for future socket reads.
As a (positive) side effect of the above refactoring, we now delay
buffer growth until the actual read(2) time, which is best for
performance. Most likely, this premature buffer growth was an accident:
maybeReadVirginBody() correctly called maybeMakeSpaceAvailable() with
doGrow set to false. However, maybeMakeSpaceAvailable() misinterpreted
doGrow as a "do not actually do it" parameter. That bug is now gone.
This recursion bug was discovered and detailed by Joshua Rogers at
https://megamansec.github.io/Squid-Security-Audit/
where it was filed as "Chunked Encoding Stack Overflow".
Alex Rousskov [Fri, 27 Oct 2023 21:27:20 +0000 (21:27 +0000)]
Exit without asserting when helper process startup fails (#1543)
... to dup() after fork() and before execvp().
Assertions are for handling program logic errors. Helper initialization
code already handled system call errors correctly (i.e. by exiting the
newly created helper process with an error), except for a couple of
assert()s that could be triggered by dup(2) failures.
This bug was discovered and detailed by Joshua Rogers at
https://megamansec.github.io/Squid-Security-Audit/ipc-assert.html
where it was filed as 'Assertion in Squid "Helper" Process Creator'.
Alex Rousskov [Wed, 25 Oct 2023 19:41:45 +0000 (19:41 +0000)]
RFC 1123: Fix date parsing (#1538)
The bug was discovered and detailed by Joshua Rogers at
https://megamansec.github.io/Squid-Security-Audit/datetime-overflow.html
where it was filed as "1-Byte Buffer OverRead in RFC 1123 date/time
Handling".
Alex Rousskov [Wed, 25 Oct 2023 11:47:19 +0000 (11:47 +0000)]
Improve handling of expanding HTTP header values (#1536)
Squid manipulations often increase HTTP header value length compared to
the corresponding raw value received by Squid. Raw header length is
checked against request_header_max_size and reply_header_max_size that
default to 64KB, making the raw value safe to store in a String object
(by default). However, when the increased length of a manipulated value
exceeds String class limits, Squid leaks memory, asserts, or possibly
stalls affected transactions. The long-term fix for this problem is a
complete String elimination from Squid sources, but that takes time.
Known manipulations may effectively concatenate headers and/or increase
header value length by 50%. This workaround makes such known increases
safe by essentially tripling String class limits:
(64KB + 64KB) * 150% = 3 * 64KB
This bug was discovered and detailed by Joshua Rogers at
https://megamansec.github.io/Squid-Security-Audit/response-memleaks.html
where it was filed as "Memory Leak in HTTP Response Parsing".
Since recent commit a4e35bd removed cache_object support, popular
clients like wget and curl can do everything squidclient can, with one
exception: They cannot expand `mgr:foo` macros into
`http://localhost:3128/squid-internal-mgr/foo` URLs. That single feature
is easily emulated and not worth keeping (fairly heavy) squidclient for,
especially since recent Squid security improvements often require
customizing squidclient commands with more and more options (to address
problems tracked in Squid bug 5283).
Alex Rousskov [Fri, 20 Oct 2023 22:24:45 +0000 (22:24 +0000)]
negotiate_wrapper_auth: protect from responses over 64KB (#1530)
... received from NTLM and Kerberos helpers.
This code uses MAX_AUTHTOKEN_LEN (~64KB) buffers to read response lines.
fgets(3) guarantees to terminate the supplied buffer, but it does not
return nil when the input line is larger than the buffer. We have
already detected such "Oversized message" cases for fgets(stdin) calls,
but not for fgets(FDNOUT) and fgets(FDKOUT) calls.
Alex Rousskov [Thu, 19 Oct 2023 02:34:28 +0000 (02:34 +0000)]
Fix "overriding recipe" gmake warnings in src/html/Makefile.am (#1528)
Makefile:1254: warning: overriding recipe for target 'testHeaders'
Makefile:1233: warning: ignoring old recipe for target 'testHeaders'
Makefile:1257: warning: overriding recipe for target '.h.hdrtest'
Makefile:1236: warning: ignoring old recipe for target '.h.hdrtest'
Makefile.am included a second TestHeaders.am since inception in recent
commit 1d0bc8e: Common.am already includes TestHeaders.am.
Andreas Weigel [Wed, 18 Oct 2023 04:14:31 +0000 (04:14 +0000)]
Fix validation of certificates with CN=* (#1523)
The bug was discovered and detailed by Joshua Rogers at
https://megamansec.github.io/Squid-Security-Audit/
where it was filed as "Buffer UnderRead in SSL CN Parsing".
Alex Bason [Sun, 15 Oct 2023 13:04:47 +0000 (13:04 +0000)]
Fix stack buffer overflow when parsing Digest Authorization (#1517)
The bug was discovered and detailed by Joshua Rogers at
https://megamansec.github.io/Squid-Security-Audit/digest-overflow.html
where it was filed as "Stack Buffer Overflow in Digest Authentication".
Recognize internal requests created by adaptation/redirection (#1504)
Before this fix, Squid set flags.internal for virgin requests but not
for adapted/redirected requests, leaving post-adaptation request
processing code in an inconsistent state, forwarding the
internalCheck()-compliant adapted/redirected requests as regular
requests, triggering forwarding loops and complicating code refactoring.
Martin Grimm [Mon, 9 Oct 2023 17:10:43 +0000 (17:10 +0000)]
Rewrite SplayNode to eliminate recursive calls (#1431)
Recursive method calls in SplayNode can lead to a stack overflow with
large (degenerate) trees, e.g. after creating a large dst acl from a
sorted ip list.