Amos Jeffries [Sun, 22 Jun 2025 00:48:05 +0000 (00:48 +0000)]
Fix missing CONTRIBUTOR name (#2091)
Despite CONTRIBUTORS file being UTF-8 Ture's name
was dropped by our conversion script which does not
handle non-ASCII characters nicely. Re-add it manually.
Do not duplicate received Surrogate-Capability in sent requests (#2087)
When computing Surrogate-Capability header while forwarding an
accelerated request, Squid duplicated old (i.e. received) header entries
(if any). For example, this outgoing request shows an extra hop1 entry:
GET / HTTP/1.1
...
Surrogate-Capability: hop1="Surrogate/1.0"
Surrogate-Capability: hop1="Surrogate/1.0", hop2="Surrogate/1.0"
Carl Vuosalo [Wed, 28 May 2025 19:17:38 +0000 (19:17 +0000)]
Fix SNMP cacheNumObjCount -- number of cached objects (#2053)
SNMP counter cacheNumObjCount used StoreEntry::inUseCount() stats. For
Squid instances using a rock cache_dirs or a shared memory cache, the
number of StoreEntry objects in use is usually very different from the
number of cached objects because these caches do not use StoreEntry
objects as a part of their index. For all instances, inUseCount() also
includes ongoing transactions and internal tasks that are not related to
cached objects at all.
We now use the sum of the counters already reported on "on-disk objects"
and "Hot Object Cache Items" lines in "Internal Data Structures" section
of `mgr:info` cache manager report. Due to floating-point arithmetic,
these stats are approximate, but it is best to keep SNMP and cache
manager reports consistent.
This change does not fix SNMP Gauge32 overflow bug: Caches with 2^32 or
more objects continue to report wrong/smaller cacheNumObjCount values.
### On MemStore::getStats() and StoreInfoStats changes
To include the number of memory-cached objects while supporting SMP
configurations with shared memory caches, we had to change how cache
manager code aggregates StoreInfoStats::mem data collected from SMP
worker processes. Before these changes, `StoreInfoStats::operator +=()`
used a mem.shared data member to trigger special aggregation code hack,
but
* SNMP-specific code cannot benefit from that StoreInfoStats aggregation
because SNMP code exchanges simple counters rather than StoreInfoStats
objects. `StoreInfoStats::operator +=()` is never called by SNMP code.
Instead, SNMP uses Snmp::Pdu::aggregate() and friends.
* We could not accommodate SNMP by simply adding special aggregation
hacks directly to MemStore::getStats() because that would break
critical "all workers report about the same stats" expectations of the
special hack in `StoreInfoStats::operator +=()`.
To make both SNMP and cache manager use cases work, we removed the hack
from StoreInfoStats::operator +=() and hacked MemStore::getStats()
instead, making the first worker responsible for shared memory cache
stats reporting (unlike SMP rock diskers, there is no single kid process
dedicated to managing a shared memory cache). StoreInfoStats operator
now uses natural aggregation logic without hacks.
TODO: After these changes, StoreInfoStats::mem.shared becomes
essentially unused because it was only used to enable special
aggregation hack in StoreInfoStats that no longer exists. Remove?
Bug 5352: Do not get stuck in RESPMOD after pausing peer read(2) (#2065)
The transaction gets stuck if Squid, while sending virgin body bytes to
an ICAP RESPMOD service, temporary stops reading additional virgin body
bytes from cache_peer or origin server. Squid pauses reading (with
readSizeWanted becoming zero) if reading more virgin bytes is temporary
prohibited by delay pools and/or read_ahead_gap limits:
readReply: avoid delayRead() to give adaptation a chance to drain...
HttpStateData::readReply() starts waiting for ModXact to drain the
BodyPipe buffer, but that draining may not happen, either because
ModXact::virginConsume() is not called at all[^1] or because it is
"postponing consumption" when BodyPipe still has some unused space[^2].
With HttpStateData not reading more virgin bytes, Squid may not write
more virgin body bytes to the ICAP service, and the ICAP service may not
start or continue responding to the RESPMOD request. Without that ICAP
activity, ModXact does not consume, the virgin BodyPipe buffer is not
drained, HttpStateData is not reading, and no progress is possible.
HttpStateData::readReply() should start waiting for adaptation to drain
BodyPipe only when the buffer becomes completely full (instead of when
it is not empty). This change may increase virgin response body bytes
accumulation but not the buffer capacity because existing buffer
space-increasing logic in maybeMakeSpaceAvailable() remains intact.
To prevent stalling, both BodyPipe ends (i.e. HttpStateData and
Icap::ModXact) must use matching "progress is possible" conditions, but
* HttpStateData used hasContent()
* Icap::ModXact used spaceSize()
* Ftp::Client used potentialSpaceSize()
Now, all three use matching potentialSpaceSize()-based conditions.
Squid eCAP code is unaffected by this bug, because it does not postpone
BodyPipe consumption. eCAP API does not expose virgin body buffer
capacity, so an eCAP adapter that postpones consumption risks filling
the virgin body buffer and stalling. This is an eCAP API limitation.
[^1]: Zero readSizeWanted is reachable without delay pools, but only if
Squid receives an adapted response (that makes readAheadPolicyCanRead()
false by filling StoreEntry). Ideally, receiving an adapted response
should result in a virginConsume() calls (that would trigger BodyPipe
draining), but currently it may not. Reliably starting virgin data
consumption sooner is not trivial and deserves a dedicated change.
[^2]: ModXact postpones consumption to preserve virgin bytes for ICAP
retries and similar purposes. ModXact believes it is safe to postpone
because there is still space left in the buffer for HttpStateData to
continue to make progress. ModXact would normally start or resume
draining the buffer when sending more virgin bytes to the ICAP service.
Nil request dereference in ACLExtUser and SourceDomainCheck ACLs (#1931)
ACLExtUser-based ACLs (i.e. ext_user and ext_user_regex) dereferenced a
nil request pointer when they were used in a context without a request
(e.g., when honoring on_unsupported_protocol).
SourceDomainCheck-based ACLs (i.e. srcdomain and srcdom_regex) have a
similar bug, although we do not know whether broken slow ACL code is
reachable without a request (e.g., on_unsupported_protocol tests cannot
reach that code until that directive starts supporting slow ACLs). This
change does not start to require request presence for these two ACLs to
avoid breaking any existing configurations that "work" without one.
Portability: remove explicit check for libdl (#1963)
OpenBSD does not have libdl, as it has dlopen() in libc.
It is not really needed, and force-requiring the presence of libdl
causes ./configure to fail on openbsd:
checking for dlopen in -ldl... no
configure: error: Required library 'dl' not found
Alex Rousskov [Sat, 13 Apr 2024 08:15:00 +0000 (08:15 +0000)]
Bug 5352: Do not get stuck when RESPMOD is slower than read(2) (#1777)
... RESPMOD BodyPipe buffer becomes full ...
maybeMakeSpaceAvailable: will not read up to 0
The AsyncCall Client::noteDelayAwareReadChance constructed
... RESPMOD consumes some buffered virgin body data ...
entering BodyProducer::noteMoreBodySpaceAvailable
leaving BodyProducer::noteMoreBodySpaceAvailable
When RESPMOD does not empty its adaptation BodyPipe buffer fast enough,
readReply() may eventually fill that buffer and call delayRead(),
anticipating a noteDelayAwareReadChance() callback from Store or Server
delay pools. That callback never happens if Store and Server are not
getting any data -- they do not even start working until RESPMOD service
starts releasing adapted/echoed response back to Squid! Meanwhile, our
flags.do_next_read (cleared by readReply() caller) remains false.
When/if RESPMOD service eventually frees some BodyPipe buffer space,
triggering noteMoreBodySpaceAvailable() notification, nothing changes
because maybeReadVirginBody() quits when flags.do_next_read is false.
noteMoreBodySpaceAvailable() could not just make flags.do_next_read true
because that flag may be false for a variety of other/permanent reasons.
Instead, we replaced that one-size-fits-all flag with more specific
checks so that reading can resume if it is safe to resume it. This
change addresses a couple of flag-related XXXs.
The bug was introduced in 2023 commit 50c5af88. Prior that that change,
delayRead() was not called when RESPMOD BodyPipe buffer became full
because maybeMakeSpaceAvailable() returned false in that case, blocking
maybeReadVirginBody() from triggering readReply() via Comm::Read(). We
missed flags.do_next_read dependency and that Store-specific delayRead()
cannot be used to wait for adaptation buffer space to become available.
XXX: To reduce risks, this change duplicates a part of
calcBufferSpaceToReserve() logic. Removing that duplication requires
significant (and risky) refactoring of several related methods.
Alex Rousskov [Sun, 1 Sep 2024 00:39:34 +0000 (00:39 +0000)]
Bug 5405: Large uploads fill request buffer and die (#1887)
maybeMakeSpaceAvailable: request buffer full
ReadNow: ... size 0, retval 0, errno 0
terminateAll: 1/1 after ERR_CLIENT_GONE/WITH_CLIENT
%Ss=TCP_MISS_ABORTED
This bug is triggered by a combination of the following two conditions:
* HTTP client upload fills Squid request buffer faster than it is
drained by an origin server, cache_peer, or REQMOD service. The buffer
accumulates 576 KB (default 512 KB client_request_buffer_max_size + 64
KB internal "pipe" buffer).
* The affected server or service consumes a few bytes after the critical
accumulation is reached. In other words, the bug cannot be triggered
if nothing is consumed after the first condition above is met.
Comm::ReadNow() must not be called with a full buffer: Related
FD_READ_METHOD() code cannot distinguish "received EOF" from "had no
buffer space" outcomes. Server::readSomeData() tried to prevent such
calls, but the corresponding check had two problems:
* The check had an unsigned integer underflow bug[^1] that made it
ineffective when inBuf length exceeded Config.maxRequestBufferSize.
That length could exceed the limit due to reconfiguration and when
inBuf space size first grew outside of maybeMakeSpaceAvailable()
protections (e.g., during an inBuf.c_str() call) and then got filled
with newly read data. That growth started happening after 2020 commit 1dfbca06 optimized SBuf::cow() to merge leading and trailing space.
Prior to that commit, Bug 5405 could probably only affect Squid
reconfigurations that lower client_request_buffer_max_size.
* The check was separated from the ReadNow() call it was meant to
protect. While ConnStateData was waiting for the socket to become
ready for reading, various asynchronous events could alter inBuf or
Config.maxRequestBufferSize.
This change fixes both problems.
This change also fixes Squid Bug 5214.
[^1]: That underflow bug was probably introduced in 2015 commit 4d1376d7
while trying to emulate the original "do not read less than two bytes"
ConnStateData::In::maybeMakeSpaceAvailable() condition. That condition
itself looks like a leftover from manual zero-terminated input buffer
days that ended with 2014 commit e7287625. It is now removed.
Alex Rousskov [Tue, 31 Dec 2024 20:40:46 +0000 (20:40 +0000)]
Clarify --enable-ecap failure on missing shared library support (#1968)
checking if libtool supports shared libraries... no
checking whether to build shared libraries... no
configure: error: eCAP support requires loadable modules.
Please do not use --disable-shared with --enable-ecap.
After 2022 commit 5a2409b7, our advice for handling the above error
became misleading in environments that do not --disable-shared
explicitly but lack shared libraries support for other reasons
Bug 5093: List http_port params that https_port/ftp_port lack (#1977)
To avoid documentation duplication, current https_port and ftp_port
directive descriptions reference http_port directive instead of
detailing their own supported parameters. For https_port, this solution
creates a false impression that the directive supports all http_port
options. Our ftp_port documentation is better but still leaves the
reader guessing which options are actually supported.
This change starts enumerating http_port configuration parameters that
ftp_port and https_port directives do _not_ support. Eventually, Squid
should reject configurations with unsupported listening port options.
Alex Rousskov [Wed, 13 Nov 2024 18:16:08 +0000 (18:16 +0000)]
Annotate PoolMalloc memory in valgrind builds (#1946)
MemPoolMalloc code (i.e. memory_pools code used by default) was missing
VALGRIND_MAKE_MEM_*() calls. Similar calls do exist in MemPoolChunked
code (i.e. code enabled by setting MEMPOOLS environment variable to 1).
Even with these markings, "memory_pools on" configuration is still not
quite compatible with Valgrind leak reporting suppressions: In some
cases, Valgrind may still incorrectly suppress leak reporting (or report
leaks that should have been suppressed) because Valgrind associates leak
suppressions with memory _allocators_ while buggy code may leak memory
allocated by others. The long-term solution (if it exists) requires
upgrading these markings to VALGRIND_MEMPOOL_*() API targeting memory
pools, but that requires a serious effort, especially when dealing with
MemPoolChunked complexities. The added markings help detect non-leak
violations and improve PoolMalloc/MemPoolChunked code symmetry.
Amos Jeffries [Tue, 5 Nov 2024 16:46:41 +0000 (16:46 +0000)]
Fix systemd startup sequence to require active Local Filesystem (#1937)
Squid requires Local Filesystem to be active. While uncommon, it
may in some cases be incomplete or delayed. Ensure that the
dependency is explicitly listed to prevent failure from Squid
early initialization.
This change resolves Debian Bug 956581:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=956581
CI: Sync GitHub Actions config with master as of commit 240efcb (#1922)
Test staged commits using our docker images from Jenkins tests.
The added tests (see slow.yaml) were easier to adopt; we are also
working on enabling GitHub Actions for FreeBSD and some other images.
We do not run these added tests for PR commits because existing Ubuntu
tests (defined in quick.yaml) already expose the vast majority of build
problems, and we are worried that running a lot more tests for each PR
push event would consume too much GitHub resources and significantly
increase noise in PR checks summary, obscuring often-easier-to-handle
failures detected by quick.yaml tests.
----
Backported all v7 GitHub Actions changes as of master commit 240efcb,
except those related to master-specific CodeQL and MacOS tests.
Supporting logging to a file complicates upgrading helper code to use
debugs() because DebugFile code calls commSetCloseOnExec(), and our
comm/libminimal does not currently provide a functioning implementation
for that API: The existing implementation is an unconditional assert.
To save development time while upgrading helpers, we are dropping this
feature. It can probably be emulated using shell redirection tricks.
Fix validation of Digest auth header parameters (#1906)
Insufficient validation of Digest authentication parameters resulted in
a DigestCalcHA1() call that dereferenced a nil pointer.
This bug was discovered and detailed by Joshua Rogers at
https://megamansec.github.io/Squid-Security-Audit/ where it was filed as
"strlen(NULL) Crash Using Digest Authentication".
Bootstrap libltdl to fix libtool v2.4 + automake v1.17 (#1900)
Bootstrap libltdl to fix libtool v2.4 + automake v1.17 build
gmake[3]: Entering directory .../libltdl
.../cfgaux/missing: line 85: aclocal-1.16: command not found
gmake[3]: *** [Makefile:561: .././../libltdl/aclocal.m4]
Error 127
During bootstrap.sh run, libtoolize copies prepackaged
configure and Makefile.in files into our libltdl directory:
* libltdl/configure from libtool v2.4 has aclocal version set
to 1.16;
* libltdl/Makefile.in from libtool v2.4 uses
configure-set aclocal version to build aclocal.m4
Thus, libltdl/Makefile (generated from libltdl/Makefile.in
above) runs aclocal-1.16 if "make" needs to build
libltdl/aclocal.m4.
Normally, "make" does not need to build libltdl/aclocal.m4
because that file was created by libtoolize. However, libtool
v2.4 is packaged with (generated by packaging folks)
libltdl/Makefile.in that makes libltdl/aclocal.m4 target
dependent on files in libltld/../m4 directory. Squid does not
have that ./m4 directory, so "make" attempts to re-generate
libltdl/aclocal.m4. When it does, it uses aclocal-1.16.
Our bootstrap.sh generated new ./configure but preserved
copied libltdl/configure with its aclocal version set to
1.16. In other words, our bootstrap.sh did not bootstrap
libltdl sub-project. In build environments without
aclocal-1.16, Squid build failed.
Several solutions or workarounds have been tried or
considered:
* Adjust bootstrap.sh to bootstrap libltdl (this change).
2008 attempt to do that was reverted in commit bfd6b6a9 with
"better to fix libtool installation" rationale. Another
potential argument against this option is that packages
should be bootstrapped by their distributors, not "users". We
are not distributing libtool, but this is a gray area because
we do distribute files that libtoolize creates. Finally,
libtool itself does not provide a bootstrapping script and
does not explicitly recommend bootstrapping in documentation.
* "Fix libtool installation". We failed to find a good way to
do that on MacOS (without building and installing newer
libtool from sources).
* Place m4 files where libtool v2.4 expects to find them.
That change fixes MacOS builds that use automake v1.17, but
breaks Gentoo builds because Gentoo libtool installs a buggy
libltdl/Makefile.in that must be regenerated by automake
before it can work. Fixing m4 files location prevents that
regeneration.
We picked the first option despite its drawbacks because the
third option did not work on Gentoo, and asking Squid
developers to install libtool from sources (i.e. the second
option) felt like a greater evil.
This old problem was exposed by recently introduced CI MacOS
tests that started to fail when MacOS brew updated automake
package from v1.16 without the corresponding libtoolize
package changes.
Also work around what seems to be a libtool packaging bug
affecting MacOS/brew environments, including GitHub Actions
runners we use for CI:
That workaround will be removed after libtool package is
fixed.
Also removed a single-iteration "for dir" loop with several
stale hacks from bootstrap.sh: With only two directories to
bootstrap and with a directory-specific mkdir command, source
comments, and progress messages, it is best to unroll that
loop.
Use git to extract default build-info (when enabled) (#1892)
Have configure option --enable-build-info[=yes]
refer to git instead of bazaar to extract
information about what is being built to include
in the output of `squid -v`
This change fixes a few manually picked cases in sources that are not
normally compiled on our CI nodes. There are probably more missing
declarations, but we do not know how to find them automatically.
Also removed a few seemingly unused functions and declarations.
This flag became effectively unused in 2010 commit cfd66529 when
copyFDFlags() -- the only function checking the flag -- became unused.
That unused function was removed a bit later in commit 5ae21d99.
Alex Rousskov [Wed, 10 Jul 2024 10:30:33 +0000 (10:30 +0000)]
Recover after failing to open a TCP connection to DNS server (#1861)
ERROR: Failed to connect to nameserver 127.0.0.1 using TCP.
After failing to establish a TCP connection to a DNS server, all DNS
queries that needed a TCP connection to that DNS server would timeout
because the nsvc object representing TCP connectivity got stuck in a
"queuing new queries but too busy to send any right now" state. Such
timeouts typically lead to HTTP 503 ERR_DNS_FAIL responses. This bug was
introduced when Comm closure handler registration was moved/delayed in
2010 commit cfd66529.
With this change, the affected nsvc object is destroyed, and Squid
attempts to open another TCP connection to the DNS server (when needed).
The original query is typically retried (subject to dns_timeout and
dns_retransmit_interval idiosyncrasies).
XXX: This fix increases the surface of reconfiguration and shutdown
problems documented in nsvc class destructor XXX.
Alex Rousskov [Mon, 15 Jul 2024 21:59:54 +0000 (21:59 +0000)]
Do not drop queued TCP DNS queries when queuing a new one (#1863)
Since 2005 commit 032785bf, previously queued serialized DNS query
(waiting for a TCP connection to the nameserver or for a write on an
already established TCP connection) was erased every time a new query
was serialized and appended to the `vc->queue` MemBuf. Thus, at most one
TCP query was submitted when the nameserver became available.
Non-serialized versions of erased queries remained on the DNS lru_list
and would eventually be retried or timeout. This change restores
conditional MemBuf initialization that was replaced with an
unconditional MemBuf reset in that 2005 commit.
It would take more than 5 hours of 1000/s unique request load for
100-byte DNS queries to overflow a 2GB MemBuf in a properly functioning
Squid, but this change does not rely on MEM_BUF_MAX_SIZE always being
large enough. New code prevents MemBuf::grow() assertions and informs
the admin about dropped queries that may indicate a Squid bug or a
persistent problem with nameserver communication.
Alex Rousskov [Thu, 11 Jul 2024 20:04:06 +0000 (20:04 +0000)]
Be more careful when updating nameservers global (#1862)
Due to I/O delays and timeouts, DNS nsvc objects may be deleted after
the `nameservers` global pointing to them was cleared and then populated
with new ns objects. Thus, nsvc destructor should check that nsvc and
`nameservers` states are still in sync before manipulating the latter.
DNS code should use similar validation in a few other places, but they
are all about read-only debugging that requires a rather noisy cleanup.
TrieNode::add() incorrectly computed an offset of an internal data
structure, resulting in out-of-bounds memory accesses that could cause
corruption or crashes.
This bug was discovered and detailed by Joshua Rogers at
https://megamansec.github.io/Squid-Security-Audit/esi-underflow.html
where it was filed as "Buffer Underflow in ESI".
Alex Rousskov [Mon, 20 May 2024 14:50:19 +0000 (14:50 +0000)]
Docs: REQUIRED in ident_regex, proxy_auth_regex, ext_user_regex (#1818)
The three ACLs were documented as matching any username when configured
with a parameter spelled "REQUIRED". Neither actually treated that
parameter specially -- all interpreted it as an ordinary regex.
This dangerous documentation bug was introduced in 2000 commit 145cf92
that added ident_regex and proxy_auth_regex support. It was then
duplicated in 2003 commit abb929f that added ext_user_regex support.
This minimal documentation fix does not imply that these ACLs should not
treat REQUIRED values specially. Enabling such special treatment
requires significant code changes, especially if we want to do that well
and without duplicating the corresponding code.
squidadm [Tue, 21 May 2024 20:46:47 +0000 (08:46 +1200)]
Fix build with clang v18 [-Wvla-cxx-extension] (#1813) (#1817)
src/fs/rock/RockRebuild.cc:356:17: error: variable length arrays
in C++ are a Clang extension [-Werror,-Wvla-cxx-extension]
char hdrBuf[SwapDir::HeaderSize];
note: initializer of 'HeaderSize' is unknown
Co-authored-by: Francesco Chemolli <5175948+kinkie@users.noreply.github.com>
When FwdState decides to re-forward a request, it forgets the original
response[^1] but does not create an error object. Since commit e2bbd3b,
FwdState::noteDestinationsEnd() correctly assumed that we only idly wait
for additional destinations after encountering a problem, but
incorrectly asserted that past problems imply error object existence.
Now Squid generates an HTTP 502 (Bad Gateway) response while setting
%err_code/%err_detail to ERR_CANNOT_FORWARD/REFORWARD_TO_NONE.
TunnelStateData::noteDestinationsEnd() code is similar, but it probably
does not suffer from the same bug because an error object is created
before every retryOrBail() call, and there is no re-forwarding code that
forgets an HTTP error response without creating an error. Those
invariants are not well tracked, and this change mimics FwdState changes
in TunnelStateData just in case and to keep the two methods in sync. In
those hypothetical cases, ERR_CANNOT_FORWARD/RETRY_TO_NONE is logged.
[^1]: Long-term we probably want to preserve that original response so
that we do not have to replace it with a generated error, but doing so
requires significant refactoring and is outside this minimal fix scope.
Alex Rousskov [Wed, 5 Jul 2023 14:41:47 +0000 (14:41 +0000)]
Do not leak memory when handling cache manager requests (#1408)
Also adjusted Cache-Control APIs to prevent similar bugs. These changes
also speed up processing a bit and simplify most of the affected code.
The now-gone "just remove the old CC" putCc() misfeature was unused.
The leak was introduced by commit 92a5adb: PutCommonResponseHeaders()
incorrectly assumed that putCc(pointerToX) takes ownership of X.
Detected by Coverity. CID 1534779: Resource leak (RESOURCE_LEAK).
Alex Rousskov [Fri, 1 Mar 2024 22:20:20 +0000 (22:20 +0000)]
Bug 5069: Keep listening after getsockname() error (#1713)
ERROR: Stopped accepting connections:
error: getsockname() failed to locate local-IP on ...
In many cases, these failures are intermittent client-triggered errors
(e.g., client shut down the accepted socket); Squid will successfully
accept other connections and, hence, should keep listening for them.
POSIX.1-2001 marks vfork(2) OBSOLETE.
POSIX.1-2008 removes the specification of vfork(2).
MacOS system headers declare vfork(2) as deprecated.
We only use vfork(2) in negotiate_wrapper, where it is not necessary.
Alex Rousskov [Fri, 23 Feb 2024 04:26:38 +0000 (04:26 +0000)]
Fix marking of problematic cached IP addresses (#1691)
Since inception in 2017 commit fd9c47d, Dns::CachedIps::have() always
returned position zero after finding a matching IP address (at zero or
positive position). The bug affected two callers:
* markAsBad() always marked the first stored address (as bad);
* forgetMarking() always cleared the first stored address marking.
Buggy markings led to Squid sometimes not attempting to use a working
address (e.g., IPv4) while using a known problematic one (e.g., IPv6).
Alex Rousskov [Sun, 18 Feb 2024 00:45:41 +0000 (00:45 +0000)]
Fix debugging for responses that Expire at check time (#1683)
Since 2000 commit 65fa5c6, our level-3 debugging mislead about Expires
being less than the check time when the two times were identical. The
actual checked conditions are correct: Roughly speaking, the response
with Expires value T is considered expired at that time T.
Also dropped extra (and inconsistent) trailing space on debugs() lines.
This space was added by the same 2000 commit, probably accidentally.
Alex Rousskov [Fri, 16 Feb 2024 04:03:40 +0000 (04:03 +0000)]
Bug 5344: mgr:config segfaults without logformat (#1680)
Since 2011 commit 38e16f9, Log::LogConfig::dumpFormats() dereferenced a
nil `logformats` pointer while reporting a non-existent logformat
configuration (e.g., squid.conf.default): `logformats->dump(e, name)`.
In most environments, that code "worked" because the corresponding
Format::Format::dump() method happens to do nothing if "this" is nil.
However, in some environments, Squid segfaulted.
Alex Rousskov [Thu, 8 Feb 2024 22:03:44 +0000 (22:03 +0000)]
Fix max-stale in default refresh_pattern (#1664)
RefreshPattern constructor must set data fields to honor refresh_pattern
defaults promised in squid.conf.documented. For max-stale, that implies
making max_stale negative. A negative value allows refreshCheck() to use
a max_stale directive value (i.e. Config.maxStale that defaults to 1
week). The buggy constructor set max_stale to 0 instead and, hence,
refreshCheck() ignored max_stale directive when no refresh_pattern rules
were configured.
The fixed bug did not affect Squids configured using explicit
refresh_pattern rules because those rules are handled by
parse_refreshpattern() which sets max_stale to -1 by default. Our
squid.conf.default does have explicit refresh_pattern rules.
Alex Rousskov [Tue, 31 Oct 2023 11:35:02 +0000 (11:35 +0000)]
Fix infinite recursion when parsing HTTP chunks (#1553)
This change stops infinite HttpStateData recursion with at-max-capacity
inBuf. Such inBuf prevents progress in the following call chain:
* processReply()
* processReplyBody() and decodeAndWriteReplyBody()
* maybeReadVirginBody()
* maybeMakeSpaceAvailable() -- tries but fails to quit processing
* processReply()
HttpStateData::maybeMakeSpaceAvailable() no longer calls processReply(),
preventing recursion.
maybeReadVirginBody() now aborts transactions that would otherwise get
stalled due to full read buffer at its maximum capacity. This change
requires that all maybeReadVirginBody() callers do actually need more
response data to make progress. AFAICT, that (natural) invariant holds.
We moved transaction stalling check from maybeMakeSpaceAvailable() into
its previous callers. Without that move, maybeMakeSpaceAvailable() would
have to handle both abortTransaction() and delayRead() cases. Besides
increased code complexity, that would trigger some premature delayRead()
calls (at maybeReadVirginBody() time). Deciding whether to delay socket
reads is complicated, the delay mechanism is expensive, and delaying may
become unnecessary by the time the socket becomes readable, so it is
best to continue to only delayRead() at readReply() time, when there is
no other choice left.
maybeReadVirginBody() mishandled cases where progress was possible, but
not _immediately_ -- it did nothing in those cases, probably stalling
transactions when maybeMakeSpaceAvailable() returned false but did not
call processReply(). This is now fixed: maybeReadVirginBody() now starts
waiting for the socket to be ready for reading in those cases,
effectively passing control to readReply() that handles them.
maybeReadVirginBody() prematurely grew buffer for future socket reads.
As a (positive) side effect of the above refactoring, we now delay
buffer growth until the actual read(2) time, which is best for
performance. Most likely, this premature buffer growth was an accident:
maybeReadVirginBody() correctly called maybeMakeSpaceAvailable() with
doGrow set to false. However, maybeMakeSpaceAvailable() misinterpreted
doGrow as a "do not actually do it" parameter. That bug is now gone.
This recursion bug was discovered and detailed by Joshua Rogers at
https://megamansec.github.io/Squid-Security-Audit/
where it was filed as "Chunked Encoding Stack Overflow".
Alexey [Mon, 29 Jan 2024 19:47:41 +0000 (19:47 +0000)]
Fix memory leak in ssl/gadgets/mimicAuthorityKeyId() (#1651)
An unnecessary std::unique_ptr::release() call prevented temporary
extOct string from being automatically deallocated. The leak usually
happened when SslBump mimicked certificates with an Authority Key
Identifier extension. The leak was added in 2016 commit 5f1318b.
Alexey [Sun, 21 Jan 2024 16:24:57 +0000 (16:24 +0000)]
NTLM/Negotiate: Fix crash on bad helper TT responses (#1645)
Helper lookup may be made without a client HTTP Request,
(stored in lm_request->request). But in Helper::TT cases the
lm_request->request was dereferenced without any checks.
Alex Rousskov [Sat, 9 Dec 2023 04:46:55 +0000 (04:46 +0000)]
Bug 5274: Successful tunnels logged as TCP_TUNNEL/500 (#1608)
Stop calling retryOrBail() when the tunneled Squid-server connection
(that we have committed to use) closes. Our retryOrBail() is dedicated
to handling errors. Most[^1] serverClosed() calls are _not_ related to
errors because our tunneling code abuses asynchronous connection closure
callbacks for TunnelStateData work termination. Depending on the
transaction details (e.g., TLS interception vs. true CONNECT), calling
retryOrBail() on these no-error code paths may result in retryOrBail()
"catch all other errors" code creating bogus ERR_CANNOT_FORWARD errors.
Most tunneling errors are already detailed, and retryOrBail() does not
have enough information to correctly detail the remaining ones anyway.
Removing this retryOrBail() call selects the arguably lesser evil.
The client-Squid connection closure callback, clientClosed(), already
uses the same logic.
This change does not resurrect Bug 5132 fixed by commit 752fa20 that
added the now-replaced retryOrBail() call to serverClosed(). That commit
fixed the leak by calling deleteThis() (via retryOrBail()). Our
finishWritingAndDelete() call preserves that logic. That commit also
claimed to allow more retries, but that claim was a mistake: To-server
closure callback registration (e.g. commitToServer()) bans retries.
[^1]: The fact that severClosed() is called for both successful and
problematic outcomes prevents TunnelStateData from properly handling
certain (rare) errors. We tried to fix that as well, but the changes
quickly snowballed, so we left a few XXXs instead.