aafbsd [Thu, 26 Jun 2025 18:57:19 +0000 (18:57 +0000)]
Bug 5497: Fix detection of duped IPs returned by getaddrinfo() (#2100)
WARNING: Ignoring <IP X> because it is already covered by <IP X>
Affects `src`, `dst`, and `localip` ACLs, especially those that use
domain names with multiple DNS A or AAAA records.
IP addresses returned by getaddrinfo(3) may not be sorted (e.g., when a
host has multiple DNS A RRs on FreeBSD). Instead of comparing the
current address with just the previous one, we now check all previously
added addresses (while processing a single getaddrinfo() call output).
This surgical fix minimizes changes without improving surrounding code.
Amos Jeffries [Sun, 22 Jun 2025 00:48:05 +0000 (00:48 +0000)]
Fix missing CONTRIBUTOR name (#2091)
Despite CONTRIBUTORS file being UTF-8 Ture's name
was dropped by our conversion script which does not
handle non-ASCII characters nicely. Re-add it manually.
Do not duplicate received Surrogate-Capability in sent requests (#2087)
When computing Surrogate-Capability header while forwarding an
accelerated request, Squid duplicated old (i.e. received) header entries
(if any). For example, this outgoing request shows an extra hop1 entry:
GET / HTTP/1.1
...
Surrogate-Capability: hop1="Surrogate/1.0"
Surrogate-Capability: hop1="Surrogate/1.0", hop2="Surrogate/1.0"
Amos Jeffries [Mon, 9 Jun 2025 04:49:04 +0000 (04:49 +0000)]
CI: Fix gperf 3.2 output filter (#2081)
gperf 3.2 now provides properly compiler and C++ version scoped
fallthrough attributes. Our filter to convert the gperf 3.1 and
older output for C++17 attribute requirements is now broken and
produces compiler errors due to listing '[[fallback]];' on two
consecutive lines.
Fix OpenSSL build with GCC v15.1.1 [-Wformat-truncation=] (#2077)
On arm64 Fedora 42:
src/ssl/crtd_message.cc:132:39: error: '%zd' directive output may be
truncated writing between 1 and 19 bytes into a region
of size 10 [-Werror=format-truncation=]
snprintf(buffer, sizeof(buffer), "%zd", body.length());
Carl Vuosalo [Wed, 28 May 2025 19:17:38 +0000 (19:17 +0000)]
Fix SNMP cacheNumObjCount -- number of cached objects (#2053)
SNMP counter cacheNumObjCount used StoreEntry::inUseCount() stats. For
Squid instances using a rock cache_dirs or a shared memory cache, the
number of StoreEntry objects in use is usually very different from the
number of cached objects because these caches do not use StoreEntry
objects as a part of their index. For all instances, inUseCount() also
includes ongoing transactions and internal tasks that are not related to
cached objects at all.
We now use the sum of the counters already reported on "on-disk objects"
and "Hot Object Cache Items" lines in "Internal Data Structures" section
of `mgr:info` cache manager report. Due to floating-point arithmetic,
these stats are approximate, but it is best to keep SNMP and cache
manager reports consistent.
This change does not fix SNMP Gauge32 overflow bug: Caches with 2^32 or
more objects continue to report wrong/smaller cacheNumObjCount values.
### On MemStore::getStats() and StoreInfoStats changes
To include the number of memory-cached objects while supporting SMP
configurations with shared memory caches, we had to change how cache
manager code aggregates StoreInfoStats::mem data collected from SMP
worker processes. Before these changes, `StoreInfoStats::operator +=()`
used a mem.shared data member to trigger special aggregation code hack,
but
* SNMP-specific code cannot benefit from that StoreInfoStats aggregation
because SNMP code exchanges simple counters rather than StoreInfoStats
objects. `StoreInfoStats::operator +=()` is never called by SNMP code.
Instead, SNMP uses Snmp::Pdu::aggregate() and friends.
* We could not accommodate SNMP by simply adding special aggregation
hacks directly to MemStore::getStats() because that would break
critical "all workers report about the same stats" expectations of the
special hack in `StoreInfoStats::operator +=()`.
To make both SNMP and cache manager use cases work, we removed the hack
from StoreInfoStats::operator +=() and hacked MemStore::getStats()
instead, making the first worker responsible for shared memory cache
stats reporting (unlike SMP rock diskers, there is no single kid process
dedicated to managing a shared memory cache). StoreInfoStats operator
now uses natural aggregation logic without hacks.
TODO: After these changes, StoreInfoStats::mem.shared becomes
essentially unused because it was only used to enable special
aggregation hack in StoreInfoStats that no longer exists. Remove?
Bug 5352: Do not get stuck in RESPMOD after pausing peer read(2) (#2065)
The transaction gets stuck if Squid, while sending virgin body bytes to
an ICAP RESPMOD service, temporary stops reading additional virgin body
bytes from cache_peer or origin server. Squid pauses reading (with
readSizeWanted becoming zero) if reading more virgin bytes is temporary
prohibited by delay pools and/or read_ahead_gap limits:
readReply: avoid delayRead() to give adaptation a chance to drain...
HttpStateData::readReply() starts waiting for ModXact to drain the
BodyPipe buffer, but that draining may not happen, either because
ModXact::virginConsume() is not called at all[^1] or because it is
"postponing consumption" when BodyPipe still has some unused space[^2].
With HttpStateData not reading more virgin bytes, Squid may not write
more virgin body bytes to the ICAP service, and the ICAP service may not
start or continue responding to the RESPMOD request. Without that ICAP
activity, ModXact does not consume, the virgin BodyPipe buffer is not
drained, HttpStateData is not reading, and no progress is possible.
HttpStateData::readReply() should start waiting for adaptation to drain
BodyPipe only when the buffer becomes completely full (instead of when
it is not empty). This change may increase virgin response body bytes
accumulation but not the buffer capacity because existing buffer
space-increasing logic in maybeMakeSpaceAvailable() remains intact.
To prevent stalling, both BodyPipe ends (i.e. HttpStateData and
Icap::ModXact) must use matching "progress is possible" conditions, but
* HttpStateData used hasContent()
* Icap::ModXact used spaceSize()
* Ftp::Client used potentialSpaceSize()
Now, all three use matching potentialSpaceSize()-based conditions.
Squid eCAP code is unaffected by this bug, because it does not postpone
BodyPipe consumption. eCAP API does not expose virgin body buffer
capacity, so an eCAP adapter that postpones consumption risks filling
the virgin body buffer and stalling. This is an eCAP API limitation.
[^1]: Zero readSizeWanted is reachable without delay pools, but only if
Squid receives an adapted response (that makes readAheadPolicyCanRead()
false by filling StoreEntry). Ideally, receiving an adapted response
should result in a virginConsume() calls (that would trigger BodyPipe
draining), but currently it may not. Reliably starting virgin data
consumption sooner is not trivial and deserves a dedicated change.
[^2]: ModXact postpones consumption to preserve virgin bytes for ICAP
retries and similar purposes. ModXact believes it is safe to postpone
because there is still space left in the buffer for HttpStateData to
continue to make progress. ModXact would normally start or resume
draining the buffer when sending more virgin bytes to the ICAP service.
Amos Jeffries [Sun, 18 May 2025 06:39:04 +0000 (06:39 +0000)]
Maintenance: Remove shared LDADD (#2058)
Most built binaries have a distinct set of dependencies and already have
their own foo_LDADD variables. Add a few variables to cover the
remaining binaries and stop setting an (incomplete) LDADD global.
Also removed unnecessary EXTRA_PROGRAMS because mem_node_test and splay
binaries are built unconditionally.
The FreeBSD project has promoted version 14.2 to stable.
Some packages we use are not compatible with version 14.1.
Upgrade the reference version we use, the action supports it
store/Disks.cc:690: error: argument 1 value 18446744073709551615
exceeds maximum object size 9223372036854775807
[-Werror=alloc-size-larger-than=]
const auto tmp = new SwapDir::Pointer[swap->n_allocated];
pconn.cc:43:53: error: argument 1 value 18446744073709551615 ...
theList_ = new Comm::ConnectionPointer[capacity_];
Andreas Weigel [Thu, 13 Mar 2025 11:30:28 +0000 (11:30 +0000)]
Fix tls-dh support for DHE parameters with OpenSSL v3+ (#1949)
# When applying tls-dh=prime256v1:dhparams.pem configuration:
WARNING: Failed to decode EC parameters 'dhparams.pem'
# When forcing the use of FFDHE with something like
# openssl s_client -tls1_2 -cipher DHE-RSA-AES256-SHA256 -connect...
ERROR: failure while accepting a TLS connection on:
SQUID_TLS_ERR_ACCEPT+TLS_LIB_ERR=A0000C1+TLS_IO_ERR=1
Squid `https_port ... tls-dh=curve:dhparams.pem` configuration is
supposed to support _both_ ECDHE and FFDHE key exchange mechanisms (and
their cipher suites), depending on client-supported cipher suites. ECDHE
mechanism should use the named curve (e.g., `prime256v1`), and FFDHE
mechanism should use key exchange parameters loaded from the named PEM
file (e.g., `ffdhe4096` named group specified in RFC 7919).
When 2022 commit 742236c added support for OpenSSL v3 APIs, new
loadDhParams() code misinterpreted curve name presence in `tls-dh` value
as an indication that the named parameters file contains ECDHE
parameters, setting OSSL_DECODER_CTX_new_for_pkey() type parameter to
"EC", and (when parameter file specified FFDHE details) triggering the
WARNING message quoted above.
Squid should not expect additional ECDHE parameters when the elliptic
curve group is already fully specified by naming it at the start of
`tls-dh` value. Squid now reverts to earlier (v4) behavior, where
the two mechanisms can coexist and can be configured separately as
described above:
Furthermore, updateContextEecdh() code in commit 742236c continued to
load parsed parameters using old SSL_CTX_set_tmp_dh() call but should
have used SSL_CTX_set0_tmp_dh_pkey() API because the type of parsed
parameters (i.e. DhePointer) have changed from DH to EVP_PKEY pointer.
This second bug affected configurations with and without an explicit
curve name in `tls-dh` value.
Also report a failure to load parsed parameters into TLS context.
CI: Do not classify "no failures" stats as test-build errors (#2001)
CppUnit tests emit a lot of "FAIL: 0" and "XFAIL: 0" lines, which are
incorrectly classified as errors by the test-builds.sh. Filter these
messages out as they are not indicative of problems.
MinGW: use nameless unions in ext_ad_group_acl (#2004)
ext_ad_group_acl was written in 2008 in C, and
it used the C variant of the Win32 API.
It was then ported to C++, but the API callers were
not updated to the C++ version of the API.
With more modern compilers, and
Squid enforcing more strict types and error handling,
it is no longer compiling.
This is part 1 of 2 of the fixes to make the helper build
again, the scope is to update Win32 API callers so they
use the C++ version of the API
Examples of fixed errors:
error: 'IADs' {aka 'struct IADs'} has no member named 'lpVtbl'
error: 'VARIANT' {aka 'struct tagVARIANT'} has no member named 'n1'
On Windows, mkdir only takes one argument.
compat/mswindows.h has an adapter, add it to
compat/mingw.h as well.
Solves error:
```
UFSSwapDir.cc:617:26: error: too many arguments
to function 'int mkdir(const char*)'
mingw/include/io.h:282:15: note: declared here
int __cdecl mkdir (const char *);
```
The AIO Windows compatibilty layer is also
necessary on mingw
Problems fixed:
```
DiskIO/AIO/async_io.h:58:18:
error: field 'aq_e_aiocb' has incomplete type 'aiocb'
DiskIO/AIO/async_io.h:58:12:
note: forward declaration of 'struct aiocb'
DiskIO/AIO/AIODiskFile.cc:
In member function
'virtual void AIODiskFile::read(ReadRequest*)':
src/DiskIO/AIO/AIODiskFile.cc:134:9:
error: 'aio_read' was not declared in this scope;
did you mean 'file_read' ?
```
ntlm_sspi_auth: Fix missing base64 symbol linkage (#2031)
Solve build error:
```
ld: ntlm_sspi_auth.o: in function `token_decode':
undefined reference to `nettle_base64_decode_init'
undefined reference to `nettle_base64_decode_update'
undefined reference to `nettle_base64_decode_final'
```
Level-2 "Tunnel Server RESPONSE:..." debugs() incorrectly assumed that
its readBuf parameter contained hdr_sz header bytes. In reality, by the
time code reached that debugs(), readBuf no longer had any header bytes
(and often had no bytes at all). Besides broken header dumps, this bug
could lead to problems that Valgrind reports as "Conditional jump or
move depends on uninitialised value" in DebugChannel::writeToStream().
This fix mimics HttpStateData::processReplyHeader() reporting code,
including its known problems. Future changes should address those
problems and reduce code duplication across at least ten functions
containing similar "decorated" level-2 message dumps.
Bug 5093: List http_port params that https_port/ftp_port lack (#1977)
To avoid documentation duplication, current https_port and ftp_port
directive descriptions reference http_port directive instead of
detailing their own supported parameters. For https_port, this solution
creates a false impression that the directive supports all http_port
options. Our ftp_port documentation is better but still leaves the
reader guessing which options are actually supported.
This change starts enumerating http_port configuration parameters that
ftp_port and https_port directives do _not_ support. Eventually, Squid
should reject configurations with unsupported listening port options.
Alex Rousskov [Tue, 31 Dec 2024 21:59:05 +0000 (21:59 +0000)]
Work around some mgr:forward accounting/reporting bugs (#1969)
In modern code, FwdReplyCodes[0][i] is usually zero because n_tries is
usually at least one at logReplyStatus() call time. This leads to
mgr:forward report showing nothing but table heading (i.e. no stats)
Also improve `try#N` heading:data match by skipping FwdReplyCodes[0]
reporting (there is still no `try#0` heading) and adding a previously
missing `try#9` heading
Alex Rousskov [Tue, 31 Dec 2024 20:40:46 +0000 (20:40 +0000)]
Clarify --enable-ecap failure on missing shared library support (#1968)
checking if libtool supports shared libraries... no
checking whether to build shared libraries... no
configure: error: eCAP support requires loadable modules.
Please do not use --disable-shared with --enable-ecap.
After 2022 commit 5a2409b7, our advice for handling the above error
became misleading in environments that do not --disable-shared
explicitly but lack shared libraries support for other reasons
Alex Rousskov [Tue, 31 Dec 2024 19:22:21 +0000 (19:22 +0000)]
Do not lookup IP addresses of X509 certificate subject CNs (#1967)
A true-vs-false `nodns` parameter value bug in a recent commit 22b2a7a0
caused, in some environments, significant startup delays and/or runtime
stalls because getaddrinfo(3) performed blocking DNS lookups when
parsing common names of X509 certificate subjects. Squid parses CNs when
loading configured and validating received certificates. Other side
effects may have included Squid-generated certificates having wrong
alternative subject names and/or wrong certificate validation results.
Negative names and context-disassociated boolean constants strike again!
Fortunately, associated problematic Ip::Address::lookupHostIP() will be
replaced when the existing Ip::Address::Parse() TODO is addressed.
Alex Rousskov [Tue, 31 Dec 2024 17:27:40 +0000 (17:27 +0000)]
REQMOD stuck when adapted request (body) is not forwarded (#1966)
Squid forwards request bodies using BodyPipe objects. A BodyPipe object
has two associated agents: producer and consumer. Those agents are set
independently, at different processing stages. If BodyPipe consumer is
not set, the producer may get stuck waiting for BodyPipe buffer space.
When producer creates a BodyPipe, it effectively relies on some code
somewhere to register a consumer (or declare a registration failure),
but automatically tracking that expectation fulfillment is impractical
For REQMOD transactions involving adapted request bodies, including ICAP
204 transactions, Client::startRequestBodyFlow() sets body consumer. If
that method is not called, there will be no consumer, and REQMOD may get
stuck. Many `if` statements can block request forwarding altogether or
block a being-forwarded request from reaching that method. For example,
adapted_http_access and miss_access denials block request forwarding
Without REQMOD, request processing can probably get stuck for similar
lack-of-consumer reasons, but regular requests ought to be killed by
various I/O or forwarding timeouts. There are no such timeouts for those
REQMOD transactions that are only waiting for request body consumer to
clear adapted BodyPipe space (e.g., after all ICAP 204 I/Os are over).
Relying on timeouts is also very inefficient
For a `mgr:mem` observer, stuck REQMOD transactions look like a ModXact
memory leak. A `mgr:jobs` report shows ModXact jobs with RBS(1) status
Report cache_peer context in probe and standby pool messages (#1960)
The absence of the usual "current master transaction:..." detail in
certain errors raises "Has Squid lost the current transaction context?"
red flags:
ERROR: Connection to peerXyz failed
In some cases, Squid may have lost that context, but for cache_peer TCP
probes, Squid has not because those probes are not associated with
master transactions. It is difficult to distinguish the two cases
because no processing context is reported. To address those concerns,
we now report current cache_peer probing context (instead of just not
reporting absent master transaction context):
ERROR: Connection to peerXyz failed
current cache_peer probe: peerXyzIP
When maintaining a cache_peer standy=N connection pool, Squid has and
now reports both contexts, attributing messages to pool maintenance:
ERROR: Connection to peerXyz failed
current cache_peer standby pool: peerXyz
current master transaction: master1234
The new PrecomputedCodeContext class handles both reporting cases and
can be reused whenever the cost of pre-computing detailCodeContext()
output is acceptable.
CI: Add OpenBSD build tests for staged commits (#1964)
Use a GitHub-hosted VM to create OpenBSD test environment. Requires
GitHub repository configuration that permits the use of
`vmactions/openbsd-vm@v1` Action.
We have not enabled ccache optimization for OpenBSD tests because we
do not know how to copy updated ccache files from VM back into runner.
CI: Add GitHub Actions workflow for periodic Coverity Scan (#1958)
Implement a weekly scheduled GitHub Actions workflow to run Coverity
Scan (i.e. cov-build). Currently, we run Coverity Scan using Jenkins.
The new job uses the Squid Project pre-made docker image because
installing the tools required to use free Coverity Scan service cannot
be easily automated at the moment.
The job only runs for the official Squid Project repository.
Portability: remove explicit check for libdl (#1963)
OpenBSD does not have libdl, as it has dlopen() in libc.
It is not really needed, and force-requiring the presence of libdl
causes ./configure to fail on openbsd:
checking for dlopen in -ldl... no
configure: error: Required library 'dl' not found
stat5minClientRequests() was to meant to return the number of recent
client requests. However, the function did not provide implied 5 minute
precision. It returned, roughly speaking, the number of requests during
the last 0-6 minutes. The new, less strict function name and boolean
type avoid this false precision implication.
Squid may crash when the SquidConfig global is auto-destructed after
main() ends. Since SquidConfig global is used by cleanup code, we should
keep its fields alive, essentially emulating "No New Globals" policy
effects. This surgical fix will be followed up with more changes to
address general OpenSSL cleanup problems exposed by this bug.
This bug fix facilitates backporting by using FuturePeerContext shim.
Alex Rousskov [Wed, 27 Nov 2024 02:32:02 +0000 (02:32 +0000)]
Treat responses to collapsed requests as fresh (#1927)
Squid transactions involving collapsed requests receive their responses
as Store cache hits. Cache hits undergo mandatory freshness validation
checks and, if those checks detect a stale response, may trigger
revalidation (e.g., an If-Modified-Since request to the origin server).
This logic results in a still-being-delivered-to-Squid response
triggering its own revalidation if that response is deemed stale on
arrival by collapsed request (e.g., has a Cache-Control: max-age=0
header field).
HTTP RFC 9111 Section 4.7 briefly mentions collapsed requests but is
ambiguous with regard to validation of responses to collapsed requests.
IETF HTTP Working Group chair believes that specs are unclear, and that
these responses should not be treated as cache hits (in primary cases):
https://lists.w3.org/Archives/Public/ietf-http-wg/2024JanMar/0095.html
This change follows the above advice and prevents arguably excessive
freshness checks for responses to collapsed requests. This change is
collapsed-forwarding specific: It does not prevent freshness checks for
responses that were, at the time of a hit request, either fully cached
or still receiving response body bytes.
After this change, clientReplyContext can no longer collapse twice, once
after initial Store lookup and then again during refresh, because the
first collapse now precludes refresh.
Simplified quick_abort_pct code and improved its docs (#1921)
Instead of ignoring quick_abort_pct settings that would, together with
other conditions, abort a pending download of a 99-byte or smaller
response, Squid now honors quick_abort_pct for all response sizes. Most
Squids are not going to be affected by this change because default
quick_abort_min settings (16KB) prevent aborts of 99-byte responses even
before quick_abort_pct is checked.
Due to conversion from integer to floating point math, this change may
affect responses larger than 99 bytes as well, but these effects ought
to be limited to cases where the decision is based on a tiny difference
(e.g., receiving 1% more bytes would have triggered full download). In
most such cases, the decision could probably go either way due to
response header size fluctuations anyway.
Also updated quick_abort_pct documentation, primarily to clarify a
misleading statement: Squid did not and does not treat 16KB or smaller
responses specially in this context. The original statement was probably
based on quick_abort_min _default_ setting of 16KB, but statement
phrasing and placement hid that connection.
Tony Walker [Sat, 16 Nov 2024 22:10:39 +0000 (22:10 +0000)]
Bug 5363: Handle IP-based X.509 SANs better (#1793)
Most X.509 Subject Alternate Name extensions encountered by Squid are
based on DNS domain names. However, real-world servers (including
publicly available servers that use vanity IP addresses) also use
IP-based SANs. Squid mishandled IP-based SANs in several ways:
* When generating certificates for servers targeted by their IP
addresses, addAltNameWithSubjectCn() used that target IP as a
DNS-based SAN, resulting in a frankenstein DNS:[ip] SAN value that
clients ignored when validating a Squid-generated certificate.
* When validating a received certificate, Squid was ignoring IP-based
SANs. When Subject CN did not match the requested IP target, Squid
only looked at DNS-based SANs, incorrectly failing validation.
* When checking certificate-related ACLs like ssl::server_name,
matchX509CommonNames() ignored IP-based SANs, not matching
certificates containing ACL-listed IP addresses.
Squid now recognizes and generates IP-based SANs.
Squid now attempts to match IP-based SANs with ACL-listed IP addresses,
but the success of that attempt depends on whether ACL IP parameters are
formatted the same way inet_ntop(3) formats those IP addresses: Matching
is still done using c-string/domain-based ::matchDomainName() (for
ssl::server_name) and string-based regexes (for ssl::server_name_regex).
Similar problems affect dstdomain and dstdomain_regex ACLs. A dedicated
fix is needed to stop treating IPs as domain names in those contexts.
This change introduces partial support for preserving IP-vs-domain
distinction in parsed/internal Squid state rather than converting both
to a string and then assuming that string is a DNS domain name.
Alex Rousskov [Wed, 13 Nov 2024 18:16:08 +0000 (18:16 +0000)]
Annotate PoolMalloc memory in valgrind builds (#1946)
MemPoolMalloc code (i.e. memory_pools code used by default) was missing
VALGRIND_MAKE_MEM_*() calls. Similar calls do exist in MemPoolChunked
code (i.e. code enabled by setting MEMPOOLS environment variable to 1).
Even with these markings, "memory_pools on" configuration is still not
quite compatible with Valgrind leak reporting suppressions: In some
cases, Valgrind may still incorrectly suppress leak reporting (or report
leaks that should have been suppressed) because Valgrind associates leak
suppressions with memory _allocators_ while buggy code may leak memory
allocated by others. The long-term solution (if it exists) requires
upgrading these markings to VALGRIND_MEMPOOL_*() API targeting memory
pools, but that requires a serious effort, especially when dealing with
MemPoolChunked complexities. The added markings help detect non-leak
violations and improve PoolMalloc/MemPoolChunked code symmetry.
CI: Cancel obsolete concurrent GitHub Actions workflow jobs (#1940)
If a new push happens to a staging branch or a PR branch, continuing to
run now-obsolete tests is pointless and wasteful. However, we do want to
finish any jobs running on previous master branch commits, so that every
master branch commit has full test results.
Alex Rousskov [Mon, 11 Nov 2024 16:17:09 +0000 (16:17 +0000)]
CI: Use ccache-action repo maintained by its original author (#1943)
squid-cache/ccache-action@v1.2.14 is not allowed to be used in ...
For recent commit 627cca6d, we cloned hendrikmuhs/ccache-action
repository because GitHub Actions prohibited usage of that repository in
Squid Project CI tests. Cloning worked around that restriction, but we
did not realize that there are other solutions, and that cloning forces
all other Squid repositories to either clone squid-cache/ccache-action
or permit squid-cache/ccache-action execution by other means.
To reduce the number of obscure repositories Squid Project and others
have to deal with, it is better to adjust repository configuration to
allow well-known hendrikmuhs/ccache-action in "Actions permissions" at
https://github.com/.../squid/settings/actions
Also prevent nil connection pointer dereference and setsockopt() calls
with negative FD in comm_reset_close() and old_comm_reset_close().
It is unknown whether such bugs could be triggered before these changes.
Also removed fde::flags::nolinger as unused since 1999 commit 2391a162.
Amos Jeffries [Tue, 5 Nov 2024 16:46:41 +0000 (16:46 +0000)]
Fix systemd startup sequence to require active Local Filesystem (#1937)
Squid requires Local Filesystem to be active. While uncommon, it
may in some cases be incomplete or delayed. Ensure that the
dependency is explicitly listed to prevent failure from Squid
early initialization.
This change resolves Debian Bug 956581:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=956581
Nil request dereference in ACLExtUser and SourceDomainCheck ACLs (#1931)
ACLExtUser-based ACLs (i.e. ext_user and ext_user_regex) dereferenced a
nil request pointer when they were used in a context without a request
(e.g., when honoring on_unsupported_protocol).
SourceDomainCheck-based ACLs (i.e. srcdomain and srcdom_regex) have a
similar bug, although we do not know whether broken slow ACL code is
reachable without a request (e.g., on_unsupported_protocol tests cannot
reach that code until that directive starts supporting slow ACLs). This
change does not start to require request presence for these two ACLs to
avoid breaking any existing configurations that "work" without one.
Writing a nil c-string to an std::ostream object results in undefined
behavior. When Security::IoResult::errorDescription is nil, that UB
leads to crashes for some STL implementations. These changes avoid UB
while making higher-level reporting code simpler and safer.
This change alters affected level-1 ERROR test lines a little, including
removing duplicated connection information from clientNegotiateSSL()
message (cache_log_message id=62). That duplication existed because
Squid reports the same Connection info automatically via CodeContext.
New WithExtras() mechanism may be useful for other "low-level debugging
and level-0/1 details for admins ought to differ" cases as well. Today,
the only known debugging context for Security::IoResult is
Security::PeerConnector::suspendNegotiation(), but that is likely to
change as we upgrade TLS callers to library-independent wrappers beyond
the current Security::Accept() and Security::Connect() pair.
Ftp::Gateway may segfault in level-3 double-complete debugs() (#1923)
Ftp::Gateway::completeForwarding() must check data.conn pointer before
dereferencing it. Long-term, we should improve Comm::ConnectionPointer
printing to safely report Connection::id (where available). This minimal
fix just mimics existing Ftp::Relay::abortOnData() solution.
Use GitHub Actions for more OS-specific build tests (#1917)
Test staged commits using some of our docker images from Jenkins tests.
The added tests (see staged.yaml) were easier to adopt; we are also
working on enabling GitHub Actions for FreeBSD and some other images.
We do not run these added tests for PR commits because existing Ubuntu
tests (defined in pr.yaml) already expose the vast majority of build
problems, and we are worried that running a lot more tests for each PR
push event would consume too much GitHub resources and significantly
increase noise in PR checks summary, obscuring often-easier-to-handle
failures detected by Ubuntu tests.
Also postpone MacOS tests until PR staging. On GitHub, MacOS runners are
x10 more expensive than Linux runners. We use cheaper runners for fast
feedback while still checking MacOS build before each merged commit.
This change does not increase GitHub CI wait time for PR push tests
because those checks are dominated by the unchanged 35min CodeQL-tests
job. However, it reduces total CPU time used (from 2h 30m to 2h) because
we no longer perform 3 MacOS tests for PR push events.
This change doubles GitHub CI wait time for staged commit tests (from
35m to 1h) and drastically increases total CPU time used (from 2h to
17h), primarily due to 84 added docker-based linux-distros checks.
GitHub does not provide any riscv64 runners and free Linux/arm64
runners. Our initial attempts to virtualize Linux/arm64 tests on
MacOS/arm64 runners and to make virtualized FreeBSD and OpenBSD tests
work on Linux/x64 were not successful. Thus, we still rely on Jenkins
for Linux/riscv64, Linux/arm64, FreeBSD/x64, and OpenBSD/x64 tests.
uhliarik [Fri, 11 Oct 2024 03:31:19 +0000 (03:31 +0000)]
Bug 5449: Ignore SP and HTAB chars after chunk-size (#1914)
Prior to 2023 commit 951013d0, Squid accepted Transfer-Encoding chunks
with chunk-size followed by spaces or tabs (before CRLF). This HTTP
syntax violation was allowed to address Bug 4492 (fixed in 2017 commit 26f0a359). This change restores that fix functionality. FWIW, our
research shows that nginx and httpd also accept similar input.
Fix validation of Digest auth header parameters (#1906)
Insufficient validation of Digest authentication parameters resulted in
a DigestCalcHA1() call that dereferenced a nil pointer.
This bug was discovered and detailed by Joshua Rogers at
https://megamansec.github.io/Squid-Security-Audit/ where it was filed as
"strlen(NULL) Crash Using Digest Authentication".
Alex Rousskov [Sat, 5 Oct 2024 16:18:33 +0000 (16:18 +0000)]
Prohibit bad --enable-linux-netfilter combinations (#1893)
Since 2009 commit 51f4d36b, Squid declared that "all transparent build
options are independent, and may be used in any combination". That
declaration was accurate from a "successful build" point of view, but
Ip::Intercept::LookupNat() (and its precursors) started mishandling at
least two combinations of options as detailed below.
LookupNat() tries up to four lookup methods (until one succeeds):
The first method -- NetfilterInterception() -- fails to look up the true
destination address of an intercepted connection when, for example, the
client goes away just before the lookup. In those (relatively common in
busy environments) cases, the intended destination address cannot be
obtained via getsockname(), but LookupNat() proceeds calling other
methods, including the two methods that may rely on getsockname().
Methods 2 and 3 may rely on a previous getsockname() call to provide
true destination address, but getsockname() answers are not compatible
with what NetfilterInterception() must provide -- the destination
address returned by getsockname() is Squid's own http(s)_port address.
When Squid reaches problematic code now encapsulated in a dedicated
UseInterceptionAddressesLookedUpEarlier() function, Squid may end up
connecting to its own http(s)_port! Such connections may cause
forwarding loops and other problems. In SslBump deployments, these loops
form _before_ Forwarded-For protection can detect and break them.
These problems can be prevented if an admin does not enable incompatible
combinations of interception lookup methods. The relevant code is
correctly documented as "Trust the user configured properly", but that
statement essentially invalidates our "may be used in any combination"
design assertion and leads to runtime failures when user configured
improperly. Those runtime failures are difficult to triage because they
lack signs pointing to a build misconfiguration.
This change bans incompatible NetfilterInterception()+getsockname()
combinations at compile time: Squid ./configured with
--enable-linux-netfilter cannot use --enable-ipfw-transparent or
(--enable-pf-transparent --without-nat-devpf).
TODO: Ban incompatible combinations at ./configure time as well!
We have considered and rejected an alternative solution where all
./configure option combinations are still allowed, but LookupNat()
returns immediately on NetfilterInterception()/SO_ORIGINAL_DST failures.
That solution is inferior to build-time bans because an admin may think
that their Squid uses other configured lookup method(s) if/as needed,
but Squid would never reach them in --enable-linux-netfilter builds.
The only viable alternative is to specify lookup methods in squid.conf,
similar to the existing "tproxy" vs. "intercept" http(s)_port modes. In
that case, squid.conf will be validated for incompatible method
combinations (if combinations are supported at all), and LookupNat()
will only use configured method(s).
Do not mark successful FTP PUT entries with ENTRY_BAD_LENGTH (#1904)
Since 2021 commit ba3fe8d, we explicitly mark complete responses and
treat all other responses as truncated. That commit missed a case where
the FTP server responds with 226 or 250 code after receiving the upload.
The bug affects HTTP PUT requests using ftp URI scheme.
Incorrect truncation marking adds an unwanted WITH_CLIENT %err_detail to
ERR_FTP_PUT_CREATED transaction records in access.log:
201 PUT ftp://... ERR_FTP_PUT_CREATED/FTP_REPLY_CODE=226+WITH_CLIENT
Fixed code logs:
201 PUT ftp://... ERR_FTP_PUT_CREATED/FTP_REPLY_CODE=226
Squid builds without pkg-config, but results are likely to surprise
administrators because many optional features will not be
default-enabled despite properly installed libraries.
ESI feature has a number of bugs and security vulnerabilities.
It is also rarely used and a survey of active community members
has not revealed a need to keep maintaining this code.
Alex Rousskov [Fri, 20 Sep 2024 09:35:04 +0000 (09:35 +0000)]
Use ERR_ACCESS_DENIED for HTTP 403 (Forbidden) errors (#1899)
... when request authentication fails. Do not use
ERR_CACHE_ACCESS_DENIED for those "permanent" errors.
Default ERR_CACHE_ACCESS_DENIED is meant for cases where the user is
likely to eventually gain access (e.g., by supplying credentials). Its
default text says "not currently allowed... until you have authenticated
yourself". When the error page was added in 1998 commit cb69b4c7 it was
only used for HTTP 407 errors. The same logic was preserved when that
code was refactored in 1999 commit 1cfdbcf0, but exceptions started to
creep in, perhaps accidentally, since 2011 when HTTP 403 case was added
in commit 2f1431ea that introduced USE_AUTH macro. 2011 commit 21512911
added a similar "not possible to authenticate" SslBump case.
Other HTTP 403 (Forbidden) cases already use ERR_ACCESS_DENIED or a
similar "permanent" error (e.g., ERR_FORWARDING_DENIED or ERR_TOO_BIG).
It is still possible to customize the returned error page via deny_info.
The replaced assertion was wrong because a stale entry may be aborted
while we are revalidating it. The exact real-world conditions that
triggered this assertion are unknown, but many conditions lead to
aborted entries. The assertion can be triggered in lab tests using a hit
transaction that collapses on a miss transaction that runs into storage
problems (e.g., a memory cache that ran out of usable space).
Also adjusted cacheHit() to avoid similar problems. We have not
reproduced them, but no code maintains the asserted invariant on the
cacheHit() path either. Moreover, async-called cacheHit() initiates
processExpired() that leads to problematic sendClientOldEntry() call, so
seeing an aborted StoreEntry at cacheHit() time is probably a matter of
sufficient concurrency levels and asynchronous callback delays.
When establishing a TLS connection to an origin server _through_ a
cache_peer, Security::CreateClientSession() used cache_peer's
Security::PeerOptions instead of global ProxyOutgoingConfig (i.e.
tls_outgoing_options). Used cache_peer's PeerOptions are unrelated to
the (tunneled) TLS connection in question (and are currently empty
because Squid does not support TLS inside TLS -- the cache_peer accepts
plain HTTP connections).
This TLS context:options mismatch exists in both OpenSSL and GnuTLS
builds, but it currently does not affect OpenSSL builds where
CreateSession() gets TLS options from its Security::Context parameter
rather than its (unused) Security::PeerOptions parameter.
The mismatch exists on both PeekingPeerConnector/SslBump and
BlindPeerConnector code paths.
This minimal change pairs TLS context with its TLS options. Long-term,
the added FuturePeerContext shim (that stores references to independent
context and options objects) should be replaced with a PeerContext class
that encapsulates those two objects. We may also be able to avoid
re-computing GnuTLS context from options and to simplify code by
preventing PeerConnector construction in Squid builds that do not
support TLS. That refactoring should be done separately because it
triggers many changes unrelated to Bug 5293.
Also removed updateSessionOptions() from
PeekingPeerConnector::initialize() because Security::CreateSession(),
called by our parent initialize() method, already sets session options.
It is easier to remove that call/code than keep it up to date.
Security::BlindPeerConnector does not updateSessionOptions().
After a ReadNow() call, the buffer length must not exceed accumulation
limits (e.g., client_request_buffer_max_size). SBuf::reserve() alone
cannot reliably enforce those limits because it does not decrease SBuf
space; various SBuf manipulations may lead to excessive SBuf space. When
filled by ReadNow(), that space exceeds the limit.
This change uses documented CommIoCbParams::size trick to limit how much
Comm::ReadNow() may read, obeying SQUID_TCP_SO_RCVBUF (server-to-Squid)
and client_request_buffer_max_size (client-to-Squid) accumulation limit.
Alex Rousskov [Thu, 5 Sep 2024 17:46:20 +0000 (17:46 +0000)]
Bug 5417: An empty annotation value does not match (#1896)
Helpers may return annotations with empty values:
OK team_=""
A note ACL may be configured to match an annotation with an empty value:
configuration_includes_quoted_values on
acl emptyTeam note team_ ""
However, that emptyTeam ACL did not match the above helper annotation:
* AppendTokens() split an empty annotation value into an empty vector
instead of a vector with a single empty entry. That "never match an
empty value received from the helper" bug was probably introduced in
2017 commit 75d47340 when it replaced an "always try to match an empty
value, even when it was not received from the helper" bug in
ACLNoteStrategy::matchNotes().
* ACLStringData::match(SBuf v) never matched an empty value v. That bug
was probably introduced in 2015 commit 76ee67ac that mistook a nil
c-string pointer for an empty c-string.