Alex Rousskov [Mon, 10 Oct 2022 04:24:22 +0000 (04:24 +0000)]
Start using Github Actions for CI tests (#1160)
This change imports existing Sempaphore CI functionality test harness
and migrates it to Github Actions. Using a stand-alone harness offered
many advantages, but that approach goes against the current custom of
importing CI configurations into being-tested repositories. Semaphore CI
itself switched to such invasive configurations since v2. We have to
start using invasive CI configurations to be able to evolve Squid CI and
solve known CI problems.
While importing the test harness, we made basic CI functionality tests
available from the command line so that developers can run them even
before submitting a PR and while addressing reviewer concerns. They do
require internet access because the tests are downloaded from GitHub,
but they avoid permanent service installations in developer environment.
The tests do create/overwrite logs and start services like Squid. A lot
more work is still required to make these tests user friendly.
The new test configuration/harness consists of two primary parts:
* Github Actions "workflow" configuration, including embedded scripts.
* Various test scripts used by the "workflow" configuration above.
There are many ways to split the test harness among those two parts.
This particular split is based on these split-specific principles/ideas:
1. When running tests locally, developers do not want to risk
disruptive, permanent, potentially dangerous actions such as globally
installing new packages or overwriting checked out Squid sources.
These actions belong to the workflow configuration file that is only
interpreted by Github Actions, on Github virtual machines. The test
scripts have only one sudo command (to start a Squid manipulating
agent as nobody) which is bypassed of the agent is already running.
2. Developers should be able to repeat basic tests from the command
line, against their version of Squid sources, their Squid builds, and
without learning the ever-changing testing details. If their PR has
failed a CI test, they may simple want to run the corresponding
script locally to reproduce the issue. This means that nearly all
testing logic belongs to the test scripts, not the Actions
configuration file (that developers cannot easily interpret/apply).
We also thought that some developers would want to separate
functionality from code quality tests (because those are often applied
at different development stages) and to repeat (many times) just the
failed test case (because that often helps when triaging and fixing
bugs). These considerations resulted in the creation of two scripts,
each accepting individual test case names as optional parameters:
When a cache_peer is (re)configured, Squid probes its address using a
throw-away TCP connection. If and only if that TCP probe succeeds, peer
selection algorithms may pick the peer. Until then, the cache_peer is
treated as if it was DEAD.
This change preserves the logic summarized above but starts _reporting_
the initial probe failure and the fact that it marks the cache_peer
DEAD. Without these reports, admins often naturally assume that the
cache_peer is alive, especially if traffic can be served without it.
De-duplicate the #include sequence into a libcompat header to
ensure all uses of the hack(s) including autoconf test macros
are using the same logic. This fixes some macros which were
using inconsistent logic for Solaris.
Simplify the if-elif use of __cplusplus detection. All the
extern C hacks apply only when C++ is used.
Wrapping the entire file peer_proxy_negotiate_auth.cc with
'extern C {...}' is unnecessary (it is a .cc) and breaks the
Solaris workaround.
Add --with-ldap and pkg-config support to speed up Squid builds
where LDAP is not to be used. This also adds support for custom
LDAP library locations like other --with-foo options.
'pkg-config --libs ldap' finds both -lldap and -llber. Stop using
different variables for them in Makefile.am.
Extract LDAP API tests into simpler dedicated macros and stop
polluting LIBS when running LDAP tests.
Alex Rousskov [Thu, 22 Sep 2022 03:41:10 +0000 (03:41 +0000)]
Fix ./configure detection of libxml2 headers (#1153)
checking for LIBXML2... yes
checking libxml/parser.h usability... yes
checking libxml/parser.h presence... no
configure: WARNING: libxml/parser.h: accepted by the compiler,
rejected by the preprocessor!
configure: WARNING: libxml/parser.h: proceeding with the compiler's
result
PKG_CHECK_MODULES() documentation warns that it uses a "misleading" name
for the foo_CFLAGS variable it sets: "the variable provides the flags to
pass to the preprocessor, rather than the compiler". Lots of Squid code
has been mislead by that macro, but this fix is specific to a recent
regression introduced in commit 866a092. TODO: Fix all others.
Alex Rousskov [Wed, 21 Sep 2022 23:53:12 +0000 (23:53 +0000)]
Fix ./configure tests (#1151)
Many tests broken by recent commit a1c2236 supplying empty input to
AC_LINK_IFELSE() macro in SQUID_CC_CHECK_ARGUMENT. Empty input results
in no test file created by the macro (and, presumably, a failed linking
test). Observable symptoms (in some environments) include many repeated
errors on ./configure stderr:
sed: can't read conftest.cpp: No such file or directory
Also fixed a similar SQUID_SEARCH_LIBS() bug. That macro was broken
since inception (commit 391f0ba). It is currently only used for the
mingw-specific SQUID_CHECK_WINSOCK_LIB check. This change is untested.
Alex Rousskov [Wed, 21 Sep 2022 13:53:22 +0000 (13:53 +0000)]
Fix escaping leading regex dashes in ./configure sources (#1152)
grep: invalid option -- 't'
Usage: grep [OPTION]... PATTERNS [FILE]...
Try 'grep --help' for more information.
Usage: grep [OPTION]... PATTERNS [FILE]...
Try 'grep --help' for more information.
./configure: line 27524: test: too many arguments
Usage: grep [OPTION]... PATTERNS [FILE]...
Try 'grep --help' for more information.
./configure: line 27698: test: too many arguments
Configure scripts are written in m4, not shell. M4 treats [square
brackets] specially. We could use quadrigraphs instead (as documented in
autoconf manual) but `(-)` looks a lot simpler than `@<:@-@:>@`!
IPC code creating a TCP socket pair for communication with helpers used
comm_open() to create a _listening_ socket, misleading Comm into
applying IP_BIND_ADDRESS_NO_PORT optimization that delays port
assignment until connect(2) (which is never called for listening
sockets). Listening sockets must be opened using comm_open_listener(),
but the IPC use case was missed in 2009 commit 31be869 that introduced
comm_open_listener(). IP_BIND_ADDRESS_NO_PORT changes in recent commit 74d7d19 hit that bug.
Disclaimer: Windows-specific code adjusted without testing.
... to use pkg-config compatible naming and take advantage of
squid provided macros better.
Remove unnecessary path detection of /usr vs /usr/local
locations. Admin can use --with-foo=PATH to provide any needed
customization of those from whichever is standard for the build
environment.
... to use pkg-config compatible variable naming and break out
the code test to a dedicated macro per the style used by other
libraries.
Also, add support for pkg-config library auto-detection.
Also, our libcompat provides local replacement for Base64 on
builds with older libnettle. Fix the library link order to
ensure our functions get linked properly.
Alex Rousskov [Thu, 15 Sep 2022 07:16:50 +0000 (07:16 +0000)]
Do not lock an SMP hit twice for the same transaction (#1134)
Before commit 4310f8b, anchorToCache() (known then as anchorCollapsed())
was only called for not-yet-anchored entries. For anchored entries, we
called updateCollapsed(). In that commit, a new anchorToCache() caller
(i.e. allowSharing() after peek()) started calling anchorToCache() for
all entries, but anchorToCache() code was not updated to skip anchoring
of anchored entries, leading to an extra read lock placed on a peek()ed
hit entry. The extra lock prevented the entry slot from being reused for
other entries when the entry was later purged (i.e. marked for removal),
drastically decreasing hit ratio in some environments.
Fixed anchorToCache() no longer anchors anchored entries. To make sure
we covered all callers/cases, especially in patches, we changed the call
parameters, addressing an old code simplification TODO.
This warning is a false positive. Introduced in GCC v12, use-after-free
checks have attracted dozens of bug reports, with several reports still
open as of GCC v12.2, including one that looks very similar to our case:
https://gcc.gnu.org/bugzilla//show_bug.cgi?id=106776
Removing (excessive) "inline" works around this GCC bug. We do not have
a better workaround right now. If we see more false positives like this,
we may disable -Wuse-after-free until its support matures.
Fix 'AC_LINK_IFELSE was called before AC_USE_SYSTEM_EXTENSIONS' (#1139)
... produced during bootstrap.sh by autoconf 2.71
AC_USE_SYSTEM_EXTENSIONS needs to be placed after detecting
which OS and compiler are being used. But before any of the
checks detecting capabilities of those OS and compilers.
AC_CANONICAL_HOST is a prerequisite of the simple_host_os logic.
Attach it to that code block instead of the compiler detection
block.
Replace deprecated '$GCC' variable with autoconf cached
'$ac_cv_c_compiler_gnu' variable used as the real source for the
yes/no value of GCC usability after AC_PROG_CC.
This macro performs a very specific validation check but was
passed a mix of different error messages to display on failure.
Many of which were unclear or wrong.
Move the text of the error inside the macro to enforce
consistency and accuracy about what failed. It actually only
needs to be passed the name of the ./configure options which
was configured badly.
Also, the options configured can only take the positive form
when they fail '--with-foo=X' or 'enable-foo=X' so update the
second argment to always show that form regardless of whether
the configure.ac logic has it default enabled or disabled.
Also, fixed a bug in --with-nat-devpf validation which was
testing the wrong variable (enableval). This bug only affects
the errors displayed so no builds were significantly affected.
Replaced FDECB callback with a basic async call (#1135)
The "FD passing callback" was misleading because it was not used as a
callback. We just need an async call with a known-to-the-caller
parameter instead. Simplifying CommCalls (by removing this "special
callback") helps improve that API.
Alex Rousskov [Sun, 4 Sep 2022 23:40:41 +0000 (23:40 +0000)]
Replaced most custom high-level callbacks with a unified API (#1094)
Regular asynchronous calls remember call arguments at call creation
time, when the type of the call is well known. Service callbacks do not
know call arguments until it is time for the service to (asynchronously)
call the previously supplied callback. Thus, the service has to get
access to call parameters inside the stored generic (from service
p.o.v.) async call pointer. Services got that access by declaring a
custom callback dialer class and expecting the service user code to use
that dialer when creating an async call object. That was problematic:
1. Each service class had to create nearly identical dialer classes, at
least one for each unique answer type. Code duplication was rampant.
2. Each service dialer class was specific to one service user type
(e.g., HappyConnOpener answer dialer did not support AsyncJob users).
3. A compiler could not check whether the service got the right
callback: A dynamic_cast operation converted a generic dialer to the
service-declared dialer class, asserting the right type at runtime.
This change replaces all but one custom service callback with three
general service-independent dialers, one for each callback style: a job
call, a call to a cbdata-protected class method, and a stand-alone
function call. This unified API solves problem 1 mentioned above. A new
Callback template supports all three dialers (addressing problem 2) and
checks the callback answer type at compile time (addressing problem 3).
Replaced custom HappyConnOpener, Security::PeerConnector, Downloader,
CertValidationHelper, Http::Tunneler, Ipc::Inquirer, and most
Ipc::StartListening() callbacks (and more such callbacks would have been
added by upcoming PRs!). Kept ListeningStartedDialer as detailed below.
Replaced Ipc::Inquirer callback with a regular call: The class does not
need a callback mechanism because it can (and should and now does)
create an async call from scratch when the call arguments become known.
Kept ListeningStartedDialer because client_side.cc relies on delivering
additional, non-answer parameters with the callback. Normally, this
additional information is stored inside the job object. In this case, no
job awaits the port opening, so we use a custom dialer hack to keep that
information during the wait. Improving that code is out of scope. This
may be the only custom high-level callback left in the code!
The asyncCall() return type improvement preserves the exact AsyncCall
type for the Callback constructor to pick up. It will also remove
explicit/imprecise AsyncCall::Pointer uses throughout the code.
Also fixed reporting of small responses received by Downloader.
Unfortunately, we cannot easily unify the old JobCallback() with the new
asyncCallback() because we do not yet know how to detect that we need to
use one of the Comm-specific dialers (e.g., CommCbMemFunT) that feed the
job pointer to the CommCommonCbParams constructor. CommCommonCbParams do
not have a default constructor because they require, probably
incorrectly, that the recipient (e.g., job) pointer is stored together
with the future callback argument(s). This is why JobCallback() requires
a Dialer as a parameter -- it does not know how to guess it either. Comm
callback improvements will need a dedicated PR.
Alex Rousskov [Fri, 2 Sep 2022 17:14:12 +0000 (17:14 +0000)]
Stop (ab)using Transient entry flags for CF requirement mngmt (#1127)
The ENTRY_REQUIRES_COLLAPSING flag was used for relaying "hitting this
entry carries Collapsed Forwarding risks right now" info to other
workers. That flag is not necessary to relay that info because remote
workers already know whether the entry is attached to one of the shared
cache stores. Moreover, relaying that flag correctly is difficult and
would require a different implementation:
* The flag is changed in relatively high-level Store code, possibly
_after_ we stopped writing, when the transients entry is already
read-only. This bug leads to an isWriter() assertion in
Transients::clearCollapsingRequirement() called via a quick or aborted
StoreEntry::startWriting().
* Basics.flags is not an atomic field. It lacks an "[app]" mark -- its
readers will not get a reliable value due to race conditions when
Squid will need to update the Transient entry flags in "append" mode.
* The flag is changed assuming that simply sending stuff to disk is
sufficient for other workers to read it. Since 18102f7, this is
inaccurate because the store index is updated only after the slot is
written to disk.
Remote workers now use Store attachment info to set the local StoreEntry
object flag, ignoring the no-longer-used Transients entry flags.
The ENTRY_REQUIRES_COLLAPSING flag is still used to tell transactions
which StoreEntry objects are subject to CF risks. It would be nice to
use the presence of MemObject::reply_ to relay the same info for
store-unattached entries (and attachment info for others), but we could
not find a way to do that (during commit d2a6dcb work and then again
during this project) without a lot of essentially out of scope rewrites.
Also removed misleading Ipc::StoreMapAnchor::exportInto() debugging.
Alex Rousskov [Fri, 2 Sep 2022 11:08:48 +0000 (11:08 +0000)]
Fix printing Security::ErrorDetail (#1129)
Squid dumps RefCount pointer details instead of the intended TLS error
details like "SQUID_TLS_ERR_ACCEPT+TLS_LIB_ERR=14094418":
ERROR: failure while accepting a TLS connection...: 0x564017165ef0*1
We overload operator "<<" for ErrorDetail::Pointer to report errors, but
the compiler picks our generic operator "<<" overload for RefCount<C>
instead. I believe this happens because the actual type of the
being-printed handshakeResult.errorDetail object (i.e.
Security::IoResult::ErrorDetailPointer) is not ErrorDetail::Pointer but
Security::ErrorDetail::Pointer; there is no overload for the latter.
This simple solution "works" but it does not solve the underlying
problem: Other ErrorDetail kids (existing and future) and other similar
class hierarchies using RefCount pointers (e.g., Http::Message) are or
will be mishandled because of missing kid-specific overloads.
To properly address this XXX, we probably should add traits/interfaces
to all RefCountable classes that want their Pointers to be printed
specially (e.g., ErrorDetail) _and_ teach our generic operator "<<"
overload for RefCount<C> to honor those. Also, an InstanceId data member
might be recognized/printed by default instead of the memory address; it
may even be added to RefCountable.
Preparing for shutdown after 186724 requests
Waiting 1 seconds for active connections to finish
Closing HTTP(S) port [::]:3128
FATAL: assertion failed: Transients.cc:192:
"oldE->mem_obj && oldE->mem_obj->xitTable.index == index"
This change removes two steps at the end of the manual shutdown sequence
(FreeAllFs() and Store::FreeMemory()) and associated deadly Store hacks,
adjusting a few Store checks as needed. Also, StoreFileSystem hierarchy
declarations were slightly updated when half of its virtual methods were
removed. Key changes are detailed below.
### Do not disable Store [by dropping Store::Root()] during shutdown
Disabling Store (i.e. calling Store::FreeMemory()) resulted in "do not
use Root() during shutdown" assertion-avoiding hacks that bypassed code
maintaining key invariants. Those hacks were probably accommodating
Store-using destruction sequences triggered by SquidShutdown() or later
code, _but_ the hacks also ran during the graceful shutdown period!
Violating those invariants in MemObject destructor led to the quoted
assertions when CollapsedForwarding::HandleNotification() ran during
graceful shutdown.
We could add more state to narrow down hacks scope, or we could disable
IPC notifications that triggered the known deadly shutdown sequence, but
* We should not disable assertions during shutdown because many are
about essential (e.g., preventing data corruption) invariants and
because it is not practical to disable all assertions that should be
disabled (and keep disabling the new ones correctly). The idea leads
to unsafe code that is also more difficult to write and maintain.
* We should not disable kid-to-kid notifications because some are
required for finishing transactions (that can still be successfully
finished) during graceful shutdown. Increasing successful completion
chances is the whole idea behind that graceful shutdown feature!
Instead, Store should provide (essential) services during shutdown.
TODO: Update those unit test cases that use Store::FreeMemory() but
should not. That requires careful analysis because the next test cases
may rely on the previous case removing its Store object/statistics/etc.
The same dedicated project might (help) address commit 27685ef TODO
about unwanted Store unit test order dependencies.
### Do not disable StoreFileSystem service during shutdown
Similar to the Store service, manually killing FS service is a bad idea.
The underlying code was also essentially unused and buggy:
* Nobody called RegisterAllFsWithCacheManager(). A caller would assert.
* Nobody called Fs::Clean(). A call could crash Squid because deleted
objects could still be in use by others (they are not refcounted).
* SetupAllFs() was called but did essentially nothing.
* FreeAllFs() was called but had only harmful effects: It leaked memory
and removed usable FS registrations, increasing shutdown dangers.
See also: Commit 3a85184 documents why we should avoid explicit "delete
essentialService" calls (and why some crashes may still occur if we do).
LEAK_CHECK_MODE had not worked since before 2006 commit afec404. The
disabled code does not compile since 2012 commit b65ce00 and would crash
current Squids during shutdown. We should not fix it because the
underlying "delete essentialService" idea is deeply flawed.
Writing correct code that uses essential modules/services that may
"suddenly" disappear or crash is nearly impossible. The trickle of
assertions due to missing Store::Root() is a case in point. These risks
and efforts are also unnecessary: We can and should build APIs that
provide essential services without disappearing or crashing. Keeping
heap-allocated service objects during shutdown helps with that.
Valgrind and other modern leak detection tools are capable of
distinguishing still-reachable at-exit memory from runtime memory leaks.
We do not need to delete all still-reachable at-exit objects to enable
that leak detection functionality.
The OS will reclaim allocated heap memory anyway.
N.B. Despite the legacy code implications, we should distinguish
low-level new/delete (a.k.a. fooInit() and fooFree()) memory management
code (which is an internal service matter that essential service users
should not be exposed to!) with configure-reconfigure-reconfigure-sync
events. There is value in notifying services about configuration
(changes) and various shutdown stages, of course. We already have the
RunnersRegistry API for handling such notifications.
Even without explicit "delete essentialService" calls, we may still have
problems during C++ post-exit() cleanup where destructors of various
globals are called "concurrently", with few order guarantees. We should
avoid non-heap-allocated globals with "complicated" destructors, but
their elimination is out of scope here.
Also added initial support for test-suite/squidconf/ tests that depend
on ./configure options/results. This framework will be extended to
support testing with invalid configuration files that Squid must reject.
Existing Squid configuration files in test-suite/squidconf/ were given a
.conf filename extension for ease of auto-distinguishing them from
.instructions files that contain testing "rules". The customary .conf
extension also improves handling of those files by humans.
Alex Rousskov [Tue, 23 Aug 2022 03:13:29 +0000 (03:13 +0000)]
Maintenance: Remove an always-false condition (#1122)
The being-removed condition was added by beae59b, likely to compensate
for another bug that was later discovered and fixed in 619da1e. That fix
made the condition unnecessary and always false.
There are other problems in this code, including the misleading comment
describing the affected if statement. They will be fixed separately, and
the removal of this effectively unused code may simplify that work.
pponakan [Mon, 22 Aug 2022 22:28:34 +0000 (22:28 +0000)]
Optimize ephemeral port reuse with IP_BIND_ADDRESS_NO_PORT (#1115)
commBind: Cannot bind socket ... : (98) Address already in use
When opening a TCP connection from a specific IP address (e.g., set by
tcp_outgoing_address), Squid usually lets the OS select the source port
by performing the traditional bind(2)+connect(2) sequence with zero
source port. This sequence consumes ephemeral ports at the connection
opening rate even if many destinations are unique (and, hence, could
reuse the same source port) because the OS must pick a source port at
bind(2) time, before connect(2) supplies the destination address.
Starting with v4.2, Linux supports IP_BIND_ADDRESS_NO_PORT socket option
designed to decrease ephemeral port consumption by delaying port binding
until connect(2) time. If available, Squid now uses that option when
opening any outgoing TCP connection bound to a source IP address.
Squid asserted because the code could switch a transient entry from
writing to reading while the corresponding store entry was still being
(or could still be) written to the shared memory cache. For example:
1. We start as a collapsed writer.
2. We get a response and start writing to disk and shared memory caches.
3. Disk swapout fails (for any reason, including out-of-slots errors).
4. storeSwapOutFileClosed() calls transientsCompleteWriting().
5. transientsCompleteWriting() switches the entry into reading mode
... but we are still writing to the memory cache!
There was a somewhat related XXX in transientsCompleteWriting(), but
what that comment did not say is that if we are writing to two stores in
parallel, then the first transientsCompleteWriting() call (made when one
of the two swapouts ends) makes us a reader prematurely.
An incorrect reader status allows a Controller::syncCollapsed()
notification (to, say, a shared memory cache writer) slip past the
transients->isWriter() guard, eventually reaching the reading()
assertion.
Properly detecting the end of all store writing is difficult because the
two mostly independent writing "threads" may start/finish at seemingly
random times, in many code places, and even in different workers. To
simplify this task, Squid now limits cache writing to the worker
transaction that made the StoreEntry object public. That transaction
creates a writing Transients entry. Other transactions start as
Transients readers. The writer switches to reader when neither memory
nor disk caching can start or continue.
Also simplified Transients entry state. Squid relayed swapout errors via
the Transient entries themselves, but that is not necessary (and it
prevented the cache entry from recovering by writing to another store).
Each store entry can take care of its own swapout status/results. The
readers just need to know whether somebody may start (or is still)
writing, and we relay that info by keeping the Transients entry locked
for writing (appending, to be precise) while that condition is true.
Also fixed shared caches to recognize that no more data will be coming
in because the remote entry writer is gone. Readers still try to deliver
what they have, even if they know that the response will be truncated.
Also tried to follow the "broadcast after change, in the same context as
the change" principle in the modified code instead of relying on the
caller to broadcast after all changes. This approach may increase the
number of broadcasts, but it reduces the probability that we will miss
an important Broadcast() call. We can (and should) optimize repeated,
useless broadcasts, but that work is outside this project scope.
Also improved StoreEntry::swapOut() shutdown safety and
mayStartSwapOut() checks descriptions/order.
Alex Rousskov [Fri, 19 Aug 2022 18:57:50 +0000 (18:57 +0000)]
Honor httpd_suppress_version_string in more contexts (#1121)
* HTML "signature" part of generated HTTP responses to URN requests
* Server header in generated HTTP redirect responses
* boundary strings in generated HTTP 206 multipart/byteranges responses
Also document known contexts outside the configuration directive reach.
We may assert that all of them are outside the current "httpd" scope
(implied by the directive name), but admins may not know how we define
that "httpd" scope; being more explicit may reduce surprises.
Alex Rousskov [Fri, 19 Aug 2022 15:44:09 +0000 (15:44 +0000)]
Match ./configure --help parameter names with their defaults (#1120)
It is customary[^1] for "./configure --help" to use "--enable-foo"
spelling for default-disabled features and "--disable-foo" spelling for
default-enabled features. This custom allows admins to quickly discover
defaults without developers painstakingly documenting that default for
every parameter.
These changes are limited to "./configure --help" output. They do not
alter any defaults and do not alter non-help ./configure run outcomes.
[^1]: Example 1.1 at https://autotools.info/autoconf/arguments.html
Alex Rousskov [Thu, 18 Aug 2022 23:28:05 +0000 (23:28 +0000)]
Make COMM_ORPHANED and COMM_REUSEPORT values distinct (#1116)
Broken since inception (commit 1c2b446). The exact runtime effects (if
any) of this bug are difficult to ascertain, but it is possible that
some COMM_REUSEPORT (i.e. worker-queues port) listening connections were
not reported as incorrectly orphaned when they should have been.
Alex Rousskov [Fri, 12 Aug 2022 16:06:12 +0000 (16:06 +0000)]
Do not tell admins that we are replacing their .* with our .* (#1119)
acl hasHeaderFoo req_header Foo .*
2022/08/12 11:43:27| WARNING: regular expression '.*' has only
wildcards and matches all strings. Using '.*' instead.
Broken since RE optimization inception (commit 6564dae), but the
excessive WARNING was "invisible" to many admins until 61be1d8 started
saving early level-0/1 messages to cache.log.
Alex Rousskov [Tue, 9 Aug 2022 03:22:48 +0000 (03:22 +0000)]
Do not "open" Comm::Connection on oldAccept() errors (#1103)
When Comm::TcpAcceptor::oldAccept() discovers an error, it should close
the socket it just accepted. We should use RAII to avoid leaking the
open socket, especially in the presence of multiple error detection
areas and C++ exceptions. Using the available Connection object for
controlling socket lifetime does not work well because that Connection
object often lingers well past oldAccept() -- the object delivers
various low-level error details (e.g., the remote HTTP client address of
the failed attempt) to TcpAcceptor users.
Instead of "opening" the Connection object ASAP to avoid FD leaks and
then struggling to find the right time to close it, we now delay that
opening until oldAccept() succeeds. Meanwhile, the socket lifetime and
Comm registration are controlled by a simple RAII Descriptor object.
Eventually, we will reuse Comm::Descriptor in Connection and other
descriptor-owning code. One known prerequisite for those improvements is
Optional<> support for non-trivial types (a work in progress).
Alex Rousskov [Mon, 8 Aug 2022 16:10:06 +0000 (16:10 +0000)]
Unify r.n.g. seed generation (#1109)
Using time-based expressions for seeding pseudo random number generators
often results in multiple SMP workers getting the same seed. That, in
turn, may result in unwanted correlation of supposed-to-be-independent
worker activities: making artificially similar decisions (that are
supposed to be "random") and/or concentrating rather than
spreading/sharing load.
The recently added RandomUuid code did not use time-based seeds in
anticipation of this unification.
This change converts all std::mt19937 users in src/ except the random
ACL code (that probably deserves a dedicated change).
C++ standard does not guarantee that std::random_device does not force
users to wait while more entropy is accumulated. Based on public
comments and source code analysis of popular STL implementations, we
should be getting the desired no-wait behavior from popular STLs, in
popular environments. If Squid encounters a special environment where
the default std::random_device waits for entropy, more code and/or
configuration options will be needed to accommodate that special case.
The biggest drawback with this approach is that it may be difficult for
the admins to discover that std::random_device is waiting in their
environment. This is mitigated by the low probability of that happening.
Alex Rousskov [Sun, 7 Aug 2022 14:59:20 +0000 (14:59 +0000)]
Maintenance: Improve stability of generated debug-sections.txt (#1101)
Pure "sort" may not produce stable results across different locales.
This may explain why the official order changed in commit 502bcc7 and
why current runs in some unofficial environments reorder sections.
Also fixed two minor problems:
* "sort -n" (i.e. numeric sort) was applied to non-numeric fields
* "sort -u" (i.e. unique sort) was pointlessly applied multiple times
Alex Rousskov [Tue, 2 Aug 2022 23:24:40 +0000 (23:24 +0000)]
Maintenance: Improve stability of gperf-generated sources (#1099)
Control gperf from source-maintenance.sh: Existing Makefile rule did not
use (and should not use) primary "make" functionality -- we do not want
to automatically trigger RegisteredHeadersHash.cci regeneration based on
file timestamps alone because most users will not have (the right
version of) gperf and (the right version of) astyle available. The rule
did use Makefile variables, but that resulted in unstable gperf output
because variables like $src change depending on the developer build
environment. This change finishes the move started by commit 3d50bea.
Simplify formater.pl, making its interface similar to other formatting
tools used by source-maintenance.sh: The script now sends its results to
stdout. Renamed the script to reduce surprises, fix the old naming typo,
and to improve consistency with format-makefile-am.pl. The script needs
more polishing; we avoided otherwise unmodified lines except removing an
unused buggy $pfx line that triggered warnings.
Move "astyle did not corrupt code" checks from source-maintenance.sh to
formater.pl, where they belong and are easier to implement. This move
simplifies reuse of the Perl script inside source-maintenance.sh. It
also removes dependency on md5sum/etc. $CppFormatter now contains the
exact command to (re)use.
Fix "Generated by.." marker. Its old location in .cci was hard to
notice. The marker was also present in hand-edited .gperf file!
Report RegisteredHeadersHash.cci changes.
Also stopped stripping "register" from gperf output. Gperf v3.1 stopped
adding "register" to C++ output. We (should) use that gperf version.
Removing "register" complicated gperf error handling because a pipe to
sed ate gperf exit code.
Alex Rousskov [Sat, 30 Jul 2022 01:12:17 +0000 (01:12 +0000)]
Support non-trivial Optional values (#1106)
Commit 7224761 skipped this std::optional feature when adding initial
Optional implementation because that feature is difficult to support.
However, several in-progress projects now need to make non-trivial
objects Optional. This implementation maintains key Optional invariants,
matching those of std::optional:
* A valueless Optional object does not construct or destruct the value.
* A valued Optional object supports move/copy ops supported by value.
Maintaining the first invariant is tricky:
* Union prevents value construction/destruction in a valueless Optional.
* Explicit destructor call destructs the value in a _valued_ Optional.
* A dummy union member works around a C++ requirement for constexpr
unions to have at least one active (i.e. initialized) member. Since a
valueless Optional cannot initialize the value, we need another union
member that we can initialize.
* We use an _anonymous_ union because C++ places more requirements on
named unions, triggering a more complex implementation with
placement-new and to-be-deprecated std::aligned_storage_t tricks!
XXX: This simplified implementation violates the following std::optional
invariant. We believe that this violation does not hurt the current and
foreseeable Squid code. In our tests, optimizing compilers still
optimize Optional<Trivial> destructors away.
* std::optional<Value> is trivially destructible if Value is.
Alex Rousskov [Wed, 27 Jul 2022 18:11:52 +0000 (18:11 +0000)]
Maintenance: Trigger -Wswitch in more build environments (#1104)
When a developer forgets to update RawSwapMetaTypeTop() after adding a
new SwapMetaType item, the function should produce a -Wswitch compiler
warning (at least). However, we discovered that many compilers remain
silent, apparently confused by the constant switch expression:
* GCC warns only starting with v9.1;
* clang does not warn at all (up to v14, inclusive).
Adding a non-constant variable triggers the expected -Wswitch warnings
in clang and earlier GCC versions. Optimizing compilers (-O1) remove the
otherwise unused variable so adding it has no runtime cost.
Declaring the variable outside the switch statement is required to avoid
an unwanted -Wunused-variable warning with GCC v4.8 and clang v3.0.0.
A post-switch return statement is required to avoid an unwanted
-Wreturn-type warning with GCC v4.9.
Alex Rousskov [Tue, 26 Jul 2022 15:05:54 +0000 (15:05 +0000)]
Remove support for Gopher protocol (#1092)
Gopher code quality remains too low for production use in most
environments. The code is a persistent source of vulnerabilities and
fixing it requires significant effort. We should not be spending scarce
Project resources on improving that code, especially given the lack of
strong demand for Gopher support.
With this change, Gopher requests will be handled like any other request
with an unknown (to Squid) protocol. For example, HTTP requests with
Gopher URI scheme result in ERR_UNSUP_REQ.
Default Squid configuration still considers TCP port 70 "safe". The
corresponding Safe_ports ACL rule has not been removed for consistency
sake: We consider WAIS port safe even though Squid refuses to forward
WAIS requests:
acl Safe_ports port 70 # gopher
acl Safe_ports port 210 # wais
Alex Rousskov [Sun, 24 Jul 2022 18:56:53 +0000 (18:56 +0000)]
Fix handling of bad SNMP variable type when parsing IP_* vars (#1097)
AFAICT, we never associate snmp_netIpFn() with the wrong variable type,
so we may be dealing with a bug in an effectively unreachable code here.
1999 commit 736eb6a correctly removed snmp_var_free() call from most
refactored SNMP functions but missed snmp_netIpFn() and
snmp_meshCtblFn() cases. The latter was fixed in 2000 commit d20b1cd.
RFC 9111 obsoletes the Warning header, removing all specification
requirements about it. Likewise, this update changes Squid behaviour in
regards to that header:
1) Squid no longer adds Warning headers to generated or forwarded
messages. Miss responses from servers/peers and hits cached by an
older version of Squid may still have Warning headers.
2) On 304 revalidation, Warning header are treated the same as any
other/generic header. They are added or replaced according to their
presence in the 304 reply. Absent any Warning update by a 304, Squid
may still deliver cached content with old Warning headers.
3) Squid no longer validates received Warning headers. RFC 7234 placed
syntax requirements and limits on how old some Warning values could
be (when dated). Those checks are no longer being performed. The
header value is now treated as an opaque string.
4) Warning header usage and types are no longer tracked in message
statistics available through cache manager.
Updated documentation references to reflect RFC 7234 becoming obsolete.
The TPROXY lookup part was misleading -- the method only checked that
TPROXY was enabled by the configuration and did not change the accepted
connection settings.
Also do not assume that a listening port can have both COMM_TRANSPARENT
and COMM_INTERCEPTION flags set. Squid treats TPROXY and NAT modes as
mutually exclusive and rejects such configurations.
Also forbid invalid port configurations with tproxy mode where TPROXY is
prohibited by ./configure.
Also removed unused Ip::Intercept::StopInterception().
* started adding -Werror in --disable-strict-error-checking builds
* stopped adding -Wall to CXXFLAGS
* forgot that -Werror is spelled $squid_cv_cc_option_werror
* forgot that "=" is converted to "_" in $squid_cv_cc_arg... name
* forgot that "W" is present in $squid_cv_cc_arg... name
* checked squid_cv_cc_... name but set squid_cv_cc_arg... name
* used warning-generating SQUID_CC_CHECK_ARGUMENT code for testing
compiler support for new -W options (while also using -Werror)
Correctly detecting support for a compiler warning option is difficult:
* Some compilers (e.g., clang) exit successfully when given an
unsupported command line option unless we also give -Werror. With
those compilers, we must use "-Werror -Wx" to test for -Wx support.
However, that does not mean we should add -Werror to all Squid builds.
* Adding -Werror does not work for detecting supported GCC warnings: GCC
needs -Werror=no-x to fail on unsupported -Wno-x warnings.
* We do not even know how to detect (via compiler exit code) supported
warnings for compilers other than clang and GCC. We still add -Werror
to all compilers other than GCC when testing -X, but, based on quick
godbolt.org tests, -Werror does not make icc fail on unsupported -Wx.
Also delayed adding warnings until we are done adding such C++ compiler
options as -std=c++11 (and, for Irix, optimization level) because those
options may affect the set of warnings supported by the compiler. We
also (now) need SQUID_CC_GUESS_VARIANT and SQUID_CC_GUESS_OPTIONS. We
might need AC_PROG_CPP. All those things are settled after the official
code attempted to add warnings. The best code location probably needs
more work, but at least all warnings are handled in the same code now.
These changes also fix Bug 5167 (GCC -Wno-unused-private-field
miscategorized as supported).
Also fixed clang (-Wall => -Wmost) warnings exposed by the above fixes.
Alex Rousskov [Thu, 21 Jul 2022 18:03:58 +0000 (18:03 +0000)]
Avoid spurious translation warnings in default "make install" (#1095)
Squid defaults to --disable-translation, but the corresponding "make
install" produces dozens of alias-link.sh warnings like these:
WARNING: az translations do not exist. Nothing to do for: az-az
WARNING: bg translations do not exist. Nothing to do for: bg-bg
...
WARNING: vi translations do not exist. Nothing to do for: vi-vn
With this change, --disable-translation builds do not attempt to
"locate" and install/link translation-related files. The following
warning is still shown, even in explicit --disable-translation builds:
WARNING: Translation is disabled.
This change also stops exposing --disable-translation builds to
alias-link.sh bugs that generate warnings for boilerplate statements.
Alex Rousskov [Tue, 19 Jul 2022 21:25:32 +0000 (21:25 +0000)]
Discarded connections do not contribute to forward_max_tries (#1093)
The forward_max_tries directive is said to limit the number of
"high-level request forwarding attempts", but its documentation does not
specify whether each Happy Eyeballs connection opening attempt
constitutes such a forwarding attempt (because commit 3eebd26 that
clarified forward_max_tries meaning came before Happy Eyeballs code
started opening multiple concurrent connections in commit 5562295).
Official Squid counts each Happy Eyeballs connection attempt as a
request forwarding attempt. For example, if forward_max_tries is 2, then
Squid, to a likely admin surprise, may refuse to re-forward the request
after an HTTP failure because the second request forwarding attempt has
been used up by the (canceled!) spare TCP connection opening attempt.
This change stops counting discarded connections as request forwarding
attempts. For example, if forward_max_tries is 2, Squid will re-forward
the request (if needed and possible) even if Squid must open up to 3
connections to do that (discarding the second one in the process). In
this context, discarding the race-losing connection and going with the
winning one may be viewed as a low-level retry activity that
forward_max_tries is already documented to ignore.
Eventually, Squid may stop discarding connections that lost the Happy
Eyeballs race. When/if that complicated improvement is implemented,
those canceled connection opening attempts should be counted, but the
then-saved connections should be used for request re-forwarding,
preserving the same overall outcome: With forward_max_tries set to 2,
Squid will still re-forward the request (if needed and possible).
Before this change, setting forward_max_tries to 1 prevented all spare
connection attempts: The first DNS response received (A or AAAA) would
be used for the primary connection attempt, increasing n_tries, making
ranOutOfTimeOrAttempts() true, and blocking any spare attempt (both
concurrent and sequential). That low-level side effect is as wrong as
the blocked retries described in the "likely admin surprise" example
above. Now, admins that really want to disable spare connection attempts
may set forward_max_tries to 1 _and_ happy_eyeballs_connect_limit to 0.
* Fix build with OpenSSL v3.
* Refactor RSA key generation to avoid deprecated RSA_*() APIs.
* Refactor DH parameter and key config to avoid deprecated DH_*() APIs.
* Refactor ECDH key creation to avoid deprecated EC_*() APIs.
* Deprecate ssl_engine support in builds with OpenSSL v1-.
* Disable ssl_engine support in builds OpenSSL v3+.
We deprecated/removed ssl_engine support (as summarized in the last two
bullets above) without providing an OpenSSL Providers-based alternative
because of the following factors:
1. We do not have the resources to update ssl_engine code to build
(without deprecation warnings) with OpenSSL v3 when the feature is
unused.
2. We do not have the resources to create an OpenSSL v3 Provider-based
replacement for ssl_engine code that uses deprecated Engine APIs.
3. OpenSSL v3 deprecated Engine support (triggering deprecation warnings
in applications that use Engine APIs with OpenSSL v3). Since Squid
default builds use -Werror, doing nothing would break such builds.
4. Squid ssl_engine does not appear to be a popular feature.
Before this fix, acceptOne() was losing FTP DATA TcpAcceptor context
by unconditionally resetting the context to a nil listenPort_.
Also, TcpAcceptor methods should not (be expected to) explicitly set
their job code context. Instead, the job should be created in the
right code context, allowing job-created async calls to auto-manage code
context. The updated port-iterating clientHttpConnectionsOpen() and
Ftp::StartListening() loops now set the right code context when creating
async calls that result in TcpAcceptor job creation.
Store swap metadata consists of a handful of Squid-specific TLV fields.
In 69c01cd, fields handling was converted to use lists of TLV objects.
In 528b2c6, the fields were promoted to a hierarchy of C++ classes, each
representing a specific T in TLV, but the transition was not complete:
While trying to add another class to the hierarchy, we discovered that
it would require violating hierarchy design again, adding another XXX.
While trying to address those hierarchy problems, we concluded that the
hierarchy and, in fact, the TLV lists themselves should be removed:
* A class hierarchy with polymorphic methods works well when the same
method can be used regardless of the specific object type. In swap
metadata case, the user code had to know the type (i.e. T in TLV) to
handle each object correctly because only certain types could be
processed at each metadata loading stage. Moreover, some processing
can benefit from using type-specific APIs (e.g., extracting a URL).
Several `switch (type)` statements (and XXXs about doing something a
method should not be doing) illustrate that design conflict.
* A class hierarchy without polymorphic methods can still be useful if
leaf objects can reuse a lot of parent code. However, StoreMeta class
had virtually no code reusable by its derived classes -- it was a
collection of static methods and their equivalents. Metadata leaf
classes were essentially used as global functions.
* Creating a list is useful when the listed objects are traversed/used
multiple times. In swap metadata case, each TLV object is effectively
used once: Official code allocates each object, copies data into it,
and then uses the object either to interpret the just-read value or to
write just-copied bytes into another buffer. The only (minor)
exception to that pattern was when the list was also scanned to
calculate the total buffer size, but that extra scan is unnecessary
and undesirable, especially given modern serialization techniques.
This change converts the problematic StoreMeta class hierarchy into a
few stand-alone functions. The SwapMetaView class implements a view of
the serialized metadata, significantly reducing allocations and copying.
Metadata parser creates a cheap view for the being-parsed field, with
view lifetime confined to that single parsing loop iteration.
Also removed unused STORE_META_* type IDs, addressing several related
XXXs and TODOs. This change was necessary to let the compiler tell us
when we forgot a swap metadata type in a `switch (type)` statement. No
changes to type IDs and their deficient assignment algorithm.
This change attempts to preserve arbitrary metadata validation decisions
that do not violate the overall design, except the following two:
* More problematic Store entries may now be rejected due to conversion
of serious validation errors (e.g., framing problems) to exceptions.
This change decreases the risks of infinite metadata parsing loops and
invalid entry acceptance.
* Some error messages are now logged at level 1 instead of 0.
The Rock-specific ZeroedSlot() check was moved to Rock. Unlike Rock
fixed-size one-file storage, UFS-based cache_dir files are not zeroed.
Alex Rousskov [Sun, 10 Jul 2022 17:04:14 +0000 (17:04 +0000)]
Maintenance: Removed most NULLs using modernize-use-nullptr (#1075)
Applying clang-tidy modernize-use-nullptr also converted many
"anonymous" 0s into nullptr.
Clang-tidy uses clang AST, so we can only modernize code what clang can
compile. Fortunately, it is fairly easy to compile most of Squid code,
eliminating ~90% of NULLs in actively maintained code (excluding
comments and Windows-specific code) and ~73% of all NULLs.
TODO: Consider eliminating all remaining C++ NULLs using a simple script
that does not understand C++ syntax, assuming that the remaining cases
are simple enough to avoid problems that a compiler cannot detect.
Alex Rousskov [Fri, 8 Jul 2022 13:48:36 +0000 (13:48 +0000)]
Maintenance: Remove unused data members (#1082)
These handlerSubscription fields blocked quite useful static assertions.
They are in the way of callback improvements; that work has discovered
that they are unused (since inception in merge commit 1c8f25b).
Alex Rousskov [Tue, 5 Jul 2022 22:58:13 +0000 (22:58 +0000)]
Maintenance: Improve stability of generated automake Makefiles (#1081)
Our source-maintenance.sh script generates five automake Makefiles using
"git ls-files <pattern>" output as the source of filename entries. These
entries were piped through "sort -u". That problematic step is now gone:
* "sort" may not produce stable results across different locales.
* "sort" does not use the order enforced by our format-makefile-am.pl.
That Perl script itself did not fix the order because it only ordered
entries in *_SOURCES groups that these generated files do not use.
* "sort -u" could silently drop duplicate Makefile entries (that should
not exist) instead of letting that Perl script warn us about them.
The first two bullets made automake Makefile generation unstable (e.g., 502bcc7 reversed the ASCII order of en.lang and en_AU.lang entries, but
current runs in unofficial environments could reorder those entries).
format-makefile-am.pl now recognizes the generated group name patterns,
properly formatting entries in those five generated automake Makefiles.
That fix alone created persistent misleading "File ... changed: by ...
format-makefile-am.pl" NOTICEs because the Perl script was (always)
changing the _generated_ file. We now pipeline the generation and
formatting steps, applying NOTICE-generation check to the final result.
(Positive) side effect: The generated automake Makefiles now include a
notice that they are generated.
Alex Rousskov [Sun, 3 Jul 2022 22:59:32 +0000 (22:59 +0000)]
Maintenance: Fix formater.pl application after a failure (#1078)
Our scripts/source-maintenance.sh relies on formater.pl to create
foo.astylebak -- a fresh copy of the being formatted file foo. The Perl
script is creating that copy, but since formater.pl could be executed
outside the "all source files are staged" protections offered by
source-maintenance.sh, formater.pl never overwrites old copies. Instead,
the Perl script numbers copies: foo.astylebak.1, foo.astylebak.2, ...
Unfortunately, the numbering code was copying into the highest available
index instead of shifting existing copies by one to always name the
fresh copy foo.astylebak, as expected by source-maintenance.sh! Thus, if
formatting failed (e.g., was interrupted by the user) leaving a backup
file on disk, then all subsequent attempts would start to fail after
a branch switch or local edits would make the backup copy stale:
ERROR: File src/store.cc not formatting well
... with the following files on disk:
src/store.cc <--- original file that should be backed up
src/store.cc.astylebak.2 <--- fresh copy just made by formater.pl
src/store.cc.astylebak.1
src/store.cc.astylebak.0
src/store.cc.astylebak <--- stale copy used by source-maintenance.sh
The FILO numbering order was also unusual/surprising to users; logrotate
and Squid itself use FIFO ordering, shifting numbered backup copies.
Implement FIFO ordering required by source-maintenance.sh and expected
by most users. TODO: Adjust source-maintenance.sh to stop relying on
formater.pl-created backup (and to stop guessing its name).
Alex Rousskov [Sun, 3 Jul 2022 14:48:50 +0000 (14:48 +0000)]
Maintenance: Fix cf_gen build warnings with clang v12 (#1076)
cf_gen.cc:... warning: comparison of array 'buff' not equal to a
null pointer is always true [-Wtautological-pointer-compare]
These conditions are not just always true but a bit odd or misleading
because they check the pointer after it was dereferenced. They were
added in c1f8bbd which misinterpreted the "else" conditions. The two
"regular DOC/CONFIG line" conditions are self-documented well enough.
Segmentation fault in tests/testStore is another symptom.
testStoreHashIndex calls Store::Root().init() which eventually gets to
MemStore::Requested() which correctly requires Config.memShared to be
configured. Hack the missing configuration replacement, following the
inconsistent pattern in three other test suites.
TODO: Reduce configuration code duplication and such bugs by providing
initialization method(s) for all Store::Root()-using tests to (re)use.
What does LTO have to do with any of it? CPPUnit [allegedly] randomizes
execution of test suites using raw C++ pointers, and a test suite in the
testStore collection probably contains the right initialization code.
Unfortunately, Config initialization is evidently shared across test
suites (XXX). LTO evidently changes those raw pointer values enough to
change the order in an unfortunate way.
When built with Samba TrivialDB, ext_session_acl would never
successfully look a session up due to an uninitialized key argument. As
a result, the helper would be unable to validate that an user had logged
in. Broken since TrivialDB support was added in acd207a.
Detected by Coverity. CID 1441979: Uninitialized scalar variable
(UNINIT).
Alex Rousskov [Tue, 14 Jun 2022 15:24:28 +0000 (15:24 +0000)]
Avoid reading tls-cert=bundle twice when loading certificates (#1073)
After opening and reading the certificate bundle to load the required
traffic-signing certificate, Squid was opening and reading the same file
again to load the optional intermediate certificates. Loading the
traffic-signing certificate twice triggered bugs and further
complicating naturally tricky code.
No functionality changes expected except minor cache.log message changes
and certificate reporting improvements in "squid -k parse" mode.
Amos Jeffries [Mon, 13 Jun 2022 15:55:06 +0000 (15:55 +0000)]
Maintenance: Improve CONTRIBUTORS and its updates (#980)
The Squid Project used a script to collect CONTRIBUTORS entries, but it
was not called from source-maintenance.sh, it did not really understand
the structure of those entries, and its results required significant
manual polishing efforts. CONTRIBUTORS file kept deteriorating.
This change consists of three major parts detailed further below:
* a major (semi-manual) CONTRIBUTORS update and cleanup;
* scripts/update-contributors.pl: Merges new entries into CONTRIBUTORS;
* collectAuthors() in source-maintenance.sh: Finds new entries.
Part 1: CONTRIBUTORS update: We collected (and then pruned/polished) all
contributors from the following (master and v3+ branches) sources:
* all commit authors;
* all commit co-authors (from Co-authored-by trailer and note entries);
* all CONTRIBUTORS file revisions (the latest one was missing entries).
Part 2: update-contributors.pl understands and enforces a more formal
CONTRIBUTORS structure. After a non-indented preamble text ending with
an empty line, indented CONTRIBUTORS entries now use these formats:
name <email>
name
<email>
The entries are case-insensitively sorted by the two fields (name,
email), with several conflict-resolution rules aimed at achieving
natural and stable order. Non-ASCII entries are still banned (for now)
because properly supporting them requires a serious dedicated effort.
The program merges current CONTRIBUTORS and all well-formed contributor
entries (read from the standard input) except these:
* entries with an already represented email
* entries with an already represented name
* entries containing "squidadm" in name or email
The matching is case-insensitive. These filtering rules appear to work
better in Squid CONTRIBUTORS context than more accurate/strict rules do.
Part 3: collectAuthors() feeds update-contributors.pl with new
contributors collected from "git log" output. The function only looks at
the current git branch commits made after a "vetted" point. That point
is updated by certain CONTRIBUTORS commits, as detailed in the
collectAuthors() description inside source-maintenance.sh. It can also
be specified manually via the new --update-contributors-since option.
It is not critical (and is probably impossible) for CONTRIBUTORS
automation to properly detect and merge every single new contributor.
When the scripts get it wrong, a human can always update CONTRIBUTORS
manually. Rare mistakes are not a big deal in this context. For example,
if a past contributor now needs to be listed with another email (e.g.
representing a new employer), we manually add a second entry.
This change is a reference point for automated CONTRIBUTORS updates.
Alex Rousskov [Sun, 12 Jun 2022 23:13:23 +0000 (23:13 +0000)]
Do not send more than one self-signed certificate (#1058)
Configured chain L-M-R-R was sent as L-M-R-R instead of just L-M-R.
Configured chain R was sent as R-R instead of just R!
... where R is a self-signed (i.e. "Root CA") certificate that signed
(possibly indirectly, via an intermediate certificate M) the traffic
signing (i.e. "Leaf" or "end-entity") certificate L (if any).
Security::KeyData::loadX509ChainFromFile() has other problems, but
properly solving those problems needs significant code refactoring. It
is best done in a dedicated no-functionality-changes PR. The same
refactoring is likely to simplify the existing code logic, handling all
root certificates in one place.
Also removed disabled code so that we do not have to keep maintaining
it. Incorrectly disabling this code at the very end of commit 540e296
work created the bugs we are fixing in this branch. For some, it is
easier to comprehend active code without disabled code misdirection.
This disabled code removal does _not_ mean that Squid should keep
sending Root CA certificates (when they are not the signing/end-entity
certificates). However, stopping that behavior deserves a dedicated PR.
Amos Jeffries [Fri, 10 Jun 2022 07:57:46 +0000 (07:57 +0000)]
RFC 9218 Priority header registration (#1070)
This allows the Priority header to more efficiently be added and
removed by reverse-proxy administrators using the
reply_header_add, reply_header_access, reply_header_replace
directives.
Break long store_client call chains with async calls (#1056)
The store_client class design created very long call chains spanning
Squid-client and Squid-server processing and multiple transactions.
These call chains also create ideal conditions for dangerous recursive
relationships between communicating classes (a.k.a. "reentrancy" among
Squid developers). For example, storeClientCopy() enters store_client
and triggers disk I/O that triggers invokeHandlers() that re-enters the
same store_client object and starts competing with the original
storeClientCopy() processing state.
The official code prevented the worst recursion cases with three(!)
boolean flags and time-based events abused to break some of the call
chains, but that approach did not solve all of the problems while also
losing transaction context information across time-based events.
This change effectively makes STCB storeClientCopy() callbacks
asynchronous, eliminating the need for time-based events and one of the
flags. It shortens many call chains and preserves transaction context.
The remaining problems can and should be eliminated by converting
store_client into AsyncJob, but those changes deserve a dedicated PR.
store_client orchestrates cooperation of multiple asynchronous players:
* Sink: A Store client requests a STCB callback via a
storeClientCopy()/copy() call. A set _callback.callback_handler
implies that the client is waiting for this callback.
* Source1: A Store disk reading subsystem activated by the storeRead()
call "spontaneously" delivers response bytes via storeClientRead*()
callbacks. The disk_io_pending flag implies waiting for them.
* Source2: Store memory subsystem activated by storeClientListAdd()
"spontaneously" delivers response bytes via invokeHandlers().
* Source3: Store disk subsystem activated by storeSwapInStart()
"spontaneously" notifies of EOF/error by calling noteSwapInDone().
* Source4: A store_client object owner may delete the object by
"spontaneously" calling storeUnregister(). The official code was
converting this event into an error-notifying callback.
We continue to answer each storeClientCopy() request with the first
available information even though several SourceN calls are possible
while we are waiting to complete the STCB callback. The StoreIOBuffer
API and STCB recipients do not support data+error/eof combinations, and
future code will move this wait to the main event loop anyway. This
first-available approach means that the creation of the notifier call
effectively ends answer processing -- store_client just waits for that
call to fire so that it can relay the answer to still-synchronous STCB.
When STCB itself becomes asynchronous, this logic will continue to work.
Also stopped calling STCB from storeUnregister(). Conceptually, the
storeUnregister() and storeClientCopy() callers ought to represent the
same get-content-from-Store task; there should be no need to notify that
task about what it is doing. Technically, analysis of STCB callbacks
showed that many such notifications would be dangerous (if they are or
become reachable). At the time of the storeUnregister() call, the STCB
callbacks are usually unset (e.g., when storeUnregister() is called from
the destructor, after that object has finished copying -- a very common
case) or do not do anything (useful).
Also removed callback_data from the Callback::pending() condition. It is
conceptually wrong to require non-nil callback parameter, and it is
never cleared separately from the callback_handler data member anyway.
Also hid copyInto into the private store_client section to make sure it
is not modified while we are waiting to complete the STCB callback. This
move required adding a couple of read-only wrapper methods like
bytesWanted() and noteSwapInDone().
Also simplified error/EOF/bytes handling on copy()-STCB path using
dedicated methods (e.g., store_client::callback() API is no longer
mixing EOF and error signals).
Use SBuf for Fs::Ufs::RebuildState file paths (#1063)
In theory, a combination of a non-terminating-on-overflows snprintf()
and an incorrectly guessed MAXPATHLEN (or a cache_dir root path being at
the edge of a correctly guessed MAXPATHLEN) may result in non-terminated
fullpath. Use SBuf to avoid these particular headaches.
Detected by Coverity. CID 1461166: Copy of overlapping memory
(OVERLAPPING_COPY).
Use added X509_check_issued() replacements. The only case left is in
src/ssl/gadgets.cc which is used by certificate helpers that cannot be
linked with libsecurity yet.
Use added X509_NAME_oneline() replacements, where feasible. This change
speeds up ssl_verify_cb() and other functions that used to extract and
copy certificate name into a buffer even when that name was unused
because debugging levels were not elevated enough, including by default.
Also fixes memory leak when debugging section 83 at level 3+ of an
OpenSSL-using Squid (missing name cleanup in clientNegotiateSSL()).
Also fixes a (usually symptom-free) sslcrtd bug: C strings allocated by
OpenSSL were freed by xfree() instead of OPENSSL_free().
ERROR: failure while establishing TLS connection on FD: 70x123456*1
* FD value was malformed. Now report all Connection details instead.
* Error details were not reported (a pointer value was reported).
Also, for error message prefix, use a single/constant phrase instead of
error-dependent phrases. This principle is for level-0/1 messages, but
it is OK to apply it here as well, especially given the related TODO.
Also improved RawPointer reporting so that we can use it for this fix.
Some of these changes are not necessary (for the final code state), but
all of them except the last one were necessary at some point during the
work on this fix, and they all may become handy in the future:
* Print nothing if ptr is nil. Both existing callers benefit from this.
* Allow the caller to set the label/object delimiter. Also faster.
* Support reporting object values on a dedicated Debug::Extra line.
* Support (and do not attempt to print) nil RawPointer() labels.
One master transaction may encounter several certificate
validation-related errors because Squid may need to validate several
server certificates for one connection and may even need to open several
server connections. Since 2012 commit 4fb72cb, ssl_verify_cb()
incorrectly assumed that no past errors were possible. The callback
started to assert when recent code changes (commit e227da8?) improved
delivery of past errors to ACL checks.
While fixing this bug, we discovered that ServerBump::serverSession
ignored new TLS connection when Squid connected to another destination
after a failed TLS handshake. That stale value was then used to extract
stale information about past TLS errors. In some cases, that would
prevent Squid from hitting the now-fixed assertion. In other cases, it
could result in wrong sslproxy_cert_sign and sslproxy_cert_adapt
decisions. Now also fixed; see Ssl::ServerBump::attachServerSession().
A nil PeerConnector::check->sslErrors value may get stale because
PeerConnector::check is filled just once: sslError will remain nil
during the second validation pass (after PeerConnector fetches missing
intermediate certificates and revalidates), despite validation errors
discovered during the first pass. Today, ssl_verify_cb() does not really
use FilledChecklist::sslErrors preset by its PeerConnector, so a stale
value currently has no effect. Fixing this waiting-to-happen problem is
not trivial (we tried). A proper fix deserves a dedicated PR. Added XXX.
While working on these changes, we discovered that Squid implements two
mutually exclusive ssl_error (FilledChecklist::sslErrors) definitions:
* In sslproxy_cert_error context, ssl_error matches the last or current
error being examined by the sslproxy_cert_error code.
* In sslproxy_cert_sign and sslproxy_cert_adapt context, ssl_error
matches any previously observed validation error.
This change updates Squid documentation to reflect the above findings.
Unfortunately, it is not feasible to remove this unwanted directive
"sensitivity" without deprecating/replacing sslproxy_cert_error and
ssl_error: Selecting any one of the two ssl_error definitions above may
seriously (and silently) break existing configurations:
# If all-errors semantics is adopted for sslproxy_cert_error, then
# this configuration will start allowing certificates with _any_
# error as long as that certificate also has a minorError:
acl minorError ssl_error ...
sslproxy_cert_error allow minorError
# If last-error semantics is adopted for sslproxy_cert_sign, then
# this configuration will stop properly signing certificates that
# mimic self-signed real certificates if the very last error during
# that real certificate validation was _not_ certSelfSigned:
sslproxy_cert_sign signSelf ssl::certSelfSigned
Alex Rousskov [Mon, 16 May 2022 12:53:26 +0000 (12:53 +0000)]
Fixed (and renamed) Ssl::ReadX509Certificate() API (#1048)
This pointer-returning non-throwing function was used to load an unknown
number of certificates from a file. A nil result normally indicated the
end of the certificate sequence but also a certificate loading failure.
The caller could not distinguish the two nil-result outcomes without
OpenSSL error stack analysis, which is difficult to do correctly, and
which none of the callers bothered to do, resulting in some certificate
configuration errors silently ignored by Squid.
The adjusted API uses exceptions to report errors. A nil result is still
used to report the end of the certificate sequence. To help callers that
must have at least one certificate, one function is now dedicated to
loading required certificate(s) while a companion function is added to
load optional ones.
Avoid exponential growth in the number of changed functions by wrapping
callers inside the not-ready-to-throw code chains (that cannot be easily
converted) with try/catch-return-false blocks.
Also removed the need for Ssl::readCertFromMemory() by adding
Ssl::ReadOnlyBioTiedTo(). Functions reading things (from various input
sources) should be kept separately from functions creating input sources
(for various reading functions). Otherwise, we will end up creating a
lot more functions (at least readers*creators) with virtually no gain in
performance or convenience. OpenSSL BIO API exists specifically to
separate I/O mechanisms from I/O users. Use it to our advantage.
We now also clear old/stale OpenSSL error stack before performing the
certificate loading operation that may trigger one or more errors.
Also forget stale errno when forgetting OpenSSL errors. This reset
avoids misleading error reports based on stale errno values.
Also detail certificate loading errors, now that they are not ignored.
ReportSysError was added to avoid SysErrorDetail::NewIfAny() memory
allocations when reporting basic system call errors. Perhaps there is a
way to reuse SysErrorDetail here via (enhanced) Optional, but the
resulting complexity and the current SBuf-based reporting overheads do
not justify the reuse of "if errno then strerror(errno)" logic. We may
revisit this when Optional supports non-trivial classes.
Also moved most SysErrorDetail implementation details to the newly added
.cc file (to reduce user exposure), replacing strerror() with xstrerr().
ForgetErrors guts were moved from Security to Ssl because we cannot link
helpers with src/security (yet). We may have to refactor Security to
make such reuse possible in the future.
Preserve caller context across (and improve) deferred reads (#1025)
The transaction context was not saved/restored when dealing with
deferred reads initiated by events like the DelayPools::Update() event.
To fix this, we refactored MemObject::delayRead() and its descendants to
use an AsyncCall, which automatically stores/restores code context.
Using explicit async callbacks highlighted the danger of passing
Connection object via CommRead that does not maintain a closure
callback. There was also a related "stuck transaction" suspicion
documented in DeferredReadManager::kickARead(). Fortunately, all these
problems could now be solved by removing DeferredRead and CommRead
classes! The delayed readers already store the Connection object,
maintain closure callbacks, and have to check stored Connection validity
before reading anyway. The general/centralized delayed reading logic is
not really about reading and Connections (those parts are handled by
transaction-specific code) but about triggering reading attempts.
Asynchronous calls are perfect (and sufficient) for doing that.
Also fixed Delay Pools for Gopher: delayAwareRead() was initiated only
once from gopherSendComplete() and the subsequent read calls were
delay-unaware (i.e. immediate) reads.
Also fixed a Delay Pools problem with active transactions: A transaction
started with Delay Pools on becomes stuck if a reconfiguration turns
Delay Pools off.
Also refactored the existing AsyncCall FIFO intrusive storage, making
its reuse possible (and marked one candidate with a TODO).
Ensure initClient MasterXactions have listening ports (#993)
MasterXaction creation code that forgets to supply a port for an
initClient transaction will now be rejected by the compiler.
This safety improvement hit two problems described below.
The TcpAcceptor class was incorrectly creating a MasterXaction object
for FTP DATA connections. We moved MasterXaction creation to TcpAcceptor
callbacks because only they know whether accepting a TCP connection
implies a new master transaction start (it does not for FTP DATA cases).
The Downloader class was implemented to spawn its own MasterXaction
using XactionInitiator info supplied by the download requestor. That
design is incompatible with the new static assertion because each
MasterXaction creator must now hard-code XactionInitiator value for the
assertion to work at _compile_ time. All other contexts naturally
hard-code that value.
We saw two ways to resolve this conflict:
a) Let the download requestor create the MasterXaction object for the
Downloader. The primary advantage of this (implemented) approach is
that it allows (future) client request processing code to satisfy
client requests using Downloader. Such request satisfaction requires
sharing the master transaction between the client transaction and the
downloader transaction, and this change enables such sharing. This
change moves the "Which master transaction does this download belong
to?" question from the Downloader into download requestors.
b) Move initClient-has-port enforcement (and squidPort) from
MasterXaction into XactionInitiator. The primary advantage of this
(not implemented) approach is that it places that enforcement inside
the type it is meant to police (essentially) -- XactionInitiator.
The two changes are complementary. We did not implement (b) because it
requires significant XactionInitiator redesign, moving _all_ originating
client information from MasterXaction to XactionInitiator (currently
squidPort and tcpClient).