git.ipfire.org Git - thirdparty/squid.git/log

Remove CPU profiler mechanism (#931)

The old CPU profiler has not been updated in many years. As a result,
the statistics provided are deceptively incomplete and not sufficient
for their intended purpose of profiling Squids CPU usage. External tools
such as oprofile do a better job despite their differences and some
limitations.

Bug 5090: Must(!request->pinnedConnection()) violation (#930)

The bug may be asymptomatic. Visible bug symptoms, if any, may include:

    FATAL: check failed: !request->pinnedConnection()
    exception location: FwdState.cc(1124) connectStart

    FATAL: check failed: transportWait
    exception location: FwdState.cc(675) noteDestinationsEnd

FwdState::usingDestination() should cover 3 post-transportWait periods:

1. peerWait: opening a CONNECT tunnel through the cache_peer
2. encryptionWait: TLS negotiations to secure the connection
3. Comm::IsConnOpen(serverConn): a Squid-peer transaction

The condition for the last period did not account for the time between
FwdState::unregister() and FwdState::complete() (or their equivalents),
when the transport connection is either closed or moved to the pconn
pool, but FwdState is still waiting for complete() and must not attempt
to connect to another destination. The bug is usually hidden because
complete() is usually called immediately after unregister(). However,
RESPMOD adaptation (at least) may delay complete(). If peer selection
news comes during that delay, usingDestination() lies, and various
Must()s may fail, depending on Squid version and HTTP request details.

Now, FwdState does not rely on the to-peer connection state for the
third period condition. Instead, we explicitly track whether the last
dispatch()ed activity has ended. This tracking is tricky because
complete() may be called without dispatching an asynchronous activity,
and because there are at least three ways for an activity to end:
explicit complete(), explicit handleUnregisteredServerEnd(), and
implicit serverClosed() connection closure callback. This tracking will
become easier and more reliable when FwdState no longer co-owns/stores
the to-peer connection. This change simplifies the future removal of
that connection ownership.

Also reordered usingDestination() conditions to match the chronological
order of the corresponding three periods.

Also reordered transportWait-vs-usingDestination() checks to match the
chronological order of those two forwarding stages.

Fix build on openbsd 7.0 (#929)

OpenBSD doesn't offer CPU_SET and CPU_ISSET. Implement their stubs as
inline functions to give the compiler proper hints about arguments (non)
use.

We have a const-correctness bug in std::unordered_map when supplying an
allocator that OpenBSD is strict about. Fix it.

Update buildtest.sh to try and use relative paths first. This prevents
autoconf complaining and failing if the directory path includes
characters from an unsafe set.

Bug 5060: Parallel builds are not reliable (#927)

Create tests directory before using it. Needed since commits 44e802f and
9ba9313.

    cp ../../src/tests/stub_debug.cc tests/stub_debug.cc
    cp ../../src/tests/stub_libmem.cc tests/stub_libmem.cc
    cp: cannot create regular file 'tests/stub_debug.cc':
        No such file or directory

BUG: Unexpected state while connecting to ... server, part 1 (#916)

These BUG messages (discussed and removed in a recent commit 2b6b1bc)
exposed several bugs. This change fixes a case where a BUG message was
correctly triggered by a Must() violation in Ssl::ServerBio::write():

check failed: buf[0] == 22
exception location: bio.cc(478)

The code expectations reflected in that Must() were wrong: Instead of
sending ClientHello, OpenSSL may also send a TLS Alert (Level: Fatal,
Description: Internal Error), at least. We believe that alert is sent
when SslBump configures OpenSSL to negotiate using unsupported ciphers
or something like that. This change relaxes ServerBio code expectations,
preventing the Must() violation.

The Must() violation was causing OpenSSL-related memory leaks. A more
comprehensive solution is needed to avoid similar leaks, but this small
fix helps in a specific (and a fairly common) case.

This is a Measurement Factory project.

Remove step2+ stare-and-splice and peek-and-bump support (#926)

Support for these features was successfully disabled (at build time) for
five years. See commit 88a300c for the list of reasons. We did not even
provide an ./configure --enable-... option for it. The corresponding
code does not compile with modern OpenSSL versions, but its mere
presence complicates related code logic and significantly increases
related development efforts. This code is not worth keeping.

This is a Measurement Factory project.

Fix HappyConnOpener::checkForNewConnection Must(prime) violation (#923)

This change addresses a known problem that triggered unwanted C++
exceptions every time Squid selected a to-server persistent connection
as the primary Happy Eyeballs destination/answer. The bug existed since
HappyConnOpener inception (commit 5562295). It did not seem to affect
the connection requestor directly because the HappyConnOpener job sends
the selected pconn to the requestor _before_ throwing.

Also adjusted Happy Eyeballs state documentation to reflect a successful
termination state. Before and after this change, we may enter that state
in the middle of checkForNewConnection(). The less we think about done()
as an _exceptional_ "only on error" or "only at the end of processing"
state, the fewer similar bugs we will create.

The code also improved after we abandoned the idea of documenting all
primary state changes in checkForNewConnection(). There are too many
nuances/changes to document everything anyway, and moving primary track
handling into a dedicated function significantly improves readability.

Docs: %adapt::sum_trs entries may well exceed %icap::tt (#914)

%icap::tt documentation incorrectly implied that the measurement
includes the entire ICAP transaction(s) lifetime. In reality, individual
ICAP transaction contribution stops with
Adaptation::Icap::ModXactLauncher::swanSong(), which is normally
triggered by Adaptation::Icap::Launcher::noteAdaptationAnswer(). Here,
the "answer" does not include the entire ICAP response, but just enough
information to form adapted HTTP message headers (echoed or received).
Thus, a large or slow ICAP response body may result in %adapt::sum_trs
values that far exceed the corresponding %icap::tt "total".

This change does not imply that %icap::tt should (not) work differently.

Also fixed a typo in %adapt::all_trs and polished %adapt::sum_trs docs.

Bug 5158: AnyP::Uri::host() mishandles [escaped] IPv6 addresses (#920)

When an address is expected to be a string in the form "Host:Port", the
Host MUST be wrapped in square brackets iff it is a numeric IPv6
address. The ":Port" part is usually optional. i.e. "[2001:db8::1]:443"
or "[2001:db8::1]".

There are 2 bugs relating to the handling of numeric IPv6 addresses:

* Bug 1: AnyP::Uri::host(const char *) is supposed to accept the host
  part of a URI, but just copied it into hostAddr_ instead of using the
  fromHost() method to set hostAddr_.  Since the argument is the host
  part of a URI, numeric IPv6 addresses are wrapped in square brackets,
  which would never be stripped and hostIsNumeric was therefore not set.
  We now use the fromHost() method to process the host name correctly.

* Bug 2: Conversely, AnyP::Uri::hostOrIp() is supposed to return the
  host name, suitable for use in a URI or Host header, which means that
  numeric IPv6 addresses need to be wrapped in square brackets.
  However, that wrapping was never done.  We now use the toHostStr()
  method to correctly format the string.

As far as I know, neither of these bugs actually break anything at the
moment, but they have the potential to break future developments, so
applying this fix should have no operational impact.

Also removed wrong IP::Address::toStr() description from .cc file that
(poorly) duplicated correct description of that method in the header.

Bug 5169: StoreMap.cc:517 "!s.reading()" assertion (#918)

unlockSharedAndSwitchToExclusive() was not properly failing when future
readers violated preconditions for obtaining an exclusive lock:

- `if (!readers)` means no old readers
+ `if (!readLevel)` means no old readers and nobody is becoming a reader

That missing "becoming a reader" condition covers a lockShared() caller
that had passed their writeLevel test before we incremented writeLevel
to lock other lockShared() callers out. That caller increments `readers`
while unlockSharedAndSwitchToExclusive() makes `writing` true.

Introduced in commit 1af789e due to poor code duplication. This change
also removes that code duplication by adding finalizeExclusive().

BUG: Unexpected state while connecting to ... server, part 2 (#917)

These BUG messages (discussed and removed in a recent commit 2b6b1bc)
exposed several bugs. This change fixes a case where a BUG message was
triggered by the following Must() violation:

    check failed: !csd->serverBump() ||
        csd->serverBump()->at(...tlsBump1, XactionStep::tlsBump2)
    exception location: PeekingPeerConnector.cc(173) initialize

The above Must() assumed that PeekingPeerConnector always changes the
SslBump step to step3. However, that assumption was wrong because
PeekingPeerConnector may run multiple times (and the step is recorded
outside the connector object). When FwdState reforwarded a failed
attempt, PeekingPeerConnector would find itself at step3.

Instead of fixing the Must(), we fixed a bigger bug: SslBump step3 must
start when we decide to communicate with the server, not in the middle
of that communication. This fix may affect some esoteric configurations
that use at_step ACLs outside ssl_bump _and_ rely on the wrong step
change timing, but, technically, such configurations are not officially
supported.

More step boundary fixes are needed. There is a (much bigger) ongoing
project dedicated to those changes.

This is a Measurement Factory project.

Fix reconfiguration leaking tls-cert=... memory (#911)

Refactored ReadX509Certificate() API for safe use in more contexts,
including in leaking Security::KeyData::loadX509ChainFromFile().

Abstract direct calls to the dangerous PEM_read_bio_X509() API.

Leaking (under certain conditions) since master/v5 commit 540e296.

Remove HPUX compiler handling in configure.ac (#890)

HPUX is not a testable nor a primary target platform for us

Properly track (and mark) truncated store entries (#909)

## Responses with truncated bodies

Squid used an error-prone approach to identifying truncated responses:
The response is treated as whole[^1] unless somebody remembers to mark
it as truncated. This dangerous default naturally resulted in bugs where
truncated responses are treated as complete under various conditions.

This change reverses that approach: Responses not explicitly marked as
whole are treated as truncated. This change affects all Squid-server
FwsState-dispatched communications: HTTP, FTP, Gopher, and WHOIS. It
also affects responses received from the adaptation services.

Squid still tries to deliver responses with truncated bodies to clients
in most cases (no changes are expected/intended in that area).

[^1]: A better word to describe a "whole" response would be "complete",
    but several key Squid APIs use "complete" to mean "no more content
    is coming", misleading developers into thinking that a "completed"
    response has all the expected bytes and may be cached/shared/etc.

## Related access-logging improvements

Transactions that failed due to origin server or peer timeout (a common
source of truncation) are now logged with a _TIMEOUT %Ss suffix and
ERR_READ_TIMEOUT/WITH_SRV %err_code/%err_detail.

Transactions prematurely canceled by Squid during client-Squid
communication (usually due to various timeouts) now have WITH_CLT
default %err_detail. This detail helps distinguish otherwise
similarly-logged problems that may happen when talking to the client or
to the origin server/peer.

## Other

FwdState now (indirectly) complete()s truncated entries _after_
releasing/adjusting them so that invokeHandlers() and others do not get
a dangerous glimpse at a seemingly OK entry before its release().

Remove HTTP reply header completion hack (#910)

Treat responses with truncated HTTP headers (i.e. no CRLF after all the
field-lines) as malformed, replacing them with an HTTP 502
ERR_INVALID_RESP error (same as any other HTTP response with malformed
headers would get).

Since Bug 2879 (commit da6c841 and earlier v2-only changes), Squid was
"fixing" such truncated headers by adding an extra CRLF sequence and
re-parsing them. Depending on the truncation circumstances, that old
workaround could result in rather bad outcomes for Squid and its
clients. Hopefully, we no longer need to work around such bugs. If we
do, we should do it in a safer manner and with admin permission.

Deduplicate transaction LogTags (#906)

...using ALE cache.code instead of its partial
ClientSideRequest::logType duplicate.

The need to de-duplicate these two LogTag members became apparent when
we tried to update/detail LogTags in forwarding code (e.g., to report
peer timeouts) and noticed that ALE cache.code value could be
overwritten by (stale) ClientSideRequest::logType information. Those
updates needed other significant changes, so we factored this small
stand-alone improvement out.

Fortunately, ClientSideRequest::logType lifetime does not exceed ALE's,
making ClientHttpRequest::logTags removal straightforward.

Also removed a paranoid TunnelStateData::al check after verifying that
the ALE pointer is never nil.

Prep for 4.17 and 5.2 (#905)

Improved const-correctness of ALE-driven APIs (#904)

Also made ClientHttpRequest::al constant. This work actually started
with changing this data member, to make some ALE-related hack safer.
That change required the API fixes. We have since got rid of the hack,
but all these const-correctness changes are valuable on their own.

No functionality changes.

Fixed a copy-paste typo in HttpHdrCc::hasMinFresh() (#901)

This bug could result in unexpected hits/misses because Squid used the
cached CC:max-stale value (or -1) instead of CC:min-fresh when
calculating entry freshness.

Broken since 810d879.

WCCP: Validate packets better (#899)

Update WCCP to support exception based error handling for
parsing and processing we are moving Squid to for protocol
handling.

Update the main WCCPv2 parsing checks to throw meaningful
exceptions when detected.

Fix X509_V_ERR_UNABLE_TO_GET_ISSUER_CERT_LOCALLY handling (#898)

Fix "mem_obj->inmem_lo == 0" assertion in StoreEntry::swapOut() (#896)

Squid may stop writing to disk well before receiving the entire miss
response (e.g., because of swapout errors). When that happens, Squid
continues to call swapOut() to mem-cache the incoming response body.
Avoid the inapplicable disk-writing part of swapOut() in those cases.

Fix MAX_PKT{4,6}_SZ to account for icmpEchoData padding (#887)

The bug was exposed by GCC v11 on Ubuntu Impish:

    Icmp4.cc:116:11: error: array subscript icmpEchoData[0] is partly
        outside array bounds of char[282] [-Werror=array-bounds]
        echo->opcode = (unsigned char) opcode;

The array the compiler is talking about is the pkt buffer. That buffer
size (i.e. MAX_PKT4_SZ) was calculated under the faulty assumption that
a compiler cannot add padding after icmphdr (when doing "icmp+1") and/or
between icmpEchoData data members. When compiler padded, the old
MAX_PKT4_SZ math stopped working.

Same for ICMPv6.

Support Gopher type "w" (#889)

Known uses and libwww imply that "w" selector is an absolute URI.

Also updated gopher item-type code list and codes documentation.

Cleanup HERE step 1 (#881)

Step 1:
Add script to replace HERE symbol in a way that does
not break branches during the remainder of the transition.

The script is added first as a separate commit for large
branches to merge and run before attempting to merge onto a
cleaned master branch.

The Enforcement commit is provided for review purposes to show
the changes made by the script. It should be removed before
merge.

Step 2:
Master branch has a regular automated Enforcement commit.

Step 3:
Then full removal of the HERE symbol as a third PR followup to
drop it completely.

Fix "make check" -Wsign-compare on arm32 broken by 1a51cf7 (#891)

MemBuf::contentSize() returns mb_size_t which is signed.

Bug 5055: FATAL FwdState::noteDestinationsEnd exception: opening (#877)

The bug was caused by commit 25b0ce4. Other known symptoms are:

    assertion failed: store.cc:1793: "isEmpty()"
    assertion failed: FwdState.cc:501: "serverConnection() == conn"
    assertion failed: FwdState.cc:1037: "!opening()"

This change has several overlapping parts. Unfortunately, merging
individual parts is both difficult and likely to cause crashes.

## Part 1: Bug 5055.

FwdState used to check serverConn to decide whether to open a connection
to forward the request. Since commit 25b0ce4, a nil serverConn pointer
no longer implies that a new connection should be opened: FwdState
helper jobs may already be working on preparing an existing open
connection (e.g., sending a CONNECT request or negotiating encryption).

Bad serverConn checks in both FwdState::noteDestination() and
FwdState::noteDestinationsEnd() methods led to extra connectStart()
calls creating two conflicting concurrent helper jobs.

To fix this, we replaced direct serverConn inspection with a
usingDestination() call which also checks whether we are waiting for a
helper job. Testing that fix exposed another set of bugs: The helper job
pointers or in-job connections were left stale/set after forwarding
failures. The changes described below addressed most of those problems.

## Part 2: Connection establishing helper jobs and their callbacks

A proper fix for Bug 5055 required answering a difficult question: When
should a dying job call its callbacks? We only found one answer which
required cooperation from the job creator and led to the following
rules:

* AsyncJob destructors must not call any callbacks.

* AsyncJob::swanSong() is responsible for async-calling any remaining
  (i.e. set, uncalled, and uncancelled) callbacks.

* AsyncJob::swanSong() is called (only) for started jobs.

* AsyncJob destructing sequence should validate that no callbacks remain
  uncalled for started jobs.

... where an AsyncJob x is considered "started" if AsyncJob::Start(x)
has returned without throwing.

A new JobWait class helps job creators follow these rules while keeping
track on in-progress helper jobs and killing no-longer-needed helpers.

Also fixed very similar bugs in tunnel.cc code.

## Part 3: ConnOpener fixes

1. Many ConnOpener users are written to keep a ConnectionPointer to the
   destination given to ConnOpener. This means that their connection
   magically opens when ConnOpener successfully connects, before
   ConnOpener has a chance to notify the user about the changes. Having
   multiple concurrent connection owners is always dangerous, and the
   user cannot even have a close handler registered for its now-open
   connection. When something happens to ConnOpener or its answer, the
   user job may be in trouble. Now, ConnOpener callers no longer pass
   Connection objects they own, cloning them as needed. That adjustment
   required adjustment 2:

2. Refactored ConnOpener users to stop assuming that the answer contains
   a pointer to their connection object. After adjustment 1 above, it
   does not. HappyConnOpener relied on that assumption quite a bit so we
   had to refactor to use two custom callback methods instead of one
   with a complicated if-statement distinguishing prime from spare
   attempts. This refactoring is an overall improvement because it
   simplifies the code. Other ConnOpener users just needed to remove a
   few no longer valid paranoid assertions/Musts.

3. ConnOpener users were forced to remember to close params.conn when
   processing negative answers. Some, naturally, forgot, triggering
   warnings about orphaned connections (e.g., Ident and TcpLogger).
   ConnOpener now closes its open connection before sending a negative
   answer.

4. ConnOpener would trigger orphan connection warnings if the job ended
   after opening the connection but without supplying the connection to
   the requestor (e.g., because the requestor has gone away). Now,
   ConnOpener explicitly closes its open connection if it has not been
   sent to the requestor.

Also fixed Comm::ConnOpener::cleanFd() debugging that was incorrectly
saying that the method closes the temporary descriptor.

Also fixed ConnOpener callback's syncWithComm(): The stale
CommConnectCbParams override was testing unused (i.e. always negative)
CommConnectCbParams::fd and was trying to cancel the callback that most
(possibly all) recipients rely on: ConnOpener users expect a negative
answer rather than no answer at all.

Also, after comparing the needs of two old/existing and a temporary
added ("clone everything") Connection cloning method callers, we decided
there is no need to maintain three different methods. All existing
callers should be fine with a single method because none of them suffers
from "extra" copying of members that others need. Right now, the new
cloneProfile() method copies everything except FD and a few
special-purpose members (with documented reasons for not copying).

Also added Comm::Connection::cloneDestinationDetails() debugging to
simplify tracking dependencies between half-baked Connection objects
carrying destination/flags/other metadata and open Connection objects
created by ConnOpener using that metadata (which are later delivered to
ConnOpener users and, in some cases, replace those half-baked
connections mentioned earlier. Long-term, we need to find a better way
to express these and other Connection states/stages than comments and
debugging messages.

## Part 4: Comm::Connection closure callbacks

We improved many closure callbacks to make sure (to the extent possible)
that Connection and other objects are in sync with Comm. There are lots
of small bugs, inconsistencies, and other problems in Connection closure
handlers. It is not clear whether any of those problems could result in
serious runtime errors or leaks. In theory, the rest of the code could
neutralize their negative side effects. However, even in that case, it
would only be a matter of time before the next bug bites us due to stale
Connection::fd and such. These changes themselves carry elevated risk,
but they get us closer to reliable code as far as Connection maintenance
is concerned. Without them, we will keep chasing deadly side effects of
poorly implemented closure callbacks.

Long-term, all these manual efforts to keep things in sync should become
unnecessary with the introduction of appropriate Connection ownership
APIs that automatically maintain the corresponding environments (TODO).

## Part 5: Other notable improvements in the adjusted code

Improved Security::PeerConnector::serverConn and
Http::Tunneler::connection management, especially when sending negative
answers. When sending a negative answer, we would set answer().conn to
an open connection, async-send that answer, and then hurry to close the
connection using our pointer to the shared Connection object. If
everything went according to the plan, the recipient would get a non-nil
but closed Connection object. Now, a negative answer simply means no
connection at all. Same for a tunneled answer.

Refactored ICAP connection-establishing code to to delay Connection
ownership until the ICAP connection is fully ready. This change
addresses primary Connection ownership concerns (as they apply to this
ICAP code) except orphaning of the temporary Connection object by helper
job start exceptions (now an explicit XXX). For example, the transaction
no longer shares a Connection object with ConnOpener and
IcapPeerConnector jobs.

Probably fixed a bug where PeerConnector::negotiate() assumed that
sslFinalized() does not return true after callBack(). It may (e.g., when
CertValidationHelper::Submit() throws). Same for
PeekingPeerConnector::checkForPeekAndSpliceMatched().

Fixed FwdState::advanceDestination() bug that did not save
ERR_GATEWAY_FAILURE details and "lost" the address of that failed
destination, making it unavailable to future retries (if any).

Polished PeerPoolMgr, Ident, and Gopher code to be able to fix similar
job callback and connection management issues.

Polished AsyncJob::Start() API. Start() used to return a job pointer,
but that was a bad idea:

* It implies that a failed Start() will return a nil pointer, and that
  the caller should check the result. Neither is true.

* It encourages callers to dereference the returned pointer to further
  adjust the job. That technically works (today) but violates the rules
  of communicating with an async job. The Start() method is the boundary
  after which the job is deemed asynchronous.

Also removed old "and wait for..." post-Start() comments because the
code itself became clear enough, and those comments were becoming
increasingly stale (because they duplicated the callback names above
them).

Fix Tunneler and PeerConnector handling of last-resort callback
requirements. Events like handleStopRequest() and callException() stop
the job but should not be reported as a BUG (e.g., it would be up to the
callException() to decide how to report the caught exception). There
might (or there will) be other, similar cases where the job is stopped
prematurely for some non-BUG reason beyond swanSong() knowledge. The
existence of non-bug cases does not mean there could be no bugs worth
reporting here, but until they can be identified more reliably than all
these benign/irrelevant cases, reporting no BUGs is a (much) lesser
evil.

TODO: Revise AsyncJob::doneAll(). Many of its overrides are written to
check for both positive (i.e. mission accomplished) and negative (i.e.
mission cancelled or cannot be accomplished) conditions, but the latter
is usually unnecessary, especially after we added handleStopRequest()
API to properly support external job cancellation events. Many doneAll()
overrides can probably be greatly simplified.

Reduce dependencies of testConfigParser (#888)

Move several globals used by the legacy parse logic into cache_cf.cc
and remove unnecessary objects linked to testConfigParser.

Cleanup dependencies for testEvent unit test (#886)

Many objects linked to the testEvent unit test were there for
legacy reasons and no longer needed.

Converting Event::dump() to Packable interface additionally
removes all dependency on the Store and StoreEntry API. Vastly
simplifying the complexity of the test setup and clarifying
(lack of) code coverage by this test.

Fix build on OpenBSD (#885)

On OpenBSD 6.9, SOL_SOCKET is defined in sys/socket.h.

ext_lm_group_acl: Improved username handling (#884)

Ensure that NTDomain is terminated even if UserName is long.
It happened to be terminated in the current code, but only by accident.

Bug 4922: Improve ftp://... filename extraction (#879)

Symptoms include: Unexpected ERR_FTP_NOT_FOUND/FTP_REPLY_CODE=550
outcome for GET ftp://... requests with URLs containing `;type=...`.

Broken since commit 51b5dcf.

Activate extra compiler checks (#667)

If the compiler supports it, add the `-Wextra` compiler flag to enable
extra checks. Do not enable extra checks with GCC v4 because that old
GCC version reports too many false positives.

Explicitly disable those `-Wextra` checks that offer few insights while
creating a lot of noise which cannot be easily avoided.

Fix warnings reported by GCC or clang with the new settings.

Rework syslog-related code in debug.cc, making it clearer and with less
dependencies on some syslog levels.

Support "file" syntax for 'squid_error' and 'has' ACL parameters (#874)

The 'squid_error' and 'has' are the only ACLs that do not support
loading their parameters from a file using the "path/filename" syntax.
We see no justification for this exception, and it stands in the way of
(unrelated) configuration improvements we are working on (that will,
among other things, prevent such accidents from happening again).

This change causes no upgrade problems because it does not change
handling of previously accepted configurations. It only expands the set
of acceptable configurations, better matching documentation and admin
expectations with regard to universal "path/filename" syntax support.

Fix TCP keepalive (#853)

Setting TCP keep-alive flags at accept(2) time resolves issues with
client sockets timing out while waiting for the ::Server handler to run.

Also resolves a bug with FTP DATA connections not having keep-alive set.
These connections would truncate objects if the data transfer connection
paused for too long and became timed out by the network routing system.

Fix doc/debug-messages.dox generation (#871)

cache_log_message ID 62 was missing from doc/debug-messages.dox because
the code generating that file did not recognize debugs() statements with
Critical() or Important() macro inside a conditional operator.

Also fixed a related sed regex: sed does not support non-greedy matches.

No runtime functionality changes.

Prep for 5.1 (#868)

documentation only.

Remove legacy context-based debugging in favor of CodeContext (#866)

Added in 1998, context-based debugging has been neglected and

- only covers two relatively small contexts
- unsafe in the presence of exceptions
- produces noise (e.g., "ctx: exit level 0" messages)
- delayed "ctx: exit" messages confuse admins
- uses deprecated urlXXX() API
- difficult to extend to more contexts without performance overheads
- usually provides less info in fewer contexts (than CodeContext)

Cleanup macros in src/defines.h (#860)

Reduce compile unit dependencies on src/defines.h by moving some
src/defines.h macros to their most-relevant header.

Also remove all src/defines.h macros known to be unused.

Allow sending "squid -k ..." signals to PID 1 (#863)

Inside a container, Squid usually runs with PID=1. Since "squid -k"
feature was added in 1996 (commit 7690e8e), Squid refused to signal a
PID=1 instance. We do not know why, and it could have been an accident.
We cannot find a good reason to keep prohibiting such signalling.

Reject different HTTP requests with unusual framing (#753)

... and remove support for request_entities.

Squid now follows the following (approximate) rules when checking HTTP
request framing. The first matching rule wins.

* HTTP requests with a Transfer-Encoding:chunked header, including GET
  and HEAD requests with that header, are accepted. No changes here.

* HTTP requests with unsupported Transfer-Encoding values are rejected
  (Squid replies with HTTP 501 "Not Implemented"). No changes here.

* HTTP requests having conflicting Content-Length values are rejected
  (Squid replies with HTTP 400 "Bad Request"). No changes here.

* HTTP/1.0 and HTTP/0.9 POST and PUT requests without a valid
  Content-Length header are now rejected (Squid replies with HTTP 411
  "Length Required"). All of these were allowed before.

* HTTP/1.0 GET and HEAD requests with a Content-Length:0 header are now
  rejected (Squid replies with HTTP 400 "Bad Request"). All of these
  were allowed before.

* HTTP/1.0 GET and HEAD requests with a positive Content-Length header
  are now rejected (Squid replies with HTTP 400 "Bad Request"). All of
  these were allowed before if and only if the request_entities
  directive was explicitly set to "on".

There are no other framing-related HTTP request restrictions. Prior to
these changes, HTTP/1.1 GET and HEAD requests with a positive
Content-Length header were rejected unless the request_entities
directive was explicitly set to "on". The following configuration sketch
keeps rejecting those requests:

    acl getOrHead method GET HEAD
    acl withContentLength req_header Content-Length .
    http_access deny getOrHead withContentLength

The new restrictions were added due to possibility of cache corruption
attacks and other security issues related to HTTP request framing.

The request_entities directive was removed to simplify decision logic.

Some developers believe that these changes should be accompanied by
configuration options that allow admins to bypass (most of) the
previously absent restrictions. However, these developers do not know of
any important use cases that these changes break, and such cases may not
even exist. The authors insist on these security-driven changes.

Refactor X-Forwarded-For stats (#834)

Use std::unordered_map instead of hash_table
Clean up unused code
Rename a few functions to better reflect name

Test plan:

$ squidclient mgr:forw_headers
HTTP/1.1 200 OK
Server: squid/6.0.0-VCS

       10 foo
       15 baz
        7 bar

Remove bogus "found KEY_PRIVATE" WARNINGs (#862)

... triggered by private bumped StoreEntry unlock()ed in ~ServerBump().

The WARNINGs were added long time ago (commit fc8b9fc) because, AFAICT,
earlier Store code expected StoreEntry owners to release() uncachable
entries, including KEY_PRIVATE ones, right before unlocking them.
However, there is no compile-time enforcement of that expectation, and
unlocking code does not always know whether the entry is cachable (as
ServerBump constructor/destructor RAII code illustrates).

This change stops tying release and unlocking decisions/actions together
but makes sure that idle KEY_PRIVATE entries are still released (because
we do not want to index unneeded/unusable Store entries).

Fix SslBump reconfiguration leaking public key memory (#861)

X509_get_pubkey() increments key reference count.

Probably leaking since commit 2a268a0.

Fix ACL-related reconfiguration memory leak (#859)

Leaking since commit 4eac340 that migrated from statically-allocated
ACLStrategy<...>::Instance_ objects to dynamically allocated ones.

Most ACLStrategy objects have no data members, probably leaking just one
virtual table pointer (per named ACL), but strategies that support ACL
(data) --options, like ACLDestinationDomainStrategy and
ACLServerNameStrategy, leak memory used for options storage.

Bug 4696: Fix leaky String move assignment operator (#858)

The original attempt at fixing String move assignment operator (i.e.
commit 20a04c1) leaked the assigned-to String object memory.

These leaks are measurable even in --disable-optimizations builds.

Fix build on RISC-V (#856)

Compiling on RISC-V (without an explicit -latomic) fails with

/usr/riscv64-linux-gnu/include/c++/10/ostream:611:
undefined reference to __atomic_compare_exchange_1

Use std::atomic<uint8_t>::exchange() to detect whether -latomic
implements 1-byte compare-and-exchange API used by Squid.

Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com>

Fix build on Ubuntu 21 (#855)

The `-Wunused-result` warning for setuid(0) call was observed on Ubuntu
21.04 (when cross compiling Squid for Ubuntu 21.10).

Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com>

Fix socket accounting for TCP accept() (#854)

The incoming_sockets_accepted counter is used by Comm I/O modules to
count the number of successfully accept()ed FDs during a single
select(2) (or equivalent) scan of ready descriptors. For TCP
connections, the official code incremented that counter in the async
TcpAcceptor callbacks. Those callbacks run outside the counting Comm I/O
context, rendering counter increments useless and resulting in
under-reporting of actual acceptance activity.

Also, some successfully accepted TCP connections fire no callbacks, so
it is conceptually wrong to count accepted sockets in callbacks,
regardless of the timing problems.

Relese Prep for 4.16 and 5.0.7 (#850)

Documentation only

SSL_SESSION::master_key cannot be falsy (#848)

SSL_SESSION::master_key data member is an array. It cannot be falsy.

This fixes clang -Wpointer-bool-conversion.

Source Format Enforcement (#845)

Fix ClpMap build error on arm32 (#844)

static_assert ... "Sum() return type can fit its (unsigned) result"
in ClpMap member instantiation inside sslCrtvdHandleReplyWrapper()

Honor Sum<T>() caller's explicit request to use a specific accumulation
type T instead of guessing the right type for inner sum iterations
(based on the second, third, etc. arguments) and hitting static
assertions when those guesses were wrong because some of those arguments
used types smaller than T.

This fix also allows Sum() callers to avoid explicit T when the first
argument is already the largest one. Callers should not be forced to be
explicit at all, but computing the largest type is a complex known TODO
outside this fix scope.

Fix sslcrtd and external_acl helper name lifetimes (#843)

helperShutdown: <binary garbage> #Hlpr523 shutting down.
helperShutdown: See example.net/legal-terms #Hlpr4456 shutting down.

Busy "class helper" objects often outlive configuration that was used to
create them. Ideally, the supplied helper category name should be copied
and maintained by the helper object itself, but that long-term solution
requires a lot more work (TODO) due to out-of-scope bugs.

Reconfiguration orphans stateless helper connection (#842)

The helper.cc code (ab)uses two different Connection objects with the
same FD. Properly fixing that problem requires significant work, but the
official code almost gets away with it, in part, because helper.cc, with
one exception, uses heavily customized code to "safely" close its
connections (instead of just calling conn->close() as many Squid jobs
still do). This fix replaces that single exceptional case with a
closeWritePipeSafely() call which prevents orphaning readPipe and
triggering BUG 3329 warnings. It also restores the symmetry between the
corresponding (previously buggy) stateless and (OK) statefull code.

A code merge mistake in commit b038892 created this bug AFAICT.

cache_log_message directive (#775)

This directive controls cache.log printing of important messages.

Rationale: In any given deployment environment, many of the cache.log
messages that Squid logs by default are noise. Some of them should
probably never be logged by default, but some are useful or even
critical in certain other environments. No single approach to logging is
going to satisfy all needs. The existing debug_options directive
operates on large message groups and is not suitable for controlling
individual critical or important messages.

The current implementation uses hard-coded manually-assigned opaque
message IDs. We have also experimented with an automated generation of
message IDs, including meaningful IDs and compiler-generated IDs created
by various advanced C++ preprocessing and template tricks. All of the
automation considered (and some partially implemented) were rejected
for being overall worse than the current manual approach.

TODO: The usefulness of ID ranges will diminish with time. We could
support named (hard-coded) sets of message IDs (e.g., "startup",
"shutdown", and "configuration"). However, sets make it easier to
accidentally suppress an important new message during an upgrade.

Maintenance: Update astyle version to 3.1 (#841)

Version 2.04 is quite outdated now, and there are only minor
formatting differences with v3.1. All changes look to be good syntax.

Also, pass the astyle executable with command line option to
formater.pl. Avoiding environment variables which do not work on servers
with sanitized sub-shell environments such as our main server.

Remove custom global new/delete operators (#839)

These custom operators did not cover all new/delete cases (e.g., array
allocations), were not declared according to C++ standards (triggering
compiler warnings), and were not enabled in clang builds.

These customizations enabled custom OOM handling (for covered cases),
but it is not clear whether that feature is desirable overall, and C++
has better ways to implement such handling (i.e. set_new_handler()).

These customizations participated in collection of optional statistics
(--enable-xmalloc-statistics), but it is not clear whether that feature
implementation is good enough, and, even if it is, providing these
partial stats does not outweigh recurring customization problems.

Remove unused code silencing intercept errors (#836)

The removed code has not been actively used almost since it was added.

It is now widely accepted that NAT and TPROXY can only be done on the
machine running Squid. The corresponding address lookup errors are an
indication of either a system misconfiguration or an adverse external
event such as flushing of conntrack tables. Since these errors should be
fatal to the affected transactions and the admin usually has the power
to address them, Squid should report them at level 1.

Avoid "BUG #3329: Lost orphan ..." during accept problems (#780)

Comm::TcpAcceptor creates a Comm::Connection with an open FD. Lots of
things could go wrong while that connection object travels to its
intended owner (e.g., Ftp::Server). A Connection object abandoned during
that travel will auto-close, triggering level-4 "BUG #3329" messages.

TODO: The manual enter/leaveOrphanage() tracking is unreliable. We need
to design and implement a true connection ownership concept that does
not require such tracking. These changes highlight handover spots.

Also made a few TcpAcceptor members protected to reduce the chance of
missing a Connection recipient (i.e. TcpAcceptor subscriber). Most of
these members should be (at least) protected for other reasons as well.

Refactor Via stats to replace hash_table with std::unordered_map (#829)

Remove a lot of unwanted ifdef'd out code (#826)

We examined most `#if...` clauses and removed those we decided are
clearly unwanted now. Most of the removed code snippets were unused for
many years or represented stale ideas. We tried to check (at least
superficially) all clauses, but most likely missed some good candidates.

Add tls_key_log to report TLS communication secrets (#759)

The new log is meant for triage using traffic inspection tools like
Wireshark. Also known as SSLKEYLOGFILE, it is a common feature across
TLS-capable applications. By default, the new feature is disabled and
has negligible performance impact.

We have rejected the idea of adding an access_log %code instead, as
detailed in squid-dev thread titled "RFC: tls_key_log: report TLS
pre-master secrets, other key material". Such %code can still be added.
http://lists.squid-cache.org/pipermail/squid-dev/2020-July/009605.html

----

Generalized StoreClient::fillChecklist() API into Acl::ChecklistFiller

Code that establishes TLS connections outsources logging of secrets to
Security::KeyLogger. Security::KeyLogger::shouldLog() must consult
tls_key_log ACLs before logging the secrets but it does not know much
about the code/context that delegated this logging to KeyLogger. That
code could fill the checklist in advance, but that is an inferior
solution because it

* prevents outsourcing a rather mundane responsibility to KeyLogger
* creates a potentially stale Checklist, not reflecting TLS progress
* wastes resources on filling a checklist that may not be needed

While adding ConnStateData::fillChecklist() and looking through
ConnStateData code for examples on how to fill a checklist, it became
clear that there is a lot of code duplication in this area, including
some subtle problems with various checklist.foo assignments possibly
missing in some places (it is difficult to know for sure because some
checklist methods may extract the same information from _multiple_
sources). For example,

* on_unsupported_protocol and ssl_bump ignored acl_uses_indirect_client
* ident_lookup_access and delay_pool_access lacked port info and
other ConnStateData details beyond src/dst addresses

Now, most of that duplicated code is replaced with unified
fillChecklist() calls. This unification is a significant improvement
that provides additional justification for the fillChecklist() design.

Also renamed ACLFilledChecklist::conn() to setConn(). Renaming is not
necessary for the compiler but will help detect no longer necessary
conn() setting calls when this code is ported to other Squid versions.

Bug 4528: ICAP transactions quit on async DNS lookups (#795)

The bug directly affected some ICAP OPTIONS transactions and indirectly
affected some ICAP REQMOD/RESPMOD transactions:

* OPTIONS: When a transaction needed to look up an IP address of the
  ICAP service, and that address was not cached by Squid, it ended
  prematurely because Adaptation::Icap::Xaction::doneAll() was unaware
  of ipcache_nbgethostbyname()'s async nature. This bug is fixed now.

* REQMOD/RESPMOD: Adaptation::Icap::ModXact masked the _direct_ effects
  of the bug: ModXact::startWriting() sets state.writing before calling
  openConnection() which schedules the DNS lookup. That "I am still
  writing" state makes ModXact::doneAll() false while a REQMOD or
  RESPMOD transaction waits for the DNS lookup.

  However, REQMOD and RESPMOD transactions that require an OPTIONS
  transaction (because the service options have never been fetched
  before or have expired) could still fail because the OPTIONS
  transaction they trigger could fail as described in the first bullet.
  For example, the first few REQMOD and RESPMOD transactions for a given
  service -- all those started before the DNS lookup completes and Squid
  caches its result -- could fail this way. With the OPTIONS now fixed,
  these REQMOD and RESPMOD transactions should work correctly.

Broken since inception (commit fb505fa).

Make CustomLog more reusable (#824)

The future tls_key_log directive supports most access_log features but
should not inherit deprecated configuration styles, multi-directive
support complications, and legacy top-level configuration parsing code.
The shared/reusable bits were extracted from CustomLog and cache_cf.cc
functions into the new FormattedLog class. Some of them were polished.

All existing access_log/etc. directives continue to use CustomLog, so
nothing in the code justifies FormattedLog existence right now.

Fixed at least these old bugs:

* access_log config dumps were missing the destination parameter
* "access_log none" config dump included a bogus "logformat=none" option
* "access_log buffer-size=..." config dump were missing size units
* built-in logformat re-definition check missed the "icap_squid" format
* "access_log rotate=N" accepted (but then ignored) negative N values
* "access_log rotate=N" was stored as short but dumped as full integer

Admin-visible changes:

* access_log key=value strings are now case sensitive. This change
  improves consistency with squid.conf directives that are all case
  sensitive and most other directive options. The logformat name was
  already case sensitive. I see no good reason to keep this undocumented
  exception.

Bug 4832: '!schemeAccess' assertion on exit (#819)

* Add test to detect the bug and prevent regressions

* delete the access list if not cleared by the time
  Auth::Config instance is destructed. This is what the
  free_AuthSchemes() code does when the function call
  nesting and debugs are pruned away.

Bug 5129 pt1: remove Lock use from HttpRequestMethod (#825)

Removes the need for a custom assignment operator with a questionable
implementation, addressing compiler and static analysis warnings.

Simplified Http::Message::parse() (#814)

size_t hdr_len cannot be negative, and sanityCheckStartLine() already
rejects zero values. We now use (and clearly document) the latter fact.

Also replaced a level-1 warning about "Too large" headers with a level-3
debugging message because huge headers is not a Squid problem, and the
problem should already be visible in access.logs records.

Fix --with-valgrind-debug build broken by commit 02f5357 (#822)

error: cbdata_htable was not declared in this scope

Bug 5128: Translation: Fix % i typo in es/ERR_FORWARDING_DENIED (#821)

| ERROR: .../es/ERR_FORWARDING_DENIED: Unsupported error page %code
near % i es un...

Typo added in bbeb83f.

API to separate Config.x users/parsers from Config.y details (#811)

The new API avoids several kinds of unwanted code dependencies:

* Config.x users do not need to know the type of any Config.y component.
  Forward declarations of Config.y types are sufficient.

* Config.x parsers do not need to know about Config.y parsers. Config.x
  parsers can also be hidden from Config.x users.

* The centralized parsing code does not need to contain Config.x parsing
  code (for any x) and does not maintain "generic" parser registration.
  The compiler/linker do that for us _without_ losing C++ type safety.

Correct API implementation may also help separate the active Config
object from the being-parsed configuration, to eventually support safe
"hot" reconfiguration.

This change does not convert any existing Config fields and does not
imply a policy that existing fields must be converted when touched.

The API was tested on a typical new Config.x component (to be merged
separately). It is likely to evolve, especially when support for
multi-directive components is added.

Activated by using TYPE names with two colons (e.g., TYPE: Security::X).

Source Format Enforcement (#763)

Prep for 4.15 and 5.0.6 (#798)

Documentation only.

Fix GCC -Werror=range-loop-construct (#808)

This warning detects unnecessary object copying in C++ range loops, with
a focus on large objects and copies triggered by implicit type
conversions. Included in -Wall.

error: loop variable ... creates a copy from type ...

Fixed typo in SPONSORS.list (#812)

Fixed spellings of Digital in DigitalOcean.

Replace cbdata::Offset hack with offsetof() (#809)

Also remove unused OFFSET_OF macro.

Stop processing a response if the Store entry is gone (#806)

HttpStateData::processReply() is usually called synchronously, after
checking the Store entry status, but there are other call chains.

StoreEntry::isAccepting() adds STORE_PENDING check to the ENTRY_ABORTED
check. An accepting entry is required for writing into Store. In theory,
an entry may stop accepting new writes (without being aborted) if
FwdState or another entry co-owner writes an error response due to a
timeout or some other problem that happens while we are waiting for an
I/O callback or some such.

N.B. HTTP and FTP code cannot use StoreEntry::isAccepting() directly
because their network readers may not be the ones writing into Store --
the content may go through the adaptation layer first and that layer
might complete the store entry before the entire peer response is
received. For example, imagine an adaptation service that wants to log
the whole response containing a virus but also replaces that (large)
response with a small error reply.

Bug 5106: Broken cache manager URL parsing (#788)

Use already parsed request-target URL in cache manager and
update CacheManager to Tokanizer based URL parse

Removing use of sscan() and regex string processing which have
proven to be problematic on many levels. Most particularly with
regards to tolerance of normally harmless garbage syntax in URLs
received.

Support for generic URI schemes is added possibly resolving some
issues reported with ftp:// URL and manager access via ftp_port
sockets.

Truly generic support for /squid-internal-mgr/ path prefix is
added, fixing some user confusion about its use on cache_object:
scheme URLs.

TODO: support for single-name parameters and URL #fragments
are left to future updates. As is refactoring the QueryParams
data storage to avoid SBuf data copying.

Fix a few level-2+ debugs(); improve ErrorState debugging (#804)

When possible, report object contents rather than its memory address.

%ssl::<negotiated_version, %ssl::>negotiated_version for TLS/1.3 (#803)

Both %codes were logged as dashes when Squid negotiated TLS v1.3.

This is a Measurement Factory project.

Fix various Clang-12.0 compiler warnings (#800)

Fixes -Wrange-loop-construct, -Wmismatched-tags, and -Wshadow.

Fix GCC 10.2.0 build on Ubuntu Hirsute s390x (#796)

GCC reports
error: free-nonheap-object: snmp_core.cc(950): snmpCreateOidFromStr

Removed redundant Http::StatusLine::protocol (#794)

Also polished Http::StatusLine initialization, but that part needs a
lot more work to get rid of Http::StatusLine::init().

Handle more partial responses (#791)

Handle more Range requests (#790)

Also removed some effectively unused code.

Maintenance: Start following Inclusive Naming Initiative advice (#786)

... as detailed at https://inclusivenaming.org/

This change is limited to the words "whitelist" and "blacklist". Those
two words are easier to replace with, arguably, better ones.

No functionality changes.

Bug 5112: Excessively loud chunked reply parsing error reporting (#789)

Traffic parsing errors should be reported at level 2 (or below) because
Squid admins can usually do nothing about them and a noisy cache.log
hides important problems that they can and should do something about.

TODO: Detail this and similar parsing errors for %err_detail logging.

Also removed an unnecessary used-once macro.

Merge pull request from GHSA-jjq6-mh2h-g39h

Replace defective Must2(c, "not c") API (#785)

... with Must3(c, "c", Here())

* We should not make a Must*() API different from the standard C++
  static_assert() API. It is just too confusing, especially since the
  two calls are very closely related.

* We should not tell Must*() writers to specify one condition (i.e. what
  must happen) but then describe the opposite condition (i.e. what went
  wrong). When the text describes the opposite condition, it is easy for
  a human writing or thinking about the text to type the wrong
  condition: Must2(got < need, "incomplete message") looks correct!

We should not keep the same macro name when changing the meaning of the
second parameter. Fortunately, adding a third argument to the macro fits
nicely into how modern Squid code should pass the source code location.

Must3() does not support SBuf descriptions. I tried to support them, but
doing so without duplicating non-trivial code is too difficult, and
since the current code does not actually use them, wasteful.

If future (complex) code needs SBuf conditions, then it should probably
use an explicit throw statement instead, especially since Must*() is,
like assert(), supposed to quickly check source code sanity rather than
validate unsafe input. A compile-time condition description and the
source code location is usually enough for sanity checks, while proper
reporting of invalid input usually also requires parsing context info.

Delete a few unnecessary checks for standard C++ features (#779)

* nullptr ought to be provided by any C++ compiler we support
* HAVE_NULLPTR_T was unused while std::nullptr_t is used w/o it
* HAVE_UNIQUE_PTR was almost unused while std::unique_ptr is used w/o it

Fix HttpHeaderStats definition to include hoErrorDetail (#787)

... when Squid is built --with-openssl.

We were "lucky" that the memory area after HttpHeaderStats was not,
apparently, used for anything important enough when HttpHeader::parse(),
indirectly called from errorInitialize() during initial Squid
configuration, was writing to it.

Detected by using AddressSanitizer.

The bug was created in commit 02259ff and cemented by commit 2673511.

Maintenance: Make boilerplate copyright years update optional (#782)

Running with `--check-and-update-copyright no` allows developers to
format sources and/or perform other essential source maintenance actions
without changing the copyright year in 2000+ files.

No change to default ./scripts/source-maintenance.sh operation.

Also improved/updated scripts/source-maintenance.sh usage reporting and
added support for --help/-h options.

Maintenance: Fix two misspellings (#784)

These typos were caught by scripts/spell-check.sh but then merged anyway
due to a `git diff` status code misinterpretation bug in the CI scripts.

Fixed a couple of minor typos (#783)

Bug 5104: Memory leak in RFC 2169 response parsing (#778)

A temporary parsing buffer was not being released when
parsing completed.

Handling missing issuer certificates for TLSv1.3 (#766)

Prior to TLS v1.3 Squid could detect and fetch missing intermediate
server certificates by parsing TLS ServerHello. TLS v1.3 encrypts the
relevant part of the handshake, making such "prefetch" impossible.

Instead of looking for certificates in TLS ServerHello, Squid now waits
for the OpenSSL built-in certificate validation to fail with an
X509_V_ERR_UNABLE_TO_GET_ISSUER_CERT_LOCALLY error. Squid-supplied
verify_callback function tells OpenSSL to ignore that error. Squid
SSL_connect()-calling code detects that the error was ignored and, if
possible, fetches the missing certificates and orchestrates certificate
chain validation outside the SSL_connect() sequence. If that validation
is successful, Squid continues with SSL_connect(). See comments inside
Security::PeerConnector::negotiate() for low-level details.

In some cases, OpenSSL is able to complete SSL_connect() with an ignored
X509_V_ERR_UNABLE_TO_GET_ISSUER_CERT_LOCALLY error. If Squid validation
fails afterwards, the TLS connection is closed (before any payload is
exchanged). Ideally, the negotiation with an untrusted server should not
complete, but complexity BIO changes required to prevent such premature
completion is probably not worth reaching that ideal, especially since
all of this is just a workaround.

The long-term solution is adding SSL_ERROR_WANT_RETRY_VERIFY to OpenSSL,
giving an application a chance to download the missing certificates
during SSL_connect() negotiations. We assist OpenSSL team with that
change, but it will not be available at least until OpenSSL v3.0.

This description and changes are not specific to SslBump code paths.

This is a Measurement Factory project.

Bug 3556: "FD ... is not an open socket" for accept() problems (#777)

Many things could go wrong after Squid successfully accept(2)ed a socket
and before that socket was registered with Comm. During that window, the
socket is stored in a refcounted Connection object. When that object was
auto-destroyed on the error handling path, its attempt to auto-close the
socket would trigger level-1 BUG 3556 errors because the socket was not
yet opened from Comm point of view. This change eliminates that "already
in Connection but not yet in Comm" window.

The fixed BUG 3556 errors stalled affected clients and leaked their FDs.

TODO: Keeping that window closed should not require a human effort, but
achieving that goal probably requires significant changes. We are
investigating.

Fix build on Fedora Rawhide (#772)

* add SYSTEMD_LIBS to all binaries using client_side.cc, fixing linking
* add `<limits>` to all sources using std::numeric_limits, fixing gcc-11
builds

Protect SMP kids from unsolicited IPC answers (#771)

When an SMP kid restarts, it recreates its IPC socket to flush any old
messages, but it might still receive IPC messages intended for its
previous OS process because the sender may write the message _after_
kid's IPC socket is flushed. Squid IPC communication is connectionless,
so the sender cannot easily detect the reopening of the recipient socket
to prevent this race condition. Some notifications are desirable across
kid restarts, so properly switching to connection-oriented IPC
communication would only complicate things further.

This change protects kids from, for the lack of a better term, an
"unsolicited" answer: An answer to a question the recipient did not ask.
When allowed to reach regular message-handling code, unsolicited answers
result in misleading diagnostics, may trigger assertions, and might even
result in a bad recipient state. For example, a kid might think it has
been successfully registered with Coordinator while its registration
attempt was actually dropped due to a Coordinator restart.

Our protection targets one specific use case: Responses that were born
(or became) "unsolicited" due to a recipient restart. Other problematic
cases may not require any protection, may require a very different
protection mechanism (e.g. cryptography), may deal with requests rather
than responses, or even cannot be reliably detected. For example:

* messages sent by a malicious attacker
* requests sent by a misconfigured Squid instance
* requests sent by a previous Squid instance
* messages sent by a no-longer-running kid process
* messages sent by buggy Squid code

----

Also marked a few out-of-scope problems/improvements, including a bug:

Improved handling of Coordinator and Strand exceptions exposed and
partially addressed an old problem: When configured to listen on an
inaccessible port, Squid used to kill the Coordinator job, resulting in
subsequent kid registration timeouts. Squid now correctly keeps the
Coordinator job running, logging a detailed report:

    WARNING: Ignoring problematic IPC message
        message type: 5
        problem: check failed: fd >= 0
        exception location: TypedMsgHdr.cc(198) putFd

Still, the affected kids do not report the port opening problem and
continue running, listening on the problem-free ports (if any).
Depending on the exception timing, similar behavior was possible before
these changes. The correct action here is to send the port opening error
to the kid process without throwing the putFd() exception, but we should
decide how Squid should handle such inaccessible configured ports first.

Detail certificate validation errors during TLS handshake (#770)

Fix certificate validation error handling in Security::Connect/Accept().
The existing validation details were not propagated/copied to IoResult,
requiring the caller to extract them via ssl_ex_index_ssl_error_detail.
The clunky approach even required a special "ErrorDetail generations"
API to figure out which error detail is "primary": the one received in
IoResult or the just extracted one. That API is removed now.

This change is used by the upcoming improvements that fetch missing TLS
v1.3 server certificates, but it also has an immediate positive effect
on the existing reporting of the client certificate validation errors.
Currently, only a general TLS error is reported for those cases because
Security::Accept() code forgot to check ssl_ex_index_ssl_error_detail.

This is a Measurement Factory project.