git.ipfire.org Git - thirdparty/squid.git/log

Bug 5186: noteDestinationsEnd check failed: transportWait (#985)

When the "no more destinations to try" notification comes after the last
forwarding/tunneling attempt has failed, the !destinationsFound block
does not run (because destinations were found), the usingDestination()
block does not run (because we are done with that last/failed
destination), but transportWait is false (for the same reason).

Also applied Bug 5090 (master/v6 commit 15bde30) FwdState protections to
tunnel.cc code. Tunnels may continue writing to the client while the
to-server connection is closing or closed, so TunnelStateData can be
potentially exposed to bug 5090 "no server connection but still
transporting" concerns. TunnelStateData never _retries_ successfully
established tunnels and, hence, can (and now does) stop receiving spare
destinations after committedToServer becomes true, but future tunnels
may start reforwarding in many cases, and most of the code does not need
to know about this (temporary?) simplification.

Also re-unified and polished related FwdState and TunnelStateData code,
including fixing lying source code comments and debug messages.

Improve debugs() handling in helpers (#1011)

This change also reduces libdebug dependency on globals.cc, improving
libdebug reusability.

Also do not default-reset debug sections after they were explicitly set:

* In most sbin/squid contexts, DebugModule constructor is called before
  Debug::parseOptions(). That call order results in the Levels array
  being reset to default values before it is reset to configured values.
  No problem.

* In sbin/squid -X context, DebugModule constructor is called after
  Debug::parseOptions(), but the override_X flag protected the Levels
  array from being reset to default values in this case. No problem.

* In helper contexts, DebugModule constructor may be called after
  Debug::parseOptions(), and the override_X flag stays false. This order
  results in the parseOptions() effects erased by ResetSections() called
  from the constructor.

This bug was detected while trying to understand why pinger's hard-coded
(and wrong) ALL,10 default has no effect on pinger's debugging. The two
bugs cancelled each other.

The following changes affect pinger (now) and other helpers that will
eventually use libdebug and its new NameThisHelper() API:

* Label helper debugs() lines with the helper name (e.g., "pinger"),
  similar to how we already label SMP debugs() lines with "kidN". This
  change improves cache.log readability, distinguishes output from
  different helpers, and distinguishes helper output from sbin/squid
  output in non-SMP logs.

* Make sure level-1 debugs() messages are logged.

* Stop pointless accumulation of cache.log and syslog channel messages.

* Automatically honor SQUID_DEBUG environment variable set by the parent
  Squid process. The pinger helper was already honoring it.

The following changes are specific to pinger:

* Removed the now-duplicated references to "pinger" in pinger debugs().

* Do not default pinger debugging levels to ALL,10. The default pinger
  debugging levels should be the same as the default Squid debugging
  levels (i.e. "ALL,1"). Bugs in the debugging module prevented the
  hard-coded "ALL,10" (or any other elevated setting) from having an
  effect, but we now fixed the last of those bugs. AFAICT, the change of
  default from ALL,1 to ALL,10 in commit cc192b5 was accidental.

Prep for 5.5 (#1014)

Remove failing-to-build and not-built-by-default ufsdump (#1013)

The program was not built by default since 2010 (see commit 2d94e2d).
Its build has been failing since before 2017 (see commit 4c2f8b7).

Judging by commit log, other ufsdump problems were discovered and often
left unaddressed throughout the years. Addressing some of them probably
requires serious work. The program source code and its dependencies have
been neglected for a long time and are a distraction. If we decide the
Project should have this tool, it should be rewritten from scratch.

The program does not support newer STORE_META_ TLVs and rock cache_dirs.

There are no signs that the program is in demand. No official bug
reports mention ufsdump since 2013. I could only find irrelevant
squid-dev references and no squid-users references at all since 2012.

comm/libminimal.la to facilitate helper use of convenience libs (#1009)

Helpers may not (want to) use fd_open() and fd_close() directly, but
they (want to) use libdebug which does use fd_open() and fd_close().

Currently, the Comm "importing" fd_open() and Comm "delisting"
fd_close() APIs are not implemented inside src/comm/ but they do belong
to the Comm module and use its internals. Moving fd.h to src/comm/ will
require a lot of noisy out-of-scope changes deserving a dedicated PR.

Despite their names, the minimally-implemented functions do not open(2)
and close(2) file descriptors in their full implementations either: That
full implementation just updates fd-associated Squid-specific metadata
that Squid helpers do not need/use.

Do not use stub_fd.cc in the deployed/non-test pinger program.

Avoid build-breaking copying of stub_fd.cc source file in test-suite/.

Maintenance: Removed a few known unused globals from src/tests (#1008)

Do not ApplyTcpKeepAlive() to PortCfg-unrelated traffic (#1006)

Honor assertions during shutdown (#1007)

From code correctness/guarantees point of view, it is much better to
assert than to exhibit undefined behavior, especially since the
asserting code is usually not shutdown-specific and the shutdown state
often lasts through hundreds of transactions.

We are not aware of any frequent assertions during shutdown, and we want
to fix the ones that do exist (instead of not knowing about them). Thus,
this change is unlikely to introduce a lot of problems and might trigger
other positive changes.

Bypassing assertion failures does not guarantee the code will keep
running: In many (most?) cases, the asserting code will still crash or
seriously misbehave. In those cases, this change clearly improves Squid.

We ignored assertion failures during shutdown since Squid started
customizing assert() in commit 54f742e. Back in 1998, Squid was known to
often crash while shutting down, the crashes were often "benign" (the
code was just mishandling disappearing modules), and Squid could not
always start after a crash, complicating startup scripts. With a focus
on adding new features, we probably felt it is best to ignore these
usually minor but often annoying failures.

This change reduces libdebug dependency on globals.cc, addressing commit
6249367 TODO, and improving libdebug reusability.

Logformat %lp expands to "-" in wildcard listening port configs (#997)

FindListeningPortAddress() and its helpers look for "local" address of
an accepted connection. When FindListeningPortAddress() is called in %la
context, we must skip Ip::Address objects with INADDR_ANY IP addresses
because we are looking for a specific IP address, not a wildcard.
However, when called in %lp context, skipping those Ip::Address objects
may, in some cases, result in skipping the only object that actually
contains the port information, resulting in %lp expanding to "-".

Similarly, zero-port Ip::Address objects (with non-any IPs) could be, in
theory, returned instead of continuing the search for an object with a
non-zero port number, although this case was not observed in tests.

Now we configure the address searching helpers with a "good Ip::Address
object" filter so that each %code code path can customize its search.

The problem was introduced in commit ea35939 that expanded INADDR_ANY
check scope from %la to %lp.

Remove SCO 3.2 support (#1005)

This OS is now obsolete. Users wishing to build for this OS
can use build time options:
./configure --disable-poll CFLAGS="-lintl"

also add to lib/util.c:
#define rint(X) floor((X) + 0.5)

mem/libminimal.la to facilitate helper reuse of convenience libs (#1004)

Squid helpers/tools cannot reuse features like SBuf because, in part,
the corresponding convenience libraries depend on libmem.la which drags
in heavy dependencies, including Store and time-based Event modules. The
new mem/libminimal convenience library implements enough of src/mem/
APIs to make the new memory library usable by helpers/tools without
causing a dependency explosion.

Do not use tests/stub_libmem.cc in the deployed/non-test squidclient,
cachemgr.cgi, and pinger programs.

Kid restart leads to persistent queue overflows, delays/timeouts (#706)

    WARNING: communication with ... may be too slow or disrupted...
    WARNING: abandoning ... I/Os
    ERROR: worker I/O push queue for ... overflow...
    ERROR: Collapsed forwarding queue overflow...

SMP queues rely on the shared memory QueueReader::popSignal flag to
reduce the number of UDS messages that queue readers and writers need to
send each other. If the flag is true but there is no corresponding "wake
up, you have new queued items to read" UDS message for the reader, the
reader may stall. This happens when the reader restarts (e.g., after
hitting an assertion) while the flag is true. A stalled queue reader
leads to delays and queue overflows:

* When the problem affects worker-disker queues, disk I/O delays under
  the hard-coded 7-second timeout are not reported to the admin but may
  affect user experience. Larger delays trigger level-1 WARNINGs. Push
  queue overflows trigger level-1 ERRORs.

* Transient worker-worker queue problems may stall concurrent
  transactions that are reading from the cache entry being written by
  another process. Overflows trigger level-1 ERRORs.

The restarted worker usually starts working just fine because it does
not expect any messages. A busy restarted worker may also appear to
continue working fine because workers always pop queued items before
pushing new ones -- as long as the worker queues new items, it will see
and pop responses to earlier requests, masking the problem. However, the
"stuck popSignal" problem never goes away: Squid only clears the flag
when receiving a notification, but sending new notifications is blocked
by that stuck flag.

Upon kid start, we now clear popSignal (to reflect the actual
communication state) and empty the queue (to reduce overflows). Since
commit 4c21861, any corresponding in-flight UDS queue notification is
ignored because it was sent to the previous process playing the same kid
role. The queue writer will see the false popSignal flag and send a new
notification when queuing a new item, preventing queue stalls.

Also properly ignore stale disker responses about cache_dirs we have not
opened yet, especially since we are now trying to empty the queues ASAP
after a restart, before Coordinator has a chance to inform us about
available diskers, populating the IpcIoFiles container. We already have
similar protection from stale UDS messages and from stale disker queue
messages about _opened_ cache_dirs ("LATE disker response to...").

Also report SMP queue flags in mgr:store_queues.

Detach libsbuf from StatHist to facilitate SBuf reuse (#1003)

The stated purpose of sbuf/DetailedStats was to "avoid adding external
dependencies to the SBuf code". DetailedStats failed to accomplish that
because it introduced a dependency on StatHist (which depends on Store).

To break the unwanted dependency, we outsource at-destruction-time size
statistics collection to external-to-SBuf code, configurable via the new
SBufStats fields. This also allows to avoid non-trivial SBuf statistics
collection in programs that do not need that statistics.

Since the new SBufStats fields are set at cache manager configuration
time in mainInitialize(), earlier SBuf and MemBlob destructions are not
accounted for. With more code changes/complications, we could initialize
the fields much earlier, but this delay may be considered a _positive_
change because it removes unusual SBuf stats from long-term histograms.

Also do not use the SBuf STUB in the deployed/non-test pinger program.
TODO: Convert more helpers to use SBuf. This change only adjusts pinger
because that adjustment did not require other significant changes.

Bug 4946: client_side_request.cc: "request != newRequest" (#1000)

... assertion when preceded by the following error message:

    ERROR: Inconsistent service method ... in dynamic adaptation chain

The assertion is triggered by the following chain of events. During
SslBump step1, a REQMOD adaptation service returns a dynamic
(X-Next-Services) plan containing a RESPMOD service P. Then, during
SslBump step2 (after obtaining TLS client SNI):

* Adaptation::AccessCheck::start() discovers P in the "future services"
  storage (Adaptation::History::theFutureServices) and returns it.

* The adaptation routing code correctly concludes that P is not
  applicable to the current vectoring point, logs the above ERROR, and
  returns the untouched virgin message object to the adaptation
  initiator. See thePlan.exhausted() in Adaptation::Iterator::step().

* ClientHttpRequest asserts because it expects a new message object.

Fixed Adaptation::AccessCheck code no longer assumes that it cannot be
activated twice for the same vectoring point. It leaves services
applicable to future vectoring points in theFutureServices instead of
always suggesting them for the current vectoring point.

TODO: We can and should optimize adaptation requesting code to stop
requiring a new message object when no adaptation is necessary, but that
change is difficult (we tried!) and independent from the bug fixed here.

Fix ignore-cc/act-as-origin in wildcard split-stack ports (#994)

The PortCfg::clone() hack (and clone_http_port_list() before it) forgot
to copy those two flags to the IPv4 port variant.

Compilers will now be able to warn us if copying misses future members.

Also prohibited other forms of copying, nearly restricting copying to
the parsing code with special needs.

Add RegisteredRunners::bootstrapConfig event hook (#992)

Remove --disable-loadable-modules build option (#990)

... in favour of libtool --enable/disable-shared option which
provides the same functionality.

This has the nice side effect of simplifying the LT_INIT and
related autoconf sequences which were being modified by
--disable-loadable-modules.

Drop redundant LIBADD_DL overwrite (#989)

The removed code questions whether it is needed.

The answer is no. LIBADD_DL is already set by one of the LT_INIT
macros to the same value we are overwriting it to.

Except, in the case where the user has supplied a custom libdl
location this code broke their build.

Improve modularity of IDENT protocol code (#991)

Simple changes to further isolate the IDENT code
in ident/libident.la and redux the mainInitialize() sequence.

Ident::Start() is already checking for and triggering ident_hash
initialization. No need to do so in main.cc if there is never
any use of IDENT. Removing one unnecessary memory allocation.

Fix Sum() by replacing it with a safer NaturalSum() (#869)

While testing future cache entry expiration computation code, we found a
Sum() bug: Sum(0l, -1lu) returns -1 instead of overflowing (long cannot
hold the maximum unsigned long value). This discovery triggered an
investigation that discovered several Sum() flaws, even inside the
supposed-to-be-trivial implementation branch for unsigned operands:

0. AllUnsigned-based optimization path selection was based on the
   detection of unsigned raw S and T types, but the actual s+t sum
   used integral-promoted types that could be signed!

1. "sizeof(T) >= sizeof(U)" assertion missed that same-size types have
   very different maximums when exactly one of the types is signed.

2. "auto sum = a + b" assignment missed that "auto" may be bigger than
   the return type "T" and, hence, the sum may not overflow when it does
   not fit T. The overflow would go undetected and the result will be
   truncated (to fit T) in the return statement. This automatic sum type
   enlargement may be due to signed->unsigned integral conversion
   related to the previous bullet, but it can also happen because of
   integral _promotions_ (e.g., those that convert "char" into "int").

3. Sum() silently truncated its arguments to fit T. Before commit
   1fba9ab, that silent truncation only applied to the first argument
   (if T did not match its actual type). After that commit, all other
   arguments could be silently truncated (except the last one).

4. It is trivial for the caller to do everything right when calling
   Sum() but then assign the result to a variable that cannot hold Sum()
   value, essentially undoing all the overflow detection work.

Fortunately, none of these bugs affected two existing Sum() callers.

Some of these problems were very difficult to fix! Eventually, a new,
simplified concept emerged that was easier to implement and that was a
better match for Squid current and foreseeable needs: NaturalSum().

NaturalSum() is designed for cases where we want an exact sum of
arguments but do not consider negative arguments as valid numbers. This
both simplifies the implementation and protects typical callers from
adding "-1" (e.g., a special "no delay" option setting) to "3600" (e.g.,
configured TTL) and getting a meaningless result of 3599.

NaturalSum() requires the caller to specify the summation type that will
be used to accumulate the sum value as Squid iterates over arguments.
Sum() wanted to automatically use the largest type that can accommodate
(partial) sums, but that complicated task was not implemented and
becomes unnecessary when dealing with only natural numbers -- there is
no need to temporary inflate the partial sum (beyond what the resulting
type can hold) in case some negative operand will decrease it later.

Also added SetToNaturalSumOrMax() to allow the caller to reset a
variable without guessing its type, avoiding silent sum truncation at
assignment time.

Also added unit test cases.

Maintenance: code style updates in libcompat (#880)

The libcompat code is targeted here specifically because it is
rarely touched so is not likely to be soon updated by our
"upgrade when modified" policy for source style changes.

Doing this part of the code semi-manually also allows us
to trivially exclude the directory from automated conversion
attempts and not bother with automating the detection of
whether a .h is included in C files.

Fix double-free segmentation fault on shutdown (#913)

SourceLayout: Move debugs()-related code into src/debug/ (#984)

The new library is needed to facilitate debugs() reuse and, perhaps more
importantly, reuse of other Squid code that already uses debugs() -- all
without build-breaking hacks of copying debug.cc source files and very
dangerous stub use in production running code.

The Raw class, asHex(), and other I/O manipulators are not specific to
debugs() streams. They are now in src/base/.

Also removed test-suite/debug, addressing an XXX. That test was
misplaced and not really doing anything useful -- the compiler already
tests what test-suite/debug was testing (and much more).

Also removed the no longer necessary xassert() stub from test-suite/.

TODO: Adjust the moved code to use Debug namespace. Those adjustments
are likely to trigger debug/Stream.h reorganization, simplifying exposed
debugging API. They deserve a dedicated PR.

TODO: Convert more helpers to use the new debugging library. This change
only adjusts pinger because that adjustment did not require other
significant changes.

ESI: Drop incorrect and unnecessary xmlSetFeature() call (#988)

xmlSetFeature() has been deprecated for 10+ years and will eventually be
removed from libxml2. Squid calls xmlSetFeature() with the wrong
argument: a nil `value` pointer instead of a pointer to a zero value.
When called with a nil `value`, the function does nothing but returning
an error. Squid does not check whether xmlSetFeature() call is
successful, and the bug went unnoticed since libxml2 support was added
in commit 964b44c.

Since libxml2 does not substitute entities by default, the call can be
removed to achieve the intended effect.

Fix build on Illumos (#983)

Do not stop listening after "ERROR: NAT/TPROXY lookup failed..." (#837)

_Intermittent_ NAT/TPROXY lookup failures can happen for several reasons
outside Squid control. For example, flushing conntrack tables under load
is likely to trigger such errors. Squid should not stop listening if
there is a good change that the next accept(2) will be fully successful.
We already apply the same logic to several other accept problems today.

Also added an XXX that TcpAcceptor::mustStop() does not work as expected
on error.

Do not skip problematic regexes in ACLs (#979)

This change has two partially overlapping parts:

* Reject configurations with ACLs containing failed-to-compile regexes.
* Do not skip ACL regexes with BUFSIZ or more characters.

Invalid or long ACL regexes were skipped (with an ERROR message),
resulting in a dangerous mismatch between admin (mis)interpretation of
their squid.conf and the actual configuration of the running instance.
Given the volume of ERRORs busy Squids are often reporting, including
transaction errors that admins may consider irrelevant, admins are not
paying enough attention to non-fatal configuration ERRORs, especially
reconfiguration ERRORs. Squid should reject erroneous configs instead.

And until we find (and this time document!) a specific reason to
artificially restrict individual regex length, let the Squid
configuration tokenizer and the regex library limit RE lengths. KISS.

Also deliver what commit 0fa036e promised but failed to do: "Squid no
longer reports REs _optimization_ failure when it is an individual RE
that is broken (and reported as such)". In that commit, I overlooked the
fact that Squid still ignored broken REs at the time, fooling the
higher-level code into thinking that they were OK (and triggering
unnecessary reporting of misleading optimization errors).

Besides rejection of invalid REs, there are two known side effects:

1. Squid may silently start using previously skipped long REs[^1].
2. Squid may fail/succeed/change regex optimization because the
previously skipped long REs now participate in optimization attempts.
In nearly all cases, this does not affect the ACL matching outcome.

[^1]: The regex library is unlikely to reject regexes based on their
length alone because libraries ought to accept regexes much longer than
longest Squid configuration tokens. For example, glibc accepts 4MB-long
regexes on 32-bit systems AFAICT based on a quick scan of regcomp code.

Prep for 5.4.1 (#978)

Improve coredump_dir on FreeBSD and Solaris based OS (#974)

Disclose that coredump_dir may also make the process "traceable",
including enabling core dumps and ptrace(2) attachments in certain
environments.

Add support for making the process traceable in FreeBSD- and
Solaris-like environments.

When Squid does not know how to make the process traceable in a given
environment, it does nothing under the assumption that the process is
already traceable enough (by default) to dump cores. Alternatively, we
could warn the admin about the lack of tracing support for that
environment. It is not clear which option is better, but we are betting
on the processes being traceable by default in most not explicitly
covered environments.

Fix FreeBSD 14 build (#975)

FreeBSD 14 defines 3-parameter CPU_AND() macro as a `do {} while` loop.
Our (void) in front of that loop creates a syntax error.

CpuAffinitySet.cc:41:16: error: expected expression
(void) CPU_AND(&cpuSet, &cpuSet, &theOrigCpuSet);

That (void) was added in commit 7ec6d51 to "remove GNU-specific syntax",
but we cannot tell what specific problem that 10-year old change solved.
Known 3-parameter CPU_AND(3) documentation says the call returns void.

Also included a missing header providing IPPROTO_TCP definition.

Prep for 5.4 (#973)

client_side_request.cc:2028 "request->method.id()" assertion (#971)

ConnStateData::tunnelOnError() ignored its method parameter and always
called initiateTunneledRequest() with METHOD_NONE. buildFakeRequest()
then set HttpRequest::method to METHOD_NONE. Squid does not support such
HttpRequest objects well because quite a bit of code assumes that
HttpRequest::method must be known. For example, depending on
configuration and other factors, Squid may assert.

Moreover, many Squids using the on_unsupported_protocol directive also
have special rules for handling tunnels and those rules may not work as
intended for these METHOD_NONE transactions.

Squid now uses CONNECT method when it creates a CONNECT-like request
that facilitates on_unsupported_protocol tunneling. This helps meet code
expectations about HttpRequest::method being defined and natural admin
expectations about tunneled requests having a CONNECT method.

Admins that want to distinguish on_unsupported_protocol tunnels from
other tunnels can use ACL annotations (for now). If needed, one can add
a better/dedicated way of identifying on_unsupported_protocol tunnels.

Also removed the method parameter from clientTunnelOnError() and related
methods. That method was extracted from a low-level parser field and

- for cases where the higher-level code deemed input to be non-HTTP, it
was wrong to use essentially garbage/non-HTTP chars as a method name;
- for other cases, the method is available via HttpRequest::method.

TODO: We may be able to remove more duplicated parameters or unnecessary
checks on this code path: Perhaps clientReplyContext::setReplyToError()
method parameter can be retrieved from errstate->request? Perhaps
errstate->request itself is always the same as http->request?

This is a Measurement Factory project.

Moved regcomp(3)-specific RegexPattern code inside RegexPattern (#972)

Also moved non-ACL regex configuration parsing code inside ConfigParser.
It is possible to move ACL (data) regex configuration parsing code as
well, but that move is a lot more complex due to regex-pattern-joining
optimizations, and there are no immediate plans to support
non-regcomp(3) regexes in ACL data. We may do that move later as we get
more experience with non-regcomp(3) regexes and decide to join them too.

These moves clean up existing regex-using code and allow adding support
for non-regcomp(3) regexes (e.g., regexes based on C++11 <regex>)
without adjusting ConfigParser::regex() callers. Such support would also
require "virtualizing" RegexPattern. To avoid increasing complexity and
hurting performance, that (simpler) step should be done only if we
decide to actually add support for non-regcomp(3) regexes.

The above changes allowed us to improve RegexData error reporting: Squid
no longer reports REs _optimization_ failure when it is an individual RE
that is broken (and reported as such). Squid still ignores the fact that
broken REs can be "optimized" into a completely "different" valid
combined RE: We do not compile individual REs unless optimization fails.

Also simplified and polished affected code.

Log early level-0/1 debugs() messages to cache_log (#950)

Commit d7ca82e dropped cache.log-recording of debugs() messages produced
by `finalizeConfig` runners (e.g., `WARNING: mem-cache size is...`).
This change restores that functionality (by buffering early messages
until the cache.log file is opened) and improves early debugs() handling
as detailed below.

## Squid has three channels for debugs() messages:

* cache.log (`cache_log`): Settles as squid.conf settings take effect.
* stderr (mostly `-d`): Settles when command line options take effect.
* syslog (mostly `-s`): Settles when command line options take effect.

Squid always ignores debugging messages with section/level mismatching
Debug::Levels configuration (driven by a combination of the `-k debug`,
`-X`, and `debug_options` directives). _Beyond_ that ever-present
top-level filter, each debugging channel has its own set of rules that
determine which filtered debugs() messages the channel accepts; the
following approximate summary is based on the changes in this commit:

* cache.log: all messages;
* stderr: either messages satisfying explicit `-d` level restrictions or
  messages that Squid failed to write to cache.log (if no `-d`);
* syslog: level-0/1 messages and `ForceAllert` messages.

This change encapsulates channel-specific logic in dedicated classes.

## Guiding design principles

* **no repetition**: A channel must record each message at most once.
* **no reordering**: Each channel must preserve same-process debugs()
  call order across all recorded messages.
* **no loss**: Each channel must log all messages matching the channel
  configuration/filters.
* **no delay**: Each channel must record each message ASAP.
* **cache.log primacy**: Admins want messages logged to cache.log if
  possible and to stderr/syslog only if necessary or explicitly
  requested.
* **cache.log locking**: No cache.log updates without a PID lock.

## The fix

Commit d7ca82e dropped level-0/1 messages produced by `finalizeConfig`
runners because, since that commit, the cache.log channel was opened
_after_ the `finalizeConfig` event. Official code also dropped other,
even earlier level-0/1 messages, violating the "cache.log primacy"
principle. We now save early level-0/1 messages into a buffer. This
buffering can be misinterpreted as violating the "no [artificial] delay"
principle, but the messages are actually written to cache.log ASAP;
without the buffer the messages would be missing from the "primary"
cache.log and, in many use cases without stderr capturing, completely.

If Squid fails to open cache.log, early messages saved for the cache.log
channel are given to the stderr channel (following the "no loss"
principle). The stderr channel logs those messages that obey explicit
`-d` restrictions and do not violate the "no reordering" and "no
repetition" principles. Violations are tracked by assigning each message
its debugs() call sequence number.

The early messages buffer is currently dedicated to level-0/1 messages
because we were worried that level-2+ messages (if enabled via `-X`)
would overflow any reasonably-sized buffer[^1]. Correctly handling such
overflows is very difficult (we tried), so we avoid them instead.

## Side effects and surprises

To allow admins to see early level-1 cache.log messages (without adding
an `if early` check to the debugs() macro[^2]), we changed the _initial_
Debug::Levels value from `ALL,0` to `ALL,1`, matching the debug_options
default set later. This fix uncovered a few early level-1 messages that
were previously hidden[^3]:

    09:46:36| Startup: Initializing Authentication Schemes ...
    09:46:36| Startup: Initialized Authentication Scheme 'basic'
    09:46:36| Startup: Initialized Authentication Scheme 'digest'
    09:46:36| Startup: Initialized Authentication Scheme 'negotiate'
    09:46:36| Startup: Initialized Authentication Scheme 'ntlm'
    09:46:36| Startup: Initialized Authentication.
    09:46:36| Processing Configuration File: .../squid.conf ...
    09:46:36| Initializing https:// proxy context
    09:46:36| Set Current Directory to /usr/local/squid/var

The fact that many debugs() messages happen before cache_log can be
opened is fairly obvious, but it is also true that stderr and syslog
channels cannot write messages immediately. Both channels need to wait
for the command line options to be parsed. Even the global `stderr`
variable may not be available during very early debugs() calls! Each
channel now buffers level-0/1 messages until it settles.

Since the early message buffers are limited to level-0/1 messages,
initial cache.log records logged by `squid -X` are level-0/1 messages,
followed by true ALL,9 debugging. The admins can get early ALL,9
messages via stderr, of course.

The `-z` command-line option no longer overrides `-d` settings.

Squid no longer writes to cache_log after removing the PID file.

fatal() text and the `Squid Cache...: Terminated abnormally.` message
are no longer dropped during certain early process terminations.

`squid -k ...` logs some new level-0/1 messages to stderr.

Improved support for assert()/debugs() triggered from within the Debug
module: Besides crashes, such "internal" debugs() could be logged before
earlier "external" messages and some assert() messages could be lost.

[^1]: The restriction to only buffer level-0/1 messages can be easily
removed (after research and discussion) if there is consensus that the
actual memory required to accumulate all typical level-2+ early messages
is worth spending on making `squid -X` write all messages to cache.log.

[^2]: Squid has thousands of debugs() calls (and counting), including
many calls on performance-sensitive paths. Most debugs() calls should do
nothing by default. Thus, the speed at which the debugs() macro can skip
logging is an important common case on the performance sensitive path.
Similarly, disruption to CPU processing pipeline due to top-level
debugs() checks may be important. Actually writing the message to the
cache.log may not be that important -- at that time, the performance
battle can be considered lost -- but the initial rejection is.

[^3]: This change is not about the levels of any specific messages.
Wrong message levels (if any) should be fixed separately. This change
does not imply that the newly discovered messages have wrong levels.

Support reliable zeroing of sensitive buffers (#758)

TODO: Use the new API for more sensible buffers, possibly adding a
wrapper class for sensitive content to automate cleanup.

Source Format Enforcement (#963)

Besides routine formatting enforcement, this change contains flag-day
updates and code polishing from code removals performed in the latter
half of 2021 (e.g., HERE removal).

(summary) log from the script execution:

UPDATE COPYRIGHT for ...
NOTICE: File ... changed: by scripts/maintenance/HERE-obsolete
NOTICE: File src/Makefile.am changed: by
   scripts/format-makefile-am.pl
NOTICE: File src/sbuf/Stream.h changed: by
   scripts/maintenance/sort-includes.pl
NOTICE: File src/tests/stub_store_client.cc changed: by
   scripts/maintenance/sort-includes.pl

If you are worried about hitting all this at once when rebasing a
large patch you can split the rebase into steps as follows:

    git fetch --all
    git rebase [ commit ID prior to this one ]
    git rebase [ this commit's ID ]
    git rebase master

Use RegisteredRunners to initialize/clean the ESI module (#965)

Bug 5192: esi_parser default is incorrect (#968)

Since commit f5f7786 reversed the internal list order of ESI parser
registry, the esi_parser documentation and code comment in esi/Module.cc
have been incorrect.

Return ESI parsers to the documented default behaviour and make that
default explicit in the code selecting which Parser to initialize.

Also fixed an inverted test that resulted in the esi_parser configured
library _not_ to be the one used.

Bug 5194: Remove all unused debug sections (#969)

... that we can find quickly.

Naming/documenting debugging sections is a good idea, but that should
not be done in every source code file that relies on that section. We
cannot remove all such DEBUG: declarations without developing a proper
way to name/document sections, but we can (and, given Bug 5194
existence, probably should) remove the unused ones -- their removal does
not remove any immediately usable info.

Bug 5194 report was specific to Section 56: Folks misinterpret section
56 "HTTP Message Body" title as promising to dump message bodies to
cache.log, which is not a functionality that should be driven by a
debugging section. Currently, Squid lacks such functionality.

Avoid reverse DNS lookups when logformat %>A is unused (#912)

Initially, the log_fqdn directive decided whether to do reverse DNS
lookups of client IP addresses ASAP, to improve our chances of logging
the resolved-by-then client FQDN.

Since commit 7684c4b, Squid started violating log_fqdn configuration,
enabling early reverse lookups if %>A was used in some logformat. Seven
years later, commit c581e96 completely removed the log_fqdn directive.

Unfortunately, the idea that seeing %>A somewhere means early lookups
are needed is flawed because

1. Some logformats containing %>A are never parsed. For example, the
   icap_squid logformat is still hard-coded as a printf()-based code.
   Using that format should enable early reverse lookups, but does not.

2. Some parsed logformats containing %>A may be unused. This is
   especially true for _default_ logformats that admins cannot control
   at all. Initially, no default logformats were parsed, but four years
   later, commit b11724b converted some hard-coded printf()s into parsed
   default logformats, inadvertently enabling reverse lookups in all
   Squid configurations!

   There is no way an admin could turn those lookups off because our
   DEFAULT: lines are parsed unconditionally; overwriting those defaults
   simply means that the corresponding logformat directive is parsed
   twice, once when parsing defaults (enabling the lookups) and then
   when parsing admin settings (the lookups state remains unchanged).

Transactions do not wait for these DNS lookup queries to be answered
and, due to DNS caching, not every client connection triggers a new DNS
query, but the total volume of these lookups may be significant in
environments with many client IP addresses, especially after a fresh
Squid start (or during a flash crowd arrival), when Squid DNS cache is
empty (or has not been primed with a lot of new addresses yet).

This change replaces the "parsed %>A enables early lookups" logic with
the "used %>A enables early lookups" approach. Logging of FQDNs is still
not guaranteed -- the lookups may not be enabled early enough and may
not complete by the time we use %>A -- but Squid instances not using %>A
are now guaranteed to avoid useless reverse DNS lookups of client IPs.

Also fixed logging of client FQDNs in the default ICAP access log format
(a.k.a. icap_squid): That format has the equivalent of a %>A field.
Thus, the icap_squid logging code must _always_ attempt to log FQDN and
enable early reverse DNS lookups. Neither was happening.

Categorize level-0/1 messages; part 1: "easy" problem messages (#946)

Today, it is impractical to reliably identify important cache.log
messages in automated sysadmin notification systems. No regular
expression is guaranteed to match all important messages and none of the
unimportant ones. While message importance is naturally subjective (with
cache_log_message and notification system regexes available to customize
handling of specific messages), even reliable detection of _potentially_
important messages (to bootstrap admin customization efforts) is
impractical today. Any notification configuration is likely to miss an
important message (in today's or future code) and/or spam admins with
unimportant messages.

Fully addressing this problem requires many changes. This change lays a
foundation for that work by standardizing three well-known message
prefixes (`FATAL:`, `ERROR:`, and `WARNING:`) for debugs() messages
reporting various problems. Eventually, all problem-reporting messages
(and only those) will use one of the standard prefixes. This change
converts 300+ debugs(), bringing the total to 900+ messages using a
a standard prefix. There are more than 5'000 debugs() in Squid.

The scope of this change are problem-reporting debugs() statements that
can be detected and converted by a simple script. We also applied a few
easy-to-automate polishing touches to modified debugs(), like replacing
debug level 0 with DBG_CRITICAL, expanding some contractions (e.g.,
"can't"), and fixing grammar (e.g., "is has").

Maintenance: De-duplicate PackableStreamBuf and SBufStreamBuf (#959)

SBufStreamBuf was almost identical to PackableStreamBuf. Both classes
are pass-through write-only streambufs that lack their own put area. Now
their common code lives in AppendingStreamBuf.

No functionality changes intended.

The removed std::streambuf method descriptions were imprecise and some
were misleading. STL documentation describes this standard/parent API.

While also similar, PackableStream and SBufStream have important
differences in API and implementation. For example, unlike
PackableStream, SBufStream owns the sink buffer and does not modify the
buffer it was created with, providing a safe content extraction method
instead. The two classes cannot be merged without changing their users,
and it may be impossible to justify SBufStream users exposure to the
dangers of forgetting to sync() the streambuf before accessing the sink.

Fix GCC v5.5.0 build after 82fe21f (#962)

Old problems revealed by recent commits:

    XactionRep.cc:174:20: error: declaration of 'String name' shadows
    a parameter [-Werror=shadow]

    XactionRep.cc:486:33: error: declaration of 'services' shadows
    a previous local [-Werror=shadow]

Remove ConfigParser::Undo() hack to improve ACL flags parsing (#960)

Since `acl ... -n` support was added in commit 33810b1, flag-agnostic
parseFlags() extracts ACL flags, applies supported ones, and rejects the
rest. However, that extraction code does not apply the supported `-i`
flag! Instead, the flag is put "back" via Undo() as if we did not see it
at all. Later, the ACL data parser re-parses and applies it.

That "undo" hack avoided ACL data parsing changes but caused several
problems, including:

* The global Undo_ storage could "leak" the stored flag to the wrong ACL
  on certain ACL parsing errors. The global nature of that storage also
  blocked serious preprocessing/reconfiguration support improvements.

* AclData::parse() did not distinguish (previously undone and now
  "redone") flags from (post-"--") ACL data, blindly assuming that the
  first token is a flag and treating the remaining tokens as ACL data.

* Increasingly inconsistent handling of `-i` and `+i` flags.

* Related `ident -i` parsing code had severe performance problems: Some
  tests timed out due to exponential growth of `-i` parsing delays (with
  the number of parsed ACLs) caused by excessive userDataNames copying.

This change removes the "undo" hack for good. We now parse all leading
ACL options once, using a single Acl::Option API for both "global" (i.e.
applicable to the entire named ACL object) and "line" scoped options.
The parsed line-scoped flags (e.g., `-i`) are reset before parsing each
ACL directive line. They are delivered to the (ACL data) parsing code
using the existing Acl::Option linkedWith() mechanism.

TODO: This change does not fix all ACL data flag handling problems. For
example, ACL data parsing methods should be refactored to reuse the
now-generalized Acl::Option API for handling flags located _between_ ACL
parameters. Those non-trivial fixes are unrelated to Undo() removal and
will fix/improve ACL data handling, so they deserve dedicated commits.

Fix gawk v5 warning (#961)

gawk: ./mk-globals-c.awk:24:
warning: regexp escape sequence \" is not a known regexp operator

Some awk variants complain about what they perceive as invalid escape
sequences. Gawk v5+ works but complains about us escaping double quotes.

The corresponding awk statement (added in commit 42c674f) is unnecessary
since commit 582c2af. Removing that statement is better than trying to
guess its portable spelling in the gray zone of awk escape sequences.

Preserve configured order of intermediate CA certificate chain (#956)

https_port ... tls-cert=signing,itsIssuer,itsIssuerIssuer.pem

The order was reversed in commit cf48712, probably by accident. Wrong
order violates TLS protocol and breaks TLS clients that are incapable of
reordering received intermediate CAs. Squid deployments that use
wrong-order bundles (to compensate for this bug) should reorder their
bundles when deploying this fix (or wait for Squid to order certificates
correctly, regardless of the bundle order -- a work in progress).

This is a Measurement Factory project.

Add missing CXXFLAGS to several objects (#952)

Target-specific CXXFLAGS customizations should include AM_CXXFLAGS.
Otherwise, their targets are built without SQUID_CXXFLAGS like -Wextra.

Bug 5132: Close the tunnel if to-server conn closes after client (#957)

Since commit 25d2603, blind CONNECT tunnel "jobs" (and equivalent) were
not destroyed upon a "lonely" to-server connection closure, leading to
memory leaks. And when a from-client connection was still present at the
time of the to-server connection closure, we did not try to reforward,
violating the spirit of commit 25d2603 changes. Calling retryOrBail() is
sufficient to handle both cases.

Bug 5177: clientca certificates sent to https_port clients (#955)

When sending an https_port _server_ certificate chain to the client,
Squid may send intermediate CA certificates found in clientca=... or
tls-cafile=... client certificate bundles. This "leak" of client CAs
surprises admins, may trigger traffic monitoring alarms, and might even
break https_port certificate validation in some TLS clients.

This surprising "leak" of client CAs is triggered by OpenSSL default
behavior of auto-completing server certificate chains using whatever CA
certificates happened to be in the TLS context certificate store. When
client certificate authentication is enabled, that store may contain
clientca CAs (or equivalent). OpenSSL CHANGES file acknowledges that
this aggressive default behavior can be a problem and introduces
SSL_MODE_NO_AUTO_CHAIN as a way to disable it.

This fix breaks misconfigured Squid deployments that (usually
unknowingly) rely on the OpenSSL clientca "leak" to build a complete
https_port server certificate chain sent to TLS clients. Such
deployments should add the right intermediate CA certificate(s) to their
https_port tls-cert=... bundle (or equivalent).

This is a Measurement Factory project.

Cleanup doc/ directory (#949)

Remove unnecessary copies of publicly available RFC documents
and lists of standard information which is available from
external sources.

See https://wiki.squid-cache.org/StandardsCompliance for a
list of RFC documents applicable to Squid.

See https://www.iana.org/assignments/http-status-codes/ for the
official list of HTTP status codes and document references.

Draft documents are kept for now since they may become
unavailable. As several already have.

Fix GCC v10.3 build after commit 8b082ed (#951)

basic_smb_auth.cc:57:1: warning: no previous declaration for
`void print_esc(FILE*, char*)` [-Wmissing-declarations]

langpack: Fix typo in Russian texts (#948)

Missing whitespace between two words in ERR_READ_TIMEOUT

Remove m88k-specific support (#944)

Modern support for m88k hardware is only available through
OpenBSD and NetBSD which have their own compatibility handling.
We no longer need to detect m88k ourselves.

Remove NeXTSTEP support (#943)

NeXT architecture hardware is no longer available and with that
hardware any support by Apple ceased.

QNX and MacOS descendant systems which still depend on some of
the workarounds already are or can be supported under their own
compatibility section.

Remove unused/broken ACL copying support (#941)

This code is unused. The clone() methods were the only use of
copy construction and operator. Most ACL related classes lack
implementation of the copy and/or assert so even if used this
code would be quite dangerous.

Explicitly forbid copy/move at the hierarchy base classes
ACL and ACLData. Removing all child specific copy
implementations and prohibitions (now unnecessary).

Prep for 5.3 (#940)

Fix FATAL ServiceRep::putConnection exception: theBusyConns > 0 (#939)

FATAL: check failed: theBusyConns > 0
exception location: ServiceRep.cc(163) putConnection

Since master/v6 commit 2b6b1bc, a timeout on a ready-to-shovel
Squid-service ICAP connection was decrementing theBusyConns level one
extra time because Adaptation::Icap::Xaction::noteCommTimedout() started
calling both noteConnectionFailed() and closeConnection(). Depending on
the actual theBusyConns level, the extra decrement could result in FATAL
errors later, when putConnection() was called (for a different ICAP
transaction) with zero theBusyConns in an exception-unprotected context.

Throughout these changes, Xaction still counts the above timeouts as a
service failure. That is done by calling ServiceRep::noteFailure() from
Xaction::callException(), including in timeout cases described above.

Fix enabling of -Woverloaded-virtual attempted in commit 8b082ed (#936)

    configure: checking whether compiler accepts -Woverloaded_virtual
    config.log: g++: error: unrecognized command-line option
    '-Woverloaded_virtual'; did you mean '-Woverloaded-virtual'?

Fix GCC v10 --with-openssl build after commit 030a9b3 (#935)

bio.cc:377:18: warning: unused variable 'ssl' [-Wunused-variable]

I removed the whole "we have a valid TLS connection" condition because
the code in question no longer uses the TLS connection object, and the
condition itself may change _after_ this code runs. The condition is
important to higher-level code (that calls the code that uses
allowSplice/allowBump state set in the affected low-level BIO method),
but there are many such conditions on that code path; there is no good
reason to single out this specific condition in this low-level code.

Remove recv-announce tool (#933)

This daemon for receiving send-announce traffic has not
built for many years.

Remove CPU profiler mechanism (#931)

The old CPU profiler has not been updated in many years. As a result,
the statistics provided are deceptively incomplete and not sufficient
for their intended purpose of profiling Squids CPU usage. External tools
such as oprofile do a better job despite their differences and some
limitations.

Bug 5090: Must(!request->pinnedConnection()) violation (#930)

The bug may be asymptomatic. Visible bug symptoms, if any, may include:

    FATAL: check failed: !request->pinnedConnection()
    exception location: FwdState.cc(1124) connectStart

    FATAL: check failed: transportWait
    exception location: FwdState.cc(675) noteDestinationsEnd

FwdState::usingDestination() should cover 3 post-transportWait periods:

1. peerWait: opening a CONNECT tunnel through the cache_peer
2. encryptionWait: TLS negotiations to secure the connection
3. Comm::IsConnOpen(serverConn): a Squid-peer transaction

The condition for the last period did not account for the time between
FwdState::unregister() and FwdState::complete() (or their equivalents),
when the transport connection is either closed or moved to the pconn
pool, but FwdState is still waiting for complete() and must not attempt
to connect to another destination. The bug is usually hidden because
complete() is usually called immediately after unregister(). However,
RESPMOD adaptation (at least) may delay complete(). If peer selection
news comes during that delay, usingDestination() lies, and various
Must()s may fail, depending on Squid version and HTTP request details.

Now, FwdState does not rely on the to-peer connection state for the
third period condition. Instead, we explicitly track whether the last
dispatch()ed activity has ended. This tracking is tricky because
complete() may be called without dispatching an asynchronous activity,
and because there are at least three ways for an activity to end:
explicit complete(), explicit handleUnregisteredServerEnd(), and
implicit serverClosed() connection closure callback. This tracking will
become easier and more reliable when FwdState no longer co-owns/stores
the to-peer connection. This change simplifies the future removal of
that connection ownership.

Also reordered usingDestination() conditions to match the chronological
order of the corresponding three periods.

Also reordered transportWait-vs-usingDestination() checks to match the
chronological order of those two forwarding stages.

Fix build on openbsd 7.0 (#929)

OpenBSD doesn't offer CPU_SET and CPU_ISSET. Implement their stubs as
inline functions to give the compiler proper hints about arguments (non)
use.

We have a const-correctness bug in std::unordered_map when supplying an
allocator that OpenBSD is strict about. Fix it.

Update buildtest.sh to try and use relative paths first. This prevents
autoconf complaining and failing if the directory path includes
characters from an unsafe set.

Bug 5060: Parallel builds are not reliable (#927)

Create tests directory before using it. Needed since commits 44e802f and
9ba9313.

    cp ../../src/tests/stub_debug.cc tests/stub_debug.cc
    cp ../../src/tests/stub_libmem.cc tests/stub_libmem.cc
    cp: cannot create regular file 'tests/stub_debug.cc':
        No such file or directory

BUG: Unexpected state while connecting to ... server, part 1 (#916)

These BUG messages (discussed and removed in a recent commit 2b6b1bc)
exposed several bugs. This change fixes a case where a BUG message was
correctly triggered by a Must() violation in Ssl::ServerBio::write():

check failed: buf[0] == 22
exception location: bio.cc(478)

The code expectations reflected in that Must() were wrong: Instead of
sending ClientHello, OpenSSL may also send a TLS Alert (Level: Fatal,
Description: Internal Error), at least. We believe that alert is sent
when SslBump configures OpenSSL to negotiate using unsupported ciphers
or something like that. This change relaxes ServerBio code expectations,
preventing the Must() violation.

The Must() violation was causing OpenSSL-related memory leaks. A more
comprehensive solution is needed to avoid similar leaks, but this small
fix helps in a specific (and a fairly common) case.

This is a Measurement Factory project.

Remove step2+ stare-and-splice and peek-and-bump support (#926)

Support for these features was successfully disabled (at build time) for
five years. See commit 88a300c for the list of reasons. We did not even
provide an ./configure --enable-... option for it. The corresponding
code does not compile with modern OpenSSL versions, but its mere
presence complicates related code logic and significantly increases
related development efforts. This code is not worth keeping.

This is a Measurement Factory project.

Fix HappyConnOpener::checkForNewConnection Must(prime) violation (#923)

This change addresses a known problem that triggered unwanted C++
exceptions every time Squid selected a to-server persistent connection
as the primary Happy Eyeballs destination/answer. The bug existed since
HappyConnOpener inception (commit 5562295). It did not seem to affect
the connection requestor directly because the HappyConnOpener job sends
the selected pconn to the requestor _before_ throwing.

Also adjusted Happy Eyeballs state documentation to reflect a successful
termination state. Before and after this change, we may enter that state
in the middle of checkForNewConnection(). The less we think about done()
as an _exceptional_ "only on error" or "only at the end of processing"
state, the fewer similar bugs we will create.

The code also improved after we abandoned the idea of documenting all
primary state changes in checkForNewConnection(). There are too many
nuances/changes to document everything anyway, and moving primary track
handling into a dedicated function significantly improves readability.

Docs: %adapt::sum_trs entries may well exceed %icap::tt (#914)

%icap::tt documentation incorrectly implied that the measurement
includes the entire ICAP transaction(s) lifetime. In reality, individual
ICAP transaction contribution stops with
Adaptation::Icap::ModXactLauncher::swanSong(), which is normally
triggered by Adaptation::Icap::Launcher::noteAdaptationAnswer(). Here,
the "answer" does not include the entire ICAP response, but just enough
information to form adapted HTTP message headers (echoed or received).
Thus, a large or slow ICAP response body may result in %adapt::sum_trs
values that far exceed the corresponding %icap::tt "total".

This change does not imply that %icap::tt should (not) work differently.

Also fixed a typo in %adapt::all_trs and polished %adapt::sum_trs docs.

Bug 5158: AnyP::Uri::host() mishandles [escaped] IPv6 addresses (#920)

When an address is expected to be a string in the form "Host:Port", the
Host MUST be wrapped in square brackets iff it is a numeric IPv6
address. The ":Port" part is usually optional. i.e. "[2001:db8::1]:443"
or "[2001:db8::1]".

There are 2 bugs relating to the handling of numeric IPv6 addresses:

* Bug 1: AnyP::Uri::host(const char *) is supposed to accept the host
  part of a URI, but just copied it into hostAddr_ instead of using the
  fromHost() method to set hostAddr_.  Since the argument is the host
  part of a URI, numeric IPv6 addresses are wrapped in square brackets,
  which would never be stripped and hostIsNumeric was therefore not set.
  We now use the fromHost() method to process the host name correctly.

* Bug 2: Conversely, AnyP::Uri::hostOrIp() is supposed to return the
  host name, suitable for use in a URI or Host header, which means that
  numeric IPv6 addresses need to be wrapped in square brackets.
  However, that wrapping was never done.  We now use the toHostStr()
  method to correctly format the string.

As far as I know, neither of these bugs actually break anything at the
moment, but they have the potential to break future developments, so
applying this fix should have no operational impact.

Also removed wrong IP::Address::toStr() description from .cc file that
(poorly) duplicated correct description of that method in the header.

Bug 5169: StoreMap.cc:517 "!s.reading()" assertion (#918)

unlockSharedAndSwitchToExclusive() was not properly failing when future
readers violated preconditions for obtaining an exclusive lock:

- `if (!readers)` means no old readers
+ `if (!readLevel)` means no old readers and nobody is becoming a reader

That missing "becoming a reader" condition covers a lockShared() caller
that had passed their writeLevel test before we incremented writeLevel
to lock other lockShared() callers out. That caller increments `readers`
while unlockSharedAndSwitchToExclusive() makes `writing` true.

Introduced in commit 1af789e due to poor code duplication. This change
also removes that code duplication by adding finalizeExclusive().

BUG: Unexpected state while connecting to ... server, part 2 (#917)

These BUG messages (discussed and removed in a recent commit 2b6b1bc)
exposed several bugs. This change fixes a case where a BUG message was
triggered by the following Must() violation:

    check failed: !csd->serverBump() ||
        csd->serverBump()->at(...tlsBump1, XactionStep::tlsBump2)
    exception location: PeekingPeerConnector.cc(173) initialize

The above Must() assumed that PeekingPeerConnector always changes the
SslBump step to step3. However, that assumption was wrong because
PeekingPeerConnector may run multiple times (and the step is recorded
outside the connector object). When FwdState reforwarded a failed
attempt, PeekingPeerConnector would find itself at step3.

Instead of fixing the Must(), we fixed a bigger bug: SslBump step3 must
start when we decide to communicate with the server, not in the middle
of that communication. This fix may affect some esoteric configurations
that use at_step ACLs outside ssl_bump _and_ rely on the wrong step
change timing, but, technically, such configurations are not officially
supported.

More step boundary fixes are needed. There is a (much bigger) ongoing
project dedicated to those changes.

This is a Measurement Factory project.

Fix reconfiguration leaking tls-cert=... memory (#911)

Refactored ReadX509Certificate() API for safe use in more contexts,
including in leaking Security::KeyData::loadX509ChainFromFile().

Abstract direct calls to the dangerous PEM_read_bio_X509() API.

Leaking (under certain conditions) since master/v5 commit 540e296.

Remove HPUX compiler handling in configure.ac (#890)

HPUX is not a testable nor a primary target platform for us

Properly track (and mark) truncated store entries (#909)

## Responses with truncated bodies

Squid used an error-prone approach to identifying truncated responses:
The response is treated as whole[^1] unless somebody remembers to mark
it as truncated. This dangerous default naturally resulted in bugs where
truncated responses are treated as complete under various conditions.

This change reverses that approach: Responses not explicitly marked as
whole are treated as truncated. This change affects all Squid-server
FwsState-dispatched communications: HTTP, FTP, Gopher, and WHOIS. It
also affects responses received from the adaptation services.

Squid still tries to deliver responses with truncated bodies to clients
in most cases (no changes are expected/intended in that area).

[^1]: A better word to describe a "whole" response would be "complete",
    but several key Squid APIs use "complete" to mean "no more content
    is coming", misleading developers into thinking that a "completed"
    response has all the expected bytes and may be cached/shared/etc.

## Related access-logging improvements

Transactions that failed due to origin server or peer timeout (a common
source of truncation) are now logged with a _TIMEOUT %Ss suffix and
ERR_READ_TIMEOUT/WITH_SRV %err_code/%err_detail.

Transactions prematurely canceled by Squid during client-Squid
communication (usually due to various timeouts) now have WITH_CLT
default %err_detail. This detail helps distinguish otherwise
similarly-logged problems that may happen when talking to the client or
to the origin server/peer.

## Other

FwdState now (indirectly) complete()s truncated entries _after_
releasing/adjusting them so that invokeHandlers() and others do not get
a dangerous glimpse at a seemingly OK entry before its release().

Remove HTTP reply header completion hack (#910)

Treat responses with truncated HTTP headers (i.e. no CRLF after all the
field-lines) as malformed, replacing them with an HTTP 502
ERR_INVALID_RESP error (same as any other HTTP response with malformed
headers would get).

Since Bug 2879 (commit da6c841 and earlier v2-only changes), Squid was
"fixing" such truncated headers by adding an extra CRLF sequence and
re-parsing them. Depending on the truncation circumstances, that old
workaround could result in rather bad outcomes for Squid and its
clients. Hopefully, we no longer need to work around such bugs. If we
do, we should do it in a safer manner and with admin permission.

Deduplicate transaction LogTags (#906)

...using ALE cache.code instead of its partial
ClientSideRequest::logType duplicate.

The need to de-duplicate these two LogTag members became apparent when
we tried to update/detail LogTags in forwarding code (e.g., to report
peer timeouts) and noticed that ALE cache.code value could be
overwritten by (stale) ClientSideRequest::logType information. Those
updates needed other significant changes, so we factored this small
stand-alone improvement out.

Fortunately, ClientSideRequest::logType lifetime does not exceed ALE's,
making ClientHttpRequest::logTags removal straightforward.

Also removed a paranoid TunnelStateData::al check after verifying that
the ALE pointer is never nil.

Prep for 4.17 and 5.2 (#905)

Improved const-correctness of ALE-driven APIs (#904)

Also made ClientHttpRequest::al constant. This work actually started
with changing this data member, to make some ALE-related hack safer.
That change required the API fixes. We have since got rid of the hack,
but all these const-correctness changes are valuable on their own.

No functionality changes.

Fixed a copy-paste typo in HttpHdrCc::hasMinFresh() (#901)

This bug could result in unexpected hits/misses because Squid used the
cached CC:max-stale value (or -1) instead of CC:min-fresh when
calculating entry freshness.

Broken since 810d879.

WCCP: Validate packets better (#899)

Update WCCP to support exception based error handling for
parsing and processing we are moving Squid to for protocol
handling.

Update the main WCCPv2 parsing checks to throw meaningful
exceptions when detected.

Fix X509_V_ERR_UNABLE_TO_GET_ISSUER_CERT_LOCALLY handling (#898)

Fix "mem_obj->inmem_lo == 0" assertion in StoreEntry::swapOut() (#896)

Squid may stop writing to disk well before receiving the entire miss
response (e.g., because of swapout errors). When that happens, Squid
continues to call swapOut() to mem-cache the incoming response body.
Avoid the inapplicable disk-writing part of swapOut() in those cases.

Fix MAX_PKT{4,6}_SZ to account for icmpEchoData padding (#887)

The bug was exposed by GCC v11 on Ubuntu Impish:

    Icmp4.cc:116:11: error: array subscript icmpEchoData[0] is partly
        outside array bounds of char[282] [-Werror=array-bounds]
        echo->opcode = (unsigned char) opcode;

The array the compiler is talking about is the pkt buffer. That buffer
size (i.e. MAX_PKT4_SZ) was calculated under the faulty assumption that
a compiler cannot add padding after icmphdr (when doing "icmp+1") and/or
between icmpEchoData data members. When compiler padded, the old
MAX_PKT4_SZ math stopped working.

Same for ICMPv6.

Support Gopher type "w" (#889)

Known uses and libwww imply that "w" selector is an absolute URI.

Also updated gopher item-type code list and codes documentation.

Cleanup HERE step 1 (#881)

Step 1:
Add script to replace HERE symbol in a way that does
not break branches during the remainder of the transition.

The script is added first as a separate commit for large
branches to merge and run before attempting to merge onto a
cleaned master branch.

The Enforcement commit is provided for review purposes to show
the changes made by the script. It should be removed before
merge.

Step 2:
Master branch has a regular automated Enforcement commit.

Step 3:
Then full removal of the HERE symbol as a third PR followup to
drop it completely.

Fix "make check" -Wsign-compare on arm32 broken by 1a51cf7 (#891)

MemBuf::contentSize() returns mb_size_t which is signed.

Bug 5055: FATAL FwdState::noteDestinationsEnd exception: opening (#877)

The bug was caused by commit 25b0ce4. Other known symptoms are:

    assertion failed: store.cc:1793: "isEmpty()"
    assertion failed: FwdState.cc:501: "serverConnection() == conn"
    assertion failed: FwdState.cc:1037: "!opening()"

This change has several overlapping parts. Unfortunately, merging
individual parts is both difficult and likely to cause crashes.

## Part 1: Bug 5055.

FwdState used to check serverConn to decide whether to open a connection
to forward the request. Since commit 25b0ce4, a nil serverConn pointer
no longer implies that a new connection should be opened: FwdState
helper jobs may already be working on preparing an existing open
connection (e.g., sending a CONNECT request or negotiating encryption).

Bad serverConn checks in both FwdState::noteDestination() and
FwdState::noteDestinationsEnd() methods led to extra connectStart()
calls creating two conflicting concurrent helper jobs.

To fix this, we replaced direct serverConn inspection with a
usingDestination() call which also checks whether we are waiting for a
helper job. Testing that fix exposed another set of bugs: The helper job
pointers or in-job connections were left stale/set after forwarding
failures. The changes described below addressed most of those problems.

## Part 2: Connection establishing helper jobs and their callbacks

A proper fix for Bug 5055 required answering a difficult question: When
should a dying job call its callbacks? We only found one answer which
required cooperation from the job creator and led to the following
rules:

* AsyncJob destructors must not call any callbacks.

* AsyncJob::swanSong() is responsible for async-calling any remaining
  (i.e. set, uncalled, and uncancelled) callbacks.

* AsyncJob::swanSong() is called (only) for started jobs.

* AsyncJob destructing sequence should validate that no callbacks remain
  uncalled for started jobs.

... where an AsyncJob x is considered "started" if AsyncJob::Start(x)
has returned without throwing.

A new JobWait class helps job creators follow these rules while keeping
track on in-progress helper jobs and killing no-longer-needed helpers.

Also fixed very similar bugs in tunnel.cc code.

## Part 3: ConnOpener fixes

1. Many ConnOpener users are written to keep a ConnectionPointer to the
   destination given to ConnOpener. This means that their connection
   magically opens when ConnOpener successfully connects, before
   ConnOpener has a chance to notify the user about the changes. Having
   multiple concurrent connection owners is always dangerous, and the
   user cannot even have a close handler registered for its now-open
   connection. When something happens to ConnOpener or its answer, the
   user job may be in trouble. Now, ConnOpener callers no longer pass
   Connection objects they own, cloning them as needed. That adjustment
   required adjustment 2:

2. Refactored ConnOpener users to stop assuming that the answer contains
   a pointer to their connection object. After adjustment 1 above, it
   does not. HappyConnOpener relied on that assumption quite a bit so we
   had to refactor to use two custom callback methods instead of one
   with a complicated if-statement distinguishing prime from spare
   attempts. This refactoring is an overall improvement because it
   simplifies the code. Other ConnOpener users just needed to remove a
   few no longer valid paranoid assertions/Musts.

3. ConnOpener users were forced to remember to close params.conn when
   processing negative answers. Some, naturally, forgot, triggering
   warnings about orphaned connections (e.g., Ident and TcpLogger).
   ConnOpener now closes its open connection before sending a negative
   answer.

4. ConnOpener would trigger orphan connection warnings if the job ended
   after opening the connection but without supplying the connection to
   the requestor (e.g., because the requestor has gone away). Now,
   ConnOpener explicitly closes its open connection if it has not been
   sent to the requestor.

Also fixed Comm::ConnOpener::cleanFd() debugging that was incorrectly
saying that the method closes the temporary descriptor.

Also fixed ConnOpener callback's syncWithComm(): The stale
CommConnectCbParams override was testing unused (i.e. always negative)
CommConnectCbParams::fd and was trying to cancel the callback that most
(possibly all) recipients rely on: ConnOpener users expect a negative
answer rather than no answer at all.

Also, after comparing the needs of two old/existing and a temporary
added ("clone everything") Connection cloning method callers, we decided
there is no need to maintain three different methods. All existing
callers should be fine with a single method because none of them suffers
from "extra" copying of members that others need. Right now, the new
cloneProfile() method copies everything except FD and a few
special-purpose members (with documented reasons for not copying).

Also added Comm::Connection::cloneDestinationDetails() debugging to
simplify tracking dependencies between half-baked Connection objects
carrying destination/flags/other metadata and open Connection objects
created by ConnOpener using that metadata (which are later delivered to
ConnOpener users and, in some cases, replace those half-baked
connections mentioned earlier. Long-term, we need to find a better way
to express these and other Connection states/stages than comments and
debugging messages.

## Part 4: Comm::Connection closure callbacks

We improved many closure callbacks to make sure (to the extent possible)
that Connection and other objects are in sync with Comm. There are lots
of small bugs, inconsistencies, and other problems in Connection closure
handlers. It is not clear whether any of those problems could result in
serious runtime errors or leaks. In theory, the rest of the code could
neutralize their negative side effects. However, even in that case, it
would only be a matter of time before the next bug bites us due to stale
Connection::fd and such. These changes themselves carry elevated risk,
but they get us closer to reliable code as far as Connection maintenance
is concerned. Without them, we will keep chasing deadly side effects of
poorly implemented closure callbacks.

Long-term, all these manual efforts to keep things in sync should become
unnecessary with the introduction of appropriate Connection ownership
APIs that automatically maintain the corresponding environments (TODO).

## Part 5: Other notable improvements in the adjusted code

Improved Security::PeerConnector::serverConn and
Http::Tunneler::connection management, especially when sending negative
answers. When sending a negative answer, we would set answer().conn to
an open connection, async-send that answer, and then hurry to close the
connection using our pointer to the shared Connection object. If
everything went according to the plan, the recipient would get a non-nil
but closed Connection object. Now, a negative answer simply means no
connection at all. Same for a tunneled answer.

Refactored ICAP connection-establishing code to to delay Connection
ownership until the ICAP connection is fully ready. This change
addresses primary Connection ownership concerns (as they apply to this
ICAP code) except orphaning of the temporary Connection object by helper
job start exceptions (now an explicit XXX). For example, the transaction
no longer shares a Connection object with ConnOpener and
IcapPeerConnector jobs.

Probably fixed a bug where PeerConnector::negotiate() assumed that
sslFinalized() does not return true after callBack(). It may (e.g., when
CertValidationHelper::Submit() throws). Same for
PeekingPeerConnector::checkForPeekAndSpliceMatched().

Fixed FwdState::advanceDestination() bug that did not save
ERR_GATEWAY_FAILURE details and "lost" the address of that failed
destination, making it unavailable to future retries (if any).

Polished PeerPoolMgr, Ident, and Gopher code to be able to fix similar
job callback and connection management issues.

Polished AsyncJob::Start() API. Start() used to return a job pointer,
but that was a bad idea:

* It implies that a failed Start() will return a nil pointer, and that
  the caller should check the result. Neither is true.

* It encourages callers to dereference the returned pointer to further
  adjust the job. That technically works (today) but violates the rules
  of communicating with an async job. The Start() method is the boundary
  after which the job is deemed asynchronous.

Also removed old "and wait for..." post-Start() comments because the
code itself became clear enough, and those comments were becoming
increasingly stale (because they duplicated the callback names above
them).

Fix Tunneler and PeerConnector handling of last-resort callback
requirements. Events like handleStopRequest() and callException() stop
the job but should not be reported as a BUG (e.g., it would be up to the
callException() to decide how to report the caught exception). There
might (or there will) be other, similar cases where the job is stopped
prematurely for some non-BUG reason beyond swanSong() knowledge. The
existence of non-bug cases does not mean there could be no bugs worth
reporting here, but until they can be identified more reliably than all
these benign/irrelevant cases, reporting no BUGs is a (much) lesser
evil.

TODO: Revise AsyncJob::doneAll(). Many of its overrides are written to
check for both positive (i.e. mission accomplished) and negative (i.e.
mission cancelled or cannot be accomplished) conditions, but the latter
is usually unnecessary, especially after we added handleStopRequest()
API to properly support external job cancellation events. Many doneAll()
overrides can probably be greatly simplified.

Reduce dependencies of testConfigParser (#888)

Move several globals used by the legacy parse logic into cache_cf.cc
and remove unnecessary objects linked to testConfigParser.

Cleanup dependencies for testEvent unit test (#886)

Many objects linked to the testEvent unit test were there for
legacy reasons and no longer needed.

Converting Event::dump() to Packable interface additionally
removes all dependency on the Store and StoreEntry API. Vastly
simplifying the complexity of the test setup and clarifying
(lack of) code coverage by this test.

Fix build on OpenBSD (#885)

On OpenBSD 6.9, SOL_SOCKET is defined in sys/socket.h.

ext_lm_group_acl: Improved username handling (#884)

Ensure that NTDomain is terminated even if UserName is long.
It happened to be terminated in the current code, but only by accident.

Bug 4922: Improve ftp://... filename extraction (#879)

Symptoms include: Unexpected ERR_FTP_NOT_FOUND/FTP_REPLY_CODE=550
outcome for GET ftp://... requests with URLs containing `;type=...`.

Broken since commit 51b5dcf.

Activate extra compiler checks (#667)

If the compiler supports it, add the `-Wextra` compiler flag to enable
extra checks. Do not enable extra checks with GCC v4 because that old
GCC version reports too many false positives.

Explicitly disable those `-Wextra` checks that offer few insights while
creating a lot of noise which cannot be easily avoided.

Fix warnings reported by GCC or clang with the new settings.

Rework syslog-related code in debug.cc, making it clearer and with less
dependencies on some syslog levels.

Support "file" syntax for 'squid_error' and 'has' ACL parameters (#874)

The 'squid_error' and 'has' are the only ACLs that do not support
loading their parameters from a file using the "path/filename" syntax.
We see no justification for this exception, and it stands in the way of
(unrelated) configuration improvements we are working on (that will,
among other things, prevent such accidents from happening again).

This change causes no upgrade problems because it does not change
handling of previously accepted configurations. It only expands the set
of acceptable configurations, better matching documentation and admin
expectations with regard to universal "path/filename" syntax support.

Fix TCP keepalive (#853)

Setting TCP keep-alive flags at accept(2) time resolves issues with
client sockets timing out while waiting for the ::Server handler to run.

Also resolves a bug with FTP DATA connections not having keep-alive set.
These connections would truncate objects if the data transfer connection
paused for too long and became timed out by the network routing system.

Fix doc/debug-messages.dox generation (#871)

cache_log_message ID 62 was missing from doc/debug-messages.dox because
the code generating that file did not recognize debugs() statements with
Critical() or Important() macro inside a conditional operator.

Also fixed a related sed regex: sed does not support non-greedy matches.

No runtime functionality changes.

Prep for 5.1 (#868)

documentation only.

Remove legacy context-based debugging in favor of CodeContext (#866)

Added in 1998, context-based debugging has been neglected and

- only covers two relatively small contexts
- unsafe in the presence of exceptions
- produces noise (e.g., "ctx: exit level 0" messages)
- delayed "ctx: exit" messages confuse admins
- uses deprecated urlXXX() API
- difficult to extend to more contexts without performance overheads
- usually provides less info in fewer contexts (than CodeContext)

Cleanup macros in src/defines.h (#860)

Reduce compile unit dependencies on src/defines.h by moving some
src/defines.h macros to their most-relevant header.

Also remove all src/defines.h macros known to be unused.

Allow sending "squid -k ..." signals to PID 1 (#863)

Inside a container, Squid usually runs with PID=1. Since "squid -k"
feature was added in 1996 (commit 7690e8e), Squid refused to signal a
PID=1 instance. We do not know why, and it could have been an accident.
We cannot find a good reason to keep prohibiting such signalling.