ankor2023 [Tue, 5 Mar 2024 11:13:19 +0000 (11:13 +0000)]
negotiate_kerberos_auth: Support Kerberos PAC-ResourceGroups (#1597)
Parse the ResourceGroupIds pac-data structure to have information
about the user's membership in AD Domain Local groups.
Previously, the helper obtained user groups information only from
GroupIds and ExtraSids pac-data structures (of the
KERB_VALIDATION_INFO structure).
The patch extends the functionality of the helper.
Now it additionally parse the ResourceGroupIds pac-data structure
where Domain Local AD-group rids are located.
It appends these groups to the the list generated by parsing
GroupIds and ExtraSids.
No changes in existing helper deployments are required.
The new parsing functions are similar to those already used for
parsing GroupIds and ExtraSids.
Alex Rousskov [Fri, 1 Mar 2024 22:20:20 +0000 (22:20 +0000)]
Bug 5069: Keep listening after getsockname() error (#1713)
ERROR: Stopped accepting connections:
error: getsockname() failed to locate local-IP on ...
In many cases, these failures are intermittent client-triggered errors
(e.g., client shut down the accepted socket); Squid will successfully
accept other connections and, hence, should keep listening for them.
Store is an essential service, used by a lot of Squid code. As was
already established in 2022 commit 23b7963, this service should be
available during shutdown. That commit correctly removed explicit Store
service termination, but missed the fact that the reference-counting
TheRoot pointer (that provides access to the Store Controller singleton)
gets _automatically_ destroyed during C++ cleanup. This change removes
TheRoot reference counting, making that Controller singleton immortal.
Squid asserted when exiting with active entries in the shared memory
cache because TheRoot destruction leads to Controller destruction,
Controller destructor cleans up cache index, and that cleanup code may
result in calls to Store::Root() that dereferences destroyed TheRoot.
These assertions were seen when Squid was shutdown cleanly (e.g., using
SIGTERM) and when a kid process was exiting due to a fatal() error.
Making Store::Controller singleton immortal means that the class
destructor is never called. Fortunately, the destructor did nothing
particularly useful; Store flushing is performed by Controller::sync()
which is explicitly called during early stages of clean shutdown. The
now-unused destructor code was removed: Implementing this destructor in
a meaningful way (while avoiding accessing a being-destructed global
Store!) requires heroic efforts (which would be wasted since the
destructor is never actually called).
Also made Store Controller singleton available on the first use,
complying with Store's "essential, widely used service" status and
removing the need for an explicit Store::Init(void) call.
Also removed unit tests that required support for dynamic replacement of
Store Controller singleton. The minuscule value of the removed tests did
not justify the costs of supporting a replaceable Store Controller.
Also removed Store::FreeMemory(). Its implementation contradicted Store
status as an essential service. The function was only used by unit
tests, and its removal addresses the corresponding commit 23b7963 TODO.
POSIX.1-2001 marks vfork(2) OBSOLETE.
POSIX.1-2008 removes the specification of vfork(2).
MacOS system headers declare vfork(2) as deprecated.
We only use vfork(2) in negotiate_wrapper, where it is not necessary.
On MacOS / Homebrew, libcppunit is not part of the system path.
pkg-config will report it; use it.
This change also ensures squid honours the promise made
in configure's help text to let administrators specify a
LIBCPPUNIT_LIBS environment variable to
override automatic detection
Alex Rousskov [Fri, 23 Feb 2024 04:26:38 +0000 (04:26 +0000)]
Fix marking of problematic cached IP addresses (#1691)
Since inception in 2017 commit fd9c47d, Dns::CachedIps::have() always
returned position zero after finding a matching IP address (at zero or
positive position). The bug affected two callers:
* markAsBad() always marked the first stored address (as bad);
* forgetMarking() always cleared the first stored address marking.
Buggy markings led to Squid sometimes not attempting to use a working
address (e.g., IPv4) while using a known problematic one (e.g., IPv6).
Maintenance: Remove Red Hat Linux workarounds predating RHEL (#1698)
The last Red Hat Linux release went EOL in 2004, replaced by Red Hat
Enterprise Linux and Fedora Linux. We no longer support Red Hat Linux
releases and expect that these hacks are no longer necessary in
supported environments.
squid-conf-tests: Ignore tests with mismatching autoconf macro (#1648)
The 'skip-unless-autoconf-defines' directive should be able to
distinguish autoconf macro values, such as '0' (not defined) from '1'
(defined) ones. For example, --disable-ipv6 configuration option
defines USE_IPV6 as '0'. This change allows IPv6 tests activation,
addressing a TODO.
Alex Rousskov [Sun, 18 Feb 2024 00:45:41 +0000 (00:45 +0000)]
Fix debugging for responses that Expire at check time (#1683)
Since 2000 commit 65fa5c6, our level-3 debugging mislead about Expires
being less than the check time when the two times were identical. The
actual checked conditions are correct: Roughly speaking, the response
with Expires value T is considered expired at that time T.
Also dropped extra (and inconsistent) trailing space on debugs() lines.
This space was added by the same 2000 commit, probably accidentally.
Alex Rousskov [Fri, 16 Feb 2024 13:02:54 +0000 (13:02 +0000)]
Maintenance: Removed unused bits of Format::FmtConfig code (#1681)
This code was probably accidentally copied from Log::LogConfig when
FmtConfig was created in 2011 commit 31971e6. It mentions logfile_format
directive that never existed. The code also duplicates a dangerous
Log::LogConfig snippet (see Bug 5344).
Alex Rousskov [Fri, 16 Feb 2024 04:03:40 +0000 (04:03 +0000)]
Bug 5344: mgr:config segfaults without logformat (#1680)
Since 2011 commit 38e16f9, Log::LogConfig::dumpFormats() dereferenced a
nil `logformats` pointer while reporting a non-existent logformat
configuration (e.g., squid.conf.default): `logformats->dump(e, name)`.
In most environments, that code "worked" because the corresponding
Format::Format::dump() method happens to do nothing if "this" is nil.
However, in some environments, Squid segfaulted.
The code loading a response from the shared memory cache incorrectly
assumed that the being-loaded response could not have been updated by an
HTTP 304 (Not Modified) reply and called adjustableBaseReply() that is
banned for updated responses. The goal of that call was to determine
whether the cached response header has been parsed. That determination
can be made without using a method that is banned for updated responses.
StoreClient and clientReplyContext had very similar checks that used the
right/safe approach because their current code did not need an immediate
access to an "adjustable" response. We have updated all external
psParsed checks to reduce chances that this bug resurfaces.
Alex Rousskov [Thu, 8 Feb 2024 22:03:44 +0000 (22:03 +0000)]
Fix max-stale in default refresh_pattern (#1664)
RefreshPattern constructor must set data fields to honor refresh_pattern
defaults promised in squid.conf.documented. For max-stale, that implies
making max_stale negative. A negative value allows refreshCheck() to use
a max_stale directive value (i.e. Config.maxStale that defaults to 1
week). The buggy constructor set max_stale to 0 instead and, hence,
refreshCheck() ignored max_stale directive when no refresh_pattern rules
were configured.
The fixed bug did not affect Squids configured using explicit
refresh_pattern rules because those rules are handled by
parse_refreshpattern() which sets max_stale to -1 by default. Our
squid.conf.default does have explicit refresh_pattern rules.
Alexey [Mon, 29 Jan 2024 19:47:41 +0000 (19:47 +0000)]
Fix memory leak in ssl/gadgets/mimicAuthorityKeyId() (#1651)
An unnecessary std::unique_ptr::release() call prevented temporary
extOct string from being automatically deallocated. The leak usually
happened when SslBump mimicked certificates with an Authority Key
Identifier extension. The leak was added in 2016 commit 5f1318b.
Alex Rousskov [Sun, 28 Jan 2024 09:51:37 +0000 (09:51 +0000)]
Remove AclMatchedName from ACL::ParseAclLine() (#1642)
ACL parsing code needs to know the aclname parameter of the being-parsed
acl directive to report various errors. Most admins recognize their ACLs
by these names so reporting aclnames improves UX. Since before 1999
commit b6a2f15, Squid used a "temporary" and "ugly" trick that supplied
aclname via the unrelated global variable called AclMatchedName (which
has its own set of problems). Some ACL parsing code used AclMatchedName
in cache.log messages, but most ACL-related problems were still reported
without that information.
Passing ACL::name to each parsing-related function via an extra
parameter is not just ugly but impractical because some the low-level
parsing functions should not really know about ACLs. Instead, we reuse
existing CodeContext mechanism to report parsing context information (in
this case -- aclname).
We plan to enhance parsing context to cover directives other than "acl"
(without modifying every directive parser, of course), but this first
small step significantly reduces configuration code exposure to
AclMatchedName, unblocking ACL-related smooth reconfiguration
improvements.
Amos Jeffries [Tue, 23 Jan 2024 22:03:43 +0000 (22:03 +0000)]
Translation: Drop deprecated language links (#1643)
Support for full-name languages has been deprecated since
Squid-3.1 it is long overdue to remove these symlinks and
simplify the default install footprint.
Alexey [Sun, 21 Jan 2024 16:24:57 +0000 (16:24 +0000)]
NTLM/Negotiate: Fix crash on bad helper TT responses (#1645)
Helper lookup may be made without a client HTTP Request,
(stored in lm_request->request). But in Helper::TT cases the
lm_request->request was dereferenced without any checks.
Avoid file name conflict with Windows WinSvc.h (#1637)
There is a conflict between our WinSvc.h header file and
Windows system include <winsvc.h>, which is one of the
factors preventing our ability to build native windows helpers.
Alex Rousskov [Thu, 11 Jan 2024 07:10:15 +0000 (07:10 +0000)]
Fix dupe handling in Splay ACLs: src, dst, http_status, etc. (#1632)
Squid was dangerously mishandling duplicate[^1] values in ACLs with
Splay tree storage: src, dst, localip, http_status, dstdomain,
srcdomain, and ssl::server_name. These problems were all rooted in the
same old code but had diverged across two ACL groups, as detailed in the
corresponding sections below. Fortunately, all known problems were
accompanied with Squid cache.log WARNINGs: Squid configurations that do
not emit cited or similar warnings are not affected by these bugs.
[^1]: Squid Splay trees can only store unique values. Uniqueness is
defined in terms of a configurable comparison function that returns zero
when the two values are "not unique" (i.e. are considered to be
"duplicated" or "equal" in that Splay context). Those two "equal" values
may differ a lot in other contexts! For example, the following two
status code ranges are equal from acl_httpstatus_data::compare() point
of view, but are obviously very different from an admin and http_access
rules points of view: 200-200 and 200-299.
### Group 1: src, dst, localip, and http_status ACLs
These ACLs ignored (i.e. never matched) some configured ACL values in
problematic use cases. They also gave wrong "remove X" advice and
incorrectly classified values as being subranges or subsets:
Processing: acl testA http_status 200 200-300
WARNING: '200-300' is a subrange of '200'
WARNING: because of this '200-300' is ignored ...
Processing: acl testB http_status 300-400 200-300
WARNING: '200-300' is a subrange of '300-400'
WARNING: because of this '200-300' is ignored ...
WARNING: You should probably remove '300-400' from the ACL...
Processing: acl testC src 10.0.0.1 10.0.0.0-10.0.0.255
WARNING: (B) 10.0.0.1 is a subnetwork of (A) 10.0.0.0-10.0.0.255
WARNING: because of this 10.0.0.0-10.0.0.255 is ignored...
Processing: acl testD src 10.0.0.0-10.0.0.1 10.0.0.1-10.0.0.255
WARNING: (A) 10.0.0.1-10.0.0.255 is a subnetwork of (B) 10.0.0.0-...
WARNING: because of this 10.0.0.1-10.0.0.255 is ignored...
WARNING: You should probably remove 10.0.0.1-10.0.0.255 from the ACL
Since 2002 commit 96da6e8, IP-based ACLs like src, dst, and localip
eliminate duplicates[^1] among configured ACL values. That elimination
code was buggy since inception. Those bugs were later duplicated in
http_status code (2005 commit a0ec9f6). This change fixes those bugs.
To correctly eliminate duplicates when facing two (fully or partially)
overlapping ranges -- new A and old B -- we must pick the right
corrective action depending on the kind of overlap:
* A is a subrange of B: Ignore fully duplicated new range A. Keep B.
* B is a subrange of A: Remove fully duplicated old range B. Add A.
* A partially overlaps with B: Add a union of A and B. Remove B.
Both acl_httpstatus_data::compare() and acl_ip_data::NetworkCompare()
mishandled the last two cases because these functions effectively
implemented the following buggy logic instead:
- in all three cases: Ignore new range A. Keep B.
Their WARNINGs also suggested wrong corrective actions in two mishandled
cases (see the last WARNING lines in testB and testD output above).
### Group 2: dstdomain, srcdomain, and ssl::server_name ACLs
Processing: acl testE dstdomain .example.com example.com
ERROR: '.example.com' is a subdomain of 'example.com'
ERROR: You need to remove '.example.com' from the ACL named 'testE'
2002 commit 96da6e8 bugs mentioned in Group 1 section stopped affecting
domain-based ACLs (i.e. dstdomain, srcdomain, and ssl::server_name) in
2011 commit 14e563c that adjusted aclDomainCompare() to self_destruct()
in problematic cases. However, that adjustment emitted wrong advice and
incorrect subdomain classification in cases where strlen() checks do not
work for determining which of the two configured ACL values should be
removed (see testE ERRORs above).
This change improves that handling by replacing the call to
self_destruct() with proper duplicate[^1] resolution code (and fixing
cache.log messages). We (now) support duplicate values instead of
rejecting configurations containing them because duplicate values do not
invalidate an ACL -- an ACL with duplicates could match as expected. It
may be difficult for some admins to avoid duplication, especially when
ACL values come from multiple sources. Squid should continue to warn
about duplicates (because they waste resources and may indicate a deeper
problem), but killing Squid or otherwise rejecting ACLs with duplicates
is bad UX.
N.B. Domain-based ACLs use sets of values rather than "ranges" discussed
in Group 1 section, but domain sets follow the same basic principles.
### All of the above seven ACLs
The problems in both ACL groups were fixed by factoring out "insert ACL
values while correctly handling duplicates" algorithm (see the three
bullets in Group 1 section) into Acl::SplayInserter::Merge() function.
Without duplicates, the new ACL value insertion code has the same cost
as the old one. The vast majority of duplicate cases incur a constant
additional overhead because Splay tree dynamic reorganization makes the
right tree nodes immediately available. A few cases duplicate double the
number of comparisons during tree searches (when Splay reorganization
cannot cope with a particularly "bad" ACL value order), but since these
"bad" cases ought to be very rare, and since all problematic cases are
accompanied by WARNINGs, this extra cost is deemed acceptable.
This change also fixes memory leaks associated with ignored ACL values.
Also added test-suite/test-squid-conf.sh support for matching multiple
stderr messages. Without this, testing a variety of closely-related
cases requires creating lots of test configuration files (two per test
case), increasing noise and, more importantly, making it difficult to
handle related test cases as one coherent collection. The new
"expect-messages" feature is barely sufficient for testing these
changes, but we are now at (or perhaps even beyond) the limit of what
can be reasonably done using shell scripts and test instruction files:
The next step is to convert instruction files (and likely some test
cases themselves!) to scripts written in Perl or a better language.
Amos Jeffries [Sun, 7 Jan 2024 18:37:09 +0000 (18:37 +0000)]
Docs: update doxygen support (#1622)
Doxygen v1.7.5 changed HTML_HEADER requirements: We now need to add an
open div element to the header that Doxygen then closes for us. This
change also fixes the corresponding generated content on Squid website.
Also, disable IDL_PROPERTY_SUPPORT which has been causing some errors
with output getter/setter matching in the generated output.
Bug 5329: cbdata.cc:276 "c->locks > 0" assertion on reconfigure (#1625)
Recent commit 0f78379 correctly removed an excessive cbdata lock of a
CachePeer::digest object in peerDigestCreate() but accidentally lost
another digest lock while inlining peerDigestCreate(). The resulting
excessive unlocking triggered reconfiguration assertions.
This change restores the lost lock as a short-term fix.
Long-term, CachePeer code should be fixed to become an exclusive[^1]
PeerDigest owner (i.e. creating and deleting its cbdata-protected digest
object without locking, unlocking, or checking locks). That improvement
is already in the works, but it requires significant code refactoring.
[^1]: Shared PeerDigest ownership (i.e. reference counting instead of
explicit delete and cbdata) does not work well in this context due to
circular references.
Ben Kallus [Mon, 18 Dec 2023 18:43:03 +0000 (18:43 +0000)]
Bug 5119: Null pointer dereference in makeMemNodeDataOffset() (#1623)
UndefinedBehaviorSanitizer: undefined-behavior mem_node.cc:27:26 in
runtime error: member access within null pointer of type 'mem_node'
Since only the address of the data member is computed, a compiler is
likely to perform pointer arithmetic rather than dereference a nullptr,
but it is best to replace this UB with a safe and clearer alternative.
Alex Rousskov [Sun, 17 Dec 2023 14:48:41 +0000 (14:48 +0000)]
Bug 5254, part 1: Do not leak master process' cache.log to kids (#1222)
The fork()ed kids unknowingly inherited cache.log file descriptor and,
hence, prevented the underlying file from being deleted on log rotation.
This change does not fully fix the bug because the file is still being
held open by the master process itself. This change was isolated because
it lacks bad/controversial side effects -- it is a simple step forward.
This fix may also help stop leaking kid's cache.log descriptor to
helpers, but that requires more work -- replacing descriptor duping
trick in ipcCreate() with a proper servicing of helper stderr descriptor
via Comm. For now, each helper process still keeps cache.log alive.
Alex Rousskov [Sun, 17 Dec 2023 02:03:22 +0000 (02:03 +0000)]
Docs: Describe surprising side effects of auth_param basic (#1612)
acl badGuys proxy_auth Bob
http_access deny badGuys
Admins may be surprised that their proxy_auth ACLs do not match users
with logins identical to those listed as proxy_auth ACL values. For
example, a user logged in as "Bob" will no match the above ACL if Basic
authentication is used without an explicit "casesensitive on" setting.
In fact, the above ACL cannot match any user in that environment!
Amos Jeffries [Wed, 13 Dec 2023 18:51:47 +0000 (18:51 +0000)]
Stop zeroing huge memAllocBuf() buffers (#1592)
memAllocBuf() buffers smaller than 64KB were not zeroed before this
change. Larger buffers (a.k.a. "huge buffers") were zeroed with
xcalloc(). We believe it is safe to stop zeroing those huge buffers
because all known code that allocates huge buffers also allocates
smaller ones for the same purpose. If some of that code relied on
zeroing for years, we would expect to see problems with smaller buffers.
Removing xcalloc() allows removal of mem*Rigid(), memBufStats() and all
string pools API complications.
The cache manager mgr:mem report section for string
statistics is also removed.
We just wanted to remove legacy printf()-like calls from Notes.cc, but
realized that finding correct replacement for that code is complicated
because some of the calls were broken, and the true meaning or purpose
of the affected annotation reporting methods was elusive. This change
combines several related fixes and improvements detailed below.
### Fix reporting method names and their descriptions
Humans could not easily figure out the difference between Note::dump(),
Note::toString(), Notes::dump(), Notes::toString(), and
NotePairs::toString() methods, especially since Note and Notes classes
had both, implying some important difference. The toString() name is
very generic. The dump() name is used (differently) in Configuration
code and ACL class hierarchy; these classes are used by that code, but
they do not belong to that hierarchy. Bugs and the variety of
annotation-related use cases increased doubts and confusion.
The new task/format-specific names for Note and Notes methods fix this.
### Fix annotate_client and annotate_transaction mgr:config reporting
Multi-name annotations were split across several lines and used an
incorrect name/value separator. The "acl" directive line was followed by
an extra new line.
Each "note" directive was followed by bogus "(note...)" suffix instead
of ACL names (if any). The "note" directive line was followed by an
extra new line.
### Improve debugging of annotations in helper responses
Alex Rousskov [Sat, 9 Dec 2023 04:46:55 +0000 (04:46 +0000)]
Bug 5274: Successful tunnels logged as TCP_TUNNEL/500 (#1608)
Stop calling retryOrBail() when the tunneled Squid-server connection
(that we have committed to use) closes. Our retryOrBail() is dedicated
to handling errors. Most[^1] serverClosed() calls are _not_ related to
errors because our tunneling code abuses asynchronous connection closure
callbacks for TunnelStateData work termination. Depending on the
transaction details (e.g., TLS interception vs. true CONNECT), calling
retryOrBail() on these no-error code paths may result in retryOrBail()
"catch all other errors" code creating bogus ERR_CANNOT_FORWARD errors.
Most tunneling errors are already detailed, and retryOrBail() does not
have enough information to correctly detail the remaining ones anyway.
Removing this retryOrBail() call selects the arguably lesser evil.
The client-Squid connection closure callback, clientClosed(), already
uses the same logic.
This change does not resurrect Bug 5132 fixed by commit 752fa20 that
added the now-replaced retryOrBail() call to serverClosed(). That commit
fixed the leak by calling deleteThis() (via retryOrBail()). Our
finishWritingAndDelete() call preserves that logic. That commit also
claimed to allow more retries, but that claim was a mistake: To-server
closure callback registration (e.g. commitToServer()) bans retries.
[^1]: The fact that severClosed() is called for both successful and
problematic outcomes prevents TunnelStateData from properly handling
certain (rare) errors. We tried to fix that as well, but the changes
quickly snowballed, so we left a few XXXs instead.
Alex Rousskov [Wed, 6 Dec 2023 05:25:53 +0000 (05:25 +0000)]
Docs: Describe more ACL effects on (re)authentication (#1611)
Existing documentation was
* silent about %ul, max_user_ip, ident, and ident_regex side effects;
* silent about adapted_http_access context triggering authentication;
* vague about (re)authentication triggers.