Packing reply headers into StoreEntry/ShmWriter directly means numerous
tiny append() calls which involve expensive mem_node/slice searches. For
example, every two-byte ": " and CRLF delimiter is packed separately.
Allow use of Samba TrivialDB instead of outdated BerkleyDB in
the session helper.
Require TrivialDB support for use of the time_quota helper.
libdb v1.85 is no longer supported by distributors and
upgrading to v5 only to deprecate use does not seem to be
worthwhile.
When dealing with an HTTP request header that Squid can parse but that
contains request URI length exceeding the 8K limit, Squid should log the
URL (prefix) instead of a dash. Logging the URL helps with triaging
these unusual requests. The older %ru (LFT_REQUEST_URI) was already
logging these huge URLs, but %>ru (LFT_CLIENT_REQ_URI) was logging a
dash. Now both log the URL (or its prefix).
As a side effect, %>ru now also logs error:request-too-large,
error:transaction-end-before-headers and other Squid-specific
pseudo-URLs, as appropriate.
Also refactored request- and URI-recording code to reduce chances of
similar inconsistencies reappearing in the future.
Also, honor strip_query_terms in %ru for large URLs. Not stripping query
string in %ru was a security problem.
Also fixed a bug with "redirected" flag calculation in
ClientHttpRequest::handleAdaptedHeader(). In general, http->url and
request->url should not be compared directly, because the latter always
passes through uri_whitespace cleanup, while the former does not.
Also fixed a bug with possibly wrong %ru after redirection:
ClientHttpRequest::log_uri was not updated in this case.
Also initialize AccessLogEntry::request and AccessLogEntry::notes ASAP.
Before this change, these fields were initialized in
ClientHttpRequest::doCallouts(). It is better to initialize them just
after the request object is created so that ACLs, running before
doCallouts(), could have them at hand. There are at least three such
ACLs: force_request_body_continuation, spoof_client_ip and
spoof_client_ip.
Also synced %ru and %>ru documentation with the current code.
A nil pointer is the proper way to indicate a missing heap-allocated
object in C++. Removing NullStoreEntry simplifies and optimizes code.
This removal also brings us one step closer to removing all virtual
methods from StoreEntry, further optimizing code and even saving 8 bytes
per non-shared memory cache entry on most platforms.
Also un-virtualized a few StoreEntry-only methods to optimize their
callers.
Optimization: Fewer memory (re)allocations for HTTP headers (#239)
Tests revealed multiple fresh memory allocations/deallocations while
storing small (few fields) HTTP headers. Many popular sites use larger
headers (15-30 fields). To avoid expensive memory operations:
1. Pool all std::vector<HttpHeaderEntries*> memory allocations.
2. Prevent reallocations (for HTTP headers with fewer than 32 fields).
This optimization deals with storing the header index. It does not
affect how individual header fields are stored.
Logging client "handshake" bytes is useful in at least two contexts:
* Runtime traffic bypass and bumping/splicing decisions. Identifying
popular clients like Skype for Business (that uses a TLS handshake but
then may not speak TLS) is critical for handling their traffic
correctly. Squid does not have enough ACLs to interrogate most TLS
handshake aspects. Adding more ACLs may still be a good idea, but
initial sketches for SfB handshakes showed rather complex
ACLs/configurations, _and_ no reasonable ACLs would be able to handle
non-TLS handshakes. An external ACL receiving the handshake is in a
much better position to analyze/fingerprint it according to custom
admin needs.
* A logged handshake can be used to analyze new/unusual traffic or even
trigger security-related alarms.
The current support is limited to cases where Squid was saving handshake
for other reasons. With enough demand, this initial support can be
extended to all protocols and port configurations.
Alex Rousskov [Fri, 6 Jul 2018 23:58:22 +0000 (23:58 +0000)]
Bug 4865: Unexpected exception on startup in TypedMsgHdr::sync() (#242)
Commit b56b37c broke Ipc::TypedMsgHdr copying by incorrectly assuming
that sync() sets name and ios members. The sync() method sets _other_
(low level) members based on name and ios.
Optimization: Fewer epoll(2) system calls when closing a socket (#235)
Squid was calling epoll(2) twice to clear a socket interest. One call is
more than enough: Technically, close(2) is supposed to clear epoll(2)
registration for us, but I did not risk relying on that.
In other environments, socket interest changes are pooled together
before being submitted to the OS, so Squid was doing a bit of extra
work, but not making (many) extra system calls AFAICT.
Also fixed (previously unused) Comm::ResetSelect() on these platforms:
* epoll(2): The old resetting code did not clear our interest AFAICT.
* kqueue(2): The old resetting code made no sense to me at all.
* poll(2): There was no code at all.
* select(Win32): There was no code at all.
Even though Comm::ResetSelect() implementation is now the same for all
platforms, I did not make that code platform-agnostic because it is
possible to optimize it further in platform-specific ways.
Alex Rousskov [Wed, 4 Jul 2018 15:59:26 +0000 (15:59 +0000)]
Documented when helper requests get queued (#238)
I had to change introductory paragraphs in several directives so that
the new documentation can refer to "numberofchildren". I fixed a few
spelling/grammar problems in changed paragraphs and edited them a bit
for consistency, but they need more work.
When an HTTPS or SSL-Bump port is configured without a cert=
parameter it results in a segmentation fault. Detect that
occurance and add the required FATAL error message instead for
these configurations where cert= is a parameter rather than an
option.
Our project terminology for config settings is;
"parameter"
- a required setting. Print a FATAL error message if missing.
"option"
- an optional setting. Ignored or default value if missing.
GCC-8 enables a lot more warnings related to unsafe coding
practices. The old Squid code contains a lot of risky buffer
size assumptions and implicit assumptions about C-string strcat,
strncat and snprintf changes when operating on those buffers -
many can result in output truncation. Squid's use of -Werror
makes these many issues all go from warnings to outright
compile failures.
Rather than just extending the char* buffer sizes not to
truncate this work seeks to actually remove the issues
permanently by converting to SBuf and updated Squid coding
styles.
The C++1z compilers (GCC-8 and Clang 4.0) are beginning to warn
about C functions memset/memcpy/memmove being used on class
objects which lack "trivial copy" constructor or assignment
operator - their use is potentially unsafe where anything more
complex than trivial copy/blit is required. A number of classes
in Squid are safely copied or initialized with those functions
for now but again the -Werror makes these hard errors.
Completing affected objects conversion from C to C++ code avoids
any deeply hidden issues or adding compiler exceptions to
silence the warnings.
see individual commit messages for details on the particular
changes each does.
Optimization: Do not create/configure ACLFilledChecklist in vain (#232)
While client_db is required for client-side pools to work, it may be
enabled for other reasons, without any client-side pools configured. We
should not create and configure useless ACLFilledChecklist objects
because those operations are already not trivial today and have a
a tendency of becoming more expensive with time.
This change disassociates Transients from collapsed forwarding, enabling
it for SMP caching configurations. Before this change, SMP Squid worker
could not read an entry being written by another worker. Besides
unexpected misses, there could be another (worse) negative effect: The
reader worker could get stuck because it did not get updates via
the Transients mechanism.
Also deprecate the collapsed_forwarding_shared_entries_limit directive
name in favor of shared_transient_entries_limit.
Also removed top-level Storage::smpAware() because memory cache SMP
awareness is determined by configuration and is now computed before we
create the memory cache Storage object. This ability to assess SMP
awareness earlier helps decide whether to create Transients segments.
Also eliminated code duplication in a couple of MemStoreRr methods.
Added a new CF tag to the Squid request status %Ss access log field.
This tag marks transactions that have waited for a CF initiator
transaction. This wait may happen in two cases (or their combination):
1. Classic collapsing: A client request gets collapsed on arrival
(e.g., TCP_CF_HIT or TCP_CF_MISS).
2. Collapsed revalidation: An internal revalidation request is collapsed
(e.g., TCP_CF_REFRESH_MODIFIED).
A CF tag approach is simple but the resulting access.log records cannot
distinguish some cases. For example, a pure collapsed revalidation
transaction (case 2) cannot be distinguished from these transactions:
* a collapsed client that got collapsed on revalidation (case 1+2);
* a collapsed client that initiated revalidation.
We may want to log more collapsing details in the future.
These changes do not affect CF initiating code.
In order to track collapsed transactions, a new CollapsingHistory class
was introduced. Since more and more non-logging code relies on ALE, this
history is kept in ALE. ClientHttpRequest uses its logType field instead
of the LogTags in ALE, so we also use logType for storing
ClientHttpRequest's CollapsedHistory. Eventually, ClientHttpRequest
should eliminate logType in favor of direct ALE use.
Also: ICP code fixing/refactoring:
* htcpSyncAle() and icpSyncAle() should not require the caller to supply
correct LogTags because callers like fillChecklist() do not have
access to that information (it is not stored in the transaction object
unlike the other pieces of info that these functions copy to ALE).
* Added icpUdpData::ale to preserve master transaction info when
messages are queued. Several icpUdpData improvements were triggered by
this change because ale is a (second!) non-POD member and icpUdpData
was mistreated as a POD. They include:
- Removed icpUdpData::start as unused.
- Removed icpUdpData::len as set but otherwise unused.
- Removed icpUdpData::logcode as essentially duplicating msg->opcode.
* Update ICP ALE, if any, as soon as the transaction tags become known
(instead of sometimes waiting for the ICP message to be logged). The
ICP message may be dropped and/or never be logged, but we should keep
ALE up to date because it is used in an increasingly many contexts.
Also found and marked an ICP memory leak. It is best to fix that in a
dedicated commit.
Also supplied URN code with ALE. Full-featured Client-based classes
already use ALE. We have not tested with URNs, but these changes may
improve logging of transactions that involve URN resolution.
Also fixed problematic StoreEntry::collapsingInitiator(). It could
return true if the entry had transients but had nothing to do with
collapsing. It also incorrectly assumed that a collapsed entry is always
marked with ENTRY_FWD_HDR_WAIT. That assumption is wrong because
Controller::allowCollapsing() does not set this flag for the entry.
We did not find a better way to track StoreEntry objects associated with
CF initiators than to add a new StoreEntry flag. Hitting an entry
flagged with ENTRY_REQUIRES_COLLAPSING requires collapsing the request.
Bug 4223: fixed retries of failed re-forwardable transactions (#211)
This change fixes Store to delay writing (and, therefore, sharing) of
re-triable errors (e.g., 504 Gateway Timeout responses) to clients:
once we start sharing the response with a client, we cannot re-try the
transaction. Since ENTRY_FWD_HDR_WAIT flag purpose is to delay response
sharing with Store clients, this patch fixes and clarifies its usage:
1. Removes unconditional clearing from startWriting().
2. Adds a conditional clearing to StoreEntry::write().
3. Sets it only for may-be-rejected responses.
(2) adds ENTRY_FWD_HDR_WAIT clearing to detect responses that filled the
entire read buffer and must be shared now with the clients because
they can no longer be retried with the current re-forwarding mechanisms
(which rely on completing the bad response transaction first) and will
get stuck. Such large re-triable error responses (>16KB with default
read_ahead_gap) should be uncommon. They were getting stuck prior to
master r12501.1.48. That revision started clearing ENTRY_FWD_HDR_WAIT in
StoreEntry startWriting() unconditionally, allowing all errors to be
sent to Store clients without a delay, and effectively disabling
retries.
Mgr::Forwarder was always setting ENTRY_FWD_HDR_WAIT, probably mimicking
similarly aggressive FwdState behavior that we are now removing. Since
the forwarder never rewrites the entry content, it should not need to
set that flag. The forwarder and associated Server classes must not
communicate with the mgr client during the client-Squid connection
descriptor handoff to Coordinator, but ENTRY_FWD_HDR_WAIT is not the
right mechanism to block such Squid-client communication. The flag
does not work well for this purpose because those Server classes may
(and do?) substitute the "blocked" StoreEntry with another one to
write an error message to the client.
Also moved ENTRY_FWD_HDR_WAIT clearance from many StoreEntry::complete()
callers to that method itself. StoreEntry::complete() is meant to be the
last call when forming the entry. Any post-complete entry modifications
such as retries are prohibited.
Amos Jeffries [Tue, 5 Jun 2018 06:11:29 +0000 (06:11 +0000)]
Bug 4831: filter chain certificates for validity when loading (#187)
51e09c08a5e6c582e7d93af99a8f2cfcb14ea9e6 adding
GnuTLS support required splitting the way
certificate chains were loaded. This resulted in the
leaf certificate being added twice at the prefix of a
chain in the serverHello.
It turns out that some recipients validate strictly that the
chain delivered by a serverHello does not contain extra
certificates and reject the handshake if they do.
This patch implements the XXX about filtering certificates
for chain sequence order and self-sign properties, added
in the initial PR. Resolving the bug 4831 regression and also
reporting failures at startup/reconfigure for admins.
Also, add debug display of certificate names for simpler
detection and administrative fix when loaded files fail
these tests.
Bug 4855: re-enable querying private entries for HTCP/ICP (#214)
This was broken since 4310f8b: HTCP/ICP misused Store as a storage of
private entries during queries (e.g., see
neighborsUdpPing()/neighborsUdpAck()). A smarter HTCP/ICP
implementation would maintain its own StoreEntry cache for this purpose
(just like the existing queried_keys array for cache keys). However,
fixing this is beyond this issue scope.
Amos Jeffries [Tue, 22 May 2018 12:55:35 +0000 (12:55 +0000)]
Bug 4707: purge tool does not obey --sysconfdir= build option (#210)
The purge tool was still using DEFAULT_SQUID_CONF macro from
before it was bundled with Squid, which had no connection to our
./configure script.
Update it to the DEFAULT_CONFIG_FILE macro used by other Squid
binaries and fix some existing issues with that macro's use by
binaries outside 'squid'.
Amos Jeffries [Sun, 20 May 2018 15:46:50 +0000 (15:46 +0000)]
Bug 4843 pt2: squidclient refactoring for GCC-8 (#208)
Replace fixed size buffers for mime header block and additional
custom headers. This fixes long standing issues with buffer
overflow from large custom header values which have become a
hard error in GCC-8.
Also improve snprintf() URL buffer limit handling and const
correctness for Transport::Write().
Amish [Wed, 16 May 2018 19:09:52 +0000 (19:09 +0000)]
New function to find Local listening IP and add new %A macro (#198)
Implemented a new function to find local listening IP:Port in a
consistent way and added support for %A macro in error pages and
deny_info URL using the same function.
Amos Jeffries [Sun, 13 May 2018 06:57:41 +0000 (06:57 +0000)]
Bug 4843 pt1: ext_edirectory_userip_acl refactoring for GCC-8 (#204)
Proposed changes to this helper to fix strcat / strncat buffer
overread / overflow issues.
The approach takes three parts:
* adds a makeHexString function to replace many for-loops
catenating bits of strings together with hex conversion into a
second buffer. Replacing with a snprintf() and buffer overflow
handling.
* a copy of Ip::Address::lookupHostIp to convert the input
string into IP address binary format, then generate the hex
string using the above new hex function instead of looped
sub-string concatenations across several buffers.
This removes all the "00" and "0000" strncat() calls and
allows far simpler code even with added buffer overflow
handling.
* replace multiple string part concatenations with a few simpler
calls to snprintf() for all the search_ip buffer constructions.
Adding buffer overflow handling as needed for the new calls.
Supply AccessLogEntry (ALE) for more fast ACL checks. (#182)
Supplying ALE for fast ACL checks allows those checks to use ACLs that
assemble values from logformat %codes. Today, such ACLs are limited to
misplaced external ACLs (that should not be used with "fast"
directives!), but it is likely that fast ACLs like annotate_client will
eventually require ALE.
The "has" ACL documentation promises ALE for every transaction, but our
code does not deliver on that promise. This change fixes a dozen of
easy cases where ALE was available nearby. Also a non-trivial
cache_peer_access case was fixed, which proved to be more complex
because of the significant call depth of the peerAllowedToUse() check,
which is a known design problem of its own.
More cases need fixing, and the whole concept of ALE probably needs to
be revised because logformat %code expansion is needed in the
increasing number of contexts that have nothing to do with access
logging.
Also fixed triggering of (probably pointless) level-1 warnings:
* ALE missing adapted HttpRequest object
* ALE missing URL
With fix applied, any ACLChecklist with ALE synchronizes it at
'pre-check' stage without logging level-1 warnings. Warnings are
triggered only if for some reason this 'pre-check' synchronization was
bypassed.
huaraz [Sun, 6 May 2018 16:06:42 +0000 (16:06 +0000)]
Bug 4042: ext_kerberos_ldap_group: add -P principal option (#195)
Added a -P principal option to ext_kerberos_ldap_group to
select a principal from the keytab overwriting the automated
method which may make it more responsive.
Bug 4845: NegotiateSsl crash on aborting transaction (#201)
Security::PeerConnector::NegotiateSsl() might be called after the
Security::PeerConnector object is gone. This race condition is present
on both regular SSL and SslBump code paths, but sightings are rare.
This bug shares the underlying cause (and the solution) with bug 3505.
TODO: Adjust Comm::SetSelect() API to prevent future bugs like this.
Amos Jeffries [Sat, 5 May 2018 14:42:12 +0000 (14:42 +0000)]
Bug 4847 pt1: regression in proxy_auth ACL flags (#191)
r15058 "Support for --long-acl-options" in Squid-4.0.21
unintentionally removed the proxy_auth ACL support for -i/+i
flags. See bug report for details.
Fix proxy_auth ACL -i and +i flags no longer working by copying
RegexData flags registration, since ACLs for UserData all use
the same names and meanings.
Add documentation to indicate that ident and ext_user ACLs do
support -i/+i just like proxy_auth ACLs.
TODO: fix server_cert_fingerprint ACL which is still broken.
squidadm [Sat, 5 May 2018 11:58:34 +0000 (11:58 +0000)]
Revert incorrect changes in PR189 (#199)
Commit a85f0df5d9226a613c14219a26c53f9a9e6c3a5f was supposed
to be documentation only, but somehow included changes from
another PR branch. Remove so the other PR can be applied with
correct commit details.
Amos Jeffries [Thu, 3 May 2018 15:52:04 +0000 (15:52 +0000)]
Bug 4852: regression in deny_info %R macro (#193)
SBuf::c_str() produces a temporary c-string which is not
guaranteed to survive, and does not survive as long as required
to print the deny_info URL. The HttpRequest::url path SBuf has
a much longer lifetime, so use a const reference to it instead.
Do not abuse argv[0] to supply roles and IDs to SMP kids (#176)
Use a newly added "--kid role-ID" command line option instead. Just like
argv[0], the new option is not meant for direct/human use.
This change allows exec(3)-wrapping tools like Valgrind to work with SMP
Squid: When launching kid processes, Valgrind does not pass Squid-formed
argv[0] to kid processes, breaking old kid role and ID detection code.
This change does not alter argv[0] of Squid processes. There is nothing
wrong with Squid-formed argv[0] values for Squid kids.
Also added a CommandLine class to support command line parsing without
code duplication. Squid needs to handle the new --kid option way before
the old mainParseOptions() handles the other options. The new class also
encapsulates argv manipulations, reducing main.cc pollution.
Avoid ssl/helper.cc "ssl_crtd" assertions on reconfiguration (#186)
Reconfiguration process consists of mainReconfigureStart() and
mainReconfigureFinish() steps separated by at least one main loop
iteration. Clearing a Squid global variable in mainReconfigureStart()
creates two problems for transactions that were started before
reconfiguration:
1. Transactions accessing that global _during_ reconfiguration loop
iteration(s) may be confused by the variable sudden disappearance.
2. Transactions accessing that global _after_ mainReconfigureFinish()
may be confused by the variable disappearance if reconfiguration
resulted in the global variable becoming nil.
To remove the first problem for ssl_crtd, external_acl, and redirecting
helpers, all of them are now reconfigured "instantly", during
mainReconfigureFinish().
To prevent crashes due to the second problem, Squid now generates helper
errors if the disappeared ssl_crtd or external_acl helpers are accessed
after reconfiguration. The admin is warned about such problems via
level-1 cache.log ERROR messages.
The second problem cannot be fully solved without storing (refcounted)
configuration globals inside each transaction that uses them. Such
serious changes are outside this small assertion-fixing project scope.
Reliable timestamp information is often critical for triage. We can use
the existing debugs() interface to add timestamps to FATAL messages. The
affected code already calls such risky functions as
storeDirWriteCleanLogs() so calling debugs() instead of printing
directly to files/syslog should not make things worse.
FATAL messages that were also logged to syslog (at LOG_ALERT level) are
still logged to syslog (at that same level, but now with the usual
Squid-generated prefix). Such syslog alerts can now be easily triggered
via a new ForceAlert() API.
Also treat segmentation faults, bus errors, and other signal-based
sudden deaths the same as most other FATAL errors -- log them to syslog.
Alexander Gozman [Fri, 13 Apr 2018 02:28:57 +0000 (02:28 +0000)]
Reworked packet/connection marking (#170)
The handling of packet and connection marks was odd: The clientside_mark
ACL worked with connection marks, but the directive of the same name
supported only packet marks. Also, clients packet MARK (if set)
overwrote CONNMARK and, as a result, broke ACL checking.
To minimize confusion, connection and packet marks are now separated:
* renamed the clientside_mark ACL to client_connection_mark
* renamed the clientside_mark directive to mark_client_packet
* added a mark_client_connection directive
While the first two points just clarify things, the last one introduces
a new functionality: It allows to set or change clients CONNMARK.
Both clientside_mark ACL and directive are now deprecated.
Alex Rousskov [Thu, 12 Apr 2018 22:12:39 +0000 (22:12 +0000)]
Fixed Transient reader locking broken by 4310f8b (#161)
The closeForWriting() call comment is correct -- we must keep the lock,
but the optional argument was accidentally lost (when undoing the failed
attempt to remove all long-term transient locks in the feature branch).
Also polished stale comments (which led to the above bug discovery!).
Also polished addEntry() aspects that I have missed in 4310f8b.
Support selective CF: collapsed_forwarding_access (#151)
The new directive controls whether individual requests (including
ICP/HTCP and revalidation requests) should participate in collapsed
forwarding. Admins want to limit collapsed forwarding because it carries
significant transaction-specific risks (and benefits!).
The fixed leak was accompanied by these cache.log errors:
ERROR: worker I/O push queue for ... overflow: ...
I/O queue overflows during disk read requests log the same error but do
not leak memory. Repeated overflows during disk write requests could
eventually exhaust IPC shared memory:
ERROR: ... exception: run out of shared memory pages for IPC I/O
With IPC memory exhausted due to leaks, rock disk I/O stops forever.
Amos Jeffries [Thu, 1 Mar 2018 23:48:28 +0000 (12:48 +1300)]
Use va_copy() on all platforms; fixed a dangerous low-level bug (#160)
To improve cross-compilation support and to simplify code, rely on C++11
cstdarg header instead of ./configure-time va_copy() detection.
Using ./configure-time detection for va_copy() is dangerous because when
it does not work (e.g., during a poorly configured cross-compilation
attempt), Squid may crash if va_copy() was needed but was not detected.
See also: Bug 4821 and bug 753.
Also found and fixed a low-level bug: StoreEntry::vappendf() was not
using va_copy() because store.cc lacked VA_COPY #defines. The affected
code (900+ callers!) is used for cache manager responses and Gopher
gateway response compilation. If any of those calls required a buffer
larger than 4KB, the lack of those va_copy() calls could lead to crashes
and/or data corruption issues on platforms where va_copy() is required.
Alexander Gozman [Fri, 16 Feb 2018 10:52:58 +0000 (13:52 +0300)]
Fix clientside_mark and client port logging in TPROXY mode (#150)
The clientside_mark ACL was not working with TPROXY because a
conntrack query could not find connmark without a true client port.
Ip::Intercept::Lookup() must return true client address, but its
TproxyTransparent() component was reseting the client port. We should
use zero port when we compute the source address for the Squid-to-peer
connection instead.
Amos Jeffries [Mon, 12 Feb 2018 15:05:25 +0000 (04:05 +1300)]
Fix loading certificates after tls-cert= changes (#144)
* Remove self-signed CA check
This check is not needed when loading the initial cert portion of a PEM file
as it will be performed later when loading the chain and was causing
self-signed CA to be rejected incorrectly.
* Fix a typo in debugs output
* Always generate static context from tls-cert= parameter
... if a cert= is provided. SSL-Bump still (for now) requires a static context as fallback when generate fails.
* Revert tlsAttemptHandshake to Squid_SSL_Accept API
Bug 4505: SMP caches sometimes do not purge entries (#46)
When Squid finds a requested entry in the memory cache, it does not
check whether the same entry is also stored in a cache_dir. The
StoreEntry object may become associated with its store entry in the
memory cache but not with its store entry on disk. This inconsistency
causes two known problems:
1. Squid may needlessly swap out the memory hit to disk, either
overwriting an existing (and identical) disk entry or, worse,
creating a duplicate entry on another disk. In the second case, the
two disk entries are not synchronized and may eventually start to
differ if one of them is removed or updated.
2. Squid may not delete a stale disk entry when needed, violating
various HTTP MUSTs, and eventually serving stale [disk] cache entries
to clients.
Another purging problem is not caused by the above inconsistency:
3. A DELETE request or equivalent may come for the entry which is still
locked for writing. Squid fails to get a lock for such an entry (in
order to purge it) and the entry remains in disk and/or memory cache.
To solve the first two problems:
* StoreEntry::mayStartSwapout() now avoids needless swapouts by checking
whether StoreEntry was fully loaded, is being loaded, or could have
been loaded from disk. To be able to reject swapouts in the last case,
we now require that the newer (disk) entries explicitly delete their
older variants instead of relying on the Store to overwrite the older
(unlocked) variant. That explicit delete should already be happening
in higher-level code (that knows which entry is newer and must mark
any stale entries for deletion anyway).
To fix problem #3:
* A new Store::Controller::evictIfFound(key) method purges (or marks for
deletion if purging is impossible) all the matching store entries,
without loading the StoreEntry information from stores. Avoiding
StoreEntry creation reduces waste of resources (the StoreEntry object
would have to be deleted anyway) _and_ allows us to mark being-created
entries (that are locked for writing and, hence, cannot be loaded into
a StoreEntry object).
XXX: SMP cache purges may continue to malfunction when the Transients
table is missing. Currently, Transients are created only when the
collapsed_forwarding is on. After Squid bug 4579 is fixed, every public
StoreEntry will have the corresponding Transients entry and vice versa,
extending these fixes to all SMP environments.
Note that even if Squid properly avoids storing duplicate disk entries,
some cache_dir manipulations by humans and Squid crashes may lead to
such duplicates being present. This patch leaves dealing with potential
duplicates out of scope except it guarantees that if an entry is
deleted, then all [possible] duplicates are deleted as well.
Fixing the above problems required (and/or benefited from) many related
improvements, including some Store API changes. It is impractical to
detail each change here, but several are highlighted below.
To propagate DELETEs across workers, every public StoreEntry now has a
Transients entry.
Prevented concurrent cache readers from aborting when their entry is
release()d. Unlike abort, release should not affect current readers.
Fixed store.log code to avoid "Bug: Missing MemObject::storeId value".
Removed Transients extras used to initialize MemObject StoreID/method in
StoreEntry objects created by Transients::get() for collapsed requests.
Controlled::get() and related Controller APIs do not _require_ setting
those MemObject details: get() methods for all cache stores return
StoreEntry objects without them (because entry basics lack Store ID and
request method). The caller is responsible for cache key collision
detection. Controlled::get() parameters could include Store ID and
request method for early cache key collision detection, but adding a
StoreQuery class and improving collision detection code is outside this
project scope (and requires many changes).
Found more cases where release() should not prevent sharing.
Remaining cases need further analysis as discussed in master 39fe14b2.
Greatly simplified UFS store rebuilding, possibly fixing subtle bug(s).
Clarified RELEASE_REQUEST flag meaning, becoming 'a private StoreEntry
which can't become public anymore'. Refactored the related code,
combining two related notions: 'a private entry' and 'an entry marked
for removal'.
Do not abort collapsed StoreEntries during syncing just because the
corresponding being stored shared entry was marked for deletion. Abort
them if the shared entry has been also aborted.
Added StoreEntry helper methods to prevent direct manipulation of
individual disk-related data members (swap_dirn, swap_filen, and
swap_status). These methods help keep these related data members in a
coherent state and minimize code duplication.
Amos Jeffries [Thu, 1 Feb 2018 09:51:54 +0000 (22:51 +1300)]
TLS: GnuTLS implementation for listening ports and client connections (#81)
Move the http_port cert= and key= options logic to libsecurity and add GnuTLS implementation for PEM file loading. Also adds some extra debugging to clarify listening port initialization problems with the PEM files.
Enable most of the http(s)_port listening socket logic to always build except where OpenSSL-specific dependency still exists. It may seem reasonable to leave it optionally excluded for minimal builds, however a minimal proxy that does not support HTTPS in any way is increasingly useless in the modern web so preference is given to building the generic TLS related code. This also simplifies the required testing to detect code portability issues.
GnuTLS implementation is added for https_port configured with static cert=/key= parameters and the resulting TLS handshake behaviour. Squid built with GnuTLS can now act as useful parent proxies behind a SSL-Bump'ing frontend or for other clients which require a TLS explicit proxy.
Also fixes the definitions for the CertPointer and PrivateKeyPointer.
Fix 889fc47 for SSL bumping with an authentication type other than the Basic (#104)
Commit 889fc47 was made to fix issue with Basic authentication and SSL bumping. But after this commit we can no longer properly use http_access with proxy_auth/proxy_auth_regex ACL because that type of ACL always return 1(match) regardless of the conditions in the rules.
Use the caches authentication results (if any) instead of a fixed 1(match) result.
Alex Rousskov [Tue, 23 Jan 2018 21:08:02 +0000 (14:08 -0700)]
Fixed store.cc "!mem_obj" assertion via peerDigestRequest (#134)
Broken by commit 76d61119 which (correctly) made createMeObject() assert
but missed one case where the old code should have been converted to
call the new ensureMemObject() instead.
peerDigestRequest() is called every 5 minutes, triggered by the
peerDigestCheck event. Most calls find the old digest entry that has the
same method and URIs.
Amos Jeffries [Sat, 20 Jan 2018 04:54:16 +0000 (17:54 +1300)]
Fixed Ip::Address copying (#126)
Explicit copy construction was slow and unnecessary.
Explicit copy assignment mishandled self copying and was unnecessary.
The remaining memcpy() calls mishandled self copying.
There are no known cases of Ip::Address self copying.
Amos Jeffries [Thu, 18 Jan 2018 20:54:50 +0000 (09:54 +1300)]
ESI: remove custom parser (#128)
Alex Rousskov:
let's consider removing the custom ESI parser from Squid. It is of
terrible quality and "nobody" is testing ESI code when things change. Is
the CVE risk worth supporting few platforms that do not have the right
parser libraries?
Andrey [Tue, 16 Jan 2018 23:28:59 +0000 (02:28 +0300)]
Added clientside_mark ACL for checking CONNMARK (#111)
Matches CONNMARK of accepted connections. Takes into account
clientside_mark and qos_flows mark changes (because Squid-set marks are
cached by Squid in conn->nfmark). Ignores 3rd-party marks set after
Squid has accepted the connection from a client (because Squid never
re-queries the connection to update/sync conn->nfmark).
Also added a debugs()-friendly API to print hex values.
Amos Jeffries [Mon, 15 Jan 2018 18:59:37 +0000 (07:59 +1300)]
Bug 3911: clang -fsanitize warnings (#125)
Fixes warnings from clang when -fsanitize is used. Many of these are also part of the bug 4738 issues.
error: private field 'callback' is not used [-Werror,-Wunused-private-field]
error: private field 'cbdata' is not used [-Werror,-Wunused-private-field]
error: private field 'IO' is not used [-Werror,-Wunused-private-field]
error: variable 'wccp2_router_id_element' is not needed and
will not be emitted [-Werror,-Wunneeded-internal-declaration]
We cannot set these warnings as default options yet because the STUB code intentionally does not use any private class members, so it would error on every unit test.
* Convert Store::LocalSearch to C++ initialization
* DiskThreadsDiskFile::IO is unused after setting by the constructor
Also, take the opportunity to redo the construct using C++11 initialization
* Remove currently unused wccp2_router_id_element
This resolves clang warnings until the WCCP redesign is completed.
Alex Rousskov [Wed, 10 Jan 2018 15:45:43 +0000 (08:45 -0700)]
Report exception locations and exception-related polish (#119)
Without location, many exceptions look identical: A growing number of
Must(entry != NULL) and Must(request) complicate triage. The location
info was already stored in TextException but was not reported.
Reporting exception location on a separate line makes admin-visible
FATAL/ERROR/WARNING messages easier to comprehend, and their primary
text becomes more "stable", which is good for documentation. Also, some
future exceptions will probably report multiple details, possibly even
context details collected as a low-level exception bubbles up to its
high-level handling/reporting code.
Also simplified/optimized TextException:
* TextException now reuses std::runtime_error message memory management
code, including its CoW optimizations/guarantees.
* Debug and TextException code now share the source location reporting
code (including Squid build prefix elision) in base/Here.{cc,h}.
Also simplified and polished SBuf-related exceptions, removing a few:
* Removed InvalidParamException as unused.
* Replaced SBufTooBigException with generic exceptions.
SBufTooBigException was misused (by SBuf::plength) and not useful. No
need to create a whole class just to parameterize an object!
* Replaced OutOfBoundsException with a generic exception.
OutOfBoundsException was not very useful (see SBufTooBigException). It
was used by one test case, that did not justify adding a whole class.
Also added SWALLOW_EXCEPTIONS() API to protect any code that may throw
unwanted exceptions. Reworked a few destructors after Must() changes
made it easier for GCC v6 to detect (and warn about) throwing code:
* Polished Ipc::Forwarder cleanup sequence. For Forwarders, I see no
reason to split/duplicate swanSong() functionality via a cleanup()
method. The swanSong() API exists so that job destructors do not need
to make confusing virtual method calls!
* Hid the AsyncJob destructor because all jobs should be "automatically"
deleted by the internal job code that guarantees a swanSong() call.
* Removed a bad (pair-less) StoreEntry::unregisterAbort() call from
Mgr::Forwarder destructor, possibly left behind in or around 51ea090.
* Removed ctor/dtor entrance debugging from the classes affected by the
"throwing destructor issue". AsyncJob covers that debugging need.
Amos Jeffries [Sun, 7 Jan 2018 14:31:16 +0000 (03:31 +1300)]
Bug 4631: security_file_certgen helper without disk cache (#95)
* disable the certificate DB disk cache if -s and -M command line options are omitted.
E.g. with this you can change squid.conf from:
sslcrtd_program security_file_certgen -s /var/lib/ssl_db -M 32MB
...to...
sslcrtd_program security_file_certgen
...and it will operate without the disk cache, generating certs fresh every time.
* Remove Ssl::CertificateDb::IsEnabledDiskStore()
Make the CertificateDb temporary objects dynamically allocated instead.
* Do command line checks in main() not the CertificateDb object.
This avoids a risky constructor exception and simplifies validity testing of parameters.
* Update man(8) documentation
The helper version is now 1.1. A minor version bump since it is being kept compatible with
installations using 1.0 properly but new feature available.
Also simplify the command line SYNOPSIS and incomplete mention of sslcrtd_* squid.conf directives.
Alex Rousskov [Tue, 2 Jan 2018 16:16:52 +0000 (09:16 -0700)]
Moved peer*(ps_state) functions into ps_state renamed PeerSelector (#113)
No functionality changes intended (other than debug message variations).
Also polished related documentation and debug messages.
Also converted "struct _icp_common_t" into an icp_common_t class, to
make its forward declarations simple. As a side effect, removed
__cplusplus ifdefs, addressing an old TODO.
Alex Rousskov [Thu, 28 Dec 2017 16:35:32 +0000 (09:35 -0700)]
Bug 2378: Duplicates in selected peer destinations (#112)
Duplicates in FwdServers lead to excessive peer connection retries, skew
in round-robin peer selection, and probably other problems.
This bug was fixed in 2008 but that v2 fix was never ported to v3. This
fix includes a bug 2408 fix for the original (bug 2378) fix, although I
adjusted bug 2408 logic to explicitly reject duplicate PINNED
destinations and to clarify why PINNED connection handling is "special".
I also centralized and improved peerAddFwdServer-related debugging,
removing duplicated and slightly inconsistent code.
Amos Jeffries [Fri, 15 Dec 2017 02:50:53 +0000 (15:50 +1300)]
Convert Acl::InnerNode to C++11 for-each loop (#101)
This also fixes a bug in some STL implementations where passing for_each &ACL::prepareForUse
results in the ACL class nil-method explicitly running instead of the child ACL class
virtual method.
Squid FTP server dying because of an unhandled exception. (#102)
Related message in cache.log:
FATAL: Dying from an exception handling failure; exception: reply
Unfortunately, Squid does not report the exact place where the exception
was thrown, however the most possible reason is a "Must(reply)" failure inside
Ftp::Server::writeErrorReply.