Bug 4864: !Comm::MonitorsRead assertion in maybeReadVirginBody() (#351)
This assertion is probably triggered when Squid retries/reforwards
server-first or step2+ bumped connections (after they fail).
Retrying/reforwarding such pinned connections is wrong because the
corresponding client-to-Squid TLS connection was negotiated based on the
now-failed Squid-to-server TLS connection, and there is no mechanism to
ensure that the new Squid-to-server TLS connection will have exactly the
same properties. Squid should forward the error to client instead.
Also fixed peer selection code that could return more than one PINNED
paths with only the first path having the destination of the actual
pinned connection.
This is a Measurement Factory project
This is a limited equivalent to master branch commit 3dde9e52
Supply ALE to request_header_add/reply_header_add (#564)
Supply ALE to request_header_add and reply_header_add ACLs that need it
(e.g., external, annotate_client, and annotate_transaction ACLs). Fixes
"ACL is used in context without an ALE state" errors when external ACLs
are used in the same context (other ACLs do not yet properly disclose
that they need ALE).
Also provides HTTP reply to reply_header_add ACLs.
DrDaveD [Mon, 30 Dec 2019 20:43:33 +0000 (20:43 +0000)]
Bug 4735: Truncated chunked responses cached as whole (#528)
Mark responses received without the last chunk as responses that have
bad (and, hence, unknown) message body length (i.e. ENTRY_BAD_LENGTH).
If they were being cached, such responses will be released and will stop
being shareable.
Fix server_cert_fingerprint on cert validator-reported errors (#522)
The server_cert_fingerprint ACL mismatched when sslproxy_cert_error
directive was applied to validation errors reported by the certificate
validator because the ACL could not find the server certificate.
Fix the parsing of the received listing from FTP services.
Also relaxed size/filename grammar used for DOS listings: Tolerate
multiple spaces between the size and the filename.
Fix shared memory size calculation on 64-bit systems (#520)
Since commit 2253ee0, the wrong type (uint32 instead of size_t) was used
to calculate the PagePool::theLevels size. theLevels memory (positioned
by different and correct code) did not overlap with the raw pages
buffer, but the raw pages buffer could, in some cases, be 32 bits short,
placing the last 4 bytes of the last page outside of allocated memory.
In practice, shared memory allocations are page-aligned, and the
difference in 4 bytes was probably compensated by the extra allocated
bytes in most (or perhaps even all) cases.
jijiwawa [Sat, 23 Nov 2019 10:24:41 +0000 (10:24 +0000)]
Bug 5008: SIGBUS in PagePool::level() with custom rock slot size (#515)
SMP Squids were crashing on arm64 due to incorrect memory alignment of
Ipc::Mem::PagePool::theLevels array. The relative position of the array
depends on the number of workers and the number of pages (influenced by
the cache capacity and slot size), so some configurations worked OK.
We have to manually align manually positioned fields inside shared
memory segments. Thankfully, C++11 provides alignment-computing APIs.
Alex Rousskov [Sat, 23 Nov 2019 09:18:24 +0000 (09:18 +0000)]
Bug 5009: Build failure with older clang libc++ (#514)
Older clang libc++ implementations correctly reject implicit usage of an
explicit (in C++11) std::map copy constructor with "chosen constructor
is explicit in copy-initialization" errors. The same code becomes legal
in C++14[1], so newer libc++ implementation allow implicit usage (even
in C++11), but there is no need for copy-initialization here at all.
Evidently, libstdc++ has never declared constructors explicit.
The bug was seen with Apple clang in Xcode 5.1.1 (roughly upstream clang
3.4) and Xcode 6.2 (roughly upstream clang 3.5), both using libc++.
Amos Jeffries [Mon, 18 Nov 2019 12:06:56 +0000 (01:06 +1300)]
Fix detection of sys/sysctl.h detection (#511)
Make sure we test the EUI specific headers using same flags
chosen for final build operations. This should make the
test detect the header as unavailable if the user options
would make the compiler #warning be a fatal error later.
squidcontrib [Sun, 20 Oct 2019 18:59:08 +0000 (18:59 +0000)]
Hash Digest noncedata (#491)
These commits together
1. Hash the noncedata for Digest nonces before encoding,
to match the documentation.
2. Encode Digest nonces using hex, rather than base64.
Fix rock disk entry contamination related to aborted swapouts (#444)
Also probably fixed hit response corruption related to aborted rock
swapins.
The following disk entry contamination sequence was observed in detailed
cache logs during high-load Polygraph tests. Some of the observed
level-1 errors matched those in real-world deployments.
1. Worker A schedules AX – a request to write a piece of entry A to disk
slot X. The disker is busy writing and reading other slots for worker
A and other workers so AX stays in the A-to-Disker queue for a while.
2. Entry A aborts swapout (for any reason, including network errors
while receiving the being-stored response). Squid makes disk slot X
available for other entries to use. AX stays queued.
3. Another worker B picks up disk slot X (from the shared free disk slot
pool) and schedules BX, a request to write a piece of entry B to disk
slot X. BX gets queued in an B-to-Disker queue. AX stays queued.
4. The disker satisfies write request BX. Disk slot X contains entry B
contents now. AX stays queued.
5. The disker satisfies write request AX. Disk slot X is a part of entry
B slot chain but contains former entry A contents now! HTTP requests
for entry B now read entry A leftovers from disk and complain about
metadata mismatches (at best) or get wrong response contents (at
worst).
To prevent premature disk slot reuse, we now keep disk slots reserved
while they are in the disk queue, even if the corresponding cache entry
is long gone: Individual disk write requests now "own" the slot they are
writing. The Rock::IoState object owns reserved but not yet used slots
so that they can be freed when the object is gone. The disk entry owns
the (successfully written) slots added to its chain in the map.
The new slot ownership scheme required changes in what metadata the
writing code has to maintain. For example, we now keep the address of
the previous slot in the entry chain so that we can update its .next
field after a successful disk write. Also, the old code detecting
dropped write requests could no longer rely on the now-empty .next field
in the previous map entry. The rewritten code numbers I/O transactions
so that out-of-order replies can be detected without using the map.
I tried to simplify the metadata maintenance code by shortening
reservation lifetimes and using just-in-time [first] slot reservations.
The new code may also leak fewer slots when facing C++ exceptions.
As for reading, I realized that we had no protection from dropped rock
read requests. If the first read request is dropped, the metadata
decoding would probably fail but if subsequent reads are skipped, the
client may be fed with data that is missing those skipped blocks. I did
not try to reproduce these problems, but they are now fixed using the
same I/O transaction numbering mechanism that the writing code now uses.
Negative length checks in store_client.cc treat dropped reads as errors.
I also removed commented out "partial writing" code because IoState
class member changes should expose any dangerous merge problems.
urnHandleReply() may be called several times while copying the entry
from the store. Each time it must use the buffer length that is left
(from the previous call).
Also do not abandon a urn entry, still having clients attached.
Also allow urnHandleReply() to produce a reply if it receives a
zero-sized buffer. This may happen after the entry has been fully
stored.
Initial replacement of URI/URL parse method internals with
SBuf and Tokenizer based parse.
For now this parsing only handles the scheme section of
URL. With this we add the missing check for alpha character
as first in the scheme name for unknown schemes and
prohibit URL without any scheme (previously accepted).
Also polishes the documentation, URN and asterisk-form
URI parsing.
Also, adds validation of URN NID portion characters to
ensure valid authority host names are generated for
THTTP lookup URLs.
Fix the SQUID_CC_REQUIRE_ARGUMENT autoconf function (#478)
Inside AC_DEFUN(), autoconf replaces `$1` with the first argument of the
function. In this case, the first argument is a variable name. To get
the _value_ of that variable, one has to use `$$1`.
One known effect of this fix (in many build environments) is the
disappearance of the following annoying extra error when a build fails
for some other reason:
unrecognized command line option -Wno-deprecated-register
RFC 7230: server MUST reject messages with BWS after field-name (#445)
Obey the RFC requirement to reject HTTP requests with whitespace
between field-name and the colon delimiter. Rejection is
critical in the presence of broken HTTP agents that mishandle
malformed messages.
Also obey requirement to always strip such whitespace from HTTP
response messages. The relaxed parser is no longer necessary for
this response change.
For now non-HTTP protocols retain the old behaviour of removal
only when using the relaxed parser.
FX Coudert [Wed, 11 Sep 2019 05:12:04 +0000 (05:12 +0000)]
Fix detection of OpenSSL built w/o deprecated features support (#470)
SSL_library_init() is deprecated since OpenSSL v1.1 and is absent in
OpenSSL built without deprecated features. Several distributions (e.g.
Homebrew) ship OpenSSL built without deprecated features.
Instead of tunneling traffic, a matching on_unsupported_protocol
"tunnel" action resulted in a Squid error response sent to the client
(or, where an error response was not possible, in a connection closure).
The following three cases were fixed:
Also, when on_unsupported_protocol was configured, Squid wasted RAM and
CPU cycles to buffer client HTTP requests beyond the point of no return
(i.e., roughly, beyond the first HTTP request on a connection or in a
tunnel), when on_unsupported_protocol settings no longer apply.
Client handshake accumulation is now driven by preservingClientData_. We
set that data member when the connection is accepted (because we may
decide to start preserving bytes right away) and reset it whenever that
decision may change, including when switching to a new protocol inside
CONNECT tunnel and confirming the expected/supported protocol by
successfully parsing its handshake.
Squid does not stop handshake preservation when on_unsupported_protocol
gets disabled during reconfiguration, but Squid will not tunnel
preserved bytes if that happens (and will not tunnel a partial handshake
if on_unsupported_protocol configuration keeps changing).
Also changed how IPv6-based certificates are generated. Their CN field
value is no longer surrounded by [square brackets]. This change was done
to improve Squid code that had to be modified to fix
on_unsupported_protocol. It affects certificate cache key so old
IPv6-based certificates will never be found (and will eventually be
purged) while new ones will be generated and cached instead. We believe
these IPv6-based certificates are rare and untrusted by browsers so the
change in their CN should not have a significant affect on users.
Bug 4918: Crashes when using OpenSSL prior to v1.0.2 (#465)
The implementation of x509_get0_signature() replacement in 24b30fd was
based on OpenSSL v1.1.0 where `signature` and `sig_alg` members of
`x509_st` structure stopped being raw pointers and became structures.
The mismatch caused segfaults when using OpenSSL versions that lacked
x509_get0_signature() -- anything earlier than OpenSSL v1.0.2.
Fixed parsing of TLS messages that span multiple records (#457)
Squid fed the TLS message parser with one TLS record fragment
at a time but allowed InsufficientInput exceptions to bubble up
beyond the TLS message parsing code. If a server handshake
message spans multiple TLS records, and Squid reads all those
records together with the end of the TLS server handshake, then
the higher-level code interprets InsufficientInput as the need
for more TLS records for the record parser (rather than more
fragments for the TLS message parser). The affected transaction
would then time out or otherwise fail while waiting for those
non-existent TLS records to come from the server.
We now parse TLS messages only after accumulating all same-type
TLS records. For truncated handshakes, this may reduce the
level of information extracted by Squid in some cases, but
this approach keeps the code simple. The handshake is still
available for logging if that partial info is needed for triage.
Test case: 1000-sans.badssl.com which sends a huge server certificate.
* The "mem-loaded all" message was printing -1 instead of the
accumulated object size. It also deserves a lower debugging level
because it happens at most once per transaction.
Fix parsing of certificate validator responses (#452)
If a certificate validator did not end its response with an end-of-line
or whitespace character, then Squid, while parsing the response,
accessed the bytes after the end of the buffer where the response is
stored.
GCC-9 with Squid use of -Werror makes these warning hard
errors which can no longer be ignored. We are thus required
to alter this third-party code when built for Squid.
Truncation of these strings is fine. Rather than suppress
GCC warnings, switch to xstrncpy() which has similar
behaviour but guarantees c-string terminator exists within
the copied range limit (removing need for two -1 hacks).
This change will add terminators on path and device_type
values in the rare case of overly long configured values.
It is not clear what ancient Domain Controllers would do
when handed un-terminated c-string in those cases, but was
unlikely to be good.
Partial disk writes may be useful for CF disk slaves and SMP disk hit
readers, but their correct implementation requires a lot of additional
work, the current implementation is insufficient/buggy, and partially
written entries were probably never read because Rock writers do not
enable shared/appending locks.
Here is a brief (but complicated) history of the issue, for the record:
1. 807feb1 adds partial disk writes "in order to propagate data from
the hit writer to the collapsed hit readers". The readers
probably could not read any data though because the disk entry
was still exclusively locked for writing. The developers either
did not realize that or intended to address it later, but did not
document their intentions -- all this development was happening
on a fast-moving CF development branch.
2. 0b8be48 makes those partial disk writes conditional on CF being
enabled. It is not clear whether the developers wanted to reduce
the scope of a risky feature or did not realize that non-CF use
cases also benefit from partial writes (when fully supported).
3. ce49546 adds low-level appending lock support but does not
actually use append locks for any caches.
4. 4475555 adds appending support to the shared memory cache.
5. 4976925 explicitly disables partial disk writes, acknowledging
that they were probably never used (by readers) anyway due to the
lack of a Ipc::StoreMap::startAppending() call. The same commit
documents that partial writes caused problems (for writers) in
performance tests.
6. 5296bbd re-enables partial disk writes (for writers) after fixing
problems detected earlier in performance tests. This commit does
not add the critical (for readers) startAppending() call. It
looks like the lack of that call was overlooked, again!
When parsing entries from /etc/hosts file, they are all lowered
(see bug 3040). If cache_peer hostname is uppercase, it will
lead to DNS resolution failure. Lowering cache_peer host fixes
this issue.
This change may expose broken Squid configurations that
incorrectly relied on non-lowercase peer host names to
bypass Squid's "is this cache_peer different from me?"
check. Though such configurations should encounter
forwarding loop errors later anyway.
Bug 4957: Multiple XSS issues in cachemgr.cgi (#429)
The cachemgr.cgi web module of the squid proxy is vulnerable
to XSS issue. The vulnerable parameters "user_name" and "auth"
have insufficient sanitization in place.
FreeBSD defines FD_NONE in /usr/include/fcntl.h to be magic to
the system. We are not using that name explicitly anywhere, but
it may make sense to keep it around as a default value for
fd_type. Rename the symbol to avoid the clash and fix the build
on FreeBSD.
Bug 4842: Memory leak when http_reply_access uses external_acl (#424)
Http::One::Server::handleReply() sets AccessLogEntry::reply which may
already be set. It is already set, for example, when the ACL code
has already called syncAle() because external ACLs require an ALE.
Squid converted any invalid response shorter than 4 bytes into an
invalid "HTTP/1.1 0 Init" response (with those received characters and a
CRLFCRLF suffix as a body). In some cases (e.g., with ICAP RESPMOD), the
resulting body was not sent to the client at all.
Now Squid handles such responses the same way it handles any non-HTTP/1
(and non-ICY) response, converting it into a valid HTTP/200 response
with an X-Transformed-From:HTTP/0.9 header and received bytes as
a message body.
Amos Jeffries [Sat, 8 Jun 2019 11:40:40 +0000 (11:40 +0000)]
Fix GCC-9 build issues (#413)
GCC-9 continues the development track started with GCC-8
producing more warnings and errors about possible code issues
which Squid use of "-Wall -Werror" turns into hard build
failures:
error: 'strncpy' output may be truncated copying 6 bytes from a
string of length 6 [-Werror=stringop-truncation]
error: '%s' directive argument is null
[-Werror=format-overflow=]
error: 'void* memset(void*, int, size_t)' clearing an object of
type ... with no trivial copy-assignment; use assignment or
value-initialization instead [-Werror=class-memaccess]
error: 'void* memset(void*, int, size_t)' clearing an object of
non-trivial type ...; use assignment or value-initialization
instead [-Werror=class-memaccess]
Also, segmentation faults with minimal builds have been
identified as std::string template differences between
optimized and non-optimized object binaries. This results in
cppunit (built with optimizations) crashing unit tests when
freeing memory. Workaround that temporarily by removing the use
of --disable-optimizations from minimal builds.
Amos Jeffries [Thu, 6 Jun 2019 12:06:41 +0000 (00:06 +1200)]
Bug 4953: to_localhost does not include :: (#410)
Some OS treat unspecified destination address as an implicit
localhost connection attempt. Add ::/128 alongside the
to_localhost 0.0.0.0/32 address to let admin forbid these
connections when DNS entries wrongly contain [::].
Also, adjust ::1 to ::1/128 to match IPv4 range-based definition
and clarify that IPv6 localhost is /128 rather than /127.
Amos Jeffries [Sat, 10 Nov 2018 04:00:12 +0000 (17:00 +1300)]
Fix tls-min-version= being ignored
Audit required change to make PeerOptions::parse() call
parseOptions() when 'options=' altered sslOptions instead of
delaying the parse to context creation.
This missed the fact that for GnuTLS the tlsMinVersion was
also updating the sslOptions string rather than the
parsedOptions variable later in the configuration process.
Call parseOptions() to reset the parsedOptions value whenever
sslOptions string is altered.
Amos Jeffries [Tue, 21 May 2019 21:31:31 +0000 (21:31 +0000)]
Replace uudecode with libnettle base64 decoder (#406)
Since RFC 7235 updated the HTTP Authentication credentials token
to the token68 characterset it is possible that characters
uudecode cannot cope with are received.
The Nettle decoder better handles characters which are valid but
not to be used for Basic auth token.
Matthieu Herrb [Mon, 13 May 2019 08:45:57 +0000 (08:45 +0000)]
Bug 4889: Ignore ECONNABORTED in accept(2) (#404)
An aborted connection attempt does not affect listening socket's
ability to accept other connections. If the error is not ignored, Squid
gets stuck after logging an oldAccept error like this one:
This bug fix was motivated by accept(2) changes in OpenBSD v6.5 that
resulted in new ECONNABORTED errors under regular deployment conditions:
https://github.com/openbsd/src/commit/c255b5a
Amos Jeffries [Sat, 4 May 2019 06:53:45 +0000 (06:53 +0000)]
Bug 4942: --with-filedescriptors does not do anything (#395)
SQUID_CHECK_MAXFD has been unconditionally overwriting any
user-defined limit with an auto-detected limit from the build
machine. The change causing this was an incomplete fix for
bug 3970 added to v3.3 and later releases.
Fixing that problem has two notable side effects:
* the user-defined value now has the FD property checks applied
to it (multiple of 64, too-few, etc). This means warnings will
start to appear in build logs for a number of custom
configurations. We should expect an increase in questions
about that.
* builds which have previously been passing in outrageous values
will actually start to use those values as the SQUID_MAXFD
limit. This may result in surprising memory consumption or
performance issues. Hopefully the warnings and new messages
displaying auto-detected limit separate from the value used
will reduce the admin surprise, but may not.
This PR also includes cleanup of the autoconf syntax within the
SQUID_CHECK_MAXFD macro and moves the ./configure warnings about
possible issues into that check macro.
When MIT or Heimdal Keberos libraries are installed at a custom
location there may be several krb5-config installed. The one
located at the user-provided path (if any) needs to have preference.
This assertion could be triggered by various swapout failures for
ufs/aufs/diskd cache_dir entries.
The bug was caused by 4310f8b change related to storeSwapOutFileClosed()
method. Before that change, swapout failures resulted in
StoreEntry::swap_status set to SWAPOUT_NONE, preventing
another/asserting iteration of StoreEntry::swapOut().
This fix adds SWAPOUT_FAILED swap status for marking swapout failures
(instead of reviving and abusing SWAPOUT_NONE), making the code more
reliable.
Also removed storeSwapOutFileNotify() implementation. We should not
waste time on maintaining an unused method that now contains conflicting
assertions: swappingOut() and !hasDisk().
Alex Rousskov [Mon, 1 Apr 2019 16:58:36 +0000 (16:58 +0000)]
Bug 4796: comm.cc !isOpen(conn->fd) assertion when rotating logs (#382)
Squid abandoned cache.log file descriptor maintenance, calling fd_open()
but then closing the descriptor without fd_close(). If the original file
descriptor value was reused for another purpose, Squid would either hit
the reported assertion or log a "Closing open FD" WARNING (depending on
the new purpose). The cache.log file descriptor is closed on log
rotation and reconfiguration events.
This short-term solution avoids assertions and WARNINGs but sacrifices
cache.log listing in fd_table and, hence, mgr:filedescriptors reports.
The correct long-term solution is to properly maintain descriptor meta
information across cache.log closures/openings, but doing so from inside
of debug.cc is technically difficult due to linking boundaries/problems.
Alex Rousskov [Tue, 19 Mar 2019 20:30:55 +0000 (20:30 +0000)]
When using OpenSSL, trust intermediate CAs from trusted stores (#383)
According to [1], GnuTLS and NSS do that by default.
Use case: Chrome and Mozilla no longer trust Semantic root CAs _but_
still trust several whitelisted Semantic intermediate CAs[2]. Squid
built with OpenSSL cannot do that without X509_V_FLAG_PARTIAL_CHAIN.
Amos Jeffries [Thu, 7 Mar 2019 13:50:38 +0000 (13:50 +0000)]
Bug 4928: Cannot convert non-IPv4 to IPv4 (#379)
... when reaching client_ip_max_connections
The client_ip_max_connections limit is checked before the TCP dst-IP is located for the newly received TCP connection. This leaves Squid unable to fetch the NFMARK or similar
details later on (they do not exist for [::]).
Move client_ip_max_connections test later in the TCP accept process to ensure dst-IP is known when the error is produced.
Alex Rousskov [Sun, 24 Feb 2019 03:28:47 +0000 (03:28 +0000)]
Fixed squidclient authentication after 4b19fa9 (Bug 4843 pt2) (#373)
* squidclient -U sent Proxy-Authorization instead of Authorization.
Code duplication bites again.
* squidclient -U and -u could sent random garbage after the correct
[Proxy-]Authorization value as exposed by Coverity CID 1441999: Unused
value (UNUSED_VALUE). Coverity missed this deeper problem, but
analyzing its report lead to discovery of the two bugs fixed here.
Also reduced authentication-related code duplication.
Conflicts:
tools/squidclient/squidclient.cc
mahdi1001 [Sun, 24 Feb 2019 09:24:14 +0000 (12:54 +0330)]
Add support for buffer-size= to UDP logging #359 (#377)
* Add support for buffer-size= to UDP logging #359
Allow admin control of buffering for log outputs written to UDP
receivers using the buffer-size= parameter.
buffer-size=0byte disables buffering and sends UDP packets
immediately regardless of line size.
When non-0 values are used lines shorter than the buffer may be
delayed and aggregated into a later UDP packet.
Log lines larger than the buffer size will be sent immediately
and may trigger delivery of previously buffered content to
retain log order (at time of send, not UDP arrival).
To avoid truncation problems known with common recipients
the buffer size remains capped at 1400 bytes.
Restored the natural order of the following two notifications:
* BodyConsumer::noteMoreBodyDataAvailable() and
* BodyConsumer::noteBodyProductionEnded() or noteBodyProducerAborted().
Commit b599471 unintentionally reordered those two notifications. Client
kids (and possibly other BodyConsumers) relied on the natural order to
end their work. If an HttpStateData job was done with the Squid-to-peer
connection and only waiting for the last adapted body bytes, it would
get stuck and leak many objects. This use case was not tested during b599471 work.
Amish [Wed, 2 Jan 2019 11:51:45 +0000 (11:51 +0000)]
basic_ldap_auth: Return BH on internal errors; polished messages (#347)
Basic LDAP auth helper now returns BH instead of ERR in case of errors
other than LDAP_SECURITY_ERROR, per helper guidelines.
Motivation: I have a wrapper around Basic LDAP auth helper. If an LDAP
server is down, then the helper returns BH, and the wrapper uses
a fallback authentication source.
Also converted printf() to SEND_*() macros and reduced message
verbosity.
Systems which have been partially 'IPv6 disabled' may allow
sockets to be opened and used but missing the IPv6 loopback
address.
Implement the outstanding TODO to detect such failures and
disable IPv6 support properly within Squid when they are found.
This should fix bug 4915 auth_param helper startup and similar
external_acl_type helper issues. For security such helpers are
not permitted to use the machine default IP address which is
globally accessible.
Fail Rock swapout if the disk dropped some of the write requests (#352)
Detecting dropped writes earlier is more than a TODO: If the last entry
write was successful, the whole entry becomes available for hits
immediately. IpcIoFile::checkTimeouts() that runs every 7 seconds
(IpcIoFile::Timeout) would eventually notify Rock about the timeout,
allowing Rock to release the failed entry, but that notification may
be too late.
The precise outcome of hitting an entry with a missing on-disk slice is
unknown (because the bug was detected by temporary hit validation code
that turned such hits into misses), but SWAPFAIL is the best we could
hope for.
Initialize StoreMapSlice when reserving a new cache slot (#350)
Rock sets the StoreMapSlice::next field when sending a slice to disk. To
avoid writing slice A twice, Rock allocates a new slice B to prime
A.next right before writing A. Scheduling A's writing and, sometimes,
lack of data to fill B create a gap between B's allocation and B's
writing (which sets B.next). During that time, A.next points to B, but
B.next is untouched.
If writing slice A or swapout in general fails, the chain of failed
entry slices (now containing both A and B) is freed. If untouched B.next
contains garbage, then freeChainAt() adds "random" slices after B to the
free slice pool. Subsequent swapouts use those incorrectly freed slices,
effectively overwriting portions of random cache entries, corrupting the
cache.
How did B.next get dirty in the first place? freeChainAt() cleans the
slices it frees, but Rock also makes direct noteFreeMapSlice() calls.
Shared memory cache may have avoided this corruption because it makes no
such calls.
Ipc::StoreMap::prepFreeSlice() now clears allocated slices. Long-term,
we may be able to move free slice management into StoreMap to automate
this cleanup.
Also simplified and polished slot allocation code a little, removing the
Rock::IoState::reserveSlotForWriting() middleman. This change also
improves the symmetry between Rock and shared memory cache code.
Before this fix, Squid sometimes logged the following error:
BUG: Worker I/O pop queue for ... overflow: ...
The bug could result in truncated hit responses, reduced hit ratio, and,
combined with buggy lost I/O handling code (GitHub PR #352), even cache
corruption.
The bug could be triggered by the following sequence of events:
* Disker dequeues one I/O request from the worker push queue.
* Worker pushes more I/O requests to that disker, reaching 1024 requests
in its push queue (QueueCapacity or just "N" below). No overflow here!
* Worker process is suspended (or is just too busy to pop I/O results).
* Disker satisfies all 1+N requests, adding each to the worker pop queue
and overflows that queue when adding the last processed request.
This fix limits worker push so that the sum of all pending requests
never exceeds (pop) queue capacity. This approach will continue to work
even if diskers are enhanced to dequeue multiple requests for seek
optimization and/or priority-based scheduling.
Pop queue and push queue can still accommodate N requests each. The fix
appears to reduce supported disker "concurrency" levels from 2N down to
N pending I/O requests, reducing queue memory utilization. However, the
actual reduction is from N+1 to N: Since a worker pops all its satisfied
requests before queuing a new one, there could never be more than N+1
pending requests (N in the push queue and 1 worked on by the disker).
We left the BUG reporting and handling intact. There are no known bugs
in that code now. If the bug never surfaces again, it can be replaced
with code that translates low-level queue overflow exception into a
user-friendly TextException.
Alex Rousskov [Tue, 8 Jan 2019 15:14:18 +0000 (15:14 +0000)]
Fix BodyPipe/Sink memory leaks associated with auto-consumption (#348)
Auto-consumption happens (and could probably leak memory) in many cases,
but this leak was exposed by an eCAP service that blocked or replaced
virgin messages.
The BodySink job termination algorithm relies on body production
notifications. A BodySink job created after the body production had
ended can never stop and, hence, leaks (leaking the associated BodyPipe
object with it). Such a job is also useless: If production is over,
there is no need to free space for more body data! This change avoids
creating such leaking and useless jobs.
Amos Jeffries [Sun, 6 Jan 2019 13:22:19 +0000 (13:22 +0000)]
Bug 4875 pt2: GCC-8 compile errors with -O3 optimization (#288)
GCC-8 warnings exposed at -O3 optimization causes its
own static analyzer to detect optimized code is eliding
initialization on paths that do not use the
configuration variables.
Refactor the parseTimeLine() API to return the parsed
values so that there is no need to initialize anything prior
to parsing.
Fixed forward_max_tries documentation and implementation (#277)
Before 1c8f25b, FwdState::n_tries counted the total number of forwarding
attempts, including pinned and persistent connection retries. Since that
revision, it started counting just those retries. What should n_tries
count? The counter is used to honor the forward_max_tries directive, but
that directive was documented to limit the number of _different_ paths
to try. Neither 1c8f25b~1 nor 1c8f25b code matched that documentation!
Continuing to count just pinned and persistent connection retries (as in 1c8f25b) would violate any reasonable forward_max_tries intent and admin
expectations. There are two ways to fix this problem, synchronizing code
and documentation:
* Count just the attempts to use a different forwarding path, matching
forward_max_tries documentation but not what Squid has ever done. This
approach makes it difficult for an admin to limit the total number of
forwarding attempts in environments where, say, the second attempt is
unlikely to succeed and will just incur wasteful delays (Squid bug
4788 report is probably about one of such use cases). Also,
implementing this approach may be more difficult because it requires
adding a new counter for retries and, for some interpretations of
"different", even a container of previously visited paths.
* Count all forwarding attempts (as before 1c8f25b) and adjust
forward_max_tries documentation to match this historical behavior.
This approach does not have known unique flaws.
Also fixed FwdState::n_tries off-by-one comparison bug discussed during
Squid bug 4788 triage.
Also fixed admin concern behind Squid bug 4788 "forward_max_tries 1 does
not prevent some retries": While the old forward_max_tries documentation
actually excluded pconn retries, technically invalidating the bug
report, the admin now has a knob to limit those retries.