Amos Jeffries [Tue, 27 Oct 2015 22:21:38 +0000 (15:21 -0700)]
Avoid errors when parsing manager ACL in old squid.conf
ACL manager is now a built-in definition and has a different type. That
has been causing FATAL errors when parsing old squid.conf. We can be
nicer and just ignore the obsolete config lines.
Amos Jeffries [Wed, 14 Oct 2015 05:29:29 +0000 (22:29 -0700)]
Bug 3574: crashes on reconfigure and startup
When Squid receives a reconfigure signal before its signal handler
has been registered on startup it will crash with unhandled signal
exceptions. This can be triggered on system boot when a resolv.conf
alteration signal wins a race with the daemon service initialization.
Fix:
Register the reconfigure signal handler early and ignoring signals
until initial squid.conf load has completed.
When Squid receives a reconfigure signal while it is already in the
process of reconfiguring, the two async sequences can interfere and
result in confusing fatal error or crashes.
Fix:
Only allow one reconfigure sequence to be initiated at a time.
Also, if shutdown signal has been received while waiting for a
reconfigure to finish, let shutdown take precedence over any pending
reconfigure repeats.
Based on work by Clint Byrum and D J Gardner, Ubuntu
Bug 4330: Do not use SSL_METHOD::put_cipher_by_char to determine size
... of cipher on hello messages
The use of these methods can cause many problems in squid:
- In earlier openSSL libraries the SSL_METHOD::put_cipher_by_char method with
NULL arguments returned the size of cipher in the SSL hello message.
In newer openSSL releases, calling this method with NULL arguments is not
valid any more, and can result to segfaults.
- In newer libreSSL library releases, the SSLv23_method it is used to produce
TLS messages and does not return the size of a cipher in an v2 HELLO
message.
Fix cache_peer login=PASS(THRU) after CVE-2015-5400
The patch for CVE-2015-5400 converts all non-200 peer responses
into 502 Bad Gateway responses when relaying a CONNECT to a peer.
This happens to break login=PASS and login=PASSTHRU behaviour
which relies on the 401 and 407 status being relayed transparently.
We need to relay the auth server responses as-is when login= is
set to PASS or PASSTHRU but then unconditionally close the
connections to prevent CVE-2015-5400 from occuring.
After the exception is thrown, Squid attempts to wind down the affected
transaction (as it should), but the code either quits with an unhandled
exception error or hits the !callback assertion, depending on whether
the async job processing was in place when the exception was hit (which
depends on whether non-blocking/slow ssl_bump ACLs were active).
The attached patch does three things:
1. Teaches Squid to guess the final ssl_bump action when no ssl_bump
rules match. The final guessed action is "bump" if the last non-final
action was "stare" and "splice" otherwise. I suspect that the older
Squid code attempted to do something like that, but that code may have
been lost when we taught Squid to ignore impossible ssl_bump actions.
2. Protects ssl_bump-checking code from quitting with an unhandled
exception error.
3. Converts the fatal !callback assertion into [hopefully less damaging]
transaction error, with a BUG message logged to cache.log.
More work may be needed to investigate other exceptions, especially
Must(!csd->serverBump() || csd->serverBump()->step <= Ssl::bumpStep2);
As an historic optimization StoreEntry uses a custom pool chunk size of 2MB.
Knowledge of the actual benefits from this optimization has been lost in time,
and it's not possible to accurately measure its actual impact in all load
scenarios; at the same time this optimization is blocking other potentially
useful developments.
This change is therefore considered a potential performance regression in
some load scenarios.
Bug 4309: Fix the presence of extensions detection in SSL Hello messages
RFC5246 section 7.4.1.3 (Server Hello) says:
The presence of extensions can be detected by determining whether
there are bytes following the compression_method field at the end
of the ServerHello.
Current parsing Hello code checks whether there are bytes in the whole
SSL message. It does not account for the fact that the message may
contain more than just ServerHello.
This patch fixes this issue and tries to improve the related code to
avoid related problems in the future.
Using the MemBuf::buf directly is not great, but it does have a properly
terminated c-string in this instance. We cannot use Raw() interface
because that is for output at DBG_DATA levels and will only display the
buffer name as if that was the raw traffic bytes at 11,2.
Which negates the entire purpose of this 11,2 output.
Alex Rousskov [Tue, 1 Sep 2015 09:25:57 +0000 (02:25 -0700)]
Support splice for SSLv3 and TLSv1 sessions that start with an SSLv2 Hello
Such sessions are created, for example, by some SSL clients using OpenSSL
v0.9.8 with default options. This does _not_ relate to SSLv2 sessions.
Just enacts the permitted exception for Hello messages in RFC 6176.
Amos Jeffries [Sat, 29 Aug 2015 20:21:33 +0000 (13:21 -0700)]
Bug 3553: cache_swap_high ignored and maxCapacity used instead
Also, to make matters worse the amount of objects (max 70) being purged on
each of the 1-second maintenance loops was far too small for the traffic
speeds of up to 20k RPS now being processed by proxies.
This fixes the cache_swap_high behaviour to closer match what is documented
at present, although some documentation does say it cleans all the way down
to the low-water mark. Which appears never to have been true in regards to
one cycle but would occur over several of the proxy speed was not too high.
With this updated algorithm there is almost no limit to how far the
aggressiveness can scale, but it is linear at 300 objects per multiple of
the gap between low- and high- watermark.
SwapDir::maintain is now fairly well documented and debug traces added. With
several TODO ideas for future improvement also documented in the method code.
Alex Rousskov [Sat, 29 Aug 2015 20:11:19 +0000 (13:11 -0700)]
When a RESPMOD service aborts, mark the body it produced as truncated.
Without these changes, the recipient of the truncated body often
cannot tell that the body was actually truncated (e.g., when Squid
uses chunked encoding for body delivery). Lying about truncation
may result in rather serious user-level problems.
Amos Jeffries [Sat, 29 Aug 2015 18:51:19 +0000 (11:51 -0700)]
Cleanup: fix assertion in Store unit tests
The old Squid String implementation cannot handle appending nullptr or
negative lengths. So if the test code using CapturingStoreEntry ever
tries to append such it will crash instead of working like a StoreEntry
should.
Amos Jeffries [Tue, 25 Aug 2015 15:27:18 +0000 (08:27 -0700)]
Docs: auto-build release notes for snapshots
This adds conditional build support to generate release notes whenever
a tarball is being created, regardless of what the code branch status
is. All that is required is the linuxdoc tool chain.
Formal release branch snapshots have been publishing the notes files
built for their previous release. But development versions of Squid
have not been getting documented at all which can be annoying for
testers.
The release-3.N.html file is also removed from the repository. With this
update it is no longer be needed by the snapshot machinery.
Handle nil HttpReply pointer inside various handlers called from
Ftp::Server::handleReply(). For example, when the related StoreEntry
object is aborted, the client_side_reply.cc code may call the
Ftp::Server::handleReply() method with a nil reply pointer.
The Ftp::Server::handleReply() methods itself cannot handle nil replies
because they are valid in many states. Only state-specific handlers know
whether they need the reply.
The Ftp::Server::handleReply() method is called [via Store] from Client code.
Thus, exceptions in handleReply() are handled by the Ftp::Client job. That job
does not have enough information to know whether the client-to-Squid connection
should be closed; the job keeps the connection open. When the reply is nil,
that open connection becomes unusable, leading to more problems.
This patch fixes the Ftp::Server::handleReply() to handle exceptions,
including closing the connections in the case of an exception. It also
adds Must(reply) checks to check for nil HttpReply pointers where the
reply is required. Eventually, Store should start using async calls to
protect jobs waiting for Store updates. Meanwhile, this should help.
Ignore impossible SSL bumping actions, as intended and documented.
According to Squid wiki: "Some actions are not possible during
certain processing steps. During a given processing step, Squid
ignores ssl_bump lines with impossible actions". The distributed
squid.conf.documented has similar text.
Current Squid violates the above rule. Squid considers all actions,
and if an impossible action matches first, Squid guesses what the
true configuration intent was. Squid may guess wrong. For example,
depending on the transaction, Squid may guess that a matching
stare or peek action during bumping step3 means "bump", breaking
peeked connections that cannot be bumped.
This unintended but gross configuration semantics violation remained
invisible until bug 4237, probably because most configurations in
most environments either worked around the problem (where admins
experimented to "make it work") or did not result in visible
errors (where Squid guesses did not lead to terminated connections).
While configuration workarounds are possible, the current
implementation is very wrong and leads to overly complex and, hence,
often wrong configurations. It is also nearly impossible to document
accurately because the guessing logic depends on too many factors.
To fix this, we add an action filtering/banning mechanism to Squid
ACL code. This mechanism is then used to:
- ban client-first and server-first on bumping steps 2 and 3.
- ban peek and stare actions on bumping step 3.
- ban splice on step3 if stare is selected on step2 and
Squid cannot splice the SSL connection any more.
- ban bump on step3 if peek is selected on step2 and
Squid cannot bump the connection any more.
The same action filtering mechanism may be useful for other
ACL-driven directives with state-dependent custom actions.
This change adds a runtime performance overhead of a single virtual
method call to all ORed ACLs that do not use banned actions.
That method itself just returns false unless the ACL represents
a whole directive rule. In the latter case, an std::vector size()
is also checked. It is possible to avoid this overhead by adding
a boolean "I may ban actions" flag to Acl::OrNode, but we decided
the small performance harm is not worth the extra code to set
that flag.
Amos Jeffries [Thu, 20 Aug 2015 13:44:21 +0000 (06:44 -0700)]
Polish: add debug section,level to cache.log
Cache.log produced at level ALL,9 are very verbose, and tracking down
what specific section,level details to log for a shorter trace without
lost details can sometimes be tricky and time consuming. Particularly
when multiple sections are involved.
This patch adds a column containing the relevant debug_options
SECTION,LEVEL value on each line right after the kidN number for debug
levels 2+.
Alex Rousskov [Thu, 20 Aug 2015 13:42:51 +0000 (06:42 -0700)]
Reject non-chunked HTTP messages with conflicting Content-Length values.
Squid used to trust and forward the largest Content-Length header. This
behavior violated an RFC 7230 MUST in Section 3.3.3 item #4. It also confused
some ICAP services and probably some HTTP agents. Squid now refuses to forward
the badly framed message to the ICAP service and HTTP agent, responding with
an HTTP 411 or 502 (depending on the message direction) error instead.
This is a quick-and-dirty implementation. A polished version should reject
responses with invalid Content-Length values as well (per RFC 7230 MUST) and
should behave the same regardless of the relaxed_header_parser setting (this
is not a header parsing issue).
Amos Jeffries [Sat, 8 Aug 2015 04:09:13 +0000 (21:09 -0700)]
Boilerplate: add Foundation details to rfcnb and smblib documentation files
We had hoped to be removing this old library code by now. But it appears
that there is no alternative and users are still requesting the helpers
that depend on them.
Amos Jeffries [Sat, 8 Aug 2015 04:04:45 +0000 (21:04 -0700)]
Cleanup: de-duplicate fake-CONNECT code
Over the course of the peek-n-splice development and followup patches
the code generating fake CONNECT requests to tunnel various intercepted
traffic has been copy-n-pasted several times.
Add a new method fakeAConnectRequest() that takes a debug reason and
SBuf containing any payload to preserve from the original I/O buffer.
Amos Jeffries [Sat, 8 Aug 2015 02:18:24 +0000 (19:18 -0700)]
Use automake subdir-objects feature
The auto* toolchain warns that automake future versions will be enabling
subdir-objects mechanism by default.
Some unit tests were moved into per-library subdirs with the plan of
keeping all convenience library code together. However the current
layout state of Squid means that most still require some objects in other
libraries or at the top level. This does not build happily with the
auto-tools subdir-objects feature. In particular the distclean target has
a tendency to erase objects twice and die on the second attempt.
Temporarily undo that SourceLayout shuffing in order to be more
compatible with automake 1.1n versions.
Now that there are no longer cross-directory collisions in the built
binaries or libraries we can enable subdir-objects from ./configure
instead of on a per-Makefile basis
basic_smb_auth.sh delivers the credentials via environment in
a form "$USER%$PASSWORD", which is not expected from smbclient. This seem to
result from an obsolete or inferior documentation of smbclient. While it is
perfectly valid to deliver the credentials in this form via commandline
parameter -U, for example in
Jeff Licquia [Fri, 31 Jul 2015 20:13:45 +0000 (13:13 -0700)]
basic_smb_auth: doesn't handle passwords with backslashes
From; Jeff Licquia <jlicquia@scinet.springfieldclinic.com>
Subject; squid: SMB auth proxy has problems with some passwords
Date; Tue, 18 Jul 2000 12:45:01 -0500 (CDT)
The SMB authenticator doesn't handle passwords with backslashes in them
correctly. The fix appears to be easy; just put a -r in the "read SMBPASS"
line in smb_auth.sh.
John M Cooper [Fri, 31 Jul 2015 20:12:12 +0000 (13:12 -0700)]
basic_smb_auth: nmblookup fails when smb.conf contaisn WINS servers
From; John M Cooper
To; Debian Bug Tracking System
Subject; squid: smb_auth does not work with a wins server defined in smb.conf
Date; 28 Jan 2002 17:46:13 +0000
If you define a wins server in the file /etc/samba/smb.conf then the
smb_auth script gets the wrong Domain Controller IP address.
There should be a change to smb_auth.sh at line 50
basically adding in the extra "\..+" stops the number of Wins servers
from being returned from the nmblookup command.
Increasingly code used inside squid.conf parsing is capable of throwing
exceptions to signal errors. Catch any unexpected exceptions that reach
the config parse initiator(s) and report as a FATAL event before self
destructing.
Bug 3345: Support %un (any available user name) format code for external ACLs.
The same %un code, with the same meaning is already supported in access.log.
In an external ACL request, it expands to the first available user name
from the following list of information sources:
- authenticated user name, like %ul or %LOGIN
- user name supplied by an external ACL to Squid via the "user=..."
key=value pair, like %ue or %EXT_USER
- SSL client name, like %us
- ident user name, like %ui
Based on Amos Jeffries 2011 patch and "arronax28" design:
http://www.squid-cache.org/mail-archive/squid-dev/201112/0080.html
with TODO completion by Measurement Factory
... from 8 to 8196 before initial congestion message appears.
Modern networks can be quite busy and even amateur installations have a
much higher I/O throughput than Squid was originally designed for. This
often results in a series of "Queue congestion" warnings appearing on
startup before Squid learns what the local environment requires.
The new limit helps to cater for this and reduce teh frequency of
unnecessary warnings. They may still occur, so debug output is also
updated to show what the queue length has grown to with each warning.
Also updating the congestion counter from 32-bit to 64-bit unsigned
since the new limit already consumes half the available growth bits in
32-bit integer.
NP: this update was triggered by reports from admin with proxies needing
to expand AIO queues to over 4K entries on startup.
Improve handling of client connections on shutdown
When Squid which are processing a lot of traffic, using persistent
client connections, or dealing with long duration requests are shutdown
they can exit with a lot of connections still open. The
shutdown_lifetime directive exists to allow time for existing
transactions to complete, but this is not always possible and has no
effect on idle connections.
The result is a large dump of aborted FD entries being logged as the TCP
sockets get abruptly reset. Potentially active transactions cache
objects being "corrupted" in the process.
Makes ConnStateData and its children implement Runner API callbacks
to receive signals about Squid shutdown. Which allows their close()
handlers to be run properly and make use of AsyncCalls API. Idle client
connections are closed immediately on the startShutdown() signal, so
their closure CPU cycles happens during the shutdown grace period.
An extra 0-delay event step is added to SignalEngine shutdown sequence
with a new Runner registry hook 'endingShutdown' is added to signal that
the shutdown_lifetime grace period is over for closure of active
transactions. All network FD sockets should be considered unusable for
read()/write() at that point since close handlers may have already been
scheduled by other Runners. AsyncCall's may still be scheduled to
release resources.
Also adds a DeregisterRunner() API action to remove Runners dynamically
from the registered set.
* shutdown grace period ends:
- remaining client connections closed
* shutdown finishes:
- main signal and Async loop halted
- all memory free'd
Server connections which are PINNED or in active use during the
endingShutdown execution will be closed cleanly as a side-effect of the
client closures. Otherwise there is no change (yet) to server connections
or other FD sockets behaviour on shutdown.
Avoid SSL certificate db corruption with empty index.txt as a symptom.
* Detect cases where the size file is corrupted or has a clearly wrong
value. Automatically rebuild the database in such cases.
* Teach ssl_crtd to keep running if it is unable to store the generated
certificate in the database. Return the generated certificate to Squid
and log an error message in such cases.
Background:
There are cases where ssl_crtd may corrupt its certificate database.
The known cases manifest themselves with an empty db index file. When
that happens, ssl_crtd helpers quit, SSL bumping does not work any more,
and the certificate DB has to be deleted and re-initialized.
We do not know exactly what causes corruption in deployments, but one
known trigger that is easy to reproduce in a lab is the block size
change in the ssl_crtd configuration. That change has the following
side-effects:
1. When ssl_crtd removes certificates, it computes their size using a
different block size than the one used to store the certificates.
This is may result in negative database sizes.
2. Signed/unsigned conversion results in a huge number near LONG_MAX,
which is then written to the "size" file.
3. The ssl_crtd helper remoces all certificates from database trying to make
space for new certificates.
4. The ssl_crtd helper refuses to store new certificates because the
database size (as described by the "size" file) still exceeds the
configured limit.
5. The ssl_crtd helper exits because it cannot store a new certificates
to the database. No helper response is sent to Squid in this case.
Most likely, there are other corruption triggers -- the database
management code is of an overall poor quality. This change resolves some
of the underlying problems in hope to address at least some of the
unknown triggers as well as the known one.
Errors served using invalid certificates when dealing with SSL server errors.
When bumping Squid needs to send an Squid-generated error "page" over a
secure connection, Squid needs to generate a certificate for that connection.
Prior to these changes, several scenarios could lead to Squid generating
a certificate that clients could not validate. In those cases, the user would
get a cryptic and misleading browser error instead of a Squid-generated
error page with useful details about the problem.
For example, is a server certificate that is rejected by the certificate
validation helper. Squid no longer uses CN from that certificate to generate
a fake certificate.
Another example is a user accessing an origin server using one of its
"alternative names" and getting a Squid-generated certificate containing just
the server common name (CN).
These changes make sure that certificate for error pages is generated using
SNI (when peeking or staring, if available) or CONNECT host name (including
server-first bumping mode). We now update the ConnStateData::sslCommonName
field (used as CN field for generated certificates) only _after_ the server
certificate is successfully validated.
Currently, Squid cannot redirect intercepted connections that are subject to
SslBump rules to _originserver_ cache_peer. For example, consider Squid that
enforces "safe search" by redirecting clients to forcesafesearch.example.com.
Consider a TLS client that tries to connect to www.example.com. Squid needs to
send that client to forcesafesearch.example.com (without changing the host
header and SNI information; those would still point to www.example.com for
safe search to work as intended!).
The admin may configure Squid to send intercepted clients to an originserver
cache_peer with the forcesafesearch.example.com address. Such a configuration
does not currently work together with ssl_bump peek/splice rules.
This patch:
* Fixes src/neighbors.cc bug which prevented CONNECT requests from going
to originserver cache peers. This bug affects both true CONNECT requests
and intercepted SSL/TLS connections (with fake CONNECT requests). Squid
use the CachePeer::in_addr.port which is not meant to be used for the HTTP
port, apparently. HTTP checks should use CachePeer::http_port instead.
* Changes Squid to not initiate SSL/TLS connection to cache_peer for
true CONNECT requests.
* Allows forwarding being-peeked (or stared) at connections to originserver
cache_peers.
The bug fix described in the first bullet makes the last two changes
necessary.
Alex Rousskov [Wed, 1 Jul 2015 06:26:38 +0000 (23:26 -0700)]
Do not blindly forward cache peer CONNECT responses.
Squid blindly forwards cache peer CONNECT responses to clients. This
may break things if the peer responds with something like HTTP 403
(Forbidden) and keeps the connection with Squid open:
- The client application issues a CONNECT request.
- Squid forwards this request to a cache peer.
- Cache peer correctly responds back with a "403 Forbidden".
- Squid does not parse cache peer response and
just forwards it as if it was a Squid response to the client.
- The TCP connections are not closed.
At this stage, Squid is unaware that the CONNECT request has failed. All
subsequent requests on the user agent TCP connection are treated as
tunnelled traffic. Squid is forwarding these requests to the peer on the
TCP connection previously used for the 403-ed CONNECT request, without
proper processing. The additional headers which should have been applied
by Squid to these requests are not applied, and the requests are being
forwarded to the cache peer even though the Squid configuration may
state that these requests must go directly to the origin server.
This fixes Squid to parse cache peer responses, and if an error response
found, respond with "502 Bad Gateway" to the client and close the
connections.
Amos Jeffries [Sun, 28 Jun 2015 10:13:58 +0000 (03:13 -0700)]
Use relative-URL in errorpage.css for SN.png
Modern browsers now seem to be accepting relative-URLs in CSS, and Squid
global_internal_static non-https:// URLs are working (bug 4132). So we
can do this now without as many failures.
Amos Jeffries [Sun, 28 Jun 2015 10:09:15 +0000 (03:09 -0700)]
Fix CONNECT failover to IPv4 after trying broken IPv6 servers
This makes CONNECT tunnel connection attempts obey forward_timeout
and continue retrying instead of aborting with a client error when one
possible server hits a connect_timeout.
Alex Rousskov [Sun, 28 Jun 2015 10:05:58 +0000 (03:05 -0700)]
Fixed segfault when freeing https_port clientca on reconfigure or exit.
AnyP::PortCfg::clientCA list was double-freed because the SSL context takes
ownership of the STACK_OF(X509_NAME) supplied via SSL_CTX_set_client_CA_list(),
but Squid was not aware of that. Squid now supplies a clone of clientCA.
This bug can be caused by certificates does not contain a CN field. In this
case the Ssl::ErrorDetail::cn method may return NULL causing this assertion
somewhere inside Ssl::ErrorDetail::buildDetail method, which expects always
a non NULL value from Ssl::ErrorDetail::cn and similar methods.
This patch try to hardening the Ssl::ErrorDetail error formating functions to
avoid always check for NULL values and also avoid sending wrong information
for various certificate fields in the case of an error while extracting the
information from certificate..
Fix assertion comm.cc:759: "Comm::IsConnOpen(conn)" in ConnStateData::getSslContextDone
This is an ssertion inside ConnStateData::getSslContextDone while
setting timeout. The reason is that the ConnStateData::clientConnection
may closed while waiting response from ssl_crtd helper.
Amos Jeffries [Fri, 5 Jun 2015 23:38:34 +0000 (16:38 -0700)]
Bug 3875: bad mimeLoadIconFile error handling
Improve the MimeIcon reliability when filesystem I/O errors or others
cause the icon data to not be loadable.
The loading process is re-worked to guarantee that once the
MimeIon::created callback occurs it will result in a valid StoreEntry in
the cache representing the wanted icon.
* If the image can be loaded without any issues it will be placed in
the cache as a 200 response.
* If errors prevent the image being loaded or necessary parameters
(size and mtime) being known a 204 object will be placed into the cache.
NP: There is no clear agreement on 204 being 'the best' status for this
case. 500 Internal Error is also appropriate. I have use 204 since:
* the bug is not in the clients request (eliminating 400, 404, etc),
* a 500 would be revealing details about server internals unnecessarily
often and incur extra complexity creating the error page.
* 204 also avoids needing to send Content-Length, Cache-Control header
and body object (bandwidth saving over 500 status).
NP: This started with just correcting the errno usage, but other bugs
promptly started appearing once I got to seriously testing this load
process. So far it fixes:
* several assertions resulting from StoreEntry being left invalid in
cache limbo beween created hash entries and valid mem_obj data.
* repeated attempts on startup to load absent icons files which dont
exist in the filesystem.
* buffer overfow on misconfigured or corrupt mime.conf file entries
* incorrect debugs messages about file I/O errors
* large error pages delivered when icons not installed (when it does
not assert from the StoreEntry)
This patch allow user_cert and ca_cert ACLs to match arbitrary
stand-alone OIDs (not DN/C/O/CN/L/ST objects or their substrings).
For example, should be able to match certificates that have
1.3.6.1.4.1.1814.3.1.14 OID in the certificate Subject or Issuer field.
Squid configuration would look like this:
acl User_Cert-TrustedCustomerNum user_cert 1.3.6.1.4.1.1814.3.1.14 1001
Bug 3329: The server side pinned connection is not closed properly
... in ConnStateData::clientPinnedConnectionClosed CommClose handler.
Squid enters a buggy state when an idle connection pinned to a peer closes:
- The ConnStateData::clientPinnedConnectionRead, the pinned peer
connection read handler, is called with the io.flag set to
Comm::ERR_CLOSING. The read handler does not close the peer
Comm::Connection object. This is correct and expected -- the I/O
handler must exit on ERR_CLOSING without doing anything.
- The ConnStateData::clientPinnedConnectionClosed close handler is called,
but it does not close the peer Comm::Connection object either. Again,
this is correct and expected -- the close handler is not the place to
close a being-closed connection.
- The corresponding fde object is marked as closed (fde::flags.open
is false), but the peer Comm::Connection object is still open
(Comm::Connection.fd >= 0)! From this point on, we have an inconsistency
between the peer Comm::Connection object state and the real world.
- When the ConnStateData::pinning::serverConnection object is later
destroyed (by refcounting), it will try to close its fd. If that fd
is already in use (e.g., by another Comm::Connection), bad things
happen (crashes, segfaults, etc). Otherwise (i.e., if that fd is
not open), comm_close may cry about BUG 3556 (or worse).
To fix this problem, we must not allow Comm::Connections to get out
of sync with fd_table, even when a descriptor is closed without going
through Connection::close(). There are two ways to accomplished that:
* Change Comm to always store Comm::Connections and similar high-level
objects instead of fdes. This is a huge change that has been long on
the TODO list (those "other high-level objects" is on of the primary
obstacles there because not everything with a FD is a Connection).
* Notify Comm::Connections about closure in their closing handlers
(this change). This design relies on every Comm::Connection having
a close handler that notifies it. It may take us some time to reach
that goal, but this change is the first step providing the necessary
API, a known bug fix, and a few preventive changes.
This change:
- Adds a new Comm::Connection::noteClosure() method to inform the
Comm::Connection object that somebody is closing its FD.
- Uses the new method inside ConnStateData::clientPinnedConnectionClosed
handler to inform the ConnStateData::pinning::serverConnection object
that its FD is being closed.
- Replaces comm_close calls which may cause bug #3329 in other places with
Comm::Connection->close() calls.
Initially based on Nathan Hoad research for bug 3329.