git.ipfire.org Git - thirdparty/squid.git/log

]> git.ipfire.org Git - thirdparty/squid.git/log

Eduard Bagdasaryan [Tue, 23 May 2017 08:31:54 +0000 (20:31 +1200)]

Make PID file check/creation atomic to avoid associated race conditions.

After this change, if N Squid instances are concurrently started shortly
after time TS, then exactly one Squid instance (X) will run (and have
the corresponding PID file). If another Squid instance has already been
running (with the corresponding PID file) at TS, then X will be that
"old" Squid instance. If no Squid instances were running at TS, then X
will be one of those new N Squids started after TS.

Lack of atomic PID file operations caused unexpected Squid behavior:
* Mismatch between started Squid instance and stored PID file.
* Unexpected crashes due to failed allocation of shared resources,
  such as listening TCP ports or shared memory segments.

A new File class guarantees atomic PID file operations using locks. We
tried to generalize/reuse Ssl::Lock from the certificate generation
helper, but that was a bad idea: Helpers cannot use a lot of Squid code
(e.g., debugs(), TextException, SBuf, and enter_suid()), and the old
Ssl::Lock class cannot support shared locking without a major rewrite.

File locks on Solaris cannot work well (see bug #4212 comment #14), but
those problems do not affect PID file management code. Solaris- and
Windows-specific File code has not been tested and may not build.

Failure to write a PID file is now fatal. It used to be fatal only when
Squid was started with the -C command line option. In the increasingly
SMP world, running without a PID file leads to difficult-to-triage
errors. An admin who does not care about PID files should disable them.

Squid now exits with a non-zero error code if another Squid is running.

Also removed PID file rewriting during reconfiguration in non-daemon
mode. Squid daemons do not support PID file reconfiguration since trunk
r13867, but that revision (accidentally?) left behind half-broken
reconfiguration code for non-daemon mode. Fixing that code is difficult,
and supporting PID reconfigure in non-daemons is probably unnecessary.

Also fixed "is Squid running?" check when kill(0) does not have
permissions to signal the other instance. This does happen when Squid is
started (e.g., on the command line) by a different user than the user
Squid normally runs as or, perhaps, when the other Squid instance enters
a privileged section at the time of the check (untested). The bug could
result in undelivered signals or multiple running Squid instances.

These changes do not alter partially broken enter/leave_suid() behavior
of main.cc. That old code will need to be fixed separately!

PID file-related cache.log messages have changed slightly to improve
consistency with other DBG_IMPORTANT messages and to simplify code.
Squid no longer lies about creating a non-configured PID file. TODO:
Consider lowering the importance of these benign/boring messages.

* Terminal errors should throw instead of calling exit()

Squid used to call exit() in many PID-related error cases. Using exit()
as an error handling mechanism creates several problems:

1. exit() does not unwind the stack, possibly executing atexit()
   handlers in the wrong (e.g., privileged) context, possibly leaving
   some RAII-controller resources in bad state, and complicating triage;
2. Using exit() complicates code by adding a yet another error handling
   mechanism to the (appropriate) exceptions and assertions.
3. Spreading exit() calls around the code obscures unreachable code
   areas, complicates unifying exit codes, and confuses code checkers.

Long-term, it is best to use exceptions for nearly all error handling.
Reaching that goal will take time, but we can and should move in that
direction: The adjusted SquidMainSafe() treats exceptions as fatal
errors, without dumping core or assuming that no exception can reach
SquidMainSafe() on purpose. This trivial-looking change significantly
simplified (and otherwise improved) PID-file handling code!

The fatal()-related code suffers from similar (and other) problems, but
we did not need to touch it.

TODO: Audit catch(...) and exit() cases [in main.cc] to take advantage
of the new SquidMainSafe() code supporting the throw-on-errors approach.