Rework shared queue for IpcIoFile, further optimize IpcIo notifications.
The patch implements a FewToFewBiQueue class that allows
communication between two group of processes. The queue is used
in IpcIoFile and allows to have a single shared queue reader
state for each process (both diskers and workers). This
continues the optimization started in r11279, see commit log for
more details.
The patch also decreases the number of shared memory segment used
by queues. Before the change, FewToOneBiQueue used
(2*workerCount + 1) number of segments. Now FewToFewBiQueue uses
just three: for shared metadata, for array of one-to-one queues
and for array of queue readers.
Before the patch, each shared object was responsible for allocating
and deallocating shared memory it uses. As a result each object had a
shared and non shared portion. Shared classes provided a pair of
static methods for creating and attaching to existing shared segments.
This is against how normal objects behave: normal objects are not
responsible for managing memory they use, they use the memory they are
given. Besides the old approach mixes shared memory management and
object initialization logic. The patch tries to improve this.
On the user side, the patch provides two functions for managing shared
objects:
* shm_new - allocates/deallocates shared memory, initializes the object
* shm_old - gives refcounted access to the object created by shm_new
Shm_new function returns so called Owner object. It is not used for
working with the shared object, but to do shared memory
allocation/deallocation and object initialization. This function will
be typically used in Squid master process to allocate shared memory on
startup. On exit, the Owner object is deleted and shared object is
deallocated.
Shm_old function returns a refcounted smart pointer to the shared
object. It does not allocate shared memory or initialize the object,
but just points to the object owned by the Owner. Smart pointer
provides a simple way for working with the shared object.
On the internal side, the patch removes shared memory
allocation/deallocation from shared object class. There is no more
local/shared parts. Shared object class implementation is now similar
to an ordinary class. The additional requirements for "shared"
classes are: the object must be a POD with no pointers to or
references; provides a static SharedMemorySize method for shared
memory size calculation; may need to use atomic primitives for safe
updates of data members.
All existing "shared" classes and code were converted to the new API.
Alex Rousskov [Thu, 21 Apr 2011 15:19:31 +0000 (09:19 -0600)]
Temporary fix for coredumps during shutdown cleanup.
For a permanent fix, we need to avoid deleting fd_table while it is still
in use by others, such as DeferredReads, possibly by allowing event loop
to run during shutdown.
Alex Rousskov [Tue, 19 Apr 2011 04:31:53 +0000 (22:31 -0600)]
Optimized the number of "queue is no longer empty" IpcIo notifications.
The original code relied on the writer (pusher) knowledge to decide when a
notification is needed. That code was simpler but it resulted in many
pointless notifications because the reader could have been busy processing the
last popped item and would have checked the queue after that processing
anyway. This would become especially wasteful when the reader pops multiple
requests before processing them (e.g. to do "elevator" seek optimization).
The intermediate implementation (not comitted) placed the reader state in
each queue. That was still fairly simple and worked OK, but it was not
addressing the needs of the disker readers. Diskers have many incoming
queues. If at least one incoming queue has requests, the disker is not
blocked and does not need a notification.
The last implementation allows all incoming queues of a single disker to share
the reader/disker state. The reader state is disassociated from the single
queue. There is still some wasteful state updates when multiple queues are
iterated in FewToOneBiQueue::pop(), but their overheads should be very minor.
We need to figure out whether a single shared reader state can also be used
for workers though (each worker also has many incoming queues...).
Also added debugging and a few XXXs/TODOs to mark future work items.
Improve statistic reporting for shared Rock caches.
Rock cache is shared between multiple processes. We need to make sure
cache related stats are not counted twice by different processes. The
patch makes Rock store statistics to be reported by the disker process
only.
Some global variables for cache related stats are replaced with Store
class methods. This is needed because it may be difficult or
impossible to correctly update these variables for shared caches.
The patch also fixes cache manager output for some requests like
mgr:storedir: Before the change stats from disker processes were not
surrounded with "by kidN".
These macros are required for ./configure to run on an OS such as MingW.
The macro to detect pkg-config being present is usualy only bundled with
pkg-config. When there is no pkg-config installed ./configure will fail.
This allows our configure to detect the absence and mark some components
as unavailable or unusable.
Fixes NTLM and Negotiate auth assertion "RefCountCount() == 2"
It turns out the replay cache and invalid RefCount cases this was added to
protect againt are not present anyway. After some minor cleanup to remove
double-calls in Negotiate things appear to run nicely.
NOTE:
There is still a risk that these problem cases may in future occur, but
meanwhile we need NTLM and Negotiate to be usable and efficient.
The bugs resulting from those can be dealt with if/when they do occur.
Markus Moeller [Fri, 15 Apr 2011 11:51:15 +0000 (05:51 -0600)]
negotiate_wrapper_auth: version 1.0.1
A helper to perform Negotaite authentication in both its Negotiate/NTLM
and Negotiate/Kerberos forms.
Makes use of additional Squid helpers after unwrapping the header token.
Alex Rousskov [Thu, 14 Apr 2011 22:20:55 +0000 (16:20 -0600)]
Call haveParsedReplyHeaders() before entry->replaceHttpReply().
HaveParsedReplyHeaders() sets the entry public key and various flags (at
least). ReplaceHttpReply() packs reply headers, starting swapout process.
It feels natural to adjust the entry _before_ we pack/swap it, but I may be
missing some side-effects here.
The change was necessary because we started calling checkCachable() from
swapoutPossible(). If haveParsedReplyHeaders() is not called before we swap
out checks, the entry will still have the private key and will be declared
impossible to cache.
Alex Rousskov [Thu, 14 Apr 2011 04:25:35 +0000 (22:25 -0600)]
Polished shared memory initialization sequence, using RunnersRegistry API.
The master process is now responsible for initializing all shared memory
segments before starting kids. The kids do not create new segments and attach
to the already initialized segments instead. This approach may not scale for
ever, but it avoids more complex initialization synchronization via
Coordinator.
Do not use Strings for globals because current string memory pools do not
support early initialization.
Alex Rousskov [Thu, 14 Apr 2011 04:22:25 +0000 (22:22 -0600)]
Added RunnersRegistry, an API to register and, later, run a group of actions.
Useful for keeping general initialization management code (e.g., main.cc)
independent from specific initialization code (e.g., Ipc::Mem::Init) during
staged initialization and cleaning.
Also, shuffle the resulting classes into their own compilation units.
No Logic changes.
Have omitted shuffling or altering two Auth::Basic::User methods handling
the validation short-circuit since these shodul not be part of that class.
Followup patch will move them appropriately.
Alex Rousskov [Wed, 13 Apr 2011 17:03:08 +0000 (11:03 -0600)]
Fixed shared memory cleanup code -- we were not returning freed pages to Pages.
Added Ipc::StoreMapCleaner API so that map users are notified when the slot is
about to be overwritten or freed. Users need a chance to update their state
(e.g., return the no longer used shared page) before the extra information in
the slot disappears.
Alex Rousskov [Wed, 13 Apr 2011 05:19:00 +0000 (23:19 -0600)]
Fixed how storeSwapOutStart() prevents repeated calls on failures.
We used to release the entry to signal that swapout is not possible. That hack
worked for disk caching, but it prevents nearly all memory caching because
released entries cannot be cached in memory.
A polished solution is to explicitly remember whether we made the decision to
allow or reject a swapout. The decision is now stored in MemObject::SwapOut.
Call StoreEntry::checkCachable() from StoreEntry::swapoutPossible(). This
allows us to make the decision sooner in some cases. Needs more work because
some checks in the two functions overlap and "too many files" checks in
checkCachable() should not be there at all.
Added an XXX for the checkCachable() call at the end of swapout. Out of this
project scope.
Uses hard-coded string "cachemgr.cgi/" instead of progname to avoid
complications from alternative names and when running under a browser.
May be elided in transit however teh VERSION sent here will help the
queried proxy respond appropriate to the CGI capabilities as we extend
the types and content of reports coming back from the future releases.
Alex Rousskov [Tue, 12 Apr 2011 00:33:41 +0000 (18:33 -0600)]
Added initial shared memory cache implementation (MemStore) and integrated it.
Like Rock Store, shared memory cache keeps its own compact index of cached
entries using extended Ipc::StoreMap class (MemStoreMap). Like Rock Store, the
cache also struggles to keep its Root.get() results out of the store_table
except during transit.
There are several XXXs and TODOs that still need to be addressed for a more
polished implementation.
Eventually, the non-shared/local memory cache should also be implemented
using a MemStore-like class, I think. This will allow to clearly isolate
local from shared memory cache code.
Alex Rousskov [Mon, 11 Apr 2011 23:50:50 +0000 (17:50 -0600)]
Avoid creating unlocked store_table entries when handling rebuild conflicts.
Such StoreEntry objects persist until a hit locks and unlocks them (or the
replacement policy removes them?), creating SMP synchronization problems
because they are treated as in-transit objects even though their store slot
may be gone already.
Also, no code shuffling which should normally have been done with namespace.
Config children are currently too intwined with UserRequest children and
helper management. Logic changes are required before that can be done.
Alex Rousskov [Sat, 9 Apr 2011 04:24:06 +0000 (22:24 -0600)]
Split Rock-only Rock::DirMap into Rock::DirMap and reusable Ipc pieces
which a shared memory cache implementation can use:
Ipc::StoreMap is responsible for maintaining a collection of lockable slots,
each with readable/writeable/free state and a "waiting to be free" flag. Kids
of this class can add more metadata (in parallel structures using the same
index as primary slots). I tried extending the slots themselves, but that
turned out to be more complex/messy.
Ipc::ReadWriteLock is a basic multiple readers, single writer lock. Its
earlier implementation inside Rock::DirMap mixed slot locking and slot
state/flags. That simplified the caller code a little, but the current simpler
class is easier to understand and reuse.
Rock::DirMap now just adds Rock::DbCellHeader metadata to Ipc::StoreMap slots.
Simplified mapping API by reducing the number of similar-but-different
methods. For example, instead of putAt, the caller can use an
openForWriting/closeForWriting pair. This helps with moving custom metadata
manipulations outside of the reusable Ipc::StoreMap.
It would be possible to split Ipc::StoreMap further by moving Store-specific
bits outside of its slots. Currently, there is no need for that though.
Alex Rousskov [Sat, 9 Apr 2011 04:20:21 +0000 (22:20 -0600)]
Added reserve() method to allow nested classes or similar related users of
the same segment to safely bite off pieces of the same shared segment. Still
need to convert the callers.
The reserve() method is useful for single-users as well because it allows
to check that a segment has enough bytes allocated for its single user.
Changed theSize type from int to "size of any single object in RAM" size_t.
ConnStateData::flags.readMoreRequests, do_next_read variables, and
ClientSocketContext::mayUseConnection() methods were used (or unused!)
incorrectly or inconsistently.
This change removes all do_next_read variables to simplify the state. Instead,
the renamed ConnStateData::flags.readMore indicates whether client_side.cc
should call comm_read. The mayUseConnection() methods are now used to indicate
whether the next client-sent byte (buffered or read) should be reserved for
the current request rather than being interpreted as the beginning of the next
request.
Portability Fix: getrlimit() / setrlimit() incompatible type 'struct rlimit'
On Linux (at least) with large file support but not full 64-bit environment.
The getrlimt / setrlimit are #define'd to getrlimite64 / setrlimit64
BUT, the struct rlimit internal fields are updated to 64-bit types individually
instead of a matching #define to struct rlimit64 as a whole.
One can only assume that GCC is casting to void* or some such major voodoo
which hides this type collision.
ICC: support 64-bit environments dirent definitions
struct dirent is not consistently defined for 32-bit and 64-bit enabled
environments. Provide a dirent_t type defined appropriate to the environment
for use instead.
This npending test bug was preventing any poll() errors from being
noticed and displayed. Possibly leading to some of the weird hanging
reports we have been unable to replicate.
Alex Rousskov [Wed, 6 Apr 2011 16:25:36 +0000 (10:25 -0600)]
Fixed chunked request forwarding in ICAP REQMOD presence.
ICAP prohibits forwarding of hop-by-hop headers in HTTP headers. If the virgin
request has a "Transfer-Encoding: chunked" header, the ICAP server will not
receive it. Thus, when the ICAP server responds with a 200 OK and what it
thinks is a copy of the HTTP request, the adapted request will be missing the
Transfer-Encoding header.
One the server side, Squid used to test whether the request had a
Transfer-Encoding header to determine whether request chunking is needed when
talking to the next HTTP hop. That test would fail in ICAP REQMOD presence.
This change implements a more direct/robust check: if we do not know the
request content length, we chunk the request.
We also no longer forward the Content-Length header if we are chunking. It
should not really be there in most cases, but an explicit check is safer and
may also prevent request smuggling attacks via Connection: Content-Length
tricks.
Portability: Provide stdio wrappers for 64-bit in cstdio C++ builds
stdio.h in that case on provides fgetpos64, fopen64 if
__USE_FILE_OFFSET64 is defined. It then checks whether a gcc-specific
__REDIRECT macro is available (defined in sys/cdefs.h, depending on
__GNUC__ begin available).
If it is not available, it does a preprocessor #define.
Which <cstdio> undefines, with this comment:
"// Get rid of those macros defined in <stdio.h> in lieu of real functions.".
When it does a namespace redirection ("namespace std { using ::fgetpos; }")
it goes blam, as fgetpos64 is available, while fgetpos is not.
To fix it, we need to supply global functions matching those
signatures (not macros).
Enable string mempools to work correctly during initialization phase
Makes string mempools work before Mem::Init() was called, as may happen
during global variable initialization or early main.cc processing. If
needed, strings allocated before the Mem::Init() call are given an extra
buffer space to make sure the allocated buffer size will not match any
string pool size during deallocation.
Shortcomings: We now waste RAM on buffer increase for early allocated
strings unless they are already bigger than the maximum supported string
pool size. Statistics for early allocations are broken. Non-string
mempools still do not support early allocations.