From: adrian <> Date: Wed, 19 Sep 2001 19:23:30 +0000 (+0000) Subject: Add some initial COSS information for people. X-Git-Tag: SQUID_3_0_PRE1~1394 X-Git-Url: http://git.ipfire.org/gitweb/gitweb.cgi?a=commitdiff_plain;h=6d80fd3a319087506c3f92158bae350c38cb3dce;p=thirdparty%2Fsquid.git Add some initial COSS information for people. From: diskio branch --- diff --git a/src/fs/coss/coss-notes.txt b/src/fs/coss/coss-notes.txt new file mode 100644 index 0000000000..6dceaf7a5b --- /dev/null +++ b/src/fs/coss/coss-notes.txt @@ -0,0 +1,91 @@ +COSS notes + +Adrian Chadd + +$Id: coss-notes.txt,v 1.1 2001/09/19 13:23:30 adrian Exp $ + + +COSS is a Cyclic Object storage system originally designed by +Eric Stern . The idea has been extended +and worked into the current framework by myself. + +In these notes I'll discuss the current implementation of COSS +and note where the implementation differed from Eric's original +idea and why the design changes were made. + + +COSS basics +----------- + +COSS works with a single file. Eventually the file may actually be +a raw disk device, but since squid doesn't cache the disk reads +in memory the OS buffer cache will need to be employed for reasonable +performance. For the purposes of this discussion the COSS storage +device will be referred to as a file. + +Each stripe is a fixed size an in a fixed position in the file. The +stripe size is a compile-time option. + +As objects are written to a COSS stripe, their place is pre-reserved +and data is copied into a memory copy of the stripe. Because of this, +the object size must be known before it can be stored in a COSS +filesystem. (Hence the max-size requirement with a coss cache_dir.) + +When a stripe is filled, the stripe is written to disk, and a new +memory stripe is created. + +When objects are read back from the COSS file, they can either come +from a stripe in-memory (the current one, or one being written), +or from the disk. If the object is still in a memory stripe, then +it is copied from memory rather than read of disk. + +If an object is read from disk, it is re-written to the head of +the current stripe (just as if it were a new object.) This is required +for correct operation of the replacement policy, detailed below. + +When the entire COSS file is full, the current stripe again becomes the +fist stripe in the file, and the objects in that stripe are released. +Since the objects on disk are kept in a strict LRU representing the +replacement policy LRU linking the StoreEntry's together, this simply +involves walking the tail of the LRU and freeing entries until we +hit an entry in the next stripe. + + +COSS implementation details +--------------------------- + +* The stripe size is fixed. In the original COSS code, Eric optimised + this a little by allowing the stripes to be truncated to not + waste disk space at the end of the stripe. This was removed + to simplify the allocation code slightly and make things easier + when the store log and checksums are combined in the stripe + for faster rebuilds. + +* COSS currently copies object memory around WAY too much. This needs + to be fixed eventually. + +* It would be nice if the storeRead() interface were a little smarter + and allowed the filesystem to return as much of an object as possible. + This would be good for COSS since the read from disk could be simplified + to use a single OS read() call - this would work really well for + the object types COSS is designed to cache. + +* The original coss code used file_read() and file_write() for disk IO. + The file_* routines were initially used to implement async disk IO, + and Eric probably wrote some async disk code for windows. + I've written a very very simple async_io.c module which uses POSIX + AIO to implement the async IO. POSIX AIO is well-suited to the + disk IO COSS performs. + +COSS direction +-------------- + +Eventually, when more of squid is rewritten, I'm going to replace +the replacement policy with something a little more flexible. +A shortcut would be to use a slab allocator and have one slab per +stripe for the StoreEntry's. When it comes time to replace a stripe, +you can just treat the stripe as an array. This would not work +well in the current squid codebase, but it would work well in the +planned rewrite. This would also allow alternate replacement policies +to be used. Oh, it'd cut down the storage requirements per +StoreEntry by two pointers (8 bytes on the i386.)