From: adrian <>
Date: Wed, 19 Sep 2001 19:23:30 +0000 (+0000)
Subject: Add some initial COSS information for people.
X-Git-Tag: SQUID_3_0_PRE1~1394
X-Git-Url: http://git.ipfire.org/gitweb/gitweb.cgi?a=commitdiff_plain;h=6d80fd3a319087506c3f92158bae350c38cb3dce;p=thirdparty%2Fsquid.git

Add some initial COSS information for people.

From: diskio branch
---

diff --git a/src/fs/coss/coss-notes.txt b/src/fs/coss/coss-notes.txt
new file mode 100644
index 0000000000..6dceaf7a5b
--- /dev/null
+++ b/src/fs/coss/coss-notes.txt
@@ -0,0 +1,91 @@
+COSS notes
+
+Adrian Chadd <adrian@creative.net.au>
+
+$Id: coss-notes.txt,v 1.1 2001/09/19 13:23:30 adrian Exp $
+
+
+COSS is a Cyclic Object storage system originally designed by
+Eric Stern <estern@logisense.com>. The idea has been extended
+and worked into the current framework by myself.
+
+In these notes I'll discuss the current implementation of COSS
+and note where the implementation differed from Eric's original
+idea and why the design changes were made.
+
+
+COSS basics
+-----------
+
+COSS works with a single file. Eventually the file may actually be
+a raw disk device, but since squid doesn't cache the disk reads
+in memory the OS buffer cache will need to be employed for reasonable
+performance. For the purposes of this discussion the COSS storage
+device will be referred to as a file.
+
+Each stripe is a fixed size an in a fixed position in the file. The
+stripe size is a compile-time option.
+
+As objects are written to a COSS stripe, their place is pre-reserved
+and data is copied into a memory copy of the stripe. Because of this,
+the object size must be known before it can be stored in a COSS
+filesystem. (Hence the max-size requirement with a coss cache_dir.)
+
+When a stripe is filled, the stripe is written to disk, and a new
+memory stripe is created.
+
+When objects are read back from the COSS file, they can either come
+from a stripe in-memory (the current one, or one being written),
+or from the disk. If the object is still in a memory stripe, then
+it is copied from memory rather than read of disk.
+
+If an object is read from disk, it is re-written to the head of
+the current stripe (just as if it were a new object.) This is required
+for correct operation of the replacement policy, detailed below.
+
+When the entire COSS file is full, the current stripe again becomes the
+fist stripe in the file, and the objects in that stripe are released.
+Since the objects on disk are kept in a strict LRU representing the
+replacement policy LRU linking the StoreEntry's together, this simply
+involves walking the tail of the LRU and freeing entries until we
+hit an entry in the next stripe.
+
+
+COSS implementation details
+---------------------------
+
+* The stripe size is fixed. In the original COSS code, Eric optimised
+  this a little by allowing the stripes to be truncated to not
+  waste disk space at the end of the stripe. This was removed
+  to simplify the allocation code slightly and make things easier
+  when the store log and checksums are combined in the stripe
+  for faster rebuilds.
+
+* COSS currently copies object memory around WAY too much. This needs
+  to be fixed eventually.
+
+* It would be nice if the storeRead() interface were a little smarter
+  and allowed the filesystem to return as much of an object as possible.
+  This would be good for COSS since the read from disk could be simplified
+  to use a single OS read() call - this would work really well for
+  the object types COSS is designed to cache.
+
+* The original coss code used file_read() and file_write() for disk IO.
+  The file_* routines were initially used to implement async disk IO,
+  and Eric probably wrote some async disk code for windows.
+  I've written a very very simple async_io.c module which uses POSIX
+  AIO to implement the async IO. POSIX AIO is well-suited to the
+  disk IO COSS performs.
+
+COSS direction
+--------------
+
+Eventually, when more of squid is rewritten, I'm going to replace
+the replacement policy with something a little more flexible.
+A shortcut would be to use a slab allocator and have one slab per
+stripe for the StoreEntry's. When it comes time to replace a stripe,
+you can just treat the stripe as an array. This would not work
+well in the current squid codebase, but it would work well in the
+planned rewrite. This would also allow alternate replacement policies
+to be used. Oh, it'd cut down the storage requirements per
+StoreEntry by two pointers (8 bytes on the i386.)