src/fs/coss/coss-notes.txt

   1 COSS notes
   2
   3 Adrian Chadd <adrian@creative.net.au>
   4
   5 $Id: coss-notes.txt,v 1.3 2003/08/27 21:19:38 wessels Exp $
   6
   7
   8 COSS is a Cyclic Object storage system originally designed by
   9 Eric Stern <estern@logisense.com>. The idea has been extended
  10 and worked into the current framework by myself.
  11
  12 In these notes I'll discuss the current implementation of COSS
  13 and note where the implementation differed from Eric's original
  14 idea and why the design changes were made.
  15
  16
  17 COSS basics
  18 -----------
  19
  20 COSS works with a single file. Eventually the file may actually be
  21 a raw disk device, but since squid doesn't cache the disk reads
  22 in memory the OS buffer cache will need to be employed for reasonable
  23 performance. For the purposes of this discussion the COSS storage
  24 device will be referred to as a file.
  25
  26 Each stripe is a fixed size an in a fixed position in the file. The
  27 stripe size is a compile-time option.
  28
  29 As objects are written to a COSS stripe, their place is pre-reserved
  30 and data is copied into a memory copy of the stripe. Because of this,
  31 the object size must be known before it can be stored in a COSS
  32 filesystem. (Hence the max-size requirement with a coss cache_dir.)
  33
  34 When a stripe is filled, the stripe is written to disk, and a new
  35 memory stripe is created.
  36
  37 When objects are read back from the COSS file, they can either come
  38 from a stripe in-memory (the current one, or one being written),
  39 or from the disk. If the object is still in a memory stripe, then
  40 it is copied from memory rather than read of disk.
  41
  42 If an object is read from disk, it is re-written to the head of
  43 the current stripe (just as if it were a new object.) This is required
  44 for correct operation of the replacement policy, detailed below.
  45
  46 When the entire COSS file is full, the current stripe again becomes the
  47 fist stripe in the file, and the objects in that stripe are released.
  48 Since the objects on disk are kept in a strict LRU representing the
  49 replacement policy LRU linking the StoreEntry's together, this simply
  50 involves walking the tail of the LRU and freeing entries until we
  51 hit an entry in the next stripe.
  52
  53
  54 COSS implementation details
  55 ---------------------------
  56
  57 * The stripe size is fixed. In the original COSS code, Eric optimised
  58   this a little by allowing the stripes to be truncated to not
  59   waste disk space at the end of the stripe. This was removed
  60   to simplify the allocation code slightly and make things easier
  61   when the store log and checksums are combined in the stripe
  62   for faster rebuilds.
  63
  64 * COSS currently copies object memory around WAY too much. This needs
  65   to be fixed eventually.
  66
  67 * It would be nice if the storeRead() interface were a little smarter
  68   and allowed the filesystem to return as much of an object as possible.
  69   This would be good for COSS since the read from disk could be simplified
  70   to use a single OS read() call - this would work really well for
  71   the object types COSS is designed to cache.
  72
  73 * The original coss code used file_read() and file_write() for disk IO.
  74   The file_* routines were initially used to implement async disk IO,
  75   and Eric probably wrote some async disk code for windows.
  76   I've written a very very simple async_io.c module which uses POSIX
  77   AIO to implement the async IO. POSIX AIO is well-suited to the
  78   disk IO COSS performs.
  79
  80 COSS direction
  81 --------------
  82
  83 Eventually, when more of squid is rewritten, I'm going to replace
  84 the replacement policy with something a little more flexible.
  85 A shortcut would be to use a slab allocator and have one slab per
  86 stripe for the StoreEntry's. When it comes time to replace a stripe,
  87 you can just treat the stripe as an array. This would not work
  88 well in the current squid codebase, but it would work well in the
  89 planned rewrite. This would also allow alternate replacement policies
  90 to be used. Oh, it'd cut down the storage requirements per
  91 StoreEntry by two pointers (8 bytes on the i386.)
  92
  93 Notes by DW July 23, 2003
  94 -------------------------
  95
  96 Fixed up swap_filen -> offset implementation.  Now user can use a
  97 block-size setting to determine the maximum COSS cache_dir size.
  98
  99 Fixed bug when cached response is larger than COSS stripe size.
 100 Now require max-size to be less than COSS_MEMBUF_SZ.
 101
 102 Fixed a lockcount bug.  Some aborted requests for cache hits failed
 103 to unlock the CossMemBuf because storeCossReadDone isn't called again.
 104 Solution is to add locked_membuf pointer to CossState structure and
 105 always unlock it if set.  This is probably more reliable than
 106 unlocking based on diskstart/diskend offsets.
 107
 108 I'm worried that COSS is susceptible to a denial-of-service.  If
 109 the user can create N cache misses for responses about as large as
 110 COSS_MEMBUF_SZ, then COSS probably allocates N membufs (stripes)
 111 at the same time.  For large enough values of N, this should cause
 112 a malloc failure.  Solution may be to refuse to allocate new stripes
 113 (thus returning failure for cache misses and hits) after so many
 114 have already been allocated.
 115
 116 Adrian's code has this comment:
 117
 118     /* Since we're not supporting NOTIFY anymore, lets fail */
 119     assert(which != COSS_ALLOC_NOTIFY);
 120
 121 However, COSS_ALLOC_NOTIFY was still present in the store_dir_coss.c
 122 rebuild routines.  To avoid assertions during rebuild, I commented
 123 out the storeCossAllocate(SD, e, COSS_ALLOC_NOTIFY) call.