Bump the blocksize up from 62 to 64 to speed up the modulo calculation.
Remove the old comment suggesting that it was desireable to have
blocksize+2 as a multiple of the cache line length. That would
have made sense only if the block structure start point was always
aligned to a cache line boundary. However, the memory allocations
are 16 byte aligned, so we don't really have control over whether
the struct spills across cache line boundaries.