OPTIM: server: start to use aligned allocs in server
This is currently for per-thread arrays like idle conns etc. We're
now cache-aligning the per-thread arrays so as to put an end to false
sharing. A comparative test between no alignment and alignment on a
simple config with round robin between 4 servers showed an average
rate of 1.75M/s vs 1.72M/s before for 100M requests. The gain seems
to be more commonly less than 1% however. This should mostly help
make measurements more reproducible across multiple runs.