When trying to work out why a non-crc filesystem took 1m57 to repair
and the same CRC enabled filesystem took 11m35 to repair, I noticed
that there was far too much CRC checking going on. Prefetched buffers
should not be CRCed, yet shortly after starting this began
to happen. perf profiling also showed up an awful lot of time doing
buffer cache lookups, and the cache profile output indicated that
the hit rate was way below 3%. IOWs, the readahead was getting so
far ahead of the processing that it was thrashing the cache.
That there is a difference in processing rate between CRC and
non-CRC filesystems is not surprising. What is surprising is the
readahead behaviour - it basically just keeps reading ahead until it
has read everything on an AG, and then it goes on to the next AG,
and reads everything on it, and then goes on to the next AG,....
This goes on until it pushes all the buffers the processing threads
need out of the cache, and suddenly they start re-reading from disk
with the various CRC checking verifiers enabled, and we end up going
-really- slow. Yes, threading made up for it a bit, but it's just
wrong.
Basically, the code assumes that IO is going to be slower than
processing, so it doesn't throttle prefetch across AGs to slow
down prefetch to match the processing rate.
So, to fix this, don't let a prefetch thread get more than a single
AG ahead of it's processing thread, just like occurs for single
threaded (i.e. -o ag_stride=-1) operation.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>