DEBUG: thread: report the wait time buckets for lock classes
In addition to the total/average wait time, we now also store the
wait time in 2^N buckets. There are 32 buckets for each type (read,
seek, write), allowing to store wait times from 1-2ns to 2.1-4.3s,
which is quite sufficient, even if we'd want to switch from NS to
CPU cycles in the future. The counters are only reported for non-
zero buckets so as not to visually pollute the output.
This significantly inflates the lock_stat struct, which is now
aligned to 256 bytes and rounded up to 1kB. But that's not really
a problem, given that there's only one per lock label.