Restructure the loop, and see about 3% speedup in run time. I believe the
speedup arises from:
o. Remove the conditional branch in the loop
o. Remove some indirection memory accesses:
The memory accesses to "s->prev_length" s->strstart" cannot be promoted
to register because the compiler is not able to disambiguate them with
store-operation in INSERT_STRING()
o. Convert non-countable loop to countable loop.
I'm not sure if this change really contribute, in general, countable
loop is lots easier to optimized than non-countable loop.