On a benchmark using zlib to decompress a PNG image this change shows a 20%
speedup. It makes sense to special case distance = 1 of read after write
dependences because it is possible to replace the loop kernel with a memset
which is usually implemented in assembly in the libc, and because of the
frequency at which distance = 1 appears during the PNG decompression:
Distance Frequency
1
1009001
6 64500
9 29000
3 25500
144 14500
12 10000
15 3500
7 2000
24 1000
21 1000
18 1000
87 500
22 500
192 500
}
} else {
from = out - dist; /* copy direct from output */
- do { /* minimum length is three */
- *out++ = *from++;
- *out++ = *from++;
- *out++ = *from++;
- len -= 3;
- } while (len > 2);
- if (len) {
- *out++ = *from++;
- if (len > 1)
+ if (dist == 1) {
+ memset (out, *from, len);
+ out += len;
+ } else {
+ do { /* minimum length is three */
+ *out++ = *from++;
+ *out++ = *from++;
*out++ = *from++;
+ len -= 3;
+ } while (len > 2);
+ if (len) {
+ *out++ = *from++;
+ if (len > 1)
+ *out++ = *from++;
+ }
}
}
} else if ((op & 64) == 0) { /* 2nd level distance code */