]>
Commit | Line | Data |
---|---|---|
d8b6dda4 | 1 | This is a patched version of zlib modified to use |
2 | Pentium-optimized assembly code in the deflation algorithm. The files | |
3 | changed/added by this patch are: | |
4 | ||
5 | README.586 | |
6 | match.S | |
7 | ||
8 | The effectiveness of these modifications is a bit marginal, as the the | |
9 | program's bottleneck seems to be mostly L1-cache contention, for which | |
10 | there is no real way to work around without rewriting the basic | |
11 | algorithm. The speedup on average is around 5-10% (which is generally | |
12 | less than the amount of variance between subsequent executions). | |
13 | However, when used at level 9 compression, the cache contention can | |
14 | drop enough for the assembly version to achieve 10-20% speedup (and | |
15 | sometimes more, depending on the amount of overall redundancy in the | |
16 | files). Even here, though, cache contention can still be the limiting | |
17 | factor, depending on the nature of the program using the zlib library. | |
18 | This may also mean that better improvements will be seen on a Pentium | |
19 | with MMX, which suffers much less from L1-cache contention, but I have | |
20 | not yet verified this. | |
21 | ||
22 | Note that this code has been tailored for the Pentium in particular, | |
23 | and will not perform well on the Pentium Pro (due to the use of a | |
24 | partial register in the inner loop). | |
25 | ||
26 | If you are using an assembler other than GNU as, you will have to | |
27 | translate match.S to use your assembler's syntax. (Have fun.) | |
28 | ||
29 | Brian Raiter | |
30 | breadbox@muppetlabs.com | |
31 | April, 1998 | |
32 | ||
33 | ||
34 | Added for zlib 1.1.3: | |
35 | ||
36 | The patches come from | |
37 | http://www.muppetlabs.com/~breadbox/software/assembly.html | |
38 | ||
39 | To compile zlib with this asm file, copy match.S to the zlib directory | |
40 | then do: | |
41 | ||
42 | CFLAGS="-O3 -DASMV" ./configure | |
43 | make OBJA=match.o |