]>
Commit | Line | Data |
---|---|---|
a2c95a72 SR |
1 | AMCC suggested to set the PMU bit to 0 for best performace on the |
2 | PPC440 DDR controller. The 440er common DDR setup files (sdram.c & | |
3 | spd_sdram.c) are changed accordingly. So all 440er boards using | |
4 | these setup routines will automatically receive this performance | |
5 | increase. | |
6 | ||
7 | Please see below some benchmarks done by AMCC to demonstrate this | |
8 | performance changes: | |
9 | ||
10 | ||
11 | ---------------------------------------- | |
a187559e | 12 | SDRAM0_CFG0[PMU] = 1 (U-Boot default for Bamboo, Yosemite and Yellowstone) |
a2c95a72 SR |
13 | ---------------------------------------- |
14 | Stream benchmark results | |
15 | ------------------------------------------------------------- | |
16 | This system uses 8 bytes per DOUBLE PRECISION word. | |
17 | ------------------------------------------------------------- | |
18 | Array size = 2000000, Offset = 0 | |
19 | Total memory required = 45.8 MB. | |
20 | Each test is run 10 times, but only | |
21 | the *best* time for each is used. | |
22 | ------------------------------------------------------------- | |
23 | Your clock granularity/precision appears to be 1 microseconds. | |
24 | Each test below will take on the order of 112345 microseconds. | |
25 | (= 112345 clock ticks) | |
26 | Increase the size of the arrays if this shows that you are not getting | |
27 | at least 20 clock ticks per test. | |
28 | ------------------------------------------------------------- | |
29 | WARNING -- The above is only a rough guideline. | |
30 | For best results, please be sure you know the precision of your system | |
31 | timer. | |
32 | ------------------------------------------------------------- | |
33 | Function Rate (MB/s) RMS time Min time Max time | |
34 | Copy: 256.7683 0.1248 0.1246 0.1250 | |
35 | Scale: 246.0157 0.1302 0.1301 0.1302 | |
36 | Add: 255.0316 0.1883 0.1882 0.1885 | |
37 | Triad: 253.1245 0.1897 0.1896 0.1899 | |
38 | ||
39 | ||
40 | TTCP Benchmark Results | |
41 | ttcp-t: socket | |
42 | ttcp-t: connect | |
43 | ttcp-t: buflen=8192, nbuf=2048, align=16384/0, port=5000 tcp -> | |
44 | localhost | |
45 | ttcp-t: 16777216 bytes in 0.28 real seconds = 454.29 Mbit/sec +++ | |
46 | ttcp-t: 2048 I/O calls, msec/call = 0.14, calls/sec = 7268.57 | |
47 | ttcp-t: 0.0user 0.1sys 0:00real 60% 0i+0d 0maxrss 0+2pf 3+1506csw | |
48 | ||
49 | ---------------------------------------- | |
50 | SDRAM0_CFG0[PMU] = 0 (Suggested modification) | |
51 | Setting PMU = 0 provides a noticeable performance improvement *2% to | |
52 | 5% improvement in memory performance. | |
53 | *Improves the Mbit/sec for TTCP benchmark by almost 76%. | |
54 | ---------------------------------------- | |
55 | Stream benchmark results | |
56 | ------------------------------------------------------------- | |
57 | This system uses 8 bytes per DOUBLE PRECISION word. | |
58 | ------------------------------------------------------------- | |
59 | Array size = 2000000, Offset = 0 | |
60 | Total memory required = 45.8 MB. | |
61 | Each test is run 10 times, but only | |
62 | the *best* time for each is used. | |
63 | ------------------------------------------------------------- | |
64 | Your clock granularity/precision appears to be 1 microseconds. | |
65 | Each test below will take on the order of 120066 microseconds. | |
66 | (= 120066 clock ticks) | |
67 | Increase the size of the arrays if this shows that you are not getting | |
68 | at least 20 clock ticks per test. | |
69 | ------------------------------------------------------------- | |
70 | WARNING -- The above is only a rough guideline. | |
71 | For best results, please be sure you know the precision of your system | |
72 | timer. | |
73 | ------------------------------------------------------------- | |
74 | Function Rate (MB/s) RMS time Min time Max time | |
75 | Copy: 262.5167 0.1221 0.1219 0.1223 | |
76 | Scale: 258.4856 0.1238 0.1238 0.1240 | |
77 | Add: 262.5404 0.1829 0.1828 0.1831 | |
78 | Triad: 266.8594 0.1800 0.1799 0.1802 | |
79 | ||
80 | TTCP Benchmark Results | |
81 | ttcp-t: socket | |
82 | ttcp-t: connect | |
83 | ttcp-t: buflen=8192, nbuf=2048, align=16384/0, port=5000 tcp -> | |
84 | localhost | |
85 | ttcp-t: 16777216 bytes in 0.16 real seconds = 804.06 Mbit/sec +++ | |
86 | ttcp-t: 2048 I/O calls, msec/call = 0.08, calls/sec = 12864.89 | |
87 | ttcp-t: 0.0user 0.0sys 0:00real 46% 0i+0d 0maxrss 0+2pf 120+1csw | |
88 | ||
89 | ||
90 | 2006-07-28, Stefan Roese <sr@denx.de> |