]>
Commit | Line | Data |
---|---|---|
997358a6 MW |
1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> |
2 | <HTML> | |
3 | <HEAD> | |
4 | <TITLE>Introduction to FreeS/WAN</TITLE> | |
5 | <META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=iso-8859-1"> | |
6 | <STYLE TYPE="text/css"><!-- | |
7 | BODY { font-family: serif } | |
8 | H1 { font-family: sans-serif } | |
9 | H2 { font-family: sans-serif } | |
10 | H3 { font-family: sans-serif } | |
11 | H4 { font-family: sans-serif } | |
12 | H5 { font-family: sans-serif } | |
13 | H6 { font-family: sans-serif } | |
14 | SUB { font-size: smaller } | |
15 | SUP { font-size: smaller } | |
16 | PRE { font-family: monospace } | |
17 | --></STYLE> | |
18 | </HEAD> | |
19 | <BODY> | |
20 | <A HREF="toc.html">Contents</A> | |
21 | <A HREF="interop.html">Previous</A> | |
22 | <A HREF="testing.html">Next</A> | |
23 | <HR> | |
24 | <H1><A name="performance">Performance of FreeS/WAN</A></H1> | |
25 | The performance of FreeS/WAN is adequate for most applications. | |
26 | <P>In normal operation, the main concern is the overhead for encryption, | |
27 | decryption and authentication of the actual IPsec (<A href="glossary.html#ESP"> | |
28 | ESP</A> and/or<A href="glossary.html#AH"> AH</A>) data packets. Tunnel | |
29 | setup and rekeying occur so much less frequently than packet processing | |
30 | that, in general, their overheads are not worth worrying about.</P> | |
31 | <P>At startup, however, tunnel setup overheads may be significant. If | |
32 | you reboot a gateway and it needs to establish many tunnels, expect | |
33 | some delay. This and other issues for large gateways are discussed<A href="#biggate"> | |
34 | below</A>.</P> | |
35 | <H2><A name="pub.bench">Published material</A></H2> | |
36 | <P>The University of Wales at Aberystwyth has done quite detailed speed | |
37 | tests and put<A href="http://tsc.llwybr.org.uk/public/reports/SWANTIME/"> | |
38 | their results</A> on the web.</P> | |
39 | <P>Davide Cerri's<A href="http://www.linux.it/~davide/doc/"> thesis (in | |
40 | Italian)</A> includes performance results for FreeS/WAN and for<A href="glossary.html#TLS"> | |
41 | TLS</A>. He posted an<A href="http://lists.freeswan.org/pipermail/users/2001-December/006303.html"> | |
42 | English summary</A> on the mailing list.</P> | |
43 | <P>Steve Bellovin used one of AT&T Research's FreeS/WAN gateways as his | |
44 | data source for an analysis of the cache sizes required for key | |
45 | swapping in IPsec. Available as<A href="http://www.research.att.com/~smb/talks/key-agility.email.txt"> | |
46 | text</A> or<A href="http://www.research.att.com/~smb/talks/key-agility.pdf"> | |
47 | PDF slides</A> for a talk on the topic.</P> | |
48 | <P>See also the NAI work mentioned in the next section.</P> | |
49 | <H2><A name="perf.estimate">Estimating CPU overheads</A></H2> | |
50 | <P>We can come up with a formula that roughly relates CPU speed to the | |
51 | rate of IPsec processing possible. It is far from exact, but should be | |
52 | usable as a first approximation.</P> | |
53 | <P>An analysis of authentication overheads for high-speed networks, | |
54 | including some tests using FreeS/WAN, is on the<A href="http://www.pgp.com/research/nailabs/cryptographic/adaptive-cryptographic.asp"> | |
55 | NAI Labs site</A>. In particular, see figure 3 in this<A href="http://download.nai.com/products/media/pgp/pdf/acsa_final_report.pdf"> | |
56 | PDF document</A>. Their estimates of overheads, measured in Pentium II | |
57 | cycles per byte processed are:</P> | |
58 | <TABLE align="center" border="1"><TBODY></TBODY> | |
59 | <TR><TH></TH><TH>IPsec</TH><TH>authentication</TH><TH>encryption</TH><TH> | |
60 | cycles/byte</TH></TR> | |
61 | <TR><TD>Linux IP stack alone</TD><TD>no</TD><TD>no</TD><TD>no</TD><TD align="right"> | |
62 | 5</TD></TR> | |
63 | <TR><TD>IPsec without crypto</TD><TD>yes</TD><TD>no</TD><TD>no</TD><TD align="right"> | |
64 | 11</TD></TR> | |
65 | <TR><TD>IPsec, authentication only</TD><TD>yes</TD><TD>SHA-1</TD><TD>no</TD><TD | |
66 | align="right">24</TD></TR> | |
67 | <TR><TD>IPsec with encryption</TD><TD>yes</TD><TD>yes</TD><TD>yes</TD><TD | |
68 | align="right">not tested</TD></TR> | |
69 | </TABLE> | |
70 | <P>Overheads for IPsec with encryption were not tested in the NAI work, | |
71 | but Antoon Bosselaers'<A href="http://www.esat.kuleuven.ac.be/~bosselae/fast.html"> | |
72 | web page</A> gives cost for his optimised Triple DES implementation as | |
73 | 928 Pentium cycles per block, or 116 per byte. Adding that to the 24 | |
74 | above, we get 140 cycles per byte for IPsec with encryption.</P> | |
75 | <P>At 140 cycles per byte, a 140 MHz machine can handle a megabyte -- 8 | |
76 | megabits -- per second. Speeds for other machines will be proportional | |
77 | to this. To saturate a link with capacity C megabits per second, you | |
78 | need a machine running at<VAR> C * 140/8 = C * 17.5</VAR> MHz.</P> | |
79 | <P>However, that estimate is not precise. It ignores the differences | |
80 | between:</P> | |
81 | <UL> | |
82 | <LI>NAI's test packets and real traffic</LI> | |
83 | <LI>NAI's Pentium II cycles, Bosselaers' Pentium cycles, and your | |
84 | machine's cycles</LI> | |
85 | <LI>different 3DES implementations</LI> | |
86 | <LI>SHA-1 and MD5</LI> | |
87 | </UL> | |
88 | <P>and does not account for some overheads you will almost certainly | |
89 | have:</P> | |
90 | <UL> | |
91 | <LI>communication on the client-side interface</LI> | |
92 | <LI>switching between multiple tunnels -- re-keying, cache reloading and | |
93 | so on</LI> | |
94 | </UL> | |
95 | <P>so we suggest using<VAR> C * 25</VAR> to get an estimate with a bit | |
96 | of a built-in safety factor.</P> | |
97 | <P>This covers only IP and IPsec processing. If you have other loads on | |
98 | your gateway -- for example if it is also working as a firewall -- then | |
99 | you will need to add your own safety factor atop that.</P> | |
100 | <P>This estimate matches empirical data reasonably well. For example, | |
101 | Metheringham's tests, described<A href="#klips.bench"> below</A>, show | |
102 | a 733 topping out between 32 and 36 Mbit/second, pushing data as fast | |
103 | as it can down a 100 Mbit link. Our formula suggests you need at least | |
104 | an 800 to handle a fully loaded 32 Mbit link. The two results are | |
105 | consistent.</P> | |
106 | <P>Some examples using this estimation method:</P> | |
107 | <TABLE align="center" border="1"><TBODY></TBODY> | |
108 | <TR><TH colspan="2">Interface</TH><TH colspan="3">Machine speed in MHz</TH> | |
109 | </TR> | |
110 | <TR><TH>Type</TH><TH>Mbit per | |
111 | <BR> second</TH><TH>Estimate | |
112 | <BR> Mbit*25</TH><TH>Minimum IPSEC gateway</TH><TH>Minimum with other | |
113 | load | |
114 | <P>(e.g. firewall)</P> | |
115 | </TH></TR> | |
116 | <TR><TD>DSL</TD><TD align="right">1</TD><TD align="right">25 MHz</TD><TD rowspan="2"> | |
117 | whatever you have</TD><TD rowspan="2">133, or better if you have it</TD></TR> | |
118 | <TR><TD>cable modem</TD><TD align="right">3</TD><TD align="right">75 MHz</TD> | |
119 | </TR> | |
120 | <TR><TD><STRONG>any link, light load</STRONG></TD><TD align="right"><STRONG> | |
121 | 5</STRONG></TD><TD align="right">125 MHz</TD><TD>133</TD><TD>200+,<STRONG> | |
122 | almost any surplus machine</STRONG></TD></TR> | |
123 | <TR><TD>Ethernet</TD><TD align="right">10</TD><TD align="right">250 MHz</TD><TD> | |
124 | surplus 266 or 300</TD><TD>500+</TD></TR> | |
125 | <TR><TD><STRONG>fast link, moderate load</STRONG></TD><TD align="right"><STRONG> | |
126 | 20</STRONG></TD><TD align="right">500 MHz</TD><TD>500</TD><TD>800+,<STRONG> | |
127 | any current off-the-shelf PC</STRONG></TD></TR> | |
128 | <TR><TD>T3 or E3</TD><TD align="right">45</TD><TD align="right">1125 MHz</TD><TD> | |
129 | 1200</TD><TD>1500+</TD></TR> | |
130 | <TR><TD>fast Ethernet</TD><TD align="right">100</TD><TD align="right"> | |
131 | 2500 MHz</TD><TD align="center" colspan="2" rowspan="2">// not feasible | |
132 | with 3DES in software on current machines //</TD></TR> | |
133 | <TR><TD>OC3</TD><TD align="right">155</TD><TD align="right">3875 MHz</TD> | |
134 | </TR> | |
135 | </TABLE> | |
136 | <P>Such an estimate is far from exact, but should be usable as minimum | |
137 | requirement for planning. The key observations are:</P> | |
138 | <UL> | |
139 | <LI>older<STRONG> surplus machines</STRONG> are fine for IPsec gateways | |
140 | at loads up to<STRONG> 5 megabits per second</STRONG> or so</LI> | |
141 | <LI>a<STRONG> mid-range new machine</STRONG> can handle IPsec at rates | |
142 | up to<STRONG> 20 megabits per second</STRONG> or more</LI> | |
143 | </UL> | |
144 | <H3><A name="perf.more">Higher performance alternatives</A></H3> | |
145 | <P><A href="glossary.html#AES">AES</A> is a new US government block | |
146 | cipher standard, designed to replace the obsolete<A href="glossary.html#DES"> | |
147 | DES</A>. If FreeS/WAN using<A href="glossary.html#3DES"> 3DES</A> is | |
148 | not fast enough for your application, the AES<A href="web.html#patch"> | |
149 | patch</A> may help.</P> | |
150 | <P>To date (March 2002) we have had only one<A href="http://lists.freeswan.org/pipermail/users/2002-February/007771.html"> | |
151 | mailing list report</A> of measurements with the patch applied. It | |
152 | indicates that, at least for the tested load on that user's network,<STRONG> | |
153 | AES roughly doubles IPsec throughput</STRONG>. If further testing | |
154 | confirms this, it may prove possible to saturate an OC3 link in | |
155 | software on a high-end box.</P> | |
156 | <P>Also, some work is being done toward support of<A href="compat.html#hardware"> | |
157 | hardware IPsec acceleration</A> which might extend the range of | |
158 | requirements FreeS/WAN could meet.</P> | |
159 | <H3><A NAME="11_2_2">Other considerations</A></H3> | |
160 | <P>CPU speed may be the main issue for IPsec performance, but of course | |
161 | it isn't the only one.</P> | |
162 | <P>You need good ethernet cards or other network interface hardware to | |
163 | get the best performance. See this<A href="http://www.ethermanage.com/ethernet/ethernet.html"> | |
164 | ethernet information</A> page and this<A href="http://www.scyld.com/diag"> | |
165 | Linux network driver</A> page.</P> | |
166 | <P>The current FreeS/WAN kernel code is largely single-threaded. It is | |
167 | SMP safe, and will run just fine on a multiprocessor machine (<A href="compat.html#multiprocessor"> | |
168 | discussion</A>), but the load within the kernel is not shared | |
169 | effectively. This means that, for example to saturate a T3 -- which | |
170 | needs about a 1200 MHz machine -- you cannot expect something like a | |
171 | dual 800 to do the job.</P> | |
172 | <P>On the other hand, SMP machines do tend to share loads well so -- | |
173 | provided one CPU is fast enough for the IPsec work -- a multiprocessor | |
174 | machine may be ideal for a gateway with a mixed load.</P> | |
175 | <H2><A name="biggate">Many tunnels from a single gateway</A></H2> | |
176 | <P>FreeS/WAN allows a single gateway machine to build tunnels to many | |
177 | others. There may, however, be some problems for large numbers as | |
178 | indicated in this message from the mailing list:</P> | |
179 | <PRE>Subject: Re: Maximum number of ipsec tunnels? | |
180 | Date: Tue, 18 Apr 2000 | |
181 | From: "John S. Denker" <jsd@research.att.com> | |
182 | ||
183 | Christopher Ferris wrote: | |
184 | ||
185 | >> What are the maximum number ipsec tunnels FreeS/WAN can handle?? | |
186 | ||
187 | Henry Spencer wrote: | |
188 | ||
189 | >There is no particular limit. Some of the setup procedures currently | |
190 | >scale poorly to large numbers of connections, but there are (clumsy) | |
191 | >workarounds for that now, and proper fixes are coming. | |
192 | ||
193 | 1) "Large" numbers means anything over 50 or so. I routinely run boxes | |
194 | with about 200 tunnels. Once you get more than 50 or so, you need to worry | |
195 | about several scalability issues: | |
196 | ||
197 | a) You need to put a "-" sign in syslogd.conf, and rotate the logs daily | |
198 | not weekly. | |
199 | ||
200 | b) Processor load per tunnel is small unless the tunnel is not up, in which | |
201 | case a new half-key gets generated every 90 seconds, which can add up if | |
202 | you've got a lot of down tunnels. | |
203 | ||
204 | c) There's other bits of lore you need when running a large number of | |
205 | tunnels. For instance, systematically keeping the .conf file free of | |
206 | conflicts requires tools that aren't shipped with the standard freeswan | |
207 | package. | |
208 | ||
209 | d) The pluto startup behavior is quadratic. With 200 tunnels, this eats up | |
210 | several minutes at every restart. I'm told fixes are coming soon. | |
211 | ||
212 | 2) Other than item (1b), the CPU load depends mainly on the size of the | |
213 | pipe attached, not on the number of tunnels. | |
214 | </PRE> | |
215 | <P>It is worth noting that item (1b) applies only to repeated attempts | |
216 | to re-key a data connection (IPsec SA, Phase 2) over an established | |
217 | keying connection (ISAKMP SA, Phase 1). There are two ways to reduce | |
218 | this overhead using settings in<A href="manpage.d/ipsec.conf.5.html"> | |
219 | ipsec.conf(5)</A>:</P> | |
220 | <UL> | |
221 | <LI>set<VAR> keyingtries</VAR> to some small value to limit repetitions</LI> | |
222 | <LI>set<VAR> keylife</VAR> to a short time so that a failing data | |
223 | connection will be cleaned up when the keying connection is reset.</LI> | |
224 | </UL> | |
225 | <P>The overheads for establishing keying connections (ISAKMP SAs, Phase | |
226 | 1) are lower because for these Pluto does not perform expensive | |
227 | operations before receiving a reply from the peer.</P> | |
228 | <P>A gateway that does a lot of rekeying -- many tunnels and/or low | |
229 | settings for tunnel lifetimes -- will also need a lot of<A href="glossary.html#random"> | |
230 | random numbers</A> from the random(4) driver.</P> | |
231 | <H2><A name="low-end">Low-end systems</A></H2> | |
232 | <P><EM>Even a 486 can handle a T1 line</EM>, according to this mailing | |
233 | list message:</P> | |
234 | <PRE>Subject: Re: linux-ipsec: IPSec Masquerade | |
235 | Date: Fri, 15 Jan 1999 11:13:22 -0500 | |
236 | From: Michael Richardson | |
237 | ||
238 | . . . A 486/66 has been clocked by Phil Karn to do | |
239 | 10Mb/s encryption.. that uses all the CPU, so half that to get some CPU, | |
240 | and you have 5Mb/s. 1/3 that for 3DES and you get 1.6Mb/s....</PRE> | |
241 | <P>and a piece of mail from project technical lead Henry Spencer:</P> | |
242 | <PRE>Oh yes, and a new timing point for Sandy's docs... A P60 -- yes, a 60MHz | |
243 | Pentium, talk about antiques -- running a host-to-host tunnel to another | |
244 | machine shows an FTP throughput (that is, end-to-end results with a real | |
245 | protocol) of slightly over 5Mbit/s either way. (The other machine is much | |
246 | faster, the network is 100Mbps, and the ether cards are good ones... so | |
247 | the P60 is pretty definitely the bottleneck.)</PRE> | |
248 | <P>From the above, and from general user experience as reported on the | |
249 | list, it seems clear that a cheap surplus machine -- a reasonable 486, | |
250 | a minimal Pentium box, a Sparc 5, ... -- can easily handle a home | |
251 | office or a small company connection using any of:</P> | |
252 | <UL> | |
253 | <LI>ADSL service</LI> | |
254 | <LI>cable modem</LI> | |
255 | <LI>T1</LI> | |
256 | <LI>E1</LI> | |
257 | </UL> | |
258 | <P>If available, we suggest using a Pentium 133 or better. This should | |
259 | ensure that, even under maximum load, IPsec will use less than half the | |
260 | CPU cycles. You then have enough left for other things you may want on | |
261 | your gateway -- firewalling, web caching, DNS and such.</P> | |
262 | <H2><A name="klips.bench">Measuring KLIPS</A></H2> | |
263 | <P>Here is some additional data from the mailing list.</P> | |
264 | <PRE>Subject: FreeSWAN (specically KLIPS) performance measurements | |
265 | Date: Thu, 01 Feb 2001 | |
266 | From: Nigel Metheringham <Nigel.Metheringham@intechnology.co.uk> | |
267 | ||
268 | I've spent a happy morning attempting performance tests against KLIPS | |
269 | (this is due to me not being able to work out the CPU usage of KLIPS so | |
270 | resorting to the crude measurements of maximum throughput to give a | |
271 | baseline to work out loading of a box). | |
272 | ||
273 | Measurements were done using a set of 4 boxes arranged in a line, each | |
274 | connected to the next by 100Mbit duplex ethernet. The inner 2 had an | |
275 | ipsec tunnel between them (shared secret, but I was doing measurements | |
276 | when the tunnel was up and running - keying should not be an issue | |
277 | here). The outer pair of boxes were traffic generators or traffic sink. | |
278 | ||
279 | The crypt boxes are Compaq DL380s - Uniprocessor PIII/733 with 256K | |
280 | cache. They have 128M main memory. Nothing significant was running on | |
281 | the boxes other than freeswan. The kernel was a 2.2.19pre7 patched | |
282 | with freeswan and ext3. | |
283 | ||
284 | Without an ipsec tunnel in the chain (ie the 2 inner boxes just being | |
285 | 100BaseT routers), throughput (measured with ttcp) was between 10644 | |
286 | and 11320 KB/sec | |
287 | ||
288 | With an ipsec tunnel in place, throughput was between 3268 and 3402 | |
289 | KB/sec | |
290 | ||
291 | These measurements are for data pushed across a TCP link, so the | |
292 | traffic on the wire between the 2 ipsec boxes would have been higher | |
293 | than this.... | |
294 | ||
295 | vmstat (run during some other tests, so not affecting those figures) on | |
296 | the encrypting box shows approx 50% system & 50% idle CPU - which I | |
297 | don't believe at all. Interactive feel of the box was significantly | |
298 | sluggish. | |
299 | ||
300 | I also tried running the kernel profiler (see man readprofile) during | |
301 | test runs. | |
302 | ||
303 | A box doing primarily decrypt work showed basically nothing happening - | |
304 | I assume interrupts were off. | |
305 | A box doing encrypt work showed the following:- | |
306 | Ticks Function Load | |
307 | ~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~ | |
308 | 956 total 0.0010 | |
309 | 532 des_encrypt2 0.1330 | |
310 | 110 MD5Transform 0.0443 | |
311 | 97 kmalloc 0.1880 | |
312 | 39 des_encrypt3 0.1336 | |
313 | 23 speedo_interrupt 0.0298 | |
314 | 14 skb_copy_expand 0.0250 | |
315 | 13 ipsec_tunnel_start_xmit 0.0009 | |
316 | 13 Decode 0.1625 | |
317 | 11 handle_IRQ_event 0.1019 | |
318 | 11 .des_ncbc_encrypt_end 0.0229 | |
319 | 10 speedo_start_xmit 0.0188 | |
320 | 9 satoa 0.0225 | |
321 | 8 kfree 0.0118 | |
322 | 8 ip_fragment 0.0121 | |
323 | 7 ultoa 0.0365 | |
324 | 5 speedo_rx 0.0071 | |
325 | 5 .des_encrypt2_end 5.0000 | |
326 | 4 _stext 0.0140 | |
327 | 4 ip_fw_check 0.0035 | |
328 | 2 rj_match 0.0034 | |
329 | 2 ipfw_output_check 0.0200 | |
330 | 2 inet_addr_type 0.0156 | |
331 | 2 eth_copy_and_sum 0.0139 | |
332 | 2 dev_get 0.0294 | |
333 | 2 addrtoa 0.0143 | |
334 | 1 speedo_tx_buffer_gc 0.0024 | |
335 | 1 speedo_refill_rx_buf 0.0022 | |
336 | 1 restore_all 0.0667 | |
337 | 1 number 0.0020 | |
338 | 1 net_bh 0.0021 | |
339 | 1 neigh_connected_output 0.0076 | |
340 | 1 MD5Final 0.0083 | |
341 | 1 kmem_cache_free 0.0016 | |
342 | 1 kmem_cache_alloc 0.0022 | |
343 | 1 __kfree_skb 0.0060 | |
344 | 1 ipsec_rcv 0.0001 | |
345 | 1 ip_rcv 0.0014 | |
346 | 1 ip_options_fragment 0.0071 | |
347 | 1 ip_local_deliver 0.0023 | |
348 | 1 ipfw_forward_check 0.0139 | |
349 | 1 ip_forward 0.0011 | |
350 | 1 eth_header 0.0040 | |
351 | 1 .des_encrypt3_end 0.0833 | |
352 | 1 des_decrypt3 0.0034 | |
353 | 1 csum_partial_copy_generic 0.0045 | |
354 | 1 call_out_firewall 0.0125 | |
355 | ||
356 | Hope this data is helpful to someone... however the lack of visibility | |
357 | into the decrypt side makes things less clear</PRE> | |
358 | <H2><A name="speed.compress">Speed with compression</A></H2> | |
359 | <P>Another user reported some results for connections with and without | |
360 | IP compression:</P> | |
361 | <PRE>Subject: [Users] Speed with compression | |
362 | Date: Fri, 29 Jun 2001 | |
363 | From: John McMonagle <johnm@advocap.org> | |
364 | ||
365 | Did a couple tests with compression using the new 1.91 freeswan. | |
366 | ||
367 | Running between 2 sites with cable modems. Both using approximately | |
368 | 130 mhz pentium. | |
369 | ||
370 | Transferred files with ncftp. | |
371 | ||
372 | Compressed file was a 6mb compressed installation file. | |
373 | Non compressed was 18mb /var/lib/rpm/packages.rpm | |
374 | ||
375 | Compressed vpn regular vpn | |
376 | Compress file 42.59 kBs 42.08 kBs | |
377 | regular file 110.84 kBs 41.66 kBs | |
378 | ||
379 | Load was about 0 either way. | |
380 | Ping times were very similar a bit above 9 ms. | |
381 | ||
382 | Compression looks attractive to me.</PRE> | |
383 | Later in the same thread, project technical lead Henry Spencer added: | |
384 | <PRE>> is there a reason not to switch compression on? I have large gateway boxes | |
385 | > connecting 3 connections, one of them with a measly DS1 link... | |
386 | ||
387 | Run some timing tests with and without, with data and loads representative | |
388 | of what you expect in production. That's the definitive way to decide. | |
389 | If compression is a net loss, then obviously, leave it turned off. If it | |
390 | doesn't make much difference, leave it off for simplicity and hence | |
391 | robustness. If there's a substantial gain, by all means turn it on. | |
392 | ||
393 | If both ends support compression and can successfully negotiate a | |
394 | compressed connection (trivially true if both are FreeS/WAN 1.91), then | |
395 | the crucial question is CPU cycles. | |
396 | ||
397 | Compression has some overhead, so one question is whether *your* data | |
398 | compresses well enough to save you more CPU cycles (by reducing the volume | |
399 | of data going through CPU-intensive encryption/decryption) than it costs | |
400 | you. Last time I ran such tests on data that was reasonably compressible | |
401 | but not deliberately contrived to be so, this generally was not true -- | |
402 | compression cost extra CPU cycles -- so compression was worthwhile only if | |
403 | the link, not the CPU, was the bottleneck. However, that was before the | |
404 | slow-compression bug was fixed. I haven't had a chance to re-run those | |
405 | tests yet, but it sounds like I'd probably see a different result. </PRE> | |
406 | The bug he refers to was a problem with the compression libraries that | |
407 | had us using C code, rather than assembler, for compression. It was | |
408 | fixed before 1.91. | |
409 | <H2><A name="methods">Methods of measuring</A></H2> | |
410 | <P>If you want to measure the loads FreeS/WAN puts on a system, note | |
411 | that tools such as top or measurements such as load average are | |
412 | more-or-less useless for this. They are not designed to measure | |
413 | something that does most of its work inside the kernel.</P> | |
414 | <P>Here is a message from FreeS/WAN kernel programmer Richard Guy Briggs | |
415 | on this:</P> | |
416 | <PRE>> I have a batch of boxes doing Freeswan stuff. | |
417 | > I want to measure the CPU loading of the Freeswan tunnels, but am | |
418 | > having trouble seeing how I get some figures out... | |
419 | > | |
420 | > - Keying etc is in userspace so will show up on the per-process | |
421 | > and load average etc (ie pluto's load) | |
422 | ||
423 | Correct. | |
424 | ||
425 | > - KLIPS is in the kernel space, and does not show up in load average | |
426 | > I think also that the KLIPS per-packet processing stuff is running | |
427 | > as part of an interrupt handler so it does not show up in the | |
428 | > /proc/stat system_cpu or even idle_cpu figures | |
429 | ||
430 | It is not running in interrupt handler. It is in the bottom half. | |
431 | This is somewhere between user context (careful, this is not | |
432 | userspace!) and hardware interrupt context. | |
433 | ||
434 | > Is this correct, and is there any means of instrumenting how much the | |
435 | > cpu is being loaded - I don't like the idea of a system running out of | |
436 | > steam whilst still showing 100% idle CPU :-) | |
437 | ||
438 | vmstat seems to do a fairly good job, but use a running tally to get a | |
439 | good idea. A one-off call to vmstat gives different numbers than a | |
440 | running stat. To do this, put an interval on your vmstat command | |
441 | line.</PRE> | |
442 | and another suggestion from the same thread: | |
443 | <PRE>Subject: Re: Measuring the CPU usage of Freeswan | |
444 | Date: Mon, 29 Jan 2001 | |
445 | From: Patrick Michael Kane <modus@pr.es.to> | |
446 | ||
447 | The only truly accurate way to accurately track FreeSWAN CPU usage is to use | |
448 | a CPU soaker. You run it on an unloaded system as a benchmark, then start up | |
449 | FreeSWAN and take the difference to determine how much FreeSWAN is eating. | |
450 | I believe someone has done this in the past, so you may find something in | |
451 | the FreeSWAN archives. If not, someone recently posted a URL to a CPU | |
452 | soaker benchmark tool on linux-kernel.</PRE> | |
453 | <HR> | |
454 | <A HREF="toc.html">Contents</A> | |
455 | <A HREF="interop.html">Previous</A> | |
456 | <A HREF="testing.html">Next</A> | |
457 | </BODY> | |
458 | </HTML> |