]> git.ipfire.org Git - thirdparty/strongswan.git/blame - doc/performance.html
- import of strongswan-2.7.0
[thirdparty/strongswan.git] / doc / performance.html
CommitLineData
997358a6
MW
1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
2<HTML>
3<HEAD>
4<TITLE>Introduction to FreeS/WAN</TITLE>
5<META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=iso-8859-1">
6<STYLE TYPE="text/css"><!--
7BODY { font-family: serif }
8H1 { font-family: sans-serif }
9H2 { font-family: sans-serif }
10H3 { font-family: sans-serif }
11H4 { font-family: sans-serif }
12H5 { font-family: sans-serif }
13H6 { font-family: sans-serif }
14SUB { font-size: smaller }
15SUP { font-size: smaller }
16PRE { font-family: monospace }
17--></STYLE>
18</HEAD>
19<BODY>
20<A HREF="toc.html">Contents</A>
21<A HREF="interop.html">Previous</A>
22<A HREF="testing.html">Next</A>
23<HR>
24<H1><A name="performance">Performance of FreeS/WAN</A></H1>
25 The performance of FreeS/WAN is adequate for most applications.
26<P>In normal operation, the main concern is the overhead for encryption,
27 decryption and authentication of the actual IPsec (<A href="glossary.html#ESP">
28ESP</A> and/or<A href="glossary.html#AH"> AH</A>) data packets. Tunnel
29 setup and rekeying occur so much less frequently than packet processing
30 that, in general, their overheads are not worth worrying about.</P>
31<P>At startup, however, tunnel setup overheads may be significant. If
32 you reboot a gateway and it needs to establish many tunnels, expect
33 some delay. This and other issues for large gateways are discussed<A href="#biggate">
34 below</A>.</P>
35<H2><A name="pub.bench">Published material</A></H2>
36<P>The University of Wales at Aberystwyth has done quite detailed speed
37 tests and put<A href="http://tsc.llwybr.org.uk/public/reports/SWANTIME/">
38 their results</A> on the web.</P>
39<P>Davide Cerri's<A href="http://www.linux.it/~davide/doc/"> thesis (in
40 Italian)</A> includes performance results for FreeS/WAN and for<A href="glossary.html#TLS">
41 TLS</A>. He posted an<A href="http://lists.freeswan.org/pipermail/users/2001-December/006303.html">
42 English summary</A> on the mailing list.</P>
43<P>Steve Bellovin used one of AT&amp;T Research's FreeS/WAN gateways as his
44 data source for an analysis of the cache sizes required for key
45 swapping in IPsec. Available as<A href="http://www.research.att.com/~smb/talks/key-agility.email.txt">
46 text</A> or<A href="http://www.research.att.com/~smb/talks/key-agility.pdf">
47 PDF slides</A> for a talk on the topic.</P>
48<P>See also the NAI work mentioned in the next section.</P>
49<H2><A name="perf.estimate">Estimating CPU overheads</A></H2>
50<P>We can come up with a formula that roughly relates CPU speed to the
51 rate of IPsec processing possible. It is far from exact, but should be
52 usable as a first approximation.</P>
53<P>An analysis of authentication overheads for high-speed networks,
54 including some tests using FreeS/WAN, is on the<A href="http://www.pgp.com/research/nailabs/cryptographic/adaptive-cryptographic.asp">
55 NAI Labs site</A>. In particular, see figure 3 in this<A href="http://download.nai.com/products/media/pgp/pdf/acsa_final_report.pdf">
56 PDF document</A>. Their estimates of overheads, measured in Pentium II
57 cycles per byte processed are:</P>
58<TABLE align="center" border="1"><TBODY></TBODY>
59<TR><TH></TH><TH>IPsec</TH><TH>authentication</TH><TH>encryption</TH><TH>
60cycles/byte</TH></TR>
61<TR><TD>Linux IP stack alone</TD><TD>no</TD><TD>no</TD><TD>no</TD><TD align="right">
625</TD></TR>
63<TR><TD>IPsec without crypto</TD><TD>yes</TD><TD>no</TD><TD>no</TD><TD align="right">
6411</TD></TR>
65<TR><TD>IPsec, authentication only</TD><TD>yes</TD><TD>SHA-1</TD><TD>no</TD><TD
66align="right">24</TD></TR>
67<TR><TD>IPsec with encryption</TD><TD>yes</TD><TD>yes</TD><TD>yes</TD><TD
68align="right">not tested</TD></TR>
69</TABLE>
70<P>Overheads for IPsec with encryption were not tested in the NAI work,
71 but Antoon Bosselaers'<A href="http://www.esat.kuleuven.ac.be/~bosselae/fast.html">
72 web page</A> gives cost for his optimised Triple DES implementation as
73 928 Pentium cycles per block, or 116 per byte. Adding that to the 24
74 above, we get 140 cycles per byte for IPsec with encryption.</P>
75<P>At 140 cycles per byte, a 140 MHz machine can handle a megabyte -- 8
76 megabits -- per second. Speeds for other machines will be proportional
77 to this. To saturate a link with capacity C megabits per second, you
78 need a machine running at<VAR> C * 140/8 = C * 17.5</VAR> MHz.</P>
79<P>However, that estimate is not precise. It ignores the differences
80 between:</P>
81<UL>
82<LI>NAI's test packets and real traffic</LI>
83<LI>NAI's Pentium II cycles, Bosselaers' Pentium cycles, and your
84 machine's cycles</LI>
85<LI>different 3DES implementations</LI>
86<LI>SHA-1 and MD5</LI>
87</UL>
88<P>and does not account for some overheads you will almost certainly
89 have:</P>
90<UL>
91<LI>communication on the client-side interface</LI>
92<LI>switching between multiple tunnels -- re-keying, cache reloading and
93 so on</LI>
94</UL>
95<P>so we suggest using<VAR> C * 25</VAR> to get an estimate with a bit
96 of a built-in safety factor.</P>
97<P>This covers only IP and IPsec processing. If you have other loads on
98 your gateway -- for example if it is also working as a firewall -- then
99 you will need to add your own safety factor atop that.</P>
100<P>This estimate matches empirical data reasonably well. For example,
101 Metheringham's tests, described<A href="#klips.bench"> below</A>, show
102 a 733 topping out between 32 and 36 Mbit/second, pushing data as fast
103 as it can down a 100 Mbit link. Our formula suggests you need at least
104 an 800 to handle a fully loaded 32 Mbit link. The two results are
105 consistent.</P>
106<P>Some examples using this estimation method:</P>
107<TABLE align="center" border="1"><TBODY></TBODY>
108<TR><TH colspan="2">Interface</TH><TH colspan="3">Machine speed in MHz</TH>
109</TR>
110<TR><TH>Type</TH><TH>Mbit per
111<BR> second</TH><TH>Estimate
112<BR> Mbit*25</TH><TH>Minimum IPSEC gateway</TH><TH>Minimum with other
113 load
114<P>(e.g. firewall)</P>
115</TH></TR>
116<TR><TD>DSL</TD><TD align="right">1</TD><TD align="right">25 MHz</TD><TD rowspan="2">
117whatever you have</TD><TD rowspan="2">133, or better if you have it</TD></TR>
118<TR><TD>cable modem</TD><TD align="right">3</TD><TD align="right">75 MHz</TD>
119</TR>
120<TR><TD><STRONG>any link, light load</STRONG></TD><TD align="right"><STRONG>
1215</STRONG></TD><TD align="right">125 MHz</TD><TD>133</TD><TD>200+,<STRONG>
122 almost any surplus machine</STRONG></TD></TR>
123<TR><TD>Ethernet</TD><TD align="right">10</TD><TD align="right">250 MHz</TD><TD>
124surplus 266 or 300</TD><TD>500+</TD></TR>
125<TR><TD><STRONG>fast link, moderate load</STRONG></TD><TD align="right"><STRONG>
12620</STRONG></TD><TD align="right">500 MHz</TD><TD>500</TD><TD>800+,<STRONG>
127 any current off-the-shelf PC</STRONG></TD></TR>
128<TR><TD>T3 or E3</TD><TD align="right">45</TD><TD align="right">1125 MHz</TD><TD>
1291200</TD><TD>1500+</TD></TR>
130<TR><TD>fast Ethernet</TD><TD align="right">100</TD><TD align="right">
1312500 MHz</TD><TD align="center" colspan="2" rowspan="2">// not feasible
132 with 3DES in software on current machines //</TD></TR>
133<TR><TD>OC3</TD><TD align="right">155</TD><TD align="right">3875 MHz</TD>
134</TR>
135</TABLE>
136<P>Such an estimate is far from exact, but should be usable as minimum
137 requirement for planning. The key observations are:</P>
138<UL>
139<LI>older<STRONG> surplus machines</STRONG> are fine for IPsec gateways
140 at loads up to<STRONG> 5 megabits per second</STRONG> or so</LI>
141<LI>a<STRONG> mid-range new machine</STRONG> can handle IPsec at rates
142 up to<STRONG> 20 megabits per second</STRONG> or more</LI>
143</UL>
144<H3><A name="perf.more">Higher performance alternatives</A></H3>
145<P><A href="glossary.html#AES">AES</A> is a new US government block
146 cipher standard, designed to replace the obsolete<A href="glossary.html#DES">
147 DES</A>. If FreeS/WAN using<A href="glossary.html#3DES"> 3DES</A> is
148 not fast enough for your application, the AES<A href="web.html#patch">
149 patch</A> may help.</P>
150<P>To date (March 2002) we have had only one<A href="http://lists.freeswan.org/pipermail/users/2002-February/007771.html">
151 mailing list report</A> of measurements with the patch applied. It
152 indicates that, at least for the tested load on that user's network,<STRONG>
153 AES roughly doubles IPsec throughput</STRONG>. If further testing
154 confirms this, it may prove possible to saturate an OC3 link in
155 software on a high-end box.</P>
156<P>Also, some work is being done toward support of<A href="compat.html#hardware">
157 hardware IPsec acceleration</A> which might extend the range of
158 requirements FreeS/WAN could meet.</P>
159<H3><A NAME="11_2_2">Other considerations</A></H3>
160<P>CPU speed may be the main issue for IPsec performance, but of course
161 it isn't the only one.</P>
162<P>You need good ethernet cards or other network interface hardware to
163 get the best performance. See this<A href="http://www.ethermanage.com/ethernet/ethernet.html">
164 ethernet information</A> page and this<A href="http://www.scyld.com/diag">
165 Linux network driver</A> page.</P>
166<P>The current FreeS/WAN kernel code is largely single-threaded. It is
167 SMP safe, and will run just fine on a multiprocessor machine (<A href="compat.html#multiprocessor">
168discussion</A>), but the load within the kernel is not shared
169 effectively. This means that, for example to saturate a T3 -- which
170 needs about a 1200 MHz machine -- you cannot expect something like a
171 dual 800 to do the job.</P>
172<P>On the other hand, SMP machines do tend to share loads well so --
173 provided one CPU is fast enough for the IPsec work -- a multiprocessor
174 machine may be ideal for a gateway with a mixed load.</P>
175<H2><A name="biggate">Many tunnels from a single gateway</A></H2>
176<P>FreeS/WAN allows a single gateway machine to build tunnels to many
177 others. There may, however, be some problems for large numbers as
178 indicated in this message from the mailing list:</P>
179<PRE>Subject: Re: Maximum number of ipsec tunnels?
180 Date: Tue, 18 Apr 2000
181 From: &quot;John S. Denker&quot; &lt;jsd@research.att.com&gt;
182
183Christopher Ferris wrote:
184
185&gt;&gt; What are the maximum number ipsec tunnels FreeS/WAN can handle??
186
187Henry Spencer wrote:
188
189&gt;There is no particular limit. Some of the setup procedures currently
190&gt;scale poorly to large numbers of connections, but there are (clumsy)
191&gt;workarounds for that now, and proper fixes are coming.
192
1931) &quot;Large&quot; numbers means anything over 50 or so. I routinely run boxes
194with about 200 tunnels. Once you get more than 50 or so, you need to worry
195about several scalability issues:
196
197a) You need to put a &quot;-&quot; sign in syslogd.conf, and rotate the logs daily
198not weekly.
199
200b) Processor load per tunnel is small unless the tunnel is not up, in which
201case a new half-key gets generated every 90 seconds, which can add up if
202you've got a lot of down tunnels.
203
204c) There's other bits of lore you need when running a large number of
205tunnels. For instance, systematically keeping the .conf file free of
206conflicts requires tools that aren't shipped with the standard freeswan
207package.
208
209d) The pluto startup behavior is quadratic. With 200 tunnels, this eats up
210several minutes at every restart. I'm told fixes are coming soon.
211
2122) Other than item (1b), the CPU load depends mainly on the size of the
213pipe attached, not on the number of tunnels.
214</PRE>
215<P>It is worth noting that item (1b) applies only to repeated attempts
216 to re-key a data connection (IPsec SA, Phase 2) over an established
217 keying connection (ISAKMP SA, Phase 1). There are two ways to reduce
218 this overhead using settings in<A href="manpage.d/ipsec.conf.5.html">
219 ipsec.conf(5)</A>:</P>
220<UL>
221<LI>set<VAR> keyingtries</VAR> to some small value to limit repetitions</LI>
222<LI>set<VAR> keylife</VAR> to a short time so that a failing data
223 connection will be cleaned up when the keying connection is reset.</LI>
224</UL>
225<P>The overheads for establishing keying connections (ISAKMP SAs, Phase
226 1) are lower because for these Pluto does not perform expensive
227 operations before receiving a reply from the peer.</P>
228<P>A gateway that does a lot of rekeying -- many tunnels and/or low
229 settings for tunnel lifetimes -- will also need a lot of<A href="glossary.html#random">
230 random numbers</A> from the random(4) driver.</P>
231<H2><A name="low-end">Low-end systems</A></H2>
232<P><EM>Even a 486 can handle a T1 line</EM>, according to this mailing
233 list message:</P>
234<PRE>Subject: Re: linux-ipsec: IPSec Masquerade
235 Date: Fri, 15 Jan 1999 11:13:22 -0500
236 From: Michael Richardson
237
238. . . A 486/66 has been clocked by Phil Karn to do
23910Mb/s encryption.. that uses all the CPU, so half that to get some CPU,
240and you have 5Mb/s. 1/3 that for 3DES and you get 1.6Mb/s....</PRE>
241<P>and a piece of mail from project technical lead Henry Spencer:</P>
242<PRE>Oh yes, and a new timing point for Sandy's docs... A P60 -- yes, a 60MHz
243Pentium, talk about antiques -- running a host-to-host tunnel to another
244machine shows an FTP throughput (that is, end-to-end results with a real
245protocol) of slightly over 5Mbit/s either way. (The other machine is much
246faster, the network is 100Mbps, and the ether cards are good ones... so
247the P60 is pretty definitely the bottleneck.)</PRE>
248<P>From the above, and from general user experience as reported on the
249 list, it seems clear that a cheap surplus machine -- a reasonable 486,
250 a minimal Pentium box, a Sparc 5, ... -- can easily handle a home
251 office or a small company connection using any of:</P>
252<UL>
253<LI>ADSL service</LI>
254<LI>cable modem</LI>
255<LI>T1</LI>
256<LI>E1</LI>
257</UL>
258<P>If available, we suggest using a Pentium 133 or better. This should
259 ensure that, even under maximum load, IPsec will use less than half the
260 CPU cycles. You then have enough left for other things you may want on
261 your gateway -- firewalling, web caching, DNS and such.</P>
262<H2><A name="klips.bench">Measuring KLIPS</A></H2>
263<P>Here is some additional data from the mailing list.</P>
264<PRE>Subject: FreeSWAN (specically KLIPS) performance measurements
265 Date: Thu, 01 Feb 2001
266 From: Nigel Metheringham &lt;Nigel.Metheringham@intechnology.co.uk&gt;
267
268I've spent a happy morning attempting performance tests against KLIPS
269(this is due to me not being able to work out the CPU usage of KLIPS so
270resorting to the crude measurements of maximum throughput to give a
271baseline to work out loading of a box).
272
273Measurements were done using a set of 4 boxes arranged in a line, each
274connected to the next by 100Mbit duplex ethernet. The inner 2 had an
275ipsec tunnel between them (shared secret, but I was doing measurements
276when the tunnel was up and running - keying should not be an issue
277here). The outer pair of boxes were traffic generators or traffic sink.
278
279The crypt boxes are Compaq DL380s - Uniprocessor PIII/733 with 256K
280cache. They have 128M main memory. Nothing significant was running on
281the boxes other than freeswan. The kernel was a 2.2.19pre7 patched
282with freeswan and ext3.
283
284Without an ipsec tunnel in the chain (ie the 2 inner boxes just being
285100BaseT routers), throughput (measured with ttcp) was between 10644
286and 11320 KB/sec
287
288With an ipsec tunnel in place, throughput was between 3268 and 3402
289KB/sec
290
291These measurements are for data pushed across a TCP link, so the
292traffic on the wire between the 2 ipsec boxes would have been higher
293than this....
294
295vmstat (run during some other tests, so not affecting those figures) on
296the encrypting box shows approx 50% system &amp; 50% idle CPU - which I
297don't believe at all. Interactive feel of the box was significantly
298sluggish.
299
300I also tried running the kernel profiler (see man readprofile) during
301test runs.
302
303A box doing primarily decrypt work showed basically nothing happening -
304I assume interrupts were off.
305A box doing encrypt work showed the following:-
306 Ticks Function Load
307 ~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~
308 956 total 0.0010
309 532 des_encrypt2 0.1330
310 110 MD5Transform 0.0443
311 97 kmalloc 0.1880
312 39 des_encrypt3 0.1336
313 23 speedo_interrupt 0.0298
314 14 skb_copy_expand 0.0250
315 13 ipsec_tunnel_start_xmit 0.0009
316 13 Decode 0.1625
317 11 handle_IRQ_event 0.1019
318 11 .des_ncbc_encrypt_end 0.0229
319 10 speedo_start_xmit 0.0188
320 9 satoa 0.0225
321 8 kfree 0.0118
322 8 ip_fragment 0.0121
323 7 ultoa 0.0365
324 5 speedo_rx 0.0071
325 5 .des_encrypt2_end 5.0000
326 4 _stext 0.0140
327 4 ip_fw_check 0.0035
328 2 rj_match 0.0034
329 2 ipfw_output_check 0.0200
330 2 inet_addr_type 0.0156
331 2 eth_copy_and_sum 0.0139
332 2 dev_get 0.0294
333 2 addrtoa 0.0143
334 1 speedo_tx_buffer_gc 0.0024
335 1 speedo_refill_rx_buf 0.0022
336 1 restore_all 0.0667
337 1 number 0.0020
338 1 net_bh 0.0021
339 1 neigh_connected_output 0.0076
340 1 MD5Final 0.0083
341 1 kmem_cache_free 0.0016
342 1 kmem_cache_alloc 0.0022
343 1 __kfree_skb 0.0060
344 1 ipsec_rcv 0.0001
345 1 ip_rcv 0.0014
346 1 ip_options_fragment 0.0071
347 1 ip_local_deliver 0.0023
348 1 ipfw_forward_check 0.0139
349 1 ip_forward 0.0011
350 1 eth_header 0.0040
351 1 .des_encrypt3_end 0.0833
352 1 des_decrypt3 0.0034
353 1 csum_partial_copy_generic 0.0045
354 1 call_out_firewall 0.0125
355
356Hope this data is helpful to someone... however the lack of visibility
357into the decrypt side makes things less clear</PRE>
358<H2><A name="speed.compress">Speed with compression</A></H2>
359<P>Another user reported some results for connections with and without
360 IP compression:</P>
361<PRE>Subject: [Users] Speed with compression
362 Date: Fri, 29 Jun 2001
363 From: John McMonagle &lt;johnm@advocap.org&gt;
364
365Did a couple tests with compression using the new 1.91 freeswan.
366
367Running between 2 sites with cable modems. Both using approximately
368130 mhz pentium.
369
370Transferred files with ncftp.
371
372Compressed file was a 6mb compressed installation file.
373Non compressed was 18mb /var/lib/rpm/packages.rpm
374
375 Compressed vpn regular vpn
376Compress file 42.59 kBs 42.08 kBs
377regular file 110.84 kBs 41.66 kBs
378
379Load was about 0 either way.
380Ping times were very similar a bit above 9 ms.
381
382Compression looks attractive to me.</PRE>
383 Later in the same thread, project technical lead Henry Spencer added:
384<PRE>&gt; is there a reason not to switch compression on? I have large gateway boxes
385&gt; connecting 3 connections, one of them with a measly DS1 link...
386
387Run some timing tests with and without, with data and loads representative
388of what you expect in production. That's the definitive way to decide.
389If compression is a net loss, then obviously, leave it turned off. If it
390doesn't make much difference, leave it off for simplicity and hence
391robustness. If there's a substantial gain, by all means turn it on.
392
393If both ends support compression and can successfully negotiate a
394compressed connection (trivially true if both are FreeS/WAN 1.91), then
395the crucial question is CPU cycles.
396
397Compression has some overhead, so one question is whether *your* data
398compresses well enough to save you more CPU cycles (by reducing the volume
399of data going through CPU-intensive encryption/decryption) than it costs
400you. Last time I ran such tests on data that was reasonably compressible
401but not deliberately contrived to be so, this generally was not true --
402compression cost extra CPU cycles -- so compression was worthwhile only if
403the link, not the CPU, was the bottleneck. However, that was before the
404slow-compression bug was fixed. I haven't had a chance to re-run those
405tests yet, but it sounds like I'd probably see a different result. </PRE>
406 The bug he refers to was a problem with the compression libraries that
407 had us using C code, rather than assembler, for compression. It was
408 fixed before 1.91.
409<H2><A name="methods">Methods of measuring</A></H2>
410<P>If you want to measure the loads FreeS/WAN puts on a system, note
411 that tools such as top or measurements such as load average are
412 more-or-less useless for this. They are not designed to measure
413 something that does most of its work inside the kernel.</P>
414<P>Here is a message from FreeS/WAN kernel programmer Richard Guy Briggs
415 on this:</P>
416<PRE>&gt; I have a batch of boxes doing Freeswan stuff.
417&gt; I want to measure the CPU loading of the Freeswan tunnels, but am
418&gt; having trouble seeing how I get some figures out...
419&gt;
420&gt; - Keying etc is in userspace so will show up on the per-process
421&gt; and load average etc (ie pluto's load)
422
423Correct.
424
425&gt; - KLIPS is in the kernel space, and does not show up in load average
426&gt; I think also that the KLIPS per-packet processing stuff is running
427&gt; as part of an interrupt handler so it does not show up in the
428&gt; /proc/stat system_cpu or even idle_cpu figures
429
430It is not running in interrupt handler. It is in the bottom half.
431This is somewhere between user context (careful, this is not
432userspace!) and hardware interrupt context.
433
434&gt; Is this correct, and is there any means of instrumenting how much the
435&gt; cpu is being loaded - I don't like the idea of a system running out of
436&gt; steam whilst still showing 100% idle CPU :-)
437
438vmstat seems to do a fairly good job, but use a running tally to get a
439good idea. A one-off call to vmstat gives different numbers than a
440running stat. To do this, put an interval on your vmstat command
441line.</PRE>
442 and another suggestion from the same thread:
443<PRE>Subject: Re: Measuring the CPU usage of Freeswan
444 Date: Mon, 29 Jan 2001
445 From: Patrick Michael Kane &lt;modus@pr.es.to&gt;
446
447The only truly accurate way to accurately track FreeSWAN CPU usage is to use
448a CPU soaker. You run it on an unloaded system as a benchmark, then start up
449FreeSWAN and take the difference to determine how much FreeSWAN is eating.
450I believe someone has done this in the past, so you may find something in
451the FreeSWAN archives. If not, someone recently posted a URL to a CPU
452soaker benchmark tool on linux-kernel.</PRE>
453<HR>
454<A HREF="toc.html">Contents</A>
455<A HREF="interop.html">Previous</A>
456<A HREF="testing.html">Next</A>
457</BODY>
458</HTML>