[thirdparty/strongswan.git] / doc / background.html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<HTML>
<HEAD>
<TITLE>Introduction to FreeS/WAN</TITLE>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=iso-8859-1">
<STYLE TYPE="text/css"><!--
BODY { font-family: serif }
H1 { font-family: sans-serif }
H2 { font-family: sans-serif }
H3 { font-family: sans-serif }
H4 { font-family: sans-serif }
H5 { font-family: sans-serif }
H6 { font-family: sans-serif }
SUB { font-size: smaller }
SUP { font-size: smaller }
PRE { font-family: monospace }
--></STYLE>
</HEAD>
<BODY>
<A HREF="toc.html">Contents</A>
<A HREF="config.html">Previous</A>
<A HREF="user_examples.html">Next</A>
<HR>
<H1><A name="background">Linux FreeS/WAN background</A></H1>
<P>This section discusses a number of issues which have three things in
 common:</P>
<UL>
<LI>They are not specifically FreeS/WAN problems</LI>
<LI>You may have to understand them to get FreeS/WAN working right</LI>
<LI>They are not simple questions</LI>
</UL>
<P>Grouping them here lets us provide the explanations some users will
 need without unduly complicating the main text.</P>
<P>The explanations here are intended to be adequate for FreeS/WAN
 purposes (please comment to the<A href="mail.html"> users mailing list</A>
 if you don't find them so), but they are not trying to be complete or
 definitive. If you need more information, see the references provided
 in each section.</P>
<H2><A name="dns.background">Some DNS background</A></H2>
<P><A href="glossary.html#carpediem">Opportunistic encryption</A>
 requires that the gateway systems be able to fetch public keys, and
 other IPsec-related information, from each other's DNS (Domain Name
 Service) records.</P>
<P><A href="glossary.html#DNS">DNS</A> is a distributed database that
 maps names to IP addresses and vice versa.</P>
<P>Much good reference material is available for DNS, including:</P>
<UL>
<LI>the<A href="http://www.linuxdoc.org/HOWTO/DNS-HOWTO.html"> DNS HowTo</A>
</LI>
<LI>the standard<A href="biblio.html#DNS.book"> DNS reference</A> book</LI>
<LI><A href="http://www.linuxdoc.org/LDP/nag2/index.html">Linux Network
 Administrator's Guide</A></LI>
<LI><A href="http://www.nominum.com/resources/whitepapers/bind-white-paper.html">
BIND overview</A></LI>
<LI><A href="http://www.nominum.com/resources/documentation/Bv9ARM.pdf">
BIND 9 Administrator's Reference</A></LI>
</UL>
<P>We give only a brief overview here, intended to help you use DNS for
 FreeS/WAN purposes.</P>
<H3><A name="forward.reverse">Forward and reverse maps</A></H3>
<P>Although the implementation is distributed, it is often useful to
 speak of DNS as if it were just two enormous tables:</P>
<UL>
<LI>the forward map: look up a name, get an IP address</LI>
<LI>the reverse map: look up an IP address, get a name</LI>
</UL>
<P>Both maps can optionally contain additional data. For opportunistic
 encryption, we insert the data need for IPsec authentication.</P>
<P>A system named gateway.example.com with IP address 10.20.30.40 should
 have at least two DNS records, one in each map:</P>
<DL>
<DT>gateway.example.com. IN A 10.20.30.40</DT>
<DD>used to look up the name and get an IP address</DD>
<DT>40.30.20.10.in-addr.arpa. IN PTR gateway.example.com.</DT>
<DD>used for reverse lookups, looking up an address to get the
 associated name. Notice that the digits here are in reverse order; the
 actual address is 10.20.30.40 but we use 40.30.20.10 here.</DD>
</DL>
<H3><A NAME="17_1_2">Hierarchy and delegation</A></H3>
<P>For both maps there is a hierarchy of DNS servers and a system of
 delegating authority so that, for example:</P>
<UL>
<LI>the DNS administrator for example.com can create entries of the form<VAR>
 name</VAR>.example.com</LI>
<LI>the example.com admin cannot create an entry for counterexample.com;
 only someone with authority for .com can do that</LI>
<LI>an admin might have authority for 20.10.in-addr.arpa.</LI>
<LI>in either map, authority can be delegated
<UL>
<LI>the example.com admin could give you authority for
 westcoast.example.com</LI>
<LI>the 20.10.in-addr.arpa admin could give you authority for
 30.20.10.in-addr.arpa</LI>
</UL>
</LI>
</UL>
<P>DNS zones are the units of delegation. There is a hierarchy of zones.</P>
<H3><A NAME="17_1_3">Syntax of DNS records</A></H3>
<P>Returning to the example records:</P>
<PRE>        gateway.example.com. IN A 10.20.30.40
        40.30.20.10.in-addr.arpa. IN PTR gateway.example.com.</PRE>
<P>some syntactic details are:</P>
<UL>
<LI>the IN indicates that these records are for<STRONG> In</STRONG>
ternet addresses</LI>
<LI>The final periods in '.com.' and '.arpa.' are required. They
 indicate the root of the domain name system.</LI>
</UL>
<P>The capitalised strings after IN indicate the type of record.
 Possible types include:</P>
<UL>
<LI><STRONG>A</STRONG>ddress, for forward lookups</LI>
<LI><STRONG>P</STRONG>oin<STRONG>T</STRONG>e<STRONG>R</STRONG>, for
 reverse lookups</LI>
<LI><STRONG>C</STRONG>anonical<STRONG> NAME</STRONG>, records to support
 aliasing, multiple names for one address</LI>
<LI><STRONG>M</STRONG>ail e<STRONG>X</STRONG>change, used in mail
 routing</LI>
<LI><STRONG>SIG</STRONG>nature, used in<A href="glossary.html#SDNS">
 secure DNS</A></LI>
<LI><STRONG>KEY</STRONG>, used in<A href="glossary.html#SDNS"> secure
 DNS</A></LI>
<LI><STRONG>T</STRONG>e<STRONG>XT</STRONG>, a multi-purpose record type</LI>
</UL>
<P>To set up for opportunistic encryption, you add some TXT records to
 your DNS data. Details are in our<A href="quickstart.html"> quickstart</A>
 document.</P>
<H3><A NAME="17_1_4">Cacheing, TTL and propagation delay</A></H3>
<P>DNS information is extensively cached. With no caching, a lookup by
 your system of &quot;www.freeswan.org&quot; might involve:</P>
<UL>
<LI>your system asks your nameserver for &quot;www.freeswan.org&quot;</LI>
<LI>local nameserver asks root server about &quot;.org&quot;, gets reply</LI>
<LI>local nameserver asks .org nameserver about &quot;freeswan.org&quot;, gets
 reply</LI>
<LI>local nameserver asks freeswan.org nameserver about
 &quot;www.freeswan.org&quot;, gets reply</LI>
</UL>
<P>However, this can be a bit inefficient. For example, if you are in
 the Phillipines, the closest a root server is in Japan. That might send
 you to a .org server in the US, and then to freeswan.org in Holland. If
 everyone did all those lookups every time they clicked on a web link,
 the net would grind to a halt.</P>
<P>Nameservers therefore cache information they look up. When you click
 on another link at www.freeswan.org, your local nameserver has the IP
 address for that server in its cache, and no further lookups are
 required.</P>
<P>Intermediate results are also cached. If you next go to
 lists.freeswan.org, your nameserver can just ask the freeswan.org
 nameserver for that address; it does not need to query the root or .org
 nameservers because it has a cached address for the freeswan.org zone
 server.</P>
<P>Of course, like any cacheing mechanism, this can create problems of
 consistency. What if the administrator for freeswan.org changes the IP
 address, or the authentication key, for www.freeswan.org? If you use
 old information from the cache, you may get it wrong. On the other
 hand, you cannot afford to look up fresh information every time. Nor
 can you expect the freeswan.org server to notify you; that isn't in the
 protocols.</P>
<P>The solution that is in the protocols is fairly simple. Cacheable
 records are marked with Time To Live (TTL) information. When the time
 expires, the caching server discards the record. The next time someone
 asks for it, the server fetches a fresh copy. Of course, a server may
 also discard records before their TTL expires if it is running out of
 cache space.</P>
<P>This implies that there will be some delay before the new version of
 a changed record propagates around the net. Until the TTLs on all
 copies of the old record expire, some users will see it because that is
 what is in their cache. Other users may see the new record immediately
 because they don't have an old one cached.</P>
<H2><A name="MTU.trouble">Problems with packet fragmentation</A></H2>
<P>It seems, from mailing list reports, to be moderately common for
 problems to crop up in which small packets pass through the IPsec
 tunnels just fine but larger packets fail.</P>
<P>These problems are caused by various devices along the way
 mis-handling either packet fragments or<A href="glossary.html#pathMTU">
 path MTU discovery</A>.</P>
<P>IPsec makes packets larger by adding an ESP or AH header. This can
 tickle assorted bugs in fragment handling in routers and firewalls, or
 in path MTU discovery mechanisms, and cause a variety of symptoms which
 are both annoying and, often, quite hard to diagnose.</P>
<P>An explanation from project technical lead Henry Spencer:</P>
<PRE>The problem is IP fragmentation; more precisely, the problem is that the
second, third, etc. fragments of an IP packet are often difficult for
filtering mechanisms to classify.

Routers cannot rely on reassembling the packet, or remembering what was in
earlier fragments, because the fragments may be out of order or may even
follow different routes.  So any general, worst-case filtering decision
pretty much has to be made on each fragment independently.  (If the router
knows that it is the only route to the destination, so all fragments
*must* pass through it, reassembly would be possible... but most routers
don't want to bother with the complications of that.)

All fragments carry roughly the original IP header, but any higher-level
header is (for IP purposes) just the first part of the packet data... so
only the first fragment carries that.  So, for example, on examining the
second fragment of a TCP packet, you could tell that it's TCP, but not
what port number it is destined for -- that information is in the TCP
header, which appears in the first fragment only. 

The result of this classification difficulty is that stupid routers and
over-paranoid firewalls may just throw fragments away.  To get through
them, you must reduce your MTU enough that fragmentation will not occur.
(In some cases, they might be willing to attempt reassembly, but have very
limited resources to devote to it, meaning that packets must be small and
fragments few in number, leading to the same conclusion:  smaller MTU.)</PRE>
<P>In addition to the problem Henry describes, you may also have trouble
 with<A href="glossary.html#pathMTU"> path MTU discovery</A>.</P>
<P>By default, FreeS/WAN uses a large<A href="glossary.html#MTU"> MTU</A>
 for the ipsec device. This avoids some problems, but may complicate
 others. Here's an explanation from Claudia:</P>
<PRE>Here are a couple of pieces of background information. Apologies if you
have seen these already. An excerpt from one of my old posts:

    An MTU of 16260 on ipsec0 is usual. The IPSec device defaults to this 
    high MTU so that it does not fragment incoming packets before encryption 
    and encapsulation. If after IPSec processing packets are larger than 1500,
    [ie. the mtu of eth0] then eth0 will fragment them. 

    Adding IPSec headers adds a certain number of bytes to each packet. 
    The MTU of the IPSec interface refers to the maximum size of the packet
    before the IPSec headers are added. In some cases, people find it helpful 
    to set ipsec0's MTU to 1500-(IPSec header size), which IIRC is about 1430.

    That way, the resulting encapsulated packets don't exceed 1500. On most 
    networks, packets less than 1500 will not need to be fragmented.

and... (from Henry Spencer)

    The way it *ought* to work is that the MTU advertised by the ipsecN
    interface should be that of the underlying hardware interface, less a
    pinch for the extra headers needed. 

    Unfortunately, in certain situations this breaks many applications.
    There is a widespread implicit assumption that the smallest MTUs are 
    at the ends of paths, not in the middle, and another that MTUs are 
    never less than 1500.  A lot of code is unprepared to handle paths 
    where there is an &quot;interior minimum&quot; in the MTU, especially when it's 
    less than 1500. So we advertise a big MTU and just let the resulting 
    big packets fragment.

This usually works, but we do get bitten in cases where some intermediate
point can't handle all that fragmentation.  We can't win on this one.</PRE>
<P>The MTU can be changed with an<VAR> overridemtu=</VAR> statement in
 the<VAR> config setup</VAR> section of<A href="manpage.d/ipsec.conf.5.html">
 ipsec.conf.5</A>.</P>
<P>For a discussion of MTU issues and some possible solutions using
 Linux advanced routing facilities, see the<A href="http://www.linuxguruz.org/iptables/howto/2.4routing-15.html#ss15.6">
 Linux 2.4 Advanced Routing HOWTO</A>. For a discussion of MTU and NAT
 (Network Address Translation), see<A HREF="http://harlech.math.ucla.edu/services/ipsec.html">
 James Carter's MTU notes</A>.</P>
<H2><A name="nat.background">Network address translation (NAT)</A></H2>
<P><STRONG>N</STRONG>etwork<STRONG> A</STRONG>ddress<STRONG> T</STRONG>
ranslation is a service provided by some gateway machines. Calling it
 NAPT (adding the word<STRONG> P</STRONG>ort) would be more precise, but
 we will follow the widespread usage.</P>
<P>A gateway doing NAT rewrites the headers of packets it is forwarding,
 changing one or more of:</P>
<UL>
<LI>source address</LI>
<LI>source port</LI>
<LI>destination address</LI>
<LI>destination port</LI>
</UL>
<P>On Linux 2.4, NAT services are provided by the<A href="http://netfilter.samba.org">
 netfilter(8)</A> firewall code. There are several<A href="http://netfilter.samba.org/documentation/index.html#HOWTO">
 Netfilter HowTos</A> including one on NAT.</P>
<P>For older versions of Linux, this was referred to as &quot;IP masquerade&quot;
 and different tools were used. See this<A href="http://www.e-infomax.com/ipmasq/">
 resource page</A>.</P>
<P>Putting an IPsec gateway behind a NAT gateway is not recommended. See
 our<A href="firewall.html#NAT"> firewalls document</A>.</P>
<H3><A NAME="17_3_1">NAT to non-routable addresses</A></H3>
<P>The most common application of NAT uses private<A href="glossary.html#non-routable">
 non-routable</A> addresses.</P>
<P>Often a home or small office network will have:</P>
<UL>
<LI>one connection to the Internet</LI>
<LI>one assigned publicly visible IP address</LI>
<LI>several machines that all need access to the net</LI>
</UL>
<P>Of course this poses a problem since several machines cannot use one
 address. The best solution might be to obtain more addresses, but often
 this is impractical or uneconomical.</P>
<P>A common solution is to have:</P>
<UL>
<LI><A href="glossary.html#non-routable">non-routable</A> addresses on
 the local network</LI>
<LI>the gateway machine doing NAT</LI>
<LI>all packets going outside the LAN rewritten to have the gateway as
 their source address</LI>
</UL>
<P>The client machines are set up with reserved<A href="glossary.html#non-routable">
 non-routable</A> IP addresses defined in RFC 1918. The masquerading
 gateway, the machine with the actual link to the Internet, rewrites
 packet headers so that all packets going onto the Internet appear to
 come from one IP address, that of its Internet interface. It then gets
 all the replies, does some table lookups and more header rewriting, and
 delivers the replies to the appropriate client machines.</P>
<P>As far as anyone else on the Internet is concerned, the systems
 behind the gateway are completely hidden. Only one machine with one IP
 address is visible.</P>
<P>For IPsec on such a gateway, you can entirely ignore the NAT in:</P>
<UL>
<LI><A href="manpage.d/ipsec.conf.5.html">ipsec.conf(5)</A></LI>
<LI>firewall rules affecting your Internet-side interface</LI>
</UL>
<P>Those can be set up exactly as they would be if your gateway had no
 other systems behind it.</P>
<P>You do, however, have to take account of the NAT in firewall rules
 which affect packet forwarding.</P>
<H3><A NAME="17_3_2">NAT to routable addresses</A></H3>
<P>NAT to routable addresses is also possible, but is less common and
 may make for rather tricky routing problems. We will not discuss it
 here. See the<A href="http://netfilter.samba.org/documentation/index.html#HOWTO">
 Netfilter HowTos</A>.</P>
<HR>
<A HREF="toc.html">Contents</A>
<A HREF="config.html">Previous</A>
<A HREF="user_examples.html">Next</A>
</BODY>
</HTML>
Commit	Line	Data
997358a6 MW	1	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
	2	<HTML>
	3	<HEAD>
	4	<TITLE>Introduction to FreeS/WAN</TITLE>
	5	<META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=iso-8859-1">
	6	<STYLE TYPE="text/css"><!--
	7	BODY { font-family: serif }
	8	H1 { font-family: sans-serif }
	9	H2 { font-family: sans-serif }
	10	H3 { font-family: sans-serif }
	11	H4 { font-family: sans-serif }
	12	H5 { font-family: sans-serif }
	13	H6 { font-family: sans-serif }
	14	SUB { font-size: smaller }
	15	SUP { font-size: smaller }
	16	PRE { font-family: monospace }
	17	--></STYLE>
	18	</HEAD>
	19	<BODY>
	20	<A HREF="toc.html">Contents</A>
	21	<A HREF="config.html">Previous</A>
	22	<A HREF="user_examples.html">Next</A>
	23	<HR>
	24	<H1><A name="background">Linux FreeS/WAN background</A></H1>
	25	<P>This section discusses a number of issues which have three things in
	26	common:</P>
	27	<UL>
	28	<LI>They are not specifically FreeS/WAN problems</LI>
	29	<LI>You may have to understand them to get FreeS/WAN working right</LI>
	30	<LI>They are not simple questions</LI>
	31	</UL>
	32	<P>Grouping them here lets us provide the explanations some users will
	33	need without unduly complicating the main text.</P>
	34	<P>The explanations here are intended to be adequate for FreeS/WAN
	35	purposes (please comment to the<A href="mail.html"> users mailing list</A>
	36	if you don't find them so), but they are not trying to be complete or
	37	definitive. If you need more information, see the references provided
	38	in each section.</P>
	39	<H2><A name="dns.background">Some DNS background</A></H2>
	40	<P><A href="glossary.html#carpediem">Opportunistic encryption</A>
	41	requires that the gateway systems be able to fetch public keys, and
	42	other IPsec-related information, from each other's DNS (Domain Name
	43	Service) records.</P>
	44	<P><A href="glossary.html#DNS">DNS</A> is a distributed database that
	45	maps names to IP addresses and vice versa.</P>
	46	<P>Much good reference material is available for DNS, including:</P>
	47	<UL>
	48	<LI>the<A href="http://www.linuxdoc.org/HOWTO/DNS-HOWTO.html"> DNS HowTo</A>
	49	</LI>
	50	<LI>the standard<A href="biblio.html#DNS.book"> DNS reference</A> book</LI>
	51	<LI><A href="http://www.linuxdoc.org/LDP/nag2/index.html">Linux Network
	52	Administrator's Guide</A></LI>
	53	<LI><A href="http://www.nominum.com/resources/whitepapers/bind-white-paper.html">
	54	BIND overview</A></LI>
	55	<LI><A href="http://www.nominum.com/resources/documentation/Bv9ARM.pdf">
	56	BIND 9 Administrator's Reference</A></LI>
	57	</UL>
	58	<P>We give only a brief overview here, intended to help you use DNS for
	59	FreeS/WAN purposes.</P>
	60	<H3><A name="forward.reverse">Forward and reverse maps</A></H3>
	61	<P>Although the implementation is distributed, it is often useful to
	62	speak of DNS as if it were just two enormous tables:</P>
	63	<UL>
	64	<LI>the forward map: look up a name, get an IP address</LI>
65	<LI>the reverse map: look up an IP address, get a name</LI>
66	</UL>
67	<P>Both maps can optionally contain additional data. For opportunistic
68	encryption, we insert the data need for IPsec authentication.</P>
69	<P>A system named gateway.example.com with IP address 10.20.30.40 should
70	have at least two DNS records, one in each map:</P>
71	<DL>
72	<DT>gateway.example.com. IN A 10.20.30.40</DT>
73	<DD>used to look up the name and get an IP address</DD>
74	<DT>40.30.20.10.in-addr.arpa. IN PTR gateway.example.com.</DT>
75	<DD>used for reverse lookups, looking up an address to get the
76	associated name. Notice that the digits here are in reverse order; the
77	actual address is 10.20.30.40 but we use 40.30.20.10 here.</DD>
78	</DL>
79	<H3><A NAME="17_1_2">Hierarchy and delegation</A></H3>
80	<P>For both maps there is a hierarchy of DNS servers and a system of
81	delegating authority so that, for example:</P>
82	<UL>
83	<LI>the DNS administrator for example.com can create entries of the form<VAR>
84	name</VAR>.example.com</LI>
85	<LI>the example.com admin cannot create an entry for counterexample.com;
86	only someone with authority for .com can do that</LI>
87	<LI>an admin might have authority for 20.10.in-addr.arpa.</LI>
88	<LI>in either map, authority can be delegated
89	<UL>
90	<LI>the example.com admin could give you authority for
91	westcoast.example.com</LI>
92	<LI>the 20.10.in-addr.arpa admin could give you authority for
93	30.20.10.in-addr.arpa</LI>
94	</UL>
95	</LI>
96	</UL>
97	<P>DNS zones are the units of delegation. There is a hierarchy of zones.</P>
98	<H3><A NAME="17_1_3">Syntax of DNS records</A></H3>
99	<P>Returning to the example records:</P>
100	<PRE> gateway.example.com. IN A 10.20.30.40
101	40.30.20.10.in-addr.arpa. IN PTR gateway.example.com.</PRE>
102	<P>some syntactic details are:</P>
103	<UL>
104	<LI>the IN indicates that these records are for<STRONG> In</STRONG>
105	ternet addresses</LI>
106	<LI>The final periods in '.com.' and '.arpa.' are required. They
107	indicate the root of the domain name system.</LI>
108	</UL>
109	<P>The capitalised strings after IN indicate the type of record.
110	Possible types include:</P>
111	<UL>
112	<LI><STRONG>A</STRONG>ddress, for forward lookups</LI>
113	<LI><STRONG>P</STRONG>oin<STRONG>T</STRONG>e<STRONG>R</STRONG>, for
114	reverse lookups</LI>
115	<LI><STRONG>C</STRONG>anonical<STRONG> NAME</STRONG>, records to support
116	aliasing, multiple names for one address</LI>
117	<LI><STRONG>M</STRONG>ail e<STRONG>X</STRONG>change, used in mail
118	routing</LI>
119	<LI><STRONG>SIG</STRONG>nature, used in<A href="glossary.html#SDNS">
120	secure DNS</A></LI>
121	<LI><STRONG>KEY</STRONG>, used in<A href="glossary.html#SDNS"> secure
122	DNS</A></LI>
123	<LI><STRONG>T</STRONG>e<STRONG>XT</STRONG>, a multi-purpose record type</LI>
124	</UL>
125	<P>To set up for opportunistic encryption, you add some TXT records to
126	your DNS data. Details are in our<A href="quickstart.html"> quickstart</A>
127	document.</P>
128	<H3><A NAME="17_1_4">Cacheing, TTL and propagation delay</A></H3>
129	<P>DNS information is extensively cached. With no caching, a lookup by
130	your system of "www.freeswan.org" might involve:</P>
131	<UL>
132	<LI>your system asks your nameserver for "www.freeswan.org"</LI>
133	<LI>local nameserver asks root server about ".org", gets reply</LI>
134	<LI>local nameserver asks .org nameserver about "freeswan.org", gets
135	reply</LI>
136	<LI>local nameserver asks freeswan.org nameserver about
137	"www.freeswan.org", gets reply</LI>
138	</UL>
139	<P>However, this can be a bit inefficient. For example, if you are in
140	the Phillipines, the closest a root server is in Japan. That might send
141	you to a .org server in the US, and then to freeswan.org in Holland. If
142	everyone did all those lookups every time they clicked on a web link,
143	the net would grind to a halt.</P>
144	<P>Nameservers therefore cache information they look up. When you click
145	on another link at www.freeswan.org, your local nameserver has the IP
146	address for that server in its cache, and no further lookups are
147	required.</P>
148	<P>Intermediate results are also cached. If you next go to
149	lists.freeswan.org, your nameserver can just ask the freeswan.org
150	nameserver for that address; it does not need to query the root or .org
151	nameservers because it has a cached address for the freeswan.org zone
152	server.</P>
153	<P>Of course, like any cacheing mechanism, this can create problems of
154	consistency. What if the administrator for freeswan.org changes the IP
155	address, or the authentication key, for www.freeswan.org? If you use
156	old information from the cache, you may get it wrong. On the other
157	hand, you cannot afford to look up fresh information every time. Nor
158	can you expect the freeswan.org server to notify you; that isn't in the
159	protocols.</P>
160	<P>The solution that is in the protocols is fairly simple. Cacheable
161	records are marked with Time To Live (TTL) information. When the time
162	expires, the caching server discards the record. The next time someone
163	asks for it, the server fetches a fresh copy. Of course, a server may
164	also discard records before their TTL expires if it is running out of
165	cache space.</P>
166	<P>This implies that there will be some delay before the new version of
167	a changed record propagates around the net. Until the TTLs on all
168	copies of the old record expire, some users will see it because that is
169	what is in their cache. Other users may see the new record immediately
170	because they don't have an old one cached.</P>
171	<H2><A name="MTU.trouble">Problems with packet fragmentation</A></H2>
172	<P>It seems, from mailing list reports, to be moderately common for
173	problems to crop up in which small packets pass through the IPsec
174	tunnels just fine but larger packets fail.</P>
175	<P>These problems are caused by various devices along the way
176	mis-handling either packet fragments or<A href="glossary.html#pathMTU">
177	path MTU discovery</A>.</P>
178	<P>IPsec makes packets larger by adding an ESP or AH header. This can
179	tickle assorted bugs in fragment handling in routers and firewalls, or
180	in path MTU discovery mechanisms, and cause a variety of symptoms which
181	are both annoying and, often, quite hard to diagnose.</P>
182	<P>An explanation from project technical lead Henry Spencer:</P>
183	<PRE>The problem is IP fragmentation; more precisely, the problem is that the
184	second, third, etc. fragments of an IP packet are often difficult for
185	filtering mechanisms to classify.
186
187	Routers cannot rely on reassembling the packet, or remembering what was in
188	earlier fragments, because the fragments may be out of order or may even
189	follow different routes. So any general, worst-case filtering decision
190	pretty much has to be made on each fragment independently. (If the router
191	knows that it is the only route to the destination, so all fragments
192	must pass through it, reassembly would be possible... but most routers
193	don't want to bother with the complications of that.)
194
195	All fragments carry roughly the original IP header, but any higher-level
196	header is (for IP purposes) just the first part of the packet data... so
197	only the first fragment carries that. So, for example, on examining the
198	second fragment of a TCP packet, you could tell that it's TCP, but not
199	what port number it is destined for -- that information is in the TCP
200	header, which appears in the first fragment only.
201
202	The result of this classification difficulty is that stupid routers and
203	over-paranoid firewalls may just throw fragments away. To get through
204	them, you must reduce your MTU enough that fragmentation will not occur.
205	(In some cases, they might be willing to attempt reassembly, but have very
206	limited resources to devote to it, meaning that packets must be small and
207	fragments few in number, leading to the same conclusion: smaller MTU.)</PRE>
208	<P>In addition to the problem Henry describes, you may also have trouble
209	with<A href="glossary.html#pathMTU"> path MTU discovery</A>.</P>
210	<P>By default, FreeS/WAN uses a large<A href="glossary.html#MTU"> MTU</A>
211	for the ipsec device. This avoids some problems, but may complicate
212	others. Here's an explanation from Claudia:</P>
213	<PRE>Here are a couple of pieces of background information. Apologies if you
214	have seen these already. An excerpt from one of my old posts:
215
216	An MTU of 16260 on ipsec0 is usual. The IPSec device defaults to this
217	high MTU so that it does not fragment incoming packets before encryption
218	and encapsulation. If after IPSec processing packets are larger than 1500,
219	[ie. the mtu of eth0] then eth0 will fragment them.
220
221	Adding IPSec headers adds a certain number of bytes to each packet.
222	The MTU of the IPSec interface refers to the maximum size of the packet
223	before the IPSec headers are added. In some cases, people find it helpful
224	to set ipsec0's MTU to 1500-(IPSec header size), which IIRC is about 1430.
225
226	That way, the resulting encapsulated packets don't exceed 1500. On most
227	networks, packets less than 1500 will not need to be fragmented.
228
229	and... (from Henry Spencer)
230
231	The way it ought to work is that the MTU advertised by the ipsecN
232	interface should be that of the underlying hardware interface, less a
233	pinch for the extra headers needed.
234
235	Unfortunately, in certain situations this breaks many applications.
236	There is a widespread implicit assumption that the smallest MTUs are
237	at the ends of paths, not in the middle, and another that MTUs are
238	never less than 1500. A lot of code is unprepared to handle paths
239	where there is an "interior minimum" in the MTU, especially when it's
240	less than 1500. So we advertise a big MTU and just let the resulting
241	big packets fragment.
242
243	This usually works, but we do get bitten in cases where some intermediate
244	point can't handle all that fragmentation. We can't win on this one.</PRE>
245	<P>The MTU can be changed with an<VAR> overridemtu=</VAR> statement in
246	the<VAR> config setup</VAR> section of<A href="manpage.d/ipsec.conf.5.html">
247	ipsec.conf.5</A>.</P>
248	<P>For a discussion of MTU issues and some possible solutions using
249	Linux advanced routing facilities, see the<A href="http://www.linuxguruz.org/iptables/howto/2.4routing-15.html#ss15.6">
250	Linux 2.4 Advanced Routing HOWTO</A>. For a discussion of MTU and NAT
251	(Network Address Translation), see<A HREF="http://harlech.math.ucla.edu/services/ipsec.html">
252	James Carter's MTU notes</A>.</P>
253	<H2><A name="nat.background">Network address translation (NAT)</A></H2>
254	<P><STRONG>N</STRONG>etwork<STRONG> A</STRONG>ddress<STRONG> T</STRONG>
255	ranslation is a service provided by some gateway machines. Calling it
256	NAPT (adding the word<STRONG> P</STRONG>ort) would be more precise, but
257	we will follow the widespread usage.</P>
258	<P>A gateway doing NAT rewrites the headers of packets it is forwarding,
259	changing one or more of:</P>
260	<UL>
261	<LI>source address</LI>
262	<LI>source port</LI>
263	<LI>destination address</LI>
264	<LI>destination port</LI>
265	</UL>
266	<P>On Linux 2.4, NAT services are provided by the<A href="http://netfilter.samba.org">
267	netfilter(8)</A> firewall code. There are several<A href="http://netfilter.samba.org/documentation/index.html#HOWTO">
268	Netfilter HowTos</A> including one on NAT.</P>
269	<P>For older versions of Linux, this was referred to as "IP masquerade"
270	and different tools were used. See this<A href="http://www.e-infomax.com/ipmasq/">
271	resource page</A>.</P>
272	<P>Putting an IPsec gateway behind a NAT gateway is not recommended. See
273	our<A href="firewall.html#NAT"> firewalls document</A>.</P>
274	<H3><A NAME="17_3_1">NAT to non-routable addresses</A></H3>
275	<P>The most common application of NAT uses private<A href="glossary.html#non-routable">
276	non-routable</A> addresses.</P>
277	<P>Often a home or small office network will have:</P>
278	<UL>
279	<LI>one connection to the Internet</LI>
280	<LI>one assigned publicly visible IP address</LI>
281	<LI>several machines that all need access to the net</LI>
282	</UL>
283	<P>Of course this poses a problem since several machines cannot use one
284	address. The best solution might be to obtain more addresses, but often
285	this is impractical or uneconomical.</P>
286	<P>A common solution is to have:</P>
287	<UL>
288	<LI><A href="glossary.html#non-routable">non-routable</A> addresses on
289	the local network</LI>
290	<LI>the gateway machine doing NAT</LI>
291	<LI>all packets going outside the LAN rewritten to have the gateway as
292	their source address</LI>
293	</UL>
294	<P>The client machines are set up with reserved<A href="glossary.html#non-routable">
295	non-routable</A> IP addresses defined in RFC 1918. The masquerading
296	gateway, the machine with the actual link to the Internet, rewrites
297	packet headers so that all packets going onto the Internet appear to
298	come from one IP address, that of its Internet interface. It then gets
299	all the replies, does some table lookups and more header rewriting, and
300	delivers the replies to the appropriate client machines.</P>
301	<P>As far as anyone else on the Internet is concerned, the systems
302	behind the gateway are completely hidden. Only one machine with one IP
303	address is visible.</P>
304	<P>For IPsec on such a gateway, you can entirely ignore the NAT in:</P>
305	<UL>
306	<LI><A href="manpage.d/ipsec.conf.5.html">ipsec.conf(5)</A></LI>
307	<LI>firewall rules affecting your Internet-side interface</LI>
308	</UL>
309	<P>Those can be set up exactly as they would be if your gateway had no
310	other systems behind it.</P>
311	<P>You do, however, have to take account of the NAT in firewall rules
312	which affect packet forwarding.</P>
313	<H3><A NAME="17_3_2">NAT to routable addresses</A></H3>
314	<P>NAT to routable addresses is also possible, but is less common and
315	may make for rather tricky routing problems. We will not discuss it
316	here. See the<A href="http://netfilter.samba.org/documentation/index.html#HOWTO">
317	Netfilter HowTos</A>.</P>
318	<HR>
319	<A HREF="toc.html">Contents</A>
320	<A HREF="config.html">Previous</A>
321	<A HREF="user_examples.html">Next</A>
322	</BODY>
323	</HTML>