doc/opportunism.nr

   1 .DA "3 May 2001"
   2 .ds LH "
   3 .ds CH "Opportunistic Encryption
   4 .ds RH "
   5 .ds LF "Draft 4+
   6 .ds CF "\\*(DY
   7 .ds RF %
   8 .de P
   9 .LP
  10 ..
  11 .de R
  12 .LP
  13 \fBRationale:\fR
  14 ..
  15 .de A
  16 .LP
  17 \fBAhem:\fR
  18 ..
  19 .TL
  20 Opportunistic Encryption
  21 .AU
  22 Henry Spencer
  23 D. Hugh Redelmeier
  24 .AI
  25 henry@spsystems.net
  26 hugh@mimosa.com
  27 Linux FreeS/WAN Project
  28 .AB no
  29 xxx cases where reverses not controlled, all possibilities.
  30 xxx DHR suggests okay if gateway doesn't control reverse but destination does.
  31 xxx level of patience where Responder just doesn't answer the phone.
  32 xxx IKE finger to get basic keying info, to be confirmed via DNSSEC?
  33 xxx packets from some OE connections might get special status,
  34 if the other end is definitely someone we trust.
  35 Opportunistic encryption permits secure (encrypted, authenticated)
  36 communication via IPsec without connection-by-connection prearrangement,
  37 either explicitly between hosts (when the hosts are capable of it) or
  38 transparently via packet-intercepting security gateways.
  39 It uses DNS records (authenticated with DNSSEC) to provide
  40 the necessary information for gateway discovery and gateway authentication,
  41 and constrains negotiation enough to guarantee success.
  42 .sp
  43 Substantive changes since draft 3:
  44 write off inverse queries as a lost cause;
  45 use Invalid-SPI rather than Delete as notification of unknown SA;
  46 minor wording improvements and clarifications.
  47 This document takes over from the older ``Implementing Opportunistic
  48 Encryption'' document.
  49 .AE
  50 .NH 1
  51 Introduction
  52 .P
  53 A major goal of the FreeS/WAN project is opportunistic encryption:
  54 a (security) gateway intercepts an outgoing packet aimed at a
  55 remote host, and quickly attempts to negotiate an IPsec tunnel to that
  56 host's security gateway.
  57 If the attempt succeeds, traffic can then be secure,
  58 transparently (without changes to the host software).
  59 If the attempt fails,
  60 the packet (or a retry thereof) passes through in clear or is dropped,
  61 depending on local policy.
  62 Prearranged tunnels bypass the packet interception etc., so static VPNs
  63 can coexist with opportunistic encryption.
  64 .P
  65 This generalizes trivially to the end-to-end case:
  66 host and security gateway simply are one and the same.
  67 Some optimizations are possible in that case,
  68 but the basic scheme need not change.
  69 .P
  70 The objectives for security systems need to be explicitly stated.
  71 Opportunistic encryption is meant to achieve secure communication,
  72 without prearrangement of the individual connection
  73 (although some prearrangement on a per-host basis is required),
  74 between any two hosts which implement the protocol
  75 (and, if they act as security gateways,
  76 between hosts behind them).
  77 Here ``secure'' means strong encryption and authentication of packets,
  78 with authentication of participants\(emto prevent man-in-the-middle
  79 and impersonation attacks\(emdependent on several factors.
  80 The biggest factor is the authentication of DNS records,
  81 via DNSSEC or equivalent means.
  82 A lesser factor is which exact variant
  83 of the setup procedure (see section 2.2) is used,
  84 because there is a tradeoff between strong authentication of the other end
  85 and ability
  86 to negotiate opportunistic encryption with hosts which have limited
  87 or no control of their reverse-map DNS records:
  88 without reverse-map information,
  89 we can verify that the host has the right to use a particular FQDN
  90 (Fully Qualified Domain Name),
  91 but not whether that FQDN is authorized to use that IP address.
  92 Local policy must decide whether authentication
  93 or connectivity has higher priority.
  94 .P
  95 Apart from careful attention to detail in various areas,
  96 there are three crucial design problems for opportunistic encryption.
  97 It needs a way to quickly identify the remote host's security gateway.
  98 It needs a way to quickly obtain an authentication key for the
  99 security gateway.
 100 And the numerous options which can be specified with IKE
 101 must be constrained sufficiently that two independent implementations are
 102 guaranteed to reach agreement,
 103 without any explicit prearrangement or preliminary negotiation.
 104 The first two problems are solved using DNS,
 105 with DNSSEC ensuring that the data obtained is reliable;
 106 the third is solved by specifying a minimum standard which must be supported.
 107 .P
 108 A note on philosophy:
 109 we have deliberately avoided providing six different
 110 ways to do each job, in favor of specifying one good one.
 111 Choices are
 112 provided only when they appear to be necessary,
 113 or at least important.
 114 .P
 115 A note on terminology:
 116 to avoid constant circumlocutions,
 117 an ISAKMP/IKE SA, possibly recreated occasionally by rekeying,
 118 will be referred to as a ``keying channel'',
 119 and a set of IPsec SAs providing bidirectional communication between
 120 two IPsec hosts,
 121 possibly recreated occasionally by rekeying,
 122 will be referred to as a ``tunnel''
 123 (it could conceivably use transport mode in the host-to-host case,
 124 but we advocate using tunnel mode even there).
 125 The word ``connection'' is here used in a more generic sense.
 126 The word ``lifetime'' will be avoided in favor of ``rekeying interval'',
 127 since many of the connections will have useful lives far shorter
 128 than any reasonable rekeying interval,
 129 and hence the two concepts must be separated.
 130 .P
 131 A note on document structure:
 132 Discussions of \fIwhy\fR things were done a particular way,
 133 or not done a particular way,
 134 are broken out in paragraphs headed ``Rationale:''
 135 (to preserve the flow of the text, many such paragraphs are deferred
 136 to the ends of sections).
 137 Paragraphs headed ``Ahem:'' are discussions of where the problem is being
 138 made significantly harder by problems elsewhere,
 139 and how that might be corrected.
 140 Some meta-comments are enclosed in [].
 141 .R
 142 The motive is to get the Internet encrypted.
 143 That requires encryption without connection-by-connection prearrangement:
 144 a system must be able to
 145 reliably negotiate an encrypted, authenticated
 146 connection with a total stranger.
 147 While end-to-end encryption is preferable,
 148 doing opportunistic encryption in security gateways
 149 gives enormous leverage for quick deployment of this technology,
 150 in a world where end-host software is often primitive, rigid, and outdated.
 151 .R
 152 Speed is of the essence in tunnel setup:
 153 a connection-establishment delay longer than about 10 seconds
 154 begins to cause problems for users and applications.
 155 Thus the emphasis on rapidity in gateway discovery and key fetching.
 156 .A
 157 Host-to-host opportunistic encryption
 158 would be utterly trivial if a fast public-key
 159 encryption/signature
 160 algorithm was available.
 161 You would do a reverse lookup on the destination address to obtain a
 162 public key for that address,
 163 and simply encrypt all packets going to it with that key,
 164 signing them with your own private key.
 165 Alas, this is impractical with current CPU speeds and current algorithms
 166 (although as noted later, it might be of some use for limited purposes).
 167 Nevertheless, it is a useful model.
 168 .NH 1
 169 Connection Setup
 170 .P
 171 For purposes of discussion, the network is taken to look like this:
 172 .DS
 173 Source----Initiator----...----Responder----Destination
 174 .DE
 175 The intercepted packet comes from the Source,
 176 bound for the Destination,
 177 and is intercepted at the Initiator.
 178 The Initiator communicates over the insecure Internet to the Responder.
 179 The Source and the Initiator might be the same host,
 180 or the Source might be an end-user host and the Initiator a
 181 security gateway (SG).
 182 Likewise for the Responder and the Destination.
 183 .P
 184 Given an intercepted packet,
 185 whose useful information (for our purposes)
 186 is essentially only the Destination's IP address,
 187 the Initiator
 188 must quickly determine the Responder (the Destination's SG) and
 189 fetch everything needed to authenticate it.
 190 The Responder must do likewise for the Initiator.
 191 Both must eventually also confirm that the other is authorized to act
 192 on behalf of the client host behind it (if any).
 193 .P
 194 An important subtlety here is that if the alternative to an IPsec tunnel
 195 is plaintext transmission, negative results must be obtained quickly.
 196 That is,
 197 the decision that \fIno\fR tunnel can be established must also be made rapidly.
 198 .NH 2
 199 Packet Interception
 200 .P
 201 Interception of outgoing packets is relatively straightforward
 202 in principle.
 203 It is preferable to put the intercepted packet on hold rather than
 204 dropping it, since higher-level retries are not necessarily well-timed.
 205 There is a problem of hosts and applications retrying during negotiations.
 206 ARP implementations, which face the same problem,
 207 use the approach of keeping the \fImost recent\fR
 208 packet for an as-yet-unresolved address,
 209 and throwing away older ones.
 210 (Incrementing of request numbers etc. means that replies to older ones may no
 211 longer be accepted.)
 212 .P
 213 Is it worth intercepting \fIincoming\fR packets, from the outside world, and
 214 attempting tunnel setup based on them?
 215 No, unless and until a way can be devised to initiate opportunistic encryption
 216 to a non-opportunistic responder,
 217 because
 218 if the other end has not initiated tunnel setup itself, it will not be
 219 prepared to do so at our request.
 220 .R
 221 Note, however, that most incoming packets will promptly be followed by
 222 an outgoing packet in response!
 223 Conceivably it might be useful to start early stages of negotiation,
 224 at least as far as looking up information,
 225 in response to an incoming packet.
 226 .R
 227 If a plaintext incoming packet indicates that the other
 228 end is not prepared to do opportunistic encryption,
 229 it might seem that this fact should be noted, to
 230 avoid consuming resources and delaying
 231 traffic in an attempt at opportunistic setup which is doomed to fail.
 232 However, this would be a major security hole,
 233 since the plaintext packet is not authenticated;
 234 see section 2.5.
 235 .NH 2
 236 Algorithm
 237 .P
 238 For clarity,
 239 the following defers most discussion of error handling to the end.
 240 .nr x \w'Step 3A.'u+1n
 241 .de S
 242 .IP "Step \\$1." \nxu
 243 ..
 244 .S 1
 245 Initiator does a DNS reverse lookup on the Destination address,
 246 asking not for the usual PTR records,
 247 but for TXT records.
 248 Meanwhile, Initiator also sends a ping to the Destination,
 249 to cause any other dynamic setup actions to start happening.
 250 (Ping replies are disregarded;
 251 the host might not be reachable with plaintext pings.)
 252 .S 2A
 253 If at least one suitable TXT record (see section 2.3) comes back,
 254 each contains a potential Responder's IP address
 255 and that Responder's public key (or where to find it).
 256 Initiator picks one TXT record, based on priority (see 2.3),
 257 thus picking a Responder.
 258 If there was no public key in the TXT record,
 259 the Initiator also starts a DNS lookup (as specified by the TXT record)
 260 to get KEY records.
 261 .S 2B
 262 If no suitable TXT record is available,
 263 and policy permits,
 264 Initiator designates the Destination itself as the Responder
 265 (see section 2.4).
 266 If policy does not permit,
 267 or the Destination is unresponsive to the negotiation,
 268 then opportunistic encryption is not possible,
 269 and Initiator gives up (see section 2.5).
 270 .S 3
 271 If there already is a keying channel to the Responder's IP address,
 272 the Initiator uses the existing keying channel;
 273 skip to step 10.
 274 Otherwise, the Initiator starts an IKE Phase 1 negotiation
 275 (see section 2.7 for details)
 276 with the Responder.
 277 The address family of the Responder's IP address dictates whether
 278 the keying channel and the outside of the tunnel should be IPv4 or IPv6.
 279 .S 4
 280 Responder gets the first IKE message,
 281 and responds.
 282 It also starts a DNS reverse lookup on the Initiator's IP address,
 283 for KEY records, on speculation.
 284 .S 5
 285 Initiator gets Responder's reply,
 286 and sends first message of IKE's D-H exchange (see 2.4).
 287 .S 6
 288 Responder gets Initiator's D-H message,
 289 and responds with a matching one.
 290 .S 7
 291 Initiator gets Responder's D-H message;
 292 encryption is now established, authentication remains to be done.
 293 Initiator sends IKE authentication message,
 294 with an FQDN identity if a reverse lookup on its address will not yield a
 295 suitable KEY record.
 296 (Note, an FQDN need not
 297 actually correspond to a host\(eme.g., the DNS data for it need not
 298 include an A record.)
 299 .S 8
 300 Responder gets Initiator's authentication message.
 301 If there is no identity included,
 302 Responder waits for step 4's speculative DNS lookup to finish;
 303 it should yield a suitable KEY record (see 2.3).
 304 If there is an FQDN identity,
 305 responder discards any data obtained from step 4's DNS lookup;
 306 does a forward lookup on the FQDN, for a KEY record;
 307 waits for that lookup to return;
 308 it should yield a suitable KEY record.
 309 Either way, Responder uses the KEY data to verify the message's hash.
 310 Responder replies with an authentication message,
 311 with an FQDN identity if a reverse lookup on its address will not yield a
 312 suitable KEY record.
 313 .S 9A
 314 (If step 2A was used.)
 315 The Initiator gets the Responder's authentication message.
 316 Step 2A has provided a key (from the TXT record or via DNS lookup).
 317 Verify message's hash.
 318 Encrypted and authenticated keying channel established,
 319 man-in-middle attack precluded.
 320 .S 9B
 321 (If step 2B was used.)
 322 The Initiator gets the Responder's authentication message,
 323 which must contain an FQDN identity (if the Responder can't put a TXT in his
 324 reverse map he presumably can't do a KEY either).
 325 Do forward lookup on the FQDN,
 326 get suitable KEY record, verify hash.
 327 Encrypted keying channel established,
 328 man-in-middle attack precluded,
 329 but authentication weak (see 2.4).
 330 .S 10
 331 Initiator initiates IKE Phase 2 negotiation (see 2.7) to establish tunnel,
 332 specifying Source and Destination identities as IP addresses (see 2.6).
 333 The address family of those addresses also determines whether the inside
 334 of the tunnel should be IPv4 or IPv6.
 335 .S 11
 336 Responder gets first Phase 2 message.
 337 Now the Responder finally knows what's going on!
 338 Unless the specified Source is identical to the Initiator,
 339 Responder initiates DNS reverse lookup on Source IP address,
 340 for TXT records;
 341 waits for result;
 342 gets suitable TXT record(s) (see 2.3),
 343 which should contain either the Initiator's IP address
 344 or an FQDN identity identical to that supplied by the Initiator in step 7.
 345 This verifies that the Initiator is authorized
 346 to act as SG for the Source.
 347 Responder replies with second Phase 2 message,
 348 selecting acceptable details (see 2.7),
 349 and establishes tunnel.
 350 .S 12
 351 Initiator gets second Phase 2 message,
 352 establishes tunnel (if he didn't already),
 353 and releases the intercepted packet into it, finally.
 354 .S 13
 355 Communication proceeds.
 356 See section 3 for what happens later.
 357 .P
 358 As additional information becomes available,
 359 notably in steps 1, 2, 4, 8, 9, 11, and 12,
 360 there is always a possibility that local policy
 361 (e.g., access limitations) might prevent further progress.
 362 Whenever possible,
 363 at least attempt to inform the other end of this.
 364 .P
 365 At any time, there is a possibility of the negotiation failing due to
 366 unexpected responses, e.g. the Responder not responding at all
 367 or rejecting all Initiator's proposals.
 368 If multiple SGs were found as possible Responders,
 369 the Initiator should try at least one more before giving up.
 370 The number tried should be influenced by what the alternative is:
 371 if the traffic will otherwise be discarded, trying the full list is
 372 probably appropriate,
 373 while if the alternative is plaintext transmission,
 374 it might be based on how long the tries are taking.
 375 The Initiator should try as many as it reasonably can,
 376 ideally all of them.
 377 .P
 378 There is a sticky problem with timeouts.
 379 If the Responder is down
 380 or otherwise inaccessible, in the worst case we won't hear about this
 381 except by not getting responses.
 382 Some other, more pathological or even
 383 evil, failure cases can have the same result.
 384 The problem is that in the
 385 case where plaintext is permitted, we want to decide whether a tunnel is
 386 possible quickly.
 387 There is no good solution to this, alas;
 388 we just have to take the time and do it right.
 389 (Passing plaintext meanwhile
 390 looks attractive at first glance... but exposing
 391 the first few seconds of a connection is often almost as bad as exposing
 392 the whole thing.
 393 Worse, if the user checks the status of the connection,
 394 after that brief window it looks secure!)
 395 .P
 396 The flip side of waiting for a timeout is that all other forms of
 397 feedback, e.g. ``host not reachable'',
 398 arguably should be \fIignored\fR,
 399 because in the absence of authenticated ICMP,
 400 you cannot trust them!
 401 .R
 402 An alternative, sometimes suggested, to the use of explicit DNS records
 403 for SG discovery is to directly attempt IKE negotiation with the
 404 destination host,
 405 and assume that any relevant SG will be on the packet path,
 406 will intercept the IKE packets,
 407 and will impersonate the destination host for the IKE negotiation.
 408 This is superficially attractive but is a very bad idea.
 409 It assumes that routing is stable throughout negotiation,
 410 that the SG is on the plaintext-packets path,
 411 and that the destination host is routable
 412 (yes, it is possible to have (private) DNS data for an unroutable host).
 413 Playing extra games in the plaintext-packet path hurts performance and
 414 can be expected to be unpopular.
 415 Various difficulties ensue when there are multiple SGs along the path
 416 (there is already bad experience with this, in RSVP),
 417 and the presence of even one can make it impossible
 418 to do IKE direct to the host when that is what's wanted.
 419 Worst of all, such impersonation breaks the IP network model badly,
 420 making problems difficult to diagnose and impossible to work around
 421 (and there is already bad experience with this, in areas like web caching).
 422 .R
 423 (Step 1.)
 424 Dynamic setup actions might include establishment of demand-dialed links.
 425 These might be present anywhere along the path,
 426 so one cannot rely on out-of-band communication at the Initiator to
 427 trigger them.
 428 Hence the ping.
 429 .R
 430 (Step 2.)
 431 In many cases, the IP address on the intercepted packet will be the
 432 result of a name lookup just done.
 433 Inverse queries, an obscure DNS feature from the distant past,
 434 in theory can be used to ask a DNS server to reverse that lookup,
 435 giving the name that produced the address.
 436 This is not the same as a reverse lookup,
 437 and the difference can matter a great deal in cases where a host
 438 does not control its reverse map
 439 (e.g., when the host's IP address is dynamically assigned).
 440 Unfortunately, inverse queries were never widely implemented and
 441 are now considered obsolete.
 442 Phooey.
 443 .A
 444 Support for a small subset of this admittedly-obscure feature
 445 would be useful.
 446 Unfortunately, it seems unlikely.
 447 .R
 448 (Step 3.)
 449 Using only IP addresses to decide whether there is already a relevant
 450 keying channel avoids some
 451 difficult problems.
 452 In particular, it might seem that this should be based on identities,
 453 but those are not known until very late in IKE Phase 1 negotiations.
 454 .R
 455 (Step 4.)
 456 The DNS lookup is done on speculation
 457 because the data will probably be useful and the lookup can be done
 458 in parallel with IKE activity,
 459 potentially speeding things up.
 460 .R
 461 (Steps 7 and 8.)
 462 If an SG does not control its reverse map,
 463 there is no way it can prove its right to use an IP address,
 464 but it can nevertheless supply both an identity (as an FQDN) and
 465 proof of its right to use that identity.
 466 This is somewhat better than nothing,
 467 and may be quite useful if the SG is representing a client host
 468 which \fIcan\fR prove its right to \fIits\fR IP address.
 469 (For example, a fixed-address subnet might live behind an SG with
 470 a dynamically-assigned address;
 471 such an SG has to be the Initiator, not the Responder,
 472 so the subnet's TXT records can contain FQDN identities,
 473 but with that restriction, this works.)
 474 It might sound like this would permit some man-in-the-middle attacks
 475 in important cases like Road Warrior,
 476 but the RW can still do full authentication of the home base,
 477 so a man in the middle cannot successfully impersonate home base,
 478 and the D-H exchange doesn't work unless the man in the middle
 479 impersonates \fIboth\fR ends.
 480 .R
 481 (Steps 7 and 8.)
 482 Another situation where proof of the right to use an identity can be
 483 very useful is when access is deliberately limited.
 484 While opportunistic encryption is intended as a general-purpose
 485 connection mechanism between strangers,
 486 it may well be convenient for prearranged connections to use
 487 the same mechanism.
 488 .R
 489 (Steps 7 and 8.)
 490 FQDNs as identities are avoided where possible,
 491 since they can involve synchronous DNS lookups.
 492 .R
 493 (Step 11.)
 494 Note that only here, in Phase 2,
 495 does the Responder actually learn who the
 496 Source and Destination hosts are.
 497 This unfortunately demands a synchronous DNS lookup to verify that the
 498 Initiator is authorized to represent the Source,
 499 unless they are one and the same.
 500 This and the initial TXT lookup are the only synchronous DNS lookups
 501 absolutely required by the algorithm,
 502 and they appear to be unavoidable.
 503 .R
 504 While it might seem unlikely that a refusal to cooperate from one SG
 505 could be remedied by trying another\(empresumably they all use the
 506 same policies\(emit's conceivable that one might be misconfigured.
 507 Preferably they should all be tried,
 508 but it may be necessary to set some limits on this
 509 if alternatives exist.
 510 .NH 2
 511 DNS Records
 512 .P
 513 Gateway discovery and key lookup are based on TXT and KEY DNS records.
 514 The TXT record specifies IP address or other identity of a host's SG,
 515 and possibly supplies its public key as well,
 516 while the KEY record supplies public keys not found in TXT records.
 517 .NH 3
 518 TXT
 519 .P
 520 Opportunistic-encryption SG discovery uses TXT records with the content:
 521 .DS
 522 X-IPsec-Gateway(\fInnn\fR)=\fIiii\fR\ \fIkkk\fR
 523 .DE
 524 following RFC 1464 attribute/value
 525 notation.
 526 Records which
 527 do not contain an ``='',
 528 or which do not have exactly the specified form to the left of it,
 529 are ignored.
 530 (Near misses perhaps should be reported.)
 531 .P
 532 The \fInnn\fR is an unsigned integer which will fit in 16 bits,
 533 specifying an MX-style preference
 534 (lower number = stronger preference) to
 535 control the order in which multiple SGs are tried.
 536 If there are ties, pick one,
 537 randomly enough that the choice will probably be different each time.
 538 xxx rollover.
 539 The preference field is not optional;
 540 use ``0'' if there is no meaningful preference ordering.
 541 .P
 542 The \fIiii\fR part identifies the SG.
 543 Normally this is a dotted-decimal IPv4 address or
 544 a colon-hex IPv6 address.
 545 The sole exception is if the SG has no fixed address (see 2.4) but
 546 the host(s) behind it do,
 547 in which case \fIiii\fR is of the form ``@fqdn'',
 548 where \fIfqdn\fR is the FQDN that the SG will use to
 549 identify itself (in step 7 of section 2.2);
 550 such a record cannot be used for SG discovery by an Initiator,
 551 but can be used for
 552 SG verification (step 11 of 2.2) by a Responder.
 553 .P
 554 The \fIkkk\fR part is optional.
 555 If it is present,
 556 it is an RSA-MD5 public key in base-64 notation, as in the text
 557 form of an RFC 2535 KEY record.
 558 If it is not present,
 559 this specifies that the public key can be found in a KEY
 560 record located based on the SG's identification:
 561 if \fIiii\fR is an IP address,
 562 do a reverse lookup on that address,
 563 else do a forward lookup on the FQDN.
 564 .R
 565 While it is unusual for a reverse lookup to go for records other than PTR
 566 records (or possibly CNAME records, for RFC 2317 classless delegation),
 567 there's no reason why it can't.
 568 The TXT record is a temporary stand-in
 569 for (we hope, someday) a new DNS record for SG identification and keying.
 570 Keeping the setup process fast requires minimizing the number of DNS
 571 lookups, hence the desire to put all the information in one place.
 572 .R
 573 The use of RFC 1464 notation avoids collisions with other uses of TXT
 574 records.
 575 The ``X-'' in the attribute name
 576 indicates that this format is tentative and experimental;
 577 this design will probably need modification after initial experiments.
 578 The format is chosen with an eye on eventual binary encoding.
 579 Note, in particular,
 580 that the TXT record normally contains the \fIaddress\fR of the SG,
 581 not (repeat, not) its name.
 582 Name-to-address conversion is the job of
 583 whatever generates the TXT record,
 584 which is expected to be a program, not a human\(emthis is conceptually
 585 a \fIbinary\fR record, temporarily using a text encoding.
 586 The ``@fqdn'' form of the SG identity is
 587 for specialized uses and is never mapped to an address.
 588 .A
 589 A DNS TXT record contains one or more character strings,
 590 but RFC 1035 does not describe exactly how
 591 a multi-string TXT record is interpreted.
 592 This is relevant because a string can be at most 255 characters,
 593 and public keys can exceed this.
 594 Empirically, the standard pattern is that
 595 each string which is
 596 both less than 255 characters \fIand\fR not the final string of the
 597 record should have a blank appended to it,
 598 and the strings of the record
 599 should then be concatenated.
 600 (This observation is based on how BIND 8 transforms a TXT record
 601 from text to DNS binary.)
 602 .NH 3
 603 KEY
 604 .P
 605 An opportunistic-encryption KEY record
 606 is an Authentication-permitted,
 607 Entity (host),
 608 non-Signatory,
 609 IPsec,
 610 RSA/MD5 record
 611 (that is, its first four bytes are 0x42000401),
 612 as per RFCs 2535 and 2537.
 613 KEY records with other \fIflags\fR, \fIprotocol\fR, or \fIalgorithm\fR
 614 values are ignored.
 615 .R
 616 Unfortunately, the public key has to be
 617 associated with the SG, not the client host behind it.
 618 The Responder does not know which client it is supposed to be representing,
 619 or which client the Initiator is representing,
 620 until far too late.
 621 .A
 622 Per-client keys would reduce vulnerability to key compromise,
 623 and simplify key changes,
 624 but they would require changes to IKE Phase 1, to separately identify
 625 the SG and its initial client(s).
 626 (At present, the client identities are not known to the Responder
 627 until IKE Phase 2.)
 628 While the current IKE standard does not actually specify (!) who is
 629 being identified by identity payloads,
 630 the overwhelming consensus is that they identify the SG,
 631 and as seen earlier,
 632 this has important uses.
 633 .NH 3
 634 Summary
 635 .P
 636 For reference, the minimum set of DNS records needed to make this
 637 all work is either:
 638 .IP 1. \w'1.'u+2n
 639 TXT in Destination reverse map, identifying Responder and providing public key.
 640 .IP 2.
 641 KEY in Initiator reverse map, providing public key.
 642 .IP 3.
 643 TXT in Source reverse map, verifying relationship to Initiator.
 644 .P
 645 or:
 646 .IP 1. \w'1.'u+2n
 647 TXT in Destination reverse map, identifying Responder.
 648 .IP 2.
 649 KEY in Responder reverse map, providing public key.
 650 .IP 3.
 651 KEY in Initiator reverse map, providing public key.
 652 .IP 4.
 653 TXT in Source reverse map, verifying relationship to Initiator.
 654 .P
 655 Slight complications ensue for dynamic addresses,
 656 lack of control over reverse maps, etc.
 657 .NH 3
 658 Implementation
 659 .P
 660 In the long run, we need either a tree of trust or a web of trust,
 661 so we can trust our DNS data.
 662 The obvious approach for DNS is a tree of trust,
 663 but there are various practical problems with running all of this
 664 through the root servers,
 665 and a web of trust is arguably more robust anyway.
 666 This is logically independent of opportunistic encryption,
 667 and a separate design proposal will be prepared.
 668 .P
 669 Interim stages of implementation of this will require a bit of thought.
 670 Notably, we need some way of dealing with the lack of fully signed DNSSEC
 671 records right away.
 672 Without user interaction, probably the best we can do is to
 673 remember the results of old fetches, compare them to the results of new
 674 fetches, and complain and disbelieve all of it if there's a mismatch.
 675 This does mean that somebody who gets fake data into our very first fetch
 676 will fool us, at least for a while, but that seems an acceptable tradeoff.
 677 (Obviously there needs to be a way to manually flush the remembered results
 678 for a specific host, to permit deliberate changes.)
 679 .NH 2
 680 Responders Without Credentials
 681 .P
 682 In cases where the Destination simply does not control its
 683 DNS reverse-map entries,
 684 there is no verifiable way to determine a suitable SG.
 685 This does not make communication utterly impossible, though.
 686 .P
 687 Simply attempting negotiation directly with the host is a last resort.
 688 (An aggressive implementation might wish to attempt it in parallel,
 689 rather than waiting until other options are known to be unavailable.)
 690 In particular, in many cases involving dynamic addresses, it will work.
 691 It has the disadvantage of delaying the discovery that opportunistic
 692 encryption is entirely impossible,
 693 but the case seems common enough to justify the overhead.
 694 .P
 695 However, there are policy issues here either way, because
 696 it is possible to impersonate such a host.
 697 The host can supply an FQDN identity and verify its right to use that
 698 identity,
 699 but except by prearrangement,
 700 there is no way to verify that the FQDN is the right one for that
 701 IP address.
 702 (The data from forward lookups may be controlled by people
 703 who do not own the address, so it cannot be trusted.)
 704 The encryption is still solid, though,
 705 so in many cases this may be useful.
 706 .NH 2
 707 Failure of Opportunism
 708 .P
 709 When there is no way to do opportunistic encryption, a policy issue arises:
 710 whether to put in a bypass (which allows plaintext traffic through)
 711 or a block (which discards it, perhaps with notification back to the sender).
 712 The choice is very much a matter of local policy,
 713 and may depend on details such as the higher-level protocol being used.
 714 For example,
 715 an SG might well permit plaintext HTTP but forbid plaintext Telnet,
 716 in which case \fIboth\fR a block and a bypass would be set up if
 717 opportunistic encryption failed.
 718 .P
 719 A bypass/block must, in practice,
 720 be treated much like an IPsec tunnel.
 721 It should persist for a while,
 722 so that high-overhead processing doesn't have to be done for every packet,
 723 but should go away eventually to return resources.
 724 It may be simplest to treat it as a degenerate tunnel.
 725 It should have a relatively long lifetime (say 6h) to keep the frequency
 726 of negotiation attempts down,
 727 except in the case where the other SG simply did not respond to IKE packets,
 728 where the lifetime should be short (say 10min) because
 729 the other SG is presumably down and might come back up again.
 730 (Cases where the other SG responded to IKE with unauthenticated error
 731 reports like ``port unreachable'' are borderline,
 732 and might deserve to be treated as an intermediate case:
 733 while such reports cannot be trusted unreservedly,
 734 in the absence of any other response,
 735 they do give some reason to suspect that the other SG is unable or
 736 unwilling to participate in opportunistic encryption.)
 737 .P
 738 As noted in section 2.1, one might think that
 739 arrival of a plaintext incoming packet should cause a
 740 bypass/block to be set up for its source host:
 741 such a packet is almost always followed by an outgoing reply packet;
 742 the incoming packet is clear evidence that opportunistic encryption is
 743 not available at the other end;
 744 attempting it will waste resources and delay traffic to no good purpose.
 745 Unfortunately, this means that anyone out on the Internet
 746 who can forge a source address can prevent encrypted communication!
 747 Since their source addresses are not authenticated,
 748 plaintext packets cannot be taken as evidence of anything,
 749 except perhaps that communication from that host is likely to occur soon.
 750 .P
 751 There needs to be a way for local administrators to remove a bypass/block
 752 ahead of its normal expiry time,
 753 to force a retry after a problem at the other end is known to have been fixed.
 754 .NH 2
 755 Subnet Opportunism
 756 .P
 757 In principle, when the Source or Destination host belongs to a subnet
 758 and the corresponding SG is willing to provide tunnels to the whole subnet,
 759 this should be done.
 760 There is no extra overhead,
 761 and considerable potential for avoiding later overhead if
 762 similar communication occurs with other members of the subnet.
 763 Unfortunately,
 764 at the moment,
 765 opportunistic tunnels can only have degenerate subnets (single hosts)
 766 at their ends.
 767 (This does, at least, set up the keying channel,
 768 so that negotiations for tunnels to other hosts in the same subnets
 769 will be considerably faster.)
 770 .P
 771 The crucial problem is step 11 of section 2.2:
 772 the Responder must verify that the Initiator is authorized to represent
 773 the Source,
 774 and this is impossible for a subnet because
 775 there is no way to do a reverse lookup on it.
 776 Information in DNS
 777 records for a name or a single address cannot be trusted,
 778 because they may be controlled by people who do not control the whole subnet.
 779 .A
 780 Except in the special case of a subnet masked on a
 781 byte boundary (in which case RFC 1035's convention of an incomplete
 782 in-addr.arpa name could be used), subnet lookup would need extensions to the
 783 reverse-map name space, perhaps along the lines of that commonly done for
 784 RFC 2317 delegation.
 785 IPv6 already has suitable name syntax, as in RFC 2874,
 786 but has no specific provisions for subnet entries in its reverse maps.
 787 Fixing all this is is not conceptually difficult,
 788 but is logically independent of opportunistic encryption,
 789 and will be proposed separately.
 790 .P
 791 A less-troublesome problem is that the Initiator,
 792 in step 10 of 2.2,
 793 must know exactly what subnet is present on the Responder's end
 794 so he can propose a tunnel to it.
 795 This information could be included in the TXT record
 796 of the Destination
 797 (it would have to be verified with a subnet lookup,
 798 but that could be done in parallel with other operations).
 799 The Initiator presumably
 800 can be configured to know what subnet(s) are present on its end.
 801 .NH 2
 802 Option Settings
 803 .P
 804 IPsec and IKE have far too many useless options, and a few useful ones.
 805 IKE negotiation is quite simplistic, and cannot handle even simple
 806 discrepancies between the two SGs.
 807 So it is necessary to be quite specific about what should be done and
 808 what should be proposed,
 809 to guarantee interoperability without prearrangement or
 810 other negotiation protocols.
 811 .R
 812 The prohibition of other negotiations is simply because there is no time.
 813 The setup algorithm (section 2.2) is lengthy already.
 814 .P
 815 [Open question:
 816 should opportunistic IKE use a different port than normal IKE?]
 817 .P
 818 Somewhat arbitrarily and
 819 tentatively, opportunistic SGs must support Main Mode, Oakley group 5 for
 820 D-H, 3DES encryption and MD5 authentication for both ISAKMP and IPsec SAs,
 821 RSA/MD5 digital-signature authentication with keys between 2048 and 8192 bits,
 822 and ESP doing both encryption and authentication.
 823 They must do key PFS
 824 in Quick Mode, but not identity PFS.
 825 They may support IPComp, preferably using Deflate,
 826 but must not insist on it.
 827 They may support AES as an alternative to 3DES,
 828 but must not insist on it.
 829 .R
 830 Identity PFS essentially requires establishing
 831 a complete new keying channel for each new tunnel,
 832 but key PFS just does a new Diffie-Hellman exchange for each rekeying,
 833 which is relatively cheap.
 834 .P
 835 Keying channels must remain in existence at least as long as any
 836 tunnel created with them remains (they are not costly, and keeping
 837 the management path up and available simplifies various issues).
 838 See section 3.1 for related issues.
 839 Given the use of key PFS,
 840 frequent rekeying does not seem critical here.
 841 In the absence of strong reason to do otherwise,
 842 the Initiator should propose rekeying at 8hr-or-1MB.
 843 The Responder must accept any proposal which specifies
 844 a rekeying time between 1hr and 24hr inclusive
 845 and a rekeying volume between 100KB and 10MB inclusive.
 846 .P
 847 Given the short expected useful life of most tunnels (see section 3.1),
 848 very few of them will survive long enough to be rekeyed.
 849 In the absence of strong reason to do otherwise,
 850 the Initiator should propose rekeying at 1hr-or-100MB.
 851 The Responder must accept any proposal which specifies
 852 a rekeying time between 10min and 8hr inclusive
 853 and a rekeying volume between 1MB and 1000MB inclusive.
 854 .P
 855 It is highly desirable to add some random jitter
 856 to the times of actual rekeying attempts,
 857 to break up ``convoys'' of rekeying events;
 858 this and certain other aspects of robust rekeying practice will be the subject
 859 of a separate design proposal.
 860 .R
 861 The numbers used here for rekeying intervals are chosen quite arbitrarily
 862 and should be re-assessed after some implementation experience is gathered.
 863 .NH 1
 864 Renewal and Teardown
 865 .NH 2
 866 Aging
 867 .P
 868 When to tear tunnels down is a bit problematic, but if we're setting up a
 869 potentially unbounded number of them,
 870 we have to tear them down \fIsomehow sometime\fR.
 871 .P
 872 Set a short initial tentative lifespan, say 1min,
 873 since most net flows in fact last only a few seconds.
 874 When that expires, look to see if
 875 the tunnel is still in use (definition:
 876 has had traffic, in either direction,
 877 in the last half of the tentative lifespan).
 878 If so, assign it a somewhat longer tentative lifespan, say 20min,
 879 after which, look again.
 880 If not, close it down.
 881 (This tentative lifespan is
 882 independent of rekeying; it is just the time when the tunnel's future
 883 is next considered.
 884 This should happen reasonably frequently, unlike
 885 rekeying, which is costly and shouldn't be too frequent.)
 886 Multi-step backoff algorithms are not worth the trouble; looking every
 887 20min doesn't seem onerous.
 888 .P
 889 If the security gateway and the client host are one and the same,
 890 tunnel teardown decisions might wish to pay attention to TCP connection status,
 891 as reported by the local TCP layer.
 892 A still-open
 893 TCP connection is almost a guarantee that more traffic is coming, while
 894 the demise of the only TCP connection through a tunnel is a strong hint
 895 that none is.
 896 If the SG and the client host are separate machines,
 897 though, tracking TCP connection status requires packet snooping,
 898 which is complicated and probably not worthwhile.
 899 .P
 900 IKE keying channels likewise are torn down when it appears the need has
 901 passed.
 902 They always linger longer than the last tunnel they administer,
 903 in case they are needed again; the cost of retaining them is low.
 904 Other than that,
 905 unless the number of keying channels on the SG gets large,
 906 the SG should simply retain all of them until rekeying time,
 907 since rekeying is the only costly event.
 908 When about to rekey a keying channel which has no current tunnels,
 909 note when the last actual keying-channel traffic occurred,
 910 and close the keying channel down if it wasn't in the last, say, 30min.
 911 When rekeying a keying channel (or perhaps shortly before rekeying is expected),
 912 Initiator and Responder should re-fetch the public keys used for
 913 SG authentication,
 914 against the possibility that they have changed or disappeared.
 915 .P
 916 See section 2.7 for discussion of rekeying intervals.
 917 .P
 918 Given the low user impact of tearing down and rebuilding a connection
 919 (a tunnel or a keying channel),
 920 rekeying attempts should not be too persistent:
 921 one can always just rebuild when needed,
 922 so heroic efforts to preserve an existing connection are unnecessary.
 923 Say, try every 10s for a minute and every minute for 5min,
 924 and then give up and declare the connection
 925 (and all other connections to that IKE peer) dead.
 926 .R
 927 In future, more sophisticated, versions of this protocol,
 928 examining the initial packet might permit a more intelligent guess at
 929 the tunnel's useful life.
 930 HTTP connections in particular are
 931 notoriously bursty and repetitive.
 932 .R
 933 Note that rekeying a keying connection basically consists of building a
 934 new keying connection from scratch,
 935 using IKE Phase 1,
 936 and abandoning the old one.
 937 .NH 2
 938 Teardown and Cleanup
 939 .P
 940 Teardown should always be coordinated with the other end.
 941 This means interpreting and sending Delete notifications.
 942 .P
 943 On receiving a Delete for the outbound SAs of a tunnel
 944 (or some subset of them),
 945 tear down the inbound ones too, and notify the other end
 946 with a Delete.
 947 Tunnels need to be considered as bidirectional entities,
 948 even though the low-level protocols don't think of them that way.
 949 .P
 950 When the deletion is initiated locally,
 951 rather than as a response to a received Delete,
 952 send a Delete for (all) the inbound SAs of a tunnel.
 953 If no responding Delete is received for the outbound SAs,
 954 try re-sending the original Delete.
 955 Three tries spaced 10s apart seems a reasonable level of effort.
 956 (Indefinite persistence is not necessary;
 957 whether the other end isn't cooperating because it doesn't feel like
 958 it, or because it is down/disconnected/etc.,
 959 the problem will eventually be cleared up by other means.)
 960 .P
 961 After rekeying,
 962 transmission should switch to using the new SAs (ISAKMP or IPsec)
 963 immediately,
 964 and the old leftover SAs should be cleared out promptly
 965 (and Deletes sent) rather than waiting for them to expire.
 966 This reduces clutter and minimizes confusion.
 967 .P
 968 Since there is only one keying channel per remote IP address,
 969 the question of whether a Delete notification has appeared on a
 970 ``suitable'' keying channel does not arise.
 971 .R
 972 The pairing of Delete notifications effectively constitutes an
 973 acknowledged Delete, which is highly desirable.
 974 .NH 2
 975 Outages and Reboots
 976 .P
 977 Tunnels sometimes go down because the other
 978 end crashes, or disconnects, or has a network link break,
 979 and there is no notice of this in the general case.
 980 (Even in the event of a crash and
 981 successful reboot, other SGs don't hear about it unless the
 982 rebooted SG has specific reason to talk to them immediately.)
 983 Over-quick response to temporary network outages is undesirable...
 984 but note that a tunnel can be torn
 985 down and then re-established without any user-visible effect except
 986 a pause in traffic,
 987 whereas if one end does reboot,
 988 the other end can't get packets to it \fIat all\fR (except via IKE)
 989 until the situation is noticed.
 990 So a bias toward quick response is appropriate,
 991 even at the cost of occasional false alarms.
 992 .P
 993 Heartbeat mechanisms are somewhat unsatisfactory for this.
 994 Unless they are very frequent, which causes other problems,
 995 they do not detect the problem promptly.
 996 .A
 997 What is really wanted is authenticated ICMP.
 998 This might be a case where public-key encryption/authentication
 999 of network packets is the right thing to do,
1000 despite the expense.
1001 .P
1002 In the absence of that, a two-part approach seems warranted.
1003 .P
1004 First,
1005 when an SG receives an IPsec packet that is addressed to it,
1006 and otherwise appears healthy,
1007 but specifies an unknown SA and is from a host that the receiver currently
1008 has no keying channel to,
1009 the receiver must attempt to inform the sender
1010 via an IKE Initial-Contact notification
1011 (necessarily sent in plaintext,
1012 since there is no suitable keying channel).
1013 This must be severely rate-limited on \fIboth\fR ends;
1014 one notification per SG pair per minute seems ample.
1015 .P
1016 Second, there is an obvious difficulty with this:
1017 the Initial-Contact notification is unauthenticated
1018 and cannot be trusted.
1019 So it must be taken as a hint only:
1020 there must be a way to confirm it.
1021 .P
1022 What is needed here is something that's desirable for
1023 debugging and testing anyway:
1024 an IKE-level ping mechanism.
1025 Pinging direct at the IP level instead will not tell us about a
1026 crash/reboot event.
1027 Sending pings through tunnels has
1028 various complications (they should stop at the far mouth of the tunnel
1029 instead of going on to a subnet; they should not count against idle
1030 timers; etc.).
1031 What is needed is a continuity check on a keying channel.
1032 (This could also be used as a heartbeat,
1033 should that seem useful.)
1034 .P
1035 IKE Ping delivery need not be reliable, since the whole point of a ping is
1036 simply to provoke an acknowledgement.
1037 They should preferably be authenticated,
1038 but it is not clear that this is absolutely necessary,
1039 although if they are not they need
1040 encryption plus a timestamp or a nonce,
1041 to foil replay mischief.
1042 How they are implemented is a secondary issue,
1043 and a separate design proposal will be prepared.
1044 .A
1045 Some existing implementations are already using
1046 (private) notify value 30000 (``LIKE_HELLO'') as ping
1047 and (private) notify value 30002 (``SHUT_UP'') as ping reply.
1048 .P
1049 If an IKE Ping gets no response, try some (say 8) IP pings,
1050 spaced a few seconds apart, to check IP connectivity;
1051 if one comes back, try another IKE Ping;
1052 if that gets no response,
1053 the other end probably has rebooted, or otherwise been re-initialized,
1054 and its tunnels and keying channel(s) should be torn down.
1055 .P
1056 In a similar vein,
1057 giving limited rekeying persistence,
1058 a short network outage could take some tunnels down without
1059 disrupting others.
1060 On receiving a packet for an unknown SA from a host that a keying
1061 channel is currently open to,
1062 send that host a Invalid-SPI notification for that SA.
1063 xxx that's not what Invalid-SPI is for.
1064 The other host can then tear down the half-torn-down tunnel,
1065 and negotiate a new tunnel for the traffic
1066 it presumably still wants to send.
1067 .P
1068 Finally,
1069 it would be helpful if SGs made some attempt to deal intelligently
1070 with crashes and reboots.
1071 A deliberate shutdown should include an attempt to notify all other SGs
1072 currently connected by keying channels,
1073 using Deletes,
1074 that communication is about to fail.
1075 (Again, these will be taken as teardowns;
1076 attempts by the other SGs to negotiate new tunnels as replacements
1077 should be ignored at this point.)
1078 And when possible, SGs should attempt to preserve information
1079 about currently-connected SGs in non-volatile storage,
1080 so that after a crash,
1081 an Initial-Contact can be sent to previous partners to
1082 indicate loss of all previously-established connections.
1083 .NH 1
1084 Conclusions
1085 .P
1086 This design appears to achieve the objective of setting up encryption
1087 with strangers.
1088 The authentication aspects also seem adequately addressed if the
1089 destination controls its reverse-map DNS entries
1090 and the DNS data itself can be reliably authenticated
1091 as having originated from the legitimate administrators of that
1092 subnet/FQDN.
1093 The authentication situation is less satisfactory when DNS is less helpful,
1094 but it is difficult to see what else could be done about it.
1095 .NH 1
1096 References
1097 .P
1098 [TBW]
1099 .NH 1
1100 Appendix:  Separate Design Proposals TBW
1101 .IP \(bu \w'\(bu'u+2n
1102 How can we build a web of trust with DNSSEC?
1103 (See section 2.3.4.)
1104 .IP \(bu
1105 How can we extend DNS reverse lookups to permit reverse lookup
1106 on a subnet?
1107 (Both address and mask must appear in the name to be looked up.)
1108 (See section 2.6.)
1109 .IP \(bu
1110 How can rekeying be done as robustly as possible?
1111 (At least partly, this is just documenting current FreeS/WAN practice.)
1112 (See section 2.7.)
1113 .IP \(bu
1114 How should IKE Pings be implemented?
1115 (See section 3.3.)