doc/draft-ietf-dhc-failover-12.txt

   1
   2
   3
   4
   5
   6
   7 Network Working Group                                        Ralph Droms
   8 INTERNET DRAFT                                               Kim Kinnear
   9                                                               Mark Stapp
  10                                                            Cisco Systems
  11
  12                                                              Bernie Volz
  13                                                                 Ericsson
  14
  15                                                             Steve Gonczi
  16                                                                 Relicore
  17
  18                                                               Greg Rabil
  19                                                      Lucent Technologies
  20
  21                                                           Michael Dooley
  22                                                  Diamond IP Technologies
  23
  24                                                               Arun Kapur
  25                                                              K5 Networks
  26
  27                                                               March 2003
  28                                                   Expires September 2003
  29
  30
  31                          DHCP Failover Protocol
  32                     <draft-ietf-dhc-failover-12.txt>
  33
  34 Status of this Memo
  35
  36    This document is an Internet-Draft and is in full conformance with
  37    all provisions of Section 10 of RFC2026.
  38
  39    Internet-Drafts are working documents of the Internet Engineering
  40    Task Force (IETF), its areas, and its working groups.  Note that
  41    other groups may also distribute working documents as Internet-
  42    Drafts.
  43
  44    Internet-Drafts are draft documents valid for a maximum of six months
  45    and may be updated, replaced, or obsoleted by other documents at any
  46    time.  It is inappropriate to use Internet- Drafts as reference
  47    material or to cite them other than as "work in progress."
  48
  49    The list of current Internet-Drafts can be accessed at
  50    http://www.ietf.org/ietf/1id-abstracts.txt
  51
  52    The list of Internet-Draft Shadow Directories can be accessed at
  53    http://www.ietf.org/shadow.html.
  54
  55
  56
  57
  58 Droms, et. al.           Expires September 2003                 [Page 1]
  59 \f
  60 Internet Draft           DHCP Failover Protocol              March 2003
  61
  62
  63 Copyright Notice
  64
  65    Copyright (C) The Internet Society (2003). All Rights Reserved.
  66
  67 Abstract
  68
  69    DHCP [RFC 2131] allows for multiple servers to be operating on a
  70    single network.  Some sites are interested in running multiple
  71    servers in such a way so as to provide redundancy in case of server
  72    failure.  In order for this to work reliably, the cooperating primary
  73    and secondary servers must maintain a consistent database of the
  74    lease information.  This implies that servers will need to coordinate
  75    any and all lease activity so that this information is synchronized
  76    in case of failover.
  77
  78    This document defines a protocol to provide such synchronization
  79    between two servers.  One server is designated the "primary" server,
  80    the other is the "secondary" server.  This document also describes a
  81    way to integrate the failover protocol with the DHCP load balancing
  82    approach.
  83
  84
  85 Table of Contents
  86
  87
  88     1.  Introduction................................................. 4
  89     2.  Terminology.................................................. 5
  90     2.1.  Requirements terminology................................... 5
  91     2.2.  DHCP and failover terminology.............................. 5
  92     3.  Background and External Requirements......................... 9
  93     3.1.  Key aspects of the DHCP protocol........................... 9
  94     3.2.  BOOTP relay agent implementation........................... 11
  95     3.3.  What does it mean if a server can't communicate with its partner? 12
  96     3.4.  Challenging scenarios for a Failover protocol.............. 13
  97     3.5.  Using TCP to detect partner server failure................. 14
  98     4.  Design Goals................................................. 15
  99     4.1.  Design goals for this protocol............................. 15
 100     4.2.  Limitations of this protocol............................... 17
 101     5.  Protocol Overview............................................ 17
 102     5.1.  Messages and States........................................ 18
 103     5.2.  Fundamental guarantees..................................... 20
 104     5.3.  Load balancing............................................. 27
 105     5.4.  IP address allocations between servers..................... 28
 106     5.5.  Operating in NORMAL state.................................. 30
 107     5.6.  Operating in COMMUNICATIONS-INTERRUPTED state.............. 31
 108     5.7.  Operating in PARTNER-DOWN state............................ 31
 109
 110
 111
 112
 113
 114 Droms, et. al.           Expires September 2003                 [Page 2]
 115 \f
 116 Internet Draft           DHCP Failover Protocol              March 2003
 117
 118
 119
 120     5.8.  Operating in RECOVER state................................. 31
 121     5.9.  Operating in STARTUP state................................. 31
 122     5.10.  Time synchronization between servers...................... 32
 123     5.11.  IP address binding-status................................. 33
 124     5.12.  DNS dynamic update considerations......................... 36
 125     5.13.  Reservations and failover................................. 41
 126     5.14.  Dynamic BOOTP and failover................................ 42
 127     5.15.  Guidelines for selecting MCLT............................. 43
 128     5.16.  What is sent in response to an UPDREQ or UPDREQALL message? 43
 129     5.17.  How do you determine that your partner is "up to date" for 45
 130     6.  Common Message Format........................................ 45
 131     6.1.  Message header format...................................... 46
 132     6.2.  Common option format....................................... 48
 133     6.3.  Batching multiple binding update transactions in one BNDUPD mes- 49
 134     7.  Protocol Messages............................................ 51
 135     7.1.  BNDUPD message [3]......................................... 51
 136     7.2.  BNDACK message [4]......................................... 62
 137     7.3.  UPDREQ message [9]......................................... 65
 138     7.4.  UPDREQALL message [7]...................................... 66
 139     7.5.  UPDDONE message [8]........................................ 67
 140     7.6.  POOLREQ message [1]........................................ 68
 141     7.7.  POOLRESP message [2]....................................... 69
 142     7.8.  CONNECT message [5]........................................ 70
 143     7.9.  CONNECTACK message [6]..................................... 74
 144     7.10.  STATE message [10]........................................ 78
 145     7.11.  CONTACT message [11]...................................... 79
 146     7.12.  DISCONNECT message [12]................................... 80
 147     8.  Connection Management........................................ 81
 148     8.1.  Connection granularity..................................... 81
 149     8.2.  Creating the TCP connection................................ 81
 150     8.3.  Using the TCP connection for determining communications status 83
 151     8.4.  Using the TCP connection for binding data.................. 85
 152     8.5.  Using the TCP connection for control messages.............. 85
 153     8.6.  Losing the TCP connection.................................. 85
 154     9.  Failover Endpoint States..................................... 86
 155     9.1.  Server Initialization...................................... 86
 156     9.2.  Server State Transitions................................... 86
 157     9.3.  STARTUP state.............................................. 90
 158     9.4.  PARTNER-DOWN state......................................... 93
 159     9.5.  RECOVER state.............................................. 95
 160     9.6.  RECOVER-WAIT state......................................... 97
 161     9.7.  RECOVER-DONE state......................................... 98
 162     9.9.  COMMUNICATIONS-INTERRUPTED State........................... 101
 163     9.10.  POTENTIAL-CONFLICT state.................................. 105
 164     9.11.  RESOLUTION-INTERRUPTED state.............................. 107
 165     9.12.  CONFLICT-DONE state....................................... 108
 166     9.13.  PAUSED state.............................................. 108
 167
 168
 169
 170 Droms, et. al.           Expires September 2003                 [Page 3]
 171 \f
 172 Internet Draft           DHCP Failover Protocol              March 2003
 173
 174
 175     9.14.  SHUTDOWN state............................................ 109
 176     10.  Safe Period................................................. 110
 177     11.  Security.................................................... 111
 178     11.1.  Simple shared secret...................................... 112
 179     11.2.  TLS....................................................... 113
 180     12.  Failover Options............................................ 113
 181     12.1.  addresses-transferred..................................... 114
 182     12.2.  assigned-IP-address....................................... 114
 183     12.3.  binding-status............................................ 114
 184     12.4.  client-identifier......................................... 115
 185     12.5.  client-hardware-address................................... 115
 186     12.6.  client-last-transaction-time.............................. 115
 187     12.7.  client-reply-options...................................... 116
 188     12.8.  client-request-options.................................... 116
 189     12.9.  DDNS...................................................... 117
 190     12.10.  delayed-service-parameter................................ 118
 191     12.11.  hash-bucket-assignment................................... 118
 192     12.12.  IP-flags................................................. 119
 193     12.13.  lease-expiration-time.................................... 120
 194     12.14.  max-unacked-bndupd....................................... 120
 195     12.15.  MCLT..................................................... 120
 196     12.16.  message.................................................. 121
 197     12.17.  message-digest........................................... 121
 198     12.18.  potential-expiration-time................................ 122
 199     12.19.  receive-timer............................................ 122
 200     12.20.  protocol-version......................................... 122
 201     12.21.  reject-reason............................................ 123
 202     12.22.  relationship-name........................................ 124
 203     12.23.  server-flags............................................. 124
 204     12.24.  server-state............................................. 125
 205     12.25.  start-time-of-state...................................... 125
 206     12.26.  TLS-reply................................................ 126
 207     12.27.  TLS-request.............................................. 126
 208     12.28.  vendor-class-identifier.................................. 126
 209     12.29.  vendor-specific-options.................................. 127
 210     13.  IANA Considerations......................................... 127
 211     14.  Acknowledgments............................................. 127
 212     15.  References.................................................. 129
 213     16.  Author's information........................................ 131
 214     17.  Full Copyright Statement.................................... 132
 215
 216
 217 1.  Introduction
 218
 219    DHCP [RFC 2131] allows for multiple servers to be operating on a sin-
 220    gle network.  Some sites are interested in running multiple servers
 221    in such a way so as to provide redundancy in case of server failure
 222    since the DHCP subsystem is in many cases a critical part of the
 223
 224
 225
 226 Droms, et. al.           Expires September 2003                 [Page 4]
 227 \f
 228 Internet Draft           DHCP Failover Protocol              March 2003
 229
 230
 231    network infrastructure.
 232
 233    This document defines a protocol to provide synchronization between
 234    two servers in order that each can take over for the other should
 235    either one fail or become unreachable.
 236
 237    One server is designated the "primary" server,  the other is the
 238    "secondary" server, and most DHCP client requests are sent to each
 239    server (see section 3.1.1 for details).
 240
 241    In order to provide a  high availability DHCP service, these
 242    cooperating primary and secondary servers must maintain a consistent
 243    database of lease information.  This implies that servers will need
 244    to coordinate all lease activity so that this information is syn-
 245    chronized in case failover is required.  The protocol messages and
 246    processing techniques required to maintain a consistent database are
 247    specified in the protocol described here.
 248
 249    The failover protocol also contains a way to integrate the DHCP load-
 250    balancing algorithm described in [RFC 3074] with the failover proto-
 251    col.
 252
 253 2.  Terminology
 254
 255    This section discusses both the generic requirements terminology com-
 256    mon to many IETF protocol specifications as well as specialized DHCP
 257    and failover protocol specific terminology.
 258
 259 2.1.  Requirements terminology
 260
 261    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
 262    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
 263    document are to be interpreted as described in RFC 2119 [RFC 2119].
 264
 265
 266 2.2.  DHCP and failover terminology
 267
 268    This document uses the following terms:
 269
 270       o  "available IP address"
 271
 272          An IP address is "available" if it may be allocated by a
 273          specific DHCP server.  An IP address is considered (for the
 274          purposes of this document) to be available to a single server
 275          for allocation unless otherwise noted.  An IP address available
 276          for allocation on a primary server has state FREE, and an IP
 277          address available for allocation on a secondary server has
 278          state BACKUP.
 279
 280
 281
 282 Droms, et. al.           Expires September 2003                 [Page 5]
 283 \f
 284 Internet Draft           DHCP Failover Protocol              March 2003
 285
 286
 287       o  "binding"
 288
 289          A binding is a collection of configuration parameters, includ-
 290          ing at least an IP address, associated with or "bound to" a
 291          DHCP client.  Bindings are managed by DHCP servers.
 292
 293       o  "binding database"
 294
 295          The collection of bindings managed by a primary and secondary.
 296
 297       o  "binding update transaction"
 298
 299          A binding update transaction refers to the set of information
 300          (contained in options) necessary to perform a binding update
 301          for a single IP address.  It will be comprised of the
 302          assigned-IP-address option, the binding-status option, along
 303          with other options as appropriate.
 304
 305       o  "binding-status"
 306
 307          The binding-status is the status of an IP address with respect
 308          to its association with a client.  There are specific binding-
 309          status values defined for use by the failover protocol, e.g.,
 310          ACTIVE, FREE, RELEASED, ABANDONED, etc.  These are designed to
 311          map more or less directly onto the binding-status values used
 312          internally in most DHCP server implementations.  The term
 313          binding-status refers to the concept also sometimes known as
 314          "lease state" or "IP address state", but in this document the
 315          term "state" is reserved for the failover state of a failover
 316          endpoint, and binding-status is always used to refer to the
 317          state associated with an IP address or lease.
 318
 319       o "DHCP client" or "client"
 320
 321         A DHCP client is an Internet host using DHCP to obtain confi-
 322         guration parameters such as a network address.  The term
 323         "client" used within this document always means a DHCP client,
 324         and never one of the two failover servers.
 325
 326       o "DHCP server" or "server"
 327
 328         A DHCP server is an Internet host that returns configuration
 329         parameters to DHCP clients.
 330
 331       o "DDNS"
 332
 333         An abbreviation for "Dynamic DNS", which refers to the capabil-
 334         ity to update a DNS server's name (actually resource record)
 335
 336
 337
 338 Droms, et. al.           Expires September 2003                 [Page 6]
 339 \f
 340 Internet Draft           DHCP Failover Protocol              March 2003
 341
 342
 343         database using an on-the-wire protocol defined in [RFC 2136].
 344
 345       o "DNS"
 346
 347         An abbreviation for "Domain Name System", a scheme where a cen-
 348         tral name repository is used to map names to IP addresses and IP
 349         addresses to names.
 350
 351       o "failover endpoint"
 352
 353         The failover protocol allows for there to be a unique failover
 354         endpoint per partner per role per relationship (where role is
 355         primary or secondary and the relationship is defined by the
 356         relationship-name option).  This failover endpoint can take
 357         actions and hold unique states.  Typically, there is a one fail-
 358         over endpoint per partner, although there may be more.
 359
 360       o "FQDN"
 361
 362         An FQDN is a "fully qualified domain name".  A fully qualified
 363         domain name generally is a host name with at least one zone
 364         name, for example "www.dhcp.org" is a fully qualified domain
 365         name.
 366
 367       o "lazy update"
 368
 369         Lazy update refers to the requirement placed on a server imple-
 370         menting a failover protocol to update its failover partner when-
 371         ever the binding database changes.  A failover protocol which
 372         didn't support lazy update would require the failover partner
 373         update to be complete before a DHCP server could respond to a
 374         DHCP client request with a DHCPACK.  A failover protocol which
 375         does support lazy update places no such restriction on the
 376         update of the failover partner server, and so a server can allo-
 377         cate an IP address or extend a lease on an IP address and then
 378         update its failover partner as time permits.  A failover proto-
 379         col which supports lazy update not only removes the requirement
 380         to update the failover partner prior to responding to a DHCP
 381         client with a DHCPACK, but also allows gathering up batches of
 382         updates from one failover server to its partner.
 383
 384       o "MCLT"
 385
 386         The MCLT refers to maximum client lead time.  This time is con-
 387         figured on the primary server and transmitted from the primary
 388         to the secondary server in the CONNECT message.  It is the max-
 389         imum amount of time that one server can extend a lease for a
 390         client's binding beyond the time known by the partner server.
 391
 392
 393
 394 Droms, et. al.           Expires September 2003                 [Page 7]
 395 \f
 396 Internet Draft           DHCP Failover Protocol              March 2003
 397
 398
 399         See section 5.2.1 for details.
 400
 401       o "partner"
 402
 403         A "partner", for the purposes of this document, refers to a
 404         failover server, typically the other failover server.  In many
 405         (if not most) cases, the failover protocol is symmetric with
 406         respect to the primary or secondary nature of the servers, and
 407         so it is often appropriate to discuss "updating the partner
 408         server", since it could be a primary server updating a secondary
 409         server or a secondary server updating a primary server.
 410
 411       o "Primary server" or "Primary"
 412
 413         A DHCP server configured to provide primary service to a set of
 414         DHCP clients for a particular set of subnet address pools.
 415
 416       o "RR"
 417
 418         "RR" is an abbreviation for "resource record".  All records in
 419         the DNS are resource records.  The resource records of most
 420         relevance to this document are the "A" resource record, which
 421         maps a DNS name to a particular IP address, the "PTR" resource
 422         record, which allows a "reverse map", from the IP address back
 423         to a DNS name, and the "KEY" resource record, which is used in
 424         ways defined in [FQDN] to tag a DNS name with the identity of
 425         the DHCP client with which it is associated.
 426
 427       o "Secondary server" or "Secondary"
 428
 429         A DHCP server configured to act as backup to a primary server
 430         for a particular set of subnet address pools.
 431
 432       o "stable storage"
 433
 434         Every DHCP server is assumed to have some form of what is called
 435         "stable storage".  Stable storage is used to hold information
 436         concerning IP address bindings (among other things) so that this
 437         information is not lost in the event of a server failure which
 438         requires restart of the server.
 439
 440       o "state"
 441
 442         In this document, the term "state" refers exclusively to the
 443         state of a failover endpoint, for example: NORMAL,
 444         COMMUNICATIONS-INTERRUPTED, PARTNER-DOWN.  It is not used to
 445         refer to any attributes of an IP address or a binding of an IP
 446         address.  See "binding-status".
 447
 448
 449
 450 Droms, et. al.           Expires September 2003                 [Page 8]
 451 \f
 452 Internet Draft           DHCP Failover Protocol              March 2003
 453
 454
 455       o "subnet address pool"
 456
 457         A subnet address pool is the set of IP addresses which is asso-
 458         ciated with a particular network number and subnet mask.  In the
 459         simple case, there is a single network number and subnet mask
 460         and a set of IP addresses.  In the more complex case (sometimes
 461         called "secondary subnets", sometimes "superscopes"), several
 462         (apparently unrelated) network number and subnet mask combina-
 463         tions with their associated IP addresses may all be configured
 464         together into one subnet address pool.
 465
 466
 467 3.  Background and External Requirements
 468
 469    This section highlights key aspects of the DHCP protocol on which the
 470    failover protocol depends.  It also discusses the requirements that
 471    the failover protocol places on other aspects of the network infras-
 472    tructure, and some general issues surrounding server failure detec-
 473    tion.  Some failure scenarios that provide particular challenges to a
 474    failover protocol are discussed.  Finally, the challenges inherent in
 475    using a TCP connection as a means to detect failure of a partner
 476    server are elaborated.
 477
 478 3.1.  Key aspects of the DHCP protocol
 479
 480    The failover protocol is designed to augment the DHCP protocol as
 481    described in RFC 2131 [RFC 2131].  There are several key aspects of
 482    the DHCP protocol which are required by the failover protocol in
 483    order to successfully meet its design goals.
 484
 485 3.1.1.  Broadcast behavior
 486
 487    There are two aspects of the broadcast behavior of the DHCP protocol
 488    which are key to making the failover protocol operate successfully.
 489    The first is simply that the DHCP protocol requires a DHCP client to
 490    broadcast all DHCPDISCOVER and DHCPREQUEST/INIT-REBOOT messages.
 491    Because of this requirement, a DHCP client who was communicating with
 492    one server will automatically be able to communicate with another
 493    server if one is available.
 494
 495    The second aspect of broadcast behavior is similar to the first, but
 496    involves the distinction between a DHCPREQUEST/RENEW and
 497    DHCPREQUEST/REBINDING.  A DHCPREQUEST/RENEW is the message that a
 498    DHCP client uses to extend its lease.  It is unicast to the DHCP
 499    server from which it acquired the lease.   However, the DHCP protocol
 500    (in a farsighted move), was explicitly designed so that in the event
 501    that a DHCP client cannot contact the server from which it received a
 502    lease on an IP address using a DHCPREQUEST/RENEW, the client is
 503
 504
 505
 506 Droms, et. al.           Expires September 2003                 [Page 9]
 507 \f
 508 Internet Draft           DHCP Failover Protocol              March 2003
 509
 510
 511    required to broadcast its renewal using a DHCPREQUEST/REBINDING to
 512    any available DHCP server.  Since all DHCP clients were required to
 513    implement this algorithm, the failover protocol can have a different
 514    server from the one that initially granted a lease be the server to
 515    renew a lease.  Thus, one server can take over for another with no
 516    interruption in the service as experienced by the DHCP client or its
 517    associated applications software.
 518
 519 3.1.2.  Client responsibility
 520
 521    In the DHCP protocol the DHCP clients are entrusted with a consider-
 522    able responsibility.  In particular, after they are granted a lease
 523    on an IP address, they are enjoined to only use that IP address while
 524    their lease is valid.  Every DHCP client is expected to stop using an
 525    IP address if the expiration time on the lease has passed and if it
 526    cannot get an extension on the lease for that IP address from some
 527    DHCP server.  Thus, the correct behavior of every DHCP client in this
 528    regard is required to ensure the integrity of the DHCP service.  On
 529    the other hand, incorrect behavior by a client in this area will tend
 530    to adversely affect at most one other DHCP client.
 531
 532    Furthermore, any DHCP client which sends in a DHCPREQUEST/RENEW or
 533    DHCPREQUEST/REBINDING to a DHCP server (either unicast for a RENEW or
 534    broadcast for a REBINDING) MUST still have time to run on the lease
 535    for that IP address.  The DHCP server sends the DHCPACK back unicast
 536    to the IP address from which the RENEW or REBINDING originated.
 537
 538    Given the existing responsibility placed on the client to only use an
 539    IP address when the lease is valid, and to only send in a RENEW or
 540    REBINDING if the lease is valid, the failover protocol relies on DHCP
 541    clients to perform responsibly and will, in the absence of conflict-
 542    ing information, believe a DHCP client that is attempting to RENEW or
 543    REBIND a lease on an IP address is the legitimate owner of that IP
 544    address.
 545
 546    If clients do not follow these rules, it is possible for an address
 547    to be in use by more than one client. For a single server, this hap-
 548    pens because the server has leased the expired address to another
 549    client and the original client is also attempting to use the address.
 550    The server would NAK the renewal request. This is made slightly worse
 551    in the failover protocol if the two servers are unable to communicate
 552    with each other and one server leases an available address to a new
 553    client while the other server receives a renewal from a different
 554    client.  In this case, both servers lease the same address to dif-
 555    ferent clients for the MCLT time.
 556
 557    One troublesome issue is that of the DHCP client responsibility when
 558    sending in DHCPREQUEST/INIT-REBOOT requests.  While the original DHCP
 559
 560
 561
 562 Droms, et. al.           Expires September 2003                [Page 10]
 563 \f
 564 Internet Draft           DHCP Failover Protocol              March 2003
 565
 566
 567    RFC was written to require a DHCP client to have time left to run on
 568    the lease for an IP address if the client is sending an INIT-REBOOT
 569    request, it was sufficiently unclear that some client vendors didn't
 570    realize this until recently.  Since the INIT-REBOOT request was sent
 571    with the IP address in the dhcp-requested-address option and not in
 572    the ciaddr (for perfectly good reasons), the similarity to the RENEW
 573    and REBINDING case was lost on many people.
 574
 575    At present, the failover protocol does not assume that a client send-
 576    ing in an INIT-REBOOT request necessarily has a valid lease on the IP
 577    address appearing in the dhcp-requested-address option in the INIT-
 578    REBOOT request.
 579
 580    The implications of this are as follows: Assume that there is a DHCP
 581    client that gets a lease from one server while that server is unable
 582    to communicate with its failover partner.  Then, assume that after
 583    that client reboots it is able only to communicate with the other
 584    failover server.  If the failover servers have not been able to com-
 585    municate with each other during this process, then the DHCP client
 586    will get a new IP address instead of being able to continue to use
 587    its existing IP address. This will affect no applications on the DHCP
 588    client, since it is rebooting.  However, it will use up an additional
 589    IP address in this marginal case.
 590
 591 3.1.3.  Stable storage update before DHCPACK
 592
 593    The DHCP protocol allocates resources, and in order to operate
 594    correctly it requires that a DHCP server update some form of stable
 595    storage prior to sending a DHCPACK to a DHCP client in order to grant
 596    that client a lease on an IP address.
 597
 598    One of the goals of the failover protocol is that it not add signifi-
 599    cant additional time to this already time consuming requirement to
 600    update stable storage prior to a DHCPACK.  In particular, adding a
 601    requirement to communicate with another server prior to sending a
 602    DHCPACK would greatly simplify the failover protocol, but it would
 603    unacceptably limit the potential scalability of any DHCP server which
 604    employed the failover protocol.
 605
 606 3.2.  BOOTP relay agent implementation
 607
 608    Many DHCP clients are not resident on the same network segment as a
 609    DHCP server.  In order to support this form of network architecture,
 610    most contemporary routers implement something known as a BOOTP Relay
 611    Agent.  This capability inside of a router listens for all broadcasts
 612    at the DHCP port, port 67, and will relay any broadcasts that it
 613    receives on to a DHCP server.  The IP address of the DHCP server must
 614    have been previously configured into the router.  As part of the
 615
 616
 617
 618 Droms, et. al.           Expires September 2003                [Page 11]
 619 \f
 620 Internet Draft           DHCP Failover Protocol              March 2003
 621
 622
 623    relay process, the relay agent will place the address of the inter-
 624    face on which it received the broadcast into the giaddr field of the
 625    DHCP packet.
 626
 627    Since the failover protocol requires two DHCP servers to receive any
 628    broadcast DHCP messages, in order to work with DHCP clients which are
 629    not local to the DHCP server, the BOOTP relay agent on the router
 630    closest to the DHCP client must be configured to point at more than
 631    one DHCP server.
 632
 633    Most BOOTP relay agent implementations allow this duplication of
 634    packets.
 635
 636    If this is not possible, an administrator might be able to configure
 637    the relay agent with a subnet broadcast address, but in this case the
 638    primary and secondary DHCP servers in a failover pair must both
 639    reside on the same subnet.
 640
 641 3.3.  What does it mean if a server can't communicate with its partner?
 642
 643    In any protocol designed to allow one server to take over some
 644    responsibilities from a partner server in the event of "failure" of
 645    that partner server, there is an inherent difficulty in determining
 646    when that partner server has failed.
 647
 648    In fact, it is fundamentally impossible for one server to distinguish
 649    a network communications failure from the outright failure of the
 650    server to which it is trying to communicate.  In the case where each
 651    server is handing out resources (in this case IP addresses) to a
 652    client community, mistaking an inability to communicate with a
 653    partner server for failure of that partner server could easily cause
 654    both servers to be handing out the same IP addresses to different
 655    clients.
 656
 657    One way that this is sometimes handled is for there to be more than
 658    two servers.  In the case of an odd number of servers, the servers
 659    that can still communicate with a majority of other servers will con-
 660    sider themselves operational, and any server which can't communicate
 661    to a majority of other servers must immediately cease operations.
 662
 663    While this technique works in some domains, having the only server to
 664    which a DHCP client can communicate voluntarily shut itself down
 665    seems like something worth avoiding.
 666
 667    The failover protocol will operate correctly while both servers are
 668    unable to communicate, whether they are both running or not.  At some
 669    point there may be resource contention, and if one of the servers is
 670    actually down, then the operator can inform the operational server
 671
 672
 673
 674 Droms, et. al.           Expires September 2003                [Page 12]
 675 \f
 676 Internet Draft           DHCP Failover Protocol              March 2003
 677
 678
 679    and the operational server will be able to use all of the failed
 680    server's resources.
 681
 682    The protocol also allows detection of an orderly shutdown of a parti-
 683    cipating server.
 684
 685 3.4.  Challenging scenarios for a Failover protocol
 686
 687    There exist two failure scenarios which provide particular challenges
 688    to the correctness guarantees of a failover protocol.
 689
 690 3.4.1.  Primary Server crash before "lazy" update:
 691
 692    In the case where the primary server sends a DHCPACK to a client for
 693    a newly allocated IP address and then crashes prior to sending the
 694    corresponding update to the secondary server, the secondary server
 695    will have no record of the IP address allocation.  When the secondary
 696    server takes over, it may well try to allocate that IP address to a
 697    different client.  In the case where the first client to receive the
 698    IP address is not on the net at the time (yet while there was still
 699    time to run on its lease), an ICMP echo (i.e., ping) will not prevent
 700    the secondary server from allocating that IP address to a different
 701    client.
 702
 703    The failover protocol deals with this situation by having the primary
 704    and secondary servers allocate addresses for new clients from dis-
 705    joint address pools.  See section 5.5 for details.
 706
 707    A more likely (in that DHCPREQUEST/RENEWs are presumably more common
 708    than DHCPDISCOVERs) and more subtle version of this problem is where
 709    the primary server crashes after extending a client's lease time, and
 710    before updating the secondary with a new time using a lazy update.
 711    After the secondary takes over, if the client is not connected to the
 712    network the secondary will believe the client's lease has expired
 713    when, in fact, it has not.  In this case as well, the IP address
 714    might be reallocated to a different client while the first client is
 715    still using it.
 716
 717    This scenario is handled by the failover protocol through control of
 718    the lease time and the use of the maximum client lead time (MCLT).
 719    See section 5.2.1  for details.
 720
 721 3.4.2.  Network partition where DHCP servers can't communicate but each
 722 can talk to clients:
 723
 724    Several conditions are required for this situation to occur.  First,
 725    due to a network failure, the primary and secondary servers cannot
 726    communicate.  As well, some of the DHCP clients must be able to
 727
 728
 729
 730 Droms, et. al.           Expires September 2003                [Page 13]
 731 \f
 732 Internet Draft           DHCP Failover Protocol              March 2003
 733
 734
 735    communicate with the primary server, and some of the clients must now
 736    only be able to communicate with the secondary server.  When this
 737    condition occurs, both primary and secondary servers could attempt to
 738    allocate IP addresses for new clients from the same pool of available
 739    addresses.  At some point, then, two clients will end up being allo-
 740    cated the same IP address.  This will cause problems when the network
 741    failure that created this situation is corrected.
 742
 743    The failover protocol deals with this situation by having the primary
 744    and secondary servers allocate addresses for new clients from dis-
 745    joint address pools.  See section 5.5 for details.
 746
 747 3.5.  Using TCP to detect partner server failure
 748
 749    There are several characteristics of TCP that are important to the
 750    functioning of the failover protocol, which uses one TCP connection
 751    for both bulk data transfer as well as to assess communications
 752    integrity with the other server.  Reliable and ordered message
 753    delivery are chief among these important characteristics.
 754
 755    It would be nice to use the capabilities built in to TCP to allow it
 756    to determine if communications integrity exists to the failover
 757    partner but this strategy contains some problems which require
 758    analysis.  There exist three fundamental cases for an open TCP con-
 759    nection that must be examined.
 760
 761       1.  When no data is being sent on a TCP connection, the TCP layer
 762           also does not exchange any signaling messages to assure that
 763           the peer is still up.
 764
 765       2.  When data is queued to be sent, and the receiver has not
 766           blocked the sending of additional data, then messages are
 767           flowing across the TCP connection containing the applications
 768           data.
 769
 770       3.  When data is queued to be sent, and the receiver has blocked
 771           the transmission of additional data, then persist messages are
 772           flowing from the receiver to the sender to ensure that the
 773           sender doesn't miss the receiver opening the window for
 774           further transmissions.
 775
 776    The first case can be turned into the second case by sending
 777    application-level keep-alive messages periodically when there is no
 778    other data queued to be sent.  Note TCP keep-alive messages might be
 779    used as well, but they present additional problems.
 780
 781    Thus, we can ensure that the TCP connection has messages flowing
 782    periodically across the connection fairly easily.  The question
 783
 784
 785
 786 Droms, et. al.           Expires September 2003                [Page 14]
 787 \f
 788 Internet Draft           DHCP Failover Protocol              March 2003
 789
 790
 791    remains as to what TCP will do if the other end of the connection
 792    fails to respond (either because of network partition or because the
 793    receiving server crashes). TCP will attempt to retransmit a message
 794    with an exponential backoff, and will eventually timeout that
 795    retransmission.  However, the length of that timeout cannot, in gen-
 796    eral, be set on a per-connection basis, and is frequently as long as
 797    nine minutes, though in some cases it may be as short as two minutes.
 798    On some systems it can be set system-wide, while on other systems it
 799    cannot be changed at all.
 800
 801    A value for this timeout that would be appropriate for the failover
 802    protocol, say less than 1 minute, could have unpleasant side-effects
 803    on other applications running on the same server, assuming that it
 804    could be changed at all on the host operating system.
 805
 806    Nine minutes is a long time for the DHCP service to be unavailable to
 807    any new clients that were being served by the server which has
 808    crashed, when there is another server running that could respond to
 809    them as soon as it determines that its partner is not operational.
 810
 811    The conclusion drawn from this analysis is that TCP provides very
 812    useful support for the failover protocol in the areas of reliable and
 813    ordered message delivery, but cannot by itself be relied upon to
 814    detect partner server failure in a fashion acceptable to the needs of
 815    the failover protocol.  Additional failover protocol capabilities
 816    have been created to support timely detection of partner server
 817    failure.  See section 8.3 for details on this mechanism.
 818
 819 4.  Design Goals
 820
 821    This section lists the design goals and the limitations of the fail-
 822    over protocol.
 823
 824 4.1.  Design goals for this protocol
 825
 826    The following is a list of goals that are met by this protocol.  They
 827    are listed in priority order.
 828
 829       1.  Implementations of this protocol must work with existing DHCP
 830           client implementations based on the DHCP protocol [RFC 2131].
 831
 832       2.  Implementations of the protocol must work with existing BOOTP
 833           relay agent implementations.
 834
 835       3.  The protocol must provide failover redundancy between servers
 836           that are not located on the same subnet.
 837
 838       4.  Provide for continued service to DHCP clients through an
 839
 840
 841
 842 Droms, et. al.           Expires September 2003                [Page 15]
 843 \f
 844 Internet Draft           DHCP Failover Protocol              March 2003
 845
 846
 847           automated mechanism in the event of failure of the primary
 848           server.
 849
 850       5.  Avoid binding an IP address to a client while that binding is
 851           currently valid for another client.  In other words, do not
 852           allocate the same IP address to two clients.
 853
 854       6.  Minimize any need for manual administrative intervention.
 855
 856       7.  Introduce no additional delays in server response time as a
 857           result of the network communications required to implement the
 858           failover protocol, i.e., don't require communications with the
 859           partner between the receipt of a DHCPREQUEST and the
 860           corresponding DHCPACK.
 861
 862       8.  Share IP address ranges between primary and secondary servers;
 863           i.e., impose no requirement that the pool of available
 864           addresses be manually or permanently divided between servers.
 865
 866       9.  Continue to meet the goals and objectives of this protocol in
 867           the event of server failure or network partition.
 868
 869       10. Provide graceful reintegration of full protocol service after
 870           server failure or network partition.
 871
 872       11. Allow for one computer to act as a secondary server for multi-
 873           ple primary servers.  The protocol must allow failover primary
 874           and secondary configuration choices to be made at a granular-
 875           ity smaller than "all of the subnets served by a single
 876           server", though individual implementations may not choose to
 877           allow such flexibility.
 878
 879       12. Ensure that an existing client can keep its existing IP
 880           address binding if it can communicate with either the primary
 881           or secondary DHCP server implementing this protocol - not just
 882           whichever server that originally offered it the binding.
 883
 884       13. Ensure that a new client can get an IP address from some
 885           server.  Ensure that in the face of partition, where servers
 886           continue to run but cannot communicate with each other, the
 887           above goals and requirements may be met.  In addition, when
 888           the partition condition is removed, allow graceful automatic
 889           re-integration without requiring human intervention.
 890
 891       14. If either primary or secondary server loses all of the infor-
 892           mation that it has stored in stable storage, ensure that it be
 893           able to refresh its stable storage from the other server.
 894
 895
 896
 897
 898 Droms, et. al.           Expires September 2003                [Page 16]
 899 \f
 900 Internet Draft           DHCP Failover Protocol              March 2003
 901
 902
 903       15. Support load balancing between the primary and secondary
 904           servers, and allow configuration of the percentage of the
 905           client population served by each with a moderately fine granu-
 906           larity.
 907
 908
 909 4.2.  Limitations of this protocol
 910
 911    The following are explicit limitations of this protocol.
 912
 913       1.  This protocol provides only one level of redundancy through a
 914           single secondary server for each primary server.
 915
 916       2.  A subset of the address pool is reserved for secondary server
 917           use.  In order to handle the failure case where both servers
 918           are able to communicate with DHCP clients, but unable to com-
 919           municate with each other, a subset of the IP address pool must
 920           be set aside as a private address pool for the secondary
 921           server.  The secondary can use these to service newly arrived
 922           DHCP clients during such a period.  The required size of this
 923           private pool is based only on the arrival rate of new DHCP
 924           clients and the length of expected downtime, and is not influ-
 925           enced in any way by the total number of DHCP clients supported
 926           by the server pair.
 927
 928           The failover protocol can be used in a mode where both the
 929           primary and secondary servers can share the load between them
 930           when both are operating.  In this load balancing mode, the
 931           addresses allocated by the primary server to the secondary
 932           server are not unused, but are used instead to service the
 933           portion of the client base to which the secondary server is
 934           required to respond.  See section 5.3 for more information on
 935           load balancing.
 936
 937       3.  The primary and secondary servers do not respond to client
 938           requests at all while recovering from a failure that could
 939           have resulted in duplicate IP assignments.  (When synchroniz-
 940           ing in POTENTIAL-CONFLICT state).
 941
 942
 943 5.  Protocol Overview
 944
 945    This section will discuss the failover protocol at a relatively high
 946    level of detail.  In the event that a description in this section
 947    conflicts (or appears to conflict due to the overview nature of this
 948    section) with information in later sections of this draft, the infor-
 949    mation in the later sections should be considered authoritative.
 950
 951
 952
 953
 954 Droms, et. al.           Expires September 2003                [Page 17]
 955 \f
 956 Internet Draft           DHCP Failover Protocol              March 2003
 957
 958
 959 5.1.  Messages and States
 960
 961    This protocol is centered around the message exchange used by one
 962    server to update the other server of binding database changes result-
 963    ing from DHCP client activity:
 964
 965       o Communication of binding database changes
 966
 967         The binding update (BNDUPD) message is used to send the binding
 968         database changes to the partner server, and the partner server
 969         responds with a binding acknowledgement (BNDACK) message when it
 970         has successfully committed those changes to its own stable
 971         storage.
 972
 973    All of the other messages involve ancillary issues:
 974
 975       o Management of available IP addresses
 976
 977         The pool request (POOLREQ) message is used by the secondary
 978         server to request an allocation of IP addresses from the primary
 979         server.  The pool response (POOLRESP) message is used by the
 980         primary server to inform the secondary server how many IP
 981         addresses were allocated to the secondary server as the result
 982         of the pool request.
 983
 984       o Synchronization of the binding databases between the servers
 985         after they've been out of communications
 986
 987         The update request (UPDREQ) message is used by one server to
 988         request that its partner send it all binding database informa-
 989         tion that it has not already seen.  The update request all
 990         (UPDREQALL) message is used by one server to request that all
 991         binding database information be sent in order to recover from a
 992         total loss of its binding database by the requesting server.
 993         The update done (UPDDONE) message is used by the responding
 994         server to indicate that all requested updates have been sent the
 995         responding server and acked by the requesting server.
 996
 997       o Connection establishment
 998
 999         The connect (CONNECT) message is used by the primary server to
1000         establish a high level connection with the other server, and to
1001         transmit several important configuration data items between the
1002         servers.  The connect acknowledgement message (CONNECTACK) is
1003         used by the secondary server to respond to a CONNECT message
1004         from the primary server.  The disconnect (DISCONNECT) message is
1005         used by either server when closing a connection.
1006
1007
1008
1009
1010 Droms, et. al.           Expires September 2003                [Page 18]
1011 \f
1012 Internet Draft           DHCP Failover Protocol              March 2003
1013
1014
1015       o Server synchronization
1016
1017         The state change (STATE) message is used by either server to
1018         inform the other server of a change of failover state.
1019
1020       o Connection integrity management
1021
1022         The contact (CONTACT) message is used by either server to ensure
1023         that the other server continues to see the connection as opera-
1024         tional.  It MUST be transmitted periodically over every esta-
1025         blished connection if other message traffic is not flowing, and
1026         it MAY be sent at any time.
1027
1028 5.1.1.  Failover endpoints
1029
1030    The proper operation of the failover protocol requires more than the
1031    transmission of messages between one server and the other.  Each end-
1032    point might seem to be a single DHCP server, but in fact there are
1033    many situations where additional flexibility in configuration is use-
1034    ful.
1035
1036    For instance, there might be several servers which are each primary
1037    for a distinct set of address pools, and one server which is secon-
1038    dary for all of those address pools.  The situation with the pri-
1039    maries is straightforward, but the secondary will need to maintain a
1040    separate failover state, partner state, and communications up/down
1041    status for each of the separate primary servers for which it is act-
1042    ing as a secondary.
1043
1044    The failover protocol is SHOULD be configured with one failover rela-
1045    tionship between each pair of failover servers. In this case there is
1046    one failover endpoint for that relationship on each partner.  This
1047    failover relationship MUST have a unique name, which is communicated
1048    using the relationship-name option in the CONNECT and CONNECTACK mes-
1049    sages.
1050
1051    There is typically little need for addtional relationships between
1052    any two servers but there MAY be more than one failover relationship
1053    between two servers -- however each MUST have a unique relationship
1054    name (stored in the relationship-name option).
1055
1056    Any failover endpoint can take actions and hold unique states.
1057
1058    Thus, in the case where there are two primary servers A and B each
1059    backed up by a single common secondary server C, there is one fail-
1060    over endpoint on each of A and B, and two different failover end-
1061    points on C.  The two different failover endpoints on C each have
1062    unique states, unique relationship names, and independent TCP
1063
1064
1065
1066 Droms, et. al.           Expires September 2003                [Page 19]
1067 \f
1068 Internet Draft           DHCP Failover Protocol              March 2003
1069
1070
1071    connections.
1072
1073    This document frequently describes the behavior of the protocol in
1074    terms of primary and secondary servers, not primary and secondary
1075    failover endpoints.  However, it is important to remember that every
1076    'server' described in this document is in reality a failover endpoint
1077    that resides in a particular process, and that many failover end-
1078    points may reside in the same server process.
1079
1080    It is not the case that there is a unique failover endpoint for each
1081    subnet address pool that participates in a failover relationship.  On
1082    one server, there is (typically) one failover endpoint per partner,
1083    regardless of how many subnet address pools are managed by that com-
1084    bination of partner and role.  Conversely, on a particular server,
1085    any given subnet address pool will be associated with exactly one
1086    failover endpoint.
1087
1088    When a connection is received from the partner, the unique failover
1089    endpoint to which the message is directed is determined solely by the
1090    IP address of the partner, the relationship-name, and the role of the
1091    receiving server. See section 8.2.
1092
1093 5.2.  Fundamental guarantees
1094
1095    There a several fundamental restrictions this protocol places on what
1096    one server can do in the absence of knowledge of the other server.
1097    Operating within these restrictions allows certain guarantees to be
1098    made to the partner server, and these are key to the correct opera-
1099    tion of the protocol.
1100
1101 5.2.1.  Control of lease time
1102
1103    The key problem with lazy update is that when a server fails after
1104    updating a client with a particular lease time and before updating
1105    its partner, the partner will believe that a lease has expired even
1106    though the client still retains a valid lease on that IP address.
1107
1108    In order to handle this problem, a period of time known as the "Max-
1109    imum Client Lead Time" (MCLT) is defined and must be known to both
1110    the primary and secondary servers.  Proper use of this time interval
1111    places an upper bound on the difference allowed between the lease
1112    time provided to a DHCP client by a server and the lease time known
1113    by that server's partner.  However, the MCLT is typically much less
1114    than the lease time that a server has been configured to offer a
1115    client, and so some strategy must exist to allow a server to offer
1116    the configured lease time to a client.  During a lazy update the
1117    updating server typically updates its partner with a potential
1118    expiration time which is longer than the lease time previously given
1119
1120
1121
1122 Droms, et. al.           Expires September 2003                [Page 20]
1123 \f
1124 Internet Draft           DHCP Failover Protocol              March 2003
1125
1126
1127    to the client and which is longer than the lease time that the server
1128    has been configured to give a client.  This allows that server to
1129    give a longer lease time to the client the next time the client
1130    renews its lease, since the time that it will give to the client will
1131    not exceed the MCLT beyond the potential expiration time acknowledged
1132    by its partner.
1133
1134    The PARTNER-DOWN state exists so that a server can be sure that its
1135    partner is, indeed, down.  Correct operation while in that state
1136    requires (generally) that the server wait the MCLT after anything
1137    that happened prior to its transition into PARTNER-DOWN state (or,
1138    more accurately, when the other server went down if that is known).
1139    Thus, the server MUST wait the MCLT after the partner server went
1140    down before allocating any of the partner's addresses which were
1141    available for allocation.  In the event the partner was not in com-
1142    munication prior to going down, it might have allocated one or more
1143    of its FREE addresses to a DHCP client and been unable to inform the
1144    server entering PARTNER-DOWN prior to going down itself.  By waiting
1145    the MCLT after the time the partner went down, the server in
1146    PARTNER-DOWN state ensures that any clients which have a lease on one
1147    of the partner's FREE addresses will either time out or contact the
1148    server in PARTNER-DOWN by the time that period ends.
1149
1150    In addition, once a server has made a transition to PARTNER-DOWN
1151    state, it MUST NOT reallocate an IP address from one client to
1152    another client until the longer of the following two times:
1153
1154       o The MCLT after the time the partner server went down (see
1155         above).
1156
1157       o An additional MCLT interval after the lease by the original
1158         client expires.  (Actually, until the maximum client lead time
1159         after what it believes to be the lease expiration time of the
1160         client.)
1161
1162    Some optimizations exist for this restriction, in that it only
1163    applies to leases that were issued BEFORE entering PARTNER-DOWN. Once
1164    a server has entered PARTNER-DOWN and it leases out an address, it
1165    need not wait this time as long as it has never communicated with the
1166    partner since the lease was given out.
1167
1168    The fundamental relationship on which much of the correctness of this
1169    protocol depends is that the lease expiration time known to a DHCP
1170    client MUST NOT be more than the maximum client lead time greater
1171    than the potential expiration time known to a server's partner.
1172
1173    The remainder of this section makes the above fundamental relation-
1174    ship more explicit.
1175
1176
1177
1178 Droms, et. al.           Expires September 2003                [Page 21]
1179 \f
1180 Internet Draft           DHCP Failover Protocol              March 2003
1181
1182
1183    This protocol requires a DHCP server to deal with several different
1184    lease intervals and places specific restrictions on their relation-
1185    ships. The purpose of these restrictions is to allow the other server
1186    in the pair to be able to make certain assumptions in the absence of
1187    an ability to communicate between servers.
1188
1189    The different lease times are:
1190
1191    o desired lease interval
1192
1193      The desired lease interval is the lease interval that a DHCP server
1194      would like to give to a DHCP client in the absence of any restric-
1195      tions imposed by the Failover protocol.  Its determination is out-
1196      side of the scope of this protocol. Typically this is the result of
1197      external configuration of a DHCP server.
1198
1199    o actual lease interval
1200
1201      The actual lease internal is the lease interval that a DHCP server
1202      gives out to a DHCP client in the dhcp-lease-time option of a
1203      DHCPACK packet.  It may be shorter than the desired client lease
1204      interval (as explained below).
1205
1206    o potential lease interval
1207
1208      The potential lease interval is the lease expiration interval the
1209      local server tells to its partner in the potential-expiration-time
1210      option of a BNDUPD message.
1211
1212    o acknowledged potential lease interval
1213
1214      The acknowledged potential lease interval is the potential lease
1215      interval the partner server has most recently acknowledged in the
1216      potential-expiration-time option of a BNDACK message.
1217
1218    The key restriction (and guarantee) that any server makes with
1219    respect to lease intervals is that the actual client lease interval
1220    never exceeds the acknowledged potential lease interval (if any) by
1221    more than a fixed amount.  This fixed amount is called the "Maximum
1222    Client Lead Time" (MCLT).
1223
1224    The MCLT MAY be configurable on the primary server, but for correct
1225    server operation it MUST be the same and known to both the primary
1226    and secondary servers.  The secondary server determines the MCLT from
1227    the MCLT option sent from the primary server to the secondary server
1228    in the CONNECT message.
1229
1230    A server MUST record in its stable storage both the actual lease
1231
1232
1233
1234 Droms, et. al.           Expires September 2003                [Page 22]
1235 \f
1236 Internet Draft           DHCP Failover Protocol              March 2003
1237
1238
1239    interval and the most recently acknowledged potential lease interval
1240    for each IP address binding.  It is assumed that the desired client
1241    lease interval can be determined through techniques outside of the
1242    scope of this protocol.  See section 7.1.5 for more details concern-
1243    ing the times that the server MUST record in its stable storage and
1244    the way that they interact with the lease time that may be offered to
1245    a DHCP client.
1246
1247    Again, the fundamental relationship among these times which MUST be
1248    maintained is:
1249
1250        actual lease interval <
1251        ( acknowledged potential lease interval + MCLT )
1252
1253
1254    Figure 5.2.1-1 illustrates an initial lease to a client using the
1255    rules discussed in the example which follows it.  Note that this is
1256    only one example -- as long as the fundamental relationship is
1257    preserved, the actual times used could be quite different.
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290 Droms, et. al.           Expires September 2003                [Page 23]
1291 \f
1292 Internet Draft           DHCP Failover Protocol              March 2003
1293
1294
1295
1296               DHCP                 Primary             Secondary
1297        time   Client               Server               Server
1298
1299                 | (time in intervals) |  (absolute time)   |
1300                 |                     |                    |
1301                 | >-DHCPDISCOVER->    |                    |
1302                 |     <---DHCPOFFER-< |                    |
1303                 |  lease-time=MCLT    |                    |
1304                 |                     |                    |
1305                 | >-DHCPREQUEST->     |                    |
1306                 |   (selecting)       |                    |
1307                 |                     |                    |
1308          t      |  <--------DHCPACK-< |                    |
1309                 |  lease-time=MCLT    |                    |
1310                 |                     |    >-BNDUPD-->     |
1311                 |                     |  lease-expiration=t+MCLT
1312                 |                     |  potential-expiration=t+(MCLT/2)+X
1313                 |                     |                    |
1314                 |                     |     <-BNDACK-<     |
1315                 |                     |  potential-expiration=t+(MCLT/2)+X
1316                ...                   ...                  ...
1317                 |                     |                    |
1318       t+MCLT/2  | >-DHCPREQUEST->     |                    |
1319                 |      (renew)        |                    |
1320                 |                     |                    |
1321          t1     |  <--------DHCPACK-< |                    |
1322                 |   lease-time=X      |                    |
1323                 |                     |    >-BNDUPD-->     |
1324                 |                     |  lease-expiration=t1+X
1325                 |                     |  potential-expiration=t1+(X/2)+X
1326                 |                     |                    |
1327                 |                     |     <-BNDACK-<     |
1328                 |                     |  potential-expiration=t1+(X/2)+X
1329                ...                   ...                  ...
1330
1331            Figure 5.2.1-1:  Lazy Update Message Traffic
1332                           X = Desired Lease Interval
1333                           Assumes renewal interval = lease interval / 2
1334
1335
1336    DISCUSSION:
1337
1338       This protocol mandates only that the above fundamental relation-
1339       ship concerning lease intervals is preserved.
1340
1341       In the interests of clarity, however, let's examine a specific
1342       example.  The MCLT in this case is 1 hour.  The desired lease
1343
1344
1345
1346 Droms, et. al.           Expires September 2003                [Page 24]
1347 \f
1348 Internet Draft           DHCP Failover Protocol              March 2003
1349
1350
1351       interval is 3 days, and its renewal time is half the lease inter-
1352       val.
1353
1354       The rules for this example are:
1355
1356       o What to tell the client:
1357
1358         Take the remainder of the acknowledged potential lease interval.
1359         If this is a new lease, then this value will be zero.  If this
1360         remainder plus the MCLT is greater than the desired lease inter-
1361         val, give the client the desired lease interval else give the
1362         client the remainder plus the MCLT.
1363
1364       o What to tell the failover partner server:
1365
1366         Take the renewal interval (typically half of the actual client
1367         lease interval), add to it the desired lease interval, and add
1368         it to the current time to yield the value that goes into the
1369         potential-expiration-time option.
1370
1371         Also tell the failover partner the actual lease interval by
1372         adding it to the current time to yield the value that goes into
1373         the lease-expiration option.
1374
1375       In operation this might work as follows:
1376
1377       When a server makes an offer for a new lease on an IP address to a
1378       DHCP client, it determines the desired lease interval (in this
1379       case, 3 days).  It then examines the acknowledged potential lease
1380       interval (which in this case is zero) and determines the remainder
1381       of the time left to run, which is also zero.  To this it adds the
1382       MCLT.  Since the actual lease interval cannot be allowed to exceed
1383       the remainder of the current acknowledged potential lease interval
1384       plus the MCLT, the offer made to the client is for the remainder
1385       of the current acknowledged potential lease interval (i.e., zero)
1386       plus the MCLT.  Thus, the actual lease interval is 1 hour.
1387
1388       Once the server has performed the DHCPACK to the DHCP client, it
1389       will update the secondary server with the lease information. How-
1390       ever, the desired potential lease interval will be composed of one
1391       half of the current actual lease interval added to the desired
1392       lease interval. Thus, the secondary server is updated with a
1393       BNDUPD with a lease interval of 3 days + 1/2 hour specified in the
1394       potential-expiration-time option.
1395
1396       When the primary server receives a BNDACK to its update of the
1397       secondary server's (partner's) potential lease interval, it
1398       records that as the acknowledged potential lease interval.  A
1399
1400
1401
1402 Droms, et. al.           Expires September 2003                [Page 25]
1403 \f
1404 Internet Draft           DHCP Failover Protocol              March 2003
1405
1406
1407       server MUST NOT send a BNDACK in response to a BNDUPD message
1408       until it is sure that the information in the BNDUPD message
1409       resides in its stable storage.  Thus, the primary server in this
1410       case can be sure that the secondary server has recorded the poten-
1411       tial lease interval in its stable storage when the primary server
1412       receives a BNDACK message from the secondary server.
1413
1414       When the DHCP client attempts to renew at T1 (approximately one
1415       half an hour from the start of the lease), the primary server
1416       again determines the desired lease interval, which is still 3
1417       days.  It then compares this with the remaining acknowledged
1418       potential lease interval (3 days + 1/2 hour) and adjusts for the
1419       time passed since the secondary was last updated (1/2 hour).  Thus
1420       the time remaining of the acknowledged potential lease interval is
1421       3 days.  Adding the MCLT to this yields 3 days plus 1 hour, which
1422       is more than the desired lease interval of 3 days.  So the client
1423       is renewed for the desired lease interval -- 3 days.
1424
1425       When the primary DHCP server updates the secondary DHCP server
1426       after the DHCP client's renewal ACK is complete, it will calculate
1427       the desired potential lease interval as the T1 fraction of the
1428       actual client lease interval (1/2 of 3 days this time = 1.5 days).
1429       To this it will add the desired client lease interval of 3 days,
1430       yielding a total desired partner server lease interval of 4.5
1431       days.  In this way, the primary attempts to have the secondary
1432       always "lead" the client in its understanding of the client's
1433       lease interval so as to be able to always offer the client the
1434       desired client lease interval.
1435
1436       Once the initial actual client lease interval of the MCLT is past,
1437       the protocol operates effectively like the DHCP protocol does
1438       today in its behavior concerning lease intervals. However, the
1439       guarantee that the actual client lease interval will never exceed
1440       the remaining acknowledged partner server lease interval by more
1441       than the MCLT allows full recovery from a variety of failures.
1442
1443 5.2.2.  Controlled re-allocation of IP addresses
1444
1445    When in PARTNER-DOWN state there is a waiting period after which an
1446    IP address can be re-allocated to another client.  For IP addresses
1447    which are available when the server enters PARTNER-DOWN state, the
1448    period is the MCLT from entry into PARTNER-DOWN state.  For IP
1449    addresses which are not available when the server enters PARTNER-DOWN
1450    state, the period is the MCLT after the IP address becomes available.
1451    See section 9.4.2 for more details.
1452
1453    In any other state, a server cannot reallocate an address from one
1454    client to another without first notifying its partner (through a
1455
1456
1457
1458 Droms, et. al.           Expires September 2003                [Page 26]
1459 \f
1460 Internet Draft           DHCP Failover Protocol              March 2003
1461
1462
1463    BNDUPD message) and receiving acknowledgement (through a BNDACK mes-
1464    sage) that its partner is aware that that first client is not using
1465    the address.
1466
1467    This could be modeled in the following way.  Though this specific
1468    implementation is in no way required, it may serve to better illus-
1469    trate the concept.
1470
1471    An "available" IP address on a server may be allocated to any client.
1472    An IP address which was leased to a client and which expired or was
1473    released by that client would take on a new state, EXPIRED or
1474    RELEASED respectively.  The partner server would then be notified
1475    that this IP address was EXPIRED or RELEASED through a BNDUPD.  When
1476    the sending server received the BNDACK for that IP address showing it
1477    was FREE, it would move the IP address from EXPIRED or RELEASED to
1478    FREE, and it would be available for allocation by the primary server
1479    to any clients.
1480
1481    A server MAY reallocate an IP address in the EXPIRED or RELEASED
1482    state to the same client with no restrictions provided it has not
1483    sent a BNDUPD message to its partner.  This situation would exist if
1484    the lease expired or was released after the transition into PARTNER-
1485    DOWN state, for instance.
1486
1487
1488 5.3.  Load balancing
1489
1490    In order to implement load balancing between a primary and secondary
1491    server pair, each server must respond to DHCPDISCOVER requests from
1492    some clients and not from other clients.  In order to do this suc-
1493    cessfully, each server must be able to determine immediately upon
1494    receipt of a DHCP client request whether it is to service this
1495    request or to ignore it in order to allow the other server to service
1496    the request.
1497
1498    In addition, it should be possible to configure the percentage of
1499    clients which will be serviced by either the primary or secondary
1500    server.  This configuration should be more or less continuous, from
1501    all clients serviced by the primary through an even split with half
1502    serviced by each, to all clients serviced by the secondary.
1503
1504    The technique chosen to support these goals is described in [RFC
1505    3074].
1506
1507    A bitmap-style Hash Bucket Assignment (as described in [RFC 3074]) is
1508    used to determine which DHCP clients can be processed.  There are two
1509    potential HBA's in a failover server -- a server HBA and a failover
1510    HBA.   The way that a server acquires a server HBA is outside of the
1511
1512
1513
1514 Droms, et. al.           Expires September 2003                [Page 27]
1515 \f
1516 Internet Draft           DHCP Failover Protocol              March 2003
1517
1518
1519    scope of the failover protocol, but both servers in a failover pair
1520    MUST have the same server HBA. The failover HBA (which specifies the
1521    clients that the secondary is supposed to process) is sent by the
1522    primary server to the secondary server whenever a connection is esta-
1523    blished, using the hash-bucket-assignment option defined in section
1524    12.11.
1525
1526    When using the server HBA (if any) and the failover HBA (if any), to
1527    decide whether to process a DHCP request, the server HBA always
1528    applies in every failover state, and the failover HBA (which MUST be
1529    a subset of the server HBA) is used by the secondary server to decide
1530    which packets to process when in NORMAL state.
1531
1532 5.4.  IP address allocations between servers
1533
1534    The failover protocol allows a DHCP server which implements it to
1535    operate correctly in spite of the uncertainty over whether its
1536    partner has failed or whether the communications link to its partner
1537    has failed.  This is made possible in part by the existence of
1538    separate address pools on each server for allocation to newly arrived
1539    DHCP clients.
1540
1541    Thus, each server has its own pool of available IP addresses.  Note
1542    that an IP address is not "owned" by a particular server throughout
1543    its entire lifetime.  Only an IP address which is available is
1544    "owned" by a particular server -- once it has been leased to a DHCP
1545    client, it is not owned by either failover partner.  When it finally
1546    becomes available again, it will be owned initially by the primary
1547    server, and it may or may not be allocated to the secondary server by
1548    the primary server.
1549
1550    So, the flow of IP address ownership is as follows: initially an IP
1551    address is owned by the primary server.  It may be allocated to the
1552    secondary server if it is available, and then it is owned by the
1553    secondary server.  Either server can allocate available IP addresses
1554    which they own to DHCP clients, in which case they cease to own them.
1555    When the DHCP client releases the address or the lease on it expires,
1556    it will again become available and will be owned by the primary.
1557
1558    An IP address will not become owned by the server which allocated it
1559    initially when it is released or the lease expires because, in gen-
1560    eral, that server will have had to replenish its pool of available
1561    addresses well in advance of any likely lease expirations.  Thus,
1562    having a particular IP address cycle back to the secondary might well
1563    put the secondary more out of balance with respect to the primary
1564    instead of enhancing the balance of available addresses between them.
1565
1566    These address pools are used when in COMMUNICATIONS-INTERRUPTED state
1567
1568
1569
1570 Droms, et. al.           Expires September 2003                [Page 28]
1571 \f
1572 Internet Draft           DHCP Failover Protocol              March 2003
1573
1574
1575    and while waiting for the MCLT expiration in PARTNER-DOWN state.  In
1576    addition, when using load balancing, these pools are used when in
1577    NORMAL state as well.
1578
1579    This allocation and maintenance of these address pools is an area of
1580    some sensitivity, since the goal is to maintain a more or less con-
1581    stant ratio of available addresses between the two servers.
1582
1583    The initial allocation when the servers first integrate is triggered
1584    by the POOLREQ message from the secondary to the primary.  This is
1585    followed by the POOLRESP message where the primary tells the secon-
1586    dary how many IP addresses it allocated to the secondary.  Then, the
1587    primary sends the allocated IP addresses to the secondary via BNDUPD
1588    messages.  l The POOLREQ/POOLRESP message is a trigger to the primary
1589    to perform a scan of its database and to ensure that the secondary
1590    has enough IP addresses (based on some configured ratio).
1591
1592    The actual IP addresses are sent to the secondary using the BNDUPD
1593    message with a state of BACKUP, which indicates the IP address is now
1594    available for allocation by the secondary.  Once the message is sent,
1595    the primary MUST NOT use these addresses for allocation to DHCP
1596    clients.
1597
1598    The POOLREQ/POOLRESP message exchange initiated by the secondary is
1599    valid at any time, and the primary server SHOULD, whenever it
1600    receives the POOLREQ message, scan its database of address pools and
1601    determine if the secondary needs more IP addresses from any of the IP
1602    address pools.
1603
1604    However, in order to support a reasonably dynamic balance of the IP
1605    addresses between the failover partners, the primary server needs to
1606    do additional work to ensure that the secondary server has as many IP
1607    addresses as it needs (but that it doesn't have *more* than it needs
1608    either).
1609
1610    The primary server SHOULD examine the balance of available addresses
1611    between the primary and secondary for a particular address pool when-
1612    ever the number of available addresses for either the primary or
1613    secondary changes.  The primary server SHOULD adjust the available
1614    address balance as required to ensure the configured address balance,
1615    excepting that the primary server SHOULD employ some threshold
1616    mechanism to such a balance adjustment in order to minimize the over-
1617    head of maintaining this balance.
1618
1619    An example of a threshold approach is: do not attempt to re-balance
1620    the available pools on the primary and secondary until the out of
1621    balance value exceeds a configured value.
1622
1623
1624
1625
1626 Droms, et. al.           Expires September 2003                [Page 29]
1627 \f
1628 Internet Draft           DHCP Failover Protocol              March 2003
1629
1630
1631    The primary server can, at any time, send an available IP address to
1632    the secondary using a BNDUPD with the state BACKUP.  The primary
1633    server can attempt to take an available IP address away from the
1634    secondary by sending a BNDUPD with the state FREE.  If the secondary
1635    accepts the BNDUPD, then it is now available to the PRIMARY and not
1636    available to the secondary.  Of course, the secondary MUST reject
1637    that BNDUPD if it has already used that IP address for a DHCP client.
1638
1639    Whenever the primary server examines the possible available IP
1640    addresses which it could send to the secondary server, the primary
1641    server SHOULD take into account whether load balancing is in use, and
1642    it SHOULD attempt to send to the secondary any IP addresses whose
1643    most recent client would be processed by the secondary under the
1644    current load balancing regime in use.  Likewise, when removing avail-
1645    able IP addresses from the secondary server when load balancing is in
1646    use, the primary server SHOULD first remove those IP addresses whose
1647    most recent client would be processed by the primary server under the
1648    current load balancing regime in use.
1649
1650 5.5.  Operating in NORMAL state
1651
1652    When in NORMAL state, each server services DHCPDISCOVER's and all
1653    other DHCP requests other than DHCPREQUEST/RENEWAL or
1654    DHCPREQUEST/REBINDING from the client set defined by the load balanc-
1655    ing algorithm [RFC 3074].  Each server services DHCPREQUEST/RENEWAL
1656    or DHCPDISCOVER/REBINDING requests from any client.
1657
1658    In general, whenever the binding database is changed in stable
1659    storage (other than a change resulting from receiving a BNDUPD from
1660    the failover partner), then a BNDUPD message is sent with the con-
1661    tents of that change to the partner server.  The partner server then
1662    writes the information about that binding in its bindings database in
1663    stable storage and replies with a BNDACK message.
1664
1665    The binding database in a DHCP server would normally be changed as a
1666    result of DHCP protocol activity with a DHCP client  (e.g., granting
1667    a lease to a DHCP client through the familiar
1668    DISCOVER/OFFER/REQUEST/ACK cycle or extending a lease due to a
1669    renewal from a DHCP client) or possibly (on some servers) because a
1670    lease has expired or undergone another state change that must be
1671    recorded in the DHCP binding database.  These are the state changes
1672    that would be communicated to the partner server using a BNDUPD mes-
1673    sage.  Of course, receipt of a BNDUPD message itself will normally
1674    cause an update of the binding database for all of the IP addresses
1675    contained in the BNDUPD, and a binding database change such as this
1676    MUST NOT trigger a corresponding BNDUPD message to the partner.
1677
1678
1679
1680
1681
1682 Droms, et. al.           Expires September 2003                [Page 30]
1683 \f
1684 Internet Draft           DHCP Failover Protocol              March 2003
1685
1686
1687 5.6.  Operating in COMMUNICATIONS-INTERRUPTED state
1688
1689    When operating in COMMUNICATIONS-INTERRUPTED state, each server is
1690    operating independently, but does not assume that its partner is not
1691    operating.  The partner server might be operating and simply unable
1692    to communicate with this server, or might not be operating.
1693
1694    Each server responds to the full range of DHCP client messages that
1695    it receives (subject to server load balancing [RFC 3074]), but in
1696    such a way that graceful reintegration is always possible when its
1697    partner comes back into contact with it.
1698
1699 5.7.  Operating in PARTNER-DOWN state
1700
1701    When operating in PARTNER-DOWN state, a server assumes that its
1702    partner is not currently operating, but does make allowances for the
1703    possibility that that server was operating in the past, though possi-
1704    bly out of communications with this server.  It responds to all DHCP
1705    client requests in PARTNER-DOWN state (subject to server load balanc-
1706    ing [RFC 3074]).
1707
1708 5.8.  Operating in RECOVER state
1709
1710    A server operating in RECOVER state assumes that it is reintegrating
1711    with a server that has been operating in PARTNER-DOWN state, and that
1712    it needs to update its bindings database before it services DHCP
1713    client requests.
1714
1715    A server may also operate in RECOVER state in order to fully recover
1716    its bindings database from its partner server.
1717
1718 5.9.  Operating in STARTUP state
1719
1720    A server operating in STARTUP state assumes that failover is opera-
1721    tional, and it spends a short time whenever it comes up attempting to
1722    contact the partner.  During this short time, the server is unrespon-
1723    sive to DHCP client requests.  This period exists in order to give a
1724    server a chance to determine that its partner has changed state since
1725    it was last in communications, and to react to that changed state (if
1726    any) prior to responding to DHCP client requests.
1727
1728    The startup period SHOULD be conditioned on the length of time the
1729    server has been down (if that can be determined).  If the server has
1730    been down less than the MCLT then it can wait only a few (say 5 or
1731    10) seconds.  If it has been down a longer time (such that the
1732    partner may well have moved to PARTNER-DOWN state), a considerably
1733    longer startup period of 30 to 60 seconds may be warranted, since the
1734    consequences of running while the partner is in PARTNER-DOWN state
1735
1736
1737
1738 Droms, et. al.           Expires September 2003                [Page 31]
1739 \f
1740 Internet Draft           DHCP Failover Protocol              March 2003
1741
1742
1743    are unpleasant.
1744
1745    The period of time a server remains in STARTUP state SHOULD be long
1746    enough to ensure that it will connect to the other server if that
1747    server is available for connections.
1748
1749 5.10.  Time synchronization between servers
1750
1751    The failover protocol is designed to operate between two servers
1752    which have time values which differ by an arbitrarily large amount.
1753    A particular implementation MAY choose to only support servers whose
1754    time values differ by an arbitrarily small amount.
1755
1756    Note that if an implementation that requires time synchronization
1757    between servers encounters a case where the time is not synchronized
1758    to its satisfaction between two servers, then this failure will prob-
1759    ably prevent the two servers from reaching communications OK status.
1760    In this occurs, and if both servers continue to operate and deal with
1761    clients, potentially troublesome things can happen.  For instance, if
1762    there is a safe period configured on either server, then it will
1763    eventually go into PARTNER-DOWN state, but in this case the partner
1764    will not be down.  This will almost certainly create problems.  Thus,
1765    some method to prevent this sort of situation SHOULD exist in imple-
1766    mentations that can be configured to require time synchronization.
1767
1768    In any event, whether large or only small differences in time values
1769    are supported, every message that is sent MUST include the time and
1770    every packet that is received MUST be tagged with a time value as
1771    soon as possible after receipt.  This time value is used along with
1772    the time value that is sent in every message between the failover
1773    partners to develop a delta time between the servers.  This delta
1774    time is used during the connection process to establish a baseline
1775    delta time between the servers, and upon receipt of each message, the
1776    delta time for that message is used to refine the delta time for the
1777    server pair.
1778
1779    While the algorithm for this refinement of delta time is not speci-
1780    fied as part of this protocol, a server SHOULD allow the delta time
1781    value for a pair of failover servers to be periodically updated to
1782    account for time drift.  In addition, the delta time value between
1783    servers SHOULD be smoothed in some fashion, so that transient network
1784    delays will not cause it to vary wildly.
1785
1786    A server SHOULD recognize a drastic change in the delta time value as
1787    an event to be signaled to a network administrator, as well as reset-
1788    ting the time delta between the failover partners.
1789
1790    The specific definitions of a minor or drastic change in delta time
1791
1792
1793
1794 Droms, et. al.           Expires September 2003                [Page 32]
1795 \f
1796 Internet Draft           DHCP Failover Protocol              March 2003
1797
1798
1799    as well as the algorithm used to smooth minor changes into the run-
1800    ning delta time are implementation issues and are not further
1801    addressed in this document.
1802
1803 5.11.  IP address binding-status
1804
1805    In most DHCP servers an IP address can take on several different
1806    binding-status values, sometimes also called states.  While no two
1807    DHCP servers probably have exactly the same possible binding-status
1808    values, the DHCP RFC enforces some commonality among the general
1809    semantics of the binding-status values used by various DHCP server
1810    implementations.
1811
1812    In order to transmit binding database updates between one server and
1813    another using the failover protocol, some common denominator
1814    binding-status values must be defined.  It is not expected that these
1815    binding-status-values correspond with any actual implementation of
1816    the DHCP protocol in a DHCP server, but rather that the binding-
1817    status values defined in this document should be a common denominator
1818    of those in use by many DHCP server implementations.  It is a goal of
1819    this protocol that any DHCP server can map the various IP address
1820    binding-status values that it uses internally into these failover IP
1821    address binding-status values on transmission of binding database
1822    updates to its partner, and likewise that it can map any failover IP
1823    address binding-status values it received in a binding update into
1824    its internal IP address binding-status values.
1825
1826    The IP address binding-status values defined for the failover proto-
1827    col are listed below.  Unless otherwise noted below, there MAY be
1828    client information associated with each of these binding-status
1829    values.
1830
1831       o ACTIVE -- Lease is assigned to a client. Client identification
1832         MUST appear.
1833
1834       o EXPIRED -- indicates that a client's binding on an IP address
1835         has expired. When the partner server ACK's the BNDUPD of an
1836         EXPIRED IP address, the server sets its internal state to FREE.
1837         It is then available for allocation to any client of the primary
1838         server.  It may be allocated to the same client on the server
1839         where the lease expired if a BNDUPD containing the EXPIRED state
1840         has not yet been sent to the partner (e.g., in the event that
1841         the servers are not in communication).  Client identification
1842         SHOULD appear.
1843
1844       o RELEASED -- indicates that a DHCP client sent in a DHCPRELEASE
1845         message.  When the partner server ACK's the BNDUPD of an
1846         RELEASED IP address, the server sets its internal state to FREE,
1847
1848
1849
1850 Droms, et. al.           Expires September 2003                [Page 33]
1851 \f
1852 Internet Draft           DHCP Failover Protocol              March 2003
1853
1854
1855         and it is available for allocation by the primary server to any
1856         DHCP client.  It may be allocated to the same client if a BNDUPD
1857         has not yet been sent to the partner.  Client identification
1858         SHOULD appear.
1859
1860       o FREE -- is used when a DHCP server needs to communicate that an
1861         IP address is unused by any DHCP client, but it was not just
1862         released, expired, or reset by a network administrator.  When
1863         the partner server ACK's the BNDUPD of a FREE IP address, the
1864         server sets its internal state such that it is available for
1865         allocation by the primary DHCP server to any DHCP client.  (Note
1866         that in PARTNER-DOWN state, after waiting the MCLT, the IP
1867         address MAY be allocated to a DHCP client by the secondary
1868         server.)
1869
1870         Note that when an IP address that was allocated by the secondary
1871         reverts to the FREE state, it must (like any other IP address)
1872         be assigned to the secondary through the POOLREQ/BNDUPD process
1873         before the secondary can reallocate it.
1874
1875         Client identification MAY appear.
1876
1877       o ABANDONED -- indicates that an IP address is considered unusable
1878         by the DHCP subsystem.  An IP address for which a valid PING
1879         response was received SHOULD be set to ABANDONED.  An IP address
1880         for which a DHCPDECLINE was received should be set to ABANDONED.
1881         Client identification MUST NOT appear.
1882
1883       o RESET -- indicates that this IP address was made available by
1884         operator command.  This is a distinct state so that the reason
1885         that the IP address became FREE can be determined.  Client iden-
1886         tification MAY appear.
1887
1888       o BACKUP -- indicates that this IP address can be allocated by the
1889         secondary server to a DHCP client at any time. When the MCLT has
1890         passed after its time of entry into PARTNER-DOWN state, the IP
1891         address may be allocated by the primary to any DHCP client.
1892         Client identification MAY appear.
1893
1894    These binding-status values are communicated from one failover
1895    partner to another using the binding-status option, see section 12.3
1896    for details of this option.  Unless otherwise noted above there MAY
1897    be client information associated with each of these binding-status
1898    values.
1899
1900    An IP address will move between these binding-status values using the
1901    following state transition diagram:
1902
1903
1904
1905
1906 Droms, et. al.           Expires September 2003                [Page 34]
1907 \f
1908 Internet Draft           DHCP Failover Protocol              March 2003
1909
1910
1911
1912
1913                                         DHCP client DECLINE or
1914                                         server detected problem
1915                                         from any state
1916                                                   |
1917                                                   V
1918                           +----------+         +--+------+
1919          External   >---->|   RESET  |   (3)   |ABANDONED|
1920          command          |          +<--------+         |
1921                           +----------+         +---------+
1922                                |
1923                            Comm w/Parter(1)
1924                                V
1925      +---------+  Comm(1) +----------+   Comm(1) +---------+
1926      | EXPIRED |--------->|  FREE    |<----------| RELEASED|
1927      |         | w/Parter |          | w/Partner |         |
1928      +---------+          +----------+           +---------+
1929        ^     ^             |    |  +-----------+       ^
1930        |     |             |    |              |       |
1931        | Exp. grace     IP |  IP addr alloc.  IP addr  |
1932        | period ends  address  to sec.(2)     reserved |
1933        |     |        leased    V              |       |
1934        |     |        by   |   +----------+    |       |
1935        |     |        primary  |  BACKUP  |<---+       |
1936        |   wait for        |   |          |            |
1937        |  grace period     |   +----------+            |
1938        |     |             |       |                   |
1939        |     |             |    IP addr leased by      |
1940        |  Expired grace    |       secondary           |
1941        |  period exists    V       V                   |
1942        |     |           +----------+                  |
1943        |     | Lease on  |  ACTIVE  | DHCPRELEASE      |
1944        +-----+-IP addr---|          |------------------+
1945                expires   +----------+
1946
1947
1948        Figure 5.11-1:  Transitions between binding-status values.
1949
1950        (1) This transition MAY also occur if the server is in
1951        PARTNER-DOWN state and the MCLT has passed since the entry
1952        in the RELEASED, EXPIRED, or RESET states.
1953
1954        (2) This transition MAY occur if the server is the secondary
1955        and the MCLT has passed since its entry into PARTNER-DOWN state.
1956
1957        (3) This transition MAY occur due to an implementation specific
1958        handling of ABANDONED IP addresses.
1959
1960
1961
1962 Droms, et. al.           Expires September 2003                [Page 35]
1963 \f
1964 Internet Draft           DHCP Failover Protocol              March 2003
1965
1966
1967
1968
1969
1970    Again, note that a DHCP server implementing the failover protocol
1971    does not have to implement either this state machine or use these
1972    particular binding-status values in its normal operation of allocat-
1973    ing IP addresses to DHCP clients.  It only needs to map its internal
1974    binding-status-values onto these "standard" binding-status values,
1975    and map these "standard" binding-status values back into its internal
1976    binding-status values.  For example, a server which implements a
1977    grace period for a IP address binding SHOULD simply wait to update
1978    its partner server until the grace period on that binding has run
1979    out.
1980
1981    The process of setting an IP address to FREE deserves some detailed
1982    discussion.  When an IP address is moved to the EXPIRED,RELEASED, or
1983    RESET binding-status on a server, it will send a BNDUPD with the
1984    binding-status of EXPIRED, RELEASED, or RESET to its partner.  If its
1985    partner agrees that is acceptable (see sections 7.1.2 and 7.1.3 con-
1986    cerning why a server might not accept a BNDUPD) it will return a
1987    BNDACK with no reject-reason, signifying that it accepted the update.
1988    As part of the BNDUPD processing, the server returning the BNDACK
1989    will set the binding-status of the IP address to FREE, and upon
1990    receipt of the BNDACK the server which sent the BNDUPD will set the
1991    binding-status of the IP address to FREE.  Thus, the EXPIRED,
1992    RELEASED, or RESET binding-status is something of a transitory state.
1993    This process is encoded in the transition diagram above by "Comm
1994    w/Partner".
1995
1996 5.12.  DNS dynamic update considerations
1997
1998    DHCP servers (and clients) can use DNS Dynamic Updates as described
1999    in [RFC 2136] to maintain DNS name-mappings as they maintain DHCP
2000    leases.  Many different administrative models for DHCP-DNS integra-
2001    tion are possible.  Descriptions of several of these models, and
2002    guidelines that DHCP servers and clients should follow in carrying
2003    them out, are laid out in [FQDN].  The nature of the DHCP failover
2004    protocol introduces some issues concerning dynamic DNS updates that
2005    are not part of non-failover DHCP environments.  This section
2006    describes these issues, and defines the information which failover
2007    partners should exchange and the protocol which they should follow in
2008    order to ensure consistent behavior.  The presence of this section
2009    should not be interpreted as requiring that implementations of the
2010    DHCP failover protocol must also support DDNS updates.  The purpose
2011    of this discussion is to clarify the areas where the DHCP failover
2012    and DHCP-DDNS protocols intersect for the benefit of implementations
2013    which support both protocols, not to introduce a new requirement into
2014    the DHCP failover protocol.  Thus, a DHCP server which implements the
2015
2016
2017
2018 Droms, et. al.           Expires September 2003                [Page 36]
2019 \f
2020 Internet Draft           DHCP Failover Protocol              March 2003
2021
2022
2023    failover protocol MAY also support dynamic DNS updates, but if it
2024    does support dynamic DNS updates it SHOULD utilize the techniques
2025    described here in order to correctly distribute them between the
2026    failover partners.  See [FQDN], [DNSRES], and [DHCID] for details of
2027    how DHCP servers update DNS.
2028
2029    From the standpoint of the failover protocol, there is no reason why
2030    a server which is utilizing the DDNS protocol to update a DNS server
2031    should not be a partner with a server which is not utilizing the DDNS
2032    protocol to update a DNS server.  However, a server which is not able
2033    to support DDNS or is not configured to support DDNS SHOULD output a
2034    warning message when it receives BNDUPD messages which indicate that
2035    its failover partner is configured to support the DDNS protocol to
2036    update a DNS server.  An implementation MAY consider this an error
2037    and refuse to operate, or it MAY choose to operate anyway, having
2038    warned the user of the problem in some way.
2039
2040 5.12.1.  Relationship between failover and dynamic DNS update
2041
2042    The failover protocol describes the conditions under which each fail-
2043    over server may renew a lease to its current DHCP client, and
2044    describes the conditions under which it may grant a lease to a new
2045    DHCP client.  An analogous set of conditions determines when a fail-
2046    over server should initiate a DDNS update, and when it should attempt
2047    to remove records from the DNS. The failover protocol's conditions
2048    are based on the desired external behavior: avoiding duplicate
2049    address assignments; allowing clients to continue using leases which
2050    they obtained from one failover partner even if they can only commun-
2051    icate with the other partner; allowing the backup DHCP server to
2052    grant new leases even if it is unable to communicate with the primary
2053    server.  The desired external DDNS behavior for DHCP failover servers
2054    is:
2055
2056       1.  Allow timely DDNS updates from the server which grants a
2057           client a lease. Recognize that there is often a DDNS update
2058           lifecycle which parallels the DHCP lease lifecycle. This is
2059           likely to include the addition of records when the lease is
2060           granted, and the removal of DNS records when the lease is sub-
2061           sequently made available for allocation to a different client.
2062
2063       2.  Communicate enough information between the two failover
2064           servers to allow one to complete the DDNS update 'lifecycle'
2065           even if the other server originally granted the lease.
2066
2067       3.  Avoid redundant or overlapping DDNS updates, where both fail-
2068           over servers are attempting to perform DDNS updates for the
2069           same lease-client binding. Avoid situations where one partner
2070           is attempting to add RRs related to a lease binding while the
2071
2072
2073
2074 Droms, et. al.           Expires September 2003                [Page 37]
2075 \f
2076 Internet Draft           DHCP Failover Protocol              March 2003
2077
2078
2079           other partner is attempting to remove RRs related to the same
2080           lease binding.
2081
2082 5.12.2.  Use of the DDNS option
2083
2084    In order for either server to be able to complete a DDNS update, or
2085    to remove DNS records which were added by its partner, both servers
2086    need to know the FQDN associated with the lease-client binding. The
2087    FQDN associated with the client's A RR and PTR RR SHOULD be communi-
2088    cated from the server which adds records into the DNS to its partner.
2089    The initiating server SHOULD use the DDNS option in the BNDUPD mes-
2090    sages to inform the partner server of the status of any DDNS updates
2091    associated with a lease binding. Failover servers MAY choose not to
2092    include the DDNS option in BNDUPD messages if there has been no
2093    change in the status of any DDNS update related to the lease binding.
2094    The partner server receiving BNDUPD messages containing the DDNS
2095    option SHOULD compare the status flags and the FQDN contained in the
2096    option data with the current DDNS information it has associated with
2097    the lease binding, and update its notion of the DDNS status accord-
2098    ingly.
2099
2100    The initiating server MAY send a BNDUPD to its partner before the
2101    DDNS update has been successfully completed. If it does so, it SHOULD
2102    leave the 'C' bit in the Flags field clear, to indicate to the
2103    partner that the DDNS update may not be complete. When the DDNS
2104    update has been successfully acknowledged by the DNS server, the ini-
2105    tiating DHCP server SHOULD include the DDNS option in its next BNDUPD
2106    message about the binding, so that the partner server will be able to
2107    record the final status of the DDNS update. The initiating server
2108    SHOULD set the 'C' bit in the DDNS option if the DDNS update was suc-
2109    cessfully accepted by the DNS server.
2110
2111    Some implementations will choose to send a BNDUPD without waiting for
2112    the DDNS update to complete, and then will send a second BNDUPD once
2113    the DDNS update is complete. Other implementations will delay sending
2114    the partner a BNDUPD until the DDNS update has been acknowledged by
2115    the DNS server, or until some time-limit has elapsed, in order to
2116    avoid sending a second BNDUPD.
2117
2118    The Domain Name field in the DDNS option contains the FQDN that will
2119    be associated with the A RR (if the server is performing an A RR
2120    update for the client) and the PTR RR. This FQDN may be composed in
2121    any of several ways, depending on server configuration and the infor-
2122    mation provided by the client in its DHCP messages. The client may
2123    supply a hostname which it would like the server to use in forming
2124    the FQDN, or it may supply the entire FQDN. The server may be config-
2125    ured to attempt to use the information the client supplies, it may be
2126    configured with an FQDN to use for the client, or it may be
2127
2128
2129
2130 Droms, et. al.           Expires September 2003                [Page 38]
2131 \f
2132 Internet Draft           DHCP Failover Protocol              March 2003
2133
2134
2135    configured to synthesize an FQDN. The responsive server SHOULD
2136    include the FQDN that it will be using in DDNS updates it initiates
2137    when it sends the DDNS option.
2138
2139    Since the responsive server may not have completed the DDNS update at
2140    the time it sends the first BNDUPD about the lease binding, there may
2141    be cases where the FQDN in later BNDUPD messages does not match the
2142    FQDN included in earlier messages.  For example, the responsive
2143    server may be configured to handle situations where two or more DHCP
2144    client FQDNs are identical by modifying the most-specific label in
2145    the FQDNs of some of the clients in an attempt to generate unique
2146    FQDNs for them (a process sometimes called "disambiguation").  Alter-
2147    natively, at sites which use some or all of the information which
2148    clients supply to form the FQDN, it's possible that a client's confi-
2149    guration may be changed so that it begins to supply new data.  The
2150    responsive server may react by removing the DNS records which it ori-
2151    ginally added for the client, and replacing them with records that
2152    refer to the client's new FQDN. In such cases, the responsive server
2153    SHOULD include the actual FQDN that was used in subsequent DDNS
2154    options.  The responsive server SHOULD include relevant client-option
2155    data in the client-request-options option in its BNDUPD messages.
2156    This information may be necessary in order to allow the non-
2157    responsive partner to detect client configuration changes that change
2158    the hostname or FQDN data which the client includes in its DHCP
2159    requests.
2160
2161 5.12.3.  Adding RRs to the DNS
2162
2163    A failover server which is going to perform DDNS updates SHOULD ini-
2164    tiate the DDNS update when it grants a new lease to a client. The
2165    non-responsive partner SHOULD NOT initiate a DDNS update when it
2166    receives the BNDUPD after the lease has been granted. The failover
2167    protocol ensures that only one of the partners will grant a lease to
2168    any individual client, so it follows that this requirement will
2169    prevent both partners from initiating updates simultaneously. The
2170    server initiating the update SHOULD follow the protocol in [FQDN].
2171    The server may be configured to perform an A RR update on behalf of
2172    its clients, or not. Ordinarily, a failover server will not initiate
2173    DDNS updates when it renews leases. In two cases, however, a failover
2174    server MAY initiate a DDNS update when it renews a lease to its
2175    existing client:
2176
2177       1.  When the lease was granted before the server was configured to
2178           perform DDNS updates, the server MAY be configured to perform
2179           updates when it next renews existing leases. Since both
2180           servers are responsive to renewals in NORMAL state, it is not
2181           enough to simply require the non-responsive server to avoid a
2182           DNS update in this case.  The server which would be responsive
2183
2184
2185
2186 Droms, et. al.           Expires September 2003                [Page 39]
2187 \f
2188 Internet Draft           DHCP Failover Protocol              March 2003
2189
2190
2191           to a DHCPDISCOVER from this client (even though the current
2192           request is a DHCPREQUEST/RENEW) is the server which should
2193           initiate the DDNS update.
2194
2195       2.  If a server is in PARTNER-DOWN state, it can conclude that its
2196           partner is no longer attempting to perform an update for the
2197           existing client. If the remaining server has not recorded that
2198           an update for the binding has been successfully completed, the
2199           server MAY initiate a DDNS update.  It MAY initiate this
2200           update immediately upon entry to PARTNER-DOWN state, it may
2201           perform this in the background, or it MAY initiate this update
2202           upon next hearing from the DHCP client.
2203
2204 5.12.4.  Deleting RRs from the DNS
2205
2206    The failover server which makes an IP address FREE SHOULD initiate
2207    any DDNS deletes, if it has recorded that DNS records were added on
2208    behalf of the client.
2209
2210    A server not in PARTNER-DOWN state "makes an IP address FREE" when it
2211    initiates a BNDUPD with a binding-status of FREE, EXPIRED, or
2212    RELEASED.  Its partner confirms this status by acking that BNDUPD,
2213    and upon receipt of the ACK the server has "made the IP address
2214    FREE".  Conversely, a server in PARTNER-DOWN state "makes an IP
2215    address FREE" when it sets the binding-status to FREE, since in
2216    PARTNER-DOWN state no communications is required with the partner.
2217
2218    It is at this point that it should initiate the DDNS operations to
2219    delete RRs from the DDNS. Its partner SHOULD NOT initiate DDNS
2220    deletes for DNS records related to the lease binding as part of send-
2221    ing the BNDACK message.   The partner MAY have issued BNDUPD messages
2222    with a binding-status of FREE, EXPIRED, or RELEASED previously, but
2223    the other server will have NAKed these BNDUPD messages.
2224
2225    The failover protocol ensures that only one of the two partner
2226    servers will be able to make a lease FREE. The server making the
2227    lease FREE may be doing so while it is in NORMAL communication with
2228    its partner, or it may be in PARTNER-DOWN state. If a server is in
2229    PARTNER-DOWN state, it may be performing DDNS deletes for RRs which
2230    its partner added originally. This allows a single remaining partner
2231    server to assume responsibility for all of the DDNS activity which
2232    the two servers were undertaking.
2233
2234    Another implication of this approach is that no DDNS RR deletes will
2235    be performed while either server is in COMMUNICATIONS-INTERRUPTED
2236    state, since no IP addresses are moved into the FREE state during
2237    that period.
2238
2239
2240
2241
2242 Droms, et. al.           Expires September 2003                [Page 40]
2243 \f
2244 Internet Draft           DHCP Failover Protocol              March 2003
2245
2246
2247 5.13.  Reservations and failover
2248
2249    Some DHCP servers support a capability to offer specific pre-
2250    configured IP addresses to DHCP clients.  These are real DHCP
2251    clients, they do the entire DHCP protocol, but these servers always
2252    offer the client a specific pre-configured IP address -- and they
2253    offer that IP address to no other clients.  Such a capability has
2254    several names, but it is sometimes called a "reservation", in that
2255    the IP address is reserved for a particular DHCP client.
2256
2257    In a situation where there are two DHCP servers serving the same sub-
2258    net without using failover, the two DHCP server's need to have dis-
2259    joint IP address pools, but identical reservations for the DHCP
2260    clients.
2261
2262    In a failover context, both servers need to be configured with the
2263    proper reservations in an identical manner, but if we stop there
2264    problems can occur around the edge conditions where reservations are
2265    made for an IP address that has already been leased to a different
2266    client.  Different servers handle this conflict in different ways,
2267    but the goal of the failover protocol is to allow correct operation
2268    with any server's approach to the normal processing of the DHCP pro-
2269    tocol.
2270
2271    The general solution with regards to reservations is as follows.
2272    Whenever a reserved IP address becomes FREE (i.e., when first config-
2273    ured or whenever a client frees it or it expires or is reset), the
2274    primary server MUST show that IP address as FREE (and thus available
2275    for its own allocation) and it MUST send it to the secondary server
2276    with the R bit set in the IP-flags option and the binding-status
2277    BACKUP.
2278
2279    Note that this implies that a reserved IP address goes through the
2280    normal state changes from FREE to ACTIVE (and possibly back to FREE).
2281    The failover protocol supports this approach to reservations, i.e.,
2282    where the IP address undergoes the normal state changes of any IP
2283    address, but it can only be offered to the client for which it is
2284    reserved.  Other approaches to the support of reservations exist in
2285    some DHCP server implementations (e.g., where the IP address is
2286    apparently leased to a particular client forever, without any expira-
2287    tion).  The goal is for the failover protocol to support any of the
2288    usual approaches to reservations, both those that allow an IP address
2289    to go through different states when reserved, and those that don't.
2290
2291    From the above, it follows that a reservation soley on the secondary
2292    will not necessarily allow the secondary to offer that address to
2293    client to whom it is reserved.  The reservation must also appear on
2294    the primary as well for the secondary to be able to offer the IP
2295
2296
2297
2298 Droms, et. al.           Expires September 2003                [Page 41]
2299 \f
2300 Internet Draft           DHCP Failover Protocol              March 2003
2301
2302
2303    address to the client to which is is reserved.
2304
2305    When the reservation on an IP address is cancelled, if the IP address
2306    is currently FREE and the server is the primary, or BACKUP and the
2307    server is the secondary, the server MUST send a BNDUPD to the other
2308    server with the binding-status FREE and the R bit clear.
2309
2310 5.14.  Dynamic BOOTP and failover
2311
2312    Some DHCP servers support a capability to offer IP addresses to BOOTP
2313    clients without having a particular address previously allocated for
2314    those clients.  This capability is often called something like
2315    "dynamic BOOTP".  It is discussed briefly in RFC 1534 [RFC 1534].
2316
2317    This capability has a negative interaction with the fundamental ele-
2318    ments of the failover protocol, in that an address handed out to a
2319    BOOTP device has no term (or effectively no term, in that usually
2320    they are considered leases for "forever").  There is no opportunity
2321    to hand out a lease which is only the MCLT long when first hearing
2322    from a BOOTP device, because they may only interact once with the
2323    DHCP server and they have no notion of a lease expiration time.  Thus
2324    the entire concept of the MCLT and waiting the MCLT after entering
2325    PARTNER-DOWN state is defeated when dealing with BOOTP devices.
2326
2327    With some restrictions, however, dynamic BOOTP devices can be sup-
2328    ported in a server on a subnet where failover is supported.  The only
2329    restriction (and it is not small) is that on any portion of the sub-
2330    net (in any address pool) where dynamic BOOTP devices can be allo-
2331    cated IP addresses, a DHCP server MUST NOT ever use any of the IP
2332    addresses which were previously available for allocation by its fail-
2333    over partner.  Thus, the addresses allocated by the primary to the
2334    secondary for allocation that might have been allocated to BOOTP dev-
2335    ices MUST NOT ever be used by the primary server even if it is in
2336    PARTNER-DOWN state and has waited the MCLT after entering that state.
2337    Conversely, addresses available for allocation by the primary MUST
2338    NOT be used by the secondary even it is in PARTNER-DOWN state.  The
2339    reason for this is because one of those IP address could have been
2340    allocated by the secondary server to a BOOTP device, and the primary
2341    server would have no way of ever knowing that happened.
2342
2343    Whenever a server sends BNDUPD message to its partner, if the client
2344    associated with the IP address is a BOOTP client, then the server
2345    MUST set the B bit in the IP-flags option.
2346
2347    There is a very slight possibility that a BOOTP client could get an
2348    IP address on each server of a failover pair.  When these two servers
2349    eventually attempt to resolve this conflict, they SHOULD agree to
2350    disagree, since it is not possible to know which IP address the BOOTP
2351
2352
2353
2354 Droms, et. al.           Expires September 2003                [Page 42]
2355 \f
2356 Internet Draft           DHCP Failover Protocol              March 2003
2357
2358
2359    client will actually use -- indeed, it could use both.  Operator
2360    intervention will, in general, be required to rectify this situation.
2361    Fortunately, it is extremely unlikely to ever actually occur.
2362
2363 5.15.  Guidelines for selecting MCLT
2364
2365    There is no one correct value for the MCLT.  There is an explicit
2366    tradeoff between various factors in selecting an MCLT value.
2367
2368 5.15.1.  Short MCLT
2369
2370    A short MCLT value will mean that after entering PARTNER-DOWN state,
2371    a server will only have to wait a short time before it can start
2372    allocating its partner's IP addresses to DHCP clients.  Furthermore,
2373    it will only have to wait a short time after the expiration of a
2374    lease on an IP address before it can reallocate that IP address to
2375    another DHCP client.
2376
2377    However the downside of a short MCLT value is that the initial lease
2378    interval that will be offered to every new DHCP client will be short,
2379    which will cause increased traffic as those clients will need to send
2380    in their first renew in a half of a short MCLT time.  In addition,
2381    the lease extensions that a server in COMMUNICATIONS-INTERRUPTED
2382    state can give will be only the MCLT after the server has been in
2383    COMMUNICATIONS-INTERRUPTED for around the desired client lease
2384    period.  If a server stays in COMMUNICATIONS-INTERRUPTED for that
2385    long, then the leases it hands out will be short and that will
2386    increase the load on that server, possibly causing difficulty.
2387
2388 5.15.2.  Long MCLT
2389
2390    A long MCLT value will mean that the initial lease period will be
2391    longer and the time that a server in COMMUNICATIONS-INTERRUPTED state
2392    will be able to extend leases (after it has been in COMMUNICATIONS-
2393    INTERRUPTED state for around the desired client lease period) will be
2394    longer.
2395
2396    However, a server entering PARTNER-DOWN state will have to wait the
2397    longer MCLT before being able to allocate its partner's IP addresses
2398    to new DHCP clients.  This may mean that additional IP addresses are
2399    required in order to cover this time period.  Further, the server in
2400    PARTNER-DOWN will have to wait the longer MCLT from every lease
2401    expiration before it can reallocate an IP address to a different DHCP
2402    client.
2403
2404 5.16.  What is sent in response to an UPDREQ or UPDREQALL message?
2405
2406    In section 7.3, the UPDREQ message is defined, and it says that the
2407
2408
2409
2410 Droms, et. al.           Expires September 2003                [Page 43]
2411 \f
2412 Internet Draft           DHCP Failover Protocol              March 2003
2413
2414
2415    receiving server sends to the requesting server "all of the binding
2416    database information that it has not already seen".  In section
2417    7.4.2, the UPDREQALL message is defined, and it says that the receiv-
2418    ing server sends to the requesting server "all binding database
2419    information".
2420
2421    Both of these statements need further elaboration.
2422
2423    First, for the UPDREQ message, the information to be sent in BNDUPD
2424    messages concerns "all of the binding database information it has not
2425    already seen".  Since every BNDUPD is acked by the receiving server,
2426    the sending server need only keep track of which IP addresses have
2427    binding database changes not yet seen by the partner, and when they
2428    are finally acked by the partner it can record that.  Thus, at any
2429    time, it knows which IP addresses have unacked binding database
2430    information.  This is less simple when, across reconfigurations of
2431    the servers, an IP address can change the failover partner to which
2432    it is associated.  In that case, it is important to reset the indica-
2433    tion that the partner has seen this binding information.  See section
2434    5.17, below, for a more complete discussion of this issue.
2435
2436    Second, in the event that a failover server's binding database infor-
2437    mation is restored from a backup, it will be partially out of date.
2438    In this case, its partner's indication of which binding database
2439    information the restored server has seen will be also be out of date.
2440
2441    The solution to this problem is for a server which is connecting with
2442    its partner to check the partner's last communicated time, and if it
2443    is very much ahead of its own last communicated time, go to into
2444    RECOVER state and transmit an UPDREQALL to allow it to refresh its
2445    state.  See section 9.3.2, step 5.  If the partner's last communi-
2446    cated time is very much behind its own record of when it last commun-
2447    icated with the partner, then it SHOULD invalidate its information on
2448    which binding database information the partner server knows, so that
2449    it will send all of its relevant binding database information to the
2450    partner.
2451
2452    Third, in the event that a server receives a UPDREQALL message, what
2453    constitutes "all binding database information"?  At first glance this
2454    would seem to be information on every configured IP address in the
2455    server.  While this would be technically correct, it may impose a
2456    serious and unacceptable performance penalty on servers which have
2457    millions of configured IP addresses.  What can be done to lessen the
2458    data that must be sent for an UPDREQALL?
2459
2460    When sending "all binding database information", if the sending
2461    server sends only information concerning IP addresses which have been
2462    at some time associated with clients, it will send enough information
2463
2464
2465
2466 Droms, et. al.           Expires September 2003                [Page 44]
2467 \f
2468 Internet Draft           DHCP Failover Protocol              March 2003
2469
2470
2471    to satisfy the needs of the failover protocol.  It need not send
2472    information on any IP addresses that have never been used, since
2473    presumably they will be initialized as available to the primary
2474    server (i.e.  FREE) on any server employing failover.
2475
2476 5.17.  How do you determine that your partner is "up to date" for
2477 specific binding?
2478
2479    Throughout this document, one server is assumed to know for each IP
2480    address binding whether or not its partner is "up to date" for that
2481    binding.  There are some subtle issues involved in recording this "up
2482    to date" information about a specific binding.
2483
2484    In a steady state world, it would suffice to have a single bit in the
2485    binding database to represent the information about whether the
2486    partner was or was not up to date.
2487
2488    In a more complex environment a configuration change affecting a par-
2489    ticular IP address may change the failover endpoint with which it is
2490    associated, and if this should happen, any "up to date" bit which is
2491    written into the bindings database will be accurate for only the pre-
2492    vious failover endpoint, but not the current failover endpoint.  If
2493    failover is disabled and then re-enabled (and the "up to date" bits,
2494    if used, are not cleared) problems can also occur.
2495
2496    A server MUST have be able to relate the "up to date" condition to a
2497    particular failover endpoint and even a particular instantiation of
2498    that failover endpoint.  The techniques to do this are implementation
2499    dependent.
2500
2501    In addition, section 7.4 requires that a server be able to remember
2502    that an UPDREQALL message has been received and to treat every UPDREQ
2503    message as an UPDREQALL message until the first UPDDONE message is
2504    sent.  One way to do this is to clear all of the "up to date" indica-
2505    tions for an entire failover endpoint upon receipt of an UPDREQALL
2506    message, thereby ensuring that every active binding will be sent to
2507    the partner whether through the completion of this UPDREQALL or
2508    through processing of a subsequent UPDREQ message.  This is actually
2509    better than remembering that an UPDREQALL was received and turning
2510    every UPDREQ into an UPDREQALL, since any information sent in an
2511    incomplete UPDREQALL (or subsequent UPDREQ messages turned into "all"
2512    messages) will be remembered and not re-sent.
2513
2514 6.  Common Message Format
2515
2516    This section discusses the common message format that all failover
2517    messages have in common, including the message header format as well
2518    as the common option format.  See section 12 for the the definitions
2519
2520
2521
2522 Droms, et. al.           Expires September 2003                [Page 45]
2523 \f
2524 Internet Draft           DHCP Failover Protocol              March 2003
2525
2526
2527    of the specific options used in the failover protocol.
2528
2529 6.1.  Message header format
2530
2531    The options contained in the payload data section of the failover
2532    message all use a two byte option number and two byte length format.
2533
2534    All failover protocol messages are sent over the TCP connection
2535    between failover endpoints and encoded using a message format
2536    specific to the failover protocol.
2537
2538    There exists a common message format for all failover messages, which
2539    utilizes the options in a way similar to the DHCP protocol.  For each
2540    message type, some options are required and some are optional.  In
2541    addition, when a message is received any options that are not under-
2542    stood by the receiving server MUST be ignored.
2543
2544    All of the fields in the fixed portion of the message MUST be filled
2545    with correct data in every message sent.
2546
2547    0                   1                   2                   3
2548    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
2549    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
2550    |        message length (2)     | msg type (1)  |payload off (1)|
2551    +---------------+---------------+---------------+---------------+
2552    |                            time (4)                           |
2553    +---------------------------------------------------------------+
2554    |                            xid (4)                            |
2555    +---------------------------------------------------------------+
2556    |     0 or more additional header bytes  (variable)             |
2557    +---------------------------------------------------------------+
2558    |                    payload data  (variable)                   |
2559    |                                                               |
2560    |               formatted as DHCP-style options                 |
2561    |           using a two byte option code and two byte length    |
2562    |                  See section 6.2 for details.                 |
2563    +---------------------------------------------------------------+
2564
2565
2566
2567    message length - 2 bytes, network byte order
2568
2569    This is the length of the message in bytes. It includes the two byte
2570    message length itself.  The maximum length is 2048 bytes.  The
2571    minimum length is 12.
2572
2573
2574
2575
2576
2577
2578 Droms, et. al.           Expires September 2003                [Page 46]
2579 \f
2580 Internet Draft           DHCP Failover Protocol              March 2003
2581
2582
2583    msg type - 1 byte
2584
2585    The message type field is used to distinguish between messages.
2586
2587    The following message types are defined:
2588
2589    Value   Message Type
2590    -----   ------------
2591    0       reserved    not used
2592    1       POOLREQ     request allocation of addresses
2593    2       POOLRESP    respond with allocation count
2594    3       BNDUPD      update partner with binding info
2595    4       BNDACK      acknowledge receipt of binding update
2596    5       CONNECT     establish connection with the secondary
2597    6       CONNECTACK  respond to attempt to establish connection with partner
2598    7       UPDREQALL   request full transfer of binding info
2599    8       UPDDONE     ack send and ack of req'd binding info
2600    9       UPDREQ      request transfer of un-acked binding info
2601    10      STATE       inform partner of current state or state change
2602    11      CONTACT     probe communications integrity with partner
2603    12      DISCONNECT  close a connection
2604
2605
2606    New message types should be defined in one of two ranges, 0-127 or
2607    129-255.  The range of 0-127 is used for messages that MUST be sup-
2608    ported by every server, and if a server receives a message in the
2609    range of 0-127 that it doesn't understand, it MUST close the TCP con-
2610    nection.  The range of 128-255 is used for messages which MAY be sup-
2611    ported but are not required, and if a server receives a message in
2612    this range that it does not understand it SHOULD ignore the message.
2613
2614    payload offset - 1 byte
2615
2616    The byte offset of the Payload Data, from the beginning of the
2617    failover message header. The value for the current protocol version
2618    (version 1) is 8.
2619
2620    time - 4 bytes, network byte order
2621
2622    The absolute time in GMT when the message was transmitted,
2623    represented as seconds elapsed since Jan 1, 1970 (i.e., similar to
2624    the ANSI C time_t time value representation).  While the ANSI C
2625    time_t value is signed, the value used in this specification is
2626    unsigned.
2627
2628    A server SHOULD set this time as close to the actual transmission of
2629    the message as possible.
2630
2631
2632
2633
2634 Droms, et. al.           Expires September 2003                [Page 47]
2635 \f
2636 Internet Draft           DHCP Failover Protocol              March 2003
2637
2638
2639    xid - 4 bytes, network byte order
2640
2641    This is the transaction id of the failover message. The sender of a
2642    failover protocol message is responsible for setting this number, and
2643    the receiver of the message copies the number over into any response
2644    message, treating it as opaque data. The sender MUST ensure that
2645    every message sent from a particular failover endpoint over the
2646    associated TCP connection has a unique transaction id.
2647
2648    For failover messages that have no corresponding response message,
2649    the XID value is meaningless, but MUST be supplied. The XID value is
2650    used solely by the receiver of a response message to determine the
2651    corresponding request message.
2652
2653    Request messages where the XID is used in the corresponding response
2654    messages are: POOLREQ, BNDUPD, CONNECT, UPDREQALL, and UPDREQ. The
2655    corresponding response messages are POOLRESP, BNDACK, CONNECTACK,
2656    UPDDONE, and UPDDONE, respectively.
2657
2658    As requests/responses don't survive connection reestablishment, XIDs
2659    only need to be unique during a specific connection.
2660
2661
2662    payload data - variable length
2663
2664    The options are placed after the header, after skipping payload
2665    offset bytes from beginning of the message.  The payload data options
2666    are not preceded by a "cookie" value.
2667
2668    The payload data is formatted as DHCP style options using two byte
2669    option codes and two byte option lengths.  The option codes are in a
2670    namespace which is unique to the failover protocol.
2671
2672    The maximum length of the payload data in octets is 2048 less the
2673    size of the header, i.e., the maximum message length is 2048 octets.
2674
2675 6.2.  Common option format
2676
2677    The options contained in the payload data section of the failover
2678    message all use a two byte option number and two byte length format.
2679
2680    The option numbers are drawn from an option number space unique to
2681    the failover protocol.  All of the message types share a common
2682    option number space and common options definitions, though not all
2683    options are required or meaningful for every message.
2684
2685    In contrast to the options which appear in DHCP client and server
2686    messages, the options in failover message are ordered.  That is, for
2687
2688
2689
2690 Droms, et. al.           Expires September 2003                [Page 48]
2691 \f
2692 Internet Draft           DHCP Failover Protocol              March 2003
2693
2694
2695    some messages the order in which the options appear in the payload
2696    data area is significant.  The messages for which option ordering is
2697    significant explicitly describe the ordering requirements.  If no
2698    ordering requirements are mentioned, then the order is not signifi-
2699    cant for that message.
2700
2701    For all options which refer to time, they all use an absolute time in
2702    GMT.  Time synchronization has already been achieved between the
2703    source and the target server using the CONNECT message and is updated
2704    and refined using the time in every packet.
2705
2706    The time value is an unsigned 32 bit integer in network byte order
2707    giving the number of seconds since 00:00 UTC, 1st January 1970. This
2708    can be converted to an NTP timestamp by adding decimal 2208988800.
2709    This time format will not wrap until the year 2106.  Until sometime
2710    in 2038, it is equal to the ANSI C time_t value (which is a signed 32
2711    bit value and will overflow into a negative number in 2038).
2712
2713    Options should appear once only in each message (except for BNDUPD
2714    and BNDACK messages where bulking is used, see section 6.3 for
2715    details.)  An option that appears twice is not concatenated, but
2716    treated as an error.
2717
2718    Specific option values are described in section 12.
2719
2720    See section 13 for how to define additional options.
2721
2722 6.3.  Batching multiple binding update transactions in one BNDUPD mes-
2723 sage
2724
2725    Implementations of this protocol MAY send multiple binding update
2726    transactions in one BNDUPD message, where a binding update transac-
2727    tion is defined as the set of options which are associated with the
2728    update of a single IP address.  All implementations of this protocol
2729    MUST be prepared to receive BNDUPD messages which contain multiple
2730    binding update transactions and respond correctly to them, including
2731    replying with a BNDACK message which contains status for the multiple
2732    binding update transactions contained in the BNDUPD message.
2733
2734    In the discussion of sending and receiving BNDUPD messages in section
2735    7.1 and BNDACK messages in section 7.2, each BNDUPD message and
2736    BNDACK message is assumed to contain a single binding update transac-
2737    tion in order to reduce the complexity of the discussions in section
2738    7.
2739
2740    Multiple binding update transactions MAY be batched together in one
2741    BNDUPD protocol message with the data sets for the individual tran-
2742    sactions delimited by the assigned-IP-address option, which MUST
2743
2744
2745
2746 Droms, et. al.           Expires September 2003                [Page 49]
2747 \f
2748 Internet Draft           DHCP Failover Protocol              March 2003
2749
2750
2751    appear first in the option set for each transaction.  Ordering of
2752    options between the assigned-IP-address options is not significant.
2753    This is illustrated in the following schematic representation:
2754
2755
2756        Non-IP Address/Non-client specific options first
2757        assigned-IP-address option for the first IP address
2758            Options pertaining to first address, including at least the
2759            binding-status option and others as required.
2760        assigned-IP-address option for the second IP address
2761            Options pertaining to second address, including at least the
2762            binding-status option and others as required.
2763        ...
2764        Trailing options (message digest).
2765
2766
2767    There MUST be a one-to-one correspondence between BNDUPD and BNDACK
2768    messages, and every BNDACK message MUST contain status for all of the
2769    binding update transactions in the corresponding BNDUPD message.
2770
2771    The BNDACK message corresponding to a BNDUPD message MUST contain
2772    assigned-IP-address options for all of the binding update transac-
2773    tions in the BNDUPD message.  Thus, every BNDACK message contains
2774    exactly the same assigned-IP-address options as does its correspond-
2775    ing BNDUPD message.  The order of the assigned-IP-address options
2776    MAY, however, be different.  Here is a schematic representation of a
2777    BNDACK:
2778
2779
2780        Non-IP Address/Non-client specific options first
2781        assigned-IP-address option for the first IP address
2782            If rejected, reject-reason option and message option.
2783        assigned-IP-address option for the second IP address
2784            If rejected, reject-reason option and message option.
2785        ...
2786        Trailing options (message digest).
2787
2788
2789    In case the server chooses to reject some or all of the IP address
2790    binding information in a BNDUPD message in a BNDACK reply, the BNDACK
2791    message MUST contain a reject-reason option following every failed
2792    assigned-IP-address option in order to indicate that the binding
2793    update transaction for that IP address was not accepted and why.  As
2794    with a BNDACK message containing a single binding update transaction,
2795    an assigned-IP-address option without any associated reject-reason
2796    option indicates a successful binding update transaction.
2797
2798
2799
2800
2801
2802 Droms, et. al.           Expires September 2003                [Page 50]
2803 \f
2804 Internet Draft           DHCP Failover Protocol              March 2003
2805
2806
2807 7.  Protocol Messages
2808
2809    This section contains the detailed definition of the protocol mes-
2810    sages, including the information to include when sending the message,
2811    as well as the actions to take upon receiving the message.  The mes-
2812    sage type for each message appears as [n] in the heading for the mes-
2813    sage (see section 6.1).
2814
2815 7.1.  BNDUPD message [3]
2816
2817    The binding update (BNDUPD) message is used to send the binding data-
2818    base changes (known as binding update transactions) to the partner
2819    server, and the partner server responds with a binding acknowledge-
2820    ment (BNDACK) message when it has successfully committed those
2821    changes to its own stable storage.
2822
2823    The rest of the failover protocol exists to determine whether the
2824    partner server is able to communicate or not, and to enable the
2825    partners to exchange BNDUPD/BNDACK messages in order to keep their
2826    binding databases in stable storage synchronized.
2827
2828    The rest of this section is written as though every BNDUPD message
2829    contains only a single binding update transaction in order to reduce
2830    the complexity of the discussion.  See section 6.3 for information on
2831    how to create and process BNDUPD and BNDACK messages which contain
2832    multiple binding update transactions.  Note that while a server MAY
2833    generate BNDUPD messages with multiple binding update transactions,
2834    every server MUST be able to process a BNDUPD message which contains
2835    multiple binding update transactions and generate the corresponding
2836    BNDACK messages with status for multiple binding update transactions.
2837
2838    The following table summarizes the various options for the BNDUPD
2839    message.
2840
2841
2842
2843
2844
2845
2846
2847
2848
2849
2850
2851
2852
2853
2854
2855
2856
2857
2858 Droms, et. al.           Expires September 2003                [Page 51]
2859 \f
2860 Internet Draft           DHCP Failover Protocol              March 2003
2861
2862
2863
2864
2865                                         binding-status            BACKUP
2866                                                                   RESET
2867                                                                   ABANDONED
2868    Option                        ACTIVE     EXPIRED    RELEASED   FREE
2869    ------                        ------     -------    --------   ----
2870    assigned-IP-address (3)       MUST       MUST       MUST       MUST
2871    IP-flags                      MUST(4)    MUST(4)    MUST(4)    MUST(4)
2872    binding-status                MUST       MUST       MUST       MUST
2873    client-identifier             MAY        MAY        MAY        MAY(2)
2874    client-hardware-address       MUST       MUST       MUST       MAY(2)
2875    lease-expiration-time         MUST       MUST NOT   MUST NOT   MUST NOT
2876    potential-expiration-time     MUST       MUST NOT   MUST NOT   MUST NOT
2877    start-time-of-state           SHOULD     SHOULD     SHOULD     SHOULD
2878    client-last-trans.-time       MUST       SHOULD     MUST       MAY
2879    DDNS(1)                       SHOULD     SHOULD     SHOULD     SHOULD
2880    client-request-options        SHOULD     SHOULD NOT SHOULD     SHOULD NOT
2881    client-reply-options          SHOULD     SHOULD NOT SHOULD NOT SHOULD NOT
2882
2883    (1) MUST if server is performing dynamic DNS for this IP address, else
2884        MUST NOT.
2885    (2) MUST NOT if binding-status is ABANDONED.
2886    (3) assigned-IP-address MUST be the first option for an IP address
2887    (4) IP-flags option MUST appear if any flags are non-zero, else it
2888        MAY appear.
2889
2890              Table 7.1-1: Options used in a BNDUPD message
2891
2892
2893 7.1.1.  Sending the BNDUPD message
2894
2895    A BNDUPD message SHOULD be generated whenever any binding changes.  A
2896    change might be in the binding-status, the lease-expiration-time, or
2897    even just the last-transaction-time.  In general, any time a DHCP
2898    server writes its stable storage, a BNDUPD message SHOULD be gen-
2899    erated.  This will often be the result of the processing of a DHCP
2900    client request, but it might also be the result of a successful
2901    dynamic DNS update operation.  Stable storage updates due to BNDUPD
2902    or BNDACK messages SHOULD NOT result in additional BNDUPD messages.
2903
2904    BNDUPD (and BNDACK) messages refer to the binding-status of the IP
2905    address, and this protocol defines a series of binding-statuses, dis-
2906    cussed in more detail below.  Some servers may not support all of
2907    these binding-statuses, and so in those cases they will not be sent.
2908    Upon receipt of a BNDUPD message which contains an unsupported
2909    binding-status, a reasonable interpretation should be made (see sec-
2910    tion 5.10).
2911
2912
2913
2914 Droms, et. al.           Expires September 2003                [Page 52]
2915 \f
2916 Internet Draft           DHCP Failover Protocol              March 2003
2917
2918
2919    All BNDUPD messages MUST contain the IP address of the binding update
2920    transaction in the assigned-IP-address option.
2921
2922    All binding update transactions MUST contain an IP-flags option if
2923    the value of any of the flags would be non-zero.  The IP-flags option
2924    MAY be omitted if all of the flags that it contains are zero.  The
2925    IP-flags option contains a flag which indicates if the IP address is
2926    currently reserved on the server sending the BNDUPD message.  It also
2927    contains a flag which indicates that the lease is associated with a
2928    client that used the BOOTP protocol (as opposed to the DHCP protocol)
2929    to interact with the DHCP server.
2930
2931    All binding update transactions contain a binding-status option, and
2932    it will have one of the values found in section 5.11.  Client infor-
2933    mation consists of client-hardware-address and possibly a client-
2934    identifier, and is explained in more detail later in this section.
2935    The following table indicates whether client information should or
2936    should not appear with each binding-status in a binding update tran-
2937    saction:
2938
2939
2940        binding-status       includes client information
2941        ------------------------------------------------
2942        ACTIVE                      MUST
2943        EXPIRED                     SHOULD
2944        RELEASED                    SHOULD
2945        FREE                        MAY
2946        ABANDONED                   MUST NOT
2947        RESET                       MAY
2948        BACKUP                      MAY
2949
2950          Table 7.1.1-1: Client information required by various
2951          binding-status values.
2952
2953
2954    The ACTIVE binding-status requires some options to indicate the
2955    length of the binding:
2956
2957
2958       o lease-expiration-time
2959
2960         The lease-expiration-time option MUST appear, and be set to the
2961         expiration time most recently ACKed to the DHCP client.  Note
2962         that the time ACKed to a DHCP client is a lease duration in
2963         seconds, while the lease-expiration-time option in a BNDUPD mes-
2964         sage is an absolute time value.
2965
2966       o potential-expiration-time
2967
2968
2969
2970 Droms, et. al.           Expires September 2003                [Page 53]
2971 \f
2972 Internet Draft           DHCP Failover Protocol              March 2003
2973
2974
2975         The potential-expiration-time option MUST appear, and be set to
2976         a value beyond that of the lease-expiration time.  This is the
2977         value that is ACKed by the BNDACK message.  A server sending a
2978         BNDUPD message MUST be able to recover the potential-
2979         expiration-time sent in every BNDUPD, not just those that
2980         receive a corresponding BNDACK, in order to be able to protect
2981         against possible duplicate allocation of IP addresses after
2982         transitioning to PARTNER-DOWN state. See section 5.2.1 for
2983         details as to why the potential-expiration-time exists and
2984         guidelines for how to decide on the value.
2985
2986    The following option information applies to all BNDUPD messages,
2987    regardless of the value of the binding-status, unless otherwise
2988    noted.
2989
2990    o Identifying the client
2991
2992      For many of the binding-status values a client MUST appear while
2993      for others a client MAY appear, and for some a client MUST NOT
2994      appear.
2995
2996      A client is identified in a BNDUPD message by at least one and pos-
2997      sibly two options.   The client-hardware-address option MUST appear
2998      any time that a client appears in a BNDUPD message, and contains
2999      the hardware type and chaddr information from the DHCP request
3000      packet.  A failover client-identifier option MUST appear any time
3001      that a client appears in a BNDUPD message if and only if that
3002      client used a DHCP client-identifier option when communicating with
3003      the DHCP server.  See section 12.5 and 12.4 for details of how to
3004      construct these two options from a DHCP request packet.
3005
3006    o start-time-of-state
3007
3008      The start-time-of-state SHOULD appear.  It is set to the time at
3009      which this IP address first took on the state that corresponds to
3010      the current value of binding-status.
3011
3012    o last-transaction-time
3013
3014      The last-transaction-time value SHOULD appear.  This is the time at
3015      which this DHCP server last received a packet from the DHCP client
3016      referenced by the client-identifier or client-hardware-address that
3017      was associated with the IP address referenced by the assigned-IP-
3018      address.
3019
3020    o DDNS
3021
3022      If the DHCP server is performing dynamic DNS operations on behalf
3023
3024
3025
3026 Droms, et. al.           Expires September 2003                [Page 54]
3027 \f
3028 Internet Draft           DHCP Failover Protocol              March 2003
3029
3030
3031      of the DHCP client represented by the client-identifier or client-
3032      hardware-address, then it should include a DDNS option containing
3033      the domain name and status of any dynamic DNS operations enabled.
3034
3035    o client-request-options
3036
3037      If the BNDUPD was triggered by a request from a DHCP client (typi-
3038      cally those with binding-status of ACTIVE and RELEASED), then the
3039      server SHOULD include options of interest to a failover partner
3040      from the client's request packet in the client-request-options for
3041      transmission to its partner (see section 12.8).
3042
3043      A server sending a BNDUPD SHOULD remember the "interesting" options
3044      or the information that would appear in an "interesting" option for
3045      transmission at a time when the BNDUPD is not closely associated
3046      with a DHCP client request.
3047
3048      A server SHOULD send the following "interesting" options.  It MAY
3049      send any DHCP client options.  As new options are defined, the RFC
3050      defining these options SHOULD include information that they are
3051      "interesting to failover servers" if they should be sent as part of
3052      a BNDUPD.
3053
3054
3055          option          option
3056          number          name
3057          -----------------------------------------
3058
3059          12              host-name
3060          81              client-FQDN [FQDN]
3061          82              relay-agent-information [RFC 3046]
3062          77              user-class [RFC 3004]
3063          60              vendor-class-identifier
3064          118             subnet-selection [RFC 3011]
3065
3066            Table 7.1.1-2: Options which SHOULD be sent in
3067            the client-request-options option in a BNDUPD message.
3068
3069
3070    o client-reply-options
3071
3072      If the BNDUPD was triggered by a request from a DHCP client (typi-
3073      cally those with binding-status of ACTIVE and RELEASED), then the
3074      server SHOULD include options of interest to a failover partner
3075      from the server's DHCP reply packet in the client-reply-options for
3076      transmission to its partner (see section 12.7).
3077
3078      A server sending a BNDUPD SHOULD remember the "interesting" options
3079
3080
3081
3082 Droms, et. al.           Expires September 2003                [Page 55]
3083 \f
3084 Internet Draft           DHCP Failover Protocol              March 2003
3085
3086
3087      or the information that would appear in an "interesting" option for
3088      transmission at a time when the BNDUPD is not closely associated
3089      with a DHCP client request.
3090
3091      A server SHOULD send the following "interesting" options.  It MAY
3092      send any DHCP client options.  As new options are defined, the RFC
3093      defining these options SHOULD include information that they are
3094      "interesting to failover servers" if they should be sent as part of
3095      a BNDUPD.
3096
3097
3098          option          option
3099          number          name
3100          -----------------------------------------
3101
3102          58              renewal-time
3103          59              rebinding-time
3104
3105            Table 7.1.1-3: Options which SHOULD be sent in
3106            the client-reply-options option in a BNDUPD message.
3107
3108
3109    The BNDUPD message SHOULD be sent as soon as possible from the time
3110    that the DHCP client received a response and the lease bindings data-
3111    base is written on stable storage.
3112
3113 7.1.2.  Receiving the BNDUPD message
3114
3115    When a server receives a BNDUPD message, it needs to decide how to
3116    process the binding update transaction it contains and whether that
3117    transaction represents a conflict of any sort. The conflict resolu-
3118    tion process MUST be used on the receipt of every BNDUPD message, not
3119    just those that are received while in POTENTIAL-CONFLICT state, in
3120    order to increase the robustness of the protocol.
3121
3122    There are three sorts of conflicts:
3123
3124       o Two clients, one IP address conflict
3125
3126         This is the duplicate IP address allocation conflict. There are
3127         two different clients each allocated the same address.  See sec-
3128         tion 7.1.3 for how to resolve this conflict.
3129
3130       o Two IP addresses, one client conflict
3131
3132         This conflict exists when a client on one server is associated
3133         with a one IP address, and on the other server with a different
3134         IP address in the same or a related subnet. This does not refer
3135
3136
3137
3138 Droms, et. al.           Expires September 2003                [Page 56]
3139 \f
3140 Internet Draft           DHCP Failover Protocol              March 2003
3141
3142
3143         to the case where a single client has addresses in multiple dif-
3144         ferent subnets or administrative domains, but rather the case
3145         where on the same subnet the client has as lease on one IP
3146         address in one server and on a different IP address on the other
3147         server.
3148
3149         This conflict may or may not be a problem for a given DHCP
3150         server implementation.  In the event that a DHCP server requires
3151         that a DHCP client have only one outstanding lease for an IP
3152         address on one subnet, this conflict should be resolved by
3153         accepting the lease information which has the latest client-
3154         last-transaction-time.
3155
3156       o binding-status conflict
3157
3158         This is normal conflict, where one server is updating the other
3159         with newer information.  See section 7.1.3 for details of how to
3160         resolve these conflicts.
3161
3162 7.1.3.  Deciding whether to accept the binding update transaction in a
3163 BNDUPD message
3164
3165    When analyzing a BNDUPD message from a partner server, if there is
3166    insufficient information in the BNDUPD to process it, then reject the
3167    BNDUPD with reject-reason 3: "Missing binding information".
3168
3169    If the IP address in the BNDUPD is not an IP address associated with
3170    the failover endpoint which received the BNDUPD message, then reject
3171    it with reject-reason 1: "Illegal IP address (not part of any address
3172    pool)".
3173
3174    IP addresses undergo binding status changes for several reasons,
3175    including receipt and processing of DHCP client requests, administra-
3176    tive inputs and receipt of BNDUPD messages.  Every DHCP server needs
3177    to respond to DHCP client requests and administrative inputs with
3178    changes to its internal record of the binding-status of an IP
3179    address, and this response is not in the scope of the failover proto-
3180    col.  However, the receipt of BNDUPD messages implies at least a pos-
3181    sible change of the binding-status for an IP address, and must be
3182    discussed here.  See section 7.1.2 for general actions to take upon
3183    receipt of a BNDUPD message.
3184
3185    Every BNDUPD message SHOULD contain a client-last-transaction-time
3186    option, which MUST, if it appears, be the time that the server last
3187    interacted with the DHCP client.  It MUST NOT be, for instance, the
3188    time that the lease on an IP address expired.  If there has been no
3189    interaction with the DHCP client in question (or there is no DHCP
3190    client presently associated with this IP address), then there will be
3191
3192
3193
3194 Droms, et. al.           Expires September 2003                [Page 57]
3195 \f
3196 Internet Draft           DHCP Failover Protocol              March 2003
3197
3198
3199    no client-last-transaction-time option in the BNDUPD message.
3200
3201    The list in Figure 7.1.3-1 is indexed by the binding-status that a
3202    server receives in a BNDUPD message.  In many cases, the binding-
3203    status of an IP address within the receiving server's data storage
3204    will have an affect upon the checks performed prior to accepting the
3205    new binding-status in a BNDUPD message.
3206
3207    In Figure 7.1.3-1, to "accept" a BNDUPD means to update the server's
3208    bindings database with the information contained in the BNDUPD and
3209    once that update is complete, send a BNDACK message corresponding to
3210    the BNDUPD message.  To "reject" a BNDUPD means to respond to the
3211    BNDUPD with a BNDACK with a reject-reason option included.
3212
3213    When interpreting the information in the following table (Figure
3214    7.1.3-1), for those rules that are listed with "time" -- if a BNDUPD
3215    doesn't have a client-last-transaction-time value, then it MUST NOT
3216    be considered later than the client-last-transaction-time in the
3217    receiving server's binding.   If the BNDUPD contains a client-last-
3218    transaction-time value and the receiving server's binding does not,
3219    then the client-last-transaction-time value in the BNDUPD MUST be
3220    considered later than the server's.
3221
3222
3223
3224
3225
3226
3227
3228
3229
3230
3231
3232
3233
3234
3235
3236
3237
3238
3239
3240
3241
3242
3243
3244
3245
3246
3247
3248
3249
3250 Droms, et. al.           Expires September 2003                [Page 58]
3251 \f
3252 Internet Draft           DHCP Failover Protocol              March 2003
3253
3254
3255
3256                         binding-status in received BNDUPD
3257        binding-status
3258        in receiving                                  FREE       RESET
3259        server          ACTIVE   EXPIRED   RELEASED   BACKUP   ABANDONED
3260
3261        ACTIVE          accept(5) time(2)   time(1)    time(2)   accept
3262        EXPIRED         time(1)   accept    accept     accept    accept
3263        RELEASED        time(1)   time(1)   accept     accept    accept
3264        FREE/BACKUP     accept    accept    accept     accept    accept
3265        RESET           time(3)   accept    accept     accept    accept
3266        ABANDONED       reject(4) reject(4) reject(4)  reject(4) accept
3267
3268        time(1): If the client-last-transaction-time in the BNDUPD
3269        is later than the client-last-transaction-time in the
3270        receiving server's binding, accept it, else reject it.
3271
3272        time(2): If the current time is later than the receiving
3273        servers' lease-expiration-time, accept it, else reject it.
3274
3275        time(3): If the client-last-transaction-time in the BNDUPD
3276        is later than the start-time-of-state in the receiving server's
3277        binding, accept it, else reject it.
3278
3279        (1,2,3): If rejecting, use reject reason 15: "Outdated binding
3280        information".
3281
3282        (4): Use reject reason 16: "Less critical binding information".
3283
3284        (5): If the clients in a BNDUPD message and in a receiving
3285        server's binding differ, then if the receiving server is a
3286        secondary accept it, else reject it with a reject reason of 2:
3287        "Fatal conflict exists: address in use by other client".
3288
3289                 Figure 7.1.3-1:  Accepting BNDUPD messages
3290
3291
3292
3293    If the IP address in the BNDUPD message has the R flag set in the
3294    IP-flags option, indicating it is a reserved IP address, and if the
3295    binding-status in the BNDUPD is BACKUP, then if the receiving server
3296    does not show the IP address as reserved, the receiving server SHOULD
3297    reject the BNDUPD using reject reason 19: "IP not reserved on this
3298    server".
3299
3300 7.1.4.  Accepting the BNDUPD message
3301
3302    When accepting a BNDUPD message, the information contained in the
3303
3304
3305
3306 Droms, et. al.           Expires September 2003                [Page 59]
3307 \f
3308 Internet Draft           DHCP Failover Protocol              March 2003
3309
3310
3311    client-request-options and client-reply-options SHOULD be examined
3312    for any information of interest to this server.  For instance, a
3313    server which wished to detect changes in client specified host names
3314    might want to examine and save information from the host-name or
3315    client-FQDN options.  Servers which expect to utilize information
3316    from the relay-agent-information option SHOULD store this informa-
3317    tion.
3318
3319 7.1.5.  Time values related to the BNDUPD message
3320
3321    There are four time values that MAY be sent in a BNDUPD message.
3322
3323       o lease-expiration-time
3324
3325         The time that the server gave to the client, i.e., the time that
3326         the server believes that the client's lease will expire.
3327
3328       o potential-expiration-time
3329
3330         The time that the server wants to be sure its partner waits
3331         (added to the MCLT) before assuming that this lease has expired.
3332         Typically some time beyond the desired client lease time.
3333
3334       o client-last-transaction-time
3335
3336         The time that the client last interacted with this server.
3337
3338       o start-time-of-state
3339
3340         The time at which the binding first went into the current state.
3341
3342    As discussed in section 5.2, each server knows what its partner has
3343    ACKed with regard to potential-expiration time.  In addition, each
3344    server needs to remember what it has told its partner as the
3345    potential-expiration-time.  Moreover, each server must remember what
3346    it has acked to the *other* server as the most recent potential-
3347    expiration-time from that server.
3348
3349    Remember that each server sends a potential-expiration-time and
3350    receives an ACK for that as well as receiving a potential-
3351    expiration-time and needing to remember what it has acked for that.
3352
3353    While they don't have to be named in any particular way, the times
3354    that a server needs to remember for every IP address in order to
3355    implement the failover protocol are:
3356
3357       o lease-expiration-time
3358
3359
3360
3361
3362 Droms, et. al.           Expires September 2003                [Page 60]
3363 \f
3364 Internet Draft           DHCP Failover Protocol              March 2003
3365
3366
3367         The time that a server gave to the DHCP client.  A DHCP server
3368         needs to remember this time already, just to be a DHCP server.
3369         A server SHOULD update this time with the lease-expiration time
3370         received from a partner in a BNDUPD if the received lease-
3371         expiration time is later than the lease-expiration time recorded
3372         for this binding.
3373
3374       o sent-potential-expiration-time
3375
3376         The latest time sent to the partner for a potential-expiration-
3377         time.
3378
3379       o acked-potential-expiration-time
3380
3381         The latest time that the partner has acked for a potential
3382         expiration time.  Typically the same as sent-potential-
3383         expiration-time if there is not a BNDUPD outstanding.
3384
3385       o received-potential-expiration-time
3386
3387         The latest time that this server has ever received as a
3388         potential-expiration-time from its partner in a BNDUPD that this
3389         server ACKed.
3390
3391    So, a server has to remember two additional times concerning BNDUPD
3392    messages that it has initiated, and one additional time concerning
3393    BNDUPD message that it has received.  How are these times used?
3394
3395    First, let's look at the time that a DHCP server can offer to a DHCP
3396    client.  A server can offer to a DHCP client a time that is no longer
3397    than the MCLT beyond the max( received-potential-expiration-time,
3398    acked-potential-expiration-time).  One might think that the server
3399    should be able to offer only the MCLT beyond the acked-potential-
3400    expiration-time, and while that is certainly simple and easy to
3401    understand, it has negative consequences in actual operation.
3402
3403    To illustrate this, in the simple case where the primary updates the
3404    secondary for a while and then fails, if the secondary can then renew
3405    the client for only the MCLT beyond the acked-potential-expiration-
3406    time, then the secondary will only be able to renew the client for
3407    the MCLT, because the secondary has never sent a BNDUPD packet to the
3408    primary concerning this IP address and client, and so its acked-
3409    potential-expiration-time is zero.
3410
3411    However, since the secondary is allowed to renew the client with the
3412    MCLT beyond the max( received-potential-expiration-time, acked-
3413    potential-expiration-time), then the secondary can usually renew the
3414    client for the full lease period, at least for the first renew it
3415
3416
3417
3418 Droms, et. al.           Expires September 2003                [Page 61]
3419 \f
3420 Internet Draft           DHCP Failover Protocol              March 2003
3421
3422
3423    sees from the client, since the received-potential-expiration-time is
3424    generally longer than the client's desired lease interval.  The
3425    difference in renew times could make a big difference in server load
3426    on the secondary in this case.
3427
3428    What are the consequences of allowing a server to offer a DHCP client
3429    a lease term of the MCLT beyond the max( received-potential-
3430    expiration-time, acked-potential-expiration-time)?  The consequences
3431    appear whenever a server enters PARTNER-DOWN state, and affect how
3432    long that server has to wait before reallocating expired leases.
3433    With this approach, when a server goes into PARTNER-DOWN state, it
3434    must wait the MCLT beyond the max( lease-expiration-time, sent-
3435    potential-expiration-time, acked-potential-expiration-time,
3436    received-potential-expiration-time ) for each IP address before it
3437    can reallocate that IP address to another DHCP client.   One might
3438    normally think that it needed to wait only the MCLT beyond the max(
3439    lease-expiration-time, received-potential-expiration-time ), i.e.,
3440    beyond what it has told the client and what it has explicitly acked
3441    to the other server.  But with the optimization discussed above --
3442    where either server can offer the DHCP client a lease term of the
3443    MCLT beyond the max( received-potential-expiration-time, acked-
3444    potential-expiration-time), then the additional times sent-
3445    potential-expiration-time and acked-potential-expiration-time must be
3446    added into the expression, since the partner could have used those
3447    times as part of its own lease time calculation.
3448
3449    Thus this optimization may require a longer waiting time when enter-
3450    ing PARTNER-DOWN state, but will generally allow servers to operate
3451    considerably more effectively when running in COMMUNICATIONS-
3452    INTERRUPTED state.
3453
3454 7.2.  BNDACK message [4]
3455
3456    A server sends a binding acknowledgement (BNDACK) message when it has
3457    processed a BNDUPD message and after it has successfully committed to
3458    stable storage any binding database changes made as a result of pro-
3459    cessing the BNDUPD message.  A BNDACK message is used to both accept
3460    or reject a BNDUPD message.  A BNDACK message which contains a
3461    reject-reason option is a rejection of the corresponding BNDUPD mes-
3462    sage.
3463
3464    In order to reduce the complexity of the discussion, the rest of this
3465    section is written as though every BNDUPD message contains only a
3466    single binding update transaction and thus every corresponding BNDACK
3467    message would also contain reply information about only a single
3468    binding update transaction.  See section 6.3 for information on how
3469    to create and process BNDUPD and BNDACK messages which contain multi-
3470    ple binding update transactions.
3471
3472
3473
3474 Droms, et. al.           Expires September 2003                [Page 62]
3475 \f
3476 Internet Draft           DHCP Failover Protocol              March 2003
3477
3478
3479    Note that while a server MAY generate BNDUPD messages with multiple
3480    binding update transactions, every server MUST be able to process a
3481    BNDUPD message which contains multiple binding update transactions
3482    and generate the corresponding BNDACK messages with status for multi-
3483    ple binding update transactions.  If a server does not ever create
3484    BNDUPD messages which contain multiple binding update transactions,
3485    then it does not need to be able to process a received BNDACK message
3486    with multiple binding update transactions.  However, all servers MUST
3487    be able to create BNDACK messages which deal with multiple binding
3488    update transactions received in a BNDUPD message.
3489
3490    Every BNDUPD message that is received by a server MUST be responded
3491    to with a corresponding BNDACK message.  The receiving server SHOULD
3492    respond quickly to every BNDUPD message but it MAY choose to respond
3493    preferentially to DHCP client requests instead of BNDUPD messages,
3494    since there is no absolute time period within which a BNDACK must be
3495    sent in response to a BNDUPD message, while DHCP clients frequently
3496    have strict time constraints.
3497
3498    A BNDACK message can only be sent in response to a BNDUPD message
3499    using the same TCP connection from which the BNDUPD message was
3500    received, since the XID's in BNDUPD messages are guaranteed unique
3501    only during the life of a single TCP connection.  When a connection
3502    to a partner server goes down, a server with unprocessed BNDUPD mes-
3503    sages MAY simply drop all of those messages, since it can be sure
3504    that the partner will resend them when they are next in communica-
3505    tions (albeit with a different XID), or it MAY instead choose to pro-
3506    cess those BNDUPD messages, but it MUST NOT send any BNDACK messages
3507    in response.
3508
3509    The following table summarizes the options for the BNDACK message.
3510
3511
3512
3513
3514
3515
3516
3517
3518
3519
3520
3521
3522
3523
3524
3525
3526
3527
3528
3529
3530 Droms, et. al.           Expires September 2003                [Page 63]
3531 \f
3532 Internet Draft           DHCP Failover Protocol              March 2003
3533
3534
3535
3536
3537    Option                        accept       reject
3538    ------                        ------       ------
3539    assigned-IP-address  (1)      MUST         MUST
3540    IP-flags                      SHOULD NOT   SHOULD NOT
3541    binding-status                SHOULD NOT   SHOULD NOT
3542    client-identifier             SHOULD NOT   SHOULD NOT
3543    client-hardware-address       SHOULD NOT   SHOULD NOT
3544    reject-reason                 SHOULD NOT   MUST
3545    message                       SHOULD NOT   SHOULD
3546    lease-expiration-time         SHOULD NOT   SHOULD NOT
3547    potential-expiration-time     SHOULD NOT   SHOULD NOT
3548    start-time-of-state           SHOULD NOT   SHOULD NOT
3549    client-last-trans.-time       SHOULD NOT   SHOULD NOT
3550    DDNS(1)                       SHOULD NOT   SHOULD NOT
3551
3552    (1) assigned-IP-address MUST be the first option for an IP address
3553
3554               Table 7.2-1: Options used in a BNDACK message
3555
3556
3557 7.2.1.  Sending the BNDACK message
3558
3559    The BNDACK message MUST contain the same xid as the corresponding
3560    BNDUPD message.
3561
3562    The assigned-IP-address option from the BNDUPD message MUST be
3563    included in the BNDACK message.  Any additional options from the
3564    BNDUPD message SHOULD NOT appear in the BNDACK message.  Note that
3565    any information sent in options (e.g, a later lease-expiration time)
3566    in the BNDACK message MUST NOT be assumed to necessarily be recorded
3567    in the stable storage of the server who receives the BNDACK message
3568    because there is no corresponding ACK of the BNDACK message.  Any
3569    information that SHOULD be recorded in the partner server's stable
3570    storage MUST be transmitted in a subsequent BNDUPD.
3571
3572    If the server is accepting the BNDUPD, the BNDACK message includes
3573    only the assigned-IP-address option.  If the server is rejecting the
3574    BNDUPD, the additional option reject-reason MUST appear in the BNDACK
3575    message, and the message option SHOULD appear in this case containing
3576    a human-readable error message describing in some detail the reason
3577    for the rejection of the BNDUPD message.
3578
3579    If the server rejects the BNDUPD message with a BNDACK and a reject-
3580    reason option, it may be because the server believes that it has
3581    binding information that the other server should know.  A server
3582    which is rejecting a BNDUPD may initiate a BNDUPD of its own in order
3583
3584
3585
3586 Droms, et. al.           Expires September 2003                [Page 64]
3587 \f
3588 Internet Draft           DHCP Failover Protocol              March 2003
3589
3590
3591    to update its partner with what it believes is better binding infor-
3592    mation, but it MUST ensure through some means that it will not end up
3593    in a situation where each server is sending BNDUPD messages as fast
3594    as possible because they can't agree on which server has better bind-
3595    ing data.  Placing a considerable delay on the initiation of a BNDUPD
3596    message after sending a BNDACK with a reject-reason would be one way
3597    to ensure this situation doesn't occur.
3598
3599 7.2.2.  Receiving the BNDACK message
3600
3601    When a server receives a BNDACK message, if it doesn't contain a
3602    reject-reason option that means that the BNDUPD message was accepted,
3603    and the server which sent the BNDUPD SHOULD update its stable storage
3604    with the potential-expiration-time value sent in the BNDUPD message.
3605
3606    If the BNDACK message contains a reject-reason option, that means
3607    that the BNDUPD was rejected.  There SHOULD be a message option in
3608    the BNDACK giving a text reason for the rejection, and the server
3609    SHOULD log the message in some way.  The server MUST NOT immediately
3610    try to resend the BNDUPD message as there is no reason to believe the
3611    partner won't reject it a second time.  However a server MAY choose
3612    to send another BNDUPD at some future time, for instance when the
3613    server next processes an update request from its partner.
3614
3615 7.3.  UPDREQ message [9]
3616
3617    The update request (UPDREQ) message is used by one server to request
3618    that its partner send it all of the binding database information that
3619    it has not already seen.   Since each server is required to keep
3620    track at all times of the binding information the other server has
3621    ACKed, one server can request transmission of all un-ACKed binding
3622    database information held by the other server by using the UPDREQ
3623    message.
3624
3625    The UPDREQ message is used whenever the sending server cannot proceed
3626    before it has processed all previously un-ACKed binding update infor-
3627    mation, since the UPDREQ message should yield a corresponding UPDDONE
3628    message.  The UPDDONE message is not sent until the server that sent
3629    the UPDREQ message has responded to all of the BNDUPD messages gen-
3630    erated by the UPDREQ message with BNDACK messages (they may either be
3631    accepted or rejected by the BNDACK messages, but they MUST have been
3632    responded to). Thus, the sender of the UPDREQ message can be sure
3633    upon receipt of an UPDDONE message that it has received and committed
3634    to stable storage all outstanding binding database updates.
3635
3636    See section 9, Failover Endpoint States, for the details of when the
3637    UPDREQ message is sent.
3638
3639
3640
3641
3642 Droms, et. al.           Expires September 2003                [Page 65]
3643 \f
3644 Internet Draft           DHCP Failover Protocol              March 2003
3645
3646
3647 7.3.1.  Sending the UPDREQ message
3648
3649    The UPDREQ message has no message specific options.
3650
3651 7.3.2.  Receiving the UPDREQ message
3652
3653    A server receiving an UPDREQ message MUST send all binding database
3654    changes that have not yet been ACKed by the sending server.   These
3655    changes are sent as undistinguished BNDUPD messages.
3656
3657    However, the server which received and is processing the UPDREQ mes-
3658    sage MUST track the BNDACK messages that correspond to the BNDUPD
3659    messages triggered by the UPDREQ message and, when they are all
3660    received, the server MUST send an UPDDONE message.
3661
3662    The server processing the UPDREQ message and sending BNDUPD messages
3663    to its partner SHOULD only track the BNDUPD and BNDACK message pairs
3664    for unACKed binding database changes that were present upon the
3665    receipt of the UPDREQ message.  A server which has received an UPDREQ
3666    message SHOULD send BNDUPD messages for binding database changes that
3667    occur after receipt of the UPDREQ message, but it SHOULD NOT include
3668    those additional BNDUPD messages and their corresponding BNDACK mes-
3669    sages in the accounting necessary to consider the UPDREQ complete and
3670    subsequently send the UPDDONE message.  If some additional binding
3671    database changes end up becoming part of the set of BNDUPD messages
3672    considered as part of the UPDREQ (due to whatever algorithm the
3673    server uses to scan its bindings database for unacked changes) it
3674    will probably not cause any difficulty, but a server MUST NOT attempt
3675    to include all such later BNDUPD messages in the accounting for the
3676    UPDREQ in order to be able to transmit an UPDDONE message.
3677
3678    When queuing up the BNDUPD messages for transmission to the sender of
3679    the UPDREQ message, the server processing the UPDREQ message MUST
3680    honor the value returned in the max-unacked-bndupd option in the CON-
3681    NECT or CONNECTACK message that set up the connection with the send-
3682    ing server.  It MUST NOT send more BNDUPD messages without receiving
3683    corresponding BNDACKs than the value returned in max-unacked-bndupd.
3684    (See section 8 for more details.)
3685
3686 7.4.  UPDREQALL message [7]
3687
3688    The update request all (UPDREQALL) message is used by one server to
3689    request that its partner send it all of the binding database informa-
3690    tion.  This message is used to allow one server to recover from a
3691    failure of stable storage and to restore its binding database in its
3692    entirety from the other server.
3693
3694    A server which sends an UPDREQALL message cannot proceed until all of
3695
3696
3697
3698 Droms, et. al.           Expires September 2003                [Page 66]
3699 \f
3700 Internet Draft           DHCP Failover Protocol              March 2003
3701
3702
3703    its binding update information is restored, and it knows that all of
3704    that information is restored when an UPDDONE message is received.
3705
3706    See section 9, Protocol state transitions, for the details of when
3707    the UPDREQALL message is sent.
3708
3709    The UPDREQALL message has no message specific options.
3710
3711 7.4.1.  Sending the UPDREQALL message
3712
3713    The UPDREQALL is sent.
3714
3715 7.4.2.  Receiving the UPDREQALL message
3716
3717    A server receiving an UPDREQALL message MUST send all binding data-
3718    base information to the sending server.  See section 5.16 for details
3719    of what might actually comprise "all binding database information".
3720
3721    A server receiving an UPDREQALL message MUST remember that such a
3722    message has been received, ensure that all binding information extant
3723    at that point is sent to the partner prior to any UPDDONE message
3724    being sent to that partner.  One way to do this is to remember the
3725    receipt of an UPDREQALL message and to and treat every subsequent
3726    UPDREQ message as an UPDREQALL message until it sends the first
3727    UPDDONE message after receipt of the UPDREQALL message.  This
3728    requirement exists because communications may fail and become re-
3729    established between the two servers, and the specific conditions
3730    which provoked the UPDREQALL message may not longer exist even though
3731    the UPDREQALL message may not yet have completed.  See section 5.17
3732    for information on a more efficient way to meet the above require-
3733    ment.
3734
3735    These changes are sent as undistinguished BNDUPD messages. Otherwise
3736    the processing is the same as for the UPDREQ message.  See section
3737    7.3.2 for details.
3738
3739 7.5.  UPDDONE message [8]
3740
3741    The update done (UPDDONE) message is used by a server receiving an
3742    UPDREQ or UPDREQALL message to signify that it has sent all of the
3743    BNDUPD messages requested by the UPDREQ or UPDREQALL request and that
3744    it has received a BNDACK for each of those messages.
3745
3746    While a BNDACK message MUST have been received for each BNDUPD mes-
3747    sage prior to the transmission of the UPDDONE message, this doesn't
3748    necessarily mean that all of the BNDUPD messages were accepted, only
3749    that all of them were responded to with a BNDACK message.  Thus, a
3750    NAK (comprised of a BNDACK message containing a reject-reason option)
3751
3752
3753
3754 Droms, et. al.           Expires September 2003                [Page 67]
3755 \f
3756 Internet Draft           DHCP Failover Protocol              March 2003
3757
3758
3759    could be used to reject a BNDUPD, but for the purposes of the UPDDONE
3760    message, such NAK would count as a response to the associated BNDUPD
3761    message, and would not block the eventual transmission of the UPDDONE
3762    message.
3763
3764    The xid in an UPDDONE message MUST be identical to the xid in the
3765    UPDREQ or UPDREQALL message that initiated the update process.
3766
3767    The UPDDONE message has no message specific options.
3768
3769 7.5.1.  Sending the UPDDONE message
3770
3771    The UPDDONE message SHOULD be sent as soon as the last BNDACK message
3772    corresponding to a BNDUPD message requested by the UPDREQ or
3773    UPDREQALL is received from the server which sent the UPDREQ or
3774    UPDREQALL.  The XID of the UPDDONE message MUST be the same as the
3775    XID of the corresponding UPDREQ or UPDREQALL message.
3776
3777 7.5.2.  Receiving the UPDDONE message
3778
3779    A server receiving the UPDDONE message knows that all of the informa-
3780    tion that it requested by sending an UPDREQ or UPDREQALL message has
3781    now been sent and that it has recorded this information in its stable
3782    storage.  It typically uses the receipt of an UPDDONE message to move
3783    to a different failover state.  See sections 9.5.2 and 9.8.3 for
3784    details.
3785
3786 7.6.  POOLREQ message [1]
3787
3788    The pool request (POOLREQ) message is used by the secondary server to
3789    request an allocation of IP addresses from the primary server.   It
3790    MUST be sent by a secondary server to a primary server to request IP
3791    address allocation by the primary.  The IP addresses allocated are
3792    transmitted using normal BNDUPD messages from the primary to the
3793    secondary.
3794
3795    The POOLREQ message SHOULD be sent from the secondary to the primary
3796    whenever the secondary makes a transition into NORMAL state.  It
3797    SHOULD periodically be resent in order that any change in the number
3798    of available IP addresses on the primary be reflected in the pool on
3799    the secondary.  The period may be influenced by the secondary
3800    server's leasing activity.
3801
3802    The POOLREQ message has no message specific options.
3803
3804 7.6.1.  Sending the POOLREQ message
3805
3806    The POOLREQ message is sent.
3807
3808
3809
3810 Droms, et. al.           Expires September 2003                [Page 68]
3811 \f
3812 Internet Draft           DHCP Failover Protocol              March 2003
3813
3814
3815 7.6.2.  Receiving the POOLREQ message
3816
3817    When a primary server receives a POOLREQ message it SHOULD examine
3818    the binding database and determine how many IP addresses the secon-
3819    dary server should have, and set these IP addresses to BACKUP state.
3820    It SHOULD then send BNDUPD messages concerning all of these IP
3821    addresses to the secondary server.
3822
3823    Servers frequently have several kinds of IP addresses available on a
3824    particular network segment.  The failover protocol assumes that both
3825    primary and secondary servers are configured in such a way that each
3826    knows the type and number of IP addresses on every network segment
3827    participating in the failover protocol.  The primary server is
3828    responsible for allocating the secondary server the correct propor-
3829    tion of available IP addresses of each kind, and the secondary server
3830    is responsible for being configured in such a way that it can tell
3831    the kind of every IP address based solely on the IP address itself.
3832
3833    A primary server MUST keep track of how many IP addresses were allo-
3834    cated as a result of processing the POOLREQ message, and send that
3835    number in the POOLRESP message.
3836
3837    A primary server MAY choose to defer processing a POOLREQ message
3838    until a more convenient time to process it, but it should not depend
3839    on the secondary server to resend the POOLREQ message in that case.
3840
3841    If a secondary server receives a POOLREQ message it SHOULD report an
3842    error.
3843
3844 7.7.  POOLRESP message [2]
3845
3846    A primary server sends a POOLRESP message to a secondary server after
3847    the allocation process for available addresses to the secondary
3848    server is complete.  Typically this message will precede some of the
3849    BNDUPD messages that the primary uses to send the actual allocated IP
3850    addresses to the secondary.
3851
3852    The xid in the POOLRESP message MUST be identical to the xid in the
3853    POOLREQ message for which this POOLRESP is a response.
3854
3855
3856 7.7.1.  Sending the POOLRESP message
3857
3858    The POOLRESP message MUST contain the same xid as the corresponding
3859    POOLREQ message.
3860
3861    Only one option MUST appear in a POOLREQ message:
3862
3863
3864
3865
3866 Droms, et. al.           Expires September 2003                [Page 69]
3867 \f
3868 Internet Draft           DHCP Failover Protocol              March 2003
3869
3870
3871       o addresses-transferred
3872
3873         The number of addresses allocated to the secondary server by the
3874         primary server as a result of a POOLREQ is contained in the
3875         addresses-transferred option in a POOLRESP message.  Note this
3876         is the number of addresses that are transferred to the secondary
3877         in the primary's binding database as a result of the correspond-
3878         ing POOLREQ message, and that it may be some time before they
3879         can all be transmitted to the secondary server through the use
3880         of BNDUPD messages.
3881
3882 7.7.2.  Receiving the POOLRESP message
3883
3884    When a secondary server receives a POOLRESP message, it SHOULD send
3885    another POOLREQ message if the value of the addresses-transferred
3886    option is non-zero.
3887
3888    Typically, no other action is taken on the reception of a POOLRESP
3889    message.
3890
3891 7.8.  CONNECT message [5]
3892
3893    The connect message is used to establish an applications level con-
3894    nection over a newly created TCP connection.  It gives the source
3895    information for the connection and critical configuration informa-
3896    tion.  It MUST be sent only by the primary server.  Either server can
3897    initiate a TCP connection, but the CONNECT message is only sent by
3898    the primary server.
3899
3900    The CONNECT message MUST be the first message sent down a newly esta-
3901    blished connection, and it MUST be sent only by the primary server.
3902
3903    The following table summarizes the options that are associated with
3904    the CONNECT message:
3905
3906
3907
3908
3909
3910
3911
3912
3913
3914
3915
3916
3917
3918
3919
3920
3921
3922 Droms, et. al.           Expires September 2003                [Page 70]
3923 \f
3924 Internet Draft           DHCP Failover Protocol              March 2003
3925
3926
3927
3928
3929    Option
3930    ------
3931    relationship-name           MUST
3932    max-unacked-bndupd          MUST
3933    receive-timer               MUST
3934    vendor-class-identifier     MUST
3935    protocol-version            MUST
3936    TLS-request                 MUST (1)
3937    MCLT                        MUST
3938    hash-bucket-assignment      MUST
3939
3940    (1) MUST NOT if CONNECT is being sent over a TLS connection
3941
3942               Table 7.8-1: Options used in a CONNECT message
3943
3944
3945 7.8.1.  Sending the CONNECT message
3946
3947    The CONNECT message MUST be the first message sent by the primary
3948    server after the establishment of a new TCP connection with a secon-
3949    dary server participating in the failover protocol.
3950
3951    The xid of the CONNECT message is not related to any previous xid
3952    sequence, but initiates the sequence for this connection.
3953
3954    The name of the failover relationship MUST be placed in the
3955    relationship-name option.  This information is placed in an option
3956    inside of the message in order to allow the identity of the sender to
3957    be covered by a shared secret.
3958
3959    The number of BNDUPD messages the primary server can accept without
3960    blocking the TCP connection MUST be placed in the max-unacked-bndupd
3961    option.  This MUST be a number equal to or greater than 1, SHOULD be
3962    a number greater than 10, and SHOULD be a number less than 100.
3963
3964    The length of the receive timer (tReceive, see section 8.3) MUST be
3965    placed in the receive-timer option.
3966
3967    The MCLT MUST be placed in the MCLT option.
3968
3969    The hash-bucket-assignment option MUST be included in the CONNECT
3970    message.  In the event that load balancing is not configured for this
3971    server, the hash-bucket-assignment option will indicate that.  The
3972    value of the hash-bucket-assignment option is determined from the
3973    specific buckets that the primary server has determined that the
3974    secondary server MUST service as part of the load-balancing
3975
3976
3977
3978 Droms, et. al.           Expires September 2003                [Page 71]
3979 \f
3980 Internet Draft           DHCP Failover Protocol              March 2003
3981
3982
3983    algorithm.  The way in which the primary server determines this
3984    information is outside the scope of this protocol definition.  The
3985    primary server SHOULD be configured with a percentage of clients that
3986    the secondary server will be instructed to service, and the primary
3987    server SHOULD use the algorithm in [RFC 3074] to generate a Hash
3988    Bucket Assignment which it sends to the secondary server.
3989
3990    The vendor class identifier MUST be placed in the vendor-class-
3991    identifier option.
3992
3993    The protocol-version option MUST be included in every CONNECT mes-
3994    sage.  The current value of the protocol version is 1.
3995
3996    The TLS-request option MUST be sent and contains the desired TLS con-
3997    nection request as well as information concerning whether TLS is sup-
3998    ported.    If this CONNECT message is being sent over a already
3999    created TLS connection, the TLS-request MUST NOT appear.
4000
4001 7.8.2.  Receiving the CONNECT message
4002
4003    When a server established a TCP connection on a failover port, if it
4004    is a PRIMARY server it should send a CONNECT message, and if it is a
4005    secondary server it should wait for a CONNECT message before sending
4006    any messages.  To avoid denial of service attacks, a secondary should
4007    only wait for a CONNECT message on a new connection for a limited
4008    amount of time and close the connection if none is received during
4009    that time.
4010
4011    When a secondary server receives a CONNECT message it should:
4012
4013       1.  Record the time at which the message was received.
4014
4015       2.  Examine the protocol-version option, and decide if this server
4016           is capable of interoperating with another server running that
4017           protocol version.  If not, send the CONNECTACK message with
4018           the reject reason 14: "Protocol version mismatch".  The server
4019           MUST include its protocol-version in the CONNECTACK message.
4020
4021       3.  Examine the TLS-request option.  Figure out the TLS-reply
4022           value based on the capabilities and configuration of this
4023           server.  If the result for the TLS-reply value is a 1 and the
4024           connection is accepted, indicating use of TLS, then immedi-
4025           ately send the CONNECTACK message and go into TLS negotiation.
4026           If the TLS-reply value implies rejection of the connection,
4027           then immediately send the CONNECTACK message with the TLS-
4028           reply value and the appropriate reject-reason option value.
4029           In all other cases, save the TLS-reply option information for
4030           the eventual CONNECTACK message.
4031
4032
4033
4034 Droms, et. al.           Expires September 2003                [Page 72]
4035 \f
4036 Internet Draft           DHCP Failover Protocol              March 2003
4037
4038
4039           The possibilities for TLS-request and TLS-reply are:
4040
4041           CONNECT CONNECTACK
4042             TLS     TLS
4043           request  reply
4044                         Reject
4045             t1      t1  Reason   Comments
4046             --      --  ------   --------
4047             0       0           no TLS used
4048             0       1    11     primary won't use TLS, secondary requires TLS
4049             1       0           primary desires TLS, secondary doesn't
4050             1       1           primary desires TLS, secondary will use TLS
4051             2       0    9, 10  primary requires TLS and secondary won't
4052             2       1           primary requires TLS and secondary will use TLS
4053
4054
4055
4056       4.  Check to see if there is a message-digest option in the CON-
4057           NECT message.  If there was, and the server does not support
4058           message-digests, then reject the connection with reject reason
4059           12: "Message digest not supported" in the CONNECTACK.  If the
4060           server does support message-digests, then check this message
4061           for validity based on the message-digest, and reject it if the
4062           digest indicates the message was altered with reject reason
4063           20: "Message digest failed to compare".
4064
4065       5.  Determine if the sender (from the relationship-name option)
4066           and the implicit role of the sender (i.e., primary) represents
4067           a server with which the receiver was configured to engage in
4068           failover activity.  This is performed after any TLS or message
4069           digest processing so that it occurs after a secure connection
4070           is created, to ensure that there is no tampering with the
4071           relationship name of the partner.  In the absence of any other
4072           security capability (i.e., when TLS or a message digest is not
4073           used), the server MAY wish to be configured with the IP
4074           address of the partner and check the source-ip of the CONNECT
4075           message against that IP address as a weak form of security.
4076
4077           If not, then the receiving server should reject the CONNECT
4078           request by sending a CONNECTACK message with a reject-reason
4079           value of: 8, invalid failover partner.
4080
4081           If it is, then the receiving failover endpoint should be
4082           determined.
4083
4084       6.  Decide if the time delta between the sending of the message,
4085           in the time field, and the receipt of the message, recorded in
4086           step 1 above, is acceptable.  A server MAY require an
4087
4088
4089
4090 Droms, et. al.           Expires September 2003                [Page 73]
4091 \f
4092 Internet Draft           DHCP Failover Protocol              March 2003
4093
4094
4095           arbitrarily small delta in time values in order to set up a
4096           failover connection with another server.  See section 5.10 for
4097           information on time synchronization.
4098
4099           If the delta between the time values is too great, the server
4100           should reject the CONNECT request by sending a CONNECTACK mes-
4101           sage with a reject-reason of 4, time mismatch too great.
4102
4103           If the time mismatch is not considered too great then the
4104           receiving server MUST record the delta between the servers.
4105           The receiving server MUST use this delta to correct all of the
4106           absolute times received from the other server in all time-
4107           valued options.  Note that servers can participate in failover
4108           with arbitrarily great time mismatches, as long as it is more
4109           or less constant.
4110
4111       7.  Examine the MCLT option in the CONNECT request and use the
4112           value of the MCLT as the MCLT for this failover endpoint.
4113
4114           The secondary server SHOULD be able to operate with any MCLT
4115           sent by the primary,  but if it cannot, then it should send a
4116           CONNECTACK with a reject-reason of 5, MCLT mismatch.  In the
4117           event that the MCLT from the primary does not match that con-
4118           figured on the secondary, and the secondary will run with the
4119           primary's value, then the secondary MUST save the MCLT in
4120           secondary storage since it will need it even if it cannot con-
4121           tact the primary.  The secondary MUST NOT use a different MCLT
4122           value than it received from the primary even if it cannot con-
4123           tact the primary.
4124
4125       8.  The server MUST store hash-bucket-assignment option for use
4126           during processing during NORMAL state.  If this hash bucket
4127           assignment conflicts with the secondary server's configured
4128           hash bucket assignment for use in other than NORMAL state, the
4129           secondary server should send a CONNECTACK with a reject reason
4130           of 19, Hash bucket assignment conflict.
4131
4132       9.  The receiving server MAY use the vendor-class-identifier to do
4133           vendor specific processing.
4134
4135 7.9.  CONNECTACK message [6]
4136
4137    The CONNECTACK message is sent to accept or reject a CONNECT message.
4138    It is sent by the secondary server which received a CONNECT message.
4139
4140    Attempting immediately to reconnect after either receiving a CONNEC-
4141    TACK with a reject-reason or after sending a CONNECTACK with a
4142    reject-reason could yield unwanted looping behavior, since the reason
4143
4144
4145
4146 Droms, et. al.           Expires September 2003                [Page 74]
4147 \f
4148 Internet Draft           DHCP Failover Protocol              March 2003
4149
4150
4151    that the connection was rejected may well not have changed since the
4152    last attempt.  A simple suggested solution is to wait a minute or two
4153    after sending or receiving a CONNECTACK message with a reject-reason
4154    before attempting to reestablish communication.
4155
4156    The following table summarizes the options associated with the CON-
4157    NECTACK message:
4158
4159
4160    Option                     accept       reject
4161    ------
4162    relationship-name           MUST        MUST
4163    max-unacked-bndupd          MUST        MUST NOT
4164    receive-timer               MUST        MUST NOT
4165    vendor-class-identifier     MUST        MUST NOT
4166    protocol-version            MUST        MUST
4167    TLS-reply                   (1)         (2)
4168    reject-reason               MUST NOT    MUST
4169    message                     MUST NOT    SHOULD
4170    MCLT                        MUST NOT    MUST NOT
4171    hash-bucket-assignment      MUST NOT    MUST NOT
4172
4173    (1) MUST NOT if sending CONNECTACK after TLS negotiation, MUST
4174    if TLS-request in CONNECT, else MUST NOT.
4175    (2) MUST if TLS-request in CONNECT message, else MUST NOT.
4176
4177               Table 7.9-1: Options used in a CONNECTACK message
4178
4179
4180 7.9.1.  Sending the CONNECTACK message
4181
4182    The xid of the CONNECTACK message MUST be that of the corresponding
4183    CONNECT message.
4184
4185    The name of the relationship MUST be placed in the relationship-name
4186    option.  This information is placed in an option inside of the mes-
4187    sage in order to allow the identity of the sender to be covered by a
4188    shared secret.
4189
4190    The protocol-version option MUST be included in every CONNECTACK mes-
4191    sage.  The current value of the protocol version is 1.
4192
4193    If the connection has been rejected, the reject-reason option MUST be
4194    placed in the CONNECTACK message with an appropriate reason, and a
4195    message option SHOULD be included with a human-readable error message
4196    describing the reason for the rejection in some detail.  If the
4197    reject-reason option appears, then the remaining options listed below
4198    do not appear.  The sending server should close the connection after
4199
4200
4201
4202 Droms, et. al.           Expires September 2003                [Page 75]
4203 \f
4204 Internet Draft           DHCP Failover Protocol              March 2003
4205
4206
4207    sending the CONNECTACK if the connection was rejected.
4208
4209    The results of the TLS negotiation MUST be placed in the TLS-reply
4210    option.  If this CONNECTACK message is being sent over an already TLS
4211    secured connection, then there MUST NOT be a TLS-reply option.
4212
4213    If there was a message-digest option in the CONNECT message, then
4214    there MUST be a message-digest in the CONNECTACK message and any sub-
4215    sequent messages if the CONNECTACK does not contain a reject-reason.
4216
4217    The number of BNDUPD messages the server can accept without blocking
4218    the TCP connection MUST be placed in the max-unacked-bndupd option.
4219    This SHOULD be a number greater than 10, and SHOULD be a number less
4220    than 100.
4221
4222    The length of the receive timer (tReceive, see section 8.3) MUST be
4223    placed in the receive-timer option.
4224
4225    The vendor class identifier MUST be placed in the vendor-class-
4226    identifier option.
4227
4228    After a connection is created (either by sending a CONNECTACK message
4229    to the first CONNECT message, or sending a CONNECTACK message to a
4230    CONNECT message received over a TLS connection), the server MUST send
4231    a STATE message.
4232
4233    After a connection is created, the server MUST start two timers for
4234    the connection: tSend and tReceive.   The tSend timer SHOULD be
4235    approximately 33 percent of the time in the receiver-timer option in
4236    the corresponding CONNECT message.  The tReceive timer SHOULD be the
4237    time sent in the receiver-timer option in the CONNECTACK message.
4238
4239    The tReceive timer is reset whenever a message is received from this
4240    TCP connection.  If it ever expires, the TCP connection is dropped
4241    and communications with this partner is considered not ok.  The
4242    reject reason 17: "No traffic within sufficient time" is placed in
4243    the DISCONNECT message sent prior to dropping the TCP connection.
4244
4245    The tSend timer is reset whenever a message is sent over this connec-
4246    tion. When it expires, a CONTACT message MUST be sent.
4247
4248 7.9.2.  Receiving the CONNECTACK message
4249
4250    If a CONNECTACK message is received with a different XID from the one
4251    in the CONNECT that was sent, it SHOULD be ignored.  To avoid denial
4252    of service attacks, a primary should only wait for a CONNECTACK mes-
4253    sage on a new connection for a limited amount of time and close the
4254    connection if none is received during that time.
4255
4256
4257
4258 Droms, et. al.           Expires September 2003                [Page 76]
4259 \f
4260 Internet Draft           DHCP Failover Protocol              March 2003
4261
4262
4263    When a CONNECTACK message is received, the following actions should
4264    be taken:
4265
4266       1.  Record the time the message was received.
4267
4268       2.  Check to see if the xid on the CONNECTACK matches an outstand-
4269           ing CONNECT message on this TCP connection.
4270
4271       3.  Check to see if there is a reject-reason option in the CONNEC-
4272           TACK message.  If not, continue with step 3.  If there is a
4273           reject-reason option, the server SHOULD report the error code.
4274           If a message option appears a server SHOULD display the string
4275           from the message option in a user visible way.  The server
4276           MUST close the connection if a reject-reason option appears.
4277
4278       4.  Check the value of the TLS-reply option (if any, which there
4279           won't be if this CONNECT is taking place utilizing TLS), and
4280           if it was 1, then skip processing of the rest of the CONNEC-
4281           TACK message, and immediately enter into TLS connection setup.
4282
4283           This step occurs prior to steps 5 and 6 in order to allow
4284           creation of a secure connection (if required) prior to pro-
4285           cessing the protocol version and IP address information.
4286
4287       5.  Examine the value of the protocol-version option.  If this
4288           server is able to establish connections with another server
4289           running this protocol version, then continue, else close the
4290           connection.
4291
4292       6.  Decide if the time delta between the sending of the message,
4293           in the time field, and the receipt of the message, recorded in
4294           step 1 above, is acceptable.  A server MAY require an arbi-
4295           trarily small delta in time values in order to set up a fail-
4296           over connection with another server.
4297
4298           If the delta between the time values is too great, the server
4299           should drop the TCP connection (see section 7.12).
4300
4301           If the time mismatch is not considered too great then the
4302           receiving server MUST record the delta between the servers.
4303           The receiving server MUST use this delta to correct all of the
4304           absolute times received from the other server in all time-
4305           valued options.  Note that the failover protocol is con-
4306           structed so that two servers can be failover partners with
4307           arbitrarily great time mismatches.
4308
4309       7.  The receiving server MAY use the vendor-class-identifier to do
4310           vendor specific processing.
4311
4312
4313
4314 Droms, et. al.           Expires September 2003                [Page 77]
4315 \f
4316 Internet Draft           DHCP Failover Protocol              March 2003
4317
4318
4319       8.  After accepting a CONNECTACK message, the server MUST send a
4320           STATE message.
4321
4322           After receiving a CONNECTACK message, the server MUST start
4323           two timers for the connection: tSend and tReceive.   The tSend
4324           timer SHOULD be approximately 20 percent of the time in the
4325           receiver-timer option in the corresponding CONNECTACK message.
4326           The tReceive timer SHOULD be set to the time sent in the
4327           receiver-timer option in the CONNECT message.
4328
4329           The tReceive timer is reset whenever a message is received
4330           from this TCP connection.  If it ever expires, the TCP connec-
4331           tion is dropped and communications with this partner is con-
4332           sidered not ok.  The reject reason 17: "No traffic within suf-
4333           ficient time" is placed in the DISCONNECT message sent prior
4334           to dropping the TCP connection.
4335
4336           The tSend timer is reset whenever a message is sent over this
4337           connection. When it expires, a CONTACT message MUST be sent.
4338
4339 7.10.  STATE message [10]
4340
4341    The state (STATE) message is used to communicate the current failover
4342    state to the partner server.
4343
4344    The STATE message MUST be sent after sending a CONNECTACK message
4345    that didn't contain a reject-reason option, and MUST be sent after
4346    receiving a CONNECTACK message without a reject-reason option.
4347
4348    A STATE message MUST be sent whenever the failover endpoint changes
4349    its failover state and a connection exists to the partner.
4350
4351    The STATE message requires no response from the failover partner.
4352
4353    The following table shows the options that MUST appear in a STATE
4354    message:
4355
4356
4357    Option
4358    ------
4359    sending-state               MUST
4360    server-flags                MUST
4361    start-time-of-state         MUST
4362
4363               Table 7.10-1: Options used in a STATE message
4364
4365
4366
4367
4368
4369
4370 Droms, et. al.           Expires September 2003                [Page 78]
4371 \f
4372 Internet Draft           DHCP Failover Protocol              March 2003
4373
4374
4375 7.10.1.  Sending the STATE message
4376
4377    The current failover state is placed in the server-state option and
4378    the current state of the STARTUP flag is placed in the server-flags
4379    option.
4380
4381    The message is sent with a unique xid.
4382
4383    A server SHOULD only send the STATE message either when the connec-
4384    tion is created (i.e, after sending or receiving a CONNECTACK message
4385    with no reject-reason option), or when there is a change from the
4386    values sent in a previous STATE message.
4387
4388 7.10.2.  Receiving the STATE message
4389
4390    Every STATE message SHOULD indicate a change in state or a change in
4391    the flags.
4392
4393    When a STATE message is received, any state transitions specified in
4394    section 9 are taken.
4395
4396    No response to a STATE message is required.
4397
4398 7.11.  CONTACT message [11]
4399
4400    The contact (CONTACT) message is sent to verify communications
4401    integrity with a failover partner.  The CONTACT message is sent when
4402    no messages have been sent to the failover partner for a specified
4403    period of time.  This is determined by the tSend timer expiring (see
4404    section 8.3).
4405
4406    The CONTACT message has no message specific options.
4407
4408 7.11.1.  Sending the CONTACT message
4409
4410    The CONTACT message is sent.
4411
4412 7.11.2.  Receiving the CONTACT message
4413
4414    When a CONTACT message is received, the tReceive timer is reset (as
4415    it is with any message that is received).
4416
4417    A server SHOULD use the time in the time field and the time the mes-
4418    sage was received to refine the delta time calculations between the
4419    servers.
4420
4421
4422
4423
4424
4425
4426 Droms, et. al.           Expires September 2003                [Page 79]
4427 \f
4428 Internet Draft           DHCP Failover Protocol              March 2003
4429
4430
4431 7.12.  DISCONNECT message [12]
4432
4433    The DISCONNECT is the last message sent over a connection before
4434    dropping an established connection (note that an established connec-
4435    tion is one where a CONNECTACK has been sent without a reject rea-
4436    son).
4437
4438    After sending or receiving a DISCONNECT message, a server needs to
4439    have some mechanism to prevent an error loop. Simply reconnecting to
4440    the partner immediately is not the best option, especially after
4441    several consecutive attempts.
4442
4443    A simple suggested solution is to wait a minute or two after sending
4444    or receiving a DISCONNECT before attempting to reestablish communica-
4445    tion.
4446
4447    The DISCONNECT message MUST be the last message sent down a connec-
4448    tion before it is closed.
4449
4450    The following table summarizes the options that are associated with
4451    the DISCONNECT message:
4452
4453
4454    Option
4455    ------
4456    reject-reason               MUST
4457    message                     SHOULD
4458
4459               Table 7.12-1: Options used in a DISCONNECT message
4460
4461
4462
4463 7.12.1.  Sending the DISCONNECT message
4464
4465    The DISCONNECT message MUST be the last message sent by the a server
4466    which is dropping a TCP connection.
4467
4468    The xid of the DISCONNECT message must be unique.
4469
4470    The reject-reason option MUST appear giving a reason why the connec-
4471    tion was dropped.  A message option SHOULD appear giving a human
4472    readable error message with possibly more details.
4473
4474 7.12.2.  Receiving the DISCONNECT message
4475
4476    When a server receives a DISCONNECT message it should log the message
4477    if there was one and possibly raise an alarm of some sort if the
4478    reject reason was one that was sufficiently serious.
4479
4480
4481
4482 Droms, et. al.           Expires September 2003                [Page 80]
4483 \f
4484 Internet Draft           DHCP Failover Protocol              March 2003
4485
4486
4487 8.  Connection Management
4488
4489    Servers participating in the failover protocol communicate over TCP
4490    connections.   These TCP connections are used both to transmit bind-
4491    ing information from one server to another as well as to allow each
4492    server to determine whether communications is possible with the other
4493    server.
4494
4495    Central to the operation of the failover protocol is a notion of
4496    "communications okay" or "communications failed".  Failover state
4497    transitions are taken in many cases when the status of communications
4498    with the partner changes, and the existence or non-existence of a TCP
4499    connections between failover endpoints is used to determine if com-
4500    munications is "okay" or "failed".
4501
4502    A single TCP connection exists which connects two failover endpoints.
4503
4504 8.1.  Connection granularity
4505
4506    There exists one TCP connection between each set of failover end-
4507    points.  See section 5.1.1 for an explanation of failover endpoints.
4508
4509    Typically there is one failover endpoint for each end of a failover
4510    relationship between two servers, and only a single relationship
4511    between any two servers.  Given the integration of loadbalancing into
4512    the failover protocol, there is little value in having more than one
4513    failover relationship between two servers, though the protocol will
4514    support multiple relationships between two servers.
4515
4516    Each failover relationship MUST have a unique relationship-name, and
4517    the relationship-name option is used to communicate this name in the
4518    CONNECT and CONNECTACK messages.
4519
4520 8.2.  Creating the TCP connection
4521
4522    All failover TCP connections are initiated over port 647.  Every
4523    server implementing the failover protocol MUST listen on port 647.
4524
4525    Every server implementing the failover protocol SHOULD attempt to
4526    connect to all of its partners periodically, where the period is
4527    implementation dependent and SHOULD be configurable.  In the event
4528    that a connection has been rejected by a CONNECTACK message with a
4529    reject-reason option contained in it or a DISCONNECT message, a
4530    server SHOULD reduce the frequency with which it attempts to connect
4531    to that server but it SHOULD continue to attempt to connect periodi-
4532    cally.
4533
4534    When a connection attempt succeeds, if the server generating the
4535
4536
4537
4538 Droms, et. al.           Expires September 2003                [Page 81]
4539 \f
4540 Internet Draft           DHCP Failover Protocol              March 2003
4541
4542
4543    connection attempt is a primary server for that relationship, then it
4544    MUST send a CONNECT message down the connection.  If it is not a pri-
4545    mary server for the relationship, then it MUST just drop the connec-
4546    tion and wait for the primary server to connect to it.
4547
4548    When a connection attempt is received on port 647, the only informa-
4549    tion that the receiving server has is the IP address of the partner
4550    initiating a connection.  It also knows whether it has the primary
4551    role for any failover relationships with the connecting server.  If
4552    it has any relationships for which it is a primary server, it should
4553    initiate a connection of its own to port 647 of the partner server,
4554    one for each primary relationship it has with that server.
4555
4556    If it has any relationships with the connecting server for which it
4557    is a seconary server, it should just await the CONNECT message to
4558    determine which relationship this connection is to serve.
4559
4560    If it has no secondary relationships with the connecting server, it
4561    SHOULD drop the connection.
4562
4563    To summarize -- a primary server MUST use a connection that it has
4564    initiated in order to send a CONNECT message.  Every server that is a
4565    secondary server in a relationship attempts to create a connection to
4566    the server which is primary in the relationship, but that connection
4567    is only used to stimulate the primary server into recognizing that
4568    the secondary server is ready for operation.  The reason behind this
4569    is that the secondary server has no way to communicate to the primary
4570    server which relationship a connection is designed to serve.
4571
4572    A server which has multiple secondary relationships with a primary
4573    server SHOULD only send one stimulus connection attempt to the pri-
4574    mary server.
4575
4576    Once a connection is established, the primary server MUST send a CON-
4577    NECT message across the connection.  A secondary server MUST wait for
4578    the CONNECT message from a primary server.  If the secondary server
4579    doesn't receive a CONNECT message from the primary server in an ins-
4580    tallation dependent amount of time, it MAY drop the connection and
4581    send another stimulus connection attempt to the primary server.
4582
4583    Every CONNECT message includes a TLS-request option, and if the CON-
4584    NECTACK message does not reject the CONNECT message and the TLS-reply
4585    option says TLS MUST be used, then the servers will immediately enter
4586    into TLS negotiation.
4587
4588    Once TLS negotiation is complete, the primary server MUST resend the
4589    CONNECT message on the newly secured TLS connection and then wait for
4590    the CONNECTACK message in response.  The TLS-request and TLS-reply
4591
4592
4593
4594 Droms, et. al.           Expires September 2003                [Page 82]
4595 \f
4596 Internet Draft           DHCP Failover Protocol              March 2003
4597
4598
4599    options MUST NOT appear in either this second CONNECT or its associ-
4600    ated CONNECTACK message as they had in the first messages.
4601
4602    The second message sent over a new connection (either a bare TCP con-
4603    nection or a connection utilizing TLS) is a STATE message.  Upon the
4604    receipt of this message, the receiver can consider communications up.
4605
4606    It is entirely possible that two servers will attempt to make connec-
4607    tions to each other essentially simultaneously, and in this case the
4608    secondary server will be waiting for a CONNECT message on each con-
4609    nection.  The primary server MUST send a CONNECT message over one
4610    connection and it MUST close the other connection.
4611
4612    A secondary server MUST NOT respond to the closing of a TCP connec-
4613    tion with a blind attempt to reconnect -- there may be another TCP
4614    connection to the same failover partner already in use.
4615
4616 8.3.  Using the TCP connection for determining communications status
4617
4618    The TCP connection is used to determine the communications status of
4619    the other server, i.e., communications-ok, or communications-
4620    interrupted.
4621
4622    Three things must happen for a server to consider that communications
4623    are ok with respect to another server:
4624
4625
4626       1.  A TCP connection must be established to the other server.
4627
4628       2.  A CONNECT message must be received and a CONNECTACK message
4629           sent in response.  The CONNECT message is used to determine
4630           the identify of the failover endpoint of the other end of the
4631           TCP connection -- without it, the failover endpoint cannot be
4632           uniquely determined.  Without knowledge of the failover end-
4633           point, then the entity with which communications is ok is
4634           undetermined.
4635
4636       3.  A STATE message must be received from the other server over
4637           the connection.  This STATE message initializes important
4638           information necessary to the operation of the state machine
4639           the governs the behavior of this failover endpoint.
4640
4641    There are two ways that a server can determine that communications
4642    has failed:
4643
4644
4645       1.  The TCP connection can go down, yielding an error when
4646           attempting to send or receive a message. This will happen at
4647
4648
4649
4650 Droms, et. al.           Expires September 2003                [Page 83]
4651 \f
4652 Internet Draft           DHCP Failover Protocol              March 2003
4653
4654
4655           least as often as the period of the tSend timer.
4656
4657       2.  The tReceive timer can expire.
4658
4659    In either of these cases, communications is considered interrupted.
4660
4661    If the tReceive timer expires, the connection MUST be dropped.  The
4662    reject reason 17: "No traffic within sufficient time" is placed in
4663    the DISCONNECT message sent prior to dropping the TCP connection.
4664
4665    Several difficulties arise when trying to use one TCP connection for
4666    both bulk data transfer as well as to sense the communications status
4667    of the other server.   One aspect of the problem stems from the dif-
4668    ferent requirements of both uses.  The bulk data transfer is of
4669    course critically important to the protocol, but the speed with which
4670    it is processed is not terribly significant.  It might well be
4671    minutes before a BNDUPD message is processed, and while not optimal,
4672    such an occasional delay doesn't compromise the correctness of the
4673    protocol. However, the speed with which one server detects the other
4674    server is up (or, more importantly, down) is more highly constrained.
4675    Generally one server should be able to detect that the other server
4676    is not communicating within a minute or less.
4677
4678    These differing time constraints makes it difficult to use the same
4679    TCP connection for data transfer as well as to sense communications
4680    integrity.   See section 3.5 for additional details on TCP.
4681
4682    The solution to this problem is to require that some message be
4683    received by each end of the connection within a limited time or that
4684    the connection will be considered down.  If no messages have been
4685    sent recently, then a CONTACT message is sent.
4686
4687    In the case where there is no data queued to be sent, this is not a
4688    problem, but in the case where there is data queued to be sent to the
4689    partner, then the CONTACT message will not actually be transmitted
4690    until the queued data is sent.  Section 3.5 explains why waiting for
4691    TCP to determine that the connection is down is not acceptable, and
4692    leads to a requirement that the receiving server never block the
4693    sending server from sending CONTACT messages.
4694
4695    In order to meet this requirement, each server tells the other server
4696    the number of outstanding BNDUPD messages that it will accept.  The
4697    receiving server is required to always be able to accept that many
4698    BNDUPD messages off of the connection's input queue even if it cannot
4699    process them immediately, and to accept all other messages immedi-
4700    ately.
4701
4702    Thus, the sending server's TCP is never blocked from sending a
4703
4704
4705
4706 Droms, et. al.           Expires September 2003                [Page 84]
4707 \f
4708 Internet Draft           DHCP Failover Protocol              March 2003
4709
4710
4711    message except for very short periods, less than a few seconds unless
4712    the network connection itself has problems.  In this case, if the
4713    CONTACT messages don't make it to the partner then the partner will
4714    close the connection.
4715
4716    DISCUSSION:
4717
4718       When implementing this capability, one needs to be careful when
4719       sending any message on the TCP connection as TCP can easily block
4720       the server if the local TCP send buffers are full.  This can't be
4721       prevented because if the receiver is not reachable (via the net-
4722       work), the sending TCP can't send and thus it will be unable to
4723       empty the local TCP send buffers.  So, all send operations either
4724       need to assume they may block for some time or non-blocking sends
4725       must be used carefully.
4726
4727 8.4.  Using the TCP connection for binding data
4728
4729    Binding data, in the form of BNDUPD messages and BNDACK messages to
4730    respond to them, are sent across the TCP connection.
4731
4732    In order to support timely detection of any failure in the partner
4733    server, the TCP connection MUST NOT block for more than a very short
4734    time, on the order of a few seconds.  Therefore, a server that is
4735    sending BNDUPD messages MUST send only a restricted number before
4736    receiving BNDACK messages about previous messages sent.
4737
4738    The number of outstanding BNDUPD messages that each server will
4739    accept without causing TCP to block transmission of additional data
4740    (i.e, CONTACT messages) is sent by each server in the CONNECT and
4741    CONNECTACK messages in the max-unacked-bndupd option.
4742
4743 8.5.  Using the TCP connection for control messages
4744
4745    The TCP connection is used for control messages: POOLREQ, UPDREQ,
4746    STATE, CONTACT, UPDREQALL and the corresponding reply messages: POOL-
4747    RESP, UPDDONE.  A server MUST immediately accept all of these mes-
4748    sages from the TCP connection.  A server MUST immediately accept any
4749    BNDACK which is received as well.
4750
4751 8.6.  Losing the TCP connection
4752
4753    When the TCP connection is lost, then communications is not ok with
4754    the other server.  A server which has lost communications SHOULD
4755    immediately attempt to reconnect to the other server, and should
4756    retry these connection attempts periodically.
4757
4758    An acknowledgement message (BNDACK, POOLRESP, UPDDONE) message can
4759
4760
4761
4762 Droms, et. al.           Expires September 2003                [Page 85]
4763 \f
4764 Internet Draft           DHCP Failover Protocol              March 2003
4765
4766
4767    only be sent in response to a request message (BNDUPD, POOLREQ,
4768    UPDREQ, UPDREQALL) on the same TCP connection from which the request
4769    was received, in part since the XID's in the request messages are
4770    guaranteed unique only during the life of a single TCP connection.
4771
4772    When a connection to a partner server goes down, a server with unpro-
4773    cessed request messages MAY simply drop all of those messages, since
4774    it can be sure that the partner will resend them when they are next
4775    in communications.  A server with unprocessed BNDUPD messages when a
4776    TCP connection goes down MAY instead choose to process those BNDUPD
4777    messages, but it MUST NOT send any BNDACK messages in response (again
4778    because of the issues surrounding XID uniqueness).
4779
4780    When the TCP connection is closed explicitly, the DISCONNECT message
4781    with a reject-reason option (and, ideally, a message option) MUST be
4782    sent over the TCP connection.
4783
4784 9.  Failover Endpoint States
4785
4786    This section discusses the various states that a failover endpoint
4787    may take, and the server actions required when entering the state,
4788    operating in the state, and leaving the state, as well as the events
4789    that cause transitions out of the state into another state.
4790
4791    The state transition diagram in Figure 9.2-1 is relevant for this
4792    section. This is the common state transition diagram for both servers
4793    in a failover pair.  In the event that the textual description of a
4794    state differs from the state transition diagram, the textual descrip-
4795    tion is to be considered authoritative.
4796
4797 9.1.  Server Initialization
4798
4799    When a server starts it starts out in STARTUP state.  See section 9.3
4800    below for details.
4801
4802 9.2.  Server State Transitions
4803
4804    Whenever a server makes a transition into a new state, it MUST record
4805    the state and the time at which it entered that state in stable
4806    storage.  If communications is "ok", it MUST also send a STATE mes-
4807    sage to its failover partner.
4808
4809    Figure 9.2-1 is the diagram of the server state transitions. The
4810    remainder of this section contains information important to the
4811    understanding of that diagram.
4812
4813    The server stays in the current state until all of the actions speci-
4814    fied on the state transition are complete.  If communications fails
4815
4816
4817
4818 Droms, et. al.           Expires September 2003                [Page 86]
4819 \f
4820 Internet Draft           DHCP Failover Protocol              March 2003
4821
4822
4823    during one of the actions, the server simply stays in the current
4824    state and attempts a transition whenever the conditions for a transi-
4825    tion are later fulfilled.
4826
4827    In the state transition diagram below, the "+" or "-" in the upper
4828    right corner of each state is a notation about whether communication
4829    is ongoing with the other server.
4830
4831    The legend "responsive", "balanced", or "unresponsive" in each state
4832    indicates whether the server is responsive to all DHCP client
4833    requests, running in load balanced mode, or totally unresponsive in
4834    the respective state.  The terms "responsive" and "unresponsive" have
4835    the obvious meanings, while "balanced" means that a DHCP server may
4836    respond to all DHCPREQUEST messages that are RENEWAL or REBINDING,
4837    and to all other messages from clients for which the load balancing
4838    algorithm indicates that it MUST respond to.  See sections 5.3 and
4839    9.8.2 for details on load balancing.
4840
4841    Note that in situations where a server does not respond to a DHCP
4842    client message, it MUST NOT remember any of the information from that
4843    message.
4844
4845    In the state transition diagram below, when communication is reesta-
4846    blished between the two servers, each must record the state of the
4847    partner when communication was restored.  State transitions on one
4848    server in some cases imply state transitions on the partner server,
4849    so a record of the current state of the partner server must be kept
4850    by each server.
4851
4852    If the state of the partner changes while communicating a server
4853    moves through the communications-failed transition and into whatever
4854    state results.  It then immediately moves through whatever state
4855    transition is appropriate given the current state of the partner
4856    server.  A server performing this operation SHOULD NOT close the TCP
4857    connection to its partner.
4858
4859    DISCUSSION:
4860
4861       The point of this technique is simplicity, both in explanation of
4862       the protocol and in its implementation.  The alternative to this
4863       technique of memory of partner state and automatic state transi-
4864       tion on change of partner state is to have every state in the fol-
4865       lowing diagram have a state transition for every possible state of
4866       the partner.  With the approach adopted, only the states in which
4867       communications are reestablished require a state transition for
4868       each possible partner state.
4869
4870    The current state of a server MUST be recorded in stable storage and
4871
4872
4873
4874 Droms, et. al.           Expires September 2003                [Page 87]
4875 \f
4876 Internet Draft           DHCP Failover Protocol              March 2003
4877
4878
4879    thus be available to the server after a server restart.
4880
4881    A transition into SHUTDOWN or PAUSED state is not represented in the
4882    following figure, since other than sending that state to its partner,
4883    the remaining actions involved look just like the server halting in
4884    its otherwise current state, which then becomes the previous state
4885    upon server restart.
4886
4887
4888
4889
4890
4891
4892
4893
4894
4895
4896
4897
4898
4899
4900
4901
4902
4903
4904
4905
4906
4907
4908
4909
4910
4911
4912
4913
4914
4915
4916
4917
4918
4919
4920
4921
4922
4923
4924
4925
4926
4927
4928
4929
4930 Droms, et. al.           Expires September 2003                [Page 88]
4931 \f
4932 Internet Draft           DHCP Failover Protocol              March 2003
4933
4934
4935
4936         +---------------+  V  +--------------+
4937         |    RECOVER -|+|  |  |   STARTUP  - |
4938         |(unresponsive) |  +->+(unresponsive)|
4939         +------+--------+     +--------------+
4940         +-Comm. OK             +-----------------+
4941         |     Other State:     |  PARTNER DOWN - +<----------------------+
4942         |    RESOLUTION-INTER. | (responsive)    |                       ^
4943        All     POTENTIAL-      +----+------------+                       |
4944       Others   CONFLICT------------ | --------+                          |
4945         |      CONFLICT-DONE     Comm. OK     |     +--------------+     |
4946      UPDREQ or                 Other State:   |  +--+ RESOLUTION - |     |
4947      UPDREQALL                  |       |     |  |  | INTERRUPTED  |     |
4948      Rcv UPDDONE             RECOVER    All   |  |  | (responsive) |     |
4949         |  +---------------+    |      Others |  |  +------------+-+     |
4950         +->+RECOVER-WAIT +-| RECOVER    |     |  |         ^     |       |
4951            |(unresponsive) |  WAIT or   |     |  Comm.     |    Ext.     |
4952            +-----------+---+  DONE      |     |  OK     Comm.   Cmd----->+
4953     Comm.---+     Wait MCLT     |       V     V  V     Failed            |
4954     Changed |          V    +---+   +---+-----+--+-+       |             |
4955      |  +---+----------++   |       |  POTENTIAL + +-------+             |
4956      |  |RECOVER-DONE +-|  Wait     |  CONFLICT    +------+              |
4957      +->+(unresponsive) |  for      |(unresponsive)|   Primary           |
4958         +------+--------+  Other  +>+----+--------++   resolve     Comm. |
4959          Comm. OK          State: |      |        ^    conflict  Changed |
4960     +---Other State:-+   RECOVER  |   Secondary   |       V       V   |  |
4961     |    |           |     DONE   |    resolve    |   ++----------+---++ |
4962     | All Others:  POTENT.  |     |   conflict    |   |CONFLICT-DONE-|+| |
4963     | Wait for    CONFLICT- | ----+    see (9.10) |   | (responsive)   | |
4964     | Other State:          V            V        |   +------+---------+ |
4965     | NORMAL or RECOVER    ++------------+---+      Other State: NORMAL  |
4966     |    |       DONE      |     NORMAL    + +<--------------+           |
4967     |    +--+----------+-->+   (balanced)    +-------External Command--->+
4968     |       ^          ^   +--------+--------+       or Other State:     |
4969     |       |          |            |             |  SHUTDOWN            |
4970     |   Wait for   Comm. OK  Comm. Failed or      |                      |
4971     |    Other      Other    Other State: PAUSED  |               External
4972     |    State:     State:          |             |                Command
4973     | RECOVER-DONE  NORMAL     Start Safe      Comm. OK                or
4974     |       |     COMM. INT.  Period Timer    Other State:            Safe
4975     |    Comm. OK.     |            V          All Others           Period
4976     |   Other State:   |  +---------+--------+    |             expiration
4977     |     RECOVER      +--+ COMMUNICATIONS - +----+                      |
4978     |       +-------------+   INTERRUPTED    |                           |
4979     RECOVER               |  (responsive)    +-------------------------->+
4980     RECOVER-WAIT--------->+------------------+
4981                     Figure 9.2-1:  Server state diagram.
4982
4983
4984
4985
4986 Droms, et. al.           Expires September 2003                [Page 89]
4987 \f
4988 Internet Draft           DHCP Failover Protocol              March 2003
4989
4990
4991
4992 9.3.  STARTUP state
4993
4994    The STARTUP state affords an opportunity for a server to probe its
4995    partner server, before starting to service DHCP clients.
4996
4997    DISCUSSION:
4998
4999       Without the STARTUP state, a server would likely start in a state
5000       derived from its previously stored state (held in stable storage),
5001       if any.  However, this may be inconsistent with the current state
5002       of the partner.  The STARTUP state affords the opportunity for a
5003       server to potentially learn the partner's state and determine if
5004       that state is consistent with its derived starting state or
5005       whether some significant state change has occurred at the partner
5006       that forces the server to start in another state.  This is
5007       especially critical if significant time has elapsed while the
5008       server was down.
5009
5010
5011 9.3.1.  Operation while in STARTUP state
5012
5013    Whenever a server is in STARTUP state, it MUST be unresponsive to
5014    DHCP client requests, and so the time spent in the STARTUP state is
5015    necessarily short, typically on the order of a few seconds to a few
5016    tens of seconds.  The exact time spent in the STARTUP state is imple-
5017    mentation dependent, and the primary and secondary server are not
5018    required to spend the same amount of time in the STARTUP state.  See
5019    section 5.9 for some guidelines on the time to spend in STARTUP
5020    state.
5021
5022    Whenever a STATE message is sent to the partner while in STARTUP
5023    state the STARTUP bit MUST be set in the server-flags option and the
5024    previously recorded failover state MUST be placed in the server-state
5025    option.
5026
5027
5028 9.3.2.  Transition out of STARTUP state
5029
5030    Each server starts out in startup state every time it initializes
5031    itself, and performs the following algorithm as part of its initiali-
5032    zation:
5033
5034       1.  Is there any record in stable storage of a previous failover
5035           state?  If yes, set previous-state to the last recorded state
5036           in stable storage, and continue with step 2.
5037
5038           Is there any configuration information that indicates that
5039
5040
5041
5042 Droms, et. al.           Expires September 2003                [Page 90]
5043 \f
5044 Internet Draft           DHCP Failover Protocol              March 2003
5045
5046
5047           this server was previously running but lost its stable
5048           storage?  Such information must typically come from some
5049           administrative intervention, since it is difficult for a
5050           server to distinguish first startup from a startup after it
5051           has lost its stable storage.  If yes, then set the previous-
5052           state to RECOVER, and set the time-of-failure to whatever time
5053           was configured, and go on to step 2.  This time-of-failure
5054           will be used in the transition out of the RECOVER-WAIT state
5055           into the RECOVER-DONE state, below.
5056
5057           If there is no record of any previous failover state in stable
5058           storage for this server, then set the previous-state to
5059           RECOVER and set the time-of-failure to a time before the
5060           maximum-client-lead-time before now.  If using standard Posix
5061           times, 0 would typically do quite well.  This will allow two
5062           servers which already have lease information to synchronize
5063           themselves prior to operating.
5064
5065           Note that neither server is responsive to DHCP client requests
5066           while in the RECOVER state.  If both servers can communicate,
5067           however, they will come out of the RECOVER state and progress
5068           through RECOVER-WAIT to RECOVER-DONE and thence to NORMAL or
5069           COMMUNICATIONS-INTERRUPTED state quickly.  If both have state,
5070           then they will exchange information.  If only one has state,
5071           then the one that does not will complete its update of its
5072           partner quickly (since it has nothing to send).
5073
5074           In some cases, an existing server will be commissioned as a
5075           failover server and brought back into operation where its
5076           partner is not yet available.  In this case, the newly commis-
5077           sioned failover server will not operate until its partner
5078           comes online  -- but it has operational responsibilities as a
5079           DHCP server nonetheless.  To properly handle this situation, a
5080           server SHOULD be configurable in such a way as to move
5081           directly into PARTNER-DOWN state after the startup period
5082           expires if it has been unable to contact its partner during
5083           the startup period.
5084
5085       2.  If the previous state is one where communications was "OK",
5086           then set the previous state to the state that is the result of
5087           the communications failed state transition in Figure 9.2-1 (if
5088           such transition is shown -- some states don't have a communi-
5089           cations failed state transition, since they allow both commun-
5090           ications OK and failed).
5091
5092       3.  Start the STARTUP state timer.  The time that a server remains
5093           in the STARTUP state (absent any communications with its
5094           partner) is implementation dependent and SHOULD be
5095
5096
5097
5098 Droms, et. al.           Expires September 2003                [Page 91]
5099 \f
5100 Internet Draft           DHCP Failover Protocol              March 2003
5101
5102
5103           configurable.  It SHOULD be long enough for a TCP connection
5104           to be created to a heavily loaded partner across a slow net-
5105           work.
5106
5107       4.  Attempt to create a TCP connection to the failover partner.
5108           See section 8.2.
5109
5110       5.  Wait for "communications okay", i.e., the process discussed in
5111           section 8.2 "Creating the TCP Connection", to complete,
5112           including the receipt of a STATE message from the partner.
5113
5114           When and if communications become "okay", clear the STARTUP
5115           flag, and set the current state to the previous-state.
5116
5117           If the partner is in PARTNER-DOWN state, and if the time at
5118           which it entered PARTNER-DOWN state (as received in the
5119           start-time-of-state option in the STATE message) is later than
5120           the last recorded time of operation of this server, then set
5121           the current state to RECOVER.  If the time at which it entered
5122           PARTNER-DOWN state is earlier than the last recorded time of
5123           operation of this server, then set the current state to
5124           POTENTIAL-CONFLICT.
5125
5126           Then, transition to the current state and take the "communica-
5127           tions okay" state transition based on the current state of
5128           this server and the partner.
5129
5130       6.  If the startup time expires, take an implementation dependent
5131           action:  The server MAY go to the previous-state, or the
5132           server MAY wait.
5133
5134           Reasons to go to previous-state and begin processing:
5135
5136           If the current server is the only operational server, then if
5137           it waits, there will be no operational DHCP servers.  This
5138           situation could occur very easily where one server fails and
5139           then the other crashes and reboots.  If the rebooting server
5140           doesn't start processing DHCP client requests without first
5141           being in communication with the other server, then the level
5142           of DHCP redundancy is not particularly high.  This is an
5143           appropriate approach if the possibility of partition is low,
5144           or if the safe period expiration time is well beyond the time
5145           at which an operator would notice and react to a partition
5146           situation.  It is also quite appropriate if the safe period
5147           will never expire.
5148
5149           Reasons to wait:
5150
5151
5152
5153
5154 Droms, et. al.           Expires September 2003                [Page 92]
5155 \f
5156 Internet Draft           DHCP Failover Protocol              March 2003
5157
5158
5159           If the current server has been down for longer than the
5160           maximum-client-lead-time, and it is partitioned from the other
5161           server, then when it returns it will attempt to use its own
5162           available addresses to allocate to new DHCP clients, and the
5163           other server may well be in PARTNER-DOWN state and may have
5164           already allocated some of those available addresses to DHCP
5165           clients.  In cases where the possibility of partition is high,
5166           and the safe period expiration time is less than the likely
5167           operator reaction time, this is a good approach to use.
5168
5169 9.4.  PARTNER-DOWN state
5170
5171    PARTNER-DOWN state is a state either server can enter.  When in this
5172    state, the server does not assume that the other server could still
5173    be operating and servicing a different set of clients, but instead
5174    assumes that it is the only server operating. If one server is in
5175    PARTNER-DOWN state, the other server MUST NOT be operating.
5176
5177
5178 9.4.1.  Upon entry to PARTNER-DOWN state
5179
5180    No special actions are required when entering PARTNER-DOWN state.
5181
5182    The server should continue to attempt to connect to the partner
5183    periodically.
5184
5185
5186 9.4.2.  Operation while in PARTNER-DOWN state
5187
5188    A server in PARTNER-DOWN state MUST respond to DHCP client requests.
5189    It will allow renewal of all outstanding leases on IP addresses, and
5190    will allocate IP addresses from its own pool, and after a fixed
5191    period of time (the MCLT interval) has elapsed from entry into
5192    PARTNER-DOWN state, it will allocate IP addresses from the set of all
5193    available IP addresses.
5194
5195    Once a server has entered NORMAL state, the PARTNER-DOWN state is
5196    entered only on command of an external agency (typically an adminis-
5197    trator of some sort) or after the expiration of an externally config-
5198    ured minimum safe-time after the beginning of COMMUNICATIONS-
5199    INTERRUPTED state.
5200
5201    Any IP address tagged as available for allocation by the other server
5202    (at entry to PARTNER-DOWN state) MUST NOT be allocated to a new
5203    client until the maximum-client-lead-time beyond the entry into
5204    PARTNER-DOWN state has elapsed.
5205
5206    A server in PARTNER-DOWN state MUST NOT allocate an IP address to a
5207
5208
5209
5210 Droms, et. al.           Expires September 2003                [Page 93]
5211 \f
5212 Internet Draft           DHCP Failover Protocol              March 2003
5213
5214
5215    DHCP client different from that to which it was allocated at the
5216    entrance to PARTNER-DOWN state until the maximum-client-lead-time
5217    beyond the maximum of the following times: client expiration time,
5218    most recently transmitted potential-expiration-time, most recently
5219    received ack of potential-expiration-time from the partner, and most
5220    recently acked potential-expiration-time to the partner.  See section
5221    7.1.5 for details.  If this time would be earlier than the current
5222    time plus the maximum-client-lead-time, then the time the server
5223    entered PARTNER-DOWN state plus the maximum-client-lead-time is used.
5224
5225    Two options exist for lease times given out while in PARTNER-DOWN
5226    state, with different ramifications flowing from each.
5227
5228    If the server wishes the Failover protocol to protect it from loss of
5229    stable storage in PARTNER-DOWN state, then it should ensure that the
5230    MCLT based lease time restrictions in section 5.1 are maintained,
5231    even in PARTNER-DOWN state.
5232
5233    If the server wishes to forego the protection of the Failover proto-
5234    col in the event of loss of stable storage, then it need recognize no
5235    restrictions on actual client lease times while in PARTNER-DOWN
5236    state.
5237
5238    A server in PARTNER-DOWN state MUST continue to attempt to establish
5239    communications and synchronization with its partner.
5240
5241 9.4.3.  Transitions out of PARTNER-DOWN state
5242
5243    When a server in PARTNER-DOWN state succeeds in establishing a con-
5244    nection to its partner, its actions are conditional on the state and
5245    flags received in the STATE message from the other server as part of
5246    the process of establishing the connection.
5247
5248    If the STARTUP bit is set in the server-flags option of a received
5249    STATE message, a server in PARTNER-DOWN state MUST NOT take any state
5250    transitions based on reestablishing communications. Essentially, if a
5251    server is in PARTNER-DOWN state, it ignores all STATE messages from
5252    its partner that have the STARTUP bit set in the server-flags option
5253    of the STATE message.
5254
5255    If the STARTUP bit is not set in the server-flags option of a STATE
5256    message received from its partner, then a server in PARTNER-DOWN
5257    state takes the following actions based on the value of the server-
5258    state option in the received STATE message (either immediately after
5259    establishing communications or at any time later when a new state is
5260    received):
5261
5262       o partner in NORMAL, COMMUNICATIONS-INTERRUPTED, PARTNER-DOWN,
5263
5264
5265
5266 Droms, et. al.           Expires September 2003                [Page 94]
5267 \f
5268 Internet Draft           DHCP Failover Protocol              March 2003
5269
5270
5271         POTENTIAL-CONFLICT, RESOLUTION-INTERRUPTED, or CONFLICT-DONE
5272         state
5273
5274         transition to POTENTIAL-CONFLICT state
5275
5276       o partner in RECOVER, RECOVER-WAIT, SHUTDOWN, PAUSED state
5277
5278         stay in PARTNER-DOWN state
5279
5280       o partner in RECOVER-DONE state
5281
5282         transition into NORMAL state
5283
5284 9.5.  RECOVER state
5285
5286    This state indicates that the server has no information in its stable
5287    storage or that it is re-integrating with a server in PARTNER-DOWN
5288    state after it has been down.  A server in this state MUST attempt to
5289    refresh its stable storage from the other server.
5290
5291 9.5.1.  Operation in RECOVER state
5292
5293    A server in RECOVER MUST NOT respond to DHCP client requests.
5294
5295    A server in RECOVER state will attempt to reestablish communications
5296    with the other server.
5297
5298 9.5.2.  Transitions out of RECOVER state
5299
5300    If the other server is in POTENTIAL-CONFLICT, RESOLUTION-INTERRUPTED,
5301    or CONFLICT-DONE state when communications are reestablished, then
5302    the server in RECOVER state will move to POTENTIAL-CONFLICT state
5303    itself.
5304
5305    If the other server is in any other state, then the server in RECOVER
5306    state will request an update of missing binding information by send-
5307    ing an UPDREQ message.  If the server has been instructed (through
5308    configuration or other external agency) that it has lost its stable
5309    storage, or if it has deduced that from the fact that it has no
5310    record of ever having talked to its partner, while its partner does
5311    have a record of communicating with it, it MUST send an UPDREQALL
5312    message, otherwise it MUST send an UPDREQ message.  See Figure
5313    9.5.2-1.
5314
5315    It will wait for an UPDDONE message, and upon receipt of that message
5316    it will transition to RECOVER-WAIT state.
5317
5318    If communications fails during the reception of the results of the
5319
5320
5321
5322 Droms, et. al.           Expires September 2003                [Page 95]
5323 \f
5324 Internet Draft           DHCP Failover Protocol              March 2003
5325
5326
5327    UPDREQ or UPDREQALL message, the server will remain in RECOVER state,
5328    and will re-issue the UPDREQ or UPDREQALL when communications are
5329    re-established.  (See section 5.17).
5330
5331    If an UPDDONE message isn't received within an implementation depen-
5332    dent amount of time, and no BNDUPD messages are being received, the
5333    connection SHOULD be dropped.
5334
5335
5336
5337
5338                 A                                        B
5339               Server                                  Server
5340
5341                 |                                        |
5342              RECOVER                               PARTNER-DOWN
5343                 |                                        |
5344                 | >--UPDREQ-------------------->         |
5345                 |                                        |
5346                 |        <---------------------BNDUPD--< |
5347                 | >--BNDACK-------------------->         |
5348                ...                                      ...
5349                 |                                        |
5350                 |        <---------------------BNDUPD--< |
5351                 | >--BNDACK-------------------->         |
5352                 |                                        |
5353                 |        <--------------------UPDDONE--< |
5354                 |                                        |
5355            RECOVER-WAIT                                  |
5356                 |                                        |
5357                 | >--STATE-(RECOVER-WAIT)------>         |
5358                 |                                        |
5359                 |                                        |
5360        Wait MCLT from last known                         |
5361           time of failover operation                     |
5362                 |                                        |
5363            RECOVER-DONE                                  |
5364                 |                                        |
5365                 | >--STATE-(RECOVER-DONE)------>         |
5366                 |                                     NORMAL
5367                 |        <-------------(NORMAL)-STATE--< |
5368              NORMAL                                      |
5369                 | >---- State-(NORMAL)--------------->
5370                 |                                        |
5371                 |                                        |
5372
5373               Figure 9.5.2-1:  Transition out of RECOVER state
5374
5375
5376
5377
5378 Droms, et. al.           Expires September 2003                [Page 96]
5379 \f
5380 Internet Draft           DHCP Failover Protocol              March 2003
5381
5382
5383
5384 If, at any time while a server is in RECOVER state communications fails,
5385 the server will stay in RECOVER state.  When communications are
5386 restored, it will restart the process of transitioning out of RECOVER
5387 state.
5388
5389 9.6.  RECOVER-WAIT state
5390
5391    This state indicates that the server has done an UPDREQ or UPDREQALL
5392    and has received the UPDDONE message indicating that it has received
5393    all outstanding binding update information.  In the RECOVER-WAIT
5394    state the server will wait for the MCLT in order to ensure that any
5395    processing that this server might have done prior to losing its
5396    stable storage will not cause future difficulties.
5397
5398 9.6.1.  Operation in RECOVER-WAIT state
5399
5400    A server in RECOVER-WAIT MUST NOT respond to DHCP client requests.
5401
5402 9.6.2.  Transitions out of RECOVER-WAIT state
5403
5404    Upon entry to RECOVER-WAIT state the server MUST start a timer whose
5405    expiration is set to a time equal to the time the server went down
5406    (if known) or the time the server started (if the down-time is
5407    unknown) plus the maximum-client-lead-time.  When this timer goes
5408    off, the server will transition into RECOVER-DONE state.
5409
5410    This is to allow any IP addresses that were allocated by this server
5411    prior to loss of its client binding information in stable storage to
5412    contact the other server or to time out.
5413
5414    If this is the first time this server has run failover -- as
5415    determined by the information received from the partner, not
5416    necessarily only as determined by this server's stable storage (as
5417    that may have been lost), then the waiting time discussed above may
5418    be skipped, and the server may transition immediately to RECOVER-DONE
5419    state.
5420
5421    See Figure 9.5.2-1.
5422
5423    DISCUSSION:
5424
5425       The actual requirement on this wait period in RECOVER is that it
5426       start not before the recovering server went down, not necessarily
5427       when it came back up.  If the time when the recovering server
5428       failed is known, it could be communicated to the recovering server
5429       (perhaps through actions of the network administrator), and the
5430       wait period could be reduced to the maximum-client-lead-time less
5431
5432
5433
5434 Droms, et. al.           Expires September 2003                [Page 97]
5435 \f
5436 Internet Draft           DHCP Failover Protocol              March 2003
5437
5438
5439       the difference between the current time and the time the server
5440       failed.  In this way, the waiting period could be minimized.
5441       Various heuristics could be used to estimate this time, for
5442       example if the recovering server periodically updates stable
5443       storage with a time stamp, the wait period could be calculated to
5444       start at the time of the last update of stable storage plus the
5445       time required for the next update (which never occurred).  This
5446       estimate is later than the server went down, but probably not too
5447       much later.
5448
5449       If the server has never before run failover, then there is no need
5450       to wait in this state -- but, again, to determine if this server
5451       has run failover it is vital that the information provided by the
5452       partner be utilized, since the stable storage of this server may
5453       have been lost.
5454
5455    If communications fails while a server is in RECOVER-WAIT state, it
5456    has no effect on the operation of this state.  The server SHOULD
5457    continue to operate its timer, and the timer goes off during the
5458    period where communications with the other server have failed, then
5459    the server SHOULD transition to RECOVER-DONE state.  This is rare --
5460    failover state transitions are not usually made while communications
5461    are interrupted, but in this case there is no reason to inhibit the
5462    timer.  A server MAY state in RECOVER-WAIT state even after expiry of
5463    the timer and transition to RECOVER-DONE state upon re-establishing
5464    communications with the partner if desired.  The key point here is to
5465    allow the timer to continue to operate, not whether or not the state
5466    transition is made before or after communications are re-established.
5467
5468
5469 9.7.  RECOVER-DONE state
5470
5471    This state exists to allow an interlocked transition for one server
5472    from RECOVER state and another server from PARTNER-DOWN or
5473    COMMUNICATIONS-INTERRUPTED state into NORMAL state.
5474
5475 9.7.1.  Operation in RECOVER-DONE state
5476
5477    A server in RECOVER-DONE state MUST respond only to
5478    DHCPREQUEST/RENEWAL and DHCPREQUEST/REBINDING DHCP messages.
5479
5480 9.7.2.  Transitions out of RECOVER-DONE state
5481
5482    When a server in RECOVER-DONE state determines that its partner
5483    server has entered NORMAL or RECOVER-DONE state, then it will transi-
5484    tion into NORMAL state.
5485
5486    If communications fails while in RECOVER-DONE state, a server will
5487
5488
5489
5490 Droms, et. al.           Expires September 2003                [Page 98]
5491 \f
5492 Internet Draft           DHCP Failover Protocol              March 2003
5493
5494
5495    stay in RECOVER-DONE state.
5496
5497
5498    9.8.  NORMAL state
5499
5500    NORMAL state is the state used by a server when it is communicating
5501    with the other server, and any required resynchronization has been
5502    performed. While some bindings database synchronization is performed
5503    in NORMAL state, potential conflicts are resolved prior to entry into
5504    NORMAL state as is binding database data loss.
5505
5506
5507 9.8.1.  Upon entry to NORMAL state
5508
5509    When entering NORMAL state, a server will send to the other server
5510    all currently unacknowledged binding updates as BNDUPD messages.
5511
5512    When the above process is complete, if the server entering NORMAL
5513    state is a secondary server, then it will request IP addresses for
5514    allocation using the POOLREQ message.
5515
5516
5517 9.8.2.  Processing DHCP client requests and load balancing
5518
5519    In NORMAL state, a server MUST process every DHCPREQUEST/RENEWAL or
5520    DHCPREQUEST/REBINDING request it receives. And, it processes other
5521    requests only for those clients as dictated by the load balancing
5522    algorithm specified in [RFC 3074].
5523
5524    As discussed in section 5.3, each server will take the client-
5525    identifier from each DHCP client request (or the client-hardware-
5526    address, i.e., the chaddr if no client-identifier is present in the
5527    request) and use it as the 'Request ID' specified in [RFC 3074].
5528    After applying the algorithm specified in [RFC 3074] and comparing
5529    the result with the hash bucket assignment (performed during connect
5530    processing between failover servers), each failover server will be
5531    able to unambiguously determine if it should process the DHCP client
5532    request.
5533
5534 9.8.3.  Operation in NORMAL state
5535
5536    When in NORMAL state, for every DHCP client request that it
5537    processes, as determined by the algorithm described in section 9.8.2,
5538    above, a server will operate in the following manner:
5539
5540       o Lease time calculations
5541
5542         As discussed in section 5.2.1, "Control of lease time", the
5543
5544
5545
5546 Droms, et. al.           Expires September 2003                [Page 99]
5547 \f
5548 Internet Draft           DHCP Failover Protocol              March 2003
5549
5550
5551         lease interval given to a DHCP client can never be more than the
5552         MCLT greater than the most recently received potential-
5553         expiration-time from the failover partner or the current time,
5554         whichever is later.
5555
5556         As long as a server adheres to this constraint, the specifics of
5557         the lease interval that it gives to a DHCP client or the value
5558         of the potential-expiration-time sent to its failover partner
5559         are implementation dependent.  One possible approach is dis-
5560         cussed in section 5.2.1, but that particular approach is in no
5561         way required by this protocol.
5562
5563         See section 7.1.5 for details concerning the storage of time
5564         associated with IP addresses and how to use these times when
5565         calculating lease times for DHCP clients.
5566
5567       o Lazy update of partner server
5568
5569         After an DHCPACK of a IP address binding, the server servicing a
5570         DHCP client request attempts to update its partner with the new
5571         binding information.  The lease time used in the update of the
5572         secondary MUST be at least that given to the DHCP client in the
5573         DHCPACK, and the potential-expiration-time MUST be at least the
5574         lease time, and SHOULD be considerably longer.
5575
5576       o Reallocation of IP addresses between clients
5577
5578         Whenever a client binding is released or expires, a BNDUPD mes-
5579         sage must be sent to the partner, setting the binding state to
5580         RELEASED or EXPIRED.  However, until a BNDACK is received for
5581         this message, the IP address cannot be allocated to another
5582         client.  It cannot be allocated to the same client again if a
5583         BNDUPD was sent, otherwise it can.  See section 5.2.2.
5584
5585    In normal state, each server receives binding updates from its
5586    partner server in BNDUPD messages.  It records these in its client
5587    binding database in stable storage and then sends a corresponding
5588    BNDACK message to its partner server.  It MUST ensure that the infor-
5589    mation is recorded in stable storage prior to sending the BNDACK mes-
5590    sage back to its partner.
5591
5592
5593 9.8.4.  Transitions out of NORMAL state
5594
5595    If an external command is received by a server in NORMAL state
5596    informing it that its partner is down, then transition into PARTNER-
5597    DOWN state.  Generally, this would be an unusual situation, where
5598    some external agency knew the partner server was down.  Using the
5599
5600
5601
5602 Droms, et. al.           Expires September 2003               [Page 100]
5603 \f
5604 Internet Draft           DHCP Failover Protocol              March 2003
5605
5606
5607    command in this case would be appropriate if the polling interval and
5608    timeout were long.
5609
5610    If a server in NORMAL state fails to receive acks to messages sent to
5611    its partner for an implementation dependent period of time, it MAY
5612    move into COMMUNICATIONS-INTERRUPTED state.  This situation might
5613    occur if the partner server was capable of maintaining the TCP con-
5614    nection between the server and also capable of sending a CONTACT mes-
5615    sage every tSend seconds, but was (for some reason) incapable of pro-
5616    cessing BNDUPD messages.
5617
5618    If the communications is determined to not be "ok" (as defined in
5619    section 8), then transition into COMMUNICATIONS-INTERRUPTED state.
5620
5621    If a server in NORMAL state receives any messages from its partner
5622    where the partner has changed state from that expected by the server
5623    in NORMAL state, then the server should transition into
5624    COMMUNICATIONS-INTERRUPTED state and take the appropriate state tran-
5625    sition from there.  For example, it would be expected for the partner
5626    to transition from POTENTIAL-CONFLICT into NORMAL state, but not for
5627    the partner to transition from NORMAL into POTENTIAL-CONFLICT state.
5628
5629    If a server in NORMAL state receives any messages from its partner
5630    where the PARTNER has changed into PAUSED state, the server should
5631    transition into COMMUNICATIONS-INTERRUPTED state.  If a server in
5632    NORMAL state receives any messages from its partner where the PARTNER
5633    has changed into SHUTDOWN state, the server should transition into
5634    PARTNER-DOWN state.
5635
5636 9.9.  COMMUNICATIONS-INTERRUPTED State
5637
5638    A server goes into COMMUNICATIONS-INTERRUPTED state whenever it is
5639    unable to communicate with the other server.  Primary and secondary
5640    servers cycle automatically (without administrative intervention)
5641    between NORMAL and COMMUNICATIONS-INTERRUPTED state as the network
5642    connection between them fails and recovers, or as the partner server
5643    cycles between operational and non-operational.  No duplicate IP
5644    address allocation can occur while the servers cycle between these
5645    states.
5646
5647
5648 9.9.1.  Upon entry to COMMUNICATIONS-INTERRUPTED state
5649
5650    When a server enters COMMUNICATIONS-INTERRUPTED state, if it has been
5651    configured to support an automatic transition out of COMMUNICATIONS-
5652    INTERRUPTED state and into PARTNER-DOWN state (i.e., a "safe period"
5653    has been configured, see section 10), then a timer MUST be started
5654    for the length of the configured safe period.
5655
5656
5657
5658 Droms, et. al.           Expires September 2003               [Page 101]
5659 \f
5660 Internet Draft           DHCP Failover Protocol              March 2003
5661
5662
5663    A server transitioning into the COMMUNICATIONS-INTERRUPTED state from
5664    the NORMAL state SHOULD raise some alarm condition to alert adminis-
5665    trative staff to a potential problem in the DHCP subsystem.
5666
5667
5668 9.9.2.  Operation in COMMUNICATIONS-INTERRUPTED State
5669
5670    In this state a server MUST respond to all DHCP client requests, and
5671    the algorithm for load balancing described in section 5.3 MUST NOT be
5672    used.  When allocating new IP addresses, each server allocates from
5673    its own IP address pool, where the primary MUST allocate only FREE IP
5674    addresses, and the secondary MUST allocate only BACKUP IP addresses.
5675    When responding to renewal requests, each server will allow continued
5676    renewal of a DHCP client's current lease on an IP address irrespec-
5677    tive of whether that lease was given out by the receiving server or
5678    not, although the renewal period MUST NOT exceed the maximum client
5679    lead time (MCLT) beyond the latest of: 1) the potential-expiration-
5680    time already acknowledged by the other server, or 2) the lease-
5681    expiration-time, or 3) the potential-expiration-time received from
5682    the partner server.
5683
5684    However, since the server cannot communicate with its partner in this
5685    state, the acknowledged-potential-expiration time will not be updated
5686    in any new bindings.  This is likely to eventually cause the actual-
5687    client-lease-times to be the current time plus the maximum-client-
5688    lead-time (unless this is greater than the desired-client-lease-
5689    time).
5690
5691    The server should continue to try to establish a connection with its
5692    partner.
5693
5694
5695 9.9.3.  Transition out of COMMUNICATIONS-INTERRUPTED State
5696
5697    If the safe period timer expires while a server is in the
5698    COMMUNICATIONS-INTERRUPTED state, it will transition immediately into
5699    PARTNER-DOWN state.
5700
5701    If an external command is received by a server in COMMUNICATIONS-
5702    INTERRUPTED state informing it that its partner is down, it will
5703    transition immediately into PARTNER-DOWN state.
5704
5705    If communications is restored with the other server, then the server
5706    in COMMUNICATIONS-INTERRUPTED state will transition into another
5707    state based on the state of the partner:
5708
5709       o partner in NORMAL or COMMUNICATIONS-INTERRUPTED
5710
5711
5712
5713
5714 Droms, et. al.           Expires September 2003               [Page 102]
5715 \f
5716 Internet Draft           DHCP Failover Protocol              March 2003
5717
5718
5719         The partner SHOULD NOT be in NORMAL state here, since upon res-
5720         toration of communications it MUST have created a new TCP con-
5721         nection which would have forced it into COMMUNICATIONS-
5722         INTERRUPTED state.  Still, we should account for every state
5723         just in case.
5724
5725         Transition into the NORMAL state.
5726
5727       o partner in RECOVER
5728
5729         Stay in COMMUNICATIONS-INTERRUPTED state.
5730
5731       o partner in RECOVER-DONE
5732
5733         Transition into NORMAL state.
5734
5735       o partner in PARTNER-DOWN, POTENTIAL-CONFLICT, CONFLICT-DONE, or
5736         RESOLUTION-INTERRUPTED
5737
5738         Transition into POTENTIAL-CONFLICT state.
5739
5740       o partner in PAUSED
5741
5742         Stay in COMMUNICATIONS-INTERRUPTED state.
5743
5744       o partner in SHUTDOWN
5745
5746         Transition into PARTNER-DOWN state.
5747
5748    The following figure illustrates the transition from NORMAL to
5749    COMMUNICATIONS-INTERRUPTED state and then back to NORMAL state again.
5750
5751
5752
5753
5754
5755
5756
5757
5758
5759
5760
5761
5762
5763
5764
5765
5766
5767
5768
5769
5770 Droms, et. al.           Expires September 2003               [Page 103]
5771 \f
5772 Internet Draft           DHCP Failover Protocol              March 2003
5773
5774
5775
5776              Primary                                Secondary
5777               Server                                  Server
5778
5779               NORMAL                                  NORMAL
5780                 | >--CONTACT------------------->         |
5781                 |        <--------------------CONTACT--< |
5782                 |         [TCP connection broken]        |
5783            COMMUNICATIONS          :              COMMUNICATIONS
5784              INTERRUPTED           :                INTERRUPTED
5785                 |      [attempt new TCP connection]      |
5786                 |         [connection succeeds]          |
5787                 |                                        |
5788                 | >--CONNECT------------------->         |
5789                 |        <-----------------CONNECTACK--< |
5790                 |                                     NORMAL
5791                 |        <-------------------STATE-----< |
5792               NORMAL                                     |
5793                 | >--STATE--------------------->         |
5794                 |
5795                 | >--BNDUPD-------------------->         |
5796                 |        <---------------------BNDACK--< |
5797                 |                                        |
5798                 |        <---------------------BNDUPD--< |
5799                 | >------BNDACK---------------->         |
5800                ...                                      ...
5801                 |                                        |
5802                 |        <--------------------POOLREQ--< |
5803                 | >--POOLRESP-(2)-------------->         |
5804                 |                                        |
5805                 | >--BNDUPD-(#1)--------------->         |
5806                 |        <---------------------BNDACK--< |
5807                 |                                        |
5808                 |        <--------------------POOLREQ--< |
5809                 | >--POOLRESP-(0)-------------->         |
5810                 |                                        |
5811                 | >--BNDUPD-(#2)--------------->         |
5812                 |        <---------------------BNDACK--< |
5813                 |                                        |
5814
5815        Figure 9.9.3-1:  Transition from NORMAL to COMMUNICATIONS-
5816                         INTERRUPTED and back (example with 2
5817                         addresses allocated to secondary)
5818
5819
5820
5821
5822
5823
5824
5825
5826 Droms, et. al.           Expires September 2003               [Page 104]
5827 \f
5828 Internet Draft           DHCP Failover Protocol              March 2003
5829
5830
5831
5832 9.10.  POTENTIAL-CONFLICT state
5833
5834    This state indicates that the two servers are attempting to re-
5835    integrate with each other, but at least one of them was running in a
5836    state that did not guarantee automatic reintegration would be
5837    possible.  In POTENTIAL-CONFLICT state the servers may determine that
5838    the same IP address has been offered and accepted by two different
5839    DHCP clients.
5840
5841    It is a goal of this protocol to minimize the possibility that
5842    POTENTIAL-CONFLICT state is ever entered.
5843
5844 9.10.1.  Upon entry to POTENTIAL-CONFLICT state
5845
5846    When a primary server enters POTENTIAL-CONFLICT state it should
5847    request that the secondary send it all updates of which it is
5848    currently unaware by sending an UPDREQ message to the secondary
5849    server.
5850
5851    A secondary server entering POTENTIAL-CONFLICT state will wait for
5852    the primary to send it an UPDREQ message.
5853
5854 9.10.2.  Operation in POTENTIAL-CONFLICT state
5855
5856    Any server in POTENTIAL-CONFLICT state MUST NOT process any incoming
5857    DHCP requests.
5858
5859
5860 9.10.3.  Transitions out of POTENTIAL-CONFLICT state
5861
5862    If communications fails with the partner while in POTENTIAL-CONFLICT
5863    state, then the server will transition to RESOLUTION-INTERRUPTED
5864    state.
5865
5866    Whenever either server receives an UPDDONE message from its partner
5867    while in POTENTIAL-CONFLICT state, it MUST transition to a new state.
5868    The primary MUST transition to CONFLICT-DONE state, and the secondary
5869    MUST transition to NORMAL state.  This will cause the primary server
5870    to leave POTENTIAL-CONFLICT state prior to the secondary, since the
5871    primary sends an UPDREQ message and receives an UPDDONE before the
5872    secondary sends an UPDREQ message and receives its UPDDONE message.
5873
5874    When a secondary server receives an indication that the primary
5875    server has made a transition from POTENTIAL-CONFLICT to CONFLICT-DONE
5876    state, it SHOULD send an UPDREQ message to the primary server.
5877
5878
5879
5880
5881
5882 Droms, et. al.           Expires September 2003               [Page 105]
5883 \f
5884 Internet Draft           DHCP Failover Protocol              March 2003
5885
5886
5887
5888
5889               Primary                                Secondary
5890               Server                                  Server
5891
5892                 |                                        |
5893          POTENTIAL-CONFLICT                    POTENTIAL-CONFLICT
5894                 |                                        |
5895                 | >--UPDREQ-------------------->         |
5896                 |                                        |
5897                 |        <---------------------BNDUPD--< |
5898                 | >--BNDACK-------------------->         |
5899                ...                                      ...
5900                 |                                        |
5901                 |        <---------------------BNDUPD--< |
5902                 | >--BNDACK-------------------->         |
5903                 |                                        |
5904                 |        <--------------------UPDDONE--< |
5905           CONFLICT-DONE                                  |
5906                 | >--STATE--(CONFLICT-DONE)---->         |
5907                 |        <---------------------UPDREQ--< |
5908                 |                                        |
5909                 | >--BNDUPD-------------------->         |
5910                 |        <---------------------BNDACK--< |
5911                ...                                      ...
5912                 | >--BNDUPD-------------------->         |
5913                 |        <---------------------BNDACK--< |
5914                 |                                        |
5915                 | >--UPDDONE------------------->         |
5916                 |                                     NORMAL
5917                 |        <------------STATE--(NORMAL)--< |
5918              NORMAL                                      |
5919                 | >--STATE--(NORMAL)----------->         |
5920                 |                                        |
5921                 |        <--------------------POOLREQ--< |
5922                 | >------POOLRESP-(n)---------->         |
5923                 |              addresses                 |
5924
5925            Figure 9.10.3-1:  Transition out of POTENTIAL-CONFLICT
5926
5927
5928
5929
5930
5931
5932
5933
5934
5935
5936
5937
5938 Droms, et. al.           Expires September 2003               [Page 106]
5939 \f
5940 Internet Draft           DHCP Failover Protocol              March 2003
5941
5942
5943
5944 9.11.  RESOLUTION-INTERRUPTED state
5945
5946    This state indicates that the two servers were attempting to re-
5947    integrate with each other in POTENTIAL-CONFLICT state, but
5948    communications failed prior to completion of re-integration.
5949
5950    If the servers remained in POTENTIAL-CONFLICT while communications
5951    was interrupted, neither server would be responsive to DHCP client
5952    requests, and if one server had crashed, then there might be no
5953    server able to process DHCP requests.
5954
5955 9.11.1.  Upon entry to RESOLUTION-INTERRUPTED state
5956
5957    When a server enters RESOLUTION-INTERRUPTED state it SHOULD raise an
5958    alarm condition to alert administrative staff of a problem in the
5959    DHCP subsystem.
5960
5961 9.11.2.  Operation in RESOLUTION-INTERRUPTED state
5962
5963    In this state a server MUST respond to all DHCP client requests, and
5964    any load balancing (described in section 5.3) MUST NOT be used.  When
5965    allocating new IP addresses, each server SHOULD allocate from its own
5966    IP address pool (if that can be determined), where the primary SHOULD
5967    allocate only FREE IP addresses, and the secondary SHOULD allocate
5968    only BACKUP IP addresses.  When responding to renewal requests, each
5969    server will allow continued renewal of a DHCP client's current lease
5970    on an IP address irrespective of whether that lease was given out by
5971    the receiving server or not, although the renewal period MUST not
5972    exceed the maximum client lead time (MCLT) beyond the latest of: 1)
5973    the potential-expiration-time already acknowledged by the other
5974    server or 2) the lease-expiration-time or 3) `potential-expiration-
5975    time received from the partner server.
5976
5977    However, since the server cannot communicate with its partner in this
5978    state, the acknowledged-potential-expiration time will not be updated
5979    in any new bindings.
5980
5981
5982 9.11.3.  Transitions out of RESOLUTION-INTERRUPTED state
5983
5984    If an external command is received by a server in RESOLUTION-
5985    INTERRUPTED state informing it that its partner is down, it will
5986    transition immediately into PARTNER-DOWN state.
5987
5988    If communications is restored with the other server, then the server
5989    in RESOLUTION-INTERRUPTED state will transition into POTENTIAL-
5990    CONFLICT state.
5991
5992
5993
5994 Droms, et. al.           Expires September 2003               [Page 107]
5995 \f
5996 Internet Draft           DHCP Failover Protocol              March 2003
5997
5998
5999
6000 9.12.  CONFLICT-DONE state
6001
6002    This state indicates that during the process where the two servers
6003    are attempting to re-integrate with each other, the primary server
6004    has received all of the updates from the secondary server.  It make a
6005    transition into CONFLICT-DONE state in order that it may be totally
6006    responsive to the client load, as opposed to NORMAL state where it
6007    would be in a "balanced" responsive state, running the load balancing
6008    algorithm.
6009
6010 9.12.1.  Upon entry to CONFLICT-DONE state
6011
6012    A secondary server should never enter CONFLICT-DONE state.
6013
6014 9.12.2.  Operation in CONFLICT-DONE state
6015
6016    A primary server in CONFLICT-DONE state is fully responsive to all
6017    DHCP clients (similar to the situation in COMMUNICATIONS-INTERRUPTED
6018    state).
6019
6020    If communications fails, remain in CONFLICT-DONE state.  If communi-
6021    cations becomes OK, remain in CONFLICT-DONE state until the condi-
6022    tions for transition out become satisfied.
6023
6024
6025 9.12.3.  Transitions out of CONFLICT-DONE state
6026
6027    If communications fails with the partner while in CONFLICT-DONE
6028    state, then the server will remain in CONFLICT-DONE state.
6029
6030    When a primary server determines that the secondary server has made a
6031    transition into NORMAL state, the primary server will also transition
6032    into NORMAL state.
6033
6034 9.13.  PAUSED state
6035
6036    This state exists to allow one server to inform another that it will
6037    be out of service for what is predicted to be a relatively short
6038    time, and to allow the other server to transition to COMMUNICATIONS-
6039    INTERRUPTED state immediately and to begin servicing all DHCP clients
6040    with no interruption in service to new DHCP clients.
6041
6042    A server which is aware that it is shutting down temporarily SHOULD
6043    send a STATE message with the server-state option containing PAUSED
6044    state and close the TCP connection.
6045
6046    While a server may or may not transition internally into PAUSED
6047
6048
6049
6050 Droms, et. al.           Expires September 2003               [Page 108]
6051 \f
6052 Internet Draft           DHCP Failover Protocol              March 2003
6053
6054
6055    state, the 'previous' state determined when it is restarted MUST be
6056    the state the server was in prior to receiving the command to shut-
6057    down and restart and which precedes its entry into the PAUSED state.
6058    See section 9.3.2 concerning the use of the previous state upon
6059    server restart.
6060
6061 9.13.1.  Upon entry to PAUSED state
6062
6063    When entering PAUSED state, the server MUST store the previous state
6064    in stable storage, and use that state as the previous state when it
6065    is restarted.
6066
6067 9.13.2.  Transitions out of PAUSED state
6068
6069    A server makes a transition out of PAUSED state by being restarted.
6070    At that time, the previous state MUST be the state the server was in
6071    prior to entering the PAUSED state.
6072
6073
6074 9.14.  SHUTDOWN state
6075
6076    This state exists to allow one server to inform another that it will
6077    be out of service for what is predicted to be a relatively long time,
6078    and to allow the other server to transition immediately to PARTNER-
6079    DOWN state, and take over completely for the server going down.
6080
6081 9.14.1.  Upon entry to SHUTDOWN state
6082
6083    When entering SHUTDOWN state, the server MUST record the previous
6084    state in stable storage for use when the server is restarted.  It
6085    also MUST record the current time as the last time operational.
6086
6087    A server which is aware that it is shutting down SHOULD send a STATE
6088    message with the server-state field containing SHUTDOWN.
6089
6090 9.14.2.  Operation in SHUTDOWN state
6091
6092    A server in SHUTDOWN state MUST NOT respond to any DHCP client input.
6093
6094    If a server receives any message indicating that the partner has
6095    moved to PARTNER-DOWN state while it is in SHUTDOWN state then it
6096    MUST record RECOVER state as the previous state to be used when it is
6097    restarted.
6098
6099    A server SHOULD wait for a few seconds after informing the partner of
6100    entry into SHUTDOWN state (if communications are okay) to determine
6101    if the partner entered PARTNER-DOWN state.
6102
6103
6104
6105
6106 Droms, et. al.           Expires September 2003               [Page 109]
6107 \f
6108 Internet Draft           DHCP Failover Protocol              March 2003
6109
6110
6111 9.14.3.  Transitions out of SHUTDOWN state
6112
6113    A server makes a transition out of SHUTDOWN state by being restarted.
6114
6115 10.  Safe Period
6116
6117    Due to the restrictions imposed on each server while in
6118    COMMUNICATIONS-INTERRUPTED state, long-term operation in this state
6119    is not feasible for either server.  One reason that these states
6120    exist at all, is to allow the servers to easily survive transient
6121    network communications failures of a few minutes to a few days
6122    (although the actual time periods will depend a great deal on the
6123    DHCP activity of the network in terms of arrival and departure of
6124    DHCP clients on the network).
6125
6126    Eventually, when the servers are unable to communicate, they will
6127    have to move into a state where they no longer can re-integrate
6128    without some possibility of a duplicate IP address allocation.  There
6129    are two ways that they can move into this state (known as PARTNER-
6130    DOWN).
6131
6132    They can either be informed by external command that, indeed, the
6133    partner server is down.  In this case, there is no difficulty in mov-
6134    ing into the PARTNER-DOWN state since it is an accurate reflection of
6135    reality and the protocol has been designed to operate correctly (even
6136    during reintegration) as long as, when in PARTNER-DOWN state the
6137    partner is, indeed, down.
6138
6139    The more difficult scenario is when the servers are running unat-
6140    tended for extended periods, and in this case an option is provided
6141    to configure something called a "safe-period" into each server.  This
6142    OPTIONAL safe-period is the period after which either the primary or
6143    secondary server will automatically transition to PARTNER-DOWN from
6144    COMMUNICATIONS-INTERRUPTED state.  If this transition is completed
6145    and the partner is not down, then the possibility of duplicate IP
6146    address allocations will exist.
6147
6148    The goal of the "safe-period" is to allow network operations staff
6149    some time to react to a server moving into COMMUNICATIONS-INTERRUPTED
6150    state.  During the safe-period the only requirement is that the net-
6151    work operations staff determine if both servers are still running --
6152    and if they are, to either fix the network communications failure
6153    between them, or to take one of the servers down before the  expira-
6154    tion of the safe-period.
6155
6156    The length of the safe-period is installation dependent, and depends
6157    in large part on the number of unallocated IP addresses within the
6158    subnet address pool and the expected frequency of arrival of
6159
6160
6161
6162 Droms, et. al.           Expires September 2003               [Page 110]
6163 \f
6164 Internet Draft           DHCP Failover Protocol              March 2003
6165
6166
6167    previously unknown DHCP clients requiring IP addresses.  Many
6168    environments should be able to support safe-periods of several days.
6169
6170    During this safe period, either server will allow renewals from any
6171    existing client.  The only limitation concerns the need for IP
6172    addresses for the DHCP server to hand out to new DHCP clients and the
6173    need to re-allocate IP addresses to different DHCP clients.
6174
6175    The number of "extra" IP addresses required is equal to the expected
6176    total number of new DHCP clients encountered during the safe period.
6177    This is dependent only on the arrival rate of new DHCP clients, not
6178    the total number of outstanding leases on IP addresses.
6179
6180    In the unlikely event that a relatively short safe period of an hour
6181    is all that can be used (given a dearth of IP addresses or a very
6182    high arrival rate of new DHCP clients), even that can provide sub-
6183    stantial benefits in allowing the DHCP subsystem to ride through
6184    minor problems that could occur and be fixed within that hour.  In
6185    these cases, no possibility of duplicate IP address allocation
6186    exists, and re-integration after the failure is solved will be
6187    automatic and require no operator intervention.
6188
6189 11.  Security
6190
6191    The Failover protocol communicates DHCP lease activity and this data
6192    is generally easily discovered via other means, such as by pinging
6193    addresses and doing DNS lookups. Therefore, the need to encrypt the
6194    data over the wire is likely not great (though some sites may feel
6195    differently).
6196
6197    However, it is very desirable to assure the integrity of failover
6198    partners and to thus ensure proper operation of the servers. For
6199    example, denial of service attacks are possible by the communication
6200    of invalid state information to one or both servers.
6201
6202    Therefore, the Failover protocol MUST be capable of being secured by
6203    using a simple shared secret message digest which covers each mes-
6204    sage.  This provides authentication of the servers, but does not pro-
6205    vide encryption of the data exchange.
6206
6207    The Failover protocol MAY also be secured by using TLS [RFC 2246]
6208    (Transport Layer Security) if encryption of the data exchange is
6209    desired.  The use of the shared secret or TLS will not protect
6210    against TCP or IP layer attacks (such as someone sending fake TCP RST
6211    segments). IPsec [RFC 2401] SHOULD be used to protect against most
6212    (if not all) of these kinds of attacks.
6213
6214
6215
6216
6217
6218 Droms, et. al.           Expires September 2003               [Page 111]
6219 \f
6220 Internet Draft           DHCP Failover Protocol              March 2003
6221
6222
6223 11.1.  Simple shared secret
6224
6225    Messages between the failover partners can be authenticated through
6226    the use of a shared secret, which is never sent over the network and
6227    must be known by each server. How each server is told about this
6228    shared secret and secures its storage of the shared secret is outside
6229    the scope of this document.  If a server is configured with a shared
6230    secret for a partner, it MUST send the message-digest option in ALL
6231    messages to that partner and it MUST treat any messages received from
6232    that partner without a message-digest option as failing authentica-
6233    tion and reject them with reject reason 21: "Missing message digest".
6234    Note that the message digest option MUST be the first option in the
6235    message.
6236
6237    If a server is not configured with a shared secret for a partner, it
6238    MUST NOT send the message-digest option in any message to that
6239    partner and it MUST treat any messages received from that partner
6240    with a message-digest option as failing authentication with reject
6241    reason 13: "Message digest not configured".
6242
6243    The shared secret is used to calculate a 16 octet message-digest
6244    which is sent in every failover message in the message-digest option.
6245    See section 12.16. The message-digest contains a one-way 16 octet
6246    HMAC-MD5 [RFC 2104] hash calculated over a stream of octets consist-
6247    ing of the entire message concatenated with the shared secret.
6248
6249    For calculation, the message includes the message-digest option with
6250    the message-digest data zeroed (16-octets of zero). Once the calcula-
6251    tion is complete, these 16 octets of zero are replaced by the 16-
6252    octet HMAC-MD5 hash and the message is sent.
6253
6254    For verification, the 16-octet message-digest is saved and replaced
6255    with 16-octets of zero and calculated per above. The resulting HMAC-
6256    MD5 hash is compared to the received hash and if they match, the mes-
6257    sage is assumed authenticated.
6258
6259    A failover partner that fails to authenticate a received message or
6260    receives a message without a message-digest option when configured
6261    with a shared secret MUST close the connection immediately and take
6262    steps to notify operators.
6263
6264    Every time a CONNECT message is received, the time at which that mes-
6265    sage was sent by the partner (i.e., the time that actually appears in
6266    the message itself) MUST be saved.  If a CONNECT message is ever
6267    received containing that time or containing a time before that time,
6268    it MUST be rejected.
6269
6270    The XID (see section 6.1) of every message received at a failover
6271
6272
6273
6274 Droms, et. al.           Expires September 2003               [Page 112]
6275 \f
6276 Internet Draft           DHCP Failover Protocol              March 2003
6277
6278
6279    endpoint MUST be greater than that of the previous message received
6280    on that failover endpoint or the message just received MUST be
6281    rejected.
6282
6283    A server MAY operate with arbitrary time skew between servers (see
6284    section 5.10), but when using a shared secret administrators MAY wish
6285    to configure a maximum allowable time skew between a failover server
6286    and its partner(s).  Servers SHOULD allow an administrator to config-
6287    ure a maximum allowable time skew between two failover partners.
6288
6289 11.2.  TLS
6290
6291    TLS, Transport Layer Security, as specified in [RFC 2246] MAY be
6292    used.  The use of TLS would be similar to the way it is used with
6293    SMTP [RFC 2487] and IMAP/POP3/ACAP [RFC 2595].
6294
6295    To request the use of TLS, the primary MUST send the TLS-request
6296    option as part of the CONNECT message. The secondary receiving the
6297    TLS-request option MUST respond with a TLS-reply option indicating
6298    its acceptance or rejection of the TLS-request in the CONNECT mes-
6299    sage."
6300
6301    If the CONNECTACK message contained a TLS-reply of 1 , then both
6302    servers immediately begin TLS negotiation.
6303
6304    Upon completion of this negotiation, the primary server sends another
6305    CONNECT message without any TLS-request option, and must wait for a
6306    corresponding CONNECTACK.
6307
6308    Implementation of the TLS_DHE_DSS_WITH_3DES_EDE_CBC_SHA [RFC 2246]
6309    cipher suite is REQUIRED in Failover servers supporting TLS. This is
6310    important as it assures that any two compliant implementations can be
6311    configured to interoperate.
6312
6313 12.  Failover Options
6314
6315    This section lists all of the options that are currently defined to
6316    be used with the failover protocol.  See section 6.2 for details con-
6317    cerning time values.
6318
6319
6320
6321
6322
6323
6324
6325
6326
6327
6328
6329
6330 Droms, et. al.           Expires September 2003               [Page 113]
6331 \f
6332 Internet Draft           DHCP Failover Protocol              March 2003
6333
6334
6335
6336 12.1.  addresses-transferred
6337
6338    A 32 bit unsigned long in network byte order. Reports the number of
6339    addresses transferred by the primary to the secondary server
6340    (addresses to be used for the secondary server's private address
6341    pool).
6342
6343         Code        Len       Number of Addresses
6344    +-----+-----+-----+-----+----+-----+-----+-----+
6345    |  0  |  1  |  0  |  4  | n1 |  n2 |  n3 |  n4 |
6346    +-----+-----+-----+-----+----+-----+-----+-----+
6347
6348
6349 12.2.  assigned-IP-address
6350
6351    The DHCP managed IP address to which this message refers.
6352
6353         Code        Len          Address
6354    +-----+-----+-----+-----+----+-----+-----+-----+
6355    |  0  |  2  |  0  |  4  | a1 |  a2 |  a3 |  a4 |
6356    +-----+-----+-----+-----+----+-----+-----+-----+
6357
6358
6359 12.3.  binding-status
6360
6361    This option is used to convey the current state of a binding.
6362
6363        Code         Len     Type
6364    +-----+-----+-----+-----+-----+
6365    |  0  |  3  |  0  |  1  | 1-7 |
6366    +-----+-----+-----+-----+-----+
6367
6368    Legal values for this option are:
6369
6370    Value Binding Status
6371    ----- ------------------------------------------------
6372    1     FREE           Lease is currently available to the primary
6373    2     ACTIVE         Lease is assigned to a client
6374    3     EXPIRED        Lease has expired
6375    4     RELEASED       Lease has been released by client
6376    5     ABANDONED      A server, or client flagged address as unusable
6377    6     RESET          Lease was freed by some external agent
6378    7     BACKUP         Lease belongs to secondary's private address pool
6379
6380
6381
6382
6383
6384
6385
6386 Droms, et. al.           Expires September 2003               [Page 114]
6387 \f
6388 Internet Draft           DHCP Failover Protocol              March 2003
6389
6390
6391
6392 12.4.  client-identifier
6393
6394    This is the client-identifier for the client associated with a
6395    binding.  The client-identifier data is subject to the same
6396    conventions as DHCP option 81 [RFC 2132].
6397
6398         Code        Len       Client Identifier
6399    +-----+-----+-----+-----+----+-----+---
6400    |  0  |  4  |  0  |  n  | i1 |  i2 | ...
6401    +-----+-----+-----+-----+----+-----+--
6402
6403
6404 12.5.  client-hardware-address
6405
6406    This is the hardware address for the client associated with a
6407    binding.  Byte t1 (type) MUST be set to the proper ARP hardware
6408    address code, as defined in the ARP section of RFC 1700 (it MUST NOT
6409    be zero!)
6410
6411         Code        Len     htype   chaddr
6412    +-----+-----+-----+-----+----+-----+-----+---
6413    |  0  |  5  |  0  |  n  | t1 |  c1 |  c2 | ...
6414    +-----+-----+-----+-----+----+-----+-----+---
6415
6416
6417 12.6.  client-last-transaction-time
6418
6419    The time at which this server last received a DHCP request from a
6420    particular client expressed as an absolute time (see section 6.2).
6421
6422
6423         Code        Len    client last transaction time
6424    +-----+-----+-----+-----+----+-----+-----+-----+
6425    |  0  |  6  |  0  |  4  | t1 |  t2 |  t3 |  t4 |
6426    +-----+-----+-----+-----+----+-----+-----+-----+
6427
6428
6429
6430
6431
6432
6433
6434
6435
6436
6437
6438
6439
6440
6441
6442 Droms, et. al.           Expires September 2003               [Page 115]
6443 \f
6444 Internet Draft           DHCP Failover Protocol              March 2003
6445
6446
6447
6448 12.7.  client-reply-options
6449
6450    This option contains options from a DHCP server's reply to a DHCP
6451    client request.  It is sent in a BNDUPD message.  The first 4 bytes
6452    of the option contain the "magic number" of the option area from
6453    which the DHCP reply options were taken and serves to define the
6454    format of the rest of the sub-options contained in this option.
6455    After the magic number, the options included are in the normal
6456    options format appropriate for that magic number.
6457
6458    A server SHOULD NOT include all of the options in a DHCP server's
6459    reply to a client's request in this option, but rather a server
6460    SHOULD include only those options which are of likely interest to its
6461    partner server.  See section 7.1 for details.
6462
6463         Code        Len         Magic Number      Embedded options
6464    +-----+-----+-----+-----+----+----+----+----+----+----+--
6465    |  0  |  7  |  0  |  n  | m1 | m2 | m3 | m4 | b1 | b2 |  ...
6466    +-----+-----+-----+-----+----+----+----+----+----+----+--
6467
6468
6469 12.8.  client-request-options
6470
6471    This option contains options from a DHCP client's request.  It is
6472    sent in a BNDUPD message.  The first 4 bytes of the option contain
6473    the "magic number" of the option area from which the DHCP client's
6474    request options were taken and serves to define the format of the
6475    rest of the sub-options contained in this option.  After the magic
6476    number, the options included are in the normal options format
6477    appropriate for that magic number.
6478
6479    A server SHOULD NOT include all of the options in a DHCP client
6480    request in this option, but rather a server SHOULD include only those
6481    options which are of likely interest to its partner server.  See
6482    section 7.1 for details.
6483
6484         Code        Len         Magic Number      Embedded options
6485    +-----+-----+-----+-----+----+----+----+----+----+----+--
6486    |  0  |  8  |  0  |  n  | m1 | m2 | m3 | m4 | b1 | b2 |  ...
6487    +-----+-----+-----+-----+----+----+----+----+----+----+--
6488
6489
6490
6491
6492
6493
6494
6495
6496
6497
6498 Droms, et. al.           Expires September 2003               [Page 116]
6499 \f
6500 Internet Draft           DHCP Failover Protocol              March 2003
6501
6502
6503
6504 12.9.  DDNS
6505
6506    If an implementation supports Dynamic DNS updates, this option is
6507    used to communicate the status of the DDNS update associated with a
6508    particular lease binding.  The Flags field conveys the types of DNS
6509    RRs that are to be updated by the DHCP server, and the status of the
6510    DDNS update.  The Domain Name field conveys the DNS FQDN that the
6511    DHCP server is using to refer to the client, in DNS encoding as
6512    specified in [RFC 1035].
6513
6514        Code        Len        Flags      Domain Name
6515    +-----+-----+-----+-----+-----+------+------+-----+------
6516    |  0  |  9  |  0  |  n  |   flags    |  d1  |  d2 | ...
6517    +-----+-----+-----+-----+-----+------+------+-----+------
6518
6519    The Flags field is a 16-bit field; several bit positions are
6520    specified here.
6521
6522                         1 1 1 1 1 1
6523     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
6524    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
6525    |C|A|D|P|       MBZ             |
6526    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
6527
6528    The bits (numbered from the least-significant bit in network
6529    byte-order) are used as follows:
6530
6531    0 (C): name to address (such as A RR) update successfully completed
6532    1 (A): Server is controlling A RR on behalf of the client
6533    2 (D): address to name (such as PTR RR) update successfully completed (Done)
6534    3 (P): Server is controlling PTR RR on behalf of the client
6535    4-15 : Must be zero
6536
6537    All of the unspecified bit positions SHOULD be set to 0 by servers
6538    sending the Failover-DDNS option, and they MUST be ignored by servers
6539    receiving the option.
6540
6541
6542
6543
6544
6545
6546
6547
6548
6549
6550
6551
6552
6553
6554 Droms, et. al.           Expires September 2003               [Page 117]
6555 \f
6556 Internet Draft           DHCP Failover Protocol              March 2003
6557
6558
6559
6560 12.10.  delayed-service-parameter
6561
6562    The delayed-service-parameter is an optional load balancing tuning
6563    parameter, defined in [RFC 3074].  If it is used, it MUST be sent in
6564    the same message as the hash-bucket-assignment option (see section
6565    12.11).
6566
6567    Format :
6568
6569
6570        Code        Len    Seconds
6571    +-----+-----+-----+-----+----+
6572    |  0  |  10 |  0  |  1  | S  |
6573    +-----+-----+-----+-----+----+
6574
6575    S is a one byte value, 1..255.
6576
6577
6578 12.11.  hash-bucket-assignment
6579
6580    A set of load balancing hash values for the secondary server.  A one
6581    bit in the hash buckets indicates that the secondary is to service
6582    that set of clients.  See section 5.3 for more information on how
6583    this option is used.  This option is only sent from the primary to
6584    the secondary.
6585
6586    The format and usage of the data in this option is defined in [RFC
6587    3074].
6588
6589         Code        Len        Hash Buckets
6590    +-----+-----+-----+-----+-----+-----+-----+-----+
6591    |  0  |  11 |  0  |  32 |  b1 |  b2 | ... | b32 |
6592    +-----+-----+-----+-----+-----+-----+-----+-----+
6593
6594
6595
6596
6597
6598
6599
6600
6601
6602
6603
6604
6605
6606
6607
6608
6609
6610 Droms, et. al.           Expires September 2003               [Page 118]
6611 \f
6612 Internet Draft           DHCP Failover Protocol              March 2003
6613
6614
6615
6616 12.12.  IP-flags
6617
6618    This option is used to convey the current flags of the assigned-IP-
6619    address option preceding it.
6620
6621        Code         Len       IP Flags
6622    +-----+-----+-----+-----+-----+-----+
6623    |  0  |  12 |  0  |  1  |  f1 |  f2 |
6624    +-----+-----+-----+-----+-----+-----+
6625
6626    The IP-flags field is a 16-bit field; two bit positions are
6627    specified here.
6628
6629                         1 1 1 1 1 1
6630     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
6631    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
6632    |R|B|           MBZ             |
6633    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
6634
6635    The bits (numbered from the least-significant bit in network
6636    byte-order) are used as follows:
6637
6638    0 (R): RESERVED  (this bit allocated and in use and named "RESERVED")
6639           Bit 0 MUST be set to 1 whenever the IP address in the preceding
6640           assigned-IP-address option is reserved on the server sending the
6641           packet.
6642    1 (B): BOOTP
6643           Bit 1 MUST be set to 1 whenever the IP address in the preceding
6644           assigned-IP-address option is a an IP address which has been
6645           allocated due to an interaction with a BOOTP client (as opposed
6646           to a DHCP client).
6647    2-15  : Must be zero
6648
6649
6650
6651
6652
6653
6654
6655
6656
6657
6658
6659
6660
6661
6662
6663
6664
6665
6666 Droms, et. al.           Expires September 2003               [Page 119]
6667 \f
6668 Internet Draft           DHCP Failover Protocol              March 2003
6669
6670
6671
6672 12.13.  lease-expiration-time
6673
6674    The lease expiration time is the lease interval that a DHCP server
6675    has ACKed to a DHCP client added to the time at which that ACK was
6676    transmitted -- expressed as an absolute time (see section 6.2).
6677
6678
6679         Code        Len          Time
6680    +-----+-----+-----+-----+----+-----+-----+-----+
6681    |  0  |  13 |  0  |  4  | t1 |  t2 |  t3 |  t4 |
6682    +-----+-----+-----+-----+----+-----+-----+-----+
6683
6684
6685 12.14.  max-unacked-bndupd
6686
6687    The maximum number of BNDUPD message that this server is prepared to
6688    accept over the TCP connection without causing the TCP connection to
6689    block.  A 32 bit unsigned integer value, in network byte order.
6690
6691
6692         Code        Len     Maximum Unacked BNDUPD
6693    +-----+-----+-----+-----+----+-----+-----+-----+
6694    |  0  |  14 |  0  |  4  | n1 |  n2 |  n3 |  n4 |
6695    +-----+-----+-----+-----+----+-----+-----+-----+
6696
6697
6698 12.15.  MCLT
6699
6700    Maximum Client Lead Time, an interval, in seconds.  A 32 bit unsigned
6701    integer value, in network byte order.
6702
6703         Code        Len             Time
6704    +-----+-----+-----+-----+----+-----+-----+-----+
6705    |  0  |  15 |  0  |  4  | t1 |  t2 |  t3 |  t4 |
6706    +-----+-----+-----+-----+----+-----+-----+-----+
6707
6708
6709
6710
6711
6712
6713
6714
6715
6716
6717
6718
6719
6720
6721
6722 Droms, et. al.           Expires September 2003               [Page 120]
6723 \f
6724 Internet Draft           DHCP Failover Protocol              March 2003
6725
6726
6727
6728 12.16.  message
6729
6730    This option is used to supply a human readable message text.  It may
6731    be used in association with the Reject Reason Code to provide a human
6732    readable error message for the reject.
6733
6734
6735         Code        Len         Text
6736    +-----+-----+-----+-----+------+-----+--
6737    |  0  |  16 |  0  |  n  |  c1  | c2  | ...
6738    +-----+-----+-----+-----+------+-----+--
6739
6740
6741 12.17.  message-digest
6742
6743    The message digest for this message.
6744
6745    This option consists of a variable number of bytes which contain the
6746    message digest of the message prior to the inclusion of this option.
6747
6748    When this option appears in a message, it MUST appear as the first
6749    option in the message.  It MUST appear in every message if message
6750    digests are required.  The Type MUST be configurable (once additional
6751    types are defined).  When additional types are defined, they MUST be
6752    specified as either optional (MAY be supported) or required (MUST be
6753    supported).  See the section on IANA considerations for more details.
6754
6755         Code        Len      Type   Message Digest
6756    +-----+-----+-----+-----+-----+-----+-----+--
6757    |  0  |  17 |  0  |  n  |  t  |  d1 |  d2 | ...
6758    +-----+-----+-----+-----+-----+-----+-----+--
6759
6760
6761       Type:    0      Not Allowed
6762                1      HMAC-MD5
6763                2-255  Not Allowed
6764
6765
6766
6767
6768
6769
6770
6771
6772
6773
6774
6775
6776
6777
6778 Droms, et. al.           Expires September 2003               [Page 121]
6779 \f
6780 Internet Draft           DHCP Failover Protocol              March 2003
6781
6782
6783
6784 12.18.  potential-expiration-time
6785
6786    The potential expiration time is the time that one server tells
6787    another server that it may wish to grant in a lease to a DHCP client.
6788    It is an absolute time.  See section 6.2.
6789
6790
6791         Code        Len          Time
6792    +-----+-----+-----+-----+----+-----+-----+-----+
6793    |  0  |  18 |  0  |  4  | t1 |  t2 |  t3 |  t4 |
6794    +-----+-----+-----+-----+----+-----+-----+-----+
6795
6796
6797 12.19.  receive-timer
6798
6799    The number of seconds (an interval) within which the server must
6800    receive a message from its partner, or it will assume that
6801    communications with the partner is not ok.  An unsigned 32 bit
6802    integer in network byte order.
6803
6804         Code        Len         Receive Timer
6805    +-----+-----+-----+-----+----+-----+-----+-----+
6806    |  0  |  19 |  0  |  4  | s1 |  s2 |  s3 |  s4 |
6807    +-----+-----+-----+-----+----+-----+-----+-----+
6808
6809
6810 12.20.  protocol-version
6811
6812    The protocol version being used by the server. It is only sent in the
6813    CONNECT and CONNECTACK messages.  The current value for the version
6814    is 1.
6815
6816         Code        Len    Version
6817    +-----+-----+-----+-----+-----+
6818    |  0  |  20 |  0  |  1  |  1  |
6819    +-----+-----+-----+-----+-----+
6820
6821
6822
6823
6824
6825
6826
6827
6828
6829
6830
6831
6832
6833
6834 Droms, et. al.           Expires September 2003               [Page 122]
6835 \f
6836 Internet Draft           DHCP Failover Protocol              March 2003
6837
6838
6839
6840 12.21.  reject-reason
6841
6842    This option is used to selectively reject binding updates. It MAY be
6843    used in a BNDACK message or a CONNECTACK message, always associated
6844    with an assigned-IP-address option, which contains the IP address of
6845    the update being rejected.
6846
6847         Code        Len   Reason Code
6848    +-----+-----+-----+-----+-----+
6849    |  0  |  21 |  0  |  1  |  R1 |
6850    +-----+-----+-----+-----+-----+
6851
6852    Reason codes (section where referenced in parentheses):
6853
6854    0   Reserved
6855    1   Illegal IP address (not part of any address pool). (7.1.3)
6856    2   Fatal conflict exists: address in use by other client. (7.1.3)
6857    3   Missing binding information. (7.1.3)
6858    4   Connection rejected, time mismatch too great. (7.8.2)
6859    5   Connection rejected, invalid MCLT. (7.8.2)
6860    6   Connection rejected, unknown reason. (not specifically referenced)
6861    7   Connection rejected, duplicate connection. (unused)
6862    8   Connection rejected, invalid failover partner. (7.8.2)
6863    9   TLS not supported. (7.8.2)
6864    10  TLS supported but not configured. (7.8.2)
6865    11  TLS required but not supported by partner. (7.8.2)
6866    12  Message digest not supported. (11.1)
6867    13  Message digest not configured. (11.1)
6868    14  Protocol version mismatch. (7.8.2)
6869    15  Outdated binding information. (7.1.3)
6870    16  Less critical binding information. (7.1.3)
6871    17  No traffic within sufficient time. (8.6)
6872    18  Hash bucket assignment conflict. (7.8.2)
6873    19  IP not reserved on this server. (7.1.3)
6874    20  Message digest failed to compare. (7.8.2)
6875    21  Missing message digest. (7.1.3)
6876    22-253, reserved.
6877    254 Unknown: Error occurred but does not match any reason code.
6878    255 Reserved for code expansion.
6879
6880
6881
6882
6883
6884
6885
6886
6887
6888
6889
6890 Droms, et. al.           Expires September 2003               [Page 123]
6891 \f
6892 Internet Draft           DHCP Failover Protocol              March 2003
6893
6894
6895
6896 12.22.  relationship-name
6897
6898    A string which is a unique identifier for the failover relationship.
6899
6900         Code        Len       Relationship Name
6901    +-----+-----+-----+-----+----+-----+---
6902    |  0  |  22 |  0  |  n  | c1 |  c2 |  ...
6903    +-----+-----+-----+-----+----+-----+---
6904
6905
6906 12.23.  server-flags
6907
6908    This option is used to convey the current flags of the failover
6909    endpoint in the sending server.
6910
6911        Code         Len     Server Flags
6912    +-----+-----+-----+-----+-------+
6913    |  0  |  23 |  0  |  1  | flags |
6914    +-----+-----+-----+-----+-------+
6915
6916    The flags field is an 8-bit field; one bit position is
6917    specified here.
6918
6919
6920     0 1 2 3 4 5 6 7
6921    +-+-+-+-+-+-+-+-+
6922    |S|   MBZ       |
6923    +-+-+-+-+-+-+-+-+
6924
6925    The bits (numbered from the least-significant bit in network
6926    byte-order) are used as follows:
6927
6928    0 (S): STARTUP,
6929           Bit 0 MUST be set to 1 whenever the server is in STARTUP state,
6930           and set to 0 otherwise.  (Note that when in STARTUP state, the
6931           state transmitted in the server-state option is usually the last
6932           recorded state from stable storage, but see section 9.3 for
6933           details.)
6934    1-7  : Must be zero
6935
6936
6937
6938
6939
6940
6941
6942
6943
6944
6945
6946 Droms, et. al.           Expires September 2003               [Page 124]
6947 \f
6948 Internet Draft           DHCP Failover Protocol              March 2003
6949
6950
6951
6952 12.24.  server-state
6953
6954    This option is used to convey the current state of the failover
6955    endpoint in the sending server.
6956
6957        Code         Len   Server State
6958    +-----+-----+-----+-----+-----+
6959    |  0  |  24 |  0  |  1  | 1-9 |
6960    +-----+-----+-----+-----+-----+
6961
6962    Legal values for this option are:
6963
6964    Value   Server State
6965    -----   -------------------------------------------------------------
6966    0       reserved
6967    1       STARTUP                      Startup state (1)
6968    2       NORMAL                       Normal state
6969    3       COMMUNICATIONS-INTERRUPTED   Communication interrupted (safe)
6970    4       PARTNER-DOWN                 Partner down (unsafe mode)
6971    5       POTENTIAL-CONFLICT           Synchronizing
6972    6       RECOVER                      Recovering bindings from partner
6973    7       PAUSED                       Shutting down for a short period.
6974    8       SHUTDOWN                     Shutting down for an extended
6975                                         period.
6976    9       RECOVER-DONE                 Interlock state prior to NORMAL
6977    10      RESOLUTION-INTERRUPTED       Comm. failed during resolution
6978    11      CONFLICT-DONE                Primary has resolved its conflicts
6979
6980    (1) The STARTUP state is never sent to the partner server, it is
6981    indicated by the STARTUP bit in the server-flags options (see section
6982    12.22).
6983
6984
6985 12.25.  start-time-of-state
6986
6987    This option is used for different states in different messages.  In a
6988    BNDUPD message it represents the start time of the state of the lease
6989    in the BNDUPD message.  In a STATE message, it represents the start
6990    time of the partner server's failover state.  In all cases it is an
6991    absolute time.
6992
6993
6994         Code        Len      Start Time of State
6995    +-----+-----+-----+-----+----+-----+-----+-----+
6996    |  0  |  25 |  0  |  4  | t1 |  t2 |  t3 |  t4 |
6997    +-----+-----+-----+-----+----+-----+-----+-----+
6998
6999
7000
7001
7002 Droms, et. al.           Expires September 2003               [Page 125]
7003 \f
7004 Internet Draft           DHCP Failover Protocol              March 2003
7005
7006
7007
7008 12.26.  TLS-reply
7009
7010    This option contains information relating to TLS security
7011    negotiation.  It is sent in a CONNECTACK message
7012
7013    A t1 value of 0 indicates no TLS operation, a value of 1 indicates
7014    that TLS operation is required.
7015
7016         Code        Len      TLS
7017    +-----+-----+-----+-----+-----+
7018    |  0  |  26 |  0  |  1  |  t1 |
7019    +-----+-----+-----+-----+-----+
7020
7021
7022 12.27.  TLS-request
7023
7024    This option contains information relating to TLS security
7025    negotiation.  It is sent in a CONNECT message.
7026
7027    The t1 byte is the TLS request from the primary server.  A value of 0
7028    indicates no TLS operation (to communicate the secondary server MUST
7029    NOT require TLS), a value of 1 indicates that TLS operation is
7030    desired but not required (to communicate, the secondary server MAY
7031    utilize TLS), and a value of 2 indicates that TLS operation is
7032    required (to communicate the secondary server MUST utilize TLS) to
7033    establish communications with the primary server.
7034
7035         Code        Len      TLS
7036    +-----+-----+-----+-----+-----+
7037    |  0  |  27 |  0  |  1  |  t1 |
7038    +-----+-----+-----+-----+-----+
7039
7040
7041 12.28.  vendor-class-identifier
7042
7043    A string which identifies the vendor of the failover protocol
7044    implementation.
7045
7046         Code        Len    vendor class string
7047    +-----+-----+-----+-----+----+-----+---
7048    |  0  |  28 |  0  |  n  | c1 |  c2 |  ...
7049    +-----+-----+-----+-----+----+-----+---
7050
7051
7052
7053
7054
7055
7056
7057
7058 Droms, et. al.           Expires September 2003               [Page 126]
7059 \f
7060 Internet Draft           DHCP Failover Protocol              March 2003
7061
7062
7063
7064 12.29.  vendor-specific-options
7065
7066    This option is used to convey options specific to a particular
7067    vendor's implementation.  The vendor class identifier is used to
7068    specify which option space the embedded options are drawn from.
7069    Every message that uses vendor specific options MUST have a vendor-
7070    class-identifier option in it.
7071
7072    It functions similarly to the vendor class identifier and vendor
7073    specific options in the DHCP protocol.
7074
7075    This option contains other options in the same two byte code, two
7076    byte length format.  If this option appears in a message without a
7077    corresponding vendor class identifier, it MUST be ignored.
7078
7079         Code        Len     Embedded options
7080    +-----+-----+-----+-----+----+-----+---
7081    |  0  |  29 |  0  |  n  | c1 |  c2 |  ...
7082    +-----+-----+-----+-----+----+-----+---
7083
7084
7085
7086
7087 13.  IANA Considerations
7088
7089    This document defines several number spaces (failover options, fail-
7090    over message types, message digest types, and failover reject reason
7091    codes). For all of these number spaces, certain values are defined in
7092    this specification.  New values may only be defined by IETF Con-
7093    sensus, as described in [RFC 2434]. Basically, this means that they
7094    are defined by RFCs approved by the IESG.
7095
7096
7097 14.  Acknowledgments
7098
7099    Ralph Droms started it all, by sketching out an initial interserver
7100    draft that embodied ideas from several past IETF meetings.  In that
7101    draft, he acknowledged contributions by Jeff Mogul, Greg Minshall,
7102    Rob Stevens, Walt Wimer, Ted Lemon, and the DHC working group.
7103
7104    Kim Kinnear and Bob Cole each extended that draft, separately and
7105    then together, until they created an interserver draft that supported
7106    any number of servers.  The complexity of that approach was just too
7107    great, and that draft wasn't greeted with enthusiasm by many, includ-
7108    ing its authors.
7109
7110    It did however lead to a much simpler approach embodied in the first
7111
7112
7113
7114 Droms, et. al.           Expires September 2003               [Page 127]
7115 \f
7116 Internet Draft           DHCP Failover Protocol              March 2003
7117
7118
7119    Failover draft by Greg Rabil, Mike Dooley, Arun Kapur and Ralph
7120    Droms.  This draft posited only two servers -- a primary and a secon-
7121    dary.
7122
7123    Kim Kinnear then wrote the Safe Failover draft to layer on top of the
7124    Failover Draft and increase its robustness in the face of certain
7125    rare network failures.
7126
7127    At the spring 1998 IETF meeting in LA, the DHC working group said
7128    that they wanted a merged Failover and Safe Failover draft.  Steve
7129    Gonczi and Bernie Volz stepped up and produced the raw material for
7130    such a merged draft, along with a new message format designed around
7131    DHCP options and other extensions and clarifications.  Kim Kinnear
7132    edited their work into draft format and made other changes in time
7133    for the Summer Chicago IETF meeting.
7134
7135    Many people have reviewed the various earlier drafts that went into
7136    this result.  At American Internet, ideas were contributed by Brad
7137    Parker.  At Cisco Systems Paul Fox and Ellen Garvey contributed to
7138    the design of the protocol.
7139
7140    During the summer and fall of 1998, two groups worked on separate
7141    implementations of the UDP failover draft.  Bernie Volz and Steve
7142    Gonczi constituted one group, and Kim Kinnear, Mark Stapp and Paul
7143    Fox made up the other.  These two groups worked together to produce
7144    considerable changes and simplifications of the protocol during that
7145    period, and Steve Gonczi and Kim Kinnear edited those changes into
7146    -03 draft in time for submission to the December 1998 Orlando IETF
7147    meeting.
7148
7149    In February of 1999 Kim Kinnear and Mark Stapp hosted a meeting of
7150    people interested in the failover draft.  During that meeting a gen-
7151    eral agreement was reached to recast the failover protocol to use TCP
7152    instead of UDP.  In addition, the group together brainstormed a work-
7153    able load-balancing technique.  Kim Kinnear rewrote the entire draft
7154    to include the changes made at that meeting as well as to restructure
7155    the draft along guidelines suggested by Thomas Narten.  The result
7156    was the -04 draft, submitted prior to the Oslo IETF meeting.
7157
7158    The initial idea for a hash-based load balancing approach was offered
7159    by Ted Lemon, and the determination of an algorithm and its integra-
7160    tion into the draft was done by Steve Gonczi.  The security section
7161    was spearheaded by Bernie Volz.  Both contributed considerably to the
7162    ideas and text in the rest of the draft with several reviews.
7163
7164    In early October of 1999, three conference calls were held to discuss
7165    the -04 draft.  The -05 includes changes as a result of those calls,
7166    perhaps the largest of which was to remove the load balancing
7167
7168
7169
7170 Droms, et. al.           Expires September 2003               [Page 128]
7171 \f
7172 Internet Draft           DHCP Failover Protocol              March 2003
7173
7174
7175    approach into a separate draft.   Thanks to all of the many people
7176    who participated in the conference calls.  Changes were made because
7177    of contributions by: Ted Lemon, David Erdmann, Richard Jones, Rob
7178    Stevens, Thomas Narten, Diana Lane, and Andre Kostur.
7179
7180    Another conference call was held in mid-January of 2000, and the -06
7181    draft was produced to tighten up the the -05 draft both technically
7182    as well as editorially.
7183
7184    The -07 draft was edited by Kim Kinnear and was based in part on
7185    reviews by Richard Jones, Bernie Volz, and Steve Gonczi.  It embodies
7186    several technical updates as well as numerous editorial revisions
7187    that enhanced both correctness as well as clarity.
7188
7189    The -08 draft was edited by Kim Kinnear and was based on the results
7190    of two conference calls held in October and November of 2000.  It
7191    includes the correct second port number, a new state to synchronize
7192    conflict resolution with load balancing, a generally accepted
7193    approach to secondary pool allocation, and many other updates based
7194    on both operational as well as implementation experience.
7195
7196    The -09 draft was edited by Kim Kinnear based on discussions held at
7197    the Minneapolis IETF in December of 2000, as well as issues raised by
7198    Ted Lemon based on implementation and deployment.  The specific
7199    changes were mailed to the dhcp-v4 list.
7200
7201    The -10 draft differed from the -09 draft in that figure 9.8.3-1 was
7202    correctly relabeled figure 9.10.3-1, and it was updated to include
7203    the CONFLICT-DONE message.  One of the authors affiliations was also
7204    updated.
7205
7206    This, the -11 draft differs only slightly from the -10 draft in
7207    correcting another author affiliation.
7208
7209    These most recent changes have not been widely circulated among the
7210    other authors prior to submission to the IETF.
7211
7212    Glenn Waters of Nortel Networks contributed ideas and enthusiasm to
7213    make a Failover protocol that was both "safe" and "lazy".
7214
7215
7216 15.  References
7217
7218
7219    [DHCID] Stapp, M., Lemon, T., Gustafsson, A., "draft-ietf-dnsext-
7220       dhcid-rr-02.txt", March, 2001.
7221
7222    [DNSRES] Stapp, M., "draft-ietf-dhc-dns-resolution-01.txt", March,
7223
7224
7225
7226 Droms, et. al.           Expires September 2003               [Page 129]
7227 \f
7228 Internet Draft           DHCP Failover Protocol              March 2003
7229
7230
7231       2001.
7232
7233    [FQDN] Rekhter, Y., Stapp, M., "draft-ietf-dhc-fqdn-option-01.txt",
7234       March, 2001.
7235
7236    [RFC 1035] Mockapetris, P., "Domain Names - Implementation and
7237       Specification", November, 1987.
7238
7239    [RFC 1534] Droms, R., "Interoperation between DHCP and BOOTP", RFC
7240       1534, October 1993.
7241
7242    [RFC 2104] Krawczyk, H., Bellare, M., and Canetti, R., "HMAC: Keyed
7243       Hashing for Message Authentication", RFC 2104, IBM T.J. Watson
7244       Research Center, University of California at San Diego, February
7245       1997.
7246
7247    [RFC 2119] Bradner, S. "Key words for use in RFCs to Indicate
7248       Requirement Levels", RFC 2119.
7249
7250    [RFC 2131] Droms, R., "Dynamic Host Configuration Protocol", RFC
7251       2131, March 1997.
7252
7253    [RFC 2132] Alexander, S.,  Droms, R., "DHCP Options and BOOTP Vendor
7254       Extensions", Internet RFC 2132, March 1997.
7255
7256    [RFC 2136] P. Vixie, S. Thomson, Y. Rekhter, J. Bound, "Dynamic
7257       Updates in the Domain Name System (DNS UPDATE)", RFC 2136, April
7258       1997
7259
7260    [RFC 2139] Rigney, C., "Radius Accounting", RFC 2139, Livingston
7261       Enterprises, April 1997.
7262
7263    [RFC 2246] Dierks, T., "The TLS Protocol, Version 1.0", RFC 2246,
7264       January 1999.
7265
7266    [RFC 2401] Kent, S., Atkinson, R., "Security Architecture for the
7267       Internet Protocol", RFC 2401, November 1998.
7268
7269    [RFC 2434] Alvestrand, H. and T. Narten, "Guidelines for Writing an
7270       IANA Considerations Section in RFCs", BCP 26, RFC 2434, October
7271       1998.
7272
7273    [RFC 2487] Hoffman, P., "SMTP Service Extension for Secure SMTP over
7274       TLS", RFC 2487, January 1999.
7275
7276    [RFC 2595] Newman, C., "Using TLS with IMAP, POP3, and ACAP", RFC
7277       2595, June 1999.
7278
7279
7280
7281
7282 Droms, et. al.           Expires September 2003               [Page 130]
7283 \f
7284 Internet Draft           DHCP Failover Protocol              March 2003
7285
7286
7287    [RFC 3004] Stump, G., Droms, R., Gu, Y., Vyaghrapuri, R., Demirtjis,
7288       A., Privat, J.  "The User Class Option for DHCP", November 2000.
7289
7290    [RFC 3011] Waters, G., "The IPv4 Subnet Selection Option for DHCP",
7291       November 2000.
7292
7293    [RFC 3046] Patrick, M., "DHCP Relay Agent Information Option", RFC
7294       3046, January 2001.
7295
7296    [RFC 3074] Volz, B., Gonczi, S., Lemon, T., Stevens, R., "DHC Load-
7297       balancing Algorithm", February, 2001.
7298
7299 16.  Author's information
7300
7301       Ralph Droms
7302       Kim Kinnear
7303       Mark Stapp
7304       Cisco Systems
7305       250 Apollo Drive
7306       Chelmsford, MA  01824
7307
7308       Phone: (978) 497-0000
7309
7310       EMail: rdroms@cisco.com
7311              kkinnear@cisco.com
7312              mjs@cisco.com
7313
7314
7315
7316       Bernie Volz
7317       Ericsson
7318       959 Concord St.
7319       Framingham, MA  01701
7320
7321       Phone: (508) 875-3162
7322
7323       EMail: bernie.volz@ericsson.com
7324
7325
7326       Steve Gonczi
7327       Relicore, Inc.
7328       One Wall Street
7329       Burlington, MA 01803
7330
7331       Phone: (781) 229-1122
7332
7333       Email: steve@relicore.com
7334
7335
7336
7337
7338 Droms, et. al.           Expires September 2003               [Page 131]
7339 \f
7340 Internet Draft           DHCP Failover Protocol              March 2003
7341
7342
7343       Greg Rabil
7344       Lucent Technologies
7345       400 Lapp Road
7346       Malvern, PA 19355
7347
7348       Phone: (800) 208-2747
7349
7350       EMail: grabil@lucent.com
7351
7352
7353
7354
7355       Michael Dooley
7356       Diamond IP Technologies
7357       One E Uwchlan Ave, Suite 112
7358       Exton, PA 19341
7359
7360       EMail: mdooley@diamondip.com
7361
7362
7363
7364
7365       Arun Kapur
7366       K5 Networks
7367       2 Toll House Lane
7368       Colts Neck, NJ 07722
7369
7370       Phone: (732) 817-9475
7371
7372 17.  Full Copyright Statement
7373
7374 Copyright (C) The Internet Society (2003). All Rights Reserved.
7375
7376 This document and translations of it may be copied and furnished to oth-
7377 ers, and derivative works that comment on or otherwise explain it or
7378 assist in its implementation may be prepared, copied, published and dis-
7379 tributed, in whole or in part, without restriction of any kind, provided
7380 that the above copyright notice and this paragraph are included on all
7381 such copies and derivative works.  However, this document itself may not
7382 be modified in any way, such as by removing the copyright notice or
7383 references to the Internet Society or other Internet organizations,
7384 except as needed for the  purpose of developing Internet standards in
7385 which case the procedures for copyrights defined in the Internet Stan-
7386 dards process must be followed, or as required to translate it into
7387 languages other than English.
7388
7389 The limited permissions granted above are perpetual and will not be
7390 revoked by the Internet Society or its successors or assigns.
7391
7392
7393
7394 Droms, et. al.           Expires September 2003               [Page 132]
7395 \f
7396 Internet Draft           DHCP Failover Protocol              March 2003
7397
7398
7399 This document and the information contained herein is provided on an "AS
7400 IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK
7401 FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT
7402 LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT
7403 INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FIT-
7404 NESS FOR A PARTICULAR PURPOSE.
7405
7406
7407
7408
7409
7410
7411
7412
7413
7414
7415
7416
7417
7418
7419
7420
7421
7422
7423
7424
7425
7426
7427
7428
7429
7430
7431
7432
7433
7434
7435
7436
7437
7438
7439
7440
7441
7442
7443
7444
7445
7446
7447
7448
7449
7450 Droms, et. al.           Expires September 2003               [Page 133]
7451 \f