From: Marcin Siodelski Date: Wed, 9 May 2018 15:13:14 +0000 (+0200) Subject: [5478] High Availability lib section moved to another xml file. X-Git-Tag: trac5549a_base~34^2~8 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=1d95e86cdbc5bc5fe6444777f2fda1e4b7f74620;p=thirdparty%2Fkea.git [5478] High Availability lib section moved to another xml file. --- diff --git a/doc/guide/Makefile.am b/doc/guide/Makefile.am index 7dca16a228..8383ca2c90 100644 --- a/doc/guide/Makefile.am +++ b/doc/guide/Makefile.am @@ -7,9 +7,9 @@ dist_html_DATA = $(HTMLDOCS) kea-guide.css kea-logo-100x70.png DOCBOOK = kea-guide.xml intro.xml quickstart.xml install.xml admin.xml config.xml DOCBOOK += keactrl.xml dhcp4-srv.xml dhcp6-srv.xml lease-expiration.xml logging.xml -DOCBOOK += ddns.xml hooks.xml libdhcp.xml lfc.xml stats.xml ctrl-channel.xml faq.xml -DOCBOOK += hooks-host-cache.xml hooks-radius.xml -DOCBOOK += classify.xml shell.xml agent.xml +DOCBOOK += ddns.xml hooks.xml hooks-ha.xml hooks-host-cache.xml hooks-radius.xml +DOCBOOK += libdhcp.xml lfc.xml stats.xml ctrl-channel.xml faq.xml classify.xml +DOCBOOK += shell.xml agent.xml EXTRA_DIST = $(DOCBOOK) diff --git a/doc/guide/hooks-ha.xml b/doc/guide/hooks-ha.xml new file mode 100644 index 0000000000..9d98ac1fb7 --- /dev/null +++ b/doc/guide/hooks-ha.xml @@ -0,0 +1,915 @@ + + +
+ libdhcp_ha: High Availability + + High Availability (HA) of the DHCP service is provided by running multiple + cooperating server instances. If any of these instances becomes + unavailable for whatever reason (DHCP software crash, control agent + software crash, power failure, hardware + failure), a surviving + server instance can continue providing the reliable service to the clients. Many + DHCP servers implementations include "DHCP Failover" protocol, which most + significant features are: communication between the servers, partner + failure detection and leases synchronization between the servers. + However, the DHCPv4 failover standardization process was never completed + at IETF. The DHCPv6 failover standard (RFC 8156) was published, but it + is complex, difficult to use, has significant operational constraints + and is different than its v4 counterpart. + Although it may be useful for some users to use a "standard" failover + protocol, it seems that most of the Kea users are simply interested in + a working solution which guarantees high availability of the DHCP + service. Therefore, Kea HA hook library derives major concepts from the + DHCP Failover protocol but uses its own solutions for communication, + configuration and its own state machine, which greatly simplifies its + implementation and generally better fits into Kea. Also, it provides the + same features in both DHCPv4 and DHCPv6. This document purposely + uses the term "High Availability" rather than "Failover" to emphasize that + it is not the Failover protocol implementation. + + + The following sections describe the configuration and operation of the Kea + HA hook library. + + +
+ Supported Configurations + The Kea HA hook library supports two configurations also known as HA + modes: load balancing and hot standby. In the load balancing mode, there + are two servers responding to the DHCP requests. The load balancing function + is implemented as described in RFC3074, with each server responding to + 1/2 of received DHCP queries. When one of the servers allocates a lease + for a client, it notifies the partner server over the control channel + (RESTful API), so as the partner can save the lease information in its + own database. If the communication with the partner is unsuccessful, + the DHCP query is dropped and the response is not returned to the DHCP + client. If the lease update is successful, the response is returned to + the DHCP client by the server which has allocated the lease. By + exchanging the lease updates, both servers get a copy of all leases + allocated by the entire HA setup and any of the servers can be switched + to handle the entire DHCP traffic if its partner becomes unavailable. + + In the load balancing configuration, one of the servers must be + designated as "primary" and the other server is designated as "secondary". + Functionally, there is no difference between the two during the normal + operation. This distniction is required when the two servers are + started at (nearly) the same time and have to synchronize their + lease databases. The primary server synchronizes the database first. + The secondary server waits for the primary server to complete the + lease database synchronization before it starts the synchronization. + + + In the hot standby configuration one of the servers is designated as + "primary" and the second server is designated as "secondary". During the + normal operation, the primary server is the only one that responds to + the DHCP requests. The secodary server receives lease updates from the + primary over the control channel. However, it does not respond to any + DHCP queries as long as the primary is running or, more accurately, + until the secondary considers the primary to be offline. When the + secondary server detects the failure of the primary, it starts + responding to all DHCP queries. + + + In the configurations described above, the primary, secondary and + standby are referred to as "active" servers, because they receive + lease updates and can automatically react to the partner's failures by + responding to the DHCP queries which would normally be handled by the + partner. The HA hook library supports another server type (role) - + backup server. The use of the backup servers is optional. They can be used + in both load balancing and hot standby setup, in addition to the active + servers. There is no limit on the number of backup servers in the HA + setup. However, the presence of the backup servers increases latency + of the DHCP responses, because not only do active servers send lease + updates to each other, but also to the backup servers. + +
+ +
+ Server States + The DHCP server operating within an HA setup runs a state machine + and the state of the server can be retrieved by its peers using the + ha-heartbeat command sent over the RESTful API. If + the partner server doesn't respond to the ha-heartbeat + command longer than configured amount of time, the communication is + considered interrupted and the server may (depending on the configuration) + use additional measures to verify if the partner is still operating. + If it finds that the partner is not operating, the server transitions + to the partner-down state to handle the entire + DHCP traffic directed to the system. + + In this case, the surviving server continues to send the + ha-heartbeat command to detect when the partner wakes + up. The partner synchronizes the lease database and when it is finally + ready to operate, the surviving server returns to the normal operation, + i.e. load-balancing or hot-standby + state. + + The following is the list of all possible states into which the + servers may transition: + + + backup - normal operation of the + backup server. In this state it receives lease updates from the active + servers. + + hot-standby - normal operation of + the active server running in the hot standby mode. Both primary and + standby server are in this state during their normal operation. + The primary server is responding to the DHCP queries and sends lease updates + to the standby server and to the backup servers, if any backup servers + are present. + + load-balancing - normal operation + of the active server running in the load balancing mode. Both primary + and secondary server are in this state during their normal operation. + Both servers are responding to the DHCP queries and send lease updates + to each other and to the backup servers, if any backup servers are + present. + + partner-down - an active server + transitions to this state after detecting that its partner (another + active server) is offline. The server doesn't transition to this state + if any of the backup servers is unavailable. In the + partner-down state the server responds to all DHCP queries, + so also those queries which are normally handled by the active server + which is now unavailable. + + ready - an active server transitions + to this state after synchronizing its lease database with an active + partner. This state is to indicate to the partner (likely being in the + partner-down state that it may return to the + normal operation. When it does, the server being in the + ready state will also start normal operation. + + + syncing - an active server + transitions to this state to fetch leases from the active partner + and update the local lease database. When it this state, it + issues the dhcp-disable to disable the DHCP + service of the partner from which the leases are fetched. The DHCP + servie is disabled for the maximum time of 60 seconds, after which + it is automatically enabled, in case the syncing partner has died + again failing to re-enable the service. If the synchronization is + completed the syncing server issues the dhcp-enable + to re-enable the DHCP service of the partner. The + syncing operation is synchronous. The server is waiting for an + answer from the partner and is not doing anything else while the + leases synchronization takes place. + + waiting - each started server + instance enters this state. The backup server will transition + directly from this state to the backup state. + An active server will send heartbeat to its partner to check its + state. If the partner appears to be unavailable the server will + transition to the partner-down, otherwise it + will transition to the syncing state and attempt + to synchronize the lease database. If both servers appear to be + in this state (concurrent startup) the primary server will + synchronize first. The secondary or standby server will remain + in the waiting state until the primary + synchronizes the database.. + + + Whether the server responds to the DHCP queries and which + queries it responds to is a matter of the server's state, if no + administrative action is performed to configure the server + otherwise. The following table provides the default behavior for + various states. + + + + Default behavior of the server in various HA states + + + + + + + + State + Server Type + DHCP Service + DHCP Service Scopes + + + + + backup + backup server + disabled + none + + + hot-standby + primary or standby (hot standby mode) + enabled + ha_server1 if primary, none otherwise + + + load-balancing + primary or secondary (load balancing mode) + enabled + ha_server1 or ha_server2 + + + partner-down + active server + enabled + all scopes + + + ready + active server + disabled + none + + + syncing + active server + disabled + none + + + waiting + any server + disabled + none + + + +
+
+ +
+ + The DHCP service scopes require some explanation. The HA + configuration must specify a unique name for each server within + the HA setup. This document uses the following convention within + provided examples: server1 for a primary server, + server2 for the secondary or standby server and + server3 for the backup server. In the real life + any names can be used as long as they remain unique. + + In the load balancing mode there are two scopes named after + the active servers: ha_server1 and + ha_server2. The DHCP queries load balanced to the + server1 belong to the ha_server1 + scope and the queries load balanced to the server2 + belong to the ha_server2 scope. If any of the + servers is in the partner-down state, it is + responsible for serving both scopes. + + In the hot standby mode, there is only one scope + ha_server1 because only the server1 + is responding to the DHCP queries. If that server becomes unavailable, + the server2 becomes responsible for this scope. + + + The backup servers do not have their own scopes. In some + cases they can be used to respond to the queries belonging to + the scopes of the active servers. Also, a server which is neither + in the partner-down state nor in the normal operation serves + no scopes. + + The scope names can be used to associate pools, subnets + and networks with certain servers, so as only these servers + can allocate addresses or prefixes from those pools, subnets + or network. This is done via the client classification mechanism + (see below). +
+ +
+ Load Balancing Configuration + The following is the configuration snippet which enables + high availability on the primary server within the load balancing + configuration. The same configuration should be applied on the + secondary and the backup server, with the only difference that + the this-server-name should be set to + server2 and server3 + on those servers respectively. + +{ +"Dhcp4": { + + ... + + "hooks-libraries": [ + { + "library": "/usr/lib/hooks/libdhcp_lease_cmds.so", + "parameters": { } + }, + { + "library": "/usr/lib/hooks/libdhcp_ha.so", + "parameters": { + "high-availability": [ { + "this-server-name": "server1", + "mode": "load-balancing", + "heartbeat-delay": 10, + "max-response-delay": 10, + "max-ack-delay": 5, + "max-unacked-clients": 5, + "peers": [ + { + "name": "server1", + "url": "http://192.168.56.33:8080/", + "role": "primary", + "auto-failover": true + }, + { + "name": "server2", + "url": "http://192.168.56.66:8080/", + "role": "secondary", + "auto-failover": true + }, + { + "name": "server3", + "url": "http://192.168.56.99:8080/", + "role": "backup", + "auto-failover": false + } + ] + } ] + } + } + ], + + "subnet4": [ + { + "subnet": "192.0.3.0/24", + "pools": [ + { + "pool": "192.0.3.100 - 192.0.3.150", + "client-class": "ha_server1" + }, + { + "pool": "192.0.3.200 - 192.0.3.250", + "client-class": "ha_server2" + } + ], + + "option-data": [ + { + "name": "routers", + "data": "192.0.3.1" + } + ], + + "relay": { "ip-address": "10.1.2.3" } + } + ], + + ... + +} + +} + + + + Two hook libraries must be loaded to enable HA: + libdhcp_lease_cmds.so and + libdhcp_ha.so. The latter provides the + implemenation of the HA feature. The former enables control + commands required by HA to fetch and manipulate leases on the + remote servers. In the example provided above, it is assumed that + Kea libraries are installed in the /usr/lib + directory. If Kea is not installed in the /usr directory, the + hook libraries locations must be updated accordingly. + + + The HA configuration is specified within the scope of the + libdhcp_ha.so. Note that the top level + parameter high-availability is a list, even + though it currently contains only one entry. In the future this + configuration is likely to be extended to contain more entries, + if the particular server can participate in more than one + HA relationships. + + The following are the global parameters which control the server's + behavior with respect to HA: + + this-server-name - is a unique + identifier of the server within this HA setup. It must match with one + of the servers specified within peers list. + + + mode - specifies a HA mode + of operation. Currently supported modes are load-balancing + and hot-standby. + + heartbeat-delay - specifies + a duration in seconds between the last heartbeat (or other command sent + to the partner) and sending the next heartbeat. The heartbeats are sent + periodically to gather the status of the partner and to verify whether + the partner is still operating. The default value of this parameter is + 10. + + max-response-delay - specifies a + duration in seconds since the last successful communication with the + partner, after which the server assumes that the communication with + the partner is interrupted. This duration should be greater than + the heartbeat-delay. Usually it is a greater than + the duration of multiple heartbeat-delay values. + When the server detects that the communication is interrupted, it + may transition to the partner-down state (when + max-unacked-clients is 0) or trigger failure + detection procedure using the values of the two parameters below. + The default value of this parameter is 60. + + + max-ack-delay - is one of + the parameters controlling partner failure detection. When the + communication with the partner is interrupted, the server examines values + of the secs field (DHCPv4) or Elapsed Time + option (DHCPv6) which denote how long the DHCP client has been + trying to communicate with the DHCP server. This parameter specifies the + maximum time for the client to try to communicate with the DHCP server, + after which this server assumes that the client failed to communicate + with the DHCP server (is "unacked"). The default value of this parameter + is 10. + + max-unacked-clients - specifies + how many "unacked" clients are allowed (see max-ack-delay) + before this server assumes that the partner is offline and transitions + to the partner-down state. The special value of 0 + is allowed for this parameter which disables failure detection + mechanism. In this case, the server which can't communicate with the + partner over the control channel assumes that the partner server is + down and transitions to the partner-down state + immediately. The default value of this parameter is 10. + + + + + + + The values of max-ack-delay and + max-unacked-clients must be selected carefully, taking + into account specifics of the network in which DHCP servers are + operating. Note that the server in question may not respond to some + of the DHCP clients because these clients are not to be serviced + by this server (per administrative policy). The server may also + drop malformed queries from the clients. Therefore, selecting too + low value for the max-unacked-clients may + result in transitioning to the partner-down + state even though the partner is still operating. On the other + hand, selecting too high value may result in never transitioning + to the partner-down state if the DHCP + traffic in the network is very low (e.g. night time), because the + number of distinct clients trying to communicate with the server + could be lower than max-unacked-clients. + + + In some cases it may be useful to disable the failure detection + mechanism altogether, if the servers are located very close to each + other and the network partitioning is unlikely, i.e. failure to + respond to heartbeats is only possible when the partner is offline. + In such cases, set the max-unacked-clients to 0. + + + The peers parameter contains a list of servers + within this HA setup. In this configuration it must contain at least + one primary and one secondary server. It may also contain unlimited + number of backup servers. In this example there is one backup server + which receives lease updates from the active servers. + + There are the following parameters specified for each of the + peers within this list: + + + name - specifies unique name for + the server. + + url - specifies URL to be used to + contact this server over the control channel. Other servers use this + URL to send control commands to that server. + + role - denotes the role of the + server in the HA setup. The following roles are supported in the + load balancing configuration: primary, + secondary and backup. + There must be exactly one primary and one secondary server in the + load balancing setup. + + auto-failover - a boolean value + which denotes whether the server detecting a partner's failure should + automatically start serving its clients. + + + + + In our example configuration, both active servers can allocate + leases from the subnet "192.0.3.0/24". This subnet contains two + address pools: "192.0.3.100 - 192.0.3.150" and "192.0.3.200 - 192.0.3.250", + which are associated with HA servers scopes using client classification. + When the server1 processes a DHCP query it will use + the first pool for the lease allocation. Conversely, when the + server2 is processing the DHCP query it will use the + second pool. When any of the servers is in the partner-down + state, it can serve leases from both pools and it will + select the pool which is appropriate for the received query. In + other words, if the query would normally be processed by the + server2, but this server is not available, the + server1 will allocate the lease from the pool of + "192.0.3.200 - 192.0.3.250". + + +
+ +
+ Load Balancing with Advanced Classification + In the previous section we have provided an example which demonstrated + the load balancing configuration with the client classification limited + to the use of ha_server1 and ha_server2 + classes, which are dynamically assigned to the received DHCP queries. + In many cases it will be required to use HA in deployments which already + use some client classification. + + + Suppose there is a system which classifies devices into two groups: + phones and laptops, based on some classification criteria specified in + Kea configuration file. Both types of devices are allocated leases + from different address pools. Introducing HA in the load balancing mode + is expected to result in further split of each of those pools, so as + each of the servers can allocate leases for some part of the phones + and part of the laptops. This requires that each of the existing pools + should be split between the ha_server1 and + ha_server2, so we end up with the following classes: + + + phones_server1 + laptops_server1 + phones_server2 + laptops_server2 + + + + The corresponding server configuration using advanced classification + (and member expression) is provided below. For brevity + the HA hook library configuration has been removed from this example. + +{ +"Dhcp4": { + + "client-classes": [ + { + // No test expression for this class. Incoming packets will be + // assigned to that class dynamically by the HA Hook library. + "name": "ha_server1" + }, + { + // No test expression for this class. Incoming packets will be + // assigned to that class dynamically by the HA Hook library. + "name": "ha_server2" + }, + { + "name": "phones", + "test": "substring(option[60].hex,0,6) == 'Aastra'", + }, + { + "name": "laptops", + "test": "not member('phones')" + }, + { + "name": "phones_server1", + "test": "member('phones') and member('ha_server1')" + }, + { + "name": "phones_server2", + "test": "member('phones') and member('ha_server2')" + }, + { + "name": "laptops_server1", + "test": "member('laptops') and member('ha_server1')" + }, + { + "name": "laptops_server2", + "test": "member('laptops') and member('ha_server2')" + } + ], + + "hooks-libraries": [ + { + "library": "/usr/lib/hooks/libdhcp_lease_cmds.so", + "parameters": { } + }, + { + "library": "/usr/lib/hooks/libdhcp_ha.so", + "parameters": { + "high-availability": [ { + + ... + + } ] + } + } + ], + + "subnet4": [ + { + "subnet": "192.0.3.0/24", + "pools": [ + { + "pool": "192.0.3.100 - 192.0.3.125", + "client-class": "phones_server1" + }, + { + "pool": "192.0.3.126 - 192.0.3.150", + "client-class": "laptops_server1" + }, + { + "pool": "192.0.3.200 - 192.0.3.225", + "client-class": "phones_server2" + }, + { + "pool": "192.0.3.226 - 192.0.3.250", + "client-class": "laptops_server2" + } + ], + + "option-data": [ + { + "name": "routers", + "data": "192.0.3.1" + } + ], + + "relay": { "ip-address": "10.1.2.3" } + } + ], + + ... + +} + +} + + + + The configuration provided above splits the address range into + four pools. Two pools are dedicated to server1 and two are dedicated for + server2. Each server can assign leases to both phones and laptops. + Both groups of devices are assigned addresses from different pools. + Note that definition of classes ha_server1 and + ha_server2 is required because other classes + refer to them via member expression. These classes + do not include test parameter because they are + not evaluated with other classes. They are assigned dynamically + by the HA hook library as a result of load balancing algorithm. + The phones_* and laptop_* + evaluate to "true" when the query belongs to a given combination + of other classes, e.g. ha_server1 and + phones. The pool will be selected accordingly + as a result of such evaluation. + + + Consult for details on how to use + member expression and about class dependencies. + +
+ +
+ Hot Standby Configuration + The following is the example configuration of the primary server + in the hot standby configuration: + +{ +"Dhcp4": { + + ... + + "hooks-libraries": [ + { + "library": "/usr/lib/hooks/libdhcp_lease_cmds.so", + "parameters": { } + }, + { + "library": "/usr/lib/hooks/libdhcp_ha.so", + "parameters": { + "high-availability": [ { + "this-server-name": "server1", + "mode": "hot-standby", + "heartbeat-delay": 10, + "max-response-delay": 10, + "max-ack-delay": 5, + "max-unacked-clients": 5, + "peers": [ + { + "name": "server1", + "url": "http://192.168.56.33:8080/", + "role": "primary", + "auto-failover": true + }, + { + "name": "server2", + "url": "http://192.168.56.66:8080/", + "role": "standby", + "auto-failover": true + }, + { + "name": "server3", + "url": "http://192.168.56.99:8080/", + "role": "backup", + "auto-failover": false + } + ] + } ] + } + } + ], + + "subnet4": [ + { + "subnet": "192.0.3.0/24", + "pools": [ + { + "pool": "192.0.3.100 - 192.0.3.250", + "client-class": "ha_server1" + } + ], + + "option-data": [ + { + "name": "routers", + "data": "192.0.3.1" + } + ], + + "relay": { "ip-address": "10.1.2.3" } + } + ], + + ... + +} + +} + + + + This configuration is very similar to the load balancing + configuration described , + with a few notable differences. + + The mode is now set to hot-standby, + in which only one server is responding to the DHCP clients. + If the primary server is online, the primary server is responding to + all DHCP queries. The standby server takes over the + entire DHCP traffic when it discovers that the primary is unavailable. + + + In this mode, the non-primary active server is called + standby and that's what the role of the second + active server is set to. + + Finally, because there is always one server responding to the + DHCP queries, there is only one scope ha_server1 + in use within pools definitions. In fact, the client-class + parameter could be removed from this configuration without harm, + because there are no conflicts in lease allocations by different + servers as they do not allocate leases concurrently. The + client-class is left in this example mostly for + demonstration purposes, to highlight the differences between the + hot standby and load balancing mode of operation. +
+ +
+ Control Agent Configuration + The describes in detail the + Kea deamon which provides RESTful interface to control Kea servers. + The same functionality is used by High Availability hook library to + establish communication between the HA peers. Therefore, the HA + library requires that Control Agent is started for each DHCP + instance within HA setup. If the Control Agent is not started + the peers will not be able to communicate with the particular DHCP + server (even if the DHCP server itself is online) and may eventually + consider this server to be offline. + + + The following is the example configuration for the CA running + on the same machine as the primary server. This configuration is + valid for both load balancing and hot standby cases presented in + previous sections. + + +{ +"Control-agent": { + "http-host": "192.168.56.33", + "http-port": 8080, + + "control-sockets": { + "dhcp4": { + "socket-type": "unix", + "socket-name": "/tmp/kea-dhcp4-ctrl.sock" + }, + "dhcp6": { + "socket-type": "unix", + "socket-name": "/tmp/kea-dhcp6-ctrl.sock" + } + } +} +} + + +
+ +
+ Control Commands for High Availability + Even though the HA hook library is designed to automatically + resolve issues with DHCP service interruptions by redirecting the + DHCP traffic to a surviving server and synchronizing the lease + database when required, it may be useful for the administrator to + have control over the server behavior. In particular, it may be + useful be able to trigger lease database synchronization on demand. + It may also be useful to manually set the HA scopes that are being + served. + + Note that the backup server can sometimes be used to handle + the DHCP traffic in case if both active servers are down. The backup + servers do not perform failover function automatically. Hence, in + order to use the backup server to respond to the DHCP queries, + the server administrator must enable this function manually. + + + The following sections describe commands supported by the + HA hook library which are available for the administrator. + + +
+ ha-sync command + The ha-sync is issued to instruct the + server to synchronize the local lease database with the + selected peer. The database synchronization may be triggered for + both active and backup server type. The ha-sync + has the following structure (DHCPv4 server case): + +{ + "command": "ha-sync", + "service": [ "dhcp4 "], + "arguments": { + "server-name": "server2", + "max-period": 60 + } +} + + + + + When the server receives this command it first disables the + DHCP service of the server from which it will be fetching leases, + i.e. sends dhcp-disable command to that server. + The max-period parameter specifies the maximum + duration (in seconds) for which the DHCP service should be disabled. + If the DHCP service is successfully disabled, the synchronizing + server will fetch leases from the remote server by issuing the + lease4-get-all command. When the lease database + synchronization is complete, the synchronizing server sends the + dhcp-enable to the peer to re-enable its + DHCP service. + +
+ +
+ ha-scopes command + This command allows for modifying the HA scopes that the + server is serving. Consult + and to learn what scopes + are available for different HA modes of operation. The + ha-scopes command has the following structure + (DHCPv4 server case): + +{ + "command": "ha-scopes", + "service": [ "dhcp4 "], + "arguments": { + "scopes": [ "ha_server1", "ha_server2" ] + } +} + + + + This command configures the server to handle traffic from + both ha_server1 and ha_server2 + scopes. In order to disable all scopes specify an empty list: + + +{ + "command": "ha-scopes", + "service": [ "dhcp4 "], + "arguments": { + "scopes": [ ] + } +} + + +
+ +
+ +
diff --git a/doc/guide/hooks.xml b/doc/guide/hooks.xml index cd60b4c24c..71f1512b04 100644 --- a/doc/guide/hooks.xml +++ b/doc/guide/hooks.xml @@ -2851,913 +2851,8 @@ both the command and the response. -
- libdhcp_ha: High Availability - - High Availability (HA) of the DHCP service is provided by running multiple - cooperating server instances. If any of these instances becomes - unavailable for whatever reason (DHCP software crash, control agent - software crash, power failure, hardware - failure), a surviving - server instance can continue providing the reliable service to the clients. Many - DHCP servers implementations include "DHCP Failover" protocol, which most - significant features are: communication between the servers, partner - failure detection and leases synchronization between the servers. - However, the DHCPv4 failover standardization process was never completed - at IETF. The DHCPv6 failover standard (RFC 8156) was published, but it - is complex, difficult to use, has significant operational constraints - and is different than its v4 counterpart. - Although it may be useful for some users to use a "standard" failover - protocol, it seems that most of the Kea users are simply interested in - a working solution which guarantees high availability of the DHCP - service. Therefore, Kea HA hook library derives major concepts from the - DHCP Failover protocol but uses its own solutions for communication, - configuration and its own state machine, which greatly simplifies its - implementation and generally better fits into Kea. Also, it provides the - same features in both DHCPv4 and DHCPv6. This document purposely - uses the term "High Availability" rather than "Failover" to emphasize that - it is not the Failover protocol implementation. - - - The following sections describe the configuration and operation of the Kea - HA hook library. - - -
- Supported Configurations - The Kea HA hook library supports two configurations also known as HA - modes: load balancing and hot standby. In the load balancing mode, there - are two servers responding to the DHCP requests. The load balancing function - is implemented as described in RFC3074, with each server responding to - 1/2 of received DHCP queries. When one of the servers allocates a lease - for a client, it notifies the partner server over the control channel - (RESTful API), so as the partner can save the lease information in its - own database. If the communication with the partner is unsuccessful, - the DHCP query is dropped and the response is not returned to the DHCP - client. If the lease update is successful, the response is returned to - the DHCP client by the server which has allocated the lease. By - exchanging the lease updates, both servers get a copy of all leases - allocated by the entire HA setup and any of the servers can be switched - to handle the entire DHCP traffic if its partner becomes unavailable. - - In the load balancing configuration, one of the servers must be - designated as "primary" and the other server is designated as "secondary". - Functionally, there is no difference between the two during the normal - operation. This distniction is required when the two servers are - started at (nearly) the same time and have to synchronize their - lease databases. The primary server synchronizes the database first. - The secondary server waits for the primary server to complete the - lease database synchronization before it starts the synchronization. - - - In the hot standby configuration one of the servers is designated as - "primary" and the second server is designated as "secondary". During the - normal operation, the primary server is the only one that responds to - the DHCP requests. The secodary server receives lease updates from the - primary over the control channel. However, it does not respond to any - DHCP queries as long as the primary is running or, more accurately, - until the secondary considers the primary to be offline. When the - secondary server detects the failure of the primary, it starts - responding to all DHCP queries. - - - In the configurations described above, the primary, secondary and - standby are referred to as "active" servers, because they receive - lease updates and can automatically react to the partner's failures by - responding to the DHCP queries which would normally be handled by the - partner. The HA hook library supports another server type (role) - - backup server. The use of the backup servers is optional. They can be used - in both load balancing and hot standby setup, in addition to the active - servers. There is no limit on the number of backup servers in the HA - setup. However, the presence of the backup servers increases latency - of the DHCP responses, because not only do active servers send lease - updates to each other, but also to the backup servers. - -
- -
- Server States - The DHCP server operating within an HA setup runs a state machine - and the state of the server can be retrieved by its peers using the - ha-heartbeat command sent over the RESTful API. If - the partner server doesn't respond to the ha-heartbeat - command longer than configured amount of time, the communication is - considered interrupted and the server may (depending on the configuration) - use additional measures to verify if the partner is still operating. - If it finds that the partner is not operating, the server transitions - to the partner-down state to handle the entire - DHCP traffic directed to the system. - - In this case, the surviving server continues to send the - ha-heartbeat command to detect when the partner wakes - up. The partner synchronizes the lease database and when it is finally - ready to operate, the surviving server returns to the normal operation, - i.e. load-balancing or hot-standby - state. - - The following is the list of all possible states into which the - servers may transition: - - - backup - normal operation of the - backup server. In this state it receives lease updates from the active - servers. - - hot-standby - normal operation of - the active server running in the hot standby mode. Both primary and - standby server are in this state during their normal operation. - The primary server is responding to the DHCP queries and sends lease updates - to the standby server and to the backup servers, if any backup servers - are present. - - load-balancing - normal operation - of the active server running in the load balancing mode. Both primary - and secondary server are in this state during their normal operation. - Both servers are responding to the DHCP queries and send lease updates - to each other and to the backup servers, if any backup servers are - present. - - partner-down - an active server - transitions to this state after detecting that its partner (another - active server) is offline. The server doesn't transition to this state - if any of the backup servers is unavailable. In the - partner-down state the server responds to all DHCP queries, - so also those queries which are normally handled by the active server - which is now unavailable. - - ready - an active server transitions - to this state after synchronizing its lease database with an active - partner. This state is to indicate to the partner (likely being in the - partner-down state that it may return to the - normal operation. When it does, the server being in the - ready state will also start normal operation. - - - syncing - an active server - transitions to this state to fetch leases from the active partner - and update the local lease database. When it this state, it - issues the dhcp-disable to disable the DHCP - service of the partner from which the leases are fetched. The DHCP - servie is disabled for the maximum time of 60 seconds, after which - it is automatically enabled, in case the syncing partner has died - again failing to re-enable the service. If the synchronization is - completed the syncing server issues the dhcp-enable - to re-enable the DHCP service of the partner. The - syncing operation is synchronous. The server is waiting for an - answer from the partner and is not doing anything else while the - leases synchronization takes place. - - waiting - each started server - instance enters this state. The backup server will transition - directly from this state to the backup state. - An active server will send heartbeat to its partner to check its - state. If the partner appears to be unavailable the server will - transition to the partner-down, otherwise it - will transition to the syncing state and attempt - to synchronize the lease database. If both servers appear to be - in this state (concurrent startup) the primary server will - synchronize first. The secondary or standby server will remain - in the waiting state until the primary - synchronizes the database.. - - - Whether the server responds to the DHCP queries and which - queries it responds to is a matter of the server's state, if no - administrative action is performed to configure the server - otherwise. The following table provides the default behavior for - various states. - - - - Default behavior of the server in various HA states - - - - - - - - State - Server Type - DHCP Service - DHCP Service Scopes - - - - - backup - backup server - disabled - none - - - hot-standby - primary or standby (hot standby mode) - enabled - ha_server1 if primary, none otherwise - - - load-balancing - primary or secondary (load balancing mode) - enabled - ha_server1 or ha_server2 - - - partner-down - active server - enabled - all scopes - - - ready - active server - disabled - none - - - syncing - active server - disabled - none - - - waiting - any server - disabled - none - - - -
-
- -
- - The DHCP service scopes require some explanation. The HA - configuration must specify a unique name for each server within - the HA setup. This document uses the following convention within - provided examples: server1 for a primary server, - server2 for the secondary or standby server and - server3 for the backup server. In the real life - any names can be used as long as they remain unique. - - In the load balancing mode there are two scopes named after - the active servers: ha_server1 and - ha_server2. The DHCP queries load balanced to the - server1 belong to the ha_server1 - scope and the queries load balanced to the server2 - belong to the ha_server2 scope. If any of the - servers is in the partner-down state, it is - responsible for serving both scopes. - - In the hot standby mode, there is only one scope - ha_server1 because only the server1 - is responding to the DHCP queries. If that server becomes unavailable, - the server2 becomes responsible for this scope. - - - The backup servers do not have their own scopes. In some - cases they can be used to respond to the queries belonging to - the scopes of the active servers. Also, a server which is neither - in the partner-down state nor in the normal operation serves - no scopes. - - The scope names can be used to associate pools, subnets - and networks with certain servers, so as only these servers - can allocate addresses or prefixes from those pools, subnets - or network. This is done via the client classification mechanism - (see below). -
- -
- Load Balancing Configuration - The following is the configuration snippet which enables - high availability on the primary server within the load balancing - configuration. The same configuration should be applied on the - secondary and the backup server, with the only difference that - the this-server-name should be set to - server2 and server3 - on those servers respectively. - -{ -"Dhcp4": { - - ... - - "hooks-libraries": [ - { - "library": "/usr/lib/hooks/libdhcp_lease_cmds.so", - "parameters": { } - }, - { - "library": "/usr/lib/hooks/libdhcp_ha.so", - "parameters": { - "high-availability": [ { - "this-server-name": "server1", - "mode": "load-balancing", - "heartbeat-delay": 10, - "max-response-delay": 10, - "max-ack-delay": 5, - "max-unacked-clients": 5, - "peers": [ - { - "name": "server1", - "url": "http://192.168.56.33:8080/", - "role": "primary", - "auto-failover": true - }, - { - "name": "server2", - "url": "http://192.168.56.66:8080/", - "role": "secondary", - "auto-failover": true - }, - { - "name": "server3", - "url": "http://192.168.56.99:8080/", - "role": "backup", - "auto-failover": false - } - ] - } ] - } - } - ], - - "subnet4": [ - { - "subnet": "192.0.3.0/24", - "pools": [ - { - "pool": "192.0.3.100 - 192.0.3.150", - "client-class": "ha_server1" - }, - { - "pool": "192.0.3.200 - 192.0.3.250", - "client-class": "ha_server2" - } - ], - - "option-data": [ - { - "name": "routers", - "data": "192.0.3.1" - } - ], - - "relay": { "ip-address": "10.1.2.3" } - } - ], - - ... - -} - -} - - - - Two hook libraries must be loaded to enable HA: - libdhcp_lease_cmds.so and - libdhcp_ha.so. The latter provides the - implemenation of the HA feature. The former enables control - commands required by HA to fetch and manipulate leases on the - remote servers. In the example provided above, it is assumed that - Kea libraries are installed in the /usr/lib - directory. If Kea is not installed in the /usr directory, the - hook libraries locations must be updated accordingly. - - - The HA configuration is specified within the scope of the - libdhcp_ha.so. Note that the top level - parameter high-availability is a list, even - though it currently contains only one entry. In the future this - configuration is likely to be extended to contain more entries, - if the particular server can participate in more than one - HA relationships. - - The following are the global parameters which control the server's - behavior with respect to HA: - - this-server-name - is a unique - identifier of the server within this HA setup. It must match with one - of the servers specified within peers list. - - - mode - specifies a HA mode - of operation. Currently supported modes are load-balancing - and hot-standby. - - heartbeat-delay - specifies - a duration in seconds between the last heartbeat (or other command sent - to the partner) and sending the next heartbeat. The heartbeats are sent - periodically to gather the status of the partner and to verify whether - the partner is still operating. The default value of this parameter is - 10. - - max-response-delay - specifies a - duration in seconds since the last successful communication with the - partner, after which the server assumes that the communication with - the partner is interrupted. This duration should be greater than - the heartbeat-delay. Usually it is a greater than - the duration of multiple heartbeat-delay values. - When the server detects that the communication is interrupted, it - may transition to the partner-down state (when - max-unacked-clients is 0) or trigger failure - detection procedure using the values of the two parameters below. - The default value of this parameter is 60. - - - max-ack-delay - is one of - the parameters controlling partner failure detection. When the - communication with the partner is interrupted, the server examines values - of the secs field (DHCPv4) or Elapsed Time - option (DHCPv6) which denote how long the DHCP client has been - trying to communicate with the DHCP server. This parameter specifies the - maximum time for the client to try to communicate with the DHCP server, - after which this server assumes that the client failed to communicate - with the DHCP server (is "unacked"). The default value of this parameter - is 10. - - max-unacked-clients - specifies - how many "unacked" clients are allowed (see max-ack-delay) - before this server assumes that the partner is offline and transitions - to the partner-down state. The special value of 0 - is allowed for this parameter which disables failure detection - mechanism. In this case, the server which can't communicate with the - partner over the control channel assumes that the partner server is - down and transitions to the partner-down state - immediately. The default value of this parameter is 10. - - - - - - - The values of max-ack-delay and - max-unacked-clients must be selected carefully, taking - into account specifics of the network in which DHCP servers are - operating. Note that the server in question may not respond to some - of the DHCP clients because these clients are not to be serviced - by this server (per administrative policy). The server may also - drop malformed queries from the clients. Therefore, selecting too - low value for the max-unacked-clients may - result in transitioning to the partner-down - state even though the partner is still operating. On the other - hand, selecting too high value may result in never transitioning - to the partner-down state if the DHCP - traffic in the network is very low (e.g. night time), because the - number of distinct clients trying to communicate with the server - could be lower than max-unacked-clients. - - - In some cases it may be useful to disable the failure detection - mechanism altogether, if the servers are located very close to each - other and the network partitioning is unlikely, i.e. failure to - respond to heartbeats is only possible when the partner is offline. - In such cases, set the max-unacked-clients to 0. - - - The peers parameter contains a list of servers - within this HA setup. In this configuration it must contain at least - one primary and one secondary server. It may also contain unlimited - number of backup servers. In this example there is one backup server - which receives lease updates from the active servers. - - There are the following parameters specified for each of the - peers within this list: - - - name - specifies unique name for - the server. - - url - specifies URL to be used to - contact this server over the control channel. Other servers use this - URL to send control commands to that server. - - role - denotes the role of the - server in the HA setup. The following roles are supported in the - load balancing configuration: primary, - secondary and backup. - There must be exactly one primary and one secondary server in the - load balancing setup. - - auto-failover - a boolean value - which denotes whether the server detecting a partner's failure should - automatically start serving its clients. - - - - - In our example configuration, both active servers can allocate - leases from the subnet "192.0.3.0/24". This subnet contains two - address pools: "192.0.3.100 - 192.0.3.150" and "192.0.3.200 - 192.0.3.250", - which are associated with HA servers scopes using client classification. - When the server1 processes a DHCP query it will use - the first pool for the lease allocation. Conversely, when the - server2 is processing the DHCP query it will use the - second pool. When any of the servers is in the partner-down - state, it can serve leases from both pools and it will - select the pool which is appropriate for the received query. In - other words, if the query would normally be processed by the - server2, but this server is not available, the - server1 will allocate the lease from the pool of - "192.0.3.200 - 192.0.3.250". - - -
- -
- Load Balancing with Advanced Classification - In the previous section we have provided an example which demonstrated - the load balancing configuration with the client classification limited - to the use of ha_server1 and ha_server2 - classes, which are dynamically assigned to the received DHCP queries. - In many cases it will be required to use HA in deployments which already - use some client classification. - - - Suppose there is a system which classifies devices into two groups: - phones and laptops, based on some classification criteria specified in - Kea configuration file. Both types of devices are allocated leases - from different address pools. Introducing HA in the load balancing mode - is expected to result in further split of each of those pools, so as - each of the servers can allocate leases for some part of the phones - and part of the laptops. This requires that each of the existing pools - should be split between the ha_server1 and - ha_server2, so we end up with the following classes: - - - phones_server1 - laptops_server1 - phones_server2 - laptops_server2 - - - - The corresponding server configuration using advanced classification - (and member expression) is provided below. For brevity - the HA hook library configuration has been removed from this example. - -{ -"Dhcp4": { - - "client-classes": [ - { - // No test expression for this class. Incoming packets will be - // assigned to that class dynamically by the HA Hook library. - "name": "ha_server1" - }, - { - // No test expression for this class. Incoming packets will be - // assigned to that class dynamically by the HA Hook library. - "name": "ha_server2" - }, - { - "name": "phones", - "test": "substring(option[60].hex,0,6) == 'Aastra'", - }, - { - "name": "laptops", - "test": "not member('phones')" - }, - { - "name": "phones_server1", - "test": "member('phones') and member('ha_server1')" - }, - { - "name": "phones_server2", - "test": "member('phones') and member('ha_server2')" - }, - { - "name": "laptops_server1", - "test": "member('laptops') and member('ha_server1')" - }, - { - "name": "laptops_server2", - "test": "member('laptops') and member('ha_server2')" - } - ], - - "hooks-libraries": [ - { - "library": "/usr/lib/hooks/libdhcp_lease_cmds.so", - "parameters": { } - }, - { - "library": "/usr/lib/hooks/libdhcp_ha.so", - "parameters": { - "high-availability": [ { - - ... - - } ] - } - } - ], - - "subnet4": [ - { - "subnet": "192.0.3.0/24", - "pools": [ - { - "pool": "192.0.3.100 - 192.0.3.125", - "client-class": "phones_server1" - }, - { - "pool": "192.0.3.126 - 192.0.3.150", - "client-class": "laptops_server1" - }, - { - "pool": "192.0.3.200 - 192.0.3.225", - "client-class": "phones_server2" - }, - { - "pool": "192.0.3.226 - 192.0.3.250", - "client-class": "laptops_server2" - } - ], - - "option-data": [ - { - "name": "routers", - "data": "192.0.3.1" - } - ], - - "relay": { "ip-address": "10.1.2.3" } - } - ], - - ... - -} - -} - - - - The configuration provided above splits the address range into - four pools. Two pools are dedicated to server1 and two are dedicated for - server2. Each server can assign leases to both phones and laptops. - Both groups of devices are assigned addresses from different pools. - Note that definition of classes ha_server1 and - ha_server2 is required because other classes - refer to them via member expression. These classes - do not include test parameter because they are - not evaluated with other classes. They are assigned dynamically - by the HA hook library as a result of load balancing algorithm. - The phones_* and laptop_* - evaluate to "true" when the query belongs to a given combination - of other classes, e.g. ha_server1 and - phones. The pool will be selected accordingly - as a result of such evaluation. - - - Consult for details on how to use - member expression and about class dependencies. - -
- -
- Hot Standby Configuration - The following is the example configuration of the primary server - in the hot standby configuration: - -{ -"Dhcp4": { - - ... - - "hooks-libraries": [ - { - "library": "/usr/lib/hooks/libdhcp_lease_cmds.so", - "parameters": { } - }, - { - "library": "/usr/lib/hooks/libdhcp_ha.so", - "parameters": { - "high-availability": [ { - "this-server-name": "server1", - "mode": "hot-standby", - "heartbeat-delay": 10, - "max-response-delay": 10, - "max-ack-delay": 5, - "max-unacked-clients": 5, - "peers": [ - { - "name": "server1", - "url": "http://192.168.56.33:8080/", - "role": "primary", - "auto-failover": true - }, - { - "name": "server2", - "url": "http://192.168.56.66:8080/", - "role": "standby", - "auto-failover": true - }, - { - "name": "server3", - "url": "http://192.168.56.99:8080/", - "role": "backup", - "auto-failover": false - } - ] - } ] - } - } - ], - - "subnet4": [ - { - "subnet": "192.0.3.0/24", - "pools": [ - { - "pool": "192.0.3.100 - 192.0.3.250", - "client-class": "ha_server1" - } - ], - - "option-data": [ - { - "name": "routers", - "data": "192.0.3.1" - } - ], - - "relay": { "ip-address": "10.1.2.3" } - } - ], - - ... - -} - -} - - - - This configuration is very similar to the load balancing - configuration described , - with a few notable differences. - - The mode is now set to hot-standby, - in which only one server is responding to the DHCP clients. - If the primary server is online, the primary server is responding to - all DHCP queries. The standby server takes over the - entire DHCP traffic when it discovers that the primary is unavailable. - - - In this mode, the non-primary active server is called - standby and that's what the role of the second - active server is set to. - - Finally, because there is always one server responding to the - DHCP queries, there is only one scope ha_server1 - in use within pools definitions. In fact, the client-class - parameter could be removed from this configuration without harm, - because there are no conflicts in lease allocations by different - servers as they do not allocate leases concurrently. The - client-class is left in this example mostly for - demonstration purposes, to highlight the differences between the - hot standby and load balancing mode of operation. -
- -
- Control Agent Configuration - The describes in detail the - Kea deamon which provides RESTful interface to control Kea servers. - The same functionality is used by High Availability hook library to - establish communication between the HA peers. Therefore, the HA - library requires that Control Agent is started for each DHCP - instance within HA setup. If the Control Agent is not started - the peers will not be able to communicate with the particular DHCP - server (even if the DHCP server itself is online) and may eventually - consider this server to be offline. - - - The following is the example configuration for the CA running - on the same machine as the primary server. This configuration is - valid for both load balancing and hot standby cases presented in - previous sections. - - -{ -"Control-agent": { - "http-host": "192.168.56.33", - "http-port": 8080, - - "control-sockets": { - "dhcp4": { - "socket-type": "unix", - "socket-name": "/tmp/kea-dhcp4-ctrl.sock" - }, - "dhcp6": { - "socket-type": "unix", - "socket-name": "/tmp/kea-dhcp6-ctrl.sock" - } - } -} -} - - -
- -
- Control Commands for High Availability - Even though the HA hook library is designed to automatically - resolve issues with DHCP service interruptions by redirecting the - DHCP traffic to a surviving server and synchronizing the lease - database when required, it may be useful for the administrator to - have control over the server behavior. In particular, it may be - useful be able to trigger lease database synchronization on demand. - It may also be useful to manually set the HA scopes that are being - served. - - Note that the backup server can sometimes be used to handle - the DHCP traffic in case if both active servers are down. The backup - servers do not perform failover function automatically. Hence, in - order to use the backup server to respond to the DHCP queries, - the server administrator must enable this function manually. - - - The following sections describe commands supported by the - HA hook library which are available for the administrator. - - -
- ha-sync command - The ha-sync is issued to instruct the - server to synchronize the local lease database with the - selected peer. The database synchronization may be triggered for - both active and backup server type. The ha-sync - has the following structure (DHCPv4 server case): - -{ - "command": "ha-sync", - "service": [ "dhcp4 "], - "arguments": { - "server-name": "server2", - "max-period": 60 - } -} - - - - - When the server receives this command it first disables the - DHCP service of the server from which it will be fetching leases, - i.e. sends dhcp-disable command to that server. - The max-period parameter specifies the maximum - duration (in seconds) for which the DHCP service should be disabled. - If the DHCP service is successfully disabled, the synchronizing - server will fetch leases from the remote server by issuing the - lease4-get-all command. When the lease database - synchronization is complete, the synchronizing server sends the - dhcp-enable to the peer to re-enable its - DHCP service. - -
- -
- ha-scopes command - This command allows for modifying the HA scopes that the - server is serving. Consult - and to learn what scopes - are available for different HA modes of operation. The - ha-scopes command has the following structure - (DHCPv4 server case): - -{ - "command": "ha-scopes", - "service": [ "dhcp4 "], - "arguments": { - "scopes": [ "ha_server1", "ha_server2" ] - } -} - - - - This command configures the server to handle traffic from - both ha_server1 and ha_server2 - scopes. In order to disable all scopes specify an empty list: - - -{ - "command": "ha-scopes", - "service": [ "dhcp4 "], - "arguments": { - "scopes": [ ] - } -} - - -
- -
- -
+ +