From: Marcin Siodelski Date: Mon, 9 Apr 2018 09:53:40 +0000 (+0200) Subject: [5478] Load balancing configuration described. X-Git-Tag: trac5549a_base~34^2~15 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=25ced3639b000355b5bc61c3bee1befca01f4215;p=thirdparty%2Fkea.git [5478] Load balancing configuration described. --- diff --git a/doc/guide/hooks.xml b/doc/guide/hooks.xml index bb7768dfbf..340811910c 100644 --- a/doc/guide/hooks.xml +++ b/doc/guide/hooks.xml @@ -2906,11 +2906,427 @@ both the command and the response. Server States The DHCP server operating within an HA setup runs a state machine and the state of the server can be retrieved by its peers using the - 'ha-heartbeat' command sent over the RESTful API. If the partner server - doesn't respond to the 'ha-heartbeat' command longer than configured - amount of time, the communication is considered interrupted and the - server may (depending on the configuration) use additional measures to - verify if the partner is still operating. + ha-heartbeat command sent over the RESTful API. If + the partner server doesn't respond to the ha-heartbeat + command longer than configured amount of time, the communication is + considered interrupted and the server may (depending on the configuration) + use additional measures to verify if the partner is still operating. + If it finds that the partner is not operating, the server transitions + to the partner-down state to handle the entire + DHCP traffic directed to the system. + + In this case, the surviving server continues to send the + ha-heartbeat command to detect when the partner wakes + up. The partner synchronizes the lease database and when it is finally + ready to operate, the surviving server returns to the normal operation, + i.e. load-balancing or hot-standby + state. + + The following is the list of all possible states into which the + servers may transition: + + + backup - normal operation of the + backup server. In this state it receives lease updates from the active + servers. + + hot-standby - normal operation of + the active server running in the hot standby mode. Both primary and + standby server are in this state during their normal operation. + The primary server is responding to the DHCP queries and sends lease updates + to the standby server and to the backup servers, if any backup servers + are present. + + load-balancing - normal operation + of the active server running in the load balancing mode. Both primary + and secondary server are in this state during their normal operation. + Both servers are responding to the DHCP queries and send lease updates + to each other and to the backup servers, if any backup servers are + present. + + partner-down - an active server + transitions to this state after detecting that its partner (another + active server) is offline. The server doesn't transition to this state + if any of the backup servers is unavailable. In the + partner-down state the server responds to all DHCP queries, + so also those queries which are normally handled by the active server + which is now unavailable. + + ready - an active server transitions + to this state after synchronizing its lease database with an active + partner. This state is to indicate to the partner (likely being in the + partner-down state that it may return to the + normal operation. When it does, the server being in the + ready state will also start normal operation. + + + syncing - an active server + transitions to this state to fetch leases from the active partner + and update the local lease database. When it this state, it + issues the dhcp-disable to disable the DHCP + service of the partner from which the leases are fetched. The DHCP + servie is disabled for the maximum time of 60 seconds, after which + it is automatically enabled, in case the syncing partner has died + again failing to re-enable the service. If the synchronization is + completed the syncing server issues the dhcp-enable + to re-enable the DHCP service of the partner. The + syncing operation is synchronous. The server is waiting for an + answer from the partner and is not doing anything else while the + leases synchronization takes place. + + waiting - each started server + instance enters this state. The backup server will transition + directly from this statet to the backup state. + An active server will send heartbeat to its partner to check its + state. If the partner appears to be unavailable the server will + transition to the partner-down, otherwise it + will transition to the syncing state and attempt + to synchronize the lease database. If both servers appear to be + in this state (concurrent startup) the primary server will + synchronize first. The secondary or standby server will remain + in the waiting state until the primary + synchronizes the database.. + + + Whether the server responds to the DHCP queries and which + queries it responds to is a matter of the server's state, if no + administrative action is performed to configure the server + otherwise. The following table provides the default behavior for + various states. + + + + Default behavior of the server in various HA states + + + + + + + + State + Server Type + DHCP Service + DHCP Service Scopes + + + + + backup + backup server + disabled + none + + + hot-standby + primary or standby (hot standby mode) + enabled + ha_server1 if primary, none otherwise + + + load-balancing + primary or secondary (load balancing mode) + enabled + ha_server1 or ha_server2 + + + partner-down + active server + enabled + all scopes + + + ready + active server + disabled + none + + + syncing + active server + disabled + none + + + waiting + any server + disabled + none + + + +
+
+ +
+ + The DHCP service scopes require some explanation. The HA + configuration must specify a unique name for each server within + the HA setup. This document uses the following convention within + provided examples: server1 for a primary server, + server2 for the secondary or standby server and + server3 for the backup server. In the real life + any names can be used as long as they remain unique. + + In the load balancing mode there are two scopes named after + the active servers: ha_server1 and + ha_server2. The DHCP queries load balanced to the + server1 belong to the ha_server1 + scope and the queries load balanced to the server2 + belong to the ha_server2 scope. If any of the + servers is in the partner-down state, it is + responsible for serving both scopes. + + In the hot standby mode, there is only one scope + ha_server1 because only the server1 + is responding to the DHCP queries. If that server crashes, the + server2 becomes responsible for this scope. + + + The backup servers do not have their own scopes. In some + cases they can be used to respond to the queries belonging to + the scopes of the active servers. Also, a server which is neither + in the partner-down state nor in the normal operation serves + no scopes. + + The scope names can be used to associate pools, subnets + and networks with certain servers, so as only these servers + can allocate addresses or prefixes from those pools, subnets + or network. This is done via the client classification mechanism + (see below). + + +
+ Load Balancing Configuration + The following is the configuration snippet which enables + high availability on the primary server within the load balancing + configuration. The same configuration should be applied on the + secondary and the backup server, with the only difference that + the this-server-name should be set to + server2 and server3 + on those servers respectively. + +{ +"Dhcp4": { + + ... + + "hooks-libraries": [ + { + "library": "/usr/lib/hooks/libdhcp_lease_cmds.so", + "parameters": { } + }, + { + "library": "/usr/lib/hooks/libdhcp_ha.so", + "parameters": { + "high-availability": [ { + "this-server-name": "server1", + "mode": "load-balancing", + "heartbeat-delay": 10, + "max-response-delay": 10, + "max-ack-delay": 5, + "max-unacked-clients": 5, + "peers": [ + { + "name": "server1", + "url": "http://192.168.56.33:8080/", + "role": "primary", + "auto-failover": true + }, + { + "name": "server2", + "url": "http://192.168.56.66:8080/", + "role": "secondary", + "auto-failover": true + }, + { + "name": "server3", + "url": "http://192.168.56.99:8080/", + "role": "backup", + "auto-failover": false + } + ] + } ] + } + } + ], + + "subnet4": [ + { + "subnet": "192.0.3.0/24", + "pools": [ + { + "pool": "192.0.3.100 - 192.0.3.150", + "client-class": "ha_server1" + }, + { + "pool": "192.0.3.200 - 192.0.3.250", + "client-class": "ha_server2" + } + ], + + "option-data": [ + { + "name": "routers", + "data": "192.0.3.1" + } + ], + + "relay": { "ip-address": "10.1.2.3" } + } + ], + + ... + +} + +} + + + Two hook libraries must be loaded to enable HA: + libdhcp_lease_cmds.so and + libdhcp_ha.so. The former provides the + implemenation of the HA feature. The latter enables control + commands required by HA to fetch and manipulate leases on the + remote servers. In the example provided above, it is assumed that + Kea libraries are installed in the /usr/lib + directory. If Kea is not installed in the /usr directory, the + hook libraries locations must be updated accordingly. + + + The HA configuration is specified within the scope of the + libdhcp_ha.so. Note that the top level + parameter high-availability is a list, even + though it currently contains only one entry. In the future this + configuration is likely to be extended to contain more entries, + if the particular server can participate in more than one + HA relationships. + + The following are the global parameters which control the server's + behavior with respect to HA: + + this-server-name - is a unique + identifier of the server within this HA setup. It must match with one + of the servers specified within peers list. + + + mode - specifies a HA mode + of operation. Currently supported modes are load-balancing + and hot-standby. + + heartbeat-delay - specifies + a duration in seconds between the last heartbeat (or other command sent + to the partner) and sending the next heartbeat. The heartbeats are sent + periodically to gather the status of the partner and to verify whether + the partner is still operating. + + max-response-delay - specifies a + duration in seconds since the last successful communication with the + partner, after which the server assumes that the communication with + the partner is interrupted. This duration should be greater than + the heartbeat-delay. Usually it is a greater than + the duration of multiple heartbeat-delay values. + When the server detects that the communication is interrupted, it + may transition to the partner-down state (when + max-unacked-clients is 0) or trigger failure + detection procedure using the values of the two parameters below. + + + max-ack-delay - is one of + the parameters controlling partner failure detection. When the + communication with the partner is interrupted, the server examines values + of the secs field (DHCPv4) or Elapsed Time + option (DHCPv6) which denote how long the DHCP client has been + trying to communicate with the DHCP server. This parameter specifies the + maximum time for the client to try to communicate with the DHCP server, + after which this server assumes that the client failed to communicate + with the DHCP server (is "unacked"). + + max-unacked-clients - specifies + how many "unacked" clients are allowed (see max-ack-delay) + before this server assumes that the partner is offline and transitions + to the partner-down state. The special value of 0 + is allowed for this parameter which disables failure detection + mechanism. In this case, the server which can't communicate with the + partner over the control channel assumes that the partner server is + down and transitions to the partner-down state + immediately. + + + + + + The values of max-ack-delay and + max-unacked must be selected carefully, taking + into account specifics of the network in which DHCP servers are + operating. Note that the server in question may not respond to some + of the DHCP clients because these clients are not to be serviced + by this server (per administrative policy). The server may also + drop malformed queries from the clients. Therefore, selecting too + low value for the max-unacked-clients may + result in transitioning to the partner-down + state even though the partner is still operating. On the other + hand, selecting too high value may result in never transitioning + to the partner-down state if the DHCP + traffic in the network is very low (e.g. night time), because the + number of distinct clients trying to communicate with the server + could be lower than max-unacked-clients. + + + In some cases it may be useful to disable the failure detection + mechanism altogether, if the servers are located very close to each + other and the network partitioning is unlikely, i.e. failure to + respond to heartbeats is only possible when the partner is offline. + In such cases, set the max-unacked-clients to 0. + + + The peers parameter contains a list of servers + within this HA setup. In this configuration it must contain at least + one primary and one secondary server. It may also contain unlimited + number of backup servers. In this example there is one backup server + which receives lease updates from the active servers. + + There are the following parameters specified for each of the + peers within this list: + + + name - specifies unique name for + the server. + + url - specifies URL to be used to + contact this server over the control channel. Other servers used this + URL to send control commands to that server. + + role - denotes the role of the + server in the HA setup. The following roles are supported in the + load balancing configuration: primary, + secondary and backup. + There must be exactly one primary and one secondary server in the + load balancing setup. + + auto-failover - a boolean value + which denotes whether the server detecting a partner's failure should + automatically start serving its clients. + + + + + In our example configuration, both active servers can allocate + leases from the subnet "192.0.3.0/24". This subnet contains two + address pools: "192.0.3.100 - 192.0.3.150" and "192.0.3.200 - 192.0.3.250", + which are associated with HA servers scopes using client classification. + When the server1 processes a DHCP query it will use + the first pool for the lease allocation. Conversely, when the + server2 is processing the DHCP query it will use the + second pool. When any of the servers is in the partner-down + state, it can serve leases from both pools and it will + select the pool which is appropriate for the received query. In + other words, if the query would normally be processed by the + server2, but this server has crashed, the + server1 will allocate the lease from the pool of + "192.0.3.200 - 192.0.3.250". + +