![]() |
|
| W O R K S H O P | |
Maximizing Uptime With Redundant DHCP October 2, 2000 By Kevin Philpot If you're using DHCP in your network, no doubt you've considered the consequences of DHCP server failure. Simply put, if your workstations don't have IP addresses, they cannot communicate. With the many other bindings available via DHCP, IP is just the start of the services you'll lose if DHCP goes down. DHCP redundancy is needed, but without RFC 2131 (www.dhcp.org/rfc2131. html), the options are limited. You could deploy another DHCP server to serve nonoverlapping IP addresses, so if one server fails, clients would receive addresses from the other. However, this solution requires you to double the size of your address scopes so you can assign half to each server. Because the servers can't communicate, you can't predict which one would assign the lease. The DHCP protocol also has a form of built-in redundancy: If you lease IP addresses for 12 days, the connection with the server is verified every six days. If a server fails, you have at least six days to bring the machine back online before stations with standing leases are affected. An outage would affect only IP addresses for newly deployed workstations.
For those workstations that have been added or moved, you could install a second server to assign addresses that don't overlap the original server's scope. Any adds or moves would simply get addresses from the secondary server. They would renew to the primary server as soon as their temporary lease reached its half-life. Enter RFC 2131 RFC 2131 is the latest draft for implementing DHCP servers and includes new functionality that enables multiple redundant servers to draw from the same address space. In addition, a group is working on Draft 7 of the DHCP Failover Protocol, which defines a protocol for synchronizing primary and secondary servers to ensure redundancy. RFC 2131 allows for two or more servers to assign leases for common address scopes. The primary server hands out leases, while the secondary server watches the primary server's health. All the while the two servers share lease information with each other. To avoid the possibility of duplicate IP addresses, the secondary server has its own pool of addresses that it uses if the primary server fails. Server-to-Server Syncing If the redundant DHCP servers are to work properly, they must be able to synchronize lease information. Any client with a lease will be able to renew it with either server. RFC 2131 addresses this issue by defining three types of server-to-server messages: server lease synchronization, operational state (hello packets) and "I'm back" (when the primary DHCP server returns from the dead). Redundant DHCP servers following the RFC 2131 DHCP Failover draft use server-lease-synchronization messages to communicate lease information between one another. When both of the servers are operating properly, a continuous stream of messages flows between the primary and secondary servers. There are three types of messages used to communicate lease information: The server will send an add message to the secondary server when the primary hands out a new lease, either server will send an update message when there is a change to a lease (such as a renewal/extension) or the servers will send a delete message when a lease expires and is once again available. In all cases, the receiving server responds with a positive or a negative acknowledgment of the message. These messages are referred to as "lazy updates" because they are sent to the other server only after the transaction with the requesting DHCP client is complete. In addition to maintaining a current database of lease information, the secondary server must keep a watchful eye over the primary server so it knows when to take over the handing out of leases. This is accomplished by monitoring the TCP connection between the two servers. The secondary server uses three criteria to determine if communications between it and the primary server are satisfactory. First, it must be able to establish a TCP connection. Then it must receive a connect message from the primary server and be able to respond with a connectack. Finally, it must receive state messages from the primary server, which it uses to determine its own operational state. The RFC 2131 DHCP Failover draft specifies a mechanism for letting a primary server return after a failure. When the primary server comes back to life and wants control back from the secondary, it initiates a sequence of three messages: request for control, return of control initiated and return of control completed. All message exchanges between the primary and the secondary servers are encoded in standard DHCP packets. The packet types are defined in the RFC, but the binding information itself has yet to be standardized. Because this draft has not been fully ratified, we won't likely see multivendor interoperable redundant DHCP solutions working together anytime in the near future. A Trial Deployment At Schneider National, we use Cisco Systems' Cisco Network Registrar (CNR). Recently we undertook a test to deploy a redundant DHCP server implementation, a feature that is new to version 3.0 of CNR. Before deploying our redundant DHCP solution, we configured CNR in the lab. We installed both the primary and the secondary server on the same network to mimic our production configuration (though the protocol explicitly states the servers can be on separate networks). We added a router and a second network to model our user segments. The router had IP helper addresses pointing at both CNR (DNS and DHCP) servers. We then installed five Microsoft Windows 95 workstations on the "user segment" network to populate the DHCP server with lease information. Note that we also use CNR for DNS; our lab test also included configuring CNR to handle primary and secondary DNS duties. Although configuring the DHCP servers for redundancy was not difficult, there were several gotchas that can plague a first-time deployment. In Chapter 10 of the CNR manual, Cisco presents three failover configurations. We chose the simple site configuration because we didn't require load-balancing between servers. This configuration required two steps: configuring the servers (primary and backup) to work together, and duplicating the scope data and transferring it to the backup. We then reloaded the servers and were ready to go. Before You Start Before you deploy your redundant DHCP solution, make sure it works properly. First and foremost, if you are using BOOTP (Bootstrap Protocol)/DHCP relays on your routers, you need to add a BOOTP/DHCP relay--called an IP helper on Cisco's routers--for the backup server. (You should have one in place for the primary server). BOOTP/ DHCP relays are configured on the Ethernet interface of the router that serves as the default gateway for the hosts or workstations on the segment. The BOOTP relay takes the DHCP broadcast packets off the segment and forwards them to the DHCP server. When you add a backup server, you need to add a second BOOTP relay to each Ethernet interface. If you skip this step, you'll end up with a non-fault-tolerant network. When the primary server fails, the packets will never get forwarded to the secondary server. The second sticking point when deploying redundant DHCP servers is that the scope information needs to be synchronized between the primary and secondary servers manually. The DHCP Failover Protocol addresses lease information, but not scope information. CNR servers synchronize only DHCP lease data, omitting scopes and other configuration data. If you make any changes to the lease scopes, you need to make the changes on both servers manually. Thankfully, Cisco provides a utility that compares the servers' DHCP configurations and warns you of any differences. With more than 100 scopes in our network, it would be very difficult to set up and maintain server synchronization. However, Cisco supplies a very useful script for cloning the DHCP servers when setting up the secondary server but warns against using it on a regular basis. Testing Our ImplementationP> We followed these instructions as we set up our test servers. After configuring the redundant server and adding the appropriate IP helpers to our Cisco router, we tested the backup functionality of CNR. We did this by both gracefully shutting down the CNR software and ungracefully unplugging its Unix host. In all cases, the failover worked as advertised. We did not detect any "outage" from the user segments. We used a Network Associates Sniffer to watch the packet exchanges between servers. The packets were easy to recognize, and the secondary server responded to failures in less than one second using the default configuration. Kevin Philpot is an administrator at Schneider National in Green Bay, Wis. Send your comments on this article to him at philpotk@schneider.com.
| |
|
PAGE: 1 I 2 I NEXT PAGE |
|
Best of the Web
Data deduplication: Declawing the clones
Data deduplication is emerging as a critically important new arrow in the storage administrator's quiver to answer hard questions about the increasing problem in storage growth costs.
Compression, Encryption, Deduplication, and Replication: Strange Bedfellows
One of the great ironies of storage technology is the inverse relationship between efficiency and security: Adding performance or reducing storage requirements almost always results in reducing the confidentiality, integrity, or availability of a system.
WAN Optimization Whitelists and Blacklists
Optimization is a fantastic way of saving money and creating really happy customers at the same time, but it doesn't work flawlessly for all applications.
WAN Optimization as a Managed Service: It's Not About the Cost
This insight examines how organizations outsourcing their WAN optimization initiatives to a third-party go about achieving their goals for application performance, reducing operational costs, and streamlining enterprise infrastructure.






