Building High-Availability Networks With Windows Storage Server R2
Disaster season is already upon us with the first Atlantic hurricanes of the season. Learn how to use Windows Storage Server to keep your servers available 24/7, no matter where
June 24, 2006
We almost had a hurricane in Florida this week, and it's only mid-June. The entire Eastern seaboard, the Gulf region, the heartland and possibly even the Canadian side of I-95 is up for grabs to African-born hurricane and tropical storms. Alberto tried to disrupt our lives in Florida, even though he never made it to hurricane strength. What he did create was pandemonium for IT. And if that was not a big enough scare, Florida Power and Light (FPL) did something nasty in downtown Fort Lauderdale on June 8, and plunged a bunch of big corporate offices on Los Olas Boulevard into darkness for an entire day.
Businesses are now scrambling again to get disaster recovery systems upgraded or in place. Why now when the whole winter was available? People procrastinate. IT departments fight with management to get a budget for DR and failover equipment when there is no hurricane on the horizon; CFOs don't feel pressured to give up the funds and look for excuses to rob IT coffers. When a hurricane threatens or the lights go out, everyone suddenly wants the ultimate failover on the lowest of "bank-width."
What are our options? Over the next few articles, I am going to explore a few. First, let's be clear on the objective. We want to set up a system or systems to allow Windows Server to be replicated, services, data and all, to other locations. Why? So that when the next hurricane hits, or when a tornado strikes, or a dyke breaks, or an FPL engineer drops a prune pit into a transformer and the city goes dark, users will be able to continue as if nothing happened. First, let's define two important words that will be prevalent throughout the architecture development: Replication and Failover. Replication is the most important aspect of any DR plan involving redundancy to another location. We must first replicate our data to remote servers and be sure the data is current before it can be used. Failover is the process of redirecting users to the replicated data in such a way they are not aware that the servers they are connecting to are now in Seattle and are no longer in Miami (assuming that Miami has been obliterated). Both failover and replication can be achieved in different ways, but solutions can't be implemented without a budget that requires more thought than sorting through a pile of credit cards for the best line to use.
Let's first look at the replication scenario. The first data you typically want to replicate to a remote location is your active directory data. Fortunately, this is easy to do because Active Directory and the Domain Controller services on which it exists have been designed to be widely dispersed. If you place domain controllers in your remote offices that are easily installed into AD, replication is handled for you. If you plan to offer failover for Exchange, SQL Server, your file server data, or any other data that resides on a Windows Server 2003 system, AD domain controllers must be within easy reach of the servers you need to "clone" in the new location. This is key.
AD "affinity" for services like Exchange or SQL Server (especially on clusters) needs to be reliable. As part of any HA solution, you thus need to place a DC in the same data center or rack as your servers, or they need to be accessible on a reliable network with reasonable bandwidth (at least 128KB reserved, but 256KB and higher if possible).Placing a domain controller in the remote location (your own facility or a co-location you can extend your network to) is easy. Install the Windows Server 2003 R2 Standard Edition of a server at your main site and connect it to the network. Once the server is running and has all necessary patches and hot-fixes, you can assign it its future IP address, subnet and gateway information, shut it down and then ship it to the target site. Don't forget to enable Remote Desktop so that you can login to the server remotely if you do not have remote eyes and hands to help you log on at the site.
At the target site, promote the server to be an Active Directory DC. Once promoted, replication to and from the main site will only work once AD's knowledge consistency checker (KCC) is able to discover routes to the server. This is achieved by creating a site on the remote subnet and placing the DC in the site. The tool you use for this is Active Directory Sites and Services.
With AD running or accessible to the remote location, we can turn to failing over of the file servers from any location to the DR site. Replicating files from one server another is not difficult. At the low end (budgetwise), you can use the new and improved enhanced replication that has been included with Windows Server 2003 R2. However, the best option is to install Windows Storage Server (WSS) onto the network and let it be your replication bridgeheads for a highly available file server solution. To kick off our full HA architecture, let's explore WSS first. The version of WSS we want to use is Release 2 or R2. You can get it on full servers or as an appliance driven system, from the likes of HP or Dell or Iomega. WSS is a full member of the Windows Server operating system family. However, it is for original equipment manufacturers (OEMs) who couple their hardware with an operating system optimized for Network Attached Storage (NAS) functionality.
WSS R2 is the latest release in this popular operating system and provides, among significant enhancements in storage management, improved support for replication and DFS over the WAN. If you are unclear on the definition of a WSS NAS appliance, it is a file server, designed to run without a monitor (headless), keyboard and mouse, that acts a front end filer to almost unlimited storage. You manage the appliance remotely via a Microsoft Management Console (MMC)-based Windows Storage Server Management user interface that loads Remote Desktop for Administration.
A NAS can host its own hard disk drives, attach to an exclusively owned or shared storage array, or attach to a storage area network (SAN). No matter what the storage solution is on the back of WSS, it is highly optimized as a file server. R2 of the operating system is further optimized with Single Instance Storage (SIS), some snazzy replication functionality, and the new version of Microsoft's Distributed File System (DFS).The idea of SIS in WSS is to conserve disk space by seeking out identical files on the disks and then storing only a single copy of the file in a SIS common store. The files that get deleted are replaced with pointers to the file in the SIS common store. The algorithm reduces the amount of redundant data stored on a volume, an ideal feature for mirroring data to backup servers and DR sites.
The SIS store caters to several duplication scenarios that occur all the time on a file server, the most common being files received as email attachments. These files remain in the SIS store on Exchange until the users open the email and save the files to their home folders. Thus, a single server can end up with hundreds of copies of the same file. Imagine what a new version of the Employee Handbook would do to your storage situation. Most importantly, the SIS has important benefits when it comes to replication of data to failover storage. This brings us to the central theme of this article, DFS.
DFS and WSS were designed to meet one the main goal of every IT department -- ensuring that users can get to their files, be able to work with them, and then save their files. It is also important not to forget that the files also need to be secure, even under replication and failover conditions. The more branches you have, or plan to have, the harder it becomes to keep users connected to the correct servers, shares, folders, and files.
If your users are scattered all over the county, you also have to consider network traffic over slow, wide area networks (WAN) and be confident that in the loss of a server or network, your branch servers will be accessible, backed up, and available to users. In other words, if it's business continuity you need, then you need DFS. DFS with Windows Storage Server 2003 R2 and Active Directory lets you cater to these scenarios with DFS Namespaces and DFS Replication; both are new additions in Windows Server 2003 R2. When these are used in tandem, your users get a more simplified, and fault-tolerant access to files, load sharing and -- most important -- WAN-friendly replication.
DFS Namespaces is a big change from the former Domain "Dfs" share points; it lets you group shared folders that are scattered all over the country on different servers to present them to users as unified namespace. The namespace is essentially a virtual tree of folders, whose braches emanate from more than one server.
DFS Replication in R2 is the successor to the often problematic File Replication Service (FRS). DFS "repl" comprises a multi-master replication engine with support for scheduling and bandwidth throttling. Microsoft has focused on replicating for effective WAN network usage. This means that only changes made to a file will be transmitted over the network. The OS now uses the so-called remote differential compression (RDC) algorithm for data mirroring across the WAN.The RDC algorithm is best suited for making small changes to large files. In the past, replication would replicate the entire file on the slightest change. Now only the exact bytes that need to be replicated are replicated. This technique results in as much as 97 percent less data being sent across the network. RDC is thus very effective as a merge technology, keeping copies of files on widely dispersed servers up to date and minimizing conflict resolution.
The replication itself is also robust. When R2 servers in a replication group begin to synchronize with each other, they are able to determine which files need to be replicated. So they first exchange metadata about the files. The metadata exchanged is minimal, and the possibility of sending changes unnecessarily (due to the order the changes occur) is eliminated because the synchronization is state-based instead of event-based. The older FRS service used an event-based replication service; and it was often plagued by latency issues that meant replication on files would be overdue, sometime as late as several minutes, after a change.
State-based replication synchronization, along with RDC, means that replication can be achieved to many more members than was possible with FRS. Microsoft has published the following scalability figures for the new replication service:
Replication groups can be as large as 256 servers.
You can replicate up to 256 folders per group.
You can can have up to 256 connections per server. This would allow for 128 inbound connections and 128 outbound connections.
On each server, the number of replication groups multiplied by the number of replicated folders multiplied by the number of simultaneously replicating connections must be kept to 1024 or fewer. If the replication schedule is staggered, you do not need to count the connections that are not replicating due to a closed schedule.
A volume can contain up to 8 million replicated files, and a server can contain up to 1 terabyte of replicated files. According to Microsoft, these are tested numbers and recommended guidelines for performance and scalability reasons.
DFS Namespaces are a simple way to redirect users to other servers that contain required data when their primary server is down, helping to achieve a higher level of server availability. This means we can use DFS to synchronize data on all servers that host a particular folder in the namespace. This was not possible on earlier versions of the OS, including the original release of Windows Server 2003. The net result is that no matter which folder the user connects to, he or she sees the same data. The so-called namespace offers uniform access across multiple file servers, making it easier for users to "remember" their server names and easily access their data. And with content easily publishable to local servers that are part of a namespace, users in branch offices can access their data locally. If a server malfunctions, the namespace automatically refers clients to the next closest server.
WSS with RDC and DFS replication presents a scalable, reliable, multimaster failover solution. No matter how big or small your business, the technology will allow you to drop two servers across the world or scale to several thousand nodes, cater to hundreds of departments, and set up topologies to satisfy the wildest imaginations.Server Pipeline columnist Jeffrey R. Shapiro is the co-author of Windows Server 2003 Bible (Wiley) and is an infrastructure architect who manages a large Windows Server network for an insurance firm.
You May Also Like