![]() |
|
| F E A T U R E | |
Unraveling the Mysteries Of Clustering October 2, 2000 By Ron Anderson, Mike Lee and Steve J. Chapin What comes to mind when you think of clustering? Do you think of virtually uninterrupted computer services via sophisticated, multinode, failover software/ hardware implementations? Or does the term bring to mind scalable, distributed computational engines that span hordes of inexpensive, networked computers, letting you save money on servers and come out smelling like a rose when budget time rolls around? Or does clustering make you think of large Web sites with load-balanced applications? In the commercial arena, Digital Equipment Corp. (now part of Compaq Computer Corp.) was an early pioneer in clustering. Its primary operating system--VMS--offered built-in clustering support. Even today thousands of companies are still happily running Digital VAXclusters. The first VAXcluster shipped in 1983. The software offered a single-system view of the cluster and let both users and administrators use any node indistinguishably from any other. Ironically, many would argue this environment offered more seamless integration than is found in many clustering products today. But the proprietary nature of VMS--even after Digital pasted "Open" in front of the operating system's name--along with the evolution of super-cheap hardware eventually made it obsolete. Whatever the term clustering conjures up for you, there are tried-and-true business reasons for making sense of these technologies. Depending on your IT infrastructure, some combination of clustered computers might just save your company big bucks and preserve your bonus at the end of the year. The main goals of clustering are simple, even obvious. First, systems with redundant components can be more reliable than those without. Clustering for fault-tolerance uses extra computers to back up services and components. If a working component (or entire computer) fails, the backup machine takes over. A group of computers also can do more work than a single one can. Clustering for scalability breaks down a unit of work to be done into smaller pieces and spreads those pieces among the computers in the cluster. Snipping Downtime Every vendor peddling an operating system for servers recognizes the market for fault-tolerant solutions is poised to explode as businesses continue to roll down the e-commerce highway. Unexpected computer downtime costs millions. Fault-tolerant solutions continue to end up on the right side of the cost/benefit analysis as an ever-increasing number of businesses begin to get a handle on the costs associated with not implementing them. The most common fault-tolerant solutions are based on high-availability technologies, but load-balancers can have fault-tolerant features as well. Four-nine uptime (99.99 percent), or less than one hour of downtime per year, is the entry point for high-availability solutions. When you consider all the components that can go wrong--power, environment, connectivity, software, hardware and biologicals (us)--five-nine uptime, or five minutes of downtime per year, is the Holy Grail of high-availability for distributed systems. Five-nine uptime is expensive and complex but worth the cost and effort if your e-business lives or dies by the second--as Amazon.com and eBay do, for example. Six-nine uptime is a marketing illusion--run for the hills if a vendor suggests its solutions provide six-nine uptime. Given today's distributed systems and our current computing infrastructures, this level of availability can't be achieved. But that's one less thing to worry about.
A word of warning to any IT professional who is contemplating a trip down the high-availability road: The high-availability solution alone may not be enough. To obtain true fault-tolerance, you must eliminate every single point of failure that can interrupt your service goals. Redundancy is the key from start to finish. High-availability computer components and connections to the systems' public network are important. But you'll also need to address redundant power and Internet connectivity issues as well as the possibility of computer-room disasters and stupid human tricks (the No. 1 cause of system failure) as you plan for high availability. In fact, to be truly effective, your high-availability configurations need to be tied tightly to your disaster-recovery plans. Business users can choose from scads of high-availability solutions. All very similar, the solutions rely on server-class computers with redundant features and private disk space as well as shared redundant disk space, redundant public and private network connections, and failover-management software. Many of them contain some intelligence in the application so failover can be handled at the application level. A server-class computer is at the heart of any high-availability cluster. You need the redundancy provided by server-class architecture, but just as important, you need expansion slots for redundant public and private networks and redundant disk host bus adapters for shared storage and host bus adapters for the private disk space. That's a minimum of six slots depending on what's built onto the motherboard, not counting any other expansion needs. Look for Fibre Channel shared storage to dominate the shared-storage space. For now, however, Fibre Channel isn't cheap. The private network is the key to the failover-management software's recognizing when a system or process has failed because it carries the heartbeat for the cluster. A two-node high-availability cluster should be interconnected using crossover network cables between redundant NICs in each server. Using crossover cables eliminates a potential point of failure at a hub or switch. High-availability clusters with more than two nodes will need to use a hub or switch to interconnect the nodes. Speed is not critical; 10-Mbps Ethernet will do fine (see "Typical Two-Node High-Availability Cluster Configuration," above). Even when all the pieces of the cluster are functioning, failover is not immediate when a system goes down. First, the failover-management software needs to recognize via the heartbeat that a process has failed. The disk space needs to be dismounted, and the IP address needs to be "deallocated" from the failed system, then remounted and reallocated on the failover system. Finally, the application process needs to be restarted on the failover system and the clients need to reconnect. Depending on the state of the application, users may need to reauthenticate or they may just notice a delay between operations.
| |
|
PAGE: 1 I 2 I 3 I 4 I 5 I NEXT PAGE |
|
Best of the Web
Data deduplication: Declawing the clones
Data deduplication is emerging as a critically important new arrow in the storage administrator's quiver to answer hard questions about the increasing problem in storage growth costs.
Compression, Encryption, Deduplication, and Replication: Strange Bedfellows
One of the great ironies of storage technology is the inverse relationship between efficiency and security: Adding performance or reducing storage requirements almost always results in reducing the confidentiality, integrity, or availability of a system.
WAN Optimization Whitelists and Blacklists
Optimization is a fantastic way of saving money and creating really happy customers at the same time, but it doesn't work flawlessly for all applications.
WAN Optimization as a Managed Service: It's Not About the Cost
This insight examines how organizations outsourcing their WAN optimization initiatives to a third-party go about achieving their goals for application performance, reducing operational costs, and streamlining enterprise infrastructure.






