Once content to bum around the basements of żbergeek hobbyists, Linux is lately finding itself in some posh and sophisticated sites previously reserved for commercial Unix platforms. High-availability Web farms and massive-parallel-processing projects in both the scientific and commercial realms are examples.
The word clustering is hopelessly overloaded and is being thrown around by anyone who hooks up two or more computers to have them work together in any way. For our purposes, we divide the Linux clustering space into the following categories: parallel processing, batch processing, and load-balancing and failover.
Parallel Processing
Certain scientific calculations -- for example, simulating the movement of a liquid at the atomic level -- require that the problem be parallelized at a very low level. This used to require expensive supercomputers and, depending on the exact nature of the computation, sometimes still does. Certain applications, however, can be run on clusters of regular workstations, connected over normal -- but private -- network connections. The advantage of these parallel-processing clusters is the high bang for the buck they offer in terms of the performance. By employing a large number of conventional workstations, you can keep costs low while assembling an amazing amount of processing power.
The Beowulf Project, started in the mid-1990s, offers parallel-processing options for the Linux platform. Donald Becker, the founder of Beowulf, has now moved on to make a commercial version of his brainchild, offered by Scyld Computing. The Scyld distribution provides a fast and polished installation, commercial support, and professional services for hire. The version we attempted to install in the lab was plagued with hardware issues. We hope customers holding commercial support contracts will have better luck than we did.
Batch Processing
Beowulf, in general, can run not only low-level parallel applications but also batch-oriented applications, such as data mining, 3-D rendering and engineering simulations. If you run a scheduler on top of Beowulf, any of these large batch jobs can be crunched on a cluster. These schedulers include Condor, from the University of Wisconsin's Computer Science Department and Portable Batch System, by Veridian Systems.
A new player on the scene is Project Nimrod, whose commercial incarnation is featured in TurboLinux's EnFuzion product.
Load-Balancing and Failover
Linux clustering solutions for load-balancing and failover seem to be popping up just as fast as new Linux distributions. In addition to offering EnFuzion for true clustering, TurboLinux also offers Turbo Clustering Server (TCS). TCS supports load-balancing and failover clustering for HTTP, FTP, SMTP/POP3/IMAP, NNTP, DNS and LDAP. The cluster manager node, called Advanced Traffic Manager, can be configured in failover mode so there is no single point of failure in the system. While some of TCS's code may be derived from the community, most of what we saw appears to be new code developed by TurboLinux. It is being released under GPL (GNU General Public License), so it may show up in other projects as well.
The High-Availability Linux (Linux-HA) project's Web site isn't fancy but is comprehensive in scope and has links to many other Linux projects under way. The Linux-HA project spawned the popular Heartbeat tool now included in several mainstream distributions, including SuSE and Mandrake. Heartbeat supports serial and Ethernet communications for failover of simple applications, including DNS and Web proxy caching services.
The Linux Virtual Server (LVS) project borrows the Heartbeat code and is collaborating with the Linux-HA folks. In turn, Ultra Monkey provides some Layer 4 switching capabilities by building off LVS.
Finally, as always, commercial support is essential. Companies like SuSE and Silicon Graphics Inc. (SGI) are teaming up in an effort to port SGI's FailSafe product to Linux, so enterprise support should be a nonissue.