Strategic Info Management: Cleaning Up with MAID

MAID storage cleans up messes by improving the long-term reliability of disk-based archives.

September 8, 2006

11 Min Read
Network Computing logo

When it comes to data archiving, you would no sooner rely on a hard disk for long-term storage than you would a pair of in-line skates for a cross-country trip. For enterprises, hard disks are a disposable commodity, while large disk arrays are reserved for highly transactional primary storage--when data integrity is more important than saving a few dollars. IT managers invest a lot of time, money and sleepless nights to ensure that the data stored on their large disk systems will survive years or even decades necessary for active mid- to long-term archives.

Strategic Information Management

Tape and optical may still be the best choice for passive or deep archives, but the increasing pressure of regulatory compliance and legal discovery makes a serious case for the advanced search capabilities and fast access of active archiving. MAID (Massive Array of Inactive Disk) storage was introduced in 2002 to reduce the operational costs and improve the long-term reliability of disk-based archives. In essence, MAID takes RAID one step further by introducing power management and enhanced disk monitoring as components of array control.

A MAID system powers down inactive drives, reducing the heat generation and electrical consumption of the overall system. As an added benefit, MAID advocates believe decreasing the run time of the individual disks results in a substantial increase to their life expectancy. This represents real progress in overcoming some of the concerns of archiving on disk, but MAID also adds complexity to low-level array management.The Challenges

Hard disks are the most reliable they've ever been, but even hard-drive manufacturers remain low-key about their suitability as an archiving platform and simply refuse to comment on the stability of data on drives at rest. Unlike conventional RAID systems that remain constantly spinning, MAID systems must effectively handle data on disks that are intended to spend much of their time offline. Two of the main concerns here are data protection and access performance.

In a normal RAID array, the drive status is being constantly monitored by the controller, and any error conditions can be acted upon immediately when they occur. In a MAID environment, a drive may be inactive for weeks or even months. In this scenario, the system must be able to monitor the condition of these inactive drives to ensure they function correctly and their assigned data remains intact.

Even though one of the primary goals of a MAID system is to minimize disk activity and reduce power consumption, the system still must be responsive to random system requests. More important, the overall system must be capable of providing throughput for large-scale data transfers as well as support protracted search and recovery tasks.

The PlayersTwo storage vendors offering MAID technology today are COPAN Systems and Nexsan Technologies. Both base their high-density arrays on SATA disk technology, but differ substantially on their MAID implementations.

COPAN Systems

COPAN was the first to offer MAID-based storage in 2003, and their Revolution 200 series of storage systems was built to take full advantage of the technology. Its power-management approach only allows a maximum of 25 percent of the drives in its system to be active at any given time. With the reduction in heat, COPAN can pack up to 448 TB of storage, or 896 500-GB SATA drives into a full rack that only consumes 6,034 watts of power at peak load and 3,253 watts on standby.

This magic is driven by the company's patented Power Managed RAID software, running on the individual storage controllers located on each of the eight 14-drive canisters filling a storage shelf. Of the 112 drives in each storage shelf, four are reserved as global spares and the rest are divided into 27, four-drive RAID 5 sets. The drives making up the RAID set are located on different canisters, and even the loss of a whole canister won't affect the data integrity of the shelf. Each canister's controllers can manage the RAID sets independently and power them up or down as needed.

To further reduce heat and energy loads at storage level, the racks run on 48-VDC power provided by redundant power supply modules which are separate from the shelves. Each storage shelf has an independent controller that manages the activities of the RAID sets, records the location of stored objects within the shelf and provides the Fibre Channel interface to the rack controller module. A rack can contain up to eight storage shelves and each rack requires a rack controller, which is a dedicated server that supports the software front end for the shelves and provides FC and Ethernet front-end connectivity.

Perhaps one of the most important features of COPAN's system is its patented Disk Aerobics software technology. Lest you conjure up pictures of hard drives wearing sweat suits, this technology monitors disk health and metrics, regularly spins up idle drives and uses heuristic-based diagnostics to automatically migrate data to global spares when needed. COPAN has been keeping statistics on over 10,000 SATA drives since 2004, and it claims that the combination of reduced run time and drive monitoring has the potential to increase drive service life nearly fivefold. More important, it has zero reported incidents of lost or unavailable data since shipping began in the spring of 2004.

The COPAN Revolution 220 is targeted at two high-density markets: searchable archives and VTL (virtual tape library). The storage remains the same for either application--the only difference between models is in the rack controller, or personality module, providing the software front end. Perhaps the only downside is that the storage shelves are not designed as generic storage platforms and can only interface with COPAN's personality modules.

Nexsan Technologies

For several years COPAN was the only company offering MAID technology, but late in 2005 Nexsan began offering AutoMAID technology to its multipurpose line of SATA FC arrays. Rather than taking a purpose-built approach to the design of its array systems, AutoMAID adds power-management capabilities to Nexsan's storage controllers, which supports multiple levels of power saving modes to be assigned to individual arrays within the disk subsystem. This approach has a lot of similarities to the power-saving methodology used in modern laptops, and is just about as easy to manage.Its 4U, 42-drive SATABeast enclosure can be divided up into any combination of RAID 0, 1, 1+0, 3, 5 or 6 arrays, for example, and each can use three different levels of power saving modes. At Level 1, the drive heads are unloaded; at Level 2, the drive's speed is reduced to 4,000 RPM; and at Level 3 the drive is spun completely down but remains in a low-power sleep mode. These timer-based modes are user-designated, and can be stacked. If an array is unused for 10 minutes, it can be set to drop to Level 1, after 30 minutes progress to Level 2 and after 2 hours without access, it moves to Level 3. Spin-up times vary for the different power modes and recovery from Level 1 is less than one second, Level 2 is 15 seconds and Level 3 is 30 seconds. High performance arrays can be set to ignore power management altogether.

As a multipurpose storage system, the SATA arrays from Nexsan have a great deal of flexibility when it comes to supporting multiple applications within the same storage subsystem. The trade-off comes at the density level, where Nexsan has no choice but to engineer its highest-density 4U to handle the full heat load of all 42 drives running at 100 percent. By comparison, COPAN's strictly power-budgeted system will only allow a maximum of 28 of the 112 drives in its 4U to be operational at any given time, offering substantially lower cooling requirements and nearly three times greater drive density.

Nexsan's 42-drive SATABeast uses a custom storage controller that's connected to a custom, 42-port SATA controller. This allows for some creativity and artistic license when building multiple arrays. The system is powered by dual, 120-VAC power supplies and offers concurrent iSCSI and FC-SAN connectivity. There is also room for an additional storage controller to increase throughput for primary storage applications: Nexsan's approach would let you use its power-saving capabilities on even high-performance applications during off hours.

Two Voices in the Wilderness

Only COPAN and Nexsan have made the move to MAID technology, but with energy-efficiency becoming a serious consideration in many data centers, it's possible they will benefit from their vision eventually. COPAN makes no secret that its purpose-built systems are targeted specifically at lower bandwidth, non-transactional applications like archives and, as a result, its energy-efficient MAID solution offers some of the highest density storage available today--more than 44 TB per square foot of floor space. By the same token, Nexsan markets the higher performance and flexibility of its storage solution, and offers user-specified MAID energy-saving capabilities as a bonus to all types of storage applications.The sweet spot of MAID technology is in the active archive and VTL space, but like any hardware storage solution, MAID is still dependent on front-end software to provide metadata support, access management, search capabilities, audit trails and other information life management services. Both COPAN's Millennia Archive and Nexsan's Assureon offer the software features required for long-term archives, but that's a different article altogether.

As an archival media, the MAID storage platform is designed to mitigate some of the heat, power and reliability problems associated with long-term disk storage. At this point, it's the only solution focused specifically on addressing these concerns for disk-based arrays.The singing globe on the Saturday morning cartoons was right: Electricity is key. Power and cooling requirements are becoming a major concern in data centers everywhere, and a recent study by Gartner showed that the cost of energy to run a server could exceed the cost of the hardware in as few as four years. MAID technology offers the most energy-efficient disk storage platform today, and with growing need for searchable archives it certainly a technology to consider for storage that needs to be near-line accessible for seven to 10 years. However, it's still only part of a solution to the larger problem of long-term digital storage media.

Disk interfaces, drive capacities and storage controllers change, and hard disk storage remains similar to tape or optical since accessing the media ultimately depends upon an outside translation system-whether it is tape drive or array controller. Regardless of the long-term archive media you choose, you'll need to refresh and migrate your data to new storage media options in the future. And that means, when it comes to disk-based archive solutions, the cost of security will remain eternal vigilance.

Steven Hill, an NWC technology editor, can be reached at [email protected].

Lies, Damn Lies, And MTBFFor drives that remain powered, reliability was originally measured as MTBF (mean time between failures), which is based on engineering estimates. Modern enterprise-class drives promised up to 1.2 million hours MTBF--a source of amusement to those actually working in the storage business, who knew better. More popular today is the AFR (annualized failure rate) specification, which is based on similar voodoo but expressed in a more useful annual percentage.

The newest generation of inexpensive, high-capacity 7,200-RPM SATA drives--the type commonly used for archive purposes--lists an AFR of 0.73 percent, for example, promising that less than 1 in 100 should fail per year. But in a 2004 study of 4,000 SATA drives done at the University of California at San Diego, researchers found that the annual failure rate was actually about 2.1 percent. These failures were caused by a number of common factors, such as head crash, IC component failure, head/actuator malfunction and, our personal favorite, head stiction. Of course, 2 percent doesn't seem like much unless those failures occur in your RAID 5 array at the exact same time.

In all fairness to drive manufacturers, there's no way to take into account the actual production loads and operational environments their equipment is subjected to in the field. As we said, drive technology is the best it's ever been but our position still remains caveat emptor. High-reliability drives and parity-protected RAID may seem safe enough, but these are nothing more than short-term insurance policies that offer a small window of opportunity to save yourself should something go horribly wrong. Which, of course, it will. If your business depends on your data, there's still no replacement for regular data backups stored at a secure, second location.

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox
More Insights