The last stop on the geek tour at Tech Field Day Seattle was NEC's Seattle office where we got a live demo of their HydraStor deduplicating backup system. For a product no one ever heard of, HydraStor is pretty darn impressive. You can buy a small HydraStor that can ingest 500MB/s to 12TB of disk and grow it to suck in over 25GB/s of backup data, dedupe it inline, and store it on up to 1.3PB of raw disk. That's much bigger than a DD880.
HydraStor based on the redundant array of independent nodes (RAIN) architecture. A HydraStor grid includes accelerator nodes, to which backup and archive applications connect via common Internet file system (CIFS) and/or network file system (NFS), and to storage nodes that hold the data. Each accelerator node can ingest data at 500MB/s, each storage node has 12 1TB or 2TB SATA drives, and the system scales pretty linearly from two storage nodes and an accelerator to 55 accelerators and 110 storage nodes. The accelerator nodes divide the incoming data into variable size blocks, using a method NEC claims is not covered by Quantum's Rocksoft patent, calculating the hash for the block and check the hash index to determine if the data has already been stored in the grid. New blocks are forwarded to a storage node.
The storage nodes don't use conventional RAID, but instead Reed-Solomon style erasure codes. Each data block is stored as 12 chunks, each of which contains combined data and forward error correction information. HydraStor allows users to specify the level of protection they want from being able to recover data if one block is lost up to recovering data when up to six blocks are missing. The default level of three provides several times the protection of RAID-6 with 25 percent overhead. The chunks are stored across 12 storage nodes if the grid is large enough, and even in the smallest HydraStor mini that combines a storage node and an accelerator node in a single server, each chunk is stored on a separate disk drive. Should a few disks or storage nodes go offline, the grid will reconstruct the data across the remaining nodes without the need for designated spare drives or nodes. Users can add storage nodes or accelerator nodes as they need additional space or ingest speed. We saw a live demo of a 60 node grid with 20 access nodes (AN) and 40 storage nodes (SN) ingest data at a steady 10GB/s.
Now don't let me lead you to believe I think HydraStor is perfect. Since each backup stream has to go to a specific AN, single stream backups are limited to the 500GB/s of a single AN, while NEC supports Symantec's OST for replication management they're just using NFS for the data path. HydraStor is also lacking in the power management arena as it doesn't support drive spin down or even better node shutdowns.
Once we got back from the wind tunnel that is the NEC Seattle data center, the server guys in the room, lead by Blades Made Simple's Kevin Houston, wanted NEC to put the HydraStor software on blades with 10Gbps interconnects to clean up the cables. Then the virtualization guys, including Jason Boche, wanted to run the software on their servers rather than NECs. Of course the kind of steely-eyed storage guys that work in the organizations where backup speeds in GB/s are needed don't want to buy blades, servers and software to integrate. That would mean they would need to work with the server wannabes and the network prima donnas to integrate everything. They want to issue one PO and have one throat to choke. Remember, these are the guys that buy Brocade FC switches from EMC because the Brocade sales guy hasn't bought lunch this month.