IBM unveiled a new storage architecture at the Supercomputing 2010 conference and walked away with the prize in the event's Storage Challenge which rewards the most innovative storage solution entered in the competition. GPFS-SNC (General Purpose File System-Shared Nothing Cluster) is designed to reliably store petabytes to exabytes of data while processing highly parallel applications like Hadoop analytics twice as fast as competing solutions.
"Businesses often talk about the importance of collecting, organizing and analyzing information to make smarter business decisions but those that are unwilling to adapt their storage technology strategies are literally running into walls, unable to keep up with the vast amounts of data generated on a daily basis," said Prasenjit Sarkar, Master Inventor, Storage Analytics and Resiliency, IBM Research - Almaden. "We constantly research and develop the industry's most advanced storage technologies to solve the world's biggest data problems. This new way of storage partitioning is another step forward on this path as it gives businesses faster time-to-insight without concern for traditional storage limitations."
GPFS-NSC is a distributed cluster architecture. Unlike many other clustered file systems, including Hadoop's own HDFS, distributes all functions across all its nodes eliminating metadata servers which are single points of failure and performance bottlenecks. Data is also distributed across the storage in the cluster which allows the system to use commodity servers but still deliver enterprise reliability by tolerating not only disk but also node failures.
IBM designed GPFS-SNC to serve not just Hadoop style MapReduce but also data warehouse and other OLAP applications and cloud computing applications with large file stores. Unlike many research or OLAP based systems GPFS-SNC supports enterprise computing features including snapshots, replication and full POSIX security and access control lists allowing it to be used for more conventional applications from engineering file storage to virtual server hosting. It also provides workload isolation features allowing a single cluster to serve several disparate workloads without the workloads affecting each other.
While GPFS-NSC is strictly a research project and IBM isn't saying anything about when, or if, a shared nothing cluster will be orderable. GPFS also serves as the core technology behind IBM's SONAS scale out NAS solution, which relies on shared SAN storage behind its nodes, and several IBM HPC (High Performance Computing) oriented products including IBM's Information Archive and Smart Business Compute Cloud. The GPFS-SNC project is also likely to be used in the VISION cloud project IBM announced earlier this month.