Object Storage: The Next Storage Paradigm
Object storage is evolving from a data archive to the primary form of storage in large systems.
November 25, 2013
I remember my first object store in 2007. Using a COTS x 86 server with 6 TB of storage, it was powered by Caringo software. I needed a cluster of four units to make a decent starter kit, and it promised metadata searching and replication. The setup worked fine.
Work I'd done previously on Replicus-derived solutions set my expectations. It wasn't very fast and seemed suited for archiving data. A year later, a bunch of my systems were storing the Human Genome Project at Johns Hopkins, and it was clear that this was no longer just an interesting technology.
Since then, we've come a long way. Many more vendors are in the game, and open-source programs like Ceph and OpenStack's Swift promise reduced prices. Amazon uses the S3 object store to underpin a good portion of its AWS cloud service. Other cloud providers now offer similar solutions.
None of this is occurring in a vacuum. There are many other changes in the storage market. For instance, solid-state disks (SSDs) are finally beginning to overtake mechanical disk drives, especially in the performance market, but the impact on bulk-oriented object storage has been slower than in other areas of storage.
Experience with large installations has made it clear that object has much longer legs in scaling out than blockIO or NAS systems. The traditional SAN becomes unwieldy at scale, and NAS is hard to manage.
This has resulted in renewed interest in object as the primary storage form in large systems. Performance has been addressed, and storage tiering is allowing SSD storage to provide a much faster path to active data, just as in hard disk drive (HDD) arrays. With lower drive prices, deduplication of objects, and the advent of erasure code systems -- which use about half the space of replication-based stores -- SSD is economical for a high-speed tier and much faster.
Cheap HDD bulk storage makes it possible to have more archive-class capacity per node for the more than 80% of data that is inactive. This also improves economics, making the latest stores attractive on both a cost-per-IOPS and cost-per-TB basis.
Ceph has already begun the next step. The RADOS object store that underpins the system also has BlockIO (iSCSI) and NAS gateways and can support S3 and Swift APIs. This unified approach erodes the need for separate storage silos for each protocol. A management and tools ecosystem is growing around Ceph, which is now mainstreamed in the Linux OS release.
EMC's ViPR system addresses a different point in unification. It brings the legacy gear typical of a datacenter into a single unified pool of storage. This is more a management approach than a data flow fix, since the legacy machines maintain their protocols, but it seems reasonable that, at some point, there will be gateways to bring them together, so that the pool can present as any of the standard options.
Ceph will likely evolve in ViPR's direction to provide the same capability. Theoretically, a Ceph server could expand to connect iSCSI or Fibre-Channel storage devices, though today it can't manage them on a single screen.
Another aspect of object is the way that host systems connect to storage. A typical OS storage stack shows layers of inefficiency. As a result, work is happening on new interfaces and modes of access. Seagate just announced an object-mode interface for its new shingled bulk drives. This is not an object store in its own right, but it is a drastic simplification of the stack. The NVMe interface, aimed primarily at SSD, is an attempt to slim the stack further. These two interface changes should help object store performance.
In the area of unified object stores, the storage industry is working on a key/data access mechanism for database work. This is already in its early stages and should tie in to content and metadata searching as time goes on. If that happens, I'd expect to see graphics processing units added to object storage nodes.
To speed up operation and reduce network traffic, some storage features need to migrate to the server. Compression and deduplication hashing must be done before shipping data over the network, for instance. Encryption of data at rest seems a logical addition on the store side, but encryption in transit needs to be resolved, as with all storage.
Looking further out, I predict that the Ceph/ViPR unified object model will be the norm for new storage products by 2020, and Ethernet-based access will be used in new installations. SAN and NAS will be absorbed into the unified storage pool, and storage management will be simpler.
About the Author
You May Also Like