As a concept, object-based storage systems appear to be the closest to a pure data storage device as you could imagine. Instead of being constrained by artificial constructs like LUNs and file systems, an object store allows the user to save and retrieve unformatted binary -- or essentially any format -- of data. So some 20 years after the idea of object storage emerged, why are we only now starting to see wide-scale adoption?
Object storage's history is steeped in academia, including work done at Carnegie Mellon University and the University of California at Berkeley. Many of these early ideas were based around large-scale file systems, including of course the underpinning of the most ubiquitous search platform with the Google File System. Probably the most commercially well-known object storage implementation in the enterprise initially was EMC’s Centera platform, a product developed from the April 2001 acquisition of FilePool NV, a Belgium-based startup.
Centera used a concept known as content addressable storage, meaning that data is stored and retrieved from the system based on a cryptographic hash value derived from the content itself. Modern object stores also rely on globally unique identifiers (GUIDs) but no longer generate them from the content. This method of addressing by ID highlights one initial barrier to the widespread adoption of object store technology: The need to access data through an API rather than existing protocols like SCSI or NFS. I’ll discuss this more later.
The value of object stores
Before going further, it’s worth looking at why object stores are relevant in the first place. I touched on one of the benefits of object storage earlier: high scalability. Today, even with multi-terabyte hard drives, it’s rare to see traditional storage systems built to the petabyte scale. Both block- and file-based systems have inherent issues around resiliency and the practicalities of operation, once capacities start reaching a few hundred terabytes. Data protection schemes like RAID don’t work well with large capacity drives and wide striping; managing multi-gigabyte or even terabyte sized files within a NAS appliance is equally challenging.
Object stores offer the ability to store vast quantities of data that reach into the multi-petabytes. Performance and resiliency don’t diminish with capacity, but on the contrary in most instances are improved as systems scale.
With these benefits, why are we just now in 2016 seeing a rise in the adoption of object platforms? There are a number of factors that have come together to make object storage an attractive proposition, including:
- Data retention demands –Organizations and individuals are storing more data than ever. At the consumer level, we create new data every day in the form of images, videos, text and log data --everything from where our iPhone is located to paying tolls on the roads. At the corporate level, businesses are retaining data for potential future value as part of backup and archive. There are also large active archives generated by media organizations, energy and mining companies, plus an enormous amount of medical imaging information. Data is collected from sensors and other Internet of Things devices.
- Software-defined storage – Although the term has become somewhat of a cliché, there has been an explosion in companies developing storage solutions based on commodity hardware running standard operating systems like Linux. The barriers to entry for developing new storage products is lower than ever before and there are open-source solutions looking to compete with commercial offerings. Pretty much all of today's object storage products are software-based, except for bespoke high performance computing (HPC) solutions.
- The Amazon effect – Amazon Web Services introduced Simple Storage Service (S3) almost 10 years ago. Since then, the S3 API has become a de-facto standard for object storage access and is supported by all of the major object storage vendors. The S3 API introduced simplicity -- it can be accessed over HTTP using simple REST-based commands -- and interoperability. Code written to access S3 can easily be adapted to work with other solutions, making data much more portable than in the days of Centera.
- Widespread protocol support – Object storage vendors have accepted that REST-based APIs aren’t for everyone. As a result, most support block- and file-based protocols, including data-specific access methods like Hadoop File System (HDFS). The use of traditional protocols, plus the ability to access the same data with each protocol type, means that object stores can offer more flexibility than ever before. This makes them more practical as active data stores rather than simply as archives.
- Cost – Storing large volumes of data has never been more cost effective and the price of storage continues to drop. This is reflected in S3's price cuts.. When it became generally available in March 2006, pricing for S3 was $0.15/GB/month. This has steadily declined to $0.03/GB/month over 10 years, with options for infrequently accessed data as low as $0.7¢/GB/month using AWS Glacier. The reduction in hard drive cost (with increasing capacity) means on-premises storage can be even more cost effective than AWS at the right scale. Storage at this price is simply unprecedented.
Object storage is already embedded into many applications without it being overtly exposed to the end user. Services like Dropbox provide file-sharing and use S3 to store their data. Both Exablox and Coho Data use object as the basis for their storage appliances, but expose file and block protocols respectively. At the drive level, Seagate produced Kinetic, a hard-disk drive that stores data in object form rather than referencing it by logical block address.
In many ways, object storage is becoming increasingly important in IT. Cleversafe, a market leader, has been acquired by IBM, and Scality recently received a significant investment from Hewlett Packard Enterprise. Both of these moves highlight how object stores are becoming pervasive for enterprise customers.
Learn more about the changing storage landscape in the Storage Track at Interop Las Vegas this spring. Don't miss out! Register now for Interop, May 2-6, and receive $200 off.