It’s a truism that the amount of data created every year continues to grow at exponential rates. Almost every business now dependends on technology and the information those businesses generate has arguably become their greatest asset. Unstructured data, the kind best kept in object stores, has seen the biggest growth. So, where are we with object storage technology and what can we expect in the future?
Object storage systems
Object storage evolved out of the need to store large volumes of unstructured data for long periods of time at high levels of resiliency. Look back 20 years and we had block (traditional storage) and NAS appliances (typically as file servers). NAS – the most practical platform for unstructured at the time - didn’t really scale to the petabyte level and certainly didn’t offer the levels of resiliency expected for long-term data retention. Generally, businesses used tape for this kind of requirement, but of course tape is slow and inefficient.
Object storage developed to fill the gap by offering online access to content and over the years has developed into a mature technology. With new protection methods like erasure coding, the issue of securing data in a large-scale archive is generally solved.
Object stores use web-based protocols to store and retrieve data. Essentially, most offer four primitives, based on the CRUD acronym – Create, Read, Update, Delete. In many instances, Update is simply a Delete and Create pair of operations. This means interacting with an object store is relatively simple -- issue a REST-based API call using HTTP that embeds the data and associated metadata.
This simplicity of operation highlights an issue for object storage: Applications need to be rewritten to use an object storage API. Thankfully vendors do offer SDKs to help in this process, but application changes are required. This problem points to the first evolution we’re seeing with object: multi-protocol access.
It’s fair to say that object stores have had multi-protocol access for some time, in the form of gateways or additional software that uses the object store back-end as a large pool of capacity. The problem with these kind of implementations is whether they truly offer concurrent access to the same data from different protocol stacks. It’s fine to be storing and retrieving objects with NFS, but how about storing with NFS and accessing with a web-based protocol?
Why would a business want to have the ability to store with one protocol and access via another? Well, offering NFS means applications can use an object store with no modification. Providing concurrent web-based access allows analytics tools to access the data without introducing performance issues associated with the NFS protocol, like locking or multiple threads hitting the same object. The typical read-only profile of analytics software means data can be analyzed without affecting the main application.
Many IoT devices, like video cameras, will only talk NFS, so ingesting this kind of content into an object store means file-based protocols are essential.
One factor influencing the use of object stores is the ability to scale down, rather than just scale up. Many object storage solutions start at capacities of many hundreds of terabytes, which isn’t practical for smaller IT organizations. We’re starting to see vendors address this problem by producing products that can scale to the tens of terabytes of capacity.
Obviously, large-capacity hard drives and flash can be a problem here, but object stores could be implemented for the functional benefits, like storing data in a flat name space. So, vendors are offering solutions that are software-only and can be deployed either on dedicated hardware or as virtual instances on-premises or in the public cloud.
With IoT likely to be a big creator of data and that data being created over wide geographic distributions, then larger numbers of smaller object stores will prove a benefit in meeting the ongoing needs of IoT.
Turning back to the software-only solutions again for a moment, providing software-only solutions means businesses can choose the right type of hardware for their environments. Where hardware supply contracts already exist, businesses can simply pay for the object storage software and deploy on existing equipment. This includes testing on older hardware that might otherwise be disposed of.
The software-defined avenue leads on to another area in which object store is growing: open source. Ceph was one of the original platforms developed as an open source model. OpenIO offers the same experience, with advanced functionality, like serverless, charged as a premium. Minio, another open source solution, recently received $20 million in funding to take its platform to a wider audience, including Docker containers.
The focus on software means it's easy for organizations to try out object stores. Almost all vendors with the exception of IBM Cloud Storage and DDN offer some sort of trial process by either downloading the software or using the company’s lab environment. Providing trials opens software to easier evaluation and adoption in the long run.
Looking at the future for object storage, it’s fair to say that recent developments have been about making solutions more consumable. There’s a greater focus on software-only and vendors are working on ease of use and installation. Multi-protocol connects more applications, making it easier to get data into object stores in the first place. I’m sure in the coming years we will see object stores continue to be an important platform for persistent data storage.