Learn about the fast-growing technology that's reshaping enterprise storage.
Object storage is one of the hottest technology trends, but it isn’t a particularly new idea: The concept surfaced in the mid-90s and by 2005 a number of alternatives had entered the market. Resistance from the entrenched file (NAS) and block (SAN) vendors, coupled with a new interface method, slowed adoption of object storage. Today, with the brilliant success of Amazon Web Services' S3 storage system, object storage is here to stay and is making huge gains against older storage methods.
Object storage is well suited to the new data environment. Unstructured data, which includes large media files and so-called big data objects, is growing at a much faster rate than structured data and, overall, data itself is growing at a phenomenal rate.
Experience has taught us that traditional block systems become complex to manage at a relatively low scale, while the concept of creating a single pool of data breaks down as the number of appliances increases, especially if the pool crosses the boundaries of different equipment types. Filers have hierarchies of file folders which become cumbersome at scale, while today’s thousands of virtual instances make file-sharing systems clumsy.
An inherent design feature of object stores is distribution of objects across all of the storage devices, or at least into subsets if there is a large number of devices in the cluster. This removes a design weakness of the block/file approach, where failure in an appliance or in more than a single drive could cause either a loss of data availability or even loss of data itself.
Object stores typically use an algorithm such as CRUSH to spread chunks of a data object out in a known and predictable way. Coupling this with replication, and more recently with erasure coding, means that several nodes or drives can fail without materially impacting data integrity or access performance. The object approach also effectively parallelizes access to larger objects, since a number of nodes will all be transferring pieces of the object at the same time.
There are now a good number of software-only vendors today, all of which are installable on a wide variety of COTS hardware platforms. This includes the popular Ceph open source solution, backed by Red Hat. The combination of any of these software stacks and low-cost COTS gear makes object stores attractive on a price-per-terabyte basis, compared to traditional proprietary NAS or SAN gear.
Object storage is evolving to absorb the other storage models by offering a “universal storage” model where object, file and block access portals all talk to the same pool of raw object storage. Likely, universal storage will deploy as object storage, with the other two access modes being used to create a file or block secondary storage to say all-flash arrays or filers. In the long term, universal storage looks to be the converging solution for the whole industry.
This trend is enhanced by the growth of software-defined storage (SDS). Object stores all run natively in a COTS standard server engine, which means the transition from software built onto an appliance to software virtualized into the instance pool is in most cases trivial. This is most definitely not the case for older proprietary NAS or SAN code. For object stores, SDS makes it possible to scale services such as compression and deduplication easily. It also opens up rich services such as data indexing.
Continue on to get up to speed on object storage and learn how it's shaking up enterprise storage.
(Image: Kitch Bain/Shutterstock)
What is an "object"?
An object is a blob of data, coupled with extensible metadata describing all sorts of things about the data, all of which is identified by a globally unique identifier (GUID). That data blob is the essence of object storage, since it can be anything from a database row to a Word document to a day’s output from the CERN accelerator. The GUID is usually a hash of the data and serves to guarantee that the data is uncorrupted or unaltered, as well as identifying the unique object.
Metadata is considered to be an extensible set of parameters about the data. These can define lifecycle handling of the object, access controls, content indexing, and relationships of the data to other data. App specific metadata can be created, too.
The fact that objects can contain any data structure means that file and block storage are effectively subsets, which is an important factor as “universal” storage evolves.
What is an object store?
Object stores, from a pure perspective, are software elements, all within the originating server, that take an object and uniquely identify it (the GUID), then deterministically identify groups of storage locations. Typically, there are many groups, which may overlap on the actual storage devices; software handles adding more storage locations (drives) or tackles lost drives or nodes.
The data is then written by breaking it up into chunks and then placing those onto object storage devices (OSDs) within the group. This approach spreads out writing and reading fairly evenly across all of the storage pool.
Hardware to support object stores is COTS-based, usually x64 or ARM. There are a number of ways the actual OSD can be designed, ranging from a server-like appliance with 10 drives to just NVMe-over-Ethernet drives residing in a JBOD and connected directly to the LAN. All the options are inexpensive and the NVMe solutions are achieving phenomenal bandwidth. Typically, a minimum configuration has four appliances with 40 drives, but adding more storage is like building with Legos -- just add more appliance nodes; the software assimilates the extra storage into the pool automatically.
(Image: robbin lee/Shutterstock)
The early object stores were relatively untuned and suffered from a high level of overhead and low performance from hard drives. This pushed object storage into a secondary storage niche for several years, but the technology switched gears as first-generation code was rewritten for performance, on the one hand, and, on the other hand, fast, RDMA LANs and SSD drives began to change the latencies involved.
Today, object storage can match file and block speeds, but we are poised for a rapid adoption of the NVMe approach, which implies drive-level parallel IO and a massive jump in performance even from current levels. File- and block-level storage piggy-backing on the object store will see many of these benefits, too, but object storage is the overall winner and we can expect it to overshadow more traditional methods.
Object storage suppliers
There is a wide range of object storage software suppliers. On the commercial side, Caringo and Scality are the leaders, but they compete with Ceph, an open source project that is supported by Red Hat as well as a very large contributor community. OpenStack has an open source project, too: Swift, which competes in OpenStack installs with Ceph. Installing object store software on a COTS server node is straightforward.
The major cloud service providers like AWS and Google have their own object stores. The AWS S3 operation is by far the largest storage pool in the world. To date, no cloud provider has sold its object store stacks to commercial users.
Most server/storage providers and a host of startups offer an object store with hardware and software already integrated. Many are based on Ceph, especially among the startups, but Scality and Caringo appear with the large platform suppliers. See my list from last year of object storage vendors.
Scale in storage is a factor of two things. First, there has to be a practical addressing scheme than can handle huge numbers of objects. The GUID method in object storage clearly meets that need, and the concepts of buckets and partitions to segregate users' workspaces from each other expand the range of the storage system way beyond any near-term need for the total object count in a storage cluster.
A more practical issue is finding and managing this scale of object IDs. The idea of a central database of GUIDs could lead to latencies of minutes or hours and a snail-pace throughput. All the successful object stores avoid bottlenecks by using a deterministic algorithm to place data and by distributing pointer information out to the nodes containing the starting point in each object thread. In other words, as you scale storage, you add processing power to scale management.
(Image: Angkana Kittayachaweng/Shutterstock)
Object storage use cases
An object store is ideal for data that is rarely edited such as movies and other media, webpages, and any fixed content. Big data is another place where object stores are the best solution, both because of their ability to scale and handle loosely structured data. Scientific data such as simulations, experimental results, and telemetry information fit well because of their size and typically unstructured nature. Traditionally, object stores are home to backup and archiving, another form of static data.
The internet of things is going to shift data creation towards sensors and such and away from structured mechanisms such as PCs and even phones, leading to a massive data growth, at least if the pundits such as IDC and Gartner have it right! Again, object stores are the best fit.
As object stores offer universal access methods that include block and file approaches, well-structured data will also fit the object storage model.
(Image: Timofeev Vladimir)
SDS and object storage
Software-defined storage (SDS) is a novel way to decouple control software from underlying storage devices. Most object storage software, being relatively new, can be remodeled into the SDS mold fairly easily. Running the object store as a virtual pool of services means that scale and agility are even more enhanced. Another implication is that heterogeneous code from a variety of vendors could be mashed up to create storage solutions, while a cluster could be built with various hardware platforms.
Most cloud storage uses the object model. The replication technique in this model can easily be deployed to generate a disaster-resistant geographical dispersion of the replicas. AWS S3, for example, has two replicas of any object in the primary zone, but keeps a third copy in another zone to handle even zone-level crashes.
The new erasure-coding approach allows even more robustness, with the ability to handle zone outages or the loss of many object-storage devices in stride.
Object storage in the cloud is very inexpensive, which has led to the enormous growth of the object model in cloud storage over the last decade. Work still ahead includes making the sharing of public and private cloud data in the hybrid cloud model efficient enough to allow operations to span both cloud spaces simultaneously.