Find out key traits of this emerging data storage technology and important vendors in the market.
Software-defined storage is just emerging into the market place, which makes it one of the fastest evolving sectors of the IT industry. It also makes this one of the most confusing areas of technology, with extravagant hype colliding with old-fashioned FUD to obfuscate one of the most important transitions in IT.
SDS is intended to align storage practices with the orchestrated world of servers and networks in clouds or virtual clusters. The technology moves us away from the current model of proprietary code running on proprietary hardware to an abstracted model where data services run in virtual machines or containers in the server cloud, while actual storage is done using bare-bones COTS appliances.
Underlying this transition to software-defined storage is the realization that today’s storage appliance looks very much like a server. It’s an x86 or ARM CPU with 6 to 12 drives and a lot of networking bandwidth. The logical extension of this transformation away from the monolithic RAID arrays of yore is that the computing functions can be virtualized, leading to the SDS concept.
This virtualization process requires interface standardization for the actual storage devices, so not all of the code migrates into the virtual machines, but features like file systems and management, access tables in object stores, deduplication, compression, erasure coding, encryption and indexing are all becoming virtual services. This should allow flexibility in vendor choices and dynamic sizing of service availability to match changing workloads.
While by no means the only embodiment of SDS, hyperconverged systems are a good way to create the hardware environment needed by melding server and storage appliance functions into a common box type. The result is the ability to position data services and apps where the data is, rather than moving that data to where the apps are. This should speed up operations considerably and will certainly reduce network traffic.
Let’s go under the hood to see what SDS really is about and which vendors are helping deliver on its promise.
Software-defined storage basics
SDS separates data services from the actual storage functions. The data services are virtualized and should be designed to interlink to form chains of operations and also to be scalable by instantiating new versions of the service in more virtual machines. These data services run in the server segment of the cluster. Heavily standardized and commoditized on the COTS model, the storage devices are simpler than today's boxes.
SDS implementations need automated orchestration. This implies strong security in what is typically a multi-tenant environment; access control systems and encryption of data at the point of creation are necessary to really make this happen.
There are many ways to configure an SDS cluster. Using hyperconverged systems is one way, while Ceph distributed over the server pool and managing storage directly on Ethernet drives is another.
(Image: Danil Melekhin/iStockphoto)
Are pure software plays the essence of SDS?
For the data services side of software-defined storage, the answer is, almost, yes. We are coming from an environment where hardware providers are converting to a software model, so often we see both unbundled code and bundled solutions from a vendor. Hyperconvergence supplier Nutanix offers both, plus licensing of code to OEMs for their bundled systems. Another example is Scality with its Ring object store code offered in much the same way as Nutanix. Still, the value proposition is code available for loading onto COTS hardware.
We also have companies claiming SDS status when they just have proprietary software on their own hardware. That’s a bit of a stretch! This is where FUD and hype create a bit of a smokescreen.
The hardware side
Storage hardware can consist of server boxes with internal drives, as in hyperconverged systems, or they could be iSCSI appliances. Fitting Ethernet-interfaced smart drives into the picture is also on the cards; it’s been demonstrated using Ceph.
There are also ways to convert existing storage and make it accessible as part of a larger pool. EMC developed ViPR software with that intent, and then released it to the open source community as CoprHD. Clearly, though, ViPR is intended as a transitional solution.
Storage orchestration and management
Automated tools are critical for software-defined storage. These tools provide discovery of new storage, a single pane of glass for control, classification of different storage types, health monitoring and, most importantly, automated user-driven storage provisioning.
Providers range from startups such as Prophetstor, Stratacloud and ioFabric, to established vendors including VMware, Dell Technologies (with EMC's ViPR), IBM and HPE. The startups have the advantage of being nimble and some have already moved ahead of the pack. An example is Nutanix, which has partnered with many of the major systems vendors worldwide.
Being a newcomer to the arena, object storage has little of the decades of baggage seen in iSCSI, NAS and RAID arrays. One result is that the code is designed for COTS systems and many of the installations use server-sized appliances. These software packages -- Red Hat’s open source Ceph, Scality Ring and Caringo Swarm, for example -- all fit the SDS model, with services virtualized and distributed, and storage devices such as drives recognized as nodes in the cluster.
None of them are fully structured to the SDS model, but that’s the problem of a lack of industry standards/APIs for service intercommunication. Time -- and perhaps the Storage Networking Industry Association -- will converge to provide a solution to this.
Simple file systems such NFS and CIFS should fit in easily with SDS. It’s a case of instantiating the NAS system code then linking some storage pools via orchestration. Direct-connected server storage) is much the same. It can be integrated into the storage pool and made available for sharing across the cluster. When we look at distributed file systems, orchestration needs to get savvier, with knowledge of metadata structures and redundancy systems.
Gluster and Lustre, the most common scale-out file systems, which are used in high-performance computing and other applications, will need a good deal of work to be virtualized on the SDS model.
Proponents of SDS often talk about uniformity of the systems deployed, and avoiding special hardware for tasks like compression. This ignores the issue that bandwidth is expanding rapidly, both on drives and networks; plain x86 or ARM CPUs just can’t keep up. This will be a major issue with hyperconverged systems, for instance. Specialized hardware features on servers or storage nodes are something orchestration still needs to address. If we can handle the virtualization of GPUs, we can handle it for hardware accelerators for functions like encryption and compression.
We are just starting to see movement towards unbundled data services such as compression, deduplication, erasure coding and encryption, among others. For example, startup Nyriad is demonstrating erasure coding at 60GB per second.
(Image: Matej Moderc/iStockphoto)
Once we have some semblance of industry standards, software-defined storage innovation should take off. There are many small players in play, but it’s a bit like Babel right now, with connectors needed to link things and translate APIs.
The next major phase after we overcome this near-term issue is achieving adequate bandwidth in the interlinking of servers to each other and to the storage nodes. Faster networking is critical and RDMA over 25/100 GbE is very attractive. This is where NVMe over Fabrics will land in a year’s time, probably upsetting the interface APIs a bit in the process.
Ethernet for all drive interfaces makes sense. Drives are getting smart and that will lead to real solutions similar to the WDLabs experiment with Ceph mentioned earlier. This will greatly simplify configuration, while expanding design options tremendously.
SDS is going to take over virtualized spaces such as the cloud and system clusters. The technologies involved will continue to evolve rapidly, and it’s worthwhile keeping up.