Scaling Data Backup Beyond Data Protection

A new breed of scale-out backup enables enterprises to put secondary data to other uses.

Chris M Evans

May 24, 2016

4 Min Read
Network Computing logo

Probably the most unloved yet arguably most important part of infrastructure management is backup, or to be more precise, data protection. Look back at the mainframe days of the 80s and 90s and you’ll see that backups weren’t that rigorously managed; tapes broke and backups failed, so you just waited until the following day and ran it again.

Fast forward to today and having reliable backups is essential; many industries have to ensure backups are running correctly for compliance and regulation reasons. We’ve reached the point  of better reliability due to the ability to store backups on disk through products like EMC Data Domain. With so much backup or secondary data now on disk, what else can we do with it? The answer is that we can put that data to use for other functions, including seeding test/dev environments, data mining as well as delivering a better backup experience.

Several companies, including Cohesity and Rubrik, are leading this charge and have developed scale-out storage solutions using a combination of disk and flash to extend the backup paradigm. With these solutions, backup is no longer a scale-up model where adding more capacity means creating a new backup server, integrating it into the infrastructure, and rebalancing what backup data is written to. When you need more backup capacity, you simply add a new node into the cluster; software manages the integration and any necessary rebalancing tasks.

Cohesity was founded by Mohit Aron, who previously co-founded and was CTO of Nutanix, a hyperconverged solutions vendor. With the Nutanix heritage and Aron’s work on the Google File System, it’s no surprise that the basis of the Cohesity platform is a distributed file system that implements global deduplication, real-time indexing, infinite snapshots, and dynamic scalability.



However, Cohesity is more than a file system. The platform provides data protection features that integrate with server virtualization platforms like VMware vSphere and allow efficient backups enabled by the company’s SnapTree technology. SnapTree provides the capability to take an infinite number of snapshots with very high granularity, pretty much to the individual block level, which is reminiscent of the continuous data protection technologies from a few years ago.

The difference with SnapTree is that data can be retrieved very efficiently, making it practical to use the Cohesity platform to perform functions such as instant VM restores. In this scenario, the Cohesity appliance acts as a datastore to the virtual server environment, allowing a VM to be instanced directly on the appliance and moved back to primary storage if desired.

VM instantiation leads to another Cohesity feature: Using the platform for test/dev work where the workload performance requirements are less demanding than production systems, but the need to rapidly create and destroy environments is essential. This capability is generally known as copy data management, with a number of other software vendors such as Catalogic Software and Actifio leveraging the capabilities of existing storage platforms to instantiate test/dev images.

Cohesity also allows its appliance to serve traditional file share services, supporting SMB and NFS protocols. There is also the capability to perform analytics on stored data, which naturally becomes more valuable as more data is stored in the system.

Rubrik also has a Nutanix heritage, through co-founder and CEO, Bipul Sinha. Sinha is an investor in Nutanix via Lightspeed Venture Partners. Rubrik’s product  also is a scale-out storage solution for managing backup. Each node -- known as a Brik - is based on quad-socket 8-core Intel Haswell processors, 256 GB of DRAM, and a mix of capacity hard-disk drives and flash solid-state drives. The two models available today vary by the capacity of the HDDs in the system.

Rubik implements a scale-out file system known as Atlas.  Like Cohesity, the company's co-founders have a background in the Google File System as well as Google Search infrastructure and have obviously brought their knowledge to bear in the design of the system.

The Rubrik appliance is focused on data protection of virtual environments, such as VMware vSphere, aiming to be agentless and as unobtrusive as possible. It implements backups using policies and automation that set the service levels of data protection. Backup administrators -- or increasingly simply  virtualization administrators -- don’t have to think about the traditional processes of jobs and schedules as this is all handled automatically.

Like Cohesity, Rubrik also can provide instant VM restore capability, allowing a virtual machine to be accessed directly from the appliance acting as a datastore. The platform also provides the capability to replicate data between appliance clusters, allowing geographic dispersal of backups, leaving only changed/deduplicated blocks of data that must be transferred between sites.

In the past, we looked at backup data as no more than an insurance policy to recover from data corruption, user error or hardware failures. Now, we can start using this secondary data creatively, consolidating other platforms that might be deployed for test/dev work and more importantly, reducing the time and effort required to provision those systems.

Scale-out backup offers the ability to do more with the storage assets on the floor, and as a result, will surpass the traditional backup process. The next stage is for backups to be more mobile -- both Cohesity and Rubrik support data replication to the cloud --  as IT organizations focus on the value of the data itself, either in primary or secondary form.

About the Author(s)

Chris M Evans

Chris M Evans has worked in the IT industry for over 27 years. After receiving a BSc (Hons) in computational science and mathematics from the University of Leeds, his early IT career started in mainframe and followed both systems programming and storage paths. During the boom, he also co-founded and successfully floated a company selling music and digital downloads. For most of the last 20 years, Chris has worked as an independent consultant, focusing on open systems storage and more recently, virtualization and cloud. He has worked in industry verticals including financials, transport, utilities and retail, designing, deploying and managing storage infrastructures from all the major vendors. In addition to his consultancy work, Chris writes a widely read and respected blog at and produces articles for online publications. He has also featured in numerous podcasts as a guest and content provider.

Stay informed! Sign up to get expert advice and insight delivered direct to your inbox

You May Also Like

More Insights