Deduplication Backup Software - Avamar

As we begin our dive into deduplication backup software, we will start with the guys that started it all: Avamar. Since the early part of this decade, Avamar, originally a standalone company, and later as part of EMC, has been working to convince people that delivering deduplication in the backup software and starting the process at the client, is the way to go. It seems like over the past few years they have been gaining steam, driven mostly by a clear articulation of where their source-side te

George Crump

March 31, 2010

5 Min Read
Network Computing logo

As we begin our dive into deduplication backup software, we will start with the guys that started it all: Avamar. Since the early part of this decade, Avamar, originally a standalone company, and later as part of EMC, has been working to convince people that delivering deduplication in the backup software and starting the process at the client, is the way to go. It seems like over the past few years they have been gaining steam, driven mostly by a clear articulation of where their source-side technology plays best, especially since one of those areas seems to be the VMware use case. 

Architecturally speaking, this is essentially an enterprise back-up software application that is designed to send data to a custom back-end disk target. The software installs as an agent on the servers that are going to be protected and then sends backup data to a grid of interconnected server and storage nodes. There are several forms of delivery for this grid, but the predominant delivery package is a disk back-up appliance called an Avamar Data Store. 

The client software, unlike other solutions, does all of the deduplication processing and communicates with the server grid to assure cross-client deduplication. The benefit to this is that only the changed segments are sent across the network to the disk target. With source-side deduplication, the bulk of time is spent on identifying and minimizing what to back-up, compared to target-side dedupe, where the bulk of time is spent transferring all the data across the wire. Source-side dedupe means very minimal use of LAN/WAN network bandwidth, shorter backup transfer windows and of course, savings in backup storage at the potential expense of source processor utilization.

Processor utilization at the client has historically been a perceived as a concern with source-side deduplication technology, and was an issue when we first looked at the technology almost seven years ago. Of course, in seven years we have seen massive advances in server processing power, as well as improvement in overall efficiency of deduplication backup software. As a result, what a customer should typically see today is a modest spike in CPU utilization at the client, but for a shorter amount of time, when compared to traditional backup software.

The short-duration impact of deduplication processing for most servers should be manageable, and where there is a concern, the amount of CPU resources used can be adjusted to customer specified limits. While this may lengthen the backups a bit, it allows you to maintain a service level on the host being backed up. This is especially important in VMware environments, where there is sensitivity to CPU consumption for backup, and where vMotion and other measures are often triggered by excessive CPU usage.  
 
Once redundant, sub-file segment of data has been identified and eliminated (within and across clients). Only unique, new data is sent across the wire to be backed up. In unstructured data environments, Avamar claims that they can reduce data by over 99 percent. The backup data is received and written to disk at the Avamar Data Store. In the Avamar Data Store, data is striped across the storage in the grid, and the processing load for the backup is distributed across the grid as well. Each node in the grid stores its data in a RAID 5 data protection scheme and then RAIN protection (Redundant Array of Independent Nodes, a grid-like RAID) is applied across the nodes. RAIN provides persistence to any individual node failure, and also allows you to scale the grid without excessive downtime.  In addition to RAID and RAIN, Avamar also offers data recovery verification. Data is validated twice daily to make sure that whatever has been backed up is always in a recoverable state. Since Avamar does not rely on a full plus incremental recovery scheme, all recoveries from Avamar are one-step recoveries from logical full backups. This means that pulling the last full backup from the weekend and layering nightly incrementals is not required.When all the net new backup data has been transferred to the Data Store, replication can begin. While backup and replication are sequential processes, it is important to note that the overall backup window itself should shrink, so in theory the replication step should start fairly quickly. In most cases, total time to have data at the DR site should be comparable to the concurrent backup/replication processes possible on the leading target deduplication storage solutions. 

As I said earlier, the VMware use case is fertile ground for Avamar and its approach to deduplication. EMC has put considerable focus on advancing the product specifically for this environment. Avamar can provide either guest or image backups of VMware environments. The reason Avamar is such a good fit for VMware is because the way that it approaches deduplication solves some of the unique backup challenges that virtualization and consolidation can bring. In particular, the problem of too much data flowing through shared physical resources on ESX hosts. Avamar's dedupe cuts this data flow down to size and speeds up the backup. This allows for further consolidation and possibly a better ROI. Avamar now supports the VMware vStorage API for Data Protection and the new capabilities that it brings, as well as integration with VMware vCenter Server for more centralized management.

Being a veteran of the dedupe wars, Avamar has survived by maturing their software and is now prospering by jumping on key market opportunities that accentuate their capabilities like remote office backup (low bandwidth),VMware backups (high levels of duplicate data), NAS backup and most recently, Desktops & Laptops.

Disclosure: EMC's Backup Recovery Systems Division has engaged and is presently engaged in projects with Storage Switzerland where George Crump's is the lead analyst. 

 

About the Author(s)

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox
More Insights