Cofio's Unique Approach To Deduplication

Deduplication, at least from a backup standpoint, is about efficiently storing data on a backup device. Some suppliers leverage either block-based incremental, continuous data protection (CDP) or source-side deduplication to increase the efficiency of data going across the network to the backup target. Cofio's AIMstor application takes the unique approach of using all of the available techniques for maximum optimization across the network and on secondary storage devices.

George Crump

June 15, 2010

3 Min Read
NetworkComputing logo in a gray background | NetworkComputing

Deduplication, at least from a backup standpoint, is about efficiently storing data on a backup device. Some suppliers leverage either block-based incremental, continuous data protection (CDP) or source-side deduplication to increase the efficiency of data going across the network to the backup target. Cofio's AIMstor application takes the unique approach of using all of the available techniques for maximum optimization across the network and on secondary storage devices.

Cofio is a software-based solution that has a client-side and a target-side component. The application will first do a source-side deduplication for the initial seeding of data to the target, which provides maximum network and storage efficiency for that first pass of data. Their deduplication process is also content-aware, and knowing how to examine an Exchange store vs. a Word document should provide greater deduplication ratios.

As with all source-side deduplication, there is some performance impact on determining duplicate data and filtering it. What makes the Cofio solution interesting is that all subsequent backups of the source disk are done via a CDP process, meaning that updates are captured and transferred in real-time at a byte level when the change occurs, or at a scheduled time if desired. CDP and block-level incremental (BLI) examination of a server does not typically impact performance as much as source-side deduplication. As a result, Cofio gets the network and storage efficiency gains of deduplication while at the same time getting the resource efficiency and frequency of protection that CDP or BLI provides.

The typical challenge that CDP or BLI technologies have is maximizing storage efficiencies on the target device. There is the possibility that a file could be added to multiple servers and then stored redundantly on the backup target. A good example would be VMware, where multiple hosts may have VMs that receive an OS patch update. Under the typical CDP, BLI use case these would all be stored redundantly. To address this issue Cofio performs a post-process deduplication pass on the disk repository on a scheduled, typically once per night, basis. This pass then will identify redundant data segments that have been stored since the initial seed or since the last post process dedupe pass.

They will soon provide VMware integration leveraging the vSphere API but the client is also lightweight enough that it can run within the guest OS. Running at the guest OS level should provide greater granularity of data examination and recoveries. While you can't launch a protected VM directly from their environment like some CDP tools, you can restore the backup of a physical server into the virtual environment, making physical to virtual machine migration another capability of AIMstor.  The software provides real-time restore consistency for any application, and will also soon release added support for Exchange, MS-SQL and Oracle. Interestingly, AIMstor file versioning will be extended to include e-mail archiving this fall. Within the foundation of the software are robust data management and retention policies to help with that task.

Cofio categorizes the AIMstor product as a unified data protection application. Functions that are typically done separately like snapshots, CDP, backup, archive, e-mail archive and information security are unified not only under one interface but also into one data mover and one deduplicated storage repository. This provides the capability to not only deduplicate backup data but all the non-primary storage copies of data into a single repository. Further, since data moves only once for several different functions, network data transfers are then reduced by an order of magnitude equal to the number of functions used "X" capacity of data per function.

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox

You May Also Like


More Insights