With all the hype about the cloud, one could be forgiven for believing that backing up your data center is best done there, but there are problems with this simplistic view. For one thing, at the enterprise and mid-range levels, the slow speed of WAN connections makes meeting backup windows difficult.
Another issue is that we focus intensely on the backup part of the process, with the result that when we need to restore data we often have a weaker solution than if we turned the issue on its head and focused on optimizing restore. This translates into long recovery times and a lot of manual intervention.
We also confuse backup and archiving. The latter is an offline and offsite mechanism for storing information for extended periods, to meet legal requirements and common-sense protection for key files. There is no question the cloud is an ideal archiving vehicle. It offers multi-site replication to protect data and tape library services that effectively offline data.
So what would an ideal backup system offer? We are looking for a disk-to-disk-to-cloud scenario, where information is backed up to a separate disk farm in the data center at the fastest possible speed, ensuring that the backup window can be met. We would want the most recent backups to be kept on that disk, so that recovering a file is quick and easy to do. (Note that most queries for a backed-up file occur very soon after the backup.)
To migrate copies offsite to the cloud, the backup appliance could simply treat the cloud as another storage tier, and apply compression and deduplication before shipping the information out. Most backup systems use compression and deduplication to shrink the stored footprint, and encryption to protect it from hackers. The last piece of the puzzle would be a mechanism for applying policies to delete data, or to route it to different cloud repositories as needed.
As we look around the market, there are plenty of choices, but most are focused either on backup to the cloud or “old-fashioned” disk-to-disk-to-tape (D2D2T) type solutions and fall short of what the leaders can do. According to Gartner, EMC is the leader in the backup appliance space, with HP some way behind, but Exagrid has a both a strong local appliance and cloud archiving story and understands the issue of scaling the solution as data grows better than the other players.
Most products scale by adding more disk, but this creates an imbalance of compute power (to compress, etc.) against the amount of data to be processed, but the Exagrid product is clustered, with incremental capacity including incremental computing. The result is a constant backup window as capacity grows. Exagrid also stores the recent data uncompressed, which means a much faster recovery of files from the backup.
EMC’s Avamar product can be virtualized onto a VMware instance, but even with this, its ability to talk to the cloud is limited. Avamar does deduplication and compression prior to storage, with the result that, while it uses a smaller disk capacity than Exagrid, it is slower in both backup and recovery of files.
Other contenders in the segment are Quantum, busily trying to expand upon its tape legacy, and Sepaton. Quantum is doing many of the right things, including a cloud offering. Sepaton has been acquired by Hitachi Data Systems, which may mean some changes in its direction in the near future.
There are some loose ends with today’s backup appliances. Remote office backup is an issue with the EMC and Exagrid products, although HP addresses it. For such sites, the Internet is already in any backup path back to the central data centers, so the best solution is likely to be a backup of the data directly to the cloud. Obviously, data size affects this decision, and there will be instances that justify a backup appliance onsite.
So is tape dead? For archiving, tape is still useful, since it creates a copy that can be taken offline. With disk-based solutions creating incremental backups, the use of a tape as backup medium is limited, at least if archiving is implemented.
It’s important to remember that archiving and backup can fail, too, or rather, recovery of older files can be problematic. Providing multiple copies in the archive is crucial for peace of mind, especially for key information.