Historically, backup to tape was a chore, and the result was that some IT operations were perhaps a bit lax on doing it. A decade or so ago, we began seeing alternatives. First, online backup companies started offering remote copies. Then along came the cloud approach, which not only backs up files, but also provides working access to them. With both backup and offsite recovery copies occurring together, this looked like a silver bullet for backup/disaster recovery.
The result has been a proliferation of cloud backup services -- Technology Advice counted 52, for instance. They range from the heavyweights like AWS and Rackspace to startups eager to differentiate.
While the cloud approach creates some new opportunities, it also creates new problems. Some of the services are tape based, which slows recovery of data, while disk-based solutions tend to be pricier, but offer immediate access to data.
The lines between archiving and backup have blurred over, too, especially with disk-based cloud storage. Replication as a data protection method is the reason for this. A good cloud backup has at least two copies of any file object, and treatment of data as objects that don’t get overwritten as they are updated creates a path to rollback and point-in-time recovery.
This touches on a key question of cloud backup and archiving: How many copies are enough? Cloud copies can fail, just like any other storage. For backup, the two copies rule is sufficient for most data, and that would be true also for archived data, although the most important data might need a third copy.
The number of copies raises an issue, however. A key tenet of disaster protection is keeping copies in more than one location. Most active data storage services in the cloud provide an option for a third replica in another “zone” of operation, with the zones being separated enough to constitute different locations.
What you really want for your two backup copies is one in each of two zones, and not the standard two in one zone/one in another model of, say, AWS S3. This can be provided by most CSPs.
This geographic dispersion is particularly important if you are backing up data in the same zone as you are running active cloud server and storage instances. If the zone crashes, you lose access to your data, and also to the backup.
[For a primer on understanding your options for moving backup, disaster recovery or business continuity processes to the cloud, check out David Hill's "Data Protection In The Cloud: The Basics."]
At this point, you might wonder with all these replicas why you would need to back up your cloud data. One of the largest elements in data loss is “finger trouble.” Erasing, moving or renaming files in error often happens, and replication systems faithfully copy the changes to all the replicas. Backups, at least in theory, prevent data loss by having read-only, non-erasable files. This also protects against a second major problem, which is malware.
Today, a good deal of our data is created and stored in the cloud. This is data that needs backup and archiving, too. Data flows are important here. Why bring the data through an in-house backup program when the idea is to copy it to another part of the Cloud. Keep in mind that replication only guards against hardware failures, so choosing zones is important.
As we transition away from NAS to object Storage (OS), versioning issues and perpetual storage become much easier. This is an advantage OS enjoys over scaled-out NAS, for example. The extended metadata and search capabilities of OS provide new hooks for recovery processes, and faster rollback/snapshot operations.
Even so, recovery remains a problem. We are fast getting too many files on systems; I checked my own workstation, and it has half a million files! Multimedia, BYOD and the Internet of Things will explode that number over the next few years. We’ll need recovery automation at high levels of sophistication. That’s a major business opportunity for some creative software companies!