Data Protection In The Cloud: The Basics
February 24, 2014
What is data protection in the cloud? This isn't an easy question to answer, since it comes in various forms and the tools and technologies for data protection are extremely numerous and can be used in different combinations. From IT’s perspective, a large number of choices can make cloud more difficult than traditional schemas. Still, we can cut the Gordian knot of cloud complexity with three steps that will help guide further exploration of data protection.
First, we need to understand in general what data protection models the cloud might solve. Second, we need to understand that big decisions related to the cloud and data protection include what is managed internally and what is managed by a third party. Third, we need a rough basis for putting together an inventory that an organization can consider when moving to the cloud. We can then use this foundation as a basis for future discussion of data protection in the cloud.
Data Protection Processes
Backup, recovery, business continuity (BC), and disaster recovery (DR), are among the issues that are bandied around in discussions involving data protection, as are other terms, such as combined BC/DR and high availability (HA). Keep in mind that definitions matter. If you and a vendor define issues and processes differently, what you get may not be what you want or need.
Backup is easiest and the most familiar process for most situations. A backup is a data protection copy of data derived from the production copy (which is the official working copy of the data). A backup copy is used to recover data needed to restart an application correctly.
Disaster recovery (DR) is the recovery of the entire relevant IT infrastructure at a remote (i.e., secondary) site after a primary (i.e., production) site has become unavailable for an unacceptable period of time. Yes, the data is important, but so is the recovery of servers and their applications, as well as any required networking capabilities.
Business continuity (BC) is about both operational recovery (OR) and disaster recovery (DR). Operational recovery pertains to recovery from a specific problem at a primary site, such as a server, application, or disk failure. It may be absolutely critical, but requires a fire drill response as opposed to invoking (often with an official declaration of disaster) widespread disaster relief that affects all applications.
Very few recovery events are caused specifically by a disaster (thank heavens!). Instead, most are operational recoveries. While some of the DR infrastructure may be helpful in some cases for providing operational recovery, simply being able to recover data from a backup in the cloud may be all that is necessary.
Cloud Definitions And Data Protection
The National Institute of Standards and Technology (NIST) defines public cloud computing as cloud infrastructure provisioned for open use by the general public, existing on the premises of the cloud provider. In contrast, NIST defines private cloud as cloud infrastructure is provisioned for exclusive use by a single organization comprising multiple consumers (e.g., business units). "It may be owned, managed, and operated by the organization, a third party, or some combination of them, and it may exist on or off premises," according to NIST.
Consider this definition in relation to data protection and a third party’s key role in providing a service, such as backup-as-a-service (BaaS), recovery-as-a-service (RaaS), or disaster recovery-as-a-service (DRaaS), even if no such label is used. That role implies a level of trust in a service provider that must be in place from the very beginning of a cloud engagement.
[Find out what questions to ask when evaluating cloud providers in "Avoid Cloud Storage Disasters: 6 Questions To Ask."]
If an organization builds and hosts its own private cloud without service provider help, that is commendable, but isn't very different from traditional implementations. Now, some vendors, such as backup/recovery vendors, have products that can work across traditional, private, and public infrastructures, but our focus of data protection in the cloud will focus on services provided by third parties. Keep in mind that the roles of the organization and a third party are not decoupled in a private cloud.
A term called “managed private cloud” solves this problem. A managed private cloud is where a service provider supports specific services in a cloud for each organization individually. This is in contrast to multi-tenancy, where a public cloud provides isolated access to the same pool of infrastructure to multiple organizations.
Let’s say that an enterprise wants to consider what to do with cloud services. First, IT has to make an inventory of all workloads. What workloads are run in-house? Of those workloads, are any candidates for traditional outsourcing or moving to a cloud? If they move, their BC/DR requirements would move with them. If they do not move, what functions for backup, DR, and BC need to be performed for each application?
Rudyard Kipling classically memorialized the five W's of journalism (who, what, when, where and why) as six "serving men" in a poem (with the addition of how), but we can use this model for this non-journalistic purpose. Take each workload and fill in the six related blanks; keep in mind this is just a starter for inquiry and not a full methodology:
Filling out the table for each workload provides you with a rough understanding of what you need and helps put you in the driver’s seat when dealing with vendors. They can have a good story, but unless you know what you want, you may become mesmerized by what they say.
The cloud is, of course, a hot topic in the IT world. But to figure out how data protection fits in the cloud, we have to distinguish among backup, disaster recovery, and business continuity and the parts or functions of the business they relate to. Otherwise, the terms can be bandied about to create confusion and unnecessary complexity.
Then we have to understand that data protection in the cloud is a service provided by a third party, through either a managed private cloud or a public cloud. Finally, we need a general understanding of whether the data protection requirements for each workload might fit in the cloud. That way, each organization has enough to get started in evaluating alternatives for data protection in the cloud. This is only a start towards understanding data protection in the cloud, but it should be enough to get you on your way.