IBM's Broad, Deep Data Protection And Retention Portfolio Strategy

That there is a data explosion is well-known. However, it is not simply an increase in volume for existing data and applications; it also involves new types of data for new applications. The resulting mix does not support all the same data protection and retention requirements businesses are used to, so a comprehensive portfolio of data protection and retention products and services is necessary to cover all requirements. IBM is one vendor that offers such a comprehensive portfolio.

David Hill

December 2, 2011

12 Min Read
Network Computing logo

That there is a data explosion is well-known. However, it is not simply an increase in volume for existing data and applications; it also involves new types of data for new applications. The resulting mix does not support all the same data protection and retention requirements businesses are used to, so a comprehensive portfolio of data protection and retention products and services is necessary to cover all requirements. IBM is one vendor that offers such a comprehensive portfolio.

As background to understanding the needs for data protection and retention, let’s examine what is happening in the world of data. Although multiple data types have always existed, not all of them received equal emphasis in business:

  • Structured Data. The first big era in data was around structured data, most notably that inhabiting relational databases. This stage centered on online transaction processing systems (OLTP) applications, which are about automating manual business operational processes. Revenue-generating applications--such as online order entry as well as operational management applications, including supply chain management--are key here. In addition, these include day-to-day operational applications, including accounts receivable, accounts payable and payroll.

    Traditional backup and restore applications have long been applied to these applications as a means of protecting vital corporate assets. Retention policies were not a major focus, as volumes were not oppressive and compliance was not much of an issue. These applications tended to be managed in house (although they could have been outsourced) and related to one company. WAN communications among multiple sites of the same company involved leased lines.

    That was then. Now these applications continue to grow, but life is more complex. Little, if anything, is totally private, as suppliers and customers may have access to some parts of an application, such as a supply chain. Complexity has also increased, as structured data applications may work in conjunction with semi-structured and unstructured data types in a hybrid environment, such as Web revenue-generating applications, as well as some medical applications. And, of course, compliance requirements have become stricter. Structured data is not going away, but now it shares the stage with other types of data.

  • Semi-structured Data. The second data era focused on semi-structured data. Now, semi-structured data can be searched--that is, the content can be examined and people can understand what is happening. Examples are emails, word processing documents and presentations. These could all be defined as files, but they represent only a part of all files (as unstructured data, which is bitmapped, can often be represented in a file structure as well).

    This might also be called the collaborative data era or era of data sharing. One does not create an email for individual use, but shares it with others. The same is true for a word processing document or presentation. Note that this sharing is not necessarily kept within the confines of one enterprise, but often occurs over a public network or the Internet. Although traditional backup and restore can be used to protect unstructured files, the policies may very well be different from those of OLTP applications. Now, multiple devices, such as smartphones and tablets, have profoundly increased the data protection complexity. Moreover, retention is a big deal not only for volume reasons, but also to make sure that e-discovery requirements are met.

  • Unstructured Data. The third data era now adds unstructured data to the mix. This is bitmapped data, such as audio and video recordings and medical images. This information cannot be sorted (as can structured data) nor searched, such as semi-structured data, but rather has to be perceived through human senses or searched with software solutions designed to scan for and find specific sorts of information. Much of this data has been in analog form and is now being digitized. Vendors love this, as the volume of data storage required increases tremendously, but customers love it as information is easier to retrieve, to share (especially across distance) and to use (show me all the medical images in the past year for a particular patient).

    Unstructured data also tends to be fixed data. That means that, although copies have to be made to preserve the information, the traditional approach of daily backup as used for constantly updated OLTP applications is not necessary. Moreover, unstructured data is a great candidate for active archiving because, except during the creation process, such as a movie being filmed, the data is fixed.

    On top of the data explosion in all three major types of data, the overall complexity is increasing. Structured data is used for day-to-day (operational) business processes.Semi-structured data is used for interpersonal communications to facilitate working together. Unstructured data represents intellectual property, such as videos, and information assets that have an intrinsic value as standalone items, such as a medical image. Complexity is added by commingling different data types in a hybrid arrangement, by the different needs for presenting information on different devices (such as a newspaper on a smartphone, a tablet and a laptop computer), and by how and where the data is stored (various flavors of cloud are now in the mix).

    On top of that, there is a revolution (finally) of not just performing operational processing, but also of doing decision-making under the rubric of analytics. Analytics has suddenly emerged from its dark ages, and big data is the temptress that has added fuel to the analytics fire.

    What has all this meant for the world of data protection and retention? The simple answer is a lot. Traditional tools still have their place, but we have heard about the growing importance of disk-to-disk replication (such as a virtual tape library) because of, among other things, the need to restore data more quickly after a problem arises.

    But the sheer volume of information means that we also have to get rid of that data that no longer has a business value and can be legally deleted, and that we also haveto be able to keep growing amounts of fixed data that no longer has high response time requirements on more cost-effective media. This is driving a much-needed, but not high-visibility, IT trend called active archiving.

    All the preceding was a lot of background, but it was necessary to convey a sense of why traditional data protection and retention strategies are no longer sufficient in and of themselves, but need to be complemented by emerging and maturing technologies. That is why a broad and deep data protection portfolio, where IBM represents a primeillustration, is essential.

    Of course, IBM offers tried-and-true approaches to data protection, such as Tivoli Storage Manager, that has also evolved to meet changing needs. But one of the areas where particular change has come to the backup and restore process is through the use of disk-to-disk instead of disk-to-tape approaches at the front end of the backup process.

    The discussion has tended to focus on data deduplication, which uses storage media for backup and restore in an efficient manner, and that is fine. But the key advantagehere is the ability to do faster restores. One of the key approaches to disk-to-disk backup and restore is the use of virtual tape libraries (VTLs).

    IBM’s answer to these scenarios is ProtecTIER, which supports both open systems and the mainframe. Customer production systems range from 4 Tbytes at the low end to more than 12 petabytes at the high end. Now, I don’t know about you, but I have difficulty visualizing a petabyte of data, let alone 10 petabytes or more, yet that is the way that the world is going. At both endsof the spectrum (but more so at the high end), high availability is important, and IBM uses clustering to assure that ProtecTIER supports that requirement. ProtecTIER also has long had strong data deduplication capabilities, and IBM claims the system supports high performance for restores. IBM counts a few thousand units in the field, and reports that TB shipments continue to beat existing records, so that would seem to confirm that ProtecTIER is doing well.

    Now, physical tape (and that includes tape libraries and tape drives, as well as media) would seem to qualify as old school data protection rather than new school. Well, yes,but most data being created and protected today tends to be semi-structured and unstructured, and tape can be used to protect copies of that type of information as wellas its traditional use in OLTP applications. Think about it. Can organizations afford to store copies of big data for data protection on anything but tape? Elsewhere, I haveaddressed the issue that tape is more cost-efficient at larger volumes than disk.

    IBM claims that it is the worldwide leader in revenues for total tape as well as tape drives and tape automation. Quite frankly, no one is likely to challenge those assertions.That does not mean that disk to disk has not made some inroads in the tape market, but IBM feels that a combination of disk and tape as part of a tiered architecture will continue to serve the needs of many enterprise customers.

    For example, one outsourcer determines which storage tier (disk or tape) should be used based upon service level agreements (SLAs). Data with stringent SLAs are storedon ProtecTIER, but later may be migrated to physical tape and then moved offsite. IBM introduced both LTO-5 tape drives for open systems and the TS1140 enterprise tapedrive this year.

    Retention policies are taking on more and more importance. One reason is that data that should be kept on the most cost-effective media possible (as no enterprise likes to expend large sums of money unnecessarily on more expensive media). Compliance needs, including e-discovery, are also becoming more important.

    IBM has long had a DR550/450 information archiving solution for these applications and also offers physical data protection, such as non-erasable and non-rewritable solutions for data that has to be preserved, as well as data shredding on deletion solutions for information that can (and should) be deleted and whose deletion must be absolutely confirmed.

    Standard production systems cannot provide the necessary mechanisms that are optionally available for physical enforcement of policies needed to meet legal requirements, such as litigation holds to ensure that key information is legally available as required and audit trails to prevent spoliation of data (which means that it could not be used legally, and that is a bad thing as fines or sanctions could arise). Overall, we believe archiving is going to take on greater and greater importance in every sort of business.

    To the surprise of many and the consternation of some is the story that physical tape has a growth market ahead of it. Simple economics dictate that the majority of the explosion of data will not be able to be stored in a financially sound manner exclusively on disk, but rather a blend of disk and tape, with the tape-only delivering the lower cost.

    Tape has had a manageability problem in that it cannot be managed as if it were disk. (Files are stored and accessed sequentially on tape as opposed to being accessed randomly on disk.) The introduction of LTFS (Linear Tape File System), which IBM invented, changed all that. In effect, anything that you can do with disk, you can now do with tape from a manageability perspective. The caveat is that tape still has be searched sequentially, which introduces a latency factor. However, the use of disk caching at the front end sparks the realization that fixed-content data that is infrequently accessed does not need the high response times required for interactive applications.

    Moreover, this type of use of tape makes digitization economically more attractive for a number of applications. These include film digitization and productivity, digital video surveillance, electronic discovery and medical (high-resolution storage of large medical images). Note that this approach is especially useful in the cloud, which has huge volumes, as well as for migrating information from one cloud to another by physical transportation rather than tying up huge amounts of network bandwidth. Tape will be the home for a lot of active archive information (and, dare we say it, big data will be an especially compelling example).

    Note that IBM was recognized with a technical Emmy Award for its work with LTFS. IBM is also building an ecosystem of suppliers, like FOR-A, which is integrating LTFS into its own products, as well as archive storage companies like Arkivum, which is using LTFS to enhance archiving services.

    Every now and then we hear cries about information overload. That certainly seems to be the case today, especially if one takes the perspective that data alone can create a sense of information overload. Today’s data explosion contains the familiar structured data of operational processes, the semi-structured data that we deal with every day (from texting to email to document creation) and unstructured information, which we have always had, such as videos and medical images, but are now taking on greater importance. Plus, traditional approaches to data protection and retention have had to be supplemented and complemented by new technologies, as well as extensions to scalability and functionality of older applications.

    IBM is right in the middle of this maelstrom of change. Its ProtecTIER VTL plays a key role in the rapidly growing disk-to-disk backup and restore market, but old standby physical tape continues to play a key role as part of a tiered data protection architecture. On the retention side, active archiving is playing a critical role. Storing data for the long term as cost-efficiently as possible is critical, but so is being able to manage that data for the ever-growing compliance needs.

    IBM’s Information Archive pre-configured appliances for archiving put the company’s best foot forward to meet those types of requirements. In addition, the maturation of products that can take advantage of LTFS means that tape will have a greater and greater role in storing humongous volumes of data efficiently, while at the same time providing access as necessary to provide ongoing business value from the preservation of those long-term information assets.

    Effectively meeting data protection and retention requirements today demands a much broader and deeper portfolio than ever before. IBM realizes that point, and has assembled a wide range of offerings to satisfy its customers’ needs.

    IBM is currently a client of David Hill and the Mesabi Group.

About the Author(s)

Stay informed! Sign up to get expert advice and insight delivered direct to your inbox
More Insights