Expanding Role Of Data Deduplication
May 25, 2010
Data volumes continue to explode: Of the 437 business technology professionals InformationWeek Analytics surveyed for our data deduplication report (available free for a limited time at dedupe.informationweek.com), more than half manage more than 10 TB of data, compared with just 10% who control less than 1 TB. Seven percent manage between 201 TB and 500 TB, and 8% are charged with wrangling more than 500 TB of data. These massive volumes may be a recent development--25% of the 328 business technology pros we surveyed for our 2009 InformationWeek Analytics State of Storage Survey managed less than 1 TB of data--but all indications point to this level of growth being the new normal.
The applications most responsible for the data deluge include the usual suspects: Enterprise databases and data warehouse apps (33%) and e-mail (23%) are cited most in our survey. Rich media, mainly voice and video, was cited by just 16%, but we think the recent surge in voice and video applications will put increasing demands on storage. And yes, we've been warned before about huge looming increases in video traffic, which never materialized. But there are good reasons to believe this time may be different given an increased focus on telecommuting and multimedia. In addition, the America Reinvestment and Recovery Act aims to have up to 90% of healthcare providers in the United States using electronic medical records by 2020. That's a potential tsunami of high-value, regulated--and huge--files.
As more companies jump on the fast track to petabyte land, a variety of vendors have emerged with technologies and management approaches aimed at helping us more efficiently administer large storage pools while lowering costs and increasing security. In our survey on data deduplication, we asked IT pros about their use of some of these technologies, including compression, disk-to-disk-to-tape backups, encryption, virtual tape libraries, thin provisioning, massive array of idle disks (MAID), and data deduplication specifically. Of those, compression is the most commonly used, with 64% of respondents employing the technology in their environments. Survey results show relatively low current adoption rates for data deduplication, with just 24% of respondents using the technology. However, the good news is that 32% are evaluating dedupe, and just 10% say definitively that they won't consider adoption. Only 17% of respondents have deployed thin provisioning, while 15% say they flat out won't; and only 12% say they have deployed MAID, while 17% say they won't.
We found the low adoption rates for these three promising technologies surprising because business as usual is no longer a realistic option. The price of storage in the data center isn't limited to hardware. Escalating power and cooling costs and scarce floor space pose a serious challenge to the "just buy more disks" approach. These three technologies could enhance a well-designed storage plan and--along with increasing disk/platter densities, larger disk drives, and faster performing drives such as solid-state disks--reduce storage hardware requirements.
Of course, compatibility with legacy systems is always an issue. McCarthy Building, a St. Louis-based construction firm with $3.5 billion in annual revenue, uses SATA disks in DualParity RAID configurations for its Tier 2 storage (more on tiers later). "We replicate production data to a remote site on the same storage," says Chris Reed, director of infrastructure IT. "We deduplicate everywhere we can, especially since the cost is still $0 from NetApp and we haven't seen a performance downside."