We've all suffered from time to time with the law of unintended consequences. Company buys you a shiny new smartphone and now you're expected to reply to emails in the middle of the first game at the new Yankee stadium. Upgrade to new faster tape drives and discover your Exchange backups take longer because the server can't keep up and the drive is shoe-shining.
Occasionally, however, we get blessed by the computing gods and rather than being stuck with unexpected consequences we get unexpected rewards. Such is the case when we combine three of the hottest technologies in the market today -- data de-duplication, virtualization, and extended read caching -- using RAM and flash memory.
Virtual machine images in most organizations, regardless of the hypervisor the organization chooses, contain lots of duplicate data. Fifty Windows VMs will each have 2 GB to 4 GB of common DLLs and other system files that any decent de-duplication scheme can reduce by at least 90 percent.
When I first started thinking about de-duping VM images, I thought condensing 50 or more -- especially in VDI like desktop VM environments -- virtual machine images would create an I/O hot spot as all those VMs try to access the same data. As I researched an article on primary storage data reduction for next week's issue of InformationWeek, I realized, with a little help from some vendors, two truths of frequently accessed, de-duped data like VM images.
The first is that de-duped data is almost by definition-static data only heavily accessed for read. As soon as someone changes one copy of de-duped data, that copy ceases to be de-duped. The second is since de-duped data increases how often a given data block is going to be accessed, it also increases the probability it will be cached. Combine de-dupe with a big read cache and you could really boost performance.