Network Computing is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

De-Duping VM images

We've all suffered from time to time with the law of unintended consequences. Company buys you a shiny new smartphone and now you're expected to reply to emails in the middle of the first game at the new Yankee stadium. Upgrade to new faster tape drives and discover your Exchange backups take longer because the server can't keep up and the drive is shoe-shining.

Occasionally, however, we get blessed by the computing gods and rather than being stuck with unexpected consequences we get unexpected rewards. Such is the case when we combine three of the hottest technologies in the market today -- data de-duplication, virtualization, and extended read caching -- using RAM and flash memory.

Virtual machine images in most organizations, regardless of the hypervisor the organization chooses, contain lots of duplicate data. Fifty Windows VMs will each have 2 GB to 4 GB of common DLLs and other system files that any decent de-duplication scheme can reduce by at least 90 percent.

When I first started thinking about de-duping VM images, I thought condensing 50 or more -- especially in VDI like desktop VM environments -- virtual machine images would create an I/O hot spot as all those VMs try to access the same data. As I researched an article on primary storage data reduction for next week's issue of InformationWeek, I realized, with a little help from some vendors, two truths of frequently accessed, de-duped data like VM images.

The first is that de-duped data is almost by definition-static data only heavily accessed for read. As soon as someone changes one copy of de-duped data, that copy ceases to be de-duped. The second is since de-duped data increases how often a given data block is going to be accessed, it also increases the probability it will be cached. Combine de-dupe with a big read cache and you could really boost performance.

  • 1