11:00 AM -- Although data de-duplication is still a relatively new technology and its full benefits remain to be seen, we've talked to enough customers actually doing it in production to say one thing for sure: as with automobiles, your mileage will vary. (See De-Dupers Lining Up and De-Dupers Demand Disk Mindset.)
Mileage in this case means compression ratio. And people who have taken de-dupe for a ride know enough not to take vendor claims of large compression ratios at face value, but to check it out for themselves. They're also smart enough to know that even if they're a long way from 300-1 or whatever they were promised, 20-1 or even 12-1 isn't so bad. (See Dealing With De-Dupe Doubts .)
Vendor claims -- as with benchmarks such as SPEC SFS for NAS systems -- may be attainable, but only under optimal conditions that you'll rarely if ever get in production environments. But just what does affect the ratios?
According to analyst Greg Schulz of The StorageIO Group , factors include how long you're using the de-duplication -- some ratios improve as more data is processed -- the type of application you're de-duping, how you have the application tuned and configured, and the size of the data files.
Schulz advises de-dupe shoppers to press vendors on how they arrived at their advertised compression rates. "Understanding how the vendor derives their claims, how they performed benchmark tests and how those tests relate to your environment and to your data types is very important to set your level of expectation," he says.