Data Storage Group Receives Patent For Innovative Data Deduplication Technology

Data Storage Group has been awarded a patent for its core deduplication technology, which the company says is better suited for data that is distributed across multiple servers. While most deduplication technologies work by breaking up data into chunks of from 8K to 256K, and then building index files to track those chunks and determine whether they've been seen before, DataStor's Adaptive Content Factoring technology creates one index entry per file version. This results in up to two or three o

January 24, 2011

3 Min Read
Network Computing logo

Data Storage Group has been awarded a patent for its core deduplication technology, which the company says is better suited for data that is distributed across multiple servers. While most deduplication technologies work by breaking up data into chunks of from 8K to 256K, and then building index files to track those chunks and determine whether they've been seen before, DataStor's Adaptive Content Factoring technology creates one index entry per file version. This results in up to two or three orders of magnitude fewer index entries. The technology is used in the enterprise version of the company's DataStor Shield product.

The patent helps validate a product that Darrin Tams has been happy with and that has met his needs. Tams, the enterprise server administrator for the St. Vrain Valley School District, in Longmont, Colo., says he has been using the product for more than four years. The district has 30,000 staff members and students, and uses just two staff members to support up to 200 servers, according to Tams.

With the software, the district backs up nearly a terabyte of raw data from about 60 servers into a baseline of about 600GBytes of data, and maintains 30 days of backups, says Tams. (The remaining servers are used primarily for applications rather than data, and don't need to be backed up.) The data consists of a combination of user documents, user profiles and some Microsoft SQL Server database data. There is also software with modules that interface with the software. The students use the software for performance testing, and the software and data setup is duplicated among the schools.

The big challenge with the way other companies perform deduplication is that a 1GByte file, broken up into 8K chunks, results in 128,000 index entries, says Brian Dodd, president and chief executive officer of Data Storage Group, which is also based in Longmont. "If you have to manage that volume of index entries across a broad network, you very quickly get overwhelmed," he says.

The architecture works reasonably well for a single target-based system because organizations can beef up that server with more processing power and memory, he says. "But to try to distribute that load across a network is essentially  impossible."The indexing architecture can distribute the information across all the computers and volumes in the network, allowing users to run hundreds of servers and simultaneously run dedupe on them at the same time, Dodd says.

Dodd partially credited the technology to his background in high-performance computing with supercomputer vendor Cray Research. The technology also distributes the deduplication process across the network of computers, which the company says requires less compute-intensive resources and can offer higher performance and scalability. In addition, the technology also includes a virtual file system that makes it easier for users to restore files and gain access to stored data for all managed points-in-time.

The DataStor Shield product, which was first available in 2006 and was originally known as ArchiveIQ, has about 120 customers and costs $990 per server, or $250 per virtual machine.

See more on this topic by subscribing to Network Computing Pro Reports 2010 Data Deduplication report.

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox
More Insights