Disk Archiving & Objects

Some of today's content-addressable storage systems have a relatively low limit on how many objects can be stored

George Crump

January 8, 2009

3 Min Read
Network Computing logo

11:00 AM -- The safest prediction for 2009 is that, despite the economic slowdown, the requirement for more capacity in the data center will continue. The challenge is that the economics of continuing to store this data on expensive primary storage will be increasingly painful as the slowdown takes full grip.

Enter archiving -- specifically, disk archiving. Today's disk archive systems like those from Nexsan Technologies Inc. , Permabit Technology Corp. , and EMC Corp. (NYSE: EMC) already overcome many of the limitations and the reasons for the failure of the optical market, as I predicted in my entry, "Why Optical is Dead." They also are the most viable alternative to continuing to store old data on primary storage.

With my apologies to the archive suppliers, disk archives should be a sophisticated digital dumping ground. You want to easily put stuff in it (the dump part) so it is out of the way and forgotten until you need to find it (search). Ideally, it just sits there accepting data and protecting itself. The only interaction is to snap in more storage, which is added automatically.

Disk archives offer ease of access and ease of scalability that makes them very attractive for data centers looking to tighten their belts. However, as users are looking to move up to 80 percent of their data to the archive, disk archives must progress to meet the challenge of the increased storage demand.

Most archives are CAS-based (content-addressable storage) systems. Data is broken down either at a file level or a sub-file level and is "fingerprinted" with a unique ID. As a result, where the data is stored on the CAS is irrelevant. Storing a similar file somewhere else on the CAS will lead the system to develop a pointer to the original data set, but not store the second copy of data.These fingerprints are called "objects," and some of today's CAS systems have a relatively low limit on how many objects can be stored on the system. They were designed for a single purpose -- email archiving or storing medical records, for example. As customers take full advantage of the easier-to-access nature of disk archiving and begin to store everything old on them, the amount of data stored on these archives will increase substantially, and so will their corresponding objects. Modern archive systems need to have the ability to support a nearly limitless number of objects.

This is important, because if you reach the limit on the number of objects that the archive can support, you need to implement a second archive. This is particularly painful in disk archiving, because if the second archive cannot share object level information with the first archive, there is the likely chance that redundant data can be stored or that retention policies on archive one will not match retention policies on archive two. As a result, this object limitation also increases the time required to manage the system. And it certainly does not meet the requirement of establishing a place to store all of your data until you need to remember it.

So whats the number? When does object count become a problem and begin to effect performance, and how are vendors getting around this? My next entry will get into details about this dirty little secret.

— George Crump is founder of Storage Switzerland , which provides strategic consulting and analysis to storage users, suppliers, and integrators. Prior to Storage Switzerland, he was CTO at one of the nation's largest integrators.

About the Author(s)

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox
More Insights