Library of Congress Groans Under Data Strain

US national archive looks to survive the digital data explosion

April 10, 2008

3 Min Read
NetworkComputing logo in a gray background | NetworkComputing

ORLANDO, Fla. -- If you think that your business is having a tough time coping with the data explosion, then spare a thought for the Library of Congress, which has to find some way of tackling a mind-blowing amount of information.

The digital revolution is comparable to the one started by Gutenberg more 500 years ago,” said Laura Campbell, the archive's associate librarian, referring to the first book printed with movable type.

In its 208-year history, the library has collected more than 138 million items in 450 languages, ranging from manuscripts to maps and sound recordings, but the Internet era poses a whole new set of challenges.

”We estimate that in the current digital age, the amount of information produced every 15 minutes is equivalent to all the data and information now in the Library of Congress,” explained Campbell, during a keynote this morning. “The library can no longer collect everything.”

From TV shows to Web pages, geospatial images, and electronic documents, Campbell and her team have had to work out an entirely new preservation strategy for the Library of Congress.The library currently has more 500 Tbytes of digital data stored within its infrastructure, split across three data centers and a plethora of different storage technologies. “We use all types of data storage: online, nearline, and tape,” she said. “About half of what we have is on nearline and online.”

Even with the falling cost of storage, the Library of Congress is still confronted with an almost incomprehensible volume of data, prompting officials to forge partnerships with a slew of government and commercial organizations.

As well as helping the library design its storage systems and build specific preservation tools, officials also need help in deciding which digital data needs to be preserved, and which does not.

“The partners bring a host of skills that are complimentary, from academia to technology companies,” said Campbell. “Essentially, the whole is greater than the parts, if you will.”

Initiatives include "Preserving Creative America," a partnership with the Academy of Motion Picture Arts and Sciences. Another major preservation effort is the National Alliance for Content Stewardship, which includes 100 state and commercial partners and has already saved 300 Tbytes worth of digital data.”This will increase to 650 Tbytes by 2013,” said Campbell, explaining that the library has also forged partnerships with Microsoft and Google, which is also involved in an ambitious book digitization project with the University of Michigan.

While admitting that digital preservation “isn’t sexy," the official urged storage vendors to help the library resolve its data challenges. “Third-party storage is a really important service for our network. I hope that some of you will get interested and contact me about getting involved.”

The Library of Congress technology guru was not the only speaker describing the challenges of digital preservation at SNW today. “I met with a large national library over in Europe and they have a series of content that they have stored using applications that no longer exist,” said Andy Monshaw, general manager of IBM’s system storage business, during another keynote.

”They are now looking at how they can get ahead of this. This is all about preservation and future-proofing.”

Have a comment on this story? Please click "Discuss" below. If you'd like to contact Byte and Switch's editors directly, send us a message.

  • Google (Nasdaq: GOOG)

  • IBM Corp. (NYSE: IBM)

  • Microsoft Corp.

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox

You May Also Like


More Insights