U of Michigan Reveals Google-Based Digitization Project

University to build massive digital library with help from Google, Isilon

April 8, 2008

3 Min Read
Network Computing logo

The University of Michigan has revealed details of an ambitious, six-year, 400-Tbyte book digitization and storage project involving Google and Isilon.

The Michigan Digitization Project will make digital copies of the universitys 7.5 million books, which will be stored in a massive clustered storage system from Isilon. The school will link its work with Google's Book Search project.

Launched in 2006, Google Book Search aims to create a virtual card catalogue of all books in all languages. Designed to work like a typical Web search, Book Search also provides links to digitized versions of older books that are not subject to copyright.

”The Google Book Search side of this is hugely ambitious -- by working with Google we can digitize millions of volumes and get past the problem of books crumbling on the shelves,” says John Wilkin, associate librarian at the University of Michigan.

Wilkin explains that Google will scan the university’s books, sending one copy back to Michigan and keeping a second copy for its Book Search.Each digitized book is approximately 55 Mbytes in size, downloadable at a rate of 3 Mbytes per second, 24 hours a day, seven days a week. “Without Google’s support this never would have happened,” says Wilkin, explaining that, on its own, the university could only scan around 15,000 books a year.

In an attempt to store the current influx of digital data, Wilkin and his team deployed 200 Tbytes worth of clustered storage from Isilon last fall. The 32 IQ 9000 and EX 9000 systems are split between the University of Michigan’s main data center in Ann Arbor and a disaster recovery site in Bloomington, Ind., linked by Isilon’s SyncIQ replication software.

”We want to ensure that this body of cultural heritage will be around for a long time,” says Wilkin, explaining that the library’s collection includes a rare edition of Chaucer’s Canterbury Tales and collections of early twentieth-century art monographs.

The Isilon hardware replaced a mixture of RAID systems from different vendors, according to the exec. “It’s a whole scaling thing -- when you get into hundreds of Tbytes for a single repository, you need good storage management,” he says. “Even with our best RAID systems, in the past, we would have been putting out fires all the time.”

The Wolverines’ RAID systems have now been deployed elsewhere within the University’s IT infrastructure, and Wilkin is already looking to expand the Isilon cluster.Michigan and its development partner, the University of Indiana, will open the digitization project up to other institutions within the Big Ten conference. “As more libraries come into this, we are likely to grow beyond 400 Tbytes. It’s going to be on a continual basis, we will be adding things annually, maybe semi-annually.”

Wilkin was less forthcoming on the specific value of the University of Michigan’s contract with Isilon, although he estimates that the Michigan Digitization Project will cost around $1.1 million a year. “That’s in terms of the hardware, site, and electricity costs, but that doesn’t involve any of the staff costs."

It's not just the University of Michigan that's ramping up its storage efforts at the moment. Houston-based Rice University today announced details of a deal to deploy 66 Tybes of Isilon IQ-9000 and IQ-200 hardware as a digital repository for its Shepherd School of Music and the James Baker Institute for Public Policy.

Have a comment on this story? Please click "Discuss" below. If you'd like to contact Byte and Switch's editors directly, send us a message.

  • Google (Nasdaq: GOOG)

  • Isilon Systems Inc.0

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox
More Insights