Network Computing is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Storing Archival Data - Part Deux: Page 4 of 5

Data integrity assurance goes hand in hand with retention
enforcement. Retrieving a document from the archive to discover it's corrupted
and the critical paragraph that would prove the company followed all the rules
and the CEO shouldn't be wearing an orange jump suit is now gibberish would be
bad. Data objects should be hashed going into the archival store and the
storage system should check data against these hashes periodically and on
retrieval. If the hashes don't match the
system should retrieve another copy.
Which of course implies the system should store multiple
independent copies, preferably in multiple locations. This can be through data scatter and gather
technology like Cleversafe's or simple replication between multiple
systems.  Policies should allow admins to
specify keep x copies in each of y locations.
Archives are data Roach Motels -- data goes in but it doesn't
check out for a long time.  While SarbOx
and other general business regulations require 5 or so years of data retention,
HIPPA and OSHA regulations require data be retained for 30 years or more under
some conditions. Since the volume of
data in an archive 20 years from now isn't something you can predict, the system
has to be extremely scalable.  Just
supporting 1,000 hard drives in many shelves on a small processor cluster like
most NASes isn't enough. This
scaleabliliy can be provided with removable storage or a RAIN
architecture, where many processing and storage nodes can create a single
storage cloud.