Network Computing is part of the Informa Tech Division of Informa PLC
Plot An Effective Data Archive Strategy
An effective archiving or data retention solution should do three things. It should let you store less-frequently-accessed data at a lower cost than if you kept it on primary storage. It should allow you find and access data relatively quickly -- but it does not have to be instantaneous like it is on primary storage. Finally, stored data should be durable. Data you archive today should be readable 10 years from now and beyond.
We used to say that all data has a decaying value; the further away from its creation date it gets, the less valuable that data becomes. Compliance and regulatory requirements as well as big data analytics and archive have changed that. We now have to assume that all data will become valuable again -- we just don't know which data or when. If decades from now your grandchildren check into a hospital, the doctors might want to access your medical records. They need them quickly and they better be readable.
In theory, these archiving needs strengthen the position of many disk-based object-storage vendors. Their systems can provide data durability as well as quick access and cost effectiveness when compared to primary storage. The problem is that object storage is not as inexpensive as tape storage nor is it as power efficient.
[ Learn more about archiving schemes. Read Find The Right Data Archive Method. ]
Because we are talking about potentially storing all data for decades, we need to do everything we can, without putting data at risk, to reduce the overall storage cost of the system. After all, those records won't do you any good if the hospital can't afford to keep the system that stores them powered on and up-to-date.
However, before we turn over all archive data to the object storage vendors, there is a part of that "all data has a decaying value" theory that is still applicable. It's this: All data has a decaying speed at which it needs to be accessed. Using our medical example above, the doctors might need to access your medical records 50 years from now, but they probably don't need to have them in seconds. They can probably wait a minute or two.
As I noted in my article "Comparing LTO-6 to Scale-Out Storage for Long-Term Retention," in these situations tape is an ideal storage type. Data on tape can still be automatically scanned for durability and it certainly meets the cost-effectiveness requirements. What surprises most people that are either new to tape or have forgotten about it is how quickly a modern tape library can deliver data. In most cases access takes less than a minute; in the worst case it is two to three minutes.
Understanding The Data Access Decay Rate
The speed at which you need to have data returned to primary storage will depend on the needs of the business. Because the predictable response to, "How long can you wait?" is, "I need it now," it is important to make sure that business line managers understand the value of waiting. If they understand that waiting two minutes could save the organization $2 million a year in storage expenses, waiting sounds much more attractive. In almost every case the durability of the data is far more important than the speed at which it can be recovered.
I typically suggest a blended strategy: As little primary storage as possible, a reasonable amount of object/archive storage, and a hefty amount of tape. The amount of object/archive disk storage will be driven by your data access decay rate. For many organizations that might mean keeping all data on object storage for three to five years. For almost all organizations, longer-term retention should be on tape. This blended strategy gives the right balance between access, affordability and durability.
Recommended For You
From infrastructure to app delivery, from data to applications, it’s past time to modernize your practices, processes, and providers to ensure you’re able to take advantage of AI and whatever comes next.
What skills do network managers really need to properly secure industrial networks? What new protocols, frameworks, and regulations are important? And what conferences and certifications can help? Here are five tips to get started.
A full-stack approach to retail edge offers retailers a way to optimize operations and adapt to changes in a post-pandemic world.