Network Computing is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Does Primary Storage Deduplication Kill Archive And Backup?

As we begin to test primary storage deduplication technology, our initial findings are that the latency it introduces may be a non-issue
for many data centers and applications. It may soon be a non-issue for
all data centers and applications. If you can get deduplication on
primary storage for "free" from a performance perspective, what is the
impact of primary storage on the other tiers of storage? Does primary
storage deduplication kill archive and backup?

Think of this scenario. In the not too distant future, you buy a storage
solution from a vendor, it has a few shelves of solid state storage, a
few shelves of 15k SAS drives and many shelves of SATA storage. All of
the storage is under the control of a single storage management software
running either on the storage controller or across storage nodes.

Data
is either deduplicated inline or post process and may or may not be
compressed. The result is that a 100TB storage system may now store
10PBs of actual data but only a small fraction of that data resides on
either the SSD or SAS tier. This is because the storage system
automatically moves data up and down the storage types based on age or
other user defined parameter with auto-tiering. Performance is high and
costs are under control.

The impact of this type of storage system on archive and backup systems
could be significant. I think that archive specifically could be a thing
of the past. Not the elimination of archive as a process but the
elimination of archive as a stand alone storage system. If I can store
10PBs of information in one system why wouldn't I? No matter how cheap I
make the disk archive I still need primary storage. If the archive can
reside in the SATA storage area, unless the primary storage vendor
is charging a ridiculous premium for that SATA storage, it is
something to consider.

Backups are equally at risk. Most primary storage suppliers claim either
unlimited or high numbers of snapshots, so roll back in time can be
covered. Most if not all primary storage suppliers can replicate data to
a secondary site, so failure of the primary system or even the site does
not mean data loss. The only concern is if something goes wrong in the
handling of meta data. If a corruption is introduced or when you hit
5PBs of snapshots, the system just fails. We've seen no indication of
that, but it could happen. At some point you are going to want your data
on some other platform just in case this type of scenario occurs.

  • 1