Network Computing is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Deduplication's Five Modes

I want to back up a bit in our deduplication discussion. I have had
trouble bracketing the deduplication field thus far, and maybe there is
a different approach. Let's discuss the modes of deduplication. I think
there are five: deduplication, replication, maintenance, rest and
restore. There is a sixth mode, move to tape, which is still relevant
for most data centers. I am going to pick these modes apart one at a
time and I may spend several entries on a single mode. If I don't cover
every aspect of each mode in a single entry, I ask your patience. If
you think I missed a mode let me know.

The duplication mode is where the community, myself included, has spent
much of our time arguing about when and where deduplication should be
done. But if you are a user of this technology, while this mode is
important, what should matter most is if the deduplication process can
be done in an appropriate amount of time for your requirements, and if
the end result of this mode delivers a high enough level of
optimization to make deduplication a worth wild investment.

Deduplication can typically happen either before the data is sent to
the backup device or it can be sent when the data gets to the device.
The advantage of deduplicating data before it gets to the device,
commonly called source side deduplication, is that it reduces the demand on
the backup network and should make the actual storing of the data
relatively quick. The downside is that this requires a replacement of
the backup application or a new agent from your current backup software
supplier. Another potential downside is that there may be a performance
impact to the server being backed up. The performance issue seems to
have been reduced in recent years as the software suppliers have
improved the agents. It also helps that there is now additional
processing power in the CPUs of the servers being backed up. In short,
they can do more tasks.

The other option is to do the deduplication at the target. This can be
done via the backup application itself or by the deduplication system.
In these scenarios, the entire backup data set is sent across the
network, no different than most other backup processes. The advantage is that there is little to no change in the backup agent or the
backup process. The system's approach, which is typically a disk based
appliance with deduplication capabilities, is merely a target to the
current backup application. The backup software option again requires a
replacement of your current backup application.

Which one is best? Really it depends. All claim very good performance.
Source side deduplication may have some benefits in network bandwidth
constrained environments, but like backup applications that are now
adding deduplication, they require a change of backup software. In both
cases, the move to one of these products should be considered as
seriously as switching to a new backup application. Deduplication
systems on the other hand provide storage efficiencies with limited
changes to the environment but they do require the same, continued
investment in the backup infrastructure as the underlying data set
grows.

  • 1