A concern that I have had with source side deduplication is the additional load that is being placed on the servers being backed-up. The process of identifying duplicate information can place a burden on the local system. Atempo says they have not seen any reports of significant impact. Time allowing, I'd like to get these applications in the lab and see what the impact is. This is something worth testing.
Atempo's approach has taken the traditional three-tier architecture of backup applications (client, server, device) and added a fourth that they call HyperStream Server. HyperStream is software only; you supply the server hardware and storage for it to use. The agent, if it is using deduplication, sends its data to the HyperStream server which uses, stores and manages the deduplicated data repository. The agent will verify with the HyperStream server prior to sending new blocks if the data already exists. The agent can also compress the block prior to sending it as well. If the HyperStream server does have a prior copy of the data, it updates the reference and tells the agent not to send the data block. If it does not have that data block, the agent sends the data block, and then HyperStream stores the data and updates its reference database.
HyperStream Server is controlled and managed from Atempo's server software Time Navigator. Deduplication is optional, so if you choose to send data directly to tape or standard disk, that data is sent directly to the Time Navigator server. Time Navigator and HyperStream can co-exist on the same physical server but there are two separate storage areas for the data types. When it comes to replication, one HyperStream server can communicate to another. Similar to the client software, it can compress data before it sends the information to the next remote HyperStream media server. Like the client, it only sends unique information to the remote HyperStream server. At this point, the replication is one-to-one; Atempo does not yet have the ability to fan in multiple sites to a single site.
Assuming there are enough client CPU cycles available for source-side deduplication, there are clearly advantages to not having to send all the data prior to finding out if it already exists and is going to be discarded. The addition of the fourth tier, in this case HyperStream, is really not that much different than adding a deduplication appliance. If you are re-evaluating your backup software then looking at one with deduplication makes sense.