From a deduplication perspective, they deduplicate data as each job is completed and lands on disk. This means that smaller jobs could begin deduplicating while larger jobs are still backing up. One of the deliverables of a grid storage solution is that it helps insure that performance stays consistent. Additionally, all the nodes have access to all storage and a common deduplication repository. If inbound performance is a concern, then you could add additional nodes to increase ingest and deduplication bandwidth while maintaining deduplication consistency. The system has the ability to turn deduplication off on specific jobs if you think that there is going to be limited space savings when backing up a particular server.
Sepaton, like a couple of other deduplication solutions, uses a technique called forward referencing which leaves the newest copy of data complete. Essentially, forward referencing removes segments from older backup jobs and reference them forward, rather than removing duplicate segments from the current segment and referencing them "back" to the original data set. This method should also reduce the amount of fragmentation that can occur on other deduplication platforms. While having this data in its native form should improve recovery performance, it does take up more disk capacity. Fortunately however it is stored in a compressed state, so its typically safe to assume a 50 percent reduction in backup footprint on the non-deduplicated copy.
Having a complete non-deduplicated copy should mean faster recoveries. The intent is that if you need to recover data it is most likely going to be from your most recent backup. Having that data in its native format avoids the additional time it takes to re-assemble data from a deduplicated repository. With an IP or NAS attached appliance it is debatable if this additional time would even be noticeable because of the speed of the segment and the overhead involved in IP. On Fibre Channel, where bandwidth and overhead are often less of an issue, the performance impact of recovering from a deduplicated repository may be noticeable.
As for replication, those processes occur at the same time the deduplication process occurs. As unique segments are identified, the references are updated and the segment is replicated to the remote site. Just as all the nodes in the grid can perform deduplication, they also all can participate in replication. Each node has a 1GBE connection that connects in the customer's existing IP WAN. Finally, you can also specify which jobs should be replicated and which should not. If you have an SLA that requires that data be at the remote site within a given window, and you have a job that takes eight hours to complete, then you may have to change or break up those jobs a bit. Of course the reason you get a solution like Sepaton's is to reduce the length of a backup job in the first place, so then this may be less of an issue.