George Crump


Upcoming Events

Cloud Connect
Santa Clara
Feb 13-16, 2012

Cloud Connect brings together the entire cloud eco-system to better understand the transformation we're experiencing and promises to be the defining event of the cloud computing industry. Learn about the latest cloud technologies and platforms from thought leaders in Cloud Connect’s comprehensive conference.

Register Now!

More Events »

Subscribe to Newsletter

  • Keep up with all of the latest news and analysis on the fast-moving IT industry with Network Computing newsletters.
Sign Up

Deduplicating Replication - SEPATON

Sepaton is a pure VTL solution, meaning it is not available to backup to a NAS target like some of the other companies we have reviewed thus far. Sepaton is focused on enterprise data sets, which should be able to handle the challenges of a fibre attached backup device. They are also a grid or clustered based system, scalable to 16 nodes. Each node can perform deduplication as well as replication.

From a deduplication perspective, they deduplicate data as each job is completed and lands on disk. This means that smaller jobs could begin deduplicating while larger jobs are still backing up. One of the deliverables of a grid storage solution is that it helps insure that performance stays consistent. Additionally, all the nodes have access to all storage and a common deduplication repository. If inbound performance is a concern, then you could add additional nodes to increase ingest and deduplication bandwidth while maintaining deduplication consistency. The system has the ability to turn deduplication off on specific jobs if you think that there is going to be limited space savings when backing up a particular server.

Sepaton, like a couple of other deduplication solutions, uses a technique called forward referencing which leaves the newest copy of data complete. Essentially, forward referencing removes segments from older backup jobs and reference them forward, rather than removing duplicate segments from the current segment and referencing them "back" to the original data set. This method should also reduce the amount of fragmentation that can occur on other deduplication platforms. While having this data in its native form should improve recovery performance, it does take up more disk capacity. Fortunately however it is stored in a compressed state, so its typically safe to assume a 50 percent reduction in backup footprint on the non-deduplicated copy.

Having a complete non-deduplicated copy should mean faster recoveries. The intent is that if you need to recover data it is most likely going to be from your most recent backup. Having that data in its native format avoids the additional time it takes to re-assemble data from a deduplicated repository. With an IP or NAS attached appliance it is debatable if this additional time would even be noticeable because of the speed of the segment and the overhead involved in IP. On Fibre Channel, where bandwidth and overhead are often less of an issue, the performance impact of recovering from a deduplicated repository may be noticeable.

As for replication, those processes occur at the same time the deduplication process occurs. As unique segments are identified, the references are updated and the segment is replicated to the remote site. Just as all the nodes in the grid can perform deduplication, they also all can participate in replication. Each node has a 1GBE connection that connects in the customer's existing IP WAN. Finally, you can also specify which jobs should be replicated and which should not. If you have an SLA that requires that data be at the remote site within a given window, and you have a job that takes eight hours to complete, then you may have to change or break up those jobs a bit. Of course the reason you get a solution like Sepaton's is to reduce the length of a backup job in the first place, so then this may be less of an issue.

Related Reading


More deduplication Insights



Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Network Computing encourages readers to engage in spirited, healthy debate, including taking us to task. However, Network Computing moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. Network Computing further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Data Deduplication Reports

Research and Reports

Hypervisor Derby
August 2011

Network Computing: August 2011

TechWeb Careers