Network Computing is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

More on Advanced Erasure Codes

As we've previously discussed in "What Comes After RAID? Erasure Codes," forward error correction coding is a leading contender to replace parity RAID as disk hardware evolves past the point where parity provides effective protection. The question remains: Are Reed-Solomon and related coding techniques the inevitable replacement for parity in the RAID systems of the future?

I've had some interesting conversations on the subject with some fellow members of the storage cognoscenti. My friend, and soon to be fellow Network Computing blogger, Stephen Foskett thinks that high-level erasure codes are ready for primary storage. While I'm relatively satisfied that advanced erasure codes are a better solution than even double parity for secondary storage applications like backups and archives, I have a few reservations about advanced erasure codes for latency-sensitive applications such as online transaction processing databases.

I'm most concerned about the overhead, and resulting latency, that advanced erasure codes present for small writes. In any RAID system beyond mirroring, the systems write behavior depends on whether the data write is smaller or larger than a stripe across the entire RAID set. For large write requests--like those in backups, digital video or other multimedia applications--writing large chunks of data across X drives generates X+N disk I/Os, where N is the number of additional ECC blocks. So for RAID 5 it's X+1, RAID 6 N+2 and for a Reed-Solomon scheme like CleverSafe's 10 of 16 coding 10+6.

Things get more complicated when the amount of data to be written is smaller than X times the stripe size, which is commonly 4-128KB, for the array. To write one 4K database page, a RAID 5 system would have to read X blocks, recalculate the parity and then write X+1 blocks. On the 10 of 16 system it would have to read 10 blocks and then write 16. Add in that the processor on the storage system has a lot more math to do, and this process could add substantial latency to every small write.

One solution to this is to front end the erasure code logic with a write-in-place write-in-free-space file system, like WAFL, ZFS or Nimble Storage's CASL, that uses logging to maximize full stripe writes.  Since the file system already separates the logical data location from its physical location, it can coalesce small writes to a journal, which can be stored in flash, and the write multiple, otherwise unrelated writes to disk in full stripe writes. Since these systems often use redirect on write snapshot technology, they also want to leave the old data in place for snapshots.

  • 1