11:55 AM
Jasmine  McTigue
Jasmine McTigue
Repost This

Microsoft ReFS and Oracle ZFS: How They Compare

Microsoft's ReFS and Oracle/Sun's ZFS file systems are designed to remedy errors that might otherwise corrupt data. Here's how they stack up against one another.

Storage pros generally regard hardware-driven, RAID-based file systems as being highly reliable. The truth is, significant potential for error still exists, especially as systems scale for cloud architectures and big data applications. This potential for error is compounded by a lack of operating system or hypervisor awareness of low-level data error, as abstracted storage resources are passed up the stack and trust in the integrity of data is assumed.

It's also true that simple RAID parity schemes are inadequate to accommodate volumes in the thousands of disks. If you'd like to take a very deep dive on how storage methodologies are failing the cloud revolution, check out Stanford professor David S. H. Rosenthal's paper on data storage, "Keeping Bits Safe: How Hard Can It Be?"

To stay current, storage delivery engines need to get smarter and more transparent to the rest of the stack, and that means better file systems capable of detecting and remedying errors in real time for volumes that cannot simply be brought offline regularly for something as routine as error detection. Two options include Oracle's ZFS and Microsoft's ReFS. I'll compare both.

Sun started working on real-time error detection as early as 2005 with the initial development of Solaris 11, codenamed Nevada. The prototype product's open-source, next-generation file system was dubbed ZFS, and has continued to undergo development and improvement through the acquisition by Oracle and into the present day.

[SSDs can have an uneasy relationship with RAID. Howard Marks says RAID has to change. Find out why in "SSDs Vs. RAID? Fix RAID."]

Microsoft is attempting to create something similar in its new ReFS file system. ReFS is featured in Windows Server 2012 Storage Spaces. At its core, ReFS attempts to solve the same essential issues as ZFS while maintaining NTFS file system compatibility for legacy Windows applications, services and infrastructures. However, ReFS has some catching up to do compared to Oracle's mature product.

Both ReFS and ZFS offer several basic improvements over legacy file systems. First, both detect and repair data errors in real time, without dismounting or interfering with access to the volume. For ZFS, this is done at both the data level and metadata level through the use of hierarchical checksums that persist all the way up the file structure to the root node.

When a file is read, the checksum of the read file is compared to the stored checksum and errors are detected automatically. In ReFS, checksums are taken on file metadata. An additional optional feature called Integrity Streams makes sure that when changes are written, they are written to a different member volume, ensuring that an original is not compromised. This error detection fights against bit rot, the degradation of stored data on disk, as well as hardware failures, firmware glitches, cosmic radiation and other integrity damaging events without compromising the availability of the online file system.

While the two file systems share similar features, ZFS distinguishes itself in performance across parity striped volumes. According to Microsoft documentation, parity on ReFS comes from the paired Microsoft feature Storage Spaces, and requires that existing parity information " read and processed before a new write can occur."

While Microsoft advocates using dedicated SSDs for journals to mitigate poor write performance, ZFS combats this effect directly by writing parity information with a variable-sized stripe. This mitigates write speed penalties and also corrects the infamous "RAID 5 Hole," a limitation in conventional RAID systems that allows data to be lost in the event of a crash or power failure, which actually constitutes a whole new paradigm in the way that RAID sets implement parity.

The two volumes differ widely in terms of caching implementations as well. ReFS can accommodate SSD caching, but architectural details regarding ReFS caching functionality are sparse, though they do state that DRAM caching is not supported.

By contrast, ZFS caching is well documented and DRAM cache is a core feature. ZFS features three levels of caching: ARC, which is intelligent memory caching to RAM and uses as much free RAM as available; L2ARC, which is disposable (non-storage) SSD read caching; and ZIL, which is SSD write caching that buffers writes to the underlying volume.

ZFS supports using PCIe SSDs or battery-backed RAM disk devices with ultra-low latency as ZIL to dramatically reduce write latencies to ZFS volumes. Microsoft's only statement on this is that "customers can use third party solutions for this."

Both file systems support snapshots, encryption, very large disk sets, enormous numbers of files and a maximum volume size of 16 exabytes. ReFS, however, does not support compression or de-duplication, both of which are core features in ZFS, even though Windows Storage Spaces does support de-dupe on NTFS volumes.

The net result is a vastly dissimilar product offering that seems to offer similar features without closer inspection. On the surface, ReFS and ZFS may seem to be competitors, but Microsoft's first-generation ReFS offering is hardly in the same ballpark as the more mature ZFS. Still, compared with Oracle's 12+ year developmental lead, ReFS is a noble effort to keep Windows competitive.

Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
User Rank: Apprentice
7/30/2013 | 10:24:02 PM
re: Microsoft ReFS and Oracle ZFS: How They Compare
If the author reads the wikipedia article on ZFS, she will see that development started back in 2001, not in 2005. In the wikipedia article there are research papers that shows ZFS detects and corrects all artificially created data corruptions [web link nr 22 in the wikipedia article], whereas NTFS fails [web link nr 17 in the wikipedia article] to even detect the artificially injected errors - by the research team.

Also, ReFS has checksums on metadata (log journal, etc) which means that the data itself might still be corrupt. You need checksums on the data itself, to detect data corruption. As of now, ReFS might detect corruption on metadata, but not corruption on the data. Ive heard that it is possible to turn on checksums on the data too in ReFS, but that is not done by default. You need to manually tweak it. Why? Maybe it is an experimental feature? If it worked perfect, then it would be turned on by default.

Microsoft developer talks about ReFS:
"Oh god, the NTFS code is a purple
opium-fueled Victorian horror novel that uses global recursive locks and
SEH for flow control. Let's write ReFs
instead. (And hey, let's start by copying and pasting the NTFS source
code and removing half the features! Then let's add checksums, because
checksums are cool, right, and now with checksums we're just as good as
ZFS? Right? And who needs quotas anyway?)"
More Blogs from Commentary
SDN: Waiting For The Trickle-Down Effect
Like server virtualization and 10 Gigabit Ethernet, SDN will eventually become a technology that small and midsized enterprises can use. But it's going to require some new packaging.
IT Certification Exam Success In 4 Steps
There are no shortcuts to obtaining passing scores, but focusing on key fundamentals of proper study and preparation will help you master the art of certification.
VMware's VSAN Benchmarks: Under The Hood
VMware touted flashy numbers in recently published performance benchmarks, but a closer examination of its VSAN testing shows why customers shouldn't expect the same results with their real-world applications.
Building an Information Security Policy Part 4: Addresses and Identifiers
Proper traffic identification through techniques such as IP addressing and VLANs are the foundation of a secure network.
SDN Strategies Part 4: Big Switch, Avaya, IBM,VMware
This series on SDN products concludes with a look at Big Switch's updated SDN strategy, VMware NSX, IBM's hybrid approach, and Avaya's focus on virtual network services.
Hot Topics
Converged Infrastructure: 3 Considerations
Bill Kleyman, National Director of Strategy & Innovation, MTM Technologies,  4/16/2014
White Papers
Register for Network Computing Newsletters
Current Issue
Twitter Feed