CAS Conundrum
Software-based CAS? That idea might find some traction with customers
January 25, 2006
How do you define "sticky technology?" It's a fashionable term being bandied about by purveyors of content addressable storage (CAS) products, but what is it?Figure 1:
I'll give you a hint: One of its many synonyms is "lock-in." Once you place data that you need to retain for a protracted length of time into one of these sticky CAS systems, the vendors have usually seen to it that you will never be able to buy anyone else's storage but theirs.
If you're OK with that -- go read another column or article on this site. If you want to avoid vendor lock-in that limits your choices and perpetuates insane costs by constraining your ability to take advantage of lower-cost, best-of-breed alternatives, then stay with me.
The "sticky" CAS strategy was pioneered by EMC, always the innovator, with its Centera platform. A couple years ago, they acquired the assets of a small Belgian firm, FilePool, and embedded the technology right onto a hardware controller. Wedding the controller to some commodity disk, they jacked up the price of both and contextualized the resulting "solution," which was unabashedly sticky, as "long-term retention and compliance in a box." They sold the heck out of it, creating a new market niche in an increasingly bland and undifferentiated storage landscape.So successful was the EMC product that it spawned a frenzy of development efforts in other shops. Disk array manufacturers were delighted by the concept of CAS, seduced by its Japanese mortgage-like qualities (150-year mortgages that obligate you and three generations of your kids to pay off the note), and intrigued by the willingness of consumers to invest in something that was so patently a hardware vendor ponzi scheme.
I expected products to proliferate, and sadly, they have. But late last year, a ray of hope appeared.
That's when I ran into a fellow named Paul Carpentier. He was the guy behind FilePool technology (EMC bought four of his patents along with the company) and he piqued my interest when he started enumerating the many flaws in the EMC implementation. I had heard consumers bemoan the foibles of the product: its long rebuild times for failed disks, its tendency to lose data in the wake of multiple disk failures, challenges in backing up and restoring data using tape given that the data was no longer data but rather "objects," the inability to restore backed up objects with any sort of granularity, for starters.But these complaints were often dismissed by the vendor and by other satisfied customers that the vendor pointed in my direction. Most consumers were satisfied to solve their compliance needs with a single sales contract, whether or not the solution was really a solution over the long haul.
Carpentier's commentary was different. As the guy who wrote FilePool, he was in a position to know where the bodies were buried in the technology or at least in EMCs implementation of it. He filled my ears with a technical critique to which he had clearly given much thought. I'll share the key points of our conversation with you here.
Toigo: What, in your view, is wrong with current generation of CAS products?Carpentier: First is their lack of a true architecture. Complexity on the outside reflects what happens on the inside: as a consequence, performance, robustness and scalability suffer to the point of turning current product offerings into niche solutions rather than the very broad, universal storage category for reference data that it could be.
Second, because CAS applications are built on top of complex and proprietary APIs, customers are typically locked into a single vendor’s hardware/software. And even if it is praised as built with commodity hardware, it doesn't make much economic sense if it's being sold at 10 times the price of an ordinary PC with similar specs.
Third, hashing algorithms used to secure content have been compromised -- or will be in the medium term because of ever-progressing computer power. It is impossible to upgrade hashing algorithms without reworking all your data, not just in your CAS systems but in every single application holding hash-based CAS identifiers. It's just plain impossible in 99 percent of the cases and it defeats the purpose of CAS. Expect some major blame games when customers and their audit departments start to come to grips with the fact that tens of petabytes stored on CAS systems worldwide have lost their integrity guarantees.
Fourth, APIs must be re-ported to every development and production environment on every relevant platform. Waiting for yours to hit the streets becomes a national pastime.
Fifth, because of lackluster performance, lost node or disk recovery takes massive amounts of time, thereby strongly affecting robustness as it opens excessively long time windows where double disk failures results in irrecoverable data losses.Sixth, vendors are positioning and selling CAS for archive use only, in order not to cannibalize their SAN/NAS product offerings.
Toigo: What do you think can be done to improve CAS?
Carpentier: CAS, in essence, is a very simple concept. But in a number of aspects, it's also very counterintuitive to some people. Also, no architecture has established itself as a role model in the market yet. Since storage engineers in general aren't used to fundamental innovation, less than stellar "CAS-oid" architectures are the result. Fundamental new blueprints will have to come from outside the established market and its vested interests. Only a very strong architecture will be able to support a whole new category of applications at the performance and scalability levels required today and tomorrow, while delivering full fault resilience as well as hardware and software upgrading and migration without ever needing to shut down.
Toigo: Are you suggesting that CAS needs to become a software solution, rather than hardware?
Carpentier: Yes, a hardware-agnostic, CAS-in-software product could leverage anyone's true commodity hardware. For the first time, enterprise-size storage resources would be available at consumer technology unit prices. Combined with near-zero administration and maintenance requirements, this will result in total cost of ownership levels that may drive whole new categories of storage-intensive applications.Toigo: The hardware CAS players would argue that specialty controllers are needed to impose data location and security hashing schemes, and that this is done most efficiently on the array via a specialty controller. Isn't this where your new company, Caringo, is headed?
Carpentier: How ever elegant it may have seemed at the time [that Centera was introduced], the direct use of hashes as long-term unique identifiers in compliant storage and archiving applications was a mistake, as it precludes the transparent upgrading of hash algorithms in reaction to ever-progressing -- and quite unpredictable -- attacks. The only way out here is to use permanent, unique identifiers that are provably associated with underlying hash functions, of which both kind and strength can be upgraded as time progresses. So, it is necessary to offer a transparent process of provable integrity that associates the identifier with a stronger hash; all this needs to happen before the older hash function goes stale. In that way, the provable integrity of digital content needs no longer be limited in time. It is exactly in this area that Caringo holds exclusive patent-pending technology that will start the era of true long-term digital storage.
Toigo: What are some of the ingredients of a software-based, hardware-agnostic CAS solution that you envision?
Carpentier: Open protocols, rather than closed proprietary APIs are the answer. Using our CAStor product as an example, we are using a simple HTTP-based interface to make sure that nearly any environment on any platform (from cell phone to mainframe) can start talking to CAStor right away. In the long term, standardizing efforts like the eXtensible Access Method (XAM) might start offering alternatives, but only if customers are vocal enough to request functional, high-performance, truly standard interfaces -- which happens to be against the interests of the proprietary vendors. But waiting until that happens may cause the CAS market to never really take off and leave its many promises unfulfilled.
Toigo: Even when you deliver CAStor to market, there will be debates about proprietary nature of such software versus proprietary hardware. Isn't performance what really matters at the end of the day?Carpentier: True performance is an often underestimated ingredient of true robustness. As long as hardware failure remains possible (double disk failures in current hardware-based CAS products being a case in point), the reliability of system recovery will depend greatly on sheer performance. In a scalable system, this can only be realized through a non-compromising, true massively parallel design. [Leveraging a parallel design enables] disk failure recovery many times faster than even the fastest hardware RAID solutions, thereby nearly eliminating the probability of data loss through double disk failure.
Toigo: It seems that you are positioning content addressing as a service rather than a stovepipe. That makes sense to me. Why hasn’t it been delivered that way?
Carpentier: From an existing SAN/NAS vendor's perspective, not being eager to cannibalize your existing product ranges and technology is quite understandable. That is why they will never drive breakthrough innovation in a marketplace like CAS. They might try to use it to compete in segments where they had no or weak presence; that is why EMC positioned Centera in the archiving corner against tapes and opticals -- product ranges they didn't offer. But they won't take any chances with their main markets. At the opposite side of the spectrum, startup companies that have no vested interests in the market whatsoever are able to leverage precisely that opportunity, and will drive innovation.
Additionally, Caringo believes that a [software-only CAS solution] has a role to play in the stark simplification of ILM: It's fast enough to serve as primary storage, less expensive than anything else for massive general storage, and has specific capabilities to deal with compliance as well as long-term archiving requirements. Additionally, it allows for the inclusion of any kind of metadata with the stored information at object creation time. Caringo believes that for a wide range of enterprises, at least as far as fixed content is concerned, single-tier storage will become a very attractive, very realistic option.
* * *When I originally chatted with Carpentier and his partners at Caringo, CAStor was just an idea. It inspired little more than a polite nod after being unveiled to a few of the big league storage players. Seemed to me that everyone knew Carpentier was right, but they just weren’t prepared to savage their own plans for hardware stovepipes that would harvest the low-hanging fruit that EMC was busily exploiting.
Not to be dissuaded, Carpentier and company continued the development of CAStor, a software CAS "that will run on clusters built with any brand or form factor PC architecture nodes, using any kind of disk technology -- although we expect SATA to be most popular by far."
Carpentier beams when he talks about the new approach, "Nodes will be interconnected with regular Gbit/s Ethernet; the cluster architecture is fully open and symmetrical. It will be targeting both the massive clustered storage as well as the traditional CAS markets. Regular local replication and data protection are part of the base product while Caringo's stronger, patent-pending upgradeable hash protection and audit functions will be offered as an option, (currently code named Pollux after CAStor's twin brother)."
CAStor is in beta now and the gloves are off. You'll be able to learn more from a company Website, www.caringo.com, on Feb. 15.
In the meantime, the gospel of software-only CAS is already spreading. A short time ago, I had an opportunity to chat with Nexsan Technologies executive vice president Diamond Lauffin about the company’s new Assureon product. Seems that Nexsan has had a change of heart from last summer when CEO Phil Black took to the stage at Storage World Middle East and shared plans for a forthcoming "sticky CAS solution" that would rival the price point of EMC Centera. I gave Black a negative review at the time for aspiring to be EMC's mini-me.After much soul searching, Lauffin admitted that the tune has changed. Assureon leverages platform-agnostic software that can be used with non-Nexsan gear as well. While some value-added features of Nexsan array controllers could be leveraged to further secure content-addressed data in the vendor's scheme, anyone's hardware would work with the software.
These changes, and the appearance of hardware-agnostic CAS generally, are a bright spot in storage today. Consumers should show their support and give their encouragement. At a minimum, be very circumspect about buying hardware CAS systems in the market now. Soon you'll have good alternatives for deciding for yourself where it makes sense for information to live.
— Jon William Toigo, Contributing Editor, Byte and Switch
What other storage applications are ripe for commoditization in software? Write Jon William Toigo with your suggestions.
Organizations mentioned in this article:
EMC Corp. (NYSE: EMC)
Hitachi Data Systems (HDS)
Nexsan Technologies Inc.0
You May Also Like