Building a New Storage Roadmap

  • It’s safe to say that we haven’t had this much happening in enterprise data storage for three decades. Things are evolving on every front, from software to networks, from drive interfaces to drives themselves and the vendor landscape is rapidly changing, with many new players and some signs of struggle among the old leaders.

    The Dell-EMC merger is one of the waves of change flooding through what had once been the steadiest and slowest-evolving segment of IT. Here was the giant of the storage industry recognizing that business fundamentals such as hardware platforms were becoming commodities and that failing to adopt a software and services worldview was a recipe for disaster.

    Who would have thought the mighty RAID array would begin to lose market share so quickly? Likewise, even leading-edge pundits are surprised at the growth of public clouds, while the 100 TB 2.5 inch solid-state drives projected to arrive in 2018 have hard-drive makers a bit panicked, especially Seagate.

    You might be taken aback if I say all these changes are  just a taste of what's ahead. The next three or four years will see a much wider restructuring of storage, impacting what and how we store data in ways that will astonish, surprise and perhaps even scare you. The idea of fixed-size blocks of data has been in place for so long that it is a pillar of the storage faith. With some of the new technology, storage becomes byte-addressable and everything we know about storing an entry changes, from hardware to operating systems, compilers and applications.

    Byte-addressability is on Intel’s Optane roadmap, so it’s real. Remember, Intel can do the CPU tweaks for space management; it owns the leading compilers and link editors, so that code can be created to a standard, and it has storage devices in the pipeline.  The result will be blindingly fast data storage. Instead of moving 4K bytes using a driver and SCSI software stack, data can be permanently stored with a single CPU command!

    If all of this isn’t enough, servers and storage appliances are converging on a common design, where the storage bays of a server are sufficient for a set of very fast SSDs, which then can be accessed across the cluster as a pool of storage.

    But there’s more! Instead of the hierarchical architectural model that has been around from the start of the computing era, new server designers, such as Gen-Z, place memory and storage at the same peer level as CPUs, GPUs, and networks on a shared fabric. Now all of these super-blocks can reach out over the fabric to other computers and read and write directly to their memory or storage. This is indeed a “pool of resources,” but managing it requires a new view of how resources are accessed and allocated.

    Software-defined infrastructure is the new mantra for these virtualized systems. All the resources are virtual elements in a shared pool, with policy-driven orchestration managing the virtual resources and tying them to physical gear as needed.

    Part of the SDI concept is the use of chainable microservices, with instances being created to host more copies of any service as needed to meet demand. With software services so divorced from the base hardware, the value of the system shifts to the services and the hardware becomes standardized, but very inexpensive. This underscores the wisdom of the Dell-EMC merger.

    Let’s take a closer look at the changes ahead for enterprise data storage.

    (Image: WIRACHAIPHOTO/Shutterstock)

     

  • The miracle of solid-state

    Flash has set storage free from the bonds of spinning rust! After three decades of snail-paced progress from 8-inch drives with 50 IOPS to 3.5-inch drives with just 300 IOPS, SSDs offer as many as 2 million IOPS per drive. Vendors have capacities of 30 and 100 TB on their roadmaps for 2018 while power profiles beat the idling power of the coolest hard drive.

    SSDs still cost more than hard-disk drives, but the advent of 3D NAND will drive cost down while capacity per drive rockets. 3D NAND takes advantage of the tiny power usage of flash to stack layers of cells on a die and then stack die on each other. Since these die are wafer-thin, there’s plenty of room for more vertical stacking

    We can expect 64-layer die this year, with Toshiba planning 192 layers next year and then die stacking up to four layers, giving a terabyte or 2 TB per flash chip. Let’s be real: spinning rust can’t even get close to this sort of density. The best HDD capacity in 2018 will be 14 TB in 3.5 inch form factor. As for HAMR, with some doubt about its viable in mass production, the cost of hard drives will rise significantly, while achieving capacities beyond 20 TB will be a challenge.

    Looking out a few more years, 100 TB NVMe SSD drives will be the standard drive for servers and capacities of 200 TB will be on sale.

    (Image: jules2000/Shutterstock)

  • Resolving SSD write wear

    SSDs used to wear out. Writing stresses the bit cells, so that threshold voltages shift and bits get lost. This limited us to single bit cells (single-level cell), but better production control and electronics made 2-bit cells the norm a few years back and now 3-bit/cell (triple-level cell) and even 4-bit (quad-level cell) cells are looking useful in mainstream storage.

    The key has been huge research investments aimed at profiling write operations, understanding aging and finding optimum wave-shaping for writes and reads. Coupled with advanced error correction codes, wear life is now well managed.

    Still, there's a usage issue with SSDs. Some are specified for very high write rates, while others aim for more read-biased environments, but the drives come characterized for wear life and it’s easy to select the model you need.

    QLC drives are targeted for the cold storage market, otherwise known as “write-once, read mostly.” They will only support hundreds of write operations per cell, but that’s plenty for a Facebook to archive photos, for example.

    Bottom line: wear-out isn’t an issue today. On a humorous note, I would guess that some brave soul at an HDD manufacturer applied the wear-life concept to HDDs and the numbers for typical HDDs were lower than SSDs!  

    For SSDs, wear life is typically described as “full-drive writes daily” for a five-year period.

    (Image: jules2000/Shutterstock)

  • New storage appliances

    RAID is so obsolescent! The “many-drives, few-interfaces” model of the RAID array breaks down with ultra-high drive bandwidths. RAID can’t keep up with SSDs. The answer is to replace RAID with smaller, server-like boxes with say six or 12 drive slots. The host-facing interfaces better match the speed of the smaller count of SSDs and overall the cluster’s bandwidth is much higher.

    SSDs are so fast that even small appliances have excess bandwidth. Many use the spare IOPS to support data compression, so those 100 TB SSDs are effectively going to be 0.5 PB capacity units. With the boxes shrinking due to using a 2.5 Inch form-factor, we could easily see 5 PB effective capacity in a single 10-drive 1U box in 2018.

    Now, a single box isn’t the modern answer. Data availability has taken a quantum leap with data replication and/or erasure coding across multiple appliances. Replication can protect against two out of three appliances dying, while erasure coding can support levels such as six of 10 going away. Geographical dispersion of the data replicas can provide disaster protection, at least against natural calamities.

    But why have separate servers and storage boxes? If the storage box looks like a server, uses COTS processors such as x64 CPUs, and has an Ethernet interface, why not make it a server that runs storage too, pool all the drive space and hyperconverge everything. This is the future of the server farm, and it is spawning software-defined infrastructure (SDI), which I’ll talk to later.

    (Image: Pilar Alcaro/Shutterstock)

  • New storage interfaces

    We’ve reached the end of the road for the old set of interfaces. SAS is being supplanted by NVMe; SATA will follow soon down the same path, simply because NVMe is as cheap as both of them and blindingly fast in comparison. NVME also allows data and statuses to be directed to originators easily and reduces system overhead by huge amounts.

    Fibre Channel is fading away too. It no longer offers the performance leadership it once enjoyed. Ethernet is now faster, and already has years of experience with the new RDMA approach, supporting NVMe across Ethernet and facilitating HCI clusters with a converged single interconnect fabric.  With many competing vendors, Ethernet is more cost competitive.

    Single-lane 25 GbE/quad-lane 100 GbE has supplanted 10/40 GbE on a revenue basis after just a year in the market. Next year, we'll see 50/200 GbE on the market and it’s a good bet that 100/500 GBE will arrive by 2021. Thanks to the leadership of Mellanox, all of these speed ratings support RDMA, providing a way to peek and poke directly into the memory of other appliance nodes. It’s much faster than archaic fixed block I/O.

    (Image: Quality Stock Arts/Shutterstock)

  • Fresh server architectures

    What if memory, drives, CPUs and GPUS and LAN connections were all connected as peers inside a server? This is what most of the industry sees as the next-generation system. Gen-Z is a consortium of pretty near major tech company but Intel, driving this type of configuration. The new server class will have much higher bandwidth, from memory to communications, and GPUs will avoid the PCIe bottlenecks they currently suffer from, having to copy memory to process it. The boost in power will be tremendous, and the architecture makes tremendous sense for HCI clusters. Bottom line is that, while Gen-Z may be different once Intel states the direction it will take its architecture, there will be a similar solution and the server world will change.

    At the same time, variations on the Hybrid Memory Cube concept architecture will bring DRAM, flash, and processors closer, providing a huge boost to memory bandwidth. It looks like the HMC solution will first create a new 32 GB CPU cache layer, boosting performance drastically, but merging the concept with Gen-Z may supply CPU/memory complexes with more than a terabyte of memory bandwidth and persistent on-board memory, too.

    (Image source: Gen-Z Consortium)

  • NVDIMM

    Put persistent memory in a DIMM? Flash allows that to happen. The exciting part is that the flash can be accessed at roughly twice the speed of the fastest SSD. These NVDIMMs are used as block I/O devices, but Intel has just launched Optane and is working on a byte-addressable mode of access.  I did some early work on this approach and the idea of using a single CPU command to write a byte, word or vector of data to a memory, instead of the file stack and a 4KB minimum transfer, is a real game changer. Think Memcached or Oracle database and it’s easy to see the difference. Both would be able to update individual entries in nanoseconds compared with the millisecond of a typical I/O.

    However, there’s some bad news. Existing block-oriented apps will need to be partially rewritten to take advantage of this technology, but the acceleration on an individual I/O is in the 10,000X range, so it will be worth the effort. Look for byte-addressability in late 2018 and watch those database companies!

    (Image source: Netlist)

  • Software-defined infrastructure

    If you’ve adopted or are considering hyperconverged solutions, making your switch and storage software a set of chainable virtual microservices is a small mental jump, though one with huge consequences. That’s the gist of software-defined infrastructure, where the microservices are encapsulated in instances or containers rather than being part of monolithic software on switches or storage appliances.

    SDI opens up scaling agility, pay-for-play models of financing, bare-bones, low-cost hardware and the emergence of many startups with new microservice offerings.

    There is still work to be done on standardizing APIs across the industry, so SDI will be a work in progress for a couple of years yet, but software-defined networking is already demonstrating real value in the approach, and SDS (storage) is picking up pace. Key storage software such as Ceph, the leading open source object store stack, already have most of the required structure for SDI deployment.

    More than any of the previous technology evolutions, SDI will define storage in the 2020’s and remake the way IT works.

    (Image: Evannovostro/Shutterstock)

  • Envisioning the future

    The storage appliance of 2020 will used a Gen-Z base to create an HCI cluster. Each compact appliance node will have its server, NVDIMMs and NVMe drives, with a mixture of ultra-fast and bulk capacity SSDs.

    Total box capacity will be 5 petabytes (compressed) ranging upward to 10 or more petabytes by decade end and a typical cluster will have many petabytes of pooled storage. SDI will be working with automated orchestration to drive the cluster, taking over much of the admin effort.

    GPUs will be used to parallel process data so that data is always compressed when stored or shipped over the 50 or 100 GbE RDMA cluster feeds, providing an effective 250/500 GbE cluster fabric in many use cases.

    Data centers are going to shrink a great deal in physical size, while SSDs allow the old regime of using pre-chilled air to cool units to be dropped in favor of free-air cooling. This will lead to PUEs better than 1.05, as well as drastic power savings from all-solid-state environments and the Hybrid Memory Cube (HMC) architecture.

    (Image: zeber/Shutterstock)