A look at what's inside HCI and the cost considerations.
Hyperconverged infrastructure is a hot area of IT right now. The realization that next-generation storage appliances look a lot like servers triggered a movement to make server storage sharable, bringing together the benefits of local storage in the server with the ability to share that storage across the whole cluster of nodes.
The key is a software package that presents all of the storage in the cluster as a single pool, with automated addition or subtraction of drives. This pool of storage, sometimes described as a virtual SAN, can be divided up and presented as smaller drive volumes, subject to access controls and sharing rules. The virtual SAN software handles issues such as replication and erasure coding groups and also deals with drive errors and data recovery questions.
A hyperconverged cluster looks like a set of identical nodes, each with a set of drives, a multi-processor server motherboard, DIMMs and LAN connections. This rigidity in configurations is somewhat artificial, and we are already seeing more wriggle room in the rules on what drives can be used and whether all the nodes have to be the same. This is because, fundamentally, HCI nodes are commercial off-the-shelf (COTS) systems.
Now, the fact that we can use COTS parts doesn’t mean every drive type is equal in performance or dollars per terabyte. Good performance of a cluster means that drives need to be fast, making NVMe SSDs the drive of choice. The premiums associated with NVMe drives are dropping, though getting a drive with 10 gigabytes-per-second throughput is still expensive.
With SSD capacities rising sharply over the next two years, driven by 3D NAND technology, the number of expensive drives per node will drop and some slower bulk SSDs will be added to provide secondary, cold storage.
Networking needs a very low overhead scheme, making RDMA over Ethernet is the solution of choice. 25 GbE ports are taking over from 10GbE, but a discrete LAN NIC card is still needed for RDMA support. This should change as the large cloud service providers pressure Intel for RDMA in the chipset.
In today's HCI nodes, server motherboards are typically dual- or quad-CPU cards. These are virtualized servers, so memory is a primary constraint in the number of VMs that can run; consequently, DIMMs are a significant cost element in a configuration. The advent of NVDIMMs will help by providing a bulk store to extend the effective DIMM capacity by a factor of 4X or more.
Let’s dig a bit deeper into HCI to figure out how to pick the right HCI node and what it will cost you.
Hyperconverged systems should be inexpensive. The convergence effectively makes servers and storage the same box, so the vendor’s combined purchase volumes are higher and the purchase process simpler. Vendors save in support and this flows through to the customer.
Scaling out is a function of just adding more nodes. The software stack takes care of integrating the new node to the resource pool. Today, those nodes have to be homogeneous, but this will soon change. As long as the nodes follow COTS principles, almost any node will work, though performance may be less predictable.
(Image: Quang Ho/Shutterstock)
HCI runs on COTS systems, which has many implications. Porting the software stacks to a node is easy and quick. Performance requirements point to having fewer, faster nodes in the cluster, so good drives and networks matter. Any recent x64 processor will do the job, handling virtualization, with hardware support for multi-tenancy.
One question to resolve up front when planning an HCI deployment: What form of support for hardware does your organization want? Buying a cluster from one of the emerging Chinese HCI vendors costs less, but support may be somewhat limited. At the other end of the spectrum, spending more for support from a traditional OEM allows access provides global 24/7 support. Perform a TCO and risk study!
The number of HCI software vendors is growing, but the majority of current users rely on Nutanix, available via OEMs, or, to a lesser extent, SimpliVity, also available through OEMs, though SimpliVity has been purchased by HPE, which may limit OEM sales. Dell Technologies has a VMware stack, too, which creates a virtual SAN.
The availability of pure software plays is expanding and we can also expect Nutanix and possibly SimpliVity to unbundle a version. This unbundling opens up the option for a much wider sourcing of platform hardware, including nodes from “white-box” or ODM manufacturers. Unbundling also implies a product free of vendor locks on storage or server boards.
Commercial pricing applies to all the current software stacks, though an open source product may well surface in a couple of years. One hidden cost issue with current unbundled stacks is the extra admin work of deployment, while licensing options such as “as-a-service” pricing still need work.
The spectrum of drive performance today is huge. I’m not just talking about the difference between SSDs and HDDs. There's a broad range of SSDs, from SATA drives in the 100 megabytes-per-second throughput range to 10 gigabytes-per-second speedsters. Right-size performance to the purpose of the cluster, estimating IO traffic both internally within a node and over the network. Good storage analytics, such as from Enmotus or Tintri, are needed to get the balance in an HCI system right. Using these on the sandbox test setup might save you from over provisioning in your design .
Generally, though, go with NVMe drives. At the slower end of their range, they are still fast and prices are close to SATA drives of the same capacity. Likely, NVMe will displace SATA over the next few years, so the choice between them will become moot.
Secondary storage will likely fit within the node footprint, with typical nodes currently having 10 or 12 drive bays. Drive sizes could reach 30+ terabytes in a 2.5 inch footprint in 2018, so just a handful of drives per node, with compression software, would meet the cold storage needs of most users
With server instances running together with storage code, the overhead associated with LAN sharing traffic needs to be reduced from the traditional Ethernet level, which can eat up a CPU with high traffic. The remedy is to use RDMA over Ethernet, either RoCE or iWARP, to reduce latency and overhead. The savings are notable, with RDMA NICs offloading CPU workload, which can run as high as 30% of CPU cycles in a heavily trafficked system. Latency is much improved, too.
The problem is cost. Until the CPU chipset supports RDMA, there is the cost of a NIC, in the range of several hundred dollars, to deal with. However, the speed of the RDMA network means a given node will be much more efficient, reducing the number of nodes needed. Likely, on this basis, RDMA is cheaper!
Use 25 GbE for networking on new systems. There are RDMA NICs available to support that speed as well as a quad-link 100 GbE version.
(Image: Quality Stock Arts/Shutterstock)
By adopting a leading-edge software-defined networking (SDN) solution, HCI clusters can be built around inexpensive bare-bones switches. All the smarts for networking run in virtual instances in the server engines of selected nodes. This can reduce switch gear costs by a large factor compared to traditional rack switches and routers.
SDN is evolving rapidly and the software and hardware are continually improving. This creates challenges, but the industry is aiming for hardware-agnostic solutions, and should eventually get there.
Right-size the DIMM space to line up with the use cases for the cluster. In many cases, just as with drives and networking, having larger, denser DIMMs may reduce server count for the cluster, and even though they're more expensive, create a net saving.
NVDIMM storage is designed in part as a DRAM extender, increasing the effective capacity of DRAM by factors of 4X or more. Since NVDIMMs currently are deployed as local drives from an operating system viewpoint, sharing should be possible. This, however, is more a wish than reality and it isn’t clear what performance gains to expect.
Intel’s Optane raises the stakes for NVDIMM, with claims of byte addressability for a second-generation product. NVDIMMs are something to watch in the HCI space. I suspect we’ll see them in 2018 configurations, fully supported by the software stack.