The world of computer systems is evolving. Instead of simple server/SAN architectures, we have a wide variety of choices. One strong model in the new wave is the hyperconverged system.
Ostensibly bred from the idea that sharing local server storage in a virtual SAN makes sense, hyperconverged systems illustrate the reality that we need local storage to make high-end instances and boxes with thousands of Docker containers work. At the same time, the virtualization model demands we have an efficient way to protect against appliance failure by copying writes to multiple nodes.
There are other drivers for hyperconvergence. Servers and storage designs already have converged a great deal. We use COTS motherboards in both classes of product and today’s storage appliances typically have just a few drives, perhaps 10 or 12, which fits them to long-standing server box designs. The result is that servers can be used as-is as storage boxes, so it's a small mental step to using the same box design for both.
However, not everyone buys the hyperconverged approach. One issue is that resources all scale in lockstep, so adding more compute also adds more storage, for example. This could be an issue as needs change over the life of an installation. Another issue is the vendor lock-in implied by homogeneity; vendor lock-in is something we all aim to avoid.
So where do we get our hyperconverged server/storage boxes? We can buy from the old guard of vendors, but this carries a considerable premium. Traditional vendors argue guaranteed interchangeability of parts, better service, and premium technical support, but all of these claims made much more sense when designs were proprietary and every failed component was repaired ASAP.
The advent of the cloud has made COTS the king of IT. It’s the nature of COTS designs that standardization of interfaces and mechanical specifications is excellent, well understood and tested over millions of units shipped. There are few surprises today at the design level.
Many IT shops are moving to clouds, both public and hybrid. This means adopting cloud maintenance approaches, which teach it’s cheaper to buy a few extra units than pay for a warranty and to repair anything. The result of COTs, then, is to make it feasible to buy high-quality servers from a much broader vendor set, including the ODM companies that make most of the servers used today.
Hyperconverged system components
Choosing a hyperconverged system is problematic. On the one hand, many vendors offer just one or two models, which may not have the lowest-cost components and almost certainly have long-term lock-in implications. On the other hand, a hyperconverged system is just a single server design that handles both compute and storage, and really there is nothing stopping you from configuring your own version. That allows much more flexibility over the life of the cluster.
Deciding what component choices are needed for your specific hyperconverged system can be a challenge. Most users will choose an x64 architecture for this type of server. ARM64 is a possibility, and offers cost and electrical power benefits, but isn’t fully mature.
The horsepower of the CPU is going to be driven by the tasks for the cluster. This is where the first tough decision comes. If worksets were homogeneous, this would be easy, but with today’s rack servers, financial modeling and databases require a different model type than web serving. Sadly, one size doesn’t fit all efficiently. The first might run best on 3U quad-CPU boxes, while the web today needs only dual-CPU 1/2U designs.
Hyperconverged clusters make sense for general purpose computing and clustered databases such as Oracle, while the need for drive storage suggests a 2U design with drive bays today. These can hold a lot of DRAM and as many as 12 3.5” drives. They support mid-power CPUs -- the top-power chips typically need larger boxes -- and up to a terabyte of memory.
A key element of a hyperconverged system is the network. I recommend buying the fastest network connections you can afford. I’d be looking for at least two 10 GbE per server, but in six months or so the sweet spot will move to 25 GbE. Economically, the gating item on cost is likely to be whether the motherboard chipset has 10 or 25 GbE ports built in, since this is much cheaper than a NIC card.
Faster networking is available. If the onboard ports don’t meet your projected need, dual-port NICs with 25 GbE ports, or single/dual 40 GbE or 100 GbE ports are an option, while for low latency, low system overhead and fast transfers, you might consider RDMA NICs.
RDMA makes sense with in-memory database, for example. Similarly, with NVDIMM persistent memory a hot new thing, it would be the best approach to share large memory pools from the DRAM bus. Make sure you have suitable software to do this!
The choice of drives is very important. Here, it depends if you are buying from the old guard or getting drives from a major distributor. In distribution, SSDs are cheaper than enterprise hard drives. In hyperconverged systems, they have a tremendous, and necessary, advantage in performance, since they have enough IOPS to cater to both local traffic and the networked storage demand. Prices of SSDs continue to drop fast, while capacities are already well above bulk storage drives. The inevitability of a wholesale move to SSD in servers is on the near horizon.
Note that SATA SSDs are proving durable and provide a good level of IOPS. We no longer need dual-port drives or SAS to deliver millions of IOPS from a dozen drives. Choosing PCIe/NVME drives is an option for high-end hyperconverged systems, but will need the fastest network possible for balance.
SSDs are achieving those large capacities in a 2.5" form-factor. So we will migrate, probably next year, to a set of server boxes with 2.5" bays, having either more bays or a smaller footprint. There is talk of using an even more compact SSD form factor, M2, to pack drives into a much smaller space, which fits with the no-maintenance approach.
External storage fits the hyperconverged architecture. First, cold storage can use one of those old HDD arrays, at least for a few years, although I wouldn’t invest in new HDD arrays for this. All-flash arrays also have a purpose. As shared storage, they are inexpensive compared with old-style RAID. If your use case requires sharing data with other server farms, or you need more fast storage than the virtual SAN can provide, all-flash arrays are a viable option.
Economics is a factor here. PCIe SSDs aren’t cheap and in a large cluster, a mix of local PCIe and an AFA might be a much cheaper solution. It comes down to use cases. This is an important issue, because the advent of persistent memory on the DRAM bus will change the profile of what a top-end system looks like. Such a system might have relatively expensive NVDIMM to essentially make that terabyte DRAM look like 10 TB or more. If so, dropping local SSDs in favor of a shared AFA may make more sense economically.
Hyperconverged systems look like a strong option going forward, but solutions based on ultra-compact servers, with persistent memory but no drives, may solve a number of problems and change the game somewhat.