Storage systems based on scale-out designs are becoming increasingly popular in the data center. These systems typically leverage commodity server hardware and the internal storage in those servers called nodes. As capacity or performance demands dictate, these nodes can be clustered together to meet those challenges, all while being managed by the storage administrator as a single entity. The problem is that nodes may not be fully utilized, especially from a CPU utilization perspective, which may lead to a new problem: node sprawl.
Each node in a scale-out storage design essentially has three vectors that it needs to scale on: capacity, network I/O, and per-node CPU utilization, used to drive storage software functions like snapshots, data tiering, and replication. Most scale-out designs add nodes in response to a demand for capacity because the internal capacity per node is relatively limited. By comparison, the network I/O (with 10 GbE) and the CPU capabilities to handle additional functionality (snapshots, clones, tiering) are plentiful. The result is that the CPU and network utilization per node are significantly lower than the capacity utilization per node.
The impact of node sprawl is that nodes are added prematurely before all three of the node resources are fully maximized. This means that IT budget is being spent prematurely as those resources sit idle. It also means that data center floor space and power and cooling resources are being consumed as rack after rack of scale-out storage is added to the environment.
[ Facing a storage performance emergency? Throw Hardware At The Problem. ]
The solution may be to break the link between capacity and storage processing. There are two ways to do this. The first is to virtualize the storage software and converge storage services into the virtual server infrastructure. The second option, as we discuss in our recent article "In Open Storage The Storage Infrastructure Matters," uses smaller servers with less storage capacity but connects them to a cost-effective storage backend.
Storage-as-software options are available now. We call them "open storage software solutions" because they will run on any server hardware and connect to any storage hardware. The key to keeping node sprawl to a minimum is going against the default design of using internal storage with the node, and instead to separate those two purchases. Build a storage/network I/O processing infrastructure, either using virtual servers or using physically smaller servers first. Then build an independent storage capacity infrastructure.
This will allow the backend capacity demands to grow independently of the frontend processing and network I/O. It would also allow the best-of-breed selection of multiple storage software applications based on the demands of the application or user profile.