A look at the relationship between SDN and SDS and potential performance issues.
The aim of software-defined infrastructure is to unify automation of virtual resource handling by cloud users, freeing admins from routine tasks that, in a large cloud, would reach unacceptably high levels of support. In other words, for storage and networking, we are trying to do what virtualization and the cloud have already done for servers.
In the world of software-defined infrastructure, services deployed in both storage and networking are virtualized, too. These data services are abstracted from storage appliances and network switches to run in any VM or container in the cluster. As a result, services can expand and contract as needed to meet any workload, while the tenant of any workspace can chain services to meet their requirements.
Abstraction implies just that. Software for services no longer needs to run inside a switch or a storage appliance, which should make the cost of this equipment drop dramatically. We are already seeing bare-bones switches, using just switch chips and a small amount of other electronics, impacting top-of-rack setups.
Undeniably, software-defined networking (SDN) has a year or two lead on software-defined storage. Many view SDN as providing an alternative to traditional proprietary switches. Notably, software suppliers are delivering services code independent of the switch hardware, and in that respect SDN has met a major goal in abstraction.
While SDN and software-defined storage (SDS) are usually considered separately, it's important to examine the relationship and interactions between SDN and SDS and look at potential performance issues.
SDN and SDS
In an SDN environment, central IT admins for the cloud can create sets of policies that define what tenants of their cloud can do in networking. They also can create “fill-in-the-blanks” templates to make the setting up of VLANs and their subsequent management almost foolproof against admin finger trouble and also easy for tenants to use.
When we look at software-defined storage, we don’t see the same level of standardization or coherence of thought as with SDN. Much of the focus of so-called SDS products is aimed at virtualization of services within the storage box and some marketers take the approach that, if there’s software in the storage appliance, it’s software-defined.
However, we are beginning to move beyond the hype phase of SDS with software products that can be abstracted from the hardware platform and run in a container or virtual machine. A prime example is Ceph, the very popular open source object store. This software can run virtually, and it’s easy to envision a Ceph cluster where the drives are parked in dense packages or distributed throughout the server farm while all but the raw transfer protocol code runs in virtual instances.
This type of virtual approach becomes more important when dealing with hyperconverged systems. The direct-connected drives in each node run free of tightly bound services, which execute in the pool of virtual servers.
However, life is never this simple! As anyone who’s run several clusters knows, storage has network and topology issues that impact performance, but also data integrity. Connect a lot of drives on the end of a slow network and you'll run into problems.
A complicated relationship
Even with SDN, networks will not be homogeneous. In-rack switching tends to have a much higher aggregate bandwidth than network backbones, for instance. It's good practice to bring compute close to where data is stored as much as possible, since this lowers latencies and reduces backbone traffic.
SDN and SDS need to figure out this type of physical affinity, with capabilities in both admin tools and automated orchestration to cope with bandwidth demands. Consider this scenario: An app decides to render a large video file using multiple instances of a rendering tool. It’s a priority job and needs powerful GPU instances.
Orchestration today can set up the instances, though whether it is smart enough to place them in the same rack as the data is another matter. Now we need wide-lane LAN connections to handle the bandwidth. If the SDN is service-aware, it should pick VLAN structures that use quad-lane high-speed links between storage and the GPU instances.
Finally, the storage has to be told what throughput it needs to deliver to the fast pipes, so it has to get feedback as to how fast they are and what portion of their bandwidth is allocated to the rendering process.
Another scenario when these issues come into play is the integration of all-flash arrays into a network or using ultra-fast flash drives. In all these cases, we need to cope with the physical attributes of the nodes – something that the marketing around software-defined infrastructure doesn’t address.
This all may sound complex, and it is, but without these issues being resolved, mainly in automation, the resulting system performance will be disastrous. Proper SDI implementation means that instance orchestration, SDN and SDS have to negotiate the optimal configuration. We do not yet seem far enough along in the commercially-available packages to achieve anything close to this. The best we can do today is to set up the hardware and treat it as homogeneous.
Moreover, SDI raises security concerns that must be addressed. The world is woefully delinquent in encrypting data – a shortcoming that will grow exponentially as automated networking in large-scale clusters becomes the norm. The resulting explosion in short-life VLANs will open up a massive attack surface, while rendering detection much more difficult.
One solution that will better protect data storage, is to build a separate storage LAN structure, perhaps based on 40 GbE for performance. Orchestrating this independently of the LAN exposed to the outside world will improve security, but it really implies that this storage LAN would carry all the command and control for orchestrating servers, networks and storage.
Integration between SDS and SDN, and server orchestration are clearly a complex area, and there's much work to be done. At the end of the exercise, though, we should see smooth running with near optimal performance.