Putting Controller-Based Networks' Security Risk In Context
OpenFlow is starting to gain some buzz in the industry, with a demonstration at the upcoming Interop show in Las Vegas and vendors starting to adopt the protocol. However, as others begin to learn about OpenFlow and controller-based networking, complaints about single points of failure and single targets of attack get fired off in an almost knee-jerk reaction. Let's stop and take a breath.
June 26, 2017
OpenFlow is starting to gain some buzz in the industry, with a demonstration at the upcoming Interop show in Las Vegas and vendors starting to adopt the protocol. However, as others begin to learn about OpenFlow and controller-based networking, complaints about single points of failure and single targets of attack get fired off in an almost knee-jerk reaction. Let's stop and take a breath. Single points of failure and single points of attack are common issues in networking and, frankly, have been dealt with in many ways. These objections are non-issues.
Whenever you stand up a new service or project, you, or someone you work with, has to address availability and security issues. And, guess what? The principles are the same, whether you are talking about an application service such as Microsoft Exchange, a network service like a firewall or a network management command-and-control system. If you can, avoid single points of failure through clustering, hot stand-by or some other method. You may be able to do that in the product itself or by using external products. If you must have a single point of failure or attack, make sure that you take steps to reduce the likelihood of failure or attack, and make sure you take steps to reduce recovery time.
I am being vague, I know, but how you do it is based on context, the severity of a service disruption, the likelihood of a service disruption, your management practices, product features, and a bunch of other things.
Look at your network today. Is every host dual-homed to different switches? Are those switches powered by separate power distribution systems with backup? Is each tier of the network fully redundant? I bet the answer to most of these questions is either "no" or "in some places." It probably doesn't make economic sense to have all of your access switches fully redundant. Having a spare switch that can be replaced and running in 15 minutes or less is good enough. In your data center, where a disruption has a larger impact, you are more likely to build on more capacity and redundancy. What you do is all about context.
What about your servers? You know how to run mission-critical services. You do it every day. Sometimes that means running application clusters that can distribute load and fail-over statefully if a cluster member fails. In other cases, you can put a load-balancer in front of a set of servers and, if a server fails, you only lose those users that are connected to that failed server. Sometimes, you keep a stand-by server ready to take over the load after it has been provisioned and configured. OpenFlow is just a protocol that securely facilitates communication between a controller and the network infrastructure. The OpenFlow controller manipulates the switch/router forwarding tables in the same way individual switch/router manipulates its own forwarding table. Rather than having each switch/router create its own view of the network in to make forwarding decisions, the controller creates a centralized view and then configures the switches to reflect its network view. The process that builds the global forwarding table(s) is independent of the OpenFlow protocol. Vendors and researchers can use whatever they choose to build the forwarding table(s) and then use OpenFlow to configure each switch/router in the network.
The first objection with OpenFlow is that, since it is controller-based, the controller becomes a single point of failure. If that's the case, then don't design or purchase a controller-based system with a single point of failure. There is no reason that OpenFlow controllers can't be clustered, placed in active/active or active/passive fail-over, load balanced or any other model to ensure up-time. This isn't an issue that can't be addressed.
Given the requirement to have critical systems be redundant and resistant to failure, you can bet vendors that adopt OpenFlow will address redundancy and high availability of the controllers. Besides, even if an OpenFlow switch was cut off from the controller, it will still forward traffic based on its current forwarding table. New flows are sent to the controller for forwarding only when there isn't a forwarding entry in the network device--a.k.a. a table miss. Designing your network so that every new flow has to be sent to the controller for disposition would be a bad design.
Using aggregated flows entries, you can design your network in a similar fashion as you design your network today--with subnets talking to subnets and other many-to-many, many-to-one and one-to-many designs. An aggregated flow says "this group of stuff talks to that group of stuff" using IP addresses, TCP/UDP ports or whatever your controller supports. If a new flow enters the switch and it can be part of an aggregated flow, then it doesn't need to go to the controller. Thus, the loss of a controller may not be that disruptive--in the short term, at least.
The second objection is that the OpenFlow controller becomes a single point of attack. If an attacker can get control of the controller, then he or she has the keys to the kingdom. Well, there is no denying that in any controller-based network, guarding the controller is paramount. But that is true for any critical piece of management software, such as network management systems, firewall management systems, VPN systems, hypervisor management systems ... you get the picture. You already have these critical systems to protect. An OpenFlow controller is no different. Put the management systems on an isolated network segment or virtual LAN (VLAN), make sure the controllers are hardened, monitor activity, restrict access to it--all the stuff that you do now.
The other security objection is availability. If an attacker can launch a denial of service (DoS) attack against the controller, then he or she can take out the network. Fair enough. I left this objection for last, because DoS mitigation, to a large extent, is addressed by the same principles of good controller design and existing anti-DoS methods. A good controller design doesn't have every new flow going to the controller for disposition--only flows that are unknown.
I think there is some research needed, but I don't think DoS is going to be that big of a risk. If an attack floods new flows that are guaranteed to miss, such as spoofed off-net IP addresses, then the controller has to deal with them. Send enough of those flows and you have a DoS. But, in a well-designed network, you know what your address ranges already are, so those that are off-net can be black-holed. Routers have had that capability--blocking source IP addresses that are off-net--for years and years. If memory serves, on Cisco's IOS, it's a single global command.
You shouldn't ignore single points of failure and single points of attack in any product or system, and you shouldn't acquire or use a critical system that doesn't offer high availability or quick recovery options. But you need to take these conditions in the context of the criticality of the system and the other systems that can support it.
Read more about:
2017About the Author
You May Also Like