Q&A: HP ProCurve CTO Paul Congdon

Our interview takes a deep diving into networking, with Paul Congdon, HP ProCurve's chief technology officer. Congdon not only opines on managing network sprawl, but he's playing a big part in the solution. As the vice chairman of the IEEE 802.1 working group, Congdon is hard at work on the data center bridging standard, which seeks to make Ethernet the single converged fabric in the data center.

October 19, 2009

16 Min Read

Our interview takes a deep dive into networking, with Paul Congdon, HP ProCurve's chief technology officer. Congdon not only opines on managing network sprawl, but he's playing a big part in the solution. As the vice chairman of the IEEE 802.1 working group, Congdon is hard at work on the data center bridging standard, which seeks to make Ethernet the single converged fabric in the data center.

In our talk, Congdon explains why better management tools are needed to deal with the literally millions of server instances--physical and virtual--now on many networks. He also sheds light on his advocacy for a distributed approach to networking architectures, which pushes more of the intelligent management and aggregation decision-making out to the edge. This stands in contrast to Cisco's more centralized philosophy.

Finally, we talk about how many enterprise data centers are likely to morph into internal cloud providers, or possibly even something akin to public hot-spot providers to their employees.

Our chat was longer than many other discussions I've conducted; this enabled us to dive deeply into the dense -- and timely --- subject matter. Accordingly, I've abandoned my usual practice of cutting the raw interview way down. Instead, I've left it nearly at full length. I think Paul's deep knowledge and passion come through better this way. I hope you'll stick with it.

NetworkComputing: What's the key issue in networking today?

Paul Congdon: How we're going to manage the sprawl that's taking place--the fact that we have thousands and thousands of servers, whether they're physical or virtual, and how we manage this new environment where the network and the server are intimately related.

You're very well aware of the challenges behind VMotion [live migration of virtual machines] and mobility within the data center. That's a huge value to customers -- being able to move workloads around and being able to optimize performance and power efficiency. Live migration is a really valuable tool, but it puts some challenges on network design and topology.

The big thing that we're all trying to solve is how to scale Layer 2 networks [i.e., the data link layer, where addresses are flat] out to be really large and flat, so that we can better manage this mobility. I don't know if this is the initial knee-jerk reaction to solve the problem, or if this is going to be the long-term answer, but having large Layer2 networks is going to make things simple, because that's how administrators are used to dealing with their virtual networks, their switches; they're deploying their applications right now.

Getting Layer 2 to scale out is not necessarily as easy as it sounds, especially the numbers we're talking about -- hundreds of thousands of servers. And if you have VMs running, you're talking millions of servers.

NetworkComputing: Isn't part of the difficulty the fact that people don't have good visibility into their networks now, and things are getting vastly more complex?

Congdon: Right. We're definitely pushing the envelope with the toolsets we have today. Visibility into what's going on is really difficult to obtain. That's one of the drivers behind this virtual-edge bridging that we've been working on, which is to enable better visibility into what's going on in the servers. For example, by using sFlow to give you that visibility, or embedded it into the switches themselves.

Then there's the scale of these tools. We're literally talking about millions of servers. The desire would be to put this all on one big VLAN, and then everything could move where it wants.

Another driver for why we want these big, flat Layer 2 networks is the convergence of Ethernet, taking over some of these protocols that were classically built on networks that were "Layer 2-ish," if you will; that is, non-routable, such as Fibre Channel over Ethernet (FCoE) and Infiniband.

The long term vision is, we want to get all this stuff running over Ethernet. The easiest way to scale to thousands of servers is to make the network big and flat. But big and flat is often a challenge. There are a number of things about Ethernet that prohibit that level of scaling. The traditional topology management protocols, like spanning trees, tend to not allow you to use all of the links. So you if have a big, flat network, that means we might not have the cross-sectional bandwidth we want for all of the nodes. So we need new topology management schemes.

The plug-and-play nature of Ethernet made it work real well. If you don't know where someone is, you flood the packet or broadcast to find somebody. Well, if you have 3 million people all chatting to figure out how to find an IP address, that can be pretty dramatic. So broadcast control becomes a challenge; we have to figure out how to make that work.

NetworkComputing: Isn't one of the challenges the fact that, in increasingly virtualized environments, there are different bandwidth demands, depending on whether you're talking about interprocessor communication, processor to memory, or out to storage?

Congdon: Absolutely. That's another dimension. Each application may have different requirements, and then certain things like storage and clustering-type applications have different bandwidth and latency requirements. So now we have more constraints on how we try to meet the needs of those applications.

In the past, you'd like to just throw bandwidth at these applications. That's always the easiest way to do it, but it's fairly expensive to make the jump to 100Gb Ethernet in the data center. We'll get there; we'll continue to drive the prices down. But there's a point in time where that higher- speed link is not necessarily cheaper than a bunch of lower-speed links aggregated or distributed in a certain way.

This means that, today, if I have to meet QoS requirements, bandwidth requirements and latency requirements, I have more of a challenge as to how I'm going to allocate that bandwidth to manage it.

NetworkComputing: Is there any tension between desire of vendors to become a sole solution and everyone working and playing well together to develop interoperability standards?

Congdon: We have this tension right now, yes. We've spent the last 20 years decomposing that mainframe into a bunch of bits and parts, and now it's a bunch of virtual parts.

In the process of doing that, we've given customers a lot of choice about how they can deploy things. But the sprawl from that decomposition has created a management challenge. So we're trying to collapse back this disaggregated system into something that customers can manage at a higher level. It'll still be modular and virtualized. But it'll be a bigger building block.

We're competing for a new system, which is the data-center element, or whatever you want to call it. Is it a rack, is it a pod? Well, it's a disaggregated system that I now manage as more of a unit. We're fiercely competing to be the interface to customers, to provide that, and we need to do that in a way that's still going to give them choice and flexibility.

NetworkComputing: So that's what you're working on in the Data Center Bridging portion of the single converged fabric task force standards effort under the IEEE?

Congdon: Yes, that's absolutely one of the key motivations here. We recognize there's an ecosystem of people that need to be part of the solution. Hypervisors, NIC vendors, switch vendors, management software, and of course the storage and compute guys. All of these people are together putting the system together, and if we're going to have a chance of building something that gives customers more choice and flexibility, that's got to be based on open standards.

Technically speaking, I fundamentally believe there are tradeoffs customers make between performance and functionality all the time. With certain applications, you may want to go for the highest performance and the lowest cost. Whereas there may be other applications that require more policy enforcement, network control, and traffic-inspection visibility, and you want to have that traffic pass through intelligent network elements.

As a customer, I believe I'm going to want to have both of those capabilities available to me. Our position in the Edge Virtual Bridging (EVB) group is that this distributed architecture gives you the best of both worlds. That's as compared to the alternative, which is a unified-computing-type system, which is very centralized, proprietary, and requires all traffic to traverse into the core of the network. It doesn't really let you tradeoff between features and performance.

NetworkComputing: Tell me more about the Edge Virtual Bridging group.

Congdon: EVB is Edge Virtual Bridging. It's really all about the architecture, about how servers--whether they're virtual or physical--connect to the network edge. This is where the convergence of computing and networking is occurring.

Inside the hypervisor, there's a switch. But how does that switch interplay with the external switches? How does it allow you to create a consistent environment for those virtual machines, across the data center, with the external network? There's a lot of new protocol, forwarding behavior, and component definitions that need to occur so that we can better include that piece.

In September, we agreed to start two new working groups. Along with EVB, the other is called Port Expansion. Port Expansion is a Cisco extension to EVB, so we found this cooperative ground, where the competing architectures can build off of common technology. By mid November, we expect to have these two new standards tracks occurring.

That's within the IEEE. Meanwhile, offline, every week, there's a group of industry players -- it's pretty much everyone you can imagine -- on phone calls where we are trying to come to a common vision agreement, and alignment on how these things will work. This feeds into the standards process.

NetworkComputing: Is the complexity for a user the reason we're seeing tighter couplings between networking and server providers?

Congdon: The important factor is, there's a convergence that's going on. If you look back in history, every time there's something interesting that's gone on, it has to do with some form of convergence -- for example, voice over IP or storage over Ethernet. Now we're in this whole virtualization-compute-network convergence.

The network is clearly not as standardized as the server component.

As for the storage piece, how well does a Brocade switch talk to a Cisco switch? At the edges, they talk to initiators and targets well, but switch-to-switch? The Fibre Channel world has never been too great with its interoperability. There's always been a fair amount of lock-in. All we're seeing on FCoE is kind of the same thing, just changing out the physical layer.

NetworkComputing:Talk about the security challenges. People are very concerned nowadays regarding both the cloud and with virtualization.

Congdon: There are two things that come to mind when I think of security virtualization. One is, the new set of security problems that exist because of the nature of security. Again, these have a lot to do with the sprawl--the fact that there are thousands of operating systems that need to be patched. Sometimes those machines are offline, so they're not there for the scanning tools. Certain security tools don't work as well as they used to. Things are moving around. I have mobility, so I need to move my firewall rules. My network configuration with the virtual machines is moving around.

The other thing which comes to mind is related to provisioning and mobility within the network--the fact that we want to be able to dynamically configure the network. In order to manage millions of servers, I can't have guys going around typing in the command line interface every time a virtual machine needs to migrate somewhere else. So I need more automation in the network edge.

This has been ProCurve's adaptive edge architecture for years, but now it's being applied in the data center; it's even more important there. It's a more contained environment, so I think it's a little bit easier for us to deploy it.

To summarize, there are new security challenges that virtual machines bring that we have to address. And then there is the ability to apply our security framework for mobile users and for dynamic network configuration that's very valid now in the data center, as well.

NetworkComputing: What does cloud do, as companies go to hybrid environments where they're running some stuff on premise and the rest in the cloud? Do they become less concerned about network architectures because part of it is not their problem anymore?

Congdon: Well, certainly for an enterprise customer who's going to use cloud services, it means that he changes his focus from his own network architecture design to managing the SLA from his provider. But you need to focus on how you integrate cloud into your enterprise network.

So your focus is on how you bring cloud into your enterprise architecture, and that actually leads people down another path that we believe is a trend, which is the internal cloud. We think that enterprise IT organizations will continue to increasingly deploy their own services as a cloud provider. They look like a cloud to their end users. That makes it even easier for them to dynamically move those applications and services off to a provider's side, or bring them back on, depending on what's most cost effective.

NetworkComputing: So you're saying that the enterprise IT org, instead of its traditional role, will become the internal cloud provider to its users

Congdon: Yes, exactly. I believe that's a trend that IT organizations will begin to adopt. We get these questions a lot when I talk to customers: 'Should I just make my network open just like it's an Internet? Should I run VPN all the time, everywhere so that my workers don't have to do anything different? So that when they come to the office, it's as if they're connecting to a hot spot.

There is a chance that the internal LAN itself -- which, by the way, I think will be the last to fall here -- begins to look like a public Internet. You secure access to resources, but you allows guests to come onto the network.

There are a number of benefits to doing this It simplifies the end-station management. It might increase the amount of VPN type technologies or security technologies that you want to put in your enterprise LAN, but it could reduce the actual, individual management of people using the LAN.

NetworkComputing: Wouldn't that be the ultimate commoditization of the network?

Congdon: Well, it positions you in a place to do the outsourcing again. Imagine infrastructure as service cloud provider. If you think about telecommuting, you almost are. You're using the Internet--the public pipe--to get back to work.

The real turning point is going to be desktop client virtualization. There's going to be a day when IT organizations say, I'm not even issuing you a laptop anymore. All I'm going to issue you is a virtual machine, and this VM contains all of our corporate applications. It contains a VPN, and I can manage that VM very well, and secure it. And you can move it around. Oh, by the way, I'm not buying you a laptop anymore Mr. Employee -- you have to go buy your own. You can do whatever you want with your personal stuff, but this VM is your work persona.

NetworkComputing: I'm not sure that would be a bad thing. You've talked about customers using their network as a strategic asset. But doesn't everyone need the same things -- maximum bandwidth and throughput?

Congdon: Well, there are those fundamental tenets--maximum bandwidth and throughput. But things like flexibility, reprovisioning, adaptivity to new applications, are also important.

There are a number of places where people can look at their network as a strategic asset and as a foundation for their IT systems. The final piece would be manageability. If you're spending all of your IT budget on configuration, command-line-interface scripts, and training your guys to be certified network engineers, just so they can just figure out how to turn on the power, then all that money is going to waste in that you could be innovating instead.

The data center is an interesting point for this. You look at Google and Amazon, and if you ask those guys if they'd consider their network architecture a strategic asset, I would think they would absolutely say yes. So much so that they won't even like to tell you what it is.

NetworkComputing: What do you see as the three burning issues for the next 12 to 36 months?

Congdon: This convergence is a big one. Getting your network up to the capability to support these new applications. I think people are going to be grappling with, is my network ready to support FCoE? Is it ready to get rid of Infiniband?

Then there's the integration of management systems. So my server, network, and storage guy are all going to have to figure out how to work together. They've got to work off of some common toolsets. I think the management tools for those different disciplines are going to be a question that customers are going to have to grapple with.

There's always the next level of bandwidth. When do I go to 10-Gig servers, or when do I need to go to the next higher performance, like 100-Gb? When does the cost give me the right return on my investment?

The last area would be managing the sprawl -- the scalability of Layer 2. What's the new architecture for my data center backbone in order to give me the kind of scalability that I'm going to want? Are spanning trees going to do it for me? Do I need to go Layer 3, closer to the edge? At some point people are going to be looking at upgrading their backbones, as we increase the performance out at the edge, and as that in turn puts a little more pressure on the backbone.

NetworkComputing: As we close, let's talk about ProCurve.

Congdon: There's an architectural message, which is the adaptive edge architecture. Fundamentally, what ProCurve is all about is bringing complex solutions into the mainstream. Our value proposition to customers has not changed in many years. We're not about being the cheap solution, but we are about making the advanced solutions affordable.

The change we're in right now is a very powerful one. We are a critical component provider to the rest of HP solutions. We're in that position because we earned it; it wasn't because of a de facto 'I have to use an HP piece in my solution.' I spend quite a bit of my time working with other parts of the company internally and driving these solutions forward.

One of the key things about HP, especially Enterprise Network Services, is we have a lot of ways we reach customers and go to market. We're allowing customers to choose how they want their IT services delivered. Some people buy the products and manage them themselves. Some people have outsourced to EDS or service providers. [EDS was acquired by HP and is now HP Enterprise Services -- ed.] Some people go for cloud. Everyone's all along the service spectrum.

Related Topics

Recent in Infrastructure

Related Topics

Recent in Network Mgmt

Related Topics

Recent in Security

Related Topics

Recent in Enterprise Connectivity

Related Topics

Recent in Wireless

Related Topics

Related Topics

Q&A: HP ProCurve CTO Paul Congdon