• 06/26/2014
    3:00 AM
  • Rating: 
    0 votes
    Vote up!
    Vote down!

Network Design Decisions Go Both Ways

Network design should reflect business goals. But it's also important to anticipate how your technology decisions will affect the business moving forward.

Any good network designer knows that design is driven by business. Before starting a network design, you gather business-related information such as locations, scale of the network, internal and external users, how the users will communicate with one another, and how the traffic will flow. But a great network designer should also keep in mind that design decisions may also affect the business -- in the future, if not today.

One of my previous articles on MPLS traffic engineering drew a reader comment about which questions we should ask service providers before choosing an MPLS service. I gave almost all classical answers to that question from the convergence, management, security, and network control point of view.

But another consideration is also important, and it will help explain how design decisions can affect business.

If you choose an MPLS service, the service provider can give you Layer 2 or Layer 3 connectivity options. This means you will interact with the service provider at Layer 2 over a technology like Ethernet, HDLC, or ATM, or you will be connected through IP and run a routing protocol at Layer 3. Though many service providers give only static routing and BGP as an option, all interior gateway protocols are also supported at the edge.

Only one Layer 3-based IP MPLS VPN connection can be set up on each physical or logical connection. If you need more for redundancy or another purpose (such as multi-tenancy), you will need to pay more.

Redundancy always introduces complexity, at least for IP and routing protocols. For example, say you need a connection between point A and point B. Connecting via only one link is the most direct and least complex path, but it is not redundant. So maybe two connections might be ideal for redundancy, but it also increases network complexity. The configuration complexity and convergence time of the system also increase.

If MPLS Layer 3 VPN connectivity is chosen and the business needs multi-tenancy functionality, perhaps for unit-based separation, partner connections, or private/public cloud, you will require more and more physical or logical connections from the service provider. Alternatively, you could use Carrier Supporting Carriers technology. This is not that common of an option, and it does require additional resources for staff training, network management, and tool support for MPLS.

So a more common alternative method is creating overlay tunnels. You can do this with basic GRE tunnels or DMVPN (multipoint GRE tunnels), which may be more scalable. These overlay tunnels will not provide segmentation, however, so you will need to build VRF lite or more scalable MPLS VPN on top of the overlays. Here's where your complexity really increases based on design choice.

You may be thinking, "Why is network complexity so important that it can affect the overall business?" The answer is simple. When you increase complexity, you expect many protocols and systems to interact with one another. Many things need to be configured, monitored, and troubleshot. If one system breaks, it usually affects many others, causing expensive outages and downtime.

Let me give an example. If the OSPF routing protocol breaks, it affects LDP, RSVP, BGP, and so on. Before the other protocols will work properly, you must wait for OSPF to converge. If one protocol converges faster than the other, you may try to fix the resulting black holes with IGP/BGP or IGP/MPLS synchronization. Solving one problem leads to another problem, which forces you to add another layer of complexity.

If you had focused on the need for segmentation to provide multi-tenancy when initially selecting your MPLS VPN, you could have simplified your situation greatly. Scalable multi-tenant designs can be achieved without too much complexity.

Don't forget that, beyond best-practices and design principles, there is a relationship between technology design and business that goes both ways.


Cloud decisions

This post is spot on, especially for those who are looking to do a lot more with the cloud in the future or want to become their own private cloud and private cloud-based service provider. (For instance, if you want to provide something like IT-as-a-Service, you'd better be able to provide it across the board today as well as to other business units, headquarters and even partners in the future. Saw this great interactive presentation about IT transformation and what needs to go into the decision-making process:


--KB (Me: )

Re: Cloud decisions


I am glad you liked it. Key idea is definetely like that. Your technology choice shouldn't limit your design. And however business drives network design, your design also effect business. If you start your design for cloud, even you don't use the capability from the day one, it gives you multitenancy , agility, flexibility and on demand infrastructure.


Re: Cloud decisions

Orhan, agreed, the nature of business is that of change. There was a time when everyone was aiming for a hierarchical organization and then, came the trend to flatten the organization. Next came the need to be closer to the customer (social media), and so on. Predicting the future for a business is difficult, but the business that manages to adapt to industry changes in real-time will be the business that survives into the future.

Re: Cloud decisions

Good points overall, although I'd say we're a lot easier to convince than the financial stakeholders.  

It seems as though convincing that group requires changing the language away from IT and make it about insurance.  Building smart now, will insure the business against costly issues down the line.

Related Example

So since we're talking about fun with complexity, I'll share one I came across a while back in a network that wanted both high availaility and fast convergence.

The network had core switches running OSPF and BGP. timers for both had been lowered; for BGP timer were 5s (keepalive) 15s (dead), and for OSPF the hellos had been set to 1s intervals (OSPF hopefully reconverging faster before BGP notices, right? The routers (Cisco) had also been configured with dual supervisors and Non-Stop Forwarding (NSF). 

The problem ended up being that when a failover event occurs, once the standby Supervisor takes over, it will send out an OSPF NSF signal telling neighbors to hold on and not reset this peer. Unfortunately, with neighbors running 1s keepalives, typically before the secondary Sup had a chance to send the NSF signal, they had already failed keepalives and downed the neighbor, causing exactly the massive recalculation that the design was intended to avoid in the first place...

Re: Related Example

@jgherbert  Agree. This is expected behavior. Thus while configuring GR for NSF of the routing protocol to achive non stop forwarding at the same time , if you tell that find an alternate route and reconvergence , this happens.

Above draft can adress how both should work together.

From the network design point of view, rule of thumb is configure bfd on the core not NSF , configure NSF with GR extension of protocol at the edge.

I don't even talk about lower protocol timers , since it is not scalable and even you can not achive sub 150 ms second with routing protocol timers. Whenever you can use , use BFD for fast failure detection.



Re: Related Example

@OrhanErgun: "This is expected behavior."

*laughs* Well, it's correct behavior for sure. Sadly it was evidently not expected by the client when it was designed and installed!

Re: Related Example

@jgherbert They need a good designer obviously :). I see two problem there , first one is forcing the device to do opposite things , second is forcing routing protocol and using it's timers for fast failure detection. Maybe third one is not asking to any designer ?

Re: Related Example

@OrhanErgun: "They need a good designer obviously :). I see two problem there , first one is forcing the device to do opposite things , second is forcing routing protocol and using it's timers for fast failure detection. Maybe third one is not asking to any designer ?"

Well, it's proof to me that blindly layering resiliency measures on top of one another doesn't make for a more resilient network. Sometimes the interactions between a variety of measures means that you get a worse result than if you'd just left it alone ;-) It's another good reason to lab test heavily...

Re: Related Example

@jgherbert. Agree. Particularly for your example , when you tune IGP timers, you need to worry many more others. When you say interactions If I would not explain this I could not sleep :). 

You need to worry about First hop redundancy protocol to IGP , IGP to MPLS and IGP to BGP interactions as well.  You mentioned OSPF , in case of an OSPF you may need to consider to advertise maximum metric in your router LSA to avoid blackhole , especially when you tune OSPF to converge faster ( even it can faster converge than BGP by default ) blackhole might occur for BGP core.

Or for BGP free core design such as ISP , SP , VPN SP designs , you want to check IGP LDP synchronization , otherwise blackholing might occur due to race condition...

Distributed systems heavily rely on protocol interactions , failure of one protocol , trigger another and you always need to consider all of them together.

What should be done IMO , if you will choose a protocol , at least choose a protocol which hides underlying mechanisms as much as possible, so configuration complexity might be avoided also interactions may not be a concern.

OTV is a good example for above. It is actually a IS-IS over Eth over MPLS over GRE over IP over Ethernet :) but you configure tunnels and couple interfaces and achieve MAC in IP ROUTING. You can argue then what about visibility ? Troubleshotting ?. But please don't since it is already a long comment.


Re: Related Example

@PMITCHELLNA Sort of agree. SPB like TRILL and Fabricpath can give you layer 2 multipath capability. If you would tell basic SPB I would not agree but since you mentioned Avaya SPB which use modified underlay and overlay control plane and modified multicast , sort of agree. For large scale bridging , for those who do not want to change their existing gears , SPB is a good choice.

To explain why I said sort of , and what other network overlays which also gives you layer 2 multipath capability and also STP free mechanisms, let me write a blog post for Networkcomputing , I would like to see your comment there as well. Thanks

Network simplicity using Shortest Path Bridging

I've always held the view that a network design should be as simple as possible. I read several of the examples in this post and many of those issues can be simplified by using Shortest Path Bridging from Avaya.

The entire Winter Olympics network infrastructure was run on one protocol - SPB. That handled everything from IPTV, wireless access etc. on one simple network. Multi-tenancy? No problem.  IP multicast?  Simplified.

This video explains it all.

Re: Network simplicity using Shortest Path Bridging


*lol* Small world... funny that you chose a video that I'm in :-) It's a great demo, and Paul Unbehagen is the boss when it comes to SPB. Especially with Multicast, Avaya seem to have nailed that solution to the wall.

SPB then is a good solution - or could be in theory. My irritation with all the fabric technologies right now is the vendor lock in. Nobody seems to be sticking to standards which is great if you're in a heterogenous network environment and can use that enhanced feature set, but less great otherwise.



Can a network design ever reflect business goals?

Dare I say it, reading through some of these comments makes the case for SDN, if there was ever a reminder of Scott Shenker's The Future of Networking, and the Past of Protocols presentation at the first ONS, then this was it!

As stated in a previous post simplification appears to be key to flexible network infrastructure, which is not only what the market is demanding, but also the direction in which the industry is moving.

No disrespect to the author, but I'm not sure many of the protocols espoused in the article reflected the needs of a dynamic business environment, hence designing a network to reflect business goals seems a pretty tall order.