Route Optimizers: Mapping out the Best Route
Whether you choose a BGP-based optimizer or the simpler DNS device, you'll be on the right road to managing your multiple ISP connections.
December 5, 2003
Larger organizations will more likely go the BGP route because many are using the protocol already. The Radware and F5 products are designed for smaller organizations that don't want to deal with the difficulties and costs of peering with the Internet. Our recommendation: If you are running BGP and have trudged over the protocol's learning curve, your best bet is to use a BGP-based route optimizer. If you are primarily looking for redundancy or just don't want to get involved in BGP, a DNS-based product will fill the bill while doing a better job of optimizing users' experience on your internal network. The disadvantage of the DNS-based devices is that you will have to use NAT, which can add its own problems for some applications (see a rundown of NAT foibles).
No matter which type you choose, these products will improve the user experience only if one of your providers' networks has noticeably better performance. Another factor to consider is ensuring that your ISPs have diverse routes through the Internet. You need to perform due diligence up front, and this is where the BGP-based products can help: In our tests, they were best at providing detailed reports and summaries about the overall performance of each ISP. This information could be used to justify switching to a better-performing ISP; to recover money for failure to honor an SLA (service-level agreement); to negotiate a better deal with a current provider; or to audition a prospective ISP via the device's monitoring features. You'll be able to see what traffic would be switched on or off the provider's network based on performance. To get the optimal balance between performance and cost among multiple ISPs, a BGP-based optimizer is your best bet.The biggest problem with BGP is that, by default, it does not necessarily route traffic down the best-performing path. Instead, BGP chooses routes based on the number of networks--as defined by ASNs (Autonomous System Numbers)--that will be traversed. This limitation can be overcome because BGP comes with a lot of knobs that you can fiddle with to tune it--if you have the chops. It would mean manual intervention from a highly skilled network engineer for every network that needs improvement.
There's another problem that a route optimizer can fix: the feedback loop. Say customers in Chicago are experiencing dismal bandwidth on your site. It could be hours, or even days, before the problem is reported and addressed. By then you may have lost business. In our tests, all the products were able to beat this lag, rerouting traffic down a better path within a few minutes with no manual intervention, ensuring that the best-performing route is always utilized.
We'll admit that there are probably a few resourceful, BGP-literate network engineers who have put together slick PERL scripts to automate some of these tasks, but we wouldn't want to be the ones to make sense of the scripts if these gurus ever left the company. And, unless you have a BGP savant, it's unlikely that any home-grown utility will be as good as a route optimizer.
One disadvantage we found with the BGP-based route optimizers is that they were able to reroute traffic only as it left our private network, not on the return path. Although it is possible to manage both ends of a transaction with BGP-based devices, it takes a long time to propagate changes across the Internet. And, since it requires updates to all the Internet's routing tables every time you make a change, we don't recommend going down this road very often. As a result, BGP-based products are best suited for networks that are serving data to the greater Internet, such as Web servers, because they will optimize the path that most of the data will take. A BGP device may provide a slight improvement for those inside the network because their requests to external networks will be optimized, but their return data will not be--that is, unless the site with which they are communicating is using a similar product. Which brings us to one popular application for these products: site-to-site VPNs. By installing one route optimizer at each company location, the Internet can be used with much greater confidence to transport critical data reliably.Multiple Addresses
DNS-based devices, on the other hand, impact both outgoing and return paths by default. In fact, they have to change both because different external addresses are used for incoming and outgoing traffic; these addresses are advertised only by the ISP that connects each corresponding link (see "The DNS Advantage,"). Basically, there is an external interface that connects directly to each ISP. On the inside, these multiple external IP addresses are mapped to single addresses via NAT. It's like giving one computer multiple addresses to communicate on the Internet, and each address has its own ISP. The address associated with the best-performing ISP is the one that is used. Performance is determined by pings and other types of probes. The DNS approach does more to improve performance for those inside a network, pulling data from the Internet. For example, whatever interface a Web request leaves on, it will also return via the ISP that advertises the source IP address.
Vendors at a Glance |
Optimizing connections that are initiated from the outside is a little trickier. If a client attempts to access an internal resource from the outside, the DNS response the route optimizer provides will direct it to the appropriate IP address, forcing the traffic through the corresponding ISP. For this to be effective, the route optimizer must act as the authoritative DNS server for the external IP addresses. A DNS TTL of 10 seconds is published, forcing the external client's local DNS server to ask constantly for an updated IP address from the device. On the plus side, this makes it possible to update constantly the IP address provided based on performance. However, there are disadvantages to the DNS approach: It depends on DNS servers honoring the lower TTL; it's dependent on clients not caching the DNS information; and it requires the use of NAT.
Given the differences between the two types of products, we split the report card, so that the BGP-based optimizers were not rated against the DNS-based ones. All the products performed well; however, the Internap product was a clear winner in the BGP category, thanks to better reporting and ease of management. On the DNS side, the Radware and F5 products tied in our report-card ratings. Radware had more flexibility in its features, but the F5 device compensated with better pricing. However, we're awarding Radware our DNS Editor's Choice because, though price is important, we weight a strong feature set much more heavily.Internap Flow Control Platform (FCP) Model 100)
The FCP 100 had the best combination of features, functionality and management of all four products tested. It had excellent reporting, superior management options and it outperformed the others at crunching the vast amount of information gathered into meaningful reports.We set the FCP to peer with the Foundry router that controlled the direction of all the egress traffic. One port on the FCP was plugged into a switch behind the router for this purpose. This port was also used to send pings and other probes across our various ISPs to help assess performance. The FCP had a separate port for CLI and Web-management access.
The FCP monitors all TCP flows in real time--making it unique among the devices we tested. It looked for retransmits and long RTTs (round-trip times) as indicators of performance problems, such as latency or packet drops. Data had to be fed through the device via a span port off of a router or switch. The advantage is that the FCP does not sit in the data path, avoiding the potential to become a point of failure. There is an additional port on the FCP dedicated to this function. This setup won't help UDP (User Datagram Protocol) traffic, however, because UDP is stateless and doesn't have the information in its headers that TCP has. For UDP to benefit, there would have to be previous traffic to the network that had used TCP. Still, the probes can monitor potential UDP performance.
Another thing to love about the FCP is that before it starts sending probes to test the paths, it does multiple traceroutes to determine the point of convergence for all of the paths involved. At some point, all paths through the Internet will converge before arriving at a single destination. Once the point of convergence is found, the FCP starts sending probes to that point, avoiding the part of the network over which it has no control. This makes for efficient performance analysis.
Internap took the information it gathered from the traceroutes one step further and made it into a useful diagnostic utility. By clicking on one of the prefixes that were monitored, we could draw a graphical map of a traceroute for each possible path. This picture made it very easy for us to see the different paths to our destinations as well as potential problems. It also showed how much diversity there was among the paths. For example, we could see clearly if one ISP was relying on another ISP for access or if an ISP was playing hot potato with our traffic.
With the FCP we could configure cost-based policies for our test traffic. For instance, we could set the device to monitor thresholds of utilization where an ISP would start charging additional usage costs. When traffic reached the preset level, the device would move traffic to a less costly link. In our testing, we could actually watch the traffic move as it reached the threshold. And talk about granular controls: We could set the device for multiple pricing levels so that it would switch back and forth in steps as the traffic increased and different pricing rules kicked in. The FCP was also very flexible in its ability to balance cost versus performance goals. For example, we could configure it to move traffic to the least expensive link, but only on the condition that it didn't cause a significant decrease in performance. And if performance problems did cause traffic to shift to an expensive link, the device will test the performance of some of the traffic on that link to see if it can be moved to the cheaper link without a performance penalty. The RouteScience PathControl product has this feature as well.The FCP command line interface was very similar to Cisco's IOS, including commands for setting the configuration and viewing routes and BGP status. Anyone familiar with the CLI on a Cisco or Foundry router will feel at home here. Most device management and reporting could be done from the Web interface, which was well designed and rich with useful information. We did the initial configuration via the CLI, but just about everything could also be done using the Web wizard. The Web configuration and management functionality was very thorough, and more intuitive than the command line, but both were easier than the configuration utilities on the RouteScience device.
Route Optimizer Features |
The FCP's reporting options were almost endless. We were treated to a plethora of report styles, with just about every conceivable summary view, as well as detailed views of all the data on network prefixes and AS numbers. Summary reports included average latency and packet loss for all providers, plus summaries of how much traffic was being directed to or from a particular provider at any given time. More detailed reports listed network prefixes that had experienced a path change, along with previous and current latency and packet loss. TopN reports by AS and prefix showed the destinations with the highest volumes of traffic. In addition, we could have reports e-mailed to us automatically and receive real-time views of network utilization for each provider in the form of a line graph.
Internap Flow Control Platform (FCP) Model 100. Internap Network Services, (404) 302-9700. www.internap.com
RouteScience PathControl 3350
The PathControl 3350 uses BGP peering to the local router that controls egress traffic, just like Internap's FCP. The device was able to issue various types of probes to test performance and also made use of a span port, but only to monitor traffic levels. The PathControl has another method for monitoring traffic: It watches the TCP handshake as a 1 KB GIF is retrieved from the box. To force a client to retrieve the GIF, a pointer has to be put in a Web page. This approach is likely more scalable than the FCP's method of looking at all TCP streams, because it monitors only the TCP handshake, once per Web page access. But it requires that Web pages be edited to point to the GIF, which resides on the PathControl device, and it works only for Web access.When we tested the PathControl in March 2002 (see "Making a Science out of Routing on the Internet,") GIF monitoring was the only method available to monitor traffic performance. Since then, RouteScience has added probes. The company also has added two new models, including the 3350 we tested, which comprises a pair of 2U boxes.
Like the FCP, the PathControl did a better job of optimizing cost versus performance than either of the DNS-based products. For example, we were able to set up cost and performance in a conditional mode, where the PathControl could be configured to favor the cheapest link, but only when it didn't hurt performance. It is also capable of juggling network prefixes from one path to another to get the best mix of cost and performance.
Also like the FCP, the PathControl has an IOS-style CLI. Even though this helped for basic commands, we found the configuration much more complex than its rivals. The initial configuration, as well as tasks like setting up cost policies, required long strings of statements in different configuration modes. Once we got used to it, it was tolerable, but we recommend that you plan on a long learning curve and an intimate acquaintance with the manual. We also found that some configuration errors were not apparent until we ran a medic utility. In addition, major changes required a warm boot of the system. There was no relief available from the Web-based GUI, because all configuration must be done from the CLI.
Although the documentation was thorough, some quick-start guides, checklists and examples of proper configurations would have been helpful. Web-based configuration options would greatly simplify matters as well. We also had some issues with the speed of the GUI: It was completely Java-based and took a long time to load, even using a 1.7-GHz laptop with a gigabyte of memory!
The PathControl was able to report on the routes it changed and the relative improvements in performance, and RouteScience included many other reports that monitored overall performance of our ISPs. In general, the reports were better than those of the DNS products, but not quite as good as those included with the FCP device. One highlight was a report that specifically showed what performance would be like if it was left completely to BGP. This is a nice way to be reminded that you are getting your money's worth. It's also useful for those evaluating the technology for the first time, who may not trust it just yet. In fact, both the FCP and the PathControl could be set up in monitoring mode with all the same reports, indicating what they would be doing to manipulate the routes. Once you're confident in its capabilities, complete control over the routes can be turned over to the devices.RouteScience PathControl 3350. RouteScience, (800) 866-8176, (650) 548-3300. www.routescience.comRadware LinkProof 3.81
The architecture of the LinkProof device was very similar to that of F5's Link Controller. Both use a combination of DNS and NAT to control the paths of traffic through the Internet. But the LinkProof device had more flexibility in the configuration of performance and cost policies. And though this isn't specifically reflected in the report card, we liked the upward and downward scalability of the LinkProof, which has five different models--from a small- or branch-office device to a high-end model with a 10 Gigabit Ethernet connection. In contrast, F5 currently offers only two models.
The LinkProof sat behind our ISP edge routers and tapped directly into our traffic stream. This let it easily redirect outgoing traffic and answer DNS requests to direct traffic back into the correct interface, based on the IP address that it provides. The device also made it easy to monitor utilization. The disadvantage of this method is that it adds another point of failure--if the device fails, all traffic stops. There is no passive pass-through mode; there is really no way to minimize this risk, because the device has to be in the path of traffic to manipulate it. Both Radware and F5 address this by providing a way to set up a second box for failover.
On a side note, the LinkProof takes advantage of its position in the network to offer IDS services, as does the F5 box, but we did not test its IDS capabilities.
The LinkProof provided policies to optimize cost and performance, but it did not give us as much flexibility as the BGP-based products. The only way to reconcile cost versus performance was to give the attributes different weightings that would be added together to determine the best link. Radware says it is working on a change to make this feature more effective and easier to control by using an if/then approach. The F5 box suffered from the same limitations. Radware does attempt to compensate in the meantime by providing the ability to set up exceptions based on network prefixes. For example, critical networks could be assigned performance as a priority, so that they would always use the best-performing link no matter what the cost.The LinkProof can be managed from a CLI or with a provided Java application. The Java app, called Configware, graphically maps out the LinkProof and all its connections to edge routers, or "Next Hop Routers," including an icon representing each. An indicator changed color to indicate the current status of our links--red for down or green for up. From here we were able to run reports and do most configuration tasks. The LinkProof can provide real-time network-utilization reports for each link, something the F5 box cannot do. It also offers a report showing the Top 10 busiest networks and their performance. For more detail, the CLI offered us a table showing each network, its latency and the ISP link in use. We could build a number of useful reports from this information, but it required screen scraping via telnet.
How We Tested |
We would have liked to have seen reports from the GUI summarizing packet loss and latency for our two ISPs. Radware sent us software that it said would summarize the performance of the ISPs as well as the Top 50 busiest networks. We didn't receive this software in time for inclusion in the review, but we were impressed with the company's responsiveness. All available reports can be run in real-time or historical mode. However, there was no way to store the reports on the device.
If you plan to use more than one LinkProof device, Radware has a proprietary protocol that communicates between the boxes and provides control over the traffic without relying on DNS. This would be a perfect setup for VPN locations with multiple ISPs.
LinkProof 3.81. Radware, (888) 234-5736, (201) 512-9771. www.radware.com
F5 Big-IP Link Controller 1000 F5 builds its route-optimization product on its Big-IP platform. Some past Big-IP Web load-balancing products have won awards from Network Computing. The version of the Big-IP Link Controller we tested for this review was set up for route optimization, but many other features that F5 continues to add to this platform can be turned on for an additional cost. A shop that is already using Big-IP for Web load balancing would be able to minimize its learning curve--and its vendor management woes--by sticking with the Big-IP for adding multiple ISPs without using BGP.
The Link Controller taps directly into the path of traffic between the internal network and the ISP, just like Radware's LinkProof. F5 also makes a higher-end model but doesn't have a low-cost, branch-office solution, as Radware does.
A built-in 16-port switch provided connectivity to our internal or external traffic via user-configurable VLANs. We used two ports for external connectivity to the ISP links on our test network, and one port to plug into the Ixia port emulating a Web server via its IxWeb application. We used 100-Mb links for all three connections. The device also came with a fiber Gigabit Ethernet connection. F5 says that the box can support about 1 Gb of data distributed among various ISP connections, but we did not verify this. The disadvantage of having the box in line with the data, besides being a potential point of failure, is that it could add latency if it gets bogged down.
The Link Controller let us shift both outgoing and incoming traffic based on availability, performance or utilization. Controlling outgoing traffic was easy because the box is inline; this also made it a simple matter to track utilization for load balancing or controlling bandwidth for cost. The box also controls incoming traffic, via DNS. Each link will be associated with a specific ISP and a corresponding external address that is routable over the Internet using its associated provider. Each address is mapped to a common internal Web server or server farm.
Every time a new request came in, the Link Controller provided its first response without checking for the optimal route, to save time. Then, it checked both our paths by sending a ping or similar probe in each direction and noting the time it took to get a response. The next time a request was made from the client's local DNS server, the Link Controller provided an external IP address that forced the traffic to the ISP associated with the particular address.The Link Controller did not have a lot of flexibility when it came to optimizing cost versus performance--it was an either/or proposition. For example, we were unable to set up a policy directing traffic to the least costly link, unless the performance exceeded a certain threshold. Both cost and performance were lumped together to form a total score. As with the LinkProof, we could weight one more than the other, but it was not possible to set up any exceptions to the rule. F5 also has another product, called 3DNS, that can set up exceptions based on network prefix.
Big-IP Link Controller 4.5. F5 Networks, (888) 88-BIGIP, (206) 272-5555. www.f5.com
Peter Morrissey is a full-time faculty member of Syracuse University's School of Information Studies, and a contributing editor and columnist for Network Computing. Write to him at [email protected].
Post a comment or question on this story.
The DNS devices change both the incoming and outgoing paths, unlike the BGP products, which can control only the outgoing paths. The architecture of these products requires that the outgoing and incoming paths be symmetric. Therefore, an organization must use external IP address spaces from two different ISPs, with an interface for each ISP that has an address issued. Individual servers are mapped to another address (usually a NAT address) behind the boxes.
When a user initiates a request from inside, the device uses the source address and the corresponding ISP connection that will provide the best path through the network. Handling external Web requests is a little trickier. This is accomplished by advertising a low DNS TTL of about 10 seconds. This forces the end user's DNS server to request an updated IP address every 10 seconds. Here again, the device will provide the external IP address that will provide the best path through the network.For this reason, DNS-based devices have more direct control over the performance of those surfing the Internet. The downside of this approach is that it relies on the DNS infrastructure to honor the vendor's methods. If a DNS server does not honor the low TTL, then it won't work. And, if the client runs software that caches DNS requests, it won't work if a change in performance necessitates a change while the IP address is cached.
Chart |
• "Route Optimizers Put You in the Driver's Seat"
• "Making a Science out of Routing on the Internet"
• "Better Bandwidth Management"
• "Saving Money With Tiered Access"
R E V I E W
Route Optimizers: BGP Products
Sorry,
your browser
is not Java
enabled
Welcome to
NETWORK COMPUTING's Interactive Report Card, v2. To launch it, click on the Interactive Report Card ® icon
above. The program components take a few moments to load.
Once launched, enter your own product feature weights and click the Recalc button. The Interactive Report Card ® will re-sort (and re-grade!) the products based on the new category weights you entered.Click here for more information about our Interactive Report Card ®.
R E V I E W
Route Optimizers: DNS Products
Sorry,
your browser
is not Java
enabled
Welcome toNETWORK COMPUTING's Interactive Report Card, v2. To launch it, click on the Interactive Report Card ® icon
above. The program components take a few moments to load.
Once launched, enter your own product feature weights and click the Recalc button. The Interactive Report Card ® will re-sort (and re-grade!) the products based on the new category weights you entered.
Click here for more information about our Interactive Report Card ®.
You May Also Like