Network Infrastructure

Tool Time

Does the thought of testing prospective network devices make you long to just drop the system into your live network and cross your fingers? Read on.

September 22, 2003

21 Min Read

Performance, Latency and Loss

Before we get to specific tools, let's define the goals and measures of performance testing.

First, throughput and latency aren't goals of performance testing, but measures. Typically, the goal of a test is to see if the system under test (SUT) can handle current or future traffic conditions. It's not enough to test for raw bits per second at a fixed frame rate, because this doesn't come close to modeling real traffic. The goal of any performance test should be to model traffic based on known or estimated patterns. Once the traffic is modeled, the test can be run against a number of systems, and you can compare performance.

The IEEE's RFC 1242, which specifies benchmarking terminology for network interconnection devices, defines throughput as the maximum rate at which there is no frame loss. However, the terminology for RFC 1242 is aimed at Layer 2/3 devices. The concept of frame loss fails when the testing goes to TCP-oriented traffic, because the error-control and congestion-recovery mechanisms in TCP mean that packets, which are contained in frames, may be lost along the way and retransmitted. When testing TCP, you'll typically want to find out the maximum rate, in bits per second, that can be achieved before the sessions start to fail.

Raw bits per second are of little interest unless you're testing modems and other bit-oriented devices. In testing devices higher up the OSI model, especially at Layer 3 and higher, remember two things: A number of factors, including packet size, can alter the SUT's performance, and once you hit Layer 4 and above, you have to worry about the number of concurrent sessions.Vendors of products that work at Layer 4 and above like to quote performance numbers at large packet sizes and with a few sessions because most products perform better when dealing with large chunks of data from a handful of sources. But these performance numbers aren't particularly useful because traffic consists of small packet sizes and lots of concurrent sessions. Before you trust anyone's performance claims, including those of third parties, make sure they explain how the test was conducted and the nature of the traffic used in the test.

Latency, to paraphrase RFC 1242, is the time interval from when the last bit enters the SUT to when the first bit exits the SUT. This is the definition for store-and-forward devices--a category that includes routers, switches, load balancers and security equipment--because the packets are first received by the SUT and stored in a buffer, after which some processing occurs and the packet is sent on its way. Latency is a problem for both transactional processing, which happens with databases, and interactive processing, like telnet or SSH (Secure Shell). Although testing latency is especially difficult, you can get reliable latency numbers by placing testing devices adjacent to one another, as we did in our test of midrange Fibre Channel switches (see "High on Fibre").

Loss is just that: lost packets or, in the case of TCP, lost sessions. Loss in an SUT can be as simple as the SUT not accepting new traffic or dropping old, queued traffic in favor of new traffic. Loss is commonly a limiting factor in throughput testing.

Of course, how an SUT performs during a test can vary widely based on the traffic presented. Most Layer 2/3 devices, when presented with a traffic volume that exceeds capacity, will simply start to drop packets, introducing loss. Latency tends not to manifest itself until the capacity is nearly reached and the SUT is overloaded.

On the other hand, when application-layer devices are tested, latency per connection rises sooner as the throughput increases because connections are queued up while other connections are being processed. Especially at Layer 7, latency is often the limiting factor in the useful throughput an SUT can support, long before actual throughput constraints are met. For example, SUTs processing Web applications with a latency under load in the seconds and tens of seconds are all but useless.There are significant differences in how various testing tools measure performance, but the critical component is how they interact with the SUT. The two broad classes of tools are traffic generators, which create large numbers of packets of the sort that may or may not be produced by a working network stack, and transaction generators, which, at minimum, send and receive real transactions over a valid, working network stack. The main differentiator is that transaction generators implement a true network stack, including the application layers.The key consideration when deciding on the right tool is whether the SUT needs to enforce at Layer 4 and above. Devices that do interact with Layer 4 streams, like firewalls and load balancers, require a transaction generator. That's because the SUT is interacting with the Transport layer stream, so that stream has to behave properly. It's also possible that the SUT will interact with the application, which can contain dynamic content--in this case, simulated traffic at Layer 4 and above won't be processed properly by the SUT because the simulated traffic expects specific responses to requests. Dynamic content or responses that aren't expected won't be processed or will be flagged as errors.

Traffic generators are bit blasters. They test performance and are good for testing Layer 2/3 devices. The advantage of traffic generators, like Spirent Communications' SmartBits and Ixia Communications' Chassis, is that they can create packet traffic to exacting specifications and send those packets at near line rate. For a recent review of 10 Gigabit Ethernet switches (see "Life in the Really Fast Lane,"), we used the Ixia 1600T to test performance and QoS (Quality of Service). By setting IP header fields, like IP address and ToS/DiffServe for QoS, we were able to exercise the 10 Gigabit Ethernet switches with high rates of varied traffic.

Hardware Transaction Generators

click to enlarge

It's important to note that in addition to aiding performance testing, traffic generators provide reliable background traffic. In both the 10 Gigabit Ethernet switch tests and our review of midrange Fibre Channel switches, a traffic generator was used to stress the SUT with different types of traffic while a second tool measured performance. As with performance testing, the traffic generator must properly exercise the SUT; otherwise, the background traffic is useless.

A common mistake is to use a traffic generator to send UDP (User Datagram Protocol) traffic through a stateful packet-filtering firewall while using some other TCP-based tool to measure performance. The problem is that UDP traffic doesn't place the same load on the firewall as similar amounts of TCP traffic.

And though traffic generators often simulate TCP traffic with random initial sequence numbers, sequence number incrementing, source port generation and the like, the resulting traffic may not behave quite according to the TCP RFC. If the SUT actively tracks TCP state, you may find the tools don't perform as you'd expect.Transactional generators, on the other hand, instantiate a full network stack and can often interact with existing servers as a client. Predominantly used to test HTTP, most products also support other common application protocols, including streaming media, SMTP, FTP and POP3. Transactional generators can test in a closed loop, with the test tool acting as both the client and the server, or in an open loop, where the tool acts as a client and is used to test a live system. The performance metrics transactional generators provide are similar in name to those kicked out by traffic generators--throughput, latency and loss--but there are some differences in terms of how latency and loss are measured.

Transactional generators typically use the network stack of the underlying OS to send and receive data. Thus, transactional generators' latency measurements also have the latency induced by the network stack. Finally, latency is measured across the system, which includes the tools' network stack, the intervening infrastructure and the target server/application combination, which may gauge an end-user experience.

When To Get Real

Other considerations for performance testing are the differences between synthetic and real traffic, and protocol standards compliance and interoperability. When it comes to performance testing, not all packets are created equal.

Software Transaction Generators

click to enlarge

Synthetic packets generated by packet blasters should be supported by Layer 2/3 devices as long as the SUT doesn't track state or use parts of the packet for processing, because the fields outside of addressing are rarely used. Protocol implementations become more critical when testing TCP devices that track session state and monitor for appropriate field values. Test tools that create synthetic TCP traffic, like Spirent's SmartWindow and SmartTCP, are useless when testing TCP-aware devices. Likewise, field values in the TCP sessions, like sequence numbers, have to be appropriately selected and incremented during the life of the TCP session; otherwise, the TCP-aware device won't likely process the packets properly.We ran into problems with synthetic TCP generators when testing stateful packet-filtering firewalls because the firewalls tracked session state. The tool didn't properly shut down TCP connections, so the firewall maintained them in an open state, and packets that were lost were never regenerated, causing the connections to remain open. The traffic generator reused initial sequence numbers, causing overlap when the ISN and IP address matched existing connections. All these issues combined made for lots of troubleshooting--and this warning to learn from our pain by vetting your tools before you use them!

Synthetic traffic also resides on fully functional IP stacks, but the transactions are synthetic--they simply mimic a transaction. Synthetic application testing works fine for Layer 4 devices where the IP stack behaves as intended and the transactional packets are of varying sizes, but devices that are application-aware are unlikely to process these transactions properly. For example, application-proxy firewalls won't process synthetic application traffic correctly because they track application state.

When testing application-layer devices, like application proxies and HTTP load balancers, you must generate real application traffic; otherwise, you'll run the very real risk of the SUT failing to treat the traffic properly. Many vendors claim in their marketing literature that their tools support application-level traffic, but the proof is whether the test tool and the server can interact.

The best way to determine if a test tool functions properly is to first understand how the protocol is supposed to work. Whip out your packet analyzer and capture and analyze a trace or two. Then, using the test tool, capture and analyze that traffic. Look for anomalies, such as sequence numbers starting at "1" and increasing, unusual protocol flag combinations, and other odd behavior.

Test tools can test application functionality, performance or both. What you want to measure is going to dictate the chosen tool. Functional testing can be as simple as checking to ensure that all the pages in a Web application are available, or as complex as in-depth quality-assurance testing to determine if the application is stable and to ferret out bugs.

Performance testing, also, can take on many forms.We've talked about raw performance--but though the numbers are useful, they rarely tell the whole story. Benchmarking a single Web server tells you the performance of that server, but that doesn't mean you can add two servers and double performance. There has to be a device that balances traffic between the two--a device that has its own performance limitations. Fail to take that into account and you've pooched the test.

As you develop the goals of your test, consider the traffic scenario you wish to examine. Obviously, you'll model the traffic you have or expect to have as closely as possible, but there are many variations in how you create that model. For example, real traffic is often made up of a mix of UDP and TCP traffic, and a large portion of it is less than 256 bytes in size and composed of TCP-session setup and tear-down, DNS queries, fragments and ICMP traffic. As mentioned earlier, how detailed you need to be in modeling the traffic depends largely on the SUT.

It's common for performance tests to be conducted at large packet sizes because network devices typically perform better. However, in a review of VPN hardware (see "IPsec VPNs: Progress Slow but Steady") where our performance tests were conducted with a mix of traffic ratios, we were surprised to see the Cisco 7140 perform better with a predominance of small packet sizes. When we asked Cisco to explain, it told us it optimized the packet handling for smaller sizes because small packets are more common than large packets. These distinctions don't come out if you test at a single fixed size.

Lost In ...

Loss can be measured in several ways. Remember that TCP has mechanisms to handle loss, such as retransmission and sliding windows. The effect of loss, in the conventional sense of a lost packet, can be hidden by TCP--a fact that's acceptable, because when transactions are tested, some loss is part of normal network operations. An increase in TCP retransmissions, however, is an early indicator that some part of the SUT is becoming overwhelmed. Retransmissions result in a reduction of TCP throughput, because less data is making it end to end, and/or an increase in latency, because each established TCP connection is waiting for data to arrive.Loss of TCP connections--either the dropping of existing connections or the inability to access new connections--means the SUT has reached capacity. Capacity can be defined as bits per second, sessions per second, concurrent sessions or a combination.

Note that transactional generators also track transaction state and success/failure. Transactions can become latent and be lost even though the target TCP stack will continue to accept incoming connections. This indicates the application is overwhelmed, while the underlying OS may still be chugging along. So transactional generators track similar metrics, but the context of the metric dictates its meaning.

Other Tools

We keep a host of other gadgets in our toolbox to augment the test bed or aid in troubleshooting. Some of these are commercial products, but many are open source and just as feature-rich as their commercial counterparts. Here are a few of our favorites:

• Although we typically talk about application performance on a clean test bed, network applications often break down when contending with packet loss, latency and fragmentation. However, testing on degraded links makes repeatability difficult at best. We use products from Shunra to model a WAN with varying link characteristics, such as bandwidth, latency, loss and jitter (see "File Distribution Across the WAN"). Application-simulation tools, such as NetIQ Chariot, are good at testing Layer 3/4 devices--but because the transactions are simulated, they don't work as well at the Session, Presentation and Application layers, which are static.• Running applications across a modeled network shows how the applications will behave in similar situations. We used Fragroute and Fragrouter (see "NIP Attacks in the Bud") to test security devices' ability to reassemble traffic streams and detect obvious evasion techniques. Fragroute can slice packets into tiny fragments, duplicate and reorder packets, set IP options and do a host of other nasty things.

• Before we run these packet games, we like to see what the SUTs are doing. Protocol-analysis tools are essential for discovering and troubleshooting network problems, and desktop protocol analyzers like Network Associates' Sniffer and Wild Packets' EtherPeek are invaluable for analyzing network traffic. Both analyzers support a wealth of protocol decodes for the hexadecimal-challenged, and they have extremely flexible packet-filtering capabilities. The open-source Ethereal has fewer features, especially in expert analysis, but it has good protocol decodes and has been ported to multiple OSs. Better than Java--learn once, run everywhere. For you command-line geeks, tcpdump is a viable option and, like Ethereal, has been ported to multiple OSs.

• Because most NICs drop errored frames, we use in-line protocol analyzers, such as Network Associates' Sniffer Distributed s400 Model EG2S appliance, to monitor all traffic passing a point in the wire. The EG2S, which sits in-line and uses an external console to capture packets and analyze data, can capture all traffic on full-duplex gigabit links. The downside is no real-time packet analysis.

Mike Fratto is a senior technology editor based in Network Computing's Syracuse University Real-World Labs®; he covers all security-related topics. Prior to joining Network Computing, Mike worked as an independent consultant in central New York. Write to him at mfratto@ nwc.com.

Post a comment or question on this story.There are several ways to get traffic off the wire and into a protocol analyzer, and each has advantages and disadvantages. At first glance, the easiest method is to plug the network and the protocol analyzer into a hub. You can have as many protocol analyzers as the hub supports. The caveat is that if you're monitoring Fast Ethernet, you've halved your overall throughput--100 Mbps shared versus 200 Mpbs switched--and you've introduced contention on the network, which means collisions.Better choices are a mirror port on a switch or a network tap. Mirror or span ports on a switch take traffic off a target port, group of ports or VLAN and mirror it to a monitor-only port, to which you attach a protocol analyzer. Just be aware that you're limited to half the bandwidth on a full-duplex connection, because all the traffic is passing down the transmit wire pairs from the switch. Also, note that switches won't pass frames that have Layer 1 or 2 errors, such as frame CRC errors, so if you're troubleshooting a Layer 1 or 2 problem, you'll have to use a hub or a network tap.

Some switches let you transmit traffic out of the span port so it can be used as a network port. But you shouldn't use that feature, because you're injecting traffic into the traffic flow you're monitoring and could overrun the switch's capacity. Use a separate network port to send/ receive traffic.

Network taps sit in-line with the physical network and transparently pass electrical or optical signals through while shunting traffic off to an external port. Typically, network taps will forward traffic to the monitor port only when the tap is powered; otherwise, it will pass traffic through the wire but not out of the monitor port, so in the event of a power outage, your network won't be affected. Although a network tap has the distinct advantage of mirroring the physical signal to a monitor port so you can see all errors from Layer 1 on up, you can't just plug the monitor link into any old NIC and expect to see all traffic.

Network Tap to a Span Port

click to enlarge

Remember: On full duplex connections, a network tap turns both the A (transmit) and B (receive) wire pairs into transmit pairs. The NIC in your desktop is going to collect only on the receive pair, not the transmit pair, so you'll see only half the traffic.

There are two ways to work around this problem if you're using fiber. If your OS and NIC support interface binding, feed one fiber interface into one fiber NIC, and the second fiber interface to a second fiber NIC. Then, bond the two NICs together to form a virtual interface and use that interface for protocol analysis.If you lack bonding NICs and/or drivers and OS support, an alternate method for getting both directions of traffic is to feed one fiber interface into a switch port, feed the second fiber interface into a second switch port, then span both to a mirror port. Of course, you'll have to make sure your aggregate traffic doesn't devour your bandwidth, but we've used this method successfully with a Cisco Catalyst 2914.A well-designed test plan is essential if you want usable results when evaluating multiple products. Like a seasoned carpenter who measures twice and cuts once, you'll find that most of your conceptual thinking and problem-solving happens during the development of the test plan. You'll set the goals of the testing, work out the network design, determine the resources needed, and create a checklist of metrics and their measurements.

To begin, write a problem statement and list the features you require in order of importance. If you started the purchase process with an RFI (Request for Information) or RFP (Request for Proposal), some of this work may be done. The features you want to test are goals to be achieved.

Next, assign measurement metrics to each testing goal. Determining the measurement metrics can be as straightforward as deciding whether to measure in bits per second, sessions per second, sustained sessions, latency or some combination of these. Softer metrics, such as testing integration, will depend on what you're trying to accomplish. For example, you could state that criteria for "good integration" include a well-defined API and support for multiple protocols.

Design your test bed and include all supporting hardware and software. Establish your addressing and routing. Try to make the test mirror your designated installation as closely as possible. For example, if a system under test (SUT) is commonly sandwiched between two routers, put the routers into your test bed.

Send your test plan to colleagues for comment. Ideally, get feedback from the people you're testing for to make sure you're testing the features important to them.Finally, build your test network and vet it for functionality and performance. At this point, you may have to tweak the test plan if the test bed deviates from what you'd planned.

If you've done your job well, the test plan will be pretty polished and will need little modification during testing.

Notice we said little, not no, modification. If we've learned one thing in our years of product testing, it's that even the best-laid plans will require some tweaks. We always find little things we missed or miscalculated. The point is to have the test bed running and documented before you insert devices.

Run performance tests without an SUT in the mix so you can baseline the system. Then you'll be able to analyze the performance of the SUT in reference to the baseline.

Now, a test plan is just that--a plan. Plans can change during testing. At times, you'll need to make architectural changes to accommodate an SUT that won't work in the standard configuration, or because an unplanned feature turned up. You should keep these changes to a minimum and ensure that they don't bias the test to a specific product. In addition, you should retest all your SUTs under the same conditions.Always document changes; repeatability is important, and full documentation will help testing go more smoothly in the future.

After testing is complete, compile your results and present them. We find it helpful to build a template of features or questions that crop up during testing, and to answer each question for each product. This ensures that we can compare both hard and soft test results per SUT.

Even if you're testing just one system, maintaining a template helps organize results and assessments as you progress.

Here's a sample test plan we used for a recent review of NIP devices (see "Inside the NIP Hype War"):

NIP Test PlanTest the devices for protection features, performance, management and reporting. First, test the devices on a test bed to gain familiarity with the products and to examine protection and performance in a controlled environment. Next, place all devices in one-arm mode on your production network, where you'll focus on protection, tuning and reporting.

Goal: Test each of the products to see if they accurately detect known attacks that are run directly from known tools and altered into tiny fragments or reordered. Check out the impact that high traffic levels have on the devices in terms of throughput, latency and degradation in attack detection.

Protection Features Testing
Tools:

Vulnerable systems
Windows 2000 SP0
Red Hat Linux 6.2
Red Hat Linux 7.1
Solaris 8

Vulnerable Applications

IIS 5.0 (SP0)
Apache (several versions with known flaws)
Sendmail (several versions with known flaws)
Telnet
Oracle8i
Oracle9i
RPC

Other applications

Vulnerabilities tested:

SANS Top 20
Current vulnerabilities, including DoS and DDoS attacks
Port scan (nmap)
IDS evasion (fragroute, slow scans)

Match the three sections to build a vulnerable network. Vulnerabilities don't necessarily need to be working, as long as they're realSecurity Testing

On a clean network, test each vulnerability through the test bed to ensure that everything's working properly. Get packet captures of each for analysis and comparison. Define capture filters on protocol analyzer to capture just the interesting traffic.
Insert SUT into network and run exploits (shunning should be disabled).
Ensure that the exploits are passing through the SUT.
Ensure that the exploits are properly detected.
Look for generic definitions.
Look for multiple signatures per exploit.
Look for misclassified exploits.

Performance Testing

On a clean test bed, get a baseline of the performance test, which will be used to determine any degradation introduced by the product and ensure that the test bed is fully functional.
Insert SUT into network with current policy.
Run performance test, increasing the bandwidth in a binary search starting at half the rated capacity.
Ensure that all tests and TCP sessions have closed properly. When in doubt, reboot the SUT to clear state tables.
Use test beds that generate real, valid IP/TCP/UDP. Simulators will cause failure in security-related SUTs.
Ensure that ISNs are sanely chosen (examples of poor ISN choices are those starting at 1).
Ensure that sequence numbers increment properly.
Ensure that IP/TCP/UDP headers are properly written.
Ensure that there are no CRC errors generated.