Dependency: You might think that going to a website is a simple process. You visualize that all you do is resolve the host name via DNS, then go to that site. This approach doesn’t factor in multi-tiered servers such as SQL, RADIUS, Oracle, or others that may reside on the backend.
An application command will require more resources, and possible drive writes and reads, to complete your request compared to a simple ping request/response.
Protocol Analysis: I am a student of protocol analysis and I have been practicing it for over 20 years, so this one is my personal favorite.
The first challenge with this approach is how you will capture the packets. For example, capturing from your host requires you to understand if you are using IP, TCP, UDP checksum offloading and why you may see packets larger than the Ethernet maximum (1,518 Bytes). Then there is the tap versus span port debate to keep in mind. I don’t think I’m doing a good job selling you on this approach, am I?
After you captured the packets you need time, patience, and protocol knowledge to calculate DNS, HTTP, TCP, and other response times. Things get tricky when you include HTTPS or QUIC since they are encrypted.
TCP SYN/SYN ACK or connect time is 130 milliseconds
HTTP response time is 170 milliseconds
DNS response time is 1 millisecond
Thresholds: Regardless of the technique you use, eventually you will come to this important question. What response time value is good or bad? Unfortunately, there is no cut and dry answer for ‘good’ or ‘bad,’ since it depends on what I call ‘context’. For example, I expect the HTTP response time over a busy WiFi network to an internet website would be different than accessing a local web server with a copper 1 GB connection. A multi-tier application will probably have a different response time than a single tiered application. This is where a baseline comes in. Ideally you should have several previous measurements to compare your results to. Unfortunately, if this process take too much time, you are less likely to do it on a regular interval.
One tip I give regarding determining what is bad and what is a good response time when no baseline is available is to take some sample TCP and application response times as a reference.
We have all performed active testing when troubleshooting. Active testing is you actually running the real application on the actual network to the production server. For example, if you get a complaint that webmail is slow, you would launch your browser to see if you encounter the same issues and test accordingly.
Active test systems all have the same architecture consisting of an agent and a console. Every vendor is a bit different so make sure you do your homework to better understand how they work.
In most cases the console controls the agent and gathers/analyzes the results. This is where the vendors start to get unique since some products only support certain operating systems. You should make sure that your solution supports your various operating systems like Microsoft, Apple, Linux, Android, or others.
You should also check if the test scripts are modifiable and if you can create your own. In many cases you might want to test http servers within your network, not just google.com.
Lastly, whatever system you decide to use, it should have built in thresholds and the ability for you to modify them. Ideally you should be able to see the test results, see which threshold caused the test to fail, and adjust accordingly. The idea here is that after a few tests you will develop a baseline of what is normal, good, or bad.