Network Computing is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Troubleshooting Network Performance in Cloud Architectures


(Image by Gerd Altmann from Pixabay)

Troubleshooting within public or hybrid clouds can be a challenge when end users begin complaining of network and application performance problems. The loss of visibility of the underlying cloud network renders some traditional troubleshooting methods and tools ineffective. Thus, we must come up with alternative ways to regain that visibility. Let's look at five tips on how to better troubleshoot application performance in public cloud or hybrid cloud environments.

Tip 1: Verify the application and all services are operational form end-to-end

The first step in the troubleshooting process should be to verify that the cloud provider is not having an issue on their end. Depending on whether your service uses a SaaS, PaaS or IaaS model, the verification process will change. For example, Salesforce SaaS platform has a status page where you can see if there are any incidents/outages or maintenance windows that may be impacting your users.

Also, don't forget to check other dependent services that can also impact access or performance to cloud services. Services such as DHCP and internal/external DNS are common dependencies can cause problems -- making it look like there is something wrong with the network. Depending on where the end user connects from in relation to the cloud application they are trying to access, the DHCP and DNS servers used will vary greatly. Verifying end users are receiving proper IP's and can resolve domains properly can save a great deal of time and headaches.

Tip 2: Review recent network configuration changes

If a performance problem to a cloud app seemingly crops up out of nowhere, it's likely a recent network change is to blame. On the corporate LAN, review any firewall, NAT or VLAN adds/changes didn't inadvertently cause an outage for a portion of your users. These same types of network changes should also be verified within IaaS clouds as well.

QoS or other traffic shaping changes can also accidentally degrade performance between the corporate LAN and remote cloud services. Automated tools can be used to verify that applications are being properly marked -- and those markings are being adhered to on a hop-by-hop basis between the end user and as far out to the cloud application or service as possible.

Tip 3: Use traditional network monitoring and troubleshooting tools

Depending on the cloud architecture model you're using, traditional network troubleshooting tools can be greater or less effective when troubleshooting performance degradation. For instance, if you use IaaS such as AWS EC2 or Microsoft Azure, you have enough visibility to use most network troubleshooting and support tools such as ping, traceroute, and SNMP. You can even get NetFlow/IPFIX data streamed to a collector -- or even run packet captures in a limited fashion. However, when troubleshooting PaaS or SaaS cloud models, these tools become far less useful. Thus, you end up having to trust your service provider that everything is operating as it should on their end.

Tip 4: Use built-in application diagnostics and assessment tools

Many enterprise applications have built-in or supplemental diagnostic tools that IT departments can use for troubleshooting purposes. These tools often provide detailed information that help you determine whether performance is an application-related issue -- or a problem with the network or infrastructure. For example, if you're having issues with Microsoft Teams through Office 365, you can test and verify sufficient end-to-end network performance using their Skype for Business Network Assessment Tool. Although this tool is most commonly used to verify whether Teams is a viable option pre-deployment. It can also be used post-deployment for troubleshooting purposes.

Tip 5: Consider SD-WAN built-in analytics or pure-play network analytics tools

Network analytics tools and platforms are the latest way for administrators to troubleshoot network and application performance problems. Network analytics platforms collect streaming telemetry and network health information using several methods and protocols. All data is then combined and analyzed using artificial intelligence (AI). The results of the analysis help pinpoint areas on the corporate network or cloud where network performance problems are occurring.

If you have extended your SD-WAN architecture to the public cloud, you can leverage the myriad of analytics components that are commonly included in these platforms. Alternatively, there are a growing number of pure-play vendors that sell multi-vendor network analytics tools that can be deployed across entire corporate LANs and into public clouds. While these two methods can be expensive and more complicated to deploy initially, they have shown to speed up performance troubleshooting and root cause analysis processes dramatically.