Network Support for AI

Is your network ready to handle AI processing and data? Here are some points to consider.

Data center network for AI
(Credit: Westend61 GmbH / Alamy Stock Photo)

AI (artificial intelligence) processing and data payloads differ substantially from what they are in traditional network workflows. What changes do you need to consider to get your network ready for support of AI applications?

This is the "front-and-center" question facing enterprise network professionals because AI is coming.

At the end of 2023, 35% of companies were using some kind of artificial intelligence, but the majority of organizations using it were resource-rich tech companies. As other companies begin to deploy AI, new investments and revisions to network architecture will have to be made.

See also: AI-Powered Networks: Transforming or Disrupting Data Centers?

How much does a network pro have to know about AI?

Historically, network staffs didn't have to know much about applications except for how much data they were sending from point to point and what the speeds and volumes of transactions were. This changed somewhat with the introduction of more unstructured "big" data into network traffic, but the adjustment to big data for video, analytics, etc., still wasn't a major disruption to network plans.

AI will change all that—and it will require network staff to learn more about the AI application and system side.

This is because there is no "one size fits all" model for every style of AI processing.

Depending upon what AI applications process, they will use different types of logical algorithms. These different algorithm types can dramatically impact the amount of bandwidth needed to support them.

For example, if the AI uses a supervised learning algorithm, all of the input data to the app is already tagged for easy retrieval and processing. This data is also coming from a finite data repository that can be quantified. In contrast, AI applications like generative AI use an unsupervised learning algorithm. In an unsupervised learning algorithm, the data is untagged and requires more processing because of that. There can also be a limitless flow of data into the application that defies quantification.

It’s easier to estimate and provision bandwidth for AI that uses supervised learning algorithms because many factors about the processing and data are already known and because the data is pre-tagged for better performance. Knowledge of these factors will likely allow you to allocate less bandwidth than you would need to provide for an unsupervised learning algorithm.

If the AI system uses an unsupervised learning algorithm, the network bandwidth estimation and provisioning get tougher. You can't really gauge how much bandwidth you'll need until you gain experience with the app over time because you don't know how much data is coming, what its payload burst rates will be, or how hard it will be to process the data. Most likely, you will over-allocate at first and then fine-tune later as you gain experience.

In all cases, the network staff needs to cross-communicate with the applications and data science groups so staff has an upfront understanding of the AI processing algorithms that will be used, and how they can plan bandwidth and other elements of network performance to handle the workload.

Additionally, AI uses parallel computing that splits processing into a series of smaller tasks that run concurrently in order to speed processing. The AI can use hundreds or even thousands of processors concurrently over many different machines. The process flows that are highly related to each other are grouped into computing clusters that exert tremendous throughput demands on networks. Congestion in even one of these processing flows can slow down an entire cluster.

Network staffs will initially be challenged to facilitate the timely completion of AI jobs and to eliminate congestion. Since there are few best practices in this area of network management, network staff will be required to acquire knowledge from experience and develop their own best practices as they go.

What new network investments need to be made for AI?

Supercomputer-level performance will be needed to support processing-intensive, unsupervised algorithms in applications like generative AI, and networks and network technology will have to be scaled up to handle these workloads.

On the edge network device side, Google's Tensor Processing Unit, an ASIC (application specific integrated circuit) that supports the Google TensorFlow programming framework, will be used for AI machine and deep learning; while Apple is using A11 and 12 bionic CPUs.

To support these and other AI technologies, internal Ethernet backbones need more beef—and organizations like the UEC (Ultra Ethernet Consortium) https://ultraethernet.org/ recognize this. This is why members of the consortium are working to define and develop an open, scalable, and cost-effective communications stack for the network that will support high-performance AI processing and workloads while still using the stable base of Ethernet to get that done.

Unfortunately, much of this new AI stack-enabled technology isn't here yet—but it could be arriving in 2025; so now is the time to start planning for it and determining how its incorporation will alter network topologies.

Read more about:

Infrastructure for AI

About the Author

Mary E. Shacklett, President, Transworld Data

Mary E. Shacklett is an internationally recognized technology commentator and President of Transworld Data, a marketing and technology services firm.

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox

You May Also Like


More Insights