The finding may help to create more efficient data-center and cloud-based computing systems, but it strikes a blow to the notion of the cloud as a ubiquitous, universally accessible computing resource that makes the location of processors and data irrelevant.
The finding is the result of tests conducted in cooperation with Google by researchers Lingjia Tang and Jason Mars of the Jacobs School of Engineering at UC San Diego. The tests were conducted on Google Web servers, whose performance and data-access status was measured in real time
The two found that software running on Google cluster servers ran significantly faster and more efficiently when the data they used was stored close by rather than in a remote location. The two tested Gmail, search and other applications running in a warehouse-sized Google server installation, then compared those results with tests on similar servers running in isolation rather than as part of a cloud.
They found that, in large clusters, long distances between server and data caused apps to run more slowly because individual processes had to wait longer for data they requested to arrive in cache where it could be processed.
"It's an issue of distance between execution and data," Tang said in an announcement from UC San Diego.
By testing the apps in isolation, where it was easier to identify confounding factors that might have come from other servers, other applications or the network, the two discovered that competition for computing resources within the server--especially competition for space in the CPU cache--also plays a major role. However, distance still remains the primary factor in affecting efficiency.
On multicore systems, applications running on one core will run more slowly if they have to access data accessible through controllers running on another core, they found. Loading the data into RAM, as most applications do, makes it available to apps running on any core. However, applications will still show more latency when they have to use data controlled by software running on a different core.
Part of the reason appears to be latency due to distance, part is competition for space on the bus connecting various cores and space in the cache, as well as the almost negligible distance between processor and data, the two found.
Most of the physical servers on which the cloud is built use Non-Uniform Memory Access (NUMA), an architecture designed to allow efficient multiprocessing using servers with multiple cores or clusters with many servers.
Under NUMA, a processor can access its own on-board memory, or the memory in its server more quickly than it can memory caches on another chip or another server. NUMA compensates for that lag by allowing the processor core to switch tasks to favor threads running on local memory while it waits for responses from tasks using more distant memory.
That makes multiprocessing possible in x86-based servers, but doesn't compensate for the greater distances involved in cloud-based computing. The greater the distance between executable code in active memory and the processor running it, the greater the lag time, Tang and Mars found.
The lag for applications using memory on another core of their own server is almost unnoticeable; the lag for those using memory or processors in another data center, at the far end of I/O buses, Ethernet switches and WAN connections will almost always run more slowly.
The only exception is when all the threads running on a processor are local and have to fight for space in the processor's memory cache. Latency from those collisions can be greater than the lag caused only by distance, if the application is designed to spread its threads among many processors and memory caches to reduce the level of conflict, the report states.
Working from their results, Mars and Tang created a metric called the NUMA Score that measures the amount of dispersion and potential for added latency in applications running on cloud or multicore systems. Keeping the NUMA Score within the right parameters can improve efficiency and speed by 15% to 20%, Mars said.
The score only measures how efficiently servers, processors and data are located; it doesn't map a cloud installation in detail to show which servers are using which pools of data or the ideal location to which applications, servers or databases must be moved to make them run as efficiently--as quickly--as possible.
Moving the data, applications or physical systems across warehouse-sized data centers--or even knowing for sure which applications are accessing which pools of data and when--could be a little more tricky than measuring the end results, however. Tang and Mars presented their findings at an IEEE meeting in China last month and will present them again at UCSD's Research Expo on April 18.