No matter how much hardware and software we throw at data center computing, there is the hard fact that we are limited to using only the resources on a single piece of server hardware. Oh, we can virtualize the PCI bus and share peripherals among multiple servers with virtual IO, and that has use cases. Non-Uniform Memory Architecture (NUMA) is used in high-performance computing to aggregate fundamental computer resources like CPU, RAM and IO across a high speed bus. NUMA has the capability of virtualizing the hardware itself and it's going to be the hot thing within two years.
I first learned about NUMA in a reader comment to a product review that Joe Hernick performed on Liquid Computing's LiquidIQ. Liquid Computing has apparently closed its doors. NUMA systems interconnect resources using intelligence to allocate resources like CPU and memory close together within the same local board, only reaching out over the bus- -a crossbar- -when needed. The process is similar to how your OS uses virtual memory by keeping frequently-used blocks on high-speed RAM and less frequently-used blocks on disk. Of course, accessing RAM via a crossbar is orders of magnitude faster than accessing a memory page on disk vs RAM, but the idea is similar. NUMA is used in cases where data processing is occurring over GBs or TBs of data and going to disk, when even a high-speed SAN is far too slow.
While NUMA is used primarily in research applications, it's going to squeeze out into general IT. Why? Because as Jayshree (who needs no last name) said a few days ago during an interview, today's virtualized data center is looking a lot more like a high-performance computing data center with high-volume, high-capacity, high-IO computing systems than racks of discrete servers doing discrete things. Soon, IT will start bumping into CPU and RAM limitations that adding new discrete servers won't be able to solve well. Jake McTique has written about increasing performance in VMware's vSphere .
If you think NUMA is crazy talk, consider what happens if you can't fit your CPU, RAM and IO requirements within a single server? Even a modest requirement like an eight-core processor means you need a server that has eight cores. How many of those do you have laying around? On a NUMA-based system, that constraint is removed. Need eight cores? Take them from the available pool. Need 128 GB of RAM but your servers have 64GB each? Pool the RAM and you're good to go. You stop thinking of discrete servers and start thinking of processing requirements. I need eight cores, 192 GB RAM and12 GB of IO.
I know I am glossing over lots and lots of details -- details that are not trivial -- and frankly, I don't even claim to fully understand them. I'm not even sure I know the right questions to ask. For example, what are the OS requirements for NUMA? How does the system schedule and allocate resources? How does a NUMA system change application development? Or does it? Remember it took a long time for applications to take advantage of multi-threading and multi-core architectures. Will a NUMA optimized application require a special software architecture or will the NUMA system just magically handle the details? It is fault tolerant? If so, what is the fault tolerance? And so on.