Based on NetLogic's NXCPU cores using a 64-bit MIPS architecture, the XLP900 achieves what Broadcom likens to supercomputer performance of more than 1 trillion operations per second and packet processing throughput of 160 Gbps. This is made possible by using a 28nm process node to pack over 5 billion transistors onto a single chip. This density enables 20 CPU cores, each a superscalar module supporting four simultaneous instruction issues and threads (better known by the Intel label, hyper-threading) and advanced out-of-order execution, along with a 3-level cache and hardware virtualization. This effectively providing 80 independent processing cores.
But that's just the computational guts. The XLP900 also includes a host of autonomous (that is, not requiring CPU intervention) hardware accelerators for deep packet inspection (DPI), RAID, deduplication, compression, RSA cryptography, packet processing and I/O acceleration for third-generation protocols including PCIe (16 lanes), SATA and USB.
A system on a chip (SoC) with this many modules needs a high-bandwidth, low-latency highway for passing data on chip, so they are all connected by an intrachip messaging network that O'Reilly calls a ring-of-rings 2D torus (for the curious, this PDF describes the technique in detail) with more than 2 Tbps of bandwidth. If the performance of a single device isn't up to the task, up to eight XLP900s can be interconnected in a low-latency grid network, with full cache coherency and inter-processor interrupts (ICI) to create systems with up to 640 virtualized cores and 1.28 Tbps of networking performance.
With specs like these, don't expect to find an XLP900 in your next UTM appliance or SOHO NAS box, unlike Cavium's recently announced OCTEON III that uses a similar 28nm process. For those uses, that's where O'Reilly said the chip's little brothers--the single and dual CPU 200 series--are better fits, but in carrier networks and the data center core.
Likely targets for the chip, O'Reilly said, include high-density LTE base stations, 10- or 40-Gigabit Ethernet security appliances, line cards for carrier-class or data center core routers and switches, or even controllers for high-performance solid-state storage systems. Indeed, security appliances--which due to their vast computational load of disassembling, scanning and pattern matching struggle to keep up with Ethernet line speeds--seem a particularly promising market. The XLP900's programmable DPI engine, with a dedicated local cache for content processing and a separate grammar-parsing unit for protocol recognition and application identification, can achieve 40-Gbps DPI throughput without loading the CPU cores, according to Broadcom.
Aside from the DPI accelerator, the chip has a significant advantage in raw performance, by factors of two to five when comparing operations per second of data throughput, O'Reilly said. The chip is currently sampling to OEMs, so expect to see some blazing new hardware in time for next year's Las Vegas Interop. Of course, by then we might all be wondering how we'll ever handle 400-Gigabit Ethernet.