IBM: Give Frequency Its Due
While the likes of Intel and Sun are emphasizing the multi-threading capabilities of their newest lines of server processors over clock speed, IBM continues to push the envelope on sheer
February 13, 2006
SAN FRANCISCO — In an age of multithreaded multicores, is frequency irrelevant? Intel Corp. seemed to answer in the affirmative when it announced a "right-hand turn" in architectural philosophy five years ago. But in a Power microprocessor being readied for a line of servers, IBM Corp. makes a case for pushing performance by pushing frequency.
Intel group vice president Pat Gelsinger announced Intel's right-hand turn at the 2001 International Solid-State Circuits conference, saying the company would keep power consumption flat by de-emphasizing megahertz and moving to dual-core designs with multiple threads. Sun Microsystems Inc. has taken a similar tack.
But at last week's ISSCC here, IBM threw down the gauntlet on frequency. IBM design engineers delivered three papers describing the pending Power6 microprocessor, aimed at the company's own pSeries servers. While the currently shipping, dual-core Power5+ design is in the 1.9-GHz range (and heading higher soon) on a 90-nanometer process, the 65-nm Power6 will debut in the 4- to 5-GHz range when servers begin shipping next year, said Mark Papermaster, IBM's vice president of technology development.
"We don't want to be blind on frequency," Papermaster said. "Otherwise, you will go right out of the thermal envelope of the data centers. But there still is a relationship between frequency and performance."
IBM's "megahertz burst" comes at a time when Intel has "taken its foot off the gas," said Rick Doherty, director of The Envisioneering Group (Seaford, N.Y.). "Intel has thrown in the towel," Doherty said. "They've stopped trying to get past 3.5 GHz."Sun, too, has de-emphasized frequency as it has embraced multithreading. IBM's Power6 will come in single- and dual-core versions, with two threads per core; Sun's Niagara processor, shipping now, has four threads per processor core, for a total of 32, and runs at 1.2 GHz. Ana-Sonia Leon, director of technology at Sun, called Niagara "a very shallow-pipeline, single-issue, in-order processor" that consumes 63 watts total--"less than 2 W per thread."
"We don't get performance out of frequency," Leon said. "We use bandwidth [more threads] and get performance scaling with a larger number of threads. We are very confident this is the way to go."
Kevin Krewell, editor-in-chief of the Microprocessor Report, said Sun's multi-threading approach "works well when there is a big pool of threads that needs servicing. Sun can parallelize the cores in Niagara. But IBM has customers with large data sets, and that is an area where they will compete with Sun's Rock, the next big Ultrasparc design."
Brad McCredie, an IBM fellow and lead engineer on the Power6, said IBM used a 13-stage pipeline--seven stages for the floating-point unit, six for integer operations-- as it did in the Power5. But work that required 22 "fan-out of four" logic stages (in which one inverter drives four others) on the Power5 can be accomplished with 13 FO4s now.
The efficiency boost was accomplished by "moving the logic around," McCredie said. "We doubled the frequency and held the pipe depth the same, getting a stage of logic to do more. The goal was to get more logical functions per transistor. If we did not, then we would have had to blow out the pipeline stages. Some companies have gone to 20, 30, 40 pipeline stages, and that's a death spiral."IBM does not dispute that higher frequencies can mean higher power consumption. "The real name of the game [in conserving power] is to monitor your transistor count," McCredie said, adding that "all transistors are not equal." For example, "caches are such regular structures that the threshold voltages in the caches can be higher. You can do stuff so the cache does not hit you as much as logic in terms of power."
The dual-core Power6 has 750 million transistors, nearly a million fewer than Intel's Montecito version of Itanium, said Joel Tendler, technology assessment program manager at IBM.
McCredie said the Power6 has a second memory controller on-chip to double the memory bandwidth. "We took great pains to make sure the bandwidth scales with the processor," he said. "Once you give up on something, you're dead. Power matters. Frequency matters. It all matters."
Tendler declined to reveal the cache sizes, operating-voltage range or power consumption, saying that what matters to IBM's customers is power at the server level, not at the chip level. That led some engineers at ISSCC to argue that disclosing frequency without relating it to chip-level power consumption is--as one Intel engineer put it--"meaningless."
Some technical details of the processor did emerge at ISSCC. Brian Curran, Power6 circuits leader, said the binary floating-point unit (BFU) uses high-threshold-volt- age transistors to reduce leakage. It consumes 310 mW at 1.1 V, running at 4 GHz. The BFU employs 54 FO4 logic stages, against 91 for the BFU on the Power5, using the same pipeline, instructions per cycle and latch cycle overhead. The integer execution unit requires 78 FO4 stages and consumes 160 mW at 1.1 V and 4 GHz.Curran said IBM minimized the use of dynamic logic to conserve power. Also, "we designed the circuits to do more than one function, and each circuit does more work. We combined that with low latch delay overhead; a lower-latency design drives to a higher frequency."
IBM engineers in the lab have demonstrated a Power6 with 5.1-GHz operation from a power supply of 1.3 V, Curran said.
Sam Naffziger, an Intel fellow and director of Itanium circuits and technology, said IBM has "borrowed" ideas first used at Intel, such as extensive use of pulsed latches rather than the master/ slave-type flip-flops that he said were used in earlier IBM and Intel designs. "The pulse-based latches have half the overhead of the master/slave latches, and that can save you one or two FO4s per stage," he said.
IBM "did a good job of adjusting the circuits to get more frequency out of the same pipeline," he added. "Frequency does have a place. If you can keep the same pipeline depth and power consumption, then higher frequency will improve performance."
Naffziger asserted that engineers "are still working on frequency at Intel," adding that the Tukwila version of Itanium, the follow-on to Montecito, will have "more cores at higher frequency."Shekhar Borkar, an Intel fellow and director of microprocessor research, said Intel determined as early as 1999 "that a low FO4 at higher frequency was not power-efficient. We started the turn [away from higher frequency and power consumption] with the Centrino processor at 1.6 GHz. They [IBM] are going backward."
At ISSCC, Intel engineer Stefan Rusu described Intel's dual-core Tulsa, a 65-nm-based Xeon server processor that runs at 3.4 GHz and consumes 150 W. Rusu said Tulsa has a 16-Mbyte Level 3 cache, for a total of 1.328 billion transistors, which he said is "the highest number of transistors yet reported for an X86 design."
Rusu said Intel did not use multiple transistor threshold voltages in the cache--a common power-saving technique--for Tulsa; instead, it made "massive use of longer-channel-length transistors." The transistors run slower but incur 3x less leakage, he said.
McCredie said IBM employed three threshold voltages and tuned the channel lengths in the Power6 to achieve a trade-off between leakage and performance.
You May Also Like