OoOE

You’re going to come across the phrase out-of-order execution (OoOE) a lot here, so let’s go through a quick refresher on what that is and why it matters.

At a high level, the role of a CPU is to read instructions from whatever program it’s running, determine what they’re telling the machine to do, execute them and write the result back out to memory.

The program counter within a CPU points to the address in memory of the next instruction to be executed. The CPU’s fetch logic grabs instructions in order. Those instructions are decoded into an internally understood format (a single architectural instruction sometimes decodes into multiple smaller instructions). Once decoded, all necessary operands are fetched from memory (if they’re not already in local registers) and the combination of instruction + operands are issued for execution. The results are committed to memory (registers/cache/DRAM) and it’s on to the next one.

In-order architectures complete this pipeline in order, from start to finish. The obvious problem is that many steps within the pipeline are dependent on having the right operands immediately available. For a number of reasons, this isn’t always possible. Operands could depend on other earlier instructions that may not have finished executing, or they might be located in main memory - hundreds of cycles away from the CPU. In these cases, a bubble is inserted into the processor’s pipeline and the machine’s overall efficiency drops as no work is being done until those operands are available.

Out-of-order architectures attempt to fix this problem by allowing independent instructions to execute ahead of others that are stalled waiting for data. In both cases instructions are fetched and retired in-order, but in an OoO architecture instructions can be executed out-of-order to improve overall utilization of execution resources.

The move to an OoO paradigm generally comes with penalties to die area and power consumption, which is one reason the earliest mobile CPU architectures were in-order designs. The ARM11, ARM’s Cortex A8, Intel’s original Atom (Bonnell) and Qualcomm’s Scorpion core were all in-order. As performance demands continued to go up and with new, smaller/lower power transistors, all of the players here started introducing OoO variants of their architectures. Although often referred to as out of order designs, ARM’s Cortex A9 and Qualcomm’s Krait 200/300 are mildly OoO compared to Cortex A15. Intel’s Silvermont joins the ranks of the Cortex A15 as a fully out of order design by modern day standards. The move to OoO alone should be good for around a 30% increase in single threaded performance vs. Bonnell.

Pipeline

Silvermont changes the Atom pipeline slightly. Bonnell featured a 16 stage in-order pipeline. One side effect to the design was that all operations, including those that didn’t have cache accesses (e.g. operations whose operands were in registers), had to go through three data cache access stages even though nothing happened during those stages. In going out-of-order, Silvermont allows instructions to bypass those stages if they don’t need data from memory, effectively shortening the mispredict penalty from 13 stages down to 10. The integer pipeline depth now varies depending on the type of instruction, but you’re looking at a range of 14 - 17 stages.

Branch prediction improves tremendously with Silvermont, a staple of any progressive microprocessor architecture. Silvermont takes the gshare branch predictor of Bonnell and significantly increased the size of all associated data structures. Silvermont also added an indirect branch predictor. The combination of the larger predictors and the new indirect predictor should increase branch prediction accuracy.

Couple better branch prediction with a lower mispredict latency and you’re talking about another 5 - 10% increase in IPC over Bonnell.

Introduction & 22nm Sensible Scaling: OoO Atom Remains Dual-Issue
POST A COMMENT

174 Comments

View All Comments

  • Hector2 - Friday, May 17, 2013 - link

    There are only 3 companies right now left in the world who have the muscle and volume to afford high tech fabs -- Intel, Samsung & TSMC. And Intel has about a 2 year lead. That means not just higher performance and lower power than before, but lower cost. Making the chips smaller multiplies the number of chips on a single, fixed-cost wafer and lowers costs. If the chip area is 1/2, the costs to make it are about 1/2 as well. 22nm tech gives Intel faster chips with less power than their competition. 14nm hits it out of the park. Reply
  • BMNify - Wednesday, June 5, 2013 - link

    You're absolutely wrong about "lower cost". x86 requires more die area. The process is more volatile (more failed wafers).

    If we combine the 2 above factors with better performance, lower power consumption and toss in a lack of experience we get GT3e. A technological marvel that few (OEMs) want.
    Reply
  • BMNify - Wednesday, June 5, 2013 - link

    Spot on Krysto - It's Intel's process advantage that is shining through. Soon they'll hit the point of diminishing returns and/or the rest of the market will catch up/get close enough. When I see AMD at 32nm (Richland) having lower power draw at idle than Intel at 22nm (Ivy Bridge) I wonder how special their "secret sauce" actually is.

    How long can Intel loss-lead? Probably as long as Xeon continues to make up for it but ARM is getting into the server market now too (looking forward to AMD and Calexda ARM SoCs for the server market). Should be interesting in 3-5 years
    Reply
  • TheinsanegamerN - Monday, August 26, 2013 - link

    only issue, though, is when you put that richland chip under load. all of a sudden, intel is using much less power. Reply
  • t.s. - Monday, May 6, 2013 - link

    "The mobile market is far more competitive than the PC industry was back when Conroe hit. There isn’t just one AMD, but many competitors in the SoC space that are already very lean fast moving. There’s also the fact that Intel doesn’t have tremendous marketshare in ultra mobile."

    Well, with their 'strategy' back then when facing AMD (http://news.bbc.co.uk/2/hi/8047546.stm), they surely'll win. :p
    Reply
  • nunomoreira10 - Monday, May 6, 2013 - link

    It´s kinda suspicious that there are many comparisons against arm but none against Amd jaguar or even bobcat.
    jaguar will probably be a much better tablet cpu and gpu, while intel competes on the phone market.
    Reply
  • Khato - Monday, May 6, 2013 - link

    Which AMD Jaguar/Bobcat SKU runs at 1.5 watts? They aren't included in the comparison because they're a markedly higher power level. Reply
  • nunomoreira10 - Monday, May 6, 2013 - link

    they will both be used on fan-less tablet designs... Reply
  • extide - Tuesday, May 7, 2013 - link

    Totally different markets. Jaguar/Bobcat will likely line up next to low end Core/Haswell, not an Atom/Silvermont Reply
  • Penti - Tuesday, May 7, 2013 - link

    Both will sadly be way to underpowered when it comes to the GPU, and that matters greatly on general OS's and applications like running a desktop OS X or Windows (or GNU/Linux) machine. You won't really be able to game on them at all as it's not smartphone games people want to run. GPGPU won't really be fast enough for anything and we talk about ~100-200 GFLOPs GPU-power on the AMD side for what is essentially a full blown computer.

    Intel is clearly targeting the phone market. Something AMD/ATI divested from years back with their mobile GPU tech going to Qualcomm (Adreno, which isn't Radeon-based) and Broadcom. ATIs/AMDs mobile GPU-tech was before that previously licensed to or used together with the likes of Intel (PXA/XScale – not integrated though), Samsung and Freescale among others. Their technology already is the mainstay of the mobile business and was departed from the company but in effect their technology know how was successful in the market without their leadership so why would they compete with that, of course they wouldn't.

    AMD simply has not and will not likely any time soon invest in an alternate route to dominate their own part of the smartphone/ARM-tablet market while Intel has with integrated designs replacing the custom ARMv5TE design. AMD going after ARM-business is different since they will license the core and their manufacturer GloFo already does manufactures and even offers hard macros for ARM-designs that they sell a bunch of to other customers already. It's also going after other embedded fields and the emerging ARM-server/appliance space all without designing custom cores.

    While PXA (Intel) was quite successful in the market, moving to x86 and doing away with stuff like ARM-based network processors, raid-processors allows Intel to focus on delivering great support for modern ISA across all sorts of devices, while it didn't make it into phones (until lately) like PXA which continued to power Blackberrys under Marvell, was the main Windows Mobile platform for years after Intels departure and so on it was able to become a multimediaplatform, and a widely adopted chip for embedded use, driving NAS-devices and the like. Thanks to the Intel purchase of Infineons Wireless portfolio including many popular 3G radios/modems and them forming a new wireless division their actual business and sales in the mobile market is also much higher than when they still had their custom PXA/XScale lineup. Plus they couldn't have competed with their XScale lineup without designing new ARM-ISA compatible cores/designs to be able to match Cortex A8, A9, A7, A15, Krait 600 etc. Plus puts them in a much better place to be a wireless/terminal supplier when they can support customers who want advanced wireless modems/baseband, Application processors, bt, wifi etc. While Nvidia will have Tegra 4i with integrated modem AMD couldn't offer anything similar as they have no team capable of producing radio baseband. Having modern compilers and x86-ISA sure makes it convenient now for Intel, as well as integrating their own GPU, just licensing ARM Ltd designs wouldn't have put them in a better position to continue their presence in the mobile field. They have basically developed and scaled their desktop GNU/Linux drivers in the Linux Kernel, added mobile features and so on years before they put the hardware and can leverage that software in mobile platforms (Android) but it makes sense and they don't have to rely on IP cores and third party drivers for graphics with the coming Bay Trail. They couldn't have shared that much tech if they were anything else then x86. Of course AMD won't be in the same place and scaling down a GPU designed for thousands of stream processors and Windows/OS X drivers to put it into phones is not the same. It would be awful if it is just scaled down to fit the power usage, even if Nvidia has kinda custom mobile gpu it's still worse then the competitors which has no presence in desktop computing. Drivers for QNX, Android/Linux, iOS etc is not the same as with Windows either. It takes a long time to start over when they did away with an okay solution (z460), and they haven't but other have and thats fine, there is more competition here then elsewhere. x86 is no stopper for Intel.
    Reply

Log in

Don't have an account? Sign up now