One of the more esoteric elements of Intel’s Architecture Day 2020 came very near the end, where Intel spent a few minutes discussing what it believes is the future of some of its products. Brijesh Tripathi, VP and CTO of Intel’s Client Computing group, laid out a vision about the future of its client products in the 2024+ future timeframe. Centered around Intel’s 7+ manufacturing process, the goal was to enable ‘Client 2.0’ – a new way to deliver and enable immersive experiences through a more optimized silicon development strategy.

Chiplets aren’t new, especially with recent launches from Intel’s competitors, and as we move into more complex process node development, the era of chiplets enables faster time-to-market as well as better binning and yields for a given product. The key is enabling how those chiplets fit together, and at which points it makes sense to mix and match the relevant ones. Intel has spoken about this before in a more generalized context, at its Technology and Manufacturing Day 2017, as shown in the carousel image at the top.

The goal here is to mix and match which process nodes work best for different parts of the chip. Intel seems set to realize this vision starting with its 7nm platform. At Architecture Day 2020, Brijesh Tripathi showed this slide:

On the left is a typical chip design – monolithic with everything it requires. For Intel’s leading edge products, these take 3-4 years to develop, and bugs are found in silicon by both Intel initially and then later by Intel’s partners as they can ramp up the silicon-on time by a a few orders of magnitude.

In the middle is a basic chiplet layout, similar to that slide from 2017, where different functions of the die are split into their own modules. Assuming a consistent interconnect, there are some reuse of the silicon elements, such as AMD using the same core compute dies in client and server. For some semiconductor companies (except Intel), this is where we are.

On the right is where Intel sees its future. Instead of having a single digit number of chiplets in a product, it envisions a world where each IP can be split into multiple chiplets, enabling products to be built with different configurations of what works for the market. In this instance, a chiplet might be a PCIe 4.0 x16 link – if the product needs more, it simply adds in more of these chiplets. Same with memory channels, cores, media accelerators, AI accelerators, Ray Tracing engines, crypto accelerators, graphics, or even as far down as SRAM and caching blocks. The idea is that each IP can be split and then scaled. This means that the chiplets are tiny, can be built relatively quickly, and bugs should be ironed out very quickly.

In this diagram, we are treated to Intel’s long term vision for the client – a base interposer with an in-package memory (something like an L3 or L4) that can act as the main SRAM cache for the whole die, and then on top of this we get 24 different chiplets. Chiplets can be graphics, cores, AI, Media, IO, or anything else, but they can be mixed and matched based on what is needed. A content creator might want a balance between some good graphics acceleration and compute, while a gamer might want to focus purely on the graphics. A corporate client or workstation might need less graphics and more for compute and AI, whereas a mobile version of the chip will be heavily invested in IO.

As always, there is some trade-off between chiplet size and complexity of actually putting them together in a multi-die arrangement. Any communications between chiplets costs more power than a monolithic interpretation, and usually offer higher latency. Thermals have to be managed as well, and so sometimes those chiplets are limited by what thermal properties are available. Multi-die arrangements also cause headaches for mobile devices, where z-height is critical. However, the benefits afforded from using the right process at the right time for the right product are big, as it helps provide both performance and power at the best possible cost. It also gives the opportunity to bring in 3rd party IP quickly if something amazing hits the scene.

The only downside here is that Intel hasn’t spoken much about the glue that binds it all together. Chiplet strategies rely on complex high-speed interconnect protocols, custom or otherwise. Current uses of Intel’s die-to-die connectivity are either simply memory protocols or FPGA fabric extensions – the big ones for server CPUs like UPI aren’t necessarily up to the task. CXL could be the future here, however current CXL is built upon PCIe, which means a complex CXL/PCIe controller for every chiplet which will likely get power hungry fast.

Intel has stated that they are inventing new packaging technology and new levels of connectivity to act between the silicon - there is no disclosure on the protocols at this time, however Intel acknowledges that to get to this level of scale it will have to go beyond what the company has today, and that will require creating standards and innovation in this area. The goal is to create and support standards, and the first incarnation will have some standardization built in. Intel states that this is a method of extreme disaggregation, and to note that not everything that is connected has to be high bandwidth (such as USB) or a coherent interconnect - Intel sees the goal involving a handful of protocols throughout the spectrum.

There’s also the developer market, which might be used to a more homogeneous implementation of resources in any given product. Without careful planning, and relevant coding, there is the potential for certain chiplet configurations to fall over if the developer was expecting a certain ratio of compute to graphics, for example. This isn’t something that OneAPI could easily fix.

These are all issues that Intel will have to address, although they have a few years until this comes to fruition. We were told that the internal name is Client 2.0, although it will likely have more marketing dressing added as Intel starts talking about it in more detail.

Related Reading

 

POST A COMMENT

67 Comments

View All Comments

  • JayNor - Saturday, August 22, 2020 - link

    don't U.S. patents expire after 17 yrs? Reply
  • KAlmquist - Saturday, August 29, 2020 - link

    U.S. patents used to expire 17 years after being granted, but this was changed in the 1990's in order to comply with WTO rules. Patents in the U.S. now expire 20 years after being filed. Reply
  • JayNor - Saturday, August 22, 2020 - link

    Their fpgas already have a chiplet system. I can imagine them creating a XeMF chiplet and using the CXL based Xe Link communication from their Aurora node design. Reply
  • zamroni - Saturday, August 22, 2020 - link

    Chiplet is suitable for desktop and server where power consumption is not big concern.
    Even amd doesn't use chiplet for laptop processor
    Reply
  • JayNor - Sunday, August 23, 2020 - link

    Why is AMD's sprawling chiplet layout considered to be a technological achievement? It seems to me it is just a trade-off between performance and yield. If it solved both issues, shouldn't we be seeing 5.3GHz AMD chips? Reply
  • Spunjji - Monday, August 24, 2020 - link

    It *is* just a trade-off between performance and yield. It's still considered to be a technological achievement because:
    1) Despite the inherent compromises in such a design, it still outperforms its competition from a power/performance perspective.
    2) It puts them in a position where they've already trialled the tech and will be on their 2nd or 3rd generation before Intel hit their 1st.

    We aren't seeing 5.3Ghz AMD chips because their core design and/or the manufacturing process don't really support it.
    Reply
  • JayNor - Tuesday, August 25, 2020 - link

    I'm not so sure the power/performance can be claimed as a virtue of the chiplet design. In fact, the sprawling chiplets require longer traces with higher energy per bit than, for example, Intel's EMIB interconnections to FPGA chiplets.

    Yes, TSM's 7nm process uses less power than Intel's 14nm process, but I think it has nothing to do with their chiplets.

    I suspect we'll see that the power/performance differences have disappeared with Ice Lake Server, along with the PCIE4, 8 channel memory and process differences.
    Reply
  • PeachNCream - Sunday, August 23, 2020 - link

    Weird how this customization at the CPU package level is nearly identical to customization of compute technology to suit a specific role has been happening for very long. Need more CPU power, add more CPU. Need more graphics, add more GPU. Need more I/O, add more I/O. The only difference is that instead of replacing or adding an expansion card in some ISA/PCI/AGP/PCIe slot, you're just seeing it done by Intel at the fab before it reaches the end user. In a way, it's just moving the needle a little regarding who handles that system customization and only barely at that since the buyer is going to be the one picking whatever chiplet capabilities they get.

    The more things change, the more they stay the same.
    Reply
  • ender8282 - Sunday, August 23, 2020 - link

    "a gamer might want to focus purely on the graphics"

    Does Ian know something about Xe that we don't? I can't think of any gamer who would want Intel to focus purely on graphics.
    Reply
  • Duncan Macdonald - Sunday, August 23, 2020 - link

    A monolithic design like Intel's has a number of an advantage of faster internal communication but with a number of costs - yield, design effort, use of high performance technology to do slow jobs.
    The yield cost comes because the larger a chip, the fewer that are perfect - a wafer of AMD CPU chiplets with a 10% failure rate becomes a 40% failure rate with the larger Intel CPUs.
    The design effort comes from the fact that the whole chip has to be re laid out for any changes that increase the size of an area in the chip - with the AMD chiplet design changes to the CPU chiplet do not require changes to the I/O die and vice versa. Also changing the number of CPU cores is a major job. AMD can use the same basic design from 8 cores to 32 cores just by adding more chiplets (Ryzen) or from 8 to 64 cores (Threadripper, EPYC)
    The monolithic design requires the same technology be used for the whole chip so slower areas such as the memory interfaces still have to use the high speed, high power process as the CPU core. AMD are able to use a slower, less power hungry and cheaper process (14nm or 12nm) for the I/O die while using the high performance process (7nm) for the CPU chiplets.
    Reply

Log in

Don't have an account? Sign up now