Cloud native EDA tools & pre-optimized hardware platforms
Lightning-fast financial transactions. Natural language processing. Genome sequencing. For heavy-duty operations like these, a standard CPU will not suffice. No, these high-performance computing (HPC) applications demand much more powerful processors that can handle the intense workloads for solving these complex problems without burning up too much energy.
Faced with two primary objectives that are seemingly at odds—high performance and low power consumption—what’s a chip designer to do?
Time and again, engineering ingenuity has proven its prowess in solving tough problems. As it becomes impractical or too costly to produce larger SoCs or move to smaller process nodes to generate power, performance, and area (PPA) advantages, engineers have devised new techniques to move semiconductor innovation forward. Chiplets present one such way to scale performance while also meeting power and form factor goals, particularly for businesses that don’t require super-high chip volumes. In this blog post, I’ll discuss how chiplets are providing the next new level of abstraction to meet the demands of hyperscale data centers and their HPC workloads.
HPC comes into play to help solve complex, computational problems across a wide spectrum that spans scientific and academic research to business and commercial innovation. From COVID-19 to climate change, financial risk analysis, and product development, our world certainly has some significant and challenging issues to address. The fast and accurate data processing capabilities of HPC systems, along with artificial intelligence and machine learning algorithms, turn massive quantities of data into actionable insights through analysis, modeling, and simulation. MarketsandMarkets projects that the HPC market will grow from US$37.8 billion in 2020 to US$49.4 billion by 2025.
The original HPC application, supercomputing, involves thousands of CPU nodes solving complex problems. Traditional data centers also have CPUs in their foundation or, sometimes, a mix of CPUs, GPUs, and specialized ASICs. Google, for example, has its Tensor Processing Unit (TPU), a proprietary AI accelerator ASIC for neural network machine learning that is accessible on the cloud. Today, we’re seeing the growing popularity of hyperscale data centers, which are designed to scale up quickly and in a massive way, managing petabytes (and beyond) of data for HPC workloads. The usual types of chips for prevailing HPC applications are not sufficient to meet the PPA demands of hyperscale data centers.
Very large chips, such as the Cerebras Wafer-Scale Engine, provide an option for hyperscalers. But a big, advanced-node, monolithic die is expensive and challenging to produce from a yield perspective. This is one reason why chiplets are attractive. In fact, hyperscalers are very much on the front lines of driving the move towards new architectures like chiplets to achieve PPA goals.
Chiplets are small dies that, when integrated into a single package, form a larger, multi-die design. By partitioning a larger design into chiplets, designers gain the benefits of product modularity and flexibility. Separate dies—even those developed on different process nodes—can be assembled onto a package to address different market segments or needs. They’re also easier to fabricate and produce better yields compared to a large, monolithic die.
As for chiplet packaging, there are a variety of options to support higher transistor density, including multi-chip module (MCM), 2.5D, and 3D technologies. The earliest type of a system-in-package (SiP), available now for a few decades, the MCM brings together at least two ICs, connected via wire bonding, on a common base in a single package. A 2.5D design includes a GPU and high-bandwidth memory (HBM) assembled side-by-side on an interposer in a single package. Even though the logic is not stacked, in some 2.5D designs, the HBM consists of 3D stacked memory, thus bringing 3D content into the 2.5D design. In a 3D package, heterogeneous dies are stacked vertically and connected with through-silicon vias (TSVs); the architecture paves the way for very fast memory access bandwidth.
An HPC design typically utilizes chiplets that come in various packaging types. MCMs are ideal for smaller, low-power designs. 2.5D designs are suited for artificial intelligence (AI) workloads, as GPUs connected closely with HBM deliver a powerful combination in terms of compute power and memory capacity. 3DICs, with their vertically stacked CPUs and fast memory access, are ideal for general HPC workloads.
Globally, data centers accounted for 200 TWh in 2019, or about 1% of electricity demand. Even with a projected 60% increase in service demand, this usage level is anticipated to remain almost flat through 2022, so long as hardware and data center infrastructure efficiencies continue, according to a report by the International Energy Agency. Clearly, any reduction in power consumption at the chip level—particularly if the reduction can scale across the multi-die design—will be beneficial. To that end, the next frontier for HPC and data center applications could be optical ICs. Integrating optical ICs into the same packets as silicon provides substantial benefits in power reduction and increased bandwidth. While optical technology is starting to find its way into data centers, providing another way to scale up, reduce power, and maintain costs, it is already a proven method in the supercomputing world to connect hundreds or even thousands of CPU nodes.
To ensure that chiplets deliver the desired PPA, it’s important to carefully choose the underlying technologies upon which they’re developed. For example, die-to-die interfaces that support high bandwidth, low latency, power efficiency, and error-free performance are essential for fast, reliable data transfer. Also important are design and verification flows that facilitate earlier discovery of problems, better quality-of-results, and faster time-to-market.
With our leading electronic design automation (EDA) flows and IP solutions, Synopsys provides the technology resources to accelerate the development of chiplets that meet aggressive PPA targets for HPC applications. For example, the AI-enhanced, cloud-ready Synopsys Fusion Design Platform™ features massively parallel digital design tools with integrated engines across synthesis, place-and-route, and signoff. For 2.5D and 3D designs, our die-to-die connectivity IP provides leading power, latency, and die edge efficiency. For 3D designs, Synopsys 3DIC Compiler, part of the Fusion Design Platform, is the industry’s first unified platform for advanced multi-die system design and integration. 3DIC Compiler can be used to build the architecture for the multi-die design and conduct analysis on parameters such as thermal, power, and timing, while Synopsys Fusion Compiler provides an RTL-to-GDSII flow for building the CPUs. Our design portfolio also includes cloud-ready solutions for RTL analysis, golden signoff extraction, static timing analysis, simulation, testing, early analysis of TSVs via technology computer-aided design (TCAD), and more. On the verification side, we have the cloud-ready Synopsys Verification Continuum® for early software bring-up, earlier detection of SoC bugs, and faster system validation.
Big data analytics are uncovering otherwise hidden patterns, correlations, and insights that are helping us solve some of the world’s toughest problems. Traditional computing architectures don’t quite have the horsepower needed for these heavy-duty computations. Chiplets, however, are providing hyperscalers and other designers of HPC systems with a way to scale performance and power beyond Moore’s law—without encountering the yield and cost issues of large, monolithic dies. As HPC workloads drive demand for chiplets, designers can rest assured that EDA and IP solutions are available to help them meet increasingly aggressive PPA and time-to-market goals.