Introduction

The demand for application-specific system-on-chips (SoCs) for compute applications is ever-increasing. Today, the diversity of requirements means there is a need for a rich set of compute solutions in a wide range of process technologies. The resulting products may have very different -- but nonetheless demanding -- power, performance, and area (PPA) requirements. These compute needs extend from IoT wearables to mobile application processors; AI inference engines to machine learning GPUs and NPUs; and at the highest performance end, hyperscale servers and high-performance compute engines in supercomputers to networking and 5G/6G base-stations, from crypto engines to automotive MCUs and advanced driver assistance engines. This diversity leads to a wide range of processing requirements, but almost all have one common objective: extracting the maximum compute performance within the optimum energy profile. The precise engineering trade-offs required to support this multitude of specifications will inevitably be very design specific. 

This article will discuss how the SoC design-specific needs for these diverse computing applications, encompassing High-Performance Compute (HPC) and AI, can be addressed for a broad range of processes with a rich, tool-aware Foundation IP solution that includes optimized circuitry, broad operating voltage range support and the flexibility to add customer-specific optimizations. The article will explain how designers can achieve the optimum PPA for their compute applications, whether that goal is the maximum possible performance or the best power-performance trade-off for their designs.

Broad Support for a Wide Range of Processor Requirements

Synopsys has developed a versatile, highly optimized High-Performance Core (HPC) Design Kit comprising a range of specially architected logic cells and memory cache instances that have been optimized specifically to enable stretched SoC's performance and power goals. 

While the diversity of compute applications might share the goal of achieving the best PPA, the environmental conditions and design constraints will vary enormously. To meet the latest density and power requirements, high-performance compute, and mobile application processors will harness the latest process nodes, such as 3nm and even 2nm, using complex implementation techniques like dynamic voltage scaling (DVS). This requires wide-range process voltage temperature (PVT) support and may need custom characterization corners for targeted operating points. Automotive and networking compute applications might target slightly larger geometry Fin-FET nodes, like 16nm, 12nm, 7nm, and 5nm, and they can also take advantage of the Synopsys HPC Design Kits to enhance PPA. Crypto engines, graphic processors, and consumer compute engines in 4nm and 6nm shrink processes can also benefit from Synopsys HPC Design Kits.

Figure 1 illustrates Synopsys HPC Design Kit optimized logic library circuits that can significantly benefit the performance and power envelope.

Figure 1: Synopsys HPC Design Kit components for processor PPA optimization  

In building the HPC Design Kits, the Synopsys Foundation IP Team has carefully selected and tuned the circuit architectures to optimize SoCs for the best PPA. Some HPC Design Kit features of this optimized logic and memory are as follows:

  • Advanced logic library cell architectures are defined with sufficient margins to cover all operating ranges. 
  • Logic cell heights are defined to fit perfectly with the fin options, cell width, and power and ground supply rails and are designed to be robust and integration-friendly at the block and chip level. 
  • Memory Cache instances take advantage of further improvements on existing advanced assist techniques.  
  • Logic versatility is enhanced through Synopsys' support for all possible device options, a very rich set of drive strengths, and the addition of complex circuits. 

These features culminate in a rich HPC Design Kit that meets the SoC optimization needs of high-performance, medium-performance, and highly power-constrained compute applications. In addition to taking great care with the architectural features, the HPC Design Kit also includes dedicated cell sets to boost performance and reduce dynamic power. These cells can be classified into groups designed to minimize switched capacitance and routing constraints and have complex combinatorial, sequential, and multi-bit cells and cells with optimized timing arcs and delays.

Figure 2 shows where optimized logic circuits within the Synopsys HPC Design Kit can be best utilized in computing classes for CPU, GPU, DSP, and CNN application processors. 

Figure 2: Key logic components in the Synopsys HPC Design Kit

Two examples of the Synopsys HPC Design Kit cells that provide PPA benefits are as follows:

 

  • Example 1: Complex Combinational Cells, are optimized for more functionality in less real estate, enabling higher performance, reducing area, and minimizing power (Figure 3). 

Figure 3: Complex combinational cells reduce area, routing congestion and power

  • Example 2: Specialty Flip-Flops are tuned flops essential for achieving optimal PPA in 2GHz+ SoCs, stretch performance and minimize power (Figure 4)

Figure 4: Specialty flip-flops stretch performance and minimize power

The Synopsys HPC Design Kit also supports multiple standard cell architectures, with a wide range of VTs and channel lengths, to provide a finer granularity for performance and power scaling. Some of the fastest application processors used in high-performance computing are running at more than 4GHz. High-performance and ultra-high performance library and memory options can be targeted for high-speed CPUs. Lower performance blocks and performance-power balanced processors can use the power saving benefits of high-density and ultra-high density library and memory architectures to give a lower power envelope. Leveraging such a broad and flexible range of options results in the best overall performance-power trade-off. Combining this with the extensive PVT support enables the Synopsys HPC Design Kit to provide a very extensive solution space.

Synopsys HPC Design Kit Enabling Dynamic Voltage Scaling across a Wide Operating Voltage Range

Combining frequency modulation with voltage scaling using Dynamic Voltage and Frequency Scaling (DVFS) is a common approach to optimizing performance and power in advanced application processors. To support DVFS, memory instances, and logic libraries must support a wide voltage range. DVFS and Voltage Scaling can enable performance boost modes to maximize frequency by taking advantage of super overdrive and overdrive PVTs for short bursts of performance, while lower PVT clusters are supported to minimize overall power consumption in non-boost modes.

Ultra-low voltage PVTs are supported for applications where power is critical and where the headline performance requirements are more restricted but challenging. Foundation IP that can efficiently scale this wide range is critical and can provide an advantage for reducing power when the core is generally operating at a lower load but still delivering high performance when needed. Synopsys Foundation IP supports a very extensive operating voltage range, from near threshold (0.375V) to high voltage (1.15V), providing designers with the flexibility to scale their designs for a broad voltage range and fully take advantage of voltage scaling benefits to reduce dynamic and leakage power.

For HPC processors operating at very high frequencies, the cache memories have stringent access time, setup, and hold time requirements. The area and aspect ratio of the memory also play an important role in defining the block's floor planning. These caches often need to be hand-crafted to provide the best PPA profile. Synopsys HPC Design Kit is specifically designed to remove this bottleneck for SoC designers by providing expertly tuned cache instances, optimized beyond what is possible with a compiler.

Synopsys Foundation IP Optimizing AI and Application-Specific Accelerator Block PPA

As the general computation needs increase, so does the need for accelerator blocks designed to perform specific processing tasks. The architectures of these accelerator blocks are highly structured and optimized for the best speed, power, and performance profile for processing a narrower set of specific operations. The architectures are often highly parallelized. AI accelerator blocks are very commonly used in the industry. They are designed and optimized to execute AI algorithms efficiently. These AI algorithms require repetitive MAC operation; hence, the architectures are designed to optimize these MAC operations. Figure 5 shows a typical AI block. Like GPU, AI accelerator blocks are also highly parallelized to maximize the data throughput so the blocks can run at a lower frequency. The overall throughput gain is achieved through thousands of replicated cores operating concurrently. These accelerator blocks are very memory intensive and highly replicated, requiring highly specialized memory instances for the best overall performance. At Synopsys, we have designed these specialized memories to cater to these applications' growing memory capacity and performance needs. 

Figure 5: Memory IP Solutions for AI SoC: Lower Power & Latency

Synopsys Foundation IP Solution for Network on Chip

Networks on Chip (NoCs) perform the task of carrying high-intensity communication workloads at the SoC level. NoCs are, therefore, high-performance circuits with a high activity rate, and are very power hungry. They require high-performance 1P, 2P, multi-port, and TCAM memories like those shown in Table 1 below. 

Table 1: Synopsys Foundation IP for NoC applications

Synopsys HPC Design Kit is Co-Optimized with Synopsys EDA for Efficient SoC Implementation

The Synopsys Logic Libraries and Embedded Memories are a rich set of IPs, co-optimized with EDA tools, enabling fine-grain SoC optimization and implementation. This allows designers to achieve precise PPA tuning to avoid overdriving and unnecessary capacitance/routing overheads. High drive cells and combinatorial and sequential cells are optimized to minimize internal timing arcs and can be combined with multi-bit cells, which minimize switched capacitance to offer excellent PPA trade-offs. Co-optimizing with the EDA tooling ensures any innovative features are seamlessly accessible to SoC implementers building High Performance Compute, AI, and other processing applications.

EDA view support and PVTs are aligned across Synopsys Logic Libraries and Memory Compilers for each node to ensure a trouble-free integration experience. The Synopsys Foundation IP is offered across a wide range of foundries and process nodes to achieve optimized PPA regardless of the customer technology choice for the target application. Target customization can be supported to address any specific customer requirements.

Summary

Today's SoCs make great demands on implementation teams. They require compute solutions for a wide range of requirements, operating under diverse constraints. Whether it is at the very high-performance compute end of cloud infrastructure, or high-end mobile, low-power AI, or very low-voltage crypto engines, the need to get the maximum performance at a low voltage or within a defined power budget is immensely challenging. As part of the Synopsys Foundation IP portfolio, the Synopsys HPC Design Kit provides a versatile solution to meet that range of challenges, enabling SoC designers with a comprehensive offering to optimize performance and power across a wide solution space. It addresses needs for the highest CPU clock frequencies and provides optimized power trade-offs for middle- to lower-performance processor applications.

The challenges are not going away, but there is help in the form of the highly optimized logic library cells and embedded cache instances in the Synopsys HPC Design Kit.

For more information, visit Synopsys Foundation IP

Synopsys IP Technical Bulletin

In-depth technical articles, white papers, videos, webinars, product announcements and more.

Continue Reading