Multi-Die Health and Reliability: Synopsys and TSMC Showcase UCIe Advances

Faisal Goriawalla, Yervant Zorian

Jan 09, 2025 / 5 min read

Although multi-die designs — an increasingly popular approach for integrating heterogeneous and homogenous dies into a single package — help resolve problems related to chip manufacturing and yield, they introduce a host of complexities and variables that must be addressed. In particular, designers must work diligently to ensure the health and reliability of their multi-die chip throughout its lifecycle. This includes testing and analysis of not only each individual die, but also die-to-die connectivity and the entire multi-die package.


Innovate Faster with Synopsys Multi-Die Solution eBook

Innovate Faster with Synopsys Multi-Die Solution

Read our comprehensive eBook and learn how to overcome multi-die design challenges and realize success.


Synopsys is at the forefront of multi-die design innovation, and we recently worked with TSMC to demonstrate two dies communicating via the high-speed UCIe (Universal Chiplet Interconnect Express) specification. Synopsys Monitoring, Test & Repair (MTR) IP was central to the demonstration, showing manufacturing and in-field health of the multi-die interconnect.

Read on as we explore the unique challenges of ensuring multi-die quality and reliability, why a comprehensive monitoring, test, and repair solution is crucial for chip designers, and what Synopsys and TSMC are doing to help. 

multi die system challenges

The need for interconnect monitoring, test, and repair

As semiconductors become more complex — with multiple heterogeneous and homogeneous dies integrated into a single package — the need for effective communication and reliable interconnects between the dies (also called chiplets) has greatly increased. The UCIe specification has standardized die-to-die interconnects and facilitates high-speed communication between chiplets. However, the high-speed nature of these connections necessitates rigorous monitoring, testing, and repair to ensure seamless communication over the lifecycle of the chip. Monitoring signal integrity is vital for ensuring the overall health of interconnects. Rigorous algorithmic-based testing can uncover different types of opens, short, and crosstalk between interconnects that can manifest in these high-data-rate lanes in proximity. Equally important is the ability to cumulatively augment any repair signature across Process, Voltage, and Temperature (PVT) corners to cover different use cases.

To implement a UCIe die-to-die link, designers must address several critical multi-die health challenges, including:

  1. Narrow pitch: The pitch, or distance between its interconnects, is very short (between 25-55um) for UCIe-advanced packages. When the chip is manufactured, probing to those microbumps is very difficult. This requires an embedded capability that allows for self-testing rather than probing.
  2. Use of UCIe mainband and sideband only: Typically, besides the mainband and sideband channels, there are no additional Design for Test (DFT) ports that can be utilized for individual die-level testing.
  3. High-speed signal integrity: With the high-speed nature of UCIe communication, maintaining signal integrity becomes challenging. Continuous monitoring of the UCIe PHY parameters to detect and rectify issues promptly is necessary.
  4. Redundancy and repair: To enhance quality, reliability, and yield, redundancy is required by providing spare interconnects. In case of failures, these spares can replace defective ones, ensuring uninterrupted communication.
  5. Environmental variability: Interconnects can behave differently under varying environmental conditions such as temperature and voltage. Tests and repairs for interconnects that operate under multiple conditions are needed to ensure robustness.
UCIe multi die interconnect

A comprehensive solution for multi-die health monitoring and reliability

Our MTR IP solution comprises several components working in unison to provide a comprehensive health check for multi-die designs:

  • Specialized mission mode signal integrity monitoring: This is comprised of Signal Integrity Monitors (SIMs) embedded within the UCIe high-speed interconnect lanes to continuously monitor signal integrity, providing real-time feedback on the health of the die-to-die communication channels.
  • Built-in self-test (BIST) algorithms: These deterministic algorithms are designed to detect advanced interconnect fault types, including crosstalk between the interconnects which may occur due to narrow pitch and high data rates.
  • Cumulative repair: UCIe-advanced provides redundant lanes for repair. For every 136 main lanes, there are 12 additional redundant lanes, and for the sideband, four main lanes are complemented by four spares. This redundancy is critical for repairing faulty interconnects without affecting overall system performance. Leveraging the redundant lanes, MTR uses a built-in redundancy analysis (BIRA) algorithm to perform hard repair, cumulatively storing the repair data in the E-Fuse.
  • High-speed access and test (HSAT) and automatic test patter generation (ATPG) via high-speed interface: HSAT functions help access a hidden die allowing adaptive, high-bandwidth testing over a functional interface that can reduce testing time, save cost due to reduced pin count and test hardware, and enable testing throughout the entire lifecycle of the silicon.

Our MTR IP solution can be used in different lifecycle scenarios: At the individual die level to ensure the health of a single die; at the multi-die level, which is especially important in manufacturing the stack; during power-on mode to ensure MTR is happening every time a user powers on a device in the field; and during real-time mission mode, which provides a deeper, real-time health check. The first two are most applicable to foundries like TSMC and the second two apply to the foundry’s customers. 

UCIe multi die IP

Demonstrating UCIe-based multi-die advances

At Chiplet Summit, we presented our recent achievement of first-pass silicon success of UCIe PHY IP on TSMC N3E and the CoWoS-S interposer. In addition, we shared the results of a demonstration featuring two dies communicating via high-speed UCIe die-to-die interface and standard GPIO-based interface.

In the first configuration, our MTR IP provides interconnect reliability, test, and repair features between two Synopsys UCIe IPs. In the second configuration, the SLM MTR IP supports the IEEE 1838 test access infrastructure allowing intra-die lane testing.

Both configurations support full execution of monitor, test, debug, and repair capabilities internal to each die, utilizing on-chip technologies such as Synopsys HSAT and SEQ IPs for random logic blocks, SMS IP for embedded memory blocks, and SHS and MTR IP for UCIe blocks. This encompasses pre-bond and post-bond manufacturing stages, the in-field power-on stage, and periodic mission health monitoring. The design showcases how the above capabilities can be used across the entire silicon lifecycle of the multi-die package without loss of coverage or pattern inflation when targeting dies in a stack.

Our commitment to multi-die health and reliability

Synopsys is committed to helping our customers push the boundaries of semiconductor technology and deliver multi-die designs with maximum manufacturing yield and robustness throughout the silicon lifecycle. Our SLM MTR IP solution for UCIe-based multi-die designs is a testament to this commitment, providing a robust framework for monitoring, testing, and repairing die-to-die interconnects. The solution can be leveraged across all stages of the silicon lifecycle, from in-design and in-ramp to in-production and in-field. 

 

Continue Reading