Reducing Manual Effort and Achieving Better Chip Verification Coverage with AI and Formal Techniques

Taruna Reddy

Sep 04, 2024 / 5 min read

Given the size and complexity of modern semiconductor designs, functional verification has become a dominant phase in the chip development cycle. Coverage lies at the very heart of this process, providing the best way to assess verification progress and determine where to focus further effort. Code coverage of the register transfer level (RTL) chip design, functional coverage as specified by the verification team, and coverage derived from assertions are combined to yield a single metric for verification thoroughness.

Coverage goals are usually quite high (95% or more) and hard to achieve. Chip verification engineers spend weeks or months trying to hit unreached coverage targets to ensure that the design is thoroughly exercised and bugs are not missed. Traditionally this has involved a lot of manual effort, consuming valuable human resources and delaying project schedules. Fortunately, in recent years several powerful techniques have been developed to automate the coverage process, achieve faster coverage closure, and end up with higher overall coverage.

functional verification tools nvidia

NVIDIA Tests Chip Design Verification Tools

A presentation by NVIDIA at the Synopsys Users Group (SNUG) Silicon Valley 2024 event described a project in which the chip verification coverage enhancement techniques of test grading, unreachability analysis, and artificial intelligence (AI) were highly successful. The NVIDIA team carefully measured the impact across three generations of related chips, providing an exceptionally quantitative case study. The designs involved were large, with more than 100 million coverage targets. Many blocks were multiply instantiated, with unique tie-offs for each instance. 

On the baseline design, Project A, this design topology made coverage convergence very challenging. The tie-offs left each instance with large unreachable cones of logic whose coverage targets could never be hit by any test. Each instance required its own unique set of coverage exclusions, so each instance had to be signed off for coverage independently. As shown in the following example for one set of coverage targets, convergence using a constrained-random testbench was slow and a large manual effort was required to reach coverage signoff. 

functional verification coverage

Some important design bugs were not found until late in the project, a cause for concern. The chip verification engineers wanted to accelerate coverage to find bugs earlier and to reduce the amount of manual effort required. The first technique they tried on the derivative Project B was test grading, available in the Synopsys VCS® simulator. Test grading analyzes the simulation tests and ranks them according to achieved coverage. This enables verification engineers to set up simulation regressions in which the most productive tests run more often, with more seeds, than less productive tests. Coverage converges more efficiently, saving project resources. 

Test grading was a good first step, but the team still faced the challenge of the many unreachable coverage targets in the design. They found an effective solution with Synopsys VC Formal and its Formal Coverage Analyzer (FCA) application (app), which determines the unreachable coverage targets in the RTL design. This eliminates the traditional quagmire in which the verification team spends enormous time and resources trying to hit coverage targets that can never be reached. 

Formal analysis conclusively determines unreachable coverage targets and removes them from consideration for future simulations. This benefits the overall coverage calculation:

functional verification coverage calculation

Excluding the unreachable coverage targets boosts total coverage by eliminating apparent coverage holes that are actually unreachable and by reducing the total number of coverage targets to be hit in simulation. This is a completely automated process. The FCA app generates an exclusions file with the specific unreachable coverage points for each unique instance in the design. As shown in the following graph, the combination of test grading and unreachability analysis on Project B achieved a major “shift left” in coverage by two key milestones.

nvidia chip signoff

In their SNUG presentation, the NVIDIA engineers reported the following learnings from Project B:

  • Focus early on test grading to improve stimulus productivity to hit more coverage
  • Focus early on coverage to uncover bugs earlier, which increases design quality and saves integration effort
  • Use automatic unreachability exclusion to save manual effort, focus verification efforts on reachable coverage gaps, and find bugs earlier
  • Achieve a left shift in coverage and bug finding by applying test grading and unreachability analysis effectively
  • Experiment with the tools, learn, and adjust to enhance verification methodologies

After the results of Project B, the verification team was eager to try additional techniques to further shift left the verification process. For project C, they experimented with AI-based techniques, starting with the Synopsys VSO.ai Verification Space Optimization solution. It includes a Coverage Inference Engine to help define coverage points based on both simulated stimulus and the RTL design. It also uses connectivity engines and a machine learning (ML) based solver to target hard-to-hit coverage points. 

The verification team first tried Synopsys VSO.ai in the late stage of Project C, using a constrained random testbench complaint with the Universal Verification Methodology (UVM). The results over using just test grading and unreachability analysis were impressive: adding VSO.ai achieved 33% more functional coverage in the same number of test runs while reducing the size of the regression test suite by 5X. Code coverage and assertion coverage improved by 20% in the same number of runs with an impressive 16X regression compression over the baseline.

Using a different set of baseline regression tests,  the engineers experimented with the Intelligent Coverage Optimization (ICO) capability in Synopsys VCS. ICO enhances test diversity using reinforcement learning, resulting in faster regression turnaround time (TAT), faster coverage closure, higher achieved coverage, and discovery of more design and testbench bugs. ICO provides testbench visibility and analytics, including stimulus distribution histograms and diversity metrics. It also provides root cause analysis to determine the reasons for low coverage, such as skewed stimulus distribution or over/under constraining. 

As shown in the graph below, applying ICO, VSO.ai, and unreachability analysis achieved 17% more coverage in the same number of runs with a 3.5x compression of regression tests compared to the baseline. Four unique bugs were also uncovered.

signoff coverage progress chart

The NVIDIA team reported the following learnings from Project C:

  • Better functional, code, and assertion coverage in the same number of runs 
  • Faster coverage, improved coverage, and better regression compression
  • More bugs discovered due to better exercise of the design

The SNUG presentation concluded with a summary of the results from the three chip projects. Unreachability analysis provided the single biggest gain, boosting coverage metrics by 10-20% with minimal effort. The combination of chip verification technologies resulted in as up to 33% better functional coverage with 2-7X regression compression on all testbenches. They found that ICO uncovered unique bugs and that VSO.ai could be used across all project milestones. 

The recommendation from the NVIDIA verification engineers is that test grading be used from the very beginning of the project to improve stimulus effectiveness. VSO.ai should be used for early milestones, when stimulus is immature, to achieve high regression compression, and continued through late stage milestones for additional compression and for increasing the total coverage. Finally, ICO and unreachability analysis should be enabled in mid-project to reduce compute resources, left-shift coverage by at least one milestone, and find unique bugs earlier. The combined power of all four technologies will benefit any complex chip project.

Continue Reading