Optimizing Signal Distribution in FPGAs: A Guide to Automation with HAPS ProtoCompiler

Christoph Kuznik, Rutger Carlsson

Mar 14, 2021 / 7 min read

Abstract

The synchronous distribution of time-critical signals in multi-FPGA environments is a challenging task in successful prototype setup, in particular, regarding overall signal delay and integrity. Moreover, whenever design partitions may change during the development cycle the distribution has to be re-assessed to ensure correct synchronization of the signal and its load.

In this article we will show how HAPS ProtoCompiler can help to ease this process, in fact, automating the synchronous distribution of signals across FPGA partitions. Here, we will focus on the most common use model of the feature, namely reset synchronization, which also lent its name for the new HAPS ProtoCompiler reset_synchronize PCF command.

Introduction

Among the use models of the synchronous distribution of signals across multiple FPGA partitions, the system reset distribution is of main interest.

A reset signal is a high fan-out net that is being distributed in an FPGA device similar to a clock net using buffer trees. As such, it exhibits insertion delay with various depths of the buffer tree contributing to reset network skew. To get a reliable release of the reset, all design flip-flops must be synchronized to the appropriate clock domain and the reset release must be distributed so that it occurs within the same clock cycle without timing errors for any flip-flop. Reset assertion may not be timing-critical, but the release of a reset signal is always a timing-critical event, which makes proper synchronization a challenge.

Time-critical events need synchronous distribution

Without implementing proper synchronization, incorrect reset-release behavior or meta-stability can occur within multi-FPGA environments even if using a global-system reset that is considered to be synchronous to a specific system clock domain.

For a large design partitioned into multiple FPGAs, each individual FPGA device exhibits different reset-tree depth and internal reset network skew due to differences such as partitioning size, place-and-route constraints, or logic structure differences. As the internal clock-tree depth and the clock-network skew for any one FPGA may not be consistent, the complete design reset tree, which spans all FPGA reset sub-trees, may not guarantee reset release within one clock cycle for all flip-flops. The same scenario can occur if synchronizers are replicated, if design sub-portions are separately triggered by multiple copies of reset signals, or if the reset is not synchronized to the clock domain where it is operating.

An asynchronous reset signal triggered at some point in time such that all flip-flops in the design receive reset release within the same clock cycle and with correct setup timing behaves correct, but is based purely on chance. The same asynchronous reset being released at some other point in time such that reset release occurs on or near a clock edge for some flip-flops in some FPGA or for some clock domain may result in erratic design behavior.

Figure 1 illustrates the three possible scenarios. Flip-flops receiving reset in the green area become active on the first trailing clock edge, whereas flip-flops receiving reset in the red area become active one clock cycle later on the second trailing clock edge, resulting in erratic behavior.

Figure 1: Example for erratic reset signal distribution effects

Figure 1: Example for erratic reset signal distribution effects

Flip-flops receiving reset in the intermediate orange section are difficult to define as they violate setup timing and may be subject to reset meta-stability. Also, multi-FPGA systems may not specify or guarantee the reset insertion delay time at the system level, illustrated with question mark ‘?’ in Figure 1. Missing or incorrect timing constraints, and incorrect use without synchronization in non-related asynchronous clock domains are the primary reasons for failing to meet reset conditions. The resulting erratic circuit behavior may not be apparent from the timing reports.

Distribution Challenge

Distribution of global signals that must be synchronous across the entire design, such as a synchronous reset signal, is one of the challenges when a design is partitioned into multiple FPGAs.

  • An RTL design with a single reset input can only have one reset input after being partitioned (i.e., the system reset input is applied to only one FPGA). From this single FPGA, the reset must be inter­nally distributed within the partitioned design to all other FPGAs where it is needed. As the incoming reset is considered asynchronous, it must also be synchronized to the clock domain in which it operates.
  • A design with multiple reset signals must treat each reset individually in terms of distribution and synchronization to their respective clock domains. Also, as today's FPGAs are large, signal distribution through an FPGA must implement flip-flop pipeline structures to maintain maximum possible clock frequency.

In both scenarios the newly introduced HAPS ProtoCompiler PCF command reset_synchronize can help to automate this process to a great extent.

Signal Distribution with HAPS ProtoCompiler reset_synchronize command

The signal targeted for synchronization is specified by reset_synchronize PCF command during partition step. In its simplest form it consists of the following arguments

reset_synchronize -toplevel_net <netName> -clock <clockName> -init 0|1

whereas -toplevel_net specifies the signal to be synchronized and -clock the respective clock net. The -init argument defines the initial value of the flip-flops. The signal and clock nets can be either top-level FPGA ports or internal nets, but these nets must exist at the top-level of the design hierarchy. Both active-low and active-high signals are supported, as well as internal signals already being synchronous. The original signal behavior is maintained without altering the original RTL, but a propagation delay on the distributed net is introduced.

In the partition step, a single initial delay chain top-level flip-flop is inserted and replicated to each partition where the net loads are located in the final partition. During the system route step, depending on the target system and available resources, a synchronous distribution tree is determined.

Note that reset_synchronize is using replications to the specified nets load partitions (bins). The automated load deduction can be altered by the optional command arguments.

 

 

-repl_bins

replicate initial top-level flip-flop into the listed FPGA bins

 

-force_repl

replicate initial top-level flip-flop to all non-locked FPGA bins regardless of loads on the signal

 

These optional arguments are useful if the routing shall be allowed over locked FPGA bins, or to allow routing over FPGA bins that have no load on the reset net themselves but are the only valid connection to other FPGA bins with load in the target system. By default, the distributed and synchronized net is not considered for time domain multiplexing (TDM) if not explicitly specified otherwise.

Given a four FPGA HAPS-80 S104 system as an example, we apply reset_synchronize on the net reset whose load is partitioned to FPGAs FB1.uA and FB1.uB respectively. For synchronization the clock net clk is used. Partition schematic will include an initial flip-flop for synchronization on top-level in every partition the net has load on, as can be seen in Figure 2.

Figure 2: Partition schematic shows the replicated initial flip-flop for synchronization in system route step

Figure 2: Partition schematic shows the replicated initial flip-flop for synchronization in system route step

The identifiers of the initial flip-flop as well as the added delay flip-flops for synchronization is determined from the original supplied net. Within system route step, depending on route resources and constraints a route and corresponding synchronization tree is calculated and applied. Figure 3 shows the resulting synchronization for the given example. Here, a total of five flip-flops are introduced on each load path whereas reset_3 in FB1.uA and reset_0 in FB1.uB are IOB placed. This leads to an overall delay of 5 clock cycles until the reset reaches its load, whether it may be partitioned to FB1.uAor FB1.uB.

Figure 3: System route schematic shows the initial flop-flops and added synchronization delay chain

Figure 3: System route schematic shows the initial flop-flops and added synchronization delay chain

To support scripting in a wide range of applications the command also supports single FPGA boards. In this case no replications will be generated. Being a leave node, still two flip-flops (1x IOB-aware, 1x center) will be inserted to ensure high signal integrity.

Whenever new partitions are found during the flow, or new routing constraints are being defined, the synchronization will be updated in partition and system route runs.

The results of the HAPS ProtoCompiler reset_synchronize PCF command can be monitored within partition.rpt and system_route.rpt. Table 1 shows the resulting entries for the given partitioning and system route results example. For the partition run of the example, the net to be synchronized had load on FB1.uA and FB1.uB, as can be seen by the replication in Section 5.3 of partition.rpt. Furthermore, Section 3 of system_route.rpt lists the amount of synchronization pipeline stages and how many flip-flops have been added in system_route step (-bins) as well as how many flip-flops related to synchronization have been added in total for the FPGA partitions (-no_of_regs).

Table 1: Log file entries for HAPS ProtoCompiler reset_synchronize PCF command

Table 1: Log file entries for HAPS ProtoCompiler reset_synchronize PCF command

More information about reset_synchronize is available within the HAPS ProtoCompiler Command Reference document, section 3.

Clues and Remarks

If custom combinatorics such as IBUFG primitives or logic inverters are present in front of the reset load, usereset_synchronize with the net that connects to the actual reset load only. In general, the net specified to the command should always be the net actually driving all load which might get assigned to the available non-locked FPGA partitions.

Moreover, reset_synchronize requires unique nets respectively per clock domain.

Summary

In this article we showed the usage of the recently introduced HAPS ProtoCompiler PCF command reset_synchronize to synchronously distribute signals across FPGA partitions in an automated fashion. By using this flow, the HAPS ProtoCompiler user can easily generate proper global reset scheme implementations which will eliminate one source of unpredictable design behavior. In particular, in the use model of reset synchronization the release of reset will become addressed by static-timing analysis within its clock domain. Moreover, introducing a synchronizer before signal distribution adds an additional advantage of effectively eliminating meta-stability issues on the original net.

Continue Reading