Ralph Grundler, CAE Manager
One of the most common support questions asked about DesignWare Cores is, "What FIFO size should I select for my design?" This question sounds simple at first glance but quickly one can realize the complexity of the question. It is a balance of the latency of the system, data bandwidth of the system bus, I/O protocol overhead and data bandwidth for the I/O protocol bus the user is connecting to, with the IP FIFO buffering in-between. Also, in every FIFO size decision the user needs to understand the trade off of size (gate count) compared to performance (throughput).
Compound this with the fact that the user of the core needs to make this hardware configuration decision very early in the design process and it sometimes becomes almost impossible to answer this question because the designer does yet not know all the details for the system (or the I/O protocol for that matter.) Since the cores are designed to accommodate all the system and product tradeoffs, the design choices for FIFOs may become overwhelming. This article explains the basic thought process and different strategies needed to determine the right FIFO sizes for the various DesignWare Cores.
The first concept that needs to be established is the design goals of the product. Hopefully your marketing team has given you this data, but if not, you need to request it. Does the product need to support the maximum bandwidth or minimal gate count? If gate count is the most important factor then selecting the minimum FIFO size will suffice.Be sure to check the documentation to make sure the core will work at that FIFO size. Sometimes the products provide so much flexibility that they give the user the ability to configure the core into a non-desirable state.
Selecting the Maximum FIFO size is done when nothing is known about the system, protocol, or latencies, and the user would like to achieve the best possible bandwidth in all cases of transmission of data. You can always go back and reconfigure the FIFO size after you get some system level simulations running to play with bandwidth results. You can also use the default configuration to get to that point, but be aware it may not be useful for your application. Typically the design needs something in between, and the user needs to go a little deeper into the design constraints and limitations.
After the design goals have been addressed, the user then needs to understand the system limitations. First the user needs to calculate a good estimate on the system latency. These latencies could include, arbitration, bridges, memory access, interrupts, etc. So system latency plus the buffing filling capability of the system minus the protocol overhead should be less than or equal to the protocol bandwidth desired, or
system latency + buffer fill - protocol overhead </= protocol bus bandwidth
This enables the protocol side of the core to empty the buffers no faster than the system can fill the FIFOs avoiding underrun, or the protocol side can not fill faster than the system can empty avoiding a FIFO overrun situation.
The next system detail the user needs to consider is the system bus bandwidth. The bandwidth on the system side should be faster than the protocol side unless of course you are not trying to achieve the maximum bandwidth on the protocol bus.The faster the system bus, the less latency in filling the FIFOs and transmitting on the protocol bus. If the system can fill the FIFOs faster it can make up for some of the system latency.
After these calculations have been reviewed, the user can figure out what type of data flow the system can handle. "Store and Forward" works as the name implies. The data is stored in a local buffer (FIFOs) in the core until all of the data is received and the data integrity is checked and then forwarded to memory by the system or a DMA controller. The clear advantage of this type of data flow is the system does not need to handle data that is corrupt, and if the system is not ready for the data it can wait in the local buffer. The disadvantage of Store and Forward is the FIFOs need to be large enough to hold the largest packet possible and hence some systems that are strict on size will use a "Cut-Through" data flow. In the Cut-Through data flow, the FIFOs can be much smaller but still need to be large enough to handle system latencies. If the total system latencies are not understood it is safest to use Store and Forward. (Note that this still does not guarantee maximum data through put on the protocol side of the core.) Cut-Through does not allow the core to check the data integrity before it is sent to the system so the system needs to be able to discard or resend data if the core "signals" that the data was corrupted. In general if the system bus is lightly loaded and there are not a lot of system delays, the preferred data flow would be Cut-Through and if the user does not know these things at the design time it is safer to use Store and Forward. In some cases because of the restrictions in the protocol side of the bus or standard software, the core does not give this selection or the software does not allow it to be used.
Various DesignWare Cores databooks and SolvNet articles attempt to take some of the guess work out of sizing the FIFOs, but with so many variables it still takes some skill on the engineer's part. Since the issues are different for each protocol, the issue of FIFO sizing or configurations is different for each core.The following is a brief overview for configuring the FIFOs of different products.
DesignWare Ethernet Core FIFO sizing is fairly conventional as there are separate FIFOs for the TX and RX paths. The user would need to consider if they are using a Store and Forward data flow or if there application can handle the extra work of a Cut-Through data flow.
For the DesignWare USB 2.0 Host Controller, there are several
options for the customer to consider when planning buffering. The basic configuration
options are Config1 or Config2 and there are various options within each configuration.
Config1 is the smaller/lower performance configuration option and consists of
a single FIFO that can be configured for size and threshold levels. In general,
a common starting point is 512bytes or 1Kbytes with the threshold registers
set at the maximum size of the FIFO. More information on thresholding is at:
https://solvnet.synopsys.com/retrieve/016685.html.
Keep in mind thresholding levels can only be modified by the host controller driver. If you are using standard drivers (from Microsoft for example) select the threshold needed (becomes reset default) as you can not change the value once the Host Driver has started. Config2 is for higher performance systems and can buffer up to 4k of data and descriptors. This configuration can give maximum performance but is at a higher cost in terms of gate count. See Appendix D of the DesignWare USB 2.0 Host Controller Subsystem-AHB databook for more details on the timing and advantages of each configuration.
When considering the configuration of the DesignWare
Hi-Speed USB OTG Controller FIFOs, the user needs to understand that
the configuration options have been designed to be very efficient with gates
by reusing logic. This makes the calculations more complex and requires the
user to better understand the USB transfers supported however it will reward
the user with a low gate count design. By sharing the FIFOs for the Device
and Host functions, the controller saves gates but also implies the user should
select these to FIFO sizes to be the same for maximum performance for that configuration.
Basically the number of device endpoints corresponds to the number of endpoints
the core can support in Host mode.
There are more details of this in the article at:
https://solvnet.synopsys.com/retrieve/016804.html
The DesignWare IP for PCI Express Core has the similar options to other cores but offers one more configuration for receive buffering. Transmit buffering is only Bypass. The data from the application is transmitted directly on the PCIe link and a copy of the data is saved in the Retry Buffer as required by the specification. For receive buffering there are three options. Store and Forward is where the core will automatically return credits and drop error packets and the application can throttle data. For lower latency the user can select Cut-Through. In this configuration the application must discard the data in the case of an error and the application can do limited throttling of data. If Bypass (no FIFO) is selected the application needs to handle error packets and it can not throttle data, but this selection has the lowest receive latency. There are many other options in the PCI Express configuration that the influence the sizing for the FIFOs (also call RAMs, buffers or queues in the PCI Express databook) we give the user the option to autosize the FIFO sizing. Autosize is a good starting point to test the system. Users can always overwrite these suggested values to their own selections.
The FIFO sizing of the design is an important design consideration with choices between sizing, system design and data flow. Please take the time to consider this configuration option carefully. The configuration options are powerful and in the end you have created verilog that can only be modified by rerunning the configuration. Sometimes the cores give the user the option to dynamically resize or reassign the memory through the software but the physical memory size will still stay the same. Keep in mind that once the chip is made, the memory size can not be physically changed, bigger or smaller. This may sound obvious but is often overlooked in the hurry to tapeout. So spend the time to understand the system, the software being used and I/O protocol to make sure the design meets the original goals of the product.