1. Introduction
DSP blocks dedicated to multiplication and addition are hard macros, but if they are not properly mapped to resources, they can slow down operation. This article describes how to improve a circuit that does not produce timing even when the DSP block is in use.
This document uses an Intel® Arria® 10 FPGA circuit as an example.
2. about the DSP block
The structure of the DSP block is as follows: The DSP block has various functions, but only the necessary functions are activated and used. For example, in this example, the addition circuit, coefficient ROM, and registers other than input and output registers are not used.
Figure 1: DSP Block (18x19 in fixed-point multiplier mode)
3. Example: Configuration in which the internal memory output is directly input to the DSP block
3-1. Circuit Configuration
When the internal memory output is multiplied without going through FF, the input register of the DSP block is not used, resulting in a longer path and lower Fmax.
Note that the shift register description may also be implemented in internal memory. Check the compilation result.
Figure 2: Circuit Configuration
3-2. Check the timing error locations with a timing analyzer Check the timing error locations with the timing analyzer.
Execute "Report Timing" in the timing analyzer and check the path name that has failed.
Figure 3: Checking the Failed Path Name
3-3. Graphically check the circuit structure
The circuit structure can be checked graphically in the Technology Map Viewer and Resource Property Viewer.
Step 1: First, open Technology Map Viewer from the path you want to check
- Right-click
in the displayed line of the path you want to check on the Timing Analysis screen → Locate Path → Locate in Technology Map Viewer
Step 2: Select the DSP block and open Resource Property Viewer
-In the Technology Map Viewer screen, select the DSP block cell and right-click
→ Locate Node → Locate in Resource Run Property Viewer to view
Input registers are unused in the DSP block (low light, no blue border). The input signal is bypassed.
Figure 4: Graphical confirmation of the circuit structure
3-4. implement countermeasures Do
Change so that the input register (DFF) of the DSP block is used.
Add one or more DFF stages between the internal RAM and the multiplication. This is a change that expects Register Packing to use the DSP block input registers.
When adding DFF, make sure that the latency change is not a problem.
If the RAM is used for shift registers, adjust the length of the internal memory section so that the overall number of stages does not change.
If the RAM is used for other functions (FIFO, coefficient/function table, etc.), adjust the read timing and the timing of using multiplication results to maintain the overall functionality.
Figure 5: Adding DFF
3-5. Check the results of the measures with the timing analyzer Check the results of the countermeasure with the timing analyzer
The Fmax value improved from 302.21 MHz before the countermeasure to 372.21 MHz, satisfying the target frequency of 333 MHz.
Figure 6: Timing Analyzer screen
3-6. Graphical Confirmation of the Structure of the Circuit after Countermeasures
Graphical confirmation of the result of the countermeasure: Input registers are used in the DSP block (with blue boxes). The internal memory (RAM) output is received using the input register in the DSP block, and Register Packing is performed as expected.
Figure 7: Graphical Confirmation of the Circuit Structure
4. Reference: Evaluation Circuit
4-1. circuit before countermeasure (RAM-based shift register + DSP block multiplication)
Shift register: 16 stages, RAM-based
multiplier: DSP block (Native Fixed Point DSP is used)
Figure 8: Circuit before countermeasure (RAM-based shift register + DSP block multiplication)
4-2. circuit after countermeasure ( RAM-based shift register + DFF + DSP block multiplication )
Shift register: 15 stages, RAM-based + DFF
Multiplier: DSP block (using Native Fixed Point DSP) (No change)
Compilation improves timing by packing the D-FF as an input register of the DSP block.
Figure 9: Circuit after countermeasures
5. Conclusion
The above is an example of a design using a DSP block that does not produce timing, the analysis method, and the countermeasure. We hope this will be of some help to you.