In this column, we introduce "FPGA technical information that is surprisingly little known but makes a difference if you know it.
The contents are useful for everyone from FPGA beginners to veterans, so please stay with us until the end.
Part 8: How to Reduce Operating Frequency (F) and Toggle Rate (N)
We will examine how to reduce "F" and "N," which affect dynamic power, from among the power consumption calculation formulas.
To reduce "F" and "N", it is important not to move them anyway.
Clock gating
This is a method of stopping the clock to reduce the power consumption of the clock net, multiplexer, registers, and the combined circuitry after the registers. There are two things to keep in mind when using clock gating
- Stop at the source of the clock as much as possible.
- Clock gating should be done in a manner supported by the vendor.
In a single gated circuit, stop the clock at the root of the clock line before it branches off, as this will be less effective unless you stop as long a clock line and as many registers as possible.
Also, stop clock lines with ANDs, etc., as this will not guarantee timing (since it is not a synchronous circuit) and may cause clock glitches, so be sure to use the FPGA vendor's recommended method of clock gating.
Intel provides a PLL that supports clock gating.
Please use this PLL since it satisfies the above two points and is highly effective.
Data Enable
For devices with clock gating constraints or where data transitions are more frequent than clock transitions, use data enable to stop unnecessary transitions by enabling the data signal.
Use of PLL
Blocks that require a lower operating frequency use a PLL to lower the operating frequency.
Resource Sharing
Resource sharing reduces the number of operations and transitions by using a common term.
For example, Z = AB + AC becomes Z = A(B + C) to reduce the number of transitions.
Figure 1: Tree Type (Z=AB + AC) Figure 2: Chain Type (Z=A(B + C))
Resource Non-Sharing
Resource sharing causes a transition every time a common term is switched, even if the input data is unchanged.
When input data transitions are few, non-shared resources consume less power because no switching operation is required.
Figure 3: Resource Sharing Figure 4: Resource Non-Sharing
For example, in Figure 2, in addition to the power consumption for the added multiplexer, power is consumed each time the input of the adder is switched (even if A and B do not transition).
Glitch Reduction in Arithmetic Circuits
Arithmetic circuits generate a lot of internal glitches before the output is determined.
Power consumption can be reduced by reducing these glitches.
(a) Align input timing
Aligning the input timing of the arithmetic circuit can reduce internal glitches.
- Attach a register to each input pin of the arithmetic circuit.
→The input timing becomes the same, and glitches in the arithmetic circuit are reduced.
- Change the order of operations in all blocks to reduce signal input timing deviations in the target block.
- Change the chain type to a tree type
→ The number of steps is reduced, so the input timing becomes closer.
For example, a 4 Bits chain type adder has about 45% more toggling due to glitches than a 4 Bits tree type adder.
Z <= A + B + C + D Z <= (A + B) + (C + D)
Figure 5: Chain Type Figure 6: Tree Type
Pipelining arithmetic circuits results in shallower logic depths and fewer glitches.
In some cases, flattening structured circuits can lower power consumption, so please verify this on the actual device.
(This may not be verified with a power simulator.)
(b) Pipelining
A large arithmetic circuit is divided into smaller circuits and pipelined.
Glitches due to delay differences in signals passing through the arithmetic circuit are greatly reduced.
Optimization of number representation
(a) Bus invert encoding
This technique reduces the number of bus and adder transitions.
Bus invert encoding is effective for data buses that change randomly.
Specifically, when more than half of the data in the previous clock changes, the control signal is raised and the data is inverted. The data bus is incremented by one bit for the control signal.
Table 1: Bus invert encoding
(b) Gray code or Johnson counter
This technique reduces the number of consecutive data transitions on a bus or counter.
Gray code counters and Johnson counters have fewer transitions than binary counters because they transition only one bit each time they count.
Examples of each for the 3-bit case are shown below.
The circuit for the Johnson counter is simpler, but the number of transitions it can represent is smaller.
In the example below, the state values 010 and 101 do not exist in the Johnson counter.
Table 2: Difference in Number of Transitions by Coding
The number of transitions for Gray code is 1/2 to 1/1 of that for binary code, with the effect increasing as the number of bits increases.
(c) Sign-absolute value representation over 2's complement
2's complement is often used because it allows easy addition and subtraction of negative values, but in 2's complement, all bits change when the value changes from "-1" to "0. Therefore, rather than using two's complement, the sign-absolute representation reduces the number of transitions and power consumption.
The sign-absolute representation is a representation in which one bit is allocated to the sign and the remaining bits are used to represent the absolute value of the data.
Change multiplier to shift-adders
Changing the multiplier to a combined shift and adder circuit reduces the number of transitions.
For example, the multiplication operation Z =X *8 has the same result with only three shifts using a shift register, but the number of transitions is greatly reduced.
Output Buffer
Since the output buffer drives external wiring with very large capacitance, it should be designed to output signals only during necessary operations or after glitches are settled and outputs are stable.
Pre-computation
When it is determined that some computation is done first and there is no need to run the entire operation, the operation is stopped to reduce the number of transitions.
For example, an n-bit comparator compares all bits even if it knows that the result will be the same.
Therefore, we can compare only 1 bit of MSB first, and then compare the other bits only when they have the same value. If the result is known to be greater than or equal to only one bit of the MSB, the other bits are not compared and the number of transitions can be greatly reduced.
Operand Isolation
Since arithmetic circuits are composed of combinational circuits, the internal logic is frequently toggled until the result is determined.
When the result of a calculation is not needed, the input signal is stopped and the internal operation of the arithmetic circuit is stopped.
For example, if there is a multiplexer at the output of the arithmetic circuit, its select signal is used to stop the input data. This is used in multi-bit arithmetic circuits.
Bus Separation
Rather than sharing a bus for time-division operation, it is better to separate buses to reduce the number of transitions.
Also, if the number of transitions between the lower bits (LSB) and the upper bits (MSB) is significantly different, separating the buses can reduce power consumption.
For example, RAM with a large word width may consume less power if the upper bits (MSB) and lower bits (LSB) transitions are compared and divided into words using a multiplexer.
This is because lower bits generally transition more frequently than higher bits.
Utilizing Code Coverage
Perform code coverage to reduce wasted circuitry that is not working.
Fine-Grained Parallelism
FPGAs consume less power than multiprocessors and GPUs because the granularity of parallel processing is smaller.
Processing on an FPGA that is being done on a processor or GPU consumes less power.
It is also possible to convert a program that has been performing software processing to an FPGA with the OpenCL SDK.
Know the Difference! FPGA: The Only Thing You Need to Know Series
Power Consumption
- Part 1: Three Tips for Low Power Consumption
- Part 2: Is Clock Gating (Gated Clock) Effective?
- Part 3: Is this leakage power? No, it is DC power.
- Part 4: Why Precision Power Simulators Were Never Used
- Part 5: Is this the ultimate low-power method?
- Part 6: How to Reduce Load Capacitance (C)
- Part 7: How to Reduce Signal Amplitude (Vs) and Supply Voltage (VCC)
- Part 8: How to reduce the operating frequency (F) and toggle ratio (N).
- Part 9: How to Reduce Short-Circuit Power
- Part 10: How to reduce DC power and leakage power
Verification
- Part 1: What is the first thing to ask designers who are concerned about design quality?
- Part 2: Verification methods not recommended for designs that have already been commercialized.
- Part 3: Verification is a combination of various methods.
- Part 4: FPGAs have more defects than ASICs - Asynchronous clocks.
- Part 5: What is formal verification?