Intel® HLS Compiler Sample Introduction

* The original content was created in Japanese, so some information, images, and links may still be in Japanese. We’re updating gradually and appreciate your patience.

Introduction to Intel® HLS Compiler

We understand that many people are interested in the Intel^® HLS compiler and would like to try out some samples first.
We have compiled a list of samples and tutorials available from Intel to help you get started.
The sample tutorials are stored in the Intel® Quartus^® Prime development software installation directory under

<installation directory>\intelFPGA\18.1\hls\examples

The examples presented in this article are those provided with the Intel^® Quartus^® Prime development software Standard Edition.

For instructions on how to run the sample tutorials, please see below.

HLS Sample & Tutorial Summary

Design Samples

(<installation directory>\hlsExamples)

Samples	Contents
counter	Sample to create a simple counter.
image_downsample	Sample to reduce the aspect ratio of a bmp file.
interp_decim_filter	Sample of interpolation filter and decimation filter.
QRD	Sample of matrix operation QR decomposition.
YUV2RGB	Sample color space conversion from YUV422 to RGB888.

ac datatype-related samples

(<installation directory>\hlsExamples\tutorials\ac_datatypes)

Samples	Contents
ac_fixed_constructor	Sample showing an example of ac_fixed constructor with better QoR (quality of results) using fractional variation in coding style. Using the recommended "Convert dynamic double / float values outside components" gives better results than the deprecated "Convert dynamic double / float values within components".
ac_fixed_math_library	Sample using the fixed-point math library. You can see the trade-off between precision/QoR when switching from floating point to fixed point.
ac_int_basic_ops	Sample to check multiple use cases of the ac_int type. As examples of ac_int type calculations, you can see examples of addition, multiplication, and division calculations. You can also see examples of shift operator, left shift operator, and slice as examples of ac_int type operators.
ac_int_overflow	Sample showing how to avoid DEBUG_AC_INT_WARNING and DEBUG_AC_INT_ERROR. It is recommended to use gdb for backtracing in Linux environments.

Sample optimizations presented in the Best Practices Guide

(<installation directory>\hlsExamples\tutorials\best_practices)

Sample	Contents
const_global	Sample to compare with and without const_global. Even though the testbench holds global variables at constant values, the compiler does not assume that the variables are constant in all situations. On the other hand, const-modified global variables are guaranteed to be constant in all situations. Therefore, const-modified global variables are folded into the component and fully optimized. For best results, it is recommended to avoid using variable global variables within components.
floating_point_ops	A sample showing the impact of the --fpc and --fp-relaxed flags in i++ to be used when a slightly poorer precision of the floating point result is acceptable. The --fp-relaxed flag: if you don't care about the order in which operations are performed, you can tell the i++ compiler to achieve a minimum pipeline. ●--fpc flag: instructs the compiler to optimize intermediate rounding and normalization of floating-point operations when possible, reducing the amount of hardware required to chain floating-point operations.
integer_promotion	A sample that checks for integer promotion with the promote-integer option. The g++ compiler automatically performs integer promotion, but i++ does not unless you explicitly instruct it to do so with the --promote-integer option.
loop_memory_dependency	Sample to check the effect of ivdep pragma, which indicates that there are no dependencies in the memory array. In a loop structure, the Intel HLS compiler ^, by default, determines that there is a dependency and waits until the dependent memory operation completes or adds hardware, creating a resource and performance bottleneck. In a loop structure without dependencies, the report shows that using ivdep pragma reduces the II (Initiation Interval) and resources compared to not using ivdep pragma.
parameter_aliasing	Sample use of the "restrict" option to avoid creating unnecessary memory dependencies between non-conflicting read and write operations. "restrict" type qualifier correctly adapted to the pointer type avoids unnecessary circuit creation, resulting in improved Loop & Latency.
resource_sharing_filter	Sample that creates two filters, firfilt and firfilt_shared, and allows comparison of each filter. The firfilt filter is a high throughput design, and simulations show that returndata is returned every clock cycle. firfilt_shared is a low throughput design, and simulations show that returndata takes time for returndata to return from simulation. However, it uses less logic, and we can see that it uses about half as many ALUTs as firfilt.
shift_register	A sample of the recommended coding style for implementing the shift register. This recommended coding style will generally result in an optimized resource.
single_vs_double_precision_math	A sample that illustrates the impact of choosing between single-precision literals and functions versus double-precision ones. Users not familiar with the C++ standard may inadvertently use double-precision literals or functions when the single-precision version is intended. Double-precision arithmetic is more complex and, as a result, consumes more resources and usually has longer latency than single-precision implementations. The following points should be noted 1) Always use single-precision literals (with the suffix "f") unless double-precision values are required. 2) Select functions that accept and return single-precision floating-point arguments unless you need the additional precision or range provided by double-precision values. 3) Explicitly convert double-precision floating-point values back to single-precision values as soon as possible to avoid performing additional operations on more double-precision operations by accident.
struct_interface	A sample showing how ac_int can be used to implement an interface without padding bits. Both components, created based on the following different classes, have the same behavior 1) I26 byte-aligned structure (184 bits): class IPPacketHeader_padding 2) ac_int type (160 bit): class IPPacketHeader_no_padding
swap_vs_copy	Sample to compare when swap is used and when deepcopy is used. Pointer swapping is a common way to avoid copying large amounts of data, but this technique only works with today's component memories, so a different technique is needed when using registers to implement buffers. In this sample, we use a register with deepcopy to improve resources and performance. The use of deepcopy provides the best QoR compared to swap.

Memory-related samples

(<installation directory>\hlsExamples\tutorials\component_memories)

Samples	Contents
bank_bits	This sample shows how to specify addresses when accessing multiple memories. It can be confirmed that the access pattern to memory differs depending on the difference in the specification method. The different access patterns can also be confirmed from the Component memory viewer in the report. *For more information on how to specify the address, please refer to the Best Practices Guide below. Specifying the bank selection bit at the local memory address
depth_wise_merge	An example of using the "merge" attribute to merge two blocks of memory into a single block of memory. Without sacrificing performance, load/store can also be merged, reducing memory usage.
static_var_init	Sample that controls initialization behavior in a component at reset and power-on. Using MIF files to initialize memory reduces component resource utilization and startup latency.
width_wise_merge	Two memories accessing the same address can be merged widthwise by using the "merge" attribute. Without sacrificing performance, load/store can also be merged, reducing memory usage.

Interface-related samples

(<installation directory>\hlsExamples\tutorials\interfaces)

Sample	Contents
explicit_streams_buffer	Sample showing the usage of stream_in and stream_out. By using "ihc::stream_in" and "ihc::stream_out", you can see that the interface of the generated circuit is the ^Avalon® streaming (ST) interface. A buffer can also be defined for the input, but we can also see how the buffer is defined.
explicit_streams_packets_ready_valid	Various samples of stream signal processing. 1) It can be confirmed that the value of the stream signal is determined to be greater than or less than the reference value, and if greater, the reference value is output." ^Avalon®-ST using "ihc::stream_in" and "ihc::stream_out". 2) By adding ihc::usesPackets to the part1 circuit, it can be confirmed that start and end can be determined from start of packet and end of packet. 3) By using ihc::usesValid and ihc::usesReady from the part2 circuit, it can be confirmed that the valid input signal and ready output signal are removed.
mm_master_testbench_operators	Sample showing the use of the getInterfaceAtIndex (int) operator in a memory-mapped master. It is similar to indexing an array. See corresponding comments in the code for details.
mm_slaves	A sample that changes the interface of the module at each Step based on a 4 byte Swap design. 1) Base Design (default interface) 2) The "hls_avalon_slave_component" option changes the default HLS interface to that of the ^Avalon® Memory Mapped Slave (MM Slave). 3) The "hls_avalon_slave_register_argument" option aggregates the ^Avalon® MM Slave, including the user input interface. 4) The "hls_avalon_slave_memory_argument" option provides logic reduction and latency improvement in exchange for RAM resources.
multiple_stream_call_sites	A sample showing the advantages of using multiple stream call sites. All call sites accessing the same stream share a single FIFO port, and areas containing multiple reads/writes to the same stream are serialized. That is, the hardware corresponding to the area executes only one thread or loop iteration at a time. The order of reads and writes will be the same whether the component is executed as an x86 emulation or in FPGA simulation.
pointer_mm_master	A simple vector and control implemetation sample on data stored in memory outside of the component. Optimize the ^Avalon® MM interface on the component using various options. 1) Base Design (default interface) with a,b,c, three pointer inputs and ^Avalon® MM Master interface. 2) Additional component descriptions provide parallel ^Avalon® MM interfaces in external memory to improve latency. 3) Combine interfaces, expand data width, and change to burst support to increase load/store resources while providing efficient access control. 4) Combine load/store units to provide area and throughput changes to the compiler to improve latency. 5) Aligns merged memory accesses with memory words, simplifying the Load/Store part and thus improving Fmax.
stable_arguments	A sample that uses the "hls_stable_argument" attribute to ensure that arguments do not change between component calls. When the same argument is called from a pipelined component, the "hls_stable_argument" attribute can be used to ensure that the argument does not change between component calls.

Miscellaneous Samples

(<installation directory>\hls_examples\tutorials\usability)

Samples	Contents
enqueue_call	This sample compares component call times between direct function calls and enqueue calls. In enqueue_call, components can be enqueued and called one after another in a pipeline format. Therefore, the calling time can be shorter than that of a normal direct call. (ihc_hls_component_run_all() must be executed to execute the called function.)
qsys_2xclock	A sample that contains a very simple component that creates an interface with clock and clock2x signals, and explains how to take the generated component and incorporate it into the Platform Designer system. The PLL takes a reference clock and generates two output clock signals. Note that one clock is set to generate twice the frequency of the other, and the two clocks are generated in phase with each other.
qsys_stitching	A sample showing the flow of generating a bandpass filter by combining multiple components generated by the ^Intel® HLS compiler. The Platform Designer feature of ^Intel®^Quartus® Prime development software can be used to easily combine multiple components.

Other Reference Materials