PYNQ now supports Vivado and Vitis HLS version 2020.2 (since PYNQ 2.7).
Time to re-check the hardware accelerator mechanisms, with DMA. This workflow has now stabilised.
Hardware Acceleration is the technique to implement program logic inside the FPGA. It will become a set of logic blocks, instead of a set of machine instructions in executable memory.
The logic is written in a programming language, in this case C++. Vitis HLS will turn this into HLS (you can choose if it's VHDL or Verilog, but that's not important).
Typically, you'd select a block of code that is too resource-heavy on your processor. You copy that code into a Vitis HLS project, define how the interfaces in an out will be done, and let it build.
The build will generate an IP that can be used in Vivado like any other IP. THere you add it to your hardware design.
Once you load the resulting bitfile into the Zynq, you can call this code from your program. In effect, you have now turned a bottleneck logic function into a hardware process.
I'm following the 3-part using a HLS stream IP with DMA training on the PYNQ community. This blog will not repeat the steps. The goal is to document the experience.
Prepare a C++ function for hardware acceleration
The source code is a bit different, but you'll recognise that it's common logic.
#include "ap_axi_sdata.h" #include "hls_stream.h" void example(hls::stream< ap_axis<32,2,5,6> > &A, hls::stream< ap_axis<32,2,5,6> > &B) { #pragma HLS INTERFACE axis port=A #pragma HLS INTERFACE axis port=B #pragma hls interface s_axilite port=return ap_axis<32,2,5,6> tmp; while(1) { A.read(tmp); tmp.data = tmp.data.to_int() + 5; B.write(tmp); if(tmp.last) { break; } } }
There are predefined patterns. The inputs and outputs have to be selected from a set of types. There are primitive types and containers.
In our example, we'll use hls::stream
for input and output.
The demo function reads the stream of integers. Then it adds 5 to each value and writes that sum to the output stream.
In the real world, this could be something else like image manipulation, FFT transformation, filtering, compression, CRC calculation, en/decryption...
That's it.
image: FPGA resource cost of the generated IP
Accelerators are most useful in scenarios where data can stream. It's not your solution if OS resources are needed, such as access to files.
They work good together with DMA.
After a successful build, Vitis HLS will summarise the FPGA resource cost. It will also show the interfaces.
image: interface definition and possible performance warning
It will also generate the IP package that we'll use in the next post, to integrate it in a Vivado flow. The next step to make this function callable from software.