PYNQ and Zynq: the Vitis HLS Accelerator with DMA training - Part 3: Use the Hardware Accelerated Code in Software

27 Nov 2021

I'm following the 3-part using a HLS stream IP with DMA training on the PYNQ community. This blog will not repeat the steps. The goal is to document the experience.

Use the Accelerated function in Software

In part 1, we made the hardware accelerated function example(stream &in, stream &out).
In part 2, we created a Vivado hardware design with the accelerator IP included. This makes the function available for your programs.
On this part, that accelerated function is used in a python program.

Refresher 1: What does the function do?

The example function accepts a stream of integers, adds the constant 5 to each value in that stream, and outputs the result. This was the accelerated IP made in Vitis HLS in training 1.

	ap_axis<32,2,5,6> tmp;
    while(1)
    {
	A.read(tmp);
	tmp.data = tmp.data.to_int() + 5;
	B.write(tmp);
     if(tmp.last)
     {
         break;
     }
    }
}

Refresher 2: What does the resulting hardware design look like?

In training 2, the FPGA design was created to allow DMA data exchange between the function and the ARM part of the Zynq

Next step: Load and activate the accelerated design into the FPGA

For the Zynq, the result looks identical than other FPGA designs. It's a set of IPs that are synthesized, Implemented and written to a bitfile.
We're using a PYNQ board, and the way to load the design into the hardware is by using the overlay functions.

For convenience, an alias is created for the DMA parts and the accelerated IP. These are the parts that we'll interact with from the Python code.
Then, the accelerated IP is enabled.

Run the Accelerated Function

Like any function you use, you need to declare the variables that hold input and result.
We're using a buffer of 100 unsigned integers here, for both input and output.

Initialise the input buffer with test values
We'll send 100 different values to the function, as test. Each position in the buffer has the value of its index. E.g.: element 14 in the buffer will have a value of 14.

We send the data to the IP by enabling the input DMA. The results are retrieved by enabling the output DMA.

That's it. We've now executed the hardware accelerated function one time. It returned the 100 processed elements. We're showing the first 10 for evaluation.

The example functionality (add 5 to a number) is intentionally kept simple. It allows to focus on the techniques.
Actual speed gain is possible for complex transformations, such as image processing.
Example:
Resizing an image from 3840x2160 to 1920x1080 using the OpenCV resize() function implemented in FPGA on my Zynq runs 4 times faster (250 ms) than the same OpenCV resize() function running as software on the ARM (1 second).

What I learned by doing this tutorial, is that the whole cycle has become more stable and integrated.
Vitis HLS and Vivado, version 2020.2, work well together. And PYNQ's examples with DMA now work reliably.

Pynq - Zync - Vivado series
Add Pynq-Z2 board to Vivado
Learning Xilinx Zynq: port a Spartan 6 PWM example to Pynq
Learning Xilinx Zynq: use AXI with a VHDL example in Pynq
VHDL PWM generator with dead time: the design
Learning Xilinx Zynq: use AXI and MMIO with a VHDL example in Pynq
Learning Xilinx Zynq: port Rotary Decoder from Spartan 6 to Vivado and PYNQ
Learning Xilinx Zynq: FPGA based PWM generator with scroll wheel control
Learning Xilinx Zynq: use RAM design for Altera Cyclone on Vivado and PYNQ
Learning Xilinx Zynq: a Quadrature Oscillator - 2 implementations
Learning Xilinx Zynq: a Quadrature Oscillator - variable frequency
Learning Xilinx Zynq: Hardware Accelerated Software
Automate Repeatable Steps in Vivado
Learning Xilinx Zynq: Try to make my own Accelerated OpenCV Function - 1: Vitis HLS
Learning Xilinx Zynq: Try to make my own Accelerated OpenCV Function - 2: Vivado Block Design
Learning Xilinx Zynq: Logic Gates in Vivado
Learning Xilinx Zynq: Interrupt ARM from FPGA fabric
Learning Xilinx Zynq: reuse and combine components to build a multiplexer
PYNQ version 2.7 (Austin) is released
PYNQ and Zynq: the Vitis HLS Accelerator with DMA training - Part 1: Turn C++ code into an FPGA IP
PYNQ and Zynq: the Vitis HLS Accelerator with DMA training - Part 2: Add the Accelerated IP to a Vivado design
PYNQ and Zynq: the Vitis HLS Accelerator with DMA training - Part 3: Use the Hardware Accelerated Code in Software
PYNQ and Zynq: the Vitis HLS Accelerator with DMA training - Deep Dive: the data streams between Accelerator IP and ARM processors
Use the ZYNQ XADC with DMA part 1: bare metal
Use the ZYNQ XADC with DMA part 2: get and show samples in PYNQ
VHDL: Convert a Fixed Module into a Generic Module for Reuse

Top Comments

Jan Cumps over 4 years ago in reply to Jan Cumps +1

Progress with the XADC sampling: The Vivado design works, and I can retrieve 128 measures at a time.

Parents

Jan Cumps over 4 years ago

What this example doesn't how, is that you don't have to use this as an accelerator for software.
You can also use it directly in an FPGA flow.

As an example, you could flow the data from the on-board ADC into your FPGA datastream, or to a MicroBlaze with DMA.

Check this blog: https://www.hackster.io/adam-taylor/signal-processing-with-xadc-and-pynq-3c716c.
I'm going to try this one of these days.
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
Jan Cumps over 4 years ago in reply to Jan Cumps

Progress with the XADC sampling:

The Vivado design works, and I can retrieve 128 measures at a time.
- Cancel
- Vote Up +1 Vote Down
- Sign in to reply
- More
- Cancel

Comment

Jan Cumps over 4 years ago in reply to Jan Cumps

Progress with the XADC sampling:

The Vivado design works, and I can retrieve 128 measures at a time.
- Cancel
- Vote Up +1 Vote Down
- Sign in to reply
- More
- Cancel

Children

No Data