Learning Xilinx Zynq: Hardware Accelerated Software

Jan Cumps

8 Aug 2021

The main target for Zynq family FPGAs is: compute systems with hardware acceleration.

It's architecture focuses on being able to stream data efficiently between ARM and FPGA submodules.

The FPGA can then perform manipulations in hardware that take too long in software.

The tool chain supports this. The Vitis HLS IDE's only goal is to convert C functions to FPGA IPs..

I have edited this article for Vivado and Vitis HLS version 2020.2.

The examples are updated for these versions, and work with Pynq 2.6 and 2.7.

You get the working sources right now via git clone -b image_v2.7_2020.2 https://github.com/mariodruiz/PYNQ-HelloWorld.git --recursive.

A merge request is open to het it into the Xilinx repo https://github.com/Xilinx/PYNQ-HelloWorld/tree/image_v2.7.

Hardware Accelerated Image Resize Algorithm

As proof of concept, Xilinx adapted OpenCV so that you can build functions and filters in hardware.

I'm reviewing such an example: image resize.

This demo performs the same exercise twice: resize an image from 3840 * 2160 to 1920 * 1080

using the ARM processor to execute OpenCV resize calls
using the same function, implemented inside the FPGA fabric.

The results:

in software, it took 1.03 s. In hardware, 210 ms.

The test is both times done in a loop, taking the fastest execution as benchmark. This to take in account optimising, caching, incidental OS activity, ...

The time taken is from having the original image in memory, to when the algorithm has finished writing the resized image to memory.

Resize executed in ARM processors	Resize executed in FPGA fabric

The Accelerated Resize Function Design

This is done in Vitis HLS. The example uses the Xilinx OpenCV port and calls its resize() function.

The C code (full):

void resize_accel(stream_t& src, stream_t& dst,
                  int src_rows, int src_cols,
                  int dst_rows, int dst_cols) {
    // Convert stream to xf::cv::Mat
    axis2xfMat<DATA_WIDTH, TYPE, HEIGHT, WIDTH, NPIX>(src, src_mat);
    // Run xfOpenCV kernel:
    xf::cv::resize<INTERPOLATION, TYPE, HEIGHT, WIDTH, HEIGHT, WIDTH, NPIX, MAXDOWNSCALE>(src_mat, dst_mat);
    // Convert xf::cv::Mat to stream
    xfMat2axis<DATA_WIDTH, TYPE, HEIGHT, WIDTH, NPIX>(dst_mat, dst);
    
}

Vitis HLS converts this into an FPGA IP (both Verilog and VHDL source are generated).

This IP can then be used in Vivado

The Accelerated Resize Function Used

You can use this generated IP similar to other IPs in Vivado. In the image below, it's the orange block.

Inputs and outputs are interfaced with the ARM controllers via the AXI interface.

Install on Pynq Board

This is straightforward.

You open a Linux terminal. There is one available from the Jupyter home page of your board. Or you can use PuTTY, etc.

Then follow the Quick Start. The two notebooks with ARM and FPGA implementations will become available

Building the example from source
If you want to have the Vivado and Vitis HLS projects available in your 2020.2 install, that's possible. Clone the HelloWorld git, with subprojects. no longer needed to clone Mario D Ruiz' project. It's now merged with the Xilinx repo git clone -b image_v2.7_2020.2 https://github.com/mariodruiz/PYNQ-HelloWorld.git --recursive git clone -b image_v2.7_2020.2 https://github.com/Xilinx/PYNQ-HelloWorld.git --recursive Vitis HLS Project start Vitis 2020.2 TCL cd <where you cloned>\PYNQ-HelloWorld\boards\ip vitis_hls -f build.tcl -tclargs ./vitis_lib/vision/L1/include/ resize/resize/impl/ip/component.xml Vivado Project start Vivado 2020.2 TCL `cd <where you cloned>/PYNQ-HelloWorld/boards/Pynq-Z2/resizer` It may be needed to edit ressizer.tcl. Comment below if you did not have to do this. replace `set_property ip_repo_paths ../../ip [current_project]` with `set_property ip_repo_paths ../ip [current_project]` vivado -mode batch -source resizer.tcl -notrace vivado -mode batch -source build_bitstream.tcl -notrace You now have the sources and projects for the accelerated function and the Vivado FPGA design.

Building the example from source

If you want to have the Vivado and Vitis HLS projects available in your 2020.2 install, that's possible.

Clone the HelloWorld git, with subprojects.

no longer needed to clone Mario D Ruiz' project. It's now merged with the Xilinx repo

git clone -b image_v2.7_2020.2 https://github.com/mariodruiz/PYNQ-HelloWorld.git --recursive

git clone -b image_v2.7_2020.2 https://github.com/Xilinx/PYNQ-HelloWorld.git --recursive

Vitis HLS Project

start Vitis 2020.2 TCL
cd <where you cloned>\PYNQ-HelloWorld\boards\ip
vitis_hls -f build.tcl -tclargs ./vitis_lib/vision/L1/include/ resize/resize/impl/ip/component.xml

Vivado Project

start Vivado 2020.2 TCL

cd <where you cloned>/PYNQ-HelloWorld/boards/Pynq-Z2/resizer

It may be needed to edit ressizer.tcl. Comment below if you did not have to do this.
replace set_property ip_repo_paths ../../ip [current_project]
with set_property ip_repo_paths ../ip [current_project]

vivado -mode batch -source resizer.tcl -notrace
vivado -mode batch -source build_bitstream.tcl -notrace

You now have the sources and projects for the accelerated function and the Vivado FPGA design.

Pynq - Zync - Vivado series
Add Pynq-Z2 board to Vivado
Learning Xilinx Zynq: port a Spartan 6 PWM example to Pynq
Learning Xilinx Zynq: use AXI with a VHDL example in Pynq
VHDL PWM generator with dead time: the design
Learning Xilinx Zynq: use AXI and MMIO with a VHDL example in Pynq
Learning Xilinx Zynq: port Rotary Decoder from Spartan 6 to Vivado and PYNQ
Learning Xilinx Zynq: FPGA based PWM generator with scroll wheel control
Learning Xilinx Zynq: use RAM design for Altera Cyclone on Vivado and PYNQ
Learning Xilinx Zynq: a Quadrature Oscillator - 2 implementations
Learning Xilinx Zynq: a Quadrature Oscillator - variable frequency
Learning Xilinx Zynq: Hardware Accelerated Software
Automate Repeatable Steps in Vivado
Learning Xilinx Zynq: Try to make my own Accelerated OpenCV Function - 1: Vitis HLS
Learning Xilinx Zynq: Try to make my own Accelerated OpenCV Function - 2: Vivado Block Design
Learning Xilinx Zynq: Logic Gates in Vivado
Learning Xilinx Zynq: Interrupt ARM from FPGA fabric
Learning Xilinx Zynq: reuse and combine components to build a multiplexer
PYNQ version 2.7 (Austin) is released
PYNQ and Zynq: the Vitis HLS Accelerator with DMA training - Part 1: Turn C++ code into an FPGA IP
PYNQ and Zynq: the Vitis HLS Accelerator with DMA training - Part 2: Add the Accelerated IP to a Vivado design
PYNQ and Zynq: the Vitis HLS Accelerator with DMA training - Part 3: Use the Hardware Accelerated Code in Software
PYNQ and Zynq: the Vitis HLS Accelerator with DMA training - Deep Dive: the data streams between Accelerator IP and ARM processors
Use the ZYNQ XADC with DMA part 1: bare metal
Use the ZYNQ XADC with DMA part 2: get and show samples in PYNQ
VHDL: Convert a Fixed Module into a Generic Module for Reuse

Top Comments

Jan Cumps over 3 years ago +1

I'm going to try make an own project with acceleration, now that I understand a bit how the development cycle ticks .

Jan Cumps over 3 years ago in reply to Jan Cumps

There's a known solution.
In current version of the DMA engine, if you use standard DMA, you have to set a final flag at the end of the exercise.
Here is an example of the change in code for the OpenCV image resizer example in Vitis HLS: https://github.com/mariodruiz/PYNQ-HelloWorld/blob/image_v2.7_2020.2/boards/ip/src/xf_resize_accel_stream.cpp.

I've tested it with version 2020.2 of the toolchain, on PYNQ 2.6 and 2.7.
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
Jan Cumps over 3 years ago in reply to Andrew J

Andrew J wrote:

Early days yet: made the ubiquitous 'Hello World', created a boot loader for the device, flashed lEDs and read a button state, but that's all PS. I'm still working on the basics but will be attempting to recreate what you've been doing in your blogs Jan. Keep them coming, they are being read!!
My blogs use the Pynq precompiled Linux image, so I'm not boot-loading or bare-metaling.
I'm following the Xilinx 4-part workshops, to see if I can do the core things that are explained in the next webinar on the Pynq board I have here...
But until now I've focused on custom FPGA bitstreams on an existing Linux with Pynq.
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
Jan Cumps over 3 years ago in reply to Jan Cumps

Jan Cumps wrote:

I'm going to try make an own project with acceleration, now that I understand a bit how the development cycle ticks .
Stuck, like everyone in the world it seems, at https://discuss.pynq.io/t/vitis-vision-core-fails-on-pynq-v2-5-1/1822
edit: I'm going to try from scratch: Learning Xilinx Zynq: Try to make my own Accelerated OpenCV Function - 1: Vitis HLS
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
Andrew J over 3 years ago in reply to Jan Cumps

Early days yet: made the ubiquitous 'Hello World', created a boot loader for the device, flashed lEDs and read a button state, but that's all PS. I'm still working on the basics but will be attempting to recreate what you've been doing in your blogs Jan. Keep them coming, they are being read!!
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
Jan Cumps over 3 years ago

I'm going to try make an own project with acceleration, now that I understand a bit how the development cycle ticks .
- Cancel
- Vote Up +1 Vote Down
- Sign in to reply
- More
- Cancel