ZYNQ Stereo Camera Platform - Part2 stereolbm with Vitis Vision libraries

9 Dec 2020

This project details how to build a stereo depth camera with AI capabilities on a ZYNQ MPSOC platform.

This time we'll see how to use the Vitis Vison layer L1 libraries and PYNQ framework to implement a complete stereo depth pipeline.

There are a number of past and existing bugs on Vitis Vision libraries so this procedure has not been smooth.

In addition the PYNQ framework is in a state of flux (from 2.5.1 to 2.6) so there are some API changes.

1. Setting up the environment

First , install Vitis on Ubuntu 18.04. This is not strictly supported so you'll need to modify /etc/release with the 18.04.4

sudo nano /etc/os-release

Now clone the Vitis libraries from this link:

git clone https://github.com/Xilinx/Vitis_Libraries.git

Install cmake :

sudo apt-get install cmake

2. Install Open CV

Then install OpenCV 3.4.4.

This is needed in order to compile and simulate the Vitis Vision libraries. Note the exact version above.

mkdir ~/opencv_build && cd ~/opencv_build
git clone https://github.com/opencv/opencv.git
git clone https://github.com/opencv/opencv_contrib.git

unzip opencv.zip
unzip opencv_contrib.zip
mv opencv-3.4.4 opencv
mv opencv_contrib-3.4.4 opencv_contrib
cd ~/opencv
ls
cd opencv
mkdir build
cd build

cd ~/opencv_build/opencv
mkdir build && cd build


make -j4
sudo make install
sudo ldconfig
pkg-config --modversion opencv
ls /usr/local/python/cv2/python-3.6
cd /usr/local/python/cv2/python-3.6
sudo mv cv2.cpython-36m-x86_64-linux-gnu.so cv2.so
cd ~/.virtualenvs/cv/lib/python3.6/site-packages/

3. Setup Vitis Vision IP core

The Vitis vision cores uses AXI for data transfer and AXI lite for parameter configuration.

To minimize energy consumtion and implement these algorithms on the fly for incoming imaging data we'll have to re-write the interfaces in streaming format.

Before we do that , test the IP by performing C simulation and COSIM.

There are some additional steps that have to be defined either in a settings.tcl file or added to the same tcl file as shown below in order to give the location of the opencv

library we installed before.

This is needed for co-simulation and synthesis.

#source settings.tcl

set PROJ "erosion.prj"
set SOLN "sol1"


set XF_PROJ_ROOT "/home/user/Documents/Vitis_Libraries/vision/"
set OPENCV_INCLUDE "/usr/local/include/opencv2"
set OPENCV_LIB "/usr/local/lib"
set XPART "xczu9eg-ffvb1156-2-i"
set CSIM "1"
set CSYNTH "1"
set COSIM "1"
set VIVADO_SYN "0"
set VIVADO_IMPL "0"

One can use either Vitis_HLS or Vivado HLS . There are minor differences between the two so the user has to be aware that once you create a project with Vitis HLS you won't be able to open the project on Vivado HLS.

To generate the core issue:

Vitis_Libraries/vision/L1/examplest/stereolbm

and issue:

vivado_hls -f script.tcl

Depending on the flags this will be used to synthesize , simulate and co-simulate the IP core.

There are two stereo vision IP cores a) stereolbm and b) stereo block matching.

When simulating the global block matgcing IP I was not able to get a proper output so that left the stereo local block matching algorithm.

3. Vitis Vision IP With PYNQ

Initially the following IP core was tested

git clone --recursive https://github.com/Xilinx/PYNQ-HelloWorld.git

The problem is that is uses an old version of Vitis Vision with bugs and implements custom conversion functions from xf::MAT to axi stream that are not part of the API.

So here one is left with two choices, a) implement the algorithm on your own or b) find a way to get the Vitis vison cores working in streaming mode.

Another issue is the bugs present on Vitis vision

https://github.com/Xilinx/Vitis_Libraries/issues/28

And the same issues when trying to interface these IPs with PYNQ framework.

https://discuss.pynq.io/t/vitis-vision-core-fails-on-pynq-v2-5-1/1822/17

So bottom line is that one has to re-write the interfaces with custom data types.

template <int W>
struct axis_t {
    ap_uint<W> data;
    ap_int<1> last;
};


/*
Unpack a AXI video stream into a xf::cv::Mat<> object
 *input: AXI_video_strm
 *output: img
 */

template <int TYPE, int ROWS, int COLS, int NPPC>
int AXIstream2xfMat(hls::stream<axis_t<8>>& AXI_video_strm, xf::cv::Mat<TYPE, ROWS, COLS, NPPC>& img) {
    axis_t<8> pixelpacket;
    int res = 0;

    int rows = img.rows;
    int cols = img.cols;
    int idx = 0;

    assert(img.rows <= ROWS);
    assert(img.cols <= COLS);

    loop_row_axi2mat:   for (int i = 0; i < rows; i++) {
        loop_col_axi2mat:     for (int j = 0; j < cols; j++) {
            // clang-format off
                #pragma HLS loop_flatten off
                #pragma HLS pipeline II=1
            // clang-format on
            AXI_video_strm >> pixelpacket;
                img.write(idx++, pixelpacket.data);
        }
       }
    return res;
}

// Pack the data of a xf::cv::Mat<> object into an AXI Video stream
/*
 *  input: img
 *  output: AXI_video_strm
 */
template <int TYPE, int ROWS, int COLS, int NPPC>
int xfMat2AXIstream(xf::cv::Mat<TYPE, ROWS, COLS, NPPC>& img, hls::stream<axis_t<8>>& AXI_video_strm) {

    axis_t<8> pixelpacket;
    int res = 0;

    int rows = img.rows;
    int cols = img.cols;
    int idx = 0;

    assert(img.rows <= ROWS);
    assert(img.cols <= COLS);

    bool sof = true; // Indicates start of frame

    loop_row_mat2axi: for (int i = 0; i < rows; i++) {
        loop_col_mat2axi: for (int j = 0; j < cols; j++) {
            // clang-format off
            #pragma HLS loop_flatten off
            #pragma HLS pipeline II=1
            // clang-format on
                
                ap_uint<1> tmp = 0;
                if ((i==rows-1) && (j== cols-1)) {
                    tmp = 1;
                }

                pixelpacket.last = tmp;
                pixelpacket.data = img.read(idx++);

                AXI_video_strm << pixelpacket;

            }
        }

    return res;
}

4.Simulating stereo IP core

There are a couple of algorithms for stereo depth perception. It's important to note that these algorithms require a lot of resources so image resolution needs to be modified for implementation,.

We will use the cones images from the Middlebury dataset, however the images will be downsized to 320x240 pixels and converted to grayscale before hand.

This is done in order to preserve resources as the stereo core uses a lot of fabric logic resources.

void stereolbm_accel(stream_t& stream_inL,stream_t& stream_inR, streamwide_t& stream_out, int height, int width) {

    #pragma HLS INTERFACE s_axilite port=height
    #pragma HLS INTERFACE s_axilite port=width
    #pragma HLS INTERFACE s_axilite port=return
    #pragma HLS INTERFACE axis port=stream_inL
    #pragma HLS INTERFACE axis port=stream_inR
    #pragma HLS INTERFACE axis port=stream_out
    xf::cv::Mat<IN_TYPE, HEIGHT, WIDTH, NPCC> imgInputL(height, width);
    xf::cv::Mat<IN_TYPE, HEIGHT, WIDTH, NPCC> imgInputR(height, width);
    xf::cv::Mat<OUT_TYPE, HEIGHT, WIDTH, NPCC> imgOutput(height, width);
    //xf::cv::Mat<IN_TYPE, HEIGHT, WIDTH, NPCC> imgOutputStream(height, width);
    xf::cv::xFSBMState<SAD_WINDOW_SIZE, NO_OF_DISPARITIES, PARALLEL_UNITS> bmState;

    // Initialize SBM State:

    bmState.preFilterCap = 31;
    bmState.uniquenessRatio = 15;
    bmState.textureThreshold = 20;
    bmState.minDisparity =  0;
// clang-format off
    #pragma HLS DATAFLOW
    // clang-format on
    // Retrieve xf::Mat objects from img_in data:
    AXIstream2xfMat<IN_TYPE,HEIGHT,WIDTH,NPCC>(stream_inL, imgInputL);
    AXIstream2xfMat<IN_TYPE,HEIGHT,WIDTH,NPCC>(stream_inR, imgInputR);
    // Run xfOpenCV kernel:
    xf::cv::StereoBM<SAD_WINDOW_SIZE, NO_OF_DISPARITIES, PARALLEL_UNITS, IN_TYPE, OUT_TYPE, HEIGHT, WIDTH, NPCC,XF_USE_URAM>(imgInputL, imgInputR, imgOutput, bmState);
    // Convert _dst xf::Mat object to output array:
    xfMat2AXIstreamwide<OUT_TYPE,HEIGHT,WIDTH,NPCC>(imgOutput, stream_out);
}

The output from CSIM and COSIM is the depth disparity map

5. Testing on device

To test on the device a stereo camera mezzanine is needed with camera sources. I opted to simulate this using two DMA to write the stereo pair and one DMA to read the stereo output.

The input to the stereo core are 8bit grayscale images however the output is 16 bit grayscale image.

PYNQ uses 8 and 32 bit for the DMA datatype. Hence in order to deal with the 16 bit output one has either to use the convertbit depth IP and modify it into streaming mode or convert the data width of the streaming output.

The Vivado hardware block design is shown below.

The Python jupyter notebook together with the IP cores is given on the github repo link below:

https://github.com/Q-point/StereoIPcores_MPSOC

Next step is to accelerate the RGB to grayscale and grayscale resize IP on HW following the same procedure as above and DPU image segmentation to the original color input in order to implement a depth camera with AI capabilities.

ttaa 2 months ago

Hello. Thank you for sharing your work; I’m implementing something similar, and it has been very helpful. However, when I tried running the notebook I found in the repository you shared, I couldn’t get past the wait() in the DMA transfer. Have you encountered the same issue?
Thanks.
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel