AVNET 96Boards Dual Camera Mezzanine + Ultra96-V2 - Review

Table of contents

RoadTest: AVNET 96Boards Dual Camera Mezzanine + Ultra96-V2

Author: dimiterk

Creation date:

Evaluation Type: Development Boards & Tools

Did you receive all parts the manufacturer stated would be included in the package?: True

What other parts do you consider comparable to this product?: There are very few dual camera mezzanine boards in the market. Even less for Ultra96 ecosystem.There are some single camera boards from D3 and AISTAIR vision and some expensive stereo rigs.

What were the biggest problems encountered?: Lack of documentation (under NDA) for both camera and ISP chipset. Video pipeline API does not allow access to independent image channels with existing API. Closed datasheet means kernel module issues cannot be addressed. V4L2 ISP drivers seems buggy when changing resolutions. No API for configuring cameras via V4L2. Non-existent documentation. The OOB image looks like it has an issue with the WIFI driver.

Detailed Review:

In this road-test I'll take a look at the Avnet Dual Camera mezzanine for Ultra96. Many thanks to @rscansy , Element14 and Avnet for providing the hardware.


The initial idea behind this review was to implement a stereo image pipeline given that the dual camera setup is primed for such an application.


First we'll take a look at the hardware design of the board, then we'll focus on the Vivado hardware image pipeline.

Next we'll look at the firmware/software support and finally see how we can use the dual camera in a real world application.

Last but not least this review will document all the current issues with the product.


Long story short, the camera mezzanine at the moment is crippled from a lack of documentation (NDAs), use of obsoleted IP blocks from XIlinx and issues with OOB image connectivity.







The block diagram taken from the product page is shown below:




The main IC is the AP1302 ISP chipset which  is used as an imaging co-processor. The AP1302 connects to the HS U96 connector via 4 MIPI CSI lane pairs.  This is a high speed serial interface for high resolution cameras. ZYNQ MPSOC series FPGAs IO pins contain the PHY necessary for interfacing with the MIPI protocol directly. There are 2 grayscale CAV10-000A cameras that are connected to the AP1302 also using a MIPI interface.


Each of the cameras contains 4 MIPI CSI lanes on their own. This in effect makes it not possible to connect both cameras without some sort of a serializer (in this case the ISP) since the HS connector contains only 2 MIPI channels and out of these only 1 (channel 0) is equipped with 4 lanes. The other channel contains only 2 lanes (MIPI channel 1)

The ISP chipset in turn uses the CSI0 on the HS connector as well as the SPI and I2C2 serial buses on this connector.  These are used for configuration. As you can see on the images below the AP1302 ISP I2C ID  is 0x265.






Other than that the hardware is pretty straightforward with the required camera LDOs. The clock for the ISP can be either provided from the HS connector CLK0 or from an external oscillator. The selection is made via a jumper.

You'll notice that there is also an option of having the master clock sourced from an external oscillator or via the HS connector.


The camera was tested both under low light conditions and under moderately bright (daylight) conditions.


DSI interface


The hardware also contains a 15 pin FPC connector which is connected to the DSI interface.

This interface is identical to the DSI interface of the raspberry pi DSI connector , so the same display can work if you have a datasheet for  configuration.


There was no display shipped with the package so this interface was not tested.






{gallery} My Gallery Title

Testing the OOB under low light conditions.image

The OOB uses 1920x1080 resolution of of the boximage

If you look closely there is some camera noise.image

Upon login one is met with the standard petalinux login.image

Stereo viewimage




The original resolution of 1920x1080p is not feasible for stereo applications. The computational resources to implement Stereo Local block matching with such a resolution go beyond the computational resources available on the Ultra96 board.

So the next logical step was to change the resolution of the individual images to VGA.

This requires editing the script that configures the gstreamer pipeline.



Changing resolution to VGA



media-ctl -d /dev/media0 -V '"ap1302.4-003c":0 [fmt:UYVY8_1X16/2560x800 field:none]'

media-ctl -d /dev/media0 -V '"a0020000.mipi_csi2_rx_subsystem":0 [fmt:UYVY8_1X16/2560x800 field:none]'
media-ctl -d /dev/media0 -V '"a0020000.mipi_csi2_rx_subsystem":1 [fmt:UYVY8_1X16/2560x800 field:none]'
media-ctl -d /dev/media0 -V  '"a0080000.v_proc_ss":0 [fmt:UYVY8_1X16/2560x800 field:none]'
media-ctl -d /dev/media0 -V  '"a0080000.v_proc_ss":1 [fmt:UYVY8_1X16/640x480 field:none]'
modetest -M xlnx -s 42:640x480@RG16 -P 38@40:640x480@YUYV -w 39:alpha:0 &
gst-launch-1.0 v4l2src device=/dev/video0 io-mode="dmabuf" ! "video/x-raw, width=640, height=480, format=YUY2, framerate=60/1" ! videoconvert ! kmssink plane-id=38 bus-id=fd4a0000.zynqmp-display render-rectangle="<0,0,640,480> fullscreen-overlay=true sync=false" -v



{gallery} My Gallery Title









As you can see above , the original image is scaled from the orginal resolution.

After initializing the MIPI cores the data is passed through the video processing subsystem which converts to  the appropriate format and does the scaling.

The video feed enumerates as a /video0 device under /dev.


This means one can use OpenCV or any other program to read the video feed once the gstreamer pipeline has started.




The OOB (Out of Box) image comes up with a script located under /usr/bin which configures the V4L2 pipeline and Gstreamer application to output the camera feed via the Displayport connector. To run it one has to issue


on the command line.

In this script both cameras are configured for 1080p resolution.


The only information one can glean about the camera is via the v4l2 API framework


{gallery} My Gallery Title



Planar YUV formats supported by the ISP







{gallery} My Gallery Title









Vivado Design




The Vivado hardware pipeline is composed of three blocks.


Block 1: This contains the ZYNQ MPSOC and reset IP


Block 2: This contains the MIPI CSI IP connected to the AP1302 and scaling IP together with a framebuffer write IP


Block 3: This contains the display output together with the timing generator and Video On Screen Display.



Vivado 2020.1 was setup on a virtual machine.



The design follows the steps outlined here:

Ultra96-V2 ON Semiconductor Dual Camera Mezzanine hardware build instructions


Ultra96-V2 Dual Camera Mezzanine Petalinux Build Instructions


However It was observed that the bitstream generation fails due to the Video On Screen Display block (OSD).

As you can see below there is an issue with the licensing of the VOSD core when the webpack license is used.






{gallery} My Gallery Title


Issue with OSD core


OSD core has been deprecated








The kernel driver for the camera is under this link:




There si only one other link on Gihub about this co-processor and even there there is no infomation on the ISP.





Since the generated Kernel of the OOB image already contains the AP1302 kernel driver , the next approach I took was to use PYNQ 2.6 rootFS filesystem together with the OOB image kernel. This did not work as expected even though the /dev/video enumerates.

It seems as there is an issue with the WIFI module on the OOB image kernel.


There is a reference design on Hackster :



however the author seems to have access to the SDK bare metal driver of the cameras and ISP chipset.


Without these it's not possible to have a working bare-metal implementation.




Software layer



The software layer makes use of the Video 4 Linux 2 framework and the Gstreamer API. The device enumerates as a video block device under /dev/video0



This allows one to read the video feed from a user-space application.


Unfortunately the OOB image does not include OpenCV. In addition it does not include the Vitis AI . PYNQ package was not possible to install due to issues with the WIFI connectivity.

After the U96 enumerates as an accesspoint and the WIFI network credentials are provided the connectivity still fails.



Stereo Application


The typical stereo application consists of four main general steps:


1. First there is the cost matching computation;

2   During the second step there is a cost aggregation;

3. Then a disparity selection is performed

4. The final step makes use of disparity refinement algorithms


No mechanical information has been published about the dual camera mezzanine regarding the camera distance from the center lines. This information is needed for depth inference once the stereo map is obtained.


The main difficulty implementing stereo application is that the video feed does not contain separate channels which must be sent concurrently to the Stereo IP core. Instead the video data from both channels is merged on the AP1302 ISP chip in one data stream where each of the camera feeds takes one virtual channel. This effectively requires either de-encapsulating the virtual video channels in hardware using VDMA or simply cropping the video feed in user-space and send each cropped camera feed to the Stereo core on the PL side.




Given the lack of documentation and ability to access individual camera feeds, implementing a stereo application is not straightforward or rather involves many more steps compared to a stereo camera jig.

Below , a typical stereo application using OpenCV python has been included. This however assumes that there are two /dev/video* devices one for each camera.



import numpy as np
import cv2
import argparse
import sys
from calibration_store import load_stereo_coefficients

def depth_map(imgL, imgR):
    """ Depth map calculation. Works with SGBM and WLS. Need rectified images, returns depth map ( left to right disparity ) """
    # SGBM Parameters -----------------
    window_size = 3  # wsize default 3; 5; 7 for SGBM reduced size image; 15 for SGBM full size image (1300px and above); 5 Works nicely

    left_matcher = cv2.StereoSGBM_create(
        P1=8 * 3 * window_size,
        # wsize default 3; 5; 7 for SGBM reduced size image; 15 for SGBM full size image (1300px and above); 5 Works nicely
        P2=32 * 3 * window_size,
    right_matcher = cv2.ximgproc.createRightMatcher(left_matcher)
    # FILTER Parameters
    lmbda = 80000
    sigma = 1.3
    visual_multiplier = 6

    wls_filter = cv2.ximgproc.createDisparityWLSFilter(matcher_left=left_matcher)

    displ = left_matcher.compute(imgL, imgR)  # .astype(np.float32)/16
    dispr = right_matcher.compute(imgR, imgL)  # .astype(np.float32)/16
    displ = np.int16(displ)
    dispr = np.int16(dispr)
    filteredImg = wls_filter.filter(displ, imgL, None, dispr)  # important to put "imgL" here!!!

    filteredImg = cv2.normalize(src=filteredImg, dst=filteredImg, beta=0, alpha=255, norm_type=cv2.NORM_MINMAX);
    filteredImg = np.uint8(filteredImg)

    return filteredImg

if __name__ == '__main__':
    # Args handling -> check help parameters to understand
    parser = argparse.ArgumentParser(description='Camera calibration')
    parser.add_argument('--calibration_file', type=str, required=True, help='Path to the stereo calibration file')
    parser.add_argument('--left_source', type=str, required=True, help='Left video or v4l2 device name')
    parser.add_argument('--right_source', type=str, required=True, help='Right video or v4l2 device name')
    parser.add_argument('--is_real_time', type=int, required=True, help='Is it camera stream or video')

    args = parser.parse_args()

    # is camera stream or video

        print("Can't opened the streams!")

    # Change the resolution in need
    cap_right.set(cv2.CAP_PROP_FRAME_WIDTH, 640)  # float
    cap_right.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)  # float

    cap_left.set(cv2.CAP_PROP_FRAME_WIDTH, 640)  # float
    cap_left.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)  # float

    while True:  # Loop until 'q' pressed or stream ends
        # Grab&retreive for sync images
        if not (cap_left.grab() and cap_right.grab()):
            print("No more frames")

        _, leftFrame = cap_left.retrieve()
        _, rightFrame = cap_right.retrieve()
        height, width, channel = leftFrame.shape  # We will use the shape for remap

        # Undistortion and Rectification part!
        leftMapX, leftMapY = cv2.initUndistortRectifyMap(K1, D1, R1, P1, (width, height), cv2.CV_32FC1)
        left_rectified = cv2.remap(leftFrame, leftMapX, leftMapY, cv2.INTER_LINEAR, cv2.BORDER_CONSTANT)
        rightMapX, rightMapY = cv2.initUndistortRectifyMap(K2, D2, R2, P2, (width, height), cv2.CV_32FC1)
        right_rectified = cv2.remap(rightFrame, rightMapX, rightMapY, cv2.INTER_LINEAR, cv2.BORDER_CONSTANT)

        # We need grayscale for disparity map.
        gray_left = cv2.cvtColor(left_rectified, cv2.COLOR_BGR2GRAY)
        gray_right = cv2.cvtColor(right_rectified, cv2.COLOR_BGR2GRAY)

        disparity_image = depth_map(gray_left, gray_right)  # Get the disparity map

        # Show the images
        cv2.imshow('left(R)', leftFrame)
        cv2.imshow('right(R)', rightFrame)
        cv2.imshow('Disparity', disparity_image)

        if cv2.waitKey(1) & 0xFF == ord('q'):  # Get key to stop stream. Press q for exit

    # Release the sources.




In order to use the camera mezannine with PYNQ or any stereo app the following may needs to be implemented:


a) revise design to use Mixer IP as opposed to VOSD

b) revise design to split virtual channels output into two separate AXIS streams

c) or use a VDMA to crop each L/R section of the MIPI output stream.






The good

1. The cameras work. You get two images side by side.

2. There is a basic no-frills V42L driver.

3. The cameras can be replaced thanks to the 30 pin connectors , though there are no options for color cameras.



The bad

1. Camera configuration codes and datasheet are under NDA. No release possible unless you are a commercial entity. I asked the FAE to provide binary blobs or SDK driver if possible and did not hear from them.

2. ISP configuration code and datasheet are under NDA. This effectively makes it impossible to implement a bare-metal solution.

3. The ISP provides a single datastream encapsulating both images in virtual channels. No access to each video stream independently adds another difficulty to the implementation of a stereo algorithm .

4. As of February the ISP driver, kernel module is still under development.

5. The OOB image lacks the userspace software for a video solution (OpenCV, Vitis AI)

6. The OOB image looks like it has a problem with the WIFI chipset configuration



The nonsense

Datasheets for ISP and cameras are under NDA so if the kernel module is buggy or does not expose the functionality then reverse engineering is the answer.

  • Hi Stefan,


    The plan is to revise the design in the near future but the workaround is not exactly optimal. I don't see any issue with image resize. The Gstreramer API above shows an example of resizing it to 640x40.

    It would be helpful if a binary blob is provided for the camera together with an SDK API. The SDK driver on the link below is not public so an SDK implementation would require patching together from Linux kernel driver some sort of API.


    Same comment would be valid for the MIPI register set of CAV10-000A cameras. The lack of datasheet makes it impossible for a embedded HW/FW developer to design their own hardware.

  • Hi Dimiterk,


    I have to emphasize that i am not familiar with any of the Xilinx stuff on this platform. It's been long time since I last worked with FPGAs and we were using it for prototyping ASIC. So we were not interested in any provided IP, even if it was available, since the point was to test our RTL exactly as it will be in ASIC.

    What I am imagining one could do in your case is to duplicate the incoming stream, feed each stream into a separate crop engine to crop out left and right image and then feed this into the stereo engine. This seems trivial from the perspective of how we used the FPGA, it would just take a bit of RTL coding. But I'm guessing here you are using building blocks provided by Xilinx and necessary blocks to do that are not available? If so, is it possible to insert your own RTL code between these blocks?

    You mentioned that certain IP cores lock if the frames are not in sync. How close in sync? Even if you output the two images on separate MIPI virtual channels, this still means you will receive line of data for one image and then a line of data for the other. This is how MIPI works. Using virtual channels does not mean that data from both images is literally transmitted at the same time, but rather interleaved on a packet to packet basis (and 1 packet is equal to 1 line if operating according to MIPI spec).


    Regarding image size changes, the AP1302 V4L2 driver does provide necessary controls to do that. So I'm not really sure what is the issue here.


    You've been highlighting the issue with unavailability of AP1302 documentation. Unfortunately our company have to follow certain protocols on this and that is not up to me. I know that doesn't help solving your problem, but I can assure you that even our tier 1 customers, with all the documentation we ever wrote for this part they wouldn't be able to configure AP1302 to split image into two virtual channels. They would need our help to do that. So you are not worse of than our tier 1 customers in this regard, if that helps image. It might be easier for them to get this kind of help, of course, but the whole idea behind this platform you are using is that it can be used by large number of users. So if we solve your problem, we may actually be solving the problem for many users and then things start to make sense business wise. So our future development on AP1302 and what we will implement or expose very much depends on users like you.




  • Hi Stefan,


    1. Yes that's the main issue. The workaround I can think of is it to modify the design to use a VDMA and crop the individual sensor streams from the interleaved output stream. This is doable but the main issue is to get both cropped output streams concurrently from two VDMAs plus an AXIS broadcaster .


    Also, there is no access to the full AP1302 datasheet so for stereo applications one need to configure for 640x480 resolution. Anything more would go beyond the fabric resources of Ultra96.

    The other workaround is to do the cropping on the PS side in software but there will be a performance penalty.


    So ideally the AP1303 AXI stream output would go to a AXI4S deinterleaver IP that splits the ISP merged camera outputs into two concurrent  video streams on the fabric . To do this one would have to a) program the ISP with proper resolution and b) understand the memory map of the ISP.  Any HLS IP cores that requires two concurrent output would lock if the frames are not in sync.



    I did not have access to the AP1302 datasheet so at the moment , the best one can do is compile the reference design with the SDSOC voucher for licensing the deprecated OSD IP core and couple the kernel with a PYNQ rootfs.


    The current kernel driver ap1302.c on Avnet repo does not have any API calls for de-interleaving the virtual channels so a datasheet/memory map is needed  for any hacking?


    The U96 unit I received for this road-test has a defective WIFI module so that's was the reason why I could not connect to WIFI correctly above.

  • Hi Dimiterk,


    Is my understanding correct that the main issue you have with AP1302 is the inability to output the two images via separate virtual channels rather than side-by-side as we do by default? This is technically possible (hardware supports it), but since we haven't had such request before we haven't exposed that through firmware. The side-by-side output was actually preferred so far.

    So, this is something that can be addressed in the future if necessary, but this could take months. I'm trying to see now if there is any alternative solution to your problem for the time being. I'm not familiar with the FPGA side of this, so I'm not sure what's possible there, but given the flexible nature of the FPGA, is it possible to split the streams before feeding it into the stereo core? Is there any performance disadvantage to it?

    Also I'm not clear if you were able compile the ap1302.c driver? I;m not familiar with Xilinx tools & petalinux so I was struggling with this myself a lot. This might be required if we want to try to hack around this somehow. Or at least you are able to write the AP1302 registers?




  • Thanks for your review. I am disappointed to hear everything is under NDAs. I thought the whole point of having the "IAS compatible" system was to have a unified, easy, documented configuration interface no matter what sensor is being used. But if the details stay under NDA then that's of no use. Unfortunately this seems to be a common issue with most imagers which makes it very hard for anyone other than big companies to build embedded imaging systems with cameras of choice. Rather, one has to find out which sensors are available with open(/leaked) datasheets and then use one of those. When using FPGAs sometimes it makes sense (or is even easier) to have a baremetal application doing the hardware setup, so if everything is hidden behind NDAs and precompiled drivers, then it doesn't make things easy.

    Best Regards,


  • Without the datasheet and ISP sample and camera code it's not possible to get a baremetal design like it was done by Adam Taylor.

    Even then it's not possible to get independent synchronous concurrent channels for the disparity core since the ISP code is just a binary blob and the datasheet in under NDA.


    I have shown the stereo design procedure in here so assuming the issues can be addressed , getting a stereo reference design is doable.


    ZYNQ Stereo Camera Platform - Part2 stereolbm with Vitis Vision libraries


    To get this working would require recompiling the kernel for PYNQ or simply revising the existing image connectivity issues, b) replacing the OSD with a video mixer c) revising the device tree .




    So ,I asked the Avent FAE and they did not respond to the email I sent.   Keeping datasheets under NDA for what amounts to a serializer does not make sense to me, unless there is something else. It simply makes this an order of magnitude more challenging than needed.


    Frankly I find it easier to design the whole thing from scratch.






  • I bought the Dual camera mezzanine a while ago. At the time the sample Vivado project and IP wasn't even publicly available but I managed to get a beta version. I'd hoped things had improved since then - but it doesn't look like a lot has changed. image It certainly has a lot of potential though.

  • Nice honest review.

    I was hoping that the system would live up to its potential, but it looks like the hardware was rushed out before the support tools could be completed.


    Sad, I was hoping to see a good stereo image analysis with some depth perception.