Avnet Ultra96 Dev Board - Review

Table of contents

RoadTest: Avnet Ultra96 Dev Board

Author: volodya

Creation date:

Evaluation Type: Development Boards & Tools

Did you receive all parts the manufacturer stated would be included in the package?: True

What other parts do you consider comparable to this product?: Xilinx Zynq UltraScale+ MPSoC ZCU102

What were the biggest problems encountered?: PetaLinux didn't build. There is no adequate community support regarding software issues neither from Avnet nor from Xilinx.

Detailed Review:

This dev board is a convenient all-inclusive package for accessing features and functionalities of the programmable multiprocessor architecture. I became interested in it because I need an experimentation platform for accelerating cryptographic algorithms, in particular finite field operations. At the start of the roadtest I had a particular plan in mind to prototype very basic blocks of threshold cryptography: but that soon got bogged down in system administration issues. Even though the progress was hindered due to the inadequateness of Xilinx software in general, I'm pretty sure that Ultra96 hardware can be a way forward as long as cryptography acceleration is concerned. There is much to be done in providing open-source tooling but I'm sure we'll be there someday: the project X-Ray is underway already.


Hardware Architecture: Zynq UltraScale+ MPSoC

The device ZU3EG device (see Xilinx DS891) is a mid-range ZU device featuring


  • an application processor unit (APU, quad-core Cortex-A53),
  • a real-time processing unit (RPU, dual-core Cortex-R5) and
  • a graphical processing unit (GPU, Mali-400MP2).


The Platform Management Unit (PMU) is also considered to be one of the main processing systems of the device. It is built on a dedicated 32-bit processor core.


The additional video codec unit is only present in higher-end devices of the ZU family. Also a higher-end ZU device would be needed for PCIe Gen 3, high-speed serial interfaces or 100G Ethernet.


Thanks to its programmability, devices in the ZU family finds its applications in various software-defined and evolving systems. One of my most favourite examples of applications has been software-defined radio. But only the next device in the ZU line-up has some transceiver fractional PLLs. Something to consider when choosing your ZU dev board. This choice of a restricted device is ever more surprising given that a primary application for the EG family, as outlined by Xilinx, is radio communication:


Given its abilities and limitations, ZU3EG is great for any task that benefits from having APU, RPU, GPU and the programmable logic on the same die. 150K logic cells doesn't sound like a lot by modern standards but consider that you have

several processor cores in hard IP already with all dedicated bus infrastructure. Together, this is a powerful bundle.


Xilinx Software Stack

Software Setup

  1. Install Xilinx SDSoC (includes Vivado). This requires a license but a license is included with the board.
  2. Download from http://ultra96.org/ and install board definition files.


Design Workflow

  1. Define and build a hardware platform either in Vivado of in SDx.
  2. Create an application project on this hardware platform.
  3. Create a FSBL (First Stage Boot Loader) project.
  4. Create a boot image from the application project and assign the FSBL ELF image to the bootloader partition of the boot image.


Programming Options

1. Avnet JTAG-USB pod. My favourite option. It gives you UART console and JTAG with only one USB cable. Very neat.


2. SD Card. Boots thanks to the First Stage Boot Loader. **First Stage Bootloader (FSBL)** for Zynq Ultrascale+ MPSoC configures the FPGA

with hardware bit stream (if it exists) and loads

  •   the operating system image or
  •   the standalone image or
  •   the second stage boot loader image

from the non-volatile memory (microSD card) to RAM (DDR) and takes A53/R5 out of reset. It supports multiple partitions, and each partition can be a code image or a bit stream.


3. From Linux: https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18841645/Solution+Zynq+PL+Programming+With+FPGA+Manager.

  I'd like to try this one as soon as I sort out build issues with PetaLinux.



The required resources are not kept a single location but are rather scattered over Ultra96 and Xilinx websites. It's unclear why wouldn't the PetaLinux BSP feature on the Ultra96 site yet it doesn't. A version of it is found on the Xilinx website and it's existence is communicated by word of mouth.

The version of the BSP is in the URL above. I was lucky enough that version matching the latest PetaLinux SDK - 2018.3 - was available. Originally I got a link for a version 2018.2 of the BSP from the Ultra96 forum and, after I tried it with SDK 2018.3 and it didn't build, I took a wild guess at a URL of the 2018.3 BSP and voi là, it was there. It appears Xilinx forgot to put it on their PetaLinux 2018.3 download page. Honestly, both websites - Ultra96 and Xilinx - aren't in their best possible shape.



I'm still trying to compile the PetaLinux Yocto distribution. Compilation in my Debian Linux faced some issues with sstate-cache which I hope can be fixed by downloading the PetaLinux sstate-cache (which is HUGE) from the Xilinx site. But I'm yet to try this.


Other Yocto distros build fine on my machine in general without their sstate-cache. Something to investigate given enough time.



For debug purposes, steps 3 and 4 in Design Workflow above may be omitted by running the glorious Eclipse-powered debugger in Xilinx SDK. It works a treat unless confronted with some Linux file system IO, in which case it may silently crash. But crashes are relatively rare and there are solutions to avoid those such as not using the serial console in the IDE. The serial console in Xilinx SDK 2018.2 crashes the whole SDK on Linux. Use a normal Linux program such as screen or cu instead.


Test Procedure and Example Project: AXI DMA

For a demo project, let's take a well-motivated AXI DMA tutorial: http://www.fpgadeveloper.com/2014/08/using-the-axi-dma-in-vivado.html.


Unlike the Avnet Ultra96 tutorials 01-04, this tutorial does demonstrate the essence of PS-PL interaction in Vivado. The tutorial applies to the board with minor modifications.


First we construct a block design featuring a simple AXI FIFO (highlighted).




This block design can be used as a template for AXI-based PS-PL designs.  In my cryptographic application, that FIFO will be replaced with a more complex PL system. But for now the FIFO is the PL component that receives data from the PS via the AXI bus and returns that data unchanged to the PS. Something akin an echo test application.


Unlike in the tutorial, on Ultra96 we may optionally include the OCM (on-chip memory) segment in the memory mapping (excluded by default and causing non-critical warnings):




After that, generate the bitstream and export the BSP into SDK, including the bitstream. Launch the SDK and create the software project as the tutorial recommends. The only difference from the tutorial is that you need to switch the PS UART from UART0 to UART1 in system.mss of the BSP:


PARAMETER OS_NAME = standalone
PARAMETER stdin = psu_uart_1
PARAMETER stdout = psu_uart_1


With this all done, compile and run or debug the project over GDB using the Avnet pod and observe the output of the PS system in the TTY console. image



1. Avnet:


  - Matrix multiply (with PL). You get a hand on the SDx environment that allows to code PL firmware in C. Just in case you like that sort of thing.


  - Hello World (PL used in rudiment configuration). The Hello World tutorial is copied from UG1209 almost one-to-one except that PL requires a bitstream on Ultra96. Why would it be necessary? The tutorial offers no explanation. Yet it looks to be an interesting difference from the ZC102 board.


2. Xilinx UG1209: Zynq UltraScale+ MPSoC: Embedded Design Tutorial.


  - Hello World: A useful tutorial but you have to add a PL bitstream to every design, including ones in which PL is not configured to have any IP, and program the FPGA with that bitstream.


3. Using AXI DMA in Vivado by Jeff Johnson. The tutorial works well for Ultra96 with minor modifications.


Lessons Learned

As usual, the hardware paces well in front of software. Documentation is great on the Xilinx part and lacking on the Avnet part. Avnet tutorials 01-04 are heavily based on the Xilinx UG1209, repeating all the steps in the IDE with very minimal changes, yet without an acknowledgement or citation. Documentation links Avnet Ultra96 site are changing and sometimes broken. In short, start with Xilinx PDF manuals. They are the best.


Importing Verilog code into Vivado using the IP Creation and Packaging tool worked well (crashing just once though). I was able to integrate Verilog code into the block design and use it with Ultra96. Vivado even performed design automation by routing clock and reset signals of my new IP block.


Sometimes my exploration of IDE features in Vivado and Xilinx SDK led to those crashing. I noticed that crashes always coincided with certain Linux file system accesses. I didn't debug those crashes. They degraded my overall positive reaction to those tools. It's a shame all of the design process should go though an IDE that is not stable after many years of development. Open-sourcing the IDE code would help fixing its instability. But obviously that is not on the table because Vivado is proprietary.


On the sysadmin side, I found that you should allocate a lot of hard drive real estate to Xilinx tools upfront. My estimate is at least 110G, to the SDx bundle (that includes Vivado, SDK and SDx) with PetaLinux, and the latter with sstate-cache containing all necessary build dependencies. And that is not including the space needed to download archived installation files. I found myself spending too much of the time from what I had initially allocated to the roadtest project to sort out disk partitions and install the packages. The users of SDx could have benefited from a live image containing all of that which they could run in a virtual machine with a Linux distro that Xilinx supports. That would have been a real time saver.


Speaking of virtual machines, there is a Vivado Docker container. Might give it a try.