All my previous blogs have been mostly experimenting with Vivado, Vitis and Petalinux tools for design and implementation as the training material has be based on it. I planned the final Training blogs to try an image processing application on ultra96-development-board. But I have had some issues with the cable connecting the display port to my monitor, so I had to change my plans. I have been trying to explore the different methods/tools to develop image processing algorithms on the ultra96-development-board. I will give some details and summarize the methods in this blog (No implementation provides just design exploration is discussed).
Traditional approach:
In the traditional approach if we had to develop an image processing algorithm, the following are the steps that would be taken
a. Understanding the algorithm (Filtering, edge detection, object identification etc) and modelling the algorithm in python
b. Break the algorithm into smaller modules
b. Developing the RTL for the algorithm modules
c. Using testbench to simulate RTL and have a mechanism to test it against the model
d. Selecting the right FPGA/SoC platform for the project. Need to have an understanding of logic cells required, interfaces supported etc.
e. Synthesis, implementation, routing and timing closure for the design using synthesis and implementation tools.
f. Prototyping the generated bitstream on the platform and testing it with sources, sink.
g. Iterate over the design until objectives and performance are reached.
Below are some newer approaches that are available at our disposal thanks to recent EDA tool developments.
I have ignored the following approaches for some time, but I think this is the right time and opportunity to understand the tools and dig a bit deeper into it.
AMD Vitis Model Composer:
When I was searching for the ultra96v2 documentation, I landed across the Avnet page and towards the end of the references I noticed a link.
https://www.mathworks.com/videos/series/getting-started-with-the-avnet-ultra96-development-board.html
It was redirecting to Mathworks website and it have a four-part video series that goes through a complete process of prototyping and deploying image processing algorithms on Ultra96 board using Model Composer. That was what prompted me to look into the AMD Vitis Model Composer. I would definitely like to try this (but found a license fee in the order of few hundered dollars).
Model Composer is a model-based design tool (if you have have used Simulink before or any block based design tool). It helps in rapid design exploration using the MathWorks MATLAB and Simulink tools. We need to purchase an additional license to Vivado ML Standard or Enterprise Editions and the Vitis development environment.
Vitis AI is designed to enable accelerated machine learning inference on AMD hardware platforms. It is an integral component of Vitis AI and is a graphical user interface (GUI) tool that helps developers build, optimize, and deploy deep learning models on AMD devices. It aims to simplify the process of creating and deploying machine learning models by providing a user-friendly interface for model design and optimization.
Key features of Vitis Model Composer typically include:
1. Graphical Model Design: Developers can create and design deep learning models by dragging and dropping pre-built layers, configuring their properties, and connecting them to construct the desired architecture.
2. Optimization: The tool may offer features for model compression, quantization, and other techniques to optimize the model for inference on AMD hardware, ensuring it runs efficiently and with lower resource requirements.
3. Model Deployment: After designing and optimizing the model, developers can use Vitis Model Composer to deploy the model on AMD devices, ready for accelerated inference.
PYNQ framework
PYNQ (Python Productivity for Zynq) is an open-source framework (much alinged with my interests) developed by AMD that enables Python developers to utilize the capabilities of Adaptive compute platforms. More details can be found in http://www.pynq.io/
The main idea behind PYNQ is to provide a platform that makes it easier for software developers who are familiar with Python to work with FPGA-based hardware accelerators. PYNQ achieves its objectives through the following components:
a. Python APIs: Python libraries and APIs exposeed by PYNQ allow developers/users to interact with the programmable logic of the Adaptive platform (Zynq, MPSoC's etc) using familiar Python syntax. This enables them to accelerate their Python applications using FPGA-based hardware accelerators.
b. Jupyter Notebooks: PYNQ utilizes Jupyter Notebooks (browser based interactive computing environment) as its programming interface. Jupyter Notebooks allow developers to write code, visualise data, create interactive widgets enabling them to experiment, learn, develop and accelarate applications rapidly.
c. Bitstream Overlay: PYNQ uses "bitstream overlays" to implement certain (prebuilt) features in the FPGA designs. Developers can switch between different overlays easily to reconfigure the FPGA for different applications.
d. Pre-built Python Libraries: PYNQ provides a set of pre-built libraries for common functions, such as image and signal processing, neural networks etc. The availablity of these libraries enable quick development of applications without requiring to write hdl or FPGA design for that matter.
PYNQ is a typically the popular choice for hardware-accelerated of any application especially for machine learning, image processing, data analytics, and digital signal processing domains. It provides a headstart for software developers to leverage the performance benefits of FPGAs without going into the nitty-gritty details of the FPGA design.
Planned custom flow (based on PYNQ and Model composer):
Based on the above discussions below is the plan/blueprint for developing an AI-based image processing application on the Ultra96 board.( Note: The following is subjected to change as I read about the topics futher and understand pros/cons):
1. Setting up the environment:
Install the necessary software tools (I have no more disk space :-() and drivers for the Ultra96 board, including the AMD Vitis platform and PYNQ framework (can get a bootable image from https://github.com/Avnet/Ultra96-PYNQ/releases).
2. Preparing the AI Model and doing model optimization:
Based on the image processing task of interest (e.g., object detection, image classification, etc.), we can choose and design an appropriate AI model for the task. Pre-trained models are available and it can be fine-tuned for specific application or we can build custom models (based on your skillset) using popular deep learning frameworks like TensorFlow or PyTorch. We need to train the model on a host computer using a dataset relevant to your image processing task. Then we need to optimize it for inference on the FPGA of the Ultra96 board. This may involve quantizing the model, removing unnecessary layers, and applying other techniques to improve performance and efficienty.
3. Converting the Model for FPGA:
The model should be converted to a format compatible with the AMD FPGA/SoC. We might need to use the DPU (Deep Learning Processing Unit) compatible format.
4. Building the Overlay:
Create an FPGA bitstream that includes the DPU and necessary hardware interfaces for the image input and output. Build the bitstream and generate the necessary configuration files for the FPGA.
5. Prototyping/Deploying the Model on Ultra96:
Transfer the bitstream/image and configuration files to the Ultra96 board. Load the bitstream onto the FPGA and configure it accordingly.
6. Integration with PYNQ and Jupyter Notebooks:
Integrate the AI model and associated Python code into a Jupyter Notebook using PYNQ. Create a Jupyter Notebook to allow us to interact with the image processing application. This should be very similar to how we manipulate images using OpenCV libraries.
7. Testing and Validation:
Test the AI-based image processing application with sample images and then to real-time image inputs. Verify that the application is functioning correctly and meets our performance requirements.
9. Performance Optimization and deployment:
If required, fine-tune the application's performance by optimizing the code, reducing latency, and improving efficiency.
Finally, I conclude the last training blog. Thanks for reading and hopefully you have learned a few things.
Top Comments