Sign Language Interpreter Blog 5: Running CNN image classification on Arty Z7 using PYNQ

8 Dec 2024

Introduction
In this blog, I will try to run ML on Pynq and do some image classification using neural network. Machine learning on PYNQ involves leveraging the FPGA (Field-Programmable Gate Array) capabilities of PYNQ boards to accelerate machine learning algorithms.

The earlier plan was to use pretrained tflite models on the FPGA board. But I started facing many issues. First I had to install the correct versions of packages, then whenever I tried to invoke the tensorflow interpreter the terminal crashed. Frustrated from the issues, I started looking into blogs of other participants for some help to see if I was doing something wrong. And in one good blog, I came to know that pynq has prebuilt hardware overlays for running inferencing on neural networks. So, in this blog I will follow an example jupyter notebook and learn about various steps involved in inferencing.

What is Neural Network?
A neural network is a computational model inspired by the way biological neural networks in the human brain process information. Neural networks are designed to recognize patterns, make predictions, and solve complex problems by learning from data.

What are components of a neural network?
Here's a breakdown of the main components:

Neurons (Nodes) : The basic units that receive input, process it, and produce an output.
Layers :
- Input Layer : The first layer that receives the initial data.
- Hidden Layers : Intermediate layers between the input and output layers that perform complex transformations. There can be multiple hidden layers.
- Output Layer : The final layer that produces the output prediction.
Weights : Parameters that determine the importance of each input. Weights are adjusted during training to minimize the error between the predicted and actual outputs.
Biases : Additional parameters added to the weighted sum before applying the activation function. Biases allow the activation function to shift, improving the model's flexibility.

Quantization of Neural Network?
Quantization of neural networks is a technique used to reduce the computational complexity and memory requirements of deep learning models. This is achieved by converting the model's weights and activations from high-precision floating-point numbers (e.g., 32-bit or 64-bit) to lower-precision fixed-point numbers (e.g., 8-bit or 4-bit). We will be using the QNN-MO-PYNQ (QNN stands for Quantized neural network) pip package in this tutorial. The source code can be found here.

Running ML on PYNQ
The QNN-MO-PYNQ package is tested on pynq2.5. So I had to uninstall the latest version of pynq and re-install pynq2.5 instead.

Make sure the board is connected to the internet. I have written about how to share the laptop wifi with the board in this blog.
Open a new notebook by clicking on 'New', then 'Python 3 (ipykernel)'.
Enter the below command and run it to install the QNN-MO-PYNQ package. This should create a QNN folder in the jupyter notebooks directory.
```
!pip3 install git+https://github.com/Xilinx/QNN-MO-PYNQ.git
```
There are many example notebooks in the qnn directory. I ran the tiny-yolo-image.ipynb to make sure that the flow is clean.

First we need to import all the required packages.

import sys
import os, platform
import json
import numpy as np
import cv2
import ctypes

from PIL import Image
from datetime import datetime


import qnn
from qnn import TinierYolo
from qnn import utils 
sys.path.append("/opt/darknet/python/")
from darknet import *

from matplotlib import pyplot as plt
%matplotlib inline

Now, we will create a classifier and pass the network architecture file to it. The network architecture file (.json) describes the architecture of the model, including the configuration of layers.
```
classifier = TinierYolo()
classifier.init_accelerator()
net = classifier.load_network(json_layer="/usr/local/lib/python3.6/dist-packages/qnn/params/tinier-yolo-layers.json")
```

Load weights for the CNN layers. The weights are stored in a numpy arrays binary file format. Transposing the axes is necessary to get correct format. The np.broadcast_to() command is performing a broadcasting operation on a bias array to match the shape required for the next layer in the neural network.

conv0_weights = np.load('/usr/local/lib/python3.6/dist-packages/qnn/params/tinier-yolo-conv0-W.npy', encoding="latin1", allow_pickle=True)
conv0_weights_correct = np.transpose(conv0_weights, axes=(3, 2, 1, 0))
conv8_weights = np.load('/usr/local/lib/python3.6/dist-packages/qnn/params/tinier-yolo-conv8-W.npy', encoding="latin1", allow_pickle=True)
conv8_weights_correct = np.transpose(conv8_weights, axes=(3, 2, 1, 0))
conv0_bias = np.load('/usr/local/lib/python3.6/dist-packages/qnn/params/tinier-yolo-conv0-bias.npy', encoding="latin1", allow_pickle=True)
conv0_bias_broadcast = np.broadcast_to(conv0_bias[:,np.newaxis], (net['conv1']['input'][0],net['conv1']['input'][1]*net['conv1']['input'][1]))
conv8_bias = np.load('/usr/local/lib/python3.6/dist-packages/qnn/params/tinier-yolo-conv8-bias.npy', encoding="latin1", allow_pickle=True)
conv8_bias_broadcast = np.broadcast_to(conv8_bias[:,np.newaxis], (125,13*13))

Now, we need to load and configure the neural network based on the settings specified in the configuration file.

file_name_cfg = c_char_p("/usr/local/lib/python3.6/dist-packages/qnn/params/tinier-yolo-bwn-3bit-relu-nomaxpool.cfg".encode())
net_darknet = lib.parse_network_cfg(file_name_cfg)

I have downloaded an image of a horse and a car from internet on which I want to do image detection.
The neural network has been trained on the PASCAL VOC (Visual Object Classes) and is able to identify 20 classes of objects in an image, namely:
Person: person
Animal: bird, cat, cow, dog, horse, sheep
Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train
Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor
```
img_file = '/home/xilinx/jupyter_notebooks/qnn/horse_car.jpg'
file_name = c_char_p(img_file.encode())

img = load_image(file_name,0,0)
img_letterbox = letterbox_image(img,416,416)
img_copy = np.copy(np.ctypeslib.as_array(img_letterbox.data, (3,416,416)))
img_copy = np.swapaxes(img_copy, 0,2)
free_image(img)
free_image(img_letterbox)

im = Image.open(img_file)
im
```

Now, we will run the first layer of the neural network. The first layer of this neural network has not been quantized, thus will not be executed in the HW accelerator.

start = datetime.now()
img_copy = img_copy[np.newaxis, :, :, :]
    
conv0_ouput = utils.conv_layer(img_copy,conv0_weights_correct,b=conv0_bias_broadcast,stride=2,padding=1)
conv0_output_quant = conv0_ouput.clip(0.0,4.0)
conv0_output_quant = utils.quantize(conv0_output_quant/4,3)
end = datetime.now()
micros = int((end - start).total_seconds() * 1000000)
print("First layer SW implementation took {} microseconds".format(micros))
print(micros, file=open('timestamp.txt', 'w'))

The core layers, which values have been quantized during training, are executed in the Programmable Logic.

out_dim = net['conv7']['output'][1]
out_ch = net['conv7']['output'][0]

conv_output = classifier.get_accel_buffer(out_ch, out_dim)
conv_input = classifier.prepare_buffer(conv0_output_quant*7);

start = datetime.now()
classifier.inference(conv_input, conv_output)
end = datetime.now()

conv7_out = classifier.postprocess_buffer(conv_output)

micros = int((end - start).total_seconds() * 1000000)
print("HW implementation took {} microseconds".format(micros))
print(micros, file=open('timestamp.txt', 'a'))

Execute the last layer of the neural network.

start = datetime.now()
conv7_out_reshaped = conv7_out.reshape(out_dim,out_dim,out_ch)
conv7_out_swapped = np.swapaxes(conv7_out_reshaped, 0, 1) # exp 1
conv7_out_swapped = conv7_out_swapped[np.newaxis, :, :, :] 

conv8_output = utils.conv_layer(conv7_out_swapped,conv8_weights_correct,b=conv8_bias_broadcast,stride=1)  
conv8_out = conv8_output.ctypes.data_as(ctypes.POINTER(ctypes.c_float))

end = datetime.now()
micros = int((end - start).total_seconds() * 1000000)
print("Last layer SW implementation took {} microseconds".format(micros))
print(micros, file=open('timestamp.txt', 'a'))

Now we will draw bounding boxes using darknet and print probabilities.

lib.forward_region_layer_pointer_nolayer(net_darknet,conv8_out)
tresh = c_float(0.3)
tresh_hier = c_float(0.5)
file_name_out = c_char_p("/home/xilinx/jupyter_notebooks/qnn/detection".encode())
file_name_probs = c_char_p("/home/xilinx/jupyter_notebooks/qnn/probabilities.txt".encode())
file_names_voc = c_char_p("/opt/darknet/data/voc.names".encode())
darknet_path = c_char_p("/opt/darknet/".encode())
lib.draw_detection_python(net_darknet, file_name, tresh, tresh_hier,file_names_voc, darknet_path, file_name_out, file_name_probs);

#Print probabilities
file_content = open(file_name_probs.value,"r").read().splitlines()
detections = []
for line in file_content[0:]:
    name, probability = line.split(": ")
    detections.append((probability, name))
for det in sorted(detections, key=lambda tup: tup[0], reverse=True):
    print("class: {}\tprobability: {}".format(det[1], det[0]))

Now lets see the results.

res = Image.open(file_name_out.value.decode() + ".png")
res

Summary
In this blog, we learned how to use the prebuilt pynq overlays for ML neural network inferencing. By quantizing the layers, the model ran more efficiently on the hardware accelerator and produced fast and accurate result on the test image.

References

flyingbean 3 months ago

Hi bidrohini: you might run ARduino with Firmata. Here is one blog on this topic: Control Arduino with Python and pyFirmata (from Raspberry Pi). In short,Raspberry Pi is needed as a host for pyFirmata. Arduino can run Firmata.
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
cbohra00627 3 months ago in reply to bidrohini

I am not familiar with this. But this project uses some custom fuctions to interact with the FPGA and run the neural inferencing on FPGA. Arduino doesn't have any FPGA part so I think you wouldn't be able to run the code as it is but if the pyfirmata provides any framework for running ML on arduino, you can try that.
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
bidrohini 3 months ago

Can I use ARduino+pyfirmata for this project?
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel