PYNQ-Z2 Dev Board: Python Productivity for Zynq® - Review

Name: PYNQ-Z2 Dev Board: Python Productivity for Zynq®
Rating: 4.5208335 (4 reviews)

sambit1991 over 5 years ago

RoadTest: PYNQ-Z2 Dev Board: Python Productivity for Zynq®

Author: sambit1991

Creation date: 21 Jul 2019

Evaluation Type: Development Boards & Tools

Did you receive all parts the manufacturer stated would be included in the package?: True

What other parts do you consider comparable to this product?: Ultra96

What were the biggest problems encountered?: 1) Very unorganized and inadequate documentation. 2) Key concepts such as overlay design and not very well explained. 3) The third party repositories such as BNN are not very well documented and mostly only support inference.

Detailed Review:

1. Introduction

Element14 sent me a Pynq FPGA development kit and I had to road test it and provide a review of several topics such as :

Functionality
Ease of use
Performance
Scalability

Here is what I did with it.

2. Unboxing

The shipment came in a nice element14 box. The actual board and other accessories were neatly and securely packed inside in smaller boxes, with the Pynq Z2 having nice logos of Xilinx and TuL.

It also contained a USB cable for connecting to a laptop, an ethernet cable and power adapters.

I also found a SD card preloaded with Pynq image and an adapter for SD card.

Everything was pretty neat, secured and nicely packed and arrived on time.

Element14 helped me out a little more by shipping it on a preferred date, since I was not at home on the first planned date.

3. Method of review

My major area of interest in using FPGA is in machine learning application. The review is done using some machine learning applications.

The key aspects of review are:

Speed of inference
Usability with own data or data not in the training and test sets.
Ease of building new models using code from example models.
Comparision between hardware accelerated and software inference in terms of speed and accuracy

4. Applications

4.1 CIFAR 10 - Automobile detection

4.1.1 About the application

CIFAR 10 is a dataset containing 32*32 images of several categories:

print(hw_classifier.classes)

['Airplane', 'Automobile', 'Bird', 'Cat', 'Deer', 'Dog', 'Frog', 'Horse', 'Ship', 'Truck']

The detailed code can be found in the code section.

The application uses a quantized binary neural network with the following architecture:

6 convolutional layers

3 max pool layers

3 fully connected layers.

The network is pre-trained and the weights are stored.

Instantiating a hardware or software version of the classifier, loads the trained weights and makes the network ready for inference.

Since my interest is in automobile detection, I have tested this network with automobile images downloaded from the internet.

The downloaded images were first transferred to the Pynq. This was super easy - thanks to the network drive that can be accessed as : \\192.168.2.99\xilinx

4.1.2 Code

The following code, iteratively fetches 9 images of cars from the Pynq file system, stored at :

It then passes each of the images to the classifiers instantiated in hardware and software respectively.

# take the above pieces and put together into a function
import bnn
from PIL import Image
import numpy as np
import time


hw_classifier = bnn.CnvClassifier(bnn.NETWORK_CNVW1A1,'cifar10',bnn.RUNTIME_HW)
sw_classifier = bnn.CnvClassifier(bnn.NETWORK_CNVW1A1,'cifar10',bnn.RUNTIME_SW)


im_name = '/home/xilinx/jupyter_notebooks/SAMBIT_DATA/car' + str(4) + '.jpg'
print("Classifying image : {0}".format(im_name))
im = Image.open(im_name)
im        
def classify_images_cifar10():
    """
    Function classifies a series of images using the hardware classifier
    platform : hw -> hardware
                sw -> software
    """
    
    print("Available classes")
    print(hw_classifier.classes)
    
    print ("========================== hardware classifications ==============================")
    for i in range(1,10):
        im_name = '/home/xilinx/jupyter_notebooks/SAMBIT_DATA/car' + str(i) + '.jpg'
        print("Classifying image : {0}".format(im_name))
        im = Image.open(im_name)
        im
        
        class_out=hw_classifier.classify_image(im)
        print("Class number: {0}".format(class_out))
        print("Class name: {0}".format(hw_classifier.class_name(class_out)))
        
    
    print("======================== software classifications ===============================")
    for i in range(1,10):
        im_name = '/home/xilinx/jupyter_notebooks/SAMBIT_DATA/car' + str(i) + '.jpg'
        print("Classifying image : {0}".format(im_name))
        im = Image.open(im_name)
        im
        
        class_out = sw_classifier.classify_image(im)
        print("Class number: {0}".format(class_out))
        print("Class name: {0}".format(sw_classifier.class_name(class_out)))
    
    return im
        


if __name__ == '__main__':
    im = classify_images_cifar10()
    im

Following is the output:

Classifying image : /home/xilinx/jupyter_notebooks/SAMBIT_DATA/car4.jpg

Available classes

['Airplane', 'Automobile', 'Bird', 'Cat', 'Deer', 'Dog', 'Frog', 'Horse', 'Ship', 'Truck']

========================== hardware classifications ==============================

Classifying image : /home/xilinx/jupyter_notebooks/SAMBIT_DATA/car1.jpg

Inference took 1582.00 microseconds

Classification rate: 632.11 images per second

Class number: 8

Class name: Ship

Classifying image : /home/xilinx/jupyter_notebooks/SAMBIT_DATA/car2.jpg

Inference took 1582.00 microseconds

Classification rate: 632.11 images per second

Class number: 1

Class name: Automobile

Classifying image : /home/xilinx/jupyter_notebooks/SAMBIT_DATA/car3.jpg

Inference took 1583.00 microseconds

Classification rate: 631.71 images per second

Class number: 8

Class name: Ship

Classifying image : /home/xilinx/jupyter_notebooks/SAMBIT_DATA/car4.jpg

Inference took 1581.00 microseconds

Classification rate: 632.51 images per second

Class number: 1

Class name: Automobile

Classifying image : /home/xilinx/jupyter_notebooks/SAMBIT_DATA/car5.jpg

Inference took 1582.00 microseconds

Classification rate: 632.11 images per second

Class number: 1

Class name: Automobile

Classifying image : /home/xilinx/jupyter_notebooks/SAMBIT_DATA/car6.jpg

Inference took 1582.00 microseconds

Classification rate: 632.11 images per second

Class number: 1

Class name: Automobile

Classifying image : /home/xilinx/jupyter_notebooks/SAMBIT_DATA/car7.jpg

Inference took 1582.00 microseconds

Classification rate: 632.11 images per second

Class number: 0

Class name: Airplane

Classifying image : /home/xilinx/jupyter_notebooks/SAMBIT_DATA/car8.jpg

Inference took 1582.00 microseconds

Classification rate: 632.11 images per second

Class number: 8

Class name: Ship

Classifying image : /home/xilinx/jupyter_notebooks/SAMBIT_DATA/car9.jpg

Inference took 1582.00 microseconds

Classification rate: 632.11 images per second

Class number: 3

Class name: Cat

======================== software classifications ===============================

Classifying image : /home/xilinx/jupyter_notebooks/SAMBIT_DATA/car1.jpg

Inference took 1587185.00 microseconds

Classification rate: 0.63 images per second

Class number: 8

Class name: Ship

Classifying image : /home/xilinx/jupyter_notebooks/SAMBIT_DATA/car2.jpg

Inference took 1586030.00 microseconds

Classification rate: 0.63 images per second

Class number: 1

Class name: Automobile

Classifying image : /home/xilinx/jupyter_notebooks/SAMBIT_DATA/car3.jpg

Inference took 1586563.00 microseconds

Classification rate: 0.63 images per second

Class number: 8

Class name: Ship

Classifying image : /home/xilinx/jupyter_notebooks/SAMBIT_DATA/car4.jpg

Inference took 1586526.00 microseconds

Classification rate: 0.63 images per second

Class number: 1

Class name: Automobile

Classifying image : /home/xilinx/jupyter_notebooks/SAMBIT_DATA/car5.jpg

Inference took 1586699.00 microseconds

Classification rate: 0.63 images per second

Class number: 1

Class name: Automobile

Classifying image : /home/xilinx/jupyter_notebooks/SAMBIT_DATA/car6.jpg

Inference took 1604946.00 microseconds

Classification rate: 0.62 images per second

Class number: 1

Class name: Automobile

Classifying image : /home/xilinx/jupyter_notebooks/SAMBIT_DATA/car7.jpg

Inference took 1586711.00 microseconds

Classification rate: 0.63 images per second

Class number: 0

Class name: Airplane

Classifying image : /home/xilinx/jupyter_notebooks/SAMBIT_DATA/car8.jpg

Inference took 1586855.00 microseconds

Classification rate: 0.63 images per second

Class number: 8

Class name: Ship

Classifying image : /home/xilinx/jupyter_notebooks/SAMBIT_DATA/car9.jpg

Inference took 1586411.00 microseconds

Classification rate: 0.63 images per second

Class number: 3

Class name: Cat

Some of the images used are as follows:

4.1.3 Results

Classifer	Accuracy	Time per image
Hardware	44.44%	1.5ms
Software	44.44%	1586ms

Inference using hardware acceleration is approximately 1000 times faster than in software.
The classifier performs rather poorly on images not from the same distribution as the training set.

4.1.4 Comments

Instantiating a model was very easy and straight forward.
However, the model did not perform with satisfactory accuracy since the test images came from a different distribution that the training set.
Options to retrain the network were very limited and difficult. The networks are mainly trained on GPUs and the quantized weights are then transferred onto the FPGA.
The following repo contains some details : https://github.com/Xilinx/BNN-PYNQ/tree/master/bnn/src/training

4.2 MNIST - Handwritten digit recognition

4.2.1 About the application

MNIST handwritten digit classification is widely regarded as the hello world of machine learning applications.

The task is to classify 28*28 images of handwritten digits into one of 9 digits 0 - 9.

The architecture of the tested network has just 3 fully connected layers.

The examples were adapted to make a single function, that collected pictures from the SD card and ran the classifier on them, in hardware and software.

Since I did not have a USB webcam, I simply saved some test images on the SD card on Pynq.

4.2.2 Code

# Run over several handwritten digits to test the network


import bnn
import cv2
from PIL import Image as PIL_Image
from PIL import ImageEnhance
from PIL import ImageOps
from PIL import Image as PIL_Image
import numpy as np
import math
from scipy import misc
from array import *
from PIL import Image as PIL_Image
from PIL import ImageOps
img_load = PIL_Image.open('/home/xilinx/img_webcam_mnist.png').convert("L")  


hw_classifier = bnn.LfcClassifier(bnn.NETWORK_LFCW1A1,"mnist",bnn.RUNTIME_HW)
sw_classifier = bnn.LfcClassifier(bnn.NETWORK_LFCW1A1,"mnist",bnn.RUNTIME_SW)


for i in range(3, 10):
    
    im_path = '/home/xilinx/jupyter_notebooks/SAMBIT_DATA/' + str(i) + '.jpg'
    cv2_im = cv2.imread(im_path, 1)
    cv2_im = cv2.cvtColor(cv2_im,cv2.COLOR_BGR2RGB)
    img = PIL_Image.fromarray(cv2_im).convert("L") 


    #original captured image
    #orig_img_path = '/home/xilinx/jupyter_notebooks/bnn/pictures/webcam_image_mnist.jpg'
    #img = PIL_Image.open(orig_img_path).convert("L")     


    #Image enhancement                
    contr = ImageEnhance.Contrast(img)
    img = contr.enhance(3)                                                    # The enhancement values (contrast and brightness) 
    bright = ImageEnhance.Brightness(img)                                     # depends on backgroud, external lights etc
    img = bright.enhance(4.0)          


    #img = img.rotate(180)                                                     # Rotate the image (depending on camera orientation)
    #Adding a border for future cropping
    img = ImageOps.expand(img,border=80,fill='white') 
    display(img)
    
    #Find bounding box  
    inverted = ImageOps.invert(img)  
    box = inverted.getbbox()  
    img_new = img.crop(box)  
    width, height = img_new.size  
    ratio = min((28./height), (28./width))  
    background = PIL_Image.new('RGB', (28,28), (255,255,255))  
    if(height == width):  
        img_new = img_new.resize((28,28))  
    elif(height>width):  
        img_new = img_new.resize((int(width*ratio),28))  
        background.paste(img_new, (int((28-img_new.size[0])/2),int((28-img_new.size[1])/2)))  
    else:  
        img_new = img_new.resize((28, int(height*ratio)))  
        background.paste(img_new, (int((28-img_new.size[0])/2),int((28-img_new.size[1])/2)))  


    background  
    img_data=np.asarray(background)  
    img_data = img_data[:,:,0]  
    misc.imsave('/home/xilinx/img_webcam_mnist.png', img_data) 
    
    #Resize the image and invert it (white on black)  
    smallimg = ImageOps.invert(img_load)  
    smallimg = smallimg.rotate(0)  


    data_image = array('B')  


    pixel = smallimg.load()  
    for x in range(0,28):  
        for y in range(0,28):  
            if(pixel[y,x] == 255):  
                data_image.append(255)  
            else:  
                data_image.append(1)  


    # Setting up the header of the MNIST format file - Required as the hardware is designed for MNIST dataset         
    hexval = "{0:#0{1}x}".format(1,6)  
    header = array('B')  
    header.extend([0,0,8,1,0,0])  
    header.append(int('0x'+hexval[2:][:2],16))  
    header.append(int('0x'+hexval[2:][2:],16))  
    header.extend([0,0,0,28,0,0,0,28])  
    header[3] = 3 # Changing MSB for image data (0x00000803)  
    data_image = header + data_image  
    output_file = open('/home/xilinx/img_webcam_mnist_processed', 'wb')  
    data_image.tofile(output_file)  
    output_file.close()   
    display(smallimg)
    
    class_out = hw_classifier.classify_mnist("/home/xilinx/img_webcam_mnist_processed")
    print("Class number: {0}".format(class_out))
    print("Class name: {0}".format(hw_classifier.class_name(class_out)))
    
    
    print("============================= SOFTWARE ======================================")
    class_out=sw_classifier.classify_mnist("/home/xilinx/img_webcam_mnist_processed")
    print("Class number: {0}".format(class_out))
    print("Class name: {0}".format(hw_classifier.class_name(class_out)))

4.2.3 Results

Now, this classified almost every image wrong! So, I do not feel it is useful to show the wrong classification results in here, to save space and time.

However, just for the sake of speed calculations, consider the following plot:

As we can see, the hardware inference is approximately 1000 times faster than software. This means a lot in case of training and inference in deep neural networks.

4.2.4 Comments

The FPGA itself is really impressive in terms of inference speed
The implemented MNIST classifier model does not really generalize well to webcam images.
A facility to train the network with own images would have been really nice.

4.3 Tiny YOLO - Object detection

4.3.1 About the application

This is a smaller implementation of the state-of-the-art You Only Look Once (YOLO) object detection algorithm.

It uses quantized weights for the convolution filters and the inference is accelerated in hardware.

This application tests the network with some random images downloaded from the internet.

4.3.2 Code

out_dim = net['conv7']['output'][1]
out_ch = net['conv7']['output'][0]


# img_folder = './yoloimages/' # uncomment to reset


img_folder = '/home/xilinx/jupyter_notebooks/SAMBIT_DATA/YOLO_TEST'


file_name_out = c_char_p("/home/xilinx/jupyter_notebooks/qnn/detection".encode())
file_name_probs = c_char_p("/home/xilinx/jupyter_notebooks/qnn/probabilities.txt".encode())
file_names_voc = c_char_p("/opt/darknet/data/voc.names".encode())
tresh = c_float(0.3)
tresh_hier = c_float(0.5)
darknet_path = c_char_p("/opt/darknet/".encode())


conv_output = classifier.get_accel_buffer(out_ch, out_dim)


while(1):
    for image_name in os.listdir(img_folder):
        img_file = os.path.join(img_folder, image_name)
        file_name = c_char_p(img_file.encode())


        img = load_image(file_name,0,0)
        img_letterbox = letterbox_image(img,416,416)
        img_copy = np.copy(np.ctypeslib.as_array(img_letterbox.data, (3,416,416)))
        img_copy = np.swapaxes(img_copy, 0,2)
        free_image(img)
        free_image(img_letterbox)


        #First convolution layer in sw
        if len(img_copy.shape)<4:
            img_copy = img_copy[np.newaxis, :, :, :]


        conv0_ouput = utils.conv_layer(img_copy,conv0_weights_correct,b=conv0_bias_broadcast,stride=2,padding=1)
        conv0_output_quant = conv0_ouput.clip(0.0,4.0)
        conv0_output_quant = utils.quantize(conv0_output_quant/4,3)


        #Offload to hardware
        conv_input = classifier.prepare_buffer(conv0_output_quant*7);
        classifier.inference(conv_input, conv_output)
        conv7_out = classifier.postprocess_buffer(conv_output)


        #Last convolution layer in sw
        conv7_out = conv7_out.reshape(out_dim,out_dim,out_ch)
        conv7_out = np.swapaxes(conv7_out, 0, 1) # exp 1
        if len(conv7_out.shape)<4:
            conv7_out = conv7_out[np.newaxis, :, :, :] 


        conv8_output = utils.conv_layer(conv7_out,conv8_weights_correct,b=conv8_bias_broadcast,stride=1)  
        conv8_out = conv8_output.ctypes.data_as(ctypes.POINTER(ctypes.c_float))


        #Draw detection boxes
        lib.forward_region_layer_pointer_nolayer(net_darknet,conv8_out)
        lib.draw_detection_python(net_darknet, file_name, tresh, tresh_hier,file_names_voc, darknet_path, file_name_out, file_name_probs);


        #Display result
        IPython.display.clear_output(1)
        file_content = open(file_name_probs.value,"r").read().splitlines()
        detections = []
        for line in file_content[0:]:
            name, probability = line.split(": ")
            detections.append((probability, name))
        for det in sorted(detections, key=lambda tup: tup[0], reverse=True):
            print("class: {}\tprobability: {}".format(det[1], det[0]))
        res = Image.open(file_name_out.value.decode() + ".png")
        display(res)
        
#         time.sleep(5)

4.3.3 Results

4.3.4 Comments

As can be seen from the video, the network performs really well!!

It mostly does a correct classification for all objects in the scene.

Only when there were several birds very close together, the network wrongly classified it as "aeroplane".

However, it is interesting to see that the detection and bounding box regression worked really well.

4.4 Custom Overlay - addmul

4.4.1 About the application

This is the best part, the part that interests me the most.

In this application, I followed some online resources to build a custom overlay that takes in two integers and returns their sum and product.

The logic itself is implemented in C++, on the FPGA fabric.

Pynq get's a handle to the overlay and I then use it to pass arguments and catch return values.

There is a nice surprise in this, read along!!

4.4.2 Code

The FPGA C++ code:

void addmul(int a, int b, int& sm, int& pr)
{
#pragma HLS INTERFACE ap_ctrl_none port=return
#pragma HLS INTERFACE s_axilite port=a
#pragma HLS INTERFACE s_axilite port=b
#pragma HLS INTERFACE s_axilite port=sm
#pragma HLS INTERFACE s_axilite port=pr


sm = a + b;
}
pr = a * b;
}

The Block design:

The tcl and bit files can be found attached to try out.

The Python code for hardware acceleration:

from pynq import Overlay
import time
olay = Overlay('/home/xilinx/pynq/overlays/addmul/addmul_block.bit')
ip = olay.addmul_0

test_list1 = [10, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23]
test_list2 = [23, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 10]


print("Adding and multiplying {0} sets of numbers in hardware ".format(len(test_list1)))


# t0 = time.clock()  # start time


# for i in range(len(test_list1)):
#     ip.write(0x10, test_list1[i])
#     ip.write(0x18, test_list2[i])
#     var1 = ip.read(0x20)
#     var2 = ip.read(0x28)
#     print("sum = {0}, prod = {1}".format(var1, var2))


# t1 = time.clock()


# print("HW acceleration took : {0} uS".format((t1 - t0) * 1000000))






# mimick 1-D convolution
t0 = time.clock()  # start time


for i in test_list1:
    for j in test_list2:
        
        ip.write(0x10, i)
        ip.write(0x18, j)
        var1 = ip.read(0x20)
        var2 = ip.read(0x28)
        print("sum = {0}, prod = {1}".format(var1, var2))


t1 = time.clock()


print("HW acceleration took : {0} uS".format((t1 - t0) * 1000000))

Python code for software:

test_list1 = [10, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23]
test_list2 = [23, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 10]


print("Adding and multiplying {0} sets of numbers in software ".format(len(test_list1)))


# t0 = time.clock()  # start time


# for i in range(len(test_list1)):
#     var1 = test_list1[i] + test_list2[i]
#     var2 = test_list1[i] * test_list2[i]
#     print("sum = {0}, prod = {1}".format(var1, var2))


# t1 = time.clock()


# print("SW took : {0} uS".format((t1 - t0) * 1000000))




# mimick 1D conv


t0 = time.clock()  # start time


for i in test_list1:
    for j in test_list2:
        
        var1 = i + j
        var2 = i * j
        print("sum = {0}, prod = {1}".format(var1, var2))


t1 = time.clock()


print("SW took : {0} uS".format((t1 - t0) * 1000000))

4.4.3 Output

For the shown code, we see the following:

Whoa!!! Did you expect that? I did not, surely. Hardware acceleration actually took longer than the software.

Well, don't be disheartened. This is mainly due to the time lost in pushing data into the hardware, basically, memory read/write operations from the Python code space to the IP running on the hardware.

Let's try this:

test_list1 = [10, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23]
test_list2 = [23, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 10]


print("Adding and multiplying {0} sets of numbers in hardware ".format(len(test_list1)))


t0 = time.clock()  # start time


for i in range(len(test_list1)):
    ip.write(0x10, test_list1[i])
    ip.write(0x18, test_list2[i])
    var1 = ip.read(0x20)
    var2 = ip.read(0x28)
    print("sum = {0}, prod = {1}".format(var1, var2))


t1 = time.clock()


print("HW acceleration took : {0} uS".format((t1 - t0) * 1000000))

test_list1 = [10, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23]
test_list2 = [23, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 10]


print("Adding and multiplying {0} sets of numbers in software ".format(len(test_list1)))


t0 = time.clock()  # start time


for i in range(len(test_list1)):
    var1 = test_list1[i] + test_list2[i]
    var2 = test_list1[i] * test_list2[i]
    print("sum = {0}, prod = {1}".format(var1, var2))


t1 = time.clock()


print("SW took : {0} uS".format((t1 - t0) * 1000000))

And ....

hardware acceleration is the winner!!!

In this, I just reduced the number of memory operations by reducing the number of calls to the IP.

This is why, we generally want to pass all our data in a large chunk, like arrays or matrices, instead of one-by-one.

5. Overall assessment

5.1 Likes

Easy to implement simple logic in FPGA using the existing overlays.
Great tool to understand how hardware acceleration can benefit deep learning.
Provides opportunity to get a first feel of FPGA without really having to design HDL code.
Some nice examples to demonstrate pattern recognition, object detection and other hot topics in machine learning.
Can be used for other generic projects, essentially as a replacement for conventional MCUs and MPs, together with benefits of hardware accceleration.
The network file transfer option is a super like!!
Very easy to clone and update repositories.

5.2 Dislikes

Documentation is sketchy - needs to be put in a proper order and easy to follow pattern
Personally for me, the Programmable Logic (PL) part is of more interest than the general GPIO, soft processor and other peripherals. I found the documentation for PL insufficient. As a beginner, I would like a more in-depth example that shows the complete flow of operations from C/C++ code in Vivado to using the creating accelerator in Pynq, using some more elaborate examples.
There is a small adder example present, but it misses out on many parts.
Not enough demonstration of how to integrate Machine Learning (ML) frameworks like Caffe, Tensorflow, Keras into my overlay design.
I would like a way to write my code in some IDE like PyCharm, directly within Pynq, if this is possible. Jupyter notebooks, though intuitive, miss out on several cool features of a full IDE.

6. Conclusion

This was a wonderful experience for me, as my first road test review for a product.

The Pynq Z2 is truly an amazing low cost entry level hardware for student, hobbyists and experimenters who wish to leverage the hardware acceleration offered by FPGAs.

However, to be able to take advantage of all features, one needs to have an in-depth knowledge in Python, HDL design, Xilinx toolchain, Zynq architecture.

Since this is very hard to acquire all together, I feel it would be really useful if an exhaustive set of resources and tutorials were made available.

So, the only resources a developer needs to bring are time and patience.

Thank you Element14 for all the help and support and bearing with my often annoying and stupid questions and requests.

I have learnt quite a lot from this first review, most essentially, lot's of patience.

Though, I would have loved to do more with the Pynq, for now, I would stop here.

Looking forward to more such opportunities and great products to review.

7. References

Top Comments

sambit1991 over 5 years ago in reply to michaelkellett

Thanks MK.
I believe I would do better with more such experience.
- Cancel
- Up 0 Down
- Reply
- More
- Cancel
sambit1991 over 5 years ago in reply to beacon_dave

Thanks for going through the code section.
My idea in the first block was to actually make every element of list1, multiply and add with every single element of list 2 - sort of a conv operation (not exactly).
So, I expected it to run 144 times.
The surprising part was - both hardware and software loops run 144 times in the first case and software seems to have run faster.
Does this make sense? Or am I still missing something?
- Cancel
- Up 0 Down
- Reply
- More
- Cancel
beacon_dave over 5 years ago
FYI there appears to be an error in the code in section 4.4 where you are adding and multiplying the values in the two lists.

There is a nested for loop:
```
for i in test_list1:
     for j in test_list2:
```
which doesn't appear to make sense and appears to produce the wrong results in the first set of tests. The first test will loop 144 times instead of the anticipated 12 and the last 10 entries will have the following values for variables i and j:

i = 23, j = 20
i = 23, j = 19
i = 23, j = 18
i = 23, j = 17
i = 23, j = 16
i = 23, j = 15
i = 23, j = 14
i = 23, j = 13
i = 23, j = 12
i = 23, j = 10
resulting in the errors shown in the test results.

In the second set of tests this nested for loop has been replaced with :
```
for i in range(len(test_list1));
```
- Cancel
- Up +1 Down
- Reply
- More
- Cancel
michaelkellett over 5 years ago

Useful review - thank you.

I think your scoring was generous, but the text of the review tells a lot more.

MK
- Cancel
- Up +1 Down
- Reply
- More
- Cancel

PYNQ-Z2 Dev Board: Python Productivity for Zynq® - Review

Table of contents

1. Introduction

2. Unboxing

3. Method of review

4. Applications

4.1 CIFAR 10 - Automobile detection

4.1.1 About the application

4.1.2 Code

4.1.3 Results

4.1.4 Comments

4.2 MNIST - Handwritten digit recognition

4.2.1 About the application

4.2.2 Code

4.2.3 Results

4.2.4 Comments

4.3 Tiny YOLO - Object detection

4.3.1 About the application

4.3.2 Code

4.3.3 Results

4.3.4 Comments

4.4 Custom Overlay - addmul

4.4.1 About the application

4.4.2 Code

4.4.3 Output

5. Overall assessment

5.1 Likes

5.2 Dislikes

6. Conclusion

7. References

Top Comments