element14 Community
element14 Community
    Register Log In
  • Site
  • Search
  • Log In Register
  • Members
    Members
    • Benefits of Membership
    • Achievement Levels
    • Members Area
    • Personal Blogs
    • Feedback and Support
    • What's New on element14
  • Learn
    Learn
    • Learning Center
    • eBooks
    • STEM Academy
    • Webinars, Training and Events
    • More
  • Technologies
    Technologies
    • 3D Printing
    • FPGA
    • Industrial Automation
    • Internet of Things
    • Power & Energy
    • Sensors
    • More
  • Challenges & Projects
    Challenges & Projects
    • Design Challenges
    • element14 presents
    • Project14
    • Arduino Projects
    • Raspberry Pi Projects
    • More
  • Products
    Products
    • Arduino
    • Dev Tools
    • Manufacturers
    • Raspberry Pi
    • RoadTests & Reviews
    • Avnet Boards Community
    • More
  • Store
    Store
    • Visit Your Store
    • Choose Another Store
      • Europe
      •  Austria (German)
      •  Belgium (Dutch, French)
      •  Bulgaria (Bulgarian)
      •  Czech Republic (Czech)
      •  Denmark (Danish)
      •  Estonia (Estonian)
      •  Finland (Finnish)
      •  France (French)
      •  Germany (German)
      •  Hungary (Hungarian)
      •  Ireland
      •  Israel
      •  Italy (Italian)
      •  Latvia (Latvian)
      •  
      •  Lithuania (Lithuanian)
      •  Netherlands (Dutch)
      •  Norway (Norwegian)
      •  Poland (Polish)
      •  Portugal (Portuguese)
      •  Romania (Romanian)
      •  Russia (Russian)
      •  Slovakia (Slovak)
      •  Slovenia (Slovenian)
      •  Spain (Spanish)
      •  Sweden (Swedish)
      •  Switzerland(German, French)
      •  Turkey (Turkish)
      •  United Kingdom
      • Asia Pacific
      •  Australia
      •  China
      •  Hong Kong
      •  India
      •  Korea (Korean)
      •  Malaysia
      •  New Zealand
      •  Philippines
      •  Singapore
      •  Taiwan
      •  Thailand (Thai)
      • Americas
      •  Brazil (Portuguese)
      •  Canada
      •  Mexico (Spanish)
      •  United States
      Can't find the country/region you're looking for? Visit our export site or find a local distributor.
  • Translate
  • Profile
RoadTests & Reviews
  • Products
  • More
RoadTests & Reviews
Blog PYNQ-Z2 Dev Kit - Tiny-YOLO Object Detection
  • Blog
  • RoadTest Forum
  • Documents
  • Events
  • RoadTests
  • Reviews
  • Mentions
  • Sub-Groups
  • Tags
  • More
  • Cancel
  • New
  • RoadTest
  • pynqworkshpch
  • pynq-z2
  • tiny-yolo
  • machine_learning
  • Share
  • Subscribe by email
  • More
  • Cancel
Related
Recommended

PYNQ-Z2 Dev Kit - Tiny-YOLO Object Detection

ralphjy
ralphjy
18 Aug 2019

The next neural network that I'm going to try is a variant of Tiny-YOLO.  The You Only Look Once (YOLO) architecture was developed to create a one step process for detection and classification.  The image is divided into a fixed grid of uniform cells and bounding boxes are predicted and classified within each cell.  This architecture enables faster object detection and has been applied to streaming video.

 

The network topology is shown below.  The pink colored layers have been quantized with 1 bit for weights and 3 bit for activations, and will be executed in the HW accelerator, while the other layers are executed in python.

 

The image processing is performed within Darknet by using python bindings.

 

 

 

The neural network has been trained on the PASCAL VOC (Visual Object Classes) and is able to identify 20 classes of objects in 4 categories

  1. Person: person
  2. Animal: bird, cat, cow, dog, horse, sheep
  3. Vehicle: airplane, bicycle, boat, bus, car, motorbike, train
  4. Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor

 

The steps for detection and classification are similar to the previous network as this network also uses the Multi-layer offload architecture.

 

Initialize the network

  1. Import libraries
  2. Instantiate classifier
  3. Perform other initializations in the Darknet framework

 

Code for initialization:

import sys
import os, platform
import json
import numpy as np
import cv2
import ctypes

from PIL import Image
from datetime import datetime

import qnn
from qnn import TinierYolo
from qnn import utils 
sys.path.append("/opt/darknet/python/")
from darknet import *

from matplotlib import pyplot as plt
%matplotlib inline

classifier = TinierYolo()
classifier.init_accelerator()
net = classifier.load_network(json_layer="/usr/local/lib/python3.6/dist-packages/qnn/params/tinier-yolo-layers.json")

conv0_weights = np.load('/usr/local/lib/python3.6/dist-packages/qnn/params/tinier-yolo-conv0-W.npy', encoding="latin1")
conv0_weights_correct = np.transpose(conv0_weights, axes=(3, 2, 1, 0))
conv8_weights = np.load('/usr/local/lib/python3.6/dist-packages/qnn/params/tinier-yolo-conv8-W.npy', encoding="latin1")
conv8_weights_correct = np.transpose(conv8_weights, axes=(3, 2, 1, 0))
conv0_bias = np.load('/usr/local/lib/python3.6/dist-packages/qnn/params/tinier-yolo-conv0-bias.npy', encoding="latin1")
conv0_bias_broadcast = np.broadcast_to(conv0_bias[:,np.newaxis], (net['conv1']['input'][0],net['conv1']['input'][1]*net['conv1']['input'][1]))
conv8_bias = np.load('/usr/local/lib/python3.6/dist-packages/qnn/params/tinier-yolo-conv8-bias.npy', encoding="latin1")
conv8_bias_broadcast = np.broadcast_to(conv8_bias[:,np.newaxis], (125,13*13))
file_name_cfg = c_char_p("/usr/local/lib/python3.6/dist-packages/qnn/params/tinier-yolo-bwn-3bit-relu-nomaxpool.cfg".encode())

net_darknet = lib.parse_network_cfg(file_name_cfg)

 

 

Classify image

  1. Open image to be classified
  2. Execute the first convolutional layer in Python
  3. Compute HW Offload of the quantized layers
  4. Normalize using fully connected layers in python

 

Code for classification:

img_folder = './yoloimages/'
img_file = os.path.join(img_folder, random.choice(os.listdir(img_folder)))
file_name = c_char_p(img_file.encode())

img = load_image(file_name,0,0)
img_letterbox = letterbox_image(img,416,416)
img_copy = np.copy(np.ctypeslib.as_array(img_letterbox.data, (3,416,416)))
img_copy = np.swapaxes(img_copy, 0,2)
free_image(img)
free_image(img_letterbox)

im = Image.open(img_file)
im

start = datetime.now()
img_copy = img_copy[np.newaxis, :, :, :]
    
conv0_ouput = utils.conv_layer(img_copy,conv0_weights_correct,b=conv0_bias_broadcast,stride=2,padding=1)
conv0_output_quant = conv0_ouput.clip(0.0,4.0)
conv0_output_quant = utils.quantize(conv0_output_quant/4,3)
end = datetime.now()
micros = int((end - start).total_seconds() * 1000000)
print("First layer SW implementation took {} microseconds".format(micros))
print(micros, file=open('timestamp.txt', 'w'))

out_dim = net['conv7']['output'][1]
out_ch = net['conv7']['output'][0]

conv_output = classifier.get_accel_buffer(out_ch, out_dim)
conv_input = classifier.prepare_buffer(conv0_output_quant*7);

start = datetime.now()
classifier.inference(conv_input, conv_output)
end = datetime.now()

conv7_out = classifier.postprocess_buffer(conv_output)

micros = int((end - start).total_seconds() * 1000000)
print("HW implementation took {} microseconds".format(micros))
print(micros, file=open('timestamp.txt', 'a'))

start = datetime.now()
conv7_out_reshaped = conv7_out.reshape(out_dim,out_dim,out_ch)
conv7_out_swapped = np.swapaxes(conv7_out_reshaped, 0, 1) # exp 1
conv7_out_swapped = conv7_out_swapped[np.newaxis, :, :, :] 

conv8_output = utils.conv_layer(conv7_out_swapped,conv8_weights_correct,b=conv8_bias_broadcast,stride=1)  
conv8_out = conv8_output.ctypes.data_as(ctypes.POINTER(ctypes.c_float))

end = datetime.now()
micros = int((end - start).total_seconds() * 1000000)
print("Last layer SW implementation took {} microseconds".format(micros))
print(micros, file=open('timestamp.txt', 'a'))

 

 

Draw detection boxes using Darknet

   The image postprocessing (drawing the bounding boxes) is performed in darknet using python bindings

 

Code for image postprocessing:

lib.forward_region_layer_pointer_nolayer(net_darknet,conv8_out)
tresh = c_float(0.3)
tresh_hier = c_float(0.5)
file_name_out = c_char_p("/home/xilinx/jupyter_notebooks/qnn/detection".encode())
file_name_probs = c_char_p("/home/xilinx/jupyter_notebooks/qnn/probabilities.txt".encode())
file_names_voc = c_char_p("/opt/darknet/data/voc.names".encode())
darknet_path = c_char_p("/opt/darknet/".encode())
lib.draw_detection_python(net_darknet, file_name, tresh, tresh_hier,file_names_voc, darknet_path, file_name_out, file_name_probs);

#Print probabilities
file_content = open(file_name_probs.value,"r").read().splitlines()
detections = []
for line in file_content[0:]:
    name, probability = line.split(": ")
    detections.append((probability, name))
for det in sorted(detections, key=lambda tup: tup[0], reverse=True):
    print("class: {}\tprobability: {}".format(det[1], det[0]))

 

 

Sample image (horses)

The first image that I going to use is a provided sample image of horses (773 x 512 pixels)

 

Execution time:

  • First layer SW implementation took 594523 microseconds
  • HW implementation took 593735 microseconds
  • Last layer SW implementation took 68420 microseconds

 

Classification:

class: cow probability: 84%

class: horse probability: 74%

class: horse probability: 68%

 

Object detection bounding boxes:

The example shows the issues that occur with multiple overlapping objects.

 

 

IP camera images

The application that I would like to use neural networks for is object identification in video streams from surveillance cameras.  As an example, I have an PTZ IP camera at the front of my house that is primarily used to alert me to deliveries (mail, Amazon, UPS, etc).  It is normally pointed at the driveway and mailbox, but the pan/tilt capability allows me to look up and down the street and also at my front door (270 degrees of coverage).  Currently, image motion detection and PIR sensing tell me when something is detected but I need to look at the camera video to determine if it is something of interest.  And needless to say, there are a lot of false detections.  I have 2 video sources that I'd like to analyze, the live fed from the camera and also stored video from a network video recorder (NVR).  I have multiple cameras, but I think it would be okay to require that each camera have dedicated processing hardware.

 

The PYNQ notebook examples that I've found either use the HDMI input or a webcam as a streaming video source.  For my application I need the ability to process an RTSP (Real Time Streaming Protocol) stream over ethernet.  I had hoped that I could just use the VideoCapture function in OpenCV, but I can't seem to get that to work.  I'm sure that I'll be able to get something to work, but for the purposes of this roadtest I'm just going to use static images from the camera (actually from the NVR).  I currently stream 2 resolutions from this particular camera (1280x720 and 640x480).  I'd like to use the lower resolution stream for processing if it doesn't degrade the accuracy too much.  I'm going to test that with the image captures from the NVR (the lower resolution captures from the NVR are actually only 320x176 - to allow for faster searching).  It turns out that because the detection grid is a fixed ratio to the image that the large and small images have about the same execution time.

 

 

Night image (1280x720)

class: car probability: 30%

The car to the right is not detected.

 

Day image (1280x720)

class: car probability: 86%

class: car probability: 34%

Multiple bounding boxes for the same image

 

Truck image (320x176)

class: car probability: 79%    -- no separate class for truck

 

Truck image (1280x720)

class: car probability: 96%  -- improved classification with larger image size (better resolution?)

 

Different truck (320x176)

class: car probability: 63%

 

Multiple cars (320x176)

class: car probability: 60%

class: car probability: 47%

class: car probability: 33%

 

did okay with the shadows

 

Me (320x176)

class: person probability: 42%

 

Multiple objects (1280x720)

class: car probability: 79%

class: car probability: 75%

class: person probability: 51%

 

Seems to have a harder time with people

 

Amazon and Mail trucks (1280x720)

class: car probability: 99%

class: car probability: 35%

 

 

Conclusion

   So, I've got a few challenges ahead of me after this roadtest.

  1. Figure out how to capture the RTSP stream (BTW, I do this successfully with a Raspberry Pi)
  2. Quantify usable frame rate (currently taking over a second to execute)
  3. Figure out to train with something that allows me to differentiate vehicles
Anonymous

Top Comments

  • weiwei2
    weiwei2 over 2 years ago +2

    RTSP is indeed challenging ... even currently i try to use RTSP over windows .net core software to do this is also challenging ...  but it is good to know if someone has achieved it some day and provide…

  • ralphjy
    ralphjy over 2 years ago in reply to genebren +1

    Hi Gene,

     

    The classifier was trained with the PASCAL Visual Object Classes which only has the following vehicle classes: airplane, bicycle, boat, bus, car, motorbike, train.

     

    I also thought it was odd at first…

  • genebren
    genebren over 2 years ago in reply to ralphjy +1

    Ralph,

     

    I am not sure how much information you get from your classifier, but some additional testing might be able to figure out how to sub-classify objects.  If you have a container of the object, you might…

  • nctiglao
    nctiglao 6 months ago in reply to weiwei2

    Can someone share their work around getting PYNQ-Z2 to access RTSP streams?

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
  • weiwei2
    weiwei2 over 2 years ago

    RTSP is indeed challenging ... even currently i try to use RTSP over windows .net core software to do this is also challenging ... but it is good to know if someone has achieved it some day and provide an example.

    • Cancel
    • Up +2 Down
    • Reply
    • More
    • Cancel
  • DAB
    DAB over 2 years ago

    Interesting results.

     

    DAB

    • Cancel
    • Up +1 Down
    • Reply
    • More
    • Cancel
  • genebren
    genebren over 2 years ago in reply to ralphjy

    Ralph,

     

    I am not sure how much information you get from your classifier, but some additional testing might be able to figure out how to sub-classify objects.  If you have a container of the object, you might be able to run a color histogram over the region and use color to sub-classify (big brown car = UPS truck).  Size of the container might allow you to distinguish truck from car.  There are a lot of ways to go.

     

    I wrote (from scratch) a program to count cells from a microscope image.  There were a lot of steps to normalizing and correcting the image (poor illumination and lack of focus in outer regions of the image), but it eventually was successful.  Object classification occurred after segmentation, and the objects were sorted based on size (number of pixels).  A histogram of sizes was used to determine the most popular size range (to determine the estimated object size).  Smaller objects were ignored and larger objects were subjected to various attempts to de-cluster groups of cells into individuals (circle fitting, re-thresholding and segmenting, etc.).  The resulting code was pretty fast and fairly accurate.  It would have been interesting to have been able to try this in hardware, but that was not one of the options, as the hardware was already designed when I jumped in to improve/re-write the processing.

     

    Good luck,

    Gene

    • Cancel
    • Up +1 Down
    • Reply
    • More
    • Cancel
  • ralphjy
    ralphjy over 2 years ago in reply to genebren

    Hi Gene,

     

    The classifier was trained with the PASCAL Visual Object Classes which only has the following vehicle classes: airplane, bicycle, boat, bus, car, motorbike, train.

     

    I also thought it was odd at first that it couldn't differentiate a truck.  Of course, I want to differentiate the UPS truck vs Amazon vs the mail truck.  I'm thinking because I don't have such a general purpose case that maybe I can figure out how to train with a small subset of vehicles.

     

    Ralph

    • Cancel
    • Up +1 Down
    • Reply
    • More
    • Cancel
>
Element14

element14 is the first online community specifically for engineers. Connect with your peers and get expert answers to your questions.

  • Members
  • Learn
  • Technologies
  • Challenges & Projects
  • Products
  • Store
  • About Us
  • Feedback & Support
  • FAQs
  • Terms of Use
  • Privacy Policy
  • Legal and Copyright Notices
  • Sitemap
  • Cookies

An Avnet Company © 2022 Premier Farnell Limited. All Rights Reserved.

Premier Farnell Ltd, registered in England and Wales (no 00876412), registered office: Farnell House, Forge Lane, Leeds LS12 2NE.

ICP 备案号 10220084.

Follow element14

  • Facebook
  • Twitter
  • linkedin
  • YouTube