element14 Community
element14 Community
    Register Log In
  • Site
  • Search
  • Log In Register
  • About Us
  • Community Hub
    Community Hub
    • What's New on element14
    • Feedback and Support
    • Benefits of Membership
    • Personal Blogs
    • Members Area
    • Achievement Levels
  • Learn
    Learn
    • Ask an Expert
    • eBooks
    • element14 presents
    • Learning Center
    • Tech Spotlight
    • STEM Academy
    • Webinars, Training and Events
    • Learning Groups
  • Technologies
    Technologies
    • 3D Printing
    • FPGA
    • Industrial Automation
    • Internet of Things
    • Power & Energy
    • Sensors
    • Technology Groups
  • Challenges & Projects
    Challenges & Projects
    • Design Challenges
    • element14 presents Projects
    • Project14
    • Arduino Projects
    • Raspberry Pi Projects
    • Project Groups
  • Products
    Products
    • Arduino
    • Avnet Boards Community
    • Dev Tools
    • Manufacturers
    • Multicomp Pro
    • Product Groups
    • Raspberry Pi
    • RoadTests & Reviews
  • Store
    Store
    • Visit Your Store
    • Choose another store...
      • Europe
      •  Austria (German)
      •  Belgium (Dutch, French)
      •  Bulgaria (Bulgarian)
      •  Czech Republic (Czech)
      •  Denmark (Danish)
      •  Estonia (Estonian)
      •  Finland (Finnish)
      •  France (French)
      •  Germany (German)
      •  Hungary (Hungarian)
      •  Ireland
      •  Israel
      •  Italy (Italian)
      •  Latvia (Latvian)
      •  
      •  Lithuania (Lithuanian)
      •  Netherlands (Dutch)
      •  Norway (Norwegian)
      •  Poland (Polish)
      •  Portugal (Portuguese)
      •  Romania (Romanian)
      •  Russia (Russian)
      •  Slovakia (Slovak)
      •  Slovenia (Slovenian)
      •  Spain (Spanish)
      •  Sweden (Swedish)
      •  Switzerland(German, French)
      •  Turkey (Turkish)
      •  United Kingdom
      • Asia Pacific
      •  Australia
      •  China
      •  Hong Kong
      •  India
      •  Korea (Korean)
      •  Malaysia
      •  New Zealand
      •  Philippines
      •  Singapore
      •  Taiwan
      •  Thailand (Thai)
      • Americas
      •  Brazil (Portuguese)
      •  Canada
      •  Mexico (Spanish)
      •  United States
      Can't find the country/region you're looking for? Visit our export site or find a local distributor.
  • Translate
  • Profile
  • Settings
Eye On Intelligence Challenge
  • Challenges & Projects
  • Design Challenges
  • Eye On Intelligence Challenge
  • More
  • Cancel
Eye On Intelligence Challenge
Blog Finale Blog - Monocular Visual SLAM on Zynq 7000 APSoC
  • Blog
  • Forum
  • Documents
  • Polls
  • Files
  • Members
  • Mentions
  • Sub-Groups
  • Tags
  • More
  • Cancel
  • New
Join Eye On Intelligence Challenge to participate - click to join for free!
  • Share
  • More
  • Cancel
Group Actions
  • Group RSS
  • More
  • Cancel
Engagement
  • Author Author: venkat01
  • Date Created: 18 Nov 2024 11:53 PM Date Created
  • Views 296 views
  • Likes 4 likes
  • Comments 0 comments
  • Eye on Intelligence Challenge
Related
Recommended

Finale Blog - Monocular Visual SLAM on Zynq 7000 APSoC

venkat01
venkat01
18 Nov 2024

This blog is the finale blog of the series of blogs listed below

Blog #1 Introduction

Blog #2 Image Processing on Zynq 7000 APSoC

Blog #3 PMOD ALS

Blog #4 SLAM

Blog #5 CNN's on Zynq 7000 using Tensil.ai

In our first installment, we laid the groundwork with an introduction to the challenger's kit, setting the stage for what was to come. We then delved into the fascinating world of image processing, uncovering the capabilities of this SoC in handling complex visual data. Our discussions on PMOD ALS illuminated how we can integrate various sensors to enhance our projects, while the exploration of SLAM techniques showcased the potential for real-time spatial awareness. Finally, we ventured into the realm of Convolutional Neural Networks (CNNs) on Zynq 7000 using Tensil.ai, demonstrating how deep learning can be effectively deployed on this platform.

image

As we conclude this series, we’ll reflect on the key takeaways and insights gained throughout our journey. Join us as we summarize the highlights and discuss the future possibilities that lie ahead.

Deploying  the ML Model on to the FPGA

For this we are going to use the PYNQ v3.0.1 and for this version of PYNQ, we need Vivado 2022.1 to generate overlays for the ML Accerlation on the PL side.The below diagram shows the typical design flow

image

First, we need to get the Tensil toolchain. The easiest way is to pull the Tensil docker container from Docker Hub.

image

Then run the container.

image

Tensil’s strength is customizability, making it suitable for a very wide range of applications. The Tensil architecture definition file (.tarch) specifies the parameters of the architecture to be implemented. These parameters are what make Tensil flexible enough to work for small embedded FPGAs as well as large data-center FPGAs. This should select parameters that provide the highest utilization of resources on the XC7Z020 FPGA part at the core of the Arty Z7-20 board.

The architecture file for Arty Z7-20 is as follows

image

The first, data_type, defines the data type used throughout the Tensor Compute Unit (TCU), including in the systolic array, SIMD ALUs, accumulators, and local memory. We will use 16-bit fixed-point with an 8-bit base point (FP16BP8), which in most cases allows simple rounding of 32-bit floating-point models without the need for quantization. Next, array_size defines a systolic array size of 8x8, which results in 64 parallel multiply-accumulate (MAC) units. This number was chosen to balance the utilization of DSP units available on XC7Z020 in case you needed to use some DSPs for another application in parallel, but you could increase it for higher performance of the TCU.

With dram0_depth and dram1_depth, we define the size of DRAM0 and DRAM1 memory buffers on the host side. These buffers feed the TCU with the model’s weights and inputs, and also store intermediate results and outputs. Note that these memory sizes are in number of vectors, which means array size (8) multiplied by data type size (16-bits) for a total of 128 bits per vector.

Next, we define the size of the local and accumulator memories which will be implemented on the FPGA fabric itself. The difference between the accumulators and the local memory is that accumulators can perform a write-accumulate operation in which the input is added to the data already stored, as opposed to simply overwriting it. The total size of accumulators plus local memory is again selected to balance the utilization of BRAM resources on XC7Z020 in case resources are needed elsewhere.

With simd_registers_depth, we specify the number of registers included in each SIMD ALU, which can perform SIMD operations on stored vectors used for ML operations like ReLU activation. Increasing this number is only needed rarely, to help compute special activation functions. Finally, stride0_depth and stride1_depth specify the number of bits to use for enabling “strided” memory reads and writes.

Here's the updated file

{
    "data_type": "FP16BP8",
    "array_size": 12
    "dram0_depth": 2097152,  // Increased to accommodate more data (optional)
    "dram1_depth": 2097152,  // Increased to accommodate more data (optional)
    "local_depth": 16384,     // Increased from 8192 to 16384 (proportional increase)
    "accumulator_depth": 4096, // Increased from 2048 to 4096 (proportional increase)
    "simd_registers_depth": 1,
    "stride0_depth": 8,
     "stride1_depth": 8,
    "number_of_threads": 1,
    "thread_queue_depth": 8

}

image

Then import the generated files to create a block design in the Vivado 2022.1

image

Then compile the model and then move the model on to the PYNQ after the generation of the required hardware files and bitstream.

Observations and Testing

Compiling the drivers for Wireless USB Adapter

I had a plan of getting the Wireless USB Adapter(so as to get the board connected to the network without the Ethernet cable)working for the Arty Z7's PYNQ Image but, the drivers couldn't be compiled due to the missing header files and some compiler issues.

image

Monocular vSLAM Inference

After deploying the overlay on to the PYNQ the below frame is obtained using the input from the USB Web Camera Video Feed.

image

  • Sign in to reply
element14 Community

element14 is the first online community specifically for engineers. Connect with your peers and get expert answers to your questions.

  • Members
  • Learn
  • Technologies
  • Challenges & Projects
  • Products
  • Store
  • About Us
  • Feedback & Support
  • FAQs
  • Terms of Use
  • Privacy Policy
  • Legal and Copyright Notices
  • Sitemap
  • Cookies

An Avnet Company © 2025 Premier Farnell Limited. All Rights Reserved.

Premier Farnell Ltd, registered in England and Wales (no 00876412), registered office: Farnell House, Forge Lane, Leeds LS12 2NE.

ICP 备案号 10220084.

Follow element14

  • X
  • Facebook
  • linkedin
  • YouTube