element14 Community
element14 Community
    Register Log In
  • Site
  • Search
  • Log In Register
  • About Us
  • Community Hub
    Community Hub
    • What's New on element14
    • Feedback and Support
    • Benefits of Membership
    • Personal Blogs
    • Members Area
    • Achievement Levels
  • Learn
    Learn
    • Ask an Expert
    • eBooks
    • element14 presents
    • Learning Center
    • Tech Spotlight
    • STEM Academy
    • Webinars, Training and Events
    • Learning Groups
  • Technologies
    Technologies
    • 3D Printing
    • FPGA
    • Industrial Automation
    • Internet of Things
    • Power & Energy
    • Sensors
    • Technology Groups
  • Challenges & Projects
    Challenges & Projects
    • Design Challenges
    • element14 presents Projects
    • Project14
    • Arduino Projects
    • Raspberry Pi Projects
    • Project Groups
  • Products
    Products
    • Arduino
    • Avnet Boards Community
    • Dev Tools
    • Manufacturers
    • Multicomp Pro
    • Product Groups
    • Raspberry Pi
    • RoadTests & Reviews
  • Store
    Store
    • Visit Your Store
    • Choose another store...
      • Europe
      •  Austria (German)
      •  Belgium (Dutch, French)
      •  Bulgaria (Bulgarian)
      •  Czech Republic (Czech)
      •  Denmark (Danish)
      •  Estonia (Estonian)
      •  Finland (Finnish)
      •  France (French)
      •  Germany (German)
      •  Hungary (Hungarian)
      •  Ireland
      •  Israel
      •  Italy (Italian)
      •  Latvia (Latvian)
      •  
      •  Lithuania (Lithuanian)
      •  Netherlands (Dutch)
      •  Norway (Norwegian)
      •  Poland (Polish)
      •  Portugal (Portuguese)
      •  Romania (Romanian)
      •  Russia (Russian)
      •  Slovakia (Slovak)
      •  Slovenia (Slovenian)
      •  Spain (Spanish)
      •  Sweden (Swedish)
      •  Switzerland(German, French)
      •  Turkey (Turkish)
      •  United Kingdom
      • Asia Pacific
      •  Australia
      •  China
      •  Hong Kong
      •  India
      •  Korea (Korean)
      •  Malaysia
      •  New Zealand
      •  Philippines
      •  Singapore
      •  Taiwan
      •  Thailand (Thai)
      • Americas
      •  Brazil (Portuguese)
      •  Canada
      •  Mexico (Spanish)
      •  United States
      Can't find the country/region you're looking for? Visit our export site or find a local distributor.
  • Translate
  • Profile
  • Settings
Eye On Intelligence Challenge
  • Challenges & Projects
  • Design Challenges
  • Eye On Intelligence Challenge
  • More
  • Cancel
Eye On Intelligence Challenge
Blog Blog #5: CNN HW Accelerator for Handwriting Recognition - Integrating the HW Accelerator as a PYNQ Overlay
  • Blog
  • Forum
  • Documents
  • Polls
  • Files
  • Members
  • Mentions
  • Sub-Groups
  • Tags
  • More
  • Cancel
  • New
Join Eye On Intelligence Challenge to participate - click to join for free!
  • Share
  • More
  • Cancel
Group Actions
  • Group RSS
  • More
  • Cancel
Engagement
  • Author Author: yesha98
  • Date Created: 17 Nov 2024 7:22 PM Date Created
  • Views 1095 views
  • Likes 6 likes
  • Comments 1 comment
  • Eye on Intelligence Challenge
Related
Recommended

Blog #5: CNN HW Accelerator for Handwriting Recognition - Integrating the HW Accelerator as a PYNQ Overlay

yesha98
yesha98
17 Nov 2024

The PYNQ (Python Productivity for Zynq) Framework

image

PYNQ overlays use a combination of .bit, .hwh, and optionally .tcl files to reconfigure the programmable logic (PL) on a Xilinx Zynq SoC and provide a seamless interface for Python-based interaction with the hardware. Here's a detailed breakdown of how each file plays a role:


1. .bit File: The Bitstream

  • Purpose:
    The .bit file contains the binary configuration data for programming the FPGA (PL). It is generated during the hardware design flow in Vivado and defines the placement and routing of logic elements, interconnects, and other resources in the PL.
  • Role in PYNQ:
    When an overlay is loaded in PYNQ, the .bit file is used to configure the FPGA hardware design into the PL:
    python
    Copy code
    from pynq import Overlay overlay = Overlay("/path/to/your/overlay.bit")
    • This step reconfigures the PL to match the desired hardware architecture.
    • The .bit file defines the physical behavior of the hardware but does not expose its structure to Python directly.

2. .hwh File: Hardware Handoff

  • Purpose:
    The .hwh file is an XML-like metadata file generated alongside the .bit file by Vivado. It describes the hardware design's internal structure, including:
    • AXI interfaces.
    • Address maps.
    • Clock connections.
    • Configuration registers of hardware IPs.
  • Role in PYNQ:
    • PYNQ uses the .hwh file to parse the hardware's architecture and automatically map the hardware IPs to Python objects.
    • For example, when you access overlay.axi_dma, PYNQ knows how to interact with the AXI DMA IP block because the .hwh file provides its address map and interface details.
    • Without the .hwh file, you would need to manually provide these details, making the design less user-friendly.

3. .tcl File: Tcl Script (Optional)

  • Purpose:
    The .tcl file is a Vivado script that describes how the hardware design was constructed. It includes:
    • The IP block configurations.
    • Connections and parameters.
    • Design constraints.
  • Role in PYNQ:
    • The .tcl file is optional and not directly used by PYNQ at runtime. However, it is helpful during the development or debugging process for regenerating or modifying the Vivado project.
    • It can also be used for advanced workflows, such as modifying an overlay dynamically or automating FPGA design customization.

How These Files Work Together:

  1. Development in Vivado:

    • You design the hardware, integrate IP cores, configure interfaces, and generate the .bit and .hwh files.
    • Optionally, you generate the .tcl file for reproducibility or further customization.
  2. Overlay Creation:

    • The .bit and .hwh files are packaged together as an overlay for use in PYNQ.
  3. Loading the Overlay:

    • The .bit file configures the PL with the desired design.
    • The .hwh file allows PYNQ to understand the design, automatically map hardware IPs, and provide a Python API for interacting with them.
  4. Python Integration:

    • PYNQ abstracts the hardware details using the .hwh metadata. Users can control the hardware from Python without needing low-level details:
      python
      Copy code
      dma = overlay.axi_dma # Access the AXI DMA block dma.sendbuffer # Start sending data through DMA
    • The .hwh file enables these high-level Python APIs to interact seamlessly with the hardware.

Why is This Powerful?

  • Dynamic Reconfiguration:
    Overlays can be swapped dynamically by loading different .bit files during runtime, making PYNQ versatile for multiple applications without rebooting.

  • Ease of Use:
    The .hwh file removes the need for manual address mapping and register configuration, providing Python libraries that abstract complex FPGA interactions.

  • Rapid Prototyping:
    By combining these files with Python, PYNQ enables hardware acceleration for complex applications with minimal overhead in software and hardware integration.

This approach streamlines FPGA development, making it accessible even for software developers unfamiliar with low-level FPGA programming.

The Bitstream, Hardware handoff, and the TCL file for the block design are placed inside a newly created directory in Overlays directory.

image

How is the overlay programmed?

  1. Communication with FPGA:

    • The Zynq SoC includes an ARM processor (PS) that communicates with the FPGA fabric (PL) via a programming interface.
    • The PYNQ framework uses the Xilinx FPGA Manager (or a similar driver) to send the .bit file from the Linux filesystem to the FPGA's configuration memory.

  2. Configuration Memory Programming:

    • The .bit file is streamed to the FPGA configuration memory through the Configuration Access Port (CAP).
    • This process resets the PL and configures it with the new hardware design.

  3. Verification and Completion:

    • After programming, the FPGA reports a "done" signal to confirm successful configuration.
    • If an .hwh file is present, it is parsed to map the hardware design's interfaces to Python objects.

The process is dynamic, meaning overlays can be swapped at runtime without rebooting the system.

Given below is snipped that programs the custom overlay into the PL of Zynq

1. Loading the FPGA Bitstream

overlay = Overlay("cnn.bit") print(f"Overlay successfully loaded!")
  • Overlay("cnn.bit"): Loads the specified bitstream file (cnn.bit) onto the FPGA. This file contains the hardware design for your CNN accelerator.
  • overlay: Represents the loaded bitstream and provides access to its components (DMA engines, MMIO registers, etc.).
  • The print statement confirms that the overlay was loaded successfully.

2. Checking Overlay Status

if overlay.is_loaded: print("Overlay is active.") else: print("Overlay failed to load.")
  • overlay.is_loaded: Checks whether the overlay has been successfully loaded onto the FPGA.
  • If the bitstream is loaded and active, it prints confirmation. Otherwise, it indicates failure.

3. Error Handling

except FileNotFoundError: print(f"Bitstream file not found.") except Exception as e: print(f"Error loading overlay: {e}")
  • FileNotFoundError: Triggers if the specified bitstream file (cnn.bit) is not found in the working directory or specified path.
  • Exception: Catches any other errors that may occur during the bitstream loading process and prints the error details.

image


Next, the same array used in Vitis IDE/Xilinx SDK to test the CNN HW accelerator is added as a python list.

image

Using DMA to transfer the data into our custom IP

image

1. Input Buffer Allocation

input_buffer = allocate(shape=sampleImage.shape, dtype=np.uint8) np.copyto(input_buffer, sampleImage)
  • allocate: Allocates a physically contiguous buffer in memory to be used for DMA transfer. This is critical for FPGA-to-CPU communication, as DMA requires contiguous memory regions.
  • np.copyto: Copies the data from the preprocessed image (sampleImage) into the allocated buffer.
  • The buffer's shape and data type (uint8) match the input data.

Note: sampleImage should already be resized and preprocessed (e.g., the 28x28 MNIST image).


2. DMA Transfer to the Accelerator

dma = overlay.axi_dma_0 dma.sendchannel.transfer(input_buffer) dma.sendchannel.wait() print("DMA transfer complete.")
  • overlay.axi_dma_0: Refers to the DMA instance configured in the FPGA bitstream (overlay.bit).
  • dma.sendchannel.transfer(input_buffer): Initiates the transfer of data from the allocated input buffer to the hardware accelerator via the DMA engine.
  • dma.sendchannel.wait(): Waits for the DMA transfer to complete before proceeding.

3. Freeing the Input Buffer

input_buffer.freebuffer()
  • After the transfer is complete, the input buffer is freed to release the allocated memory.

4. Output Data Retrieval

time.sleep(1) # Small delay to ensure processing completion offset = 0x8 value = mmio.read(offset) print("Recognized Digit:", value)
  • time.sleep(1): Introduces a delay to allow the accelerator sufficient time to process the input data.
  • offset = 0x8: Specifies the register offset for reading the output of the accelerator. This value depends on your hardware's register map.
  • mmio.read(offset): Reads the result (recognized digit) from the specified register.
  • value: Contains the recognized digit output by the accelerator.

image

Read AXI MMRs in PYNQ

image

To read an MMIO (Memory-Mapped I/O) register in PYNQ, you can use the pynq.MMIO class. Here's a step-by-step guide:


1. Import the Required Module

from pynq import MMIO

2. Identify the MMIO Address and Size

  • You need the base address and the size of the MMIO region. These are typically provided in the hardware specification (e.g., from a block design in Vivado).

3. Create an MMIO Instance

Initialize an MMIO object with the base address and size:

mmio = MMIO(base_addr, size)
  • Replace base_addr with the starting physical address of the MMIO region.
  • Replace size with the size of the MMIO region in bytes.

4. Read from an MMIO Register

To read a value from a specific offset within the MMIO region:

value = mmio.read(offset)
  • Replace offset with the register's offset (in bytes) from the base address.

image

Give some time for CNN accelerator to complete processing.

image

Output

image

This output is expected (refer to the previous blog, where C was used to perform the same operation)

  • Sign in to reply
  • giachi
    giachi 3 months ago

    Dear, would it be possible to have the different files so I can try it on the z1?

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
element14 Community

element14 is the first online community specifically for engineers. Connect with your peers and get expert answers to your questions.

  • Members
  • Learn
  • Technologies
  • Challenges & Projects
  • Products
  • Store
  • About Us
  • Feedback & Support
  • FAQs
  • Terms of Use
  • Privacy Policy
  • Legal and Copyright Notices
  • Sitemap
  • Cookies

An Avnet Company © 2025 Premier Farnell Limited. All Rights Reserved.

Premier Farnell Ltd, registered in England and Wales (no 00876412), registered office: Farnell House, Forge Lane, Leeds LS12 2NE.

ICP 备案号 10220084.

Follow element14

  • X
  • Facebook
  • linkedin
  • YouTube