EC Blog #5

18 Nov 2024

Using Tensil.ai for CNN Models on the Arty Z7

Tensil.ai is a powerful platform that enables the efficient deployment of convolutional neural networks (CNNs) on FPGA hardware, specifically tailored for development boards like the Arty Z7. This article provides a detailed and technical guide on how to utilize Tensil.ai for running CNN models on the Arty Z7 board, focusing on the necessary steps to set up the environment, compile models, and execute them effectively.

Setting Up the Environment

The first step in utilizing Tensil.ai on the Arty Z7 is to set up the PYNQ environment. Begin by downloading the appropriate SD card image for the Arty Z7 from the official PYNQ repository. The image typically includes a preconfigured Linux environment optimized for FPGA development.

Flash the SD Card: Use a tool like Balena Etcher or Win32 Disk Imager to write the downloaded image to your SD card. Ensure that the SD card is at least 16 GB for optimal performance.
Boot the Arty Z7: Insert the SD card into the Arty Z7 board, connect it to a power source, and establish a network connection (via Ethernet or Wi-Fi). Use a serial console or SSH to access the board.
Kernel Configuration: Once logged into the PYNQ environment, you may need to adjust the kernel configuration to increase the size of the Contiguous Memory Allocator (CMA). This is crucial for handling large data transfers between the CPU and FPGA. You can modify the boot parameters by editing the bootargs in the /boot/uEnv.txt file to include:

bootargs=console=ttyPS0,115200 root=/dev/mmcblk0p2 rw rootwait cma=128M

After saving the changes, reboot the board to apply the new settings.

Installing Tensil Driver and Artifacts

With the environment set up, the next step is to install the Tensil driver and necessary artifacts. This involves cloning the Tensil repository and transferring files to the Arty Z7.

Clone the Tensil Repository: On your local machine, clone the Tensil GitHub repository:

$ git clone https://github.com/tensil-ai/tensil.git

Transfer Drivers: Use SCP (Secure Copy Protocol) to transfer the Tensil drivers to your Arty Z7. You might want to navigate to the appropriate directory on the board:

$ scp -r tensil/drivers/tcu_arty [email protected]:/home/xilinx/

Bitstream and Model Files: After compiling your model, transfer the generated bitstream and model files to the Arty Z7 board. For instance:

$ scp my_model.bit [email protected]:/home/xilinx/

$ scp my_model.onnx [email protected]:/home/xilinx/

Compiling the CNN Model

Compiling the CNN model is a critical step in the deployment process. Tensil provides a compiler that converts your high-level model definition into a format that can be executed on the FPGA.

Prepare Your Model: Ensure that your CNN model is in ONNX format. If you have a model in another format (like TensorFlow or PyTorch), you can convert it to ONNX using the respective libraries.
Compile the Model: Use the Tensil compiler to compile your ONNX model. The command typically looks like this:

$ tensil compile -a /path/to/arty.tarch -m /path/to/my_model.onnx -o "Identity:0" -s true

In this command:
- -a specifies the architecture file for the Arty Z7.
- -m points to your ONNX model file.
- -o specifies the output node of your model.
- -s indicates whether to include the softmax operation in the compilation.
The compilation will generate several artifacts, including a manifest file (.tmodel), a program file (.tprog), and weights data (.tdata). These files are essential for running the model on the FPGA.

Running the Model on Arty Z7

After compiling the model, you can proceed to execute it on the Arty Z7. This involves initializing the PYNQ overlay, loading the model, and running inference.

Initialize the PYNQ Overlay: In your Python environment on the Arty Z7, you need to load the PYNQ overlay and instantiate the Tensil driver. Here’s how you can do it:

from pynq import Overlay

from tcu_arty.driver import Driver

from tcu_arty.arch import Architecture

# Load the PYNQ overlay

overlay = Overlay("path/to/your/overlay.bit")

# Initialize the Tensil driver
driver = Driver(overlay)

Load and Preprocess Data: Prepare your input data for inference. If you are using the CIFAR dataset, you can load and preprocess the images as follows:

import numpy as np

from PIL import Image

def load_and_preprocess_image(image_path):

    image = Image.open(image_path).resize((32, 32))  # Resize to CIFAR dimensions

    image_array = np.array(image) / 255.0  # Normalize pixel values

    return image_array.flatten()  # Flatten the image for input

input_data = load_and_preprocess_image("path/to/image.png")

Execute the Model: With the model loaded and data prepared, you can run inference. The following code snippet demonstrates how to execute the model and retrieve the results:

# Load the model into the driver

driver.load_model("path/to/my_model.tmodel")

# Run inference

output = driver.run(input_data)

# Process the output

predicted_class = np.argmax(output)
print(f"Predicted class: {predicted_class}")

Performance Optimization: To maximize performance, consider optimizing the data transfer between the CPU and FPGA. Use DMA (Direct Memory Access) for efficient data handling, and ensure that the input data is aligned with the memory requirements of the FPGA.
Debugging and Monitoring: Utilize the PYNQ Jupyter Notebooks for debugging and monitoring the performance of your model. You can visualize the data flow and check for bottlenecks in the processing pipeline.