Using Tensil.ai for CNN Models on the Arty Z7
Tensil.ai is a powerful platform that enables the efficient deployment of convolutional neural networks (CNNs) on FPGA hardware, specifically tailored for development boards like the Arty Z7. This article provides a detailed and technical guide on how to utilize Tensil.ai for running CNN models on the Arty Z7 board, focusing on the necessary steps to set up the environment, compile models, and execute them effectively.
Setting Up the Environment
The first step in utilizing Tensil.ai on the Arty Z7 is to set up the PYNQ environment. Begin by downloading the appropriate SD card image for the Arty Z7 from the official PYNQ repository. The image typically includes a preconfigured Linux environment optimized for FPGA development.
-
Flash the SD Card: Use a tool like Balena Etcher or Win32 Disk Imager to write the downloaded image to your SD card. Ensure that the SD card is at least 16 GB for optimal performance.
-
Boot the Arty Z7: Insert the SD card into the Arty Z7 board, connect it to a power source, and establish a network connection (via Ethernet or Wi-Fi). Use a serial console or SSH to access the board.
-
Kernel Configuration: Once logged into the PYNQ environment, you may need to adjust the kernel configuration to increase the size of the Contiguous Memory Allocator (CMA). This is crucial for handling large data transfers between the CPU and FPGA. You can modify the boot parameters by editing the
bootargs
in the/boot/uEnv.txt
file to include:
bootargs=console=ttyPS0,115200 root=/dev/mmcblk0p2 rw rootwait cma=128M
-
After saving the changes, reboot the board to apply the new settings.
Installing Tensil Driver and Artifacts
With the environment set up, the next step is to install the Tensil driver and necessary artifacts. This involves cloning the Tensil repository and transferring files to the Arty Z7.
-
Clone the Tensil Repository: On your local machine, clone the Tensil GitHub repository:
$
git clone https://github.com/tensil-ai/tensil.git
-
Transfer Drivers: Use SCP (Secure Copy Protocol) to transfer the Tensil drivers to your Arty Z7. You might want to navigate to the appropriate directory on the board:
$
scp -r tensil/drivers/tcu_arty [email protected]:/home/xilinx/
-
Bitstream and Model Files: After compiling your model, transfer the generated bitstream and model files to the Arty Z7 board. For instance:
$
scp my_model.bit [email protected]:/home/xilinx/
$scp my_model.onnx [email protected]:/home/xilinx/
Compiling the CNN Model
Compiling the CNN model is a critical step in the deployment process. Tensil provides a compiler that converts your high-level model definition into a format that can be executed on the FPGA.
-
Prepare Your Model: Ensure that your CNN model is in ONNX format. If you have a model in another format (like TensorFlow or PyTorch), you can convert it to ONNX using the respective libraries.
-
Compile the Model: Use the Tensil compiler to compile your ONNX model. The command typically looks like this:
$
tensil compile -a /path/to/arty.tarch -m /path/to/my_model.onnx -o "Identity:0" -s true
-
In this command:
-a
specifies the architecture file for the Arty Z7.-m
points to your ONNX model file.-o
specifies the output node of your model.-s
indicates whether to include the softmax operation in the compilation.
The compilation will generate several artifacts, including a manifest file (
.tmodel
), a program file (.tprog
), and weights data (.tdata
). These files are essential for running the model on the FPGA.
Running the Model on Arty Z7
After compiling the model, you can proceed to execute it on the Arty Z7. This involves initializing the PYNQ overlay, loading the model, and running inference.
-
Initialize the PYNQ Overlay: In your Python environment on the Arty Z7, you need to load the PYNQ overlay and instantiate the Tensil driver. Here’s how you can do it:
from pynq import Overlay
from tcu_arty.driver import Driver
from tcu_arty.arch import Architecture
# Load the PYNQ overlay
overlay = Overlay("path/to/your/overlay.bit")
# Initialize the Tensil driver
driver = Driver(overlay)
-
Load and Preprocess Data: Prepare your input data for inference. If you are using the CIFAR dataset, you can load and preprocess the images as follows:
import numpy as np
from PIL import Image
def load_and_preprocess_image(image_path):
image = Image.open(image_path).resize((32, 32)) # Resize to CIFAR dimensions
image_array = np.array(image) / 255.0 # Normalize pixel values
return image_array.flatten() # Flatten the image for input
input_data = load_and_preprocess_image("path/to/image.png")
-
Execute the Model: With the model loaded and data prepared, you can run inference. The following code snippet demonstrates how to execute the model and retrieve the results:
# Load the model into the driver
driver.load_model("path/to/my_model.tmodel")
# Run inference
output = driver.run(input_data)
# Process the output
predicted_class = np.argmax(output)
print(f"Predicted class: {predicted_class}")
-
Performance Optimization: To maximize performance, consider optimizing the data transfer between the CPU and FPGA. Use DMA (Direct Memory Access) for efficient data handling, and ensure that the input data is aligned with the memory requirements of the FPGA.
-
Debugging and Monitoring: Utilize the PYNQ Jupyter Notebooks for debugging and monitoring the performance of your model. You can visualize the data flow and check for bottlenecks in the processing pipeline.