Experimenting with Sensor Fusion: Augmented Reality - Drone Pose (Blog #1)

11 Nov 2022

Introduction
Project Description
Hardware
Task Breakdown
Current Status
About Me

Introduction

Greetings Element14 community! This is my introductory post for the Experimenting with Sensor Fusion competition!

First I would just like to thank the Element14 staff for all the work that they put into organizing and administering the competition, as well as the sponsors from AMD Xilinx for making this possible!

Now without further ado, let's explore the enigmatic field of sensor fusion!

Project Description

The goal of this competition is to implement some kind of sensor fusion application. Sensor fusion is the process of collecting data from a variety of sensors and combining them in a useful manner. When I first saw this competition, my first reaction was that I really wanted to do Visual-Inertial Odometry (VIO), a type of sensor fusion which uses image sensors and IMU data to compute the position and orientation of an object. If you haven't seen VIO before, check out this video.

Given the time constraints, I decided that this was an unrealistic goal. As a compromise, I have decided to only tackle part of this problem, namely the inertial odometry part. My goal is to collect data from an IMU and accelerate the computation of an object's pose. Once this is computed, I will generate a visualization of a set of unit vectors which will be fused with a live camera feed. In essence, this is an augmented reality (AR) application which will attempt to project the pose of an object in real time. The reason I call this "drone pose" is because drones are probably the best example of a rigid body robot that experiences linear and angular accelerations.

Check out the gallery below for some examples of what the visualization will look like. The goal is to overlay this on an image feed and if time allows, track an object's position in that frame. This might not look great, but this uses actual OpenCV calls which means I can spend most my time accelerating the algorithm, rather than figuring out how to implement such an algorithm from scratch.

Drone Pose Visualization Examples
Drone Pose: ωx=0 ωy=0 ωz=0 (rad)
Drone Pose: ωx=3.6 ωy=1.78 ωz=2.2 (rad)

Drone Pose Visualization Examples

Drone Pose: ωx=0 ωy=0 ωz=0 (rad)

Drone Pose: ωx=3.6 ωy=1.78 ωz=2.2 (rad)

Note 1: The axis is projected with the following rotation to give perspective: ωx=0.2 ωy=0.785398 ωz=0.1 (rad).

Note 2: Blue = x-axis, Red=y-axis, Green=z-axis.

Hardware

The SP701 Evaluation Kit
Digilent Pcam 5C Imaging Module
Digilent Pmod NAV 9-axis IMU Plus Barometer

Task Breakdown

Graphics

Implement an unaccelerated OpenCV application which handles the pose graphics.
- As mentioned before, the example drone poses in this blog were generated using OpenCV. There are two main functions used to generate these images: cv::projectPoints and cv::line.
  - cv::projectPoints is simply a computation to determine the endpoints of each axis given a rotation and translation vector.
  - cv::line is a graphical computation which is what actually draws lines on an image.
Simplify the OpenCV functions.
- OpenCV functions are often difficult to understand because they handle so many edge cases and overloaded function prototypes. These can often be greatly simplified. For instance, the cv::projectPoints function actually allows you to take into account lens distorition, but this is an unnecessary feature that I plan to remove. The goal is to break these function down into more digestible parts.
Accelerate the OpenCV functions.
- I plan to use VitisHLS along with the Vitis Vision Library to accelerate these OpenCV functions. The Vitis Vision Library provides a really neat interface for handling opencv-like matrices which greatly reduce the effort involved in creating data structures for the images. Perhaps the most unsung feature is the support for fixed-point precision. OpenCV functions like cv::projectPoints use floating point arithmetic which is notoriously slow on FPGA's, however, fixed-point arithmetic is a great alternative because it is much faster and should still produce really good results.

Video Capture and Display Pipeline

Run the MIPI CSI-2 RX Subsystem Example Design for the SP701.
- The PCAM-5C is a camera module built around the OV5640 MIPI CSI-2 image sensor. MIPI cameras are most often used in the mobile industry for products like smartphones, but are incredibly useful for FPGA designs. Xilinx provides an IP block called the "MIPI CSI-2 RX Subsystem" which can be used to capture video from these sensors. It's important to note that you still need a driver for such cameras. Luckily, Xilinx provides one for the PCAM-5C.
- The MIPI CSI-2 RX Subsystem comes with an example design for the SP701. Not all boards have an example, but we are lucky to have one for the SP701! We will take advantage of this as a starting point for this project.
Modify the example design.
- The example design is meant to be flexible, but this actually hinders our project because we don't need all those features. My goal is to free up lots of resources so that they can be used for accelerating my computer vision application.
- Our modified design can utilize most of the camera pipeline without any changes. There is an AXI4S switch that goes out to the MIPI DSI interface that I plan to remove since I only need an HDMI interface.
- Our modified design needs to add support for the PMOD-NAV IMU.
- Optional: The PCAM-5C looked kind of rough without any kind of processing. I might try to enhance it using gamma and white balance correction.
- Optional: The PCAM-5C has support for autofocus, but the current driver does not support this. I might add this to make the output prettier.

Integration

Calculate the pose of the object from the PMOD-NAV.
- This is probably the biggest question mark I will face in this project. My gut is telling me that I need to implement some kind of Kalman filter. IMU data gives linear and angular accelerations, but in order to get the absolute position and rotation, you need some way to integrate this data. Kalman filters take advantage of statistics to predict what the sensor sensed in between samples to more accurately integrate these data points.
Integrate the graphics accelerator into the camera pipeline.
- This should actually be relatively simple since I plan to use the AXI4 stream interface so it can sit inline with the camera display pipeline.
Write software to configure the accelerator with data that is processed by the Kalman filter.

Current Status

Here is the current status of the image capture pipeline. As you can see, the camera quality isn't the best. If I have time, I will try to enhance the quality and add autofocus so that things become more clear. The goal is to fuse the coordinate frame onto the live camera feed based on the position of the IMU. Right now the IMU is connected to a PMOD connector, but I plan to extend it with wires so that I can more easily simulate the "drone" movement.

About Me

I'm a 2021 grad with a background in computer engineering and a concentration in robotics. I've worked with Xilinx FPGA's for about 2 years now. I started out by interfacing MIPI CSI cameras in baremetal, then progressively switched over to embedded linux where I learned how to use XRT to deploy accelerated computer vision applications for drones. When I first started out with FPGA design, I found that there was a steep learning curve. If you find yourself in the same boat, I encourage you to stick with it and ask questions, because eventually it will click, and you will find it very rewarding.