Hardware/Software Co-design with the LOGI Boards

9 Oct 2014

Introduction

In a previous blog post ValentF(x) gave an explanation of what FPGAs (field programmable gate arrays) are and how they are a very valuable resource when designing electronics systems. The article went on to describe the major differences in the way FPGAs operate from CPU/MCU technology. Finally, it was highlighted that FPGAs, especially when used in conjunction with CPU technology, are a powerful tool with both having their own respective strong points in how they process data.

This blog article focuses on how a user should begin to look at using an FPGA in conjunction with a CPU using a co-processing system. The user will better understand how the system is designed to handle processing multiple tasks, scheduling, mapping of the processing tasks and the intercommunication between the LOGI FPGA boards and the host CPU or MCU.

This article will use examples from the LOGI Face project, which is an open source animatronics robot project as the basis for discussing the co-processing methodologies. We will be using real examples from the LOGI projects. Additionally we will refer the user to the LOGI Face Lite project which is a more basic version of LOGI Face that the user can fully replicated with 3D printable parts and off-the-shelf components. The LOGI Face Lite wiki page contains instructions to build and run the algorithms in the project.

What is Hardware/Software Co-design ?

Co-design consists of designing an electronics system as a mixture of software and hardware components. Software components usually run on processors such as a CPU, DSP or GPU, where hardware components run on an FPGA or a dedicated ASIC (application specific integrated circuit). This kind of design method is used to take advantage of the inherent parallelism between the tasks of the application and ease of re-use between applications.

Steps for Designing a Hardware/Software Co-processing System

Partition the application into the hardware and software components
Map the software components to the CPU resources
Map the needed custom hardware components to the FPGA resources
Schedule the software components
Manage the communications between the software and hardware components

These steps can either be performed by co-design tools, or by hand, based on the knowledge of the designer. In the following we will describe how the LOGI Pi can be used to perform such a co-design in run real-time control oriented applications or high performance applications with the Raspberry Pi.

Communication between the LOGI Pi and Raspberry Pi

A critical requirement of a co-design system is the method of communication between the FPGA and CPU processing units of the platform. The processing units in this case are the LOGI FPGA and the Raspberry Pi. The LOGI projects use the wishbone bus to ensure fast, reliable and expandable communication between the hardware and software components. The Raspberry Pi does not provide a wishbone bus on its expansion, so it was required to take advantage of the SPI port and to design a hardware “wrapper” component in the FPGA that transforms the SPI serial bus into a 16 bit wishbone master component. The use of this bus allows users to take advantage of the extensive repository of open-source HDL components hosted on open-cores.org and other shared HDL locations.

To handle the communication on the SPI bus each transaction is composed of the following information.

1) set slave select low 2) send a 16 bit command word with bits 15 to 2 being the address of the access, bit 1 to indicate burst mode (1) , and bit 0 to indicate read (1) or write (0) (see figure).3) send/receive 16 bit words to/from the address set in the first transaction. If burst mode is set, the address will be increased on each subsequent access until the chip select line is set to high (end of transaction). 4) set slave select high

Such transactions allow users to take advantage of the integrated SPI controller of the Raspberry Pi which uses a 4096 byte fifo. This access format permits the following useful bandwidth to be reached (the 2 bit of synchro is transfer overhead on the SPI bus):

For a single 16 bit access :(16 bit data)/(2 bit synchro + 16 bit command + 16 bit data) => 47% of the theoretical bandwidth.
For a 4094 byte access : ( 2047 * (16 bit data))/(2 bit synchro + 16 bit command + 2047 * (16 bit data) 99.7% of the theoretical bandwidth.

This means that for most control based applications (writing/reading registers), we get half of the theoretical bandwidth, but for data based applications, such as writing and reading to buffers or memory, the performance is 99% of the theoretical bandwidth. It could be argued that getting rid of the wishbone interface and replacing it with an application specific data communication protocol (formatted packet of data) on the SPI bus could give the maximum bandwidth, but this would break the generic approach that is proposed here. The communication is abstracted using a dedicated C API that provides memory read and write functions. ValentF(x) also provides a library of hardware components (VHDL) that the user can integrate into designs (servo controller, pwm controller, fifo controller, pid controller …).

Communication between the LOGI Bone and BeagleBone

The BeagleBone exposes an external memory bus (called GPMC General on its P8 and P9 expansion connectors. This memory bus, provides 16-bit multiplexed address/data, 3 chip select, read, write, clock, address latch, high-byte/low-byte signals.The bus behavior is configured through the device-tree on the linux system as a synchronous bus with 50Mhz clock. This bus is theoretically capable of achieving 80MB/s but current settings limit the bus speed to a maximum of 20MB/s read, 27MB/s write. Higher speeds (50MB/s) can be achieved by enabling burst access (requires to re-compile the kernel) but this breaks the support for some of the IPs (mainly wishbone_fifo). Even higher speeds were measured by switching the bus to asynchronous mode and disabling DMA, but the data transfers would then increase the CPU load quite a lot.

On the FPGA side, ValentF(x) provides a wishbone wrapper that transforms this bus protocol into a wishbone master compatible with the LOGI drivers. On the Linux system side a kernel module is loaded and is in charge of triggering DMA transfers for each user request. The driver exposes a LOGI Bone_mem char device in the “/dev” directory that can be accessed through open/read/write/close functions in C or directly using dd from the command line.

This communication is also abstracted using a dedicated C API that provides memory read/write functions. This C API standardizes function accesses for the LOGI Bone and LOGI Pi thus enabling code for the LOGI Bone to be ported to the LOGI Pi with no modification.

Abstracting the communication layer using Python

Because the Raspberry Pi platform is targeted toward education, it was decided to offer the option to abstract the communication over the SPI bus using a Python library that provides easy access function calls to the LOGI Pi and LOGI Bone platforms. The communication package also comes with a Hardware Abstraction Library (HAL) that provides Python support for most of the hardware modules of the LOGI hardware library. LOGI HAL, which is part of the LOGI Stack, gives easy access to the wishbone hardware modules by providing direct read and write access commands to the modules. The HAL drivers will be extended as the module base grows.

A Basic Example of Hardware/Software Co-design with LOGI Face

LOGI Face is a demonstration based on the LOGI Pi platform that acts as a telepresence animatronic device. The LOGI Face demo includes software and hardware functionality using the Raspberry Pi and the LOGI Pi FPGA in a co-design architecture.

LOGI Face Software

The software consists of a VOIP (voice over internet protocol) client, text to voice synthesizer library and LOGI Tools which consist of C SPI drivers and Python language wrappers that give easy and direct communication to the wishbone devices on the FPGA. Using a VOIP client allows communication to and from LOGI Face from any internet connected VOIP clients, giving access to anyone on the internet access to sending commands and data to LOGI face which are communicated to the FPGA to control the hardware components. The software parses the commands and data and tagged subsets of data are then synthesized to speech using the espeak text to voice library. Users can also use the linphone VOIP client to bi-directionally communicate with voice through LOGI Face. The remote voice is broadcasted and heard on installed speaker in LOGI Face and the local user can then speak back to the remote user using the installed microphone in LOGI Face.

LOGI Face Hardware

The FPGA hardware side implementation consists of a SPI to wishbone wrapper, wishbone hardware modules including servos(mouth and eyebrows), RGB LEDs (hair color), 8x8 LED matrix (eyes) and SPI ADC drivers. The wishbone wrapper acts as glue logic that converts the SPI data to the wishbone protocol. The servos are used to emulate emotion by controlling the mouth which smiles or frowns and the eyebrows are likewise used to show emotions. A diagram depicting the tasks for the application can be seen in the following diagram.

javascript:;

LOGI Face Tasks

The LOGI Face applications tasks are partitioned on the LOGI Pi platform with software components running on the Raspberry Pi and hardware components on the LOGI Pi. The choice of software components was made to take advantage of existing software libraries including the espeak text to speech engine and linphone SIP client. The hardware components are time critical tasks including the wishbone wrapper, servo drivers, led matrix controller, SPI ADC controller and PWM controller.

Further work on this co-processing system could include optimizing CPU performance by moving parts of the espeak TTS (text to speech) engine and other software tasks to hardware in the FPGA. Moving software to the FPGA is a good example that showcases the flexibility and power of using an FPGA with an CPU.

A diagram with the final co-processing tasks of the LOGI Face application can be see in the following diagram.

LOGI Face Lite

LOGI Face Lite is a simplified version of the above mentioned LOGI Face project. The LOGI Face Lite project was created to allow direct access to the main hardware components on LOGI Face. LOGI Face Lite is intended to allow users to quickly build and replicate the basic software and hardware co-processing functions including servo, SPI ADC, PWM RGB LED and 8x8 Matrix LEDs. Each component has an HDL hardware implementation on the FPGA and function API call access from the Raspberry Pi. We hope that this give users a feel for how they might go about designing a project using the Raspberry Pi or BeagleBone and the LOGI FPGA boards.

Note that the lite version has removed the VOIP Lin client and text to speech functionality to give users a more direct interface to the hardware components using a simple python script. We hope this that will make it easier to understand and begin working with the components and that when the user is ready will move to the full LOGI Face project with all of the features.

Diagram of wiring and functions

3D model of frame and components

Assembled LOGI Face Lite

FPGA Control

2x Servos to control the eyebrows - mad , happy, angry, surprised, etc.
2x Servos to control mouth - smile, frown, etc
1x RGB LEDs which control the hair color, correspond to mood, sounds, etc
2 x 8x8 LED matrices which display animated eyes - blink, look up/down or side to side, etc
SPI microphone connected ADC to collect ambient sounds which are used to dynamically add responses to LOGI Face

Software ControlEach of the FPGA controllers is accessible to send and receive data from on the Raspberry Pi. A basic example Python program is supplied which shows how to directly access the FPGA hardware components from the Raspberry Pi.

Build the HDL using the Skeleton EditorAs an exercise the users can use LOGI Skeleton Editor to generate the LOGI Face Lite hardware project, which can then be synthesized and loaded into the FPGA. A Json file can be downloaded from the wiki can can then be imported into the Skeleton Editor, which will then configure the HDL project the user. The user can then use the generated HDL from Skeleton Editor to synthesize and generate a bitstream from Xilinx ISE. Alternatively we supply a pre-built bitsream to configure the FPGA.

Re-create the ProjectWe encourage users to go to the LOGI Face Lite ValentF(x) wiki page for a walk through on how to build the mechanical assembly with a 3D printable frame, parts list for required parts and instructions to configure the Skeleton project, build the hardware and finally run the software.

http://valentfx.com/wiki/index.php?title=LOGI_Face_Lite_-_Project

You can also jump to any of these resources the project

Conclusion

The LOGI Pi and LOGI Bone were designed to develop co-designed applications in a low cost tightly coupled FPGA/processor package. On the Raspberry Pi or BeagleBone the ARM processor has plenty of computing power while the Spartan 6 LX9 FPGA provides enough logic for many types of applications that would not otherwise be available with a standalone processor.

A FPGA and processor platform allows users to progressively migrate pieces of an application to hardware to gain performance while an FPGA only platform can require a lot of work to get simple tasks that processors are very good at. Using languages such as Python on the Raspberry Pi or BeagleBone enables users to quickly connect to a variety of library or web services and the FPGA can act as a proxy to sensors and actuators, taking care of all low-level real-time signal generation and processing. The LOGI Face project can easily be augmented with functions such as broadcasting the weather or reading tweets, emails or text messages by using the many available libraries of Python. The hardware architecture can be extended to provide more actuators or perform more advanced processing on the sound input such as FFT, knock detection and other interesting applications.

We hope to hear from you about what kind of projects you would like to see and or how we might improve our current projects.

References

http://espeak.sourceforge.net/

http://www.linphone.org/technical-corner/liblinphone/overview

https://github.com/jpiat/hard-cv

https://github.com/fpga-logi/logi-projects

https://github.com/fpga-logi/logi-hard

http://valentfx.com/wiki/index.php?title=LOGI_Face_Lite_-_Project

This work is licensed to ValentF(x) under a Creative Commons Attribution 4.0 International License.

Top Comments

DAB over 10 years ago in reply to johnbeetem +1

Hi John, I am going to try to approach it from a different angle and look for the specific frequency of the collapsing charge column. I came up with the idea when I was assessing how the charge flowed…

Former Member over 10 years ago

FPGA Group,
We appreciate your input, feedback and are LOOKING to help anyway we can...Ask questions to the LOGI team, they are here helping us....SO ask..and get involved!
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
DAB over 10 years ago in reply to johnbeetem

Hi John,

I am going to try to approach it from a different angle and look for the specific frequency of the collapsing charge column.
I came up with the idea when I was assessing how the charge flowed through the air and vaporized the water in its path.
I may have to some fancy FFT work to figure it all out, but I have a good idea on how the charge flows and at what frequencies I can track.

DAB
- Cancel
- Vote Up +1 Vote Down
- Sign in to reply
- More
- Cancel
Former Member over 10 years ago in reply to DAB

Hi All,

DAB, sounds like an interesting project! Looking forward to seeing your results! I love to see a model of the sound of thunder.

It seems that there is plenty that you would be able to do with the very time deterministic and fast operation within the FPGA. You limitation would be collecting the sound as the bottle neck. Have you though about what ADC you are planning on using? Single channel, multi channel, serial/paralle/lvds interface? Is this a homebrew project just for fun (low cost) or for professional work? It appers that there a number of MCU projects that are setup to sample multiple sensors and detect direction based upon TDOA, but it appears that there are some other methods that may be well suited for the FPGA as well.

The cool think about using the FPGA is that you could setup any number of parallel ADC samples, any number of algorithms that run in parallel to the samples, which would make for some nice pipline implementations. Ther are many ways of setting up such a sampling and processing, where the popular method has been to get a couple of concurrent ADC values with an MCU. With the FPGA you could do a some really interesting things.

Jonathan, I would be interested in hearing about what required the distance of you sensors being spaced so far? It seems that you would be able to get a good amount of information in a realtively small distance (meters), where sound travels so slowly.

In the simplest case the mic's could be spaced at 40cm (~16") spacing and with the speed of sound being 340m/s you would end up with end having [1/(340m/s)]*.4m = 1.1ms before the sound were to hit the second sensor. Which would give you the 1ms to work with before you would need the second sensor sample. DAB what kind geometry were you thinking of using in your toplogy?
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
jpiat over 10 years ago in reply to johnbeetem

I am not an audio expert but i worked on some localization techniques using audio and depending on the time resolution in your system and the precision in localization you require you may need to have your microphone spaced of quite a distance. I guess that if you want to detect and track thunders, having microphone spaced by hundreds of meters could give you a better spatial resolution. Thus one Pi one MIC would greatly ease the setup.
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
johnbeetem over 10 years ago in reply to jpiat

Jonathan Piat wrote:

I have started to work on a fixed-point FFT for audio to implement pattern matching for voice recognition (a single word with very low latency would already be a nice demo). Do you have any idea on the sound signature you plan to use ?

I once heard of an early speech recognizer called the "watermelon box". It could very accurately detect the word "watermelon" spoken by many different speakers at many different speeds, at a time when practical speech recognizers required training. The "watermelon box" used the fact that the word "watermelon" has four equal-length syllables with equal emphasis, which makes the word quite unusual. The box just looked for that simple pattern. Can't spell pffft without FFT!
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel