BBB - High speed data acquisition and web-based UI

4 Aug 2013

Introduction

This was a fun yet initially challenging experiment, to find a convenient way to read in data at a reasonably high speed on a BeagleBone Black. This photo shows the results from a mobile, showing a couple of sampled waveforms (100kHz and 1MHz sinewaves in this case).

This was another capture of the same signals on a PC (this is an older picture with a x10 probe so the amplitude is a little low in the photo - it should fill the screen).

What does it do?

In its current state, it grabs analog data from an ADC, and dumps it into memory on the BBB, ready to be displayed or further processed. It could be used for gathering analog information from sensors, CCDs or other data acquisition use-cases. To be reasonably useful, the desire was for it to support 20Mbytes/sec of data or more. It does achieve this, but it is for further study to find higher speed methods.

How does it work?

A few different ways were considered. The initial approach was using an FTDI device (USB interface). However the method described here just feeds data directly into the on-chip 200MHz PRU that is part of the Beaglebone’s AM3359 chip. Other methods are possible too.

It was also desired to have an external clock, so that the data could be sampled at a determinable jitter, so that it could be useful for frequency analysis, or maybe Software Defined Radio (SDR).

The overall approach that was taken is outlined here.

The analog signal was amplified and fed into a high-speed ADC (A parallel bus ADC is needed in order to achieve high throughput).

A pre-built amp+ADC board was used from KNJN (note: in my opinion it is not a good choice, because it is closed source; there are no circuit diagrams for it so it is hard to modify it, the datasheet is sparse and also it is expensive; better to construct one up manually).

A linux application (called adc_app in the diagram) was used to kick off the PRU code which reads in data and dumps to shared memory. Once complete, the adc_app stores it in a .csv format file.

I wanted to try out Node.js ever since a recent blog post, so some very basic code is used to create a HTTP server. The real-time comms between the browser and web server is achieved using Socket.IO which is a way of passing arbitrary data.

A photo of the overall system:

A bit more detail:

The underside was more untidy!

ADC Detail

As mentioned, a ready-built amplifier and ADC board was used. The on-board oscillator was disabled, so that an external one could be fed in. I needed a clock of 20MHz or less, but I only had a 32MHz oscillator at hand and didn't want to wait (and the local Maplin store doesn’t sell any 3.3V-compatible logic to divide by two!) and I’m not entirely sure how long the shared memory write takes, and I did experience lost samples with the 32MHz oscillator. I plan on trying frequencies in the range 14MHz-20MHz to find the upper limit for missing no samples; for further study!

Note that some ADCs will have specific requirements related to the clock and duty cycle.

The ADC on this particular board was ADC08200 but an ADC08100 or ADC08060 could have been used at lower cost).

Buffers

These were extremely important for two reasons. One reason is that the pins to the BBB that I wished to use need to be isolated during power-up, because they are used for selecting the boot method. If there was any unusual level on the pins upon power-up then the BBB will not boot from the eMMC. So, a tri-state buffer is needed.

The other reason is that there is a fair bit of capacitance and it is highly likely that the ADC may not be able to directly drive the pins at high speed. I actually came across this problem while trying to connect a camera to the BBB. I struggled for days without realising that the camera could not support the load. So, the buffers are likely to be essential for most designs using the pins that were selected. I used a 74LVC244A device as a buffer.

Note that the clock also needs a buffer, unless significant jitter is acceptable. No tri-state is required here, so I used a MC74VHC1GT50.

PRU code

The PRU code uses shared memory for communication. I designated a single byte of shared memory to be used for commands. When run, the PRU sits in a loop waiting for the command to instruct it to begin the data capture. PRU GPI mode is used which allows inputs to be read at the processor speed of 200MHz with no varying latency. Just a few instructions are needed to populate the data into shared memory. No attempt was made to pack the data, and 32-bits are used to store the 8-bit sample. This is not such a bad idea, because in future the ADC could be swapped out to a (say) 10-bit ADC with no code change on the PRU.

There are two PRUs in the AM3359. There are a total of 12 PRU GPI capable pins available that are connected to PRU 1 which are brought out to port 8 on the BBB. So, this means that realistically 10 or 11-bit is about the limit for high-speed parallel ADCs connected in this manner. Still, at (say) 16MHz clock, this would equate to 20Mbytes/sec of data for a 10-bit ADC.

EDIT: See comments section - on PRU0, all 16 pins are in theory available).

The pins used are shared with the HDMI interface; it proved necessary to disable the HDMI interface by recompiling the device tree file in the /boot folder (EDIT: See Brian's comment below for a better method to disable HDMI). Since I wished to display the data using a web browser, I have no issue with losing the HDMI.

Once the data has been captured (2000 samples in this example), the command byte is acknowledged, so that the Linux hosted application can know that the PRU has completed. The PRU now sits and waits for a new instruction from the Linux hosted application.

These were the pins used. They were used as D[0..7], CLK and a *EN signal for controlling the octal tri-state buffer.

Linux hosted application

The adc_app program is very simple (C code); it downloads the assembled code into PRU1 and executes it. The resultant data in shared memory is dumped to a text file and then the program exits.

Node.js application

The Node.js application creates a HTTP server (no need for apache!) and a Socket.IO connection. This sits and waits for a connection from any web browser. Once it receives a connection, it will send a handshake and then wait for a ‘capture’ command from the web browser. It then calls the adc_app program. Once complete, it opens the file of captured data and transmits it over the Socket.IO connection line-by-line. This is very inefficient, but it is proof-of-concept code that could really be optimised.

Web Page

The web page served up contains some small bits of code to handle the Socket.IO connection and send a ‘capture’ request when a button is clicked and to display the received data with a canvas element and pixel manipulation.

Summary

In conclusion, it is possible to read analog data with low jitter at fairly high speeds without any external FIFO or logic (beyond a simple buffer IC), while continuing to run Linux applications such as a web server. It is also nice that a web-based UI can be rapidly created using Node.js.

Note: It is still for further study how long captures can be sustained and read off by the Linux application without any data loss. If it can be sustained, then it could be useful for SDR applications just about, although a higher speed (and better ADC) would be preferable.

Note2: The waveforms used to test out the system were generated by the same BBB using a low-cost ‘direct digital synthesis’ (DDS) board. That’s a subject for a later date.

Using the code

Disable the HDMI as mentioned in the comments.

You may need to install Socket.IO. Type this to install it:

npm install socket.io

Create a development folder and then the attached code can be unzipped to (say) /home/root/development/adc.

make clean
make
cp BB-BONE-HSADC-00A0.dtbo /lib/firmware/
source install_hsadc_cape.sh
node index.js

Then, navigate to http://xx.xx.xx.xx:8081/index.html

If you want to make subsequent changes, there is a bug in the makefile, and you will need to issue 'make clean' before typing 'make' whenever any change is made in the C code.

If you just want to reassemble the PRU code, type 'make pru' (no need for 'make clean').

Attachments:

adc_v1.zip

Top Comments

Parents

morgaine over 12 years ago
After your I2S audio DAC article, I was wondering when the ADC one would appear.

In addition to SDR, I'm interested in ADC functionality for the multifunction instrument idea.

Since you pointed out that:
1. the KNJN board is not optimal for several reasons, and
2. you also want more ADC width and speed for the future, and
3. the PRU has plenty of speed in reserve, but
4. the number of PRU-connected GPIs is limited,
these four factors seem to combine towards using just 8 GPIs to read in bytes from an external 16:8 selector that samples an ADC of up to 16 bits wide and runs a lot faster. Because the PRU triggers sampling, only one half of the 16 bits would need to sample and hold while the PRU gets around to reading in the other half, thus saving us an 8-bit register while also allowing the ADC to run faster because its sample acquisition time is overlapped with the PRU read time.

Or, at the cost of SAH on all 16 bits, the sampler could be fully external and asynchronous w.r.to the PRU's cycle period to give the ADC even more acquisition time, although I'm in two minds about whether this helps since that might require the addition of under-read logic to tell the PRU that a sample was missed..

Maybe it's too early to be thinking about this without a coffee.
- Cancel
- Vote Up +1 Vote Down
- Sign in to reply
- More
- Cancel
shabaz over 12 years ago in reply to morgaine

Hi Morgaine,

You're right, there are 16 available per PRU, but unfortunately on PRU1, 4 of them are being used for the main UART0) and MMC1, which is the eMMC.
I'm glad you brought up the question, since on PRU0, I just checked, it may be possible to use up all of them! (This would result in MMC0, i.e. the microSD socket) not being usable while the application is running, but this is a small loss. I should have checked in more detail earlier :-)
I will move my code to PRU0 and confirm 100% if these pins all work ok. Since one pin is needed for receiving the clock, this means 15-bits are usable, which I imagine are quite expensive ADCs at these frequencies :-) I'd be extremely happy with 14 bit or more. This really is awesome if 15-bit is possible.
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
morgaine over 11 years ago in reply to Former Member

Sebastien Saury wrote:

A wee while since this discussion took place so I hope I am not annoying people by reviving it.

You've earned yourself a liking purely for reviving it. It demonstrates nicely the longevity of engineering topics, which can remain highly relevant for years. Working technology doesn't stop working, barring component failure.

am looking at something robust and relatively simple to implement though as the idea is to interface the BBB with the FPGA board I produce (see: aes220 high speed USB FPGA mini-module if interested).

Very interesting! There's no shortage of people eager to combine programmable logic with the new generation of ARM application processors, although relevant pricing may be an issue because the low-cost ARM boards set the price expectations in this market niche.

BBB is clearly a good partner among Linux ARM boards owing to the realtime capabilities of its PRUs.

Morgaine.
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
Former Member over 11 years ago in reply to shabaz

shabaz wrote:

the amount of shared mem that is allocated for PRU use that can also be read by applications running on Linux is limited to 12kbyte today

it's probably solveable with a fairly simple driver that just uses CMA to allocate some memory http://lwn.net/Articles/486301/
- Cancel
- Vote Up +1 Vote Down
- Sign in to reply
- More
- Cancel
Former Member over 11 years ago in reply to Former Member

Sebastien Saury wrote:

Am I right in thinking that it means the pin direction cannot be changed on the fly and it is a dts job?

The way around this would be to claim ownership of the pins you're interested in via some fragment in devicetree (perhaps with a device tree overlay if you're feeling brave). Once the rest of the system knows not to touch these pins, then you should be free to do whatever you like with them. As shabaz says, it's not appropriate to do that while some other driver thinks it owns control of the pins as it may legitimately change the pinmux settings without your knowledge. However, if you've claimed them for yourself in a proper manner that the rest of the system will respect, what you do with them is up to you.
Remember, devicetree is a static representation of the initial state that gets setup before handing the pins over to their defined owner, the state can be changed later for all sorts of valid reasons.
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
morgaine over 11 years ago in reply to Former Member

selsinork wrote:

it's probably solveable with a fairly simple driver that just uses CMA to allocate some memory

http://lwn.net/Articles/486301/

One way of giving the PRU access to a large chunk of host RAM is described in Hipstercircuits' blog post "BeagleBone PRU DDR memory access the right way", where he configures 512k for this purpose. Unfortunately it seems that he uses a user-space process to read out the mapped address from /sys and then passes this info to the PRU, which does seem unnecessarily indirect.

Until we figure out a more direct way like CMA, this does at least provide one working approach.

Morgaine.
- Cancel
- Vote Up +1 Vote Down
- Sign in to reply
- More
- Cancel
stanto over 11 years ago in reply to morgaine

These values sound very small. Perhaps I have misread but is this relevant ?https://groups.google.com/forum/m/#!topic/beagleboard/Gb6xL7V7Z00
- Cancel
- Vote Up +2 Vote Down
- Sign in to reply
- More
- Cancel

Comment

stanto over 11 years ago in reply to morgaine

These values sound very small. Perhaps I have misread but is this relevant ?https://groups.google.com/forum/m/#!topic/beagleboard/Gb6xL7V7Z00
- Cancel
- Vote Up +2 Vote Down
- Sign in to reply
- More
- Cancel

Children

morgaine over 11 years ago in reply to stanto

Aha, looks like a kernel limit (but possibly redefinable) for the size of contiguous chunks, which is actually reasonable. It won't be a problem if we are able to request multiple such chunks and, instead of making the ring buffer in user space as that link described, create a ring buffer from multiple chunks in driver space.

OTOH, depending on the application, a single 8MB chunk by itself may be enough for a PRU's purposes. It's certainly better than the default 12KB.

Morgaine.
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
Former Member over 11 years ago in reply to stanto

Yes, very relevant. The problem is that getting higher order physically contiguous memory allocations is difficult. Especially difficult the longer the system has been running as other tasks will fragment physical memory.

As suggested in that thread, smaller allocation and ring buffer semantics will be the way to go.
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
Former Member over 11 years ago in reply to morgaine

Morgaine Dinova wrote:

Aha, looks like a kernel limit (but possibly redefinable) for the size of contiguous chunks, which is actually reasonable.

There'll be a bunch of hardware details tied up in how it works. For example, my i.MX6 system seems to only allow up to 4MB allocations by default, as do the A20 boards.

It won't be a problem if we are able to request multiple such chunks

requesting large allocations is always the problem, the larger the requested size the lower the chance of the allocation succeding. A single 8M allocation may fail while at the same time you're able to successfully request 200x 256K, they just won't be contiguous. Short of an IOMMU that can do the same as the main MMU does for userspace code, this will always be a problem for peripherals that have a raw hardware view. CMA seems to be designed to help here as it's able to move the contents of pages in physical memory to make room for large allocations.

The ring buffer approach is still the most sensible and is something that's sucessfully used by all manner of peripherals.

There's probably other ways, you could implement some form of dma style scatter-gather in your code, but either that or ring-buffer will mean mode PRU code dedicated to communicating with the linux side. So you're down to trade-offs, what's the value in having linux available to do all the generic stuff compared to having to re-implement everything yourself on bare metal. I'd guess that implementing a ring-buffer driver is less work.
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
morgaine over 11 years ago in reply to Former Member

selsinork wrote:

Short of an IOMMU that can do the same as the main MMU does for userspace code, this will always be a problem for peripherals that have a raw hardware view.

That's the opposite of what I was suggesting though. We don't need IOMMU functionality in our case because the PRUs are capable enough to handle multiple 4 or 8MB buffers arranged as a ring buffer accessible dynamically from within their own address space. The total memory available to a PRU doesn't have to be physically contiguous, nor made to look contiguous through an IOMMU.

what's the value in having linux available to do all the generic stuff compared to having to re-implement everything yourself on bare metal.

Well we're talking about the PRUs here. They are bare metal.

Morgaine.
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
Former Member over 11 years ago in reply to morgaine

Morgaine Dinova wrote:

That's the opposite of what I was suggesting though. We don't need IOMMU functionality in our case because the PRUs are capable enough to handle multiple 4 or 8MB buffers arranged as a ring buffer accessible dynamically from within their own address space. The total memory available to a PRU doesn't have to be physically contiguous, nor made to look contiguous through an IOMMU.

Absolutely. It does leave you with more PRU code to write to handle communication though.

Well we're talking about the PRUs here. They are bare metal.

Exactly my point! Do you really want to redo apache/python/mysql/whatever in PRU assembly ? Or even bare metal on the Arm ?

One of my previous projects was never completed simply due to the effort required to reimplement enough of the software stack from scratch on a tiny microcontroller. I've since re-designed it with an SBC to do all of the general stuff, cheap SBC's like RPi and BBB made this not only possible, but preferable.

Anyway, back to the subject of communicating with the PRU.. I've just found a kernel parameter 'rproc_mem' that seems designed for exaclly this sort of thing. You do something like rproc_mem=16M@0xC3000000 and the allocations behind it are seemingly handled by CMA. Interestingly the 'Remote Processor Framework' this equates to seems to have been largely written by TI and Google http://processors.wiki.ti.com/index.php/IPC_Install_Guide_Linux
It's obviously not targeted specifically at the PRU's (not really a surprise), but looks like exactly what's needed since the problem description is identical.
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel