PocketSphinx on Pi3: The Detailed Steps

19 Jun 2016

These are the detail steps I went through to install PocketSphinx on my element14 Road Test Raspberry Pi 3B (Thank you guys again - I love it!)

(It got late and I stopped writing down things at the end - sorry.)

Refer to:

makezine.com Roomba, I Command Thee: Use Raspberry Pi for Voice Control

http://makezine.com/projects/use-raspberry-pi-for-voice-control/

First, go get the packages required for SphinxBase by executing:

sudo apt-get update

sudo apt-get install libasound2-dev autoconf libtool bison \

swig python-dev python-pyaudio

You’ll also need to install some Python libraries for use with our demo application. To do this, you’ll install and use the Python pip command with the following commands:

curl -O https://bootstrap.pypa.io/get-pip.py

sudo python get-pip.py

sudo pip install gevent grequests

OBTAINING THE SPHINX TOOLS

Now you can go about getting the SphinxBase package, which is used by PocketSphinx as well as other software in the CMU Sphinx family.

To obtain SphinxBase execute the following commands:

git clone git://github.com/cmusphinx/sphinxbase.git

cd sphinxbase

git checkout 3b34d87

./autogen.sh

make

(At this stage you may want to go make coffee …)

sudo make install

cd ..

You’re ready to move on to PocketSphinx.

To obtain PocketSphinx, execute the following commands:

git clone git://github.com/cmusphinx/pocketsphinx.git

cd pocketsphinx

git checkout 4e4e607

./autogen.sh

make

(Time for a second cup of coffee …)

sudo make install

cd ..

To update the system with your new libraries, run sudo ldconfig.

TESTING THE SPEECH RECOGNITION

Now that you have the building blocks of your speech recognition in place, you’ll want to test that it actually works before continuing.

Now you can run a test of PocketSphinx using

pocketsphinx_continuous -inmic yes.

You should see something like the following, which indicates the system is ready for you to start speaking:

...

Listening...

Input overrun, read calls are too rare (non-fatal)

You can safely ignore the warning. Go ahead and speak!

When you’re finished, you should see some technical information along with PocketSphinx’s best guess as to what you said, and then another READY prompt letting you know it’s ready for more input.

INFO: ngram_search.c(874): bestpath 0.10 CPU 0.071 xRT

INFO: ngram_search.c(877): bestpath 0.11 wall 0.078 xRT

what

READY....

RECO FROM FILE with large LM:

arecord -f s16_LE -r 16000 test16k.wav

pocketsphinx_continuous -infile test16k.wav 2>&1 | tee ./psphinx.log

xRT= sum of fwdflat, CPU xRT (from Nikolay )

My files are located at: https://github.com/slowrunner/Pi3RoadTest

Pi3RoadTest/ top directory

recoMic/ contains recognition test using pocketsphinx with the microphone

recoFile/ contains recognition test and results for pocketsphinx with file input

copy the folder cmusphinx-5prealpha-en-us-ptm-2.0 into recoMic/ and into recoFile/

download the prebuilt acoustic model from the Sphinx SourceForge: http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Acoustic%20M…

copy into recoMic/ and into recoFile/

extract it with:

tar -xvf cmusphinx-en-us-ptm-5.2.tar.gz.

See the notes file in each dir for exact commands to run the test and how to process the log file to get results statistics.

================

In lm mode log shows performance information (latest psphinx supposedly will show for grammar mode also).

For individual or TOTAL:

Add up fwdtree + fwdflat + bestpath CPU time = CPU time spent recognizing

Add up fwdtree + fwdflat + bestpath xRT (<1 e.g. 0.52 means 1s of audio takes 0.52 seconds of CPU time, or 0.52% of one core to perform the reco.

(To calculate length of audio processed divide total CPU time by percent of CPU)

on Pi 3: (126 phrase corpus using 136 words, 278 bi-grams, 295 tri-grams)

1.2GHz single core processing

64 phrases:

Total CPU: 75.7s 0.52 xRT 146s audio

Total Wall: 190s 1.3 xRT (146s audio) -

Wall time includes startup, tear down, output and logging.

Reco from mic cannot be faster than realtime.

0.52 xRT CPU means 52% of one core used by ASR.