We have seen many forms of voice control, and I've used some of them in the past (IoT Alarm Clock using Jasper ) or recently (Running Amazon Echo (Alexa) on Raspberry Pi Zero ).
For this project, I thought I'd try to find a voice control solution that can meet following requirements:
- work offline
- easy to customise commands
For example, Alexa is extremely powerful and can understand and answer a lot of questions, while sounding very human. But the data is processed online and wouldn't work without an active internet connection. Another thing is that in order to easily customise commands with Alexa, an additional service like IFTTT is required. So unfortunately, no internet = no voice control.
Luckily, alanmcdonley's Raspberry Pi 3 RoadTest was all about speech recognition performance using a Speech To Text tool called PocketSphinx. Alan also refers to a practical application by Neil Davenport for Make. Exactly what I was looking for!
Microphone
First things first. To be able to do voice control, we need to have an audio input device on the Pi. A USB microphone is probably the easiest and cheapest option.
I found this tiny USB microphone on eBay for about $1:
To verify it was properly detected, I listed the recording devices with following command:
pi@piclock:~ $ arecord -l **** List of CAPTURE Hardware Devices **** card 1: Device [USB PnP Sound Device], device 0: USB Audio [USB Audio] Subdevices: 1/1 Subdevice #0: subdevice #0
Next, trigger a recording and generate some sound:
pi@piclock:~ $ arecord -Dhw:1 -r 44100 -f S16_LE file Recording WAVE 'file' : Signed 16 bit Little Endian, Rate 44100 Hz, Mono ^CAborted by signal Interrupt...
And finally, verify the recording by playing it out:
pi@piclock:~ $ aplay -Dhw:0 -r 44100 -f S16_LE file Playing WAVE 'file' : Signed 16 bit Little Endian, Rate 44100 Hz, Mono
You should be able to hear the recorded sounds, indicating the microphone is working as expected.
PocketSphinx
Installation
To install the software, I mainly followed the very clear and detailed instructions from Neil on Make: http://makezine.com/projects/use-raspberry-pi-for-voice-control/
There were however some things I needed to adapt or add in order to get it fully working, so here's my take on the PocketSphinx installation
Dependencies
Some dependencies need to be installed, to avoid running into problems when building PocketSpinx or running the code.
pi@piclock:~ $ sudo apt-get install libasound2-dev autoconf libtool bison swig python-dev python-pyaudio
pi@piclock:~ $ curl -O https://bootstrap.pypa.io/get-pip.py pi@piclock:~ $ sudo python get-pip.py pi@piclock:~ $ sudo pip install gevent grequests
Once the dependencies are installed, the first bit of software can be installed.
SphinxBase
These instructions have been followed as is from Neil's guide, and are used to download the source files for SphinxBase and build it.
pi@piclock:~ $ git clone git://github.com/cmusphinx/sphinxbase.git pi@piclock:~ $ cd sphinxbase pi@piclock:~/sphinxbase $ git checkout 3b34d87 pi@piclock:~/sphinxbase $ ./autogen.sh pi@piclock:~/sphinxbase $ make pi@piclock:~/sphinxbase $ sudo make install pi@piclock:~/sphinxbase $ cd ..
PocketSphinx
After building SphinxBase, the same is done for PocketSphinx:
pi@piclock:~ $ git clone git://github.com/cmusphinx/pocketsphinx.git pi@piclock:~ $ cd pocketsphinx pi@piclock:~/pocketsphinx $ git checkout 4e4e607 pi@piclock:~/pocketsphinx $ ./autogen.sh pi@piclock:~/pocketsphinx $ make pi@piclock:~/pocketsphinx $ sudo make install
Testing
With the installation complete, I first tested PocketSphinx using the microphone input, in continuous listen mode:
pi@piclock:~ $ pocketsphinx_continuous -inmic yes pocketsphinx_continuous: error while loading shared libraries: libpocketsphinx.so.3: cannot open shared object file: No such file or directory
This returned an error. For some reason, the location of the shared libraries needs to be included in library search path, as it's not part of the defaults:
pi@piclock:~ $ sudo nano /etc/ld.so.conf include /etc/ld.so.conf.d/*.conf /usr/local/lib
After adding the "/usr/local/lib" path to "/etc/ld.so.conf", apply the change:
pi@piclock:~ $ sudo ldconfig
I then tried again and bumped into another issue:
pi@piclock:~ $ pocketsphinx_continuous -inmic yes Error opening audio device default for capture: No such file or directory
PocketSphinx searched for the microphone on the default audio card, which is the Pi's onboard audio, that has not input capabilities. This can easily be helped by specifying the microphone's index on the command line interface:
pi@piclock:~ $ pocketsphinx_continuous -inmic yes -adcdev plughw:1
PocketSphinx is then running, trying to recognise speech. Don't worry if at this stage it's not recognising what you say (at all), as it still needs to be configured with meaningful dictionary and grammar data.
Configuration
Audio Devices
As documented in Neil's post, I changed the <> config to put the USB device at index 0 and only then the onboard audio. This makes the USB device the default for playout and capture, without having to change other files. This works particularly well if you are using a USB sound card with both input and output capabilities.
pi@piclock:~ $ sudo nano /etc/modprobe.d/alsa-base.conf options snd-usb-audio index=0 options snd_bcm2835 index=1
Then came, what was for me, the most tricky part. Since the USB dongle I used is microphone only, I needed to have the default playout to remain the onboard audio, but the default capture to be the USB mic. After a lot of searches and different tests, this became the resulting audio configuration to get both playout and capture to work on the expected devices:
pi@piclock:~ $ sudo nano /etc/asound.conf pcm.mic { type hw card 0 } pcm.onboard { type hw card 1 } pcm.!default { type asym playback.pcm { type plug slave.pcm "onboard" } capture.pcm { type plug slave.pcm "mic" } }
Dictionary & Language Model
The required input is a "corpus" file, a file containing the phrases that need to be recognised. This file can then be fed to Sphinx's lmtool to generate the dictionary and language model files.
As explained on that page:
To use: Create a sentence corpus file, consisting of all sentences you would like the decoder to recognize. The sentences should be one to a line (but do not need to have standard punctuation). You may not need to exhaustively list all possible sentences: the decoder will allow fragments to recombine into new sentences.
Example questions for my application are:
- is the door of the shed closed
- is the door of the shed open
- what is the temperature in the shed
- turn on the lab light
- turn off the lab light
Two of these generated files are relevant: the *.dic (dictionary) file and the *.lm (language model)
For ease of use, renamed the files to "dictionary.dic" & "language_model.lm".
Grammar File
The grammar file ("grammar.jsgf") contains the structure of the sentences that will be spoken. Based on Neil's example, I created my own grammar file:
#JSGF V1.0; grammar commands; <action> = TURN ON | TURN OFF | TEMPERATURE | DOOR ; <object> = SHED | LAB ; public <command> = <action> THE <object> LIGHT | WHAT IS THE <action> OF THE <object> | IS THE <action> OF THE <object> CLOSED | IS THE <action> OF THE <object> OPEN ;
Be careful though, every word used in the grammar file should be present in the dictionary. Otherwise, an error will be generated at startup and the script will fail to start.
Demo
After replacing the files in the demo code by my own, I was able to detect my customised phrases accurately.
Here's a short clip demonstrating the recognition. It is now just a matter of linking the actual actions to the detected phrases.
As you can see, the recognition is extremely fast. Also, everything is done locally, without the need for an internet connection!
Navigate to the next or previous post using the arrows. |
Top Comments