element14 Community
element14 Community
    Register Log In
  • Site
  • Search
  • Log In Register
  • About Us
  • Community Hub
    Community Hub
    • What's New on element14
    • Feedback and Support
    • Benefits of Membership
    • Personal Blogs
    • Members Area
    • Achievement Levels
  • Learn
    Learn
    • Ask an Expert
    • eBooks
    • element14 presents
    • Learning Center
    • Tech Spotlight
    • STEM Academy
    • Webinars, Training and Events
    • Learning Groups
  • Technologies
    Technologies
    • 3D Printing
    • FPGA
    • Industrial Automation
    • Internet of Things
    • Power & Energy
    • Sensors
    • Technology Groups
  • Challenges & Projects
    Challenges & Projects
    • Design Challenges
    • element14 presents Projects
    • Project14
    • Arduino Projects
    • Raspberry Pi Projects
    • Project Groups
  • Products
    Products
    • Arduino
    • Avnet Boards Community
    • Dev Tools
    • Manufacturers
    • Multicomp Pro
    • Product Groups
    • Raspberry Pi
    • RoadTests & Reviews
  • Store
    Store
    • Visit Your Store
    • Choose another store...
      • Europe
      •  Austria (German)
      •  Belgium (Dutch, French)
      •  Bulgaria (Bulgarian)
      •  Czech Republic (Czech)
      •  Denmark (Danish)
      •  Estonia (Estonian)
      •  Finland (Finnish)
      •  France (French)
      •  Germany (German)
      •  Hungary (Hungarian)
      •  Ireland
      •  Israel
      •  Italy (Italian)
      •  Latvia (Latvian)
      •  
      •  Lithuania (Lithuanian)
      •  Netherlands (Dutch)
      •  Norway (Norwegian)
      •  Poland (Polish)
      •  Portugal (Portuguese)
      •  Romania (Romanian)
      •  Russia (Russian)
      •  Slovakia (Slovak)
      •  Slovenia (Slovenian)
      •  Spain (Spanish)
      •  Sweden (Swedish)
      •  Switzerland(German, French)
      •  Turkey (Turkish)
      •  United Kingdom
      • Asia Pacific
      •  Australia
      •  China
      •  Hong Kong
      •  India
      •  Korea (Korean)
      •  Malaysia
      •  New Zealand
      •  Philippines
      •  Singapore
      •  Taiwan
      •  Thailand (Thai)
      • Americas
      •  Brazil (Portuguese)
      •  Canada
      •  Mexico (Spanish)
      •  United States
      Can't find the country/region you're looking for? Visit our export site or find a local distributor.
  • Translate
  • Profile
  • Settings
Pi IoT
  • Challenges & Projects
  • Design Challenges
  • Pi IoT
  • More
  • Cancel
Pi IoT
Blog [Pi IoT] Alarm Clock #12: Voice Control
  • Blog
  • Forum
  • Documents
  • Polls
  • Files
  • Events
  • Mentions
  • Sub-Groups
  • Tags
  • More
  • Cancel
  • New
  • Share
  • More
  • Cancel
Group Actions
  • Group RSS
  • More
  • Cancel
Engagement
  • Author Author: fvan
  • Date Created: 2 Aug 2016 7:50 PM Date Created
  • Views 2605 views
  • Likes 6 likes
  • Comments 8 comments
  • fvan_piiot
  • pocketsphinx
  • piiot
  • winners
  • voice control
  • feature_tutorial
Related
Recommended

[Pi IoT] Alarm Clock #12: Voice Control

fvan
fvan
2 Aug 2016
  • Microphone
  • PocketSphinx
    • Installation
      • Dependencies
      • SphinxBase
      • PocketSphinx
      • Testing
    • Configuration
      • Audio Devices
      • Dictionary & Language Model
      • Grammar File
  • Demo

 

We have seen many forms of voice control, and I've used some of them in the past (IoT Alarm Clock using Jasper ) or recently (Running Amazon Echo (Alexa) on Raspberry Pi Zero ).

 

For this project, I thought I'd try to find a voice control solution that can meet following requirements:

  • work offline
  • easy to customise commands

 

For example, Alexa is extremely powerful and can understand and answer a lot of questions, while sounding very human. But the data is processed online and wouldn't work without an active internet connection. Another thing is that in order to easily customise commands with Alexa, an additional service like IFTTT is required. So unfortunately, no internet = no voice control.

 

Luckily, alanmcdonley's Raspberry Pi 3 RoadTest was all about speech recognition performance using a Speech To Text tool called PocketSphinx. Alan also refers to a practical application by Neil Davenport for Make. Exactly what I was looking for!

 

Microphone

 

First things first. To be able to do voice control, we need to have an audio input device on the Pi. A USB microphone is probably the easiest and cheapest option.

 

I found this tiny USB microphone on eBay for about $1:

 

imageimage

 

To verify it was properly detected, I listed the recording devices with following command:

 

pi@piclock:~ $ arecord -l
**** List of CAPTURE Hardware Devices ****
card 1: Device [USB PnP Sound Device], device 0: USB Audio [USB Audio]
  Subdevices: 1/1
  Subdevice #0: subdevice #0

 

Next, trigger a recording and generate some sound:

 

pi@piclock:~ $ arecord -Dhw:1 -r 44100 -f S16_LE file
Recording WAVE 'file' : Signed 16 bit Little Endian, Rate 44100 Hz, Mono
^CAborted by signal Interrupt...

 

And finally, verify the recording by playing it out:

 

pi@piclock:~ $ aplay -Dhw:0 -r 44100 -f S16_LE file
Playing WAVE 'file' : Signed 16 bit Little Endian, Rate 44100 Hz, Mono

 

You should be able to hear the recorded sounds, indicating the microphone is working as expected.

 

PocketSphinx

 

Installation

 

To install the software, I mainly followed the very clear and detailed instructions from Neil on Make: http://makezine.com/projects/use-raspberry-pi-for-voice-control/

There were however some things I needed to adapt or add in order to get it fully working, so here's my take on the PocketSphinx installation image

 

Dependencies

 

Some dependencies need to be installed, to avoid running into problems when building PocketSpinx or running the code.

 

pi@piclock:~ $ sudo apt-get install libasound2-dev autoconf libtool bison swig python-dev python-pyaudio

 

pi@piclock:~ $ curl -O https://bootstrap.pypa.io/get-pip.py
pi@piclock:~ $ sudo python get-pip.py
pi@piclock:~ $ sudo pip install gevent grequests

 

Once the dependencies are installed, the first bit of software can be installed.

 

SphinxBase

 

These instructions have been followed as is from Neil's guide, and are used to download the source files for SphinxBase and build it.

 

pi@piclock:~ $ git clone git://github.com/cmusphinx/sphinxbase.git
pi@piclock:~ $ cd sphinxbase
pi@piclock:~/sphinxbase $ git checkout 3b34d87
pi@piclock:~/sphinxbase $ ./autogen.sh
pi@piclock:~/sphinxbase $ make
pi@piclock:~/sphinxbase $ sudo make install
pi@piclock:~/sphinxbase $ cd ..

 

PocketSphinx

 

After building SphinxBase, the same is done for PocketSphinx:

 

pi@piclock:~ $ git clone git://github.com/cmusphinx/pocketsphinx.git
pi@piclock:~ $ cd pocketsphinx
pi@piclock:~/pocketsphinx $ git checkout 4e4e607
pi@piclock:~/pocketsphinx $ ./autogen.sh
pi@piclock:~/pocketsphinx $ make
pi@piclock:~/pocketsphinx $ sudo make install

 

Testing

 

With the installation complete, I first tested PocketSphinx using the microphone input, in continuous listen mode:

 

pi@piclock:~ $ pocketsphinx_continuous -inmic yes
pocketsphinx_continuous: error while loading shared libraries: libpocketsphinx.so.3: cannot open shared object file: No such file or directory

 

This returned an error. For some reason, the location of the shared libraries needs to be included in library search path, as it's not part of the defaults:

 

pi@piclock:~ $ sudo nano /etc/ld.so.conf
include /etc/ld.so.conf.d/*.conf
/usr/local/lib

 

After adding the "/usr/local/lib" path to "/etc/ld.so.conf", apply the change:

 

pi@piclock:~ $ sudo ldconfig

 

I then tried again and bumped into another issue:

 

pi@piclock:~ $ pocketsphinx_continuous -inmic yes
Error opening audio device default for capture: No such file or directory

 

PocketSphinx searched for the microphone on the default audio card, which is the Pi's onboard audio, that has not input capabilities. This can easily be helped by specifying the microphone's index on the command line interface:

 

pi@piclock:~ $ pocketsphinx_continuous -inmic yes -adcdev plughw:1

 

PocketSphinx is then running, trying to recognise speech. Don't worry if at this stage it's not recognising what you say (at all), as it still needs to be configured with meaningful dictionary and grammar data.

 

Configuration

 

Audio Devices

 

As documented in Neil's post, I changed the <> config to put the USB device at index 0 and only then the onboard audio. This makes the USB device the default for playout and capture, without having to change other files. This works particularly well if you are using a USB sound card with both input and output capabilities.

 

pi@piclock:~ $ sudo nano /etc/modprobe.d/alsa-base.conf
options snd-usb-audio index=0
options snd_bcm2835 index=1

 

Then came, what was for me, the most tricky part. Since the USB dongle I used is microphone only, I needed to have the default playout to remain the onboard audio, but the default capture to be the USB mic. After a lot of searches and different tests, this became the resulting audio configuration to get both playout and capture to work on the expected devices:

 

pi@piclock:~ $ sudo nano /etc/asound.conf

pcm.mic
{
    type hw
    card 0
}
pcm.onboard
{
    type hw
    card 1
}

pcm.!default
{
    type asym
    playback.pcm
    {
        type plug
        slave.pcm "onboard"
    }
    capture.pcm
    {
        type plug
        slave.pcm "mic"
    }
}

 

Dictionary & Language Model

 

The required input is a "corpus" file, a file containing the phrases that need to be recognised. This file can then be fed to Sphinx's lmtool to generate the dictionary and language model files.

 

As explained on that page:

To use: Create a sentence corpus file, consisting of all sentences you would like the decoder to recognize. The sentences should be one to a line (but do not need to have standard punctuation). You may not need to exhaustively list all possible sentences: the decoder will allow fragments to recombine into new sentences.

 

Example questions for my application are:

  • is the door of the shed closed
  • is the door of the shed open
  • what is the temperature in the shed
  • turn on the lab light
  • turn off the lab light

 

imageimage

 

Two of these generated files are relevant: the *.dic (dictionary) file and the *.lm (language model)

 

For ease of use, renamed the files to "dictionary.dic" & "language_model.lm".

 

Grammar File

 

The grammar file ("grammar.jsgf") contains the structure of the sentences that will be spoken. Based on Neil's example, I created my own grammar file:

 

#JSGF V1.0;
grammar commands;

<action> = TURN ON |
  TURN OFF       |
  TEMPERATURE |
  DOOR ;

<object> = SHED |
  LAB ;

public <command> = <action> THE <object> LIGHT |
  WHAT IS THE <action> OF THE <object> |
  IS THE <action> OF THE <object> CLOSED |
  IS THE <action> OF THE <object> OPEN ;

 

image Be careful though, every word used in the grammar file should be present in the dictionary. Otherwise, an error will be generated at startup and the script will fail to start.

 

Demo

 

After replacing the files in the demo code by my own, I was able to detect my customised phrases accurately.

 

Here's a short clip demonstrating the recognition. It is now just a matter of linking the actual actions to the detected phrases.

 

You don't have permission to edit metadata of this video.
Edit media
x
image
Upload Preview
image

 

As you can see, the recognition is extremely fast. Also, everything is done locally, without the need for an internet connection!

 


image

 


Navigate to the next or previous post using the arrows.

image
  • Sign in to reply

Top Comments

  • mcb1
    mcb1 over 9 years ago +2
    You have a nack of making it look easy ... especially when it doesn;t quite go right. Well done Looking forward to the text to speech, we might have a use at work. Mark
  • Jan Cumps
    Jan Cumps over 9 years ago +1
    Hey, the recognition worked really well.
  • fvan
    fvan over 9 years ago in reply to Jan Cumps +1
    And fast! Moving on to text to speech now
Parents
  • Jan Cumps
    Jan Cumps over 9 years ago

    Hey, the recognition worked really well.

    • Cancel
    • Vote Up +1 Vote Down
    • Sign in to reply
    • More
    • Cancel
Comment
  • Jan Cumps
    Jan Cumps over 9 years ago

    Hey, the recognition worked really well.

    • Cancel
    • Vote Up +1 Vote Down
    • Sign in to reply
    • More
    • Cancel
Children
  • fvan
    fvan over 9 years ago in reply to Jan Cumps

    And fast!

     

    Moving on to text to speech now image

    • Cancel
    • Vote Up +1 Vote Down
    • Sign in to reply
    • More
    • Cancel
element14 Community

element14 is the first online community specifically for engineers. Connect with your peers and get expert answers to your questions.

  • Members
  • Learn
  • Technologies
  • Challenges & Projects
  • Products
  • Store
  • About Us
  • Feedback & Support
  • FAQs
  • Terms of Use
  • Privacy Policy
  • Legal and Copyright Notices
  • Sitemap
  • Cookies

An Avnet Company © 2025 Premier Farnell Limited. All Rights Reserved.

Premier Farnell Ltd, registered in England and Wales (no 00876412), registered office: Farnell House, Forge Lane, Leeds LS12 2NE.

ICP 备案号 10220084.

Follow element14

  • X
  • Facebook
  • linkedin
  • YouTube