[Pi IoT] Alarm Clock #13: Text to Speech

3 Aug 2016

Offline
- Installation
- Voices
Online
Switch
Demo

Now that we are able to easily create and customise voice commands on the Pi, let's do the reverse and create voice responses. As mentioned in my previous post, There are a lot of voice tools available, but I would like to have an offline alternative capable of working without an internet connection. What's a home automation system if it's crippled because of no internet?

That's why in this post, I will work with both an offline and online text to speech tool, and provide a mechanism to switch between the two, should the internet connection be down. I'm using both, because from what I've experienced, the online alternatives just sounds better than the offline ones.

Offline

Searching for an offline and easy to use text to speech tool, I came across flite. "Flite" is a lightweight version of another text to speech tool called Festival ("flite" = "festival-lite"). It is designed specifically for embedded systems and has specific commands to make it easier to use from the command line.

Installation

Flite is available in the repository and will use a mere 384kB of disk space. I suppose that indeed qualifies as lightweight

pi@piclock:~ $ sudo apt-get install flite
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
  flite
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 234 kB of archives.
After this operation, 384 kB of additional disk space will be used.
Get:1 http://mirrordirector.raspbian.org/raspbian/ jessie/main flite armhf 1.4-release-12 [234 kB]
Fetched 234 kB in 0s (395 kB/s)
Selecting previously unselected package flite.
(Reading database ... 119163 files and directories currently installed.)
Preparing to unpack .../flite_1.4-release-12_armhf.deb ...
Unpacking flite (1.4-release-12) ...
Processing triggers for man-db (2.7.0.2-5) ...
Processing triggers for install-info (5.2.0.dfsg.1-6) ...
Setting up flite (1.4-release-12) ...

Voices

Different voices are installed by default. You can list them as follows:

pi@piclock:~ $ flite -lv
Voices available: kal awb_time kal16 awb rms slt

To use a certain voice, use the "-voice" option when launching flite. For example:

pi@piclock:~ $ flite -voice slt -t "Hello, is it me you're looking for?"

If you can't find a voice you like, additional voices are available for download on the flite website: Flite English Synthesis Demo

Online

Nothing to be installed here for the speech synthesis, as it will be processed online, but a tool is required to play the received audio file.

Using the preinstalled "omxplayer", the audio seemed to be cut off and the program not stopping after playing out the file. So instead, I installed "mplayer".

pi@piclock:~ $ sudo apt-get install mplayer
Reading package lists... Done
Building dependency tree
Reading state information... Done
Note, selecting 'mplayer2' instead of 'mplayer'
The following extra packages will be installed:
  liba52-0.7.4 libbs2b0 liblircclient0 liblua5.2-0 libpostproc52 libquvi-scripts libquvi7
Suggested packages:
  lirc
The following NEW packages will be installed:
  liba52-0.7.4 libbs2b0 liblircclient0 liblua5.2-0 libpostproc52 libquvi-scripts libquvi7 mplayer2
0 upgraded, 8 newly installed, 0 to remove and 0 not upgraded.
Need to get 1,042 kB of archives.
After this operation, 2,711 kB of additional disk space will be used.
Do you want to continue? [Y/n]

For the text to speech side of things, I'm making use of the Google Translate TTS API. It's possible to pass a string to the API in the form of a URL, which will return an mp3 file containing the spoken version.

Clicking the link below should play out some audio:

https://translate.google.com/translate_tts?ie=UTF-8&client=tw-ob&tl=en&q=This%20is%20the%20Pi%20I%20O%20T%20design%20cha…

By integrating this URL in a script and make the query variable, custom responses can be generated on the fly.

Switch

In order to be able to use voice control at all times, even without an active internet connection, both solutions can be implemented and combined in order to have the code switch between them.

I wrote a script taking the desired message as an argument. The script first checks connectivity to Google. If ping to Google is successful, use the Google Translate TTS, otherwise, use "flite".

#!/usr/bin/env python

import os
from sys import argv

response = argv[1]

def check_internet():
        host = "google.com"
        connectivity = os.system("ping -W 1 -c 1 " + host)

        return connectivity

def offline_response():
        os.system("flite -voice slt -t \"" + response + "\"")

def online_response():
        url = "\"http://translate.google.com/translate_tts?ie=UTF-8&tl=en&client=tw-ob&q=" + response + "\""
        agent = "\"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:46.0) Gecko/20100101 Firefox/46.0\""
        recording = "/tmp/recording.mp3"
        os.system("wget -U " + agent + " -O " + recording + " " + url + "  && mplayer " + recording + "")

def main():
        if check_internet() == 0:
                online_response()
        else:
                offline_response()

main()

Demo

Ok, for this post's demo, I'm calling the response script defined in the previous paragraph and have it repeat the incoming speech registered via PocketSphinx, as installed in my previous post.

The first part of the video demonstrates the offline TTS by temporarily setting the ping host to a dummy value ("google.coma"), simulating internet down. In the second part, the ping host is valid, and the script uses the online TTS. You can see the audio file being downloaded on the fly.

I hope you've enjoyed this post!

Navigate to the next or previous post using the arrows.

Top Comments

Parents

mcb1 over 8 years ago

Brilliant you now have someone who can repeat your every wish ...

I'm surprised at how quick the conversion and subsequent readback is.
Almost faster than some humans ..

Mark
- Cancel
- Vote Up +3 Vote Down
- Sign in to reply
- More
- Cancel

Comment

mcb1 over 8 years ago

Brilliant you now have someone who can repeat your every wish ...

I'm surprised at how quick the conversion and subsequent readback is.
Almost faster than some humans ..

Mark
- Cancel
- Vote Up +3 Vote Down
- Sign in to reply
- More
- Cancel

Children

fvan over 8 years ago in reply to mcb1

Looking back at the videos of voice control I used on the B+, the difference in responsiveness is huge! IoT Alarm Clock - Part 4
- Cancel
- Vote Up +4 Vote Down
- Sign in to reply
- More
- Cancel