Pi Spy Audio Recorder - Part 3 - Mozilla DeepSpeech

15 Nov 2021

This is the final blog post of the Pi Spy Audio Recorder, which is part of the Project 14 Spy Nerd theme . As part of this blog post I wanted to use machine learning to convert the recorded wav file by the spy recorder into text, and them email the same to the spy. Now there are a bunch of speech to text engines, but based on my reading of a few blog post on the internet, it looked like Mozilla DeepSpeech seem to be the best one to use, and there are already available trained models for American English.

DeepSpeech is an open-source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow to make the implementation easier. Documentation for installation, usage, and training models are available on deepspeech.readthedocs.io

My two previous blog posts used the Raspberry Pi Zero W , and I soon realized the setup below would not work on the Pi zero . and had to move the same to the latest Pi 4 with 8 GB RAM. Here are the two previous blog post just in case you plan to create this project and have the recorded wav file emailed to you instead of the text. Here are the links to the previous blog post

Pi Spy Audio Recorder - part 1

Pi Spy Audio Recorder - part 2 - PIR sensor and Email

Here is a picture of the setup for this blog post

As part of the build/circuit, as you see in the picture above

the PIR sensor signal pin, the yellow wire is connected to GPIO 4 on the Raspberry Pi 4
and the USB mic is connected to the Pi's USB port
the SD card is flashed with 2021-05-07-raspios-buster-armhf-lite image

Now, before you run the python program attached, you will need to follow the setup steps below

- SSH into Pi4 and create a virtual environment called p14spy using the command

python3 -m venv ./some/pyenv/dir/path/p14spy

- go to the virtual enviroment

source ./some/pyenv/dir/path/p14spy/bin/activate

- install deep speech, I installed an older version , as the latest 0.9.X version is error'ing out

pip3 install deepspeech==0.6.0

- create a directory to download the DeepSpeech models

mkdir -p ./some/workspace/path/p14spy

cd ./some/workspace/path/p14spy

- use curl to download the DeepSpeech models, this is going to take some time , so go an grab yourself a coffee after running this command

curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.6.0/deepspeech-0.6.0-models.tar.gz

this is more than a 1 GB download, and in my case took about 26 mins.

- once done unzip the tar

tar -xvzf deepspeech-0.6.0-models.tar.gz

- also download a few audio samples which we will use to test that the install and all the dependencies installed okay, and is working as expected.

curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.6.0/audio-0.6.0.tar.gz

- unzip the tar

tar -xvzf audio-0.6.0.tar.gz

- now to check if you install is successful run deepspeech

deepspeech --model deepspeech-0.6.0-models/output_graph.tflite --lm deepspeech-0.6.0-models/lm.binary --trie ./deepspeech-0.6.0-models/trie --audio ./audio/4507-16021-0012.wav

- In addition to the above setup, like in the first blog post Pi Spy Audio Recorder - part 1 , we will setup pyaudio using pip, and check if we are able to record audio using the USB microphone

pip3 install pyaudio

arecord -l

mkdir recordings

arecord -D plughw:1,0 -d 5 recordings/test1.wav

- Now we will write a quick python program to read the wav file we just recorded.

import deepspeech
import wave
import numpy as np


model_file_path = 'deepspeech-0.6.0-models/output_graph.tflite'
beam_width = 500
model = deepspeech.Model(model_file_path, beam_width)
lm_file_path = 'deepspeech-0.6.0-models/lm.binary'
trie_file_path = 'deepspeech-0.6.0-models/trie'
lm_alpha = 0.75
lm_beta = 1.85
model.enableDecoderWithLM(lm_file_path, trie_file_path, lm_alpha, lm_beta)
filename = 'recordings/test1.wav'
w = wave.open(filename, 'r')
rate = w.getframerate()
frames = w.getnframes()
print(rate)
print(model.sampleRate())
buffer = w.readframes(frames)
type(buffer)
data16 = np.frombuffer(buffer, dtype=np.int16)
type(data16)
text = model.stt(data16)
print(text)

In my case as part of the as part of the wav file I recorded saying "Project 14 spy nerd theme", but Deepspeech returned "project four ten spy nurse" ..

Now if you have successfully got to this point you are ready to download the python program attached below, which will send you/the spy an email with the text in the email body of the spy recording on the Pi 4. And, if you want to refer back to the recording stored on the SD card the name of the wav file is in the subject of the email. Here is a sample email sent from the Pi4.

For a video demo, please refer to the video in the previous blog post(Pi Spy Audio Recorder - part 2 - PIR sensor and Email) as the setup remain the same, just replace the Pi Zero W with the Pi4 - Demo Video for Pi Spy Audio recorder

Attachments:

spyAudioArecordDeepSpeech.zip
spyAudioDeepSpeechEmail.zip