This is the final blog post of the Pi Spy Audio Recorder, which is part of the Project 14 Spy Nerd theme . As part of this blog post I wanted to use machine learning to convert the recorded wav file by the spy recorder into text, and them email the same to the spy. Now there are a bunch of speech to text engines, but based on my reading of a few blog post on the internet, it looked like Mozilla DeepSpeech seem to be the best one to use, and there are already available trained models for American English.
DeepSpeech is an open-source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow to make the implementation easier. Documentation for installation, usage, and training models are available on deepspeech.readthedocs.io
My two previous blog posts used the Raspberry Pi Zero W , and I soon realized the setup below would not work on the Pi zero . and had to move the same to the latest Pi 4 with 8 GB RAM. Here are the two previous blog post just in case you plan to create this project and have the recorded wav file emailed to you instead of the text. Here are the links to the previous blog post
Pi Spy Audio Recorder - part 1
Pi Spy Audio Recorder - part 2 - PIR sensor and Email
Here is a picture of the setup for this blog post
As part of the build/circuit, as you see in the picture above
- the PIR sensor signal pin, the yellow wire is connected to GPIO 4 on the Raspberry Pi 4
- and the USB mic is connected to the Pi's USB port
- the SD card is flashed with 2021-05-07-raspios-buster-armhf-lite image
Now, before you run the python program attached, you will need to follow the setup steps below
- SSH into Pi4 and create a virtual environment called p14spy using the command
python3 -m venv ./some/pyenv/dir/path/p14spy
- go to the virtual enviroment
source ./some/pyenv/dir/path/p14spy/bin/activate
- install deep speech, I installed an older version , as the latest 0.9.X version is error'ing out
pip3 install deepspeech==0.6.0
- create a directory to download the DeepSpeech models
mkdir -p ./some/workspace/path/p14spy
cd ./some/workspace/path/p14spy
- use curl to download the DeepSpeech models, this is going to take some time , so go an grab yourself a coffee after running this command
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.6.0/deepspeech-0.6.0-models.tar.gz
this is more than a 1 GB download, and in my case took about 26 mins.
- once done unzip the tar
tar -xvzf deepspeech-0.6.0-models.tar.gz
- also download a few audio samples which we will use to test that the install and all the dependencies installed okay, and is working as expected.
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.6.0/audio-0.6.0.tar.gz
- unzip the tar
tar -xvzf audio-0.6.0.tar.gz
- now to check if you install is successful run deepspeech
deepspeech --model deepspeech-0.6.0-models/output_graph.tflite --lm deepspeech-0.6.0-models/lm.binary --trie ./deepspeech-0.6.0-models/trie --audio ./audio/4507-16021-0012.wav
- In addition to the above setup, like in the first blog post Pi Spy Audio Recorder - part 1 , we will setup pyaudio using pip, and check if we are able to record audio using the USB microphone
pip3 install pyaudio
arecord -l
mkdir recordings
arecord -D plughw:1,0 -d 5 recordings/test1.wav
- Now we will write a quick python program to read the wav file we just recorded.
import deepspeech import wave import numpy as np model_file_path = 'deepspeech-0.6.0-models/output_graph.tflite' beam_width = 500 model = deepspeech.Model(model_file_path, beam_width) lm_file_path = 'deepspeech-0.6.0-models/lm.binary' trie_file_path = 'deepspeech-0.6.0-models/trie' lm_alpha = 0.75 lm_beta = 1.85 model.enableDecoderWithLM(lm_file_path, trie_file_path, lm_alpha, lm_beta) filename = 'recordings/test1.wav' w = wave.open(filename, 'r') rate = w.getframerate() frames = w.getnframes() print(rate) print(model.sampleRate()) buffer = w.readframes(frames) type(buffer) data16 = np.frombuffer(buffer, dtype=np.int16) type(data16) text = model.stt(data16) print(text)
In my case as part of the as part of the wav file I recorded saying "Project 14 spy nerd theme", but Deepspeech returned "project four ten spy nurse" ..
Now if you have successfully got to this point you are ready to download the python program attached below, which will send you/the spy an email with the text in the email body of the spy recording on the Pi 4. And, if you want to refer back to the recording stored on the SD card the name of the wav file is in the subject of the email. Here is a sample email sent from the Pi4.
For a video demo, please refer to the video in the previous blog post(Pi Spy Audio Recorder - part 2 - PIR sensor and Email) as the setup remain the same, just replace the Pi Zero W with the Pi4 - Demo Video for Pi Spy Audio recorder