Project 02: Speech Recognition and Machine Learning | Arduino Nano 33 BLE Sense Roadtest

31 Mar 2021

I welcome you to this part of my review about Arduino Nano 33 BLE Sense. My review is split into multiple blog posts. You can find all my thoughts about this Arduino and related parts in chapters with name beginning with "Review". There are also articles describing test projects like this one which I have done for gathering experiences with board and some tutorials. Main page of review contains summary and final score. Following Table of Contents contains links to other parts of my roadtest review.

Introduction
Review of Development Board
Review of Onboard Sensors
Review of Microcontroller and BLE Module
Review of Software
Review of Documentation
Tutorial 01: Accessing Sensor Values
Tutorial 02: nRF52840 Application without Arduino IDE

- Part 1 - Developing nRF52840 Application
- Part 2 - Debugging nRF52840 Using Raspberry Pi

Project 01: Gestures over BLE
Project 02: Speach Reccognition and Machine Learning (this article)
Summary and Score

Project 02: Speech Recognition and Machine Learning

As part of review process, I tried to develop machine learning app (my first ML app in my life). Sadly, it was mostly unsuccessful. Most probably because I am total novice to machine learning and my experiences were insufficient to complete this project. I tried to make voice recognition app which determines voice commands “red”, “green” and “blue” based on data from microphone and turns on appropriate colour on RGB led.

My first attempt was done by following this tutorial: EloquentTinyML: Easier Voice Classifier on Nano 33 BLE Sense. Arduino side was easy to develop and deploy. Because I had no experiences with Tensor Flow, I spent plenty of time with installing it but later it was also seamless. I tested training model multiple times and always with 60 samples. 20 samples per each classified word. Usually I was able to correctly classify “red” word but when I said nothing it was classified as blue and saying “green” were classified very randomly. But “red” command worked, and it was classified quite accurately. I tried this also with tweaked gain because results from microphone was very low volume. After changing gain, nothing changed.

My second attempt was using Edge Impulse environment. It internally also uses Tensor Flow, but it is probably much more tweaked. Some parameters of model and training algorithm are configurable, but I had no experiences to doing this. User interface is also very user friendly and almost anyone can start developing applications very quickly. I trained model similar to first case and results were slightly better, but the behaviour was mostly the same. Red worked, green triggered random colour and blue does not work at all. In this case I was unable to tweak gain because firmware was provided by edge impulse and they did not allow me to adjust any parameters of peripheral (microphone in this case).

Finally, after non-successes, I was thinking about cause for failure. I was thinking about low volume of samples. I tried to collect some samples to check if they are not deformed or noisy. I wrote application (using nRF52840 stack and not an Arduino environment for going in more deep details about microphone, interface with MCU, timing of PDM signal, and PDM peripheral to ensure that all configurations are correct) which uploaded samples over high baudrate UART to PC and visualized them using desktop application which I have also written. I tried to say “reg, “green” and “blue” and results were following.

Because I am not an audio expert, I cannot determine correctness of audio signal, but I think this audio signal is ok. See values range. PDM module reports 16-bit samples ranging from minimum -32768 to maximum 32767. I said words as I normally speak but received amplitudes were in range about -300 to 300. I zoomed to middle part of signal and it looks good and correct, I think. Noise is also pretty low. At following picture you can see zoomed middle part of signal.

Finally, I tried to clap to check amplitude ranges and results were following. As you can see signal went very high. I received samples with amplitudes between -14699 to 26550. After these tests I think PDM microphone, MCU and PDM peripheral works good and audio signal is not an issue of my ML failure.

I thought about samples processing. When samples are collected there are calculated RMS (root mean square) and RMS samples are passed to ML library. I am not sure if this is good approach. I think that RMS mostly depend of amplitudes and not a signal frequency which is more important in audio signal, I think. If my thought is correct, this also matches behaviour of “red” and non-red voice commands. If you look back to first chart, you can see that “red“ word (“red” is first part of chart) has different amplitude behaviour than “green” and “blue” commands which were many times confused in opposition to “red” which was classified very clearly. Currently I have no time to do other experiments with trying passing frequencies rather than amplitude because it requires lot of work and I currently do not want to spend so much time with it. But in the future, I plan to return to this and try further experiments.

Lastly, I want to say that I am not the first who failed with this. In fact, author of mentioned tutorial also says that his model and recognitions were not very accurate. On element14 at Edge Impulse webinar page you can see in discussion another user who tried it (in fact, he tried it with different words and much more training samples that I have done) on the same Arduino, but also failed.

Top Comments

misaz over 3 years ago in reply to gpolder

No I did not. I setup it in following way:

Maybe FFT is part of some ot this step. I used it in this way because it was recomended for voice recongnition usecase.
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
misaz over 3 years ago in reply to ralphjy

Hello, I tried some aditional tweeking it over time. Mainly I extended sampling duration because many samples were ended in the middle of word. Results get better but still rthey are not good enough.

I will not share full model because It contains some of my personal data (voice) and I am not experienced enough to determine how much it is anonymized.

Trained model looks as follows:

I of course tried live clasification. Now results was little bit better than described in article, but still not correct and more than half of attemps were classified incorectly. Green often interfere with red. Dot is somewhere between green and red section like on follwing picture (I said "green" but classified as "red"):

Second common type of error which I have seen is interference between green and blue because dots are very near like on following test (I said "blue" but classified as "green"):

But as I said, I want to try some other experiments later when I will have time for doing it. I think the issue could also be that there are silence over most of time and classfied word is only small fraction of time of sample. Sometime when I will have enough time I want to do some other experiments.
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
gpolder over 3 years ago

did you try an FFT as preprocessing step?
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
DAB over 3 years ago

There are a lot of variables involved with getting audio recognition correct.

I remember back in the 1980's we had a system that worked well for most of us, but our engineer from Georgia could not use it.
When we tuned it to work for her, it would not work for any of the rest of us.

DAB
- Cancel
- Vote Up +1 Vote Down
- Sign in to reply
- More
- Cancel
ralphjy over 3 years ago

When you trained and built the Edge Impulse model, what accuracy did it indicate? Can you share your impulse design and the resulting Confusion matrix. Just interested as I am learning TinyML also. If your model trained okay, you may have a deployment problem. Did you try live classification with your model?
- Cancel
- Vote Up +1 Vote Down
- Sign in to reply
- More
- Cancel