Songspire - Machine Learning and Audio classification

14 Jun 2022

Hi all ! Hope everyone is fine.

What is Machine Learning (ML)

Since my project will deal with ML, fair is to briefly explain what is Machine Learning.

I'm not an expert on this and i'm still learning - this is a big big scientific field, with a lot of options to choose from and a lot to read and study.

Machine Learning is one of the most promising field currently on programming. People that deal with it are called Data Scientists.

ML is a subfield of Artificial Intelligence .

Kinds of ML

ML algorithms mainly fall into one of two categories - supervised learning and unsupervised learning

The difference is small, but significant

Supervised Learning

This type of learning is used when we already have data labeled that we will use to train our model to predict the future .

IE: You're a real estate agent. You have the a lot of data from previous sales that shows the values of houses, based on size, neighborhood, what similar houses have sold for, etc..

That data shows the relation of number of bedrooms, meters square, the neighborhood and the price it sold.

Using this data, we can train a network to predict the price of a house based on those parameters.

This is supervised learning. The computer will try to work out the relationship between all those fields.

Unsupervised learning

Using the same example from above, you have all these values, but this time, only the number of bedrooms, the size of the house and the neighborhood, but they don't have any labels on them, so you don't know which is which or what all means.

You don't know what all these values mean, but perhaps you can find a pattern in there.

You feed this to a ML algorithm that will try to find patterns in that data, without having previous knowledge of the data or knowing what it means.

Hey, can't we used ML to predict the lottery numbers ?

It turns out, you can't.

Unsurprisingly, studies have been done and a research has been made to prove that you can't. In one word - randomness.

Mathematicians and ML experts agree that AI can't be used to predict numbers randomly drawn. Sorry !

Here's an article in Medium, from Pavel Baidaus explaining this. It's fun to read.

Audio Classification

Sound Classification is one of the most widely used applications in Audio deep learning. Learning to classify sounds and predict the category of said sound.

What is sound ?

A sound signal is produced by variations in air pressure. We can measure the intensity of those variations and plot them over time.

Here's a very crude representation of a sound wave.

Audio Wave

Digital Audio

To represent a sound digitally, we turn the sound waves into numbers. We do this by measuring the sound wave amplitude at fixed time intervals. Each measurement is a sample. The sample rate is the number of samples taken per second.

A common sample rate is 44.1KHz - 44.100 samples per second. Why this number ? (Remember, is a common sample rate, there are others)

According to Nyquist Sampling Theorem, the sampling frequency to produce the exact original waveform should be double the original frequency of the signal. The human hearing bandwidth is 20Hz-20KHz , hence the 44.1KHz more commonly used.

Waveform

A Waveform is a representation of the signal's amplitude at a specific time. Below we have an example of a waveform of a wav file with the word "right".

Spectrograms

Since a signal produces different sounds in time, its frequencies also vary with time.

A Spectrogram is an image representation of the waveform of a signal. It shows its frequency intensity over time.

Here's the spectrogram of the word "right" - the same above

How is audio classified ?

Well, why bother with all this of getting the audio spectrogram ?

Because what we're going to feed our ML model is not the raw audio data, but a spectrogram of it .

Because deep learning CNNs - Convolutional neural nets - (this is for another post) are so great dealing with images, we feed them an image representation of our audio signal and let them learn with it.

The steps are (in a broad sense - there's a lot of fine details) :

Audio raw data in a wave file (.wav)
Convert the audio data into the spectrogram
optional steps involve:
- augment the data (more on this on another post)
- crop or resize - normalize - the data
feed the image data to the deep learning/shallow learning architecture for learning and feature extraction
generate output predictions by passing it to a classifier of fully connected layers.

This project will use a classifier to try and classify the bird singing.

References

https://medium.com/@ageitgey/machine-learning-is-fun-80ea3ec3c471

https://micropyramid.com/blog/understanding-audio-quality-bit-rate-sample-rate/

https://towardsdatascience.com/audio-deep-learning-made-simple-sound-classification-step-by-step-cebc936bbe5

feiticeir0 over 3 years ago in reply to Jan Cumps

Hi Jan Cumps !

Because we're talking about Pico, TinyML will have to be used because of the constraints. There are some studies being made related to predictive maintenance, called anomaly detection. I'm guessing it's the same thing.

There are some coursers over coursera by Shawn Hymel, using Edge Impulse that can help you with that. This one is an introduction to ML, focusing on audio. It's great.

https://www.coursera.org/learn/introduction-to-embedded-machine-learning/home/week/1
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
feiticeir0 over 3 years ago in reply to robogary

Hi robogary

I haven't started to create the model yet. I'm still gathering data and analyzing what could be the best approach.

I will have some non-birds sounds for the model to analyze as non-birds, as some background noise . Some data augmentation will have to be used.

Of course, I will concentrate on just the birds near me - impossible to classify al the birds in the world .

I'll keep everybody posted.
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
robogary over 3 years ago in reply to Jan Cumps

gearbox noise is a great idea and amazing business potential. Avoid HAL9000 diagnostics, tell you a gearbox needs replaced, and locks you out when you go to change it. :-)
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
robogary over 3 years ago

I really enjoy this project. Do you employ any filtering to eliminate non-bird frequencies, especially low freq ?
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
Jan Cumps over 3 years ago

There have been some interesting posts on ML here on the community lately. I'm going to follow along with your adventure.
I want to investigate its capabilities for predictive maintenance. There's also this article: TinyML Gearbox Fault Prediction.

I don't have a preference for TinyML or some other stack - I'm completely new to the subject. But I have a way to gather physical data from gearboxes, and want to see if I can feed that into an ML process.
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel