element14 Community
element14 Community
    Register Log In
  • Site
  • Search
  • Log In Register
  • About Us
  • Community Hub
    Community Hub
    • What's New on element14
    • Feedback and Support
    • Benefits of Membership
    • Personal Blogs
    • Members Area
    • Achievement Levels
  • Learn
    Learn
    • Ask an Expert
    • eBooks
    • element14 presents
    • Learning Center
    • Tech Spotlight
    • STEM Academy
    • Webinars, Training and Events
    • Learning Groups
  • Technologies
    Technologies
    • 3D Printing
    • FPGA
    • Industrial Automation
    • Internet of Things
    • Power & Energy
    • Sensors
    • Technology Groups
  • Challenges & Projects
    Challenges & Projects
    • Design Challenges
    • element14 presents Projects
    • Project14
    • Arduino Projects
    • Raspberry Pi Projects
    • Project Groups
  • Products
    Products
    • Arduino
    • Avnet Boards Community
    • Dev Tools
    • Manufacturers
    • Multicomp Pro
    • Product Groups
    • Raspberry Pi
    • RoadTests & Reviews
  • Store
    Store
    • Visit Your Store
    • Choose another store...
      • Europe
      •  Austria (German)
      •  Belgium (Dutch, French)
      •  Bulgaria (Bulgarian)
      •  Czech Republic (Czech)
      •  Denmark (Danish)
      •  Estonia (Estonian)
      •  Finland (Finnish)
      •  France (French)
      •  Germany (German)
      •  Hungary (Hungarian)
      •  Ireland
      •  Israel
      •  Italy (Italian)
      •  Latvia (Latvian)
      •  
      •  Lithuania (Lithuanian)
      •  Netherlands (Dutch)
      •  Norway (Norwegian)
      •  Poland (Polish)
      •  Portugal (Portuguese)
      •  Romania (Romanian)
      •  Russia (Russian)
      •  Slovakia (Slovak)
      •  Slovenia (Slovenian)
      •  Spain (Spanish)
      •  Sweden (Swedish)
      •  Switzerland(German, French)
      •  Turkey (Turkish)
      •  United Kingdom
      • Asia Pacific
      •  Australia
      •  China
      •  Hong Kong
      •  India
      •  Korea (Korean)
      •  Malaysia
      •  New Zealand
      •  Philippines
      •  Singapore
      •  Taiwan
      •  Thailand (Thai)
      • Americas
      •  Brazil (Portuguese)
      •  Canada
      •  Mexico (Spanish)
      •  United States
      Can't find the country/region you're looking for? Visit our export site or find a local distributor.
  • Translate
  • Profile
  • Settings
Pi-Fest
  • Challenges & Projects
  • Design Challenges
  • Pi-Fest
  • More
  • Cancel
Pi-Fest
Blog Songspire - Machine Learning and Audio classification
  • Blog
  • Forum
  • Documents
  • Polls
  • Files
  • Leaderboard
  • Mentions
  • Sub-Groups
  • Tags
  • More
  • Cancel
  • New
  • Share
  • More
  • Cancel
Group Actions
  • Group RSS
  • More
  • Cancel
Engagement
  • Author Author: feiticeir0
  • Date Created: 14 Jun 2022 2:39 PM Date Created
  • Views 1290 views
  • Likes 6 likes
  • Comments 5 comments
  • machine learning audio classification
  • shallow learning
  • songspire
  • pi
  • audio classification
  • ml
  • raspberry pi pico
  • machine learning
  • pi-fest
  • deep learning
  • pi-fest songspire
Related
Recommended

Songspire - Machine Learning and Audio classification

feiticeir0
feiticeir0
14 Jun 2022
Songspire - Machine Learning and Audio classification

image

Hi all ! Hope everyone is fine.

What is Machine Learning (ML)

Since my project will deal with ML, fair is to briefly explain what is Machine Learning.

I'm not an expert on this and i'm still learning - this is a big big scientific field, with a lot of options to choose from and a lot to read and study.

Machine Learning is one of the most promising field currently on programming.  People that deal with it are called Data Scientists.

ML is a subfield of Artificial Intelligence .

Kinds of ML

ML algorithms mainly fall into one of two categories - supervised learning and unsupervised learning

The difference is small, but significant

Supervised Learning

This type of learning is used when we already have data labeled that we will use to train our model to predict the future .

IE: You're a real estate agent. You have the a lot of data from previous sales that shows the values of houses, based on size, neighborhood, what similar houses have sold for, etc..

That data shows the relation of number of bedrooms, meters square, the neighborhood and the price it sold.

Using this data, we can train a network to predict the price of a house based on those parameters.

This is supervised learning.  The computer will try to work out the relationship between all those fields.

Unsupervised learning

Using the same example from above, you have all these values, but this time, only the number of bedrooms, the size of the house and the neighborhood, but they don't have any labels on them, so you don't know which is which or what all means.

You don't know what all these values mean, but perhaps you can find a pattern in there.

You feed this to a ML algorithm that will try to find patterns in that data, without having previous knowledge of the data or knowing what it means.

Hey, can't we used ML to predict the lottery numbers ?

It turns out, you can't.

Unsurprisingly, studies have been done and a research has been made to prove that you can't. In one word - randomness.

Mathematicians and ML experts agree that AI can't be used to predict numbers randomly drawn.  Sorry !

Here's an article in Medium, from Pavel Baidaus explaining this. It's fun to read.

Audio Classification

Sound Classification is one of the most widely used applications in Audio deep learning. Learning to classify sounds and predict the category of said sound.

What is sound ?

A sound signal is produced by variations in air pressure. We can measure the intensity of those variations and plot them over time.

Here's a very crude representation of a sound wave.

Audio Wave

Digital Audio

To represent a sound digitally, we turn the sound waves into numbers. We do this by measuring the sound wave amplitude at fixed time intervals.  Each measurement is a sample. The sample rate is the number of samples taken per second.

A common sample rate is 44.1KHz - 44.100 samples per second. Why this number ? (Remember, is a common sample rate, there are others)

According to Nyquist Sampling Theorem, the sampling frequency to produce the exact original waveform should be double the original frequency of the signal. The human hearing bandwidth is 20Hz-20KHz , hence the 44.1KHz more commonly used.

Waveform

A Waveform is a representation of the signal's amplitude at a specific time. Below we have an example of a waveform of a wav file with the word "right".

image

Spectrograms

Since a signal produces different sounds in time, its frequencies also vary with time.

A Spectrogram is an image representation of the waveform of a signal.  It shows its frequency intensity over time.

Here's the spectrogram of the word "right" - the same above

image

How is audio classified ?

Well, why bother with all this of getting the audio spectrogram ?

Because what we're going to feed our ML model is not the raw audio data, but a spectrogram of it .

Because deep learning CNNs - Convolutional neural nets - (this is for another post) are so great dealing with images, we feed them an image representation of our audio signal and let them learn with it.

The steps are (in a broad sense - there's a lot of fine details) :

  • Audio raw data in a wave file (.wav)
  • Convert the audio data into the spectrogram
  • optional steps involve:
    • augment the data (more on this on another post)
    • crop or resize - normalize - the data
  • feed the image data to the deep learning/shallow learning architecture for learning and feature extraction
  • generate output predictions by passing it to a classifier of fully connected layers.

image

This project will use a classifier to try and classify the bird singing.

References

https://medium.com/@ageitgey/machine-learning-is-fun-80ea3ec3c471

https://micropyramid.com/blog/understanding-audio-quality-bit-rate-sample-rate/

https://towardsdatascience.com/audio-deep-learning-made-simple-sound-classification-step-by-step-cebc936bbe5

  • Sign in to reply
  • feiticeir0
    feiticeir0 over 3 years ago in reply to Jan Cumps

    Hi Jan Cumps !

    Because we're talking about Pico, TinyML will have to be used because of the constraints. There are some studies being made related to predictive maintenance, called anomaly detection. I'm guessing it's the same thing.

    There are some coursers over coursera by Shawn Hymel, using Edge Impulse that can help you with that. This one is an introduction to ML, focusing on audio. It's great.

    https://www.coursera.org/learn/introduction-to-embedded-machine-learning/home/week/1

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
  • feiticeir0
    feiticeir0 over 3 years ago in reply to robogary

    Hi robogary

    I haven't started to create the model yet. I'm still gathering data and analyzing what could be the best approach.

    I will have some non-birds sounds for the model to analyze as non-birds, as some background noise . Some data augmentation will have to be used.

    Of course, I will concentrate on just the birds near me - impossible to classify al the birds in the world Slight smile .

    I'll keep everybody posted.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
  • robogary
    robogary over 3 years ago in reply to Jan Cumps

    gearbox noise is a great idea and amazing business potential.  Avoid HAL9000 diagnostics, tell you a gearbox needs replaced, and locks you out when you go to change it.  :-) 

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
  • robogary
    robogary over 3 years ago

    I really enjoy this project. Do you employ any filtering to eliminate non-bird frequencies, especially low freq  ? 

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
  • Jan Cumps
    Jan Cumps over 3 years ago

    There have been some interesting posts on ML here on the community lately. I'm going to follow along with your adventure.
    I want to investigate its capabilities for predictive maintenance. There's also this article: TinyML Gearbox Fault Prediction.

    I don't have a preference for TinyML or some other stack - I'm completely new to the subject. But I have a way to gather physical data from gearboxes, and want to see if I can feed that into an ML process.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
element14 Community

element14 is the first online community specifically for engineers. Connect with your peers and get expert answers to your questions.

  • Members
  • Learn
  • Technologies
  • Challenges & Projects
  • Products
  • Store
  • About Us
  • Feedback & Support
  • FAQs
  • Terms of Use
  • Privacy Policy
  • Legal and Copyright Notices
  • Sitemap
  • Cookies

An Avnet Company © 2025 Premier Farnell Limited. All Rights Reserved.

Premier Farnell Ltd, registered in England and Wales (no 00876412), registered office: Farnell House, Forge Lane, Leeds LS12 2NE.

ICP 备案号 10220084.

Follow element14

  • X
  • Facebook
  • linkedin
  • YouTube