Deep Learning for angry dog sound classification #11

24 May 2020

You don't have permission to edit metadata of this video.

Edit media

Dimensions x
Subject (required) Brief Description Tags (separated by comma) Video visibility in search results
Parent content

Poster
Upload Preview

Introduction

As the deadline have been extended, we are working future more to use of deep learning for recognised to detect dog barking which may harmful of the pedestrian.

We train and deploy under AWS SageMaker and connect our IoT device though AWS IoT by fine turning the pre-trained AudioSet by Google's open source as part of Tensorflow model,

we fine turning on top of this model as output only two variance: Angrydog or other, which acheive around 90% accuracy.

Amazon SageMaker

Amazon SageMaker is a service provided by AWS for run of Machine Learning, the platform provided easy of use tools for build of Machine Learning such as the SageMaker Autopilot running inside new Amazon SageMaker Studio for general data analysis, on the other hand, it also flexible to run custom tasks as specific the run machine and kernel allow install custom modules and git repository.

We first to study run under Amazon SageMaker Studio which provide Autopilot to automate model build, however it only accept csv good for business data analysis but not good for this job require analysis wavform, so we train our model by custom python script inside SageMaker Studio.

In additional, SageMaker Studio also not allow access by external web socket, which we could like to trigger the recognising jobs automatically, so we shift to use SageMaker Notebook instances, which allow us build of model and run of predict and connect outside by websocket inside a virtual machine.

We creative a new Notebook instances with import the Git repositories https://github.com/tensorflow/models

The Notebook instances similar the Studio also run on top of Jupyterlab.

We active the Conda Environment tensorflow_p36 which pre-installed Tensorflow:

source activate tensorflow_p36

We also install additional lib: resampy, pysoundfile and libsndfile.

conda install -c conda-forge resampy
conda install -c conda-forge pysoundfile
conda install -c conda-forge libsndfile

Prepare the wav and call of SageMaker

Last blog, our sound clip send to Transcribe for speech recognition, here we redirect the wav file storage under S3 to trigger the SageMaker work for predict.

Change our voice_consumption function to invoke another Lambda function call test_sound_ai:

def run_ai(soundkey):
    
    event = '{"soundkey":"'+soundkey+'"}'   
    client = boto3.client('lambda')
    response = client.invoke(
        FunctionName='test_sound_ai',
        Payload=event
    )
    print (response)
    return

Inside the test_sound_ai function we open the websocket terminals of the Notebook instances, start the iot_inference.py and pass the arg of the sample's S3 key work for future process.

ws = websocket.create_connection(
        "wss://{}/terminals/websocket/1".format(http_hn),
        cookie=cookies,
        host=http_hn, id
        origin=http_proto + "//" + http_hn
    )
    
    ws.send("""[ "stdin", "source activate tensorflow_p36\\r" ]""")
       
    ws.send("""[ "stdin", "cd /home/ec2-user/SageMaker/models/research/audioset/yamnet\\r" ]""")
    
    ws.send("""[ "stdin", "python iot_inference.py """+soundkey+"""\\r" ]""")

Use of Yamnet Default model

The Yamnet is the pretrained model which provide 521 audio class based on AudioSet,

which included class relative to dog sound:

https://github.com/tensorflow/models/blob/master/research/audioset/yamnet/yamnet_class_map.csv

67 Animal, 69 Dog, 70 Bark is highly relative to our requirement, so if we direct use of this Yamnet default model, we simply calculate the score on top of those classes.

dogscore = 0
    for i in top5_i:
      if yamnet_id[i] == 67:
        dogscore += prediction[i] * 0.25
      elif yamnet_id[i] == 68:
        dogscore+=prediction[i]*0.25
      elif yamnet_id[i] == 69:
        dogscore+=prediction[i]*0.7
      elif yamnet_id[i] == 79:
        dogscore+=prediction[i]*0.25
     # we hate cat
      elif yamnet_id[i] == 76:
        dogscore-=prediction[i]*0.25

    print('like_dog:',dogscore,'\n')

The iot_inference_yamnet.py direct use the dogscore to trigger the device play the alarm sound "Cat Meow" download from S3 storage and send though AWS IoT MQTT.

if dogscore < 0.5 :
        return
    s3 = boto3.resource('s3')
    obj = s3.Object('voicerecognise','alarm.pcm')
    alarm = obj.get()['Body'].read()
    total_alarm_section = int(len(alarm)/1536)
    alarm_section = total_alarm_section
    while alarm_section:
        print(alarm_section,',')
        section_data = base64.b64encode(alarm[alarm_section*1536:(alarm_section+1)*1536]).decode("utf-8")
        message = "{ \"requests\":\"alarm\",\"section\":\""+str(alarm_section)+"\",\"totalsection\":\""+str(total_alarm_section)+"\",\"data\":\""+ section_data + "\"}"
        try:
            aitopic = things+'/ai/get'
            response = iotclient.publish(
               topic=aitopic,
               qos=0,
               payload=message
            )
        except:
            print ("UnauthorizedException")
        
        alarm_section-=1
        time.sleep(0.005)

Fine-Turning of Angry Dog

The default Yamnet only detect dog barking, not identify of good dog and bad dog which the barking is angry and likely to attack the innocent, so we need Fine-turning.

After some research and evaluate, By reference of https://medium.com/@laanlabs/building-a-train-horn-detection-neural-network-ff368f127c1, we use some approval for fine-turning as give best result, which is extract the dense before default classifying, by train on top of this dense, as result only with two classified : angrydog and other.

The samples

Under https://freesound.org/ we download and select some very angry dog samples and put into the S3 Bucket "soundsample/angrydog",

we also download some sample for other included Cat, bird, and noise mostly outside of house and put under "soundsample/other".

We also included some pretty dogs sound to help Machine learning to find of different between bad and good dog, the overall accuracy reduced compare just check of dog sound because it harder to recognised.

We also download some sample from Free Sound Clips | SoundBible.com with separated with angrydog and other for test (evaluation) purpose, which used for find the final accuracy of our model.

load_dataset("angrydog",False)
load_dataset("other",False)
shuffle()

print(" Loaded samples: " , samples.shape, samples.dtype,  classtypesindex.shape)

input_layer = layers.Input(shape=(1024,))
output = layers.Dense(1024, activation=None)(input_layer)
output = layers.Dense(2, activation='softmax')(output)
model = Model(inputs=input_layer, outputs=output)
opt = SGD(lr=0.002, decay=1e-5, momentum=0.8, nesterov=True)   
model.compile(optimizer=opt, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history = model.fit(samples, classtypesindex, epochs=160, validation_split=0.200)
history = model.fit(samples, classtypesindex, epochs=5)

samples = []
classtypesindex = []
classtypes = []

load_dataset("angrydog",True)
load_dataset("other",True)
shuffle()
test_mse_score, test_mae_score = model.evaluate(samples,classtypesindex)

model.save("angrydog.h5", include_optimizer=False)

We load the dataset, shuffle it and train the model on top of default Yamnet.

We also evaluate the accuracy by test samples, as result we got about 90% accuracy.

Because we put bad/good dog barking to confusing the deep learning process so the accuracy not much good, even human sometime is hard to identify the friendly or danger dog sound.

At the end, we save the model for future use.

The termial output like this:

other/Car Driving-SoundBible.com-923766101.wav
other/Cow_Moo-Mike_Koenig-42670858.wav
other/Train_Honk_Horn_2x-Mike_Koenig-157974048.wav
other/puppy-barking_ds.wav
other/street-daniel_simon.wav
400/400 [==============================] - 0s 57us/sample - loss: 0.5594 - acc: 0.8700
Test Mse: {}, Test Mae: {} 0.5594366431236267 0.87
Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_2 (InputLayer)         [(None, 1024)]            0         
_________________________________________________________________
dense_1 (Dense)              (None, 1024)              1049600   
_________________________________________________________________
dense_2 (Dense)              (None, 2)                 2050      
=================================================================
Total params: 1,051,650
Trainable params: 1,051,650
Non-trainable params: 0

The accuracy of our test samples is 0.87, as we are intended to put challenge such as small dog barking which is very hard to identify.

We online find some Youtube sound for evaluate the result, for cute dogs the score of angrydog around 0.35, and for angry dog sound the score of angry dog mostly higher that 0.8,

so we set the alarm threshold to 0.8 for the alarming as play Cat meow ten times.

This require about one minutes for upload the sound clips, pass to SageMaker, working for AI sound recognition process and send back result though AWS IoT.

Demo showed on above Video for reference.

Conclusion

By using Amazon SageMaker and Lambra function, we are success to received sound clip pick from Cypress Pioneer Kit PDM's Mic to recognise Angry Dog sound and feedback to Cypress Pioneer Kit's Internal Sound Card to play the alarm all though AWS Cloud.

Our final IOT platform is:

Source Code

We updated the previous summary and the final source code can be download though

https://github.com/sicreative/ConnectedCloudChallenge