Wio Terminal Sensor Fusion - TinyML Keyword Spotting Part 1

ralphjy

14 Nov 2020

Sensors

Enter Your Electronics & Design Project for a chance to win a $200 shopping cart!

Submit an EntrySubmit an Entry Back to homepage

Project14 Home

Monthly Themes

Monthly Theme Poll

As part of my sensors project I thought that I would use voice commands to toggle the LCD display on and off. There has been a lot of activity in the last couple of years in the area of machine learning (ML) on embedded microcontrollers which have limited memory and compute resources. TinyML using TensorFlow Lite has become a fairly popular method for implementation.

Edge Impulse has developed a framework https://docs.edgeimpulse.com/docs/getting-started that has made it easy to create and deploy models (impulses) to a small set of development boards and mobile devices. They also provide a porting guide to use their framework with boards that are not currently supported. It seems like using the Edge Impulse framework might be more straightforward than the other TensorFlow Lite tutorials that I've seen so I'm going to give it a try.

The Wio Terminal falls in the category of "Community Board" which means that Seeed has created the interface to Edge Impulse rather than it being a directly supported board. The only sensors currently supported are the onboard 3 axis accelerometer and a gas sensor. I tried the example using the accelerometer with Edge Impulse to detect gestures and that worked well. There is a "data forwarder" interface that should allow sending any sensor sample data over serial to Edge Impulse.

So, there are a couple of challenges to doing something simple like keyword spotting. The first is to generate enough labeled audio data and use it to create a model. Then you need to deploy that model to your device and incorporate it with the application code. It turns out that the biggest challenge will probably be just interfacing the microphone on the Wio Terminal. As a matter of fact, the response that I got on the Seeed forum is that the microphone would not work for keyword spotting. That doesn't make sense so I'm going to try it anyway. But because it is an unknown, I thought I should try the process on a supported board to verify that I could get it working.

When doing keyword spotting the method used is "continuous audio sampling" which runs the model (impulse) in parallel with capturing the audio data rather than running them sequentially which could cause missing detection windows due to processing latency. The sequential vs continuous sampling is shown in these images from the Edge Impulse documentation:

Currently, continuous audio sampling is implemented on the ST B-L475E-IOT01A and the Arduino Nano 33 BLE Sense boards. The implementation uses double buffering which allows the impulse to use one buffer while the other one is being loaded with sample data. I have an Arduino Nano 33 BLE Sense board that I've used with Edge Impulse examples so I'm going to use that to initially develop and test the keyword spotting model before I try it on the Wio Terminal.

Keyword Spotting Impulse Development

Data acquisition

I want to use the impulse to turn the LCD display and NeoPixel strip on and off, so I'm going to train it to recognize the keywords "on" and "off". The key to accurate inference is having a robust data set. For keyword detection it is recommended to use 1 second data windows and at least 4 classification categories (labels). For labels I have the two keywords and a category for background "noise" and for words other than the keywords "unknown". Normally one would generate the labeled data using data captured from the target board but to save the time required to capture many 1 second samples I elected to use a prebuilt dataset and test how it performs. I used http://download.tensorflow.org/data/speech_commands_v0.02.tar.gz which is a very large (> 3GB) dataset that contains 20 core keywords ("Yes", "No", "Up", "Down", "Left", "Right", "On", "Off", "Stop", "Go", "Zero", "One", "Two", "Three", "Four", "Five", "Six", "Seven", "Eight", and "Nine"). I used the "On" and "Off" data and merged that with the "noise" and "unknown" data from this dataset https://cdn.edgeimpulse.com/datasets/keywords2.zip . It contains 25 minutes of data per class, split up in 1 second windows, So, that's 1500 samples per label and that data is split using 80% as Training data and 20% as Test data. This process is well documented by Edge Impulse so I won't repeat it here.

Here's a quick snapshot of how the Training data looks in the Data Acquisition window:

Then you create your impulse in the Impulse design window - shown here with the Data, Processing, and Learning blocks used for this impulse.

Then you test and deploy the impulse. You can choose to deploy the impulse as a library or as firmware if you are using a supported board. I chose to deploy it as an Arduino library that I could use to control device actions within my own program.

You can then load the library using the Library manager in the Arduino IDE. Library is shown loaded into my libraries directory:

I used the nano_ble33_sense_microphone_continuous.ino example to create my test program

Here's the program modified to toggle the LED_BUILTIN on when it recognizes the keyword "on" and off when it recognizes "off".

nano_ble33_sense_microphone_continuous.ino

/* Edge Impulse Arduino examples
 * Copyright (c) 2020 EdgeImpulse Inc.
 *
 * Permission is hereby granted, free of charge, to any person obtaining a copy
 * of this software and associated documentation files (the "Software"), to deal
 * in the Software without restriction, including without limitation the rights
 * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the Software is
 * furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 * SOFTWARE.
 */

// If your target is limited in memory remove this macro to save 10K RAM
#define EIDSP_QUANTIZE_FILTERBANK   0

/**
 * Define the number of slices per model window. E.g. a model window of 1000 ms
 * with slices per model window set to 4. Results in a slice size of 250 ms.
 * For more info: https://docs.edgeimpulse.com/docs/continuous-audio-sampling
 */
#define EI_CLASSIFIER_SLICES_PER_MODEL_WINDOW 3

/* Includes ---------------------------------------------------------------- */
#include <PDM.h>
#include <on-off-nano_33_ble_inference.h>

/** Audio buffers, pointers and selectors */
typedef struct {
    signed short *buffers[2];
    unsigned char buf_select;
    unsigned char buf_ready;
    unsigned int buf_count;
    unsigned int n_samples;
} inference_t;

static inference_t inference;
static bool record_ready = false;
static signed short *sampleBuffer;
static bool debug_nn = false; // Set this to true to see e.g. features generated from the raw signal
static int print_results = -(EI_CLASSIFIER_SLICES_PER_MODEL_WINDOW);

/**
 * @brief      Arduino setup function
 */
void setup()
{
    // put your setup code here, to run once:
    pinMode(LED_BUILTIN, OUTPUT);
    Serial.begin(115200);

    Serial.println("Edge Impulse Inferencing Demo");

    // summary of inferencing settings (from model_metadata.h)
    ei_printf("Inferencing settings:\n");
    ei_printf("\tInterval: %.2f ms.\n", (float)EI_CLASSIFIER_INTERVAL_MS);
    ei_printf("\tFrame size: %d\n", EI_CLASSIFIER_DSP_INPUT_FRAME_SIZE);
    ei_printf("\tSample length: %d ms.\n", EI_CLASSIFIER_RAW_SAMPLE_COUNT / 16);
    ei_printf("\tNo. of classes: %d\n", sizeof(ei_classifier_inferencing_categories) /
                                            sizeof(ei_classifier_inferencing_categories[0]));

    run_classifier_init();
    if (microphone_inference_start(EI_CLASSIFIER_SLICE_SIZE) == false) {
        ei_printf("ERR: Failed to setup audio sampling\r\n");
        return;
    }
}

/**
 * @brief      Arduino main function. Runs the inferencing loop.
 */
void loop()
{
    bool m = microphone_inference_record();
    if (!m) {
        ei_printf("ERR: Failed to record audio...\n");
        return;
    }

    signal_t signal;
    signal.total_length = EI_CLASSIFIER_SLICE_SIZE;
    signal.get_data = &microphone_audio_signal_get_data;
    ei_impulse_result_t result = {0};

    EI_IMPULSE_ERROR r = run_classifier_continuous(&signal, &result, debug_nn);
    if (r != EI_IMPULSE_OK) {
        ei_printf("ERR: Failed to run classifier (%d)\n", r);
        return;
    }

    if (++print_results >= (EI_CLASSIFIER_SLICES_PER_MODEL_WINDOW)) {
        // print the predictions
        ei_printf("Predictions (DSP: %d ms., Classification: %d ms., Anomaly: %d ms.): \n",
                  result.timing.dsp, result.timing.classification, result.timing.anomaly);
        for (size_t ix = 0; ix < EI_CLASSIFIER_LABEL_COUNT; ix++) {
            ei_printf("    %s: %.5f\n", result.classification[ix].label,
                      result.classification[ix].value);
        }
        if (result.classification[2].value >= 0.85) {
            digitalWrite(LED_BUILTIN, LOW);
        } else if (result.classification[3].value >= 0.85) {
            digitalWrite(LED_BUILTIN, HIGH);
        }
#if EI_CLASSIFIER_HAS_ANOMALY == 1
        ei_printf("    anomaly score: %.3f\n", result.anomaly);
#endif

        print_results = 0;
    }
}

/**
 * @brief      Printf function uses vsnprintf and output using Arduino Serial
 *
 * @param[in]  format     Variable argument list
 */
void ei_printf(const char *format, ...) {
    static char print_buf[1024] = { 0 };

    va_list args;
    va_start(args, format);
    int r = vsnprintf(print_buf, sizeof(print_buf), format, args);
    va_end(args);

    if (r > 0) {
        Serial.write(print_buf);
    }
}

/**
 * @brief      PDM buffer full callback
 *             Get data and call audio thread callback
 */
static void pdm_data_ready_inference_callback(void)
{
    int bytesAvailable = PDM.available();

    // read into the sample buffer
    int bytesRead = PDM.read((char *)&sampleBuffer[0], bytesAvailable);

    if (record_ready == true) {
        for (int i = 0; i<bytesRead>> 1; i++) {
            inference.buffers[inference.buf_select][inference.buf_count++] = sampleBuffer[i];

            if (inference.buf_count >= inference.n_samples) {
                inference.buf_select ^= 1;
                inference.buf_count = 0;
                inference.buf_ready = 1;
            }
        }
    }
}

/**
 * @brief      Init inferencing struct and setup/start PDM
 *
 * @param[in]  n_samples  The n samples
 *
 * @return     { description_of_the_return_value }
 */
static bool microphone_inference_start(uint32_t n_samples)
{
    inference.buffers[0] = (signed short *)malloc(n_samples * sizeof(signed short));

    if (inference.buffers[0] == NULL) {
        return false;
    }

    inference.buffers[1] = (signed short *)malloc(n_samples * sizeof(signed short));

    if (inference.buffers[0] == NULL) {
        free(inference.buffers[0]);
        return false;
    }

    sampleBuffer = (signed short *)malloc((n_samples >> 1) * sizeof(signed short));

    if (sampleBuffer == NULL) {
        free(inference.buffers[0]);
        free(inference.buffers[1]);
        return false;
    }

    inference.buf_select = 0;
    inference.buf_count = 0;
    inference.n_samples = n_samples;
    inference.buf_ready = 0;

    // configure the data receive callback
    PDM.onReceive(&pdm_data_ready_inference_callback);

    // optionally set the gain, defaults to 20
    PDM.setGain(80);

    PDM.setBufferSize((n_samples >> 1) * sizeof(int16_t));

    // initialize PDM with:
    // - one channel (mono mode)
    // - a 16 kHz sample rate
    if (!PDM.begin(1, EI_CLASSIFIER_FREQUENCY)) {
        ei_printf("Failed to start PDM!");
    }

    record_ready = true;

    return true;
}

/**
 * @brief      Wait on new data
 *
 * @return     True when finished
 */
static bool microphone_inference_record(void)
{
    bool ret = true;

    if (inference.buf_ready == 1) {
        ei_printf(
            "Error sample buffer overrun. Decrease the number of slices per model window "
            "(EI_CLASSIFIER_SLICES_PER_MODEL_WINDOW)\n");
        ret = false;
    }

    while (inference.buf_ready == 0) {
        delay(1);
    }

    inference.buf_ready = 0;

    return ret;
}

/**
 * Get raw audio signal data
 */
static int microphone_audio_signal_get_data(size_t offset, size_t length, float *out_ptr)
{
    numpy::int16_to_float(&inference.buffers[inference.buf_select ^ 1][offset], out_ptr, length);

    return 0;
}

/**
 * @brief      Stop PDM and release buffers
 */
static void microphone_inference_end(void)
{
    PDM.end();
    free(inference.buffers[0]);
    free(inference.buffers[1]);
    free(sampleBuffer);
}

#if !defined(EI_CLASSIFIER_SENSOR) || EI_CLASSIFIER_SENSOR != EI_CLASSIFIER_SENSOR_MICROPHONE
#error "Invalid model for current sensor."
#endif

Here is short video of the impulse working. There is a bit of inference lag.

So, now I have a working example template to use. In the next post I'll go through the steps necessary to get this working on the Wio Terminal. One caveat, I'm a little late starting this so there is a possibility that I won't finish before the project deadline in 3 days.

Links to related posts

Wio Terminal Sensor Fusion - Introduction

Wio Terminal Sensor Fusion - Sensor Integration

Wio Terminal Sensor Fusion - Remote Data Display and Control using Blynk

Wio Terminal Sensor Fusion - Remote Data Display and Control using Blynk continued

Top Comments

fmilburn over 4 years ago

Hi Ralph,
I am impressed it is working that well also. Nice work.
- Cancel
- Vote Up +1 Vote Down
- Sign in to reply
- More
- Cancel
shabaz over 4 years ago

Hi Ralph,
Great blog post, thanks for very clearly explaining it all, and the result is impressive too!
This is an awesome resource, bookmarked.
- Cancel
- Vote Up +2 Vote Down
- Sign in to reply
- More
- Cancel