Reason for the Design Challenge:
This summer I have been focused on trying tinyML with different platforms and in different applications. The current application that I've been working on is an edge deployed multi-sensor secured entry (cloud interaction not required). I already have a commercial video doorbell that serves me well, it let's me know when there is motion in its field of view. It uses a simple image differencing algorithm within defined regions of interest, so it is prone to a lot of false alarms. I'd like to create a setup where I can use tinyML with vision and voice to detect humans approaching my front door and also to identify the individual to allow entry. I have tried something similar in a Hackster project that used a LUX-ESP32 device that used Luxonis DepthAI with Intel OpenVino models. That was a reasonably high power application due to the Myriad-X VPU. Inferencing was consuming 2-4 W @ 15-30 FPS. I thought that it would interesting and fun to try a low power solution using much more constrained hardware. I noticed that Infineon has recently added tinyML capability to its Modus ToolBox for the PSoC MCUs (MTB-ML). Hopefully, I can get that to work for vision and voice classification. Otherwise, I saw that there is a keyword spotting implementation using a PSoC6 and ModusToolbox with the Edge Impulse framework. That will be my fallback, although I haven't seen any vision applications for the PSoC, so that will be the biggest challenge. MTB-ML currently only has a gesture (accelerometer) example.
Project Description
Concept
I have been working on the concept of a multi-stage security system for the front entry door of my house. There has been a trend lately to "sensor fusion" to enhance application capability. The use of different sensors in the various stages would allow me to optimize the power of the overall system and improve the accuracy of person identification.
Stage 1 Intrusion Detection: This is an "always on" low power microwave sensor that would detect a person or animal entering the zone around the entry door. This would trigger the operation of the next stage. I designed a portable unit for my Hackster project that communicated via MQTT and I'm going to reuse that for this project.
Stage 2 Object Classification: This would classify the detected object as a person or animal using a trained tinyML model with camera and microphone inputs. This stage will alert via MQTT that something "live" is in the detection zone.
Stage 3 Image Recognition/Audio Codeword: If the object is classified as a person then entry access would be granted if the person is identified by image recognition and/or an audio codeword response.
This proof of concept project would only implement the control logic, I haven't invested in an electronic lock yet. There will be quite a few "challenges" to overcome, so I haven't attempted to come up with a rigorous schedule. I do have a sequence (flow) of steps that I'll try to follow.
Step 1: Develop MQTT client to communicate with RPi4 server
Step 2: Develop keyword spotting capability using Edge Impulse (since there is an existing example)
Step 3: Try to do Step2 using MTB-ML
Step 4: Develop camera application (need to determine which camera to use)
Step 5: Develop object classification model using MTB-ML or Edge Impulse if necessary
Step 6: Develop person recognition model based learning from Step5
Step 7: Develop integrated application and test
Step 8: Measure power consumption in the various stages of operation