1 Project Origin and Vision
1.1 The Original Tennis Picker Project
The Deaf Rover project traces its roots back to the "Tennis Picker" entry submitted for the "At the Core" Design Challenge. Tennis Picker@The Core#5 Picking Tennis via Dual-Core IPC was built around the Infineon PSoC 62S4 Pioneer Kit, a dual-core microcontroller featuring an ARM Cortex-M4 and a Cortex-M0+ core.
However, the project hit a critical wall: the ML model required for object detection far exceeded the available 256KB flash on the PSoC 62S4. The machine learning component remained incomplete by the challenge deadline.
1.2 MAX78000: The AI Microcontroller Upgrade
For the Spring Clean 2026 revival, the Deaf Rover project undergoes a fundamental platform upgrade to the Analog Devices MAX78000. This is a paradigm shift in how edge AI is executed on a microcontroller.

The MAX78000 integrates three processing elements on a single die:
1. ARM Cortex-M4F (with floating-point unit) at up to 100 MHz for application logic
2. 32-bit RISC-V core optimized for data movement and preprocessing
3. Dedicated hardware CNN accelerator with 64 parallel 8-bit processors at 50 MHz, capable of 28.8 GOPS
1.3 Current Phase: Voice-Controlled Navigation
The current phase uses a KWS (Keyword Spotting) model instead of visual AI. The rover moves based on voice commands detected by the on-board microphone. The project is named "Deaf Rover" because sometimes it misses commands like a deaf person, and occasionally moves randomly on its own.
2 Function design
2.1 Initialization
The system boots up with System Clock configured to IPO at 100MHz, followed by Cache Enable for performance. Two parallel branches initialize:
- Console UART at 115200 baud for debug output
- CNN Accelerator (load weights, configure state machine) then WUT Timer at 380us continuous mode
- I2S + HPF for audio: I2S at 16kHz mono left channel, RX FIFO threshold=4, and High Pass Filter at 100Hz cutoff (coefficient 0.995)
2.2 Audio Capture & Processing
MicRead reads 128 samples from the I2S FIFO. Each sample is:
- Extracted from 32-bit I2S word (18 MSB, right-shifted by 14)
- Passed through HPF() to remove DC offset
- Scaled to 8-bit unsigned (x4 / 256)
- Stored in circular buffer micBuff[16384]
- Average absolute amplitude is computed for voice detection
2.3 Voice Activity Detection
A state machine with two states:
- SILENCE (avg < 350): Keeps sampling, waits for voice energy to exceed threshold
- KEYWORD (avg >= 350): Starts collecting samples. Monitors for end-of-word when avg drops below 100 for 20+ consecutive chunks. Then zero-pads to fill 16384 samples.
A PREAMBLE check ensures at least 3840 samples (30 x 128) are collected before enabling detection.
2.4 CNN Inference
Once 16384 samples are ready:
- AddTranspose() rearranges data into 128x128 matrix format (16 groups x 1KB)
- CNN clock enabled at 50MHz
- cnn_load_data() copies 16KB across 4 memory banks (0x50400000, 0x50800000, 0x50C00000, 0x51000000)
- cnn_start() triggers hardware inference
- Wait for cnn_time flag from ISR
- cnn_unload() reads 21 class outputs
- cnn_stop() disables CNN clock to save power
2.5 Classification & Mapping
- Softmax converts raw CNN outputs to probabilities (Q15 fixed-point)
- check_inference() finds top-1 class; accepts only if probability > 91%
- Keyword-to-Command mapping: UP->FRONT, DOWN->BACK, LEFT->LEFT, RIGHT->RIGHT, STOP->STOP, GO->GO
- Low confidence or "Unknown" class keeps the previous command unchanged
2.6 Motor Control & Safety
Before any motor action, a Bump Check reads P0.7:
- Triggered: Forces EMERGENCY STOP via motor_stop_all() (all GPIO OFF), then returns to main loop
- Clear: Passes to madrover(cmd) dispatcher, which calls:
- STOP (cmd=0): P2.3=P2.4=P1.1=P1.0 = 0
- GO/FRONT (cmd=1,2): Both tracks forward (A: P2.3=1 P2.4=0, B: P1.1=1 P1.0=0)
- BACK (cmd=3): Both tracks backward (A: P2.3=0 P2.4=1, B: P1.1=0 P1.0=1)
- LEFT (cmd=4): Pivot left (A backward, B forward)
- RIGHT (cmd=5): Pivot right (A forward, B backward)
3 Hardware
3.1 MAX78000FTHR - AI Microcontroller Board
Role in Project KWS inference, motor control, bump detection

The MAX78000FTHR is the central processing unit of the Deaf Rover. It is an ultra-low-power AI microcontroller from Analog Devices (formerly Maxim Integrated) in the Feather form factor (25.4mm x 50.8mm). The board integrates three processing elements: an ARM Cortex-M4F at 100MHz for application logic, a 32-bit RISC-V core at 50MHz for data movement, and a dedicated hardware CNN accelerator with 64 parallel 8-bit processors capable of 28.8 GOPS. It features 1.5MB flash, 512KB SRAM, an on-board MEMS microphone connected via I2S, and supports multiple low-power modes. The CNN accelerator can execute inference in microseconds and return to sleep, enabling battery-powered always-on keyword spotting.
On board MIC is used in this project as voice detecting source.
3.2 H-Bridge Motor Driver Module

The H-Bridge motor driver module controls the two DC motors for the tracked chassis. It uses an L293D 16-pin DIP IC that provides dual H-bridge outputs for bidirectional motor control. The module accepts 2-10V DC input for motor power and TTL-level logic inputs from the MAX78000 GPIO pins. It has two 100uF electrolytic capacitors for power supply filtering and includes flyback protection diodes. The module features two 2-pin JST connectors (MOTOR-A and MOTOR-B) for motor output and a 5-pin header (IN1-IN4, GND) for control signals.
3.3 MH-B IR Obstacle Detection Sensor
Role in Project Emergency stop when obstacle detected

The MH-B IR obstacle sensor provides collision detection for the Deaf Rover. It operates using active infrared reflection: an IR LED emits infrared light, and when an object is within range, the reflected light is detected by an IR photodiode. An LM393 dual comparator IC processes the signal against a threshold set by the onboard 10k ohm potentiometer. The module outputs a digital HIGH/LOW signal on its OUT pin. It has two indicator LEDs: a power indicator and an output indicator that lights when an obstacle is detected. The module connects via a 3-pin header (VCC, GND, OUT) and operates on 3.3V-5V DC with a detection range of approximately 2-30cm, adjustable via the potentiometer.
3.4 LiPo Battery Pack
A single-cell Lithium Polymer (LiPo) pouch cell battery provides portable power for the Deaf Rover. The battery has a nominal voltage of 3.7V, which is fed to the MAX78000FTHR board (operating range 2.0-3.6V) and to the H-Bridge motor driver via a DC-DC boost converter to step up to 5V for motor operation.
3.5 Materials Summary
1 MAX78000FTHR Board 1 AI MCU + CNN + On-board Mic + GPIO
2 H-Bridge Motor Driver (L293D) 1 Dual H-bridge for 2 DC motors
3 MH-B IR Obstacle Sensor 1 Bump/collision detection
4 Tracked Chassis + 2x DC Motors 1 Mobile robot base with differential drive
5 LiPo Battery 3.7V 1 Portable power source
6 USB-C Cable 1 Programming and debug via SWD
4 Software & Program Flow
4.1 Initialization
The main() function follows a structured initialization sequence before entering the main processing loop:
Step 1: System Clock Configuration
Clock source is selected based on CLOCK_SOURCE macro (default: IPO at 100MHz). The cache is enabled via MXC_ICC_Enable() for improved performance.
Step 2: CNN Accelerator Setup
The CNN accelerator is enabled and configured:
Step 3: I2S and Microphone Initialization
The I2SInit() function sets up the audio interface for the on-board microphone:
4.2 Main Processing Loop
The main loop implements a state machine for keyword detection with three states:
STOP Initial/Reset state No processing, wait for activation
SILENCE avg < THRESHOLD_HIGH Continuously sample mic, wait for voice
KEYWORD avg >= THRESHOLD_HIGH Collect samples, run CNN inference
4.3Pin Definitions and Hardware Mapping
The motor control system uses GPIO pins defined in board.h, led.h, and pb.h.
The mapping is as follows:
Bump Detect -- P0.7 BUMP_ALERT, (PB[2]) Digital Input
Motor A -- Left Track P2.3, P2.4 LC_MOTORA, RC_MOTORA Digital Output
Motor B -- Right Track P1.1, P1.0 LC_MOTORB, RC_MOTORB Digital Output
4.4 Motor Control Functions
The motor control is implemented through LED API functions (LED_On/LED_Off) which control the GPIO states:
motor_stop_all()
Turns off all motor control signals. Both tracks stop. Used for STOP command and safety halt.
motor_a_forward() / motor_a_backward()
Controls the left track. Forward sets LC_MOTORA=1, RC_MOTORA=0. Backward sets LC_MOTORA=0, RC_MOTORA=1.
motor_b_forward() / motor_b_backward()
Controls the right track. Forward sets LC_MOTORB=1, RC_MOTORB=0. Backward sets LC_MOTORB=0, RC_MOTORB=1.
4.5 Command Mapping and madrover() Logic
The madrover() function maps voice commands to motor actions. It first checks the bump sensor (P0.7) for safety:
Attach code part
void motor_stop_all(void) {
LED_Off(LC_MOTORA);
LED_Off(RC_MOTORA);
LED_Off(LC_MOTORB);
LED_Off(RC_MOTORB);
}
void motor_a_forward(void) {
LED_On(LC_MOTORA);
LED_Off(RC_MOTORA);
}
void motor_a_backward(void) {
LED_Off(LC_MOTORA);
LED_On(RC_MOTORA);
}
void motor_b_forward(void) {
LED_On(LC_MOTORB);
LED_Off(RC_MOTORB);
}
void motor_b_backward(void) {
LED_Off(LC_MOTORB);
LED_On(RC_MOTORB);
}
void madrover(uint16_t command) {
if (PB_Get(BUMP_ALERT) == 1) {
motor_stop_all();
printf("\n!!!!!! BUMP_ALERT is Trigger!!!!!!!!!!!!!\n");
printf("v_commands: %s (FORCED STOP)\n", v_commands[0]);
printf("----------------------------------------\n");
return;
}
if (command >= NUMBEROFCOMMONDS) {
command = 0;
}
switch (command) {
case 0: // STOP
motor_stop_all();
break;
case 1: // GO
case 2: // FRONT
motor_a_forward();
motor_b_forward();
break;
case 3: // BACK
motor_a_backward();
motor_b_backward();
break;
case 4: // LEFT
motor_a_backward();
motor_b_forward();
break;
case 5: // RIGHT
motor_a_forward();
motor_b_backward();
break;
default:
motor_stop_all();
break;
}
printf("v_commands: %s\n", v_commands[command]);
printf("----------------------------------------\n");
return;
}
4.6 Bump Sensor Safety Logic
The bump sensor on P0.7 (BUMP_ALERT) provides collision detection. The logic is:
1. Before executing any command, check PB_Get(BUMP_ALERT)
2. If bump sensor is triggered (returns 1), immediately call motor_stop_all()
3. Print "BUMP_ALERT is Trigger" message and force command to STOP
4. This safety check runs both before and after CNN inference
4.7 Code in main part
int main(void)
{
uint16_t command = 0; //COMMAND for ROVER
uint32_t sampleCounter = 0;
mxc_tmr_unit_t units;
uint8_t pChunkBuff[CHUNK];
uint16_t avg = 0;
uint16_t ai85Counter = 0;
uint16_t wordCounter = 0;
uint16_t avgSilenceCounter = 0;
mic_processing_state procState = STOP;
#if defined(BOARD_FTHR_REVA)
// Wait for PMIC 1.8V to become available, about 180ms after power up.
MXC_Delay(200000);
#endif
/* Enable cache */
MXC_ICC_Enable(MXC_ICC0);
switch (CLOCK_SOURCE) {
case 0:
MXC_SYS_ClockSourceEnable(MXC_SYS_CLOCK_IPO);
MXC_SYS_Clock_Select(MXC_SYS_CLOCK_IPO);
MXC_GCR->pm &= ~MXC_F_GCR_PM_IPO_PD; // enable IPO during sleep
break;
case 1:
MXC_SYS_ClockSourceEnable(MXC_SYS_CLOCK_ISO);
MXC_SYS_Clock_Select(MXC_SYS_CLOCK_ISO);
MXC_GCR->pm &= ~MXC_F_GCR_PM_ISO_PD; // enable ISO during sleep
break;
case 2:
MXC_SYS_ClockSourceEnable(MXC_SYS_CLOCK_IBRO);
MXC_SYS_Clock_Select(MXC_SYS_CLOCK_IBRO);
MXC_GCR->pm &= ~MXC_F_GCR_PM_IBRO_PD; // enable IBRO during sleep
break;
default:
printf("UNKNOWN CLOCK SOURCE \n");
while (1) {}
}
SystemCoreClockUpdate();
#ifdef ENABLE_MIC_PROCESSING
#if defined(ENABLE_CODEC_MIC)
codec_init();
#elif defined(BOARD_FTHR_REVA)
/* Enable microphone power on Feather board only if codec is not enabled */
Microphone_Power(POWER_ON);
#endif
#endif
// Initialize UART
console_UART_init(CON_BAUD);
//LED_Init();
/* Enable peripheral, enable CNN interrupt, turn on CNN clock */
/* CNN clock: 50 MHz div 1 */
cnn_enable(MXC_S_GCR_PCLKDIV_CNNCLKSEL_PCLK, MXC_S_GCR_PCLKDIV_CNNCLKDIV_DIV1);
/* Configure P2.5, turn on the CNN Boost */
cnn_boost_enable(MXC_GPIO2, MXC_GPIO_PIN_5);
PR_INFO("\nDeaf Rover ......\n");
PR_INFO("\n***** Init *****\n");
memset(pAI85Buffer, 0x0, sizeof(pAI85Buffer));
//PR_DEBUG("pChunkBuff: %d\n", sizeof(pChunkBuff));
//PR_DEBUG("pAI85Buffer: %d\n", sizeof(pAI85Buffer));
#if SLEEP_MODE == 1
NVIC_EnableIRQ(CNN_IRQn);
#endif
#ifdef WUT_ENABLE
// Get ticks based off of microseconds
mxc_wut_cfg_t cfg;
uint32_t ticks;
MXC_WUT_GetTicks(WUT_USEC, MXC_WUT_UNIT_MICROSEC, &ticks);
// config structure for one shot timer to trigger in a number of ticks
cfg.mode = MXC_WUT_MODE_CONTINUOUS;
cfg.cmp_cnt = ticks;
// Init WUT
MXC_WUT_Init(MXC_WUT_PRES_1);
//Config WUT
MXC_WUT_Config(&cfg);
MXC_LP_EnableWUTAlarmWakeup();
NVIC_EnableIRQ(WUT_IRQn);
#endif
/* Disable CNN clock */
MXC_SYS_ClockDisable(MXC_SYS_PERIPH_CLOCK_CNN);
/* switch to silence state*/
procState = SILENCE;
#ifdef ENABLE_MIC_PROCESSING
/* initialize I2S interface to Mic */
I2SInit();
#endif
PR_INFO("\n*** READY ***\n");
#ifdef WUT_ENABLE
MXC_WUT_Enable(); // Start WUT
#endif
/* Read samples */
while (1) {
#ifndef ENABLE_MIC_PROCESSING
/* end of test vectors */
if (sampleCounter >= sizeof(voiceVector) / sizeof(voiceVector[0])) {
PR_DEBUG("End of test Vector\n");
break;
}
#endif
/* Read from Mic driver to get CHUNK worth of samples, otherwise next sample*/
if (MicReadChunk(&avg) == 0) {
#ifdef WUT_ENABLE
#ifdef ENERGY
// keep LED on for about 10sec for energy measurement
if (tot_usec > 10 * 1000 * 1000) {
LED_Off(LED1);
tot_usec = -10000000; // wait for 10sec before measuring again
} else if (tot_usec > 0) {
LED_On(LED1);
}
#endif
#endif
#if SLEEP_MODE == 1
__WFI();
#elif SLEEP_MODE == 2
#ifdef WUT_ENABLE
MXC_LP_ClearWakeStatus();
SCB->SCR |= SCB_SCR_SLEEPDEEP_Msk; // SLEEPDEEP=1
__WFI();
#endif
#endif // #if SLEEP_MODE == 1
continue;
}
sampleCounter += CHUNK;
/* wait for at least PREAMBLE_SIZE samples before detecting the utterance */
if (sampleCounter < PREAMBLE_SIZE)
continue;
#ifdef ENABLE_SILENCE_DETECTION // disable to start collecting data immediately.
/* Display average envelope as a bar */
#ifdef ENABLE_PRINT_ENVELOPE
PR_DEBUG("%.6d|", sampleCounter);
for (int i = 0; i < avg / 10; i++) {
PR_DEBUG("=");
}
if (avg >= thresholdHigh) {
PR_DEBUG("*");
}
PR_DEBUG("[%d]\n", avg);
#endif
/* if we have not detected voice, check the average*/
if (procState == SILENCE) {
/* compute average, proceed if greater than threshold */
if (avg >= thresholdHigh) {
/* switch to keyword data collection*/
procState = KEYWORD;
/* record the average and index of the begining of the word */
utteranceAvg = avg;
utteranceIndex = micBufIndex;
ai85Counter += PREAMBLE_SIZE;
continue;
}
}
/* if it is in data collection, add samples to buffer*/
else if (procState == KEYWORD)
#endif //#ifdef ENABLE_SILENCE_DETECTION
{
uint8_t ret = 0;
/* increment number of stored samples */
ai85Counter += CHUNK;
/* if there is silence after at least 1/3 of samples passed, increment number of times back to back silence to find end of keyword */
if ((avg < thresholdLow) && (ai85Counter >= SAMPLE_SIZE / 3)) {
avgSilenceCounter++;
} else {
avgSilenceCounter = 0;
}
/* if this is the last sample and there are not enough samples to
* feed to CNN, or if it is long silence after keyword, append with zero (for reading file)
*/
#ifndef ENABLE_MIC_PROCESSING
if (((ai85Counter < SAMPLE_SIZE) &&
(sampleCounter >= sizeof(voiceVector) / sizeof(voiceVector[0]) - 1)) ||
(avgSilenceCounter > SILENCE_COUNTER_THRESHOLD))
#else
if (avgSilenceCounter > SILENCE_COUNTER_THRESHOLD)
#endif
{
memset(pChunkBuff, 0, CHUNK);
zeroPad = SAMPLE_SIZE - ai85Counter;
ai85Counter = SAMPLE_SIZE;
}
/* if enough samples are collected, start CNN */
if (ai85Counter >= SAMPLE_SIZE) {
int16_t out_class = -1;
double probability = 0;
/* end of the utterance */
int endIndex =
(utteranceIndex + SAMPLE_SIZE - PREAMBLE_SIZE - zeroPad) % SAMPLE_SIZE;
//PR_DEBUG("Word starts from index %d to %d, padded with %d zeros, avg:%d > %d \n", utteranceIndex, endIndex, zeroPad, utteranceAvg, thresholdHigh);
// zero padding
memset(pChunkBuff, 0, CHUNK);
/* PREAMBLE copy */
if (utteranceIndex - PREAMBLE_SIZE >= 0) {
if (AddTranspose((uint8_t *)&micBuff[utteranceIndex - PREAMBLE_SIZE],
pAI85Buffer, PREAMBLE_SIZE, SAMPLE_SIZE, TRANSPOSE_WIDTH)) {
PR_DEBUG("ERROR: Transpose ended early \n");
}
} else {
/* copy oldest samples to the beginning*/
if (AddTranspose(
(uint8_t *)&micBuff[SAMPLE_SIZE - PREAMBLE_SIZE + utteranceIndex],
pAI85Buffer, PREAMBLE_SIZE - utteranceIndex, SAMPLE_SIZE,
TRANSPOSE_WIDTH)) {
PR_DEBUG("ERROR: Transpose ended early \n");
}
/* copy latest samples afterwards */
if (AddTranspose((uint8_t *)&micBuff[0], pAI85Buffer, utteranceIndex,
SAMPLE_SIZE, TRANSPOSE_WIDTH)) {
PR_DEBUG("ERROR: Transpose ended early \n");
}
}
/* Utterance copy */
if (utteranceIndex < endIndex) {
/* copy from utternace to the end */
if (AddTranspose((uint8_t *)&micBuff[utteranceIndex], pAI85Buffer,
endIndex - utteranceIndex, SAMPLE_SIZE, TRANSPOSE_WIDTH)) {
PR_DEBUG("ERROR: Transpose ended early \n");
}
// copy zero padding
while (!ret) {
ret = AddTranspose(pChunkBuff, pAI85Buffer, CHUNK, SAMPLE_SIZE,
TRANSPOSE_WIDTH);
}
} else {
/* copy from utternace to the end*/
if (AddTranspose((uint8_t *)&micBuff[utteranceIndex], pAI85Buffer,
SAMPLE_SIZE - utteranceIndex, SAMPLE_SIZE, TRANSPOSE_WIDTH)) {
PR_DEBUG("ERROR: Transpose ended early \n");
}
/* copy from begining*/
if (AddTranspose((uint8_t *)&micBuff[0], pAI85Buffer, endIndex, SAMPLE_SIZE,
TRANSPOSE_WIDTH)) {
PR_DEBUG("ERROR: Transpose ended early \n");
}
// copy zero padding
while (!ret) {
ret = AddTranspose(pChunkBuff, pAI85Buffer, CHUNK, SAMPLE_SIZE,
TRANSPOSE_WIDTH);
}
}
/* reset counters */
ai85Counter = 0;
avgSilenceCounter = 0;
/* new word */
wordCounter++;
/* change state to silence */
procState = SILENCE;
/* sanity check, last transpose should have returned 1, as enough samples should have already been added */
if (ret != 1) {
PR_DEBUG("ERROR: Transpose incomplete!\n");
fail();
}
//---------------------------------- : invoke AI85 CNN
//PR_DEBUG("%.6d: Starts CNN: %d\n", sampleCounter, wordCounter);
/* enable CNN clock */
MXC_SYS_ClockEnable(MXC_SYS_PERIPH_CLOCK_CNN);
/* load to CNN */
if (!cnn_load_data(pAI85Buffer)) {
PR_DEBUG("ERROR: Loading data to CNN! \n");
fail();
}
/* Start CNN */
if (!cnn_start()) {
PR_DEBUG("ERROR: Starting CNN! \n");
fail();
}
#if SLEEP_MODE == 0
/* Wait for CNN to complete */
while (cnn_time == 0) {
__WFI();
}
#elif SLEEP_MODE == 1
while (cnn_time == 0) {
__WFI();
}
#elif SLEEP_MODE == 2
SCB->SCR |= SCB_SCR_SLEEPDEEP_Msk; // SLEEPDEEP=1
while (cnn_time == 0) {
#ifdef WUT_ENABLE
MXC_LP_ClearWakeStatus();
__WFI();
#endif
}
#endif // #if SLEEP_MODE==0
/* Read CNN result */
cnn_unload((uint32_t *)ml_data);
/* Stop CNN */
cnn_stop();
/* Disable CNN clock to save power */
MXC_SYS_ClockDisable(MXC_SYS_PERIPH_CLOCK_CNN);
/* Get time */
MXC_TMR_GetTime(MXC_TMR0, cnn_time, (void *)&cnn_time, &units);
//PR_DEBUG("%.6d: Completes CNN: %d\n", sampleCounter, wordCounter);
switch (units) {
case TMR_UNIT_NANOSEC:
cnn_time /= 1000;
break;
case TMR_UNIT_MILLISEC:
cnn_time *= 1000;
break;
case TMR_UNIT_SEC:
cnn_time *= 1000000;
break;
default:
break;
}
PR_DEBUG("CNN Time: %d us || ", cnn_time);
/* run softmax */
softmax_q17p14_q15((const q31_t *)ml_data, NUM_OUTPUTS, ml_softmax);
#ifdef ENABLE_CLASSIFICATION_DISPLAY
PR_DEBUG("\nClassification results:\n");
for (int i = 0; i < NUM_OUTPUTS; i++) {
int digs = (1000 * ml_softmax[i] + 0x4000) >> 15;
int tens = digs % 10;
digs = digs / 10;
PR_DEBUG("[%+.7d] -> Class %.2d %8s: %d.%d%%\n", ml_data[i], i, keywords[i],
digs, tens);
}
#endif
/* find detected class with max probability */
ret = check_inference(ml_softmax, ml_data, &out_class, &probability);
//PR_DEBUG("----------------------------------------- \n");
/* Treat low confidence detections as unknown*/
if (!ret || out_class == NUM_OUTPUTS - 1) {
PR_DEBUG("Detected word: %s", "Unknown\n");
} else {
PR_DEBUG("Detected word: %s (%0.1f%%)\n", keywords[out_class], probability);
}
//PR_DEBUG("\n----------------------------------------- \n");
//v_commands[NUMBEROFCOMMONDS][5] = {"STOP", "GO", "FRONT", "BACK", "LEFT", "RIGHT", };
//keywords[NUM_OUTPUTS][10] = { "UP", "DOWN", "LEFT", "RIGHT", "STOP", "GO", "YES", "NO", "ON", "OFF", "ONE", "TWO",
// "THREE", "FOUR", "FIVE", "SIX", "SEVEN", "EIGHT", "NINE", "ZERO", "Unknown" };
//command = 0; //uint8_t command = 0;
if (PB_Get(BUMP_ALERT) == 1){
command = 0;
printf("\n!!!!!! BUMP_ALERT is Trigger!!!!!!!!!!!!!\n");
madrover(command); //Stop the ROVER if it works, check bumps first.
};
switch (out_class) {
case 4:
command = 0;
break;
case 5:
command = 1;
break;
case 0:
command = 2;
break;
case 1:
command = 3;
break;
case 2:
command = 4;
break;
case 3:
command = 5;
break;
//case 20:
//Unknow, command = 0;
//printf("Unknown, keep command: %d and ", command);
//break;
default:
//command = 0;
printf("Non_kws, keep command: %d and ", command);
//LED_Off(LED_RED);
break;
};
madrover(command); //TURN ON the ROVER if it works
Max = 0;
Min = 0;
//------------------------------------------------------------
while (1) {}
}
4.8 Build and run the program
Build and download the code

run the code and get feedback from uart terminal
5 Software & Program Flow
5.1 Hardware Assembly

The hardware assembly consists of:
1. MAX78000FTHR board - Main AI controller with on-board microphone
2. H-Bridge Motor Driver - Controls two DC motors for track movement
3. LiPo Battery - Power source (3.7V nominal)
4. IR Obstacle Sensor - Connected to P0.7 for bump detection
5.2 Test with Tracked Chassis - Two independent tracks with DC motors

6AI Model Training and Deployment
6.1 Model Architecture (Brief)
The KWS model is a compact CNN optimized for the MAX78000 hardware constraints:
⦁ Input: 128x128 spectrogram-like features (16384 8-bit samples)
⦁ Convolutional layers with quantized 8-bit weights
⦁ Output: 21 classes (20 keywords + Unknown)
⦁ Model trained with ai8x-training framework with quantization-aware training
6.2 Training Pipeline
The model was trained using the following pipeline:
1. Dataset: Google Speech Commands v2 (35 keywords, 105K samples)
2. Subset: 20 keywords selected for rover control
3. Augmentation: Background noise, time shift, speed perturbation
4. Quantization: 8-bit weights and activations for MAX78000
5. Compilation: ai8x-synthesis generates C code for CNN accelerator
6.3 Performance Characteristics
Based on the code parameters and MAX78000 specifications:
⦁ Inference time: ~1-2ms per keyword (measured via cnn_time)
⦁ Power consumption: Sub-mW during inference (ultra-low-power CNN)
⦁ Accuracy: ~90%+ on clean speech, lower with noise (hence "Deaf" Rover)
⦁ Latency: ~1 second from speech to motor action (includes sample collection)
6.4 Two Cli tool for Model training and systhesis for MAX78000 core
Download two toolset from github


Then run with python code under \Script for different models provided.
One can customized model, if customized dataset is prepared. Readme explain clearly how to do.
7: Conclusion and Future Work
7.1Current Status
The Deaf Rover demonstrates voice-controlled navigation using the MAX78000's CNN accelerator for keyword spotting. The system successfully:
⦁ Detects 6 movement commands (STOP, GO, FRONT, BACK, LEFT, RIGHT)
⦁ Controls two DC motors via H-bridge driver
⦁ Implements safety stop via bump sensor
⦁ Operates on battery power with ultra-low-power CNN inference

It works with voice command

7.2 Known Limitations
⦁ Occasionally misses commands (hence the name "Deaf Rover")
⦁ May move randomly due to false positives or misclassification
⦁ No visual object detection in current phase
7.3 Next Phase Vision
Future development will add:
1. Visual AI for tennis ball detection using camera input
2. Autonomous pathfinding with obstacle avoidance
3. Pick-up mechanism for tennis ball collection
4. Integration with more powerful MPU (e.g., Arduino Uno Q) for complex tasks