Voice Detection on ESP-32

bcerjan · Jul 16, 2023

I've been working on making an offline voice timer using the ESP32 for voice recognition and the RP2040 for everything else (a little display, sounds, ...). I've hit a bit of a wall with the speech recognition portion and I was wondering if anyone had any suggestions or had a better performing method for speech detection.

Knowing very little about machine learning, I followed along with the Google Colab documentation for training 'tflite' models for microcontrollers, but I wanted to adjust it to detect the words: marvin (as a wake word), stop, and the digits 0-9. I've tooled around with the settings as well as implemented the methods from a few papers (e.g. this one) to try and improve performance, but I'm typically limited to ~85% accuracy on a reduced set of words (marvin, stop, 0-3 and 5). When I try modifying the example code for an ESP32 (from the espressif repo) I can see that it is now trying to detect the correct words, but it has very low confidence (typically ~130 on the scale it uses) and I am fairly certain it would never "work" in any real sense.

Does anyone have any suggestions about how to improve accuracy? I also tried using ESP-Skainet, but I couldn't get it to recognize any commands (though I am not sure it was receiving audio correctly).

Lhimo · Nov 23, 2023

might be due to this?

Log in or Sign up

Voice Detection on ESP-32

bcerjan New Member

Lhimo New Member

Share This Page

Log in or Sign up

Voice Detection on ESP-32

bcerjan New Member

Lhimo New Member

Share This Page

Useful Searches