element14 Community
element14 Community
    Register Log In
  • Site
  • Search
  • Log In Register
  • Community Hub
    Community Hub
    • What's New on element14
    • Feedback and Support
    • Benefits of Membership
    • Personal Blogs
    • Members Area
    • Achievement Levels
  • Learn
    Learn
    • Ask an Expert
    • eBooks
    • element14 presents
    • Learning Center
    • Tech Spotlight
    • STEM Academy
    • Webinars, Training and Events
    • Learning Groups
  • Technologies
    Technologies
    • 3D Printing
    • FPGA
    • Industrial Automation
    • Internet of Things
    • Power & Energy
    • Sensors
    • Technology Groups
  • Challenges & Projects
    Challenges & Projects
    • Design Challenges
    • element14 presents Projects
    • Project14
    • Arduino Projects
    • Raspberry Pi Projects
    • Project Groups
  • Products
    Products
    • Arduino
    • Avnet & Tria Boards Community
    • Dev Tools
    • Manufacturers
    • Multicomp Pro
    • Product Groups
    • Raspberry Pi
    • RoadTests & Reviews
  • About Us
  • Store
    Store
    • Visit Your Store
    • Choose another store...
      • Europe
      •  Austria (German)
      •  Belgium (Dutch, French)
      •  Bulgaria (Bulgarian)
      •  Czech Republic (Czech)
      •  Denmark (Danish)
      •  Estonia (Estonian)
      •  Finland (Finnish)
      •  France (French)
      •  Germany (German)
      •  Hungary (Hungarian)
      •  Ireland
      •  Israel
      •  Italy (Italian)
      •  Latvia (Latvian)
      •  
      •  Lithuania (Lithuanian)
      •  Netherlands (Dutch)
      •  Norway (Norwegian)
      •  Poland (Polish)
      •  Portugal (Portuguese)
      •  Romania (Romanian)
      •  Russia (Russian)
      •  Slovakia (Slovak)
      •  Slovenia (Slovenian)
      •  Spain (Spanish)
      •  Sweden (Swedish)
      •  Switzerland(German, French)
      •  Turkey (Turkish)
      •  United Kingdom
      • Asia Pacific
      •  Australia
      •  China
      •  Hong Kong
      •  India
      • Japan
      •  Korea (Korean)
      •  Malaysia
      •  New Zealand
      •  Philippines
      •  Singapore
      •  Taiwan
      •  Thailand (Thai)
      • Vietnam
      • Americas
      •  Brazil (Portuguese)
      •  Canada
      •  Mexico (Spanish)
      •  United States
      Can't find the country/region you're looking for? Visit our export site or find a local distributor.
  • Translate
  • Profile
  • Settings
Project Videos
  • Challenges & Projects
  • element14 presents
  • Project Videos
  • More
  • Cancel
Project Videos
Documents How Voice Recognition Works on Raspberry Pi (and Why It’s Easy to Break) -- Episode 700
  • Documents
  • Members
  • Mentions
  • Sub-Groups
  • Tags
  • More
  • Cancel
  • New
Join Project Videos to participate - click to join for free!
Related
Recommended
Engagement
  • Author Author: cstanton
  • Date Created: 4 Feb 2026 1:01 PM Date Created
  • Last Updated Last Updated: 5 Feb 2026 9:48 AM
  • Views 5564 views
  • Likes 6 likes
  • Comments 18 comments

How Voice Recognition Works on Raspberry Pi (and Why It’s Easy to Break) -- Episode 700

Lorraine builds a voice-locked prize box using a Raspberry Pi, servo, microphone and speaker recognition. A hands-on project exploring voice authentication, hardware design, and how easy systems are to break.

Watch the Project

You don't have permission to edit metadata of this video.
Edit media
x
image
Upload Preview
image

Reviewing the Fort Vox Voice Box Project

The Fort Vox Voice Box is a practical exploration of voice biometrics framed as a hands-on outreach project. Lorraine’s aim is clear from the outset: build a physical box that can only be unlocked using a spoken voice password, and use it as a tool for linguistics and computing outreach with children. The result sits deliberately at the intersection of hardware hacking, applied machine learning, and human behaviour.

Lorraine introduces the concept succinctly: a locked box, a treat inside, and a spoken phrase as the key. As she puts it, it is “a box that’s locked with a password but it’s a voice password,” designed so children can actively try to defeat it during outreach events. The emphasis is not on security in the commercial sense, but on transparency and learning. The electronics are intentionally visible, and the mechanics are simple enough to provoke curiosity rather than mystique.

image

Concept and System Breakdown

Early in the project, Lorraine breaks the system down into its core components: microphone, speaker, screen, button, servo, and a Raspberry Pi as the controller. This early sketching phase is important, because it reveals one of the recurring themes of the build: reducing complexity by choosing parts that can serve multiple roles.

The decision to use a Seeed ReSpeaker HAT is a good example. It combines microphone input, speaker output, onboard buttons, and LEDs in a single board, reducing wiring and setup effort. Lorraine notes that it “looks perfect actually for what I need,” largely because it avoids juggling separate audio components. This choice also shapes later software decisions, including how audio devices are discovered and selected in code.

From a physical design perspective, the box itself is as important as the electronics. The linguistics team’s requirement that the box be transparent drives several later decisions: how the Raspberry Pi is mounted, how the servo-driven locking mechanism is designed, and how the prize is separated from the electronics. Lorraine repeatedly returns to the idea that children should be able to see how the system works, not just interact with it.

image

Hardware Decisions and Constraints

As the project moves from concept to assembly, Lorraine reflects on practical constraints. Acrylic thickness, cutting methods, and durability under repeated handling all influence the design. A key concern is robustness: the box must survive being shaken, poked, and tested by children.

This leads to a change in the original locking idea. A freely rotating disc would be visually clear, but too easy to defeat physically. Lorraine recognises this risk: if the plastic is flexible, “you can just stick your hand in it and grab the sweet”. The revised design uses a servo-driven hook that physically blocks the lid, combined with a partial hinge so that only the prize compartment opens. This separation of electronics and reward is both a safety measure and a teaching tool.

Button placement and power switching are also considered carefully. The system needs to reset quickly between users, and adults need a way to intervene without dismantling the box. These considerations feed directly into the software flow later on.

image

Software Setup and Audio Handling

On the software side, Lorraine is explicit about the importance of environment choices. She deliberately avoids the newest Raspberry Pi OS release, noting that “we do not want bookworm… we need to go older, the bullseye”. This is a practical compatibility decision driven by audio libraries and HAT support, and it is a detail that anyone reproducing the project should pay close attention to.

Audio setup is validated early using low-level tools before being wrapped into Python. Lorraine accepts imperfections here, commenting that the microphones are “not amazing” and that some crackling is acceptable for the use case. This is an important reflection: the project is about relative similarity between voices, not studio-quality recordings.

The OLED screen is brought up next, using standard I²C detection and the Luma library. Lorraine demonstrates example animations not because they are part of the final system, but to confirm that the display pipeline works. This incremental validation approach—test each subsystem in isolation—is consistent throughout the build.

image

Voice Comparison Logic

The most technically dense part of the project is the voice comparison itself. Lorraine uses the Vosk speech and speaker recognition libraries to extract a speaker “signature” from audio recordings. She is candid about this part of the code, describing much of the maths as “gobbledegook” to her, but the implementation works.

Looking at the Python script, the process is clear. A reference recording (password.wav) is captured and processed to extract a speaker embedding. Each attempt is recorded in the same way, and the two embeddings are compared using cosine distance:

def cosine_dist(x,y):
    return 1 - np.dot(np.array(x),np.array(y)) / np.linalg.norm(x) / np.linalg.norm(y)

A smaller distance means a closer match. Lorraine highlights an important behavioural detail here: timing and pauses matter. A long pause can dramatically worsen the score, which leads her to note that “that pause is going to kill a lot of people”. This observation feeds directly into future ideas about clipping silence or adjusting thresholds.

The threshold itself is deliberately loose. In the final system, a score below 0.5 triggers the servo to open the box. Lorraine leaves this choice open-ended, noting that it will ultimately be up to the linguistics team to decide what is “close enough” for their experiments.

image

User Interaction and Feedback

The finished interaction loop is simple and effective. The screen displays short prompts such as “Ready,” “Speak,” and “Calculating,” while audio playback reinforces what the correct phrase should sound like. In the Python code, this is handled with a small OLED helper that redraws the display for each state change.

When a match succeeds, the servo opens the lock, pauses, and then closes again, ready for the next user. Lorraine notes a timing issue here during testing: the box can be slow to close if the servo movement overlaps with manual handling. This is flagged as something to refine, rather than ignored.

The live testing section of the project is particularly revealing. Lorraine and colleagues quickly discover that replay attacks work: recording the correct voice and playing it back can defeat the system. Rather than treating this as a failure, Lorraine treats it as a success for outreach. One participant summarises the moment simply: “broke the system”. For a project designed to provoke discussion about security, this is exactly the outcome she wants.

image

Reflections and Future Directions

By the end of the build, Lorraine reflects openly on the difficulty of working with audio and the time spent debugging. She admits she “doesn’t like speakers,” but is glad she pushed through because the result is repeatable and scalable. She plans to build multiple units for linguistics outreach events.

Future possibilities are hinted at rather than fully specified. Background noise, accents, and mimicry are all areas of interest. Lorraine is particularly interested in how children adapt their behaviour once they realise that silence and consistency matter, and how different accents affect matching scores.

What stands out is that the Fort Vox Voice Box is not positioned as a finished product, but as a platform for experimentation. The hardware is robust enough for repeated use, the software is readable and modifiable, and the limitations are visible by design. Anyone recreating the project is encouraged, implicitly, to tweak thresholds, refine audio handling, or even deliberately exploit weaknesses as part of the learning experience.

In that sense, the project succeeds not because it is secure, but because it makes the trade-offs of voice authentication tangible.

Supporting Files and Links

- Episode 700 Resources 

- Raspberry Pi OS image used in project

- Vosk Alphacephei Audio Model

Bill of Materials

Product Name Manufacturer Quantity Buy Kit
Raspberry Pi 3 RASPBERRY-PI 1 Buy Now
Official Raspberry Pi PSU with UK and Euro Plugs RASPBERRY-PI 1 Buy Now
Expansion Board, Respeaker Dual Microphone HAT, Raspberry Pi , AI And Voice Applications SEEED STUDIO 1 Buy Now
Loudspeaker, Stereo Enclosed, 3W, 8ohm, 16 mm x 30 mm x 70 mm DFROBOT 1 Buy Now
Buckled Cable, Universal, 4 Pin, Grove Module, 50 mm Cable SEEED STUDIO 1 Buy Now
Cable, Female Jumper to Conversion, 4pin, Grove Modules, 5 PCs Per Pack SEEED STUDIO 1 Buy Now
Grove 4 pin Male Jumper to Grove 4 pin Conversion Cable (5Pk) SEEED STUDIO 1 Buy Now
Nano HAT Hacker for Raspberry Pi Pimoroni 1 Buy Now
 

Additional Parts

Product Name Manufacturer Quantity
A suitable enclosure
A prize (maybe, a chocolate bar!)
OLED

  • audio machine learning project
  • oled display raspberry pi
  • servo lock raspberry pi
  • voice password system
  • vox raspberry pi
  • voice controlled lock
  • voice biometrics demo
  • interactive stem project
  • raspberry pi microphone project
  • stem outreach electronics
  • hardware security demo
  • speaker recognition raspberry pi
  • raspberry pi voice authentication
  • python voice recognition
  • voice recognition project
  • friday_release
  • Share
  • History
  • More
  • Cancel
Actions
  • Share
  • More
  • Cancel
  • Sign in to reply

Top Comments

  • lorrainbow
    lorrainbow 25 days ago in reply to beacon_dave +1
    my son is OBSESSED with ducks! They're everywhere in our house. In all shapes and sizes.
  • beacon_dave
    beacon_dave 25 days ago +1
    I noticed that when you made the initial passphrase recording you had the top cover open, so there is nothing obstructing the microphones on the ReSpeaker hat. However when you are trying to unlock the…
  • beacon_dave
    beacon_dave 23 days ago

    It could be interesting to see what confidence score an audio synthesis model like one from WaveNet could achieve.

    The model could be trained on easily available sources of audio such as some of the e14 presents episodes before this one.

    3 years ago, it looks like about 10mins of high quality audio with transcripts is all that is required to create a viable model.

    Deepmind: The Podcast Me, myself and AI

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
  • DAB
    DAB 24 days ago

    Great project Lorraine.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
  • beacon_dave
    beacon_dave 24 days ago

    If children are to be involved, it might be worth considering using a more graphical display of the result. A gauge type indicator that shows the current threshold setting required to be reached along with the actual score achieved as as percentage bar.

    Might also want an alternative way to get into the box for changing settings and releasing the servo lock without having to resort to the screwdriver every time. If the box locks before it is reloaded, then the researcher will need to supply the correct vox passphrase to open it or resort to the screwdriver.

    Perhaps consider adding a mode selector inside to allow a researcher to be able to quickly change between different settings.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
  • beacon_dave
    beacon_dave 24 days ago in reply to robogary

    When Lorraine mentioned children, the first thing that crossed my mind was Roy Scheider saying "...you are going to need a bigger candy bar...". Slight smile

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
  • robogary
    robogary 24 days ago in reply to kmikemoo

    you have to say Open Sesame in a Roy Scheider voice

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
  • robogary
    robogary 24 days ago in reply to beacon_dave

    and dont run with scissors 

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
  • kmikemoo
    kmikemoo 25 days ago in reply to robogary

    SO... the duck (or an onlooker) has to scream first?  Very Hollywood. Thumbsup

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
  • robogary
    robogary 25 days ago in reply to beacon_dave

    I'll put a VOX lock on the Shark Chase Tank so it cant eat rubber ducks when no one is watching. 

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
  • beacon_dave
    beacon_dave 25 days ago

    I noticed that when you made the initial passphrase recording you had the top cover open, so there is nothing obstructing the microphones on the ReSpeaker hat. However when you are trying to unlock the box, you now have a plastic cover over the microphones.

    You might get better results if you raise the hat up and cut a hole in the cover above each microphone. Pop an acoustic overcover windshield over it to reduce the effects of breath noise.

    May also help if you isolate the box from the tabletop with some foam pads.

    • Cancel
    • Vote Up +1 Vote Down
    • Sign in to reply
    • More
    • Cancel
  • beacon_dave
    beacon_dave 25 days ago in reply to lorrainbow

    I was getting concerned, your book highlights problems with Zombies but nothing about Werewolves...  Slight smile

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
>
element14 Community

element14 is the first online community specifically for engineers. Connect with your peers and get expert answers to your questions.

  • Members
  • Learn
  • Technologies
  • Challenges & Projects
  • Products
  • Store
  • About Us
  • Feedback & Support
  • FAQs
  • Terms of Use
  • Privacy Policy
  • Legal and Copyright Notices
  • Sitemap
  • Cookies

An Avnet Company © 2026 Premier Farnell Limited. All Rights Reserved.

Premier Farnell Ltd, registered in England and Wales (no 00876412), registered office: Farnell House, Forge Lane, Leeds LS12 2NE.

ICP 备案号 10220084.

Follow element14

  • X
  • Facebook
  • linkedin
  • YouTube