Has anyone successfully built a web-based voice call experience (not push-to-talk) with natural speech, silence detection, and barge-in using browser audio + AI?
We’re building a call-style UI where:
  • The user can talk freely (like a real phone call)
  • Silence detection determines when a “turn” ends
  • Short pauses are merged into one thought
  • The AI can be interrupted if the user starts talking
  • Audio playback and mic capture work reliably on iOS Safari
Right now we’re running into issues where:
  • Silence detection doesn’t reliably stop listening
  • Turns fire too early or too late
  • Transcription sometimes fails or never triggers
  • iOS Safari adds extra constraints around audio unlock and playback
If you’ve solved this (or seen a solid pattern for frontend VAD + turn management in the browser), I’d love to hear:
  • What approach worked for you
  • Any gotchas with MediaRecorder / Web Audio API
  • Whether you moved logic frontend vs backend
Appreciate any war stories or architecture advice 🙏
1
0 comments
Blake Templeton
1
Has anyone successfully built a web-based voice call experience (not push-to-talk) with natural speech, silence detection, and barge-in using browser audio + AI?
powered by
AI Prompting Basic Training
skool.com/ai-professor-5434
ABC's and 123's of AI.
Old school learning with lessons & practice exercises
You will learn how to get great results from your prompts
Build your own community
Bring people together around your passion and get paid.
Powered by