Has anyone successfully built a web-based voice call experience (not push-to-talk) with natural speech, silence detection, and barge-in using browser audio + AI?
We’re building a call-style UI where: - The user can talk freely (like a real phone call) - Silence detection determines when a “turn” ends - Short pauses are merged into one thought - The AI can be interrupted if the user starts talking - Audio playback and mic capture work reliably on iOS Safari Right now we’re running into issues where: - Silence detection doesn’t reliably stop listening - Turns fire too early or too late - Transcription sometimes fails or never triggers - iOS Safari adds extra constraints around audio unlock and playback If you’ve solved this (or seen a solid pattern for frontend VAD + turn management in the browser), I’d love to hear: - What approach worked for you - Any gotchas with MediaRecorder / Web Audio API - Whether you moved logic frontend vs backend Appreciate any war stories or architecture advice 🙏