Has anyone successfully built a web-based voice call experience (not push-to-talk) with natural speech, silence detection, and barge-in using browser audio + AI?
Weโre building a call-style UI where: - The user can talk freely (like a real phone call) - Silence detection determines when a โturnโ ends - Short pauses are merged into one thought - The AI can be interrupted if the user starts talking - Audio playback and mic capture work reliably on iOS Safari Right now weโre running into issues where: - Silence detection doesnโt reliably stop listening - Turns fire too early or too late - Transcription sometimes fails or never triggers - iOS Safari adds extra constraints around audio unlock and playback If youโve solved this (or seen a solid pattern for frontend VAD + turn management in the browser), Iโd love to hear: - What approach worked for you - Any gotchas with MediaRecorder / Web Audio API - Whether you moved logic frontend vs backend Appreciate any war stories or architecture advice ๐