The 1,500ms Vapi default eating your latency budget (and a few other things worth knowing)
If your Vapi agents feel laggy on phone calls but smooth in the web demo, there's a single config line that explains most of it. Vapi ships with onNoPunctuationSeconds set to 1.5 seconds by default. That one setting adds more latency than your entire STT + LLM + TTS pipeline combined. Dropping it to 0.8 seconds is usually the highest-ROI change you can make on a production agent, and it costs you nothing. I put together a deeper writeup that covers the rest of what we've been seeing across agency deployments in 2026. Sharing it here because I figured a few of you would find it useful. Quick rundown of what's in it: - The 1,200ms conversational ceiling above which callers start consciously noticing they're talking to AI - April 2026 practitioner benchmarks across Vapi, Retell, Bland, and Synthflow (Vapi sits at 720ms median, 1,050ms P95) - Honest cost-per-minute math: the advertised $0.05 platform fee vs the realistic $0.12 to $0.33 all-in once you add STT, LLM, TTS, and telephony - The multi-provider fallback config that prevented a class of outages during the April 2026 incident - Why your web demo shows 465ms but phone delivery lands at 965ms+, and what to scope into client SLAs accordingly - HIPAA mode locks the provider list (the ~$1,000/mo cost and constraints to pre-qualify healthcare clients with) - A 6-step pre-launch checklist that runs about 45 minutes per agent Most of this is stuff that's only obvious after you've shipped a handful of production agents and had a client call you about audio quality. Wanted to save someone else the slow path. Link: https://voiceaiwrapper.com/insights/vapi-voice-ai-optimization-performance-guide-voiceaiwrapper