The 1,500ms Vapi default eating your latency budget (and a few other things worth knowing)
If your Vapi agents feel laggy on phone calls but smooth in the web demo, there's a single config line that explains most of it.
Vapi ships with onNoPunctuationSeconds set to 1.5 seconds by default. That one setting adds more latency than your entire STT + LLM + TTS pipeline combined. Dropping it to 0.8 seconds is usually the highest-ROI change you can make on a production agent, and it costs you nothing.
I put together a deeper writeup that covers the rest of what we've been seeing across agency deployments in 2026. Sharing it here because I figured a few of you would find it useful.
Quick rundown of what's in it:
  • The 1,200ms conversational ceiling above which callers start consciously noticing they're talking to AI
  • April 2026 practitioner benchmarks across Vapi, Retell, Bland, and Synthflow (Vapi sits at 720ms median, 1,050ms P95)
  • Honest cost-per-minute math: the advertised $0.05 platform fee vs the realistic $0.12 to $0.33 all-in once you add STT, LLM, TTS, and telephony
  • The multi-provider fallback config that prevented a class of outages during the April 2026 incident
  • Why your web demo shows 465ms but phone delivery lands at 965ms+, and what to scope into client SLAs accordingly
  • HIPAA mode locks the provider list (the ~$1,000/mo cost and constraints to pre-qualify healthcare clients with)
  • A 6-step pre-launch checklist that runs about 45 minutes per agent
Most of this is stuff that's only obvious after you've shipped a handful of production agents and had a client call you about audio quality. Wanted to save someone else the slow path.
1
0 comments
Raj Baruah
4
The 1,500ms Vapi default eating your latency budget (and a few other things worth knowing)
VoiceAIWrapper Academy
skool.com/voice-ai-wrapper-academy-3838
Master white-label Voice AI with Vapi, RetellAI, ElevenLabs & more. Transform from reseller to solutions partner. Practical strategies for agencies.
Powered by