User
Write something
Need your help/support
Hey everyone, I am building my AI Voice Agent agency, Voiceonic. I have started posting on LinkedIn, so I’m looking to connect with other people who are also interested in AI. I have posted valuable content on that platform. If you like that then connect with me and follow in the upcomming days I am bringing amazing content. Below is my LinkedIn you will learn something new from my account. See you there... https://www.linkedin.com/in/adamismaail/
Help Needed: Deepgram Nova-3 (Polish) Fragmenting Phone Numbers despite `utterance_end_ms`
Hi everyone, I'm building a specialized voice assistant using **Pipecat Flows v0.0.22** and running into a frustrating issue with phone number collection that I can't seem to solve. ### The Stack - **Framework:** Pipecat Flows v0.0.22 (Python) - **STT:** Deepgram Nova-3 (Polish `pl`) - **TTS:** Cartesia (Polish voice) - **Transport:** Local WebRTC (browser-based, no telephony yet) ### The Problem When I dictate a 9-digit Polish phone number (e.g., "690807057"), the assistant receives partial fragments and processes them individually instead of waiting for the full number. For example, if I say "690... 807... 057" (with natural pauses), the bot splits it into: 1. "6" -> sent to LLM -> LLM complains "Received only 1 digit" 2. "980" -> sent to LLM -> LLM complains 3. "5" ... and so on. ### What I Have Tried I've gone through the documentation and tried several fixes, but the "defragmentation" issue persists. 1. **Deepgram Configuration (Current Setup):** I've configured the `LiveOptions` to handle phone numbers and utterance endings explicitly: ```python options = LiveOptions( model="nova-3", language="pl", smart_format=True, # Enabled numerals=True, # Enabled utterance_end_ms=1000, # Set to 1000ms to force waiting interim_results=True # Required for utterance_end_ms ) ``` *Result:* Even with `utterance_end_ms=1000`, Deepgram seems to finalize the results too early during the digit pauses. 2. **VAD Tuning:** - I tried increasing Pipecat's VAD `stop_secs` to `2.0s`. - *Result:* This caused massive latency (2s delay on every response) and didn't solve the valid STT fragmentation (Deepgram still finalized early). I've reverted to `0.5s` (and `0.2s` for barge-in) as `stop_secs=2.0s` is considered an anti-pattern for conversational flows. 3. **Prompt Engineering (Aggressive):** - I instructed the LLM to "call the function IMMEDIATELY with whatever fragments you have". - *Result:* This led to early failures where the LLM would call `capture_phone("6")`, which would fail validation (requires 9 digits), causing the bot to reject the input before the user finished speaking.
Any Vietnamese Voice Agent builders?
I'm currently in Vietnam, Hanoi and wanted to see if I have any AI Voice Agent developers from vietnam in this community. If yes just drop a comment I would love to connect with some of you.
PersonaPlex-7B
Guys, do you know how to use PersonaPlex from Nvidia with LiveKit or Vapi?
AI Voice Tech Ready - Need Client Partners
Built AI phone systems from scratch. Know the full stack—scoping, design, development, testing, deployment. Systems can handle inbound calls, appointments, CRM integration for service businesses. Ready to launch but need partners who can source clients. Split profits 50/50. You handle sales, I handle delivery. Interested? Let's talk.
1-30 of 73
powered by
Voice AI HQ
skool.com/artilo-ai-6501
For developers, entrepreneurs, and anyone sick of voice AI hype without results.
Build your own community
Bring people together around your passion and get paid.
Powered by