Arek Wu

Voice AI HQ

Activity

Mon

Wed

Fri

Sun

Jul

Aug

Sep

Oct

Nov

Dec

Jan

Feb

Mar

Apr

May

Jun

What is this?

Less

Memberships

Voice AI HQ

460 members • Free

Open Source Voice AI Community

943 members • Free

1 contribution to Voice AI HQ

Arek Wu

Jan 15 •

General discussion

Help Needed: Deepgram Nova-3 (Polish) Fragmenting Phone Numbers despite `utterance_end_ms`

Hi everyone, I'm building a specialized voice assistant using **Pipecat Flows v0.0.22** and running into a frustrating issue with phone number collection that I can't seem to solve. ### The Stack - **Framework:** Pipecat Flows v0.0.22 (Python) - **STT:** Deepgram Nova-3 (Polish `pl`) - **TTS:** Cartesia (Polish voice) - **Transport:** Local WebRTC (browser-based, no telephony yet) ### The Problem When I dictate a 9-digit Polish phone number (e.g., "690807057"), the assistant receives partial fragments and processes them individually instead of waiting for the full number. For example, if I say "690... 807... 057" (with natural pauses), the bot splits it into: 1. "6" -> sent to LLM -> LLM complains "Received only 1 digit" 2. "980" -> sent to LLM -> LLM complains 3. "5" ... and so on. ### What I Have Tried I've gone through the documentation and tried several fixes, but the "defragmentation" issue persists. 1. **Deepgram Configuration (Current Setup):** I've configured the `LiveOptions` to handle phone numbers and utterance endings explicitly: ```python options = LiveOptions( model="nova-3", language="pl", smart_format=True, # Enabled numerals=True, # Enabled utterance_end_ms=1000, # Set to 1000ms to force waiting interim_results=True # Required for utterance_end_ms ) ``` *Result:* Even with `utterance_end_ms=1000`, Deepgram seems to finalize the results too early during the digit pauses. 2. **VAD Tuning:** - I tried increasing Pipecat's VAD `stop_secs` to `2.0s`. - *Result:* This caused massive latency (2s delay on every response) and didn't solve the valid STT fragmentation (Deepgram still finalized early). I've reverted to `0.5s` (and `0.2s` for barge-in) as `stop_secs=2.0s` is considered an anti-pattern for conversational flows. 3. **Prompt Engineering (Aggressive):** - I instructed the LLM to "call the function IMMEDIATELY with whatever fragments you have". - *Result:* This led to early failures where the LLM would call `capture_phone("6")`, which would fail validation (requires 9 digits), causing the bot to reject the input before the user finished speaking.

New comment 24d ago

Arek Wu

0 likes • 25d

Soniox resolved the issue

Arek Wu

0 likes • 24d

Hey @Hugo Podworski , figured you're Polish from the surname 😄 and from your videos I always assumed you're based in UK – am I right? I lived in London for about 8 years myself, now back in PL working in one of the largest companies and building a startup – uflow.pl. We've got first customers already (mid-implementation right now), running custom Pipecat AI voice and a tailored CRM where everything flows into one system. Pretty complex under the hood but thats kind of the point – we wanted something comprehensive rather then patching tools together. Watched your videos a while back and honestly pulled quite a few things into my own work, so cheers for that. Think we're already connected on LinkedIn too. Funny enough I made a vapi + Make tutorial maybe 2 years ago – was a good starting point but over time I just hit too many walls with low-code. That pushed me more into proper coding with AI and it really moved the needle for me. Always happy to buddy up with someone whos deep in this space. Been having calls with a few community members lately and the experience sharing is genuinely valuable – would be great to do the same with you!

1-1 of 1

Level 2

13points to level up

Arek Wu

@arek-wu-8696

10+ years in IT industry. Optimist

Active 23d ago

Joined Nov 21, 2025

Contributions

Followers

Following