Activity
Mon
Wed
Fri
Sun
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
What is this?
Less
More

Memberships

Open Source Voice AI Community

765 members • Free

6 contributions to Open Source Voice AI Community
Zadarma with livekit or pipecat in selfhosted?
Hello, I´m new in the community :) I´m playing with pipecat and livekit in self-hosted and the problem I have is that I need compatibility with Zadarma SIP (https://zadarma.com/) and I can´t use it :_( I need to use zadarma because Twilio and Tlenyx are so expensive and also I can´t buy phones from a specific part of Spain. Do some one use zadarma SIP with them? Thanks!!😁
0 likes • 18m
Nir's solution sounds great, and if he can help you get this set up you should definitely go that route. Just in case general information about SIP interconnect is useful: with Pipecat, you can use either a telephony provider that supports WebSocket streaming, or use a Pipecat transport that supports SIP directly. Here's the Pipecat guide to using the DailyTransport + Twilio SIP. Everything this guide says about SIP should be applicable to any telephony provider that supports SIP. (Only the Twilio configuration/setup parts of this guide are specific to Twilio). https://docs.pipecat.ai/guides/telephony/twilio-daily-sip
New Gemini Live model release
Google released the latest version of the Gemini Live model today. This is still the 2.5 series (not based on Gemini 3, yet) Using the AI Studio APIs, the model name is: gemini-2.5-flash-native-audio-preview-12-2025 I've been experimenting with the checkpoints for a couple of weeks, and it's pretty similar to the previous version. (flash-native-audio-preview-09-2025). They focused a lot on tool calling reliability in this release, because everybody told them they needed to make tool calling more reliable! It's definitely better on their benchmarks, but I generally found you could prompt the model pretty well to do tool calling until you get fairly deep in a multi-turn conversation, in which case all bets are off. What we really need from this API is good context engineering capabilities! Interestingly, the model is also GA today (not preview) on Google Cloud Vertex. The Vertex model name is gemini-live-2.5-flash-native-audio I don't really understand the Vertex thinking, here, other than that Vertex can give you contracts for committed TPU capacity, which removes one of the variables that makes it hard to deliver reliable latencies for a model like this. You can try the new model at https://www.pipecat.ai/
1
0
Voice agent observability with tracing
Are you using tracing in your voice agent? I thought about this today, because The team at LangChain built voice AI support into their agent debugging and monitoring tool, LangSmith. LangSmith is built around the concept of "tracing." If you've used OpenTelemetery for application logging, you're already familiar with tracing. If you haven't, think about it like this: a trace is a record of an operation that an application performs. Today's production voice agents are complex, multi-model, multi-modal, multi-turn systems! Tracing gives you leverage to understand what your agents are doing. This saves time during development. And it's critical in production. You can dig into what happened during each turn of any session. What did the user say and how was that processed by each model you're using in your voice agent? What was the latency for each inference operation? What audio and text was actually sent back to the user? You can also run analytics using tracing as your observability data. And you can use traces to build evals. Tanushree is an engineer at LangChain. Her video below shows using a local (on-device) model for transcription, then switching to using the OpenAI speech-to-text model running in the cloud. You can see the difference in accuracy. (Using Pipecat, switching between different models is a single-line code change.) Also, the video is fun! It's a French tutor. Which is a voice agent I definitely need. How to debug voice agents with LangSmith (video): https://youtu.be/0FmbIgzKAkQ LangSmith Pipecat integration docs page: https://docs.langchain.com/langsmith/trace-with-pipecat I always like to read the code for nifty Pipecat services like the LangSmith tracing processor. It's here, though I think this nice work will likely make its way into Pipecat core soon: https://github.com/langchain-ai/voice-agents-tracing/blob/main/pipecat/langsmith_processor.py
Who has built extremely scalable Voice AI System with LIvekit & Pipecat
I mean a system where one can do 10k calls per day. Has anyone built a system like this using livekit and pipecat. did you do it without using our own GPUs ?
2 likes • 19d
Which part of the scaling are you thinking about? For inference, most people that are doing 10k calls per day are using first-party, hosted services for STT, LLM, and TTS. You definitely can host your own models, but for two reasons, most people are not. First, the open weights models are still less capable than the commercial models. (Though I think that will change.) And second, it's actually more expensive to host your own models than to use the first-party, hosted, APIs until you are significantly bigger than 10k calls a day. For hosting the voice agent itself, the basic answer is that you need to get various bits and pieces of Kubernetes auto-scaling set up for your specific use case and cloud provider. (You can also look and see if Pipecat Cloud fits your needs. It's "docker push for voice agents".)
Voice AI without pipecat or livekit
Has any one built an voice bot without pipecat and livekit would love to connect with them I have built it and facing some issues in latency part My tech stack is openai llm , Deepgram for ASR and Azure for TTS and one telephony vendor which connects everything
2 likes • 19d
It's great to build things to learn! If there are latency issues that I can help you think through/diagnose, very happy to do that. Typical things that contribute to latency are: - using WebSockets instead of WebRTC (for edge-to-cloud, but this is probably not your case since you mention telephony), - long network round trips between all your providers (you want your voice agent very close to where your telephony provider terminates the PSTN, and to all your inference providers) - TTFT variance from your providers (you want to build observability tooling into your agent so you can measure and track this), - configuration parameters for Deepgram, etc - turn detection configuration - not streaming between models
1-6 of 6
Kwindla Kramer
2
8points to level up
@kwindla-kramer-2446
I work on Pipecat and Daily infrastructure

Active 16m ago
Joined Nov 7, 2025