You can use Pipecat or LiveKit for the orchestration. For the models, you can use on open-source options such as Whisper for STT, Llama for the LLM, and Piper for TTS. However, open-source models generally don’t perform as well as commercial ones. On top of that, you need GPU servers to achieve acceptable latency, which typically start at around $700 USD. In short, it’s not currently feasible to build and maintain production-grade voice AI agents entirely for free. Another approach is to make the most of free-tier cloud credits.For example, in this video I showcase a multilingual agent that loops through Groq’s LLMs and uses only the free credits: https://www.youtube.com/watch?v=mhh3QNf6gwl In another video, I show how to use Deepgram for TTS and STT, since they provide $200 in free credits, which can last quite a while: https://www.youtube.com/watch?v=_dIYv9YdT5s