No $ No Latency local TTS for any need? Chats? Your Agent? Yes it's possible and easy NOW.

Eleven labs is cool but pricey. Local models are GPU intensive. Introducing Kokoro... a high quality, emotive Text to Speech Model. Its been out for a bit, but the local setup can be a pain. And no Ollama version available yet (but rumored Ollama is building that functionality). Now some great people have built a complete Docker version with built-in API to hit from your script for whatever REALTIME voice needs you have in your app, gpu and cpu build versions and a gradio and web interface. I tried the latest beta branch and there was ZERO latency with some great high-quality voices (even some weird ASMR ones... not judging ;) ). Don't be intimidated by Docker if you haven't used it (its free and basically just a complete .venv and app combo that is builds itself and is hosted in the cloud). This is a work in progress so i tried the 1.2 pre beta github branch using the web interface. The API is only set on the main 1.0 branch for now it seems. Did i mention it's basically a free voice service for whatever you're building?

TLDR: https://github.com/remsky/Kokoro-FastAPI.git Easy to use README.md

1 comment