Speech To Speech: an effort for an open-sourced and modular GPT4-o
The repository implements a speech-to-speech cascaded pipeline with consecutive parts:
  1. Voice Activity Detection (VAD): silero VAD v5
  2. Speech to Text (STT): Whisper checkpoints (including distilled versions)
  3. Language Model (LM): Any instruct model available on the Hugging Face Hub! 🤗
  4. Text to Speech (TTS): Parler-TTS🤗
7
5 comments
Marcio Pacheco
7
Speech To Speech: an effort for an open-sourced and modular GPT4-o
Data Alchemy
skool.com/data-alchemy-9173
Your Community to Master the Fundamentals of Working with Data and AI — by Datalumina®
Powered by