Multimodal Generation MCP - AI Powered Image, Music and Video Studio
Check out Google's VertexAI Creative Studio on Github: it provides individual MCP servers with tools to automate the generation and editing of each of Google's media platforms for images, videos, music, TTS, even automated FFmpeg tooling. (Imagen, Veo, Lyria). I'll admit It is was a bit of a bear for me to get fully running, and i have some familiarity with GCP, but once you do, it's amazing. Start with creating the ADK Agent powered example in the "experiments" directory that sets up a google ADK agent to allow you to interact in a chat UI with each of the media MCPs. Once you've mastered that, you can add the whole media mcp system to Gemini's claude code version...Gemini CLI. That gives you the equivalent of a complete AI powered media studio in your terminal or through a chat UI. For me, it's one of the most amazing MCP automations i've seen. Caveat: this requires working with Google's GCP cloud offering which is a meaningful learning step of its own. But....If you haven't done this yet.... do so today as it comes with $300 of free GCP credits you can use for any cloud offerings and on gemini and media AI API credits. It basically pays for all your AI needs and getting up to speed on the Gemini infrastructure. This is a massive rabbit hole so be careful.... but I think for anyone, it's a perfect time to check out in advance of what's rumored to be the ultimate monster sota model release of Gemini 3, and the just released amazing imagen and veo models. I'm not affiliated in any way... i just think its good to diversify your AI understanding among the lead labs... and gemini is currently #1 in all multimodal media :) https://github.com/GoogleCloudPlatform/vertex-ai-creative-studio/tree/main