Kevin Dragan

Check out Google's VertexAI Creative Studio on Github: it provides individual MCP servers with tools to automate the generation and editing of each of Google's media platforms for images, videos, music, TTS, even automated FFmpeg tooling. (Imagen, Veo, Lyria). I'll admit It is was a bit of a bear for me to get fully running, and i have some familiarity with GCP, but once you do, it's amazing. Start with creating the ADK Agent powered example in the "experiments" directory that sets up a google ADK agent to allow you to interact in a chat UI with each of the media MCPs. Once you've mastered that, you can add the whole media mcp system to Gemini's claude code version...Gemini CLI. That gives you the equivalent of a complete AI powered media studio in your terminal or through a chat UI. For me, it's one of the most amazing MCP automations i've seen. Caveat: this requires working with Google's GCP cloud offering which is a meaningful learning step of its own. But....If you haven't done this yet.... do so today as it comes with $300 of free GCP credits you can use for any cloud offerings and on gemini and media AI API credits. It basically pays for all your AI needs and getting up to speed on the Gemini infrastructure. This is a massive rabbit hole so be careful.... but I think for anyone, it's a perfect time to check out in advance of what's rumored to be the ultimate monster sota model release of Gemini 3, and the just released amazing imagen and veo models. I'm not affiliated in any way... i just think its good to diversify your AI understanding among the lead labs... and gemini is currently #1 in all multimodal media :) https://github.com/GoogleCloudPlatform/vertex-ai-creative-studio/tree/main

New comment 16d ago

Kevin Dragan

1 like • 20d

Another example. One fake, and one using a real place. Both completely automated by an ADK agent.

Kevin Dragan

1 like • 16d

@Scott Graham Thx…great to meet a fellow AI kool-aid drinker :)

Kevin Dragan

Apr 13 •

General discussion

Gemini ADK tutorial (...and the Future of Vibe Coding)

Hopefully you're all seeing the benefits of vibe coding. If you are, then you've seen the downside too. When it goes off the rails, you need structure to bring it back around. Many are now instituting manual processes to scaffold their vibe and it generates great results. I came across 2 packages that I believe are the way of the future. They add structure through a package installation that scaffolds the AI programmer with nodes, utility functions and a flow code that connects them. It can be built from a minimal or detailed PRD for a complete app or one just for a new feature. This structure improves the code and the accuracy, while providing a life-raft to return to if your code goes astray. They can both be used in Windsurf and Cursor. The proof is in the pudding.. check these out in the repos below. FYI, I also used the Pocket Flow system created app (`Codebase Knowledge) to create the best tutorial i've seen for learning the awesome new gemini ADK repo (it took 5 minutes... mostly processing time, and can work to explain any github repo as a tutorial!) ADK TUTORIAL https://github.com/Kjdragan/google-adk-tutorial VIBE CODE `CODE FRAMEWORKS`: https://github.com/The-Pocket/PocketFlow https://github.com/eyaltoledano/claude-task-master

Kevin Dragan

Feb 9 •

General discussion

Mind Blown...Again. MCP, Roo Code and Youtube

Just had to share this with the enthusiasts. In 3 minutes, I got Roo Code (VS code extension...technically i had it running in Windsurf IDE) to build its own MCP for fetching and processing Youtube transcripts. In one prompt, my Roo now gets the transcript, saves it to a file and will process it however you want.... exact transcript, cleaned up markdown, thoughtful summarized takeaways, etc. I've played around in the past with expansive hobby-code to "extract knowledge" from YT videos and analyze and structure output in various ways. Now Roo does this with one prompt using its own MCP with complete flexibility. This is some awesome (and scary) sh;t! So you're aware, the newest nOOb can pull this off... so please try this for yourselves. The prompt to Roo was simply "create a mcp you can use to fetch youtube transcripts using the youtube-transcript-api library when the user supplies a youtube url". Not exactly rocket science. You could make this much better, but Roo (claude 3.5) one-shotted it before i could even think about improving it. The costs for this was pennies, and if your careful, you could make it virtually free by using experimental gemini models... or my favorite $ hack.... v3 and R1. The OG 3.5 killed it for $0.39..... haven't tried the others where I'm sure it can be done probably for $0.01, or using a reasoning model with the MCP to create some amazing analysis. Go try it!! 🤯

New comment Mar 15

Kevin Dragan

2 likes • Mar 15

It will only get more dramatic from here. This is just the first case where the LLM in a chat interface is “untethered” an is directing actions that previously required human in the loop and more importantly is demonstrating the speed and intelligence that an electronic initiation can do as opposed to even the most skilled human directing something like inputting search parameters, websites to access etc.

Brandon Hancock

Feb 4 •

General discussion

Battle of the AI Models: Which One is Best for Agents?

I’m deep in testing O3-Mini, GPT-4o, DeepSeek-R1, and DeepSeek-V3 to figure out which model is truly the best for AI agents. To find the winner, I’m stress-testing them against the most common tasks agents need to handle: 🧠 Instruction Overload: Can it handle rule-heavy tasks without getting confused or hallucinating? 🛠️ Tool Call Hell: Can it handle 5 consecutive tool calls, feeding results from one into the next without breaking? 🔍 Needle in a Haystack: Can it retrieve precise information from large datasets while staying contextually aware? So far, one model is dominating—and I’ll be switching all of my agents over to it going forward. 💡 I’ll share the full results (and the winning model) in my upcoming video—recording tomorrow, dropping Thursday! Stay tuned. Which model do you think will win? Drop your guesses below! 👇

New comment Feb 7

Kevin Dragan

0 likes • Feb 6

@Reuben Lomas Many tests and benchmarks have shown how far Anthropic is behind on this because no reasoner model. They need to get going.

Kevin Dragan

Jan 25 •

General discussion

No $ No Latency local TTS for any need? Chats? Your Agent? Yes it's possible and easy NOW.

Eleven labs is cool but pricey. Local models are GPU intensive. Introducing Kokoro... a high quality, emotive Text to Speech Model. Its been out for a bit, but the local setup can be a pain. And no Ollama version available yet (but rumored Ollama is building that functionality). Now some great people have built a complete Docker version with built-in API to hit from your script for whatever REALTIME voice needs you have in your app, gpu and cpu build versions and a gradio and web interface. I tried the latest beta branch and there was ZERO latency with some great high-quality voices (even some weird ASMR ones... not judging ;) ). Don't be intimidated by Docker if you haven't used it (its free and basically just a complete .venv and app combo that is builds itself and is hosted in the cloud). This is a work in progress so i tried the 1.2 pre beta github branch using the web interface. The API is only set on the main 1.0 branch for now it seems. Did i mention it's basically a free voice service for whatever you're building? TLDR: https://github.com/remsky/Kokoro-FastAPI.git Easy to use README.md

New comment Jan 30

1-10 of 16

Level 3

35points to level up

Kevin Dragan

@kevin-dragan-5803

Former IB now seeking AI everything before the singularity 🤖

Active 3d ago

Joined Jun 26, 2024

Houston