Activity
Mon
Wed
Fri
Sun
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
What is this?
Less
More

Memberships

5minAI

2.7k members • Free

AI Agent Developer Academy

2.1k members • Free

Tech Snack University

10.9k members • Free

NATURAL 20

832 members • $37/m

AI Agents by BUSINESS24.AI

970 members • $24/m

AI Developer Accelerator

9.9k members • Free

Coding the Future With AI

1.3k members • Free

Agent Zero

1.6k members • Free

Dona's AI Community

1.6k members • Free

16 contributions to AI Developer Accelerator
Multimodal Generation MCP - AI Powered Image, Music and Video Studio
Check out Google's VertexAI Creative Studio on Github: it provides individual MCP servers with tools to automate the generation and editing of each of Google's media platforms for images, videos, music, TTS, even automated FFmpeg tooling. (Imagen, Veo, Lyria). I'll admit It is was a bit of a bear for me to get fully running, and i have some familiarity with GCP, but once you do, it's amazing. Start with creating the ADK Agent powered example in the "experiments" directory that sets up a google ADK agent to allow you to interact in a chat UI with each of the media MCPs. Once you've mastered that, you can add the whole media mcp system to Gemini's claude code version...Gemini CLI. That gives you the equivalent of a complete AI powered media studio in your terminal or through a chat UI. For me, it's one of the most amazing MCP automations i've seen. Caveat: this requires working with Google's GCP cloud offering which is a meaningful learning step of its own. But....If you haven't done this yet.... do so today as it comes with $300 of free GCP credits you can use for any cloud offerings and on gemini and media AI API credits. It basically pays for all your AI needs and getting up to speed on the Gemini infrastructure. This is a massive rabbit hole so be careful.... but I think for anyone, it's a perfect time to check out in advance of what's rumored to be the ultimate monster sota model release of Gemini 3, and the just released amazing imagen and veo models. I'm not affiliated in any way... i just think its good to diversify your AI understanding among the lead labs... and gemini is currently #1 in all multimodal media :) https://github.com/GoogleCloudPlatform/vertex-ai-creative-studio/tree/main
1 like • 20d
Another example. One fake, and one using a real place. Both completely automated by an ADK agent.
1 like • 16d
@Scott Graham Thx…great to meet a fellow AI kool-aid drinker :)
Gemini ADK tutorial (...and the Future of Vibe Coding)
Hopefully you're all seeing the benefits of vibe coding. If you are, then you've seen the downside too. When it goes off the rails, you need structure to bring it back around. Many are now instituting manual processes to scaffold their vibe and it generates great results. I came across 2 packages that I believe are the way of the future. They add structure through a package installation that scaffolds the AI programmer with nodes, utility functions and a flow code that connects them. It can be built from a minimal or detailed PRD for a complete app or one just for a new feature. This structure improves the code and the accuracy, while providing a life-raft to return to if your code goes astray. They can both be used in Windsurf and Cursor. The proof is in the pudding.. check these out in the repos below. FYI, I also used the Pocket Flow system created app (`Codebase Knowledge) to create the best tutorial i've seen for learning the awesome new gemini ADK repo (it took 5 minutes... mostly processing time, and can work to explain any github repo as a tutorial!) ADK TUTORIAL https://github.com/Kjdragan/google-adk-tutorial VIBE CODE `CODE FRAMEWORKS`: https://github.com/The-Pocket/PocketFlow https://github.com/eyaltoledano/claude-task-master
1
0
Mind Blown...Again. MCP, Roo Code and Youtube
Just had to share this with the enthusiasts. In 3 minutes, I got Roo Code (VS code extension...technically i had it running in Windsurf IDE) to build its own MCP for fetching and processing Youtube transcripts. In one prompt, my Roo now gets the transcript, saves it to a file and will process it however you want.... exact transcript, cleaned up markdown, thoughtful summarized takeaways, etc. I've played around in the past with expansive hobby-code to "extract knowledge" from YT videos and analyze and structure output in various ways. Now Roo does this with one prompt using its own MCP with complete flexibility. This is some awesome (and scary) sh;t! So you're aware, the newest nOOb can pull this off... so please try this for yourselves. The prompt to Roo was simply "create a mcp you can use to fetch youtube transcripts using the youtube-transcript-api library when the user supplies a youtube url". Not exactly rocket science. You could make this much better, but Roo (claude 3.5) one-shotted it before i could even think about improving it. The costs for this was pennies, and if your careful, you could make it virtually free by using experimental gemini models... or my favorite $ hack.... v3 and R1. The OG 3.5 killed it for $0.39..... haven't tried the others where I'm sure it can be done probably for $0.01, or using a reasoning model with the MCP to create some amazing analysis. Go try it!! 🤯
2 likes • Mar 15
It will only get more dramatic from here. This is just the first case where the LLM in a chat interface is ā€œuntetheredā€ an is directing actions that previously required human in the loop and more importantly is demonstrating the speed and intelligence that an electronic initiation can do as opposed to even the most skilled human directing something like inputting search parameters, websites to access etc.
Battle of the AI Models: Which One is Best for Agents?
I’m deep in testing O3-Mini, GPT-4o, DeepSeek-R1, and DeepSeek-V3 to figure out which model is truly the best for AI agents. To find the winner, I’m stress-testing them against the most common tasks agents need to handle: 🧠 Instruction Overload: Can it handle rule-heavy tasks without getting confused or hallucinating? šŸ› ļø Tool Call Hell: Can it handle 5 consecutive tool calls, feeding results from one into the next without breaking? šŸ” Needle in a Haystack: Can it retrieve precise information from large datasets while staying contextually aware? So far, one model is dominating—and I’ll be switching all of my agents over to it going forward. šŸ’” I’ll share the full results (and the winning model) in my upcoming video—recording tomorrow, dropping Thursday! Stay tuned. Which model do you think will win? Drop your guesses below! šŸ‘‡
0 likes • Feb 6
@Reuben Lomas Many tests and benchmarks have shown how far Anthropic is behind on this because no reasoner model. They need to get going.
No $ No Latency local TTS for any need? Chats? Your Agent? Yes it's possible and easy NOW.
Eleven labs is cool but pricey. Local models are GPU intensive. Introducing Kokoro... a high quality, emotive Text to Speech Model. Its been out for a bit, but the local setup can be a pain. And no Ollama version available yet (but rumored Ollama is building that functionality). Now some great people have built a complete Docker version with built-in API to hit from your script for whatever REALTIME voice needs you have in your app, gpu and cpu build versions and a gradio and web interface. I tried the latest beta branch and there was ZERO latency with some great high-quality voices (even some weird ASMR ones... not judging ;) ). Don't be intimidated by Docker if you haven't used it (its free and basically just a complete .venv and app combo that is builds itself and is hosted in the cloud). This is a work in progress so i tried the 1.2 pre beta github branch using the web interface. The API is only set on the main 1.0 branch for now it seems. Did i mention it's basically a free voice service for whatever you're building? TLDR: https://github.com/remsky/Kokoro-FastAPI.git Easy to use README.md
1-10 of 16
Kevin Dragan
3
35points to level up
@kevin-dragan-5803
Former IB now seeking AI everything before the singularity šŸ¤–

Active 3d ago
Joined Jun 26, 2024
Houston
Powered by