Knowledge extraction pipeline for YouTube videos. Use this on Jake's videos!!!

You watch a lot of content in this space. Demos, walkthroughs, systems builds, production case studies. Some of it is genuinely valuable. Most of it evaporates.

A summary gives you a shorter version of what was said. What you want to know is: is there anything here worth keeping, what's the mechanism behind it, and what would you build differently knowing it?

What this does: YouTube URL in. Claude extracts 3-7 discrete claims worth keeping. Each one gets:

Concept - the core assertion
Mechanism - the causal explanation (the most important field)
So what - what you'd build or decide differently
Open questions - what this raises but doesn't answer

Appended to a local markdown log you own.

Real example, from Curtis Hays's Collideascope OS walkthrough:

Concept: ICM deploys fast only when the doctrine layer is already documented. The folder structure is the last step, not the first.

Mechanism: Curtis had 8 months of prior work before touching the ICM - documented beliefs, brand voice, organizational why/how/what, all in markdown. He brought that corpus in and said "organize it using this structure." The system came together quickly because the content existed. Without pre-existing doctrine, the ICM produces mechanics without a belief layer.

So what: Before building the folder structure, ask: is the doctrine layer written down? Cloning a blueprint without existing beliefs produces a technically correct but contextually empty system.

That's not in any summary. That comes from extraction.

How repo works: Primary mode uses Claude Code with your existing subscription. No API key needed.Three components, each with one job: fetch_ transcript.py gets the transcript, prompts/extract_default.md tells Claude what to look for, CONTEXT.md tells Claude Code how to run the workflow.

The prompt file is the thing to edit. The scripts are plumbing.

If you've been building with ICM, you'll recognize the architecture: prompts/ is Layer 3 factory configuration, logs/ is Layer 4 working artifacts.

Edit the prompt, change what gets extracted - no code changes.

Default prompt is tuned for AI systems and workflow architecture content. There's a second prompt in the repo showing how to adapt it for a different use case.

Get it: https://github.com/djterry15-afk/kim-youtube-ingest

Clone it, open it in VS Code, ask Claude Code to ingest a URL. Three setup commands.

This is Phase 1 of a longer pipeline - eventually covering articles, books, and research and building toward a quarriable knowledge base. Phase 1 is standalone and useful now.

13 comments