User
Write something
Afternoon Tea is happening in 3 days
Flip the Script - It's all about the $$$
Many of us in the AI space are still selling the wrong thing. We package custom models, prompt engineering, agent setups, or hourly consulting. The conversation stays locked on tools, hours, and technical specs. By 2026 that model has clearly hit its limit. I’ve lived the shift firsthand while grinding open-source stacks for nonprofit work—Cloudflare tiers, Alibaba credits, Jake’s method, ZeroClaw experiments, and finally landing on Hermes Agent + Cognee. The tech itself became fast and cheap. Specialized work turned into a commodity almost overnight. Clients compare on price and speed, and the race-to-the-bottom kicks in. That’s why I flipped the script: stop selling the service and start selling the measurable value. Lead with real outcomes—less manual grunt work, tighter decision loops, lower operational costs, and repeatable processes that actually move the needle. Structure engagements around shared success metrics instead of hours or fixed deliverables. Price based on impact: performance bonuses, value-share pieces, or retainers tied to sustained results. When the system improves, everyone wins. Jake’s approach is a perfect example of this in action. He’s giving away his Interpretable Context Methodology for free, openly sharing the full system so anyone can adopt it. That single move creates exponential value across the entire community—more capable setups, clearer thinking, and faster progress for all of us without gatekeeping or hourly billing. Keep your own work minimal and transparent. Dig into the client’s real friction points. Co-define success in concrete terms. Build lightweight, maintainable architectures instead of black boxes. Show clear before-and-after results. The outcome is powerful: the system handles the repetitive load reliably, freeing up attention for the judgment and creative work only you (or your client) can do. You shift from being another vendor to becoming the guide who cuts through the noise. I’ve seen this mindset pay off in my own volunteer projects and early client conversations. It just feels cleaner and more aligned.
LLM's as judges and adversarial testing
A lot of gurus already teach the bits on how everyone can build agents with tonnes of skills to build their $30M ARR micro SaaS, or replace their $500K a year small business. So I'm going to go off the beaten track and cover a topic no one likes hearing: your agents can make mistakes, LLMs can hallucinate, and somehow you need to figure out when it happens and fix it. I have this process where as I am building, I will have my co-pilot or LLM work out tests. It will execute the agent, run a couple of scenarios and prompts, and determine if the agent's responses measure up to pre-determined pass or fail conditions. It keeps a record of all the tests we've done throughout the build, and at the end, I get co-pilot or coding agent to make me a scripted standardised test suite that we can run to score the agent's performance. The first part of the tests involve my coding agent acting as a judge = how good the responses are and how well they stick to what we know to be reasonably good responses. Sometimes an LLM as a judge is not needed because some builds don't return subjective responses. They are needed when human-like reasoning is required to interpret responses. The second part is adversarial testing = I get the coding agent to help me design scenarios intended to trip up or trick the agent's into giving the wrong answers. - I usually run these tests at every major milestone in the build and periodically when running the agents (even in production environments). - I walk through the scores with the coding agent to root cause and perform interim fixes - Then we monitor and run the scores again at a later time to see if the fixes held. - Tests and results are always recorded. - When we've run enough of these tests, they get turned into some automated gate for determining whether agents should be monitored closely, triaged, or discarded. The screenshots are from my most recent build. I was designing a different, more compact, memory system, and needed to know if I could objectively trust (1) the responses coming from both the agent running the system and (2) the coding agent that's building it.
LLM's as judges and adversarial testing
Skills and Tools in ICM
Hoping to get a few questions answered on skills/tools based on the resource materials and examples from the classroom. 1. Do you guys keep a "global skills" folder at the project level (ie. for skills used for multiple workspaces)? That way you don't have to maintain the same skill in multiple places? 2. In the workspace-blueprint example, it says "skills work best when they're wired into the CONTEXT.md files. I asked Claude how to "wire" (which I assume just means making it accessible via backslash) and Claude said it needs to exist in the .claude/ folder, but I don't see this in the example. 3. The example also states "you can wire up to 15 skills per workspace". I don't see this limitation in any documentation - is this just a suggestion? If so, why? If you invoke a skill by name or md, why would you want to limit the number of skills in a workspace - if the ICM works as intended, it should only load in the specified skills.
Skills and Tools in ICM
Who's here? Drop your intro.
Tell us three things: 1. What you do (job, industry, student, career-changer, whatever) 2. What brought you to Clief Notes 3. One thing you're trying to figure out right now related to computing or AI I'll respond to every single one. And read each other's intros too because the person who's stuck on the same problem as you might already be in this thread. I'll go first I am Jake, I have been working in tech for 15 Years, building with Generative AI for 3 Years straight now! Excited to teach and learn! That's it. Simple, scannable, gives you data on who's joining and what they need, and keeps the feed clear for content that retains people past week one.
OpenAI Workspace Agents Are ICM in Production
OpenAI just launched Workspace Agents in ChatGPT (openai.com/index/introducing-workspace-agents-in-chatgpt) and the architecture they're describing maps almost exactly to what Interpretable Context Methodology (ICM) has been doing. Sequential stages. Structured context loaded per step. Input/output contracts. Human review gates before the next step runs. Agents that keep working after you close the tab. The difference is not in the concept, it's in the implementation. OpenAI delivers this as a cloud product built on top of Codex. ICM delivers the same logic through the filesystem: portable, versionable, zero framework dependency, no vendor lock-in. The observability story is also different. When you want to inspect what the agent did at any step, you open the folder and read the file. No dashboard to configure, no logging layer to build, no special tooling. The structure itself is the audit trail. The OpenAI announcement is good news for everyone building in this space. It validates that structured, stage-based, reviewable AI workflows are the right direction for real work inside organizations.
0
0
1-30 of 357
Clief Notes
skool.com/quantum-quill-lyceum-1116
Jake Van Clief, giving you the Cliff notes on the new AI age.
Leaderboard (30-day)
Powered by