Ask any software engineer or architect how best to do something, and most will default to thinking in terms of "patterns" asked, so I'll oblige with the odd pattern that crosses my mind ever so often. I won't detract from the fine work Jake's been doing with ICM, but rather maybe add on and write about specific use-cases or case studies from the work I've been doing with poor man's memory. Most of these usually arise when I'm back to actually writing software (or vibe coding smaller codebases with non-Anthropic models) while in token jail mostly (or in between contexts or debugging).
I've mentioned before that I'm not a fan of MCP. I'll qualify it further: I'm not a fan of MCP for personal builds and small projects where tokens are a luxury. The way MCP works is that you create a messaging channel between your AI agent and another AI agent (the published MCP host). The hosts has specific tools (in Telegram for example, that would be send_message, get_history, reply_message, etc). There's a lot of tool-calling between the the MCP host and telegram, and the MCP host and your agent. When you have multiple agents running on telegram = could be a potential token guzzler (or recipe for disaster). It does however, make the whole process of deploying and publishing services for agents and bots much easier. Connect your agent harness to the MCP host, and it taks care of the rest.
So I just checked the numbers for the telegram demo with 3 friends and Vera the agent bot. It was a proof of concept we set up in case anyone of us needed to demo to VCs quickly. It was created by the agents over a short 30 minute long session over the same telegram we developed for demos. 0 hallucinations. It was a stress test.
[Edit: Yeah... I'm a big fan of recursion (AI developed the memory, to build themselves, to build the telegram, to build the demo artifact)].
In Telegram, we sent just shy of 800 messages in just under 18 hours (Vera was responding about 2/3 of the time).
- About 250 messages sent by everyone except Vera
- So a typical MCP setup would have to ingest 250 message x 600 characters (150 tokens) average per message = 37.5K tokens of fresh message content
- Agents are dumb without context, so they need to be given context from the conversation history. Let's say we're keeping last 20 messages (from history) x 150 tokens x 250 messages = 750K just for context
- Then there's the overhead = the MCP host has to take the request from the bots requesting updates on new messages, past history, regular heartbeats. Let's say once a minute, enough to make the chat feel realtime. 1440 minutes x 100 tokens = 14.4K tokens just to check if new messages came in
- MCP Hosts need to perform their own processing, figure out which tools are needed etc let's put that at another 14.4K tokens conservatively.
That's 800K+ tokens just for input. With Sonnet 4.6 that's about $2.40 (Opus 4.6 = $4)
Then there's the output
- Those MCP hosts need to think, provide the output etc
- The MCP hosts need to provide their responses from telegram = 550 responses x 600 characters (150 tokens) = 330K tokens
- Your agents need to provide their responses to you = 330K tokens
- There's a tax on thinking, the back and forth between host and agents, let's put that another 330K tokens
Total output: 1M tokens (give or take). With Sonnet 4.6 that's about $15 (or $25 with Opus 4.6)
1.8M tokens and $29 per day
This is the tax most people pay for using telegram or any other P2P or IM service to talk to our agents via MCP.
For contrast, the Claude Max 20 Plan is about 39M tokens (give or take) every 5 hours (window message limit) or 300M tokens every week (model limit). That makes Pro ~ 2M tokens every 5 hours and 60M tokens every week (but Anthropic made it worse recently by introducing something similar to surge pricing but for LLMs during peak hours).
Thats 2.5% of the window and token limits towards just on the Pro Plan just to be able to talk to your agent over telegram (or Slack, or Discord) in a single user (you and your agent) chat.
I'm not sure what the cost is like for multi-user and group environments, but most agents are stateless and dumb, and most will default to canned responses when in a multi-user environment. It is the cost for maintaining a near real-time chat with an agent over a noisy communications channel like MCP.
I'm a millennial, and we've lived through at least 3 global financial meltdowns in our lifetimes, and I hate paying for extra usage if I could help it.
So this is the pattern or architecture that I use for our demo telegram channel:
- A collector running on a raspberry pi (connected to my development environment via tailscale)
- The collector is dumb, no AI, no LLMs
- Every 60 seconds it polls telegram for updates from the channel, and appends it to a SQLite database
- On my development machine, Haiku checks every 30s (60s if the telegram channel is chatty).
- It polls the collector, which returns last 20 messages + new messages only if there are new messages
- 2880 heartbeats x 10 tokens max x 28K tokens per day base
- 800K in context + new messages
- We'll round it up to 900K on Haiku = $1
- Haiku spits out max 100 tokens with the following format: { reply:<true|false>, immediate_reply: <string>, promise: <true/false>, promise_type: <text|photo|document|voice>, reply_to:<message_id>, status:<working|done|delivered>} x 250 messages = 25K tokens on Haiku ($0.025)
- Half of messages usually needs replying: 150 tokens x 21 (20 past + 1 new response) x 100 = 315K tokens
- Haiku decides, and Sonnet take care of responses (about $1)
Total input tokens = Haiku (900K) + Sonnet (315K), roughly 1M tokens ($2)
Output = 100 responses x 150 tokens = 15K on Sonnet ($0.045)
Total = $2.045 per day.
Of course the telegram agent I have running has some intelligence to track individual participants in a channel, sends and receives attachments (images, documents and voice notes). I still rely on Sonnet for images and documents (10K tokens each message x 500 = 5M tokens a day) and nothing for voice notes because it's a local whisper and voxtral setup that does TTS and STT for free.
But I've taken only the baselines for comparison. Might have missed some stuff, but happy to stand corrected if anyone else has their numbers.
But the point is: no message queues, no buses, no complex protocols. For a personal build = it was worth it. For enterprise = maybe not. And the simple tweak to the design of the architecture (from real-time, multi-peer MCP host) to collector - subscriber (this is what the pattern is called), means I will avoid token jail for the foreseeable future and have more room for serious stuff with Opus.
Just like Jake's folders and files... it's low tech, but the patterns and foundational principles behind it is pretty solid and scales up from single agent to multi-agents pretty well. Of course Jake's recommended architecture has MCP in it, for a very specific reason. I don't question it because there are valid reasons for it. The moment multiple apps and agents are needed, MCP is almost the most productive option.