Your Agent Forgets. Here's How I Made Mine Stop.
Long coding sessions forget things.
You decide something on Monday.
By Friday it's buried under 40,000 tokens of tool calls, file reads, and "thinking." You can't grep a session.
Re-reading it through an API is slow, expensive, and leaks every thought you had to a third party.
I wanted the durable facts.
Nothing else.
Here's what I built.
It runs locally, costs nothing, and nothing leaves the machine.
The Pattern
  1. Hook into the agent's Stop event. Claude Code gives you one. So does Cursor. So does any agent worth using.
  2. Track a byte offset per session. On each Stop, read only the new bytes since last time. No re-reading. No duplicate work.
  3. Pipe that delta through a tiny local model. I use gemma3:4b via Ollama. 3 GB. Runs on a laptop. Free.
  4. Ask it for ONE thing. Extract durable facts in a fixed line format. Format is a regex. Topics are a closed set. Anything off-format gets dropped.
  5. Append what survives to one markdown file. timeline.md. Append-only. Grep it. Tail it. Diff it.
That is it.
No vector DB. No RAG. No summarization pass. No API key.
The Insight
Small local models are shockingly good at structured extraction when the rules are narrow.
They are terrible at "be helpful, write a summary."
Give them one job:
"Read this delta. Output 0 to 10 lines matching this exact format, or output NONE."
That is a job a 4B parameter model can do.
It cannot do "summarize my week" well.
It cannot do "what did I decide?" well.
It can do "match this regex or output nothing."
The prompt does all the work.
The model is the cheapest, most replaceable part of the system.
The Principle
The intelligence is not the model.
The intelligence is the schema you force the model into.
A strict regex.
A closed topic set.
A required "reason" field under 80 characters.
Past tense only.
No inference beyond what is in the transcript.
Sanitization canary tests that fail the build if private terms leak into the prompt.
None of that is AI.
All of that is engineering.
The model is the text-in, text-out primitive in the middle.
You can swap gemma3 for any future 4B model and the system keeps working.
That is the whole point.
Stop prompting. Start defining outcomes.
The Repo
I shipped a sanitized public version of this pattern today.
One file to read:
observer/core.py. Everything else is wiring.
Clone it.
Install Ollama.
Pull gemma3:4b.
Copy one hook config into ~/.claude/settings.json.
Start a session.
Under 5 minutes from clone to first observation.
//
10
21 comments
Ari Evergreen
6
Your Agent Forgets. Here's How I Made Mine Stop.
Clief Notes
skool.com/cliefnotes
Jake Van Clief, giving you the Cliff notes on the new AI age.
Leaderboard (30-day)
Powered by