Your Agent Forgets. Here's How I Made Mine Stop.

Long coding sessions forget things.

You decide something on Monday.

By Friday it's buried under 40,000 tokens of tool calls, file reads, and "thinking." You can't grep a session.

Re-reading it through an API is slow, expensive, and leaks every thought you had to a third party.

I wanted the durable facts.

Nothing else.

Here's what I built.

It runs locally, costs nothing, and nothing leaves the machine.

The Pattern

Hook into the agent's Stop event. Claude Code gives you one. So does Cursor. So does any agent worth using.
Track a byte offset per session. On each Stop, read only the new bytes since last time. No re-reading. No duplicate work.
Pipe that delta through a tiny local model. I use gemma3:4b via Ollama. 3 GB. Runs on a laptop. Free.
Ask it for ONE thing. Extract durable facts in a fixed line format. Format is a regex. Topics are a closed set. Anything off-format gets dropped.
Append what survives to one markdown file. timeline.md. Append-only. Grep it. Tail it. Diff it.

That is it.

No vector DB. No RAG. No summarization pass. No API key.

The Insight

Small local models are shockingly good at structured extraction when the rules are narrow.

They are terrible at "be helpful, write a summary."

Give them one job:

"Read this delta. Output 0 to 10 lines matching this exact format, or output NONE."

That is a job a 4B parameter model can do.

It cannot do "summarize my week" well.

It cannot do "what did I decide?" well.

It can do "match this regex or output nothing."

The prompt does all the work.