How I measure my Session Health
my Live Health Stream is a real-time telemetry dashboard designed to monitor the "cognitive load" and operational stability of autonomous AI agents across different sessions and models (e.g., Gemini 3.1 Pro, Claude Sonnet 4.6). I am using both AntiGravity and Claude Code. They write to my obsidian vault, and my dashboard is a HTML/CSS version of that data along with python scripts that aggregate my json data for metrics I want to measure.
Tool Error Rate % (Red Line)
  • What it is: The frequency at which the agent is failing to execute its tools properly (e.g., syntax errors, bad file paths).
  • How it’s calculated: (Failed Tool Calls / Total Tool Calls in the session) * 100. Notice on the graph how the red line spikes almost immediately after the green saturation line peaks.
Context Saturation % (Green Line)
  • What it is: A measure of how "full" the agent's brain is getting.
  • How it’s calculated: (Total Input Tokens / Model's Maximum Context Window) * 100. If this hits 100%, the agent starts forgetting earlier instructions (context eviction).
Lessons Learned (Yellow Bars)
  • What it is: The number of times an agent gets caught in a recursive failure loop (trying the exact same failed action over and over).
  • How it’s calculated: Programmatically tracked by analyzing the execution log for identical, sequential failed tool calls within a single session. I have a "Lessons Learned" log that I have my agents read on boot, and then if they **** up they have to record a lesson there, and update the .json file for that session where i build this graph from.
Input/Output Tokens (Blue/Purple bars)
Since this is not exposed I have to do an estimate:
"input_tokens": input_bytes // 4,
"output_tokens": output_bytes // 4,
One of my core philosophies is No Vanity Metrics, and another is Things that are Measured Improve, and Ultimately, The Most Important Things Cannot Be Measured.
10
10 comments
Peder Halseide
4
How I measure my Session Health
Clief Notes
skool.com/cliefnotes
What we give away free beats most paid courses. Build durable AI systems with a Marine vet and Edinburgh researcher. 40+ lessons, growing.
Leaderboard (30-day)
Powered by