Eval Genius
What's education vs distraction.
Over the past two weeks I got into Evaluations.
The rate of Agent improvement is going to matter.
The ease of human feedback will be key to that rate.
Humans hate evaluating (it's the flossing of AI work). So we've gotta make it simpler.
So I created an eval framework.
Built a slack app using their new AI SDK
Learned how to attach all of that to an auth provider.
Built out the back end enough to send some tests through.
Called it Eval Genius, made a bit of a logo, make a business canvas.
Then found Humanloop and Weights & Balances which are really good frameworks.
So... was it all a waste?
Nope. I learned how to build a Git repositories.
Slack backends.
Connect all the things: n8n/notion/elevenlabs/blobs/enumerators/unfurls/slackblockkit
Ultimately, I'll go with either W&B or Humanloop and I'll need to pay a dev to actually build it right.
I build almost all of it on v0.dev with Vercel & supabase.
3
5 comments
Tim Lockie
2
Eval Genius
AutoSkool.Club AI
skool.com/autoskool-club
To become a top AI Practitioner, learning how to manage all these AI tools proficiently, is mandatory. I will teach you how to use them like a pro.
Leaderboard (30-day)
Powered by