What should i use for the best performance-to-price ratio?

I am looking for the most cost-effective way to utilize SOTA LLMs. I am debating between standard subscriptions (Claude Pro, Cursor, Gemini Advanced) and a custom setup: hosting an agent (e.g., Hermes-Agent) on a VPS using OpenRouter API keys. This modular approach would allow me to optimize 'cost-per-token' by routing complex queries to high-end models while using cheaper models for routine tasks. Given the lack of hardware for local inference, is this API-centric approach more efficient than a flat-rate subscription in terms of 'performance-to-price' ratio?

Whats your toughts? Is it cheaper and better to pay per token, or should i just buy Claude Pro?

3 comments