User
Write something
I built an AI agent that reads my AWS bill and tells me exactly what to cut
AWS shipped its own FinOps Agent last week (June 9, public preview). It's genuinely good: plain-English cost investigations, anomaly root-cause, scheduled reports. But you don't have to wait for the preview, and you don't have to hand your entire cost dataset to a managed black box. You can build a leaner version yourself in an afternoon, and you'll understand every line of it. Here's the core idea. Your AWS Cost and Usage data is just numbers. LLMs are great at turning structured numbers into prioritized, plain-English actions, IF you feed them the right slice. The pattern is four stages. Stage one, Collect: pull the last 30 to 60 days of cost data from the Cost Explorer API (boto3, get_cost_and_usage), grouped by SERVICE and by linked account, plus rightsizing and idle-resource signals from Compute Optimizer and Cost Optimization Hub. Stage two, Reduce: compress that into a compact JSON summary of top movers, week-over-week deltas, and biggest absolute spend. This is the step that matters most. Stage three, Reason: hand THAT to Claude or GPT with a tight prompt. "You are a FinOps analyst. Here is the cost summary. Return the top 5 actions ranked by dollars saved per month, each with the exact AWS steps and the risk." Stage four, Deliver: render the ranked report to Slack, a PDF, or a ticket. The magic isn't the model. It's the data engineering before the model. Feed it raw billing CSVs and you get mush. Feed it a clean "top 10 movers plus deltas" summary and you get an action list a senior engineer would write. Three things I learned the hard way. First, never paste raw account IDs or ARNs into a public LLM, mask them; FinOps data is sensitive. Second, anchor the model with thresholds: tell it to flag anything that grew over 20% week-over-week or any single idle resource over $200 a month, or it hedges. Third, make it output a diff, not a dashboard. "What changed and what do I do" beats "here are 40 charts" every time. That's literally why AWS's own agent emphasizes investigation summaries over more graphs.
0
0
Your AI inference cluster is probably burning 60% of its GPU budget on idle silicon. Here's the fix.
Quick gut check: open your Kubernetes cluster right now and run kubectl describe node on any GPU node. Look at the nvidia.com/gpu requests vs. actual utilization in your monitoring. If you're like most teams I audit, you'll find pods that reserve a full A100 or L4 and then sit at 8-15% GPU utilization all day. That's not a tuning problem. That's money on fire. Here's why it happens. Kubernetes treats nvidia.com/gpu as a countable, non-divisible resource by default. Request 1 and you get the whole card, even if your model only needs a sliver of it. Most inference models (anything under ~7B params, embeddings, rerankers, OCR, classic CV) don't come close to saturating a modern GPU. So you end up with a 1:1 pod-to-GPU mapping and a fleet that's mostly idle. Three levers fix the bulk of this, in order of effort: 1) GPU time-slicing (NVIDIA device plugin). Lets multiple pods share one physical GPU by oversubscribing time. Zero hardware requirement, works on almost any card, configured with a single ConfigMap. Best for bursty, latency-tolerant inference. Downside: no memory isolation, a leaky pod can OOM its neighbors. 2) MIG (Multi-Instance GPU) on A100/H100/H200. Hardware-partitions one GPU into up to 7 isolated instances, each with dedicated memory and compute. Real isolation, predictable performance. Best for production multi-tenant inference. Downside: only on data-center GPUs, fixed partition profiles. 3) Right-size the request, then bin-pack onto Spot. Once pods share GPUs cleanly, schedule them onto spot/preemptible GPU nodes with proper PodDisruptionBudgets and a fallback node pool. This is where the 60-80% savings actually land. Real example from a recent audit: a team running 12 inference services on 12 dedicated L4s. After time-slicing the latency-tolerant services 4:1 and moving them to spot, they dropped to 4 GPUs with a 1-GPU on-demand safety pool. ~65% monthly GPU cost cut, no SLA regression.
0
0
The Cloud Bill Nobody Wants to Open
Let’s be honest. Most engineers love building things. Very few enjoy opening the monthly cloud bill. At first, cloud costs seem manageable. A few virtual machines, some storage, a database or two. Then the project grows. More environments. More teams. More services. More “temporary” resources that somehow survive for years. And suddenly everyone is asking the same question: “Why are we spending so much?” The Funny Thing About Cloud Costs The biggest cloud bills rarely come from one huge mistake. They come from hundreds of small decisions. A larger VM because it’s easier. A test environment left running over the weekend. Snapshots that nobody reviews. A Kubernetes cluster sized for traffic that never arrived. Individually, these decisions seem harmless. Together, they become a budget problem. FinOps Isn’t About Spending Less This is where many people get FinOps wrong. FinOps isn’t about cutting costs at all costs. It’s about understanding where money creates value and where it creates waste. Nobody complains about spending money on infrastructure that helps the business grow. People complain when they’re paying for things nobody uses. That’s a big difference. Engineers Are Part of the Solution For years, cloud costs were often treated as a finance problem. Today, that’s changing. The engineers designing the architecture are often in the best position to optimize it. A small design decision can save thousands of dollars per year. A good tagging strategy can reveal hidden waste. A simple automation can shut down non-production environments every night. The best cost optimization stories often start with engineers asking: “Do we actually need this?” Where AI Comes In This is one of the reasons I’m excited about combining AI and cloud engineering. AI can already help: - analyze cloud bills - identify unusual spending patterns - recommend rightsizing opportunities - explain cost spikes - generate optimization reports Instead of spending hours digging through dashboards, engineers can focus on making decisions.
1
0
What this community is (and what it isn't)
If you landed here, you're probably a cloud engineer, DevOps engineer, sysadmin, MSP, or IT consultant dealing with one or more of these problems: - Your cloud bill is rising and you don't know exactly where the waste is - Someone asked you for an IAM audit and you're building it from scratch - You need to deliver a security review to a client and have no template - Your Kubernetes cluster costs more than it should and you're not sure why - You're writing the same Terraform review checklist for the third time - You need a script that does the thing — and Stack Overflow isn't cutting it That's what this community is for. What gets published here: Every week you get usable technical assets. Scripts you can run. Checklists you can hand to a client. Templates you can drop into a report. Queries you can paste into CloudWatch, Azure Monitor, or BigQuery. Runbooks for the incidents you don't want to debug from scratch at 2am. AWS. Azure. Google Cloud. All three, with equal depth. What this is not: - Not a course. No videos to watch. - Not a motivational community. - Not a news digest. - Not beginner theory. Who it's for: Cloud engineers and DevOps teams already in the cloud. MSPs and consultants who need client-ready deliverables without building them from scratch each time. To get started: 1. Introduce yourself below — what cloud(s) do you work on, what's your biggest headache right now? 2. Check the Classroom — first assets are already there 3. Come back Thursday — that's when the weekly Q&A thread opens — Richard
1-4 of 4
powered by
AI for Cloud Engineers
skool.com/cloud-cost-optimization-3746
Automate your cloud work with AI. GCP, Azure, VMware. Save hours every week with real workflows.
Build your own community
Bring people together around your passion and get paid.
Powered by