Reducing Infrastructure Costs by 40% on Kubernetes
February 12, 2024
The wake-up call
Our monthly AWS bill hit six figures before we took cost optimization seriously. The cloud makes it easy to spend money — every engineer can provision resources with a click. But bringing costs down requires discipline and the right tooling.
Spot instances
The single biggest cost saver: use spot instances for non-critical workloads. We run 70% of our cluster on spot instances with a fallback to on-demand:
nodeSelector:
eks.amazonaws.com/capacityType: SPOT
# Fallback via pod topology spread constraints
topologySpreadConstraints:
- maxSkew: 1
topologyKey: eks.amazonaws.com/capacityType
whenUnsatisfiable: ScheduleAnywayResource right-sizing
Most teams over-provision resources by 2-3x. Use the Vertical Pod Autoscaler in recommendation mode to get accurate resource requests:
# Install VPA
kubectl apply -f https://github.com/harjjotsinghh
# Get recommendations
kubectl describe vpa my-service-vpaCluster autoscaling
Don't run a fixed-size cluster. Use the Cluster Autoscaler to scale nodes up and down based on pod resource requests. This alone saved us 15% on compute costs.
Waste elimination
- Remove unused load balancers (they cost $20/month each just existing)
- Delete untagged resources (they're invisible in cost reports)
- Use S3 Intelligent-Tiering for infrequently accessed data
- Right-size RDS instances (we were running db.r5.4xlarge when db.r5.xlarge was sufficient)
The result
After six months of optimization work, our monthly infrastructure bill dropped by 40% — from six figures to a much more reasonable number. And we maintained the same reliability SLA.