Key lessons & tech stack
- Move critical controllers (like Cluster Autoscaler) to IRSA or Pod Identity before changing AMIs.
- Separate concerns: IRSA for AWS APIs, Kubernetes RBAC for what the pod can do inside the cluster.
- Treat AMI upgrades as application changes: test in non-production with cordon/drain and synthetic scale-up/scale-down runs.
Why IRSA here:
For this incident we used IRSA as the fastest safe fix: the cluster already had an OIDC provider,
the Helm chart supported the “service account + annotation” pattern, and our AWS CDK stack had IRSA helpers.
Pod Identity stays on the roadmap for new clusters where we can design the model from day one.
Tech stack: AWS EKS, Amazon Linux 2 & Amazon Linux 2023, Kubernetes Cluster Autoscaler,
IAM Roles for Service Accounts (IRSA), Kubernetes RBAC, EKS OIDC, Terraform / AWS CDK (Python), Datadog.