I specialize in platform engineering and Kubernetes ecosystems — building the shared infrastructure, tooling, and standards that let engineering teams move fast and operate reliably at scale.
Over the years, I have worked across e-commerce and data-intensive platforms, shaping platform architecture and strategy, driving engineering standardization, and helping organizations navigate cloud adoption, large-scale migrations, and operational maturity.
I focus on turning complex, loosely defined problems into scalable, automated, production-ready platforms — and on influencing how engineering organizations build and operate those platforms long-term.
Here are some highlights of my profile:
Senior Cloud Engineer
Designed and evolved a Kubernetes-based cloud platform for a large European e-commerce organisation, enabling teams to build and operate services at scale.
Senior Site Reliability Engineer
Drove reliability engineering and platform modernisation for a high-traffic auctions and automotive platform, improving system stability and engineering practices org-wide.
Senior DevOps Engineer / Associate Technical Lead – DevOps
Led platform and DevOps engineering for large-scale foodservice and supply chain systems, driving architectural standards and mentoring engineers across multiple teams.
DevOps Engineer / Senior DevOps Engineer
Designed and operated cloud infrastructure for analytics and machine learning products serving customers across multiple industries, growing from engineer to senior ownership.
Systems Engineer
Supported mission-critical enterprise applications in the travel and hospitality space, maintaining high availability and contributing to release engineering improvements.
Associate Application Support Engineer
Started my career supporting capital markets and trading platforms, working closely with customers and engineering teams.
A few examples of real projects where I designed, debugged, and improved cloud platforms with measurable impact.
Role: Senior Cloud Engineer for a high-traffic e-commerce EKS platform.
During our migration from Amazon Linux 2 (AL2) to Amazon Linux 2023 (AL2023), the EKS Cluster Autoscaler
suddenly stopped scaling: pods were stuck in Pending and logs showed
“Failed to get nodes from apiserver: Unauthorized”. The tighter metadata behaviour on AL2023
broke our previous assumption that the autoscaler could “borrow” the node IAM role.
Impact: Restored safe, predictable autoscaling on AL2023 in non-production before touching production, and created a reusable IRSA + RBAC pattern for other controllers (Cluster Autoscaler, ExternalDNS, load balancer controllers) across the organisation.
Why IRSA here: For this incident we used IRSA as the fastest safe fix: the cluster already had an OIDC provider, the Helm chart supported the “service account + annotation” pattern, and our AWS CDK stack had IRSA helpers. Pod Identity stays on the roadmap for new clusters where we can design the model from day one.
Tech stack: AWS EKS, Amazon Linux 2 & Amazon Linux 2023, Kubernetes Cluster Autoscaler, IAM Roles for Service Accounts (IRSA), Kubernetes RBAC, EKS OIDC, Terraform / AWS CDK (Python), Datadog.
Role: Senior Cloud & DevOps Engineer, leading Datadog governance for a multi-team engineering organisation.
Our Datadog logs setup “worked”, but ownership and costs were blurry. Dozens of teams shipped logs with inconsistent tags and ad-hoc indexes, making it hard to see who owned which volume, how long data stayed, and why costs kept creeping up.
team, costcenter, appgroup, env and retention.Impact: Made log retention an explicit product team decision instead of a central bottleneck, improved cost transparency and paved the way for full IaC ownership of Datadog log indexes and enforcement rules.
index-retention-period-03,
-07, -15, -30, -90
matching only fully tagged logs with allowed retention values.LogsIndexManager module in Pulumi
(Python) to manage indexes, routing rules and (optionally) index order and enforcement.
Tech stack: Datadog logs & monitors, tag-based routing and indexes,
Pulumi (pulumi-datadog), AWS workloads (EKS/Lambda/EC2), shared tagging model
for logs, metrics and traces across 70+ engineering teams.
Role: Platform/DevOps Engineer leading a registry migration from Google Artifact Registry / Container Registry to AWS ECR.
As part of a wider platform move to AWS, dozens of image repositories had to move from GCP (with hierarchical
paths like eu.gcr.io/project/app/service) to AWS ECR, which uses flatter repositories and tags.
A naive “pull & push” risked overwriting tags or losing the original structure.
project-app-service:1.2.3).--limit) and a safe dry-run mode with clear logging.Impact: Enabled teams to migrate image repositories without accidentally overwriting tags or losing traceability, and produced a reusable migration tool that can be shared or open-sourced for similar GCP → ECR moves.
Tech stack: Python, Docker CLI, Google Artifact Registry / Container Registry, AWS ECR, AWS CLI, bash automation and CI integration where needed. The tool encapsulates the GCP hierarchical naming model and the flatter AWS ECR repository/tag model so teams don’t have to think about it on every migration.
I write about Cloud, DevOps and platform engineering on Medium, and occasionally join podcasts to share lessons from real-world migrations and incidents.
A recent conversation where I talk about my work, platform engineering and lessons learned.
Completed primary and secondary education at Royal College, Colombo 7, Sri Lanka .
BSc (Hons) in Information Technology from Sri Lanka Institute of Information Technology (SLIIT) .
MSc in Information Technology – Cyber Security from Sri Lanka Institute of Information Technology (SLIIT) .