Over the last decade, I have worked with e-commerce, fintech and analytics teams, designing and operating
cloud platforms that power real-world products. Recently my work has spanned AWS, Azure and GCP,
using Terraform and AWS CDK in Python, Kubernetes where it fits, and observability tooling such as Datadog and Dynatrace.
I enjoy turning loosely defined problems into automated, reliable platforms – from greenfield designs to migrations,
cost optimisation and incident response.
Here are some highlights of my profile:
Senior Cloud Engineer
Designed and operated a Kubernetes-based cloud platform for a large European e-commerce organisation, hosting high-traffic web and backend services.
Site Reliability Engineer
Worked as an SRE for an online auctions and automotive platform, focusing on reliability, performance and modernising legacy systems.
Associate Technical Lead – DevOps
Led DevOps initiatives for a global payments and fintech organisation, building secure, scalable infrastructure for payment and merchant services.
Senior DevOps Engineer
Supported large-scale foodservice and supply chain systems, modernising infrastructure and improving deployment workflows.
DevOps Engineer / Senior DevOps Engineer
Built and operated cloud infrastructure for analytics and machine learning products used by customers across multiple industries.
Systems Engineer
Worked on enterprise systems in the travel and hospitality space, supporting mission-critical applications and infrastructure.
Associate Application Support Engineer
Started my career supporting capital markets and trading platforms, working closely with customers and engineering teams.
A few examples of real projects where I designed, debugged and improved cloud platforms with measurable impact.
Role: Senior Cloud Engineer for a high-traffic e-commerce EKS platform.
During our migration from Amazon Linux 2 (AL2) to Amazon Linux 2023 (AL2023), the EKS Cluster Autoscaler
suddenly stopped scaling: pods were stuck in Pending and logs showed
“Failed to get nodes from apiserver: Unauthorized”. The tighter metadata behaviour on AL2023
broke our previous assumption that the autoscaler could “borrow” the node IAM role.
Impact: Restored safe, predictable autoscaling on AL2023 in non-production before touching production, and created a reusable IRSA + RBAC pattern for other controllers (Cluster Autoscaler, ExternalDNS, load balancer controllers) across the organisation.
Why IRSA here: For this incident we used IRSA as the fastest safe fix: the cluster already had an OIDC provider, the Helm chart supported the “service account + annotation” pattern, and our AWS CDK stack had IRSA helpers. Pod Identity stays on the roadmap for new clusters where we can design the model from day one.
Tech stack: AWS EKS, Amazon Linux 2 & Amazon Linux 2023, Kubernetes Cluster Autoscaler, IAM Roles for Service Accounts (IRSA), Kubernetes RBAC, EKS OIDC, Terraform / AWS CDK (Python), Datadog.
Role: Senior Cloud & DevOps Engineer, leading Datadog governance for a multi-team engineering organisation.
Our Datadog logs setup “worked”, but ownership and costs were blurry. Dozens of teams shipped logs with inconsistent tags and ad-hoc indexes, making it hard to see who owned which volume, how long data stayed, and why costs kept creeping up.
team, costcenter, appgroup, env and retention.Impact: Made log retention an explicit product team decision instead of a central bottleneck, improved cost transparency and paved the way for full IaC ownership of Datadog log indexes and enforcement rules.
index-retention-period-03,
-07, -15, -30, -90
matching only fully tagged logs with allowed retention values.LogsIndexManager module in Pulumi
(Python) to manage indexes, routing rules and (optionally) index order and enforcement.
Tech stack: Datadog logs & monitors, tag-based routing and indexes,
Pulumi (pulumi-datadog), AWS workloads (EKS/Lambda/EC2), shared tagging model
for logs, metrics and traces across 70+ engineering teams.
Role: Platform/DevOps Engineer leading a registry migration from Google Artifact Registry / Container Registry to AWS ECR.
As part of a wider platform move to AWS, dozens of image repositories had to move from GCP (with hierarchical
paths like eu.gcr.io/project/app/service) to AWS ECR, which uses flatter repositories and tags.
A naive “pull & push” risked overwriting tags or losing the original structure.
project-app-service:1.2.3).--limit) and a safe dry-run mode with clear logging.Impact: Enabled teams to migrate image repositories without accidentally overwriting tags or losing traceability, and produced a reusable migration tool that can be shared or open-sourced for similar GCP → ECR moves.
Tech stack: Python, Docker CLI, Google Artifact Registry / Container Registry, AWS ECR, AWS CLI, bash automation and CI integration where needed. The tool encapsulates the GCP hierarchical naming model and the flatter AWS ECR repository/tag model so teams don’t have to think about it on every migration.
I write about Cloud, DevOps and platform engineering on Medium, and occasionally join podcasts to share lessons from real-world migrations and incidents.
A recent conversation where I talk about my work, platform engineering and lessons learned.
Completed primary and secondary education at Royal College, Colombo 7, Sri Lanka .
BSc (Hons) in Information Technology from Sri Lanka Institute of Information Technology (SLIIT) .
MSc in Information Technology – Cyber Security from Sri Lanka Institute of Information Technology (SLIIT) .