Senior Cloud Platform Engineer building self-healing infrastructure at MongoDB.
Previously led a team of 5 at NoBroker — saving ₹10L+ annually on
logging infra and shipping a self-service staging platform for 100+ developers.
Nine years deep in cloud platforms, container orchestration, and the quiet engineering
of systems that need less attention over time — through automation,
observability, and self-service.
I spend my days designing systems that take operational toil and convert it into
Go code, RabbitMQ queues, and Temporal workflows. The work I'm proudest of looks
small from the outside: a logging pipeline that's 10× cheaper,
a staging environment that spins up in under five minutes,
a Go service that quietly closes 38% of incident tickets before
a human reads them.
I'm CKA-certified at 91% (top 9th percentile), AWS/GCP certified,
and have led a team of five through a PCI DSS audit. I write about what I build —
two technical articles on Medium have reached over a thousand engineers.
02
Measurable impact
numbers worth keeping
38%
Manual ops work eliminated
MongoDB · Go automation
₹10L/year
Infrastructure cost saved
NoBroker · ClickHouse migration
10:1
Log compression achieved
LZ4HC · 5K+ logs/sec
100+
Developers unblocked
Project Blackbox · staging-as-a-service
03
Experience
Cloud Operations Engineer
MongoDB · Atlas platform
Dec 2023 — Present2 years
Building Go services that categorize, correlate, and auto-resolve
Atlas infrastructure issues across AWS, GCP, and Azure — turning 3 years of repetitive
playbook work into deterministic automation.
Platform engineering
Automated infrastructure issue resolution. Built a Go service that categorizes incoming Jira tickets, compares cluster goal-state vs. current state, and auto-applies playbook fixes — eliminating 38% of manual ops work. Stack: Go, RabbitMQ, MongoDB, Jira APIs, Kubernetes, Prometheus, Grafana, Splunk.
Gateway service migration. Removed RabbitMQ from the alerting hot-path by building a single gateway backed by MongoDB Changestreams and the Temporal framework. Idempotent child workflows eliminated race conditions and cut alert processing time.
Proactive cloud outage detection. Engineered a multi-source correlation system with a sliding-window algorithm (1-hour lookback) that produces confidence scores from Prometheus alerts and cloud-provider health dashboards, then auto-creates tracking incidents.
Atlas operations
Resolve Atlas infrastructure incidents surfaced by Go-planner automations across AWS, GCP, and Azure.
Coordinate with cloud-provider TAMs for RCA on integration failures.
Perform operational actions (resync, cluster maintenance) and prevent customer impact through proactive alerting.
Senior DevOps Engineer
NoBroker.com · led team of 5
Aug 2021 — Dec 20232 yrs 5 mo
Led a team of 5 DevOps engineers driving platform initiatives.
Owned PCI DSS certification for NBPay and led the ClickHouse migration that saved
₹10L+ annually.
Leadership
Led a team of 5 DevOps engineers over 2 years, driving platform initiatives, conducting code reviews, and mentoring on Kubernetes, CI/CD, and cloud best practices.
Established team processes for incident response, on-call rotations, and infrastructure-as-code reviews.
Security & compliance
PCI DSS certification lead for the NBPay payments system — server hardening, cloud security controls, network segmentation, audit logging.
Standardized vulnerability scanning, access controls, and encryption at rest / in transit.
Technical wins
ClickHouse migration. Replaced Elasticsearch with ClickHouse — ₹10L+ annual savings, 10:1 compression with LZ4HC + TTL expressions, 5K+ logs/sec throughput. Published on Medium.
Awarded Star Performer for cost & performance impact.
Promoted to Senior DevOps Engineer recognizing technical leadership.
DevOps Engineer
NoBroker.com
Jun 2020 — Jul 20211 yr 2 mo
Implemented Linkerd service mesh across microservices, architected
the initial Kubernetes infra for NoBrokerHood,
and shipped Project Blackbox — staging-as-a-service for 100+ devs.
Service mesh
Implemented Linkerd as sidecar proxy across microservices to solve inter-service communication between NoBroker and NoBroker Search.
Enabled traffic monitoring, load balancing, and rate limiting — eliminated race conditions and improved reliability.
Gained observability into request flows, latency, and service dependencies.
NoBrokerHood
Architected initial Kubernetes infrastructure for the society-management platform.
Built nginx automations for application scaling and performance tuning.
Project Blackbox
Self-service staging platform on Docker Swarm — eliminated bottlenecks for 100+ developers.
Jenkins multi-branch pipelines (Groovy 47.5%) with Perl automation (27.1%); Swarm + Prometheus + Loki monitoring; Git-to-HTTPS automation via Traefik + Let's Encrypt with dynamic subdomains.
Published on NoBroker Engineering Medium — read article →
DevOps Engineer
DXC Technology
Apr 2019 — Jun 20201 yr 3 mo
Multi-cloud infrastructure management across AWS, GCP, and Azure. Led the
Terraform 0.11 → 0.12.1 migration — SPOT Award for delivery.
Received SPOT Award for exceptional project delivery.
Technical Lead
Cognizant Technology Solutions
Jun 2017 — Apr 20191 yr 11 mo
Production application-server management and middleware operations. SSL automation.
Client Performer Award for operational excellence.
Production application server management and middleware operations.
SSL certificate migration and automation.
Awarded Client Performer for operational excellence.
Senior System Engineer
Cognizant Technology Solutions
Jun 2015 — Jun 20172 yrs 1 mo
Application server management and middleware support. Production environment
maintenance and troubleshooting.
Application server management and middleware support.
Production environment maintenance and troubleshooting.
04
Selected projects
case studies
01MongoDB
Automated infrastructure issue resolution
Self-healing Atlas · Go automation
−38% manual work3 years of toil replaced
Go service with a node-based execution graph using a modular executor pattern.
Fetches Jira tickets, compares Atlas cluster goal-state vs. current, applies
playbook fixes, or escalates to customers.
Removed RabbitMQ dependency by building a single gateway service.
MongoDB Changestreams for event-driven processing, Temporal framework
for idempotent child workflows.
STACKGolangMongoDB ChangestreamsTemporalTTL
03MongoDB
Proactive cloud outage detection
Correlation engine · confidence scoring
Auto-incident creation1-hour sliding window
Multi-source correlation engine using a sliding-window algorithm
with a 1-hour lookback. Correlates cluster signals with cloud-provider
health dashboards, produces confidence scores, auto-creates tracking incidents.
Architected initial Kubernetes infrastructure for the society-management
application. Built automation scripts, nginx configurations, and scalable
environments for platform expansion.
Capstone: Modelling and analysis of high-boost DC–DC converter — published in IEEE.
07
Publications
1,000+ readers
Real-Time Log Analysis and Cost-Efficient Log Storage
LinkedIn / Medium · Dec 2023 · 1,000+ readers
Technical deep-dive on migrating from Elasticsearch to ClickHouse — architecture
decisions, compression strategy, performance benchmarks, and the path to ₹10L+ savings.
Architecture guide for self-service developer environments built on Docker Swarm,
Jenkins, and Traefik. Covers automation workflows and scalability patterns.