Vicknesh Rethinavelu

Vicknesh Rethinavelu

Cloud Operations Engineer | Platform Engineering Specialist

9+ Years Experience | CKA 91% | Multi-Cloud Expert

Professional Summary

Senior Cloud Platform Engineer with 9+ years of experience specializing in cloud infrastructure, container orchestration, and platform engineering. Proven track record of optimizing infrastructure costs (₹10L+ annual savings) while improving performance (10X compression ratios, 5K+ logs/sec throughput). Expert in Kubernetes (CKA/CKAD certified - 91% score), multi-cloud environments (AWS, GCP, Azure), and building self-service developer platforms. Led critical infrastructure migrations and automation projects reducing manual work by 38%. Published technical thought leadership on Medium.

Key Achievements

38%

Manual Work Reduced

MongoDB Automation Platform

₹10L+

Annual Cost Savings

ClickHouse Migration

100+

Developers Enabled

Project Blackbox Platform

91%

CKA Score

Top 9th Percentile

Work Experience

Cloud Operations Engineer

MongoDB | Dec 2023 - Present (2 years)

Platform Engineering Projects:

  • Automated Infrastructure Issue Resolution: Built a Golang service reducing 38% manual work by categorizing issues, comparing cluster goal vs current state, and auto-applying playbook fixes. Stack: Go, RabbitMQ, MongoDB, Jira APIs, Kubernetes, Prometheus, Grafana, Splunk
  • Gateway Service Migration: Eliminated RabbitMQ dependency by building a single gateway endpoint with MongoDB Changestreams and Temporal framework, reducing race conditions and alert processing time. Implemented idempotent workflows
  • Proactive Cloud Outage Detection: Engineered multi-source correlation system using sliding window algorithm with 1-hour lookback for confidence scoring. Auto-creates incidents from Prometheus alerts, correlating cluster signals and cloud provider health dashboards

Atlas Infrastructure Operations:

  • Handle Atlas infrastructure issues from Go planner automations across AWS, GCP, and Azure
  • Coordinate with cloud providers for RCA and analysis, resolve integration issues proactively
  • Perform operational actions (MongoDB resync, cluster maintenance), prevent customer impact through proactive alerts

Senior DevOps Engineer

NoBroker.com | Aug 2021 - Dec 2023 (2 years 5 months)

  • ClickHouse Logging Migration: Led Elasticsearch to ClickHouse migration achieving ₹10L+ annual savings, 10:1 compression ratio, 5,000+ logs/sec throughput. Implemented LZ4HC compression with TTL expressions. Published technical article on Medium
  • Awarded Star Performer for cost optimization and performance improvements
  • Promoted to Senior DevOps Engineer, recognizing technical leadership and project delivery

DevOps Engineer

NoBroker.com | Jun 2020 - Jul 2021 (1 year 2 months)

NoBrokerHood - Society Management Platform:

  • Architected initial Kubernetes infrastructure for NoBrokerHood society management application
  • Built automation scripts and nginx configurations for application scaling and performance optimization
  • Designed scalable environments enabling platform growth during the first year of operations

Project Blackbox - Self-Service Staging Platform:

  • Built self-service staging platform using Docker Swarm, eliminating bottlenecks for 100+ developers
  • Implemented Jenkins multi-branch pipelines (Groovy 47.5%) with Perl automation scripts (27.1%)
  • Deployed Swarm+Prometheus+Loki monitoring stack for observability
  • Automated Git-to-HTTPS pipeline with Traefik, Let's Encrypt SSL, dynamic subdomains
  • Published technical article on Medium (NoBroker Engineering) - Read Article

DevOps Engineer

DXC Technology | Apr 2019 - Jun 2020 (1 year 3 months)

  • Multi-cloud infrastructure management (AWS, GCP, Azure)
  • Led Terraform 0.11 to 0.12.1 migration project
  • Received SPOT Award for exceptional project delivery

Technical Lead

Cognizant Technology Solutions | Jun 2017 - Apr 2019 (1 year 11 months)

  • Production application server management and middleware operations
  • SSL certificate migration and automation
  • Awarded Client Performer for operational excellence

Senior System Engineer

Cognizant Technology Solutions | Jun 2015 - Jun 2017 (2 years 1 month)

  • Application server management and middleware support
  • Production environment maintenance and troubleshooting

Major Projects

Automated Infrastructure Issue Resolution (MongoDB)

Tech Stack: Golang, RabbitMQ, MongoDB, Jira APIs, Kubernetes, Prometheus, Grafana, Splunk

Impact: 38% manual work reduction replacing 3 years of repetitive processes

Built a Golang automation service with a node-based execution graph using a modular executor pattern. System fetches Jira tickets, compares the Atlas cluster goal state vs current state, automatically applies fixes from playbooks or escalates to customers.

Gateway Service Migration (MongoDB)

Tech Stack: Golang, MongoDB Changestreams, Temporal Framework, TTL

Impact: Eliminated race conditions, reduced alert processing time, simplified architecture

Eliminated RabbitMQ dependency by building a single gateway service. Used MongoDB Changestreams for event-driven processing and the Temporal framework for idempotent child workflows.

Proactive Cloud Outage Detection (MongoDB)

Tech Stack: Temporal, MongoDB, Prometheus, Sliding Window Algorithm

Impact: Proactive detection with confidence scoring, auto-incident creation

Built a multi-source correlation engine using a sliding window algorithm with a 1-hour lookback. Correlates cluster signals with cloud provider health dashboards, produces confidence scores, and auto-creates tracking incidents.

ClickHouse Logging Infrastructure (NoBroker)

Tech Stack: ClickHouse, Fluent Bit, Redash, Grafana, LZ4HC

Impact: ₹10L+ annual savings, 10:1 compression, 5,000+ logs/sec throughput

Led Elasticsearch to ClickHouse migration. Researched ClickBench benchmarks, studied Zerodha/Cloudflare deployments, and implemented zero-downtime migration. Published an article on Medium reaching 1,000+ engineers.

Read Article on LinkedIn

Project Blackbox (NoBroker)

Tech Stack: Docker Swarm, Jenkins (Groovy), Perl, Traefik, Portainer, Prometheus, Loki

Impact: 100+ developers enabled, <5min Git push to HTTPS deployment

Built a complete self-service staging platform. Automated workflow: Git → Jenkins → Nexus → Portainer → Swarm → Traefik with Let's Encrypt SSL and dynamic subdomains. GitHub: vicknesh22/blackbox-swarm

Read Article on Medium

NoBrokerHood Infrastructure (NoBroker)

Tech Stack: Kubernetes, Nginx, Automation Scripts

Impact: Scalable platform enabling application growth

Architected initial Kubernetes infrastructure for society management application. Built automation scripts, nginx configurations, and scalable environments for platform expansion.

Technical Skills

Programming & Automation

Golang (Advanced) Python (Advanced) Bash/Shell (Expert) Groovy/Jenkins DSL (Advanced) Perl (Advanced)

Container & Orchestration

Kubernetes (Expert - CKA 91%) Docker & Swarm (Expert) Temporal Framework (Advanced)

Cloud Platforms

Multi-Cloud AWS/GCP/Azure (Expert) Google Cloud (Expert - Certified) AWS (Advanced - Certified) Azure (Advanced)

CI/CD & Infrastructure

Jenkins (Expert - 7+ years) Terraform (Advanced) GitOps (Expert) Spinnaker (Advanced)

Observability & Data

Prometheus & Grafana (Expert) ClickHouse (Expert) Splunk (Advanced) Fluent Bit (Expert) Loki (Advanced) Elasticsearch (Expert - 7+ years)

Workflow & Integration

MongoDB & Changestreams (Advanced) RabbitMQ (Advanced) Jira APIs (Advanced) Traefik (Expert)

Certifications & Education

Certified Kubernetes Administrator (CKA)

Score: 91% (Top 9th Percentile)

The Linux Foundation

Certified Kubernetes Application Developer (CKAD)

The Linux Foundation

Google Cloud Associate Cloud Engineer

Google Cloud

AWS Solution Architect Associate

Amazon Web Services

AWS CloudEndure Migration

Amazon Web Services

B.Tech - Electrical and Electronics Engineering

Pondicherry Engineering College | 2011 - 2015

Publications & Thought Leadership

"Real-Time Log Analysis and Cost-Efficient Log Storage"

Medium | December 2023 | 1,000+ readers

Detailed technical article on migrating from Elasticsearch to ClickHouse, achieving 10:1 compression and ₹10L+ cost savings. Covers architecture decisions, implementation challenges, and performance optimization.

Read Article →

"Project Blackbox - Our Mysterious Staging Environment"

Medium (NoBroker Engineering) | September 2020 | 30+ engagement

Architecture guide for building self-service developer environments using Docker Swarm, Jenkins, and modern DevOps tools. Covers automation workflows and scalability patterns.

Read Article →

Awards & Recognition