About Me

Senior Data
Engineer.

6+ years designing and operating distributed data systems at scale — batch pipelines, streaming platforms, and lakehouse architecture on AWS and GCP.

Core focus: Spark performance tuning, Delta Lake table design, and Kafka-based streaming with exactly-once semantics. Data correctness, pipeline observability, and failure recovery are treated as system requirements — not operational concerns.

Targeting senior and staff-level data engineering roles — fintech, ML infrastructure, and large-scale data platform teams. Apache Spark, PySpark, Kafka, Databricks, Delta Lake, Snowflake, BigQuery, AWS, GCP, dbt, Airflow, Terraform.

Vasudev Rao
Senior DE

Vasudev Rao

Data Engineer · 6+ Years

Bangalore · Remote Worldwide
AWS · GCP · Databricks

1B+

Events/Day

40%

Cost Saved

50+

Unified

Engineering Philosophy

Correctness First

Exactly-once delivery, idempotent writes, and late-event handling are system requirements — not afterthoughts. Pipelines that are correct under failure.

Performance at Scale

1B+ events/day Spark pipelines, sub-10ms feature serving, and 70% query improvements through physical data modelling and smart partitioning.

Operational Simplicity

Systems built for observability — structured logging, data quality checks, automated alerting, and runbook-driven incident response from day one.

Lakehouse Architecture

Medallion patterns on Delta Lake and Apache Iceberg with Unity Catalog governance. Schema evolution, time-travel, and zero-copy cloning as standard.

Technical Proficiency

Data Engineering Core

dbt (Data Build Tool)98%
Snowflake / BigQuery96%
Python Automation98%
Apache Airflow95%
Apache Spark / PySpark94%
Apache Kafka / Streaming91%

Warehousing & Lakehouse

Delta Lake / Databricks94%
Apache Iceberg88%
AWS Glue / EMR92%
GCP Dataflow / Pub/Sub90%
Great Expectations92%
OpenMetadata / Catalog86%

Cloud & Infrastructure

AWS (S3, Glue, EMR, Redshift)93%
GCP (BigQuery, Dataflow)90%
Terraform / IaC88%
Docker / Kubernetes87%
PostgreSQL / Redshift91%
Looker / Metabase / Power BI89%

Experience

Senior Data Engineer

Fintech Platform (100K+ apps/day)2023 – Present

Kafka → PySpark real-time credit decisioning pipeline. Reduced decision latency from 48 hours to under 2 minutes while maintaining 95%+ model accuracy at 100K+ applications/day

Built ML Feature Store on Databricks serving 1,000+ features with point-in-time correctness for offline training and sub-10ms retrieval for online inference across 4 ML teams

Designed cost governance framework with auto-remediation across 20+ Databricks workspaces — reduced combined AWS + GCP data platform spend by 40% within 90 days

Data Platform Engineer

Enterprise Data Platform2020 – 2023

Built Kafka → Spark Structured Streaming pipelines processing 1B+ events/day with exactly-once delivery guarantees to Delta Lake — reduced end-to-end latency from minutes to under 5 seconds

Replaced 50+ siloed ingestion jobs with a unified Databricks medallion lakehouse. Cut pipeline execution time by 60% and eliminated cross-team schema inconsistencies with Unity Catalog

Led migration of 100TB from Redshift + Oracle to Snowflake + BigQuery using dual-write validation strategy — zero downtime, 70% query performance improvement

Data Engineer

Analytics Consultancy2018 – 2020

Designed Airflow DAGs for multi-source ELT workflows across 50+ upstream sources into BigQuery and Snowflake

Reduced BigQuery costs 60% via date partitioning, clustering, and materialized view optimisation

Built cloud-native ingestion pipelines from REST APIs, CDC streams, and file sources into GCS and BigQuery

Certifications & Recognition

Databricks Certified Data Engineer Associate

Databricks · 2023

AWS Certified Data Analytics Specialty

Amazon Web Services · 2022

Google Professional Data Engineer

Google Cloud · 2022

dbt Certified Analytics Engineer

dbt Labs · 2023

Open to Senior DE Roles & Consulting

Let's build something
at scale.

Interested in distributed data systems, streaming architecture, or lakehouse design? Let's talk.