Available for senior data engineering roles
Vasu

I build data systems
that scale to billions

Senior Data Engineer specialising in real-time pipelines, distributed systems, and lakehouse architectures.

0+Years exp.
0B+Events / day
0TB+Data managed
0.9%SLA delivered
Apache Kafka·Delta Lake·Apache Spark·Apache Flink·Kubernetes·Terraform·Airflow·Trino·BigQuery·Redshift·FastAPI·Databricks·Apache Kafka·Delta Lake·Apache Spark·Apache Flink·Kubernetes·Terraform·Airflow·Trino·BigQuery·Redshift·FastAPI·Databricks·

Portfolio

Selected Work

Sub-5s end-to-end latency (from 8min), 10M+ events/day throughput, 0 data loss with exactly-once delivery, 99.9% uptime

10M Events/Day Kafka → Spark Streaming Pipeline

Production Kafka → Spark Structured Streaming pipeline processing 10M+ events/day with exactly-once delivery to Delta Lake. Watermark-based late-event handling, idempotent MERGE upserts, and dead-letter queue with automatic replay. Reduced end-to-end latency from 8 minutes to under 5 seconds.

Apache KafkaSpark Structured StreamingDelta LakePySparkAWS EMR+1 more
60% pipeline runtime reduction, 50+ sources unified, 0 schema conflicts, full data lineage with Unity Catalog

Enterprise Lakehouse — Databricks Medallion Architecture

Unified 50+ isolated AWS Glue jobs into a Databricks Delta Lake medallion architecture (Bronze/Silver/Gold). Unity Catalog for governance, dbt for schema contracts, Photon-powered Gold layer. Achieved 60% pipeline runtime reduction and eliminated schema conflicts across 8 engineering teams.

DatabricksDelta LakePySparkUnity Catalogdbt+3 more
70% query performance improvement (p95: 42s → 11s), 40% cost reduction, 100TB migrated with zero downtime

100TB Warehouse Migration — Redshift & Oracle → Snowflake + BigQuery

Led migration of 100+ TB from on-premise Oracle and legacy AWS Redshift to Snowflake and BigQuery using dual-write validation strategy. Re-modeled physical layer with micro-partition clustering and incremental ELT using dbt. Achieved 70% query performance improvement (p95: 42s → 11s) and 40% cost reduction with zero-downtime cutover.

SnowflakeBigQuerydbtApache AirflowAirbyte+2 more

How I work

Engineering Principles

Reliability First

Systems designed for 99.9% SLA. Every pipeline ships with observability, alerting, and a documented recovery path.

Scalable by Design

Architecture that grows with your data — from gigabytes to petabytes without re-engineering the foundation.

Deep Observability

You can't fix what you can't see. Rich metrics, distributed tracing, and cost visibility across every layer.

Let's talk

Let's build something scalable

Need production-grade data systems that actually scale?Let's discuss your architecture.

Get in Touch