Senior Data Engineer buildingreliable batch & streaming data platforms

I build large-scale data platforms, real-time streaming systems, and AI-ready pipelines using Databricks, PySpark, Snowflake, BigQuery, Delta Lake, AWS, GCP, Airflow, and Kafka.

DatabricksPySparkSnowflakeBigQueryAWSGCPDelta LakeAirflowKafkaPythonSQLDockerKubernetes

Download CV (PDF)See My Work

⚡ Designed data platforms processing 1B+ records

⚡ Reduced ETL cost by 40% with optimized Delta pipelines

⚡ Built real-time streaming pipelines with Kafka & Spark

About Me

I design, build, and operate production-grade data platforms that teams trust.

I am a Senior Data Engineer with 6+ years of experience designing, building, and optimizing large-scale data platforms across AWS, GCP, Databricks, and modern lakehouse architectures.

I specialize in building real-time and batch data systems using PySpark, Delta Lake, Snowflake, BigQuery, Airflow, and Kafka.

I am passionate about data architecture, fintech systems, streaming pipelines, and MLOps.

What I Do

Build large-scale data platforms
Develop real-time pipelines (Kafka + Spark)
Design lakehouse architectures

What I Am Focusing On

Fintech AI and credit risk pipelines
LLM and vector database engineering
MLOps architecture

Skills & Technologies

Data Engineering

Python
SQL
PostgreSQL
MySQL
MongoDB
Apache Spark (PySpark)
Apache Kafka
Apache Airflow
Hadoop
HDFS
Hive
ETL / ELT
Data Warehousing
Data Modeling
Data Pipelines
Batch & Streaming Pipelines
Delta Lake
Databricks
Snowflake
dbt
Distributed Systems

AWS Cloud

AWS S3
AWS EC2
AWS Lambda
AWS Glue
AWS Redshift
AWS EMR
AWS IAM
AWS VPC
AWS CloudWatch
AWS SNS / SQS
AWS RDS

Google Cloud (GCP)

GCP BigQuery
GCP Dataproc
GCP Dataflow
GCP Cloud Storage
GCP Pub/Sub
GCP Composer (Managed Airflow)
GCP IAM
GCP VPC Networking
GCP Monitoring

Experience

Senior Data Engineer

2020 – Present

Designed and operated large-scale batch and streaming pipelines processing 1B+ records using PySpark, Databricks, and Delta Lake.
Built real-time streaming systems using Kafka and Spark Structured Streaming.
Reduced cloud cost by 40% via lakehouse migration.
Implemented CI/CD with Docker, GitHub Actions, and Terraform.
Modernized data warehouses on Snowflake and BigQuery.

Data Engineer

2018 – 2020

Built ETL workflows using Airflow, Python, and SQL.
Optimized BigQuery with partitioning and clustering (60% faster queries).
Developed cloud-native ingestion pipelines.
Collaborated on data modeling and warehouse design.

Featured Projects

Production-grade data engineering projects showcasing end-to-end expertise

Real-time Streaming Pipeline

End-to-end real-time data pipeline processing millions of events per second with Kafka, Spark Structured Streaming, and Delta Lake for real-time analytics and ML feature engineering.

Key Highlights

•Processes 10M+ events/second with sub-second latency
•Automated schema evolution and data quality checks
•Cost-optimized architecture with auto-scaling

Apache KafkaSpark StreamingDelta LakeAWS S3PythonDocker

Enterprise Lakehouse on Databricks

Modern data lakehouse architecture built on Databricks with Delta Lake, enabling ACID transactions, time travel, and unified batch/streaming workloads.

Key Highlights

•Unified 50+ data sources into a single lakehouse
•Reduced pipeline runtime by 60%
•Implemented governance with Unity Catalog

DatabricksDelta LakePySparkUnity CatalogAWSTerraform

Snowflake / BigQuery Modernization

Migrated legacy warehouse workloads to Snowflake and BigQuery with optimized data models and automated ELT pipelines.

Key Highlights

•Migrated 100+ TB of historical data
•Improved query performance by 70%
•Automated transformations with DBT

SnowflakeBigQueryDBTAirflowPythonGitHub Actions

MLOps Feature Store Architecture

Production-grade feature store supporting real-time and batch feature computation, lineage, and versioning.

Key Highlights

•Served 1000+ features with ms latency
•Automated feature pipelines
•Full feature lineage and versioning

Databricks Feature StoreMLflowPySparkDelta LakeFastAPI

Fintech Credit Risk Pipeline

Real-time credit risk scoring system with ML-driven decisioning and streaming feature engineering.

Key Highlights

•Reduced approval time from days to minutes
•Processed 100K+ applications daily
•Achieved 95%+ model accuracy

PySparkKafkaPostgreSQLFastAPIAWS LambdaDocker

Cloud Cost Optimization Framework

Automated framework for monitoring, anomaly detection, and dynamic resource scaling.

Key Highlights

•Reduced cloud costs by 40%
•Automated rightsizing
•Real-time anomaly detection

PythonAWS Cost ExplorerTerraformLambdaCloudWatch

Get In Touch

Let's talk about data engineering, cloud platforms, real-time systems, and AI-ready architectures.

Email Me GitHub LinkedIn

Senior Data Engineer buildingreliable batch & streaming data platforms

About Me

What I Do

What I Am Focusing On

Skills & Technologies

Data Engineering

AWS Cloud

Google Cloud (GCP)

Experience

Senior Data Engineer

Data Engineer

Featured Projects

Real-time Streaming Pipeline

Key Highlights

Enterprise Lakehouse on Databricks

Key Highlights

Snowflake / BigQuery Modernization

Key Highlights

MLOps Feature Store Architecture

Key Highlights

Fintech Credit Risk Pipeline

Key Highlights

Cloud Cost Optimization Framework

Key Highlights

Get In Touch

About

Quick Links

Expertise

Connect