Senior Data Engineer buildingreliable batch & streaming data platforms
I build large-scale data platforms, real-time streaming systems, and AI-ready pipelines using Databricks, PySpark, Snowflake, BigQuery, Delta Lake, AWS, GCP, Airflow, and Kafka.
⚡ Designed data platforms processing 1B+ records
⚡ Reduced ETL cost by 40% with optimized Delta pipelines
⚡ Built real-time streaming pipelines with Kafka & Spark
About Me
I design, build, and operate production-grade data platforms that teams trust.
I am a Senior Data Engineer with 6+ years of experience designing, building, and optimizing large-scale data platforms across AWS, GCP, Databricks, and modern lakehouse architectures.
I specialize in building real-time and batch data systems using PySpark, Delta Lake, Snowflake, BigQuery, Airflow, and Kafka.
I am passionate about data architecture, fintech systems, streaming pipelines, and MLOps.
What I Do
- Build large-scale data platforms
- Develop real-time pipelines (Kafka + Spark)
- Design lakehouse architectures
What I Am Focusing On
- Fintech AI and credit risk pipelines
- LLM and vector database engineering
- MLOps architecture
Skills & Technologies
Data Engineering
- Python
- SQL
- PostgreSQL
- MySQL
- MongoDB
- Apache Spark (PySpark)
- Apache Kafka
- Apache Airflow
- Hadoop
- HDFS
- Hive
- ETL / ELT
- Data Warehousing
- Data Modeling
- Data Pipelines
- Batch & Streaming Pipelines
- Delta Lake
- Databricks
- Snowflake
- dbt
- Distributed Systems
AWS Cloud
- AWS S3
- AWS EC2
- AWS Lambda
- AWS Glue
- AWS Redshift
- AWS EMR
- AWS IAM
- AWS VPC
- AWS CloudWatch
- AWS SNS / SQS
- AWS RDS
Google Cloud (GCP)
- GCP BigQuery
- GCP Dataproc
- GCP Dataflow
- GCP Cloud Storage
- GCP Pub/Sub
- GCP Composer (Managed Airflow)
- GCP IAM
- GCP VPC Networking
- GCP Monitoring
Experience
Senior Data Engineer
2020 – Present
- Designed and operated large-scale batch and streaming pipelines processing 1B+ records using PySpark, Databricks, and Delta Lake.
- Built real-time streaming systems using Kafka and Spark Structured Streaming.
- Reduced cloud cost by 40% via lakehouse migration.
- Implemented CI/CD with Docker, GitHub Actions, and Terraform.
- Modernized data warehouses on Snowflake and BigQuery.
Data Engineer
2018 – 2020
- Built ETL workflows using Airflow, Python, and SQL.
- Optimized BigQuery with partitioning and clustering (60% faster queries).
- Developed cloud-native ingestion pipelines.
- Collaborated on data modeling and warehouse design.
Featured Projects
Production-grade data engineering projects showcasing end-to-end expertise
Real-time Streaming Pipeline
End-to-end real-time data pipeline processing millions of events per second with Kafka, Spark Structured Streaming, and Delta Lake for real-time analytics and ML feature engineering.
Key Highlights
- •Processes 10M+ events/second with sub-second latency
- •Automated schema evolution and data quality checks
- •Cost-optimized architecture with auto-scaling
Enterprise Lakehouse on Databricks
Modern data lakehouse architecture built on Databricks with Delta Lake, enabling ACID transactions, time travel, and unified batch/streaming workloads.
Key Highlights
- •Unified 50+ data sources into a single lakehouse
- •Reduced pipeline runtime by 60%
- •Implemented governance with Unity Catalog
Snowflake / BigQuery Modernization
Migrated legacy warehouse workloads to Snowflake and BigQuery with optimized data models and automated ELT pipelines.
Key Highlights
- •Migrated 100+ TB of historical data
- •Improved query performance by 70%
- •Automated transformations with DBT
MLOps Feature Store Architecture
Production-grade feature store supporting real-time and batch feature computation, lineage, and versioning.
Key Highlights
- •Served 1000+ features with ms latency
- •Automated feature pipelines
- •Full feature lineage and versioning
Fintech Credit Risk Pipeline
Real-time credit risk scoring system with ML-driven decisioning and streaming feature engineering.
Key Highlights
- •Reduced approval time from days to minutes
- •Processed 100K+ applications daily
- •Achieved 95%+ model accuracy