Welcome! π
Iβm Pradeep Kalluri, a Data Engineer specializing in building scalable cloud data platforms and production-grade data pipelines.
Currently at NatWest Bank in London, designing and delivering reliable data engineering solutions that power analytics and business intelligence across the organization.
π‘ What I Do
I build end-to-end data platforms that transform raw data into actionable insights:
- Data Ingestion - Real-time streaming (Kafka) and batch processing from cloud storage (S3, Azure Data Lake)
- Distributed Processing - Large-scale data transformation using PySpark and Databricks
- Data Warehousing - Building curated datasets in Snowflake with optimized data models
- Pipeline Orchestration - Workflow automation and monitoring with Apache Airflow
- Analytics Engineering - dbt transformations, data quality frameworks, BI integration
π― Technical Stack
Languages: Python (PySpark, Pandas), SQL, Shell Scripting
Cloud Platforms: AWS (S3, Glue, Lambda), Azure (Databricks, ADF, Data Lake), Microsoft Fabric
Data Engineering: Apache Kafka, Apache Airflow, Snowflake, dbt, ETL/ELT pipelines
Databases: Snowflake, Azure SQL, Redshift, PostgreSQL, MySQL
DevOps: Docker, Terraform, CI/CD (GitHub Actions, Azure DevOps), Git
BI Tools: Tableau, Power BI
π Recent Achievements
π Certifications
- Microsoft Certified: Fabric Data Engineer Associate (January 2026) - NEW!
- Data lakehouse architecture with OneLake and Delta Lake
- Building data pipelines with Data Factory and Dataflow Gen2
- Real-time analytics with KQL databases and Eventstream
π Technical Writing
71,000+ views across platforms
- Rewriting My Apache Airflow PR: When Your First Solution Isnβt the Right One (January 2026) - NEW!
- Published in Apache Airflow official Medium publication (2.6K followers)
- Featured lessons on handling maintainer feedback and complete rewrites
- Cross-posted on Dev.to and Substack
- Why 71,000 Data Engineers Read My Article - Lessons on technical writing (Dec 2024)
- The Time Our Pipeline Processed the Same Day 47 Times - Approved by The New Stack (Dec 2024) - Pending Publication
- 5 Data Pipeline Mistakes That Cost Me Weeks - Production debugging stories (Dec 2024)
- Data Quality at Scale - 71,000 views! (Nov 2024)
- From Raw to Refined: Data Pipeline Architecture - Scalable pipeline design (Nov 2024)
Cross-posted on Dev.to and discussed on Redditβs r/dataengineering
π€ Speaking
-
Oxford Microsoft Data Platform Group - January 21, 2026 (Confirmed)
Topic: βFrom Raw to Refined: Building Production Data Pipelines That Scaleβ
Format: Online presentation with Azure-based demos -
13 conference proposals submitted to data engineering conferences and meetups across Europe
π» Open Source Contributions
Apache Airflow - 2 merged contributions β
- Pool name validation fix (PR #59938) - β
MERGED (January 2026) - NEW!
- Fixed InvalidStatsNameException for pool names with special characters
- Implemented normalization for backward compatibility
- Complete rewrite after maintainer feedback
- Documentation enhancement (PR #58587) - β
MERGED (December 2024)
- Improved data masking documentation for production deployments
dbt-core - Active contributions:
- Freshness summary output (PR #12231) - π‘ Under review
- User experience improvement (PR #12232) - π‘ Under review
Confluent Kafka Python - Active contribution:
- SSL configuration enhancement (PR submitted) - π’ Under review
π Projects
- Real-Time Data Quality Monitor - NEW!
- Production-grade streaming data quality system
- 6 quality dimensions (Completeness, Timeliness, Accuracy, Consistency, Uniqueness, Validity)
- REST API with 7 endpoints exposing real-time metrics
- Processing 600+ orders/minute with 46,000+ quality checks in 24 hours
- Built with Kafka, Python, PostgreSQL, FastAPI, and Streamlit
- View on GitHub
π Currently
- Building production data pipelines at NatWest Bank processing millions of transactions daily
- Writing about data engineering on Medium (71K+ views) and Dev.to
- Contributing to Apache Airflow and dbt-core open source projects
- Speaking at the Oxford Microsoft Data Platform Group (Jan 2026)
- Pursuing UK Global Talent Visa in Digital Technology
π’ Professional Experience
NatWest Bank (Sep 2025 - Present) - Data Engineer
Building scalable data platforms with Kafka, PySpark, Snowflake, and Airflow
Accenture (Jul 2023 - Aug 2025) - Data Engineer
Delivered enterprise cloud data solutions across Azure and AWS for major clients
Dpoint Group (May 2022 - Jun 2023) - Data Engineer
Developed BI solutions and ETL pipelines supporting operational analytics
π Connect With Me
Email β’ LinkedIn β’ GitHub β’ Medium β’ Dev.to
π Based in London, United Kingdom
Passionate about building reliable, scalable data platforms that empower data-driven decision making.