Hello, I'm
Building scalable cloud data platforms and production-grade pipelines at NatWest Bank, London.
I'm a Data Engineer with hands-on experience designing, developing, and deploying scalable data pipelines for analytics and business intelligence across enterprise organizations.
Currently at NatWest Bank, I work across modern data platforms building reliable data flows using Kafka, PySpark, Snowflake, and Airflow.
With experience spanning financial services and consulting, I've delivered data engineering solutions across cloud data platforms, real-time streaming systems, and advanced analytics environments.
Azure Databricks, AWS, Snowflake, Microsoft Fabric
Kafka, PySpark, Airflow orchestration
Python, SQL, dbt, Azure Data Factory
Tableau, Power BI, SAP BW integration
York St John University, London, UK
2023 – 2024Tor Vergata University, Rome, Italy
2020 – 2023London, United Kingdom
Retail Banking Data Quality and Pipeline Engineering. Building production data platforms processing millions of transactions daily.
Enterprise Data Platform Modernisation. Delivered large-scale cloud data engineering solutions for Fortune 500 clients across multiple industries.
SAP BW to Azure Migration & Power BI Reporting Modernisation. Developed BI and analytics solutions for manufacturing and logistics operations.
PySpark, Pandas
T-SQL, PL/SQL
Bash Scripting
S3, Glue, Lambda
Databricks, ADF
Microsoft Fabric
Streaming & Events
DAG Orchestration
Cloud Data Warehouse
Distributed Processing
Transform Layer
Relational DB
Relational DB
Azure SQL
Containerization
IaC
GitHub Actions
Data Visualization
Dashboards & KPIs
NatWest Bank
Building production-grade data pipelines processing millions of transactions daily with real-time streaming and automated quality frameworks.
Accenture
Delivered large-scale cloud platforms for Fortune 500 clients using Azure Databricks, Snowflake, and Microsoft Fabric with data mesh architecture.
Dpoint Group
Developed BI and analytics solutions for manufacturing and logistics operations, automating 30+ manual reporting processes.
✅ Production-Ready
ML-powered real-time data quality monitoring system detecting anomalies in streaming data with sub-10ms latency using Isolation Forest.
✅ Open Source
A full modern data engineering platform built from scratch. Cost-effective alternative to commercial tools, potentially saving companies £100K+ annually.
End-to-end data pipeline demonstrating modern data engineering practices with PySpark, Airflow, dbt, and comprehensive testing.
Click for case study →AWS
In ProgressMicrosoft
In Progress71,000+ views across platforms
Building Production Data Pipelines That Scale
February 202613 proposals submitted to data engineering conferences across Europe
Active mentor on Topmate (Top 5% Mentor) and internal training lead at NatWest Group. Helping aspiring data engineers transition into the field.
"Pradeep delivered an exceptional presentation on production data pipelines. His deep technical knowledge and ability to explain complex concepts clearly made a lasting impression on our audience."
"An outstanding data engineer who combines strong technical skills with excellent communication. His article on data quality reached 71,000+ engineers for a reason — he writes with clarity and real-world experience that resonates with practitioners."
"Pradeep's contributions to Apache Airflow demonstrate deep understanding of the codebase. His PRs are well-documented, thoroughly tested, and address real user pain points. A valuable contributor to the open source community."
"Pradeep's mentoring sessions on Topmate dramatically accelerated my career transition into data engineering. His practical approach and willingness to share real-world examples set him apart from other mentors."
Last 52 weeks of activity
Passionate about building reliable, scalable data platforms that empower data-driven decision making.