Technical Writing

I write about data engineering, production debugging, and lessons learned from building scalable data platforms. My articles focus on real-world experiences and practical insights that help other data engineers solve similar problems.

📊 Impact

Published on The New Stack - Leading cloud-native and DevOps publication ⭐ NEW!
71,000+ total views across Medium, LinkedIn, Reddit, and Dev.to
Published in Apache Airflow official Medium publication (2.6K followers)
Featured on Reddit r/dataengineering with 100+ upvotes
Cross-platform reach with engaged discussions and community feedback

📝 Published Articles

A Beginner’s Guide to Contributing to Apache Airflow

Published: February 2026 Platform: Apache Airflow Official Medium Publication

A practical guide for developers looking to make their first open source contribution to Apache Airflow. Covers setting up a local development environment, navigating the codebase, understanding the contribution process, and working with maintainers effectively — lessons drawn from my own three merged PRs and 5+ code reviews.

Key Topics:

Setting up a local Airflow dev environment
Finding good first issues
Understanding the PR review process
Working with maintainer feedback
Writing tests and documentation alongside code

Platforms: Apache Airflow Medium • LinkedIn

Technologies: Apache Airflow, Python, Open Source Contribution

Impact: Published in Apache Airflow’s official publication, helping new contributors navigate the open source contribution process

The Weekend Our Pipeline Processed the Same Data 47 Times ⭐ Published on The New Stack

Published: January 12, 2026
Platform: The New Stack

A production debugging story published on The New Stack, a leading publication for software development and cloud-native infrastructure. After deploying new analytics code on Friday, our pipeline processed the same day’s data 47 times over the weekend due to a combination of zero-count edge cases and poorly designed retry logic.

Key Lessons:

Test with realistic data patterns (weekdays, weekends, holidays)
Retry logic should reprocess the same data, not fallback data
Handle division by zero explicitly in analytics
Monitor successful runs, not just failures
Keep execution date and data date separate

Platforms: The New Stack • Medium • Dev.to • LinkedIn

Technologies: Apache Airflow, PySpark, Python, ETL

Recognition: Selected for publication by The New Stack, a leading software development and infrastructure publication read by enterprise engineers and CTOs globally

Rewriting My Apache Airflow PR: When Your First Solution Isn’t the Right One

Published: January 2026
Platform: Apache Airflow Official Medium Publication (2.6K followers)

The story of contributing to Apache Airflow and completely rewriting my pull request after maintainer feedback. Covers the journey from initial solution to a better approach, handling feedback professionally, and the importance of understanding maintainer perspective.

The Journey:

Initial PR (#59938) fixing InvalidStatsNameException for pool names with special characters
Maintainer feedback suggesting a different approach
Complete rewrite from local fix to centralized normalization
Successfully merged into Apache Airflow main branch

Key Lessons:

Maintainer feedback is about improving the codebase, not rejecting you
The first solution that works isn’t always the best solution
Understanding the broader codebase leads to better contributions
Documentation and tests are as important as the fix itself
Complete rewrites can result in cleaner, more maintainable code

Technical Details:

Implemented normalize_metric_name() function for pool name sanitization
Applied normalization in both trigger_task() and allocate_slots() methods
Ensured backward compatibility with existing pool naming patterns
Added comprehensive test coverage for edge cases

Platforms: Apache Airflow Medium • Dev.to • Substack • LinkedIn

Technologies: Apache Airflow, Python, Open Source Contribution, Code Review

Impact: Featured in Apache Airflow’s official publication, helping other contributors navigate the PR process

Why 71,000 Data Engineers Read My Article: What I Learned About Technical Writing

Published: December 8, 2024

An analysis of what made my data quality article reach 71,000 views. Lessons on writing technical content that resonates: focusing on pain over solutions, showing vulnerability, using specific numbers, and structuring for skimmers.

Key Insights:

Write about pain, not just solutions
Show mistakes, not expertise
Use specific numbers (10%, $100 → 10,000)
Structure for skimmers with clear headings
End with questions to spark discussion

Platforms: Medium • Dev.to • LinkedIn • Reddit

Impact: Helped data engineers understand how to write technical content that gets read and shared

5 Production Data Pipeline Mistakes That Cost Me Weeks

Published: December 2, 2024

Real production incidents from financial services data engineering: silent data loss, weekend-only failures, currency format changes, undocumented schema changes, and validation issues. Each story includes the debugging process and lessons learned.

Mistakes Covered:

Validation logic silently dropping 10% of transactions
Hard-coded thresholds causing weekend-only failures
Currency format changes ($100 → 10,000)
Undocumented schema changes creating duplicates
COALESCE logic missing critical columns

Platforms: Medium • Dev.to • LinkedIn • Reddit

Technologies: PySpark, Data Quality, SQL, Production Debugging

Community Response: Featured discussion on r/dataengineering with engineers sharing similar experiences

Data Quality at Scale: Building Reliable Data Pipelines

Published: November 2024
Views: 71,000+

The article that started it all. Covers practical approaches to data quality in production pipelines: validation strategies, monitoring patterns, schema evolution, and handling edge cases. Based on real experiences processing millions of transactions daily.

Key Topics:

Data validation patterns at scale
Schema evolution strategies
Monitoring and alerting approaches
Handling edge cases and exceptions
Production debugging techniques

Platforms: Medium • Dev.to • LinkedIn • Reddit

Technologies: PySpark, Kafka, Airflow, Data Quality

Impact: 71,000+ views, 100+ upvotes on Reddit, extensive discussion and community engagement

From Raw to Refined: Building Production Data Pipelines That Scale

Published: November 2024

Architecture patterns for building scalable data pipelines. Covers the journey from raw data ingestion through transformation to analytics-ready datasets, including best practices for each layer.

Architecture Covered:

Bronze (Raw) → Silver (Cleaned) → Gold (Business) layer pattern
Data ingestion strategies (batch vs streaming)
Transformation best practices
Data modeling for analytics
Orchestration and monitoring

Platforms: Medium • Dev.to • LinkedIn

Technologies: Kafka, PySpark, Snowflake, Airflow, Databricks

Use Case: Topic presented at Oxford Microsoft Data Platform Group (January 2026)

🎯 Writing Philosophy

Practical over theoretical - Real production experiences, not abstract concepts

Honest about failures - Sharing mistakes and debugging journeys, not just solutions

Specific and actionable - Concrete examples with code, numbers, and real scenarios

Community-focused - Ending with questions to spark discussion and shared learning

Open source focused - Contributing back to the tools and communities that help us

📢 Where to Read

The New Stack: thenewstack.io/author/pradeep-kalluri
Apache Airflow Publication: medium.com/apache-airflow
Primary: Medium
Cross-posted: Dev.to • Substack
Discussions: Reddit r/dataengineering
Professional updates: LinkedIn

💬 Get in Touch

Have questions about data engineering? Want to discuss pipeline architecture or share your own production war stories?

Connect with me on LinkedIn or reach out via email.

← Back to Home

View Open Source Contributions →

View Projects →