Technical Writing

I write about data engineering, production debugging, and lessons learned from building scalable data platforms. My articles focus on real-world experiences and practical insights that help other data engineers solve similar problems.

📊 Impact

  • 71,000+ total views across Medium, LinkedIn, Reddit, and Dev.to
  • Published in Apache Airflow official Medium publication (2.6K followers) - NEW!
  • Approved by The New Stack - Article pending publication - NEW!
  • Featured on Reddit r/dataengineering with 100+ upvotes
  • Cross-platform reach with engaged discussions and community feedback

📝 Published Articles

Rewriting My Apache Airflow PR: When Your First Solution Isn’t the Right One ⭐ NEW!

Published: January 2026
Platform: Apache Airflow Official Medium Publication (2.6K followers)

The story of contributing to Apache Airflow and completely rewriting my pull request after maintainer feedback. Covers the journey from initial solution to a better approach, handling feedback professionally, and the importance of understanding maintainer perspective.

The Journey:

  • Initial PR (#59938) fixing InvalidStatsNameException for pool names with special characters
  • Maintainer feedback suggesting a different approach
  • Complete rewrite from local fix to centralized normalization
  • Successfully merged into Apache Airflow main branch

Key Lessons:

  • Maintainer feedback is about improving the codebase, not rejecting you
  • The first solution that works isn’t always the best solution
  • Understanding the broader codebase leads to better contributions
  • Documentation and tests are as important as the fix itself
  • Complete rewrites can result in cleaner, more maintainable code

Technical Details:

  • Implemented normalize_metric_name() function for pool name sanitization
  • Applied normalization in both trigger_task() and allocate_slots() methods
  • Ensured backward compatibility with existing pool naming patterns
  • Added comprehensive test coverage for edge cases

Platforms: Apache Airflow MediumDev.toSubstackLinkedIn

Technologies: Apache Airflow, Python, Open Source Contribution, Code Review

Impact: Featured in Apache Airflow’s official publication, helping other contributors navigate the PR process


Why 71,000 Data Engineers Read My Article: What I Learned About Technical Writing

Published: December 8, 2024

An analysis of what made my data quality article reach 71,000 views. Lessons on writing technical content that resonates: focusing on pain over solutions, showing vulnerability, using specific numbers, and structuring for skimmers.

Key Insights:

  • Write about pain, not just solutions
  • Show mistakes, not expertise
  • Use specific numbers (10%, $100 → 10,000)
  • Structure for skimmers with clear headings
  • End with questions to spark discussion

Platforms: MediumDev.toLinkedIn • Reddit

Impact: Helped data engineers understand how to write technical content that gets read and shared


The Time Our Pipeline Processed the Same Day’s Data 47 Times ⭐ Approved by The New Stack

Published: December 16, 2024
Status: Approved by The New Stack - Pending Publication Date

A production debugging story about retry logic gone wrong. After deploying new analytics code on Friday, our pipeline processed the same day’s data 47 times over the weekend due to a combination of zero-count edge cases and poorly designed retry logic.

Key Lessons:

  • Test with realistic data patterns (weekdays, weekends, holidays)
  • Retry logic should reprocess the same data, not fallback data
  • Handle division by zero explicitly in analytics
  • Monitor successful runs, not just failures
  • Keep execution date and data date separate

Platforms: The New Stack (pending) • MediumDev.toLinkedIn • Reddit

Technologies: Apache Airflow, PySpark, Python, ETL

Recognition: Selected for publication by The New Stack, a leading software development and infrastructure publication


5 Production Data Pipeline Mistakes That Cost Me Weeks

Published: December 2, 2024

Real production incidents from financial services data engineering: silent data loss, weekend-only failures, currency format changes, undocumented schema changes, and validation issues. Each story includes the debugging process and lessons learned.

Mistakes Covered:

  1. Validation logic silently dropping 10% of transactions
  2. Hard-coded thresholds causing weekend-only failures
  3. Currency format changes ($100 → 10,000)
  4. Undocumented schema changes creating duplicates
  5. COALESCE logic missing critical columns

Platforms: MediumDev.toLinkedIn • Reddit

Technologies: PySpark, Data Quality, SQL, Production Debugging

Community Response: Featured discussion on r/dataengineering with engineers sharing similar experiences


Data Quality at Scale: Building Reliable Data Pipelines

Published: November 2024
Views: 71,000+

The article that started it all. Covers practical approaches to data quality in production pipelines: validation strategies, monitoring patterns, schema evolution, and handling edge cases. Based on real experiences processing millions of transactions daily.

Key Topics:

  • Data validation patterns at scale
  • Schema evolution strategies
  • Monitoring and alerting approaches
  • Handling edge cases and exceptions
  • Production debugging techniques

Platforms: MediumDev.toLinkedIn • Reddit

Technologies: PySpark, Kafka, Airflow, Data Quality

Impact: 71,000+ views, 100+ upvotes on Reddit, extensive discussion and community engagement


From Raw to Refined: Building Production Data Pipelines That Scale

Published: November 2024

Architecture patterns for building scalable data pipelines. Covers the journey from raw data ingestion through transformation to analytics-ready datasets, including best practices for each layer.

Architecture Covered:

  • Bronze (Raw) → Silver (Cleaned) → Gold (Business) layer pattern
  • Data ingestion strategies (batch vs streaming)
  • Transformation best practices
  • Data modeling for analytics
  • Orchestration and monitoring

Platforms: MediumDev.toLinkedIn

Technologies: Kafka, PySpark, Snowflake, Airflow, Databricks

Use Case: Submitted as speaking proposal to Oxford Microsoft Data Platform Group (Accepted for Jan 2026)


🎯 Writing Philosophy

Practical over theoretical - Real production experiences, not abstract concepts

Honest about failures - Sharing mistakes and debugging journeys, not just solutions

Specific and actionable - Concrete examples with code, numbers, and real scenarios

Community-focused - Ending with questions to spark discussion and shared learning

Open source focused - Contributing back to the tools and communities that help us


📢 Where to Read

Apache Airflow Publication: medium.com/apache-airflow
Primary: Medium
Cross-posted: Dev.toSubstack
Discussions: Reddit r/dataengineering
Professional updates: LinkedIn


💬 Get in Touch

Have questions about data engineering? Want to discuss pipeline architecture or share your own production war stories?

Connect with me on LinkedIn or reach out via email.


← Back to Home View Open Source Contributions → View Projects →