Hi Airflow Dev Community! I am excited to share a new proposal written by TP and I titled "Enhanced Data Awareness in Airflow <https://docs.google.com/document/d/1Sra65yjbAIZ2mZIbSUL9YMPrW73ltDEPWTCD4J3j2hQ/edit#heading=h.f9eh19p4yqfw>" that I believe will significantly advance our capabilities in data orchestration.
The proposal aims to bridge the gap between task management and data management within Airflow integrating enhanced data awareness features. This evolution unlocks Airflow's ability to make informed orchestration decisions based on actual data that is produced/manipulated by Airflow and provide actionable insights about the data as it moves through workflows, ultimately improving data reliability and data quality. Key highlights of the proposal include: - *Introducing Assets:* Redefining datasets as assets, allowing for more comprehensive data management and better alignment with modern data engineering practices. - *Progressive Adoptability:* Ensuring that enhancements can be integrated incrementally without disrupting existing workflows. - *Handling Incremental Load Strategies:* Providing first-class support for incremental processes to provide visibility on data freshness, set the stage for targeted backfills, and ultimately improve data reliability For more details, please refer to the attached document. I am eager to hear your thoughts and feedback on this proposal, as well as any suggestions for improvement. We will follow up with a set of formal AIPs. Constance -- Constance Martineau Senior Product Manager Email: [email protected] Time zone: US Eastern (EST UTC-5 / EDT UTC-4)
