First pass done. I think it's a great direction for Dataset -> Asset but I
think clarifying with some examples how Threshold and Tolerance would look
like, plus I am a bit unclear on how partitioning works and I think also
some examples would be useful there

On Thu, Jun 13, 2024 at 2:49 PM Constance Martineau
<[email protected]> wrote:

> Hi Airflow Dev Community!
>
> I am excited to share a new proposal written by TP and I titled "Enhanced
> Data Awareness in Airflow
> <
> https://docs.google.com/document/d/1Sra65yjbAIZ2mZIbSUL9YMPrW73ltDEPWTCD4J3j2hQ/edit#heading=h.f9eh19p4yqfw
> >"
> that I believe will significantly advance our capabilities in data
> orchestration.
>
> The proposal aims to bridge the gap between task management and data
> management within Airflow integrating enhanced data awareness features.
> This evolution unlocks Airflow's ability to make informed orchestration
> decisions based on actual data that is produced/manipulated by Airflow and
> provide actionable insights about the data as it moves through workflows,
> ultimately improving data reliability and data quality.
>
> Key highlights of the proposal include:
>
>    - *Introducing Assets:* Redefining datasets as assets, allowing for more
>    comprehensive data management and better alignment with modern data
>    engineering practices.
>    - *Progressive Adoptability:* Ensuring that enhancements can be
>    integrated incrementally without disrupting existing workflows.
>    - *Handling Incremental Load Strategies:* Providing first-class support
>    for incremental processes to provide visibility on data freshness, set
> the
>    stage for targeted backfills, and ultimately improve data reliability
>
> For more details, please refer to the attached document. I am eager to hear
> your thoughts and feedback on this proposal, as well as any suggestions for
> improvement. We will follow up with a set of formal AIPs.
>
> Constance
> --
>
> Constance Martineau
>
> Senior Product Manager
>
> Email: [email protected]
>
> Time zone: US Eastern (EST UTC-5 / EDT UTC-4)
>

Reply via email to