kaxil commented on PR #54383: URL: https://github.com/apache/airflow/pull/54383#issuecomment-3229256100
> > I just stumbled across this PR, it's quite a massive change but the description has almost no information/justification/explanation. Can someone provide some context for the goal here? @uranusjr @kaxil > > I have not looked at details (yet) and have not reviewed it in detail (it's huge), but let me explain how I understand that change: > > The whole thing here is that we should get rid of airflow.models.DAG that was used to define Dags. During the process of serialisation implementation - when we implemented Airflow 2 we implemented SerializedDag that we used in places where we only retrieved the SerializedDag from the serialized form, - which was (quoting one of my favourite authors - Douglas Adams) "almost, but not quite entirely unlike DAG". They had different methods and helper methods - they were almost the same, but different. > > There was some class hierarchy that was supposed to make things easier (BaseDag) but generally speaking it was very difficult to reason when to use which. It's always been quite a complex and historically "convoluted" part of Airflow code. Not because it was designed like that but because it came from some incremental changes that we applied to the original dag (notably Dag serialization) that made it overly complex. > > In the process of moving to Task SDK, the Dag definition - the one that is used to created Dags by Dag Authors have been moved to TaskSDK. So far, so good. However, we were still using the DAG from airflow.models (and the DAG from airflow models actually derived from TaskSDK's Dag) because there were many, many places where the airlfow.models.DAG has been used and a number of methods and properties from airflow.models.DAG should NOT be move the TaskSDK Dag because they are simply not needed there. Many of those methods were actually only usable in testst, many were only needed for airflow-core internals. And our goal is to make TaskSDK as small an lean as possible so that we can only expose to Dag Authors what they **really** need. > > This is also a step towards complete "airflow-core" and "task.sdk" separation. There are stil a few task-sdk -> airflow.models.Dag references that are removed in this PR. And the idea ( for Airflow 3.1) is that we only end up with airflow-core using task.sdk but with task.sdk NOT using airflow. > > If I understand correctly - after this step is completed - airflow-core (scheduler, Triggerer) should generally only use SerializedDag. DagProcessor (for the parsing part) will use TaskSDKDag to produce SerializedDag (and store it in the database). Yup, that's exactly right -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
