kaxil opened a new pull request, #64558: URL: https://github.com/apache/airflow/pull/64558
Addresses review feedback on #64181, fixes the teardown scope evaluation to avoid unnecessary work in the scheduler hot path. - **Lazy iteration instead of `list()` materialization**: `_evaluate_direct_relatives()` has side effects (`ti.set_state()` on [line 460](https://github.com/apache/airflow/blob/456e4478bf69cc68b3855ef710f33092bef07220/airflow-core/src/airflow/ti_deps/deps/trigger_rule_dep.py#L460)). The original `list()` call forced all those DB writes before yielding anything. With lazy iteration, we yield as we go and only evaluate teardown scope if no direct-relative status was produced. - **Early return for empty `in_scope_ids`**: Skips the `ensure_finished_tis` DB query when there are no in-scope tasks to check. - **Parallel branch regression test**: The existing tests used linear chains, but the [reported bug](https://github.com/apache/airflow/issues/29332) involves parallel branches (`setup >> [t_fail, t_slow] >> downstream >> teardown`). Added a test matching that DAG shape. - **FAILED/UPSTREAM_FAILED state test**: Teardowns must run to clean up resources regardless of upstream failure state. Added a test verifying this works with FAILED and UPSTREAM_FAILED in-scope tasks. - **Removed fragile string assertion**: `assert "2 in-scope" in reason` couples tests to exact message format. ## Benchmark: scheduler impact of `list()` vs lazy The scheduler evaluates trigger rules for every TI in every loop. For teardowns, `list()` forces all `set_state()` DB writes upfront even when the first upstream already blocks: | Deployment | DAGs | Tasks/DAG | `list()` writes | lazy writes | wasted writes | |---|---|---|---|---|---| | Small | 50 | 10 | 500 | 50 | 450 | | Medium | 200 | 20 | 8,000 | 400 | 7,600 | | Large | 1,000 | 50 | 250,000 | 5,000 | 245,000 | | Enterprise | 5,000 | 100 | 5,000,000 | 50,000 | 4,950,000 | | Deployment | Teardowns | `list()` time | lazy time | speedup | |---|---|---|---|---| | Small | 50 | 0.05 ms | 0.02 ms | 2.4x | | Medium | 400 | 0.66 ms | 0.16 ms | 4.2x | | Large | 5,000 | 17.54 ms | 2.00 ms | 8.8x | | Enterprise | 50,000 | 334.51 ms | 19.82 ms | 16.9x | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
