Subham-KRLX opened a new issue, #65125:
URL: https://github.com/apache/airflow/issues/65125

   ### Under which category would you file this issue?
   
   Airflow Core
   
   ### Apache Airflow version
   
   3.2.0.dev0
   
   ### What happened and how to reproduce it?
   
   In Airflow 3, logical_date is nullable for many run types. The scheduler's 
auto-pausing logic in dagrun.py relies on 
.order_by(DagRun.logical_date.desc()), which is non-deterministic for NULLs and 
fails to isolate Manual vs. Scheduled runs.
   
   Steps to Reproduce:
   
   Set max_consecutive_failed_dag_runs=3.
   A healthy scheduled run succeeds (Run A).
   3 Manual test runs fail with logical_date=None (Runs B, C, D).
   On Postgres/MySQL, the query order_by(logical_date.desc()) handles NULLs 
inconsistently. Often, the old success (Run A) is returned in the "top 3," 
preventing the auto-pause.
   In other cases, manual test failures "pollute" the count and pause the 
production schedule unnecessarily.
   
   ### What you think should happen instead?
   
   Deterministic Ordering: The scheduler should use a stable ordering mechanism 
that accounts for nullable logical_date by using 
order_by(DagRun.logical_date.desc().nulls_last(), DagRun.id.desc()) or 
prioritizing the run_after column.
   RunType Isolation: Evaluation of consecutive failures should be isolated by 
run_type (e.g., only scheduled runs should trigger a production auto-pause) to 
prevent manual test failures from impacting automated schedules.
   
   
   The DagRun model in Airflow 3 was updated to make logical_date optional, but 
the logic in _check_last_n_dagruns_failed in airflow/models/dagrun.py was not 
updated to handle this change. Specifically, the query at line 868: 
.order_by(DagRun.logical_date.desc()) is a regression that was missed when 
similar fixes were applied in PR #47301. It incorrectly treats all run types as 
a single chronological sequence and uses an unstable sort on a nullable column.
   
   
   
   ### Operating System
   
   macOS
   
   ### Deployment
   
   Virtualenv installation
   
   ### Apache Airflow Provider(s)
   
   _No response_
   
   ### Versions of Apache Airflow Providers
   
   N/A
   
   ### Official Helm Chart version
   
   Not Applicable
   
   ### Kubernetes Version
   
   N/A
   
   ### Helm Chart configuration
   
   N/A
   
   ### Docker Image customizations
   
   N/A
   
   ### Anything else?
   
   This issue was identified while researching the impact of nullable 
logical_date on core scheduler stability in Airflow 3. It appears to be a 
regression that was missed when similar fixes for nullable dates were applied 
elsewhere (such as in PR #47301 for get_previous_scheduled_dagrun). This 
problem occurs every time manual and scheduled runs are mixed in the history of 
a DAG with auto-pausing enabled.
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to