mahirhiro opened a new issue, #64765:
URL: https://github.com/apache/airflow/issues/64765

   ## Bug
   
   This is a follow-up to #29076, which fixed the missing 
`dagrun.duration.failed`
   metric emission on `dagrun_timeout`. That fix added the `Stats.timing` call 
to
   the timeout path in `scheduler_job_runner.py` but hardcoded:
   
       tags={"dag_id": dag_run.dag_id}
   
   instead of using `dag_run.stats_tags`.
   
   The `stats_tags` property on `DagRun` (dagrun.py:420) is defined as:
   
       {"dag_id": self.dag_id, "run_type": self.run_type}
   
   The normal finish path (dagrun.py:1648) correctly uses `self.stats_tags`,
   so `run_type` is present on all non-timeout failures but always absent
   from timeout-caused failures.
   
   ## Impact
   
   It is impossible to filter `dagrun.duration.failed` by `run_type` in
   monitoring queries when the failure was caused by `dagrun_timeout`. This
   affects monitor accuracy for teams that want to alert only on scheduled
   run failures.
   
   ## Fix
   
   One-line change in `task-sdk/src/airflow/jobs/scheduler_job_runner.py`
   around line 2011:
   
   ```python
   # Before
   tags={"dag_id": dag_run.dag_id}
   
   # After
   tags=dag_run.stats_tags
   ```
   
   ## Version
   
   Confirmed present in 3.1.7. Likely affects all versions since #29076
   was merged (2.5.2+).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to