mahirhiro opened a new issue, #64765:
URL: https://github.com/apache/airflow/issues/64765
## Bug
This is a follow-up to #29076, which fixed the missing
`dagrun.duration.failed`
metric emission on `dagrun_timeout`. That fix added the `Stats.timing` call
to
the timeout path in `scheduler_job_runner.py` but hardcoded:
tags={"dag_id": dag_run.dag_id}
instead of using `dag_run.stats_tags`.
The `stats_tags` property on `DagRun` (dagrun.py:420) is defined as:
{"dag_id": self.dag_id, "run_type": self.run_type}
The normal finish path (dagrun.py:1648) correctly uses `self.stats_tags`,
so `run_type` is present on all non-timeout failures but always absent
from timeout-caused failures.
## Impact
It is impossible to filter `dagrun.duration.failed` by `run_type` in
monitoring queries when the failure was caused by `dagrun_timeout`. This
affects monitor accuracy for teams that want to alert only on scheduled
run failures.
## Fix
One-line change in `task-sdk/src/airflow/jobs/scheduler_job_runner.py`
around line 2011:
```python
# Before
tags={"dag_id": dag_run.dag_id}
# After
tags=dag_run.stats_tags
```
## Version
Confirmed present in 3.1.7. Likely affects all versions since #29076
was merged (2.5.2+).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]