aurangzaib048 opened a new pull request, #64770:
URL: https://github.com/apache/airflow/pull/64770
When `EmrContainerOperator` runs in deferrable mode and the trigger
times out or the task is killed, the EMR job keeps running on the
cluster. This leads to orphaned jobs consuming resources and duplicate
executions on retry.
This PR adds cancel-on-kill support to `EmrContainerTrigger` following
the same proven pattern as `EmrServerlessStartJobTrigger` (PR #51883):
- Override `run()` in `EmrContainerTrigger` to catch
`asyncio.CancelledError` and cancel the EMR job via
`hook.stop_query()` when safe to do so
- Add `safe_to_cancel()` check to distinguish user-initiated kills
from triggerer restarts (avoids cancelling jobs during triggerer
restart)
- Add `cancel_on_kill` parameter (default `True`) for opt-out
- Update `EmrContainerOperator.execute_complete()` to cancel the job
when the trigger reports a failure/timeout event
- All cancellation paths are wrapped in try/except to ensure proper
error propagation (CancelledError is always re-raised, original
AirflowException is preserved)
closes: #60517
---
##### Was generative AI tooling used to co-author this PR?
- [ ] No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]