shaleena commented on issue #65011:
URL: https://github.com/apache/airflow/issues/65011#issuecomment-4243649820
Want to add that this is not limited to manual retries / clears
A lot of our failures were reproducible after clearing the task instance or
retrying the same DAG run, but we also observed cases where the scheduled run
itself failed with the same duplicate-XCom pattern, without us manually
rerunning it first.
So manual retry / clear makes the issue easier to reproduce, but it does not
appear to be the only trigger.
**Workaround behavior**
We tested disabling the default XCom behavior entirely:
`do_xcom_push = False
`
avoid returning a value (so no return_value)
manually push a custom key instead
In deferrable mode, the stable version was to extract the run ID from the
trigger event in execute_complete() and push it under a custom key:
```
def execute_complete(self, context, event=None, **kwargs):
run_id = event.get("run_id") if event else None
if run_id:
context["ti"].xcom_push(key="glue_run_id_debug", value=run_id)
super().execute_complete(context, event=event, **kwargs)
return None
```
With this setup:
tasks succeed consistently
clearing / retrying no longer produces 409 errors
the standard keys return_value and glue_job_run_details are no longer
written in these runs
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]