amarhacks commented on issue #65011:
URL: https://github.com/apache/airflow/issues/65011#issuecomment-4224142705

   Thanks for the detailed report — we’re observing the same behavior in 
Airflow 3.1.8 with `GlueJobOperator(deferrable=True)`.
   
   From our investigation, this appears to be a systemic issue with how XComs 
are handled across the deferrable lifecycle (`execute → defer → 
execute_complete → retry`):
   
   * XCom rows created during `execute()` are **not cleared** when the task 
resumes.
   * On resume, `execute_complete()` attempts to **write the same keys again**, 
leading to duplicate key violations.
   * Retries further amplify the issue since the same XCom keys 
(`return_value`, `glue_job_run_details`) are re-inserted.
   
   A few specific observations:
   
   1. `return_value` is auto-pushed by the task runner in both `execute()` and 
`execute_complete()`, which guarantees a collision on resume/retry.
   2. `glue_job_run_details` is written before deferral and then attempted 
again on subsequent runs, and while some writes are suppressed, they still 
contribute to inconsistent behavior.
   3. Airflow explicitly avoids clearing XComs for deferred tasks, which makes 
the current insert-only semantics unsafe for deferrable operators.
   
   This suggests that deferrable operators are currently **not 
XCom-idempotent**, which can lead to failures even when operator logic itself 
is correct.
   
   **Expected behavior:**
   
   * XCom writes should be idempotent across deferral/resume boundaries, OR
   * Existing keys should be updated/replaced instead of causing failures, OR
   * The framework should avoid auto-pushing duplicate `return_value` entries 
for resumed executions.
   
   As a temporary workaround, we’ve had to suppress `return_value` pushes or 
avoid deferrable mode, but this is not ideal.
   
   It would be great to get guidance on whether:
   
   * This is a known limitation of the current deferrable execution model, or
   * There are plans to make XCom handling idempotent (e.g., upsert semantics 
or scoped lifecycle per phase).
   
   Happy to help with a minimal reproducible example if needed.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to