skymensch opened a new pull request, #64743:
URL: https://github.com/apache/airflow/pull/64743

   When a task pushes a large XCom payload (e.g. 300 MB), the supervisor's 
single-threaded event loop is blocked on the synchronous HTTP POST to the API 
server. During that time no heartbeats can be sent, and the scheduler 
eventually marks the task instance as failed with a heartbeat timeout — even 
though the task itself is still running successfully.
   
   **Root cause:** `ActivitySubprocess._handle_request()` calls 
`self.client.xcoms.set()` synchronously. Because the supervisor uses a 
`selectors`-based event loop, any blocking call inside a handler stalls the 
entire loop, including `_send_heartbeat_if_needed()`.
   
   **Fix:** Offload the `SetXCom` API call to a single-worker 
`ThreadPoolExecutor`. The handler submits the future and returns immediately, 
so the event loop keeps ticking and heartbeats continue uninterrupted. A new 
`_drain_pending_requests()` helper is called on every loop iteration; it 
inspects completed futures and sends the appropriate response (or error) back 
to the task process.
   
   - `max_workers=1` preserves ordering of concurrent XCom writes from the same 
task.
   - `httpx.Client` is thread-safe, so sharing the existing client with the 
worker thread is safe.
   - On process cleanup `shutdown(wait=False)` discards any in-flight upload 
because the task process is already gone.
   
   closes: #64628
   
   ---
   
   ##### Was generative AI tooling used to co-author this PR?
   
   - [X] Yes — Claude Code (claude-sonnet-4-6)
   
   Generated-by: Claude Code (claude-sonnet-4-6) following [the 
guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to