MichaelRBlack opened a new pull request, #64691:
URL: https://github.com/apache/airflow/pull/64691

   ## Summary
   
   Task-level OTel metrics (e.g. `ti.finish`) are silently dropped in forked 
task subprocesses because the OTel Python SDK's `Once()` guard on 
`set_meter_provider()` survives `fork()`.
   
   **Root cause:** `stats.py` correctly detects PID mismatches after fork and 
calls `otel_logger.get_otel_logger()` to re-initialize. This creates a fresh 
`MeterProvider` and calls `metrics.set_meter_provider()`, but the SDK's 
`_METER_PROVIDER_SET_ONCE._done = True` flag inherited from the parent blocks 
the call. The child ends up with the parent's stale provider whose 
`PeriodicExportingMetricReader` export thread is dead after fork.
   
   **Fix:** Reset the SDK's provider state in `get_otel_logger()` before 
calling `set_meter_provider()`. Since `stats.py` only calls the factory after 
detecting a PID mismatch, this reset only runs in forked children that need a 
fresh provider.
   
   Closes #64690
   
   ## Test plan
   - [x] Added unit test that simulates `Once._done = True` (forked child 
state) and verifies `get_otel_logger()` successfully sets a new `MeterProvider`
   - [ ] Manual: Deploy and confirm `ti.finish` metrics appear in Grafana
   - [ ] Manual: Confirm "Overriding of current MeterProvider is not allowed" 
warning no longer appears in task logs
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to