MichaelRBlack opened a new pull request, #64703:
URL: https://github.com/apache/airflow/pull/64703

   ## Summary
   
   Fixes #64690 — task-level OTel metrics (`ti.finish`, `ti.start`) are 
silently dropped in forked task subprocesses (LocalExecutor, CeleryExecutor).
   
   ### Root cause
   
   `stats.py` correctly detects PID mismatches after fork and re-initializes 
the Stats instance by calling `get_otel_logger()`. This creates a fresh 
`MeterProvider` and calls `metrics.set_meter_provider()`.
   
   However, the OTel Python SDK uses a `Once()` guard on `set_meter_provider()` 
that only allows it to be called once per process. The `Once._done = True` flag 
from the parent survives `fork()`, so the child's `set_meter_provider()` 
silently fails with:
   
   ```
   WARNING - Overriding of current MeterProvider is not allowed
   ```
   
   The child ends up using the parent's stale `MeterProvider` whose 
`PeriodicExportingMetricReader` background thread is dead after fork.
   
   ### Fix
   
   Reset the OTel SDK's `_METER_PROVIDER_SET_ONCE._done` and `_METER_PROVIDER` 
in `get_otel_logger()` before calling `set_meter_provider()`. Since 
`get_otel_logger()` always intends to create and register a new provider, this 
is safe:
   
   - **First call** (no fork): `_done` is already `False`, so the reset is a 
no-op.
   - **Re-init after fork**: `_done` is `True` (inherited from parent), so the 
reset allows the new provider to be registered.
   
   ### Changes
   
   - `shared/observability/src/.../otel_logger.py` — reset `Once()` guard 
before `set_meter_provider()`
   - `shared/observability/tests/.../test_otel_logger.py` — add 
`test_reinit_after_fork_exports_metrics` that calls `get_otel_logger()` twice 
and verifies metrics from the second initialization are exported
   
   ### Test plan
   
   - [ ] Existing `test_atexit_flush_on_process_exit` continues to pass (single 
init path unchanged)
   - [ ] New `test_reinit_after_fork_exports_metrics` passes — verifies that 
calling `get_otel_logger()` twice (simulating post-fork re-init) correctly 
exports metrics from the second provider
   - [ ] Manual verification: with LocalExecutor or CeleryExecutor and `otel_on 
= True`, `ti.finish` metrics appear in the OTel collector after task completion


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to