MichaelRBlack opened a new pull request, #64703: URL: https://github.com/apache/airflow/pull/64703
## Summary Fixes #64690 — task-level OTel metrics (`ti.finish`, `ti.start`) are silently dropped in forked task subprocesses (LocalExecutor, CeleryExecutor). ### Root cause `stats.py` correctly detects PID mismatches after fork and re-initializes the Stats instance by calling `get_otel_logger()`. This creates a fresh `MeterProvider` and calls `metrics.set_meter_provider()`. However, the OTel Python SDK uses a `Once()` guard on `set_meter_provider()` that only allows it to be called once per process. The `Once._done = True` flag from the parent survives `fork()`, so the child's `set_meter_provider()` silently fails with: ``` WARNING - Overriding of current MeterProvider is not allowed ``` The child ends up using the parent's stale `MeterProvider` whose `PeriodicExportingMetricReader` background thread is dead after fork. ### Fix Reset the OTel SDK's `_METER_PROVIDER_SET_ONCE._done` and `_METER_PROVIDER` in `get_otel_logger()` before calling `set_meter_provider()`. Since `get_otel_logger()` always intends to create and register a new provider, this is safe: - **First call** (no fork): `_done` is already `False`, so the reset is a no-op. - **Re-init after fork**: `_done` is `True` (inherited from parent), so the reset allows the new provider to be registered. ### Changes - `shared/observability/src/.../otel_logger.py` — reset `Once()` guard before `set_meter_provider()` - `shared/observability/tests/.../test_otel_logger.py` — add `test_reinit_after_fork_exports_metrics` that calls `get_otel_logger()` twice and verifies metrics from the second initialization are exported ### Test plan - [ ] Existing `test_atexit_flush_on_process_exit` continues to pass (single init path unchanged) - [ ] New `test_reinit_after_fork_exports_metrics` passes — verifies that calling `get_otel_logger()` twice (simulating post-fork re-init) correctly exports metrics from the second provider - [ ] Manual verification: with LocalExecutor or CeleryExecutor and `otel_on = True`, `ti.finish` metrics appear in the OTel collector after task completion -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
