mobuchowski opened a new pull request, #64843:
URL: https://github.com/apache/airflow/pull/64843

   `emit_lineage_from_sql_extras` called `hook.get_connection()` twice per SQL 
extra — once in `_resolve_namespace` and once inside 
`get_openlineage_facets_with_sql`. For N extras from the same hook (common when 
a task runs multiple SQL statements) this is N×2 redundant round-trips that all 
return the same connection object. Each call hits SecretsManager (miss) then 
the Airflow API server, making it the dominant cost of SQL hook lineage 
processing.
   
   Fix: build a `conn_id → hook` mapping before the loop, then define three 
`@functools.cache`-decorated local closures keyed by `conn_id`:
   - `_get_connection` — one `hook.get_connection()` call per unique `conn_id`
   - `_get_database_info` — derived from `_get_connection`, cached separately
   - `_get_namespace` — derived from `_get_database_info`, cached separately
   
   Each concern is cached independently via `functools.cache`; no manual dict, 
no tuple packing. `hook.get_connection()` fires exactly once per unique 
`conn_id` regardless of how many SQL extras share it.
   
   Also includes two related fixes:
   - `extractors/manager.py`: wrap `get_hook_lineage()` in try/except so an 
exception in SQL extras processing can't silently suppress the task-level 
COMPLETE event via `@print_warning`
   - `plugins/listener.py`: call `logging.shutdown()` before `os._exit(0)` in 
the fork child so buffered log messages (including failure warnings) are 
flushed before exit
   
   
   ##### Was generative AI tooling used to co-author this PR?
   
   - [ ] Yes (please specify the tool below)
   
   Generated-by: Claude Code following [the 
guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to