[PR] BigQuery: Reuse table from refresh during commit to reduce API calls [iceberg]

via GitHub Sat, 27 Dec 2025 23:20:06 -0800


joyhaldar opened a new pull request, #14940:
URL: https://github.com/apache/iceberg/pull/14940


   The current commit path loads the BigQuery table twice:
   1. During table refresh to get metadata location
   2. During commit to get ETag for the update call
   
   This change stores the table from the refresh step and reuses it during 
commit, eliminating the redundant load. Concurrent modification detection 
remains intact via [ETag-based optimistic 
locking](https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/patch) 
in the BigQuery API.
   
   BigQuery API calls per commit:
   | Before | After |
   |--------|-------|
   | `doRefresh` → tables.get | `doRefresh` → tables.get |
   | `updateTable` → tables.get | (reuses table from refresh) |
   | `updateTable` → tables.patch | `updateTable` → tables.patch |
   
   This improves commit latency and reduces [tables.get quota 
consumption](https://cloud.google.com/bigquery/quotas#api_request_quotas).
   
   **Changes:**
   - Store table loaded during refresh for reuse during commit
   - Remove metadata location comparison (redundant with ETag check)
   - Update test to verify ETag-based conflict detection


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] BigQuery: Reuse table from refresh during commit to reduce API calls [iceberg]

Reply via email to