aiborodin commented on code in PR #14792:
URL: https://github.com/apache/iceberg/pull/14792#discussion_r2605272290
##########
flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/TableMetadataCache.java:
##########
@@ -207,9 +207,26 @@ private Tuple2<Boolean, Exception>
refreshTable(TableIdentifier identifier) {
}
private boolean needsRefresh(CacheItem cacheItem, boolean allowRefresh) {
- return allowRefresh
- && (cacheItem == null
- || cacheRefreshClock.millis() - cacheItem.createdTimestampMillis >
refreshMs);
+ if (!allowRefresh) {
+ return false;
+ }
+
+ if (cacheItem == null) {
+ return true;
+ }
+
+ long nowMillis = cacheRefreshClock.millis();
+ long timeElapsedMillis = nowMillis - cacheItem.createdTimestampMillis;
+ if (timeElapsedMillis > refreshMs) {
+ LOG.info(
Review Comment:
I was debating this. Refreshing table metadata is expensive because it hits
the catalog. In a good setup, the cache refresh and this log should appear
infrequently because the refresh is lazy and happens on missed cache hits.
Additionally, users may want to know when and why the sink queries the
catalog. Currently, they have to investigate this manually. Having this on by
default would give more visibility on why the sink queries the catalog.
What do you think?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]