Re: [PR] Flink: Dynamic Iceberg Sink: Add table update code for schema comparison and evolution [iceberg]

via GitHub Thu, 04 Sep 2025 01:38:45 -0700


aiborodin commented on code in PR #13032:
URL: https://github.com/apache/iceberg/pull/13032#discussion_r2321288049



##########
flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/TableMetadataCache.java:
##########
@@ -0,0 +1,261 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.flink.sink.dynamic;
+
+import com.github.benmanes.caffeine.cache.Cache;
+import com.github.benmanes.caffeine.cache.Caffeine;
+import java.util.LinkedHashMap;
+import java.util.Map;
+import java.util.Set;
+import org.apache.flink.annotation.Internal;
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.exceptions.NoSuchTableException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * TableMetadataCache is responsible for caching table metadata to avoid 
hitting the catalog too
+ * frequently. We store table identifier, schema, partition spec, and a set of 
past schema
+ * comparison results of the active table schema against the last input 
schemas.
+ */
+@Internal
+class TableMetadataCache {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(TableMetadataCache.class);
+  private static final int MAX_SCHEMA_COMPARISON_RESULTS_TO_CACHE = 10;
+  private static final Tuple2<Boolean, Exception> EXISTS = Tuple2.of(true, 
null);
+  private static final Tuple2<Boolean, Exception> NOT_EXISTS = 
Tuple2.of(false, null);
+  static final Tuple2<Schema, CompareSchemasVisitor.Result> NOT_FOUND =
+      Tuple2.of(null, CompareSchemasVisitor.Result.SCHEMA_UPDATE_NEEDED);
+
+  private final Catalog catalog;
+  private final long refreshMs;
+  private final Cache<TableIdentifier, CacheItem> cache;
+
+  TableMetadataCache(Catalog catalog, int maximumSize, long refreshMs) {
+    this.catalog = catalog;
+    this.refreshMs = refreshMs;
+    this.cache = Caffeine.newBuilder().maximumSize(maximumSize).build();
+  }
+
+  Tuple2<Boolean, Exception> exists(TableIdentifier identifier) {
+    CacheItem cached = cache.getIfPresent(identifier);
+    if (cached != null && Boolean.TRUE.equals(cached.tableExists)) {
+      return EXISTS;
+    } else if (needsRefresh(cached, true)) {
+      return refreshTable(identifier);
+    } else {
+      return NOT_EXISTS;
+    }
+  }
+
+  String branch(TableIdentifier identifier, String branch) {
+    return branch(identifier, branch, true);
+  }
+
+  Tuple2<Schema, CompareSchemasVisitor.Result> schema(TableIdentifier 
identifier, Schema input) {
+    return schema(identifier, input, true);
+  }
+
+  PartitionSpec spec(TableIdentifier identifier, PartitionSpec spec) {
+    return spec(identifier, spec, true);
+  }
+
+  void update(TableIdentifier identifier, Table table) {
+    cache.put(
+        identifier,
+        new CacheItem(true, table.refs().keySet(), new 
SchemaInfo(table.schemas()), table.specs()));
+  }
+
+  private String branch(TableIdentifier identifier, String branch, boolean 
allowRefresh) {
+    CacheItem cached = cache.getIfPresent(identifier);
+    if (cached != null && cached.tableExists && 
cached.branches.contains(branch)) {
+      return branch;
+    }
+
+    if (needsRefresh(cached, allowRefresh)) {
+      refreshTable(identifier);
+      return branch(identifier, branch, false);
+    } else {
+      return null;
+    }
+  }
+
+  private Tuple2<Schema, CompareSchemasVisitor.Result> schema(
+      TableIdentifier identifier, Schema input, boolean allowRefresh) {
+    CacheItem cached = cache.getIfPresent(identifier);
+    Schema compatible = null;
+    if (cached != null && cached.tableExists) {
+      // This only works if the {@link Schema#equals(Object)} returns true for 
the old schema
+      // and a new schema. Performance is paramount as this code is on the hot 
path. Every other
+      // way for comparing 2 schemas were performing worse than the
+      // {@link CompareByNameVisitor#visit(Schema, Schema, boolean)}, so 
caching was useless.
+      Tuple2<Schema, CompareSchemasVisitor.Result> lastResult =
+          cached.schema.lastResults.get(input);
+      if (lastResult != null) {
+        return lastResult;
+      }
+
+      for (Map.Entry<Integer, Schema> tableSchema : 
cached.schema.schemas.entrySet()) {

Review Comment:
   @mxm @pvary, why do we need to loop through all previous table schemas here? 
Would it be more correct to always compare the incoming record schema to the 
latest table schema and evolve the table, or convert the incoming records to 
match the latest schema?
   
   The side effect of the current approach is that the dynamic sink will create 
multiple `DynamicCommittable` instances for each resolved table schema of 
`cached.schema.schemas.entrySet()` in the [dynamic commit 
aggregator](https://github.com/apache/iceberg/blob/be03c998d96d0d1fae13aa8c53d6c7c87e2d60ba/flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/DynamicWriteResultAggregator.java#L101),
 which will issue multiple commits to an Iceberg table per checkpoint - one 
commit per each table schema id. However, each of these commits will have a 
`schema-id` in its metadata snapshot pointing to the latest table schema, which 
seems wrong. See the [SnapshotProducer 
implementation](https://github.com/apache/iceberg/blob/be03c998d96d0d1fae13aa8c53d6c7c87e2d60ba/core/src/main/java/org/apache/iceberg/SnapshotProducer.java#L322).
 
   
   I can see the performance benefit of exactly matching an incoming schema to 
a previous table schema, as we don't have to go through the conversion code and 
can directly write using one of the earlier schemas.
   
   Is it valid in Iceberg to constantly alternate between multiple schemas? Or 
should we only evolve the latest schema and adjust incoming records to match 
it? What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Flink: Dynamic Iceberg Sink: Add table update code for schema comparison and evolution [iceberg]

Reply via email to