Vino1016 commented on issue #15215:
URL: https://github.com/apache/iceberg/issues/15215#issuecomment-3882477610

   I found the root reason:
   
   Root Cause Analysis
   Flink Serialization Causes Schema Object Reference Invalidation
   
   Upstream Operator: Although reusing the same Schema instance
   Serialization Transfer: DynamicRecord is serialized and sent to downstream 
Writer
   Downstream Writer: Each parallel instance gets a new Schema object after 
deserialization
   Cache Invalidation: Iceberg's TableMetadataCache uses object references as 
keys, resulting in cache misses
   
   Solution
   Cache Schema objects in DynamicRecordGenerator:
   
   JAVA
   // Each Writer instance maintains its own Schema cache
   private transient Schema cachedSchema;
   
   // All records reuse the same object reference after compare
   
    if (!icebergSchema.sameSchema(dynamicRecord.schema()))
   
   dynamicRecord.setSchema(cachedSchema);
   
   Results
   ✅ Performance warnings eliminated
   ✅ Cache hit rate 100%
   ✅ Supports Schema evolution detection


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to