jhump commented on PR #385: URL: https://github.com/apache/iceberg-go/pull/385#issuecomment-2802368954
> Is there an example of an issue that could happen by using the default cache? Can we test for it? Good question. I think so, though it depends slightly on what the `ocf` package does with a schema before serializing it. The issue would be if a file has a schema that **omits** the definition of one of the record types and relies solely on the type's name. That type could then be resolved from the cache (which would be incorrect -- the Avro files for Iceberg metadata should not rely on some cache of schema types but should include the definitions of all record types). > Could we use an internal module scoped cache? Such as in the internal package? I don't think you'd want to as that would have the same problems as the default cache: it's a global that allows the schema from one file "infect" a different file. So after reading one file (which implicitly caches all record types in that file's schema), loading any subsequent file using the same cache could incorrectly resolve a named type whose definition is missing from that later file. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org