Re: [PR] fix: Don't use avro.DefaultSchemaCache [iceberg-go]

via GitHub Mon, 14 Apr 2025 10:14:54 -0700


jhump commented on PR #385:
URL: https://github.com/apache/iceberg-go/pull/385#issuecomment-2802368954


   > Is there an example of an issue that could happen by using the default 
cache? Can we test for it?
   
   Good question. I think so, though it depends slightly on what the `ocf` 
package does with a schema before serializing it. The issue would be if a file 
has a schema that **omits** the definition of one of the record types and 
relies solely on the type's name. That type could then be resolved from the 
cache (which would be incorrect -- the Avro files for Iceberg metadata should 
not rely on some cache of schema types but should include the definitions of 
all record types).
   
   > Could we use an internal module scoped cache? Such as in the internal 
package?
   
   I don't think you'd want to as that would have the same problems as the 
default cache: it's a global that allows the schema from one file "infect" a 
different file. So after reading one file (which implicitly caches all record 
types in that file's schema), loading any subsequent file using the same cache 
could incorrectly resolve a named type whose definition is missing from that 
later file.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] fix: Don't use avro.DefaultSchemaCache [iceberg-go]

Reply via email to