zeroshade commented on issue #386:
URL: https://github.com/apache/iceberg-go/issues/386#issuecomment-2847904700

   Hey @jhump, after talking with @loicalleyne about this a bunch, I'm trying 
to figure out how this can be useful without being misleading.
   
   > Currently, manifest Avro files store the corresponding Iceberg schema and 
schema ID as well as partition spec and partition spec ID in the files' metadata
   
   They only store the corresponding schema, schema ID, partition spec, 
partition spec ID **at the time of writing** the manifest file. The Partition 
Spec ID is retrievable from the Manifest file via a `PartitionSpecID()` method 
on the `ManifestFile` interface. This makes some sense because the physical 
data in the manifest entry data files will reflect the partition spec at the 
time of writing.
   
   However, you mention that the Schema ID is the important thing you want 
exposed. But I'm not sure that it would be useful, instead it could end up 
being particularly misleading. Since a ManifestFile and ManifestEntry would 
only be aware of the Schema ID / Schema at the time of writing the file, schema 
evolution means that the *current* schema may be very different from what the 
schema was when the manifest was written. Both the ManifestFile and the 
ManifestEntry would contain the SnapshotID from the point when they were added 
to the table, so you should be able to retrieve the Schema ID and Schema by 
looking at the corresponding Snapshot to that ID. If we were to embed the 
Schema information into the manifest entry/manifest file, consumers might 
mistakenly believe it shows the *current* schema of a table as opposed to just 
reflecting the schema *at the time the manifest was written* which may or may 
not be the same as the current one.
   
   In the interests of avoiding this possible ambiguity, can you expand more on 
why you want to expose the schema *directly* through the ManifestFile and 
ManifestEntry as opposed to having to go through the Snapshot to get them?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to