zeroshade commented on issue #386: URL: https://github.com/apache/iceberg-go/issues/386#issuecomment-2847904700
Hey @jhump, after talking with @loicalleyne about this a bunch, I'm trying to figure out how this can be useful without being misleading. > Currently, manifest Avro files store the corresponding Iceberg schema and schema ID as well as partition spec and partition spec ID in the files' metadata They only store the corresponding schema, schema ID, partition spec, partition spec ID **at the time of writing** the manifest file. The Partition Spec ID is retrievable from the Manifest file via a `PartitionSpecID()` method on the `ManifestFile` interface. This makes some sense because the physical data in the manifest entry data files will reflect the partition spec at the time of writing. However, you mention that the Schema ID is the important thing you want exposed. But I'm not sure that it would be useful, instead it could end up being particularly misleading. Since a ManifestFile and ManifestEntry would only be aware of the Schema ID / Schema at the time of writing the file, schema evolution means that the *current* schema may be very different from what the schema was when the manifest was written. Both the ManifestFile and the ManifestEntry would contain the SnapshotID from the point when they were added to the table, so you should be able to retrieve the Schema ID and Schema by looking at the corresponding Snapshot to that ID. If we were to embed the Schema information into the manifest entry/manifest file, consumers might mistakenly believe it shows the *current* schema of a table as opposed to just reflecting the schema *at the time the manifest was written* which may or may not be the same as the current one. In the interests of avoiding this possible ambiguity, can you expand more on why you want to expose the schema *directly* through the ManifestFile and ManifestEntry as opposed to having to go through the Snapshot to get them? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org