yshcz opened a new pull request, #14949: URL: https://github.com/apache/iceberg/pull/14949
Closes https://github.com/apache/iceberg/issues/14926. While implementing the spec, I looked into how other implementations detect the format version of manifest list files and found some notable differences in approaches: | Implementation | Description | |----------------|-------------| | Java | Uses Avro schema resolution [without version-specific read path](https://github.com/apache/iceberg/blob/apache-iceberg-1.10.1/core/src/main/java/org/apache/iceberg/ManifestLists.java#L33-L48) - always uses [the same schema](https://github.com/apache/iceberg/blob/apache-iceberg-1.10.1/api/src/main/java/org/apache/iceberg/ManifestFile.java#L100-L117) that includes all fields up to V3+ | | Python | Similar to Java - uses [schema resolution](https://github.com/apache/iceberg-python/blob/pyiceberg-0.10.0/pyiceberg/avro/resolver.py#L236-L253) with a [fixed read schema](https://github.com/apache/iceberg-python/blob/pyiceberg-0.10.0/pyiceberg/manifest.py#L897) | | Rust | Uses current version of table metadata and [branches on it](https://github.com/apache/iceberg-rust/blob/v0.7.0/crates/iceberg/src/spec/manifest_list.rs#L60-L72), though a TODO comment notes this [may not be needed](https://github.com/apache/iceberg-rust/blob/v0.7.0/crates/iceberg/src/spec/snapshot.rs#L195-L196) | | Go | Reads `format-version` from [Avro metadata](https://github.com/apache/iceberg-go/blob/v0.4.0/manifest.go#L809-L812) and errors if missing, which is not strictly following the spec since this field is not required for manifest lists | This PR adds an implementation note to Appendix F to document the recommended approach and guide future implementations. It explains how Avro schema resolution enables implementations to use a single read schema without version detection, how to determine the exact format version when needed (by examining the writer schema), and why `format-version` in Avro metadata cannot be reliably used for manifest lists. If manifest lists are removed in v4, this note would only apply to v1 to v3. I'm not sure how to frame this, any guidance from reviewers would be appreciated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
