yshcz opened a new pull request, #14949:
URL: https://github.com/apache/iceberg/pull/14949

   Closes https://github.com/apache/iceberg/issues/14926.
   
   While implementing the spec, I looked into how other implementations detect 
the format version of manifest list files and found some notable differences in 
approaches:
   
     | Implementation | Description |
     |----------------|-------------|
     | Java | Uses Avro schema resolution [without version-specific read 
path](https://github.com/apache/iceberg/blob/apache-iceberg-1.10.1/core/src/main/java/org/apache/iceberg/ManifestLists.java#L33-L48)
 - always uses [the same 
schema](https://github.com/apache/iceberg/blob/apache-iceberg-1.10.1/api/src/main/java/org/apache/iceberg/ManifestFile.java#L100-L117)
 that includes all fields up to V3+ |
     | Python | Similar to Java - uses [schema 
resolution](https://github.com/apache/iceberg-python/blob/pyiceberg-0.10.0/pyiceberg/avro/resolver.py#L236-L253)
 with a [fixed read 
schema](https://github.com/apache/iceberg-python/blob/pyiceberg-0.10.0/pyiceberg/manifest.py#L897)
 |
     | Rust | Uses current version of table metadata and [branches on 
it](https://github.com/apache/iceberg-rust/blob/v0.7.0/crates/iceberg/src/spec/manifest_list.rs#L60-L72),
 though a TODO comment notes this [may not be 
needed](https://github.com/apache/iceberg-rust/blob/v0.7.0/crates/iceberg/src/spec/snapshot.rs#L195-L196)
 |
     | Go | Reads `format-version` from [Avro 
metadata](https://github.com/apache/iceberg-go/blob/v0.4.0/manifest.go#L809-L812)
 and errors if missing, which is not strictly following the spec since this 
field is not required for manifest lists |
   
   This PR adds an implementation note to Appendix F to document the 
recommended approach and guide future implementations. It explains how Avro 
schema resolution enables implementations to use a single read schema without 
version detection, how to determine the exact format version when needed (by 
examining the writer schema), and why `format-version` in Avro metadata cannot 
be reliably used for manifest lists.
   
   If manifest lists are removed in v4, this note would only apply to v1 to v3. 
I'm not sure how to frame this, any guidance from reviewers would be 
appreciated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to