srilman opened a new issue, #401: URL: https://github.com/apache/iceberg-python/issues/401
### Question Is there a recommended way to getting the base / original schema or schema-id of a data file in a FileScanTask returned during `FileTableScan.plan_files`? This is useful to determine what kind of schema evolution occurred with the subset of files we are reading, and group files together with the same schemas for reads. I had a hard time accomplishing this in the Java library, but found it much easier to do in Python. In `plan_files`, we can get the snapshot id a data file was created by looking at the `snapshot_id` of the associated manifest entry (or `added_snapshot_id` of the manifest list if the previous is null). From there, we can get the associated schema per snapshot. Is this a logical approach, or is there a better way to get the original schema? Happy to open a PR to integrate this into `FileTableScan` if it would be useful! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org