kevinjqliu commented on code in PR #1443: URL: https://github.com/apache/iceberg-python/pull/1443#discussion_r1894421818
########## tests/io/test_pyarrow.py: ########## @@ -1122,6 +1123,110 @@ def test_projection_concat_files(schema_int: Schema, file_int: str) -> None: assert repr(result_table.schema) == "id: int32" +def test_projection_partition_inference(tmp_path: str) -> None: Review Comment: > ensuring that the partition struct contains a value during the table scan good point, [from the spec](https://iceberg.apache.org/spec/#column-projection) ``` Return the value from partition metadata if an [Identity Transform](https://iceberg.apache.org/spec/#partition-transforms) exists for the field and the partition value is present in the partition struct on data_file object in the manifest. ``` specifically, `the partition value is present in the partition struct on data_file object in the manifest`. Looks like in order to create this, we need to ~~write data files on an already partitioned table~~. Otherwise, the data_file's partition value will not be populated. edit; this is one of those hive table trivia... hive tables do not store partition field in the underlying data files, the values are inferred from the file path. iceberg tables do store partition field in the data files. So we need to do something extra weird to set this up... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org