Re: [PR] Implement column projection [iceberg-python]

via GitHub Fri, 20 Dec 2024 14:07:12 -0800


kevinjqliu commented on code in PR #1443:
URL: https://github.com/apache/iceberg-python/pull/1443#discussion_r1894421818



##########
tests/io/test_pyarrow.py:
##########
@@ -1122,6 +1123,110 @@ def test_projection_concat_files(schema_int: Schema, 
file_int: str) -> None:
     assert repr(result_table.schema) == "id: int32"
 
 
+def test_projection_partition_inference(tmp_path: str) -> None:

Review Comment:
   > ensuring that the partition struct contains a value during the table scan
   
   good point, [from the 
spec](https://iceberg.apache.org/spec/#column-projection) 
   ```
   Return the value from partition metadata if an [Identity 
Transform](https://iceberg.apache.org/spec/#partition-transforms) exists for 
the field and the partition value is present in the partition struct on 
data_file object in the manifest.
   ```
   
   specifically, `the partition value is present in the partition struct on 
data_file object in the manifest`.
   
   Looks like in order to create this, we need to ~~write data files on an 
already partitioned table~~. Otherwise, the data_file's partition value will 
not be populated.
   
   edit; this is one of those hive table trivia... hive tables do not store 
partition field in the underlying data files, the values are inferred from the 
file path. iceberg tables do store partition field in the data files. So we 
need to do something extra weird to set this up...  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Implement column projection [iceberg-python]

Reply via email to