Re: [PR] Cast PyArrow schema to `large_*` types [iceberg-python]

via GitHub Wed, 12 Jun 2024 01:02:08 -0700


HonahX commented on code in PR #807:
URL: https://github.com/apache/iceberg-python/pull/807#discussion_r1635920344



##########
pyiceberg/io/pyarrow.py:
##########
@@ -680,6 +680,10 @@ def _pyarrow_to_schema_without_ids(schema: pa.Schema) -> 
Schema:
     return visit_pyarrow(schema, _ConvertToIcebergWithoutIDs())
 
 
+def _pyarrow_with_large_types(schema: pa.Schema) -> pa.Schema:

Review Comment:
   How about naming it `_pyarrow_ensure_large_types` to reflect that we 
directly convert types to `large_*` if applicable. 
   



##########
pyiceberg/io/pyarrow.py:
##########
@@ -998,7 +1026,7 @@ def _task_to_table(
 
         fragment_scanner = ds.Scanner.from_fragment(
             fragment=fragment,
-            schema=physical_schema,
+            schema=_pyarrow_with_large_types(physical_schema),

Review Comment:
   It may be good to add a comment (either here or in the method body) to 
explain that we read data as `large_*` types to improve the performance



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Cast PyArrow schema to `large_*` types [iceberg-python]

Reply via email to