HonahX commented on code in PR #807:
URL: https://github.com/apache/iceberg-python/pull/807#discussion_r1635920344


##########
pyiceberg/io/pyarrow.py:
##########
@@ -680,6 +680,10 @@ def _pyarrow_to_schema_without_ids(schema: pa.Schema) -> 
Schema:
     return visit_pyarrow(schema, _ConvertToIcebergWithoutIDs())
 
 
+def _pyarrow_with_large_types(schema: pa.Schema) -> pa.Schema:

Review Comment:
   How about naming it `_pyarrow_ensure_large_types` to reflect that we 
directly convert types to `large_*` if applicable. 
   



##########
pyiceberg/io/pyarrow.py:
##########
@@ -998,7 +1026,7 @@ def _task_to_table(
 
         fragment_scanner = ds.Scanner.from_fragment(
             fragment=fragment,
-            schema=physical_schema,
+            schema=_pyarrow_with_large_types(physical_schema),

Review Comment:
   It may be good to add a comment (either here or in the method body) to 
explain that we read data as `large_*` types to improve the performance



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to