Re: [PR] Pyarrow IO property for configuring large v small types on read [iceberg-python]

via GitHub Wed, 07 Aug 2024 02:10:11 -0700


Fokko commented on code in PR #986:
URL: https://github.com/apache/iceberg-python/pull/986#discussion_r1706662170



##########
pyiceberg/io/pyarrow.py:
##########
@@ -1303,6 +1345,8 @@ def project_table(
             # When FsSpec is not installed
             raise ValueError(f"Expected PyArrowFileIO or FsspecFileIO, got: 
{io}") from e
 
+    use_large_types = property_as_bool(io.properties, 
PYARROW_USE_LARGE_TYPES_ON_READ, True)

Review Comment:
   This is the only part I wouldn't say I like where we now force the table to 
use large or normal tables. When we read record batches I agree that we need to 
force the schema, but for the table, we have to read all the footers anyway.
   
   Once https://github.com/apache/iceberg-python/pull/929 goes in, I think we 
still need to change that, but let's defer that question for now.



##########
pyiceberg/io/__init__.py:
##########
@@ -80,6 +80,7 @@
 GCS_ENDPOINT = "gcs.endpoint"
 GCS_DEFAULT_LOCATION = "gcs.default-bucket-location"
 GCS_VERSION_AWARE = "gcs.version-aware"
+PYARROW_USE_LARGE_TYPES_ON_READ = "pyarrow.use-large-types-on-read"

Review Comment:
   I think it also makes more sense to move this inside of the Arrow file.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Pyarrow IO property for configuring large v small types on read [iceberg-python]

Reply via email to