Re: [PR] PyArrow: Don't enforce the schema [iceberg-python]

via GitHub Wed, 10 Jul 2024 00:52:21 -0700


Fokko commented on code in PR #902:
URL: https://github.com/apache/iceberg-python/pull/902#discussion_r1671798339



##########
pyiceberg/table/__init__.py:
##########
@@ -1884,8 +1884,9 @@ def to_arrow_batch_reader(self) -> pa.RecordBatchReader:
 
         from pyiceberg.io.pyarrow import project_batches, schema_to_pyarrow
 
+        target_schema = schema_to_pyarrow(self.projection())

Review Comment:
   My preference would be to let Arrow decide. For Polars it is different 
because they are also the query engine. Casting the types will recompute the 
buffers, consuming additional memory/CPU, which I would rather avoid.
   
   For the table, we first materialize all the batches in memory, so if one of 
them is large, it will automatically upcast, otherwise, it will keep the small 
types.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] PyArrow: Don't enforce the schema [iceberg-python]

Reply via email to