Re: [PR] PyArrow: Don't enforce the schema [iceberg-python]

via GitHub Wed, 10 Jul 2024 22:04:35 -0700


HonahX commented on code in PR #902:
URL: https://github.com/apache/iceberg-python/pull/902#discussion_r1673398145



##########
pyiceberg/table/__init__.py:
##########
@@ -1884,8 +1884,9 @@ def to_arrow_batch_reader(self) -> pa.RecordBatchReader:
 
         from pyiceberg.io.pyarrow import project_batches, schema_to_pyarrow
 
+        target_schema = schema_to_pyarrow(self.projection())

Review Comment:
   > I think the only time we are casting the types is on write, where we may 
want to downcast it for forward compatibility.
   
   +1 Currently, we use "large_*" types during write. I think it could be 
better if we can write file based on the input pyarrow dataframe schema: if the 
dataframe is `string`, we also write with `string`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] PyArrow: Don't enforce the schema [iceberg-python]

Reply via email to