Re: [I] `to_arrow_batch_reader` returns a different schema than `to_arrow` [iceberg-python]

via GitHub Mon, 04 Aug 2025 13:13:38 -0700


Fokko commented on issue #2250:
URL: 
https://github.com/apache/iceberg-python/issues/2250#issuecomment-3152205462


   Some context, as this is a bit of a loose-end:
   
   - When we read using `to_arrow()` we pull all the data in memory right away, 
so we know that all the types of the column.
   - When reading this through the `to_arrow_batch_reader()` we read the files 
one by one, and we might hit a Parquet file that uses `large_string` rather 
than `string`. We cannot push it into the `string` since it would cause a 
buffer overflow. Therefore we're taking the safe path and we upcast it into a 
`large_string` upfront.
   
   Out of curiousity, what is your expected outcome here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] `to_arrow_batch_reader` returns a different schema than `to_arrow` [iceberg-python]

Reply via email to