Fokko commented on issue #2250: URL: https://github.com/apache/iceberg-python/issues/2250#issuecomment-3152205462
Some context, as this is a bit of a loose-end: - When we read using `to_arrow()` we pull all the data in memory right away, so we know that all the types of the column. - When reading this through the `to_arrow_batch_reader()` we read the files one by one, and we might hit a Parquet file that uses `large_string` rather than `string`. We cannot push it into the `string` since it would cause a buffer overflow. Therefore we're taking the safe path and we upcast it into a `large_string` upfront. Out of curiousity, what is your expected outcome here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
