syun64 commented on issue #791: URL: https://github.com/apache/iceberg-python/issues/791#issuecomment-2156782618
> For Arrow, the `binary` cannot store more than 2GB in a single buffer, not a single field. See [Arrow docs](https://arrow.apache.org/docs/format/Columnar.html#variable-size-binary-layout) for more context. My apologies - I think I might not have done a good job explaining the problem @Fokko . I think the issue is with Parquet, not Arrow or PolaRs. I'm using these two libraries as examples to show that writing a record that exceeds 2GB, even if they are able to be represented in memory as large Arrow data type, cannot be written into a Parquet file. This issue raised on PolaRs seems to reiterate that issue as well: https://github.com/pola-rs/polars/issues/10774 This is just based on my research this week, so it is definitely possible that I'm missing something here, but so far I haven't been able to write an actually large record (>2GB) into Parquet -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org