syun64 commented on issue #791:
URL: https://github.com/apache/iceberg-python/issues/791#issuecomment-2156782618

   > For Arrow, the `binary` cannot store more than 2GB in a single buffer, not 
a single field. See [Arrow 
docs](https://arrow.apache.org/docs/format/Columnar.html#variable-size-binary-layout)
 for more context. 
   
   My apologies - I think I might not have done a good job explaining the 
problem @Fokko . I think the issue is with Parquet, not Arrow or PolaRs. I'm 
using these two libraries as examples to show that writing a record that 
exceeds 2GB, even if they are able to be represented in memory as large Arrow 
data type, cannot be written into a Parquet file. This issue raised on PolaRs 
seems to reiterate that issue as well: 
https://github.com/pola-rs/polars/issues/10774
   
   This is just based on my research this week, so it is definitely possible 
that I'm missing something here, but so far I haven't been able to write an 
actually large record (>2GB) into Parquet
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to