syun64 commented on issue #791: URL: https://github.com/apache/iceberg-python/issues/791#issuecomment-2159307590
Gotcha - thank you for the explanation @Fokko I didn't think of how using a large_binary could actually improve the performance because the data is grouped together into large buffers. I think I might have been convoluting the issue with the 2GB limit of Parquet with that of the necessity of using a large type. Simply, I was asking: if we can't write that large of a data into Parquet, why do we even bother using a type that is specifically designed to be able to support larger data (which can't be written into the file)? But now I see that the motivation to support large types is different from the motivation to write larger data. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org