Re: [I] Upcasting and Downcasting inconsistencies with PyArrow Schema [iceberg-python]

via GitHub Mon, 10 Jun 2024 14:23:09 -0700


syun64 commented on issue #791:
URL: https://github.com/apache/iceberg-python/issues/791#issuecomment-2159307590


   Gotcha - thank you for the explanation @Fokko I didn't think of how using a 
large_binary could actually improve the performance because the data is grouped 
together into large buffers.
   
   I think I might have been convoluting the issue with the 2GB limit of 
Parquet with that of the necessity of using a large type. 
   
   Simply, I was asking: if we can't write that large of a data into Parquet, 
why do we even bother using a type that is specifically designed to be able to 
support larger data (which can't be written into the file)?
   
   But now I see that the motivation to support large types is different from 
the motivation to write larger data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Upcasting and Downcasting inconsistencies with PyArrow Schema [iceberg-python]

Reply via email to