Re: [PR] rewrite v2 tables by skipping deletes planning and join deletes data tables [iceberg]

via GitHub Thu, 12 Oct 2023 19:02:17 -0700


zinking commented on PR #8807:
URL: https://github.com/apache/iceberg/pull/8807#issuecomment-1760653401


   
   > wondering if we could benefit from reads in general as well ? 
   
   yep, like mentioned in the distributed planning work: when metadata becomes 
big, hand crafted parallel code is no longer optimal. if reads are planned 
optimally these delete files would be read concurrently instead of what we have 
now. 
   
   > Also do you have more crisp benchmarks demonstrating this would benefit 
always ?
   
   I don't think this benefit always, it's easy to imagine that when there are 
only a couple of delete files, join would certainly not outperform. but when 
metadata becomes larger, it would always benefit as in theory file reads 
decreased. 
   
   I don't have more numbers at the moment, and the benchmark above isn't fully 
optimized. 
   
   > have you tried the caching of delete files on executor solution which 
@aokolnychyi is working on and integrating with it ?
   
   not yet
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] rewrite v2 tables by skipping deletes planning and join deletes data tables [iceberg]

Reply via email to