zinking commented on PR #8807: URL: https://github.com/apache/iceberg/pull/8807#issuecomment-1760653401
> wondering if we could benefit from reads in general as well ? yep, like mentioned in the distributed planning work: when metadata becomes big, hand crafted parallel code is no longer optimal. if reads are planned optimally these delete files would be read concurrently instead of what we have now. > Also do you have more crisp benchmarks demonstrating this would benefit always ? I don't think this benefit always, it's easy to imagine that when there are only a couple of delete files, join would certainly not outperform. but when metadata becomes larger, it would always benefit as in theory file reads decreased. I don't have more numbers at the moment, and the benchmark above isn't fully optimized. > have you tried the caching of delete files on executor solution which @aokolnychyi is working on and integrating with it ? not yet -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org