rbalamohan opened a new pull request, #6432: URL: https://github.com/apache/iceberg/pull/6432
Issue: https://github.com/apache/iceberg/issues/6387 When tables are updated in "merge-on-read" mode, it creates positional delete files. Performance of reads degrades quite a bit, even with 4+ positional delete files (I tried with tpcds queries). Depending on workload, data file may have to read multiple "positional delete" files to construct delete positions. This does not sound expensive, but when large number of medium sized files are present in a partition, combinedfiletask ends up with many files. So a task has to process the data files in sequential fashion and every data file reads multiple positional delete file causing slowness. PR uses "ParallelIterable" in "Deletes::toPositionIndex". RoaringBitMap isn't threadsafe and hence added sync in BitmapPositionDeleteIndex. Tried out in local cluster and confirmed that this is not expensive. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org