rbalamohan opened a new pull request, #6432:
URL: https://github.com/apache/iceberg/pull/6432

   Issue: https://github.com/apache/iceberg/issues/6387
   
   When tables are updated in "merge-on-read" mode, it creates positional 
delete files. Performance of reads degrades quite a bit, even with 4+ 
positional delete files (I tried with tpcds queries).
   
   Depending on workload, data file may have to read multiple "positional 
delete" files to construct delete positions. This does not sound expensive, but 
when large number of medium sized files are present in a partition, 
combinedfiletask ends up with many files. So a task has to process the data 
files in sequential fashion and every data file reads multiple positional 
delete file causing slowness.
   
   PR uses "ParallelIterable" in "Deletes::toPositionIndex". RoaringBitMap 
isn't threadsafe and hence added sync in BitmapPositionDeleteIndex. Tried out 
in local cluster and confirmed that this is not expensive.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to