aokolnychyi commented on code in PR #9117:
URL: https://github.com/apache/iceberg/pull/9117#discussion_r1399860140


##########
data/src/main/java/org/apache/iceberg/data/DeleteFilter.java:
##########
@@ -245,18 +242,9 @@ private CloseableIterable<T> 
applyPosDeletes(CloseableIterable<T> records) {
 
     List<CloseableIterable<Record>> deletes = Lists.transform(posDeletes, 
this::openPosDeletes);
 
-    // if there are fewer deletes than a reasonable number to keep in memory, 
use a set

Review Comment:
   When this logic was added a few years ago, we added position deletes into a 
set (see the comment). We have been using bitmaps for a while now. In fact, 
vectorized reads always build bitmaps and have no threshold on the number of 
deletes. This has proven to work really well. Position deletes represented as 
bitmaps should always fit in memory.
   
   Position deletes compress really well both on disk and in memory. We have 
seen this 100K threshold causing degradation in jobs without any good reason.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to