rbalamohan commented on code in PR #6432: URL: https://github.com/apache/iceberg/pull/6432#discussion_r1051680991
########## core/src/main/java/org/apache/iceberg/deletes/Deletes.java: ########## @@ -144,7 +146,18 @@ public static <T extends StructLike> PositionDeleteIndex toPositionIndex( deletes -> CloseableIterable.transform( locationFilter.filter(deletes), row -> (Long) POSITION_ACCESSOR.get(row))); - return toPositionIndex(CloseableIterable.concat(positions)); + return toPositionIndex(positions); + } + + public static PositionDeleteIndex toPositionIndex(List<CloseableIterable<Long>> positions) { Review Comment: Thanks @rdblue. Yes, this happens when there are more than one "delete positional file" that qualifies for the data file. E.g Assume a trickle feed job ingests data into the partition. Due to late arriving data, another job updates the dataset for certain dataset in the partition & creates "positional files (POS)". For update jobs with different criteria, same data file may get qualified and creates additional POS files. Essentially during scanning, one data file may have to scan multiple POS files (e.g 4 pos files) and causes slowness. ParallelIterable helps in this case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org