singhpk234 commented on code in PR #11273: URL: https://github.com/apache/iceberg/pull/11273#discussion_r1794359228
########## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkBatchQueryScan.java: ########## @@ -158,6 +163,26 @@ public void filter(Predicate[] predicates) { } } + protected Map<String, DeleteFileSet> dataToFileScopedDeletes() { Review Comment: [doubt] why do we need this whole hash-map of all the files with deletes being broadcasted from driver to executor ? since they are being anyways derived from scanTasks and spark executor ideally should have scanTask() so can we not create a local hashmap within executor and merge as executors will need to apply all the deletes to the give data file it points to ? Am i missing something here ? [a bit orthogonal] can we put an estimate on the size of the HM ? if it goes very high it can fail the query ? i think the size is 8GB if IIRC. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org