singhpk234 commented on issue #12280: URL: https://github.com/apache/iceberg/issues/12280#issuecomment-2661558013
I see if it written this way i.e join, each eq delete would be scanned only once right (same is what Impala does). Is there a configuration to read multiple eq deletes in a single execution task (essentially pack ?) as there will be always an issue with parallelism if we try to re:write eq deletes ? Consider `ICEBERG_SCAN rows from eq delete files)` could support parallely scanning 10 files, but we compacted / re:written 10 files into 1 so now the read parallelism got reduced to 1 ? may be better to bin-pack at engine level at scanning ? The problem is even worse in Spark as eq delete can get scanned multiple times for a single file. so we need some strategies around how to distribute these tasks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org