Re: [I] previous eq deletes handling on new write [iceberg]

via GitHub Sun, 16 Feb 2025 10:25:27 -0800


singhpk234 commented on issue #12280:
URL: https://github.com/apache/iceberg/issues/12280#issuecomment-2661558013


   I see if it written this way i.e join, each eq delete would be scanned only 
once right (same is what Impala does). Is there a configuration to read 
multiple eq deletes in a single execution task (essentially pack ?) as there 
will be always an issue with parallelism if we try to re:write eq deletes ? 
   
   Consider `ICEBERG_SCAN rows from eq delete files)` could support parallely 
scanning 10 files, but we compacted / re:written 10 files into 1 so now the 
read parallelism got reduced to 1 ? may be better to bin-pack at engine level 
at scanning ? 
   
   The problem is even worse in Spark as eq delete can get scanned multiple 
times for a single file. so we need some strategies around how to distribute 
these tasks 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] previous eq deletes handling on new write [iceberg]

Reply via email to