pvary commented on issue #8802: URL: https://github.com/apache/iceberg/issues/8802#issuecomment-1757603105
Currently there is no way to order the scan task. The planning side specifically makes sure that even the planning could be done by parallel threads (reading manifests files parallel) Sometimes we need to do similar thing in Flink Source, and we ended up creating our own comparator for this which compares Iceberg splits (which are a wrapper above ScanTasks). You can do something similar like this in java code with one serious caveat: For a big table you might not want/able to keep all of the tasks in memory, which is needed for sorting. What we do in flink is limit the number of snapshots to read once. I hope this helps, Peter -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org