Re: [I] How to read data in the order in which files are commited? [iceberg]

via GitHub Wed, 11 Oct 2023 05:42:16 -0700


pvary commented on issue #8802:
URL: https://github.com/apache/iceberg/issues/8802#issuecomment-1757603105


   Currently there is no way to order the scan task. The planning side 
specifically makes sure that even the planning could be done by parallel 
threads (reading manifests files parallel)
   
   Sometimes we need to do similar thing in Flink Source, and we ended up 
creating our own comparator for this which compares Iceberg splits (which are a 
wrapper above ScanTasks).
   
   You can do something similar like this in java code with one serious caveat: 
For a big table you might not want/able to keep all of the tasks in memory, 
which is needed for sorting. What we do in flink is limit the number of 
snapshots to read once.
   
   I hope this helps,
   Peter


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] How to read data in the order in which files are commited? [iceberg]

Reply via email to