Re: [PR] feat: make file scan task serializable [iceberg-rust]

via GitHub Tue, 21 May 2024 23:31:54 -0700


ZENOTME commented on PR #377:
URL: https://github.com/apache/iceberg-rust/pull/377#issuecomment-2123974633


   > This seems reasonable, but perhaps we might want to consider having this 
as a separate method to the existing `plan_files` though so that anyone who is 
using the existing stream of file plan tasks does not get broken by this.
   
   I think we don't need a separate method for this.🤔 We just need to let 
`FileScanTask` be `Serialize, Deserialize` and then user can use the 
`plan_files()` to get the `FileScanTask` and transfer them to compute node. A 
simple case may like following to read all files at once. Also user can use the 
stream interface to have some optimization, e.g. read in stream way.
   ```
   let plan_file_stream = scan.plan_files();
   
   // read all file scan.
   let file_scans = vec![];
   #[for_await]
   for file_scan in plan_file_stream {
     file_scans.push(file_scan);
   }
   
   // send the file scan to the compute node. The compute node can 
   // read them all at once and use the Reader to read the data.
   arrow_reader(stream::iter(file_scans.into_iter())) 
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: make file scan task serializable [iceberg-rust]

Reply via email to