ZENOTME commented on PR #377: URL: https://github.com/apache/iceberg-rust/pull/377#issuecomment-2123974633
> This seems reasonable, but perhaps we might want to consider having this as a separate method to the existing `plan_files` though so that anyone who is using the existing stream of file plan tasks does not get broken by this. I think we don't need a separate method for this.🤔 We just need to let `FileScanTask` be `Serialize, Deserialize` and then user can use the `plan_files()` to get the `FileScanTask` and transfer them to compute node. A simple case may like following to read all files at once. Also user can use the stream interface to have some optimization, e.g. read in stream way. ``` let plan_file_stream = scan.plan_files(); // read all file scan. let file_scans = vec![]; #[for_await] for file_scan in plan_file_stream { file_scans.push(file_scan); } // send the file scan to the compute node. The compute node can // read them all at once and use the Reader to read the data. arrow_reader(stream::iter(file_scans.into_iter())) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org