liurenjie1024 commented on code in PR #207: URL: https://github.com/apache/iceberg-rust/pull/207#discussion_r1502118976
########## crates/iceberg/src/spec/manifest.rs: ########## @@ -855,6 +855,11 @@ impl ManifestEntry { self.data_file.content } + /// Content type of this manifest entry. + pub fn file_format(&self) -> DataFileFormat { Review Comment: ```suggestion #[inline(always)] pub fn file_format(&self) -> DataFileFormat { ``` ########## crates/iceberg/src/scan.rs: ########## @@ -180,7 +185,42 @@ pub type ArrowRecordBatchStream = BoxStream<'static, crate::Result<RecordBatch>> impl FileScanTask { /// Returns a stream of arrow record batches. pub async fn execute(&self) -> crate::Result<ArrowRecordBatchStream> { - todo!() + match self.data_file.content_type() { Review Comment: After checking java/python's implementation, I think it's in appropriate to have this api. To convert a `FileScanTask` to arrow record batch stream, we may need some other configurations: 1. Record batch size, e.g. number of records of each record batch, this may affect memory usage of upstream executors 2. `FileIO`, as in your implementation. Also it's more general to have multi `FileScanTask`s in size based reader. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org