sdd commented on code in PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r1598841588
########## crates/iceberg/src/scan.rs: ########## @@ -189,66 +195,20 @@ impl TableScan { self.case_sensitive, )?; - let mut partition_filter_cache = PartitionFilterCache::new(); - let mut manifest_evaluator_cache = ManifestEvaluatorCache::new(); - - Ok(try_stream! { - let manifest_list = context - .snapshot - .load_manifest_list(&context.file_io, &context.table_metadata) - .await?; - - for entry in manifest_list.entries() { - if !Self::content_type_is_data(entry) { - continue; - } + let (sender, receiver) = channel(CHANNEL_BUFFER_SIZE); - let partition_spec_id = entry.partition_spec_id; + let manifest_list = context + .snapshot + .load_manifest_list(&context.file_io, &context.table_metadata) + .await?; - let partition_filter = partition_filter_cache.get( - partition_spec_id, - &context, - )?; + spawn(async move { Review Comment: I went with using an mpsc channel to avoid having 2 nested try_for_each_concurrent macros. On reflection the inner try_for_each_concurrent is probably unnecessary, since there's only one async operation needed for each manifest file, despite that manifest file producing n DataFileTasks. So I can probably ditch the channel too. The channel in reader.rs is also overkill as I've only got a single try_for_each_concurrent there. I just wanted to get the ball rolling on this while we're waiting for the filtering code to get signed off, it's fun writing this kind of stuff 😁 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org