Re: [PR] Concurrent table scans [iceberg-rust]

via GitHub Mon, 13 May 2024 10:52:36 -0700


sdd commented on code in PR #373:
URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r1598841588



##########
crates/iceberg/src/scan.rs:
##########
@@ -189,66 +195,20 @@ impl TableScan {
             self.case_sensitive,
         )?;
 
-        let mut partition_filter_cache = PartitionFilterCache::new();
-        let mut manifest_evaluator_cache = ManifestEvaluatorCache::new();
-
-        Ok(try_stream! {
-            let manifest_list = context
-                .snapshot
-                .load_manifest_list(&context.file_io, &context.table_metadata)
-                .await?;
-
-            for entry in manifest_list.entries() {
-                if !Self::content_type_is_data(entry) {
-                    continue;
-                }
+        let (sender, receiver) = channel(CHANNEL_BUFFER_SIZE);
 
-                let partition_spec_id = entry.partition_spec_id;
+        let manifest_list = context
+            .snapshot
+            .load_manifest_list(&context.file_io, &context.table_metadata)
+            .await?;
 
-                let partition_filter = partition_filter_cache.get(
-                    partition_spec_id,
-                    &context,
-                )?;
+        spawn(async move {

Review Comment:
   I went with using an mpsc channel to avoid having 2 nested 
try_for_each_concurrent macros. On reflection the inner try_for_each_concurrent 
is probably unnecessary, since there's only one async operation needed for each 
manifest file, despite that manifest file producing n DataFileTasks. So I can 
probably ditch the channel too. The channel in reader.rs is also overkill as 
I've only got a single try_for_each_concurrent there.
   
   I just wanted to get the ball rolling on this while we're waiting for the 
filtering code to get signed off, it's fun writing this kind of stuff 😁



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Concurrent table scans [iceberg-rust]

Reply via email to