ZENOTME commented on issue #398:
URL: https://github.com/apache/iceberg-rust/issues/398#issuecomment-2169216557

   > > > Sorry, I don't quite get what's the meta here?
   > > 
   > > 
   > > e.g. field_id, predicate. The info that can be shared by the set of 
FileScanTasks. Maybe "metadata" is not precise here.
   > 
   > I agree that they are same in different tasks, but I don't quite get how 
to share them. In distributed query engine like spark, the distribute tasks to 
different hosts, one way to do that is utilizing broadcat in spark to do that, 
but it increases complexity in implementation.
   
   In this model, it's the user responsibility to share(distribute) the 
"metadata" to different hosts. What we provide is a method which to get the 
"metadata" from scan and the "metadata" is serializable/desirable.
   
   ```
   #[derive(Serialize,Deserialize)]
   struct Metadata {
     field_ids: Vec<i32>,
     predicate: BoundPredicate
   }
   
   // master node 
   let metadata = scan.meta();
   send(metadata);
   
   // worker node
   let metadata = receive();
   let reader = ArrowReaderBuilder::new().with_metadata(metadata);
   let scan_tasks = receive();
   for task in scan_tasks {
     reader.read(task);
   }
   ``` 
   
   >  one way to do that is utilizing broadcat in spark to do that
   
   I'm not familiar with this. But I think this can be a way spark to 
distribute the "metadata". Different compute engines can use in different ways. 
It's flexible but as you say it increases complexity in implementation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to