ZENOTME commented on issue #398: URL: https://github.com/apache/iceberg-rust/issues/398#issuecomment-2169216557
> > > Sorry, I don't quite get what's the meta here? > > > > > > e.g. field_id, predicate. The info that can be shared by the set of FileScanTasks. Maybe "metadata" is not precise here. > > I agree that they are same in different tasks, but I don't quite get how to share them. In distributed query engine like spark, the distribute tasks to different hosts, one way to do that is utilizing broadcat in spark to do that, but it increases complexity in implementation. In this model, it's the user responsibility to share(distribute) the "metadata" to different hosts. What we provide is a method which to get the "metadata" from scan and the "metadata" is serializable/desirable. ``` #[derive(Serialize,Deserialize)] struct Metadata { field_ids: Vec<i32>, predicate: BoundPredicate } // master node let metadata = scan.meta(); send(metadata); // worker node let metadata = receive(); let reader = ArrowReaderBuilder::new().with_metadata(metadata); let scan_tasks = receive(); for task in scan_tasks { reader.read(task); } ``` > one way to do that is utilizing broadcat in spark to do that I'm not familiar with this. But I think this can be a way spark to distribute the "metadata". Different compute engines can use in different ways. It's flexible but as you say it increases complexity in implementation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org