ZENOTME commented on issue #1540: URL: https://github.com/apache/iceberg-rust/issues/1540#issuecomment-3111981699
> Hi [@ZENOTME](https://github.com/ZENOTME) , > > > how write node and commit node Interact in this path > > Write node will need to serialize `Vec<DataFile>` and send it to commit node in a stream, and commit node will deserialize it. My draft [here](https://github.com/apache/iceberg-rust/pull/1511) probably will make more sense to me explaining in text:) > > I've discussed with [@liurenjie1024](https://github.com/liurenjie1024) offline over the DataSink trait before, and we are not sure about some design details in `DataSink`: > > * The repartitioning/demuxing and the following writing process in DataSinkExec will be done on a single node using multiple threads ([link](https://github.com/apache/datafusion/blob/4084894ebe1889b6ce80f3a207453154de274b03/datafusion/datasource/src/file_sink_config.rs#L99)) and demuxing only makes sense when you don't know the parallelism beforehand, which is not the case here because parallelism should be configurable > * DataSinkExec enforces input to be single partitioned ([link](https://github.com/apache/datafusion/blob/14ac31d0e0d3cdb1c38bf1bf0d5afe24ee4f05b3/datafusion/datasource/src/sink.rs#L225)) I see. Thanks your explain @CTTY! This design LGTM. Let's move! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
