ZENOTME commented on issue #1540:
URL: https://github.com/apache/iceberg-rust/issues/1540#issuecomment-3111981699

   > Hi [@ZENOTME](https://github.com/ZENOTME) ,
   > 
   > > how write node and commit node Interact in this path
   > 
   > Write node will need to serialize `Vec<DataFile>` and send it to commit 
node in a stream, and commit node will deserialize it. My draft 
[here](https://github.com/apache/iceberg-rust/pull/1511) probably will make 
more sense to me explaining in text:)
   > 
   > I've discussed with [@liurenjie1024](https://github.com/liurenjie1024) 
offline over the DataSink trait before, and we are not sure about some design 
details in `DataSink`:
   > 
   > * The repartitioning/demuxing and the following writing process in 
DataSinkExec will be done on a single node using multiple threads 
([link](https://github.com/apache/datafusion/blob/4084894ebe1889b6ce80f3a207453154de274b03/datafusion/datasource/src/file_sink_config.rs#L99))
 and demuxing only makes sense when you don't know the parallelism beforehand, 
which is not the case here because parallelism should be configurable
   > * DataSinkExec enforces input to be single partitioned 
([link](https://github.com/apache/datafusion/blob/14ac31d0e0d3cdb1c38bf1bf0d5afe24ee4f05b3/datafusion/datasource/src/sink.rs#L225))
   
   I see. Thanks your explain @CTTY! This design LGTM. Let's move!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to