yegangy0718 opened a new pull request, #6382:
URL: https://github.com/apache/iceberg/pull/6382

   This PR is created as part of issue 
https://github.com/apache/iceberg/issues/6303 and project 
https://github.com/apache/iceberg/projects/27
   
   In this PR, we focus on 
   bin packing based on traffic distribution statistics. This works well for 
skewed data on partition columns (like event time). This requires calculating 
traffic distribution statistics across partition columns and use the statistics 
to guide shuffling decision.
   
   Changes:
   1. Implement ShuffleOperator which will be added before Iceberg Writer 
operator to collect data distribution based on key(generated from provided 
KeySelector)
   2. Implement ShuffleRecordWrapper which contains either the record for data 
distribution information
   
   I will have following up PRs to implement ShuffleCoordinator, the data 
distribution sending and receiving logic between coordinator and operator, and 
etc. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to