sajjad-moradi commented on PR #11476:
URL: https://github.com/apache/pinot/pull/11476#issuecomment-1765447754

   This is for fixing the metadata of the new consuming segments for the new 
partitions. The consuming segments for existing partitions keep their existing 
partitioning metadata. Depending on how you repartition, that might be correct 
or incorrect. For example, in Linkedin, any partition change is a factor of 
two. With this repartitioning approach, the existing segment metadata will 
result in correct partition routing. 
   
   Let say there are two partitions and 4 ids in total for the partitioning 
column. So before repartitioning, here are segment ZK metadata for the two 
partitions:
   p0 -> numPartitions: 2, partitionIds = [0] (corresponding to id's 0, 2)
   p1 -> numPartitions: 2, partitionIds = [1] (corresponding to id's 1, 3)
   
   After repartitioning, two new partitions are added, but the existing 
consuming segment keep their existing metadata:
   p0 -> numPartitions: 2, partitionIds = [0] (corresponding to id's 0, 2)
   p1 -> numPartitions: 2, partitionIds = [1] (corresponding to id's 1, 3)
   p2 -> numPartitions: 4, partitionIds = [2] (corresponding to id 2)
   p3 -> numPartitions: 4, partitionIds = [3] (corresponding to id 3)
   
   After the next segment commits, consuming segment zk metadata are like these:
   p0 -> numPartitions: 4, partitionIds = [0] (corresponding to id 0)
   p1 -> numPartitions: 4, partitionIds = [1] (corresponding to id 1)
   p2 -> numPartitions: 4, partitionIds = [2] (corresponding to id 2)
   p3 -> numPartitions: 4, partitionIds = [3] (corresponding to id 3)
   
   This shows that with "factor or two" repartitioning approach, the partition 
routing is correct before, during, and after repartitioning. 
   
   If you're using different repartitioning, you need to do a force-commit 
right after repartition happens. This way you can significantly reduce the 
duration when partition aware routing is problematic.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to