sajjad-moradi commented on PR #11476: URL: https://github.com/apache/pinot/pull/11476#issuecomment-1765447754
This is for fixing the metadata of the new consuming segments for the new partitions. The consuming segments for existing partitions keep their existing partitioning metadata. Depending on how you repartition, that might be correct or incorrect. For example, in Linkedin, any partition change is a factor of two. With this repartitioning approach, the existing segment metadata will result in correct partition routing. Let say there are two partitions and 4 ids in total for the partitioning column. So before repartitioning, here are segment ZK metadata for the two partitions: p0 -> numPartitions: 2, partitionIds = [0] (corresponding to id's 0, 2) p1 -> numPartitions: 2, partitionIds = [1] (corresponding to id's 1, 3) After repartitioning, two new partitions are added, but the existing consuming segment keep their existing metadata: p0 -> numPartitions: 2, partitionIds = [0] (corresponding to id's 0, 2) p1 -> numPartitions: 2, partitionIds = [1] (corresponding to id's 1, 3) p2 -> numPartitions: 4, partitionIds = [2] (corresponding to id 2) p3 -> numPartitions: 4, partitionIds = [3] (corresponding to id 3) After the next segment commits, consuming segment zk metadata are like these: p0 -> numPartitions: 4, partitionIds = [0] (corresponding to id 0) p1 -> numPartitions: 4, partitionIds = [1] (corresponding to id 1) p2 -> numPartitions: 4, partitionIds = [2] (corresponding to id 2) p3 -> numPartitions: 4, partitionIds = [3] (corresponding to id 3) This shows that with "factor or two" repartitioning approach, the partition routing is correct before, during, and after repartitioning. If you're using different repartitioning, you need to do a force-commit right after repartition happens. This way you can significantly reduce the duration when partition aware routing is problematic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org