sajjad-moradi opened a new issue, #8585:
URL: https://github.com/apache/pinot/issues/8585

   Currently for consuming segments, the following formula is used for segment 
assignment:
   ```
   (partitionId * numReplicas + replicaId) % numInstances
   ```
   This gives us unbalanced assignment which is shown in the following examples:
   ```
   Scenario#1:
   
   10 instances -> i0, i1, ..., i9
   2 partitions -> 0, 1
   3 replicas   -> 0, 1, 2
   
   segment name: partitionId_seqId(replicaId) 
   
   i0      i1      i2      i3      i4      i5      i6      i7      i8     i9
   0_0(0)  0_0(1)  0_0(2)  1_0(0)  1_0(1)  1_0(2)
   0_1(0)  0_1(1)  0_1(2)  1_1(0)  1_1(1)  1_1(2)
   0_2(0)  0_2(1)  0_2(2)  1_2(0)  1_2(1)  1_2(2)
   0_3(0)  0_3(1)  0_3(2)  1_3(0)  1_3(1)  1_3(2)
   0_4(0)  0_4(1)  0_4(2)  1_4(0)  1_4(1)  1_4(2)
   0_5(0)  0_5(1)  0_5(2)  1_5(0)  1_5(1)  1_5(2)
   0_6(0)  0_6(1)  0_6(2)  1_6(0)  1_6(1)  1_6(2)
   
   
   Scenario#2:
   
   5 instances
   2 partitions
   3 replicas
   
   i0      i1      i2      i3      i4      
   0_0(0)  0_0(1)  0_0(2)  1_0(0)  1_0(1)  
   1_0(2)
   0_1(0)  0_1(1)  0_1(2)  1_1(0)  1_1(1)  
   1_1(2)
   0_2(0)  0_2(1)  0_2(2)  1_2(0)  1_2(1)  
   1_2(2)
   0_3(0)  0_3(1)  0_3(2)  1_3(0)  1_3(1)  
   1_3(2)
   0_4(0)  0_4(1)  0_4(2)  1_4(0)  1_4(1)  
   1_4(2)
   0_5(0)  0_5(1)  0_5(2)  1_5(0)  1_5(1)  
   1_5(2)
   0_6(0)  0_6(1)  0_6(2)  1_6(0)  1_6(1)  
   1_6(2)
   ```
   In scenario#1, 4 instances don't get any segments assigned to them. 
   In scenario#2, instance `i0` has two times more segments than the other 
instances.
   
   The existing realtime segment assignment tries to put all segments of the 
same partition in one instance. That's good for the use cases for which the 
issued queries always have partitioned column in the where-clause so that query 
execution will be faster.
   But that's not a hard requirement for all realtime tables. Having a balanced 
segment assignment leads to a balanced load distribution on the servers for the 
use case that don't have that requirement.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to