Jackie-Jiang commented on PR #8441:
URL: https://github.com/apache/pinot/pull/8441#issuecomment-1087919194

   We assign the servers in the following steps:
   1. Pick the pools for the table based on the tenant and pool config
   2. Apply the constraint to the servers if any
   3. Map each replica to a server pool
   4. Pick servers from the server pool
   
   Currently all steps are deterministic, and it should be very rare to 
add/remove pools, so it should be okay to move more segments if the pool count 
is changed. If we can assume the first 3 steps do not change, then the 
algorithm can be very straight forward: simply keep the original servers if the 
server exists in the pool; or replace it with a new server if not. This 
algorithm should also be implemented in a deterministic way.
   
   If we want to solve the corner case of adding/removing pools, we can save 
the pool id into the instance partitions for each replica, and keep them fixed 
during the re-assignment.
   
   Some potential problems with the current approach:
   1. For a large cluster, there can be hundreds or even more servers for each 
pool. Storing them in the instance partitions can add overhead, and can be very 
hard to debug
   2. The overall idea is to optimize the server selection to minimize the 
movement, so the logic should be applied to the server selection step instead 
of the pool selection step
   
   > IMO there is no hard requirement for a pool id to be mapped 1:1 1:N to a 
replica id, right? It's just in current strategy of 
InstanceReplicaGroupPartitionSelector we assign instances to a replica from one 
pool. But this should not be enforced for future use, especially right now we 
are implementing selector with FD awareness and it can have instances from 
multiple pools in 1 replica group.
   In other words we should not rely on the status-quo of we can "reverse 
engineering the pool id from replica group id". So I think the pool -> instance 
mapping should probably be saved.
   
   We don't rely on reverse engineer, but deterministic selection algorithm. 
Storing pool -> server mapping can be very costly. Processing them can be 
costly as well. We may store the replica-group -> pool mapping, but not the 
individual servers


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to