yashmayya opened a new issue, #17179:
URL: https://github.com/apache/pinot/issues/17179

   - https://github.com/apache/pinot/pull/15843#discussion_r2105706875
   - Essentially, during the time period when instance partitions in ZK and the 
segment assignment in the ideal state are out of sync for a table during a 
rebalance with instance reassignment, the `MultiStageReplicaGroupSelector` will 
drop segments.
   - Consider this scenario (extreme example where all servers are replaced for 
illustrative purposes) - a table initially has two replica groups, RG0: 
`{instance-0, instance-1}`, RG1: `{instance-2, instance-3}`. The table config 
is updated (let's say a server tenant / tag change) and a rebalance with 
instance reassignment is triggered. The instance partitions in ZK is updated to 
RG0: `{instance-100, instance-101}`, RG1: `{instance-102, instance-103}`. After 
this change is made, and before the ideal state is fully updated by the table 
rebalancer to move all the segments to the new servers, the 
`MultiStageReplicaGroupSelector` will compute a `null` partitionId 
[here](https://github.com/apache/pinot/blob/189bbdfcfff334eef2fbc116d4f79de96b12c9b7/pinot-broker/src/main/java/org/apache/pinot/broker/routing/instanceselector/MultiStageReplicaGroupSelector.java#L156),
 and this will lead to the returned segment to selected instance map being 
empty.
   - The proposal to fix this issue is an overhaul of the 
`MultiStageReplicaGroupSelector` logic. We can track the instance partitions in 
the ideal state (through new list fields); these lists will contain both the 
old set of instances and the new set of instances during a rebalance with 
instance reassignment. After the rebalance is successfully completed, the lists 
will be updated to contain only the new set of instances. The 
`MultiStageReplicaGroupSelector` can use this new metadata from the ideal state 
to choose the instances for a request, instead of relying on the ZK instance 
partitions. This also paves the way for other replica group based routing 
strategies.
   - The difference between the ideal state instance partitions and the 
instance partitions stored separately in the property store in ZK is that the 
ideal state version will be used for query routing (and can contain 
intermediate states) whereas the dedicated instance partitions ZNode will 
always contain the _target_ instance partitions which is used for making 
assignment decisions for any new segments (except for upsert tables). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to