chenboat commented on PR #18433:
URL: https://github.com/apache/pinot/pull/18433#issuecomment-4580920166

   > > I do not understand this example and the argument. The partitions 0 and 
10000 belong to two different streams. Although both of them is the partition 0 
of their streams, why do they need to be colocated?
   > 
   > Good question - let me clarify the colocation argument.
   > 
   > In this specific setup, both streams are co-partitioned by the same key 
(trace_id). That means stream 0 partition 0 and stream 1 partition 0 contain 
data for the same set of trace IDs (those where trace_id % 3 == 0). Colocating 
them on the same server means a query filtering by a specific trace_id can be 
served entirely locally without scatter-gathering across multiple server groups.
   > 
   > That said, you're right that if the two streams have no relationship 
between their partition keys, colocation across streams wouldn't be a 
requirement.
   
   > 
   > But more fundamentally, the fix is necessary for correctness of instance 
assignment even independent of the colocation argument. With numPartitions: 3 
configured, the intent is:
   > 
   > * stream partition 0 → instance group 0
   > * stream partition 1 → instance group 1
   > * stream partition 2 → instance group 2
   > 
   > Without the fix, stream 1's segments get assigned via the raw Pinot 
partition ID:
   > 
   > * 10000 % 3 = 1 → instance group 1 (wrong, should be 0)
   > * 10001 % 3 = 2 → instance group 2 (wrong, should be 1)
   > * 10002 % 3 = 0 → instance group 0 (wrong, should be 2)
   
   In the above example, the current assignment will assign in a round-robin 
manner to balance segment placement. Why 10000%3 = 1 is a wrong assignment 
result?
   
   > 
   > This produces an arbitrary and scrambled mapping that doesn't match what 
the user configured at all — segments from stream 1 would be distributed across 
servers in a way that's inconsistent with the replicaGroupPartitionConfig. The 
fix ensures both streams use their stream-level partition ID consistently when 
computing the instance group.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to