shauryachats commented on PR #18433:
URL: https://github.com/apache/pinot/pull/18433#issuecomment-4579885435
> I do not understand this example and the argument. The partitions 0 and
10000 belong to two different streams. Although both of them is the partition 0
of their streams, why do they need to be colocated?
Good question - let me clarify the colocation argument.
In this specific setup, both streams are co-partitioned by the same key
(trace_id). That means stream 0 partition 0 and
stream 1 partition 0 contain data for the same set of trace IDs (those
where trace_id % 3 == 0). Colocating them on the
same server means a query filtering by a specific trace_id can be served
entirely locally without scatter-gathering
across multiple server groups.
That said, you're right that if the two streams have no relationship
between their partition keys, colocation across
streams wouldn't be a requirement.
But more fundamentally, the fix is necessary for correctness of instance
assignment even independent of the colocation argument. With numPartitions: 3
configured, the intent is:
- stream partition 0 → instance group 0
- stream partition 1 → instance group 1
- stream partition 2 → instance group 2
Without the fix, stream 1's segments get assigned via the raw Pinot
partition ID:
- 10000 % 3 = 1 → instance group 1 (wrong, should be 0)
- 10001 % 3 = 2 → instance group 2 (wrong, should be 1)
- 10002 % 3 = 0 → instance group 0 (wrong, should be 2)
This produces an arbitrary and scrambled mapping that doesn't match what
the user configured at all — segments from stream 1 would be distributed across
servers in a way that's inconsistent with the replicaGroupPartitionConfig. The
fix ensures both streams use their stream-level partition ID consistently when
computing the instance group.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]