npawar commented on a change in pull request #6667: URL: https://github.com/apache/incubator-pinot/pull/6667#discussion_r606531505
########## File path: pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/PinotTableIdealStateBuilder.java ########## @@ -115,14 +117,45 @@ public static void buildLowLevelRealtimeIdealStateFor(PinotLLCRealtimeSegmentMan pinotLLCRealtimeSegmentManager.setUpNewTable(realtimeTableConfig, idealState); } - public static int getPartitionCount(StreamConfig streamConfig) { - PartitionCountFetcher partitionCountFetcher = new PartitionCountFetcher(streamConfig); + /** + * Fetches the list of {@link PartitionGroupInfo} for the new partition groups for the stream, + * with the help of the {@link PartitionGroupMetadata} of the current partitionGroups. + * + * Reasons why <code>currentPartitionGroupMetadata</code> is needed: + * + * The current partition group metadata is used to determine the offsets that have been consumed for a partition group. + * An example of where the offsets would be used: + * e.g. If partition group 1 contains shardId 1, with status DONE and endOffset 150. There's 2 possibilities: + * 1) the stream indicates that shardId's last offset is 200. + * This tells Pinot that partition group 1 still has messages which haven't been consumed, and must be included in the response. + * 2) the stream indicates that shardId's last offset is 150, + * This tells Pinot that all messages of partition group 1 have been consumed, and it need not be included in the response. + * Thus, this call will skip a partition group when it has reached end of life and all messages from that partition group have been consumed. + * + * The current partition group metadata is also used to know about existing groupings of partitions, + * and accordingly make the new partition groups. + * e.g. Assume that partition group 1 has status IN_PROGRESS and contains shards 0,1,2 + * and partition group 2 has status DONE and contains shards 3,4. + * In the above example, the currentPartitionGroupMetadataList indicates that + * the collection of shards in partition group 1, should remain unchanged in the response, + * whereas shards 3,4 can be added to new partition groups if needed. + * + * @param streamConfig the streamConfig from the tableConfig + * @param currentPartitionGroupMetadataList List of {@link PartitionGroupMetadata} for the current partition groups. + * The size of this list is equal to the number of partition groups, + * and is created using the latest segment zk metadata. + */ + public static List<PartitionGroupInfo> getPartitionGroupInfoList(StreamConfig streamConfig, + List<PartitionGroupMetadata> currentPartitionGroupMetadataList) { + PartitionGroupInfoFetcher partitionGroupInfoFetcher = + new PartitionGroupInfoFetcher(streamConfig, currentPartitionGroupMetadataList); try { - RetryPolicies.noDelayRetryPolicy(3).attempt(partitionCountFetcher); - return partitionCountFetcher.getPartitionCount(); + RetryPolicies.noDelayRetryPolicy(3).attempt(partitionGroupInfoFetcher); Review comment: Added randomDelayRetryPolicy -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org