somandal commented on code in PR #15368: URL: https://github.com/apache/pinot/pull/15368#discussion_r2029139368
########## pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/rebalance/RebalanceSummaryResult.java: ########## @@ -306,18 +297,133 @@ public Map<String, ServerSegmentChangeInfo> getServerSegmentChangeInfo() { } } + public static class ConsumingSegmentToBeMovedSummary { + private final int _numConsumingSegmentsToBeMoved; + private final int _numServersGettingConsumingSegmentsAdded; + private final Map<String, Integer> _consumingSegmentsToBeMovedWithMostOffsetsToCatchUp; + private final Map<String, Integer> _consumingSegmentsToBeMovedWithOldestAgeInMinutes; + private final Map<String, ConsumingSegmentSummaryPerServer> _serverConsumingSegmentSummary; + + /** + * Constructor for ConsumingSegmentToBeMovedSummary + * @param numConsumingSegmentsToBeMoved total number of consuming segments to be moved as part of this rebalance + * @param numServersGettingConsumingSegmentsAdded maximum bytes of consuming segments to be moved to catch up + * @param consumingSegmentsToBeMovedWithMostOffsetsToCatchUp top consuming segments to be moved to catch up. + * Map from segment name to its number of offsets to + * catch up on the new server. This is essentially the + * difference between the latest offset of the stream + * and the segment's start offset of the stream. The + * map is set to null if the number of offsets to catch + * up could not be determined for at least one + * consuming segment + * @param consumingSegmentsToBeMovedWithOldestAgeInMinutes oldest consuming segments to be moved to catch up. Map + * from segment name to its age in minutes. The map is + * set to null if ZK metadata is not available or the + * creation time is not found for at least one consuming + * segment. + * The age of a segment is determined by its creation + * time from ZK metadata. Segment age is an approximation + * to data age for a consuming segment. It may not reflect + * the actual oldest age of data in the consuming segment. + * For reasons, a segment could consume events which date + * before the segment created. We collect segment age + * here as there is no obvious way to get the age of the + * oldest data in the stream for a specific consuming + * segment. + * consumingSegmentsToBeMovedWithMostOffsetsToCatchUp + * is a better indicator to the cost of moving a consuming + * segment if it presents. Review Comment: @J-HowHuang let's not add this I think I've said this multiple times and I feel like my point is getting completely lost - Number of offsets to catch up gives information about how much work needs to be done when the segment moves (and this is also not a very good estimate as different Kafka topics can have different payload sizes which will have a large effect on how much memory / cpu is used) - Segment age, or any age related type data we're trying to track gives ideas to the user about how stale the data will be when it first moves. This can be useful if an application is sensitive to querying stale data. E.g. application where staleness may not be tolerated well would be ad data for example. Is there a use case misunderstanding with this age stuff? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org