somandal commented on PR #15368:
URL: https://github.com/apache/pinot/pull/15368#issuecomment-2754805807

   I was thinking about this some more, and was wondering if it would be 
possible to come up with something like the following:
   
   ```
   "consumingSegmentSummary": {
       "numConsumingSegmentsToBeMoved": 10,
       "numServersGettingConsumingSegmentsAdded": 5,
       "topConsumingSegmentsToOffsetDifferenceMap": { // map of the top 10 
consuming segments with highest offset difference (latest - startOffset)
           "consuming_seg_10": 500,
           "consuming_seg_8": 400,
           "consuming_seg_24": 350,
           ...
           ...
       },
       "topConsumingSegmentsToAgeDifferenceInMinutesMap": { // map of the top 
10 consuming segments with oldest startOffset (please dig into whether age can 
be determined based on Kafka offset, if not, CONSUMING segment creation time 
can be a proxy - update field name in that case)
           "consuming_seg_10": 128,
           "consuming_seg_6": 120,
           "consuming_seg_12": 65,
           ...
           ...
       },
       "serverInfoOnConsumingSegments": { // only have information about the 
servers getting new consuming segments, easy to verify there is no issue with 
numServersGettingConsumingSegmentsAdded
           "Server_10": { // let's assume this has consuming_seg_10 and 
consuming_seg_24 added (so numbers add up)
               "consumingSegmentsAdded": 2,
               "totalOffsetsToCatchUpAcrossAllConsumingSegments": 850,
               "oldestConsumingSegmentAgeAdded": 128
           },
           "Server_42": {
               "consumingSegmentsAdded": 2,
               "totalOffsetsToCatchUpAcrossAllConsumingSegments": 350,
               "oldestConsumingSegmentAgeAdded": 50
           },
           ...
           ...
       }
   }
   
   ```
   
   Let's say we get errors when fetching the consuming segment info about 
latest offsets / ZK metadata lookup, etc, we make those fields null. An example 
might look like:
   
   ```
   "consumingSegmentSummary": {
       "numConsumingSegmentsToBeMoved": 10,
       "numServersGettingConsumingSegmentsAdded": 5,
       "topConsumingSegmentsToOffsetDifferenceMap": null, // could not fetch 
partition info for consuming segments
       "topConsumingSegmentsToAgeDifferenceInMinutesMap": null, // could not 
fetch partition info for consuming segment or ZK metadata
       "serverInfoOnConsumingSegments": { // only have information about the 
servers getting new consuming segments, easy to verify there is no issue with 
numServersGettingConsumingSegmentsAdded
           "Server_10": {
               "consumingSegmentsAdded": 2,
               "totalOffsetsToCatchUpAcrossAllConsumingSegments": -1, // error, 
couldn't fetch partition info for consuming segments
               "oldestConsumingSegmentAgeAdded": -1 // error, couldn't fetch 
partition info for consuming segment or ZK metadata
           },
           "Server_42": {
               "consumingSegmentsAdded": 2,
               "totalOffsetsToCatchUpAcrossAllConsumingSegments": -1,
               "oldestConsumingSegmentAgeAdded": -1
           },
           ...
           ...
       }
   }
   ```
   
   Then for offline tables, and realtime tables that have 0 consuming segments, 
lets have:
   
   ```
   "consumingSegmentSummary": null
   ```
   That way we don't even show the field and know there is no need to think 
about consuming segments
   
   How does the above sound? Let me know if you'd like to discuss more


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to