priyen opened a new issue, #11448: URL: https://github.com/apache/pinot/issues/11448
The scenario is as follows, assuming we have 3 replicas for the table: - increase instances / replica config by some number - a rebalance is kicked off with reAssignInstances=True, minReplicas=2, and includeConsuming=False, and downtime=False - this should cause all sealed segments to move appropriately - finally, once a consuming segment seals, (consuming to online), we notice that the ingestion delay tracker (pinot.server.realtime_ingestion_delay) metric continues to rise from 0. Tracking the code, we determined this happens when the segment is sealed, but is no longer under ownership of that instance, and so it is also dropped. In the code, it is marked for "verification" as part of the transition message handling. At some point, the background thread will realize this partition is inactive, and stop tracking the lag. This takes 10~ mins, so we see a increasing lag over time from the moment this transition happens. Relevant code - https://github.com/apache/pinot/blob/399f033ec3917df2bc478b5904406a95e0bc7258/pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/IngestionDelayTracker.java#L91 Desired behaviour - lag tracking is stopped the moment the partition is transitioned/dropped from said instance. Right now, that function call simply marks the partition for verification cc @jugomezv cc @jadami-stripe -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org