jugomezv commented on PR #9994: URL: https://github.com/apache/pinot/pull/9994#issuecomment-1403953161
> I know this image doesn't have a lot of context, but it's definitely in milliseconds, and it seems this affects ~7/48 partitions for this topic. Thanks a lot, the scale in the y-axis is that hours? days? Let me continue to look in to the consume loop to see if there is other places where consumption could be stuck that would lead to such increases. I have looked at the consume loop code and have the following suggestions: Can you enable debug logs: there is a wealth of debug/info traces in that can help us tell the difference of consumption patterns between your partitions and what leads to the ramping up times. Currently there are two places where this code blocks: on fetching a message batch from stream (configurable timeout described above) and right after we get an empty batch where we block for 100milliseconds. We also have a number of other interesting metrics which you should correlate with the graph above: LLC_PARTITION_CONSUMING should indicate if the partition is actively consuming or not HIGHEST_STREAM_OFFSET_CONSUMED REALTIME_ROWS_CONSUMED INVALID_REALTIME_ROWS_DROPPED INCOMPLETE_REALTIME_ROWS_CONSUMED Also another question for you: is there any filtering of messages going on? I noticed that if we do get a batch of messages and all are filtered the metric could reflect the lag for the last unfiltered message -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org