ege-st commented on code in PR #12157: URL: https://github.com/apache/pinot/pull/12157#discussion_r1474666166
########## pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java: ########## @@ -140,6 +142,8 @@ public class PinotLLCRealtimeSegmentManager { // Max time to wait for all LLC segments to complete committing their metadata while stopping the controller. private static final long MAX_LLC_SEGMENT_METADATA_COMMIT_TIME_MILLIS = 30_000L; + private Map<Pair<String, String>, SegmentErrorInfo> _errorCache; Review Comment: Just double checking my understanding of the error cache. It's a map from each (table, segment) pair that is on this server to the most recent error message that was seen for that table/segment? In other words, for each server, we'll see the most recent error on each segment on that server. 1. Longer term is how to manage noisy errors vs not-noisy errors. For example: if there's an error with missing offsets (which you're monitoring for in this PR) and a decoding error on 1/5 messages, the decoding error will flood the cache and block out the Offset Error from being seen. 2. What happens when a table/segment is deleted or moved? The error cache will still have the non-existent segments and provide invalid information. We have this issue with Ingestion Lag metrics and it's frequently causing false alerts and issues. If this happens multiple times then we can wind up with many servers reporting errors for the same segment which will be confusing during investigations. 3. If you limit the size of this map, then it still needs to support all the extant segments that are on a server: so I'm not sure setting a fixed limit will work b/c how many segments a single server can have is not, so far as I know, strictly limited. So how can we determine what the max size should be? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org