ege-st commented on code in PR #12157:
URL: https://github.com/apache/pinot/pull/12157#discussion_r1474666166


##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java:
##########
@@ -140,6 +142,8 @@ public class PinotLLCRealtimeSegmentManager {
   // Max time to wait for all LLC segments to complete committing their 
metadata while stopping the controller.
   private static final long MAX_LLC_SEGMENT_METADATA_COMMIT_TIME_MILLIS = 
30_000L;
 
+  private Map<Pair<String, String>, SegmentErrorInfo> _errorCache;

Review Comment:
   Just double checking my understanding of the error cache.  It's a map from 
each (table, segment) pair that is on this server to the most recent error 
message that was seen for that table/segment? In other words, for each server, 
we'll see the most recent error on each segment on that server. 
   
   1. Longer term is how to manage noisy errors vs not-noisy errors.  For 
example: if there's an error with missing offsets (which you're monitoring for 
in this PR) and a decoding error on 1/5 messages, the decoding error will flood 
the cache and block out the Offset Error from being seen.
   2. What happens when a table/segment is deleted or moved? The error cache 
will still have the non-existent segments and provide invalid information.  We 
have this issue with Ingestion Lag metrics and it's frequently causing false 
alerts and issues. If this happens multiple times then we can wind up with many 
servers reporting errors for the same segment which will be confusing during 
investigations.
   3. If you limit the size of this map, then it still needs to support all the 
extant segments that are on a server: so I'm not sure setting a fixed limit 
will work b/c how many segments a single server can have is not, so far as I 
know, strictly limited. So how can we determine what the max size should be?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to