ankitsultana commented on issue #10010: URL: https://github.com/apache/pinot/issues/10010#issuecomment-1362450209
This edge case can happen when a realtime segment does a "Consuming to Online" transition which calls `LLRealtimeSegmentDataManager::destroy`, and the next segment for the same partition is assigned to the same server. In between the `containsKey` check in `AbstractMetrics` and the `get()` on the `_gaugeValues` map, the thread that does the destroy will remove the key from the map causing a NPE in the new segment's consumer thread. There are multiple issues here: 1. `AbstractMetrics` has a unsafe access pattern. Between when `containsKey` and `get` are called, a competing thread may remove that metric causing a NullPointerException which we saw above. 2. Even if we were to fix that, I am not sure how we guarantee the correctness of metrics. At a very minimum, ingestion shouldn't be impacted due to metrics which is the case right now. I raised #10022 which should fix that. --- Just so there's no confusion, this is the line where the exception is thrown. Our internal fork is a few months old. <img width="519" alt="image" src="https://user-images.githubusercontent.com/8644710/209063451-1c096942-f62a-43ad-82ef-659efde6f083.png"> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org