ankitsultana commented on issue #10010:
URL: https://github.com/apache/pinot/issues/10010#issuecomment-1362450209

   This edge case can happen when a realtime segment does a "Consuming to 
Online" transition which calls `LLRealtimeSegmentDataManager::destroy`, and the 
next segment for the same partition is assigned to the same server. In between 
the `containsKey` check in `AbstractMetrics` and the `get()` on the 
`_gaugeValues` map, the thread that does the  destroy will remove the key from 
the map causing a NPE in the new segment's consumer thread.
   
   There are multiple issues here:
   
   1. `AbstractMetrics` has a unsafe access pattern. Between when `containsKey` 
and `get` are called, a competing thread may remove that metric causing a 
NullPointerException which we saw above.
   2. Even if we were to fix that, I am not sure how we guarantee the 
correctness of metrics.
   
   At a very minimum, ingestion shouldn't be impacted due to metrics which is 
the case right now. I raised #10022 which should fix that.
   
   ---
   
   Just so there's no confusion, this is the line where the exception is 
thrown. Our internal fork is a few months old.
   
   <img width="519" alt="image" 
src="https://user-images.githubusercontent.com/8644710/209063451-1c096942-f62a-43ad-82ef-659efde6f083.png";>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to