rhodo opened a new pull request, #15722:
URL: https://github.com/apache/pinot/pull/15722

   During large table rebalances, a massive number of state transitions may be 
triggered. If a server cannot keep up, the size of its Helix message queue can 
grow significantly. This PR adds visibility into the server-side Helix message 
queue size.
   
   Some rationale:
   - This PR delegates responsibility to each server instance to monitor and 
log its own message queue size metrics, instead of relying on the controller.
   - It decouples the getHelixServerMessageCount() method from the metrics 
scraping thread. This ensures that:  
     - The frequency of metrics scraping does not introduce additional I/O 
pressure on ZooKeeper.
     - ZooKeeper I/O latency do not interfere with the metrics scraping process.
   
    ## Test
   
   In quickstart trigger segment reload, meanwhile intentionally block segment 
reload handler in server, then observing the queue size bump from 0 -> 1, after 
let segment reload go through, saw metric go back to 0
   
   ![Screenshot 2025-05-06 at 4 20 47 
PM](https://github.com/user-attachments/assets/29a0cd39-b88f-454d-95d8-b35186b5c869)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to