jmint-stripe opened a new issue #8352: URL: https://github.com/apache/pinot/issues/8352
We observed a situation with our Pinot cluster where a host got stuck processing Helix messages and the Helix task queue got backed up and kept increasing in size until we restarted the server and at that point it recovered. We eventually tracked down what we believe is the root cause which were some Zookeeper connection disconnects. We would expect that once the ZK connection is re-established it would be able to process the queued messages. Below are some logs from our investigation that shows how the message for adding a new consuming segment did not get processed on a server. ---------------- From inspecting our logs we see the controller assigns new segment to two streaming servers: ``` 2022/02/10 17:29:30.901 INFO [PinotLLCRealtimeSegmentManager] [grizzly-http-server-13] Adding new CONSUMING segment: SEGMENT_NAME_REDACTED to instances: [ServerA, ServerB] 2022/02/10 17:29:41.293 INFO [MessageDispatchStage] [HelixController-pipeline-default-PinotClusterRedacted-(f382b6fe_DEFAULT)] Event f382b6fe_DEFAULT : Sending Message 06e151ca-965b-47d6-9dcc-bbd4a0d02463 to ServerA transit point_transaction_REALTIME.SEGMENT_NAME_REDACTED|[] from:OFFLINE to:CONSUMING, relayMessages: 0 2022/02/10 17:29:41.293 INFO [MessageDispatchStage] [HelixController-pipeline-default-PinotClusterRedacted-(f382b6fe_DEFAULT)] Event f382b6fe_DEFAULT : Sending Message 7074fc66-0986-4199-bc3d-24825eaaf79b to ServerB transit point_transaction_REALTIME.SEGMENT_NAME_REDACTED|[] from:OFFLINE to:CONSUMING, relayMessages: 0 ``` Each server processes the message ServerA - we see it actually handle the message ``` 2022/02/10 17:29:41.311 INFO [HelixTaskExecutor] [ZkClient-EventThread-49-pinot-zookeeper-PinotClusterRedacted.service.consul:2181] Scheduling message 06e151ca-965b-47d6-9dcc-bbd4a0d02463: point_transaction_REALTIME:SEGMENT_NAME_REDACTED, OFFLINE->CONSUMING 2022/02/10 17:29:41.316 INFO [HelixStateTransitionHandler] [HelixTaskExecutor-message_handle_STATE_TRANSITION] handling message: 06e151ca-965b-47d6-9dcc-bbd4a0d02463 transit point_transaction_REALTIME.SEGMENT_NAME_REDACTED|[] from:OFFLINE to:CONSUMING, relayedFrom: null ``` ServerB - we do not see any further log messages for this message ``` 2022/02/10 17:29:41.306 INFO [HelixTaskExecutor] [ZkClient-EventThread-49-pinot-zookeeper-PinotClusterRedacted.service.consul:2181] Scheduling message 7074fc66-0986-4199-bc3d-24825eaaf79b: point_transaction_REALTIME:SEGMENT_NAME_REDACTED, OFFLINE->CONSUMING ``` if we look up the message UUID 7074fc66-0986-4199-bc3d-24825eaaf79b, we see that it’s scheduled, but the number queued tasks on that threadpool is 1133 ``` 2022/02/10 17:29:41.306 INFO [HelixTaskExecutor] [ZkClient-EventThread-49-pinot-zookeeper-PinotClusterRedacted.service.consul:2181] Message: 7074fc66-0986-4199-bc3d-24825eaaf79b handling task scheduled 2022/02/10 17:29:41.306 INFO [HelixTaskExecutor] [ZkClient-EventThread-49-pinot-zookeeper-PinotClusterRedacted.service.consul:2181] Submit task: 7074fc66-0986-4199-bc3d-24825eaaf79b to pool: java.util.concurrent.ThreadPoolExecutor@92d8506[Running, pool size = 40, active threads = 40, queued tasks = 1133, completed tasks = 1147] 2022/02/10 17:29:41.306 INFO [HelixTaskExecutor] [ZkClient-EventThread-49-pinot-zookeeper-PinotClusterRedacted.service.consul:2181] Scheduling message 7074fc66-0986-4199-bc3d-24825eaaf79b: point_transaction_REALTIME:SEGMENT_NAME_REDACTED, OFFLINE->CONSUMING 2022/02/10 17:29:41.303 INFO [MessageLatencyMonitor] [ZkClient-EventThread-49-pinot-zookeeper-PinotClusterRedacted.service.consul:2181] The latency of message 7074fc66-0986-4199-bc3d-24825eaaf79b is 289 ms 2022/02/10 17:29:41.293 INFO [MessageDispatchStage] [HelixController-pipeline-default-PinotClusterRedacted-(f382b6fe_DEFAULT)] Event f382b6fe_DEFAULT : Sending Message 7074fc66-0986-4199-bc3d-24825eaaf79b to ServerB transit point_transaction_REALTIME.SEGMENT_NAME_REDACTED|[] from:OFFLINE to:CONSUMING, relayMessages: 0 ``` in comparison, the threadpool on the other streaming host which only has zero queued tasks so it processes it immediately ``` 2022/02/10 17:29:41.311 INFO [HelixTaskExecutor] [ZkClient-EventThread-49-pinot-zookeeper-PinotClusterRedacted.service.consul:2181] Submit task: dd34ddd4-425b-41e5-a0f0-2863442daea6 to pool: java.util.concurrent.ThreadPoolExecutor@97a16b1[Running, pool size = 40, active threads = 0, queued tasks = 0, completed tasks = 864] ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org