dang-stripe opened a new issue, #10787: URL: https://github.com/apache/pinot/issues/10787
We've observed some 425 error query failures during rolling restarts on a relatively low QPS cluster. Looking at logs, we noticed that the server shutdown before the broker finished processing the routing table update. It doesn't seem as though the server is waiting the full `pinot.server.shutdown.noQueryThresholdMs` before shutting down the process fully. ``` # server begins shutdown [2023-05-18 05:44:52.728337] INFO [BaseServerStarter] [Thread-41:17] Shutting down Pinot server [2023-05-18 05:44:52.747490] INFO [BaseServerStarter] [Thread-41:17] Sleep for 4608ms as there are still incoming queries (no query time: 10392ms is smaller than the threshold: 15000ms) # broker receives signal to remove server from routing table [2023-05-18 05:44:52.817685] INFO [BrokerRoutingManager] [ClusterChangeHandlingThread:25] Removing entry for server=Server1, table=Table1 # server stops quiescing after 4.6s [2023-05-18 05:44:57.355546] INFO [BaseServerStarter] [Thread-41:17] No query received within 15000ms (larger than the threshold: 15000ms), mark it as no incoming queries [2023-05-18 05:44:57.355592] INFO [BaseServerStarter] [Thread-41:17] Finished draining queries after 4608ms # roughly the time when broker starts query [2023-05-18 05:45:00.671645] Caused by: java.net.ConnectException: Connection refused [2023-05-18 05:45:00.671634] org.apache.pinot.shaded.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: Server1/10.20.30.40:8098 [2023-05-18 05:45:00.671597] ERROR [QueryRouter] [jersey-server-managed-async-executor-788:25] Caught exception while sending request 55024 to server: Server1, marking query failed [2023-05-18 05:45:00.723279] INFO [QueryLogger] [jersey-server-managed-async-executor-788:25] requestId=55024,table=Table1,timeMs=490 # broker finishes processing routing table change [2023-05-18 05:45:00.944494] INFO [BrokerRoutingManager] [ClusterChangeHandlingThread:25] Processed instance config change in 191ms (fetch 1040 instance configs: 68ms, calculate changed servers: 2ms, update 4 routing entries: 121ms), new enabled servers: [], new disabled servers: [Server1], excluded servers: [Server1] ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org