dang-stripe opened a new issue, #16001: URL: https://github.com/apache/pinot/issues/16001
We've found it difficult to do post-hoc analysis of query timeouts/failures on the multistage engine without access to the stage stats. Debugging these queries have involved: 1. Piecing together distributed server logs to understand where the query failed or slowed down 2. Re-running the query, raising the timeout as needed, and collecting the stage stats adhoc Given the complexity of the multistage engine, it'd be ideal for these queries to have a concise summary log from the broker similar to how single stage does it [here](https://github.com/apache/pinot/blob/master/pinot-broker/src/main/java/org/apache/pinot/broker/querylog/QueryLogger.java#L184). Some metadata that'd be helpful for debugging: 1. Query success or failure 2. Query latency 3. Concise representation of the stage graph (1->[2,3],2->4,etc) including which stages are leafs 4. Which stages of the query succeeded or failed 5. How many servers were involved in each stage 6. How long (wall clock) time did each stage take This would make it much easier to debug production issues on the fly and provide centralized visibility into query execution across all stages. cc @gortiz @Jackie-Jiang @jadami10 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org