dang-stripe opened a new issue, #16001:
URL: https://github.com/apache/pinot/issues/16001

   We've found it difficult to do post-hoc analysis of query timeouts/failures 
on the multistage engine without access to the stage stats. Debugging these 
queries have involved:
   
   1. Piecing together distributed server logs to understand where the query 
failed or slowed down
   2. Re-running the query, raising the timeout as needed, and collecting the 
stage stats adhoc
   
   Given the complexity of the multistage engine, it'd be ideal for these 
queries to have a concise summary log from the broker similar to how single 
stage does it 
[here](https://github.com/apache/pinot/blob/master/pinot-broker/src/main/java/org/apache/pinot/broker/querylog/QueryLogger.java#L184).
 Some metadata that'd be helpful for debugging:
   
   1. Query success or failure
   2. Query latency
   3. Concise representation of the stage graph (1->[2,3],2->4,etc) including 
which stages are leafs
   4. Which stages of the query succeeded or failed
   5. How many servers were involved in each stage
   6. How long (wall clock) time did each stage take
   
   This would make it much easier to debug production issues on the fly and 
provide centralized visibility into query execution across all stages.
   
   cc @gortiz @Jackie-Jiang @jadami10 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to