gortiz opened a new pull request, #15609: URL: https://github.com/apache/pinot/pull/15609
This PR adds the ability to include stats on timeouts. Although we are ready to return stats on errors (since https://github.com/apache/pinot/pull/15245), the tricky part on timeouts is that different threads (on different jvms) use the same deadline calculated with `now() + timeout`. To get updated stats, we would need to do implement at least one of the following: 1. Fail as close to the leaves as possible 2. Ask for stats on failure 3. Use an alternative mechanism In this PR we try to do the latest. What I propose (and I'm very open to discussing), is the following: - On servers, when an opchain is scheduled, we store the stage root operator in a Guava cache indexed by opchain id. - This cache has a maximum size that could be configured. The size is based on the number of operators. - When the opchain finishes: - If finished successfully, the entry is removed from the cache - In other cases, it is kept - When a query is cancelled, we look for the opchains in the cache and retrieve their statistics. These stats are now returned in the cancel response. - On brokers, when a query fails, a cancel signal is sent to all servers. This is what we did before, but now we wait a bit for the stats in the responses. These stats are sent to the user. This system has several flaws, race conditions, may return partial data (some opchains may have been removed from the cache) and requires to keep some extra memory (bounded). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org