gortiz opened a new issue, #12696: URL: https://github.com/apache/pinot/issues/12696
# Proposal: Multi Stage actual physical/implementation explain Apache Pinot has some particularities that make `explain` sentences a bit more complex than customers may be used to. First, Pinot is a distributed database where each node may not have all the information. In fact that is the normal case. For example Brokers don’t know much about the actual schema, indexes and other important information like max, min, sorting, etc. Second, different segments on the same table may have been created in a different way. For example, in the same table some segments may be sorted by a column while others are not. Same happens with the presence of indexes or even columns. ## Current State In Apache Pinot 1.1.0 we have the following explains: * In V1 (aka single stage) we have two versions: * `explain plan for`, where the result combines the broker part with the result of a single segment of a single server. This includes important physical information like whether an index is being used or not. This is good enough for most cases. * Specifically we ask for all plans be keep only the one that: 1. The greater sorted in the following order: No matching segment < empty < match all < any other 2. In case of a tie, we pick the one that is deeper. * Verbose mode, which is enabled by setting the query option `explainPlanVerbose` to true. When this is set, all unique server plans are returned. * In V2 (aka multi stage) we have the following versions: * `explain implementation for`, which returns some kind of physical plan. The returned plan is different from the one returned in V1. Not just in format but also semantically. The plan here does not include important physical information like whether indexes are being used or segments are being pruned. Instead it seems to be focused on how exchanges are implemented as send and receive mailboxes. * `explain plan for`, which returns the logical plan. This sentence can be decorated with: * `AS JSON/XML/DOT`, which returns the same result but in different formats. JSON/XML is very verbose but can be useful to deeply understand the plan and DOT can be used to draw the plan like. See the Appendix A * `WITH TYPE`, which does nothing * `WITH IMPLEMENTATION`, which means to show the physical plan, but does nothing. * `WITHOUT IMPLEMENTATION`, which means to show the logical plan, but we ignore it. Notice that in V2 `explainPlanVerbose` is ignored and there is no way to get the actual physical plan. The only way to understand whether we are using an index or not is to extract the equivalent V1 query (which is not always possible) and explain that query in V1, using `explainPlanVerbose` if necessary. Also notice we have the `trace` option. When it is used, we trace how time is being spent during query execution. When a explain query is executed with trace enabled, what we trace is how long it takes to create the explain plan. ## Ideal state The `explain` sentence should be homogenous in Apache Pinot. In both V1 and V2, `explain plan` for should: * Return the physical plan, including: * Exchange mechanism used, if any (Queries in V1 will not contain exchanges). * Indexes used in the leaf stage, if any. * Return the result in the same format. Ideally we should use the Calcite default format (internally called TEXT). This sentence can be decorated with: * `without implementation` to return the logical plan instead. Given there is no concept of logical plan in V1, we can either create one simple logical plan or return the physical plan. Alternatively we could fail in V1, but the error must explicitly say we fail because there is no logical plan in V1. * `as JSON/XML/DOT` to return the result in that specific format. * By default explain will be simplified in a similar way V1 explain is simplified right now (although the heuristic may be different) so only one plan is returned even if there are segments with different plans. * In order to receive all possible plans, the query option `explainPlanVerbose` must be true (same as V1 in the current form). * There must be a way to explain a query using trace in a way that the query is actually executed but no result is sent. Instead trace information is attached to the explain plan. In this mode we should be able to see, for each explain node, how much time has been spent on that node. * This mode should be similar to explain analyze in Postgres. * This mode may work with the same trace we have right now or with a different method, depending on how much we care about the ability to trace the explain sentence itself. * Trace should include * Rows and or bytes serialized between exchanges. * Pinot UI should provide a way to visualize the plain as a chart. * At least we should be able to provide something like what we can currently do in V2 using DOT format. See [Appendix A](https://docs.google.com/document/d/14YD-Tbb2NxxMULwpFX4b6uC9YjhygPYgiHvI0jt0q0s/edit#bookmark=id.v5c39orxv8dy). * Ideally, we should be able to do something like [this](https://explain.dalibo.com/plan/d3ff19b3agf57951) postgres plan. ## Appendix A Logical explain for query: ``` explain plan as dot for SELECT a.playerName, a.teamID, b.teamName FROM baseballStats_OFFLINE AS a JOIN dimBaseballTeams_OFFLINE AS b ON a.teamID = b.teamID where a.teamID = 10 LIMIT 10 ``` Drawn using [GraphizOnline](https://dreampuf.github.io/GraphvizOnline/#digraph%20%7B%0A%22PinotLogicalSortExch%5Cnange%5Cndistribution%20%3D%20hash%5Cncollation%20%3D%20%5B%5D%5CnisSortOnSender%20%3D%20fal%5Cnse%5CnisSortOnReceiver%20%3D%20f%5Cnalse%22%20-%3E%20%22LogicalSort%5Cnoffset%20%3D%200%5Cnfetch%20%3D%2010%5Cn%22%20%5Blabel%3D%220%22%5D%0A%22LogicalSort%5Cnfetch%20%3D%2010%5Cn%22%20-%3E%20%22PinotLogicalSortExch%5Cnange%5Cndistribution%20%3D%20hash%5Cncollation%20%3D%20%5B%5D%5CnisSortOnSender%20%3D%20fal%5Cnse%5CnisSortOnReceiver%20%3D%20f%5Cnalse%22%20%5Blabel%3D%220%22%5D%0A%22LogicalProject%5CnplayerName%20%3D%20%240%5CnteamID%20%3D%20%241%5CnteamName%20%3D%20%243%5Cn%22%20-%3E%20%22LogicalSort%5Cnfetch%20%3D%2010%5Cn%22%20%5Blabel%3D%220%22%5D%0A%22LogicalJoin%5Cncondition%20%3D%20%3D(%241%2C%20%242%5Cn)%5CnjoinType%20%3D%20inner%5Cn%22%20-%3E%20%22LogicalProject%5CnplayerName%20%3D%20%240%5CnteamID%20%3D%20%241%5CnteamName%20%3D%20%243%5Cn%22%20%5Blabel%3D%220%22%5D %0A%22PinotLogicalExchange%5Cndistribution%20%3D%20hash%5B%5Cn1%5D%5Cn%22%20-%3E%20%22LogicalJoin%5Cncondition%20%3D%20%3D(%241%2C%20%242%5Cn)%5CnjoinType%20%3D%20inner%5Cn%22%20%5Blabel%3D%220%22%5D%0A%22PinotLogicalExchange%5Cndistribution%20%3D%20hash%5B%5Cn0%5D%5Cn%22%20-%3E%20%22LogicalJoin%5Cncondition%20%3D%20%3D(%241%2C%20%242%5Cn)%5CnjoinType%20%3D%20inner%5Cn%22%20%5Blabel%3D%221%22%5D%0A%22LogicalProject%5CnplayerName%20%3D%20%2417%5CnteamID%20%3D%20%2425%5Cn%22%20-%3E%20%22PinotLogicalExchange%5Cndistribution%20%3D%20hash%5B%5Cn1%5D%5Cn%22%20%5Blabel%3D%220%22%5D%0A%22LogicalFilter%5Cncondition%20%3D%20%3D(CAST(%24%5Cn25)%3AINTEGER%20NOT%20NULL%5Cn%2C%2010)%5Cn%22%20-%3E%20%22LogicalProject%5CnplayerName%20%3D%20%2417%5CnteamID%20%3D%20%2425%5Cn%22%20%5Blabel%3D%220%22%5D%0A%22LogicalTableScan%5Cntable%20%3D%20%5BbaseballSta%5Cnts_OFFLINE%5D%5Cn%22%20-%3E%20%22LogicalFilter%5Cncondition%20%3D%20%3D(CAST(%24%5Cn25)%3AINTEGER%20NOT%20NULL%5Cn%2C%2010)%5Cn%22%20%5Blabel%3D% 220%22%5D%0A%22LogicalProject%5CnteamID%20%3D%20%243%5CnteamName%20%3D%20%244%5Cn%22%20-%3E%20%22PinotLogicalExchange%5Cndistribution%20%3D%20hash%5B%5Cn0%5D%5Cn%22%20%5Blabel%3D%220%22%5D%0A%22LogicalTableScan%5Cntable%20%3D%20%5BdimBaseball%5CnTeams_OFFLINE%5D%5Cn%22%20-%3E%20%22LogicalProject%5CnteamID%20%3D%20%243%5CnteamName%20%3D%20%244%5Cn%22%20%5Blabel%3D%220%22%5D%0A%7D). Edges should be annotated with information related to cost or rows, but given we don’t support that in our explain, all edges are annotated with 0.  This issue is related to #11039 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org