gortiz opened a new issue, #12696:
URL: https://github.com/apache/pinot/issues/12696

   # Proposal: Multi Stage actual physical/implementation explain
   
   Apache Pinot has some particularities that make `explain` sentences a bit 
more complex than customers may be used to.
   First, Pinot is a distributed database where each node may not have all the 
information.
   In fact that is the normal case.
   For example Brokers don’t know much about the actual schema, indexes and 
other important information like max, min, sorting, etc. 
   Second, different segments on the same table may have been created in a 
different way. 
   For example, in the same table some segments may be sorted by a column while 
others are not.
   Same happens with the presence of indexes or even columns.
   
   ## Current State
   In Apache Pinot 1.1.0 we have the following explains:
   
   * In V1 (aka single stage) we have two versions:
     * `explain plan for`, where the result combines the broker part with the 
result of a single segment of a single server. This includes important physical 
information like whether an index is being used or not. This is good enough for 
most cases.
        * Specifically we ask for all plans be keep only the one that:
            1. The greater sorted in the following order: No matching segment < 
empty < match all < any other
            2. In case of a tie, we pick the one that is deeper.
        * Verbose mode, which is enabled by setting the query option 
`explainPlanVerbose` to true. When this is set, all unique server plans are 
returned.
   * In V2 (aka multi stage) we have the following versions:
     * `explain implementation for`, which returns some kind of physical plan. 
The returned plan is different from the one returned in V1. Not just in format 
but also semantically. The plan here does not include important physical 
information like whether indexes are being used or segments are being pruned. 
Instead it seems to be focused on how exchanges are implemented as send and 
receive mailboxes.
     * `explain plan for`, which returns the logical plan. This sentence can be 
decorated with:
        * `AS JSON/XML/DOT`, which returns the same result but in different 
formats. JSON/XML is very verbose but can be useful to deeply understand the 
plan and DOT can be used to draw the plan like. See the Appendix A
       * `WITH TYPE`, which does nothing
       * `WITH IMPLEMENTATION`, which means to show the physical plan, but does 
nothing.
       * `WITHOUT IMPLEMENTATION`, which means to show the logical plan, but we 
ignore it.
   
   Notice that in V2 `explainPlanVerbose` is ignored and there is no way to get 
the actual physical plan. The only way to understand whether we are using an 
index or not is to extract the equivalent V1 query (which is not always 
possible) and explain that query in V1, using `explainPlanVerbose` if necessary.
   
   Also notice we have the `trace` option. When it is used, we trace how time 
is being spent during query execution. When a explain query is executed with 
trace enabled, what we trace is how long it takes to create the explain plan.
   
   ## Ideal state
   The `explain` sentence should be homogenous in Apache Pinot. In both V1 and 
V2, `explain plan` for should:
   * Return the physical plan, including:
     * Exchange mechanism used, if any (Queries in V1 will not contain 
exchanges).
     * Indexes used in the leaf stage, if any. 
   * Return the result in the same format. Ideally we should use the Calcite 
default format (internally called TEXT). This sentence can be decorated with:
     * `without implementation` to return the logical plan instead. Given there 
is no concept of logical plan in V1, we can either create one simple logical 
plan or return the physical plan. Alternatively we could fail in V1, but the 
error must explicitly say we fail because there is no logical plan in V1.
     * `as JSON/XML/DOT` to return the result in that specific format.
   * By default explain will be simplified in a similar way V1 explain is 
simplified right now (although the heuristic may be different) so only one plan 
is returned even if there are segments with different plans.
     * In order to receive all possible plans, the query option 
`explainPlanVerbose` must be true (same as V1 in the current form).
   * There must be a way to explain a query using trace in a way that the query 
is actually executed but no result is sent. Instead trace information is 
attached to the explain plan. In this mode we should be able to see, for each 
explain node, how much time has been spent on that node.
     * This mode should be similar to explain analyze in Postgres.
     * This mode may work with the same trace we have right now or with a 
different method, depending on how much we care about the ability to trace the 
explain sentence itself.
     * Trace should include
        * Rows and or bytes serialized between exchanges.
   * Pinot UI should provide a way to visualize the plain as a chart.
     * At least we should be able to provide something like what we can 
currently do in V2 using DOT format. See [Appendix 
A](https://docs.google.com/document/d/14YD-Tbb2NxxMULwpFX4b6uC9YjhygPYgiHvI0jt0q0s/edit#bookmark=id.v5c39orxv8dy).
     * Ideally, we should be able to do something like 
[this](https://explain.dalibo.com/plan/d3ff19b3agf57951) postgres plan.
   
   ## Appendix A
   Logical explain for query:
   
   ```
   explain plan as dot for 
   SELECT a.playerName,
     a.teamID,
     b.teamName
   FROM baseballStats_OFFLINE AS a
     JOIN dimBaseballTeams_OFFLINE AS b ON a.teamID = b.teamID
     where a.teamID = 10
   LIMIT 10
   ```
   
   Drawn using 
[GraphizOnline](https://dreampuf.github.io/GraphvizOnline/#digraph%20%7B%0A%22PinotLogicalSortExch%5Cnange%5Cndistribution%20%3D%20hash%5Cncollation%20%3D%20%5B%5D%5CnisSortOnSender%20%3D%20fal%5Cnse%5CnisSortOnReceiver%20%3D%20f%5Cnalse%22%20-%3E%20%22LogicalSort%5Cnoffset%20%3D%200%5Cnfetch%20%3D%2010%5Cn%22%20%5Blabel%3D%220%22%5D%0A%22LogicalSort%5Cnfetch%20%3D%2010%5Cn%22%20-%3E%20%22PinotLogicalSortExch%5Cnange%5Cndistribution%20%3D%20hash%5Cncollation%20%3D%20%5B%5D%5CnisSortOnSender%20%3D%20fal%5Cnse%5CnisSortOnReceiver%20%3D%20f%5Cnalse%22%20%5Blabel%3D%220%22%5D%0A%22LogicalProject%5CnplayerName%20%3D%20%240%5CnteamID%20%3D%20%241%5CnteamName%20%3D%20%243%5Cn%22%20-%3E%20%22LogicalSort%5Cnfetch%20%3D%2010%5Cn%22%20%5Blabel%3D%220%22%5D%0A%22LogicalJoin%5Cncondition%20%3D%20%3D(%241%2C%20%242%5Cn)%5CnjoinType%20%3D%20inner%5Cn%22%20-%3E%20%22LogicalProject%5CnplayerName%20%3D%20%240%5CnteamID%20%3D%20%241%5CnteamName%20%3D%20%243%5Cn%22%20%5Blabel%3D%220%22%5D
 
%0A%22PinotLogicalExchange%5Cndistribution%20%3D%20hash%5B%5Cn1%5D%5Cn%22%20-%3E%20%22LogicalJoin%5Cncondition%20%3D%20%3D(%241%2C%20%242%5Cn)%5CnjoinType%20%3D%20inner%5Cn%22%20%5Blabel%3D%220%22%5D%0A%22PinotLogicalExchange%5Cndistribution%20%3D%20hash%5B%5Cn0%5D%5Cn%22%20-%3E%20%22LogicalJoin%5Cncondition%20%3D%20%3D(%241%2C%20%242%5Cn)%5CnjoinType%20%3D%20inner%5Cn%22%20%5Blabel%3D%221%22%5D%0A%22LogicalProject%5CnplayerName%20%3D%20%2417%5CnteamID%20%3D%20%2425%5Cn%22%20-%3E%20%22PinotLogicalExchange%5Cndistribution%20%3D%20hash%5B%5Cn1%5D%5Cn%22%20%5Blabel%3D%220%22%5D%0A%22LogicalFilter%5Cncondition%20%3D%20%3D(CAST(%24%5Cn25)%3AINTEGER%20NOT%20NULL%5Cn%2C%2010)%5Cn%22%20-%3E%20%22LogicalProject%5CnplayerName%20%3D%20%2417%5CnteamID%20%3D%20%2425%5Cn%22%20%5Blabel%3D%220%22%5D%0A%22LogicalTableScan%5Cntable%20%3D%20%5BbaseballSta%5Cnts_OFFLINE%5D%5Cn%22%20-%3E%20%22LogicalFilter%5Cncondition%20%3D%20%3D(CAST(%24%5Cn25)%3AINTEGER%20NOT%20NULL%5Cn%2C%2010)%5Cn%22%20%5Blabel%3D%
 
220%22%5D%0A%22LogicalProject%5CnteamID%20%3D%20%243%5CnteamName%20%3D%20%244%5Cn%22%20-%3E%20%22PinotLogicalExchange%5Cndistribution%20%3D%20hash%5B%5Cn0%5D%5Cn%22%20%5Blabel%3D%220%22%5D%0A%22LogicalTableScan%5Cntable%20%3D%20%5BdimBaseball%5CnTeams_OFFLINE%5D%5Cn%22%20-%3E%20%22LogicalProject%5CnteamID%20%3D%20%243%5CnteamName%20%3D%20%244%5Cn%22%20%5Blabel%3D%220%22%5D%0A%7D).
 Edges should be annotated with information related to cost or rows, but given 
we don’t support that in our explain, all edges are annotated with 0.
   
   
![graphviz](https://github.com/apache/pinot/assets/1913993/00c643f5-964b-406d-93cf-d3917d78a731)
   
   
   This issue is related to #11039


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to