[I] Improve and unify explain plan [pinot]

via GitHub Fri, 22 Mar 2024 01:41:06 -0700


gortiz opened a new issue, #12696:
URL: https://github.com/apache/pinot/issues/12696

# Proposal: Multi Stage actual physical/implementation explain

Apache Pinot has some particularities that make `explain` sentences a bit
more complex than customers may be used to.
First, Pinot is a distributed database where each node may not have all the
information.
In fact that is the normal case.
For example Brokers don’t know much about the actual schema, indexes and
other important information like max, min, sorting, etc.
Second, different segments on the same table may have been created in a
different way.
For example, in the same table some segments may be sorted by a column while
others are not.
Same happens with the presence of indexes or even columns.

## Current State
In Apache Pinot 1.1.0 we have the following explains:

* In V1 (aka single stage) we have two versions:
* `explain plan for`, where the result combines the broker part with the
result of a single segment of a single server. This includes important physical
information like whether an index is being used or not. This is good enough for
most cases.
* Specifically we ask for all plans be keep only the one that:
1. The greater sorted in the following order: No matching segment <
empty < match all < any other
2. In case of a tie, we pick the one that is deeper.
* Verbose mode, which is enabled by setting the query option
`explainPlanVerbose` to true. When this is set, all unique server plans are
returned.
* In V2 (aka multi stage) we have the following versions:
* `explain implementation for`, which returns some kind of physical plan.
The returned plan is different from the one returned in V1. Not just in format
but also semantically. The plan here does not include important physical
information like whether indexes are being used or segments are being pruned.
Instead it seems to be focused on how exchanges are implemented as send and
receive mailboxes.
* `explain plan for`, which returns the logical plan. This sentence can be
decorated with:
* `AS JSON/XML/DOT`, which returns the same result but in different
formats. JSON/XML is very verbose but can be useful to deeply understand the
plan and DOT can be used to draw the plan like. See the Appendix A
* `WITH TYPE`, which does nothing
* `WITH IMPLEMENTATION`, which means to show the physical plan, but does
nothing.
* `WITHOUT IMPLEMENTATION`, which means to show the logical plan, but we
ignore it.

Notice that in V2 `explainPlanVerbose` is ignored and there is no way to get
the actual physical plan. The only way to understand whether we are using an
index or not is to extract the equivalent V1 query (which is not always
possible) and explain that query in V1, using `explainPlanVerbose` if necessary.

Also notice we have the `trace` option. When it is used, we trace how time
is being spent during query execution. When a explain query is executed with
trace enabled, what we trace is how long it takes to create the explain plan.

## Ideal state
The `explain` sentence should be homogenous in Apache Pinot. In both V1 and
V2, `explain plan` for should:
* Return the physical plan, including:
* Exchange mechanism used, if any (Queries in V1 will not contain
exchanges).
* Indexes used in the leaf stage, if any.
* Return the result in the same format. Ideally we should use the Calcite
default format (internally called TEXT). This sentence can be decorated with:
* `without implementation` to return the logical plan instead. Given there
is no concept of logical plan in V1, we can either create one simple logical
plan or return the physical plan. Alternatively we could fail in V1, but the
error must explicitly say we fail because there is no logical plan in V1.
* `as JSON/XML/DOT` to return the result in that specific format.
* By default explain will be simplified in a similar way V1 explain is
simplified right now (although the heuristic may be different) so only one plan
is returned even if there are segments with different plans.
* In order to receive all possible plans, the query option
`explainPlanVerbose` must be true (same as V1 in the current form).
* There must be a way to explain a query using trace in a way that the query
is actually executed but no result is sent. Instead trace information is
attached to the explain plan. In this mode we should be able to see, for each
explain node, how much time has been spent on that node.
* This mode should be similar to explain analyze in Postgres.
* This mode may work with the same trace we have right now or with a
different method, depending on how much we care about the ability to trace the
explain sentence itself.
* Trace should include
* Rows and or bytes serialized between exchanges.
* Pinot UI should provide a way to visualize the plain as a chart.
* At least we should be able to provide something like what we can
currently do in V2 using DOT format. See [Appendix
A](https://docs.google.com/document/d/14YD-Tbb2NxxMULwpFX4b6uC9YjhygPYgiHvI0jt0q0s/edit#bookmark=id.v5c39orxv8dy).
* Ideally, we should be able to do something like
[this](https://explain.dalibo.com/plan/d3ff19b3agf57951) postgres plan.

## Appendix A
Logical explain for query:

```
explain plan as dot for
SELECT a.playerName,
a.teamID,
b.teamName
FROM baseballStats_OFFLINE AS a
JOIN dimBaseballTeams_OFFLINE AS b ON a.teamID = b.teamID
where a.teamID = 10
LIMIT 10
```

Drawn using
[GraphizOnline](https://dreampuf.github.io/GraphvizOnline/#digraph%20%7B%0A%22PinotLogicalSortExch%5Cnange%5Cndistribution%20%3D%20hash%5Cncollation%20%3D%20%5B%5D%5CnisSortOnSender%20%3D%20fal%5Cnse%5CnisSortOnReceiver%20%3D%20f%5Cnalse%22%20-%3E%20%22LogicalSort%5Cnoffset%20%3D%200%5Cnfetch%20%3D%2010%5Cn%22%20%5Blabel%3D%220%22%5D%0A%22LogicalSort%5Cnfetch%20%3D%2010%5Cn%22%20-%3E%20%22PinotLogicalSortExch%5Cnange%5Cndistribution%20%3D%20hash%5Cncollation%20%3D%20%5B%5D%5CnisSortOnSender%20%3D%20fal%5Cnse%5CnisSortOnReceiver%20%3D%20f%5Cnalse%22%20%5Blabel%3D%220%22%5D%0A%22LogicalProject%5CnplayerName%20%3D%20%240%5CnteamID%20%3D%20%241%5CnteamName%20%3D%20%243%5Cn%22%20-%3E%20%22LogicalSort%5Cnfetch%20%3D%2010%5Cn%22%20%5Blabel%3D%220%22%5D%0A%22LogicalJoin%5Cncondition%20%3D%20%3D(%241%2C%20%242%5Cn)%5CnjoinType%20%3D%20inner%5Cn%22%20-%3E%20%22LogicalProject%5CnplayerName%20%3D%20%240%5CnteamID%20%3D%20%241%5CnteamName%20%3D%20%243%5Cn%22%20%5Blabel%3D%220%22%5D

%0A%22PinotLogicalExchange%5Cndistribution%20%3D%20hash%5B%5Cn1%5D%5Cn%22%20-%3E%20%22LogicalJoin%5Cncondition%20%3D%20%3D(%241%2C%20%242%5Cn)%5CnjoinType%20%3D%20inner%5Cn%22%20%5Blabel%3D%220%22%5D%0A%22PinotLogicalExchange%5Cndistribution%20%3D%20hash%5B%5Cn0%5D%5Cn%22%20-%3E%20%22LogicalJoin%5Cncondition%20%3D%20%3D(%241%2C%20%242%5Cn)%5CnjoinType%20%3D%20inner%5Cn%22%20%5Blabel%3D%221%22%5D%0A%22LogicalProject%5CnplayerName%20%3D%20%2417%5CnteamID%20%3D%20%2425%5Cn%22%20-%3E%20%22PinotLogicalExchange%5Cndistribution%20%3D%20hash%5B%5Cn1%5D%5Cn%22%20%5Blabel%3D%220%22%5D%0A%22LogicalFilter%5Cncondition%20%3D%20%3D(CAST(%24%5Cn25)%3AINTEGER%20NOT%20NULL%5Cn%2C%2010)%5Cn%22%20-%3E%20%22LogicalProject%5CnplayerName%20%3D%20%2417%5CnteamID%20%3D%20%2425%5Cn%22%20%5Blabel%3D%220%22%5D%0A%22LogicalTableScan%5Cntable%20%3D%20%5BbaseballSta%5Cnts_OFFLINE%5D%5Cn%22%20-%3E%20%22LogicalFilter%5Cncondition%20%3D%20%3D(CAST(%24%5Cn25)%3AINTEGER%20NOT%20NULL%5Cn%2C%2010)%5Cn%22%20%5Blabel%3D%

220%22%5D%0A%22LogicalProject%5CnteamID%20%3D%20%243%5CnteamName%20%3D%20%244%5Cn%22%20-%3E%20%22PinotLogicalExchange%5Cndistribution%20%3D%20hash%5B%5Cn0%5D%5Cn%22%20%5Blabel%3D%220%22%5D%0A%22LogicalTableScan%5Cntable%20%3D%20%5BdimBaseball%5CnTeams_OFFLINE%5D%5Cn%22%20-%3E%20%22LogicalProject%5CnteamID%20%3D%20%243%5CnteamName%20%3D%20%244%5Cn%22%20%5Blabel%3D%220%22%5D%0A%7D).
Edges should be annotated with information related to cost or rows, but given
we don’t support that in our explain, all edges are annotated with 0.

![graphviz](https://github.com/apache/pinot/assets/1913993/00c643f5-964b-406d-93cf-d3917d78a731)

This issue is related to #11039

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[I] Improve and unify explain plan [pinot]

Reply via email to