gortiz opened a new pull request, #13733:
URL: https://github.com/apache/pinot/pull/13733

   This PR introduces a new way to explain multi-stage queries in Pinot. 
   The main goal is to provide a more detailed explanation of the query 
execution plan, including information about
   the physical operators that are being used.
   
   By default, `explain plan for` will return the new plan. If you want to use 
the old plan you can use 
   `explain plan without implementation for`. This may be problematic, so we 
can discuss to introduce a new flag for this.
   The main reason to break the default behavior is that the new plan is more 
verbose actually what a user should expect
   when asking for _implementation_, at least following Calcite terminology.
   Alternatively we can change the syntax in the same way we already did with 
`explain physical plan for`.
   
   At architectural level, the new explain mode is closer to the one used in 
single stage.
   The broker parses and optimizes the query generating a logical plan, 
generating RelNodes.
   These nodes are transformed into PlanNodes as usual and sent to the servers.
   But instead of asking to execute the plan, the broker asks to explain it 
using a new protobuf endpoint.
   This new endpoint returns a list of PlanNodes.
   
   When the server receives the explain request, it analyzes the plan looking 
for leaf operators and creates single-stage
   operators as usual.
   There are two key differences with respect of the execution mode:
   1. The server tracks which PlanNodes have been converted into single-stage 
operators.
   2. The server does not execute the operator. Instead it calls a new 
introduced method `Operator.getOperatorInfo`, 
   which returns the same information returned by `Operator.explainPlan` but in 
POJOs.
   
   The server then convert these POJOs into PlanNodes and substitute the 
tracked PlanNodes with the new ones.
   Finally the new plan is sent back to the broker.
   
   In order to be able to introduce physical (aka index used, etc) information 
in the PlanNode, a new ExplainedPlanNode is
   created.
   These nodes are not meant to be translated into actual operators, but to be 
used to explain the query execution plan.
   When the broker receives the PlanNodes, it converts them back into a RelNode 
using a new class `PinotExplainedRelNode`.
   
   Then the broker substitutes the original logical RelNodes with the new ones 
returned by the servers.
   Finally, it explains the RelNode as expected in Calcite.
   
   The result can be see in the following pictures:
   
   Without implementation (similar to current explain)
   
![image](https://github.com/user-attachments/assets/761ee0d5-24d5-4f10-8d1c-d9c4a886ebef)
   
   
   With implementation:
   
![image](https://github.com/user-attachments/assets/13e96262-3ea4-4479-89a9-7c201dfe8993)
   
   
   The PR is still a work in progress, but it is already partially functional.
   
   - [ ] Combine different plans for segments (right now the first one is used)
       - Ideally we should _group by plan_ and count how many segments use that 
plan
   - [ ] Add a flag so that we can use to decide if we want to use the new plan 
or the old one by default
   - [ ] PlanNodeToRelConverted
     - [ ] Support Window
     - [ ] Support SetOp
     - [ ] Decide what to do with send and receive mailboxes
   - [ ] Support pipeline breaker
   - [ ] PinotTable.getRowType may not use TypeFactory
   - [ ] Try to simplify LeafStageTransferableBlockOperator.explain
   - [ ] Try to simplify QueryServer.explain (similar to what was done in 
QueryDispatcher)
   - [ ] Try to simplify QueryRunner.explain (similar to what was done in 
QueryDispatcher)
   - [ ] Resolve corner cases in MultiStageBrokerRequestHandler
   - [ ] Review ServerQueryExecutorV1Impl
   - [ ] Decide if we need to keep TransformationTraker
   - [ ] Decide if we need to keep ImplementationExplainUtils
   - [ ] Verify tests
   - [ ] Optional: Extend V1 to be able to use the new ExplainMode
   - [ ] Split PinotExplainedRelNode and PinotExplainedRelNode.Info


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to