bziobrowski commented on code in PR #14346:
URL: https://github.com/apache/pinot/pull/14346#discussion_r1853796560


##########
docs/dev/query/msq/tree-lifecycle.md:
##########
@@ -0,0 +1,99 @@
+# Query tree lifecycle
+
+As usual in parsers and compilers, the query is transformed from a string to a 
tree of objects that represent the query.
+This page describes the different transformations and phases this query tree 
goes through.
+
+[TOC]
+
+## Parsing
+
+The `BaseBrokerRequestHandler` and `MultiStageBrokerRequestHandler` receive a 
Jackson `JsonNode` with the query.
+That node includes the query in string format and some options.
+They are parsed by the SQL parser provided by Calcite using `CalciteSqlParser`.
+This generates an object of type `SqlNodeAndOptions` which contains the parsed 
`SqlNode` and the options.
+
+At this point any syntax error in the query will be detected and an exception 
will be thrown.
+It is not easy to customize these syntactic errors given most of that is done 
using Javacc.
+
+The `SqlNode`s are Calcite entities that are bound to the SQL language.
+They are basically classes that represent exactly what the user wrote in the 
query.
+They are even bound to the exact syntax of the query, so if the user writes 
`SELECT * FROM table` the `SqlNode` 
+will contain a `SqlSelect` with a `SqlIdentifier` for the `*` and a 
`SqlIdentifier` for the `table`.
+They even contain the exact position in the query where the user wrote that.
+
+But up to this point we have not validated the query semantically.
+That means that the query could be syntactically correct but still not make 
sense.
+For example, the referenced tables, attributes or functions may not exist or 
some expression may not be well typed.
+
+## Semantic validation
+
+The `SqlNodeAndOptions` is passed to `QueryEnvironment.validate`, which uses 
Calcite rules and the Pinot schema and
+function registries to validate the query.
+At this point is where we check that the tables and columns exist, that the 
functions are well typed, etc.
+
+## Relational algebra
+
+Once the query is validated, the `SqlNode` is converted to a `RelNode` using 
`QueryEnvironment.toRelation` method.
+This is done by the `SqlToRelConverter` class, which is a Calcite class that 
converts `SqlNode`s to `RelNode`s.
+
+Contrary to the `SqlNode`, the `RelNode` is not bound to the SQL language.
+The `RelNode`s represent the relational algebra and it is not bound to the SQL 
language or its syntax.
+For example, a query in MongoDB, in Casandra or in Prometheus can be 
represented as a `RelNode`. 
+
+In Calcite these `RelNode`s can be very abstract, representing a query plan 
that is not tied to any specific execution 
+algorithm or even engine.
+This is the case of the Logical rel nodes, which are the output of the 
`SqlToRelConverter`.
+They say what needs to be done but not how to do it.
+Should we use a hash join or a merge join? Should we use a bitmap index or a 
B-tree index?
+This is not specified in the logical level.
+
+Even more, at this point the query plan wasn't actually optimized.
+Calcite applies some small optimizations here like reducing some literal 
expressions (for example `1 + 1` to `2`) but
+the query plan is not optimized yet..
+
+## Optimization
+
+If Apache Pinot were following the Calcite architecture, this phase would 
optimize the logical `RelNode`s into
+physical `RelNode`s.
+But Pinot just partially follows this model.
+During optimization apply the rules defined in `PinotQueryRuleSets`, which 
transform the `RelNode`s in several ways:
+1. Apply some common relational algebra rules. For example, push down filters, 
prune project columns that are not used,
+etc.
+2. Apply some Pinot specific rules. For example, tries to optimize semi-joins 
into pipeline breakers, add exchanges
+to separate the query into multiple stages, etc.
+
+But these nodes are still not executable.
+
+## PlanNodes and DispatchableSubPlan
+
+In this phase, still in `QueryExecutor`, the `RelNode`s are converted to 
`PlanNode`s using `PinotLogicalQueryPlanner`.
+
+These `PlanNode`s were originally created to be able to serialize the query 
plan and send it to the servers, but they
+ended up being used to apply some final optimizations and are actually used to 
calculate how to distribute the
+stages between servers.
+Specifically in this phase we decide how many workers will execute each stage.
+These workers are distributed between the servers, so a server may execute the 
same stage more than once.
+
+## Serialization and deserialization
+The `PlanNode`s are sent over GRPC.
+The definition of the RPC methods is in 
`pinot-common/src/main/proto/plan.proto` and the protobuf used to serialize
+the plans is in `pinot-common/src/main/proto/plan.proto`.
+
+## Multi-stage operators
+
+The plan in protobuf format is received by the server and deserialized into 
`Worker.StagePlan`, generated by protobuf.
+These objects are then transformed back into `PlanNode`s using 
`QueryPlanSerDeUtils` and then transformed into
+`MultiStageOperator`s by MSQ `QueryRunner` (do not confuse with `QueryRunner` 
in SSQ).
+
+During this phase, two interesting things happen:
+1. The mailboxes are created. These are the objects that will be used to send 
the data to workers executing the stages
+that depend on the output of other stages. Therefore for each worker, send 
mailboxes are created to send the result
+of the stage and receive mailboxes are created to receive the input of the 
stage.
+2. In the leaf stages, the `PlanNode` is analyzed to look for all the work 
that could be done by SSQ.
+This is almost any table read, filter, aggregate and project whose children 
are also supported by SSQ.
+For example joins and window functions are not supported by SSQ.
+See `PlanNodeToOpChain` to learn more about the SSQ boundary.
+This subplan is then transformed into a single 
`LeafStageTransferableBlockOperator`, which uses SSQ to execute that
+part of the query. This is done mainly in order to do not have to rewrite all 
the optimizations we have in SSQ.

Review Comment:
   ```suggestion
   part of the query. This is done mainly in order to not have to rewrite all 
the optimizations we have in SSQ.
   ```
   or 
   ```suggestion
   part of the query. This is done mainly in order to avoid having to rewrite 
all the optimizations we have in SSQ.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to