bziobrowski commented on code in PR #14346: URL: https://github.com/apache/pinot/pull/14346#discussion_r1853762300
########## docs/dev/query/msq/tree-lifecycle.md: ########## @@ -0,0 +1,99 @@ +# Query tree lifecycle + +As usual in parsers and compilers, the query is transformed from a string to a tree of objects that represent the query. +This page describes the different transformations and phases this query tree goes through. + +[TOC] + +## Parsing + +The `BaseBrokerRequestHandler` and `MultiStageBrokerRequestHandler` receive a Jackson `JsonNode` with the query. +That node includes the query in string format and some options. +They are parsed by the SQL parser provided by Calcite using `CalciteSqlParser`. +This generates an object of type `SqlNodeAndOptions` which contains the parsed `SqlNode` and the options. + +At this point any syntax error in the query will be detected and an exception will be thrown. +It is not easy to customize these syntactic errors given most of that is done using Javacc. + +The `SqlNode`s are Calcite entities that are bound to the SQL language. +They are basically classes that represent exactly what the user wrote in the query. +They are even bound to the exact syntax of the query, so if the user writes `SELECT * FROM table` the `SqlNode` +will contain a `SqlSelect` with a `SqlIdentifier` for the `*` and a `SqlIdentifier` for the `table`. +They even contain the exact position in the query where the user wrote that. + +But up to this point we have not validated the query semantically. +That means that the query could be syntactically correct but still not make sense. +For example, the referenced tables, attributes or functions may not exist or some expression may not be well typed. + +## Semantic validation + +The `SqlNodeAndOptions` is passed to `QueryEnvironment.validate`, which uses Calcite rules and the Pinot schema and +function registries to validate the query. +At this point is where we check that the tables and columns exist, that the functions are well typed, etc. + +## Relational algebra + +Once the query is validated, the `SqlNode` is converted to a `RelNode` using `QueryEnvironment.toRelation` method. +This is done by the `SqlToRelConverter` class, which is a Calcite class that converts `SqlNode`s to `RelNode`s. + +Contrary to the `SqlNode`, the `RelNode` is not bound to the SQL language. +The `RelNode`s represent the relational algebra and it is not bound to the SQL language or its syntax. +For example, a query in MongoDB, in Casandra or in Prometheus can be represented as a `RelNode`. + +In Calcite these `RelNode`s can be very abstract, representing a query plan that is not tied to any specific execution +algorithm or even engine. +This is the case of the Logical rel nodes, which are the output of the `SqlToRelConverter`. +They say what needs to be done but not how to do it. +Should we use a hash join or a merge join? Should we use a bitmap index or a B-tree index? +This is not specified in the logical level. + +Even more, at this point the query plan wasn't actually optimized. +Calcite applies some small optimizations here like reducing some literal expressions (for example `1 + 1` to `2`) but +the query plan is not optimized yet.. + +## Optimization + +If Apache Pinot were following the Calcite architecture, this phase would optimize the logical `RelNode`s into +physical `RelNode`s. +But Pinot just partially follows this model. +During optimization apply the rules defined in `PinotQueryRuleSets`, which transform the `RelNode`s in several ways: Review Comment: Maybe rewrite start of sentence above to : `During optimization it applies the rules` ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org