agavra commented on code in PR #10120:
URL: https://github.com/apache/pinot/pull/10120#discussion_r1069654441


##########
pinot-query-runtime/src/test/resources/queries/Skew.json:
##########
@@ -0,0 +1,31 @@
+{
+  "skew": {
+    "tables": {
+      "tbl": {
+        "schema": [
+          {"name": "key", "type": "STRING"},
+          {"name": "val", "type": "INT"}
+        ],
+        "inputs": [
+          ["a", 1],
+          ["a", 2],
+          ["a", 3],
+          ["a", 4],
+          ["a", 4],
+          ["a", 4],
+          ["a", 7],
+          ["a", 9]

Review Comment:
   I added the test and it still passes :) 
   
   > I felt like we are missing an intermediate aggregation that needs to deal 
with after-shuffle results
   
   I'm not sure why we need this, here's the plan:
   
   ```
   [0]@localhost:57105 MAIL_RECEIVE(RANDOM_DISTRIBUTED)
   ├── [1]@localhost:57103 MAIL_SEND(RANDOM_DISTRIBUTED)->{[0]@localhost:57105} 
(Subtree Omitted)
   └── [1]@localhost:57104 MAIL_SEND(RANDOM_DISTRIBUTED)->{[0]@localhost:57105}
      └── [1]@localhost:57104 PROJECT
         └── [1]@localhost:57104 AGGREGATE
            └── [1]@localhost:57104 MAIL_RECEIVE(HASH_DISTRIBUTED)
               ├── [2]@localhost:57103 
MAIL_SEND(HASH_DISTRIBUTED)->{[1]@localhost:57103,[1]@localhost:57104}
               │   └── [2]@localhost:57103 AGGREGATE
               │      └── [2]@localhost:57103 TABLE SCAN (skew_tbl) 
{OFFLINE=[skew_tbl_OFFLINE_064c7c7a-676d-44e8-b2a8-c9c03da47be3]}
               └── [2]@localhost:57104 
MAIL_SEND(HASH_DISTRIBUTED)->{[1]@localhost:57103,[1]@localhost:57104}
                  └── [2]@localhost:57104 AGGREGATE
                     └── [2]@localhost:57104 TABLE SCAN (skew_tbl) 
{OFFLINE=[skew_tbl_OFFLINE_4d071bc0-898d-474b-9f75-9747d9a4aa98]}
   ```
   
   Note that it aggregates both at the leaf and at the intermediate node. 
Because it uses a hash distributed, there should be no key overlap between 
`localhost:57103` and `localhost:57104` (which does the final project).
   
   Am I missing something?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to