siddharthteotia commented on code in PR #10120: URL: https://github.com/apache/pinot/pull/10120#discussion_r1073802214
########## pinot-query-runtime/src/test/resources/queries/Skew.json: ########## @@ -0,0 +1,79 @@ +{ + "skew": { + "tables": { + "tbl": { + "schema": [ + {"name": "groupingCol", "type": "STRING"}, + {"name": "partitionCol", "type": "STRING"}, + {"name": "val", "type": "INT"} + ], + "inputs": [ + ["a", "key1", 1], + ["a", "key2", 2], + ["a", "key3", 3], + ["a", "key1", 4], + ["a", "key2", 4], + ["a", "key3", 4], + ["a", "key1", 7], + ["a", "key2", 9], + ["b", "key3", 1], + ["b", "key1", 2], + ["b", "key2", 3], + ["b", "key3", 4], + ["b", "key1", 4], + ["b", "key2", 4], + ["b", "key3", 7], + ["b", "key1", 9] + ], + "partitionColumns": [ + "partitionCol" + ] + }, + "tbl2": { + "schema": [ + {"name": "groupingCol", "type": "STRING"}, + {"name": "partitionCol", "type": "STRING"}, + {"name": "val", "type": "INT"} + ], + "inputs": [ + ["a", "key1", 1], + ["a", "key2", 2], + ["a", "key3", 3], + ["a", "key1", 4], + ["a", "key2", 4], + ["a", "key3", 4], + ["a", "key1", 7], + ["a", "key2", 9], + ["b", "key3", 1], + ["b", "key1", 2], + ["b", "key2", 3], + ["b", "key3", 4], + ["b", "key1", 4], + ["b", "key2", 4], + ["b", "key3", 7], + ["b", "key1", 9] + ], + "partitionColumns": [ + "partitionCol" + ] + } + }, + "queries": [ + { + "description": "skew for int column", + "sql": "SELECT groupingCol, SKEWNESS(val), KURTOSIS(val) FROM {tbl} GROUP BY groupingCol", + "outputs": [ + ["a", 0.8647536091225356, 0.3561662049861511], + ["b", 0.8647536091225356, 0.3561662049861511] + ] + }, + { + "sql": "SELECT t1.groupingCol, SKEWNESS(t1.val + t2.val), KURTOSIS(t1.val + t2.val) FROM {tbl} AS t1 LEFT JOIN {tbl2} AS t2 USING (partitionCol) GROUP BY t1.groupingCol", Review Comment: You may also want to add a test for `EXPLAIN PLAN` for these queries ? IIUC - The 2nd query will not push down the `fourthMoment` to leaf since it has to be computed on top of `JOIN` and therefore will use the new code you added in this PR. But the previous query is a typical 2-stage plan where the aggregates are computed at leaf layer by current engine operators and then merged / reduced on broker. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org