SabrinaZhaozyf commented on issue #9277: URL: https://github.com/apache/pinot/issues/9277#issuecomment-1227537033
Hi @VenkatDatta, thank you for taking this up! Hopefully, the following information can help you get started:) **Definition** https://en.wikipedia.org/wiki/Correlation. In Pinot, correlation can be used to describe the dependence/association of two columns. **Support in Existing DBs** Presto: https://prestodb.io/docs/current/functions/aggregate.html#statistical-aggregate-functions Postgres: https://www.postgresql.org/docs/9.4/functions-aggregate.html Pinot should follow the same syntax: `corr(x, y)` -> DOUBLE **Calculation** Formula for correlation can be found in https://en.wikipedia.org/wiki/Correlation. You could also think of it as a normalized covariance. `corr(x, y) = cov(x, y) / (std(x) * std(y))` **Related PRs** - Background: #8493 - Covariance: #9236 - Histogram: #8724 **Good Starting Point** - Run/step through unit tests for aggregation functions under `src/test/java/org/apache/pinot/queries` to familiarize with how aggregation happens distributedly - See how aggregate and merge update the states we are keeping track and how are they different? - Helpful breakpoints `org/apache/pinot/core/query/reduce/AggregationDataTableReducer.java` `org/apache/pinot/core/operator/combine/BaseCombineOperator.java ` `org/apache/pinot/core/operator/query/AggregationOperator.java ` - Extend `CovarianceTuple` to add fields(sum of squares?) that help calculate the standard deviation for the 2 columns - Also helpful to look at other classes in `org/apache/pinot/segment/local/customobject` - Also good to look at other aggregation functions under `org/apache/pinot/core/query/aggregation/function ` and see how they are wired in **Testing** - Should have very similar structure as `CovarianceQueriesTest.java` - Want to test on both individual segments and inter-segments to make sure both intermediate results and reduced results are correct - Make sure to test on distinct servers - Please feel free to ask me any questions! This part can be a bit tricky. Please let me / @jasperjiaguo know if you have any questions! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org