SabrinaZhaozyf commented on issue #9277:
URL: https://github.com/apache/pinot/issues/9277#issuecomment-1227537033

   Hi @VenkatDatta, thank you for taking this up! Hopefully, the following 
information can help you get started:) 
   
   **Definition**
   https://en.wikipedia.org/wiki/Correlation. In Pinot, correlation can be used 
to describe the dependence/association of two columns.
   
   **Support in Existing DBs**
   Presto: 
https://prestodb.io/docs/current/functions/aggregate.html#statistical-aggregate-functions
   Postgres: https://www.postgresql.org/docs/9.4/functions-aggregate.html
   Pinot should follow the same syntax: `corr(x, y)`  -> DOUBLE
   
   **Calculation**
   Formula for correlation can be found in 
https://en.wikipedia.org/wiki/Correlation. You could also think of it as a 
normalized covariance. `corr(x, y) = cov(x, y) / (std(x) * std(y))`
   
   **Related PRs**
   - Background: #8493 
   - Covariance: #9236
   - Histogram: #8724 
   
   **Good Starting Point**
   - Run/step through unit tests for aggregation functions under 
`src/test/java/org/apache/pinot/queries` to familiarize with how aggregation 
happens distributedly
     - See how aggregate and merge update the states we are keeping track and 
how are they different?
     - Helpful breakpoints
      `org/apache/pinot/core/query/reduce/AggregationDataTableReducer.java`
      `org/apache/pinot/core/operator/combine/BaseCombineOperator.java `
      `org/apache/pinot/core/operator/query/AggregationOperator.java  `
   - Extend `CovarianceTuple` to add fields(sum of squares?) that help 
calculate the standard deviation for the 2 columns
   - Also helpful to look at other classes in 
`org/apache/pinot/segment/local/customobject`
   - Also good to look at other aggregation functions under 
`org/apache/pinot/core/query/aggregation/function ` and see how they are wired 
in
   
   **Testing** 
   - Should have very similar structure as `CovarianceQueriesTest.java`
   - Want to test on both individual segments and inter-segments to make sure 
both intermediate results and reduced results are correct 
   - Make sure to test on distinct servers
     - Please feel free to ask me any questions! This part can be a bit tricky.
   
   Please let me / @jasperjiaguo know if you have any questions! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to