On 8/22/15 6:04 PM, Andrew Schaumberg wrote: > Dear devs, > > Chi-squared test takes degrees of freedom to be one fewer than the sample > count.
You mean one fewer than the number of categories (the common length of the expected and observed count arrays). > https://commons.apache.org/proper/commons-math/apidocs/src-html/org/apache/commons/math3/stat/inference/ChiSquareTest.html#line.154 > > Should be two fewer, right? > http://courses.wcupa.edu/rbove/Berenson/10th%20ed%20CD-ROM%20topics/section12_5.pdf > > d.f. = k - p - 1 > k is sample count > p is # parameters being estimated (typically 1) If the expected counts are computed from the same underlying dataset used to get the observed counts and they depend on parameters estimated from the data, then you are correct, you need to reduce the degrees of freedom by the number of parameters estimated from the data. The [math] code has no way of knowing where the expected counts come from. The test evaluates the null hypothesis stated in the javadoc - that the observed counts conform to the (fixed) distribution described by the expected counts. To handle the case where the expected counts are derived from a parametric distribution fit from the data, you need to use the ChiSquare distribution directly to perform the test. That is a little inconvenient, so an enhancement request allowing the degrees of freedom to be passed to the test makes sense. Feel free to open a JIRA with this request. > > One fewer is typically for T-test, I think. > > Not a statistician, thanks for your time, sorry if this is the wrong agent, > first time here, Thanks for the question. These kinds of questions probably belong on the user list in the future, though. Welcome to Commons! Phil > -A > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org