I need to do some pretty simple statistics in a Clojure program and Incanter
produces results that I think must be wrong (details below). So I don't think I
can trust it.
Is there other code for statistical testing out there? Or maybe somebody could
explain to me how to interpret the seemingly anomalous Incanter results? (I
received no reply on the Incanter list). I only need a t-test at the moment,
but this is a bit of a pain to code from scratch (because of the table that it
uses).
I'm trying to use an un-paired, two-tailed t-test to tell whether the means of
two sets of numbers differ significantly. (Whether or not this is the right
test for my application -- e.g. whether the assumptions of normal distributions
are valid -- is another matter. I just want to know it the tests are being
calculated correctly.)
If I understand correctly the t-test should produce a p-value which ranges from
0 to 1. If it's less than 0.05 we can say that the means differ. (Again, there
would be more to say here about what's statistically meaningful, but that
discussion isn't relevant to my question).
Again, if I understand correctly, under no circumstances should the p-value
ever be outside of the range from 0 to 1. It's a probability, and no value
outside of that range makes any sense. But Incanter sometimes returns p-values
greater than 1.
Sometimes it seems to give reasonable results:
=> (use 'incanter.stats)
nil
=> (t-test [2 3 4 3 2 3] :y [3 4 5 6 5 4 3])
{:conf-int [-2.6129722457891322 -0.2917896589727722],
:x-mean 2.8333333333333335,
:t-stat -2.7883256115163184,
:p-value 0.018335366451909547,
:n1 6,
:df 10.519255193727584,
:n2 7,
:y-var 1.2380952380952408,
:x-var 0.5666666666666658,
:y-mean 4.285714285714286}
But in other cases the :p-value is over 1. Here's an example from Incanter's
own documentation:
=> (t-test (range 1 11) :mu 0)
{:conf-int [3.33414941027723 7.66585058972277],
:x-mean 5.5,
:t-stat 5.744562646538029,
:p-value 1.9997218039889517,
:n1 10,
:df 9,
:n2 nil,
:y-var nil,
:x-var 9.166666666666666,
:y-mean nil}
Here's an example that's closer to what can arise in my application, and again
I just don't see how the calculation can be right if it's producing this kind
of p-value:
=> (t-test '(40 5 2) :y '(1 5 1))
{:conf-int [-39.46068349230474 66.12735015897141],
:x-mean 15.666666666666666,
:t-stat 1.0866516498483223,
:p-value 1.6115506955016772,
:n1 3,
:df 2.0477900396893336,
:n2 3,
:y-var 5.333333333333332,
:x-var 446.33333333333337,
:y-mean 2.3333333333333335}
Am I missing something that would rationalize these results?
If not, then does anyone have a pointer to more reliable statistics code in
Clojure? Or pointers to using a Java library? I see that there are libraries
out there -- e.g.
http://commons.apache.org/math/api-1.2/org/apache/commons/math/stat/inference/TTest.html
-- but Java interop is not my strong suit and I'm not sure how to call this
from my Clojure code.
Any pointers would be appreciated.
Thanks,
-Lee
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en