statistics library?

Lee Spector Tue, 27 Sep 2011 09:14:33 -0700

I need to do some pretty simple statistics in a Clojure program and Incanter 
produces results that I think must be wrong (details below). So I don't think I 
can trust it.


Is there other code for statistical testing out there? Or maybe somebody could 
explain to me how to interpret the seemingly anomalous Incanter results? (I 
received no reply on the Incanter list). I only need a t-test at the moment, 
but this is a bit of a pain to code from scratch (because of the table that it 
uses).

I'm trying to use an un-paired, two-tailed t-test to tell whether the means of 
two sets of numbers differ significantly. (Whether or not this is the right 
test for my application -- e.g. whether the assumptions of normal distributions 
are valid -- is another matter. I just want to know it the tests are being 
calculated correctly.)

If I understand correctly the t-test should produce a p-value which ranges from 
0 to 1. If it's less than 0.05 we can say that the means differ. (Again, there 
would be more to say here about what's statistically meaningful, but that 
discussion isn't relevant to my question).

Again, if I understand correctly, under no circumstances should the p-value 
ever be outside of the range from 0 to 1. It's a probability, and no value 
outside of that range makes any sense. But Incanter sometimes returns p-values 
greater than 1.

Sometimes it seems to give reasonable results:

=> (use 'incanter.stats)
nil

=> (t-test [2 3 4 3 2 3] :y [3 4 5 6 5 4 3])
{:conf-int [-2.6129722457891322 -0.2917896589727722],
 :x-mean 2.8333333333333335,
 :t-stat -2.7883256115163184,
 :p-value 0.018335366451909547,
 :n1 6,
 :df 10.519255193727584,
 :n2 7,
 :y-var 1.2380952380952408,
 :x-var 0.5666666666666658,
 :y-mean 4.285714285714286}

But in other cases the :p-value is over 1. Here's an example from Incanter's 
own documentation:

=> (t-test (range 1 11) :mu 0)
{:conf-int [3.33414941027723 7.66585058972277],
:x-mean 5.5,
:t-stat 5.744562646538029,
:p-value 1.9997218039889517,
:n1 10,
:df 9,
:n2 nil,
:y-var nil,
:x-var 9.166666666666666,
:y-mean nil}

Here's an example that's closer to what can arise in my application, and again 
I just don't see how the calculation can be right if it's producing this kind 
of p-value:

=> (t-test '(40 5 2) :y '(1 5 1))
{:conf-int [-39.46068349230474 66.12735015897141],
 :x-mean 15.666666666666666,
 :t-stat 1.0866516498483223,
 :p-value 1.6115506955016772,
 :n1 3,
 :df 2.0477900396893336,
 :n2 3,
 :y-var 5.333333333333332,
 :x-var 446.33333333333337,
 :y-mean 2.3333333333333335}

Am I missing something that would rationalize these results? 

If not, then does anyone have a pointer to more reliable statistics code in 
Clojure? Or pointers to using a Java library? I see that there are libraries 
out there -- e.g. 
http://commons.apache.org/math/api-1.2/org/apache/commons/math/stat/inference/TTest.html
 -- but Java interop is not my strong suit and I'm not sure how to call this 
from my Clojure code.

Any pointers would be appreciated.

Thanks,

 -Lee

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

statistics library?

Reply via email to