On 09-May-2013 01:42:07 Pascal Oettli wrote: > On 05/09/2013 10:29 AM, Gundala Viswanath wrote: >> I have the following list of data each has 10 samples. >> The values indicate binding strength of a particular molecule. >> >> What I want so show is that 'x' is statistically different from >> 'y', 'z' and 'w'. Which it does if you look at X it has >> more values greater than zero (2.8,1.00,5.4, etc) than others. >> >> I tried t-test, but all of them shows insignificant difference >> with high P-value. >> >> What's the appropriate test for that? >> >> Below is my code: >> >> x <- >> c(2.852672123,0.076840264,1.009542943,0.430716968,5.4016,0.084281843,0.065654 >> 548,0.971907344,3.325405405,0.606504718) >> y <- >> c(0.122615039,0.844203734,0.002128992,0.628740077,0.87752229,0.888600425,0.72 >> 8667099,0.000375047,0.911153571,0.553786408); >> z <- >> c(0.766445916,0.726801899,0.389718652,0.978733927,0.405585807,0.408554832,0.7 >> 99010791,0.737676439,0.433279599,0.947906524) >> w <- >> c(0.000124984,1.486637663,0.979713013,0.917105894,0.660855127,0.338574774,0.2 >> 11689885,0.434050179,0.955522972,0.014195184) >> >> t.test(x,y) >> t.test(x,z) >> >> --END-- >> >> G.V. > > Hello, > > 1) Why 'x' should be statistically different from others? > 2) 'y' looks to be bimodal. The mean is not an appropriate measurement > for this kind of distribution. > > Regards, > Pascal
Running the commands: plot(x,pch="+",col="red",ylim=c(0,6)) points(y,pch="+",col="green") points(z,pch="+",col="blue") points(w,pch="+",col="black") lines(x,col="red") lines(y,col="green") lines(z,col="blue") lines(w,col="black") indicates that y, z and w are similar to each other (with some suggestion of a serial structure). However, while part of x is also similar to y, z and w, it is clear that 3 values of x are "outliers" (well above the range of all other values, including those of x). [And I think Pascal meant "x" when he wrote "'y' looks to be bimodal."] And it may be of interest that these exceptional values of x occur at x[1], x[5], x[9] (i.e. every 4th observation). Taken together, these facts suggest that an examination of the procedure giving rise to the data may be relevant. As one example of the sort of thing to look for: were the 3 outlying observations obtained by the same worker/laboratory/apparatus as the others (or a similar question for x as opposed to y, z, w, raising issues of reliability). There are many similar questions one could think of raising, but knowledge of the background is essential for appropriate choice! I would agree with Pascal that a "routine" t-test is not appropriate. One thing that can be directly looked at statistically is, taking as given that there are 3 outliers somewhere in all 40 data, what is the probability that all three occur in one of the 4 groups (x,y,z,w) of data? This is 4 times the probability that they occur is a specific group (say x). The chance of all 3 being in x is the number of ways of choosing the remaining 7 out of the remaining 37, divided by the number of ways of choosing any 10 out of 40, i.e. (in R-speak) choose(37,7)/choose(40,10) # [1] 0.01214575 so the chance of all 3 being in some one of the 4 groups is 4*choose(37,7)/choose(40,10) # [1] 0.048583 which, if you are addicted to P-values, is just significant at the 5% (P <= 0.05) level. So this gives some indication that the "x" group of data is not on the same footing as the other ("y", "z", "w") groups. However, such a test does not address any question of why such outliers should be there in the first place; this needs to be addressed differently (see above). And one must not forget that the above "P-value" has been obtained by a method which was prompted by looking at the data in the first place. Hoping this helps, Ted. ------------------------------------------------- E-Mail: (Ted Harding) <ted.hard...@wlandres.net> Date: 09-May-2013 Time: 09:35:05 This message was sent by XFMail ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.