Hi [EMAIL PROTECTED] napsal dne 07.11.2007 18:23:55:
> hello, > > i am a bit of a statistical neophyte and currently trying to make some sense > of confidence intervals for correlation coefficients. i am using the cor. > test() function. the documentation is quite terse and i am having trouble > tieing up the output from this function with stuff that i have read in the > literature. so, for example, i make two sequences and calculate the > correlation coefficient: > > > x <- runif(20) > > y <- jitter(x, amount = 0.7) > > cor(x, y) > [1] 0.5198252 > > now i want to establish that confidence i can attach to this value. from the > table i retrieved from the article "Understanding Correlation" by r. j. rummel > [online] i get that the probability of a correlation coefficient of 0.5198252 > arising by chance from two sequences of length 20 is less than 0.01. so this > seems like i can attach some significance to the result. i still don't > understand where the table comes from and it only goes up as far as sequences > of length 1000. the data i am wanting to analyse has length of more than > 70000, so i need to calculate these confidence levels myself. i assume that > cor.test() is the way to do this. so i tried: You shall consult some basic statistic textbooks. Some of them you can find in CRAN recommended literature but much is explained in output. > > > cor.test(x, y, "greater", conf.level = 0.95) > > Pearson's product-moment correlation > > data: x and y > t = 2.5816, df = 18, p-value = 0.009405 ^^^^^^^^^ Here is your 0.01 value getting this cor coeficient by chance > alternative hypothesis: true correlation is greater than 0 positive correlation > 95 percent confidence interval: > 0.1753340 1.0000000 confidence interval for correlation coeficient > sample estimates: > cor > 0.5198252 > > > cor.test(x, y, "less", conf.level = 0.95) > > Pearson's product-moment correlation > > data: x and y > t = 2.5816, df = 18, p-value = 0.9906 > alternative hypothesis: true correlation is less than 0 negative correlation > 95 percent confidence interval: > -1.0000000 0.7509089 > sample estimates: > cor > 0.5198252 > > > cor.test(x, y, "two.sided", conf.level = 0.95) > > Pearson's product-moment correlation > > data: x and y > t = 2.5816, df = 18, p-value = 0.01881 > alternative hypothesis: true correlation is not equal to 0 any type of correlation > 95 percent confidence interval: > 0.1003997 0.7823738 > sample estimates: > cor > 0.5198252 > > i reckon that the first invocation of the function is closest to what i am > looking for. now the rest of the output from the function is a total mystery > to me. could anyone please tell me: > > o what is a p-value? Wikipedia says In statistical hypothesis testing, the p-value is the probability of obtaining a result at least as extreme as a given data point, assuming the data point was the result of chance alone. The fact that p-values are based on this assumption is crucial to their correct interpretation > o how to interpret the quoted confidence interval? > > i do see that as i increase the conf.level input parameter to cov.test() the > lower bound of the confidence interval gets lower: > > 0.95 -> 0.1753340 1.0000000 > 0.975 -> 0.1003997 1.0000000 > 0.995 -> -0.04859184 1.00000000 > > does this mean that with 99.5% certainty the correlation coefficient should > lie in the range -0.04859184 to 1.00000000? hmmm. i am doubtful. plus this > doesn't really answer my question, which is more about what confidence i can > assign to the measured correlation coefficient (0.5198252). Why not. Those figures are really what they seems to be. In first case the true correlation coeficient lies between 0.17 and 1 based on data and assumption of positice correlation with 95% probability. If you want to increase the probability for true coeficient to be in some interval you need to expand your interval (and if you want to be 100% sure you need to expand it infinitelly :-). Regards Petr > > an alternative question would be: given two sequences and a calculated > correlation coefficient, with what probability could i assert that the > underlying processes are indeed correlated and that the calculated correlation > coefficient does not simply arise by chance. > > please forgive my ignorance. any help will be vastly appreciated. thanks! > > best regards, > andrew. > > ---------------------------------------------------------------------- > Get a free email account with anti spam protection. > http://www.bluebottle.com/tag/2 > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.