Re: [R] Correlate

John Fox Mon, 22 Aug 2022 11:00:54 -0700

Dear Val,

On 2022-08-22 1:33 p.m., Val wrote:

For the time being  I am assuming the relationship across  variables
is linear.  I want get the values first  and detailed examining  of
the relationship will follow later.

This seems backwards to me, but I'll refrain from commenting further onwhether what you want to do makes sense and instead address how to do it(not, BTW, because I disagree with Bert's and Tim's remarks).


Please see below:


On Mon, Aug 22, 2022 at 12:23 PM Ebert,Timothy Aaron <teb...@ufl.edu> wrote:


I (maybe) agree, but I would go further than that. There are assumptions associated with 
the test that are missing. It is not clear that the relationships are all linear. 
Regardless of a "significant outcome" all of the relationships need to be 
explored in more detail than what is provided in the correlation test.

Multiplicity adjustment as in : 
https://www.sciencedirect.com/science/article/pii/S0197245600001069 is not an 
issue that I can see in these data from the information provided. At least not 
in the same sense as used in the link.

My first guess at the meaning of "multiplicity adjustment" was closer to the 
experimentwise error rate in a multiple comparison procedure. 
https://dictionary.apa.org/experiment-wise-error-rateEssentially, the type 1 error rate 
is inflated the more test you do and if you perform enough tests you find significant 
outcomes by chance alone. There is great significance in the Redskins rule: 
https://en.wikipedia.org/wiki/Redskins_Rule.

A simple solution is to apply a Bonferroni correction where alpha is divided by 
the number of comparisons. If there are 250, then 0.05/250 = 0.0002. Another 
approach is to try to discuss the outcomes in a way that makes sense. What is 
the connection between a football team's last home game an the election result 
that would enable me to take another team and apply their last home game result 
to the outcome of a different election?

Another complication is if variables x2 through x250 are themselves correlated. 
Not enough information was provided in the problem to know if this is an issue, 
but 250 orthogonal variables in a real dataset would be a bit unusual 
considering the experimentwise error rate previously mentioned.

Large datasets can be very messy.


Tim

-----Original Message-----
From: Bert Gunter <bgunter.4...@gmail.com>
Sent: Monday, August 22, 2022 12:07 PM
To: Ebert,Timothy Aaron <teb...@ufl.edu>
Cc: Val <valkr...@gmail.com>; r-help@R-project.org (r-help@r-project.org) 
<r-help@r-project.org>
Subject: Re: [R] Correlate

[External Email]

... But of course the p-values are essentially meaningless without some sort of 
multiplicity adjustment.
(search on "multiplicity adjustment" for details). :-(

-- Bert


On Mon, Aug 22, 2022 at 8:59 AM Ebert,Timothy Aaron <teb...@ufl.edu> wrote:


A somewhat clunky solution:
for(i in colnames(dat)){
   print(cor.test(dat[,i], dat$x1, method = "pearson", use = 
"complete.obs")$estimate)
   print(cor.test(dat[,i], dat$x1, method = "pearson", use =
"complete.obs")$p.value) }

Because of missing data, this computes the correlations on differentsubsets of the data. A simple solution is to filter the data for NAs:


D <- na.omit(dat)

More comments below:

Rather than printing you could set up an array or list to save the results.

Tim

-----Original Message-----
From: R-help <r-help-boun...@r-project.org> On Behalf Of Val
Sent: Monday, August 22, 2022 11:09 AM
To: r-help@R-project.org (r-help@r-project.org) <r-help@r-project.org>
Subject: [R] Correlate

[External Email]

Hi all,

I have a data set with  ~250  variables(columns).  I want to calculate
the correlation of  one variable with the rest of the other variables
and also want  the p-values  for each correlation.  Please see the
sample data and my attempt.  I  have got the correlation but unable to
get the p-values

dat <- read.table(text="x1 x2 x3 x4
            1.68 -0.96 -1.25  0.61
           -0.06  0.41  0.06 -0.96
               .    0.08  1.14  1.42
            0.80 -0.67  0.53 -0.68
            0.23 -0.97 -1.18 -0.78
           -1.03  1.11 -0.61    .
            2.15     .    0.02  0.66
            0.35 -0.37 -0.26  0.39
           -0.66  0.89   .    -1.49
            0.11  1.52  0.73  -1.03",header=TRUE)

#change all to numeric
     dat[] <- lapply(dat, function(x) as.numeric(as.character(x)))

This data manipulation is unnecessary. Just specify the argumentna.strings="." to read.table().


     data_cor <- cor(dat[ , colnames(dat) != "x1"],  dat$x1, method =
"pearson", use = "complete.obs")

Result
               [,1]
x2 -0.5845835
x3 -0.4664220
x4  0.7202837

How do I get the p-values ?

Taking a somewhat different approach from cor.test(), you can applyFisher's z-transformation (recall that D is the data filtered for NAs):


> 2*pnorm(abs(atanh(data_cor)), sd=1/sqrt(nrow(D) - 3), lower.tail=FALSE)
        [,1]
x2 0.2462807
x3 0.3812854
x4 0.1156939

I hope this helps,
 John


Thank you,

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&amp;data=05%7C01%7Ctebert%40ufl
.edu%7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e
1b84%7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w
LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
&amp;sdata=3iAfMs1QzQARKF3lqUI8s43PX4IIkgEuQ9PUDyUtpqY%3D&amp;reserved
=0 PLEASE do read the posting guide
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r
-project.org%2Fposting-guide.html&amp;data=05%7C01%7Ctebert%40ufl.edu%
7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e1b84%
7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;
sdata=v3IEonnPgg1xTKUzLK4rJc3cfMFxw5p%2FW6puha5CFz0%3D&amp;reserved=0
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&amp;data=05%7C01%7Ctebert%40ufl
.edu%7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e
1b84%7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w
LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
&amp;sdata=3iAfMs1QzQARKF3lqUI8s43PX4IIkgEuQ9PUDyUtpqY%3D&amp;reserved
=0 PLEASE do read the posting guide
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r
-project.org%2Fposting-guide.html&amp;data=05%7C01%7Ctebert%40ufl.edu%
7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e1b84%
7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;
sdata=v3IEonnPgg1xTKUzLK4rJc3cfMFxw5p%2FW6puha5CFz0%3D&amp;reserved=0
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Correlate

Reply via email to