Re: [R] Correlate

Val Fri, 26 Aug 2022 13:06:53 -0700

Thank you John for your help and advice.

On Fri, Aug 26, 2022 at 11:04 AM John Fox <j...@mcmaster.ca> wrote:
>
> Dear Val,
>
> On 2022-08-26 10:41 a.m., Val wrote:
> > Hi John and Timothy
> >
> > Thank you for your suggestion and help. Using the sample data, I did
> > carry out a test run and found a difference in the correlation result.
> >
> > Option 1.
> > data_cor <- cor(dat[ , colnames(dat) != "x1"],  # Calculate correlations
> >                      dat$x1, method = "pearson", use = "complete.obs")
> > resulted
> >                   [,1]
> >      x2 -0.5845835
> >      x3 -0.4664220
> >      x4  0.7202837
> >
> > Option 2.
> >   for(i in colnames(dat)){
> >        print(cor.test(dat[,i], dat$x1, method = "pearson", use =
> > "complete.obs")$estimate)
> >      }
> >             [,1]
> > x2  -0.7362030
> > x3  -0.04935132
> > x4   0.85766290
> >
> > This was crosschecked  using Excel and other softwares and all matches
> > with option 2.
> > One of the factors that contributed for this difference  is loss of
> > information when we are using na.rm(). This is because that if x2 has
> > missing value but x3 and x4 don’t have then  na.rm()  removed  entire
> > row information including x3 and x4.
>
> Yes, I already explained that in my previous message.
>
> As well, cor() is capable of computing pairwise-complete correlations --
> see ?cor.
>
> There's not an obvious right answer here, however. Using
> pairwise-complete correlations can produce inconsistent (i.e.,
> non-positive semi-definite) correlation matrices because correlations
> are computed on different subsets of the data.
>
> There are much better ways to deal with missing data.
>
> >
> > My question is there  a way to extract the number of rows (N)  used in
> > the correlation analysis?.
>
> I'm sure that there are many ways, but here is one that is very
> simple-minded and should be reasonably efficient for ~250 variables:
>
>  > (nc <- ncol(dat))
> [1] 4
>
>  > R <- N <- matrix(NA, nc, nc)
>  > diag(R) <- 1
>  > for (i in 1:(nc - 1)){
> +   for (j in (i + 1):nc){
> +     R[i, j] <- R[j, i] <-cor(dat[, i], dat[, j], use="complete.obs")
> +     N[i, j] <- N[j, i] <- nrow(na.omit(dat[, c(i, j)]))
> +   }
> + }
>
>  > round(R, 3)
>         [,1]   [,2]   [,3]   [,4]
> [1,]  1.000 -0.736 -0.049  0.858
> [2,] -0.736  1.000  0.458 -0.428
> [3,] -0.049  0.458  1.000  0.092
> [4,]  0.858 -0.428  0.092  1.000
>
>  > N
>       [,1] [,2] [,3] [,4]
> [1,]   NA    8    8    8
> [2,]    8   NA    8    8
> [3,]    8    8   NA    8
> [4,]    8    8    8   NA
>
>  > round(cor(dat, use="pairwise.complete.obs"), 3) # check
>         x1     x2     x3     x4
> x1  1.000 -0.736 -0.049  0.858
> x2 -0.736  1.000  0.458 -0.428
> x3 -0.049  0.458  1.000  0.092
> x4  0.858 -0.428  0.092  1.000
>
> More generally, I think that it's a good idea to learn a little bit
> about R programming if you intend to use R in your work. You'll then be
> able to solve problems like this yourself.
>
> I hope this helps,
>   John
>
> > Thank you,
> >
> > On Mon, Aug 22, 2022 at 1:00 PM John Fox <j...@mcmaster.ca> wrote:
> >>
> >> Dear Val,
> >>
> >> On 2022-08-22 1:33 p.m., Val wrote:
> >>> For the time being  I am assuming the relationship across  variables
> >>> is linear.  I want get the values first  and detailed examining  of
> >>> the relationship will follow later.
> >>
> >> This seems backwards to me, but I'll refrain from commenting further on
> >> whether what you want to do makes sense and instead address how to do it
> >> (not, BTW, because I disagree with Bert's and Tim's remarks).
> >>
> >> Please see below:
> >>
> >>>
> >>> On Mon, Aug 22, 2022 at 12:23 PM Ebert,Timothy Aaron <teb...@ufl.edu> 
> >>> wrote:
> >>>>
> >>>> I (maybe) agree, but I would go further than that. There are assumptions 
> >>>> associated with the test that are missing. It is not clear that the 
> >>>> relationships are all linear. Regardless of a "significant outcome" all 
> >>>> of the relationships need to be explored in more detail than what is 
> >>>> provided in the correlation test.
> >>>>
> >>>> Multiplicity adjustment as in : 
> >>>> https://www.sciencedirect.com/science/article/pii/S0197245600001069 is 
> >>>> not an issue that I can see in these data from the information provided. 
> >>>> At least not in the same sense as used in the link.
> >>>>
> >>>> My first guess at the meaning of "multiplicity adjustment" was closer to 
> >>>> the experimentwise error rate in a multiple comparison procedure. 
> >>>> https://dictionary.apa.org/experiment-wise-error-rateEssentially, the 
> >>>> type 1 error rate is inflated the more test you do and if you perform 
> >>>> enough tests you find significant outcomes by chance alone. There is 
> >>>> great significance in the Redskins rule: 
> >>>> https://en.wikipedia.org/wiki/Redskins_Rule.
> >>>>
> >>>> A simple solution is to apply a Bonferroni correction where alpha is 
> >>>> divided by the number of comparisons. If there are 250, then 0.05/250 = 
> >>>> 0.0002. Another approach is to try to discuss the outcomes in a way that 
> >>>> makes sense. What is the connection between a football team's last home 
> >>>> game an the election result that would enable me to take another team 
> >>>> and apply their last home game result to the outcome of a different 
> >>>> election?
> >>>>
> >>>> Another complication is if variables x2 through x250 are themselves 
> >>>> correlated. Not enough information was provided in the problem to know 
> >>>> if this is an issue, but 250 orthogonal variables in a real dataset 
> >>>> would be a bit unusual considering the experimentwise error rate 
> >>>> previously mentioned.
> >>>>
> >>>> Large datasets can be very messy.
> >>>>
> >>>>
> >>>> Tim
> >>>>
> >>>> -----Original Message-----
> >>>> From: Bert Gunter <bgunter.4...@gmail.com>
> >>>> Sent: Monday, August 22, 2022 12:07 PM
> >>>> To: Ebert,Timothy Aaron <teb...@ufl.edu>
> >>>> Cc: Val <valkr...@gmail.com>; r-help@R-project.org 
> >>>> (r-help@r-project.org) <r-help@r-project.org>
> >>>> Subject: Re: [R] Correlate
> >>>>
> >>>> [External Email]
> >>>>
> >>>> ... But of course the p-values are essentially meaningless without some 
> >>>> sort of multiplicity adjustment.
> >>>> (search on "multiplicity adjustment" for details). :-(
> >>>>
> >>>> -- Bert
> >>>>
> >>>>
> >>>> On Mon, Aug 22, 2022 at 8:59 AM Ebert,Timothy Aaron <teb...@ufl.edu> 
> >>>> wrote:
> >>>>>
> >>>>> A somewhat clunky solution:
> >>>>> for(i in colnames(dat)){
> >>>>>     print(cor.test(dat[,i], dat$x1, method = "pearson", use = 
> >>>>> "complete.obs")$estimate)
> >>>>>     print(cor.test(dat[,i], dat$x1, method = "pearson", use =
> >>>>> "complete.obs")$p.value) }
> >>
> >> Because of missing data, this computes the correlations on different
> >> subsets of the data. A simple solution is to filter the data for NAs:
> >>
> >> D <- na.omit(dat)
> >>
> >> More comments below:
> >>
> >>>>>
> >>>>> Rather than printing you could set up an array or list to save the 
> >>>>> results.
> >>>>>
> >>>>>
> >>>>> Tim
> >>>>>
> >>>>> -----Original Message-----
> >>>>> From: R-help <r-help-boun...@r-project.org> On Behalf Of Val
> >>>>> Sent: Monday, August 22, 2022 11:09 AM
> >>>>> To: r-help@R-project.org (r-help@r-project.org) <r-help@r-project.org>
> >>>>> Subject: [R] Correlate
> >>>>>
> >>>>> [External Email]
> >>>>>
> >>>>> Hi all,
> >>>>>
> >>>>> I have a data set with  ~250  variables(columns).  I want to calculate
> >>>>> the correlation of  one variable with the rest of the other variables
> >>>>> and also want  the p-values  for each correlation.  Please see the
> >>>>> sample data and my attempt.  I  have got the correlation but unable to
> >>>>> get the p-values
> >>>>>
> >>>>> dat <- read.table(text="x1 x2 x3 x4
> >>>>>              1.68 -0.96 -1.25  0.61
> >>>>>             -0.06  0.41  0.06 -0.96
> >>>>>                 .    0.08  1.14  1.42
> >>>>>              0.80 -0.67  0.53 -0.68
> >>>>>              0.23 -0.97 -1.18 -0.78
> >>>>>             -1.03  1.11 -0.61    .
> >>>>>              2.15     .    0.02  0.66
> >>>>>              0.35 -0.37 -0.26  0.39
> >>>>>             -0.66  0.89   .    -1.49
> >>>>>              0.11  1.52  0.73  -1.03",header=TRUE)
> >>>>>
> >>>>> #change all to numeric
> >>>>>       dat[] <- lapply(dat, function(x) as.numeric(as.character(x)))
> >>
> >> This data manipulation is unnecessary. Just specify the argument
> >> na.strings="." to read.table().
> >>
> >>>>>
> >>>>>       data_cor <- cor(dat[ , colnames(dat) != "x1"],  dat$x1, method =
> >>>>> "pearson", use = "complete.obs")
> >>>>>
> >>>>> Result
> >>>>>                 [,1]
> >>>>> x2 -0.5845835
> >>>>> x3 -0.4664220
> >>>>> x4  0.7202837
> >>>>>
> >>>>> How do I get the p-values ?
> >>
> >> Taking a somewhat different approach from cor.test(), you can apply
> >> Fisher's z-transformation (recall that D is the data filtered for NAs):
> >>
> >>   > 2*pnorm(abs(atanh(data_cor)), sd=1/sqrt(nrow(D) - 3), lower.tail=FALSE)
> >>           [,1]
> >> x2 0.2462807
> >> x3 0.3812854
> >> x4 0.1156939
> >>
> >> I hope this helps,
> >>    John
> >>
> >>>>>
> >>>>> Thank you,
> >>>>>
> >>>>> ______________________________________________
> >>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>>> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
> >>>>> .ethz.ch%2Fmailman%2Flistinfo%2Fr-help&amp;data=05%7C01%7Ctebert%40ufl
> >>>>> .edu%7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e
> >>>>> 1b84%7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w
> >>>>> LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
> >>>>> &amp;sdata=3iAfMs1QzQARKF3lqUI8s43PX4IIkgEuQ9PUDyUtpqY%3D&amp;reserved
> >>>>> =0 PLEASE do read the posting guide
> >>>>> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r
> >>>>> -project.org%2Fposting-guide.html&amp;data=05%7C01%7Ctebert%40ufl.edu%
> >>>>> 7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e1b84%
> >>>>> 7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
> >>>>> DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;
> >>>>> sdata=v3IEonnPgg1xTKUzLK4rJc3cfMFxw5p%2FW6puha5CFz0%3D&amp;reserved=0
> >>>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>>
> >>>>> ______________________________________________
> >>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>>> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
> >>>>> .ethz.ch%2Fmailman%2Flistinfo%2Fr-help&amp;data=05%7C01%7Ctebert%40ufl
> >>>>> .edu%7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e
> >>>>> 1b84%7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w
> >>>>> LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
> >>>>> &amp;sdata=3iAfMs1QzQARKF3lqUI8s43PX4IIkgEuQ9PUDyUtpqY%3D&amp;reserved
> >>>>> =0 PLEASE do read the posting guide
> >>>>> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r
> >>>>> -project.org%2Fposting-guide.html&amp;data=05%7C01%7Ctebert%40ufl.edu%
> >>>>> 7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e1b84%
> >>>>> 7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
> >>>>> DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;
> >>>>> sdata=v3IEonnPgg1xTKUzLK4rJc3cfMFxw5p%2FW6puha5CFz0%3D&amp;reserved=0
> >>>>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>> ______________________________________________
> >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide 
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >> --
> >> John Fox, Professor Emeritus
> >> McMaster University
> >> Hamilton, Ontario, Canada
> >> web: https://socialsciences.mcmaster.ca/jfox/
> >>
> --
> John Fox, Professor Emeritus
> McMaster University
> Hamilton, Ontario, Canada
> web: https://socialsciences.mcmaster.ca/jfox/
>


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Correlate

Reply via email to