Thank you John for your help and advice. On Fri, Aug 26, 2022 at 11:04 AM John Fox <j...@mcmaster.ca> wrote: > > Dear Val, > > On 2022-08-26 10:41 a.m., Val wrote: > > Hi John and Timothy > > > > Thank you for your suggestion and help. Using the sample data, I did > > carry out a test run and found a difference in the correlation result. > > > > Option 1. > > data_cor <- cor(dat[ , colnames(dat) != "x1"], # Calculate correlations > > dat$x1, method = "pearson", use = "complete.obs") > > resulted > > [,1] > > x2 -0.5845835 > > x3 -0.4664220 > > x4 0.7202837 > > > > Option 2. > > for(i in colnames(dat)){ > > print(cor.test(dat[,i], dat$x1, method = "pearson", use = > > "complete.obs")$estimate) > > } > > [,1] > > x2 -0.7362030 > > x3 -0.04935132 > > x4 0.85766290 > > > > This was crosschecked using Excel and other softwares and all matches > > with option 2. > > One of the factors that contributed for this difference is loss of > > information when we are using na.rm(). This is because that if x2 has > > missing value but x3 and x4 don’t have then na.rm() removed entire > > row information including x3 and x4. > > Yes, I already explained that in my previous message. > > As well, cor() is capable of computing pairwise-complete correlations -- > see ?cor. > > There's not an obvious right answer here, however. Using > pairwise-complete correlations can produce inconsistent (i.e., > non-positive semi-definite) correlation matrices because correlations > are computed on different subsets of the data. > > There are much better ways to deal with missing data. > > > > > My question is there a way to extract the number of rows (N) used in > > the correlation analysis?. > > I'm sure that there are many ways, but here is one that is very > simple-minded and should be reasonably efficient for ~250 variables: > > > (nc <- ncol(dat)) > [1] 4 > > > R <- N <- matrix(NA, nc, nc) > > diag(R) <- 1 > > for (i in 1:(nc - 1)){ > + for (j in (i + 1):nc){ > + R[i, j] <- R[j, i] <-cor(dat[, i], dat[, j], use="complete.obs") > + N[i, j] <- N[j, i] <- nrow(na.omit(dat[, c(i, j)])) > + } > + } > > > round(R, 3) > [,1] [,2] [,3] [,4] > [1,] 1.000 -0.736 -0.049 0.858 > [2,] -0.736 1.000 0.458 -0.428 > [3,] -0.049 0.458 1.000 0.092 > [4,] 0.858 -0.428 0.092 1.000 > > > N > [,1] [,2] [,3] [,4] > [1,] NA 8 8 8 > [2,] 8 NA 8 8 > [3,] 8 8 NA 8 > [4,] 8 8 8 NA > > > round(cor(dat, use="pairwise.complete.obs"), 3) # check > x1 x2 x3 x4 > x1 1.000 -0.736 -0.049 0.858 > x2 -0.736 1.000 0.458 -0.428 > x3 -0.049 0.458 1.000 0.092 > x4 0.858 -0.428 0.092 1.000 > > More generally, I think that it's a good idea to learn a little bit > about R programming if you intend to use R in your work. You'll then be > able to solve problems like this yourself. > > I hope this helps, > John > > > Thank you, > > > > On Mon, Aug 22, 2022 at 1:00 PM John Fox <j...@mcmaster.ca> wrote: > >> > >> Dear Val, > >> > >> On 2022-08-22 1:33 p.m., Val wrote: > >>> For the time being I am assuming the relationship across variables > >>> is linear. I want get the values first and detailed examining of > >>> the relationship will follow later. > >> > >> This seems backwards to me, but I'll refrain from commenting further on > >> whether what you want to do makes sense and instead address how to do it > >> (not, BTW, because I disagree with Bert's and Tim's remarks). > >> > >> Please see below: > >> > >>> > >>> On Mon, Aug 22, 2022 at 12:23 PM Ebert,Timothy Aaron <teb...@ufl.edu> > >>> wrote: > >>>> > >>>> I (maybe) agree, but I would go further than that. There are assumptions > >>>> associated with the test that are missing. It is not clear that the > >>>> relationships are all linear. Regardless of a "significant outcome" all > >>>> of the relationships need to be explored in more detail than what is > >>>> provided in the correlation test. > >>>> > >>>> Multiplicity adjustment as in : > >>>> https://www.sciencedirect.com/science/article/pii/S0197245600001069 is > >>>> not an issue that I can see in these data from the information provided. > >>>> At least not in the same sense as used in the link. > >>>> > >>>> My first guess at the meaning of "multiplicity adjustment" was closer to > >>>> the experimentwise error rate in a multiple comparison procedure. > >>>> https://dictionary.apa.org/experiment-wise-error-rateEssentially, the > >>>> type 1 error rate is inflated the more test you do and if you perform > >>>> enough tests you find significant outcomes by chance alone. There is > >>>> great significance in the Redskins rule: > >>>> https://en.wikipedia.org/wiki/Redskins_Rule. > >>>> > >>>> A simple solution is to apply a Bonferroni correction where alpha is > >>>> divided by the number of comparisons. If there are 250, then 0.05/250 = > >>>> 0.0002. Another approach is to try to discuss the outcomes in a way that > >>>> makes sense. What is the connection between a football team's last home > >>>> game an the election result that would enable me to take another team > >>>> and apply their last home game result to the outcome of a different > >>>> election? > >>>> > >>>> Another complication is if variables x2 through x250 are themselves > >>>> correlated. Not enough information was provided in the problem to know > >>>> if this is an issue, but 250 orthogonal variables in a real dataset > >>>> would be a bit unusual considering the experimentwise error rate > >>>> previously mentioned. > >>>> > >>>> Large datasets can be very messy. > >>>> > >>>> > >>>> Tim > >>>> > >>>> -----Original Message----- > >>>> From: Bert Gunter <bgunter.4...@gmail.com> > >>>> Sent: Monday, August 22, 2022 12:07 PM > >>>> To: Ebert,Timothy Aaron <teb...@ufl.edu> > >>>> Cc: Val <valkr...@gmail.com>; r-help@R-project.org > >>>> (r-help@r-project.org) <r-help@r-project.org> > >>>> Subject: Re: [R] Correlate > >>>> > >>>> [External Email] > >>>> > >>>> ... But of course the p-values are essentially meaningless without some > >>>> sort of multiplicity adjustment. > >>>> (search on "multiplicity adjustment" for details). :-( > >>>> > >>>> -- Bert > >>>> > >>>> > >>>> On Mon, Aug 22, 2022 at 8:59 AM Ebert,Timothy Aaron <teb...@ufl.edu> > >>>> wrote: > >>>>> > >>>>> A somewhat clunky solution: > >>>>> for(i in colnames(dat)){ > >>>>> print(cor.test(dat[,i], dat$x1, method = "pearson", use = > >>>>> "complete.obs")$estimate) > >>>>> print(cor.test(dat[,i], dat$x1, method = "pearson", use = > >>>>> "complete.obs")$p.value) } > >> > >> Because of missing data, this computes the correlations on different > >> subsets of the data. A simple solution is to filter the data for NAs: > >> > >> D <- na.omit(dat) > >> > >> More comments below: > >> > >>>>> > >>>>> Rather than printing you could set up an array or list to save the > >>>>> results. > >>>>> > >>>>> > >>>>> Tim > >>>>> > >>>>> -----Original Message----- > >>>>> From: R-help <r-help-boun...@r-project.org> On Behalf Of Val > >>>>> Sent: Monday, August 22, 2022 11:09 AM > >>>>> To: r-help@R-project.org (r-help@r-project.org) <r-help@r-project.org> > >>>>> Subject: [R] Correlate > >>>>> > >>>>> [External Email] > >>>>> > >>>>> Hi all, > >>>>> > >>>>> I have a data set with ~250 variables(columns). I want to calculate > >>>>> the correlation of one variable with the rest of the other variables > >>>>> and also want the p-values for each correlation. Please see the > >>>>> sample data and my attempt. I have got the correlation but unable to > >>>>> get the p-values > >>>>> > >>>>> dat <- read.table(text="x1 x2 x3 x4 > >>>>> 1.68 -0.96 -1.25 0.61 > >>>>> -0.06 0.41 0.06 -0.96 > >>>>> . 0.08 1.14 1.42 > >>>>> 0.80 -0.67 0.53 -0.68 > >>>>> 0.23 -0.97 -1.18 -0.78 > >>>>> -1.03 1.11 -0.61 . > >>>>> 2.15 . 0.02 0.66 > >>>>> 0.35 -0.37 -0.26 0.39 > >>>>> -0.66 0.89 . -1.49 > >>>>> 0.11 1.52 0.73 -1.03",header=TRUE) > >>>>> > >>>>> #change all to numeric > >>>>> dat[] <- lapply(dat, function(x) as.numeric(as.character(x))) > >> > >> This data manipulation is unnecessary. Just specify the argument > >> na.strings="." to read.table(). > >> > >>>>> > >>>>> data_cor <- cor(dat[ , colnames(dat) != "x1"], dat$x1, method = > >>>>> "pearson", use = "complete.obs") > >>>>> > >>>>> Result > >>>>> [,1] > >>>>> x2 -0.5845835 > >>>>> x3 -0.4664220 > >>>>> x4 0.7202837 > >>>>> > >>>>> How do I get the p-values ? > >> > >> Taking a somewhat different approach from cor.test(), you can apply > >> Fisher's z-transformation (recall that D is the data filtered for NAs): > >> > >> > 2*pnorm(abs(atanh(data_cor)), sd=1/sqrt(nrow(D) - 3), lower.tail=FALSE) > >> [,1] > >> x2 0.2462807 > >> x3 0.3812854 > >> x4 0.1156939 > >> > >> I hope this helps, > >> John > >> > >>>>> > >>>>> Thank you, > >>>>> > >>>>> ______________________________________________ > >>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>>>> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat > >>>>> .ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl > >>>>> .edu%7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e > >>>>> 1b84%7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w > >>>>> LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C > >>>>> &sdata=3iAfMs1QzQARKF3lqUI8s43PX4IIkgEuQ9PUDyUtpqY%3D&reserved > >>>>> =0 PLEASE do read the posting guide > >>>>> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r > >>>>> -project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu% > >>>>> 7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e1b84% > >>>>> 7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM > >>>>> DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C& > >>>>> sdata=v3IEonnPgg1xTKUzLK4rJc3cfMFxw5p%2FW6puha5CFz0%3D&reserved=0 > >>>>> and provide commented, minimal, self-contained, reproducible code. > >>>>> > >>>>> ______________________________________________ > >>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>>>> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat > >>>>> .ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl > >>>>> .edu%7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e > >>>>> 1b84%7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w > >>>>> LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C > >>>>> &sdata=3iAfMs1QzQARKF3lqUI8s43PX4IIkgEuQ9PUDyUtpqY%3D&reserved > >>>>> =0 PLEASE do read the posting guide > >>>>> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r > >>>>> -project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu% > >>>>> 7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e1b84% > >>>>> 7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM > >>>>> DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C& > >>>>> sdata=v3IEonnPgg1xTKUzLK4rJc3cfMFxw5p%2FW6puha5CFz0%3D&reserved=0 > >>>>> and provide commented, minimal, self-contained, reproducible code. > >>> > >>> ______________________________________________ > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >> -- > >> John Fox, Professor Emeritus > >> McMaster University > >> Hamilton, Ontario, Canada > >> web: https://socialsciences.mcmaster.ca/jfox/ > >> > -- > John Fox, Professor Emeritus > McMaster University > Hamilton, Ontario, Canada > web: https://socialsciences.mcmaster.ca/jfox/ >
______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.