Re: [R] please help generate a square correlation matrix

Bert Gunter Sat, 27 Jul 2024 16:50:42 -0700

Your expanded explanation helps clarify your intent. Herewith some
comments. Of course, feel free to ignore and not respond. And, as
always, my apologies if I have failed to comprehend your intent.


1. I would avoid any notion of "statistical significance" like the
plague. This is a purely exploratory exercise.

2. My understanding is that you want to know the proportion of rows in
a pair of columns/vectors in which only 1 values of the pair is 1 out
of the number of pairs where 1 or 2 values is 1.  In R syntax, this is
simply:

sum(xor(x, y)) / sum(x | y)  ,
where x and y are two columns of 1's and 0's

Better yet might be to report both this *and* sum(x|y) to help you
judge "meaningfulness".
Here is a simple function that does this

## first, define a function that does above calculation:
assoc <- \(z){
   x <- z[,1]; y <- z[,2]
   n <- sum(x|y)
   c(prop = sum(xor(x, y))/n, N = n)
}

## Now a function that uses it for the various combinations:

somecor <- function(dat, func = assoc){
   dat <- as.matrix(dat)
   indx <- seq_len(ncol(dat))
   rbind(w <- combn(indx,2),
         combn(indx, 2, FUN = \(m)func(dat[,m]) )) |>
     t()  |> round(digits =2) |>
  'dimnames<-'(list(rep.int('',ncol(w)), c("","", "prop","N")))
}

# Now apply it to your example data:

somecor(dat)
## which gives
     prop N
 1 2 0.67 6
 1 3 0.60 5
 1 4 0.57 7
 2 3 0.60 5
 2 4 0.33 6
 3 4 0.71 7

This seems more interpretable and directly useful to me. Bigger values
of prop for bigger N are the more interesting, assuming I have
interpreted you correctly.

Cheers,
Bert


On Sat, Jul 27, 2024 at 12:54 PM Yuan Chun Ding <ycd...@coh.org> wrote:
>
> Hi Richard,
>
>
>
> Nice to know you had similar experience.
>
> Yes, your understanding is right.  all correlations are negative after 
> removing double-zero rows.
>
> It is consistent with a heatmap we generated.
>
> 1 is for a cancer patient with a specific mutation.  0 is no mutation for the 
> same mutation type in a patient.
>
> a pair of mutation type (two different mutations) are exclusive for most of 
> patients in heatmap or oncoplots.
>
>  If we include all 1000 patients, 900 of patients with no mutations in both 
> mutation types, then the correlation is not significant at all.
>
> But eyeball the heatmap (oncoplots) for mutation (row) by patient (column), 
> mutations are exclusive for most of patients,
>
> so I want to measure how strong the exclusiveness between two specific 
> mutation types across those patients with at least one mutation type.
>
> Then put the pair of mutations with strong negative mutations on the top rows 
> by order of negative mutation values.
>
>
>
> Regarding a final application,  maybe there are some usage for my case.
>
>  If one develops two drugs specific to the two negative correlated mutations, 
> the drug treatment for cancer patients is usually only for those patients 
> carrying the specific mutation,
>
> then it is informative to know how strong the negative correlation when 
> considering different combination of treatment strategies.
>
>
>
> Ding
>
>
>
>
>
>
>
>
>
>
>
> From: R-help <r-help-boun...@r-project.org> On Behalf Of Richard O'Keefe
> Sent: Saturday, July 27, 2024 4:47 AM
> To: Bert Gunter <bgunter.4...@gmail.com>
> Cc: r-help@r-project.org
> Subject: Re: [R] please help generate a square correlation matrix
>
>
>
> Curses, my laptop is hallucinating again. Hope I can get through this. So 
> we're talking about correlations between binary variables. Suppose we have 
> two 0-1-valued variables, x and y. Let A <- sum(x*y) # number of cases where 
> x and y are
>
> Curses, my laptop is hallucinating again.  Hope I can get through this.
>
> So we're talking about correlations between binary variables.
>
> Suppose we have two 0-1-valued variables, x and y.
>
> Let A <- sum(x*y)  # number of cases where x and y are both 1.
>
> Let B <- sum(x)-A  # number of cases where x is 1 and y is 0
>
> Let C <- sum(y)-A # number of cases where y is 1 and x is 0
>
> Let D <- sum(!x * !y) # number of cases where x and y are both 0.
>
> (also D = length(x)-A-B-C)
>
>
>
> All the information is summarised in the 2-by-2 contingency table.
>
> Some years ago, Nathan Rountree and I supervised Yung-Sing Koh's
>
> data-mining PhD.
>
> She surveyed the data mining literature and found some 37 different
>
> "interestingness measures" for two-variable associations  -- if I
>
> remember correctly; there were a lot of them.  They fell into a much
>
> smaller number of qualitatively similar groups.
>
> At any rate, the Pearson correlation between x and y is
>
> (A*D - B*C)/sqrt((A+B)*(C+D)*(A+C)*(B+D))
>
>
>
> So what happens when we delete the rows where x = 0 and y = 0?
>
> Right, it forces D to 0, leaving A B C unchanged.
>
> And looking at the numerator,
>
>   If you delete rows with x = 0 y = 0 you MUST get a negative correlation.
>
>
>
> Quite a modest "true" correlation (based on all the data) like -0.2
>
> can masquerade as quite a strong "zero-suppressed" correlation like
>
> -0.6.  Even +0.2 can turn into -0.4.   (These figures are from a
>
> particular simulation run and may not apply in your case.)
>
>
>
> Now one of the reasons why Yun-Sing Koh, Nathan Rountree, and I were
>
> interested in interestingness measures is perhaps coincidentally
>
> related to the file drawer/underreporting problem: it's quite common
>
> for rows where x = 0 and y = 0 never to have been reported to you, so
>
> we were hoping there were measures immune to that.  I have argued for
>
> years that "till record analysis" for supermarkets &c is badly flawed
>
> by two facts: (a) it is hard to measure how much of a product people
>
> WOULD have bought if only you had offered it for sale (although you
>
> can make educated guesses) and (b) till records provide no evidence on
>
> what the people who walked out without buying anything wanted (was the
>
> price too high?  could they not find it?).  Problem (a) leads to a
>
> commercial variant of the Signor-Lipps effect: "when x and/or y were
>
> available for purchase" is not the same as "the period for which data
>
> were recorded", thus inflating D, perhaps massively.  Methods
>
> developed for handling the Signor-Lipps effect in paleontology can be
>
> used to estimate when x and y were available helping you to recover a
>
> more realistic N=A+B+C+D.  I really should have published that.
>
>
>
> All of which is a long-winded way of saying that
>
> - Pearson correlations on binary columns can be computed very efficiently
>
> - the rows with x=0 and y=0 may be very informative, even essential for 
> analysis
>
> - delete them at your peril.
>
> - really, delete them at your peril.
>
>
>
> On Sat, 27 Jul 2024 at 23:07, Richard O'Keefe <rao...@gmail.com> wrote:
>
> >
>
> > Let's go back to the original posting.
>
> >
>
> > > >
>
> > > >> in each column, less than 10% values are 1, most of them are 0;
>
> > > >
>
> > > >
>
> > > >
>
> > > >> so I want to remove a  row with value of zero in both columns when 
> > > >> calculate correlation between two columns.
>
> > > >
>
> >
>
> > So we're talking about correlations between binary variables.
>
> > Suppose we have two 0-1-valued variables, x and y.
>
> > Let A <- sum(x*y)  # number of cases where x and y are both 1.
>
> > Let B <- sum(x)-a  # number of cases where x is 1 and y is 0
>
> > Let C <- sum(y)-a # number of cases where y is 1 and x is 0
>
> > Let D <- sum(!x * !y) # number of cases where x and y are both 0.
>
> >
>
> > N
>
> >
>
> > On Fri, 26 Jul 2024 at 12:07, Bert Gunter <bgunter.4...@gmail.com> wrote:
>
> > >
>
> > > If I have understood the request, I'm not sure that omitting all 0
>
> > > pairs for each pair of columns makes much sense, but be that as it
>
> > > may, here's another way to do it by using the 'FUN' argument of combn
>
> > > to encapsulate any calculations that you do. I just use cor() as the
>
> > > calculation -- you can use anything you like that takes two vectors of
>
> > > 0's and 1's and produces fixed length numeric results (or fromm which
>
> > > you can extract such).
>
> > >
>
> > > I encapsulated it all in a little function. Note that I first
>
> > > converted the data frame to a matrix. Because of their generality,
>
> > > data frames carry a lot of extra baggage that can slow purely numeric
>
> > > manipulations down.
>
> > >
>
> > > Anyway, here's the function, 'somecors' (I'm a bad name picker :(  ! )
>
> > >
>
> > >    somecors <- function(dat, func = cor){
>
> > >       dat <- as.matrix(dat)
>
> > >       indx <- seq_len(ncol(dat))
>
> > >          combn(indx, 2, FUN = \(z) {
>
> > >             i <- z[1]; j <- z[2]
>
> > >             k <- dat[, i ] | dat[, j ]
>
> > >             c(z,func(dat[k,i ], dat[k,j ]))
>
> > >          })
>
> > >    }
>
> > >
>
> > > Results come out as a matrix with combn(ncol(dat),2) columns, the
>
> > > first 2 rows giving the pair of column numbers for each column,and
>
> > > then 1 or more rows (possibly extracted) from whatever func you use.
>
> > > Here's the results for your data formatted to 2 decimal places:
>
> > >
>
> > > > round(somecors(dat),2)
>
> > >      [,1]  [,2]  [,3]  [,4] [,5]  [,6]
>
> > > [1,]  1.0  1.00  1.00  2.00    2  3.00
>
> > > [2,]  2.0  3.00  4.00  3.00    4  4.00
>
> > > [3,] -0.5 -0.41 -0.35 -0.41   NA -0.47
>
> > > Warning message:
>
> > > In func(dat[k, i], dat[k, j]) : the standard deviation is zero
>
> > >
>
> > > The NA and warning comes in the 2,4 pair of columns because after
>
> > > removing all zero rows in the pair, dat[,4] is all 1's, giving a zero
>
> > > in the denominator of the cor() calculation -- again, assuming I have
>
> > > correctly understood your request. If so, this might be something you
>
> > > need to worry about.
>
> > >
>
> > > Again, feel free to ignore if  I have misinterpreterd or this does not 
> > > suit.
>
> > >
>
> > > Cheers,
>
> > > Bert
>
> > >
>
> > >
>
> > > On Thu, Jul 25, 2024 at 2:01 PM Rui Barradas <ruipbarra...@sapo.pt> wrote:
>
> > > >
>
> > > > Às 20:47 de 25/07/2024, Yuan Chun Ding escreveu:
>
> > > > > Hi Rui,
>
> > > > >
>
> > > > > You are always very helpful!! Thank you,
>
> > > > >
>
> > > > > I just modified your R codes to remove a row with zero values in both 
> > > > > column pair as below for my real data.
>
> > > > >
>
> > > > > Ding
>
> > > > >
>
> > > > > dat<-gene22mut.coded
>
> > > > > r <- P <- matrix(NA, nrow = 22L, ncol = 22L,
>
> > > > >                   dimnames = list(names(dat), names(dat)))
>
> > > > >
>
> > > > > for(i in 1:22) {
>
> > > > >    #i=1
>
> > > > >    x <- dat[[i]]
>
> > > > >    for(j in (1:22)) {
>
> > > > >      #j=2
>
> > > > >      if(i == j) {
>
> > > > >        # there's nothing to test, assign correlation 1
>
> > > > >        r[i, j] <- 1
>
> > > > >      } else {
>
> > > > >        tmp <-cbind(x,dat[[j]])
>
> > > > >        row0 <-rowSums(tmp)
>
> > > > >        tem2 <-tmp[row0!=0,]
>
> > > > >        tmp3 <- cor.test(tem2[,1],tem2[,2])
>
> > > > >        r[i, j] <- tmp3$estimate
>
> > > > >        P[i, j] <- tmp3$p.value
>
> > > > >      }
>
> > > > >    }
>
> > > > > }
>
> > > > > r<-as.data.frame(r)
>
> > > > > P<-as.data.frame(P)
>
> > > > >
>
> > > > > From: R-help <r-help-boun...@r-project.org> On Behalf Of Yuan Chun 
> > > > > Ding via R-help
>
> > > > > Sent: Thursday, July 25, 2024 11:26 AM
>
> > > > > To: Rui Barradas <ruipbarra...@sapo.pt>; r-help@r-project.org
>
> > > > > Subject: Re: [R] please help generate a square correlation matrix
>
> > > > >
>
> > > > > HI Rui, Thank you for the help! You did not remove a row if zero 
> > > > > values exist in both column pair, right? Ding From: Rui Barradas 
> > > > > <ruipbarradas@ sapo. pt> Sent: Thursday, July 25, 2024 11: 15 AM To: 
> > > > > Yuan Chun Ding <ycding@ coh. org>;
>
> > > > >
>
> > > > >
>
> > > > > HI Rui,
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > Thank you for the  help!
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > You did not remove a row if zero values exist in both column pair, 
> > > > > right?
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > Ding
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > From: Rui Barradas <ruipbarra...@sapo.pt<mailto:ruipbarra...@sapo.pt>>
>
> > > > >
>
> > > > > Sent: Thursday, July 25, 2024 11:15 AM
>
> > > > >
>
> > > > > To: Yuan Chun Ding <ycd...@coh.org<mailto:ycd...@coh.org>>; 
> > > > > r-help@r-project.org<mailto:r-help@r-project.org>
>
> > > > >
>
> > > > > Subject: Re: [R] please help generate a square correlation matrix
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > Às 17: 39 de 25/07/2024, Yuan Chun Ding via R-help escreveu: > Hi R 
> > > > > users, > > I generated a square correlation matrix for the dat 
> > > > > dataframe below; > dat<-data. frame(g1=c(1,0,0,1,1,1,0,0,0), > 
> > > > > g2=c(0,1,0,1,0,1,1,0,0), > g3=c(1,1,0,0,0,1,0,0,0),
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > Às 17:39 de 25/07/2024, Yuan Chun Ding via R-help escreveu:
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >> Hi R users,
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >>
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >> I generated a square correlation matrix for the dat dataframe below;
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >> dat<-data.frame(g1=c(1,0,0,1,1,1,0,0,0),
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >>                   g2=c(0,1,0,1,0,1,1,0,0),
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >>                   g3=c(1,1,0,0,0,1,0,0,0),
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >>                   g4=c(0,1,0,1,1,1,1,1,0))
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >> library("Hmisc")
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >> dat.rcorr = rcorr(as.matrix(dat))
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >> dat.r <-round(dat.rcorr$r,2)
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >>
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >> however, I want to modify this correlation calculation;
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >> my dat has more than 1000 rows and 22 columns;
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >> in each column, less than 10% values are 1, most of them are 0;
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >> so I want to remove a  row with value of zero in both columns when 
> > > > >> calculate correlation between two columns.
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >> I just want to check whether those values of 1 are correlated 
> > > > >> between two columns.
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >> Please look at my code in the following;
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >>
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >> cor.4gene <-matrix(0,nrow=4*4, ncol=4)
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >> for (i in 1:4){
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >>     #i=1
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >>     for (j in 1:4) {
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >>       #j=1
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >>       d <-dat[,c(i,j)]%>%
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >>         filter(eval(as.symbol(colnames(dat)[i]))!=0 |
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >>                  eval(as.symbol(colnames(dat)[j]))!=0)
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >>       c <-cor.test(d[,1],d[,2])
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >>       cor.4gene[i*j,]<-c(colnames(dat)[i],colnames(dat)[j],
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >>                           c$estimate,c$p.value)
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >>     }
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >> }
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >> cor.4gene<-as.data.frame(cor.4gene)%>%filter(V1 !=0)
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >> colnames(cor.4gene)<-c("gene1","gene2","cor","P")
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >>
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >> Can you tell me what mistakes I made?
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >> first, why cor is NA when calculation of correlation for g1 and g1, 
> > > > >> I though it should be 1.
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >>
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >> cor.4gene$cor[is.na(cor.4gene$cor)]<-1
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >> cor.4gene$cor[is.na(cor.4gene$P)]<-0
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >> cor.4gene.sq <-pivot_wider(cor.4gene, names_from = gene1, 
> > > > >> values_from = cor)
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >>
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >> Then this line of code above did not generate a square matrix as 
> > > > >> what the HMisc library did.
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >> How to fix my code?
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >>
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >> Thank you,
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >>
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >> Ding
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >>
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >>
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >> ----------------------------------------------------------------------
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >> ------------------------------------------------------------
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >> -SECURITY/CONFIDENTIALITY WARNING-
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >>
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >> This message and any attachments are intended solely for the 
> > > > >> individual or entity to which they are addressed. This communication 
> > > > >> may contain information that is privileged, confidential, or exempt 
> > > > >> from disclosure under applicable law (e.g., personal health 
> > > > >> information, research data, financial information). Because this 
> > > > >> e-mail has been sent without encryption, individuals other than the 
> > > > >> intended recipient may be able to view the information, forward it 
> > > > >> to others or tamper with the information without the knowledge or 
> > > > >> consent of the sender. If you are not the intended recipient, or the 
> > > > >> employee or person responsible for delivering the message to the 
> > > > >> intended recipient, any dissemination, distribution or copying of 
> > > > >> the communication is strictly prohibited. If you received the 
> > > > >> communication in error, please notify the sender immediately by 
> > > > >> replying to this message and deleting the message and any 
> > > > >> accompanying files from your system. If, due to the security risks, 
> > > > >> you do not wish to rec
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >>    eive further communications via e-mail, please reply to this 
> > > > >> message and inform the sender that you do not wish to receive 
> > > > >> further e-mail from the sender. (LCP301)
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >> ------------------------------------------------------------
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >>
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >>              [[alternative HTML version deleted]]
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >>
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >> ______________________________________________
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >> R-help@r-project.org<mailto:R-help@r-project.org<mailto:R-help@r-project.org%3cmailto:R-help@r-project.org>>
> > > > >>  mailing list -- To UNSUBSCRIBE and more, see
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!Fou38LsQmgU!tyykZkQmOKcwoWXEpV2ohbnr02thhHMabAcYLL_-7dteKHAabK-eo4rGDnwgSFjniAy8SO00L6Hb8338TBM$<https://urldefense.com/v3/__https:/stat.ethz.ch/mailman/listinfo/r-help__;!!Fou38LsQmgU!tyykZkQmOKcwoWXEpV2ohbnr02thhHMabAcYLL_-7dteKHAabK-eo4rGDnwgSFjniAy8SO00L6Hb8338TBM$><https://urldefense.com/v3/__https:/stat.ethz.ch/mailman/listinfo/r-help__;!!Fou38LsQmgU!tyykZkQmOKcwoWXEpV2ohbnr02thhHMabAcYLL_-7dteKHAabK-eo4rGDnwgSFjniAy8SO00L6Hb8338TBM$%3chttps:/urldefense.com/v3/__https:/stat.ethz.ch/mailman/listinfo/r-help__;!!Fou38LsQmgU!tyykZkQmOKcwoWXEpV2ohbnr02thhHMabAcYLL_-7dteKHAabK-eo4rGDnwgSFjniAy8SO00L6Hb8338TBM$%3e%3e>
>
> >> > >
>
> > > > >   
> > > > > <https://urldefense.com/v3/__https:/stat.ethz.ch/mailman/listinfo/r-help__;!!Fou38LsQmgU!tyykZkQmOKcwoWXEpV2ohbnr02thhHMabAcYLL_-7dteKHAabK-eo4rGDnwgSFjniAy8SO00L6Hb8338TBM$%3chttps:/urldefense.com/v3/__https:/stat.ethz.ch/mailman/listinfo/r-help__;!!Fou38LsQmgU!tyykZkQmOKcwoWXEpV2ohbnr02thhHMabAcYLL_-7dteKHAabK-eo4rGDnwgSFjniAy8SO00L6Hb8338TBM$%3e%3e>
>
> >> > >
>
> > > > >> <https://urldefense.com/v3/__https:/stat.ethz.ch/mailman/listinfo/r-help__;!!Fou38LsQmgU!tyykZkQmOKcwoWXEpV2ohbnr02thhHMabAcYLL_-7dteKHAabK-eo4rGDnwgSFjniAy8SO00L6Hb8338TBM$%3chttps:/urldefense.com/v3/__https:/stat.ethz.ch/mailman/listinfo/r-help__;!!Fou38LsQmgU!tyykZkQmOKcwoWXEpV2ohbnr02thhHMabAcYLL_-7dteKHAabK-eo4rGDnwgSFjniAy8SO00L6Hb8338TBM$%3e%3e>PLEASEdo
> > > > >>  read the posting guide 
> > > > >> https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!Fou38LsQmgU!tyykZkQmOKcwoWXEpV2ohbnr02thhHMabAcYLL_-7dteKHAabK-eo4rGDnwgSFjniAy8SO00L6Hb880tLw0$<https://urldefense.com/v3/__http:/www.R-project.org/posting-guide.html__;!!Fou38LsQmgU!tyykZkQmOKcwoWXEpV2ohbnr02thhHMabAcYLL_-7dteKHAabK-eo4rGDnwgSFjniAy8SO00L6Hb880tLw0$><https://urldefense.com/v3/__http:/www.R-project.org/posting-guide.html__;!!Fou38LsQmgU!tyykZkQmOKcwoWXEpV2ohbnr02thhHMabAcYLL_-7dteKHAabK-eo4rGDnwgSFjniAy8SO00L6Hb880tLw0$%3chttps:/urldefense.com/v3/__http:/www.R-project.org/posting-guide.html__;!!Fou38LsQmgU!tyykZkQmOKcwoWXEpV2ohbnr02thhHMabAcYLL_-7dteKHAabK-eo4rGDnwgSFjniAy8SO00L6Hb880tLw0$%3e%3e>
>
> >> > >
>
> > > > >   
> > > > > <https://urldefense.com/v3/__http:/www.R-project.org/posting-guide.html__;!!Fou38LsQmgU!tyykZkQmOKcwoWXEpV2ohbnr02thhHMabAcYLL_-7dteKHAabK-eo4rGDnwgSFjniAy8SO00L6Hb880tLw0$%3chttps:/urldefense.com/v3/__http:/www.R-project.org/posting-guide.html__;!!Fou38LsQmgU!tyykZkQmOKcwoWXEpV2ohbnr02thhHMabAcYLL_-7dteKHAabK-eo4rGDnwgSFjniAy8SO00L6Hb880tLw0$%3e%3e>
>
> >> > >
>
> > > > >> <https://urldefense.com/v3/__http:/www.R-project.org/posting-guide.html__;!!Fou38LsQmgU!tyykZkQmOKcwoWXEpV2ohbnr02thhHMabAcYLL_-7dteKHAabK-eo4rGDnwgSFjniAy8SO00L6Hb880tLw0$%3chttps:/urldefense.com/v3/__http:/www.R-project.org/posting-guide.html__;!!Fou38LsQmgU!tyykZkQmOKcwoWXEpV2ohbnr02thhHMabAcYLL_-7dteKHAabK-eo4rGDnwgSFjniAy8SO00L6Hb880tLw0$%3e%3e>andprovide
> > > > >>  commented, minimal, self-contained, reproducible code.
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > Hello,
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > You are complicating the code, there's no need for as.symbol/eval, the
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > column numbers do exactly the same.
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > # create the two results matrices beforehand
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > r <- P <- matrix(NA, nrow = 4L, ncol = 4L, dimnames = list(names(dat),
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > names(dat)))
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > for(i in 1:4) {
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >     x <- dat[[i]]
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >     for(j in (1:4)) {
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >       if(i == j) {
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >         # there's nothing to test, assign correlation 1
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >         r[i, j] <- 1
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >       } else {
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >         tmp <- cor.test(x, dat[[j]])
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >         r[i, j] <- tmp$estimate
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >         P[i, j] <- tmp$p.value
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >       }
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >     }
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > }
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > # these two results are equal up to floating-point precision
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > dat.rcorr$r
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > #>           g1        g2        g3        g4
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > #> g1 1.0000000 0.1000000 0.3162278 0.1581139
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > #> g2 0.1000000 1.0000000 0.3162278 0.6324555
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > #> g3 0.3162278 0.3162278 1.0000000 0.0000000
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > #> g4 0.1581139 0.6324555 0.0000000 1.0000000
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > r
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > #>           g1        g2           g3           g4
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > #> g1 1.0000000 0.1000000 3.162278e-01 1.581139e-01
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > #> g2 0.1000000 1.0000000 3.162278e-01 6.324555e-01
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > #> g3 0.3162278 0.3162278 1.000000e+00 1.355253e-20
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > #> g4 0.1581139 0.6324555 1.355253e-20 1.000000e+00
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > # these two results are equal up to floating-point precision
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > dat.rcorr$P
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > #>           g1         g2        g3         g4
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > #> g1        NA 0.79797170 0.4070838 0.68452834
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > #> g2 0.7979717         NA 0.4070838 0.06758329
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > #> g3 0.4070838 0.40708382        NA 1.00000000
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > #> g4 0.6845283 0.06758329 1.0000000         NA
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > P
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > #>           g1         g2        g3         g4
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > #> g1        NA 0.79797170 0.4070838 0.68452834
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > #> g2 0.7979717         NA 0.4070838 0.06758329
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > #> g3 0.4070838 0.40708382        NA 1.00000000
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > #> g4 0.6845283 0.06758329 1.0000000         NA
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > You can put these two results in a list, like Hmisc::rcorr does.
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > lst_rcorr <- list(r = r, P = P)
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > Hope this helps,
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > Rui Barradas
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > --
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > Este e-mail foi analisado pelo software antivírus AVG para verificar 
> > > > > a presença de vírus.
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > https://urldefense.com/v3/__http://www.avg.com__;!!Fou38LsQmgU!tyykZkQmOKcwoWXEpV2ohbnr02thhHMabAcYLL_-7dteKHAabK-eo4rGDnwgSFjniAy8SO00L6HbloMCQMI$<https://urldefense.com/v3/__http:/www.avg.com__;!!Fou38LsQmgU!tyykZkQmOKcwoWXEpV2ohbnr02thhHMabAcYLL_-7dteKHAabK-eo4rGDnwgSFjniAy8SO00L6HbloMCQMI$><https://urldefense.com/v3/__http:/www.avg.com__;!!Fou38LsQmgU!tyykZkQmOKcwoWXEpV2ohbnr02thhHMabAcYLL_-7dteKHAabK-eo4rGDnwgSFjniAy8SO00L6HbloMCQMI$%3chttps:/urldefense.com/v3/__http:/www.avg.com__;!!Fou38LsQmgU!tyykZkQmOKcwoWXEpV2ohbnr02thhHMabAcYLL_-7dteKHAabK-eo4rGDnwgSFjniAy8SO00L6HbloMCQMI$%3e%09%5b%5balternative>
>
> >> > >
>
> > > > >   
> > > > > <https://urldefense.com/v3/__http:/www.avg.com__;!!Fou38LsQmgU!tyykZkQmOKcwoWXEpV2ohbnr02thhHMabAcYLL_-7dteKHAabK-eo4rGDnwgSFjniAy8SO00L6HbloMCQMI$%3chttps:/urldefense.com/v3/__http:/www.avg.com__;!!Fou38LsQmgU!tyykZkQmOKcwoWXEpV2ohbnr02thhHMabAcYLL_-7dteKHAabK-eo4rGDnwgSFjniAy8SO00L6HbloMCQMI$%3e%09%5b%5balternative>
>
> >> > >
>
> > > > >                 
> > > > > [[alternative<https://urldefense.com/v3/__http:/www.avg.com__;!!Fou38LsQmgU!tyykZkQmOKcwoWXEpV2ohbnr02thhHMabAcYLL_-7dteKHAabK-eo4rGDnwgSFjniAy8SO00L6HbloMCQMI$%3chttps:/urldefense.com/v3/__http:/www.avg.com__;!!Fou38LsQmgU!tyykZkQmOKcwoWXEpV2ohbnr02thhHMabAcYLL_-7dteKHAabK-eo4rGDnwgSFjniAy8SO00L6HbloMCQMI$%3e%09%5b%5balternative>HTMLversion
> > > > >  deleted]]
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > ______________________________________________
>
> > > > >
>
> > > > > R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To 
> > > > > UNSUBSCRIBE and more, see
>
> > > > >
>
> > > > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!Fou38LsQmgU!s54ahZtZNAIyaIGV3C2p8lhXpYlHksC6XvFKZltf6g3ElJHOO3I1MYFecLQ4QeMO3MpP3qXz-YrhdUE$<https://urldefense.com/v3/__https:/stat.ethz.ch/mailman/listinfo/r-help__;!!Fou38LsQmgU!s54ahZtZNAIyaIGV3C2p8lhXpYlHksC6XvFKZltf6g3ElJHOO3I1MYFecLQ4QeMO3MpP3qXz-YrhdUE$>
>
> >> > >
>
> > > > > PLEASE do read the posting guide 
> > > > > https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!Fou38LsQmgU!s54ahZtZNAIyaIGV3C2p8lhXpYlHksC6XvFKZltf6g3ElJHOO3I1MYFecLQ4QeMO3MpP3qXzNRZxc6s$<https://urldefense.com/v3/__http:/www.R-project.org/posting-guide.html__;!!Fou38LsQmgU!s54ahZtZNAIyaIGV3C2p8lhXpYlHksC6XvFKZltf6g3ElJHOO3I1MYFecLQ4QeMO3MpP3qXzNRZxc6s$>
>
> >> > >
>
> > > > > and provide commented, minimal, self-contained, reproducible code.
>
> > > > >
>
> > > > Hello,
>
> > > >
>
> > > > Here are two other ways.
>
> > > >
>
> > > > The first is equivalent to your long format attempt.
>
> > > >
>
> > > >
>
> > > > library(tidyverse)
>
> > > >
>
> > > > dat %>%
>
> > > >    names() %>%
>
> > > >    expand.grid(., .) %>%
>
> > > >    apply(1L, \(x) {
>
> > > >      tmp <- dat[rowSums(dat[x]) > 0, ]
>
> > > >      tmp2 <- cor.test(tmp[[ x[1L] ]], tmp[[ x[2L] ]])
>
> > > >      c(tmp2$estimate, P = tmp2$p.value)
>
> > > >    }) %>%
>
> > > >    t() %>%
>
> > > >    as.data.frame() %>%
>
> > > >    cbind(tmp_df, .) %>%
>
> > > >    na.omit()
>
> > > >
>
> > > >
>
> > > > The second is, in my opinion the one that makes more sense. If you see
>
> > > > the results, cor is symmetric (as it should) so the calculations are
>
> > > > repeated. If you only run the cor.tests on the combinations of
>
> > > > names(dat) by groups of 2, it will save a lot of work. But the output is
>
> > > > a much smaller a data.frame.
>
> > > >
>
> > > >
>
> > > > cbind(
>
> > > >    combn(names(dat), 2L) %>%
>
> > > >      t() %>%
>
> > > >      as.data.frame(),
>
> > > >    combn(dat, 2L, FUN = \(d) {
>
> > > >      d2 <- d[rowSums(d) > 0, ]
>
> > > >      tmp2 <- cor.test(d2[[1L]], d2[[2L]])
>
> > > >      c(tmp2$estimate, P = tmp2$p.value)
>
> > > >    }) %>% t()
>
> > > > ) %>% na.omit()
>
> > > >
>
> > > >
>
> > > >
>
> > > > Hope this helps,
>
> > > >
>
> > > > Rui Barradas
>
> > > >
>
> > > >
>
> > > > ______________________________________________
>
> > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>
> > > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!Fou38LsQmgU!syHSW7xho1Y4ssijZgjysEtoRhDbKljzLKfIYGOzmXQsT7wsjfUQ3n7CZDn7aQ-aUmwxIgqJrg$
>
> > > > PLEASE do read the posting guide 
> > > > https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!Fou38LsQmgU!syHSW7xho1Y4ssijZgjysEtoRhDbKljzLKfIYGOzmXQsT7wsjfUQ3n7CZDn7aQ-aUmxc76WB2w$
>
> > > > and provide commented, minimal, self-contained, reproducible code.
>
> > >
>
> > > ______________________________________________
>
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>
> > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!Fou38LsQmgU!syHSW7xho1Y4ssijZgjysEtoRhDbKljzLKfIYGOzmXQsT7wsjfUQ3n7CZDn7aQ-aUmwxIgqJrg$
>
> > > PLEASE do read the posting guide 
> > > https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!Fou38LsQmgU!syHSW7xho1Y4ssijZgjysEtoRhDbKljzLKfIYGOzmXQsT7wsjfUQ3n7CZDn7aQ-aUmxc76WB2w$
>
> > > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> ______________________________________________
>
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>
> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!Fou38LsQmgU!syHSW7xho1Y4ssijZgjysEtoRhDbKljzLKfIYGOzmXQsT7wsjfUQ3n7CZDn7aQ-aUmwxIgqJrg$
>
> PLEASE do read the posting guide 
> https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!Fou38LsQmgU!syHSW7xho1Y4ssijZgjysEtoRhDbKljzLKfIYGOzmXQsT7wsjfUQ3n7CZDn7aQ-aUmxc76WB2w$
>
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] please help generate a square correlation matrix

Reply via email to