> On May 6, 2016, at 2:12 PM, Lida Zeighami <lid.z...@gmail.com> wrote: > > Hi there, > > Is there any way to find out high correlated variables among a big matrix? > for example I have a matrix called data= 2000*5000 and I need to find the > high correlated variables between the variables in the columns! (Need 100 > high correlated variables from 5000 variables in column) > > I could calculate the correlation matrix and pick the high correlated ones > but my problem is, I just can pick pairs of variables with high correlation > and may be we have low correlation across the pairs! Means, in my 100*100 > correlation matrix, there are some pairs with low correlation and I > couldn't find the 100 variables which they all have high correlation > together!!! > Would you please ley me know if there is any way?
The rcorr function in Hmisc will return a list whose first element is a correlation matrix > base <- rnorm(100) > test <- matrix(base+0.2*rnorm(300), 100) > rcorr(test)[[1]] [,1] [,2] [,3] [1,] 1.0000000 0.9631220 0.9721688 [2,] 0.9631220 1.0000000 0.9666564 [3,] 0.9721688 0.9666564 1.0000000 You can use which to to find the locations meeting a criterion (or two): > mycorr <- .Last.value > which(mycorr > 0.97 & mycorr != 1, arr.ind=TRUE) row col [1,] 3 1 [2,] 1 3 -- David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.